- VHE optimizations
 - EL2 address space randomization
 - speculative execution mitigations ("variant 3a", aka execution past invalid
 privilege register access)
 - bugfixes and cleanups
 
 PPC:
 - improvements for the radix page fault handler for HV KVM on POWER9
 
 s390:
 - more kvm stat counters
 - virtio gpu plumbing
 - documentation
 - facilities improvements
 
 x86:
 - support for VMware magic I/O port and pseudo-PMCs
 - AMD pause loop exiting
 - support for AMD core performance extensions
 - support for synchronous register access
 - expose nVMX capabilities to userspace
 - support for Hyper-V signaling via eventfd
 - use Enlightened VMCS when running on Hyper-V
 - allow userspace to disable MWAIT/HLT/PAUSE vmexits
 - usual roundup of optimizations and nested virtualization bugfixes
 
 Generic:
 - API selftest infrastructure (though the only tests are for x86 as of now)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
 rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
 N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
 kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
 UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
 Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
 =bPlD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - VHE optimizations

   - EL2 address space randomization

   - speculative execution mitigations ("variant 3a", aka execution past
     invalid privilege register access)

   - bugfixes and cleanups

  PPC:
   - improvements for the radix page fault handler for HV KVM on POWER9

  s390:
   - more kvm stat counters

   - virtio gpu plumbing

   - documentation

   - facilities improvements

  x86:
   - support for VMware magic I/O port and pseudo-PMCs

   - AMD pause loop exiting

   - support for AMD core performance extensions

   - support for synchronous register access

   - expose nVMX capabilities to userspace

   - support for Hyper-V signaling via eventfd

   - use Enlightened VMCS when running on Hyper-V

   - allow userspace to disable MWAIT/HLT/PAUSE vmexits

   - usual roundup of optimizations and nested virtualization bugfixes

  Generic:
   - API selftest infrastructure (though the only tests are for x86 as
     of now)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
  kvm: x86: fix a prototype warning
  kvm: selftests: add sync_regs_test
  kvm: selftests: add API testing infrastructure
  kvm: x86: fix a compile warning
  KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
  KVM: X86: Introduce handle_ud()
  KVM: vmx: unify adjacent #ifdefs
  x86: kvm: hide the unused 'cpu' variable
  KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
  Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
  kvm: Add emulation for movups/movupd
  KVM: VMX: raise internal error for exception during invalid protected mode state
  KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
  KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  KVM: x86: Fix misleading comments on handling pending exceptions
  KVM: x86: Rename interrupt.pending to interrupt.injected
  KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
  x86/kvm: use Enlightened VMCS when running on Hyper-V
  x86/hyper-v: detect nested features
  x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
  ...
This commit is contained in:
Linus Torvalds 2018-04-09 11:42:31 -07:00
commit d8312a3f61
150 changed files with 11930 additions and 2012 deletions

View File

@ -1907,6 +1907,9 @@
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface.
Default is false (don't support).
kvm.mmu_audit= [KVM] This is a R/W parameter which allows audit
KVM MMU at runtime.
Default is 0 (off)

View File

@ -86,9 +86,12 @@ Translation table lookup with 64KB pages:
+-------------------------------------------------> [63] TTBR0/1
When using KVM without the Virtualization Host Extensions, the hypervisor
maps kernel pages in EL2 at a fixed offset from the kernel VA. See the
kern_hyp_va macro for more details.
When using KVM without the Virtualization Host Extensions, the
hypervisor maps kernel pages in EL2 at a fixed (and potentially
random) offset from the linear mapping. See the kern_hyp_va macro and
kvm_update_va_mask function for more details. MMIO devices such as
GICv2 gets mapped next to the HYP idmap page, as do vectors when
ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
When using KVM with the Virtualization Host Extensions, no additional
mappings are created, since the host kernel runs directly in EL2.

View File

@ -1,7 +1,12 @@
00-INDEX
- this file.
amd-memory-encryption.rst
- notes on AMD Secure Encrypted Virtualization feature and SEV firmware
command description
api.txt
- KVM userspace API.
arm
- internal ABI between the kernel and HYP (for arm/arm64)
cpuid.txt
- KVM-specific cpuid leaves (x86).
devices/
@ -26,6 +31,5 @@ s390-diag.txt
- Diagnose hypercall description (for IBM S/390)
timekeeping.txt
- timekeeping virtualization for x86-based architectures.
amd-memory-encryption.txt
- notes on AMD Secure Encrypted Virtualization feature and SEV firmware
command description
vcpu-requests.rst
- internal VCPU request API

View File

@ -3480,7 +3480,7 @@ encrypted VMs.
Currently, this ioctl is used for issuing Secure Encrypted Virtualization
(SEV) commands on AMD Processors. The SEV commands are defined in
Documentation/virtual/kvm/amd-memory-encryption.txt.
Documentation/virtual/kvm/amd-memory-encryption.rst.
4.111 KVM_MEMORY_ENCRYPT_REG_REGION
@ -3516,6 +3516,38 @@ Returns: 0 on success; -1 on error
This ioctl can be used to unregister the guest memory region registered
with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
4.113 KVM_HYPERV_EVENTFD
Capability: KVM_CAP_HYPERV_EVENTFD
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_hyperv_eventfd (in)
This ioctl (un)registers an eventfd to receive notifications from the guest on
the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without
causing a user exit. SIGNAL_EVENT hypercall with non-zero event flag number
(bits 24-31) still triggers a KVM_EXIT_HYPERV_HCALL user exit.
struct kvm_hyperv_eventfd {
__u32 conn_id;
__s32 fd;
__u32 flags;
__u32 padding[3];
};
The conn_id field should fit within 24 bits:
#define KVM_HYPERV_CONN_ID_MASK 0x00ffffff
The acceptable values for the flags field are:
#define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0)
Returns: 0 on success,
-EINVAL if conn_id or flags is outside the allowed range
-ENOENT on deassign if the conn_id isn't registered
-EEXIST on assign if the conn_id is already registered
5. The kvm_run structure
------------------------
@ -3873,7 +3905,7 @@ in userspace.
__u64 kvm_dirty_regs;
union {
struct kvm_sync_regs regs;
char padding[1024];
char padding[SYNC_REGS_SIZE_BYTES];
} s;
If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
@ -4078,6 +4110,46 @@ Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be
accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
the guest.
6.74 KVM_CAP_SYNC_REGS
Architectures: s390, x86
Target: s390: always enabled, x86: vcpu
Parameters: none
Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register
sets are supported (bitfields defined in arch/x86/include/uapi/asm/kvm.h).
As described above in the kvm_sync_regs struct info in section 5 (kvm_run):
KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers
without having to call SET/GET_*REGS". This reduces overhead by eliminating
repeated ioctl calls for setting and/or getting register values. This is
particularly important when userspace is making synchronous guest state
modifications, e.g. when emulating and/or intercepting instructions in
userspace.
For s390 specifics, please refer to the source code.
For x86:
- the register sets to be copied out to kvm_run are selectable
by userspace (rather that all sets being copied out for every exit).
- vcpu_events are available in addition to regs and sregs.
For x86, the 'kvm_valid_regs' field of struct kvm_run is overloaded to
function as an input bit-array field set by userspace to indicate the
specific register sets to be copied out on the next exit.
To indicate when userspace has modified values that should be copied into
the vCPU, the all architecture bitarray field, 'kvm_dirty_regs' must be set.
This is done using the same bitflags as for the 'kvm_valid_regs' field.
If the dirty bit is not set, then the register set values will not be copied
into the vCPU even if they've been modified.
Unused bitfields in the bitarrays must be set to zero.
struct kvm_sync_regs {
struct kvm_regs regs;
struct kvm_sregs sregs;
struct kvm_vcpu_events events;
};
7. Capabilities that can be enabled on VMs
------------------------------------------
@ -4286,6 +4358,26 @@ enables QEMU to build error log and branch to guest kernel registered
machine check handling routine. Without this capability KVM will
branch to guests' 0x200 interrupt vector.
7.13 KVM_CAP_X86_DISABLE_EXITS
Architectures: x86
Parameters: args[0] defines which exits are disabled
Returns: 0 on success, -EINVAL when args[0] contains invalid exits
Valid bits in args[0] are
#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
Enabling this capability on a VM provides userspace with a way to no
longer intercept some instructions for improved latency in some
workloads, and is suggested when vCPUs are associated to dedicated
physical CPUs. More bits can be added in the future; userspace can
just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable
all such vmexits.
Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
8. Other capabilities.
----------------------
@ -4398,15 +4490,6 @@ reserved.
Both registers and addresses are 64-bits wide.
It will be possible to run 64-bit or 32-bit guest code.
8.8 KVM_CAP_X86_GUEST_MWAIT
Architectures: x86
This capability indicates that guest using memory monotoring instructions
(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit. As such time
spent while virtual CPU is halted in this way will then be accounted for as
guest running time on the host (as opposed to e.g. HLT).
8.9 KVM_CAP_ARM_USER_IRQ
Architectures: arm, arm64
@ -4483,3 +4566,33 @@ Parameters: none
This capability indicates if the flic device will be able to get/set the
AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
to discover this without having to create a flic device.
8.14 KVM_CAP_S390_PSW
Architectures: s390
This capability indicates that the PSW is exposed via the kvm_run structure.
8.15 KVM_CAP_S390_GMAP
Architectures: s390
This capability indicates that the user space memory used as guest mapping can
be anywhere in the user memory address space, as long as the memory slots are
aligned and sized to a segment (1MB) boundary.
8.16 KVM_CAP_S390_COW
Architectures: s390
This capability indicates that the user space memory used as guest mapping can
use copy-on-write semantics as well as dirty pages tracking via read-only page
tables.
8.17 KVM_CAP_S390_BPB
Architectures: s390
This capability indicates that kvm will implement the interfaces to handle
reset, migration and nested KVM for branch prediction blocking. The stfle
facility 82 should not be provided to the guest without this capability.

View File

@ -23,8 +23,8 @@ This function queries the presence of KVM cpuid leafs.
function: define KVM_CPUID_FEATURES (0x40000001)
returns : ebx, ecx, edx = 0
eax = and OR'ed group of (1 << flag), where each flags is:
returns : ebx, ecx
eax = an OR'ed group of (1 << flag), where each flags is:
flag || value || meaning
@ -66,3 +66,14 @@ KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
|| || per-cpu warps are expected in
|| || kvmclock.
------------------------------------------------------------------------------
edx = an OR'ed group of (1 << flag), where each flags is:
flag || value || meaning
==================================================================================
KVM_HINTS_DEDICATED || 0 || guest checks this feature bit to
|| || determine if there is vCPU pinning
|| || and there is no vCPU over-commitment,
|| || allowing optimizations
----------------------------------------------------------------------------------

View File

@ -6516,7 +6516,7 @@ S: Maintained
F: Documentation/networking/netvsc.txt
F: arch/x86/include/asm/mshyperv.h
F: arch/x86/include/asm/trace/hyperv.h
F: arch/x86/include/uapi/asm/hyperv.h
F: arch/x86/include/asm/hyperv-tlfs.h
F: arch/x86/kernel/cpu/mshyperv.c
F: arch/x86/hyperv
F: drivers/hid/hid-hyperv.c

View File

@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
/* no VHE on 32-bit :( */
static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { BUG(); return 0; }
extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
extern void __init_stage2_translation(void);

View File

@ -41,7 +41,17 @@ static inline unsigned long *vcpu_reg32(struct kvm_vcpu *vcpu, u8 reg_num)
return vcpu_reg(vcpu, reg_num);
}
unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu);
static inline unsigned long vpcu_read_spsr(struct kvm_vcpu *vcpu)
{
return *__vcpu_spsr(vcpu);
}
static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
{
*__vcpu_spsr(vcpu) = v;
}
static inline unsigned long vcpu_get_reg(struct kvm_vcpu *vcpu,
u8 reg_num)
@ -92,14 +102,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
vcpu->arch.hcr = HCR_GUEST_MASK;
}
static inline unsigned long vcpu_get_hcr(const struct kvm_vcpu *vcpu)
static inline unsigned long *vcpu_hcr(const struct kvm_vcpu *vcpu)
{
return vcpu->arch.hcr;
}
static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
{
vcpu->arch.hcr = hcr;
return (unsigned long *)&vcpu->arch.hcr;
}
static inline bool vcpu_mode_is_32bit(const struct kvm_vcpu *vcpu)

View File

@ -155,9 +155,6 @@ struct kvm_vcpu_arch {
/* HYP trapping configuration */
u32 hcr;
/* Interrupt related fields */
u32 irq_lines; /* IRQ and FIQ levels */
/* Exception Information */
struct kvm_vcpu_fault_info fault;
@ -315,4 +312,7 @@ static inline bool kvm_arm_harden_branch_predictor(void)
return false;
}
static inline void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu) {}
static inline void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu) {}
#endif /* __ARM_KVM_HOST_H__ */

View File

@ -110,6 +110,10 @@ void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
asmlinkage void __vfp_save_state(struct vfp_hard_struct *vfp);
asmlinkage void __vfp_restore_state(struct vfp_hard_struct *vfp);

View File

@ -28,6 +28,13 @@
*/
#define kern_hyp_va(kva) (kva)
/* Contrary to arm64, there is no need to generate a PC-relative address */
#define hyp_symbol_addr(s) \
({ \
typeof(s) *addr = &(s); \
addr; \
})
/*
* KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels.
*/
@ -42,8 +49,15 @@
#include <asm/pgalloc.h>
#include <asm/stage2_pgtable.h>
/* Ensure compatibility with arm64 */
#define VA_BITS 32
int create_hyp_mappings(void *from, void *to, pgprot_t prot);
int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
void __iomem **kaddr,
void __iomem **haddr);
int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
void **haddr);
void free_hyp_pgds(void);
void stage2_unmap_vm(struct kvm *kvm);

View File

@ -135,6 +135,15 @@ struct kvm_arch_memory_slot {
#define KVM_REG_ARM_CRM_SHIFT 7
#define KVM_REG_ARM_32_CRN_MASK 0x0000000000007800
#define KVM_REG_ARM_32_CRN_SHIFT 11
/*
* For KVM currently all guest registers are nonsecure, but we reserve a bit
* in the encoding to distinguish secure from nonsecure for AArch32 system
* registers that are banked by security. This is 1 for the secure banked
* register, and 0 for the nonsecure banked register or if the register is
* not banked by security.
*/
#define KVM_REG_ARM_SECURE_MASK 0x0000000010000000
#define KVM_REG_ARM_SECURE_SHIFT 28
#define ARM_CP15_REG_SHIFT_MASK(x,n) \
(((x) << KVM_REG_ARM_ ## n ## _SHIFT) & KVM_REG_ARM_ ## n ## _MASK)

View File

@ -270,6 +270,60 @@ static bool access_gic_sre(struct kvm_vcpu *vcpu,
return true;
}
static bool access_cntp_tval(struct kvm_vcpu *vcpu,
const struct coproc_params *p,
const struct coproc_reg *r)
{
u64 now = kvm_phys_timer_read();
u64 val;
if (p->is_write) {
val = *vcpu_reg(vcpu, p->Rt1);
kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL, val + now);
} else {
val = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
*vcpu_reg(vcpu, p->Rt1) = val - now;
}
return true;
}
static bool access_cntp_ctl(struct kvm_vcpu *vcpu,
const struct coproc_params *p,
const struct coproc_reg *r)
{
u32 val;
if (p->is_write) {
val = *vcpu_reg(vcpu, p->Rt1);
kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CTL, val);
} else {
val = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CTL);
*vcpu_reg(vcpu, p->Rt1) = val;
}
return true;
}
static bool access_cntp_cval(struct kvm_vcpu *vcpu,
const struct coproc_params *p,
const struct coproc_reg *r)
{
u64 val;
if (p->is_write) {
val = (u64)*vcpu_reg(vcpu, p->Rt2) << 32;
val |= *vcpu_reg(vcpu, p->Rt1);
kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL, val);
} else {
val = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
*vcpu_reg(vcpu, p->Rt1) = val;
*vcpu_reg(vcpu, p->Rt2) = val >> 32;
}
return true;
}
/*
* We could trap ID_DFR0 and tell the guest we don't support performance
* monitoring. Unfortunately the patch to make the kernel check ID_DFR0 was
@ -423,10 +477,17 @@ static const struct coproc_reg cp15_regs[] = {
{ CRn(13), CRm( 0), Op1( 0), Op2( 4), is32,
NULL, reset_unknown, c13_TID_PRIV },
/* CNTP */
{ CRm64(14), Op1( 2), is64, access_cntp_cval},
/* CNTKCTL: swapped by interrupt.S. */
{ CRn(14), CRm( 1), Op1( 0), Op2( 0), is32,
NULL, reset_val, c14_CNTKCTL, 0x00000000 },
/* CNTP */
{ CRn(14), CRm( 2), Op1( 0), Op2( 0), is32, access_cntp_tval },
{ CRn(14), CRm( 2), Op1( 0), Op2( 1), is32, access_cntp_ctl },
/* The Configuration Base Address Register. */
{ CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
};

View File

@ -142,7 +142,7 @@ unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num)
/*
* Return the SPSR for the current mode of the virtual CPU.
*/
unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu)
unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu)
{
unsigned long mode = *vcpu_cpsr(vcpu) & MODE_MASK;
switch (mode) {
@ -174,5 +174,5 @@ unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu)
*/
void kvm_inject_vabt(struct kvm_vcpu *vcpu)
{
vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VA);
*vcpu_hcr(vcpu) |= HCR_VA;
}

View File

@ -9,7 +9,6 @@ KVM=../../../../virt/kvm
CFLAGS_ARMV7VE :=$(call cc-option, -march=armv7ve)
obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o

View File

@ -44,7 +44,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu, u32 *fpexc_host)
isb();
}
write_sysreg(vcpu->arch.hcr | vcpu->arch.irq_lines, HCR);
write_sysreg(vcpu->arch.hcr, HCR);
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(HSTR_T(15), HSTR);
write_sysreg(HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11), HCPTR);
@ -90,18 +90,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
{
if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
__vgic_v3_save_state(vcpu);
else
__vgic_v2_save_state(vcpu);
__vgic_v3_deactivate_traps(vcpu);
}
}
static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
{
if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
__vgic_v3_activate_traps(vcpu);
__vgic_v3_restore_state(vcpu);
else
__vgic_v2_restore_state(vcpu);
}
}
static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
@ -154,7 +154,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
return true;
}
int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;

View File

@ -922,6 +922,22 @@ config HARDEN_BRANCH_PREDICTOR
If unsure, say Y.
config HARDEN_EL2_VECTORS
bool "Harden EL2 vector mapping against system register leak" if EXPERT
default y
help
Speculation attacks against some high-performance processors can
be used to leak privileged information such as the vector base
register, resulting in a potential defeat of the EL2 layout
randomization.
This config option will map the vectors to a fixed location,
independent of the EL2 code mapping, so that revealing VBAR_EL2
to an attacker does not give away any extra information. This
only gets enabled on affected CPUs.
If unsure, say Y.
menuconfig ARMV8_DEPRECATED
bool "Emulate deprecated/obsolete ARMv8 instructions"
depends on COMPAT

View File

@ -5,6 +5,8 @@
#include <asm/cpucaps.h>
#include <asm/insn.h>
#define ARM64_CB_PATCH ARM64_NCAPS
#ifndef __ASSEMBLY__
#include <linux/init.h>
@ -22,12 +24,19 @@ struct alt_instr {
u8 alt_len; /* size of new instruction(s), <= orig_len */
};
typedef void (*alternative_cb_t)(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
void __init apply_alternatives_all(void);
void apply_alternatives(void *start, size_t length);
#define ALTINSTR_ENTRY(feature) \
#define ALTINSTR_ENTRY(feature,cb) \
" .word 661b - .\n" /* label */ \
" .if " __stringify(cb) " == 0\n" \
" .word 663f - .\n" /* new instruction */ \
" .else\n" \
" .word " __stringify(cb) "- .\n" /* callback */ \
" .endif\n" \
" .hword " __stringify(feature) "\n" /* feature bit */ \
" .byte 662b-661b\n" /* source len */ \
" .byte 664f-663f\n" /* replacement len */
@ -45,15 +54,18 @@ void apply_alternatives(void *start, size_t length);
* but most assemblers die if insn1 or insn2 have a .inst. This should
* be fixed in a binutils release posterior to 2.25.51.0.2 (anything
* containing commit 4e4d08cf7399b606 or c1baaddf8861).
*
* Alternatives with callbacks do not generate replacement instructions.
*/
#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled) \
#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled, cb) \
".if "__stringify(cfg_enabled)" == 1\n" \
"661:\n\t" \
oldinstr "\n" \
"662:\n" \
".pushsection .altinstructions,\"a\"\n" \
ALTINSTR_ENTRY(feature) \
ALTINSTR_ENTRY(feature,cb) \
".popsection\n" \
" .if " __stringify(cb) " == 0\n" \
".pushsection .altinstr_replacement, \"a\"\n" \
"663:\n\t" \
newinstr "\n" \
@ -61,11 +73,17 @@ void apply_alternatives(void *start, size_t length);
".popsection\n\t" \
".org . - (664b-663b) + (662b-661b)\n\t" \
".org . - (662b-661b) + (664b-663b)\n" \
".else\n\t" \
"663:\n\t" \
"664:\n\t" \
".endif\n" \
".endif\n"
#define _ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg, ...) \
__ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg))
__ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg), 0)
#define ALTERNATIVE_CB(oldinstr, cb) \
__ALTERNATIVE_CFG(oldinstr, "NOT_AN_INSTRUCTION", ARM64_CB_PATCH, 1, cb)
#else
#include <asm/assembler.h>
@ -132,6 +150,14 @@ void apply_alternatives(void *start, size_t length);
661:
.endm
.macro alternative_cb cb
.set .Lasm_alt_mode, 0
.pushsection .altinstructions, "a"
altinstruction_entry 661f, \cb, ARM64_CB_PATCH, 662f-661f, 0
.popsection
661:
.endm
/*
* Provide the other half of the alternative code sequence.
*/
@ -157,6 +183,13 @@ void apply_alternatives(void *start, size_t length);
.org . - (662b-661b) + (664b-663b)
.endm
/*
* Callback-based alternative epilogue
*/
.macro alternative_cb_end
662:
.endm
/*
* Provides a trivial alternative or default sequence consisting solely
* of NOPs. The number of NOPs is chosen automatically to match the

View File

@ -32,7 +32,7 @@
#define ARM64_HAS_VIRT_HOST_EXTN 11
#define ARM64_WORKAROUND_CAVIUM_27456 12
#define ARM64_HAS_32BIT_EL0 13
#define ARM64_HYP_OFFSET_LOW 14
#define ARM64_HARDEN_EL2_VECTORS 14
#define ARM64_MISMATCHED_CACHE_LINE_SIZE 15
#define ARM64_HAS_NO_FPSIMD 16
#define ARM64_WORKAROUND_REPEAT_TLBI 17

View File

@ -70,6 +70,7 @@ enum aarch64_insn_imm_type {
AARCH64_INSN_IMM_6,
AARCH64_INSN_IMM_S,
AARCH64_INSN_IMM_R,
AARCH64_INSN_IMM_N,
AARCH64_INSN_IMM_MAX
};
@ -314,6 +315,11 @@ __AARCH64_INSN_FUNCS(eor, 0x7F200000, 0x4A000000)
__AARCH64_INSN_FUNCS(eon, 0x7F200000, 0x4A200000)
__AARCH64_INSN_FUNCS(ands, 0x7F200000, 0x6A000000)
__AARCH64_INSN_FUNCS(bics, 0x7F200000, 0x6A200000)
__AARCH64_INSN_FUNCS(and_imm, 0x7F800000, 0x12000000)
__AARCH64_INSN_FUNCS(orr_imm, 0x7F800000, 0x32000000)
__AARCH64_INSN_FUNCS(eor_imm, 0x7F800000, 0x52000000)
__AARCH64_INSN_FUNCS(ands_imm, 0x7F800000, 0x72000000)
__AARCH64_INSN_FUNCS(extr, 0x7FA00000, 0x13800000)
__AARCH64_INSN_FUNCS(b, 0xFC000000, 0x14000000)
__AARCH64_INSN_FUNCS(bl, 0xFC000000, 0x94000000)
__AARCH64_INSN_FUNCS(cbz, 0x7F000000, 0x34000000)
@ -423,6 +429,16 @@ u32 aarch64_insn_gen_logical_shifted_reg(enum aarch64_insn_register dst,
int shift,
enum aarch64_insn_variant variant,
enum aarch64_insn_logic_type type);
u32 aarch64_insn_gen_logical_immediate(enum aarch64_insn_logic_type type,
enum aarch64_insn_variant variant,
enum aarch64_insn_register Rn,
enum aarch64_insn_register Rd,
u64 imm);
u32 aarch64_insn_gen_extr(enum aarch64_insn_variant variant,
enum aarch64_insn_register Rm,
enum aarch64_insn_register Rn,
enum aarch64_insn_register Rd,
u8 lsb);
u32 aarch64_insn_gen_prefetch(enum aarch64_insn_register base,
enum aarch64_insn_prfm_type type,
enum aarch64_insn_prfm_target target,

View File

@ -25,6 +25,7 @@
/* Hyp Configuration Register (HCR) bits */
#define HCR_TEA (UL(1) << 37)
#define HCR_TERR (UL(1) << 36)
#define HCR_TLOR (UL(1) << 35)
#define HCR_E2H (UL(1) << 34)
#define HCR_ID (UL(1) << 33)
#define HCR_CD (UL(1) << 32)
@ -64,6 +65,7 @@
/*
* The bits we set in HCR:
* TLOR: Trap LORegion register accesses
* RW: 64bit by default, can be overridden for 32bit VMs
* TAC: Trap ACTLR
* TSC: Trap SMC
@ -81,9 +83,9 @@
*/
#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW)
HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
HCR_FMO | HCR_IMO)
#define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
#define HCR_INT_OVERRIDE (HCR_FMO | HCR_IMO)
#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
/* TCR_EL2 Registers bits */

View File

@ -33,6 +33,7 @@
#define KVM_ARM64_DEBUG_DIRTY_SHIFT 0
#define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
/* Translate a kernel address of @sym into its equivalent linear mapping */
#define kvm_ksym_ref(sym) \
({ \
void *val = &sym; \
@ -57,7 +58,9 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
extern u64 __vgic_v3_get_ich_vtr_el2(void);
extern u64 __vgic_v3_read_vmcr(void);
@ -70,6 +73,20 @@ extern u32 __init_stage2_translation(void);
extern void __qcom_hyp_sanitize_btac_predictors(void);
#else /* __ASSEMBLY__ */
.macro get_host_ctxt reg, tmp
adr_l \reg, kvm_host_cpu_state
mrs \tmp, tpidr_el2
add \reg, \reg, \tmp
.endm
.macro get_vcpu_ptr vcpu, ctxt
get_host_ctxt \ctxt, \vcpu
ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU]
kern_hyp_va \vcpu
.endm
#endif
#endif /* __ARM_KVM_ASM_H__ */

View File

@ -26,13 +26,15 @@
#include <asm/esr.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmio.h>
#include <asm/ptrace.h>
#include <asm/cputype.h>
#include <asm/virt.h>
unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr);
@ -45,6 +47,11 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
{
return !(vcpu->arch.hcr_el2 & HCR_RW);
}
static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
@ -59,16 +66,19 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
vcpu->arch.hcr_el2 &= ~HCR_RW;
/*
* TID3: trap feature register accesses that we virtualise.
* For now this is conditional, since no AArch32 feature regs
* are currently virtualised.
*/
if (!vcpu_el1_is_32bit(vcpu))
vcpu->arch.hcr_el2 |= HCR_TID3;
}
static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.hcr_el2;
}
static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
{
vcpu->arch.hcr_el2 = hcr;
return (unsigned long *)&vcpu->arch.hcr_el2;
}
static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
@ -81,11 +91,27 @@ static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
return (unsigned long *)&vcpu_gp_regs(vcpu)->regs.pc;
}
static inline unsigned long *vcpu_elr_el1(const struct kvm_vcpu *vcpu)
static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
{
return (unsigned long *)&vcpu_gp_regs(vcpu)->elr_el1;
}
static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
{
if (vcpu->arch.sysregs_loaded_on_cpu)
return read_sysreg_el1(elr);
else
return *__vcpu_elr_el1(vcpu);
}
static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned long v)
{
if (vcpu->arch.sysregs_loaded_on_cpu)
write_sysreg_el1(v, elr);
else
*__vcpu_elr_el1(vcpu) = v;
}
static inline unsigned long *vcpu_cpsr(const struct kvm_vcpu *vcpu)
{
return (unsigned long *)&vcpu_gp_regs(vcpu)->regs.pstate;
@ -135,13 +161,28 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val;
}
/* Get vcpu SPSR for current mode */
static inline unsigned long *vcpu_spsr(const struct kvm_vcpu *vcpu)
static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
{
if (vcpu_mode_is_32bit(vcpu))
return vcpu_spsr32(vcpu);
return vcpu_read_spsr32(vcpu);
return (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
if (vcpu->arch.sysregs_loaded_on_cpu)
return read_sysreg_el1(spsr);
else
return vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
}
static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
{
if (vcpu_mode_is_32bit(vcpu)) {
vcpu_write_spsr32(vcpu, v);
return;
}
if (vcpu->arch.sysregs_loaded_on_cpu)
write_sysreg_el1(v, spsr);
else
vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
}
static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
@ -282,15 +323,18 @@ static inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
{
return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
}
static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
{
if (vcpu_mode_is_32bit(vcpu))
if (vcpu_mode_is_32bit(vcpu)) {
*vcpu_cpsr(vcpu) |= COMPAT_PSR_E_BIT;
else
vcpu_sys_reg(vcpu, SCTLR_EL1) |= (1 << 25);
} else {
u64 sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
sctlr |= (1 << 25);
vcpu_write_sys_reg(vcpu, SCTLR_EL1, sctlr);
}
}
static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
@ -298,7 +342,7 @@ static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
if (vcpu_mode_is_32bit(vcpu))
return !!(*vcpu_cpsr(vcpu) & COMPAT_PSR_E_BIT);
return !!(vcpu_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
}
static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,

View File

@ -272,9 +272,6 @@ struct kvm_vcpu_arch {
/* IO related fields */
struct kvm_decode mmio_decode;
/* Interrupt related fields */
u64 irq_lines; /* IRQ and FIQ levels */
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
@ -287,10 +284,25 @@ struct kvm_vcpu_arch {
/* Virtual SError ESR to restore when HCR_EL2.VSE is set */
u64 vsesr_el2;
/* True when deferrable sysregs are loaded on the physical CPU,
* see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
bool sysregs_loaded_on_cpu;
};
#define vcpu_gp_regs(v) (&(v)->arch.ctxt.gp_regs)
#define vcpu_sys_reg(v,r) ((v)->arch.ctxt.sys_regs[(r)])
/*
* Only use __vcpu_sys_reg if you know you want the memory backed version of a
* register, and not the one most recently accessed by a running VCPU. For
* example, for userspace access or for system registers that are never context
* switched, but only emulated.
*/
#define __vcpu_sys_reg(v,r) ((v)->arch.ctxt.sys_regs[(r)])
u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg);
/*
* CP14 and CP15 live in the same array, as they are backed by the
* same system registers.
@ -298,14 +310,6 @@ struct kvm_vcpu_arch {
#define vcpu_cp14(v,r) ((v)->arch.ctxt.copro[(r)])
#define vcpu_cp15(v,r) ((v)->arch.ctxt.copro[(r)])
#ifdef CONFIG_CPU_BIG_ENDIAN
#define vcpu_cp15_64_high(v,r) vcpu_cp15((v),(r))
#define vcpu_cp15_64_low(v,r) vcpu_cp15((v),(r) + 1)
#else
#define vcpu_cp15_64_high(v,r) vcpu_cp15((v),(r) + 1)
#define vcpu_cp15_64_low(v,r) vcpu_cp15((v),(r))
#endif
struct kvm_vm_stat {
ulong remote_tlb_flush;
};
@ -358,10 +362,15 @@ int kvm_perf_teardown(void);
struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
void __kvm_set_tpidr_el2(u64 tpidr_el2);
DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
unsigned long hyp_stack_ptr,
unsigned long vector_ptr)
{
u64 tpidr_el2;
/*
* Call initialization code, and switch to the full blown HYP code.
* If the cpucaps haven't been finalized yet, something has gone very
@ -370,6 +379,16 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
*/
BUG_ON(!static_branch_likely(&arm64_const_caps_ready));
__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
/*
* Calculate the raw per-cpu offset without a translation from the
* kernel's mapping to the linear mapping, and store it in tpidr_el2
* so that we can use adr_l to access per-cpu variables in EL2.
*/
tpidr_el2 = (u64)this_cpu_ptr(&kvm_host_cpu_state)
- (u64)kvm_ksym_ref(kvm_host_cpu_state);
kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2);
}
static inline void kvm_arch_hardware_unsetup(void) {}
@ -416,6 +435,13 @@ static inline void kvm_arm_vhe_guest_enter(void)
static inline void kvm_arm_vhe_guest_exit(void)
{
local_daif_restore(DAIF_PROCCTX_NOIRQ);
/*
* When we exit from the guest we change a number of CPU configuration
* parameters, such as traps. Make sure these changes take effect
* before running the host or additional guests.
*/
isb();
}
static inline bool kvm_arm_harden_branch_predictor(void)
@ -423,4 +449,7 @@ static inline bool kvm_arm_harden_branch_predictor(void)
return cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR);
}
void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu);
void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
#endif /* __ARM64_KVM_HOST_H__ */

View File

@ -120,37 +120,38 @@ typeof(orig) * __hyp_text fname(void) \
return val; \
}
void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
void __timer_enable_traps(struct kvm_vcpu *vcpu);
void __timer_disable_traps(struct kvm_vcpu *vcpu);
void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt);
void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt);
void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt);
void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt);
void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt);
void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt);
void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt);
void __sysreg32_save_state(struct kvm_vcpu *vcpu);
void __sysreg32_restore_state(struct kvm_vcpu *vcpu);
void __debug_save_state(struct kvm_vcpu *vcpu,
struct kvm_guest_debug_arch *dbg,
struct kvm_cpu_context *ctxt);
void __debug_restore_state(struct kvm_vcpu *vcpu,
struct kvm_guest_debug_arch *dbg,
struct kvm_cpu_context *ctxt);
void __debug_cond_save_host_state(struct kvm_vcpu *vcpu);
void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu);
void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
void __debug_switch_to_host(struct kvm_vcpu *vcpu);
void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
bool __fpsimd_enabled(void);
void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
void deactivate_traps_vhe_put(void);
u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
void __noreturn __hyp_do_panic(unsigned long, ...);

View File

@ -69,9 +69,6 @@
* mappings, and none of this applies in that case.
*/
#define HYP_PAGE_OFFSET_HIGH_MASK ((UL(1) << VA_BITS) - 1)
#define HYP_PAGE_OFFSET_LOW_MASK ((UL(1) << (VA_BITS - 1)) - 1)
#ifdef __ASSEMBLY__
#include <asm/alternative.h>
@ -81,28 +78,19 @@
* Convert a kernel VA into a HYP VA.
* reg: VA to be converted.
*
* This generates the following sequences:
* - High mask:
* and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
* nop
* - Low mask:
* and x0, x0, #HYP_PAGE_OFFSET_HIGH_MASK
* and x0, x0, #HYP_PAGE_OFFSET_LOW_MASK
* - VHE:
* nop
* nop
*
* The "low mask" version works because the mask is a strict subset of
* the "high mask", hence performing the first mask for nothing.
* Should be completely invisible on any viable CPU.
* The actual code generation takes place in kvm_update_va_mask, and
* the instructions below are only there to reserve the space and
* perform the register allocation (kvm_update_va_mask uses the
* specific registers encoded in the instructions).
*/
.macro kern_hyp_va reg
alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
and \reg, \reg, #HYP_PAGE_OFFSET_HIGH_MASK
alternative_else_nop_endif
alternative_if ARM64_HYP_OFFSET_LOW
and \reg, \reg, #HYP_PAGE_OFFSET_LOW_MASK
alternative_else_nop_endif
alternative_cb kvm_update_va_mask
and \reg, \reg, #1 /* mask with va_mask */
ror \reg, \reg, #1 /* rotate to the first tag bit */
add \reg, \reg, #0 /* insert the low 12 bits of the tag */
add \reg, \reg, #0, lsl 12 /* insert the top 12 bits of the tag */
ror \reg, \reg, #63 /* rotate back */
alternative_cb_end
.endm
#else
@ -113,23 +101,43 @@ alternative_else_nop_endif
#include <asm/mmu_context.h>
#include <asm/pgtable.h>
void kvm_update_va_mask(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
static inline unsigned long __kern_hyp_va(unsigned long v)
{
asm volatile(ALTERNATIVE("and %0, %0, %1",
"nop",
ARM64_HAS_VIRT_HOST_EXTN)
: "+r" (v)
: "i" (HYP_PAGE_OFFSET_HIGH_MASK));
asm volatile(ALTERNATIVE("nop",
"and %0, %0, %1",
ARM64_HYP_OFFSET_LOW)
: "+r" (v)
: "i" (HYP_PAGE_OFFSET_LOW_MASK));
asm volatile(ALTERNATIVE_CB("and %0, %0, #1\n"
"ror %0, %0, #1\n"
"add %0, %0, #0\n"
"add %0, %0, #0, lsl 12\n"
"ror %0, %0, #63\n",
kvm_update_va_mask)
: "+r" (v));
return v;
}
#define kern_hyp_va(v) ((typeof(v))(__kern_hyp_va((unsigned long)(v))))
/*
* Obtain the PC-relative address of a kernel symbol
* s: symbol
*
* The goal of this macro is to return a symbol's address based on a
* PC-relative computation, as opposed to a loading the VA from a
* constant pool or something similar. This works well for HYP, as an
* absolute VA is guaranteed to be wrong. Only use this if trying to
* obtain the address of a symbol (i.e. not something you obtained by
* following a pointer).
*/
#define hyp_symbol_addr(s) \
({ \
typeof(s) *addr; \
asm("adrp %0, %1\n" \
"add %0, %0, :lo12:%1\n" \
: "=r" (addr) : "S" (&s)); \
addr; \
})
/*
* We currently only support a 40bit IPA.
*/
@ -140,7 +148,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
#include <asm/stage2_pgtable.h>
int create_hyp_mappings(void *from, void *to, pgprot_t prot);
int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
void __iomem **kaddr,
void __iomem **haddr);
int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
void **haddr);
void free_hyp_pgds(void);
void stage2_unmap_vm(struct kvm *kvm);
@ -249,7 +261,7 @@ struct kvm;
static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
{
return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
}
static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
@ -348,36 +360,95 @@ static inline unsigned int kvm_get_vmid_bits(void)
return (cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR1_VMIDBITS_SHIFT) == 2) ? 16 : 8;
}
#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
#ifdef CONFIG_KVM_INDIRECT_VECTORS
/*
* EL2 vectors can be mapped and rerouted in a number of ways,
* depending on the kernel configuration and CPU present:
*
* - If the CPU has the ARM64_HARDEN_BRANCH_PREDICTOR cap, the
* hardening sequence is placed in one of the vector slots, which is
* executed before jumping to the real vectors.
*
* - If the CPU has both the ARM64_HARDEN_EL2_VECTORS cap and the
* ARM64_HARDEN_BRANCH_PREDICTOR cap, the slot containing the
* hardening sequence is mapped next to the idmap page, and executed
* before jumping to the real vectors.
*
* - If the CPU only has the ARM64_HARDEN_EL2_VECTORS cap, then an
* empty slot is selected, mapped next to the idmap page, and
* executed before jumping to the real vectors.
*
* Note that ARM64_HARDEN_EL2_VECTORS is somewhat incompatible with
* VHE, as we don't have hypervisor-specific mappings. If the system
* is VHE and yet selects this capability, it will be ignored.
*/
#include <asm/mmu.h>
extern void *__kvm_bp_vect_base;
extern int __kvm_harden_el2_vector_slot;
static inline void *kvm_get_hyp_vector(void)
{
struct bp_hardening_data *data = arm64_get_bp_hardening_data();
void *vect = kvm_ksym_ref(__kvm_hyp_vector);
void *vect = kern_hyp_va(kvm_ksym_ref(__kvm_hyp_vector));
int slot = -1;
if (data->fn) {
vect = __bp_harden_hyp_vecs_start +
data->hyp_vectors_slot * SZ_2K;
if (!has_vhe())
vect = lm_alias(vect);
if (cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR) && data->fn) {
vect = kern_hyp_va(kvm_ksym_ref(__bp_harden_hyp_vecs_start));
slot = data->hyp_vectors_slot;
}
if (this_cpu_has_cap(ARM64_HARDEN_EL2_VECTORS) && !has_vhe()) {
vect = __kvm_bp_vect_base;
if (slot == -1)
slot = __kvm_harden_el2_vector_slot;
}
if (slot != -1)
vect += slot * SZ_2K;
return vect;
}
/* This is only called on a !VHE system */
static inline int kvm_map_vectors(void)
{
return create_hyp_mappings(kvm_ksym_ref(__bp_harden_hyp_vecs_start),
kvm_ksym_ref(__bp_harden_hyp_vecs_end),
PAGE_HYP_EXEC);
}
/*
* HBP = ARM64_HARDEN_BRANCH_PREDICTOR
* HEL2 = ARM64_HARDEN_EL2_VECTORS
*
* !HBP + !HEL2 -> use direct vectors
* HBP + !HEL2 -> use hardened vectors in place
* !HBP + HEL2 -> allocate one vector slot and use exec mapping
* HBP + HEL2 -> use hardened vertors and use exec mapping
*/
if (cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR)) {
__kvm_bp_vect_base = kvm_ksym_ref(__bp_harden_hyp_vecs_start);
__kvm_bp_vect_base = kern_hyp_va(__kvm_bp_vect_base);
}
if (cpus_have_const_cap(ARM64_HARDEN_EL2_VECTORS)) {
phys_addr_t vect_pa = __pa_symbol(__bp_harden_hyp_vecs_start);
unsigned long size = (__bp_harden_hyp_vecs_end -
__bp_harden_hyp_vecs_start);
/*
* Always allocate a spare vector slot, as we don't
* know yet which CPUs have a BP hardening slot that
* we can reuse.
*/
__kvm_harden_el2_vector_slot = atomic_inc_return(&arm64_el2_vector_last_slot);
BUG_ON(__kvm_harden_el2_vector_slot >= BP_HARDEN_EL2_SLOTS);
return create_hyp_exec_mappings(vect_pa, size,
&__kvm_bp_vect_base);
}
return 0;
}
#else
static inline void *kvm_get_hyp_vector(void)
{
return kvm_ksym_ref(__kvm_hyp_vector);
return kern_hyp_va(kvm_ksym_ref(__kvm_hyp_vector));
}
static inline int kvm_map_vectors(void)

View File

@ -21,6 +21,8 @@
#define USER_ASID_FLAG (UL(1) << USER_ASID_BIT)
#define TTBR_ASID_MASK (UL(0xffff) << 48)
#define BP_HARDEN_EL2_SLOTS 4
#ifndef __ASSEMBLY__
typedef struct {
@ -49,9 +51,13 @@ struct bp_hardening_data {
bp_hardening_cb_t fn;
};
#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
#if (defined(CONFIG_HARDEN_BRANCH_PREDICTOR) || \
defined(CONFIG_HARDEN_EL2_VECTORS))
extern char __bp_harden_hyp_vecs_start[], __bp_harden_hyp_vecs_end[];
extern atomic_t arm64_el2_vector_last_slot;
#endif /* CONFIG_HARDEN_BRANCH_PREDICTOR || CONFIG_HARDEN_EL2_VECTORS */
#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
DECLARE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
static inline struct bp_hardening_data *arm64_get_bp_hardening_data(void)

View File

@ -288,6 +288,12 @@
#define SYS_MAIR_EL1 sys_reg(3, 0, 10, 2, 0)
#define SYS_AMAIR_EL1 sys_reg(3, 0, 10, 3, 0)
#define SYS_LORSA_EL1 sys_reg(3, 0, 10, 4, 0)
#define SYS_LOREA_EL1 sys_reg(3, 0, 10, 4, 1)
#define SYS_LORN_EL1 sys_reg(3, 0, 10, 4, 2)
#define SYS_LORC_EL1 sys_reg(3, 0, 10, 4, 3)
#define SYS_LORID_EL1 sys_reg(3, 0, 10, 4, 7)
#define SYS_VBAR_EL1 sys_reg(3, 0, 12, 0, 0)
#define SYS_DISR_EL1 sys_reg(3, 0, 12, 1, 1)

View File

@ -55,9 +55,7 @@ arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
arm64-obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
ifeq ($(CONFIG_KVM),y)
arm64-obj-$(CONFIG_HARDEN_BRANCH_PREDICTOR) += bpi.o
endif
arm64-obj-$(CONFIG_KVM_INDIRECT_VECTORS)+= bpi.o
obj-y += $(arm64-obj-y) vdso/ probes/
obj-m += $(arm64-obj-m)

View File

@ -107,32 +107,53 @@ static u32 get_alt_insn(struct alt_instr *alt, __le32 *insnptr, __le32 *altinsnp
return insn;
}
static void patch_alternative(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst)
{
__le32 *replptr;
int i;
replptr = ALT_REPL_PTR(alt);
for (i = 0; i < nr_inst; i++) {
u32 insn;
insn = get_alt_insn(alt, origptr + i, replptr + i);
updptr[i] = cpu_to_le32(insn);
}
}
static void __apply_alternatives(void *alt_region, bool use_linear_alias)
{
struct alt_instr *alt;
struct alt_region *region = alt_region;
__le32 *origptr, *replptr, *updptr;
__le32 *origptr, *updptr;
alternative_cb_t alt_cb;
for (alt = region->begin; alt < region->end; alt++) {
u32 insn;
int i, nr_inst;
int nr_inst;
if (!cpus_have_cap(alt->cpufeature))
/* Use ARM64_CB_PATCH as an unconditional patch */
if (alt->cpufeature < ARM64_CB_PATCH &&
!cpus_have_cap(alt->cpufeature))
continue;
BUG_ON(alt->alt_len != alt->orig_len);
if (alt->cpufeature == ARM64_CB_PATCH)
BUG_ON(alt->alt_len != 0);
else
BUG_ON(alt->alt_len != alt->orig_len);
pr_info_once("patching kernel code\n");
origptr = ALT_ORIG_PTR(alt);
replptr = ALT_REPL_PTR(alt);
updptr = use_linear_alias ? lm_alias(origptr) : origptr;
nr_inst = alt->alt_len / sizeof(insn);
nr_inst = alt->orig_len / AARCH64_INSN_SIZE;
for (i = 0; i < nr_inst; i++) {
insn = get_alt_insn(alt, origptr + i, replptr + i);
updptr[i] = cpu_to_le32(insn);
}
if (alt->cpufeature < ARM64_CB_PATCH)
alt_cb = patch_alternative;
else
alt_cb = ALT_REPL_PTR(alt);
alt_cb(alt, origptr, updptr, nr_inst);
flush_icache_range((uintptr_t)origptr,
(uintptr_t)(origptr + nr_inst));

View File

@ -138,6 +138,7 @@ int main(void)
DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs));
DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context));
DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
#endif
#ifdef CONFIG_CPU_PM
DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx));

View File

@ -19,42 +19,61 @@
#include <linux/linkage.h>
#include <linux/arm-smccc.h>
.macro ventry target
.rept 31
#include <asm/alternative.h>
#include <asm/mmu.h>
.macro hyp_ventry
.align 7
1: .rept 27
nop
.endr
b \target
/*
* The default sequence is to directly branch to the KVM vectors,
* using the computed offset. This applies for VHE as well as
* !ARM64_HARDEN_EL2_VECTORS.
*
* For ARM64_HARDEN_EL2_VECTORS configurations, this gets replaced
* with:
*
* stp x0, x1, [sp, #-16]!
* movz x0, #(addr & 0xffff)
* movk x0, #((addr >> 16) & 0xffff), lsl #16
* movk x0, #((addr >> 32) & 0xffff), lsl #32
* br x0
*
* Where addr = kern_hyp_va(__kvm_hyp_vector) + vector-offset + 4.
* See kvm_patch_vector_branch for details.
*/
alternative_cb kvm_patch_vector_branch
b __kvm_hyp_vector + (1b - 0b)
nop
nop
nop
nop
alternative_cb_end
.endm
.macro vectors target
ventry \target + 0x000
ventry \target + 0x080
ventry \target + 0x100
ventry \target + 0x180
ventry \target + 0x200
ventry \target + 0x280
ventry \target + 0x300
ventry \target + 0x380
ventry \target + 0x400
ventry \target + 0x480
ventry \target + 0x500
ventry \target + 0x580
ventry \target + 0x600
ventry \target + 0x680
ventry \target + 0x700
ventry \target + 0x780
.macro generate_vectors
0:
.rept 16
hyp_ventry
.endr
.org 0b + SZ_2K // Safety measure
.endm
.text
.pushsection .hyp.text, "ax"
.align 11
ENTRY(__bp_harden_hyp_vecs_start)
.rept 4
vectors __kvm_hyp_vector
.rept BP_HARDEN_EL2_SLOTS
generate_vectors
.endr
ENTRY(__bp_harden_hyp_vecs_end)
.popsection
ENTRY(__qcom_hyp_sanitize_link_stack_start)
stp x29, x30, [sp, #-16]!
.rept 16

View File

@ -78,6 +78,8 @@ cpu_enable_trap_ctr_access(const struct arm64_cpu_capabilities *__unused)
config_sctlr_el1(SCTLR_EL1_UCT, 0);
}
atomic_t arm64_el2_vector_last_slot = ATOMIC_INIT(-1);
#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
#include <asm/mmu_context.h>
#include <asm/cacheflush.h>
@ -108,7 +110,6 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
const char *hyp_vecs_start,
const char *hyp_vecs_end)
{
static int last_slot = -1;
static DEFINE_SPINLOCK(bp_lock);
int cpu, slot = -1;
@ -121,10 +122,8 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
}
if (slot == -1) {
last_slot++;
BUG_ON(((__bp_harden_hyp_vecs_end - __bp_harden_hyp_vecs_start)
/ SZ_2K) <= last_slot);
slot = last_slot;
slot = atomic_inc_return(&arm64_el2_vector_last_slot);
BUG_ON(slot >= BP_HARDEN_EL2_SLOTS);
__copy_hyp_vect_bpi(slot, hyp_vecs_start, hyp_vecs_end);
}
@ -348,6 +347,10 @@ static const struct arm64_cpu_capabilities arm64_bp_harden_list[] = {
#endif
#ifndef ERRATA_MIDR_ALL_VERSIONS
#define ERRATA_MIDR_ALL_VERSIONS(x) MIDR_ALL_VERSIONS(x)
#endif
const struct arm64_cpu_capabilities arm64_errata[] = {
#if defined(CONFIG_ARM64_ERRATUM_826319) || \
defined(CONFIG_ARM64_ERRATUM_827319) || \
@ -500,6 +503,18 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
.capability = ARM64_HARDEN_BP_POST_GUEST_EXIT,
ERRATA_MIDR_RANGE_LIST(qcom_bp_harden_cpus),
},
#endif
#ifdef CONFIG_HARDEN_EL2_VECTORS
{
.desc = "Cortex-A57 EL2 vector hardening",
.capability = ARM64_HARDEN_EL2_VECTORS,
ERRATA_MIDR_ALL_VERSIONS(MIDR_CORTEX_A57),
},
{
.desc = "Cortex-A72 EL2 vector hardening",
.capability = ARM64_HARDEN_EL2_VECTORS,
ERRATA_MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
},
#endif
{
}

View File

@ -838,19 +838,6 @@ static bool has_no_hw_prefetch(const struct arm64_cpu_capabilities *entry, int _
MIDR_CPU_VAR_REV(1, MIDR_REVISION_MASK));
}
static bool hyp_offset_low(const struct arm64_cpu_capabilities *entry,
int __unused)
{
phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
/*
* Activate the lower HYP offset only if:
* - the idmap doesn't clash with it,
* - the kernel is not running at EL2.
*/
return idmap_addr > GENMASK(VA_BITS - 2, 0) && !is_kernel_in_hyp_mode();
}
static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unused)
{
u64 pfr0 = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
@ -1121,12 +1108,6 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.field_pos = ID_AA64PFR0_EL0_SHIFT,
.min_field_value = ID_AA64PFR0_EL0_32BIT_64BIT,
},
{
.desc = "Reduced HYP mapping offset",
.capability = ARM64_HYP_OFFSET_LOW,
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.matches = hyp_offset_low,
},
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
{
.desc = "Kernel page table isolation (KPTI)",

View File

@ -577,6 +577,13 @@ set_hcr:
7:
msr mdcr_el2, x3 // Configure debug traps
/* LORegions */
mrs x1, id_aa64mmfr1_el1
ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
cbz x0, 1f
msr_s SYS_LORC_EL1, xzr
1:
/* Stage-2 translation */
msr vttbr_el2, xzr

View File

@ -35,6 +35,7 @@
#define AARCH64_INSN_SF_BIT BIT(31)
#define AARCH64_INSN_N_BIT BIT(22)
#define AARCH64_INSN_LSL_12 BIT(22)
static int aarch64_insn_encoding_class[] = {
AARCH64_INSN_CLS_UNKNOWN,
@ -343,6 +344,10 @@ static int __kprobes aarch64_get_imm_shift_mask(enum aarch64_insn_imm_type type,
mask = BIT(6) - 1;
shift = 16;
break;
case AARCH64_INSN_IMM_N:
mask = 1;
shift = 22;
break;
default:
return -EINVAL;
}
@ -899,9 +904,18 @@ u32 aarch64_insn_gen_add_sub_imm(enum aarch64_insn_register dst,
return AARCH64_BREAK_FAULT;
}
/* We can't encode more than a 24bit value (12bit + 12bit shift) */
if (imm & ~(BIT(24) - 1))
goto out;
/* If we have something in the top 12 bits... */
if (imm & ~(SZ_4K - 1)) {
pr_err("%s: invalid immediate encoding %d\n", __func__, imm);
return AARCH64_BREAK_FAULT;
/* ... and in the low 12 bits -> error */
if (imm & (SZ_4K - 1))
goto out;
imm >>= 12;
insn |= AARCH64_INSN_LSL_12;
}
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RD, insn, dst);
@ -909,6 +923,10 @@ u32 aarch64_insn_gen_add_sub_imm(enum aarch64_insn_register dst,
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn, src);
return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_12, insn, imm);
out:
pr_err("%s: invalid immediate encoding %d\n", __func__, imm);
return AARCH64_BREAK_FAULT;
}
u32 aarch64_insn_gen_bitfield(enum aarch64_insn_register dst,
@ -1481,3 +1499,171 @@ pstate_check_t * const aarch32_opcode_cond_checks[16] = {
__check_hi, __check_ls, __check_ge, __check_lt,
__check_gt, __check_le, __check_al, __check_al
};
static bool range_of_ones(u64 val)
{
/* Doesn't handle full ones or full zeroes */
u64 sval = val >> __ffs64(val);
/* One of Sean Eron Anderson's bithack tricks */
return ((sval + 1) & (sval)) == 0;
}
static u32 aarch64_encode_immediate(u64 imm,
enum aarch64_insn_variant variant,
u32 insn)
{
unsigned int immr, imms, n, ones, ror, esz, tmp;
u64 mask = ~0UL;
/* Can't encode full zeroes or full ones */
if (!imm || !~imm)
return AARCH64_BREAK_FAULT;
switch (variant) {
case AARCH64_INSN_VARIANT_32BIT:
if (upper_32_bits(imm))
return AARCH64_BREAK_FAULT;
esz = 32;
break;
case AARCH64_INSN_VARIANT_64BIT:
insn |= AARCH64_INSN_SF_BIT;
esz = 64;
break;
default:
pr_err("%s: unknown variant encoding %d\n", __func__, variant);
return AARCH64_BREAK_FAULT;
}
/*
* Inverse of Replicate(). Try to spot a repeating pattern
* with a pow2 stride.
*/
for (tmp = esz / 2; tmp >= 2; tmp /= 2) {
u64 emask = BIT(tmp) - 1;
if ((imm & emask) != ((imm >> tmp) & emask))
break;
esz = tmp;
mask = emask;
}
/* N is only set if we're encoding a 64bit value */
n = esz == 64;
/* Trim imm to the element size */
imm &= mask;
/* That's how many ones we need to encode */
ones = hweight64(imm);
/*
* imms is set to (ones - 1), prefixed with a string of ones
* and a zero if they fit. Cap it to 6 bits.
*/
imms = ones - 1;
imms |= 0xf << ffs(esz);
imms &= BIT(6) - 1;
/* Compute the rotation */
if (range_of_ones(imm)) {
/*
* Pattern: 0..01..10..0
*
* Compute how many rotate we need to align it right
*/
ror = __ffs64(imm);
} else {
/*
* Pattern: 0..01..10..01..1
*
* Fill the unused top bits with ones, and check if
* the result is a valid immediate (all ones with a
* contiguous ranges of zeroes).
*/
imm |= ~mask;
if (!range_of_ones(~imm))
return AARCH64_BREAK_FAULT;
/*
* Compute the rotation to get a continuous set of
* ones, with the first bit set at position 0
*/
ror = fls(~imm);
}
/*
* immr is the number of bits we need to rotate back to the
* original set of ones. Note that this is relative to the
* element size...
*/
immr = (esz - ror) % esz;
insn = aarch64_insn_encode_immediate(AARCH64_INSN_IMM_N, insn, n);
insn = aarch64_insn_encode_immediate(AARCH64_INSN_IMM_R, insn, immr);
return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_S, insn, imms);
}
u32 aarch64_insn_gen_logical_immediate(enum aarch64_insn_logic_type type,
enum aarch64_insn_variant variant,
enum aarch64_insn_register Rn,
enum aarch64_insn_register Rd,
u64 imm)
{
u32 insn;
switch (type) {
case AARCH64_INSN_LOGIC_AND:
insn = aarch64_insn_get_and_imm_value();
break;
case AARCH64_INSN_LOGIC_ORR:
insn = aarch64_insn_get_orr_imm_value();
break;
case AARCH64_INSN_LOGIC_EOR:
insn = aarch64_insn_get_eor_imm_value();
break;
case AARCH64_INSN_LOGIC_AND_SETFLAGS:
insn = aarch64_insn_get_ands_imm_value();
break;
default:
pr_err("%s: unknown logical encoding %d\n", __func__, type);
return AARCH64_BREAK_FAULT;
}
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RD, insn, Rd);
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn, Rn);
return aarch64_encode_immediate(imm, variant, insn);
}
u32 aarch64_insn_gen_extr(enum aarch64_insn_variant variant,
enum aarch64_insn_register Rm,
enum aarch64_insn_register Rn,
enum aarch64_insn_register Rd,
u8 lsb)
{
u32 insn;
insn = aarch64_insn_get_extr_value();
switch (variant) {
case AARCH64_INSN_VARIANT_32BIT:
if (lsb > 31)
return AARCH64_BREAK_FAULT;
break;
case AARCH64_INSN_VARIANT_64BIT:
if (lsb > 63)
return AARCH64_BREAK_FAULT;
insn |= AARCH64_INSN_SF_BIT;
insn = aarch64_insn_encode_immediate(AARCH64_INSN_IMM_N, insn, 1);
break;
default:
pr_err("%s: unknown variant encoding %d\n", __func__, variant);
return AARCH64_BREAK_FAULT;
}
insn = aarch64_insn_encode_immediate(AARCH64_INSN_IMM_S, insn, lsb);
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RD, insn, Rd);
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn, Rn);
return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RM, insn, Rm);
}

View File

@ -57,6 +57,9 @@ config KVM_ARM_PMU
Adds support for a virtual Performance Monitoring Unit (PMU) in
virtual machines.
config KVM_INDIRECT_VECTORS
def_bool KVM && (HARDEN_BRANCH_PREDICTOR || HARDEN_EL2_VECTORS)
source drivers/vhost/Kconfig
endif # VIRTUALIZATION

View File

@ -16,7 +16,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/e
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o

View File

@ -46,7 +46,9 @@ static DEFINE_PER_CPU(u32, mdcr_el2);
*/
static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
{
vcpu->arch.guest_debug_preserved.mdscr_el1 = vcpu_sys_reg(vcpu, MDSCR_EL1);
u64 val = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
vcpu->arch.guest_debug_preserved.mdscr_el1 = val;
trace_kvm_arm_set_dreg32("Saved MDSCR_EL1",
vcpu->arch.guest_debug_preserved.mdscr_el1);
@ -54,10 +56,12 @@ static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
static void restore_guest_debug_regs(struct kvm_vcpu *vcpu)
{
vcpu_sys_reg(vcpu, MDSCR_EL1) = vcpu->arch.guest_debug_preserved.mdscr_el1;
u64 val = vcpu->arch.guest_debug_preserved.mdscr_el1;
vcpu_write_sys_reg(vcpu, val, MDSCR_EL1);
trace_kvm_arm_set_dreg32("Restored MDSCR_EL1",
vcpu_sys_reg(vcpu, MDSCR_EL1));
vcpu_read_sys_reg(vcpu, MDSCR_EL1));
}
/**
@ -108,6 +112,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
{
bool trap_debug = !(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY);
unsigned long mdscr;
trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
@ -152,9 +157,13 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
*/
if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
*vcpu_cpsr(vcpu) |= DBG_SPSR_SS;
vcpu_sys_reg(vcpu, MDSCR_EL1) |= DBG_MDSCR_SS;
mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
mdscr |= DBG_MDSCR_SS;
vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1);
} else {
vcpu_sys_reg(vcpu, MDSCR_EL1) &= ~DBG_MDSCR_SS;
mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
mdscr &= ~DBG_MDSCR_SS;
vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1);
}
trace_kvm_arm_set_dreg32("SPSR_EL2", *vcpu_cpsr(vcpu));
@ -170,7 +179,9 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
*/
if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW) {
/* Enable breakpoints/watchpoints */
vcpu_sys_reg(vcpu, MDSCR_EL1) |= DBG_MDSCR_MDE;
mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
mdscr |= DBG_MDSCR_MDE;
vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1);
vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
@ -193,8 +204,12 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
if (trap_debug)
vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
/* If KDE or MDE are set, perform a full save/restore cycle. */
if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_sys_reg(vcpu, MDSCR_EL1));
trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
}
void kvm_arm_clear_debug(struct kvm_vcpu *vcpu)

View File

@ -117,7 +117,6 @@ CPU_BE( orr x4, x4, #SCTLR_ELx_EE)
/* Set the stack and new vectors */
kern_hyp_va x1
mov sp, x1
kern_hyp_va x2
msr vbar_el2, x2
/* copy tpidr_el1 into tpidr_el2 for use by HYP */

View File

@ -7,10 +7,10 @@ ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
KVM=../../../../virt/kvm
obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o
obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-cpuif-proxy.o
obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
obj-$(CONFIG_KVM_ARM_HOST) += entry.o

View File

@ -66,11 +66,6 @@
default: write_debug(ptr[0], reg, 0); \
}
static void __hyp_text __debug_save_spe_vhe(u64 *pmscr_el1)
{
/* The vcpu can run. but it can't hide. */
}
static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
{
u64 reg;
@ -103,11 +98,7 @@ static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
dsb(nsh);
}
static hyp_alternate_select(__debug_save_spe,
__debug_save_spe_nvhe, __debug_save_spe_vhe,
ARM64_HAS_VIRT_HOST_EXTN);
static void __hyp_text __debug_restore_spe(u64 pmscr_el1)
static void __hyp_text __debug_restore_spe_nvhe(u64 pmscr_el1)
{
if (!pmscr_el1)
return;
@ -119,16 +110,13 @@ static void __hyp_text __debug_restore_spe(u64 pmscr_el1)
write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
}
void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
struct kvm_guest_debug_arch *dbg,
struct kvm_cpu_context *ctxt)
static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
struct kvm_guest_debug_arch *dbg,
struct kvm_cpu_context *ctxt)
{
u64 aa64dfr0;
int brps, wrps;
if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
return;
aa64dfr0 = read_sysreg(id_aa64dfr0_el1);
brps = (aa64dfr0 >> 12) & 0xf;
wrps = (aa64dfr0 >> 20) & 0xf;
@ -141,16 +129,13 @@ void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
ctxt->sys_regs[MDCCINT_EL1] = read_sysreg(mdccint_el1);
}
void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
struct kvm_guest_debug_arch *dbg,
struct kvm_cpu_context *ctxt)
static void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
struct kvm_guest_debug_arch *dbg,
struct kvm_cpu_context *ctxt)
{
u64 aa64dfr0;
int brps, wrps;
if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
return;
aa64dfr0 = read_sysreg(id_aa64dfr0_el1);
brps = (aa64dfr0 >> 12) & 0xf;
@ -164,27 +149,54 @@ void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
write_sysreg(ctxt->sys_regs[MDCCINT_EL1], mdccint_el1);
}
void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
{
/* If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY is set, perform
* a full save/restore cycle. */
if ((vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_KDE) ||
(vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_MDE))
vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
struct kvm_guest_debug_arch *host_dbg;
struct kvm_guest_debug_arch *guest_dbg;
__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
kern_hyp_va(vcpu->arch.host_cpu_context));
__debug_save_spe()(&vcpu->arch.host_debug_state.pmscr_el1);
/*
* Non-VHE: Disable and flush SPE data generation
* VHE: The vcpu can run, but it can't hide.
*/
if (!has_vhe())
__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
return;
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
guest_ctxt = &vcpu->arch.ctxt;
host_dbg = &vcpu->arch.host_debug_state.regs;
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
__debug_save_state(vcpu, host_dbg, host_ctxt);
__debug_restore_state(vcpu, guest_dbg, guest_ctxt);
}
void __hyp_text __debug_cond_restore_host_state(struct kvm_vcpu *vcpu)
void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
{
__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
__debug_restore_state(vcpu, &vcpu->arch.host_debug_state.regs,
kern_hyp_va(vcpu->arch.host_cpu_context));
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
struct kvm_guest_debug_arch *host_dbg;
struct kvm_guest_debug_arch *guest_dbg;
if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
if (!has_vhe())
__debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
return;
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
guest_ctxt = &vcpu->arch.ctxt;
host_dbg = &vcpu->arch.host_debug_state.regs;
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
__debug_save_state(vcpu, guest_dbg, guest_ctxt);
__debug_restore_state(vcpu, host_dbg, host_ctxt);
vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
}
u32 __hyp_text __kvm_get_mdcr_el2(void)

View File

@ -62,9 +62,6 @@ ENTRY(__guest_enter)
// Store the host regs
save_callee_saved_regs x1
// Store host_ctxt and vcpu for use at exit time
stp x1, x0, [sp, #-16]!
add x18, x0, #VCPU_CONTEXT
// Restore guest regs x0-x17
@ -118,8 +115,7 @@ ENTRY(__guest_exit)
// Store the guest regs x19-x29, lr
save_callee_saved_regs x1
// Restore the host_ctxt from the stack
ldr x2, [sp], #16
get_host_ctxt x2, x3
// Now restore the host regs
restore_callee_saved_regs x2

View File

@ -55,15 +55,9 @@ ENTRY(__vhe_hyp_call)
ENDPROC(__vhe_hyp_call)
el1_sync: // Guest trapped into EL2
stp x0, x1, [sp, #-16]!
alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
mrs x1, esr_el2
alternative_else
mrs x1, esr_el1
alternative_endif
lsr x0, x1, #ESR_ELx_EC_SHIFT
mrs x0, esr_el2
lsr x0, x0, #ESR_ELx_EC_SHIFT
cmp x0, #ESR_ELx_EC_HVC64
ccmp x0, #ESR_ELx_EC_HVC32, #4, ne
b.ne el1_trap
@ -117,10 +111,14 @@ el1_hvc_guest:
eret
el1_trap:
get_vcpu_ptr x1, x0
mrs x0, esr_el2
lsr x0, x0, #ESR_ELx_EC_SHIFT
/*
* x0: ESR_EC
* x1: vcpu pointer
*/
ldr x1, [sp, #16 + 8] // vcpu stored by __guest_enter
/*
* We trap the first access to the FP/SIMD to save the host context
@ -137,18 +135,18 @@ alternative_else_nop_endif
b __guest_exit
el1_irq:
stp x0, x1, [sp, #-16]!
ldr x1, [sp, #16 + 8]
get_vcpu_ptr x1, x0
mov x0, #ARM_EXCEPTION_IRQ
b __guest_exit
el1_error:
stp x0, x1, [sp, #-16]!
ldr x1, [sp, #16 + 8]
get_vcpu_ptr x1, x0
mov x0, #ARM_EXCEPTION_EL1_SERROR
b __guest_exit
el2_error:
ldp x0, x1, [sp], #16
/*
* Only two possibilities:
* 1) Either we come from the exit path, having just unmasked
@ -180,14 +178,7 @@ ENTRY(__hyp_do_panic)
ENDPROC(__hyp_do_panic)
ENTRY(__hyp_panic)
/*
* '=kvm_host_cpu_state' is a host VA from the constant pool, it may
* not be accessible by this address from EL2, hyp_panic() converts
* it with kern_hyp_va() before use.
*/
ldr x0, =kvm_host_cpu_state
mrs x1, tpidr_el2
add x0, x0, x1
get_host_ctxt x0, x1
b hyp_panic
ENDPROC(__hyp_panic)
@ -206,32 +197,43 @@ ENDPROC(\label)
invalid_vector el2h_sync_invalid
invalid_vector el2h_irq_invalid
invalid_vector el2h_fiq_invalid
invalid_vector el1_sync_invalid
invalid_vector el1_irq_invalid
invalid_vector el1_fiq_invalid
.ltorg
.align 11
.macro valid_vect target
.align 7
stp x0, x1, [sp, #-16]!
b \target
.endm
.macro invalid_vect target
.align 7
b \target
ldp x0, x1, [sp], #16
b \target
.endm
ENTRY(__kvm_hyp_vector)
ventry el2t_sync_invalid // Synchronous EL2t
ventry el2t_irq_invalid // IRQ EL2t
ventry el2t_fiq_invalid // FIQ EL2t
ventry el2t_error_invalid // Error EL2t
invalid_vect el2t_sync_invalid // Synchronous EL2t
invalid_vect el2t_irq_invalid // IRQ EL2t
invalid_vect el2t_fiq_invalid // FIQ EL2t
invalid_vect el2t_error_invalid // Error EL2t
ventry el2h_sync_invalid // Synchronous EL2h
ventry el2h_irq_invalid // IRQ EL2h
ventry el2h_fiq_invalid // FIQ EL2h
ventry el2_error // Error EL2h
invalid_vect el2h_sync_invalid // Synchronous EL2h
invalid_vect el2h_irq_invalid // IRQ EL2h
invalid_vect el2h_fiq_invalid // FIQ EL2h
valid_vect el2_error // Error EL2h
ventry el1_sync // Synchronous 64-bit EL1
ventry el1_irq // IRQ 64-bit EL1
ventry el1_fiq_invalid // FIQ 64-bit EL1
ventry el1_error // Error 64-bit EL1
valid_vect el1_sync // Synchronous 64-bit EL1
valid_vect el1_irq // IRQ 64-bit EL1
invalid_vect el1_fiq_invalid // FIQ 64-bit EL1
valid_vect el1_error // Error 64-bit EL1
ventry el1_sync // Synchronous 32-bit EL1
ventry el1_irq // IRQ 32-bit EL1
ventry el1_fiq_invalid // FIQ 32-bit EL1
ventry el1_error // Error 32-bit EL1
valid_vect el1_sync // Synchronous 32-bit EL1
valid_vect el1_irq // IRQ 32-bit EL1
invalid_vect el1_fiq_invalid // FIQ 32-bit EL1
valid_vect el1_error // Error 32-bit EL1
ENDPROC(__kvm_hyp_vector)

View File

@ -33,21 +33,60 @@ static bool __hyp_text __fpsimd_enabled_nvhe(void)
return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
}
static bool __hyp_text __fpsimd_enabled_vhe(void)
static bool fpsimd_enabled_vhe(void)
{
return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
}
static hyp_alternate_select(__fpsimd_is_enabled,
__fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
ARM64_HAS_VIRT_HOST_EXTN);
bool __hyp_text __fpsimd_enabled(void)
/* Save the 32-bit only FPSIMD system register state */
static void __hyp_text __fpsimd_save_fpexc32(struct kvm_vcpu *vcpu)
{
return __fpsimd_is_enabled()();
if (!vcpu_el1_is_32bit(vcpu))
return;
vcpu->arch.ctxt.sys_regs[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
}
static void __hyp_text __activate_traps_vhe(void)
static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
{
/*
* We are about to set CPTR_EL2.TFP to trap all floating point
* register accesses to EL2, however, the ARM ARM clearly states that
* traps are only taken to EL2 if the operation would not otherwise
* trap to EL1. Therefore, always make sure that for 32-bit guests,
* we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
* If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
* it will cause an exception.
*/
if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
write_sysreg(1 << 30, fpexc32_el2);
isb();
}
}
static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
{
/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
/*
* Make sure we trap PMU access from EL0 to EL2. Also sanitize
* PMSELR_EL0 to make sure it never contains the cycle
* counter, which could make a PMXEVCNTR_EL0 access UNDEF at
* EL1 instead of being trapped to EL2.
*/
write_sysreg(0, pmselr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
}
static void __hyp_text __deactivate_traps_common(void)
{
write_sysreg(0, hstr_el2);
write_sysreg(0, pmuserenr_el0);
}
static void activate_traps_vhe(struct kvm_vcpu *vcpu)
{
u64 val;
@ -59,71 +98,36 @@ static void __hyp_text __activate_traps_vhe(void)
write_sysreg(kvm_get_hyp_vector(), vbar_el1);
}
static void __hyp_text __activate_traps_nvhe(void)
static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
{
u64 val;
__activate_traps_common(vcpu);
val = CPTR_EL2_DEFAULT;
val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
write_sysreg(val, cptr_el2);
}
static hyp_alternate_select(__activate_traps_arch,
__activate_traps_nvhe, __activate_traps_vhe,
ARM64_HAS_VIRT_HOST_EXTN);
static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
u64 hcr = vcpu->arch.hcr_el2;
/*
* We are about to set CPTR_EL2.TFP to trap all floating point
* register accesses to EL2, however, the ARM ARM clearly states that
* traps are only taken to EL2 if the operation would not otherwise
* trap to EL1. Therefore, always make sure that for 32-bit guests,
* we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
* If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
* it will cause an exception.
*/
val = vcpu->arch.hcr_el2;
write_sysreg(hcr, hcr_el2);
if (!(val & HCR_RW) && system_supports_fpsimd()) {
write_sysreg(1 << 30, fpexc32_el2);
isb();
}
if (val & HCR_RW) /* for AArch64 only: */
val |= HCR_TID3; /* TID3: trap feature register accesses */
write_sysreg(val, hcr_el2);
if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (val & HCR_VSE))
if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
/*
* Make sure we trap PMU access from EL0 to EL2. Also sanitize
* PMSELR_EL0 to make sure it never contains the cycle
* counter, which could make a PMXEVCNTR_EL0 access UNDEF at
* EL1 instead of being trapped to EL2.
*/
write_sysreg(0, pmselr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
__activate_traps_arch()();
__activate_traps_fpsimd32(vcpu);
if (has_vhe())
activate_traps_vhe(vcpu);
else
__activate_traps_nvhe(vcpu);
}
static void __hyp_text __deactivate_traps_vhe(void)
static void deactivate_traps_vhe(void)
{
extern char vectors[]; /* kernel exception vectors */
u64 mdcr_el2 = read_sysreg(mdcr_el2);
mdcr_el2 &= MDCR_EL2_HPMN_MASK |
MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
MDCR_EL2_TPMS;
write_sysreg(mdcr_el2, mdcr_el2);
write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
write_sysreg(vectors, vbar_el1);
@ -133,6 +137,8 @@ static void __hyp_text __deactivate_traps_nvhe(void)
{
u64 mdcr_el2 = read_sysreg(mdcr_el2);
__deactivate_traps_common();
mdcr_el2 &= MDCR_EL2_HPMN_MASK;
mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
@ -141,10 +147,6 @@ static void __hyp_text __deactivate_traps_nvhe(void)
write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
}
static hyp_alternate_select(__deactivate_traps_arch,
__deactivate_traps_nvhe, __deactivate_traps_vhe,
ARM64_HAS_VIRT_HOST_EXTN);
static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
{
/*
@ -156,14 +158,32 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
if (vcpu->arch.hcr_el2 & HCR_VSE)
vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
__deactivate_traps_arch()();
write_sysreg(0, hstr_el2);
write_sysreg(0, pmuserenr_el0);
if (has_vhe())
deactivate_traps_vhe();
else
__deactivate_traps_nvhe();
}
static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
{
__activate_traps_common(vcpu);
}
void deactivate_traps_vhe_put(void)
{
u64 mdcr_el2 = read_sysreg(mdcr_el2);
mdcr_el2 &= MDCR_EL2_HPMN_MASK |
MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
MDCR_EL2_TPMS;
write_sysreg(mdcr_el2, mdcr_el2);
__deactivate_traps_common();
}
static void __hyp_text __activate_vm(struct kvm *kvm)
{
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
write_sysreg(kvm->arch.vttbr, vttbr_el2);
}
@ -172,29 +192,22 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
write_sysreg(0, vttbr_el2);
}
static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
/* Save VGICv3 state on non-VHE systems */
static void __hyp_text __hyp_vgic_save_state(struct kvm_vcpu *vcpu)
{
if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
__vgic_v3_save_state(vcpu);
else
__vgic_v2_save_state(vcpu);
write_sysreg(read_sysreg(hcr_el2) & ~HCR_INT_OVERRIDE, hcr_el2);
__vgic_v3_deactivate_traps(vcpu);
}
}
static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
/* Restore VGICv3 state on non_VEH systems */
static void __hyp_text __hyp_vgic_restore_state(struct kvm_vcpu *vcpu)
{
u64 val;
val = read_sysreg(hcr_el2);
val |= HCR_INT_OVERRIDE;
val |= vcpu->arch.irq_lines;
write_sysreg(val, hcr_el2);
if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
__vgic_v3_activate_traps(vcpu);
__vgic_v3_restore_state(vcpu);
else
__vgic_v2_restore_state(vcpu);
}
}
static bool __hyp_text __true_value(void)
@ -305,54 +318,27 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
}
}
int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
/*
* Return true when we were able to fixup the guest exit and should return to
* the guest, false when we should restore the host state and return to the
* main run loop.
*/
static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
{
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
bool fp_enabled;
u64 exit_code;
vcpu = kern_hyp_va(vcpu);
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = &vcpu->arch.ctxt;
__sysreg_save_host_state(host_ctxt);
__debug_cond_save_host_state(vcpu);
__activate_traps(vcpu);
__activate_vm(vcpu);
__vgic_restore_state(vcpu);
__timer_enable_traps(vcpu);
/*
* We must restore the 32-bit state before the sysregs, thanks
* to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
*/
__sysreg32_restore_state(vcpu);
__sysreg_restore_guest_state(guest_ctxt);
__debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
/* Jump in the fire! */
again:
exit_code = __guest_enter(vcpu, host_ctxt);
/* And we're baaack! */
if (ARM_EXCEPTION_CODE(exit_code) != ARM_EXCEPTION_IRQ)
if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
/*
* We're using the raw exception code in order to only process
* the trap if no SError is pending. We will come back to the
* same PC once the SError has been injected, and replay the
* trapping instruction.
*/
if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
goto again;
if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
return true;
if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
exit_code == ARM_EXCEPTION_TRAP) {
*exit_code == ARM_EXCEPTION_TRAP) {
bool valid;
valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
@ -366,9 +352,9 @@ again:
if (ret == 1) {
if (__skip_instr(vcpu))
goto again;
return true;
else
exit_code = ARM_EXCEPTION_TRAP;
*exit_code = ARM_EXCEPTION_TRAP;
}
if (ret == -1) {
@ -380,29 +366,112 @@ again:
*/
if (!__skip_instr(vcpu))
*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
exit_code = ARM_EXCEPTION_EL1_SERROR;
*exit_code = ARM_EXCEPTION_EL1_SERROR;
}
/* 0 falls through to be handler out of EL2 */
}
}
if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
exit_code == ARM_EXCEPTION_TRAP &&
*exit_code == ARM_EXCEPTION_TRAP &&
(kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
int ret = __vgic_v3_perform_cpuif_access(vcpu);
if (ret == 1) {
if (__skip_instr(vcpu))
goto again;
return true;
else
exit_code = ARM_EXCEPTION_TRAP;
*exit_code = ARM_EXCEPTION_TRAP;
}
/* 0 falls through to be handled out of EL2 */
}
/* Return to the host kernel and handle the exit */
return false;
}
/* Switch to the guest for VHE systems running in EL2 */
int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
bool fp_enabled;
u64 exit_code;
host_ctxt = vcpu->arch.host_cpu_context;
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = &vcpu->arch.ctxt;
sysreg_save_host_state_vhe(host_ctxt);
__activate_traps(vcpu);
__activate_vm(vcpu->kvm);
sysreg_restore_guest_state_vhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
do {
/* Jump in the fire! */
exit_code = __guest_enter(vcpu, host_ctxt);
/* And we're baaack! */
} while (fixup_guest_exit(vcpu, &exit_code));
fp_enabled = fpsimd_enabled_vhe();
sysreg_save_guest_state_vhe(guest_ctxt);
__deactivate_traps(vcpu);
sysreg_restore_host_state_vhe(host_ctxt);
if (fp_enabled) {
__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
__fpsimd_save_fpexc32(vcpu);
}
__debug_switch_to_host(vcpu);
return exit_code;
}
/* Switch to the guest for legacy non-VHE systems */
int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
bool fp_enabled;
u64 exit_code;
vcpu = kern_hyp_va(vcpu);
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = &vcpu->arch.ctxt;
__sysreg_save_state_nvhe(host_ctxt);
__activate_traps(vcpu);
__activate_vm(kern_hyp_va(vcpu->kvm));
__hyp_vgic_restore_state(vcpu);
__timer_enable_traps(vcpu);
/*
* We must restore the 32-bit state before the sysregs, thanks
* to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
*/
__sysreg32_restore_state(vcpu);
__sysreg_restore_state_nvhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
do {
/* Jump in the fire! */
exit_code = __guest_enter(vcpu, host_ctxt);
/* And we're baaack! */
} while (fixup_guest_exit(vcpu, &exit_code));
if (cpus_have_const_cap(ARM64_HARDEN_BP_POST_GUEST_EXIT)) {
u32 midr = read_cpuid_id();
@ -413,29 +482,29 @@ again:
}
}
fp_enabled = __fpsimd_enabled();
fp_enabled = __fpsimd_enabled_nvhe();
__sysreg_save_guest_state(guest_ctxt);
__sysreg_save_state_nvhe(guest_ctxt);
__sysreg32_save_state(vcpu);
__timer_disable_traps(vcpu);
__vgic_save_state(vcpu);
__hyp_vgic_save_state(vcpu);
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);
__sysreg_restore_host_state(host_ctxt);
__sysreg_restore_state_nvhe(host_ctxt);
if (fp_enabled) {
__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
__fpsimd_save_fpexc32(vcpu);
}
__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
/*
* This must come after restoring the host sysregs, since a non-VHE
* system may enable SPE here and make use of the TTBRs.
*/
__debug_cond_restore_host_state(vcpu);
__debug_switch_to_host(vcpu);
return exit_code;
}
@ -443,10 +512,20 @@ again:
static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
struct kvm_vcpu *vcpu)
struct kvm_cpu_context *__host_ctxt)
{
struct kvm_vcpu *vcpu;
unsigned long str_va;
vcpu = __host_ctxt->__hyp_running_vcpu;
if (read_sysreg(vttbr_el2)) {
__timer_disable_traps(vcpu);
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);
__sysreg_restore_state_nvhe(__host_ctxt);
}
/*
* Force the panic string to be loaded from the literal pool,
* making sure it is a kernel address and not a PC-relative
@ -460,40 +539,31 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
read_sysreg(hpfar_el2), par, vcpu);
}
static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
struct kvm_vcpu *vcpu)
static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
struct kvm_cpu_context *host_ctxt)
{
struct kvm_vcpu *vcpu;
vcpu = host_ctxt->__hyp_running_vcpu;
__deactivate_traps(vcpu);
sysreg_restore_host_state_vhe(host_ctxt);
panic(__hyp_panic_string,
spsr, elr,
read_sysreg_el2(esr), read_sysreg_el2(far),
read_sysreg(hpfar_el2), par, vcpu);
}
static hyp_alternate_select(__hyp_call_panic,
__hyp_call_panic_nvhe, __hyp_call_panic_vhe,
ARM64_HAS_VIRT_HOST_EXTN);
void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt)
void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
{
struct kvm_vcpu *vcpu = NULL;
u64 spsr = read_sysreg_el2(spsr);
u64 elr = read_sysreg_el2(elr);
u64 par = read_sysreg(par_el1);
if (read_sysreg(vttbr_el2)) {
struct kvm_cpu_context *host_ctxt;
host_ctxt = kern_hyp_va(__host_ctxt);
vcpu = host_ctxt->__hyp_running_vcpu;
__timer_disable_traps(vcpu);
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);
__sysreg_restore_host_state(host_ctxt);
}
/* Call panic for real */
__hyp_call_panic()(spsr, elr, par, vcpu);
if (!has_vhe())
__hyp_call_panic_nvhe(spsr, elr, par, host_ctxt);
else
__hyp_call_panic_vhe(spsr, elr, par, host_ctxt);
unreachable();
}

View File

@ -19,32 +19,43 @@
#include <linux/kvm_host.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>
/* Yes, this does nothing, on purpose */
static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
/*
* Non-VHE: Both host and guest must save everything.
*
* VHE: Host must save tpidr*_el0, actlr_el1, mdscr_el1, sp_el0,
* and guest must save everything.
* VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and pstate,
* which are handled as part of the el2 return state) on every switch.
* tpidr_el0 and tpidrro_el0 only need to be switched when going
* to host userspace or a different VCPU. EL1 registers only need to be
* switched when potentially going to run a different VCPU. The latter two
* classes are handled as part of kvm_arch_vcpu_load and kvm_arch_vcpu_put.
*/
static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
{
ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);
/*
* The host arm64 Linux uses sp_el0 to point to 'current' and it must
* therefore be saved/restored on every entry/exit to/from the guest.
*/
ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
}
static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
{
ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
}
static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
{
ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
ctxt->sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr);
ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
ctxt->sys_regs[CPACR_EL1] = read_sysreg_el1(cpacr);
ctxt->sys_regs[TTBR0_EL1] = read_sysreg_el1(ttbr0);
ctxt->sys_regs[TTBR1_EL1] = read_sysreg_el1(ttbr1);
@ -64,6 +75,10 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1);
ctxt->gp_regs.elr_el1 = read_sysreg_el1(elr);
ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
}
static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
{
ctxt->gp_regs.regs.pc = read_sysreg_el2(elr);
ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr);
@ -71,36 +86,48 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
ctxt->sys_regs[DISR_EL1] = read_sysreg_s(SYS_VDISR_EL2);
}
static hyp_alternate_select(__sysreg_call_save_host_state,
__sysreg_save_state, __sysreg_do_nothing,
ARM64_HAS_VIRT_HOST_EXTN);
void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
void __hyp_text __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
{
__sysreg_save_el1_state(ctxt);
__sysreg_save_common_state(ctxt);
__sysreg_save_user_state(ctxt);
__sysreg_save_el2_return_state(ctxt);
}
void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt)
{
__sysreg_call_save_host_state()(ctxt);
__sysreg_save_common_state(ctxt);
}
void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt)
{
__sysreg_save_state(ctxt);
__sysreg_save_common_state(ctxt);
__sysreg_save_el2_return_state(ctxt);
}
static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
{
write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);
/*
* The host arm64 Linux uses sp_el0 to point to 'current' and it must
* therefore be saved/restored on every entry/exit to/from the guest.
*/
write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
}
static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
{
write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
}
static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
{
write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1], sctlr);
write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
write_sysreg_el1(ctxt->sys_regs[CPACR_EL1], cpacr);
write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1], ttbr0);
write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1], ttbr1);
@ -120,6 +147,11 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
write_sysreg_el1(ctxt->gp_regs.elr_el1, elr);
write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
}
static void __hyp_text
__sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
{
write_sysreg_el2(ctxt->gp_regs.regs.pc, elr);
write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr);
@ -127,27 +159,30 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
write_sysreg_s(ctxt->sys_regs[DISR_EL1], SYS_VDISR_EL2);
}
static hyp_alternate_select(__sysreg_call_restore_host_state,
__sysreg_restore_state, __sysreg_do_nothing,
ARM64_HAS_VIRT_HOST_EXTN);
void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
void __hyp_text __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
{
__sysreg_restore_el1_state(ctxt);
__sysreg_restore_common_state(ctxt);
__sysreg_restore_user_state(ctxt);
__sysreg_restore_el2_return_state(ctxt);
}
void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt)
{
__sysreg_call_restore_host_state()(ctxt);
__sysreg_restore_common_state(ctxt);
}
void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
{
__sysreg_restore_state(ctxt);
__sysreg_restore_common_state(ctxt);
__sysreg_restore_el2_return_state(ctxt);
}
void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
{
u64 *spsr, *sysreg;
if (read_sysreg(hcr_el2) & HCR_RW)
if (!vcpu_el1_is_32bit(vcpu))
return;
spsr = vcpu->arch.ctxt.gp_regs.spsr;
@ -161,10 +196,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
if (__fpsimd_enabled())
sysreg[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
}
@ -172,7 +204,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
{
u64 *spsr, *sysreg;
if (read_sysreg(hcr_el2) & HCR_RW)
if (!vcpu_el1_is_32bit(vcpu))
return;
spsr = vcpu->arch.ctxt.gp_regs.spsr;
@ -186,6 +218,78 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
}
/**
* kvm_vcpu_load_sysregs - Load guest system registers to the physical CPU
*
* @vcpu: The VCPU pointer
*
* Load system registers that do not affect the host's execution, for
* example EL1 system registers on a VHE system where the host kernel
* runs at EL2. This function is called from KVM's vcpu_load() function
* and loading system register state early avoids having to load them on
* every entry to the VM.
*/
void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
if (!has_vhe())
return;
__sysreg_save_user_state(host_ctxt);
/*
* Load guest EL1 and user state
*
* We must restore the 32-bit state before the sysregs, thanks
* to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
*/
__sysreg32_restore_state(vcpu);
__sysreg_restore_user_state(guest_ctxt);
__sysreg_restore_el1_state(guest_ctxt);
vcpu->arch.sysregs_loaded_on_cpu = true;
activate_traps_vhe_load(vcpu);
}
/**
* kvm_vcpu_put_sysregs - Restore host system registers to the physical CPU
*
* @vcpu: The VCPU pointer
*
* Save guest system registers that do not affect the host's execution, for
* example EL1 system registers on a VHE system where the host kernel
* runs at EL2. This function is called from KVM's vcpu_put() function
* and deferring saving system register state until we're no longer running the
* VCPU avoids having to save them on every exit from the VM.
*/
void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
if (!has_vhe())
return;
deactivate_traps_vhe_put();
__sysreg_save_el1_state(guest_ctxt);
__sysreg_save_user_state(guest_ctxt);
__sysreg32_save_state(vcpu);
/* Restore host user state */
__sysreg_restore_user_state(host_ctxt);
vcpu->arch.sysregs_loaded_on_cpu = false;
}
void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
{
asm("msr tpidr_el2, %0": : "r" (tpidr_el2));
}

View File

@ -0,0 +1,78 @@
/*
* Copyright (C) 2012-2015 - ARM Ltd
* Author: Marc Zyngier <marc.zyngier@arm.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include <linux/compiler.h>
#include <linux/irqchip/arm-gic.h>
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
/*
* __vgic_v2_perform_cpuif_access -- perform a GICV access on behalf of the
* guest.
*
* @vcpu: the offending vcpu
*
* Returns:
* 1: GICV access successfully performed
* 0: Not a GICV access
* -1: Illegal GICV access
*/
int __hyp_text __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
struct vgic_dist *vgic = &kvm->arch.vgic;
phys_addr_t fault_ipa;
void __iomem *addr;
int rd;
/* Build the full address */
fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
/* If not for GICV, move on */
if (fault_ipa < vgic->vgic_cpu_base ||
fault_ipa >= (vgic->vgic_cpu_base + KVM_VGIC_V2_CPU_SIZE))
return 0;
/* Reject anything but a 32bit access */
if (kvm_vcpu_dabt_get_as(vcpu) != sizeof(u32))
return -1;
/* Not aligned? Don't bother */
if (fault_ipa & 3)
return -1;
rd = kvm_vcpu_dabt_get_rd(vcpu);
addr = hyp_symbol_addr(kvm_vgic_global_state)->vcpu_hyp_va;
addr += fault_ipa - vgic->vgic_cpu_base;
if (kvm_vcpu_dabt_iswrite(vcpu)) {
u32 data = vcpu_data_guest_to_host(vcpu,
vcpu_get_reg(vcpu, rd),
sizeof(u32));
writel_relaxed(data, addr);
} else {
u32 data = readl_relaxed(addr);
vcpu_set_reg(vcpu, rd, vcpu_data_host_to_guest(vcpu, data,
sizeof(u32)));
}
return 1;
}

View File

@ -58,7 +58,7 @@ static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
exc_offset = LOWER_EL_AArch32_VECTOR;
}
return vcpu_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
return vcpu_read_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
}
static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
@ -67,13 +67,13 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
u32 esr = 0;
*vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
*vcpu_spsr(vcpu) = cpsr;
vcpu_write_spsr(vcpu, cpsr);
vcpu_sys_reg(vcpu, FAR_EL1) = addr;
vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
/*
* Build an {i,d}abort, depending on the level and the
@ -94,7 +94,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
if (!is_iabt)
esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
vcpu_sys_reg(vcpu, ESR_EL1) = esr | ESR_ELx_FSC_EXTABT;
vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
}
static void inject_undef64(struct kvm_vcpu *vcpu)
@ -102,11 +102,11 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
unsigned long cpsr = *vcpu_cpsr(vcpu);
u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
*vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
*vcpu_spsr(vcpu) = cpsr;
vcpu_write_spsr(vcpu, cpsr);
/*
* Build an unknown exception, depending on the instruction
@ -115,7 +115,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
if (kvm_vcpu_trap_il_is32bit(vcpu))
esr |= ESR_ELx_IL;
vcpu_sys_reg(vcpu, ESR_EL1) = esr;
vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
}
/**
@ -128,7 +128,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
*/
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
{
if (!(vcpu->arch.hcr_el2 & HCR_RW))
if (vcpu_el1_is_32bit(vcpu))
kvm_inject_dabt32(vcpu, addr);
else
inject_abt64(vcpu, false, addr);
@ -144,7 +144,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
*/
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
{
if (!(vcpu->arch.hcr_el2 & HCR_RW))
if (vcpu_el1_is_32bit(vcpu))
kvm_inject_pabt32(vcpu, addr);
else
inject_abt64(vcpu, true, addr);
@ -158,7 +158,7 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
*/
void kvm_inject_undefined(struct kvm_vcpu *vcpu)
{
if (!(vcpu->arch.hcr_el2 & HCR_RW))
if (vcpu_el1_is_32bit(vcpu))
kvm_inject_undef32(vcpu);
else
inject_undef64(vcpu);
@ -167,7 +167,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
{
vcpu_set_vsesr(vcpu, esr);
vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
*vcpu_hcr(vcpu) |= HCR_VSE;
}
/**

View File

@ -141,28 +141,61 @@ unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
/*
* Return the SPSR for the current mode of the virtual CPU.
*/
unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu)
static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
{
unsigned long mode = *vcpu_cpsr(vcpu) & COMPAT_PSR_MODE_MASK;
switch (mode) {
case COMPAT_PSR_MODE_SVC:
mode = KVM_SPSR_SVC;
break;
case COMPAT_PSR_MODE_ABT:
mode = KVM_SPSR_ABT;
break;
case COMPAT_PSR_MODE_UND:
mode = KVM_SPSR_UND;
break;
case COMPAT_PSR_MODE_IRQ:
mode = KVM_SPSR_IRQ;
break;
case COMPAT_PSR_MODE_FIQ:
mode = KVM_SPSR_FIQ;
break;
case COMPAT_PSR_MODE_SVC: return KVM_SPSR_SVC;
case COMPAT_PSR_MODE_ABT: return KVM_SPSR_ABT;
case COMPAT_PSR_MODE_UND: return KVM_SPSR_UND;
case COMPAT_PSR_MODE_IRQ: return KVM_SPSR_IRQ;
case COMPAT_PSR_MODE_FIQ: return KVM_SPSR_FIQ;
default: BUG();
}
}
unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
{
int spsr_idx = vcpu_spsr32_mode(vcpu);
if (!vcpu->arch.sysregs_loaded_on_cpu)
return vcpu_gp_regs(vcpu)->spsr[spsr_idx];
switch (spsr_idx) {
case KVM_SPSR_SVC:
return read_sysreg_el1(spsr);
case KVM_SPSR_ABT:
return read_sysreg(spsr_abt);
case KVM_SPSR_UND:
return read_sysreg(spsr_und);
case KVM_SPSR_IRQ:
return read_sysreg(spsr_irq);
case KVM_SPSR_FIQ:
return read_sysreg(spsr_fiq);
default:
BUG();
}
return (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[mode];
}
void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
{
int spsr_idx = vcpu_spsr32_mode(vcpu);
if (!vcpu->arch.sysregs_loaded_on_cpu) {
vcpu_gp_regs(vcpu)->spsr[spsr_idx] = v;
return;
}
switch (spsr_idx) {
case KVM_SPSR_SVC:
write_sysreg_el1(v, spsr);
case KVM_SPSR_ABT:
write_sysreg(v, spsr_abt);
case KVM_SPSR_UND:
write_sysreg(v, spsr_und);
case KVM_SPSR_IRQ:
write_sysreg(v, spsr_irq);
case KVM_SPSR_FIQ:
write_sysreg(v, spsr_fiq);
}
}

View File

@ -35,6 +35,7 @@
#include <asm/kvm_coproc.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_host.h>
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
#include <asm/perf_event.h>
#include <asm/sysreg.h>
@ -76,6 +77,93 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
return false;
}
u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
{
if (!vcpu->arch.sysregs_loaded_on_cpu)
goto immediate_read;
/*
* System registers listed in the switch are not saved on every
* exit from the guest but are only saved on vcpu_put.
*
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the guest cannot modify its
* own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
* thread when emulating cross-VCPU communication.
*/
switch (reg) {
case CSSELR_EL1: return read_sysreg_s(SYS_CSSELR_EL1);
case SCTLR_EL1: return read_sysreg_s(sctlr_EL12);
case ACTLR_EL1: return read_sysreg_s(SYS_ACTLR_EL1);
case CPACR_EL1: return read_sysreg_s(cpacr_EL12);
case TTBR0_EL1: return read_sysreg_s(ttbr0_EL12);
case TTBR1_EL1: return read_sysreg_s(ttbr1_EL12);
case TCR_EL1: return read_sysreg_s(tcr_EL12);
case ESR_EL1: return read_sysreg_s(esr_EL12);
case AFSR0_EL1: return read_sysreg_s(afsr0_EL12);
case AFSR1_EL1: return read_sysreg_s(afsr1_EL12);
case FAR_EL1: return read_sysreg_s(far_EL12);
case MAIR_EL1: return read_sysreg_s(mair_EL12);
case VBAR_EL1: return read_sysreg_s(vbar_EL12);
case CONTEXTIDR_EL1: return read_sysreg_s(contextidr_EL12);
case TPIDR_EL0: return read_sysreg_s(SYS_TPIDR_EL0);
case TPIDRRO_EL0: return read_sysreg_s(SYS_TPIDRRO_EL0);
case TPIDR_EL1: return read_sysreg_s(SYS_TPIDR_EL1);
case AMAIR_EL1: return read_sysreg_s(amair_EL12);
case CNTKCTL_EL1: return read_sysreg_s(cntkctl_EL12);
case PAR_EL1: return read_sysreg_s(SYS_PAR_EL1);
case DACR32_EL2: return read_sysreg_s(SYS_DACR32_EL2);
case IFSR32_EL2: return read_sysreg_s(SYS_IFSR32_EL2);
case DBGVCR32_EL2: return read_sysreg_s(SYS_DBGVCR32_EL2);
}
immediate_read:
return __vcpu_sys_reg(vcpu, reg);
}
void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
{
if (!vcpu->arch.sysregs_loaded_on_cpu)
goto immediate_write;
/*
* System registers listed in the switch are not restored on every
* entry to the guest but are only restored on vcpu_load.
*
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the the MPIDR should only be
* set once, before running the VCPU, and never changed later.
*/
switch (reg) {
case CSSELR_EL1: write_sysreg_s(val, SYS_CSSELR_EL1); return;
case SCTLR_EL1: write_sysreg_s(val, sctlr_EL12); return;
case ACTLR_EL1: write_sysreg_s(val, SYS_ACTLR_EL1); return;
case CPACR_EL1: write_sysreg_s(val, cpacr_EL12); return;
case TTBR0_EL1: write_sysreg_s(val, ttbr0_EL12); return;
case TTBR1_EL1: write_sysreg_s(val, ttbr1_EL12); return;
case TCR_EL1: write_sysreg_s(val, tcr_EL12); return;
case ESR_EL1: write_sysreg_s(val, esr_EL12); return;
case AFSR0_EL1: write_sysreg_s(val, afsr0_EL12); return;
case AFSR1_EL1: write_sysreg_s(val, afsr1_EL12); return;
case FAR_EL1: write_sysreg_s(val, far_EL12); return;
case MAIR_EL1: write_sysreg_s(val, mair_EL12); return;
case VBAR_EL1: write_sysreg_s(val, vbar_EL12); return;
case CONTEXTIDR_EL1: write_sysreg_s(val, contextidr_EL12); return;
case TPIDR_EL0: write_sysreg_s(val, SYS_TPIDR_EL0); return;
case TPIDRRO_EL0: write_sysreg_s(val, SYS_TPIDRRO_EL0); return;
case TPIDR_EL1: write_sysreg_s(val, SYS_TPIDR_EL1); return;
case AMAIR_EL1: write_sysreg_s(val, amair_EL12); return;
case CNTKCTL_EL1: write_sysreg_s(val, cntkctl_EL12); return;
case PAR_EL1: write_sysreg_s(val, SYS_PAR_EL1); return;
case DACR32_EL2: write_sysreg_s(val, SYS_DACR32_EL2); return;
case IFSR32_EL2: write_sysreg_s(val, SYS_IFSR32_EL2); return;
case DBGVCR32_EL2: write_sysreg_s(val, SYS_DBGVCR32_EL2); return;
}
immediate_write:
__vcpu_sys_reg(vcpu, reg) = val;
}
/* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
static u32 cache_levels;
@ -121,16 +209,26 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r)
{
bool was_enabled = vcpu_has_cache_enabled(vcpu);
u64 val;
int reg = r->reg;
BUG_ON(!p->is_write);
if (!p->is_aarch32) {
vcpu_sys_reg(vcpu, r->reg) = p->regval;
/* See the 32bit mapping in kvm_host.h */
if (p->is_aarch32)
reg = r->reg / 2;
if (!p->is_aarch32 || !p->is_32bit) {
val = p->regval;
} else {
if (!p->is_32bit)
vcpu_cp15_64_high(vcpu, r->reg) = upper_32_bits(p->regval);
vcpu_cp15_64_low(vcpu, r->reg) = lower_32_bits(p->regval);
val = vcpu_read_sys_reg(vcpu, reg);
if (r->reg % 2)
val = (p->regval << 32) | (u64)lower_32_bits(val);
else
val = ((u64)upper_32_bits(val) << 32) |
lower_32_bits(p->regval);
}
vcpu_write_sys_reg(vcpu, val, reg);
kvm_toggle_cache(vcpu, was_enabled);
return true;
@ -175,6 +273,14 @@ static bool trap_raz_wi(struct kvm_vcpu *vcpu,
return read_zero(vcpu, p);
}
static bool trap_undef(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
kvm_inject_undefined(vcpu);
return false;
}
static bool trap_oslsr_el1(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
@ -231,10 +337,10 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r)
{
if (p->is_write) {
vcpu_sys_reg(vcpu, r->reg) = p->regval;
vcpu_write_sys_reg(vcpu, p->regval, r->reg);
vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
} else {
p->regval = vcpu_sys_reg(vcpu, r->reg);
p->regval = vcpu_read_sys_reg(vcpu, r->reg);
}
trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
@ -447,7 +553,8 @@ static void reset_wcr(struct kvm_vcpu *vcpu,
static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{
vcpu_sys_reg(vcpu, AMAIR_EL1) = read_sysreg(amair_el1);
u64 amair = read_sysreg(amair_el1);
vcpu_write_sys_reg(vcpu, amair, AMAIR_EL1);
}
static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
@ -464,7 +571,7 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
}
static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
@ -478,12 +585,12 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
*/
val = ((pmcr & ~ARMV8_PMU_PMCR_MASK)
| (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) & (~ARMV8_PMU_PMCR_E);
vcpu_sys_reg(vcpu, PMCR_EL0) = val;
__vcpu_sys_reg(vcpu, PMCR_EL0) = val;
}
static bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags)
{
u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
u64 reg = __vcpu_sys_reg(vcpu, PMUSERENR_EL0);
bool enabled = (reg & flags) || vcpu_mode_priv(vcpu);
if (!enabled)
@ -525,14 +632,14 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (p->is_write) {
/* Only update writeable bits of PMCR */
val = vcpu_sys_reg(vcpu, PMCR_EL0);
val = __vcpu_sys_reg(vcpu, PMCR_EL0);
val &= ~ARMV8_PMU_PMCR_MASK;
val |= p->regval & ARMV8_PMU_PMCR_MASK;
vcpu_sys_reg(vcpu, PMCR_EL0) = val;
__vcpu_sys_reg(vcpu, PMCR_EL0) = val;
kvm_pmu_handle_pmcr(vcpu, val);
} else {
/* PMCR.P & PMCR.C are RAZ */
val = vcpu_sys_reg(vcpu, PMCR_EL0)
val = __vcpu_sys_reg(vcpu, PMCR_EL0)
& ~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C);
p->regval = val;
}
@ -550,10 +657,10 @@ static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return false;
if (p->is_write)
vcpu_sys_reg(vcpu, PMSELR_EL0) = p->regval;
__vcpu_sys_reg(vcpu, PMSELR_EL0) = p->regval;
else
/* return PMSELR.SEL field */
p->regval = vcpu_sys_reg(vcpu, PMSELR_EL0)
p->regval = __vcpu_sys_reg(vcpu, PMSELR_EL0)
& ARMV8_PMU_COUNTER_MASK;
return true;
@ -586,7 +693,7 @@ static bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx)
{
u64 pmcr, val;
pmcr = vcpu_sys_reg(vcpu, PMCR_EL0);
pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0);
val = (pmcr >> ARMV8_PMU_PMCR_N_SHIFT) & ARMV8_PMU_PMCR_N_MASK;
if (idx >= val && idx != ARMV8_PMU_CYCLE_IDX) {
kvm_inject_undefined(vcpu);
@ -611,7 +718,7 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
if (pmu_access_event_counter_el0_disabled(vcpu))
return false;
idx = vcpu_sys_reg(vcpu, PMSELR_EL0)
idx = __vcpu_sys_reg(vcpu, PMSELR_EL0)
& ARMV8_PMU_COUNTER_MASK;
} else if (r->Op2 == 0) {
/* PMCCNTR_EL0 */
@ -666,7 +773,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) {
/* PMXEVTYPER_EL0 */
idx = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK;
idx = __vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK;
reg = PMEVTYPER0_EL0 + idx;
} else if (r->CRn == 14 && (r->CRm & 12) == 12) {
idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
@ -684,9 +791,9 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (p->is_write) {
kvm_pmu_set_counter_event_type(vcpu, p->regval, idx);
vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_PMU_EVTYPE_MASK;
__vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_PMU_EVTYPE_MASK;
} else {
p->regval = vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_MASK;
p->regval = __vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_MASK;
}
return true;
@ -708,15 +815,15 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
val = p->regval & mask;
if (r->Op2 & 0x1) {
/* accessing PMCNTENSET_EL0 */
vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val;
__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val;
kvm_pmu_enable_counter(vcpu, val);
} else {
/* accessing PMCNTENCLR_EL0 */
vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
kvm_pmu_disable_counter(vcpu, val);
}
} else {
p->regval = vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
}
return true;
@ -740,12 +847,12 @@ static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (r->Op2 & 0x1)
/* accessing PMINTENSET_EL1 */
vcpu_sys_reg(vcpu, PMINTENSET_EL1) |= val;
__vcpu_sys_reg(vcpu, PMINTENSET_EL1) |= val;
else
/* accessing PMINTENCLR_EL1 */
vcpu_sys_reg(vcpu, PMINTENSET_EL1) &= ~val;
__vcpu_sys_reg(vcpu, PMINTENSET_EL1) &= ~val;
} else {
p->regval = vcpu_sys_reg(vcpu, PMINTENSET_EL1) & mask;
p->regval = __vcpu_sys_reg(vcpu, PMINTENSET_EL1) & mask;
}
return true;
@ -765,12 +872,12 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (p->is_write) {
if (r->CRm & 0x2)
/* accessing PMOVSSET_EL0 */
vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= (p->regval & mask);
__vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= (p->regval & mask);
else
/* accessing PMOVSCLR_EL0 */
vcpu_sys_reg(vcpu, PMOVSSET_EL0) &= ~(p->regval & mask);
__vcpu_sys_reg(vcpu, PMOVSSET_EL0) &= ~(p->regval & mask);
} else {
p->regval = vcpu_sys_reg(vcpu, PMOVSSET_EL0) & mask;
p->regval = __vcpu_sys_reg(vcpu, PMOVSSET_EL0) & mask;
}
return true;
@ -807,10 +914,10 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return false;
}
vcpu_sys_reg(vcpu, PMUSERENR_EL0) = p->regval
& ARMV8_PMU_USERENR_MASK;
__vcpu_sys_reg(vcpu, PMUSERENR_EL0) =
p->regval & ARMV8_PMU_USERENR_MASK;
} else {
p->regval = vcpu_sys_reg(vcpu, PMUSERENR_EL0)
p->regval = __vcpu_sys_reg(vcpu, PMUSERENR_EL0)
& ARMV8_PMU_USERENR_MASK;
}
@ -893,6 +1000,12 @@ static u64 read_id_reg(struct sys_reg_desc const *r, bool raz)
task_pid_nr(current));
val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
} else if (id == SYS_ID_AA64MMFR1_EL1) {
if (val & (0xfUL << ID_AA64MMFR1_LOR_SHIFT))
pr_err_once("kvm [%i]: LORegions unsupported for guests, suppressing\n",
task_pid_nr(current));
val &= ~(0xfUL << ID_AA64MMFR1_LOR_SHIFT);
}
return val;
@ -1178,6 +1291,12 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_MAIR_EL1), access_vm_reg, reset_unknown, MAIR_EL1 },
{ SYS_DESC(SYS_AMAIR_EL1), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
{ SYS_DESC(SYS_LORSA_EL1), trap_undef },
{ SYS_DESC(SYS_LOREA_EL1), trap_undef },
{ SYS_DESC(SYS_LORN_EL1), trap_undef },
{ SYS_DESC(SYS_LORC_EL1), trap_undef },
{ SYS_DESC(SYS_LORID_EL1), trap_undef },
{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
@ -1545,6 +1664,11 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn(13), CRm( 0), Op2( 1), access_vm_reg, NULL, c13_CID },
/* CNTP_TVAL */
{ Op1( 0), CRn(14), CRm( 2), Op2( 0), access_cntp_tval },
/* CNTP_CTL */
{ Op1( 0), CRn(14), CRm( 2), Op2( 1), access_cntp_ctl },
/* PMEVCNTRn */
PMU_PMEVCNTR(0),
PMU_PMEVCNTR(1),
@ -1618,6 +1742,7 @@ static const struct sys_reg_desc cp15_64_regs[] = {
{ Op1( 0), CRn( 0), CRm( 9), Op2( 0), access_pmu_evcntr },
{ Op1( 0), CRn( 0), CRm(12), Op2( 0), access_gic_sgi },
{ Op1( 1), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, c2_TTBR1 },
{ Op1( 2), CRn( 0), CRm(14), Op2( 0), access_cntp_cval },
};
/* Target specific emulation tables */
@ -2194,7 +2319,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
if (r->get_user)
return (r->get_user)(vcpu, r, reg, uaddr);
return reg_to_user(uaddr, &vcpu_sys_reg(vcpu, r->reg), reg->id);
return reg_to_user(uaddr, &__vcpu_sys_reg(vcpu, r->reg), reg->id);
}
int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@ -2215,7 +2340,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
if (r->set_user)
return (r->set_user)(vcpu, r, reg, uaddr);
return reg_from_user(&vcpu_sys_reg(vcpu, r->reg), uaddr, reg->id);
return reg_from_user(&__vcpu_sys_reg(vcpu, r->reg), uaddr, reg->id);
}
static unsigned int num_demux_regs(void)
@ -2421,6 +2546,6 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
reset_sys_reg_descs(vcpu, table, num);
for (num = 1; num < NR_SYS_REGS; num++)
if (vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
panic("Didn't reset vcpu_sys_reg(%zi)", num);
if (__vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
panic("Didn't reset __vcpu_sys_reg(%zi)", num);
}

View File

@ -89,14 +89,14 @@ static inline void reset_unknown(struct kvm_vcpu *vcpu,
{
BUG_ON(!r->reg);
BUG_ON(r->reg >= NR_SYS_REGS);
vcpu_sys_reg(vcpu, r->reg) = 0x1de7ec7edbadc0deULL;
__vcpu_sys_reg(vcpu, r->reg) = 0x1de7ec7edbadc0deULL;
}
static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{
BUG_ON(!r->reg);
BUG_ON(r->reg >= NR_SYS_REGS);
vcpu_sys_reg(vcpu, r->reg) = r->val;
__vcpu_sys_reg(vcpu, r->reg) = r->val;
}
static inline int cmp_sys_reg(const struct sys_reg_desc *i1,

View File

@ -38,13 +38,13 @@ static bool access_actlr(struct kvm_vcpu *vcpu,
if (p->is_write)
return ignore_write(vcpu, p);
p->regval = vcpu_sys_reg(vcpu, ACTLR_EL1);
p->regval = vcpu_read_sys_reg(vcpu, ACTLR_EL1);
return true;
}
static void reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{
vcpu_sys_reg(vcpu, ACTLR_EL1) = read_sysreg(actlr_el1);
__vcpu_sys_reg(vcpu, ACTLR_EL1) = read_sysreg(actlr_el1);
}
/*

227
arch/arm64/kvm/va_layout.c Normal file
View File

@ -0,0 +1,227 @@
/*
* Copyright (C) 2017 ARM Ltd.
* Author: Marc Zyngier <marc.zyngier@arm.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include <linux/kvm_host.h>
#include <linux/random.h>
#include <linux/memblock.h>
#include <asm/alternative.h>
#include <asm/debug-monitors.h>
#include <asm/insn.h>
#include <asm/kvm_mmu.h>
/*
* The LSB of the random hyp VA tag or 0 if no randomization is used.
*/
static u8 tag_lsb;
/*
* The random hyp VA tag value with the region bit if hyp randomization is used
*/
static u64 tag_val;
static u64 va_mask;
static void compute_layout(void)
{
phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
u64 hyp_va_msb;
int kva_msb;
/* Where is my RAM region? */
hyp_va_msb = idmap_addr & BIT(VA_BITS - 1);
hyp_va_msb ^= BIT(VA_BITS - 1);
kva_msb = fls64((u64)phys_to_virt(memblock_start_of_DRAM()) ^
(u64)(high_memory - 1));
if (kva_msb == (VA_BITS - 1)) {
/*
* No space in the address, let's compute the mask so
* that it covers (VA_BITS - 1) bits, and the region
* bit. The tag stays set to zero.
*/
va_mask = BIT(VA_BITS - 1) - 1;
va_mask |= hyp_va_msb;
} else {
/*
* We do have some free bits to insert a random tag.
* Hyp VAs are now created from kernel linear map VAs
* using the following formula (with V == VA_BITS):
*
* 63 ... V | V-1 | V-2 .. tag_lsb | tag_lsb - 1 .. 0
* ---------------------------------------------------------
* | 0000000 | hyp_va_msb | random tag | kern linear VA |
*/
tag_lsb = kva_msb;
va_mask = GENMASK_ULL(tag_lsb - 1, 0);
tag_val = get_random_long() & GENMASK_ULL(VA_BITS - 2, tag_lsb);
tag_val |= hyp_va_msb;
tag_val >>= tag_lsb;
}
}
static u32 compute_instruction(int n, u32 rd, u32 rn)
{
u32 insn = AARCH64_BREAK_FAULT;
switch (n) {
case 0:
insn = aarch64_insn_gen_logical_immediate(AARCH64_INSN_LOGIC_AND,
AARCH64_INSN_VARIANT_64BIT,
rn, rd, va_mask);
break;
case 1:
/* ROR is a variant of EXTR with Rm = Rn */
insn = aarch64_insn_gen_extr(AARCH64_INSN_VARIANT_64BIT,
rn, rn, rd,
tag_lsb);
break;
case 2:
insn = aarch64_insn_gen_add_sub_imm(rd, rn,
tag_val & GENMASK(11, 0),
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_ADSB_ADD);
break;
case 3:
insn = aarch64_insn_gen_add_sub_imm(rd, rn,
tag_val & GENMASK(23, 12),
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_ADSB_ADD);
break;
case 4:
/* ROR is a variant of EXTR with Rm = Rn */
insn = aarch64_insn_gen_extr(AARCH64_INSN_VARIANT_64BIT,
rn, rn, rd, 64 - tag_lsb);
break;
}
return insn;
}
void __init kvm_update_va_mask(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst)
{
int i;
BUG_ON(nr_inst != 5);
if (!has_vhe() && !va_mask)
compute_layout();
for (i = 0; i < nr_inst; i++) {
u32 rd, rn, insn, oinsn;
/*
* VHE doesn't need any address translation, let's NOP
* everything.
*
* Alternatively, if we don't have any spare bits in
* the address, NOP everything after masking that
* kernel VA.
*/
if (has_vhe() || (!tag_lsb && i > 0)) {
updptr[i] = cpu_to_le32(aarch64_insn_gen_nop());
continue;
}
oinsn = le32_to_cpu(origptr[i]);
rd = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD, oinsn);
rn = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RN, oinsn);
insn = compute_instruction(i, rd, rn);
BUG_ON(insn == AARCH64_BREAK_FAULT);
updptr[i] = cpu_to_le32(insn);
}
}
void *__kvm_bp_vect_base;
int __kvm_harden_el2_vector_slot;
void kvm_patch_vector_branch(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst)
{
u64 addr;
u32 insn;
BUG_ON(nr_inst != 5);
if (has_vhe() || !cpus_have_const_cap(ARM64_HARDEN_EL2_VECTORS)) {
WARN_ON_ONCE(cpus_have_const_cap(ARM64_HARDEN_EL2_VECTORS));
return;
}
if (!va_mask)
compute_layout();
/*
* Compute HYP VA by using the same computation as kern_hyp_va()
*/
addr = (uintptr_t)kvm_ksym_ref(__kvm_hyp_vector);
addr &= va_mask;
addr |= tag_val << tag_lsb;
/* Use PC[10:7] to branch to the same vector in KVM */
addr |= ((u64)origptr & GENMASK_ULL(10, 7));
/*
* Branch to the second instruction in the vectors in order to
* avoid the initial store on the stack (which we already
* perform in the hardening vectors).
*/
addr += AARCH64_INSN_SIZE;
/* stp x0, x1, [sp, #-16]! */
insn = aarch64_insn_gen_load_store_pair(AARCH64_INSN_REG_0,
AARCH64_INSN_REG_1,
AARCH64_INSN_REG_SP,
-16,
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX);
*updptr++ = cpu_to_le32(insn);
/* movz x0, #(addr & 0xffff) */
insn = aarch64_insn_gen_movewide(AARCH64_INSN_REG_0,
(u16)addr,
0,
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_MOVEWIDE_ZERO);
*updptr++ = cpu_to_le32(insn);
/* movk x0, #((addr >> 16) & 0xffff), lsl #16 */
insn = aarch64_insn_gen_movewide(AARCH64_INSN_REG_0,
(u16)(addr >> 16),
16,
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_MOVEWIDE_KEEP);
*updptr++ = cpu_to_le32(insn);
/* movk x0, #((addr >> 32) & 0xffff), lsl #32 */
insn = aarch64_insn_gen_movewide(AARCH64_INSN_REG_0,
(u16)(addr >> 32),
32,
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_MOVEWIDE_KEEP);
*updptr++ = cpu_to_le32(insn);
/* br x0 */
insn = aarch64_insn_gen_branch_reg(AARCH64_INSN_REG_0,
AARCH64_INSN_BRANCH_NOLINK);
*updptr++ = cpu_to_le32(insn);
}

View File

@ -94,6 +94,11 @@ static inline unsigned int kvm_arch_para_features(void)
return 0;
}
static inline unsigned int kvm_arch_para_hints(void)
{
return 0;
}
#ifdef CONFIG_MIPS_PARAVIRT
static inline bool kvm_para_available(void)
{

View File

@ -60,7 +60,6 @@
#define KVM_ARCH_WANT_MMU_NOTIFIER
extern int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
extern int kvm_unmap_hva_range(struct kvm *kvm,
unsigned long start, unsigned long end);
extern int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);

View File

@ -61,6 +61,11 @@ static inline unsigned int kvm_arch_para_features(void)
return r;
}
static inline unsigned int kvm_arch_para_hints(void)
{
return 0;
}
static inline bool kvm_check_and_clear_guest_paused(void)
{
return false;

View File

@ -295,7 +295,6 @@ struct kvmppc_ops {
const struct kvm_userspace_memory_region *mem,
const struct kvm_memory_slot *old,
const struct kvm_memory_slot *new);
int (*unmap_hva)(struct kvm *kvm, unsigned long hva);
int (*unmap_hva_range)(struct kvm *kvm, unsigned long start,
unsigned long end);
int (*age_hva)(struct kvm *kvm, unsigned long start, unsigned long end);

View File

@ -819,12 +819,6 @@ void kvmppc_core_commit_memory_region(struct kvm *kvm,
kvm->arch.kvm_ops->commit_memory_region(kvm, mem, old, new);
}
int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
{
return kvm->arch.kvm_ops->unmap_hva(kvm, hva);
}
EXPORT_SYMBOL_GPL(kvm_unmap_hva);
int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end)
{
return kvm->arch.kvm_ops->unmap_hva_range(kvm, start, end);

View File

@ -14,7 +14,6 @@
extern void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
struct kvm_memory_slot *memslot);
extern int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva);
extern int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start,
unsigned long end);
extern int kvm_age_hva_hv(struct kvm *kvm, unsigned long start,

View File

@ -877,15 +877,6 @@ static int kvm_unmap_rmapp(struct kvm *kvm, struct kvm_memory_slot *memslot,
return 0;
}
int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva)
{
hva_handler_fn handler;
handler = kvm_is_radix(kvm) ? kvm_unmap_radix : kvm_unmap_rmapp;
kvm_handle_hva(kvm, hva, handler);
return 0;
}
int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned long end)
{
hva_handler_fn handler;

View File

@ -150,7 +150,9 @@ static void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr,
{
int psize = MMU_BASE_PSIZE;
if (pshift >= PMD_SHIFT)
if (pshift >= PUD_SHIFT)
psize = MMU_PAGE_1G;
else if (pshift >= PMD_SHIFT)
psize = MMU_PAGE_2M;
addr &= ~0xfffUL;
addr |= mmu_psize_defs[psize].ap << 5;
@ -163,6 +165,17 @@ static void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr,
asm volatile("ptesync": : :"memory");
}
static void kvmppc_radix_flush_pwc(struct kvm *kvm, unsigned long addr)
{
unsigned long rb = 0x2 << PPC_BITLSHIFT(53); /* IS = 2 */
asm volatile("ptesync": : :"memory");
/* RIC=1 PRS=0 R=1 IS=2 */
asm volatile(PPC_TLBIE_5(%0, %1, 1, 0, 1)
: : "r" (rb), "r" (kvm->arch.lpid) : "memory");
asm volatile("ptesync": : :"memory");
}
unsigned long kvmppc_radix_update_pte(struct kvm *kvm, pte_t *ptep,
unsigned long clr, unsigned long set,
unsigned long addr, unsigned int shift)
@ -223,9 +236,9 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
new_pud = pud_alloc_one(kvm->mm, gpa);
pmd = NULL;
if (pud && pud_present(*pud))
if (pud && pud_present(*pud) && !pud_huge(*pud))
pmd = pmd_offset(pud, gpa);
else
else if (level <= 1)
new_pmd = pmd_alloc_one(kvm->mm, gpa);
if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
@ -246,6 +259,50 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
new_pud = NULL;
}
pud = pud_offset(pgd, gpa);
if (pud_huge(*pud)) {
unsigned long hgpa = gpa & PUD_MASK;
/*
* If we raced with another CPU which has just put
* a 1GB pte in after we saw a pmd page, try again.
*/
if (level <= 1 && !new_pmd) {
ret = -EAGAIN;
goto out_unlock;
}
/* Check if we raced and someone else has set the same thing */
if (level == 2 && pud_raw(*pud) == pte_raw(pte)) {
ret = 0;
goto out_unlock;
}
/* Valid 1GB page here already, remove it */
old = kvmppc_radix_update_pte(kvm, (pte_t *)pud,
~0UL, 0, hgpa, PUD_SHIFT);
kvmppc_radix_tlbie_page(kvm, hgpa, PUD_SHIFT);
if (old & _PAGE_DIRTY) {
unsigned long gfn = hgpa >> PAGE_SHIFT;
struct kvm_memory_slot *memslot;
memslot = gfn_to_memslot(kvm, gfn);
if (memslot && memslot->dirty_bitmap)
kvmppc_update_dirty_map(memslot,
gfn, PUD_SIZE);
}
}
if (level == 2) {
if (!pud_none(*pud)) {
/*
* There's a page table page here, but we wanted to
* install a large page, so remove and free the page
* table page. new_pmd will be NULL since level == 2.
*/
new_pmd = pmd_offset(pud, 0);
pud_clear(pud);
kvmppc_radix_flush_pwc(kvm, gpa);
}
kvmppc_radix_set_pte_at(kvm, gpa, (pte_t *)pud, pte);
ret = 0;
goto out_unlock;
}
if (pud_none(*pud)) {
if (!new_pmd)
goto out_unlock;
@ -264,6 +321,11 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
ret = -EAGAIN;
goto out_unlock;
}
/* Check if we raced and someone else has set the same thing */
if (level == 1 && pmd_raw(*pmd) == pte_raw(pte)) {
ret = 0;
goto out_unlock;
}
/* Valid 2MB page here already, remove it */
old = kvmppc_radix_update_pte(kvm, pmdp_ptep(pmd),
~0UL, 0, lgpa, PMD_SHIFT);
@ -276,35 +338,43 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
kvmppc_update_dirty_map(memslot,
gfn, PMD_SIZE);
}
} else if (level == 1 && !pmd_none(*pmd)) {
/*
* There's a page table page here, but we wanted
* to install a large page. Tell the caller and let
* it try installing a normal page if it wants.
*/
ret = -EBUSY;
}
if (level == 1) {
if (!pmd_none(*pmd)) {
/*
* There's a page table page here, but we wanted to
* install a large page, so remove and free the page
* table page. new_ptep will be NULL since level == 1.
*/
new_ptep = pte_offset_kernel(pmd, 0);
pmd_clear(pmd);
kvmppc_radix_flush_pwc(kvm, gpa);
}
kvmppc_radix_set_pte_at(kvm, gpa, pmdp_ptep(pmd), pte);
ret = 0;
goto out_unlock;
}
if (level == 0) {
if (pmd_none(*pmd)) {
if (!new_ptep)
goto out_unlock;
pmd_populate(kvm->mm, pmd, new_ptep);
new_ptep = NULL;
}
ptep = pte_offset_kernel(pmd, gpa);
if (pte_present(*ptep)) {
/* PTE was previously valid, so invalidate it */
old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT,
0, gpa, 0);
kvmppc_radix_tlbie_page(kvm, gpa, 0);
if (old & _PAGE_DIRTY)
mark_page_dirty(kvm, gpa >> PAGE_SHIFT);
}
kvmppc_radix_set_pte_at(kvm, gpa, ptep, pte);
} else {
kvmppc_radix_set_pte_at(kvm, gpa, pmdp_ptep(pmd), pte);
if (pmd_none(*pmd)) {
if (!new_ptep)
goto out_unlock;
pmd_populate(kvm->mm, pmd, new_ptep);
new_ptep = NULL;
}
ptep = pte_offset_kernel(pmd, gpa);
if (pte_present(*ptep)) {
/* Check if someone else set the same thing */
if (pte_raw(*ptep) == pte_raw(pte)) {
ret = 0;
goto out_unlock;
}
/* PTE was previously valid, so invalidate it */
old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT,
0, gpa, 0);
kvmppc_radix_tlbie_page(kvm, gpa, 0);
if (old & _PAGE_DIRTY)
mark_page_dirty(kvm, gpa >> PAGE_SHIFT);
}
kvmppc_radix_set_pte_at(kvm, gpa, ptep, pte);
ret = 0;
out_unlock:
@ -325,11 +395,11 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned long mmu_seq, pte_size;
unsigned long gpa, gfn, hva, pfn;
struct kvm_memory_slot *memslot;
struct page *page = NULL, *pages[1];
long ret, npages, ok;
unsigned int writing;
struct vm_area_struct *vma;
unsigned long flags;
struct page *page = NULL;
long ret;
bool writing;
bool upgrade_write = false;
bool *upgrade_p = &upgrade_write;
pte_t pte, *ptep;
unsigned long pgflags;
unsigned int shift, level;
@ -369,122 +439,131 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
dsisr & DSISR_ISSTORE);
}
writing = (dsisr & DSISR_ISSTORE) != 0;
if (memslot->flags & KVM_MEM_READONLY) {
if (writing) {
/* give the guest a DSI */
dsisr = DSISR_ISSTORE | DSISR_PROTFAULT;
kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
return RESUME_GUEST;
}
upgrade_p = NULL;
}
if (dsisr & DSISR_SET_RC) {
/*
* Need to set an R or C bit in the 2nd-level tables;
* since we are just helping out the hardware here,
* it is sufficient to do what the hardware does.
*/
pgflags = _PAGE_ACCESSED;
if (writing)
pgflags |= _PAGE_DIRTY;
/*
* We are walking the secondary page table here. We can do this
* without disabling irq.
*/
spin_lock(&kvm->mmu_lock);
ptep = __find_linux_pte(kvm->arch.pgtable,
gpa, NULL, &shift);
if (ptep && pte_present(*ptep) &&
(!writing || pte_write(*ptep))) {
kvmppc_radix_update_pte(kvm, ptep, 0, pgflags,
gpa, shift);
dsisr &= ~DSISR_SET_RC;
}
spin_unlock(&kvm->mmu_lock);
if (!(dsisr & (DSISR_BAD_FAULT_64S | DSISR_NOHPTE |
DSISR_PROTFAULT | DSISR_SET_RC)))
return RESUME_GUEST;
}
/* used to check for invalidations in progress */
mmu_seq = kvm->mmu_notifier_seq;
smp_rmb();
writing = (dsisr & DSISR_ISSTORE) != 0;
/*
* Do a fast check first, since __gfn_to_pfn_memslot doesn't
* do it with !atomic && !async, which is how we call it.
* We always ask for write permission since the common case
* is that the page is writable.
*/
hva = gfn_to_hva_memslot(memslot, gfn);
if (dsisr & DSISR_SET_RC) {
/*
* Need to set an R or C bit in the 2nd-level tables;
* if the relevant bits aren't already set in the linux
* page tables, fall through to do the gup_fast to
* set them in the linux page tables too.
*/
ok = 0;
pgflags = _PAGE_ACCESSED;
if (writing)
pgflags |= _PAGE_DIRTY;
local_irq_save(flags);
ptep = find_current_mm_pte(current->mm->pgd, hva, NULL, NULL);
if (ptep) {
pte = READ_ONCE(*ptep);
if (pte_present(pte) &&
(pte_val(pte) & pgflags) == pgflags)
ok = 1;
}
local_irq_restore(flags);
if (ok) {
spin_lock(&kvm->mmu_lock);
if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) {
spin_unlock(&kvm->mmu_lock);
return RESUME_GUEST;
}
/*
* We are walking the secondary page table here. We can do this
* without disabling irq.
*/
ptep = __find_linux_pte(kvm->arch.pgtable,
gpa, NULL, &shift);
if (ptep && pte_present(*ptep)) {
kvmppc_radix_update_pte(kvm, ptep, 0, pgflags,
gpa, shift);
spin_unlock(&kvm->mmu_lock);
return RESUME_GUEST;
}
spin_unlock(&kvm->mmu_lock);
if (upgrade_p && __get_user_pages_fast(hva, 1, 1, &page) == 1) {
pfn = page_to_pfn(page);
upgrade_write = true;
} else {
/* Call KVM generic code to do the slow-path check */
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
writing, upgrade_p);
if (is_error_noslot_pfn(pfn))
return -EFAULT;
page = NULL;
if (pfn_valid(pfn)) {
page = pfn_to_page(pfn);
if (PageReserved(page))
page = NULL;
}
}
ret = -EFAULT;
pfn = 0;
pte_size = PAGE_SIZE;
pgflags = _PAGE_READ | _PAGE_EXEC;
/* See if we can insert a 1GB or 2MB large PTE here */
level = 0;
npages = get_user_pages_fast(hva, 1, writing, pages);
if (npages < 1) {
/* Check if it's an I/O mapping */
down_read(&current->mm->mmap_sem);
vma = find_vma(current->mm, hva);
if (vma && vma->vm_start <= hva && hva < vma->vm_end &&
(vma->vm_flags & VM_PFNMAP)) {
pfn = vma->vm_pgoff +
((hva - vma->vm_start) >> PAGE_SHIFT);
pgflags = pgprot_val(vma->vm_page_prot);
}
up_read(&current->mm->mmap_sem);
if (!pfn)
return -EFAULT;
} else {
page = pages[0];
pfn = page_to_pfn(page);
if (PageCompound(page)) {
pte_size <<= compound_order(compound_head(page));
/* See if we can insert a 2MB large-page PTE here */
if (pte_size >= PMD_SIZE &&
(gpa & (PMD_SIZE - PAGE_SIZE)) ==
(hva & (PMD_SIZE - PAGE_SIZE))) {
level = 1;
pfn &= ~((PMD_SIZE >> PAGE_SHIFT) - 1);
}
}
/* See if we can provide write access */
if (writing) {
pgflags |= _PAGE_WRITE;
} else {
local_irq_save(flags);
ptep = find_current_mm_pte(current->mm->pgd,
hva, NULL, NULL);
if (ptep && pte_write(*ptep))
pgflags |= _PAGE_WRITE;
local_irq_restore(flags);
if (page && PageCompound(page)) {
pte_size = PAGE_SIZE << compound_order(compound_head(page));
if (pte_size >= PUD_SIZE &&
(gpa & (PUD_SIZE - PAGE_SIZE)) ==
(hva & (PUD_SIZE - PAGE_SIZE))) {
level = 2;
pfn &= ~((PUD_SIZE >> PAGE_SHIFT) - 1);
} else if (pte_size >= PMD_SIZE &&
(gpa & (PMD_SIZE - PAGE_SIZE)) ==
(hva & (PMD_SIZE - PAGE_SIZE))) {
level = 1;
pfn &= ~((PMD_SIZE >> PAGE_SHIFT) - 1);
}
}
/*
* Compute the PTE value that we need to insert.
*/
pgflags |= _PAGE_PRESENT | _PAGE_PTE | _PAGE_ACCESSED;
if (pgflags & _PAGE_WRITE)
pgflags |= _PAGE_DIRTY;
pte = pfn_pte(pfn, __pgprot(pgflags));
if (page) {
pgflags = _PAGE_READ | _PAGE_EXEC | _PAGE_PRESENT | _PAGE_PTE |
_PAGE_ACCESSED;
if (writing || upgrade_write)
pgflags |= _PAGE_WRITE | _PAGE_DIRTY;
pte = pfn_pte(pfn, __pgprot(pgflags));
} else {
/*
* Read the PTE from the process' radix tree and use that
* so we get the attribute bits.
*/
local_irq_disable();
ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift);
pte = *ptep;
local_irq_enable();
if (shift == PUD_SHIFT &&
(gpa & (PUD_SIZE - PAGE_SIZE)) ==
(hva & (PUD_SIZE - PAGE_SIZE))) {
level = 2;
} else if (shift == PMD_SHIFT &&
(gpa & (PMD_SIZE - PAGE_SIZE)) ==
(hva & (PMD_SIZE - PAGE_SIZE))) {
level = 1;
} else if (shift && shift != PAGE_SHIFT) {
/* Adjust PFN */
unsigned long mask = (1ul << shift) - PAGE_SIZE;
pte = __pte(pte_val(pte) | (hva & mask));
}
if (!(writing || upgrade_write))
pte = __pte(pte_val(pte) & ~ _PAGE_WRITE);
pte = __pte(pte_val(pte) | _PAGE_EXEC);
}
/* Allocate space in the tree and write the PTE */
ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq);
if (ret == -EBUSY) {
/*
* There's already a PMD where wanted to install a large page;
* for now, fall back to installing a small page.
*/
level = 0;
pfn |= gfn & ((PMD_SIZE >> PAGE_SHIFT) - 1);
pte = pfn_pte(pfn, __pgprot(pgflags));
ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq);
}
if (page) {
if (!ret && (pgflags & _PAGE_WRITE))
if (!ret && (pte_val(pte) & _PAGE_WRITE))
set_page_dirty_lock(page);
put_page(page);
}
@ -662,6 +741,10 @@ void kvmppc_free_radix(struct kvm *kvm)
for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++pud) {
if (!pud_present(*pud))
continue;
if (pud_huge(*pud)) {
pud_clear(pud);
continue;
}
pmd = pmd_offset(pud, 0);
for (im = 0; im < PTRS_PER_PMD; ++im, ++pmd) {
if (pmd_is_leaf(*pmd)) {

View File

@ -450,7 +450,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
/*
* Synchronize with the MMU notifier callbacks in
* book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
* book3s_64_mmu_hv.c (kvm_unmap_hva_range_hv etc.).
* While we have the rmap lock, code running on other CPUs
* cannot finish unmapping the host real page that backs
* this guest real page, so we are OK to access the host

View File

@ -4375,7 +4375,6 @@ static struct kvmppc_ops kvm_ops_hv = {
.flush_memslot = kvmppc_core_flush_memslot_hv,
.prepare_memory_region = kvmppc_core_prepare_memory_region_hv,
.commit_memory_region = kvmppc_core_commit_memory_region_hv,
.unmap_hva = kvm_unmap_hva_hv,
.unmap_hva_range = kvm_unmap_hva_range_hv,
.age_hva = kvm_age_hva_hv,
.test_age_hva = kvm_test_age_hva_hv,

View File

@ -277,15 +277,6 @@ static void do_kvm_unmap_hva(struct kvm *kvm, unsigned long start,
}
}
static int kvm_unmap_hva_pr(struct kvm *kvm, unsigned long hva)
{
trace_kvm_unmap_hva(hva);
do_kvm_unmap_hva(kvm, hva, hva + PAGE_SIZE);
return 0;
}
static int kvm_unmap_hva_range_pr(struct kvm *kvm, unsigned long start,
unsigned long end)
{
@ -1773,7 +1764,6 @@ static struct kvmppc_ops kvm_ops_pr = {
.flush_memslot = kvmppc_core_flush_memslot_pr,
.prepare_memory_region = kvmppc_core_prepare_memory_region_pr,
.commit_memory_region = kvmppc_core_commit_memory_region_pr,
.unmap_hva = kvm_unmap_hva_pr,
.unmap_hva_range = kvm_unmap_hva_range_pr,
.age_hva = kvm_age_hva_pr,
.test_age_hva = kvm_test_age_hva_pr,

View File

@ -724,7 +724,7 @@ int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, enum instruction_type type,
/************* MMU Notifiers *************/
int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
static int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
{
trace_kvm_unmap_hva(hva);

View File

@ -254,21 +254,6 @@ TRACE_EVENT(kvm_exit,
)
);
TRACE_EVENT(kvm_unmap_hva,
TP_PROTO(unsigned long hva),
TP_ARGS(hva),
TP_STRUCT__entry(
__field( unsigned long, hva )
),
TP_fast_assign(
__entry->hva = hva;
),
TP_printk("unmap hva 0x%lx\n", __entry->hva)
);
#endif /* _TRACE_KVM_H */
/* This part must be outside protection */

View File

@ -294,6 +294,7 @@ struct kvm_vcpu_stat {
u64 exit_userspace;
u64 exit_null;
u64 exit_external_request;
u64 exit_io_request;
u64 exit_external_interrupt;
u64 exit_stop_request;
u64 exit_validity;
@ -310,16 +311,29 @@ struct kvm_vcpu_stat {
u64 exit_program_interruption;
u64 exit_instr_and_program;
u64 exit_operation_exception;
u64 deliver_ckc;
u64 deliver_cputm;
u64 deliver_external_call;
u64 deliver_emergency_signal;
u64 deliver_service_signal;
u64 deliver_virtio_interrupt;
u64 deliver_virtio;
u64 deliver_stop_signal;
u64 deliver_prefix_signal;
u64 deliver_restart_signal;
u64 deliver_program_int;
u64 deliver_io_int;
u64 deliver_program;
u64 deliver_io;
u64 deliver_machine_check;
u64 exit_wait_state;
u64 inject_ckc;
u64 inject_cputm;
u64 inject_external_call;
u64 inject_emergency_signal;
u64 inject_mchk;
u64 inject_pfault_init;
u64 inject_program;
u64 inject_restart;
u64 inject_set_prefix;
u64 inject_stop_signal;
u64 instruction_epsw;
u64 instruction_gs;
u64 instruction_io_other;
@ -644,7 +658,12 @@ struct kvm_vcpu_arch {
};
struct kvm_vm_stat {
ulong remote_tlb_flush;
u64 inject_io;
u64 inject_float_mchk;
u64 inject_pfault_done;
u64 inject_service_signal;
u64 inject_virtio;
u64 remote_tlb_flush;
};
struct kvm_arch_memory_slot {
@ -792,6 +811,7 @@ struct kvm_arch{
int css_support;
int use_irqchip;
int use_cmma;
int use_pfmfi;
int user_cpu_state_ctrl;
int user_sigp;
int user_stsi;

View File

@ -193,6 +193,11 @@ static inline unsigned int kvm_arch_para_features(void)
return 0;
}
static inline unsigned int kvm_arch_para_hints(void)
{
return 0;
}
static inline bool kvm_check_and_clear_guest_paused(void)
{
return false;

View File

@ -22,8 +22,8 @@ typedef struct {
unsigned int has_pgste:1;
/* The mmu context uses storage keys. */
unsigned int use_skey:1;
/* The mmu context uses CMMA. */
unsigned int use_cmma:1;
/* The mmu context uses CMM. */
unsigned int uses_cmm:1;
} mm_context_t;
#define INIT_MM_CONTEXT(name) \

View File

@ -31,7 +31,7 @@ static inline int init_new_context(struct task_struct *tsk,
(current->mm && current->mm->context.alloc_pgste);
mm->context.has_pgste = 0;
mm->context.use_skey = 0;
mm->context.use_cmma = 0;
mm->context.uses_cmm = 0;
#endif
switch (mm->context.asce_limit) {
case _REGION2_SIZE:

View File

@ -1050,8 +1050,7 @@ shadow_r2t:
rc = gmap_shadow_r2t(sg, saddr, rfte.val, *fake);
if (rc)
return rc;
/* fallthrough */
}
} /* fallthrough */
case ASCE_TYPE_REGION2: {
union region2_table_entry rste;
@ -1077,8 +1076,7 @@ shadow_r3t:
rc = gmap_shadow_r3t(sg, saddr, rste.val, *fake);
if (rc)
return rc;
/* fallthrough */
}
} /* fallthrough */
case ASCE_TYPE_REGION3: {
union region3_table_entry rtte;
@ -1113,8 +1111,7 @@ shadow_sgt:
rc = gmap_shadow_sgt(sg, saddr, rtte.val, *fake);
if (rc)
return rc;
/* fallthrough */
}
} /* fallthrough */
case ASCE_TYPE_SEGMENT: {
union segment_table_entry ste;

View File

@ -50,18 +50,6 @@ u8 kvm_s390_get_ilen(struct kvm_vcpu *vcpu)
return ilen;
}
static int handle_noop(struct kvm_vcpu *vcpu)
{
switch (vcpu->arch.sie_block->icptcode) {
case 0x10:
vcpu->stat.exit_external_request++;
break;
default:
break; /* nothing */
}
return 0;
}
static int handle_stop(struct kvm_vcpu *vcpu)
{
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
@ -465,8 +453,11 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
switch (vcpu->arch.sie_block->icptcode) {
case ICPT_EXTREQ:
vcpu->stat.exit_external_request++;
return 0;
case ICPT_IOREQ:
return handle_noop(vcpu);
vcpu->stat.exit_io_request++;
return 0;
case ICPT_INST:
rc = handle_instruction(vcpu);
break;

View File

@ -391,6 +391,7 @@ static int __must_check __deliver_cpu_timer(struct kvm_vcpu *vcpu)
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
int rc;
vcpu->stat.deliver_cputm++;
trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_CPU_TIMER,
0, 0);
@ -410,6 +411,7 @@ static int __must_check __deliver_ckc(struct kvm_vcpu *vcpu)
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
int rc;
vcpu->stat.deliver_ckc++;
trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_CLOCK_COMP,
0, 0);
@ -595,6 +597,7 @@ static int __must_check __deliver_machine_check(struct kvm_vcpu *vcpu)
trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
KVM_S390_MCHK,
mchk.cr14, mchk.mcic);
vcpu->stat.deliver_machine_check++;
rc = __write_machine_check(vcpu, &mchk);
}
return rc;
@ -710,7 +713,7 @@ static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
ilen = pgm_info.flags & KVM_S390_PGM_FLAGS_ILC_MASK;
VCPU_EVENT(vcpu, 3, "deliver: program irq code 0x%x, ilen:%d",
pgm_info.code, ilen);
vcpu->stat.deliver_program_int++;
vcpu->stat.deliver_program++;
trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_PROGRAM_INT,
pgm_info.code, 0);
@ -899,7 +902,7 @@ static int __must_check __deliver_virtio(struct kvm_vcpu *vcpu)
VCPU_EVENT(vcpu, 4,
"deliver: virtio parm: 0x%x,parm64: 0x%llx",
inti->ext.ext_params, inti->ext.ext_params2);
vcpu->stat.deliver_virtio_interrupt++;
vcpu->stat.deliver_virtio++;
trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
inti->type,
inti->ext.ext_params,
@ -975,7 +978,7 @@ static int __must_check __deliver_io(struct kvm_vcpu *vcpu,
inti->io.subchannel_id >> 1 & 0x3,
inti->io.subchannel_nr);
vcpu->stat.deliver_io_int++;
vcpu->stat.deliver_io++;
trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
inti->type,
((__u32)inti->io.subchannel_id << 16) |
@ -1004,7 +1007,7 @@ static int __must_check __deliver_io(struct kvm_vcpu *vcpu,
VCPU_EVENT(vcpu, 4, "%s isc %u", "deliver: I/O (AI/gisa)", isc);
memset(&io, 0, sizeof(io));
io.io_int_word = isc_to_int_word(isc);
vcpu->stat.deliver_io_int++;
vcpu->stat.deliver_io++;
trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
KVM_S390_INT_IO(1, 0, 0, 0),
((__u32)io.subchannel_id << 16) |
@ -1268,6 +1271,7 @@ static int __inject_prog(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
{
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
vcpu->stat.inject_program++;
VCPU_EVENT(vcpu, 3, "inject: program irq code 0x%x", irq->u.pgm.code);
trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_PROGRAM_INT,
irq->u.pgm.code, 0);
@ -1309,6 +1313,7 @@ static int __inject_pfault_init(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
{
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
vcpu->stat.inject_pfault_init++;
VCPU_EVENT(vcpu, 4, "inject: pfault init parameter block at 0x%llx",
irq->u.ext.ext_params2);
trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_PFAULT_INIT,
@ -1327,6 +1332,7 @@ static int __inject_extcall(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
struct kvm_s390_extcall_info *extcall = &li->irq.extcall;
uint16_t src_id = irq->u.extcall.code;
vcpu->stat.inject_external_call++;
VCPU_EVENT(vcpu, 4, "inject: external call source-cpu:%u",
src_id);
trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_EXTERNAL_CALL,
@ -1351,6 +1357,7 @@ static int __inject_set_prefix(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
struct kvm_s390_prefix_info *prefix = &li->irq.prefix;
vcpu->stat.inject_set_prefix++;
VCPU_EVENT(vcpu, 3, "inject: set prefix to %x",
irq->u.prefix.address);
trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_SIGP_SET_PREFIX,
@ -1371,6 +1378,7 @@ static int __inject_sigp_stop(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
struct kvm_s390_stop_info *stop = &li->irq.stop;
int rc = 0;
vcpu->stat.inject_stop_signal++;
trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_SIGP_STOP, 0, 0);
if (irq->u.stop.flags & ~KVM_S390_STOP_SUPP_FLAGS)
@ -1395,6 +1403,7 @@ static int __inject_sigp_restart(struct kvm_vcpu *vcpu,
{
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
vcpu->stat.inject_restart++;
VCPU_EVENT(vcpu, 3, "%s", "inject: restart int");
trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_RESTART, 0, 0);
@ -1407,6 +1416,7 @@ static int __inject_sigp_emergency(struct kvm_vcpu *vcpu,
{
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
vcpu->stat.inject_emergency_signal++;
VCPU_EVENT(vcpu, 4, "inject: emergency from cpu %u",
irq->u.emerg.code);
trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_EMERGENCY,
@ -1427,6 +1437,7 @@ static int __inject_mchk(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
struct kvm_s390_mchk_info *mchk = &li->irq.mchk;
vcpu->stat.inject_mchk++;
VCPU_EVENT(vcpu, 3, "inject: machine check mcic 0x%llx",
irq->u.mchk.mcic);
trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_MCHK, 0,
@ -1457,6 +1468,7 @@ static int __inject_ckc(struct kvm_vcpu *vcpu)
{
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
vcpu->stat.inject_ckc++;
VCPU_EVENT(vcpu, 3, "%s", "inject: clock comparator external");
trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_CLOCK_COMP,
0, 0);
@ -1470,6 +1482,7 @@ static int __inject_cpu_timer(struct kvm_vcpu *vcpu)
{
struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
vcpu->stat.inject_cputm++;
VCPU_EVENT(vcpu, 3, "%s", "inject: cpu timer external");
trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_CPU_TIMER,
0, 0);
@ -1596,6 +1609,7 @@ static int __inject_service(struct kvm *kvm,
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
kvm->stat.inject_service_signal++;
spin_lock(&fi->lock);
fi->srv_signal.ext_params |= inti->ext.ext_params & SCCB_EVENT_PENDING;
/*
@ -1621,6 +1635,7 @@ static int __inject_virtio(struct kvm *kvm,
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
kvm->stat.inject_virtio++;
spin_lock(&fi->lock);
if (fi->counters[FIRQ_CNTR_VIRTIO] >= KVM_S390_MAX_VIRTIO_IRQS) {
spin_unlock(&fi->lock);
@ -1638,6 +1653,7 @@ static int __inject_pfault_done(struct kvm *kvm,
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
kvm->stat.inject_pfault_done++;
spin_lock(&fi->lock);
if (fi->counters[FIRQ_CNTR_PFAULT] >=
(ASYNC_PF_PER_VCPU * KVM_MAX_VCPUS)) {
@ -1657,6 +1673,7 @@ static int __inject_float_mchk(struct kvm *kvm,
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
kvm->stat.inject_float_mchk++;
spin_lock(&fi->lock);
fi->mchk.cr14 |= inti->mchk.cr14 & (1UL << CR_PENDING_SUBCLASS);
fi->mchk.mcic |= inti->mchk.mcic;
@ -1672,6 +1689,7 @@ static int __inject_io(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
struct list_head *list;
int isc;
kvm->stat.inject_io++;
isc = int_word_to_isc(inti->io.io_int_word);
if (kvm->arch.gisa && inti->type & KVM_S390_INT_IO_AI_MASK) {

View File

@ -57,6 +57,7 @@
(KVM_MAX_VCPUS + LOCAL_IRQS))
#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "userspace_handled", VCPU_STAT(exit_userspace) },
@ -64,6 +65,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "exit_validity", VCPU_STAT(exit_validity) },
{ "exit_stop_request", VCPU_STAT(exit_stop_request) },
{ "exit_external_request", VCPU_STAT(exit_external_request) },
{ "exit_io_request", VCPU_STAT(exit_io_request) },
{ "exit_external_interrupt", VCPU_STAT(exit_external_interrupt) },
{ "exit_instruction", VCPU_STAT(exit_instruction) },
{ "exit_pei", VCPU_STAT(exit_pei) },
@ -78,16 +80,34 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "instruction_lctl", VCPU_STAT(instruction_lctl) },
{ "instruction_stctl", VCPU_STAT(instruction_stctl) },
{ "instruction_stctg", VCPU_STAT(instruction_stctg) },
{ "deliver_ckc", VCPU_STAT(deliver_ckc) },
{ "deliver_cputm", VCPU_STAT(deliver_cputm) },
{ "deliver_emergency_signal", VCPU_STAT(deliver_emergency_signal) },
{ "deliver_external_call", VCPU_STAT(deliver_external_call) },
{ "deliver_service_signal", VCPU_STAT(deliver_service_signal) },
{ "deliver_virtio_interrupt", VCPU_STAT(deliver_virtio_interrupt) },
{ "deliver_virtio", VCPU_STAT(deliver_virtio) },
{ "deliver_stop_signal", VCPU_STAT(deliver_stop_signal) },
{ "deliver_prefix_signal", VCPU_STAT(deliver_prefix_signal) },
{ "deliver_restart_signal", VCPU_STAT(deliver_restart_signal) },
{ "deliver_program_interruption", VCPU_STAT(deliver_program_int) },
{ "deliver_io_interrupt", VCPU_STAT(deliver_io_int) },
{ "deliver_program", VCPU_STAT(deliver_program) },
{ "deliver_io", VCPU_STAT(deliver_io) },
{ "deliver_machine_check", VCPU_STAT(deliver_machine_check) },
{ "exit_wait_state", VCPU_STAT(exit_wait_state) },
{ "inject_ckc", VCPU_STAT(inject_ckc) },
{ "inject_cputm", VCPU_STAT(inject_cputm) },
{ "inject_external_call", VCPU_STAT(inject_external_call) },
{ "inject_float_mchk", VM_STAT(inject_float_mchk) },
{ "inject_emergency_signal", VCPU_STAT(inject_emergency_signal) },
{ "inject_io", VM_STAT(inject_io) },
{ "inject_mchk", VCPU_STAT(inject_mchk) },
{ "inject_pfault_done", VM_STAT(inject_pfault_done) },
{ "inject_program", VCPU_STAT(inject_program) },
{ "inject_restart", VCPU_STAT(inject_restart) },
{ "inject_service_signal", VM_STAT(inject_service_signal) },
{ "inject_set_prefix", VCPU_STAT(inject_set_prefix) },
{ "inject_stop_signal", VCPU_STAT(inject_stop_signal) },
{ "inject_pfault_init", VCPU_STAT(inject_pfault_init) },
{ "inject_virtio", VM_STAT(inject_virtio) },
{ "instruction_epsw", VCPU_STAT(instruction_epsw) },
{ "instruction_gs", VCPU_STAT(instruction_gs) },
{ "instruction_io_other", VCPU_STAT(instruction_io_other) },
@ -152,13 +172,33 @@ static int nested;
module_param(nested, int, S_IRUGO);
MODULE_PARM_DESC(nested, "Nested virtualization support");
/* upper facilities limit for kvm */
unsigned long kvm_s390_fac_list_mask[16] = { FACILITIES_KVM };
unsigned long kvm_s390_fac_list_mask_size(void)
/*
* For now we handle at most 16 double words as this is what the s390 base
* kernel handles and stores in the prefix page. If we ever need to go beyond
* this, this requires changes to code, but the external uapi can stay.
*/
#define SIZE_INTERNAL 16
/*
* Base feature mask that defines default mask for facilities. Consists of the
* defines in FACILITIES_KVM and the non-hypervisor managed bits.
*/
static unsigned long kvm_s390_fac_base[SIZE_INTERNAL] = { FACILITIES_KVM };
/*
* Extended feature mask. Consists of the defines in FACILITIES_KVM_CPUMODEL
* and defines the facilities that can be enabled via a cpu model.
*/
static unsigned long kvm_s390_fac_ext[SIZE_INTERNAL] = { FACILITIES_KVM_CPUMODEL };
static unsigned long kvm_s390_fac_size(void)
{
BUILD_BUG_ON(ARRAY_SIZE(kvm_s390_fac_list_mask) > S390_ARCH_FAC_MASK_SIZE_U64);
return ARRAY_SIZE(kvm_s390_fac_list_mask);
BUILD_BUG_ON(SIZE_INTERNAL > S390_ARCH_FAC_MASK_SIZE_U64);
BUILD_BUG_ON(SIZE_INTERNAL > S390_ARCH_FAC_LIST_SIZE_U64);
BUILD_BUG_ON(SIZE_INTERNAL * sizeof(unsigned long) >
sizeof(S390_lowcore.stfle_fac_list));
return SIZE_INTERNAL;
}
/* available cpu features supported by kvm */
@ -679,6 +719,8 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
mutex_lock(&kvm->lock);
if (!kvm->created_vcpus) {
kvm->arch.use_cmma = 1;
/* Not compatible with cmma. */
kvm->arch.use_pfmfi = 0;
ret = 0;
}
mutex_unlock(&kvm->lock);
@ -1583,7 +1625,7 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm,
return -EINVAL;
/* CMMA is disabled or was not used, or the buffer has length zero */
bufsize = min(args->count, KVM_S390_CMMA_SIZE_MAX);
if (!bufsize || !kvm->mm->context.use_cmma) {
if (!bufsize || !kvm->mm->context.uses_cmm) {
memset(args, 0, sizeof(*args));
return 0;
}
@ -1660,7 +1702,7 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm,
/*
* This function sets the CMMA attributes for the given pages. If the input
* buffer has zero length, no action is taken, otherwise the attributes are
* set and the mm->context.use_cmma flag is set.
* set and the mm->context.uses_cmm flag is set.
*/
static int kvm_s390_set_cmma_bits(struct kvm *kvm,
const struct kvm_s390_cmma_log *args)
@ -1710,9 +1752,9 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
srcu_read_unlock(&kvm->srcu, srcu_idx);
up_read(&kvm->mm->mmap_sem);
if (!kvm->mm->context.use_cmma) {
if (!kvm->mm->context.uses_cmm) {
down_write(&kvm->mm->mmap_sem);
kvm->mm->context.use_cmma = 1;
kvm->mm->context.uses_cmm = 1;
up_write(&kvm->mm->mmap_sem);
}
out:
@ -1967,20 +2009,15 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (!kvm->arch.sie_page2)
goto out_err;
/* Populate the facility mask initially. */
memcpy(kvm->arch.model.fac_mask, S390_lowcore.stfle_fac_list,
sizeof(S390_lowcore.stfle_fac_list));
for (i = 0; i < S390_ARCH_FAC_LIST_SIZE_U64; i++) {
if (i < kvm_s390_fac_list_mask_size())
kvm->arch.model.fac_mask[i] &= kvm_s390_fac_list_mask[i];
else
kvm->arch.model.fac_mask[i] = 0UL;
}
/* Populate the facility list initially. */
kvm->arch.model.fac_list = kvm->arch.sie_page2->fac_list;
memcpy(kvm->arch.model.fac_list, kvm->arch.model.fac_mask,
S390_ARCH_FAC_LIST_SIZE_BYTE);
for (i = 0; i < kvm_s390_fac_size(); i++) {
kvm->arch.model.fac_mask[i] = S390_lowcore.stfle_fac_list[i] &
(kvm_s390_fac_base[i] |
kvm_s390_fac_ext[i]);
kvm->arch.model.fac_list[i] = S390_lowcore.stfle_fac_list[i] &
kvm_s390_fac_base[i];
}
/* we are always in czam mode - even on pre z14 machines */
set_kvm_facility(kvm->arch.model.fac_mask, 138);
@ -2028,6 +2065,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm->arch.css_support = 0;
kvm->arch.use_irqchip = 0;
kvm->arch.use_pfmfi = sclp.has_pfmfi;
kvm->arch.epoch = 0;
spin_lock_init(&kvm->arch.start_stop_lock);
@ -2454,8 +2492,6 @@ int kvm_s390_vcpu_setup_cmma(struct kvm_vcpu *vcpu)
vcpu->arch.sie_block->cbrlo = get_zeroed_page(GFP_KERNEL);
if (!vcpu->arch.sie_block->cbrlo)
return -ENOMEM;
vcpu->arch.sie_block->ecb2 &= ~ECB2_PFMFI;
return 0;
}
@ -2491,7 +2527,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
if (test_kvm_facility(vcpu->kvm, 73))
vcpu->arch.sie_block->ecb |= ECB_TE;
if (test_kvm_facility(vcpu->kvm, 8) && sclp.has_pfmfi)
if (test_kvm_facility(vcpu->kvm, 8) && vcpu->kvm->arch.use_pfmfi)
vcpu->arch.sie_block->ecb2 |= ECB2_PFMFI;
if (test_kvm_facility(vcpu->kvm, 130))
vcpu->arch.sie_block->ecb2 |= ECB2_IEP;
@ -3023,7 +3059,7 @@ retry:
if (kvm_check_request(KVM_REQ_START_MIGRATION, vcpu)) {
/*
* Disable CMMA virtualization; we will emulate the ESSA
* Disable CMM virtualization; we will emulate the ESSA
* instruction manually, in order to provide additional
* functionalities needed for live migration.
*/
@ -3033,11 +3069,11 @@ retry:
if (kvm_check_request(KVM_REQ_STOP_MIGRATION, vcpu)) {
/*
* Re-enable CMMA virtualization if CMMA is available and
* was used.
* Re-enable CMM virtualization if CMMA is available and
* CMM has been used.
*/
if ((vcpu->kvm->arch.use_cmma) &&
(vcpu->kvm->mm->context.use_cmma))
(vcpu->kvm->mm->context.uses_cmm))
vcpu->arch.sie_block->ecb2 |= ECB2_CMMA;
goto retry;
}
@ -4044,7 +4080,7 @@ static int __init kvm_s390_init(void)
}
for (i = 0; i < 16; i++)
kvm_s390_fac_list_mask[i] |=
kvm_s390_fac_base[i] |=
S390_lowcore.stfle_fac_list[i] & nonhyp_mask(i);
return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);

View File

@ -294,8 +294,6 @@ void exit_sie(struct kvm_vcpu *vcpu);
void kvm_s390_sync_request(int req, struct kvm_vcpu *vcpu);
int kvm_s390_vcpu_setup_cmma(struct kvm_vcpu *vcpu);
void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu);
unsigned long kvm_s390_fac_list_mask_size(void);
extern unsigned long kvm_s390_fac_list_mask[];
void kvm_s390_set_cpu_timer(struct kvm_vcpu *vcpu, __u64 cputm);
__u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu);

View File

@ -1078,9 +1078,9 @@ static int handle_essa(struct kvm_vcpu *vcpu)
* value really needs to be written to; if the value is
* already correct, we do nothing and avoid the lock.
*/
if (vcpu->kvm->mm->context.use_cmma == 0) {
if (vcpu->kvm->mm->context.uses_cmm == 0) {
down_write(&vcpu->kvm->mm->mmap_sem);
vcpu->kvm->mm->context.use_cmma = 1;
vcpu->kvm->mm->context.uses_cmm = 1;
up_write(&vcpu->kvm->mm->mmap_sem);
}
/*

View File

@ -62,6 +62,13 @@ static struct facility_def facility_defs[] = {
}
},
{
/*
* FACILITIES_KVM contains the list of facilities that are part
* of the default facility mask and list that are passed to the
* initial CPU model. If no CPU model is used, this, together
* with the non-hypervisor managed bits, is the maximum list of
* guest facilities supported by KVM.
*/
.name = "FACILITIES_KVM",
.bits = (int[]){
0, /* N3 instructions */
@ -89,6 +96,19 @@ static struct facility_def facility_defs[] = {
-1 /* END */
}
},
{
/*
* FACILITIES_KVM_CPUMODEL contains the list of facilities
* that can be enabled by CPU model code if the host supports
* it. These facilities are not passed to the guest without
* CPU model support.
*/
.name = "FACILITIES_KVM_CPUMODEL",
.bits = (int[]){
-1 /* END */
}
},
};
static void print_facility_list(struct facility_def *def)

View File

@ -21,7 +21,7 @@
#include <asm/apic.h>
#include <asm/desc.h>
#include <asm/hypervisor.h>
#include <asm/hyperv.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
#include <linux/version.h>
#include <linux/vmalloc.h>
@ -88,11 +88,15 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
u32 *hv_vp_index;
EXPORT_SYMBOL_GPL(hv_vp_index);
struct hv_vp_assist_page **hv_vp_assist_page;
EXPORT_SYMBOL_GPL(hv_vp_assist_page);
u32 hv_max_vp_index;
static int hv_cpu_init(unsigned int cpu)
{
u64 msr_vp_index;
struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
hv_get_vp_index(msr_vp_index);
@ -101,6 +105,22 @@ static int hv_cpu_init(unsigned int cpu)
if (msr_vp_index > hv_max_vp_index)
hv_max_vp_index = msr_vp_index;
if (!hv_vp_assist_page)
return 0;
if (!*hvp)
*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
if (*hvp) {
u64 val;
val = vmalloc_to_pfn(*hvp);
val = (val << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) |
HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
}
return 0;
}
@ -198,6 +218,9 @@ static int hv_cpu_die(unsigned int cpu)
struct hv_reenlightenment_control re_ctrl;
unsigned int new_cpu;
if (hv_vp_assist_page && hv_vp_assist_page[cpu])
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
if (hv_reenlightenment_cb == NULL)
return 0;
@ -224,6 +247,7 @@ void hyperv_init(void)
{
u64 guest_id, required_msrs;
union hv_x64_msr_hypercall_contents hypercall_msr;
int cpuhp;
if (x86_hyper_type != X86_HYPER_MS_HYPERV)
return;
@ -241,9 +265,17 @@ void hyperv_init(void)
if (!hv_vp_index)
return;
if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
hv_cpu_init, hv_cpu_die) < 0)
hv_vp_assist_page = kcalloc(num_possible_cpus(),
sizeof(*hv_vp_assist_page), GFP_KERNEL);
if (!hv_vp_assist_page) {
ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
goto free_vp_index;
}
cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
hv_cpu_init, hv_cpu_die);
if (cpuhp < 0)
goto free_vp_assist_page;
/*
* Setup the hypercall page and enable hypercalls.
@ -256,7 +288,7 @@ void hyperv_init(void)
hv_hypercall_pg = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
if (hv_hypercall_pg == NULL) {
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
goto free_vp_index;
goto remove_cpuhp_state;
}
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@ -304,6 +336,11 @@ register_msr_cs:
return;
remove_cpuhp_state:
cpuhp_remove_state(cpuhp);
free_vp_assist_page:
kfree(hv_vp_assist_page);
hv_vp_assist_page = NULL;
free_vp_index:
kfree(hv_vp_index);
hv_vp_index = NULL;

View File

@ -1,6 +1,13 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _ASM_X86_HYPERV_H
#define _ASM_X86_HYPERV_H
/*
* This file contains definitions from Hyper-V Hypervisor Top-Level Functional
* Specification (TLFS):
* https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
*/
#ifndef _ASM_X86_HYPERV_TLFS_H
#define _ASM_X86_HYPERV_TLFS_H
#include <linux/types.h>
@ -14,6 +21,7 @@
#define HYPERV_CPUID_FEATURES 0x40000003
#define HYPERV_CPUID_ENLIGHTMENT_INFO 0x40000004
#define HYPERV_CPUID_IMPLEMENT_LIMITS 0x40000005
#define HYPERV_CPUID_NESTED_FEATURES 0x4000000A
#define HYPERV_HYPERVISOR_PRESENT_BIT 0x80000000
#define HYPERV_CPUID_MIN 0x40000005
@ -159,6 +167,9 @@
/* Recommend using the newer ExProcessorMasks interface */
#define HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED (1 << 11)
/* Recommend using enlightened VMCS */
#define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED (1 << 14)
/*
* Crash notification flag.
*/
@ -192,7 +203,7 @@
#define HV_X64_MSR_EOI 0x40000070
#define HV_X64_MSR_ICR 0x40000071
#define HV_X64_MSR_TPR 0x40000072
#define HV_X64_MSR_APIC_ASSIST_PAGE 0x40000073
#define HV_X64_MSR_VP_ASSIST_PAGE 0x40000073
/* Define synthetic interrupt controller model specific registers. */
#define HV_X64_MSR_SCONTROL 0x40000080
@ -240,6 +251,55 @@
#define HV_X64_MSR_CRASH_PARAMS \
(1 + (HV_X64_MSR_CRASH_P4 - HV_X64_MSR_CRASH_P0))
/*
* Declare the MSR used to setup pages used to communicate with the hypervisor.
*/
union hv_x64_msr_hypercall_contents {
u64 as_uint64;
struct {
u64 enable:1;
u64 reserved:11;
u64 guest_physical_address:52;
};
};
/*
* TSC page layout.
*/
struct ms_hyperv_tsc_page {
volatile u32 tsc_sequence;
u32 reserved1;
volatile u64 tsc_scale;
volatile s64 tsc_offset;
u64 reserved2[509];
};
/*
* The guest OS needs to register the guest ID with the hypervisor.
* The guest ID is a 64 bit entity and the structure of this ID is
* specified in the Hyper-V specification:
*
* msdn.microsoft.com/en-us/library/windows/hardware/ff542653%28v=vs.85%29.aspx
*
* While the current guideline does not specify how Linux guest ID(s)
* need to be generated, our plan is to publish the guidelines for
* Linux and other guest operating systems that currently are hosted
* on Hyper-V. The implementation here conforms to this yet
* unpublished guidelines.
*
*
* Bit(s)
* 63 - Indicates if the OS is Open Source or not; 1 is Open Source
* 62:56 - Os Type; Linux is 0x100
* 55:48 - Distro specific identification
* 47:16 - Linux kernel version number
* 15:0 - Distro specific identification
*
*
*/
#define HV_LINUX_VENDOR_ID 0x8100
/* TSC emulation after migration */
#define HV_X64_MSR_REENLIGHTENMENT_CONTROL 0x40000106
@ -278,10 +338,13 @@ struct hv_tsc_emulation_status {
#define HVCALL_POST_MESSAGE 0x005c
#define HVCALL_SIGNAL_EVENT 0x005d
#define HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE 0x00000001
#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT 12
#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \
(~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE 0x00000001
#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT 12
#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK \
(~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
/* Hyper-V Enlightened VMCS version mask in nested features CPUID */
#define HV_X64_ENLIGHTENED_VMCS_VERSION 0xff
#define HV_X64_MSR_TSC_REFERENCE_ENABLE 0x00000001
#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12
@ -301,12 +364,22 @@ enum HV_GENERIC_SET_FORMAT {
HV_GENERIC_SET_ALL,
};
#define HV_HYPERCALL_RESULT_MASK GENMASK_ULL(15, 0)
#define HV_HYPERCALL_FAST_BIT BIT(16)
#define HV_HYPERCALL_VARHEAD_OFFSET 17
#define HV_HYPERCALL_REP_COMP_OFFSET 32
#define HV_HYPERCALL_REP_COMP_MASK GENMASK_ULL(43, 32)
#define HV_HYPERCALL_REP_START_OFFSET 48
#define HV_HYPERCALL_REP_START_MASK GENMASK_ULL(59, 48)
/* hypercall status code */
#define HV_STATUS_SUCCESS 0
#define HV_STATUS_INVALID_HYPERCALL_CODE 2
#define HV_STATUS_INVALID_HYPERCALL_INPUT 3
#define HV_STATUS_INVALID_ALIGNMENT 4
#define HV_STATUS_INVALID_PARAMETER 5
#define HV_STATUS_INSUFFICIENT_MEMORY 11
#define HV_STATUS_INVALID_PORT_ID 17
#define HV_STATUS_INVALID_CONNECTION_ID 18
#define HV_STATUS_INSUFFICIENT_BUFFERS 19
@ -321,6 +394,8 @@ typedef struct _HV_REFERENCE_TSC_PAGE {
#define HV_SYNIC_SINT_COUNT (16)
/* Define the expected SynIC version. */
#define HV_SYNIC_VERSION_1 (0x1)
/* Valid SynIC vectors are 16-255. */
#define HV_SYNIC_FIRST_VALID_VECTOR (16)
#define HV_SYNIC_CONTROL_ENABLE (1ULL << 0)
#define HV_SYNIC_SIMP_ENABLE (1ULL << 0)
@ -415,6 +490,216 @@ struct hv_timer_message_payload {
__u64 delivery_time; /* When the message was delivered */
};
/* Define virtual processor assist page structure. */
struct hv_vp_assist_page {
__u32 apic_assist;
__u32 reserved;
__u64 vtl_control[2];
__u64 nested_enlightenments_control[2];
__u32 enlighten_vmentry;
__u64 current_nested_vmcs;
};
struct hv_enlightened_vmcs {
u32 revision_id;
u32 abort;
u16 host_es_selector;
u16 host_cs_selector;
u16 host_ss_selector;
u16 host_ds_selector;
u16 host_fs_selector;
u16 host_gs_selector;
u16 host_tr_selector;
u64 host_ia32_pat;
u64 host_ia32_efer;
u64 host_cr0;
u64 host_cr3;
u64 host_cr4;
u64 host_ia32_sysenter_esp;
u64 host_ia32_sysenter_eip;
u64 host_rip;
u32 host_ia32_sysenter_cs;
u32 pin_based_vm_exec_control;
u32 vm_exit_controls;
u32 secondary_vm_exec_control;
u64 io_bitmap_a;
u64 io_bitmap_b;
u64 msr_bitmap;
u16 guest_es_selector;
u16 guest_cs_selector;
u16 guest_ss_selector;
u16 guest_ds_selector;
u16 guest_fs_selector;
u16 guest_gs_selector;
u16 guest_ldtr_selector;
u16 guest_tr_selector;
u32 guest_es_limit;
u32 guest_cs_limit;
u32 guest_ss_limit;
u32 guest_ds_limit;
u32 guest_fs_limit;
u32 guest_gs_limit;
u32 guest_ldtr_limit;
u32 guest_tr_limit;
u32 guest_gdtr_limit;
u32 guest_idtr_limit;
u32 guest_es_ar_bytes;
u32 guest_cs_ar_bytes;
u32 guest_ss_ar_bytes;
u32 guest_ds_ar_bytes;
u32 guest_fs_ar_bytes;
u32 guest_gs_ar_bytes;
u32 guest_ldtr_ar_bytes;
u32 guest_tr_ar_bytes;
u64 guest_es_base;
u64 guest_cs_base;
u64 guest_ss_base;
u64 guest_ds_base;
u64 guest_fs_base;
u64 guest_gs_base;
u64 guest_ldtr_base;
u64 guest_tr_base;
u64 guest_gdtr_base;
u64 guest_idtr_base;
u64 padding64_1[3];
u64 vm_exit_msr_store_addr;
u64 vm_exit_msr_load_addr;
u64 vm_entry_msr_load_addr;
u64 cr3_target_value0;
u64 cr3_target_value1;
u64 cr3_target_value2;
u64 cr3_target_value3;
u32 page_fault_error_code_mask;
u32 page_fault_error_code_match;
u32 cr3_target_count;
u32 vm_exit_msr_store_count;
u32 vm_exit_msr_load_count;
u32 vm_entry_msr_load_count;
u64 tsc_offset;
u64 virtual_apic_page_addr;
u64 vmcs_link_pointer;
u64 guest_ia32_debugctl;
u64 guest_ia32_pat;
u64 guest_ia32_efer;
u64 guest_pdptr0;
u64 guest_pdptr1;
u64 guest_pdptr2;
u64 guest_pdptr3;
u64 guest_pending_dbg_exceptions;
u64 guest_sysenter_esp;
u64 guest_sysenter_eip;
u32 guest_activity_state;
u32 guest_sysenter_cs;
u64 cr0_guest_host_mask;
u64 cr4_guest_host_mask;
u64 cr0_read_shadow;
u64 cr4_read_shadow;
u64 guest_cr0;
u64 guest_cr3;
u64 guest_cr4;
u64 guest_dr7;
u64 host_fs_base;
u64 host_gs_base;
u64 host_tr_base;
u64 host_gdtr_base;
u64 host_idtr_base;
u64 host_rsp;
u64 ept_pointer;
u16 virtual_processor_id;
u16 padding16[3];
u64 padding64_2[5];
u64 guest_physical_address;
u32 vm_instruction_error;
u32 vm_exit_reason;
u32 vm_exit_intr_info;
u32 vm_exit_intr_error_code;
u32 idt_vectoring_info_field;
u32 idt_vectoring_error_code;
u32 vm_exit_instruction_len;
u32 vmx_instruction_info;
u64 exit_qualification;
u64 exit_io_instruction_ecx;
u64 exit_io_instruction_esi;
u64 exit_io_instruction_edi;
u64 exit_io_instruction_eip;
u64 guest_linear_address;
u64 guest_rsp;
u64 guest_rflags;
u32 guest_interruptibility_info;
u32 cpu_based_vm_exec_control;
u32 exception_bitmap;
u32 vm_entry_controls;
u32 vm_entry_intr_info_field;
u32 vm_entry_exception_error_code;
u32 vm_entry_instruction_len;
u32 tpr_threshold;
u64 guest_rip;
u32 hv_clean_fields;
u32 hv_padding_32;
u32 hv_synthetic_controls;
u32 hv_enlightenments_control;
u32 hv_vp_id;
u64 hv_vm_id;
u64 partition_assist_page;
u64 padding64_4[4];
u64 guest_bndcfgs;
u64 padding64_5[7];
u64 xss_exit_bitmap;
u64 padding64_6[7];
};
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE 0
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_IO_BITMAP BIT(0)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP BIT(1)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2 BIT(2)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP1 BIT(3)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_PROC BIT(4)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_EVENT BIT(5)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_ENTRY BIT(6)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_EXCPN BIT(7)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CRDR BIT(8)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_XLAT BIT(9)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC BIT(10)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1 BIT(11)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2 BIT(12)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_POINTER BIT(13)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1 BIT(14)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL BIT(15)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL 0xFFFF
#define HV_STIMER_ENABLE (1ULL << 0)
#define HV_STIMER_PERIODIC (1ULL << 1)
#define HV_STIMER_LAZY (1ULL << 2)

View File

@ -34,6 +34,7 @@
#include <asm/msr-index.h>
#include <asm/asm.h>
#include <asm/kvm_page_track.h>
#include <asm/hyperv-tlfs.h>
#define KVM_MAX_VCPUS 288
#define KVM_SOFT_MAX_VCPUS 240
@ -73,6 +74,7 @@
#define KVM_REQ_HV_RESET KVM_ARCH_REQ(20)
#define KVM_REQ_HV_EXIT KVM_ARCH_REQ(21)
#define KVM_REQ_HV_STIMER KVM_ARCH_REQ(22)
#define KVM_REQ_LOAD_EOI_EXITMAP KVM_ARCH_REQ(23)
#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@ -498,6 +500,7 @@ struct kvm_vcpu_arch {
u64 apic_base;
struct kvm_lapic *apic; /* kernel irqchip context */
bool apicv_active;
bool load_eoi_exitmap_pending;
DECLARE_BITMAP(ioapic_handled_vectors, 256);
unsigned long apic_attention;
int32_t apic_arb_prio;
@ -571,7 +574,7 @@ struct kvm_vcpu_arch {
} exception;
struct kvm_queued_interrupt {
bool pending;
bool injected;
bool soft;
u8 nr;
} interrupt;
@ -754,6 +757,12 @@ struct kvm_hv {
u64 hv_crash_ctl;
HV_REFERENCE_TSC_PAGE tsc_ref;
struct idr conn_to_evt;
u64 hv_reenlightenment_control;
u64 hv_tsc_emulation_control;
u64 hv_tsc_emulation_status;
};
enum kvm_irqchip_mode {
@ -762,15 +771,6 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
};
struct kvm_sev_info {
bool active; /* SEV enabled guest */
unsigned int asid; /* ASID used for this guest */
unsigned int handle; /* SEV firmware handle */
int fd; /* SEV device fd */
unsigned long pages_locked; /* Number of pages locked */
struct list_head regions_list; /* List of registered regions */
};
struct kvm_arch {
unsigned int n_used_mmu_pages;
unsigned int n_requested_mmu_pages;
@ -800,13 +800,13 @@ struct kvm_arch {
struct mutex apic_map_lock;
struct kvm_apic_map *apic_map;
unsigned int tss_addr;
bool apic_access_page_done;
gpa_t wall_clock;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
bool mwait_in_guest;
bool hlt_in_guest;
bool pause_in_guest;
unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
@ -849,17 +849,8 @@ struct kvm_arch {
bool disabled_lapic_found;
/* Struct members for AVIC */
u32 avic_vm_id;
u32 ldr_mode;
struct page *avic_logical_id_table_page;
struct page *avic_physical_id_table_page;
struct hlist_node hnode;
bool x2apic_format;
bool x2apic_broadcast_quirk_disabled;
struct kvm_sev_info sev_info;
};
struct kvm_vm_stat {
@ -936,6 +927,8 @@ struct kvm_x86_ops {
bool (*cpu_has_high_real_mode_segbase)(void);
void (*cpuid_update)(struct kvm_vcpu *vcpu);
struct kvm *(*vm_alloc)(void);
void (*vm_free)(struct kvm *);
int (*vm_init)(struct kvm *kvm);
void (*vm_destroy)(struct kvm *kvm);
@ -1007,6 +1000,7 @@ struct kvm_x86_ops {
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*set_identity_map_addr)(struct kvm *kvm, u64 ident_addr);
int (*get_tdp_level)(struct kvm_vcpu *vcpu);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
int (*get_lpage_level)(void);
@ -1109,6 +1103,17 @@ struct kvm_arch_async_pf {
extern struct kvm_x86_ops *kvm_x86_ops;
#define __KVM_HAVE_ARCH_VM_ALLOC
static inline struct kvm *kvm_arch_alloc_vm(void)
{
return kvm_x86_ops->vm_alloc();
}
static inline void kvm_arch_free_vm(struct kvm *kvm)
{
return kvm_x86_ops->vm_free(kvm);
}
int kvm_mmu_module_init(void);
void kvm_mmu_module_exit(void);
@ -1187,6 +1192,8 @@ enum emulation_result {
#define EMULTYPE_SKIP (1 << 2)
#define EMULTYPE_RETRY (1 << 3)
#define EMULTYPE_NO_REEXECUTE (1 << 4)
#define EMULTYPE_NO_UD_ON_FAIL (1 << 5)
#define EMULTYPE_VMWARE (1 << 6)
int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
int emulation_type, void *insn, int insn_len);
@ -1204,8 +1211,7 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
struct x86_emulate_ctxt;
int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port);
int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, unsigned short port);
int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in);
int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
int kvm_emulate_halt(struct kvm_vcpu *vcpu);
int kvm_vcpu_halt(struct kvm_vcpu *vcpu);

View File

@ -88,6 +88,7 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
#ifdef CONFIG_KVM_GUEST
bool kvm_para_available(void);
unsigned int kvm_arch_para_features(void);
unsigned int kvm_arch_para_hints(void);
void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
void kvm_async_pf_task_wake(u32 token);
u32 kvm_read_and_reset_pf_reason(void);
@ -115,6 +116,11 @@ static inline unsigned int kvm_arch_para_features(void)
return 0;
}
static inline unsigned int kvm_arch_para_hints(void)
{
return 0;
}
static inline u32 kvm_read_and_reset_pf_reason(void)
{
return 0;

View File

@ -6,90 +6,23 @@
#include <linux/atomic.h>
#include <linux/nmi.h>
#include <asm/io.h>
#include <asm/hyperv.h>
#include <asm/hyperv-tlfs.h>
#include <asm/nospec-branch.h>
/*
* The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
* is set by CPUID(HVCPUID_VERSION_FEATURES).
*/
enum hv_cpuid_function {
HVCPUID_VERSION_FEATURES = 0x00000001,
HVCPUID_VENDOR_MAXFUNCTION = 0x40000000,
HVCPUID_INTERFACE = 0x40000001,
/*
* The remaining functions depend on the value of
* HVCPUID_INTERFACE
*/
HVCPUID_VERSION = 0x40000002,
HVCPUID_FEATURES = 0x40000003,
HVCPUID_ENLIGHTENMENT_INFO = 0x40000004,
HVCPUID_IMPLEMENTATION_LIMITS = 0x40000005,
};
struct ms_hyperv_info {
u32 features;
u32 misc_features;
u32 hints;
u32 nested_features;
u32 max_vp_index;
u32 max_lp_index;
};
extern struct ms_hyperv_info ms_hyperv;
/*
* Declare the MSR used to setup pages used to communicate with the hypervisor.
*/
union hv_x64_msr_hypercall_contents {
u64 as_uint64;
struct {
u64 enable:1;
u64 reserved:11;
u64 guest_physical_address:52;
};
};
/*
* TSC page layout.
*/
struct ms_hyperv_tsc_page {
volatile u32 tsc_sequence;
u32 reserved1;
volatile u64 tsc_scale;
volatile s64 tsc_offset;
u64 reserved2[509];
};
/*
* The guest OS needs to register the guest ID with the hypervisor.
* The guest ID is a 64 bit entity and the structure of this ID is
* specified in the Hyper-V specification:
*
* msdn.microsoft.com/en-us/library/windows/hardware/ff542653%28v=vs.85%29.aspx
*
* While the current guideline does not specify how Linux guest ID(s)
* need to be generated, our plan is to publish the guidelines for
* Linux and other guest operating systems that currently are hosted
* on Hyper-V. The implementation here conforms to this yet
* unpublished guidelines.
*
*
* Bit(s)
* 63 - Indicates if the OS is Open Source or not; 1 is Open Source
* 62:56 - Os Type; Linux is 0x100
* 55:48 - Distro specific identification
* 47:16 - Linux kernel version number
* 15:0 - Distro specific identification
*
*
*/
#define HV_LINUX_VENDOR_ID 0x8100
/*
* Generate the guest ID based on the guideline described above.
* Generate the guest ID.
*/
static inline __u64 generate_guest_id(__u64 d_info1, __u64 kernel_version,
@ -228,14 +161,6 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
return hv_status;
}
#define HV_HYPERCALL_RESULT_MASK GENMASK_ULL(15, 0)
#define HV_HYPERCALL_FAST_BIT BIT(16)
#define HV_HYPERCALL_VARHEAD_OFFSET 17
#define HV_HYPERCALL_REP_COMP_OFFSET 32
#define HV_HYPERCALL_REP_COMP_MASK GENMASK_ULL(43, 32)
#define HV_HYPERCALL_REP_START_OFFSET 48
#define HV_HYPERCALL_REP_START_MASK GENMASK_ULL(59, 48)
/* Fast hypercall with 8 bytes of input and no output */
static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
{
@ -307,6 +232,15 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
*/
extern u32 *hv_vp_index;
extern u32 hv_max_vp_index;
extern struct hv_vp_assist_page **hv_vp_assist_page;
static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
{
if (!hv_vp_assist_page)
return NULL;
return hv_vp_assist_page[cpu];
}
/**
* hv_cpu_number_to_vp_number() - Map CPU to VP.
@ -343,6 +277,10 @@ static inline void hyperv_setup_mmu_ops(void) {}
static inline void set_hv_tscchange_cb(void (*cb)(void)) {}
static inline void clear_hv_tscchange_cb(void) {}
static inline void hyperv_stop_tsc_emulation(void) {};
static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
{
return NULL;
}
#endif /* CONFIG_HYPERV */
#ifdef CONFIG_HYPERV_TSCPAGE

View File

@ -353,7 +353,21 @@
/* Fam 15h MSRs */
#define MSR_F15H_PERF_CTL 0xc0010200
#define MSR_F15H_PERF_CTL0 MSR_F15H_PERF_CTL
#define MSR_F15H_PERF_CTL1 (MSR_F15H_PERF_CTL + 2)
#define MSR_F15H_PERF_CTL2 (MSR_F15H_PERF_CTL + 4)
#define MSR_F15H_PERF_CTL3 (MSR_F15H_PERF_CTL + 6)
#define MSR_F15H_PERF_CTL4 (MSR_F15H_PERF_CTL + 8)
#define MSR_F15H_PERF_CTL5 (MSR_F15H_PERF_CTL + 10)
#define MSR_F15H_PERF_CTR 0xc0010201
#define MSR_F15H_PERF_CTR0 MSR_F15H_PERF_CTR
#define MSR_F15H_PERF_CTR1 (MSR_F15H_PERF_CTR + 2)
#define MSR_F15H_PERF_CTR2 (MSR_F15H_PERF_CTR + 4)
#define MSR_F15H_PERF_CTR3 (MSR_F15H_PERF_CTR + 6)
#define MSR_F15H_PERF_CTR4 (MSR_F15H_PERF_CTR + 8)
#define MSR_F15H_PERF_CTR5 (MSR_F15H_PERF_CTR + 10)
#define MSR_F15H_NB_PERF_CTL 0xc0010240
#define MSR_F15H_NB_PERF_CTR 0xc0010241
#define MSR_F15H_PTSC 0xc0010280

View File

@ -407,9 +407,19 @@ union irq_stack_union {
DECLARE_PER_CPU_FIRST(union irq_stack_union, irq_stack_union) __visible;
DECLARE_INIT_PER_CPU(irq_stack_union);
static inline unsigned long cpu_kernelmode_gs_base(int cpu)
{
return (unsigned long)per_cpu(irq_stack_union.gs_base, cpu);
}
DECLARE_PER_CPU(char *, irq_stack_ptr);
DECLARE_PER_CPU(unsigned int, irq_count);
extern asmlinkage void ignore_sysret(void);
#if IS_ENABLED(CONFIG_KVM)
/* Save actual FS/GS selectors and bases to current->thread */
void save_fsgs_for_kvm(void);
#endif
#else /* X86_64 */
#ifdef CONFIG_CC_STACKPROTECTOR
/*

View File

@ -60,7 +60,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u32 intercept_dr;
u32 intercept_exceptions;
u64 intercept;
u8 reserved_1[42];
u8 reserved_1[40];
u16 pause_filter_thresh;
u16 pause_filter_count;
u64 iopm_base_pa;
u64 msrpm_base_pa;

View File

@ -354,8 +354,25 @@ struct kvm_xcrs {
__u64 padding[16];
};
/* definition of registers in kvm_run */
#define KVM_SYNC_X86_REGS (1UL << 0)
#define KVM_SYNC_X86_SREGS (1UL << 1)
#define KVM_SYNC_X86_EVENTS (1UL << 2)
#define KVM_SYNC_X86_VALID_FIELDS \
(KVM_SYNC_X86_REGS| \
KVM_SYNC_X86_SREGS| \
KVM_SYNC_X86_EVENTS)
/* kvm_sync_regs struct included by kvm_run struct */
struct kvm_sync_regs {
/* Members of this structure are potentially malicious.
* Care must be taken by code reading, esp. interpreting,
* data fields from them inside KVM to prevent TOCTOU and
* double-fetch types of vulnerabilities.
*/
struct kvm_regs regs;
struct kvm_sregs sregs;
struct kvm_vcpu_events events;
};
#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0)

View File

@ -3,15 +3,16 @@
#define _UAPI_ASM_X86_KVM_PARA_H
#include <linux/types.h>
#include <asm/hyperv.h>
/* This CPUID returns the signature 'KVMKVMKVM' in ebx, ecx, and edx. It
* should be used to determine that a VM is running under KVM.
*/
#define KVM_CPUID_SIGNATURE 0x40000000
/* This CPUID returns a feature bitmap in eax. Before enabling a particular
* paravirtualization, the appropriate feature bit should be checked.
/* This CPUID returns two feature bitmaps in eax, edx. Before enabling
* a particular paravirtualization, the appropriate feature bit should
* be checked in eax. The performance hint feature bit should be checked
* in edx.
*/
#define KVM_CPUID_FEATURES 0x40000001
#define KVM_FEATURE_CLOCKSOURCE 0
@ -28,6 +29,8 @@
#define KVM_FEATURE_PV_TLB_FLUSH 9
#define KVM_FEATURE_ASYNC_PF_VMEXIT 10
#define KVM_HINTS_DEDICATED 0
/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
*/

View File

@ -487,7 +487,7 @@ void load_percpu_segment(int cpu)
loadsegment(fs, __KERNEL_PERCPU);
#else
__loadsegment_simple(gs, 0);
wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu));
wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
#endif
load_stack_canary_segment();
}
@ -1398,6 +1398,7 @@ __setup("clearcpuid=", setup_clearcpuid);
#ifdef CONFIG_X86_64
DEFINE_PER_CPU_FIRST(union irq_stack_union,
irq_stack_union) __aligned(PAGE_SIZE) __visible;
EXPORT_PER_CPU_SYMBOL_GPL(irq_stack_union);
/*
* The following percpu variables are hot. Align current_task to

View File

@ -22,7 +22,7 @@
#include <linux/kexec.h>
#include <asm/processor.h>
#include <asm/hypervisor.h>
#include <asm/hyperv.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
#include <asm/desc.h>
#include <asm/irq_regs.h>
@ -216,8 +216,8 @@ static void __init ms_hyperv_init_platform(void)
pr_info("Hyper-V: features 0x%x, hints 0x%x\n",
ms_hyperv.features, ms_hyperv.hints);
ms_hyperv.max_vp_index = cpuid_eax(HVCPUID_IMPLEMENTATION_LIMITS);
ms_hyperv.max_lp_index = cpuid_ebx(HVCPUID_IMPLEMENTATION_LIMITS);
ms_hyperv.max_vp_index = cpuid_eax(HYPERV_CPUID_IMPLEMENT_LIMITS);
ms_hyperv.max_lp_index = cpuid_ebx(HYPERV_CPUID_IMPLEMENT_LIMITS);
pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
@ -225,11 +225,12 @@ static void __init ms_hyperv_init_platform(void)
/*
* Extract host information.
*/
if (cpuid_eax(HVCPUID_VENDOR_MAXFUNCTION) >= HVCPUID_VERSION) {
hv_host_info_eax = cpuid_eax(HVCPUID_VERSION);
hv_host_info_ebx = cpuid_ebx(HVCPUID_VERSION);
hv_host_info_ecx = cpuid_ecx(HVCPUID_VERSION);
hv_host_info_edx = cpuid_edx(HVCPUID_VERSION);
if (cpuid_eax(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS) >=
HYPERV_CPUID_VERSION) {
hv_host_info_eax = cpuid_eax(HYPERV_CPUID_VERSION);
hv_host_info_ebx = cpuid_ebx(HYPERV_CPUID_VERSION);
hv_host_info_ecx = cpuid_ecx(HYPERV_CPUID_VERSION);
hv_host_info_edx = cpuid_edx(HYPERV_CPUID_VERSION);
pr_info("Hyper-V Host Build:%d-%d.%d-%d-%d.%d\n",
hv_host_info_eax, hv_host_info_ebx >> 16,
@ -243,6 +244,11 @@ static void __init ms_hyperv_init_platform(void)
x86_platform.calibrate_cpu = hv_get_tsc_khz;
}
if (ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED) {
ms_hyperv.nested_features =
cpuid_eax(HYPERV_CPUID_NESTED_FEATURES);
}
#ifdef CONFIG_X86_LOCAL_APIC
if (ms_hyperv.features & HV_X64_ACCESS_FREQUENCY_MSRS &&
ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {

View File

@ -454,6 +454,13 @@ static void __init sev_map_percpu_data(void)
}
#ifdef CONFIG_SMP
static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
{
native_smp_prepare_cpus(max_cpus);
if (kvm_para_has_hint(KVM_HINTS_DEDICATED))
static_branch_disable(&virt_spin_lock_key);
}
static void __init kvm_smp_prepare_boot_cpu(void)
{
/*
@ -546,6 +553,7 @@ static void __init kvm_guest_init(void)
}
if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) &&
!kvm_para_has_hint(KVM_HINTS_DEDICATED) &&
kvm_para_has_feature(KVM_FEATURE_STEAL_TIME))
pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
@ -556,6 +564,7 @@ static void __init kvm_guest_init(void)
kvm_setup_vsyscall_timeinfo();
#ifdef CONFIG_SMP
smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "x86/kvm:online",
kvm_cpu_online, kvm_cpu_down_prepare) < 0)
@ -605,6 +614,11 @@ unsigned int kvm_arch_para_features(void)
return cpuid_eax(kvm_cpuid_base() | KVM_CPUID_FEATURES);
}
unsigned int kvm_arch_para_hints(void)
{
return cpuid_edx(kvm_cpuid_base() | KVM_CPUID_FEATURES);
}
static uint32_t __init kvm_detect(void)
{
return kvm_cpuid_base();
@ -635,6 +649,7 @@ static __init int kvm_setup_pv_tlb_flush(void)
int cpu;
if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) &&
!kvm_para_has_hint(KVM_HINTS_DEDICATED) &&
kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
for_each_possible_cpu(cpu) {
zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mask, cpu),
@ -730,6 +745,9 @@ void __init kvm_spinlock_init(void)
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
if (kvm_para_has_hint(KVM_HINTS_DEDICATED))
return;
__pv_init_lock_hash();
pv_lock_ops.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);

View File

@ -205,6 +205,20 @@ static __always_inline void save_fsgs(struct task_struct *task)
save_base_legacy(task, task->thread.gsindex, GS);
}
#if IS_ENABLED(CONFIG_KVM)
/*
* While a process is running,current->thread.fsbase and current->thread.gsbase
* may not match the corresponding CPU registers (see save_base_legacy()). KVM
* wants an efficient way to save and restore FSBASE and GSBASE.
* When FSGSBASE extensions are enabled, this will have to use RD{FS,GS}BASE.
*/
void save_fsgs_for_kvm(void)
{
save_fsgs(current);
}
EXPORT_SYMBOL_GPL(save_fsgs_for_kvm);
#endif
static __always_inline void loadseg(enum which_selector which,
unsigned short sel)
{

View File

@ -135,6 +135,11 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
return -EINVAL;
}
best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
if (kvm_hlt_in_guest(vcpu->kvm) && best &&
(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
/* Update physical-address width */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
kvm_mmu_reset_context(vcpu);
@ -370,7 +375,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM) |
F(TOPOEXT);
F(TOPOEXT) | F(PERFCTR_CORE);
/* cpuid 0x80000008.ebx */
const u32 kvm_cpuid_8000_0008_ebx_x86_features =

View File

@ -30,6 +30,7 @@
#include "x86.h"
#include "tss.h"
#include "mmu.h"
#include "pmu.h"
/*
* Operand types
@ -2887,6 +2888,9 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt)
return ctxt->ops->cpl(ctxt) > iopl;
}
#define VMWARE_PORT_VMPORT (0x5658)
#define VMWARE_PORT_VMRPC (0x5659)
static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt,
u16 port, u16 len)
{
@ -2898,6 +2902,14 @@ static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt,
unsigned mask = (1 << len) - 1;
unsigned long base;
/*
* VMware allows access to these ports even if denied
* by TSS I/O permission bitmap. Mimic behavior.
*/
if (enable_vmware_backdoor &&
((port == VMWARE_PORT_VMPORT) || (port == VMWARE_PORT_VMRPC)))
return true;
ops->get_segment(ctxt, &tr, &tr_seg, &base3, VCPU_SREG_TR);
if (!tr_seg.p)
return false;
@ -4282,6 +4294,13 @@ static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
u64 cr4 = ctxt->ops->get_cr(ctxt, 4);
u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
/*
* VMware allows access to these Pseduo-PMCs even when read via RDPMC
* in Ring3 when CR4.PCE=0.
*/
if (enable_vmware_backdoor && is_vmware_backdoor_pmc(rcx))
return X86EMUL_CONTINUE;
if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) ||
ctxt->ops->check_pmc(ctxt, rcx))
return emulate_gp(ctxt, 0);
@ -4498,6 +4517,10 @@ static const struct gprefix pfx_0f_2b = {
ID(0, &instr_dual_0f_2b), ID(0, &instr_dual_0f_2b), N, N,
};
static const struct gprefix pfx_0f_10_0f_11 = {
I(Unaligned, em_mov), I(Unaligned, em_mov), N, N,
};
static const struct gprefix pfx_0f_28_0f_29 = {
I(Aligned, em_mov), I(Aligned, em_mov), N, N,
};
@ -4709,7 +4732,9 @@ static const struct opcode twobyte_table[256] = {
DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N,
/* 0x10 - 0x1F */
N, N, N, N, N, N, N, N,
GP(ModRM | DstReg | SrcMem | Mov | Sse, &pfx_0f_10_0f_11),
GP(ModRM | DstMem | SrcReg | Mov | Sse, &pfx_0f_10_0f_11),
N, N, N, N, N, N,
D(ImplicitOps | ModRM | SrcMem | NoAccess),
N, N, N, N, N, N, D(ImplicitOps | ModRM | SrcMem | NoAccess),
/* 0x20 - 0x2F */

View File

@ -29,6 +29,7 @@
#include <linux/kvm_host.h>
#include <linux/highmem.h>
#include <linux/sched/cputime.h>
#include <linux/eventfd.h>
#include <asm/apicdef.h>
#include <trace/events/kvm.h>
@ -74,22 +75,11 @@ static bool synic_has_vector_auto_eoi(struct kvm_vcpu_hv_synic *synic,
return false;
}
static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
u64 data, bool host)
static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
int vector)
{
int vector;
vector = data & HV_SYNIC_SINT_VECTOR_MASK;
if (vector < 16 && !host)
return 1;
/*
* Guest may configure multiple SINTs to use the same vector, so
* we maintain a bitmap of vectors handled by synic, and a
* bitmap of vectors with auto-eoi behavior. The bitmaps are
* updated here, and atomically queried on fast paths.
*/
atomic64_set(&synic->sint[sint], data);
if (vector < HV_SYNIC_FIRST_VALID_VECTOR)
return;
if (synic_has_vector_connected(synic, vector))
__set_bit(vector, synic->vec_bitmap);
@ -100,6 +90,37 @@ static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
__set_bit(vector, synic->auto_eoi_bitmap);
else
__clear_bit(vector, synic->auto_eoi_bitmap);
}
static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
u64 data, bool host)
{
int vector, old_vector;
bool masked;
vector = data & HV_SYNIC_SINT_VECTOR_MASK;
masked = data & HV_SYNIC_SINT_MASKED;
/*
* Valid vectors are 16-255, however, nested Hyper-V attempts to write
* default '0x10000' value on boot and this should not #GP. We need to
* allow zero-initing the register from host as well.
*/
if (vector < HV_SYNIC_FIRST_VALID_VECTOR && !host && !masked)
return 1;
/*
* Guest may configure multiple SINTs to use the same vector, so
* we maintain a bitmap of vectors handled by synic, and a
* bitmap of vectors with auto-eoi behavior. The bitmaps are
* updated here, and atomically queried on fast paths.
*/
old_vector = synic_read_sint(synic, sint) & HV_SYNIC_SINT_VECTOR_MASK;
atomic64_set(&synic->sint[sint], data);
synic_update_vector(synic, old_vector);
synic_update_vector(synic, vector);
/* Load SynIC vectors into EOI exit bitmap */
kvm_make_request(KVM_REQ_SCAN_IOAPIC, synic_to_vcpu(synic));
@ -736,6 +757,9 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
case HV_X64_MSR_CRASH_CTL:
case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
case HV_X64_MSR_RESET:
case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
case HV_X64_MSR_TSC_EMULATION_CONTROL:
case HV_X64_MSR_TSC_EMULATION_STATUS:
r = true;
break;
}
@ -981,6 +1005,15 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
kvm_make_request(KVM_REQ_HV_RESET, vcpu);
}
break;
case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
hv->hv_reenlightenment_control = data;
break;
case HV_X64_MSR_TSC_EMULATION_CONTROL:
hv->hv_tsc_emulation_control = data;
break;
case HV_X64_MSR_TSC_EMULATION_STATUS:
hv->hv_tsc_emulation_status = data;
break;
default:
vcpu_unimpl(vcpu, "Hyper-V uhandled wrmsr: 0x%x data 0x%llx\n",
msr, data);
@ -1009,17 +1042,17 @@ static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
return 1;
hv->vp_index = (u32)data;
break;
case HV_X64_MSR_APIC_ASSIST_PAGE: {
case HV_X64_MSR_VP_ASSIST_PAGE: {
u64 gfn;
unsigned long addr;
if (!(data & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE)) {
if (!(data & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE)) {
hv->hv_vapic = data;
if (kvm_lapic_enable_pv_eoi(vcpu, 0))
return 1;
break;
}
gfn = data >> HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT;
gfn = data >> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT;
addr = kvm_vcpu_gfn_to_hva(vcpu, gfn);
if (kvm_is_error_hva(addr))
return 1;
@ -1105,6 +1138,15 @@ static int kvm_hv_get_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
case HV_X64_MSR_RESET:
data = 0;
break;
case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
data = hv->hv_reenlightenment_control;
break;
case HV_X64_MSR_TSC_EMULATION_CONTROL:
data = hv->hv_tsc_emulation_control;
break;
case HV_X64_MSR_TSC_EMULATION_STATUS:
data = hv->hv_tsc_emulation_status;
break;
default:
vcpu_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr);
return 1;
@ -1129,7 +1171,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
return kvm_hv_vapic_msr_read(vcpu, APIC_ICR, pdata);
case HV_X64_MSR_TPR:
return kvm_hv_vapic_msr_read(vcpu, APIC_TASKPRI, pdata);
case HV_X64_MSR_APIC_ASSIST_PAGE:
case HV_X64_MSR_VP_ASSIST_PAGE:
data = hv->hv_vapic;
break;
case HV_X64_MSR_VP_RUNTIME:
@ -1226,10 +1268,47 @@ static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
return 1;
}
static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
{
struct eventfd_ctx *eventfd;
if (unlikely(!fast)) {
int ret;
gpa_t gpa = param;
if ((gpa & (__alignof__(param) - 1)) ||
offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
return HV_STATUS_INVALID_ALIGNMENT;
ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
if (ret < 0)
return HV_STATUS_INVALID_ALIGNMENT;
}
/*
* Per spec, bits 32-47 contain the extra "flag number". However, we
* have no use for it, and in all known usecases it is zero, so just
* report lookup failure if it isn't.
*/
if (param & 0xffff00000000ULL)
return HV_STATUS_INVALID_PORT_ID;
/* remaining bits are reserved-zero */
if (param & ~KVM_HYPERV_CONN_ID_MASK)
return HV_STATUS_INVALID_HYPERCALL_INPUT;
/* conn_to_evt is protected by vcpu->kvm->srcu */
eventfd = idr_find(&vcpu->kvm->arch.hyperv.conn_to_evt, param);
if (!eventfd)
return HV_STATUS_INVALID_PORT_ID;
eventfd_signal(eventfd, 1);
return HV_STATUS_SUCCESS;
}
int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
{
u64 param, ingpa, outgpa, ret;
uint16_t code, rep_idx, rep_cnt, res = HV_STATUS_SUCCESS, rep_done = 0;
u64 param, ingpa, outgpa, ret = HV_STATUS_SUCCESS;
uint16_t code, rep_idx, rep_cnt;
bool fast, longmode;
/*
@ -1268,7 +1347,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
/* Hypercall continuation is not supported yet */
if (rep_cnt || rep_idx) {
res = HV_STATUS_INVALID_HYPERCALL_CODE;
ret = HV_STATUS_INVALID_HYPERCALL_CODE;
goto set_result;
}
@ -1276,11 +1355,15 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
case HVCALL_NOTIFY_LONG_SPIN_WAIT:
kvm_vcpu_on_spin(vcpu, true);
break;
case HVCALL_POST_MESSAGE:
case HVCALL_SIGNAL_EVENT:
ret = kvm_hvcall_signal_event(vcpu, fast, ingpa);
if (ret != HV_STATUS_INVALID_PORT_ID)
break;
/* maybe userspace knows this conn_id: fall through */
case HVCALL_POST_MESSAGE:
/* don't bother userspace if it has no way to handle it */
if (!vcpu_to_synic(vcpu)->active) {
res = HV_STATUS_INVALID_HYPERCALL_CODE;
ret = HV_STATUS_INVALID_HYPERCALL_CODE;
break;
}
vcpu->run->exit_reason = KVM_EXIT_HYPERV;
@ -1292,12 +1375,79 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
kvm_hv_hypercall_complete_userspace;
return 0;
default:
res = HV_STATUS_INVALID_HYPERCALL_CODE;
ret = HV_STATUS_INVALID_HYPERCALL_CODE;
break;
}
set_result:
ret = res | (((u64)rep_done & 0xfff) << 32);
kvm_hv_hypercall_set_result(vcpu, ret);
return 1;
}
void kvm_hv_init_vm(struct kvm *kvm)
{
mutex_init(&kvm->arch.hyperv.hv_lock);
idr_init(&kvm->arch.hyperv.conn_to_evt);
}
void kvm_hv_destroy_vm(struct kvm *kvm)
{
struct eventfd_ctx *eventfd;
int i;
idr_for_each_entry(&kvm->arch.hyperv.conn_to_evt, eventfd, i)
eventfd_ctx_put(eventfd);
idr_destroy(&kvm->arch.hyperv.conn_to_evt);
}
static int kvm_hv_eventfd_assign(struct kvm *kvm, u32 conn_id, int fd)
{
struct kvm_hv *hv = &kvm->arch.hyperv;
struct eventfd_ctx *eventfd;
int ret;
eventfd = eventfd_ctx_fdget(fd);
if (IS_ERR(eventfd))
return PTR_ERR(eventfd);
mutex_lock(&hv->hv_lock);
ret = idr_alloc(&hv->conn_to_evt, eventfd, conn_id, conn_id + 1,
GFP_KERNEL);
mutex_unlock(&hv->hv_lock);
if (ret >= 0)
return 0;
if (ret == -ENOSPC)
ret = -EEXIST;
eventfd_ctx_put(eventfd);
return ret;
}
static int kvm_hv_eventfd_deassign(struct kvm *kvm, u32 conn_id)
{
struct kvm_hv *hv = &kvm->arch.hyperv;
struct eventfd_ctx *eventfd;
mutex_lock(&hv->hv_lock);
eventfd = idr_remove(&hv->conn_to_evt, conn_id);
mutex_unlock(&hv->hv_lock);
if (!eventfd)
return -ENOENT;
synchronize_srcu(&kvm->srcu);
eventfd_ctx_put(eventfd);
return 0;
}
int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args)
{
if ((args->flags & ~KVM_HYPERV_EVENTFD_DEASSIGN) ||
(args->conn_id & ~KVM_HYPERV_CONN_ID_MASK))
return -EINVAL;
if (args->flags == KVM_HYPERV_EVENTFD_DEASSIGN)
return kvm_hv_eventfd_deassign(kvm, args->conn_id);
return kvm_hv_eventfd_assign(kvm, args->conn_id, args->fd);
}

View File

@ -88,4 +88,8 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu);
void kvm_hv_setup_tsc_page(struct kvm *kvm,
struct pvclock_vcpu_time_info *hv_clock);
void kvm_hv_init_vm(struct kvm *kvm);
void kvm_hv_destroy_vm(struct kvm *kvm);
int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
#endif

View File

@ -73,8 +73,19 @@ static int kvm_cpu_has_extint(struct kvm_vcpu *v)
*/
int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
{
/*
* FIXME: interrupt.injected represents an interrupt that it's
* side-effects have already been applied (e.g. bit from IRR
* already moved to ISR). Therefore, it is incorrect to rely
* on interrupt.injected to know if there is a pending
* interrupt in the user-mode LAPIC.
* This leads to nVMX/nSVM not be able to distinguish
* if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on
* pending interrupt or should re-inject an injected
* interrupt.
*/
if (!lapic_in_kernel(v))
return v->arch.interrupt.pending;
return v->arch.interrupt.injected;
if (kvm_cpu_has_extint(v))
return 1;
@ -91,8 +102,19 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
*/
int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
{
/*
* FIXME: interrupt.injected represents an interrupt that it's
* side-effects have already been applied (e.g. bit from IRR
* already moved to ISR). Therefore, it is incorrect to rely
* on interrupt.injected to know if there is a pending
* interrupt in the user-mode LAPIC.
* This leads to nVMX/nSVM not be able to distinguish
* if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on
* pending interrupt or should re-inject an injected
* interrupt.
*/
if (!lapic_in_kernel(v))
return v->arch.interrupt.pending;
return v->arch.interrupt.injected;
if (kvm_cpu_has_extint(v))
return 1;

View File

@ -41,7 +41,7 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
if (!test_bit(VCPU_EXREG_PDPTR,
(unsigned long *)&vcpu->arch.regs_avail))
kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR);
kvm_x86_ops->cache_reg(vcpu, (enum kvm_reg)VCPU_EXREG_PDPTR);
return vcpu->arch.walk_mmu->pdptrs[index];
}
@ -93,6 +93,11 @@ static inline void enter_guest_mode(struct kvm_vcpu *vcpu)
static inline void leave_guest_mode(struct kvm_vcpu *vcpu)
{
vcpu->arch.hflags &= ~HF_GUEST_MASK;
if (vcpu->arch.load_eoi_exitmap_pending) {
vcpu->arch.load_eoi_exitmap_pending = false;
kvm_make_request(KVM_REQ_LOAD_EOI_EXITMAP, vcpu);
}
}
static inline bool is_guest_mode(struct kvm_vcpu *vcpu)

View File

@ -321,8 +321,16 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
if (!lapic_in_kernel(vcpu))
return;
/*
* KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
* which doesn't have EOI register; Some buggy OSes (e.g. Windows with
* Hyper-V role) disable EOI broadcast in lapic not checking for IOAPIC
* version first and level-triggered interrupts never get EOIed in
* IOAPIC.
*/
feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
if (feat && (feat->ecx & (1 << (X86_FEATURE_X2APIC & 31))))
if (feat && (feat->ecx & (1 << (X86_FEATURE_X2APIC & 31))) &&
!ioapic_in_kernel(vcpu->kvm))
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
}

View File

@ -109,7 +109,7 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
{
return vcpu->arch.hyperv.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
return vcpu->arch.hyperv.hv_vapic & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
}
int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);

View File

@ -3031,7 +3031,7 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
return RET_PF_RETRY;
}
return RET_PF_EMULATE;
return -EFAULT;
}
static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,

Some files were not shown because too many files have changed in this diff Show More