forked from Minki/linux
The first batch of KVM patches, mostly covering x86, which I
am sending out early due to me travelling next week. There is a lone mm patch for which Andrew gave an informal ack at https://lore.kernel.org/linux-mm/20220817102500.440c6d0a3fce296fdf91bea6@linux-foundation.org. I will send the bulk of ARM work, as well as other architectures, at the end of next week. ARM: * Account stage2 page table allocations in memory stats. x86: * Account EPT/NPT arm64 page table allocations in memory stats. * Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR accesses. * Drop eVMCS controls filtering for KVM on Hyper-V, all known versions of Hyper-V now support eVMCS fields associated with features that are enumerated to the guest. * Use KVM's sanitized VMCS config as the basis for the values of nested VMX capabilities MSRs. * A myriad event/exception fixes and cleanups. Most notably, pending exceptions morph into VM-Exits earlier, as soon as the exception is queued, instead of waiting until the next vmentry. This fixed a longstanding issue where the exceptions would incorrecly become double-faults instead of triggering a vmexit; the common case of page-fault vmexits had a special workaround, but now it's fixed for good. * A handful of fixes for memory leaks in error paths. * Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow. * Never write to memory from non-sleepable kvm_vcpu_check_block() * Selftests refinements and cleanups. * Misc typo cleanups. Generic: * remove KVM_REQ_UNHALT -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmM2zwcUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNpbwf+MlVeOlzE5SBdrJ0TEnLmKUel1lSz QnZzP5+D65oD0zhCilUZHcg6G4mzZ5SdVVOvrGJvA0eXh25ruLNMF6jbaABkMLk/ FfI1ybN7A82hwJn/aXMI/sUurWv4Jteaad20JC2DytBCnsW8jUqc49gtXHS2QWy4 3uMsFdpdTAg4zdJKgEUfXBmQviweVpjjl3ziRyZZ7yaeo1oP7XZ8LaE1nR2l5m0J mfjzneNm5QAnueypOh5KhSwIvqf6WHIVm/rIHDJ1HIFbgfOU0dT27nhb1tmPwAcE +cJnnMUHjZqtCXteHkAxMClyRq0zsEoKk0OGvSOOMoq3Q0DavSXUNANOig== =/hqX -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "The first batch of KVM patches, mostly covering x86. ARM: - Account stage2 page table allocations in memory stats x86: - Account EPT/NPT arm64 page table allocations in memory stats - Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR accesses - Drop eVMCS controls filtering for KVM on Hyper-V, all known versions of Hyper-V now support eVMCS fields associated with features that are enumerated to the guest - Use KVM's sanitized VMCS config as the basis for the values of nested VMX capabilities MSRs - A myriad event/exception fixes and cleanups. Most notably, pending exceptions morph into VM-Exits earlier, as soon as the exception is queued, instead of waiting until the next vmentry. This fixed a longstanding issue where the exceptions would incorrecly become double-faults instead of triggering a vmexit; the common case of page-fault vmexits had a special workaround, but now it's fixed for good - A handful of fixes for memory leaks in error paths - Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow - Never write to memory from non-sleepable kvm_vcpu_check_block() - Selftests refinements and cleanups - Misc typo cleanups Generic: - remove KVM_REQ_UNHALT" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits) KVM: remove KVM_REQ_UNHALT KVM: mips, x86: do not rely on KVM_REQ_UNHALT KVM: x86: never write to memory from kvm_vcpu_check_block() KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set KVM: x86: lapic does not have to process INIT if it is blocked KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed KVM: nVMX: Make an event request when pending an MTF nested VM-Exit KVM: x86: make vendor code check for all nested events mailmap: Update Oliver's email address KVM: x86: Allow force_emulation_prefix to be written without a reload KVM: selftests: Add an x86-only test to verify nested exception queueing KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events() KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions KVM: x86: Morph pending exceptions to pending VM-Exits at queue time ...
This commit is contained in:
commit
ef688f8b8c
1
.mailmap
1
.mailmap
@ -336,6 +336,7 @@ Oleksij Rempel <linux@rempel-privat.de> <external.Oleksij.Rempel@de.bosch.com>
|
||||
Oleksij Rempel <linux@rempel-privat.de> <fixed-term.Oleksij.Rempel@de.bosch.com>
|
||||
Oleksij Rempel <linux@rempel-privat.de> <o.rempel@pengutronix.de>
|
||||
Oleksij Rempel <linux@rempel-privat.de> <ore@pengutronix.de>
|
||||
Oliver Upton <oliver.upton@linux.dev> <oupton@google.com>
|
||||
Pali Rohár <pali@kernel.org> <pali.rohar@gmail.com>
|
||||
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
|
||||
Patrick Mochel <mochel@digitalimplant.org>
|
||||
|
@ -1355,6 +1355,11 @@ PAGE_SIZE multiple when read back.
|
||||
pagetables
|
||||
Amount of memory allocated for page tables.
|
||||
|
||||
sec_pagetables
|
||||
Amount of memory allocated for secondary page tables,
|
||||
this currently includes KVM mmu allocations on x86
|
||||
and arm64.
|
||||
|
||||
percpu (npn)
|
||||
Amount of memory used for storing per-cpu kernel
|
||||
data structures.
|
||||
|
@ -982,6 +982,7 @@ Example output. You may not have all of these fields.
|
||||
SUnreclaim: 142336 kB
|
||||
KernelStack: 11168 kB
|
||||
PageTables: 20540 kB
|
||||
SecPageTables: 0 kB
|
||||
NFS_Unstable: 0 kB
|
||||
Bounce: 0 kB
|
||||
WritebackTmp: 0 kB
|
||||
@ -1090,6 +1091,9 @@ KernelStack
|
||||
Memory consumed by the kernel stacks of all tasks
|
||||
PageTables
|
||||
Memory consumed by userspace page tables
|
||||
SecPageTables
|
||||
Memory consumed by secondary page tables, this currently
|
||||
currently includes KVM mmu allocations on x86 and arm64.
|
||||
NFS_Unstable
|
||||
Always zero. Previous counted pages which had been written to
|
||||
the server, but has not been committed to stable storage.
|
||||
|
@ -4074,7 +4074,7 @@ Queues an SMI on the thread's vcpu.
|
||||
4.97 KVM_X86_SET_MSR_FILTER
|
||||
----------------------------
|
||||
|
||||
:Capability: KVM_X86_SET_MSR_FILTER
|
||||
:Capability: KVM_CAP_X86_MSR_FILTER
|
||||
:Architectures: x86
|
||||
:Type: vm ioctl
|
||||
:Parameters: struct kvm_msr_filter
|
||||
@ -4173,8 +4173,10 @@ If an MSR access is not permitted through the filtering, it generates a
|
||||
allows user space to deflect and potentially handle various MSR accesses
|
||||
into user space.
|
||||
|
||||
If a vCPU is in running state while this ioctl is invoked, the vCPU may
|
||||
experience inconsistent filtering behavior on MSR accesses.
|
||||
Note, invoking this ioctl while a vCPU is running is inherently racy. However,
|
||||
KVM does guarantee that vCPUs will see either the previous filter or the new
|
||||
filter, e.g. MSRs with identical settings in both the old and new filter will
|
||||
have deterministic behavior.
|
||||
|
||||
4.98 KVM_CREATE_SPAPR_TCE_64
|
||||
----------------------------
|
||||
@ -5287,110 +5289,7 @@ KVM_PV_DUMP
|
||||
authentication tag all of which are needed to decrypt the dump at a
|
||||
later time.
|
||||
|
||||
|
||||
4.126 KVM_X86_SET_MSR_FILTER
|
||||
----------------------------
|
||||
|
||||
:Capability: KVM_CAP_X86_MSR_FILTER
|
||||
:Architectures: x86
|
||||
:Type: vm ioctl
|
||||
:Parameters: struct kvm_msr_filter
|
||||
:Returns: 0 on success, < 0 on error
|
||||
|
||||
::
|
||||
|
||||
struct kvm_msr_filter_range {
|
||||
#define KVM_MSR_FILTER_READ (1 << 0)
|
||||
#define KVM_MSR_FILTER_WRITE (1 << 1)
|
||||
__u32 flags;
|
||||
__u32 nmsrs; /* number of msrs in bitmap */
|
||||
__u32 base; /* MSR index the bitmap starts at */
|
||||
__u8 *bitmap; /* a 1 bit allows the operations in flags, 0 denies */
|
||||
};
|
||||
|
||||
#define KVM_MSR_FILTER_MAX_RANGES 16
|
||||
struct kvm_msr_filter {
|
||||
#define KVM_MSR_FILTER_DEFAULT_ALLOW (0 << 0)
|
||||
#define KVM_MSR_FILTER_DEFAULT_DENY (1 << 0)
|
||||
__u32 flags;
|
||||
struct kvm_msr_filter_range ranges[KVM_MSR_FILTER_MAX_RANGES];
|
||||
};
|
||||
|
||||
flags values for ``struct kvm_msr_filter_range``:
|
||||
|
||||
``KVM_MSR_FILTER_READ``
|
||||
|
||||
Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap
|
||||
indicates that a read should immediately fail, while a 1 indicates that
|
||||
a read for a particular MSR should be handled regardless of the default
|
||||
filter action.
|
||||
|
||||
``KVM_MSR_FILTER_WRITE``
|
||||
|
||||
Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap
|
||||
indicates that a write should immediately fail, while a 1 indicates that
|
||||
a write for a particular MSR should be handled regardless of the default
|
||||
filter action.
|
||||
|
||||
``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE``
|
||||
|
||||
Filter both read and write accesses to MSRs using the given bitmap. A 0
|
||||
in the bitmap indicates that both reads and writes should immediately fail,
|
||||
while a 1 indicates that reads and writes for a particular MSR are not
|
||||
filtered by this range.
|
||||
|
||||
flags values for ``struct kvm_msr_filter``:
|
||||
|
||||
``KVM_MSR_FILTER_DEFAULT_ALLOW``
|
||||
|
||||
If no filter range matches an MSR index that is getting accessed, KVM will
|
||||
fall back to allowing access to the MSR.
|
||||
|
||||
``KVM_MSR_FILTER_DEFAULT_DENY``
|
||||
|
||||
If no filter range matches an MSR index that is getting accessed, KVM will
|
||||
fall back to rejecting access to the MSR. In this mode, all MSRs that should
|
||||
be processed by KVM need to explicitly be marked as allowed in the bitmaps.
|
||||
|
||||
This ioctl allows user space to define up to 16 bitmaps of MSR ranges to
|
||||
specify whether a certain MSR access should be explicitly filtered for or not.
|
||||
|
||||
If this ioctl has never been invoked, MSR accesses are not guarded and the
|
||||
default KVM in-kernel emulation behavior is fully preserved.
|
||||
|
||||
Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR
|
||||
filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes
|
||||
an error.
|
||||
|
||||
As soon as the filtering is in place, every MSR access is processed through
|
||||
the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff);
|
||||
x2APIC MSRs are always allowed, independent of the ``default_allow`` setting,
|
||||
and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base
|
||||
register.
|
||||
|
||||
If a bit is within one of the defined ranges, read and write accesses are
|
||||
guarded by the bitmap's value for the MSR index if the kind of access
|
||||
is included in the ``struct kvm_msr_filter_range`` flags. If no range
|
||||
cover this particular access, the behavior is determined by the flags
|
||||
field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW``
|
||||
and ``KVM_MSR_FILTER_DEFAULT_DENY``.
|
||||
|
||||
Each bitmap range specifies a range of MSRs to potentially allow access on.
|
||||
The range goes from MSR index [base .. base+nmsrs]. The flags field
|
||||
indicates whether reads, writes or both reads and writes are filtered
|
||||
by setting a 1 bit in the bitmap for the corresponding MSR index.
|
||||
|
||||
If an MSR access is not permitted through the filtering, it generates a
|
||||
#GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that
|
||||
allows user space to deflect and potentially handle various MSR accesses
|
||||
into user space.
|
||||
|
||||
Note, invoking this ioctl with a vCPU is running is inherently racy. However,
|
||||
KVM does guarantee that vCPUs will see either the previous filter or the new
|
||||
filter, e.g. MSRs with identical settings in both the old and new filter will
|
||||
have deterministic behavior.
|
||||
|
||||
4.127 KVM_XEN_HVM_SET_ATTR
|
||||
4.126 KVM_XEN_HVM_SET_ATTR
|
||||
--------------------------
|
||||
|
||||
:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO
|
||||
|
@ -97,7 +97,7 @@ VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
|
||||
This means general bitops, like those documented in [atomic-ops]_ could
|
||||
also be used, e.g. ::
|
||||
|
||||
clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
|
||||
clear_bit(KVM_REQ_UNBLOCK & KVM_REQUEST_MASK, &vcpu->requests);
|
||||
|
||||
However, VCPU request users should refrain from doing so, as it would
|
||||
break the abstraction. The first 8 bits are reserved for architecture
|
||||
@ -126,17 +126,6 @@ KVM_REQ_UNBLOCK
|
||||
or in order to update the interrupt routing and ensure that assigned
|
||||
devices will wake up the vCPU.
|
||||
|
||||
KVM_REQ_UNHALT
|
||||
|
||||
This request may be made from the KVM common function kvm_vcpu_block(),
|
||||
which is used to emulate an instruction that causes a CPU to halt until
|
||||
one of an architectural specific set of events and/or interrupts is
|
||||
received (determined by checking kvm_arch_vcpu_runnable()). When that
|
||||
event or interrupt arrives kvm_vcpu_block() makes the request. This is
|
||||
in contrast to when kvm_vcpu_block() returns due to any other reason,
|
||||
such as a pending signal, which does not indicate the VCPU's halt
|
||||
emulation should stop, and therefore does not make the request.
|
||||
|
||||
KVM_REQ_OUTSIDE_GUEST_MODE
|
||||
|
||||
This "request" ensures the target vCPU has exited guest mode prior to the
|
||||
@ -297,21 +286,6 @@ architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
|
||||
to check if it should awaken. One reason to do so is to provide
|
||||
architectures a function where requests may be checked if necessary.
|
||||
|
||||
Clearing Requests
|
||||
-----------------
|
||||
|
||||
Generally it only makes sense for the receiving VCPU thread to clear a
|
||||
request. However, in some circumstances, such as when the requesting
|
||||
thread and the receiving VCPU thread are executed serially, such as when
|
||||
they are the same thread, or when they are using some form of concurrency
|
||||
control to temporarily execute synchronously, then it's possible to know
|
||||
that the request may be cleared immediately, rather than waiting for the
|
||||
receiving VCPU thread to handle the request in VCPU RUN. The only current
|
||||
examples of this are kvm_vcpu_block() calls made by VCPUs to block
|
||||
themselves. A possible side-effect of that call is to make the
|
||||
KVM_REQ_UNHALT request, which may then be cleared immediately when the
|
||||
VCPU returns from the call.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
|
@ -666,7 +666,6 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
|
||||
|
||||
kvm_vcpu_halt(vcpu);
|
||||
vcpu_clear_flag(vcpu, IN_WFIT);
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
|
||||
preempt_disable();
|
||||
vgic_v4_load(vcpu);
|
||||
|
@ -92,9 +92,13 @@ static bool kvm_is_device_pfn(unsigned long pfn)
|
||||
static void *stage2_memcache_zalloc_page(void *arg)
|
||||
{
|
||||
struct kvm_mmu_memory_cache *mc = arg;
|
||||
void *virt;
|
||||
|
||||
/* Allocated with __GFP_ZERO, so no need to zero */
|
||||
return kvm_mmu_memory_cache_alloc(mc);
|
||||
virt = kvm_mmu_memory_cache_alloc(mc);
|
||||
if (virt)
|
||||
kvm_account_pgtable_pages(virt, 1);
|
||||
return virt;
|
||||
}
|
||||
|
||||
static void *kvm_host_zalloc_pages_exact(size_t size)
|
||||
@ -102,6 +106,21 @@ static void *kvm_host_zalloc_pages_exact(size_t size)
|
||||
return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
|
||||
}
|
||||
|
||||
static void *kvm_s2_zalloc_pages_exact(size_t size)
|
||||
{
|
||||
void *virt = kvm_host_zalloc_pages_exact(size);
|
||||
|
||||
if (virt)
|
||||
kvm_account_pgtable_pages(virt, (size >> PAGE_SHIFT));
|
||||
return virt;
|
||||
}
|
||||
|
||||
static void kvm_s2_free_pages_exact(void *virt, size_t size)
|
||||
{
|
||||
kvm_account_pgtable_pages(virt, -(size >> PAGE_SHIFT));
|
||||
free_pages_exact(virt, size);
|
||||
}
|
||||
|
||||
static void kvm_host_get_page(void *addr)
|
||||
{
|
||||
get_page(virt_to_page(addr));
|
||||
@ -112,6 +131,15 @@ static void kvm_host_put_page(void *addr)
|
||||
put_page(virt_to_page(addr));
|
||||
}
|
||||
|
||||
static void kvm_s2_put_page(void *addr)
|
||||
{
|
||||
struct page *p = virt_to_page(addr);
|
||||
/* Dropping last refcount, the page will be freed */
|
||||
if (page_count(p) == 1)
|
||||
kvm_account_pgtable_pages(addr, -1);
|
||||
put_page(p);
|
||||
}
|
||||
|
||||
static int kvm_host_page_count(void *addr)
|
||||
{
|
||||
return page_count(virt_to_page(addr));
|
||||
@ -625,10 +653,10 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
|
||||
|
||||
static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
|
||||
.zalloc_page = stage2_memcache_zalloc_page,
|
||||
.zalloc_pages_exact = kvm_host_zalloc_pages_exact,
|
||||
.free_pages_exact = free_pages_exact,
|
||||
.zalloc_pages_exact = kvm_s2_zalloc_pages_exact,
|
||||
.free_pages_exact = kvm_s2_free_pages_exact,
|
||||
.get_page = kvm_host_get_page,
|
||||
.put_page = kvm_host_put_page,
|
||||
.put_page = kvm_s2_put_page,
|
||||
.page_count = kvm_host_page_count,
|
||||
.phys_to_virt = kvm_host_va,
|
||||
.virt_to_phys = kvm_host_pa,
|
||||
|
@ -955,13 +955,11 @@ enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu *vcpu)
|
||||
kvm_vcpu_halt(vcpu);
|
||||
|
||||
/*
|
||||
* We we are runnable, then definitely go off to user space to
|
||||
* We are runnable, then definitely go off to user space to
|
||||
* check if any I/O interrupts are pending.
|
||||
*/
|
||||
if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
if (kvm_arch_vcpu_runnable(vcpu))
|
||||
vcpu->run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
|
||||
}
|
||||
}
|
||||
|
||||
return EMULATE_DONE;
|
||||
|
@ -499,7 +499,6 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
|
||||
if (msr & MSR_POW) {
|
||||
if (!vcpu->arch.pending_exceptions) {
|
||||
kvm_vcpu_halt(vcpu);
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
vcpu->stat.generic.halt_wakeup++;
|
||||
|
||||
/* Unset POW bit after we woke up */
|
||||
|
@ -393,7 +393,6 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
|
||||
case H_CEDE:
|
||||
kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
|
||||
kvm_vcpu_halt(vcpu);
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
vcpu->stat.generic.halt_wakeup++;
|
||||
return EMULATE_DONE;
|
||||
case H_LOGICAL_CI_LOAD:
|
||||
|
@ -719,7 +719,6 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
|
||||
if (vcpu->arch.shared->msr & MSR_WE) {
|
||||
local_irq_enable();
|
||||
kvm_vcpu_halt(vcpu);
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
hard_irq_disable();
|
||||
|
||||
kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS);
|
||||
|
@ -239,7 +239,6 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
|
||||
case EV_HCALL_TOKEN(EV_IDLE):
|
||||
r = EV_SUCCESS;
|
||||
kvm_vcpu_halt(vcpu);
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
break;
|
||||
default:
|
||||
r = EV_UNIMPLEMENTED;
|
||||
|
@ -191,7 +191,6 @@ void kvm_riscv_vcpu_wfi(struct kvm_vcpu *vcpu)
|
||||
kvm_vcpu_srcu_read_unlock(vcpu);
|
||||
kvm_vcpu_halt(vcpu);
|
||||
kvm_vcpu_srcu_read_lock(vcpu);
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -4343,8 +4343,6 @@ retry:
|
||||
goto retry;
|
||||
}
|
||||
|
||||
/* nothing to do, just clear the request */
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
/* we left the vsie handler, nothing to do, just clear the request */
|
||||
kvm_clear_request(KVM_REQ_VSIE_RESTART, vcpu);
|
||||
|
||||
|
@ -138,6 +138,9 @@
|
||||
#define HV_X64_NESTED_GUEST_MAPPING_FLUSH BIT(18)
|
||||
#define HV_X64_NESTED_MSR_BITMAP BIT(19)
|
||||
|
||||
/* Nested features #2. These are HYPERV_CPUID_NESTED_FEATURES.EBX bits. */
|
||||
#define HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL BIT(0)
|
||||
|
||||
/*
|
||||
* This is specific to AMD and specifies that enlightened TLB flush is
|
||||
* supported. If guest opts in to this feature, ASID invalidations only
|
||||
@ -546,7 +549,7 @@ struct hv_enlightened_vmcs {
|
||||
u64 guest_rip;
|
||||
|
||||
u32 hv_clean_fields;
|
||||
u32 hv_padding_32;
|
||||
u32 padding32_1;
|
||||
u32 hv_synthetic_controls;
|
||||
struct {
|
||||
u32 nested_flush_hypercall:1;
|
||||
@ -554,14 +557,25 @@ struct hv_enlightened_vmcs {
|
||||
u32 reserved:30;
|
||||
} __packed hv_enlightenments_control;
|
||||
u32 hv_vp_id;
|
||||
|
||||
u32 padding32_2;
|
||||
u64 hv_vm_id;
|
||||
u64 partition_assist_page;
|
||||
u64 padding64_4[4];
|
||||
u64 guest_bndcfgs;
|
||||
u64 padding64_5[7];
|
||||
u64 guest_ia32_perf_global_ctrl;
|
||||
u64 guest_ia32_s_cet;
|
||||
u64 guest_ssp;
|
||||
u64 guest_ia32_int_ssp_table_addr;
|
||||
u64 guest_ia32_lbr_ctl;
|
||||
u64 padding64_5[2];
|
||||
u64 xss_exit_bitmap;
|
||||
u64 padding64_6[7];
|
||||
u64 encls_exiting_bitmap;
|
||||
u64 host_ia32_perf_global_ctrl;
|
||||
u64 tsc_multiplier;
|
||||
u64 host_ia32_s_cet;
|
||||
u64 host_ssp;
|
||||
u64 host_ia32_int_ssp_table_addr;
|
||||
u64 padding64_6;
|
||||
} __packed;
|
||||
|
||||
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE 0
|
||||
|
@ -67,7 +67,7 @@ KVM_X86_OP(get_interrupt_shadow)
|
||||
KVM_X86_OP(patch_hypercall)
|
||||
KVM_X86_OP(inject_irq)
|
||||
KVM_X86_OP(inject_nmi)
|
||||
KVM_X86_OP(queue_exception)
|
||||
KVM_X86_OP(inject_exception)
|
||||
KVM_X86_OP(cancel_injection)
|
||||
KVM_X86_OP(interrupt_allowed)
|
||||
KVM_X86_OP(nmi_allowed)
|
||||
|
@ -615,6 +615,8 @@ struct kvm_vcpu_hv {
|
||||
u32 enlightenments_eax; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EAX */
|
||||
u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
|
||||
u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
|
||||
u32 nested_eax; /* HYPERV_CPUID_NESTED_FEATURES.EAX */
|
||||
u32 nested_ebx; /* HYPERV_CPUID_NESTED_FEATURES.EBX */
|
||||
} cpuid_cache;
|
||||
};
|
||||
|
||||
@ -639,6 +641,16 @@ struct kvm_vcpu_xen {
|
||||
struct timer_list poll_timer;
|
||||
};
|
||||
|
||||
struct kvm_queued_exception {
|
||||
bool pending;
|
||||
bool injected;
|
||||
bool has_error_code;
|
||||
u8 vector;
|
||||
u32 error_code;
|
||||
unsigned long payload;
|
||||
bool has_payload;
|
||||
};
|
||||
|
||||
struct kvm_vcpu_arch {
|
||||
/*
|
||||
* rip and regs accesses must go through
|
||||
@ -738,16 +750,12 @@ struct kvm_vcpu_arch {
|
||||
|
||||
u8 event_exit_inst_len;
|
||||
|
||||
struct kvm_queued_exception {
|
||||
bool pending;
|
||||
bool injected;
|
||||
bool has_error_code;
|
||||
u8 nr;
|
||||
u32 error_code;
|
||||
unsigned long payload;
|
||||
bool has_payload;
|
||||
u8 nested_apf;
|
||||
} exception;
|
||||
bool exception_from_userspace;
|
||||
|
||||
/* Exceptions to be injected to the guest. */
|
||||
struct kvm_queued_exception exception;
|
||||
/* Exception VM-Exits to be synthesized to L1. */
|
||||
struct kvm_queued_exception exception_vmexit;
|
||||
|
||||
struct kvm_queued_interrupt {
|
||||
bool injected;
|
||||
@ -858,7 +866,6 @@ struct kvm_vcpu_arch {
|
||||
u32 id;
|
||||
bool send_user_only;
|
||||
u32 host_apf_flags;
|
||||
unsigned long nested_apf_token;
|
||||
bool delivery_as_pf_vmexit;
|
||||
bool pageready_pending;
|
||||
} apf;
|
||||
@ -1524,7 +1531,7 @@ struct kvm_x86_ops {
|
||||
unsigned char *hypercall_addr);
|
||||
void (*inject_irq)(struct kvm_vcpu *vcpu, bool reinjected);
|
||||
void (*inject_nmi)(struct kvm_vcpu *vcpu);
|
||||
void (*queue_exception)(struct kvm_vcpu *vcpu);
|
||||
void (*inject_exception)(struct kvm_vcpu *vcpu);
|
||||
void (*cancel_injection)(struct kvm_vcpu *vcpu);
|
||||
int (*interrupt_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
|
||||
int (*nmi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
|
||||
@ -1634,10 +1641,10 @@ struct kvm_x86_ops {
|
||||
|
||||
struct kvm_x86_nested_ops {
|
||||
void (*leave_nested)(struct kvm_vcpu *vcpu);
|
||||
bool (*is_exception_vmexit)(struct kvm_vcpu *vcpu, u8 vector,
|
||||
u32 error_code);
|
||||
int (*check_events)(struct kvm_vcpu *vcpu);
|
||||
bool (*handle_page_fault_workaround)(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault);
|
||||
bool (*hv_timer_pending)(struct kvm_vcpu *vcpu);
|
||||
bool (*has_events)(struct kvm_vcpu *vcpu);
|
||||
void (*triple_fault)(struct kvm_vcpu *vcpu);
|
||||
int (*get_state)(struct kvm_vcpu *vcpu,
|
||||
struct kvm_nested_state __user *user_kvm_nested_state,
|
||||
@ -1863,7 +1870,7 @@ void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long pay
|
||||
void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
|
||||
void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
|
||||
void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
|
||||
bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
|
||||
void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault);
|
||||
bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
|
||||
bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
|
||||
|
@ -311,6 +311,15 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
|
||||
|
||||
static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
|
||||
{
|
||||
struct kvm_cpuid_entry2 *entry;
|
||||
|
||||
entry = cpuid_entry2_find(entries, nent, HYPERV_CPUID_INTERFACE,
|
||||
KVM_CPUID_INDEX_NOT_SIGNIFICANT);
|
||||
return entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX;
|
||||
}
|
||||
|
||||
static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
@ -346,7 +355,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
vcpu->arch.cr4_guest_rsvd_bits =
|
||||
__cr4_reserved_bits(guest_cpuid_has, vcpu);
|
||||
|
||||
kvm_hv_set_cpuid(vcpu);
|
||||
kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
|
||||
vcpu->arch.cpuid_nent));
|
||||
|
||||
/* Invoke the vendor callback only after the above state is updated. */
|
||||
static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);
|
||||
@ -409,6 +419,12 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (kvm_cpuid_has_hyperv(e2, nent)) {
|
||||
r = kvm_hv_vcpu_init(vcpu);
|
||||
if (r)
|
||||
return r;
|
||||
}
|
||||
|
||||
r = kvm_check_cpuid(vcpu, e2, nent);
|
||||
if (r)
|
||||
return r;
|
||||
|
@ -1137,9 +1137,11 @@ static int em_fnstsw(struct x86_emulate_ctxt *ctxt)
|
||||
static void decode_register_operand(struct x86_emulate_ctxt *ctxt,
|
||||
struct operand *op)
|
||||
{
|
||||
unsigned reg = ctxt->modrm_reg;
|
||||
unsigned int reg;
|
||||
|
||||
if (!(ctxt->d & ModRM))
|
||||
if (ctxt->d & ModRM)
|
||||
reg = ctxt->modrm_reg;
|
||||
else
|
||||
reg = (ctxt->b & 7) | ((ctxt->rex_prefix & 1) << 3);
|
||||
|
||||
if (ctxt->d & Sse) {
|
||||
@ -1953,7 +1955,7 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
|
||||
if (rc != X86EMUL_CONTINUE)
|
||||
return rc;
|
||||
|
||||
if (ctxt->modrm_reg == VCPU_SREG_SS)
|
||||
if (seg == VCPU_SREG_SS)
|
||||
ctxt->interruptibility = KVM_X86_SHADOW_INT_MOV_SS;
|
||||
if (ctxt->op_bytes > 2)
|
||||
rsp_increment(ctxt, ctxt->op_bytes - 2);
|
||||
@ -3645,13 +3647,10 @@ static int em_wrmsr(struct x86_emulate_ctxt *ctxt)
|
||||
| ((u64)reg_read(ctxt, VCPU_REGS_RDX) << 32);
|
||||
r = ctxt->ops->set_msr_with_filter(ctxt, msr_index, msr_data);
|
||||
|
||||
if (r == X86EMUL_IO_NEEDED)
|
||||
return r;
|
||||
|
||||
if (r > 0)
|
||||
if (r == X86EMUL_PROPAGATE_FAULT)
|
||||
return emulate_gp(ctxt, 0);
|
||||
|
||||
return r < 0 ? X86EMUL_UNHANDLEABLE : X86EMUL_CONTINUE;
|
||||
return r;
|
||||
}
|
||||
|
||||
static int em_rdmsr(struct x86_emulate_ctxt *ctxt)
|
||||
@ -3662,15 +3661,14 @@ static int em_rdmsr(struct x86_emulate_ctxt *ctxt)
|
||||
|
||||
r = ctxt->ops->get_msr_with_filter(ctxt, msr_index, &msr_data);
|
||||
|
||||
if (r == X86EMUL_IO_NEEDED)
|
||||
return r;
|
||||
|
||||
if (r)
|
||||
if (r == X86EMUL_PROPAGATE_FAULT)
|
||||
return emulate_gp(ctxt, 0);
|
||||
|
||||
*reg_write(ctxt, VCPU_REGS_RAX) = (u32)msr_data;
|
||||
*reg_write(ctxt, VCPU_REGS_RDX) = msr_data >> 32;
|
||||
return X86EMUL_CONTINUE;
|
||||
if (r == X86EMUL_CONTINUE) {
|
||||
*reg_write(ctxt, VCPU_REGS_RAX) = (u32)msr_data;
|
||||
*reg_write(ctxt, VCPU_REGS_RDX) = msr_data >> 32;
|
||||
}
|
||||
return r;
|
||||
}
|
||||
|
||||
static int em_store_sreg(struct x86_emulate_ctxt *ctxt, int segment)
|
||||
@ -4171,8 +4169,7 @@ static int check_dr7_gd(struct x86_emulate_ctxt *ctxt)
|
||||
|
||||
ctxt->ops->get_dr(ctxt, 7, &dr7);
|
||||
|
||||
/* Check if DR7.Global_Enable is set */
|
||||
return dr7 & (1 << 13);
|
||||
return dr7 & DR7_GD;
|
||||
}
|
||||
|
||||
static int check_dr_read(struct x86_emulate_ctxt *ctxt)
|
||||
|
@ -38,9 +38,6 @@
|
||||
#include "irq.h"
|
||||
#include "fpu.h"
|
||||
|
||||
/* "Hv#1" signature */
|
||||
#define HYPERV_CPUID_SIGNATURE_EAX 0x31237648
|
||||
|
||||
#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
|
||||
|
||||
static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer,
|
||||
@ -934,11 +931,14 @@ static void stimer_init(struct kvm_vcpu_hv_stimer *stimer, int timer_index)
|
||||
stimer_prepare_msg(stimer);
|
||||
}
|
||||
|
||||
static int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)
|
||||
int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_vcpu_hv *hv_vcpu;
|
||||
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
|
||||
int i;
|
||||
|
||||
if (hv_vcpu)
|
||||
return 0;
|
||||
|
||||
hv_vcpu = kzalloc(sizeof(struct kvm_vcpu_hv), GFP_KERNEL_ACCOUNT);
|
||||
if (!hv_vcpu)
|
||||
return -ENOMEM;
|
||||
@ -962,11 +962,9 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages)
|
||||
struct kvm_vcpu_hv_synic *synic;
|
||||
int r;
|
||||
|
||||
if (!to_hv_vcpu(vcpu)) {
|
||||
r = kvm_hv_vcpu_init(vcpu);
|
||||
if (r)
|
||||
return r;
|
||||
}
|
||||
r = kvm_hv_vcpu_init(vcpu);
|
||||
if (r)
|
||||
return r;
|
||||
|
||||
synic = to_hv_synic(vcpu);
|
||||
|
||||
@ -1660,10 +1658,8 @@ int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
|
||||
if (!host && !vcpu->arch.hyperv_enabled)
|
||||
return 1;
|
||||
|
||||
if (!to_hv_vcpu(vcpu)) {
|
||||
if (kvm_hv_vcpu_init(vcpu))
|
||||
return 1;
|
||||
}
|
||||
if (kvm_hv_vcpu_init(vcpu))
|
||||
return 1;
|
||||
|
||||
if (kvm_hv_msr_partition_wide(msr)) {
|
||||
int r;
|
||||
@ -1683,10 +1679,8 @@ int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
|
||||
if (!host && !vcpu->arch.hyperv_enabled)
|
||||
return 1;
|
||||
|
||||
if (!to_hv_vcpu(vcpu)) {
|
||||
if (kvm_hv_vcpu_init(vcpu))
|
||||
return 1;
|
||||
}
|
||||
if (kvm_hv_vcpu_init(vcpu))
|
||||
return 1;
|
||||
|
||||
if (kvm_hv_msr_partition_wide(msr)) {
|
||||
int r;
|
||||
@ -1987,49 +1981,49 @@ ret_success:
|
||||
return HV_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu, bool hyperv_enabled)
|
||||
{
|
||||
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
|
||||
struct kvm_cpuid_entry2 *entry;
|
||||
struct kvm_vcpu_hv *hv_vcpu;
|
||||
|
||||
entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_INTERFACE);
|
||||
if (entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX) {
|
||||
vcpu->arch.hyperv_enabled = true;
|
||||
} else {
|
||||
vcpu->arch.hyperv_enabled = false;
|
||||
vcpu->arch.hyperv_enabled = hyperv_enabled;
|
||||
|
||||
if (!hv_vcpu) {
|
||||
/*
|
||||
* KVM should have already allocated kvm_vcpu_hv if Hyper-V is
|
||||
* enabled in CPUID.
|
||||
*/
|
||||
WARN_ON_ONCE(vcpu->arch.hyperv_enabled);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!to_hv_vcpu(vcpu) && kvm_hv_vcpu_init(vcpu))
|
||||
return;
|
||||
memset(&hv_vcpu->cpuid_cache, 0, sizeof(hv_vcpu->cpuid_cache));
|
||||
|
||||
hv_vcpu = to_hv_vcpu(vcpu);
|
||||
if (!vcpu->arch.hyperv_enabled)
|
||||
return;
|
||||
|
||||
entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_FEATURES);
|
||||
if (entry) {
|
||||
hv_vcpu->cpuid_cache.features_eax = entry->eax;
|
||||
hv_vcpu->cpuid_cache.features_ebx = entry->ebx;
|
||||
hv_vcpu->cpuid_cache.features_edx = entry->edx;
|
||||
} else {
|
||||
hv_vcpu->cpuid_cache.features_eax = 0;
|
||||
hv_vcpu->cpuid_cache.features_ebx = 0;
|
||||
hv_vcpu->cpuid_cache.features_edx = 0;
|
||||
}
|
||||
|
||||
entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_ENLIGHTMENT_INFO);
|
||||
if (entry) {
|
||||
hv_vcpu->cpuid_cache.enlightenments_eax = entry->eax;
|
||||
hv_vcpu->cpuid_cache.enlightenments_ebx = entry->ebx;
|
||||
} else {
|
||||
hv_vcpu->cpuid_cache.enlightenments_eax = 0;
|
||||
hv_vcpu->cpuid_cache.enlightenments_ebx = 0;
|
||||
}
|
||||
|
||||
entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES);
|
||||
if (entry)
|
||||
hv_vcpu->cpuid_cache.syndbg_cap_eax = entry->eax;
|
||||
else
|
||||
hv_vcpu->cpuid_cache.syndbg_cap_eax = 0;
|
||||
|
||||
entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_NESTED_FEATURES);
|
||||
if (entry) {
|
||||
hv_vcpu->cpuid_cache.nested_eax = entry->eax;
|
||||
hv_vcpu->cpuid_cache.nested_ebx = entry->ebx;
|
||||
}
|
||||
}
|
||||
|
||||
int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
|
||||
@ -2552,7 +2546,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
|
||||
case HYPERV_CPUID_NESTED_FEATURES:
|
||||
ent->eax = evmcs_ver;
|
||||
ent->eax |= HV_X64_NESTED_MSR_BITMAP;
|
||||
|
||||
ent->ebx |= HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL;
|
||||
break;
|
||||
|
||||
case HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS:
|
||||
|
@ -23,6 +23,9 @@
|
||||
|
||||
#include <linux/kvm_host.h>
|
||||
|
||||
/* "Hv#1" signature */
|
||||
#define HYPERV_CPUID_SIGNATURE_EAX 0x31237648
|
||||
|
||||
/*
|
||||
* The #defines related to the synthetic debugger are required by KDNet, but
|
||||
* they are not documented in the Hyper-V TLFS because the synthetic debugger
|
||||
@ -141,7 +144,8 @@ void kvm_hv_request_tsc_page_update(struct kvm *kvm);
|
||||
|
||||
void kvm_hv_init_vm(struct kvm *kvm);
|
||||
void kvm_hv_destroy_vm(struct kvm *kvm);
|
||||
void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu);
|
||||
int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu);
|
||||
void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu, bool hyperv_enabled);
|
||||
int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce);
|
||||
int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
|
||||
int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
|
||||
|
@ -3025,17 +3025,8 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
u8 sipi_vector;
|
||||
int r;
|
||||
unsigned long pe;
|
||||
|
||||
if (!lapic_in_kernel(vcpu))
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* Read pending events before calling the check_events
|
||||
* callback.
|
||||
*/
|
||||
pe = smp_load_acquire(&apic->pending_events);
|
||||
if (!pe)
|
||||
if (!kvm_apic_has_pending_init_or_sipi(vcpu))
|
||||
return 0;
|
||||
|
||||
if (is_guest_mode(vcpu)) {
|
||||
@ -3043,38 +3034,31 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
|
||||
if (r < 0)
|
||||
return r == -EBUSY ? 0 : r;
|
||||
/*
|
||||
* If an event has happened and caused a vmexit,
|
||||
* we know INITs are latched and therefore
|
||||
* we will not incorrectly deliver an APIC
|
||||
* event instead of a vmexit.
|
||||
* Continue processing INIT/SIPI even if a nested VM-Exit
|
||||
* occurred, e.g. pending SIPIs should be dropped if INIT+SIPI
|
||||
* are blocked as a result of transitioning to VMX root mode.
|
||||
*/
|
||||
}
|
||||
|
||||
/*
|
||||
* INITs are latched while CPU is in specific states
|
||||
* (SMM, VMX root mode, SVM with GIF=0).
|
||||
* Because a CPU cannot be in these states immediately
|
||||
* after it has processed an INIT signal (and thus in
|
||||
* KVM_MP_STATE_INIT_RECEIVED state), just eat SIPIs
|
||||
* and leave the INIT pending.
|
||||
* INITs are blocked while CPU is in specific states (SMM, VMX root
|
||||
* mode, SVM with GIF=0), while SIPIs are dropped if the CPU isn't in
|
||||
* wait-for-SIPI (WFS).
|
||||
*/
|
||||
if (kvm_vcpu_latch_init(vcpu)) {
|
||||
if (!kvm_apic_init_sipi_allowed(vcpu)) {
|
||||
WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
|
||||
if (test_bit(KVM_APIC_SIPI, &pe))
|
||||
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
|
||||
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (test_bit(KVM_APIC_INIT, &pe)) {
|
||||
clear_bit(KVM_APIC_INIT, &apic->pending_events);
|
||||
if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
|
||||
kvm_vcpu_reset(vcpu, true);
|
||||
if (kvm_vcpu_is_bsp(apic->vcpu))
|
||||
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
|
||||
else
|
||||
vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
|
||||
}
|
||||
if (test_bit(KVM_APIC_SIPI, &pe)) {
|
||||
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
|
||||
if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events)) {
|
||||
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
|
||||
/* evaluate pending_events before reading the vector */
|
||||
smp_rmb();
|
||||
|
@ -7,6 +7,7 @@
|
||||
#include <linux/kvm_host.h>
|
||||
|
||||
#include "hyperv.h"
|
||||
#include "kvm_cache_regs.h"
|
||||
|
||||
#define KVM_APIC_INIT 0
|
||||
#define KVM_APIC_SIPI 1
|
||||
@ -223,11 +224,17 @@ static inline bool kvm_vcpu_apicv_active(struct kvm_vcpu *vcpu)
|
||||
return lapic_in_kernel(vcpu) && vcpu->arch.apic->apicv_active;
|
||||
}
|
||||
|
||||
static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
|
||||
static inline bool kvm_apic_has_pending_init_or_sipi(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return lapic_in_kernel(vcpu) && vcpu->arch.apic->pending_events;
|
||||
}
|
||||
|
||||
static inline bool kvm_apic_init_sipi_allowed(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return !is_smm(vcpu) &&
|
||||
!static_call(kvm_x86_apic_init_signal_blocked)(vcpu);
|
||||
}
|
||||
|
||||
static inline bool kvm_lowest_prio_delivery(struct kvm_lapic_irq *irq)
|
||||
{
|
||||
return (irq->delivery_mode == APIC_DM_LOWEST ||
|
||||
|
@ -1667,6 +1667,18 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
|
||||
percpu_counter_add(&kvm_total_used_mmu_pages, nr);
|
||||
}
|
||||
|
||||
static void kvm_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
||||
{
|
||||
kvm_mod_used_mmu_pages(kvm, +1);
|
||||
kvm_account_pgtable_pages((void *)sp->spt, +1);
|
||||
}
|
||||
|
||||
static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
||||
{
|
||||
kvm_mod_used_mmu_pages(kvm, -1);
|
||||
kvm_account_pgtable_pages((void *)sp->spt, -1);
|
||||
}
|
||||
|
||||
static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
|
||||
{
|
||||
MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
|
||||
@ -2124,7 +2136,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
|
||||
*/
|
||||
sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
|
||||
list_add(&sp->link, &kvm->arch.active_mmu_pages);
|
||||
kvm_mod_used_mmu_pages(kvm, +1);
|
||||
kvm_account_mmu_page(kvm, sp);
|
||||
|
||||
sp->gfn = gfn;
|
||||
sp->role = role;
|
||||
@ -2458,7 +2470,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
|
||||
list_add(&sp->link, invalid_list);
|
||||
else
|
||||
list_move(&sp->link, invalid_list);
|
||||
kvm_mod_used_mmu_pages(kvm, -1);
|
||||
kvm_unaccount_mmu_page(kvm, sp);
|
||||
} else {
|
||||
/*
|
||||
* Remove the active root from the active page list, the root
|
||||
@ -4292,7 +4304,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
|
||||
|
||||
vcpu->arch.l1tf_flush_l1d = true;
|
||||
if (!flags) {
|
||||
trace_kvm_page_fault(fault_address, error_code);
|
||||
trace_kvm_page_fault(vcpu, fault_address, error_code);
|
||||
|
||||
if (kvm_event_needs_reinjection(vcpu))
|
||||
kvm_mmu_unprotect_page_virt(vcpu, fault_address);
|
||||
@ -6704,10 +6716,12 @@ int kvm_mmu_vendor_module_init(void)
|
||||
|
||||
ret = register_shrinker(&mmu_shrinker, "x86-mmu");
|
||||
if (ret)
|
||||
goto out;
|
||||
goto out_shrinker;
|
||||
|
||||
return 0;
|
||||
|
||||
out_shrinker:
|
||||
percpu_counter_destroy(&kvm_total_used_mmu_pages);
|
||||
out:
|
||||
mmu_destroy_caches();
|
||||
return ret;
|
||||
|
@ -472,7 +472,7 @@ error:
|
||||
|
||||
#if PTTYPE == PTTYPE_EPT
|
||||
/*
|
||||
* Use PFERR_RSVD_MASK in error_code to to tell if EPT
|
||||
* Use PFERR_RSVD_MASK in error_code to tell if EPT
|
||||
* misconfiguration requires to be injected. The detection is
|
||||
* done by is_rsvd_bits_set() above.
|
||||
*
|
||||
|
@ -372,6 +372,16 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
|
||||
}
|
||||
}
|
||||
|
||||
static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
||||
{
|
||||
kvm_account_pgtable_pages((void *)sp->spt, +1);
|
||||
}
|
||||
|
||||
static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
||||
{
|
||||
kvm_account_pgtable_pages((void *)sp->spt, -1);
|
||||
}
|
||||
|
||||
/**
|
||||
* tdp_mmu_unlink_sp() - Remove a shadow page from the list of used pages
|
||||
*
|
||||
@ -384,6 +394,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
|
||||
static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
|
||||
bool shared)
|
||||
{
|
||||
tdp_unaccount_mmu_page(kvm, sp);
|
||||
if (shared)
|
||||
spin_lock(&kvm->arch.tdp_mmu_pages_lock);
|
||||
else
|
||||
@ -1132,6 +1143,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
|
||||
if (account_nx)
|
||||
account_huge_nx_page(kvm, sp);
|
||||
spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
|
||||
tdp_account_mmu_page(kvm, sp);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -55,28 +55,6 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
|
||||
nested_svm_vmexit(svm);
|
||||
}
|
||||
|
||||
static bool nested_svm_handle_page_fault_workaround(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
struct vmcb *vmcb = svm->vmcb;
|
||||
|
||||
WARN_ON(!is_guest_mode(vcpu));
|
||||
|
||||
if (vmcb12_is_intercept(&svm->nested.ctl,
|
||||
INTERCEPT_EXCEPTION_OFFSET + PF_VECTOR) &&
|
||||
!WARN_ON_ONCE(svm->nested.nested_run_pending)) {
|
||||
vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + PF_VECTOR;
|
||||
vmcb->control.exit_code_hi = 0;
|
||||
vmcb->control.exit_info_1 = fault->error_code;
|
||||
vmcb->control.exit_info_2 = fault->address;
|
||||
nested_svm_vmexit(svm);
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
@ -468,7 +446,7 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
|
||||
unsigned int nr;
|
||||
|
||||
if (vcpu->arch.exception.injected) {
|
||||
nr = vcpu->arch.exception.nr;
|
||||
nr = vcpu->arch.exception.vector;
|
||||
exit_int_info = nr | SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_EXEPT;
|
||||
|
||||
if (vcpu->arch.exception.has_error_code) {
|
||||
@ -781,11 +759,15 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
int ret;
|
||||
|
||||
trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb12_gpa,
|
||||
vmcb12->save.rip,
|
||||
vmcb12->control.int_ctl,
|
||||
vmcb12->control.event_inj,
|
||||
vmcb12->control.nested_ctl);
|
||||
trace_kvm_nested_vmenter(svm->vmcb->save.rip,
|
||||
vmcb12_gpa,
|
||||
vmcb12->save.rip,
|
||||
vmcb12->control.int_ctl,
|
||||
vmcb12->control.event_inj,
|
||||
vmcb12->control.nested_ctl,
|
||||
vmcb12->control.nested_cr3,
|
||||
vmcb12->save.cr3,
|
||||
KVM_ISA_SVM);
|
||||
|
||||
trace_kvm_nested_intercepts(vmcb12->control.intercepts[INTERCEPT_CR] & 0xffff,
|
||||
vmcb12->control.intercepts[INTERCEPT_CR] >> 16,
|
||||
@ -1304,44 +1286,46 @@ int nested_svm_check_permissions(struct kvm_vcpu *vcpu)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static bool nested_exit_on_exception(struct vcpu_svm *svm)
|
||||
static bool nested_svm_is_exception_vmexit(struct kvm_vcpu *vcpu, u8 vector,
|
||||
u32 error_code)
|
||||
{
|
||||
unsigned int nr = svm->vcpu.arch.exception.nr;
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(nr));
|
||||
return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(vector));
|
||||
}
|
||||
|
||||
static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
|
||||
static void nested_svm_inject_exception_vmexit(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
unsigned int nr = svm->vcpu.arch.exception.nr;
|
||||
struct kvm_queued_exception *ex = &vcpu->arch.exception_vmexit;
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
struct vmcb *vmcb = svm->vmcb;
|
||||
|
||||
vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
|
||||
vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + ex->vector;
|
||||
vmcb->control.exit_code_hi = 0;
|
||||
|
||||
if (svm->vcpu.arch.exception.has_error_code)
|
||||
vmcb->control.exit_info_1 = svm->vcpu.arch.exception.error_code;
|
||||
if (ex->has_error_code)
|
||||
vmcb->control.exit_info_1 = ex->error_code;
|
||||
|
||||
/*
|
||||
* EXITINFO2 is undefined for all exception intercepts other
|
||||
* than #PF.
|
||||
*/
|
||||
if (nr == PF_VECTOR) {
|
||||
if (svm->vcpu.arch.exception.nested_apf)
|
||||
vmcb->control.exit_info_2 = svm->vcpu.arch.apf.nested_apf_token;
|
||||
else if (svm->vcpu.arch.exception.has_payload)
|
||||
vmcb->control.exit_info_2 = svm->vcpu.arch.exception.payload;
|
||||
if (ex->vector == PF_VECTOR) {
|
||||
if (ex->has_payload)
|
||||
vmcb->control.exit_info_2 = ex->payload;
|
||||
else
|
||||
vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
|
||||
} else if (nr == DB_VECTOR) {
|
||||
/* See inject_pending_event. */
|
||||
kvm_deliver_exception_payload(&svm->vcpu);
|
||||
if (svm->vcpu.arch.dr7 & DR7_GD) {
|
||||
svm->vcpu.arch.dr7 &= ~DR7_GD;
|
||||
kvm_update_dr7(&svm->vcpu);
|
||||
vmcb->control.exit_info_2 = vcpu->arch.cr2;
|
||||
} else if (ex->vector == DB_VECTOR) {
|
||||
/* See kvm_check_and_inject_events(). */
|
||||
kvm_deliver_exception_payload(vcpu, ex);
|
||||
|
||||
if (vcpu->arch.dr7 & DR7_GD) {
|
||||
vcpu->arch.dr7 &= ~DR7_GD;
|
||||
kvm_update_dr7(vcpu);
|
||||
}
|
||||
} else
|
||||
WARN_ON(svm->vcpu.arch.exception.has_payload);
|
||||
} else {
|
||||
WARN_ON(ex->has_payload);
|
||||
}
|
||||
|
||||
nested_svm_vmexit(svm);
|
||||
}
|
||||
@ -1353,10 +1337,22 @@ static inline bool nested_exit_on_init(struct vcpu_svm *svm)
|
||||
|
||||
static int svm_check_nested_events(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
bool block_nested_events =
|
||||
kvm_event_needs_reinjection(vcpu) || svm->nested.nested_run_pending;
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
/*
|
||||
* Only a pending nested run blocks a pending exception. If there is a
|
||||
* previously injected event, the pending exception occurred while said
|
||||
* event was being delivered and thus needs to be handled.
|
||||
*/
|
||||
bool block_nested_exceptions = svm->nested.nested_run_pending;
|
||||
/*
|
||||
* New events (not exceptions) are only recognized at instruction
|
||||
* boundaries. If an event needs reinjection, then KVM is handling a
|
||||
* VM-Exit that occurred _during_ instruction execution; new events are
|
||||
* blocked until the instruction completes.
|
||||
*/
|
||||
bool block_nested_events = block_nested_exceptions ||
|
||||
kvm_event_needs_reinjection(vcpu);
|
||||
|
||||
if (lapic_in_kernel(vcpu) &&
|
||||
test_bit(KVM_APIC_INIT, &apic->pending_events)) {
|
||||
@ -1368,18 +1364,16 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (vcpu->arch.exception.pending) {
|
||||
/*
|
||||
* Only a pending nested run can block a pending exception.
|
||||
* Otherwise an injected NMI/interrupt should either be
|
||||
* lost or delivered to the nested hypervisor in the EXITINTINFO
|
||||
* vmcb field, while delivering the pending exception.
|
||||
*/
|
||||
if (svm->nested.nested_run_pending)
|
||||
if (vcpu->arch.exception_vmexit.pending) {
|
||||
if (block_nested_exceptions)
|
||||
return -EBUSY;
|
||||
if (!nested_exit_on_exception(svm))
|
||||
return 0;
|
||||
nested_svm_inject_exception_vmexit(svm);
|
||||
nested_svm_inject_exception_vmexit(vcpu);
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (vcpu->arch.exception.pending) {
|
||||
if (block_nested_exceptions)
|
||||
return -EBUSY;
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -1720,8 +1714,8 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
|
||||
|
||||
struct kvm_x86_nested_ops svm_nested_ops = {
|
||||
.leave_nested = svm_leave_nested,
|
||||
.is_exception_vmexit = nested_svm_is_exception_vmexit,
|
||||
.check_events = svm_check_nested_events,
|
||||
.handle_page_fault_workaround = nested_svm_handle_page_fault_workaround,
|
||||
.triple_fault = nested_svm_triple_fault,
|
||||
.get_nested_state_pages = svm_get_nested_state_pages,
|
||||
.get_state = svm_get_nested_state,
|
||||
|
@ -461,24 +461,22 @@ static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void svm_queue_exception(struct kvm_vcpu *vcpu)
|
||||
static void svm_inject_exception(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_queued_exception *ex = &vcpu->arch.exception;
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
unsigned nr = vcpu->arch.exception.nr;
|
||||
bool has_error_code = vcpu->arch.exception.has_error_code;
|
||||
u32 error_code = vcpu->arch.exception.error_code;
|
||||
|
||||
kvm_deliver_exception_payload(vcpu);
|
||||
kvm_deliver_exception_payload(vcpu, ex);
|
||||
|
||||
if (kvm_exception_is_soft(nr) &&
|
||||
if (kvm_exception_is_soft(ex->vector) &&
|
||||
svm_update_soft_interrupt_rip(vcpu))
|
||||
return;
|
||||
|
||||
svm->vmcb->control.event_inj = nr
|
||||
svm->vmcb->control.event_inj = ex->vector
|
||||
| SVM_EVTINJ_VALID
|
||||
| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
|
||||
| (ex->has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
|
||||
| SVM_EVTINJ_TYPE_EXEPT;
|
||||
svm->vmcb->control.event_inj_err = error_code;
|
||||
svm->vmcb->control.event_inj_err = ex->error_code;
|
||||
}
|
||||
|
||||
static void svm_init_erratum_383(void)
|
||||
@ -1975,7 +1973,7 @@ static int npf_interception(struct kvm_vcpu *vcpu)
|
||||
u64 fault_address = svm->vmcb->control.exit_info_2;
|
||||
u64 error_code = svm->vmcb->control.exit_info_1;
|
||||
|
||||
trace_kvm_page_fault(fault_address, error_code);
|
||||
trace_kvm_page_fault(vcpu, fault_address, error_code);
|
||||
return kvm_mmu_page_fault(vcpu, fault_address, error_code,
|
||||
static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
|
||||
svm->vmcb->control.insn_bytes : NULL,
|
||||
@ -2341,7 +2339,8 @@ void svm_set_gif(struct vcpu_svm *svm, bool value)
|
||||
enable_gif(svm);
|
||||
if (svm->vcpu.arch.smi_pending ||
|
||||
svm->vcpu.arch.nmi_pending ||
|
||||
kvm_cpu_has_injectable_intr(&svm->vcpu))
|
||||
kvm_cpu_has_injectable_intr(&svm->vcpu) ||
|
||||
kvm_apic_has_pending_init_or_sipi(&svm->vcpu))
|
||||
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
|
||||
} else {
|
||||
disable_gif(svm);
|
||||
@ -3522,7 +3521,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
|
||||
|
||||
/* Note, this is called iff the local APIC is in-kernel. */
|
||||
if (!READ_ONCE(vcpu->arch.apic->apicv_active)) {
|
||||
/* Process the interrupt via inject_pending_event */
|
||||
/* Process the interrupt via kvm_check_and_inject_events(). */
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
kvm_vcpu_kick(vcpu);
|
||||
return;
|
||||
@ -4697,15 +4696,7 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
/*
|
||||
* TODO: Last condition latch INIT signals on vCPU when
|
||||
* vCPU is in guest-mode and vmcb12 defines intercept on INIT.
|
||||
* To properly emulate the INIT intercept,
|
||||
* svm_check_nested_events() should call nested_svm_vmexit()
|
||||
* if an INIT signal is pending.
|
||||
*/
|
||||
return !gif_set(svm) ||
|
||||
(vmcb_is_intercept(&svm->vmcb->control, INTERCEPT_INIT));
|
||||
return !gif_set(svm);
|
||||
}
|
||||
|
||||
static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
|
||||
@ -4798,7 +4789,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
|
||||
.patch_hypercall = svm_patch_hypercall,
|
||||
.inject_irq = svm_inject_irq,
|
||||
.inject_nmi = svm_inject_nmi,
|
||||
.queue_exception = svm_queue_exception,
|
||||
.inject_exception = svm_inject_exception,
|
||||
.cancel_injection = svm_cancel_injection,
|
||||
.interrupt_allowed = svm_interrupt_allowed,
|
||||
.nmi_allowed = svm_nmi_allowed,
|
||||
|
@ -394,20 +394,25 @@ TRACE_EVENT(kvm_inj_exception,
|
||||
* Tracepoint for page fault.
|
||||
*/
|
||||
TRACE_EVENT(kvm_page_fault,
|
||||
TP_PROTO(unsigned long fault_address, unsigned int error_code),
|
||||
TP_ARGS(fault_address, error_code),
|
||||
TP_PROTO(struct kvm_vcpu *vcpu, u64 fault_address, u64 error_code),
|
||||
TP_ARGS(vcpu, fault_address, error_code),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( unsigned long, fault_address )
|
||||
__field( unsigned int, error_code )
|
||||
__field( unsigned int, vcpu_id )
|
||||
__field( unsigned long, guest_rip )
|
||||
__field( u64, fault_address )
|
||||
__field( u64, error_code )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->vcpu_id = vcpu->vcpu_id;
|
||||
__entry->guest_rip = kvm_rip_read(vcpu);
|
||||
__entry->fault_address = fault_address;
|
||||
__entry->error_code = error_code;
|
||||
),
|
||||
|
||||
TP_printk("address %lx error_code %x",
|
||||
TP_printk("vcpu %u rip 0x%lx address 0x%016llx error_code 0x%llx",
|
||||
__entry->vcpu_id, __entry->guest_rip,
|
||||
__entry->fault_address, __entry->error_code)
|
||||
);
|
||||
|
||||
@ -589,10 +594,12 @@ TRACE_EVENT(kvm_pv_eoi,
|
||||
/*
|
||||
* Tracepoint for nested VMRUN
|
||||
*/
|
||||
TRACE_EVENT(kvm_nested_vmrun,
|
||||
TRACE_EVENT(kvm_nested_vmenter,
|
||||
TP_PROTO(__u64 rip, __u64 vmcb, __u64 nested_rip, __u32 int_ctl,
|
||||
__u32 event_inj, bool npt),
|
||||
TP_ARGS(rip, vmcb, nested_rip, int_ctl, event_inj, npt),
|
||||
__u32 event_inj, bool tdp_enabled, __u64 guest_tdp_pgd,
|
||||
__u64 guest_cr3, __u32 isa),
|
||||
TP_ARGS(rip, vmcb, nested_rip, int_ctl, event_inj, tdp_enabled,
|
||||
guest_tdp_pgd, guest_cr3, isa),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( __u64, rip )
|
||||
@ -600,7 +607,9 @@ TRACE_EVENT(kvm_nested_vmrun,
|
||||
__field( __u64, nested_rip )
|
||||
__field( __u32, int_ctl )
|
||||
__field( __u32, event_inj )
|
||||
__field( bool, npt )
|
||||
__field( bool, tdp_enabled )
|
||||
__field( __u64, guest_pgd )
|
||||
__field( __u32, isa )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
@ -609,14 +618,24 @@ TRACE_EVENT(kvm_nested_vmrun,
|
||||
__entry->nested_rip = nested_rip;
|
||||
__entry->int_ctl = int_ctl;
|
||||
__entry->event_inj = event_inj;
|
||||
__entry->npt = npt;
|
||||
__entry->tdp_enabled = tdp_enabled;
|
||||
__entry->guest_pgd = tdp_enabled ? guest_tdp_pgd : guest_cr3;
|
||||
__entry->isa = isa;
|
||||
),
|
||||
|
||||
TP_printk("rip: 0x%016llx vmcb: 0x%016llx nrip: 0x%016llx int_ctl: 0x%08x "
|
||||
"event_inj: 0x%08x npt: %s",
|
||||
__entry->rip, __entry->vmcb, __entry->nested_rip,
|
||||
__entry->int_ctl, __entry->event_inj,
|
||||
__entry->npt ? "on" : "off")
|
||||
TP_printk("rip: 0x%016llx %s: 0x%016llx nested_rip: 0x%016llx "
|
||||
"int_ctl: 0x%08x event_inj: 0x%08x nested_%s=%s %s: 0x%016llx",
|
||||
__entry->rip,
|
||||
__entry->isa == KVM_ISA_VMX ? "vmcs" : "vmcb",
|
||||
__entry->vmcb,
|
||||
__entry->nested_rip,
|
||||
__entry->int_ctl,
|
||||
__entry->event_inj,
|
||||
__entry->isa == KVM_ISA_VMX ? "ept" : "npt",
|
||||
__entry->tdp_enabled ? "y" : "n",
|
||||
!__entry->tdp_enabled ? "guest_cr3" :
|
||||
__entry->isa == KVM_ISA_VMX ? "nested_eptp" : "nested_cr3",
|
||||
__entry->guest_pgd)
|
||||
);
|
||||
|
||||
TRACE_EVENT(kvm_nested_intercepts,
|
||||
|
@ -65,6 +65,7 @@ struct vmcs_config {
|
||||
u64 cpu_based_3rd_exec_ctrl;
|
||||
u32 vmexit_ctrl;
|
||||
u32 vmentry_ctrl;
|
||||
u64 misc;
|
||||
struct nested_vmx_msrs nested;
|
||||
};
|
||||
extern struct vmcs_config vmcs_config;
|
||||
@ -82,7 +83,8 @@ static inline bool cpu_has_vmx_basic_inout(void)
|
||||
|
||||
static inline bool cpu_has_virtual_nmis(void)
|
||||
{
|
||||
return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS;
|
||||
return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
|
||||
vmcs_config.cpu_based_exec_ctrl & CPU_BASED_NMI_WINDOW_EXITING;
|
||||
}
|
||||
|
||||
static inline bool cpu_has_vmx_preemption_timer(void)
|
||||
@ -224,11 +226,8 @@ static inline bool cpu_has_vmx_vmfunc(void)
|
||||
|
||||
static inline bool cpu_has_vmx_shadow_vmcs(void)
|
||||
{
|
||||
u64 vmx_msr;
|
||||
|
||||
/* check if the cpu supports writing r/o exit information fields */
|
||||
rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
|
||||
if (!(vmx_msr & MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS))
|
||||
if (!(vmcs_config.misc & MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS))
|
||||
return false;
|
||||
|
||||
return vmcs_config.cpu_based_2nd_exec_ctrl &
|
||||
@ -370,10 +369,7 @@ static inline bool cpu_has_vmx_invvpid_global(void)
|
||||
|
||||
static inline bool cpu_has_vmx_intel_pt(void)
|
||||
{
|
||||
u64 vmx_msr;
|
||||
|
||||
rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
|
||||
return (vmx_msr & MSR_IA32_VMX_MISC_INTEL_PT) &&
|
||||
return (vmcs_config.misc & MSR_IA32_VMX_MISC_INTEL_PT) &&
|
||||
(vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_PT_USE_GPA) &&
|
||||
(vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_RTIT_CTL);
|
||||
}
|
||||
|
@ -10,6 +10,8 @@
|
||||
#include "vmx.h"
|
||||
#include "trace.h"
|
||||
|
||||
#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
|
||||
|
||||
DEFINE_STATIC_KEY_FALSE(enable_evmcs);
|
||||
|
||||
#define EVMCS1_OFFSET(x) offsetof(struct hv_enlightened_vmcs, x)
|
||||
@ -28,6 +30,8 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
|
||||
EVMCS1_FIELD(HOST_IA32_EFER, host_ia32_efer,
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
|
||||
EVMCS1_FIELD(HOST_IA32_PERF_GLOBAL_CTRL, host_ia32_perf_global_ctrl,
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
|
||||
EVMCS1_FIELD(HOST_CR0, host_cr0,
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
|
||||
EVMCS1_FIELD(HOST_CR3, host_cr3,
|
||||
@ -78,6 +82,8 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
|
||||
EVMCS1_FIELD(GUEST_IA32_EFER, guest_ia32_efer,
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
|
||||
EVMCS1_FIELD(GUEST_IA32_PERF_GLOBAL_CTRL, guest_ia32_perf_global_ctrl,
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
|
||||
EVMCS1_FIELD(GUEST_PDPTR0, guest_pdptr0,
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
|
||||
EVMCS1_FIELD(GUEST_PDPTR1, guest_pdptr1,
|
||||
@ -126,6 +132,28 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
|
||||
EVMCS1_FIELD(XSS_EXIT_BITMAP, xss_exit_bitmap,
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
|
||||
EVMCS1_FIELD(ENCLS_EXITING_BITMAP, encls_exiting_bitmap,
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
|
||||
EVMCS1_FIELD(TSC_MULTIPLIER, tsc_multiplier,
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
|
||||
/*
|
||||
* Not used by KVM:
|
||||
*
|
||||
* EVMCS1_FIELD(0x00006828, guest_ia32_s_cet,
|
||||
* HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
|
||||
* EVMCS1_FIELD(0x0000682A, guest_ssp,
|
||||
* HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC),
|
||||
* EVMCS1_FIELD(0x0000682C, guest_ia32_int_ssp_table_addr,
|
||||
* HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
|
||||
* EVMCS1_FIELD(0x00002816, guest_ia32_lbr_ctl,
|
||||
* HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
|
||||
* EVMCS1_FIELD(0x00006C18, host_ia32_s_cet,
|
||||
* HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
|
||||
* EVMCS1_FIELD(0x00006C1A, host_ssp,
|
||||
* HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
|
||||
* EVMCS1_FIELD(0x00006C1C, host_ia32_int_ssp_table_addr,
|
||||
* HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
|
||||
*/
|
||||
|
||||
/* 64 bit read only */
|
||||
EVMCS1_FIELD(GUEST_PHYSICAL_ADDRESS, guest_physical_address,
|
||||
@ -294,19 +322,6 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
|
||||
};
|
||||
const unsigned int nr_evmcs_1_fields = ARRAY_SIZE(vmcs_field_to_evmcs_1);
|
||||
|
||||
#if IS_ENABLED(CONFIG_HYPERV)
|
||||
__init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
|
||||
{
|
||||
vmcs_conf->cpu_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_EXEC_CTRL;
|
||||
vmcs_conf->pin_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_PINCTRL;
|
||||
vmcs_conf->cpu_based_2nd_exec_ctrl &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
|
||||
vmcs_conf->cpu_based_3rd_exec_ctrl = 0;
|
||||
|
||||
vmcs_conf->vmexit_ctrl &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
|
||||
vmcs_conf->vmentry_ctrl &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
|
||||
}
|
||||
#endif
|
||||
|
||||
bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa)
|
||||
{
|
||||
struct hv_vp_assist_page assist_page;
|
||||
@ -334,6 +349,9 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
|
||||
* versions: lower 8 bits is the minimal version, higher 8 bits is the
|
||||
* maximum supported version. KVM supports versions from 1 to
|
||||
* KVM_EVMCS_VERSION.
|
||||
*
|
||||
* Note, do not check the Hyper-V is fully enabled in guest CPUID, this
|
||||
* helper is used to _get_ the vCPU's supported CPUID.
|
||||
*/
|
||||
if (kvm_cpu_cap_get(X86_FEATURE_VMX) &&
|
||||
(!vcpu || to_vmx(vcpu)->nested.enlightened_vmcs_enabled))
|
||||
@ -342,10 +360,67 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
|
||||
return 0;
|
||||
}
|
||||
|
||||
void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
|
||||
enum evmcs_revision {
|
||||
EVMCSv1_LEGACY,
|
||||
NR_EVMCS_REVISIONS,
|
||||
};
|
||||
|
||||
enum evmcs_ctrl_type {
|
||||
EVMCS_EXIT_CTRLS,
|
||||
EVMCS_ENTRY_CTRLS,
|
||||
EVMCS_2NDEXEC,
|
||||
EVMCS_PINCTRL,
|
||||
EVMCS_VMFUNC,
|
||||
NR_EVMCS_CTRLS,
|
||||
};
|
||||
|
||||
static const u32 evmcs_unsupported_ctrls[NR_EVMCS_CTRLS][NR_EVMCS_REVISIONS] = {
|
||||
[EVMCS_EXIT_CTRLS] = {
|
||||
[EVMCSv1_LEGACY] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL,
|
||||
},
|
||||
[EVMCS_ENTRY_CTRLS] = {
|
||||
[EVMCSv1_LEGACY] = EVMCS1_UNSUPPORTED_VMENTRY_CTRL,
|
||||
},
|
||||
[EVMCS_2NDEXEC] = {
|
||||
[EVMCSv1_LEGACY] = EVMCS1_UNSUPPORTED_2NDEXEC,
|
||||
},
|
||||
[EVMCS_PINCTRL] = {
|
||||
[EVMCSv1_LEGACY] = EVMCS1_UNSUPPORTED_PINCTRL,
|
||||
},
|
||||
[EVMCS_VMFUNC] = {
|
||||
[EVMCSv1_LEGACY] = EVMCS1_UNSUPPORTED_VMFUNC,
|
||||
},
|
||||
};
|
||||
|
||||
static u32 evmcs_get_unsupported_ctls(enum evmcs_ctrl_type ctrl_type)
|
||||
{
|
||||
enum evmcs_revision evmcs_rev = EVMCSv1_LEGACY;
|
||||
|
||||
return evmcs_unsupported_ctrls[ctrl_type][evmcs_rev];
|
||||
}
|
||||
|
||||
static bool evmcs_has_perf_global_ctrl(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
|
||||
|
||||
/*
|
||||
* PERF_GLOBAL_CTRL has a quirk where some Windows guests may fail to
|
||||
* boot if a PV CPUID feature flag is not also set. Treat the fields
|
||||
* as unsupported if the flag is not set in guest CPUID. This should
|
||||
* be called only for guest accesses, and all guest accesses should be
|
||||
* gated on Hyper-V being enabled and initialized.
|
||||
*/
|
||||
if (WARN_ON_ONCE(!hv_vcpu))
|
||||
return false;
|
||||
|
||||
return hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL;
|
||||
}
|
||||
|
||||
void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
|
||||
{
|
||||
u32 ctl_low = (u32)*pdata;
|
||||
u32 ctl_high = (u32)(*pdata >> 32);
|
||||
u32 unsupported_ctrls;
|
||||
|
||||
/*
|
||||
* Hyper-V 2016 and 2019 try using these features even when eVMCS
|
||||
@ -354,77 +429,70 @@ void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
|
||||
switch (msr_index) {
|
||||
case MSR_IA32_VMX_EXIT_CTLS:
|
||||
case MSR_IA32_VMX_TRUE_EXIT_CTLS:
|
||||
ctl_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
|
||||
unsupported_ctrls = evmcs_get_unsupported_ctls(EVMCS_EXIT_CTRLS);
|
||||
if (!evmcs_has_perf_global_ctrl(vcpu))
|
||||
unsupported_ctrls |= VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
|
||||
ctl_high &= ~unsupported_ctrls;
|
||||
break;
|
||||
case MSR_IA32_VMX_ENTRY_CTLS:
|
||||
case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
|
||||
ctl_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
|
||||
unsupported_ctrls = evmcs_get_unsupported_ctls(EVMCS_ENTRY_CTRLS);
|
||||
if (!evmcs_has_perf_global_ctrl(vcpu))
|
||||
unsupported_ctrls |= VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
|
||||
ctl_high &= ~unsupported_ctrls;
|
||||
break;
|
||||
case MSR_IA32_VMX_PROCBASED_CTLS2:
|
||||
ctl_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
|
||||
ctl_high &= ~evmcs_get_unsupported_ctls(EVMCS_2NDEXEC);
|
||||
break;
|
||||
case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
|
||||
case MSR_IA32_VMX_PINBASED_CTLS:
|
||||
ctl_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
|
||||
ctl_high &= ~evmcs_get_unsupported_ctls(EVMCS_PINCTRL);
|
||||
break;
|
||||
case MSR_IA32_VMX_VMFUNC:
|
||||
ctl_low &= ~EVMCS1_UNSUPPORTED_VMFUNC;
|
||||
ctl_low &= ~evmcs_get_unsupported_ctls(EVMCS_VMFUNC);
|
||||
break;
|
||||
}
|
||||
|
||||
*pdata = ctl_low | ((u64)ctl_high << 32);
|
||||
}
|
||||
|
||||
static bool nested_evmcs_is_valid_controls(enum evmcs_ctrl_type ctrl_type,
|
||||
u32 val)
|
||||
{
|
||||
return !(val & evmcs_get_unsupported_ctls(ctrl_type));
|
||||
}
|
||||
|
||||
int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
|
||||
{
|
||||
int ret = 0;
|
||||
u32 unsupp_ctl;
|
||||
if (CC(!nested_evmcs_is_valid_controls(EVMCS_PINCTRL,
|
||||
vmcs12->pin_based_vm_exec_control)))
|
||||
return -EINVAL;
|
||||
|
||||
unsupp_ctl = vmcs12->pin_based_vm_exec_control &
|
||||
EVMCS1_UNSUPPORTED_PINCTRL;
|
||||
if (unsupp_ctl) {
|
||||
trace_kvm_nested_vmenter_failed(
|
||||
"eVMCS: unsupported pin-based VM-execution controls",
|
||||
unsupp_ctl);
|
||||
ret = -EINVAL;
|
||||
}
|
||||
if (CC(!nested_evmcs_is_valid_controls(EVMCS_2NDEXEC,
|
||||
vmcs12->secondary_vm_exec_control)))
|
||||
return -EINVAL;
|
||||
|
||||
unsupp_ctl = vmcs12->secondary_vm_exec_control &
|
||||
EVMCS1_UNSUPPORTED_2NDEXEC;
|
||||
if (unsupp_ctl) {
|
||||
trace_kvm_nested_vmenter_failed(
|
||||
"eVMCS: unsupported secondary VM-execution controls",
|
||||
unsupp_ctl);
|
||||
ret = -EINVAL;
|
||||
}
|
||||
if (CC(!nested_evmcs_is_valid_controls(EVMCS_EXIT_CTRLS,
|
||||
vmcs12->vm_exit_controls)))
|
||||
return -EINVAL;
|
||||
|
||||
unsupp_ctl = vmcs12->vm_exit_controls &
|
||||
EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
|
||||
if (unsupp_ctl) {
|
||||
trace_kvm_nested_vmenter_failed(
|
||||
"eVMCS: unsupported VM-exit controls",
|
||||
unsupp_ctl);
|
||||
ret = -EINVAL;
|
||||
}
|
||||
if (CC(!nested_evmcs_is_valid_controls(EVMCS_ENTRY_CTRLS,
|
||||
vmcs12->vm_entry_controls)))
|
||||
return -EINVAL;
|
||||
|
||||
unsupp_ctl = vmcs12->vm_entry_controls &
|
||||
EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
|
||||
if (unsupp_ctl) {
|
||||
trace_kvm_nested_vmenter_failed(
|
||||
"eVMCS: unsupported VM-entry controls",
|
||||
unsupp_ctl);
|
||||
ret = -EINVAL;
|
||||
}
|
||||
/*
|
||||
* VM-Func controls are 64-bit, but KVM currently doesn't support any
|
||||
* controls in bits 63:32, i.e. dropping those bits on the consistency
|
||||
* check is intentional.
|
||||
*/
|
||||
if (WARN_ON_ONCE(vmcs12->vm_function_control >> 32))
|
||||
return -EINVAL;
|
||||
|
||||
unsupp_ctl = vmcs12->vm_function_control & EVMCS1_UNSUPPORTED_VMFUNC;
|
||||
if (unsupp_ctl) {
|
||||
trace_kvm_nested_vmenter_failed(
|
||||
"eVMCS: unsupported VM-function controls",
|
||||
unsupp_ctl);
|
||||
ret = -EINVAL;
|
||||
}
|
||||
if (CC(!nested_evmcs_is_valid_controls(EVMCS_VMFUNC,
|
||||
vmcs12->vm_function_control)))
|
||||
return -EINVAL;
|
||||
|
||||
return ret;
|
||||
return 0;
|
||||
}
|
||||
|
||||
int nested_enable_evmcs(struct kvm_vcpu *vcpu,
|
||||
|
@ -42,8 +42,6 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
|
||||
* PLE_GAP = 0x00004020,
|
||||
* PLE_WINDOW = 0x00004022,
|
||||
* VMX_PREEMPTION_TIMER_VALUE = 0x0000482E,
|
||||
* GUEST_IA32_PERF_GLOBAL_CTRL = 0x00002808,
|
||||
* HOST_IA32_PERF_GLOBAL_CTRL = 0x00002c04,
|
||||
*
|
||||
* Currently unsupported in KVM:
|
||||
* GUEST_IA32_RTIT_CTL = 0x00002814,
|
||||
@ -61,9 +59,8 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
|
||||
SECONDARY_EXEC_TSC_SCALING | \
|
||||
SECONDARY_EXEC_PAUSE_LOOP_EXITING)
|
||||
#define EVMCS1_UNSUPPORTED_VMEXIT_CTRL \
|
||||
(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | \
|
||||
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
|
||||
#define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
|
||||
(VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
|
||||
#define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (0)
|
||||
#define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
|
||||
|
||||
struct evmcs_field {
|
||||
@ -212,7 +209,6 @@ static inline void evmcs_load(u64 phys_addr)
|
||||
vp_ap->enlighten_vmentry = 1;
|
||||
}
|
||||
|
||||
__init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf);
|
||||
#else /* !IS_ENABLED(CONFIG_HYPERV) */
|
||||
static __always_inline void evmcs_write64(unsigned long field, u64 value) {}
|
||||
static inline void evmcs_write32(unsigned long field, u32 value) {}
|
||||
@ -243,7 +239,7 @@ bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa);
|
||||
uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
|
||||
int nested_enable_evmcs(struct kvm_vcpu *vcpu,
|
||||
uint16_t *vmcs_version);
|
||||
void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
|
||||
void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
|
||||
int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
|
||||
|
||||
#endif /* __KVM_X86_VMX_EVMCS_H */
|
||||
|
@ -439,61 +439,22 @@ static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
|
||||
return inequality ^ bit;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* KVM wants to inject page-faults which it got to the guest. This function
|
||||
* checks whether in a nested guest, we need to inject them to L1 or L2.
|
||||
*/
|
||||
static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long *exit_qual)
|
||||
{
|
||||
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
|
||||
unsigned int nr = vcpu->arch.exception.nr;
|
||||
bool has_payload = vcpu->arch.exception.has_payload;
|
||||
unsigned long payload = vcpu->arch.exception.payload;
|
||||
|
||||
if (nr == PF_VECTOR) {
|
||||
if (vcpu->arch.exception.nested_apf) {
|
||||
*exit_qual = vcpu->arch.apf.nested_apf_token;
|
||||
return 1;
|
||||
}
|
||||
if (nested_vmx_is_page_fault_vmexit(vmcs12,
|
||||
vcpu->arch.exception.error_code)) {
|
||||
*exit_qual = has_payload ? payload : vcpu->arch.cr2;
|
||||
return 1;
|
||||
}
|
||||
} else if (vmcs12->exception_bitmap & (1u << nr)) {
|
||||
if (nr == DB_VECTOR) {
|
||||
if (!has_payload) {
|
||||
payload = vcpu->arch.dr6;
|
||||
payload &= ~DR6_BT;
|
||||
payload ^= DR6_ACTIVE_LOW;
|
||||
}
|
||||
*exit_qual = payload;
|
||||
} else
|
||||
*exit_qual = 0;
|
||||
return 1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static bool nested_vmx_handle_page_fault_workaround(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault)
|
||||
static bool nested_vmx_is_exception_vmexit(struct kvm_vcpu *vcpu, u8 vector,
|
||||
u32 error_code)
|
||||
{
|
||||
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
|
||||
|
||||
WARN_ON(!is_guest_mode(vcpu));
|
||||
/*
|
||||
* Drop bits 31:16 of the error code when performing the #PF mask+match
|
||||
* check. All VMCS fields involved are 32 bits, but Intel CPUs never
|
||||
* set bits 31:16 and VMX disallows setting bits 31:16 in the injected
|
||||
* error code. Including the to-be-dropped bits in the check might
|
||||
* result in an "impossible" or missed exit from L1's perspective.
|
||||
*/
|
||||
if (vector == PF_VECTOR)
|
||||
return nested_vmx_is_page_fault_vmexit(vmcs12, (u16)error_code);
|
||||
|
||||
if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code) &&
|
||||
!WARN_ON_ONCE(to_vmx(vcpu)->nested.nested_run_pending)) {
|
||||
vmcs12->vm_exit_intr_error_code = fault->error_code;
|
||||
nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
|
||||
PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
|
||||
INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
|
||||
fault->address);
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
return (vmcs12->exception_bitmap & (1u << vector));
|
||||
}
|
||||
|
||||
static int nested_vmx_check_io_bitmap_controls(struct kvm_vcpu *vcpu,
|
||||
@ -1607,6 +1568,10 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
|
||||
vmcs12->guest_rflags = evmcs->guest_rflags;
|
||||
vmcs12->guest_interruptibility_info =
|
||||
evmcs->guest_interruptibility_info;
|
||||
/*
|
||||
* Not present in struct vmcs12:
|
||||
* vmcs12->guest_ssp = evmcs->guest_ssp;
|
||||
*/
|
||||
}
|
||||
|
||||
if (unlikely(!(hv_clean_fields &
|
||||
@ -1653,6 +1618,13 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
|
||||
vmcs12->host_fs_selector = evmcs->host_fs_selector;
|
||||
vmcs12->host_gs_selector = evmcs->host_gs_selector;
|
||||
vmcs12->host_tr_selector = evmcs->host_tr_selector;
|
||||
vmcs12->host_ia32_perf_global_ctrl = evmcs->host_ia32_perf_global_ctrl;
|
||||
/*
|
||||
* Not present in struct vmcs12:
|
||||
* vmcs12->host_ia32_s_cet = evmcs->host_ia32_s_cet;
|
||||
* vmcs12->host_ssp = evmcs->host_ssp;
|
||||
* vmcs12->host_ia32_int_ssp_table_addr = evmcs->host_ia32_int_ssp_table_addr;
|
||||
*/
|
||||
}
|
||||
|
||||
if (unlikely(!(hv_clean_fields &
|
||||
@ -1720,6 +1692,8 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
|
||||
vmcs12->tsc_offset = evmcs->tsc_offset;
|
||||
vmcs12->virtual_apic_page_addr = evmcs->virtual_apic_page_addr;
|
||||
vmcs12->xss_exit_bitmap = evmcs->xss_exit_bitmap;
|
||||
vmcs12->encls_exiting_bitmap = evmcs->encls_exiting_bitmap;
|
||||
vmcs12->tsc_multiplier = evmcs->tsc_multiplier;
|
||||
}
|
||||
|
||||
if (unlikely(!(hv_clean_fields &
|
||||
@ -1767,6 +1741,13 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
|
||||
vmcs12->guest_bndcfgs = evmcs->guest_bndcfgs;
|
||||
vmcs12->guest_activity_state = evmcs->guest_activity_state;
|
||||
vmcs12->guest_sysenter_cs = evmcs->guest_sysenter_cs;
|
||||
vmcs12->guest_ia32_perf_global_ctrl = evmcs->guest_ia32_perf_global_ctrl;
|
||||
/*
|
||||
* Not present in struct vmcs12:
|
||||
* vmcs12->guest_ia32_s_cet = evmcs->guest_ia32_s_cet;
|
||||
* vmcs12->guest_ia32_lbr_ctl = evmcs->guest_ia32_lbr_ctl;
|
||||
* vmcs12->guest_ia32_int_ssp_table_addr = evmcs->guest_ia32_int_ssp_table_addr;
|
||||
*/
|
||||
}
|
||||
|
||||
/*
|
||||
@ -1869,12 +1850,23 @@ static void copy_vmcs12_to_enlightened(struct vcpu_vmx *vmx)
|
||||
* evmcs->vm_exit_msr_store_count = vmcs12->vm_exit_msr_store_count;
|
||||
* evmcs->vm_exit_msr_load_count = vmcs12->vm_exit_msr_load_count;
|
||||
* evmcs->vm_entry_msr_load_count = vmcs12->vm_entry_msr_load_count;
|
||||
* evmcs->guest_ia32_perf_global_ctrl = vmcs12->guest_ia32_perf_global_ctrl;
|
||||
* evmcs->host_ia32_perf_global_ctrl = vmcs12->host_ia32_perf_global_ctrl;
|
||||
* evmcs->encls_exiting_bitmap = vmcs12->encls_exiting_bitmap;
|
||||
* evmcs->tsc_multiplier = vmcs12->tsc_multiplier;
|
||||
*
|
||||
* Not present in struct vmcs12:
|
||||
* evmcs->exit_io_instruction_ecx = vmcs12->exit_io_instruction_ecx;
|
||||
* evmcs->exit_io_instruction_esi = vmcs12->exit_io_instruction_esi;
|
||||
* evmcs->exit_io_instruction_edi = vmcs12->exit_io_instruction_edi;
|
||||
* evmcs->exit_io_instruction_eip = vmcs12->exit_io_instruction_eip;
|
||||
* evmcs->host_ia32_s_cet = vmcs12->host_ia32_s_cet;
|
||||
* evmcs->host_ssp = vmcs12->host_ssp;
|
||||
* evmcs->host_ia32_int_ssp_table_addr = vmcs12->host_ia32_int_ssp_table_addr;
|
||||
* evmcs->guest_ia32_s_cet = vmcs12->guest_ia32_s_cet;
|
||||
* evmcs->guest_ia32_lbr_ctl = vmcs12->guest_ia32_lbr_ctl;
|
||||
* evmcs->guest_ia32_int_ssp_table_addr = vmcs12->guest_ia32_int_ssp_table_addr;
|
||||
* evmcs->guest_ssp = vmcs12->guest_ssp;
|
||||
*/
|
||||
|
||||
evmcs->guest_es_selector = vmcs12->guest_es_selector;
|
||||
@ -1982,7 +1974,7 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld(
|
||||
bool evmcs_gpa_changed = false;
|
||||
u64 evmcs_gpa;
|
||||
|
||||
if (likely(!vmx->nested.enlightened_vmcs_enabled))
|
||||
if (likely(!guest_cpuid_has_evmcs(vcpu)))
|
||||
return EVMPTRLD_DISABLED;
|
||||
|
||||
if (!nested_enlightened_vmentry(vcpu, &evmcs_gpa)) {
|
||||
@ -2328,9 +2320,14 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
|
||||
* are emulated by vmx_set_efer() in prepare_vmcs02(), but speculate
|
||||
* on the related bits (if supported by the CPU) in the hope that
|
||||
* we can avoid VMWrites during vmx_set_efer().
|
||||
*
|
||||
* Similarly, take vmcs01's PERF_GLOBAL_CTRL in the hope that if KVM is
|
||||
* loading PERF_GLOBAL_CTRL via the VMCS for L1, then KVM will want to
|
||||
* do the same for L2.
|
||||
*/
|
||||
exec_control = __vm_entry_controls_get(vmcs01);
|
||||
exec_control |= vmcs12->vm_entry_controls;
|
||||
exec_control |= (vmcs12->vm_entry_controls &
|
||||
~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
|
||||
exec_control &= ~(VM_ENTRY_IA32E_MODE | VM_ENTRY_LOAD_IA32_EFER);
|
||||
if (cpu_has_load_ia32_efer()) {
|
||||
if (guest_efer & EFER_LMA)
|
||||
@ -2863,7 +2860,7 @@ static int nested_vmx_check_controls(struct kvm_vcpu *vcpu,
|
||||
nested_check_vm_entry_controls(vcpu, vmcs12))
|
||||
return -EINVAL;
|
||||
|
||||
if (to_vmx(vcpu)->nested.enlightened_vmcs_enabled)
|
||||
if (guest_cpuid_has_evmcs(vcpu))
|
||||
return nested_evmcs_check_controls(vmcs12);
|
||||
|
||||
return 0;
|
||||
@ -3145,7 +3142,7 @@ static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu)
|
||||
* L2 was running), map it here to make sure vmcs12 changes are
|
||||
* properly reflected.
|
||||
*/
|
||||
if (vmx->nested.enlightened_vmcs_enabled &&
|
||||
if (guest_cpuid_has_evmcs(vcpu) &&
|
||||
vmx->nested.hv_evmcs_vmptr == EVMPTR_MAP_PENDING) {
|
||||
enum nested_evmptrld_status evmptrld_status =
|
||||
nested_vmx_handle_enlightened_vmptrld(vcpu, false);
|
||||
@ -3364,12 +3361,24 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
|
||||
};
|
||||
u32 failed_index;
|
||||
|
||||
trace_kvm_nested_vmenter(kvm_rip_read(vcpu),
|
||||
vmx->nested.current_vmptr,
|
||||
vmcs12->guest_rip,
|
||||
vmcs12->guest_intr_status,
|
||||
vmcs12->vm_entry_intr_info_field,
|
||||
vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_ENABLE_EPT,
|
||||
vmcs12->ept_pointer,
|
||||
vmcs12->guest_cr3,
|
||||
KVM_ISA_VMX);
|
||||
|
||||
kvm_service_local_tlb_flush_requests(vcpu);
|
||||
|
||||
evaluate_pending_interrupts = exec_controls_get(vmx) &
|
||||
(CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING);
|
||||
if (likely(!evaluate_pending_interrupts) && kvm_vcpu_apicv_active(vcpu))
|
||||
evaluate_pending_interrupts |= vmx_has_apicv_interrupt(vcpu);
|
||||
if (!evaluate_pending_interrupts)
|
||||
evaluate_pending_interrupts |= kvm_apic_has_pending_init_or_sipi(vcpu);
|
||||
|
||||
if (!vmx->nested.nested_run_pending ||
|
||||
!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
|
||||
@ -3450,18 +3459,10 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
|
||||
}
|
||||
|
||||
/*
|
||||
* If L1 had a pending IRQ/NMI until it executed
|
||||
* VMLAUNCH/VMRESUME which wasn't delivered because it was
|
||||
* disallowed (e.g. interrupts disabled), L0 needs to
|
||||
* evaluate if this pending event should cause an exit from L2
|
||||
* to L1 or delivered directly to L2 (e.g. In case L1 don't
|
||||
* intercept EXTERNAL_INTERRUPT).
|
||||
*
|
||||
* Usually this would be handled by the processor noticing an
|
||||
* IRQ/NMI window request, or checking RVI during evaluation of
|
||||
* pending virtual interrupts. However, this setting was done
|
||||
* on VMCS01 and now VMCS02 is active instead. Thus, we force L0
|
||||
* to perform pending event evaluation by requesting a KVM_REQ_EVENT.
|
||||
* Re-evaluate pending events if L1 had a pending IRQ/NMI/INIT/SIPI
|
||||
* when it executed VMLAUNCH/VMRESUME, as entering non-root mode can
|
||||
* effectively unblock various events, e.g. INIT/SIPI cause VM-Exit
|
||||
* unconditionally.
|
||||
*/
|
||||
if (unlikely(evaluate_pending_interrupts))
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
@ -3718,7 +3719,7 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
|
||||
is_double_fault(exit_intr_info))) {
|
||||
vmcs12->idt_vectoring_info_field = 0;
|
||||
} else if (vcpu->arch.exception.injected) {
|
||||
nr = vcpu->arch.exception.nr;
|
||||
nr = vcpu->arch.exception.vector;
|
||||
idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
|
||||
|
||||
if (kvm_exception_is_soft(nr)) {
|
||||
@ -3819,19 +3820,40 @@ mmio_needed:
|
||||
return -ENXIO;
|
||||
}
|
||||
|
||||
static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
|
||||
unsigned long exit_qual)
|
||||
static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_queued_exception *ex = &vcpu->arch.exception_vmexit;
|
||||
u32 intr_info = ex->vector | INTR_INFO_VALID_MASK;
|
||||
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
|
||||
unsigned int nr = vcpu->arch.exception.nr;
|
||||
u32 intr_info = nr | INTR_INFO_VALID_MASK;
|
||||
unsigned long exit_qual;
|
||||
|
||||
if (vcpu->arch.exception.has_error_code) {
|
||||
vmcs12->vm_exit_intr_error_code = vcpu->arch.exception.error_code;
|
||||
if (ex->has_payload) {
|
||||
exit_qual = ex->payload;
|
||||
} else if (ex->vector == PF_VECTOR) {
|
||||
exit_qual = vcpu->arch.cr2;
|
||||
} else if (ex->vector == DB_VECTOR) {
|
||||
exit_qual = vcpu->arch.dr6;
|
||||
exit_qual &= ~DR6_BT;
|
||||
exit_qual ^= DR6_ACTIVE_LOW;
|
||||
} else {
|
||||
exit_qual = 0;
|
||||
}
|
||||
|
||||
if (ex->has_error_code) {
|
||||
/*
|
||||
* Intel CPUs do not generate error codes with bits 31:16 set,
|
||||
* and more importantly VMX disallows setting bits 31:16 in the
|
||||
* injected error code for VM-Entry. Drop the bits to mimic
|
||||
* hardware and avoid inducing failure on nested VM-Entry if L1
|
||||
* chooses to inject the exception back to L2. AMD CPUs _do_
|
||||
* generate "full" 32-bit error codes, so KVM allows userspace
|
||||
* to inject exception error codes with bits 31:16 set.
|
||||
*/
|
||||
vmcs12->vm_exit_intr_error_code = (u16)ex->error_code;
|
||||
intr_info |= INTR_INFO_DELIVER_CODE_MASK;
|
||||
}
|
||||
|
||||
if (kvm_exception_is_soft(nr))
|
||||
if (kvm_exception_is_soft(ex->vector))
|
||||
intr_info |= INTR_TYPE_SOFT_EXCEPTION;
|
||||
else
|
||||
intr_info |= INTR_TYPE_HARD_EXCEPTION;
|
||||
@ -3844,16 +3866,39 @@ static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true if a debug trap is pending delivery.
|
||||
* Returns true if a debug trap is (likely) pending delivery. Infer the class
|
||||
* of a #DB (trap-like vs. fault-like) from the exception payload (to-be-DR6).
|
||||
* Using the payload is flawed because code breakpoints (fault-like) and data
|
||||
* breakpoints (trap-like) set the same bits in DR6 (breakpoint detected), i.e.
|
||||
* this will return false positives if a to-be-injected code breakpoint #DB is
|
||||
* pending (from KVM's perspective, but not "pending" across an instruction
|
||||
* boundary). ICEBP, a.k.a. INT1, is also not reflected here even though it
|
||||
* too is trap-like.
|
||||
*
|
||||
* In KVM, debug traps bear an exception payload. As such, the class of a #DB
|
||||
* exception may be inferred from the presence of an exception payload.
|
||||
* KVM "works" despite these flaws as ICEBP isn't currently supported by the
|
||||
* emulator, Monitor Trap Flag is not marked pending on intercepted #DBs (the
|
||||
* #DB has already happened), and MTF isn't marked pending on code breakpoints
|
||||
* from the emulator (because such #DBs are fault-like and thus don't trigger
|
||||
* actions that fire on instruction retire).
|
||||
*/
|
||||
static inline bool vmx_pending_dbg_trap(struct kvm_vcpu *vcpu)
|
||||
static unsigned long vmx_get_pending_dbg_trap(struct kvm_queued_exception *ex)
|
||||
{
|
||||
return vcpu->arch.exception.pending &&
|
||||
vcpu->arch.exception.nr == DB_VECTOR &&
|
||||
vcpu->arch.exception.payload;
|
||||
if (!ex->pending || ex->vector != DB_VECTOR)
|
||||
return 0;
|
||||
|
||||
/* General Detect #DBs are always fault-like. */
|
||||
return ex->payload & ~DR6_BD;
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true if there's a pending #DB exception that is lower priority than
|
||||
* a pending Monitor Trap Flag VM-Exit. TSS T-flag #DBs are not emulated by
|
||||
* KVM, but could theoretically be injected by userspace. Note, this code is
|
||||
* imperfect, see above.
|
||||
*/
|
||||
static bool vmx_is_low_priority_db_trap(struct kvm_queued_exception *ex)
|
||||
{
|
||||
return vmx_get_pending_dbg_trap(ex) & ~DR6_BT;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -3865,9 +3910,11 @@ static inline bool vmx_pending_dbg_trap(struct kvm_vcpu *vcpu)
|
||||
*/
|
||||
static void nested_vmx_update_pending_dbg(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (vmx_pending_dbg_trap(vcpu))
|
||||
vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
|
||||
vcpu->arch.exception.payload);
|
||||
unsigned long pending_dbg;
|
||||
|
||||
pending_dbg = vmx_get_pending_dbg_trap(&vcpu->arch.exception);
|
||||
if (pending_dbg)
|
||||
vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, pending_dbg);
|
||||
}
|
||||
|
||||
static bool nested_vmx_preemption_timer_pending(struct kvm_vcpu *vcpu)
|
||||
@ -3876,21 +3923,113 @@ static bool nested_vmx_preemption_timer_pending(struct kvm_vcpu *vcpu)
|
||||
to_vmx(vcpu)->nested.preemption_timer_expired;
|
||||
}
|
||||
|
||||
static bool vmx_has_nested_events(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return nested_vmx_preemption_timer_pending(vcpu) ||
|
||||
to_vmx(vcpu)->nested.mtf_pending;
|
||||
}
|
||||
|
||||
/*
|
||||
* Per the Intel SDM's table "Priority Among Concurrent Events", with minor
|
||||
* edits to fill in missing examples, e.g. #DB due to split-lock accesses,
|
||||
* and less minor edits to splice in the priority of VMX Non-Root specific
|
||||
* events, e.g. MTF and NMI/INTR-window exiting.
|
||||
*
|
||||
* 1 Hardware Reset and Machine Checks
|
||||
* - RESET
|
||||
* - Machine Check
|
||||
*
|
||||
* 2 Trap on Task Switch
|
||||
* - T flag in TSS is set (on task switch)
|
||||
*
|
||||
* 3 External Hardware Interventions
|
||||
* - FLUSH
|
||||
* - STOPCLK
|
||||
* - SMI
|
||||
* - INIT
|
||||
*
|
||||
* 3.5 Monitor Trap Flag (MTF) VM-exit[1]
|
||||
*
|
||||
* 4 Traps on Previous Instruction
|
||||
* - Breakpoints
|
||||
* - Trap-class Debug Exceptions (#DB due to TF flag set, data/I-O
|
||||
* breakpoint, or #DB due to a split-lock access)
|
||||
*
|
||||
* 4.3 VMX-preemption timer expired VM-exit
|
||||
*
|
||||
* 4.6 NMI-window exiting VM-exit[2]
|
||||
*
|
||||
* 5 Nonmaskable Interrupts (NMI)
|
||||
*
|
||||
* 5.5 Interrupt-window exiting VM-exit and Virtual-interrupt delivery
|
||||
*
|
||||
* 6 Maskable Hardware Interrupts
|
||||
*
|
||||
* 7 Code Breakpoint Fault
|
||||
*
|
||||
* 8 Faults from Fetching Next Instruction
|
||||
* - Code-Segment Limit Violation
|
||||
* - Code Page Fault
|
||||
* - Control protection exception (missing ENDBRANCH at target of indirect
|
||||
* call or jump)
|
||||
*
|
||||
* 9 Faults from Decoding Next Instruction
|
||||
* - Instruction length > 15 bytes
|
||||
* - Invalid Opcode
|
||||
* - Coprocessor Not Available
|
||||
*
|
||||
*10 Faults on Executing Instruction
|
||||
* - Overflow
|
||||
* - Bound error
|
||||
* - Invalid TSS
|
||||
* - Segment Not Present
|
||||
* - Stack fault
|
||||
* - General Protection
|
||||
* - Data Page Fault
|
||||
* - Alignment Check
|
||||
* - x86 FPU Floating-point exception
|
||||
* - SIMD floating-point exception
|
||||
* - Virtualization exception
|
||||
* - Control protection exception
|
||||
*
|
||||
* [1] Per the "Monitor Trap Flag" section: System-management interrupts (SMIs),
|
||||
* INIT signals, and higher priority events take priority over MTF VM exits.
|
||||
* MTF VM exits take priority over debug-trap exceptions and lower priority
|
||||
* events.
|
||||
*
|
||||
* [2] Debug-trap exceptions and higher priority events take priority over VM exits
|
||||
* caused by the VMX-preemption timer. VM exits caused by the VMX-preemption
|
||||
* timer take priority over VM exits caused by the "NMI-window exiting"
|
||||
* VM-execution control and lower priority events.
|
||||
*
|
||||
* [3] Debug-trap exceptions and higher priority events take priority over VM exits
|
||||
* caused by "NMI-window exiting". VM exits caused by this control take
|
||||
* priority over non-maskable interrupts (NMIs) and lower priority events.
|
||||
*
|
||||
* [4] Virtual-interrupt delivery has the same priority as that of VM exits due to
|
||||
* the 1-setting of the "interrupt-window exiting" VM-execution control. Thus,
|
||||
* non-maskable interrupts (NMIs) and higher priority events take priority over
|
||||
* delivery of a virtual interrupt; delivery of a virtual interrupt takes
|
||||
* priority over external interrupts and lower priority events.
|
||||
*/
|
||||
static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_vmx *vmx = to_vmx(vcpu);
|
||||
unsigned long exit_qual;
|
||||
bool block_nested_events =
|
||||
vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
|
||||
bool mtf_pending = vmx->nested.mtf_pending;
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
|
||||
struct vcpu_vmx *vmx = to_vmx(vcpu);
|
||||
/*
|
||||
* Clear the MTF state. If a higher priority VM-exit is delivered first,
|
||||
* this state is discarded.
|
||||
* Only a pending nested run blocks a pending exception. If there is a
|
||||
* previously injected event, the pending exception occurred while said
|
||||
* event was being delivered and thus needs to be handled.
|
||||
*/
|
||||
if (!block_nested_events)
|
||||
vmx->nested.mtf_pending = false;
|
||||
bool block_nested_exceptions = vmx->nested.nested_run_pending;
|
||||
/*
|
||||
* New events (not exceptions) are only recognized at instruction
|
||||
* boundaries. If an event needs reinjection, then KVM is handling a
|
||||
* VM-Exit that occurred _during_ instruction execution; new events are
|
||||
* blocked until the instruction completes.
|
||||
*/
|
||||
bool block_nested_events = block_nested_exceptions ||
|
||||
kvm_event_needs_reinjection(vcpu);
|
||||
|
||||
if (lapic_in_kernel(vcpu) &&
|
||||
test_bit(KVM_APIC_INIT, &apic->pending_events)) {
|
||||
@ -3900,6 +4039,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
|
||||
clear_bit(KVM_APIC_INIT, &apic->pending_events);
|
||||
if (vcpu->arch.mp_state != KVM_MP_STATE_INIT_RECEIVED)
|
||||
nested_vmx_vmexit(vcpu, EXIT_REASON_INIT_SIGNAL, 0, 0);
|
||||
|
||||
/* MTF is discarded if the vCPU is in WFS. */
|
||||
vmx->nested.mtf_pending = false;
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -3909,31 +4051,41 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
|
||||
return -EBUSY;
|
||||
|
||||
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
|
||||
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED)
|
||||
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
|
||||
nested_vmx_vmexit(vcpu, EXIT_REASON_SIPI_SIGNAL, 0,
|
||||
apic->sipi_vector & 0xFFUL);
|
||||
return 0;
|
||||
return 0;
|
||||
}
|
||||
/* Fallthrough, the SIPI is completely ignored. */
|
||||
}
|
||||
|
||||
/*
|
||||
* Process any exceptions that are not debug traps before MTF.
|
||||
* Process exceptions that are higher priority than Monitor Trap Flag:
|
||||
* fault-like exceptions, TSS T flag #DB (not emulated by KVM, but
|
||||
* could theoretically come in from userspace), and ICEBP (INT1).
|
||||
*
|
||||
* Note that only a pending nested run can block a pending exception.
|
||||
* Otherwise an injected NMI/interrupt should either be
|
||||
* lost or delivered to the nested hypervisor in the IDT_VECTORING_INFO,
|
||||
* while delivering the pending exception.
|
||||
* TODO: SMIs have higher priority than MTF and trap-like #DBs (except
|
||||
* for TSS T flag #DBs). KVM also doesn't save/restore pending MTF
|
||||
* across SMI/RSM as it should; that needs to be addressed in order to
|
||||
* prioritize SMI over MTF and trap-like #DBs.
|
||||
*/
|
||||
|
||||
if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu)) {
|
||||
if (vmx->nested.nested_run_pending)
|
||||
if (vcpu->arch.exception_vmexit.pending &&
|
||||
!vmx_is_low_priority_db_trap(&vcpu->arch.exception_vmexit)) {
|
||||
if (block_nested_exceptions)
|
||||
return -EBUSY;
|
||||
if (!nested_vmx_check_exception(vcpu, &exit_qual))
|
||||
goto no_vmexit;
|
||||
nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
|
||||
|
||||
nested_vmx_inject_exception_vmexit(vcpu);
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (mtf_pending) {
|
||||
if (vcpu->arch.exception.pending &&
|
||||
!vmx_is_low_priority_db_trap(&vcpu->arch.exception)) {
|
||||
if (block_nested_exceptions)
|
||||
return -EBUSY;
|
||||
goto no_vmexit;
|
||||
}
|
||||
|
||||
if (vmx->nested.mtf_pending) {
|
||||
if (block_nested_events)
|
||||
return -EBUSY;
|
||||
nested_vmx_update_pending_dbg(vcpu);
|
||||
@ -3941,15 +4093,20 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (vcpu->arch.exception.pending) {
|
||||
if (vmx->nested.nested_run_pending)
|
||||
if (vcpu->arch.exception_vmexit.pending) {
|
||||
if (block_nested_exceptions)
|
||||
return -EBUSY;
|
||||
if (!nested_vmx_check_exception(vcpu, &exit_qual))
|
||||
goto no_vmexit;
|
||||
nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
|
||||
|
||||
nested_vmx_inject_exception_vmexit(vcpu);
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (vcpu->arch.exception.pending) {
|
||||
if (block_nested_exceptions)
|
||||
return -EBUSY;
|
||||
goto no_vmexit;
|
||||
}
|
||||
|
||||
if (nested_vmx_preemption_timer_pending(vcpu)) {
|
||||
if (block_nested_events)
|
||||
return -EBUSY;
|
||||
@ -4255,14 +4412,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
|
||||
nested_vmx_abort(vcpu,
|
||||
VMX_ABORT_SAVE_GUEST_MSR_FAIL);
|
||||
}
|
||||
|
||||
/*
|
||||
* Drop what we picked up for L2 via vmx_complete_interrupts. It is
|
||||
* preserved above and would only end up incorrectly in L1.
|
||||
*/
|
||||
vcpu->arch.nmi_injected = false;
|
||||
kvm_clear_exception_queue(vcpu);
|
||||
kvm_clear_interrupt_queue(vcpu);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -4538,6 +4687,9 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
|
||||
struct vcpu_vmx *vmx = to_vmx(vcpu);
|
||||
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
|
||||
|
||||
/* Pending MTF traps are discarded on VM-Exit. */
|
||||
vmx->nested.mtf_pending = false;
|
||||
|
||||
/* trying to cancel vmlaunch/vmresume is a bug */
|
||||
WARN_ON_ONCE(vmx->nested.nested_run_pending);
|
||||
|
||||
@ -4602,6 +4754,17 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
|
||||
WARN_ON_ONCE(nested_early_check);
|
||||
}
|
||||
|
||||
/*
|
||||
* Drop events/exceptions that were queued for re-injection to L2
|
||||
* (picked up via vmx_complete_interrupts()), as well as exceptions
|
||||
* that were pending for L2. Note, this must NOT be hoisted above
|
||||
* prepare_vmcs12(), events/exceptions queued for re-injection need to
|
||||
* be captured in vmcs12 (see vmcs12_save_pending_event()).
|
||||
*/
|
||||
vcpu->arch.nmi_injected = false;
|
||||
kvm_clear_exception_queue(vcpu);
|
||||
kvm_clear_interrupt_queue(vcpu);
|
||||
|
||||
vmx_switch_vmcs(vcpu, &vmx->vmcs01);
|
||||
|
||||
/* Update any VMCS fields that might have changed while L2 ran */
|
||||
@ -5030,8 +5193,8 @@ static int handle_vmxoff(struct kvm_vcpu *vcpu)
|
||||
|
||||
free_nested(vcpu);
|
||||
|
||||
/* Process a latched INIT during time CPU was in VMX operation */
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
if (kvm_apic_has_pending_init_or_sipi(vcpu))
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
|
||||
return nested_vmx_succeed(vcpu);
|
||||
}
|
||||
@ -5067,7 +5230,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
|
||||
* state. It is possible that the area will stay mapped as
|
||||
* vmx->nested.hv_evmcs but this shouldn't be a problem.
|
||||
*/
|
||||
if (likely(!vmx->nested.enlightened_vmcs_enabled ||
|
||||
if (likely(!guest_cpuid_has_evmcs(vcpu) ||
|
||||
!nested_enlightened_vmentry(vcpu, &evmcs_gpa))) {
|
||||
if (vmptr == vmx->nested.current_vmptr)
|
||||
nested_release_vmcs12(vcpu);
|
||||
@ -6463,6 +6626,9 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
|
||||
if (ret)
|
||||
goto error_guest_mode;
|
||||
|
||||
if (vmx->nested.mtf_pending)
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
|
||||
return 0;
|
||||
|
||||
error_guest_mode:
|
||||
@ -6522,8 +6688,10 @@ static u64 nested_vmx_calc_vmcs_enum_msr(void)
|
||||
* bit in the high half is on if the corresponding bit in the control field
|
||||
* may be on. See also vmx_control_verify().
|
||||
*/
|
||||
void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
|
||||
void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
{
|
||||
struct nested_vmx_msrs *msrs = &vmcs_conf->nested;
|
||||
|
||||
/*
|
||||
* Note that as a general rule, the high half of the MSRs (bits in
|
||||
* the control fields which may be 1) should be initialized by the
|
||||
@ -6540,11 +6708,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
|
||||
*/
|
||||
|
||||
/* pin-based controls */
|
||||
rdmsr(MSR_IA32_VMX_PINBASED_CTLS,
|
||||
msrs->pinbased_ctls_low,
|
||||
msrs->pinbased_ctls_high);
|
||||
msrs->pinbased_ctls_low |=
|
||||
msrs->pinbased_ctls_low =
|
||||
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
|
||||
|
||||
msrs->pinbased_ctls_high = vmcs_conf->pin_based_exec_ctrl;
|
||||
msrs->pinbased_ctls_high &=
|
||||
PIN_BASED_EXT_INTR_MASK |
|
||||
PIN_BASED_NMI_EXITING |
|
||||
@ -6555,50 +6722,47 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
|
||||
PIN_BASED_VMX_PREEMPTION_TIMER;
|
||||
|
||||
/* exit controls */
|
||||
rdmsr(MSR_IA32_VMX_EXIT_CTLS,
|
||||
msrs->exit_ctls_low,
|
||||
msrs->exit_ctls_high);
|
||||
msrs->exit_ctls_low =
|
||||
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
|
||||
|
||||
msrs->exit_ctls_high = vmcs_conf->vmexit_ctrl;
|
||||
msrs->exit_ctls_high &=
|
||||
#ifdef CONFIG_X86_64
|
||||
VM_EXIT_HOST_ADDR_SPACE_SIZE |
|
||||
#endif
|
||||
VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
|
||||
VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
|
||||
VM_EXIT_CLEAR_BNDCFGS;
|
||||
msrs->exit_ctls_high |=
|
||||
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
|
||||
VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
|
||||
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT;
|
||||
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT |
|
||||
VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
|
||||
|
||||
/* We support free control of debug control saving. */
|
||||
msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;
|
||||
|
||||
/* entry controls */
|
||||
rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
|
||||
msrs->entry_ctls_low,
|
||||
msrs->entry_ctls_high);
|
||||
msrs->entry_ctls_low =
|
||||
VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
|
||||
|
||||
msrs->entry_ctls_high = vmcs_conf->vmentry_ctrl;
|
||||
msrs->entry_ctls_high &=
|
||||
#ifdef CONFIG_X86_64
|
||||
VM_ENTRY_IA32E_MODE |
|
||||
#endif
|
||||
VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
|
||||
VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
|
||||
VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
|
||||
msrs->entry_ctls_high |=
|
||||
(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER);
|
||||
(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
|
||||
VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
|
||||
|
||||
/* We support free control of debug control loading. */
|
||||
msrs->entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS;
|
||||
|
||||
/* cpu-based controls */
|
||||
rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
|
||||
msrs->procbased_ctls_low,
|
||||
msrs->procbased_ctls_high);
|
||||
msrs->procbased_ctls_low =
|
||||
CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
|
||||
|
||||
msrs->procbased_ctls_high = vmcs_conf->cpu_based_exec_ctrl;
|
||||
msrs->procbased_ctls_high &=
|
||||
CPU_BASED_INTR_WINDOW_EXITING |
|
||||
CPU_BASED_NMI_WINDOW_EXITING | CPU_BASED_USE_TSC_OFFSETTING |
|
||||
@ -6632,12 +6796,9 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
|
||||
* depend on CPUID bits, they are added later by
|
||||
* vmx_vcpu_after_set_cpuid.
|
||||
*/
|
||||
if (msrs->procbased_ctls_high & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
|
||||
rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
|
||||
msrs->secondary_ctls_low,
|
||||
msrs->secondary_ctls_high);
|
||||
|
||||
msrs->secondary_ctls_low = 0;
|
||||
|
||||
msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
|
||||
msrs->secondary_ctls_high &=
|
||||
SECONDARY_EXEC_DESC |
|
||||
SECONDARY_EXEC_ENABLE_RDTSCP |
|
||||
@ -6717,10 +6878,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
|
||||
msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING;
|
||||
|
||||
/* miscellaneous data */
|
||||
rdmsr(MSR_IA32_VMX_MISC,
|
||||
msrs->misc_low,
|
||||
msrs->misc_high);
|
||||
msrs->misc_low &= VMX_MISC_SAVE_EFER_LMA;
|
||||
msrs->misc_low = (u32)vmcs_conf->misc & VMX_MISC_SAVE_EFER_LMA;
|
||||
msrs->misc_low |=
|
||||
MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS |
|
||||
VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE |
|
||||
@ -6814,9 +6972,9 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
|
||||
|
||||
struct kvm_x86_nested_ops vmx_nested_ops = {
|
||||
.leave_nested = vmx_leave_nested,
|
||||
.is_exception_vmexit = nested_vmx_is_exception_vmexit,
|
||||
.check_events = vmx_check_nested_events,
|
||||
.handle_page_fault_workaround = nested_vmx_handle_page_fault_workaround,
|
||||
.hv_timer_pending = nested_vmx_preemption_timer_pending,
|
||||
.has_events = vmx_has_nested_events,
|
||||
.triple_fault = nested_vmx_triple_fault,
|
||||
.get_state = vmx_get_nested_state,
|
||||
.set_state = vmx_set_nested_state,
|
||||
|
@ -17,7 +17,7 @@ enum nvmx_vmentry_status {
|
||||
};
|
||||
|
||||
void vmx_leave_nested(struct kvm_vcpu *vcpu);
|
||||
void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
|
||||
void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps);
|
||||
void nested_vmx_hardware_unsetup(void);
|
||||
__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
|
||||
void nested_vmx_set_vmcs_shadowing_bitmap(void);
|
||||
|
@ -129,7 +129,7 @@ static int sgx_inject_fault(struct kvm_vcpu *vcpu, gva_t gva, int trapnr)
|
||||
ex.address = gva;
|
||||
ex.error_code_valid = true;
|
||||
ex.nested_page_fault = false;
|
||||
kvm_inject_page_fault(vcpu, &ex);
|
||||
kvm_inject_emulated_page_fault(vcpu, &ex);
|
||||
} else {
|
||||
kvm_inject_gp(vcpu, 0);
|
||||
}
|
||||
|
@ -189,13 +189,16 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
|
||||
xor %ebx, %ebx
|
||||
|
||||
.Lclear_regs:
|
||||
/* Discard @regs. The register is irrelevant, it just can't be RBX. */
|
||||
pop %_ASM_AX
|
||||
|
||||
/*
|
||||
* Clear all general purpose registers except RSP and RBX to prevent
|
||||
* speculative use of the guest's values, even those that are reloaded
|
||||
* via the stack. In theory, an L1 cache miss when restoring registers
|
||||
* could lead to speculative execution with the guest's values.
|
||||
* Zeroing XORs are dirt cheap, i.e. the extra paranoia is essentially
|
||||
* free. RSP and RAX are exempt as RSP is restored by hardware during
|
||||
* free. RSP and RBX are exempt as RSP is restored by hardware during
|
||||
* VM-Exit and RBX is explicitly loaded with 0 or 1 to hold the return
|
||||
* value.
|
||||
*/
|
||||
@ -216,9 +219,6 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
|
||||
xor %r15d, %r15d
|
||||
#endif
|
||||
|
||||
/* "POP" @regs. */
|
||||
add $WORD_SIZE, %_ASM_SP
|
||||
|
||||
/*
|
||||
* IMPORTANT: RSB filling and SPEC_CTRL handling must be done before
|
||||
* the first unbalanced RET after vmexit!
|
||||
@ -234,7 +234,6 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
|
||||
FILL_RETURN_BUFFER %_ASM_CX, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT,\
|
||||
X86_FEATURE_RSB_VMEXIT_LITE
|
||||
|
||||
|
||||
pop %_ASM_ARG2 /* @flags */
|
||||
pop %_ASM_ARG1 /* @vmx */
|
||||
|
||||
@ -293,22 +292,13 @@ SYM_FUNC_START(vmread_error_trampoline)
|
||||
push %r10
|
||||
push %r11
|
||||
#endif
|
||||
#ifdef CONFIG_X86_64
|
||||
|
||||
/* Load @field and @fault to arg1 and arg2 respectively. */
|
||||
mov 3*WORD_SIZE(%rbp), %_ASM_ARG2
|
||||
mov 2*WORD_SIZE(%rbp), %_ASM_ARG1
|
||||
#else
|
||||
/* Parameters are passed on the stack for 32-bit (see asmlinkage). */
|
||||
push 3*WORD_SIZE(%ebp)
|
||||
push 2*WORD_SIZE(%ebp)
|
||||
#endif
|
||||
mov 3*WORD_SIZE(%_ASM_BP), %_ASM_ARG2
|
||||
mov 2*WORD_SIZE(%_ASM_BP), %_ASM_ARG1
|
||||
|
||||
call vmread_error
|
||||
|
||||
#ifndef CONFIG_X86_64
|
||||
add $8, %esp
|
||||
#endif
|
||||
|
||||
/* Zero out @fault, which will be popped into the result register. */
|
||||
_ASM_MOV $0, 3*WORD_SIZE(%_ASM_BP)
|
||||
|
||||
|
@ -439,7 +439,7 @@ do { \
|
||||
pr_warn_ratelimited(fmt); \
|
||||
} while (0)
|
||||
|
||||
asmlinkage void vmread_error(unsigned long field, bool fault)
|
||||
void vmread_error(unsigned long field, bool fault)
|
||||
{
|
||||
if (fault)
|
||||
kvm_spurious_fault();
|
||||
@ -864,7 +864,7 @@ unsigned int __vmx_vcpu_run_flags(struct vcpu_vmx *vmx)
|
||||
return flags;
|
||||
}
|
||||
|
||||
static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
|
||||
static __always_inline void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
|
||||
unsigned long entry, unsigned long exit)
|
||||
{
|
||||
vm_entry_controls_clearbit(vmx, entry);
|
||||
@ -922,7 +922,7 @@ skip_guest:
|
||||
vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr);
|
||||
}
|
||||
|
||||
static void add_atomic_switch_msr_special(struct vcpu_vmx *vmx,
|
||||
static __always_inline void add_atomic_switch_msr_special(struct vcpu_vmx *vmx,
|
||||
unsigned long entry, unsigned long exit,
|
||||
unsigned long guest_val_vmcs, unsigned long host_val_vmcs,
|
||||
u64 guest_val, u64 host_val)
|
||||
@ -1652,17 +1652,25 @@ static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu)
|
||||
|
||||
/*
|
||||
* Per the SDM, MTF takes priority over debug-trap exceptions besides
|
||||
* T-bit traps. As instruction emulation is completed (i.e. at the
|
||||
* instruction boundary), any #DB exception pending delivery must be a
|
||||
* debug-trap. Record the pending MTF state to be delivered in
|
||||
* TSS T-bit traps and ICEBP (INT1). KVM doesn't emulate T-bit traps
|
||||
* or ICEBP (in the emulator proper), and skipping of ICEBP after an
|
||||
* intercepted #DB deliberately avoids single-step #DB and MTF updates
|
||||
* as ICEBP is higher priority than both. As instruction emulation is
|
||||
* completed at this point (i.e. KVM is at the instruction boundary),
|
||||
* any #DB exception pending delivery must be a debug-trap of lower
|
||||
* priority than MTF. Record the pending MTF state to be delivered in
|
||||
* vmx_check_nested_events().
|
||||
*/
|
||||
if (nested_cpu_has_mtf(vmcs12) &&
|
||||
(!vcpu->arch.exception.pending ||
|
||||
vcpu->arch.exception.nr == DB_VECTOR))
|
||||
vcpu->arch.exception.vector == DB_VECTOR) &&
|
||||
(!vcpu->arch.exception_vmexit.pending ||
|
||||
vcpu->arch.exception_vmexit.vector == DB_VECTOR)) {
|
||||
vmx->nested.mtf_pending = true;
|
||||
else
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
} else {
|
||||
vmx->nested.mtf_pending = false;
|
||||
}
|
||||
}
|
||||
|
||||
static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu)
|
||||
@ -1684,32 +1692,40 @@ static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
|
||||
vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
|
||||
}
|
||||
|
||||
static void vmx_queue_exception(struct kvm_vcpu *vcpu)
|
||||
static void vmx_inject_exception(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_queued_exception *ex = &vcpu->arch.exception;
|
||||
u32 intr_info = ex->vector | INTR_INFO_VALID_MASK;
|
||||
struct vcpu_vmx *vmx = to_vmx(vcpu);
|
||||
unsigned nr = vcpu->arch.exception.nr;
|
||||
bool has_error_code = vcpu->arch.exception.has_error_code;
|
||||
u32 error_code = vcpu->arch.exception.error_code;
|
||||
u32 intr_info = nr | INTR_INFO_VALID_MASK;
|
||||
|
||||
kvm_deliver_exception_payload(vcpu);
|
||||
kvm_deliver_exception_payload(vcpu, ex);
|
||||
|
||||
if (has_error_code) {
|
||||
vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code);
|
||||
if (ex->has_error_code) {
|
||||
/*
|
||||
* Despite the error code being architecturally defined as 32
|
||||
* bits, and the VMCS field being 32 bits, Intel CPUs and thus
|
||||
* VMX don't actually supporting setting bits 31:16. Hardware
|
||||
* will (should) never provide a bogus error code, but AMD CPUs
|
||||
* do generate error codes with bits 31:16 set, and so KVM's
|
||||
* ABI lets userspace shove in arbitrary 32-bit values. Drop
|
||||
* the upper bits to avoid VM-Fail, losing information that
|
||||
* does't really exist is preferable to killing the VM.
|
||||
*/
|
||||
vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)ex->error_code);
|
||||
intr_info |= INTR_INFO_DELIVER_CODE_MASK;
|
||||
}
|
||||
|
||||
if (vmx->rmode.vm86_active) {
|
||||
int inc_eip = 0;
|
||||
if (kvm_exception_is_soft(nr))
|
||||
if (kvm_exception_is_soft(ex->vector))
|
||||
inc_eip = vcpu->arch.event_exit_inst_len;
|
||||
kvm_inject_realmode_interrupt(vcpu, nr, inc_eip);
|
||||
kvm_inject_realmode_interrupt(vcpu, ex->vector, inc_eip);
|
||||
return;
|
||||
}
|
||||
|
||||
WARN_ON_ONCE(vmx->emulation_required);
|
||||
|
||||
if (kvm_exception_is_soft(nr)) {
|
||||
if (kvm_exception_is_soft(ex->vector)) {
|
||||
vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
|
||||
vmx->vcpu.arch.event_exit_inst_len);
|
||||
intr_info |= INTR_TYPE_SOFT_EXCEPTION;
|
||||
@ -1930,9 +1946,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
* sanity checking and refuse to boot. Filter all unsupported
|
||||
* features out.
|
||||
*/
|
||||
if (!msr_info->host_initiated &&
|
||||
vmx->nested.enlightened_vmcs_enabled)
|
||||
nested_evmcs_filter_control_msr(msr_info->index,
|
||||
if (!msr_info->host_initiated && guest_cpuid_has_evmcs(vcpu))
|
||||
nested_evmcs_filter_control_msr(vcpu, msr_info->index,
|
||||
&msr_info->data);
|
||||
break;
|
||||
case MSR_IA32_RTIT_CTL:
|
||||
@ -2494,6 +2509,30 @@ static bool cpu_has_sgx(void)
|
||||
return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0));
|
||||
}
|
||||
|
||||
/*
|
||||
* Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they
|
||||
* can't be used due to errata where VM Exit may incorrectly clear
|
||||
* IA32_PERF_GLOBAL_CTRL[34:32]. Work around the errata by using the
|
||||
* MSR load mechanism to switch IA32_PERF_GLOBAL_CTRL.
|
||||
*/
|
||||
static bool cpu_has_perf_global_ctrl_bug(void)
|
||||
{
|
||||
if (boot_cpu_data.x86 == 0x6) {
|
||||
switch (boot_cpu_data.x86_model) {
|
||||
case INTEL_FAM6_NEHALEM_EP: /* AAK155 */
|
||||
case INTEL_FAM6_NEHALEM: /* AAP115 */
|
||||
case INTEL_FAM6_WESTMERE: /* AAT100 */
|
||||
case INTEL_FAM6_WESTMERE_EP: /* BC86,AAY89,BD102 */
|
||||
case INTEL_FAM6_NEHALEM_EX: /* BA97 */
|
||||
return true;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
|
||||
u32 msr, u32 *result)
|
||||
{
|
||||
@ -2526,13 +2565,13 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
|
||||
struct vmx_capability *vmx_cap)
|
||||
{
|
||||
u32 vmx_msr_low, vmx_msr_high;
|
||||
u32 min, opt, min2, opt2;
|
||||
u32 _pin_based_exec_control = 0;
|
||||
u32 _cpu_based_exec_control = 0;
|
||||
u32 _cpu_based_2nd_exec_control = 0;
|
||||
u64 _cpu_based_3rd_exec_control = 0;
|
||||
u32 _vmexit_control = 0;
|
||||
u32 _vmentry_control = 0;
|
||||
u64 misc_msr;
|
||||
int i;
|
||||
|
||||
/*
|
||||
@ -2552,64 +2591,17 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
|
||||
};
|
||||
|
||||
memset(vmcs_conf, 0, sizeof(*vmcs_conf));
|
||||
min = CPU_BASED_HLT_EXITING |
|
||||
#ifdef CONFIG_X86_64
|
||||
CPU_BASED_CR8_LOAD_EXITING |
|
||||
CPU_BASED_CR8_STORE_EXITING |
|
||||
#endif
|
||||
CPU_BASED_CR3_LOAD_EXITING |
|
||||
CPU_BASED_CR3_STORE_EXITING |
|
||||
CPU_BASED_UNCOND_IO_EXITING |
|
||||
CPU_BASED_MOV_DR_EXITING |
|
||||
CPU_BASED_USE_TSC_OFFSETTING |
|
||||
CPU_BASED_MWAIT_EXITING |
|
||||
CPU_BASED_MONITOR_EXITING |
|
||||
CPU_BASED_INVLPG_EXITING |
|
||||
CPU_BASED_RDPMC_EXITING;
|
||||
|
||||
opt = CPU_BASED_TPR_SHADOW |
|
||||
CPU_BASED_USE_MSR_BITMAPS |
|
||||
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
|
||||
CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
|
||||
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
|
||||
&_cpu_based_exec_control) < 0)
|
||||
if (adjust_vmx_controls(KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL,
|
||||
KVM_OPTIONAL_VMX_CPU_BASED_VM_EXEC_CONTROL,
|
||||
MSR_IA32_VMX_PROCBASED_CTLS,
|
||||
&_cpu_based_exec_control))
|
||||
return -EIO;
|
||||
#ifdef CONFIG_X86_64
|
||||
if (_cpu_based_exec_control & CPU_BASED_TPR_SHADOW)
|
||||
_cpu_based_exec_control &= ~CPU_BASED_CR8_LOAD_EXITING &
|
||||
~CPU_BASED_CR8_STORE_EXITING;
|
||||
#endif
|
||||
if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) {
|
||||
min2 = 0;
|
||||
opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
|
||||
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
|
||||
SECONDARY_EXEC_WBINVD_EXITING |
|
||||
SECONDARY_EXEC_ENABLE_VPID |
|
||||
SECONDARY_EXEC_ENABLE_EPT |
|
||||
SECONDARY_EXEC_UNRESTRICTED_GUEST |
|
||||
SECONDARY_EXEC_PAUSE_LOOP_EXITING |
|
||||
SECONDARY_EXEC_DESC |
|
||||
SECONDARY_EXEC_ENABLE_RDTSCP |
|
||||
SECONDARY_EXEC_ENABLE_INVPCID |
|
||||
SECONDARY_EXEC_APIC_REGISTER_VIRT |
|
||||
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
|
||||
SECONDARY_EXEC_SHADOW_VMCS |
|
||||
SECONDARY_EXEC_XSAVES |
|
||||
SECONDARY_EXEC_RDSEED_EXITING |
|
||||
SECONDARY_EXEC_RDRAND_EXITING |
|
||||
SECONDARY_EXEC_ENABLE_PML |
|
||||
SECONDARY_EXEC_TSC_SCALING |
|
||||
SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
|
||||
SECONDARY_EXEC_PT_USE_GPA |
|
||||
SECONDARY_EXEC_PT_CONCEAL_VMX |
|
||||
SECONDARY_EXEC_ENABLE_VMFUNC |
|
||||
SECONDARY_EXEC_BUS_LOCK_DETECTION |
|
||||
SECONDARY_EXEC_NOTIFY_VM_EXITING;
|
||||
if (cpu_has_sgx())
|
||||
opt2 |= SECONDARY_EXEC_ENCLS_EXITING;
|
||||
if (adjust_vmx_controls(min2, opt2,
|
||||
if (adjust_vmx_controls(KVM_REQUIRED_VMX_SECONDARY_VM_EXEC_CONTROL,
|
||||
KVM_OPTIONAL_VMX_SECONDARY_VM_EXEC_CONTROL,
|
||||
MSR_IA32_VMX_PROCBASED_CTLS2,
|
||||
&_cpu_based_2nd_exec_control) < 0)
|
||||
&_cpu_based_2nd_exec_control))
|
||||
return -EIO;
|
||||
}
|
||||
#ifndef CONFIG_X86_64
|
||||
@ -2627,13 +2619,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
|
||||
rdmsr_safe(MSR_IA32_VMX_EPT_VPID_CAP,
|
||||
&vmx_cap->ept, &vmx_cap->vpid);
|
||||
|
||||
if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
|
||||
/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
|
||||
enabled */
|
||||
_cpu_based_exec_control &= ~(CPU_BASED_CR3_LOAD_EXITING |
|
||||
CPU_BASED_CR3_STORE_EXITING |
|
||||
CPU_BASED_INVLPG_EXITING);
|
||||
} else if (vmx_cap->ept) {
|
||||
if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) &&
|
||||
vmx_cap->ept) {
|
||||
pr_warn_once("EPT CAP should not exist if not support "
|
||||
"1-setting enable EPT VM-execution control\n");
|
||||
|
||||
@ -2653,32 +2640,24 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
|
||||
vmx_cap->vpid = 0;
|
||||
}
|
||||
|
||||
if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
|
||||
u64 opt3 = TERTIARY_EXEC_IPI_VIRT;
|
||||
if (!cpu_has_sgx())
|
||||
_cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_ENCLS_EXITING;
|
||||
|
||||
_cpu_based_3rd_exec_control = adjust_vmx_controls64(opt3,
|
||||
if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS)
|
||||
_cpu_based_3rd_exec_control =
|
||||
adjust_vmx_controls64(KVM_OPTIONAL_VMX_TERTIARY_VM_EXEC_CONTROL,
|
||||
MSR_IA32_VMX_PROCBASED_CTLS3);
|
||||
}
|
||||
|
||||
min = VM_EXIT_SAVE_DEBUG_CONTROLS | VM_EXIT_ACK_INTR_ON_EXIT;
|
||||
#ifdef CONFIG_X86_64
|
||||
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
|
||||
#endif
|
||||
opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
|
||||
VM_EXIT_LOAD_IA32_PAT |
|
||||
VM_EXIT_LOAD_IA32_EFER |
|
||||
VM_EXIT_CLEAR_BNDCFGS |
|
||||
VM_EXIT_PT_CONCEAL_PIP |
|
||||
VM_EXIT_CLEAR_IA32_RTIT_CTL;
|
||||
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
|
||||
&_vmexit_control) < 0)
|
||||
if (adjust_vmx_controls(KVM_REQUIRED_VMX_VM_EXIT_CONTROLS,
|
||||
KVM_OPTIONAL_VMX_VM_EXIT_CONTROLS,
|
||||
MSR_IA32_VMX_EXIT_CTLS,
|
||||
&_vmexit_control))
|
||||
return -EIO;
|
||||
|
||||
min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
|
||||
opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR |
|
||||
PIN_BASED_VMX_PREEMPTION_TIMER;
|
||||
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
|
||||
&_pin_based_exec_control) < 0)
|
||||
if (adjust_vmx_controls(KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL,
|
||||
KVM_OPTIONAL_VMX_PIN_BASED_VM_EXEC_CONTROL,
|
||||
MSR_IA32_VMX_PINBASED_CTLS,
|
||||
&_pin_based_exec_control))
|
||||
return -EIO;
|
||||
|
||||
if (cpu_has_broken_vmx_preemption_timer())
|
||||
@ -2687,15 +2666,10 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
|
||||
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
|
||||
_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
|
||||
|
||||
min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
|
||||
opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
|
||||
VM_ENTRY_LOAD_IA32_PAT |
|
||||
VM_ENTRY_LOAD_IA32_EFER |
|
||||
VM_ENTRY_LOAD_BNDCFGS |
|
||||
VM_ENTRY_PT_CONCEAL_PIP |
|
||||
VM_ENTRY_LOAD_IA32_RTIT_CTL;
|
||||
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
|
||||
&_vmentry_control) < 0)
|
||||
if (adjust_vmx_controls(KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS,
|
||||
KVM_OPTIONAL_VMX_VM_ENTRY_CONTROLS,
|
||||
MSR_IA32_VMX_ENTRY_CTLS,
|
||||
&_vmentry_control))
|
||||
return -EIO;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(vmcs_entry_exit_pairs); i++) {
|
||||
@ -2715,30 +2689,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
|
||||
_vmexit_control &= ~x_ctrl;
|
||||
}
|
||||
|
||||
/*
|
||||
* Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they
|
||||
* can't be used due to an errata where VM Exit may incorrectly clear
|
||||
* IA32_PERF_GLOBAL_CTRL[34:32]. Workaround the errata by using the
|
||||
* MSR load mechanism to switch IA32_PERF_GLOBAL_CTRL.
|
||||
*/
|
||||
if (boot_cpu_data.x86 == 0x6) {
|
||||
switch (boot_cpu_data.x86_model) {
|
||||
case 26: /* AAK155 */
|
||||
case 30: /* AAP115 */
|
||||
case 37: /* AAT100 */
|
||||
case 44: /* BC86,AAY89,BD102 */
|
||||
case 46: /* BA97 */
|
||||
_vmentry_control &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
|
||||
_vmexit_control &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
|
||||
pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
|
||||
"does not work properly. Using workaround\n");
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
rdmsr(MSR_IA32_VMX_BASIC, vmx_msr_low, vmx_msr_high);
|
||||
|
||||
/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
|
||||
@ -2755,6 +2705,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
|
||||
if (((vmx_msr_high >> 18) & 15) != 6)
|
||||
return -EIO;
|
||||
|
||||
rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
|
||||
|
||||
vmcs_conf->size = vmx_msr_high & 0x1fff;
|
||||
vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
|
||||
|
||||
@ -2766,11 +2718,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
|
||||
vmcs_conf->cpu_based_3rd_exec_ctrl = _cpu_based_3rd_exec_control;
|
||||
vmcs_conf->vmexit_ctrl = _vmexit_control;
|
||||
vmcs_conf->vmentry_ctrl = _vmentry_control;
|
||||
|
||||
#if IS_ENABLED(CONFIG_HYPERV)
|
||||
if (enlightened_vmcs)
|
||||
evmcs_sanitize_exec_ctrls(vmcs_conf);
|
||||
#endif
|
||||
vmcs_conf->misc = misc_msr;
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -3037,10 +2985,15 @@ int vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
|
||||
return 0;
|
||||
|
||||
vcpu->arch.efer = efer;
|
||||
#ifdef CONFIG_X86_64
|
||||
if (efer & EFER_LMA)
|
||||
vm_entry_controls_setbit(vmx, VM_ENTRY_IA32E_MODE);
|
||||
else
|
||||
vm_entry_controls_clearbit(vmx, VM_ENTRY_IA32E_MODE);
|
||||
#else
|
||||
if (KVM_BUG_ON(efer & EFER_LMA, vcpu->kvm))
|
||||
return 1;
|
||||
#endif
|
||||
|
||||
vmx_setup_uret_msrs(vmx);
|
||||
return 0;
|
||||
@ -4327,18 +4280,37 @@ static u32 vmx_vmentry_ctrl(void)
|
||||
if (vmx_pt_mode_is_system())
|
||||
vmentry_ctrl &= ~(VM_ENTRY_PT_CONCEAL_PIP |
|
||||
VM_ENTRY_LOAD_IA32_RTIT_CTL);
|
||||
/* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
|
||||
return vmentry_ctrl &
|
||||
~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_EFER);
|
||||
/*
|
||||
* IA32e mode, and loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically.
|
||||
*/
|
||||
vmentry_ctrl &= ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
|
||||
VM_ENTRY_LOAD_IA32_EFER |
|
||||
VM_ENTRY_IA32E_MODE);
|
||||
|
||||
if (cpu_has_perf_global_ctrl_bug())
|
||||
vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
|
||||
|
||||
return vmentry_ctrl;
|
||||
}
|
||||
|
||||
static u32 vmx_vmexit_ctrl(void)
|
||||
{
|
||||
u32 vmexit_ctrl = vmcs_config.vmexit_ctrl;
|
||||
|
||||
/*
|
||||
* Not used by KVM and never set in vmcs01 or vmcs02, but emulated for
|
||||
* nested virtualization and thus allowed to be set in vmcs12.
|
||||
*/
|
||||
vmexit_ctrl &= ~(VM_EXIT_SAVE_IA32_PAT | VM_EXIT_SAVE_IA32_EFER |
|
||||
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
|
||||
|
||||
if (vmx_pt_mode_is_system())
|
||||
vmexit_ctrl &= ~(VM_EXIT_PT_CONCEAL_PIP |
|
||||
VM_EXIT_CLEAR_IA32_RTIT_CTL);
|
||||
|
||||
if (cpu_has_perf_global_ctrl_bug())
|
||||
vmexit_ctrl &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
|
||||
|
||||
/* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
|
||||
return vmexit_ctrl &
|
||||
~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER);
|
||||
@ -4376,20 +4348,38 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
|
||||
{
|
||||
u32 exec_control = vmcs_config.cpu_based_exec_ctrl;
|
||||
|
||||
/*
|
||||
* Not used by KVM, but fully supported for nesting, i.e. are allowed in
|
||||
* vmcs12 and propagated to vmcs02 when set in vmcs12.
|
||||
*/
|
||||
exec_control &= ~(CPU_BASED_RDTSC_EXITING |
|
||||
CPU_BASED_USE_IO_BITMAPS |
|
||||
CPU_BASED_MONITOR_TRAP_FLAG |
|
||||
CPU_BASED_PAUSE_EXITING);
|
||||
|
||||
/* INTR_WINDOW_EXITING and NMI_WINDOW_EXITING are toggled dynamically */
|
||||
exec_control &= ~(CPU_BASED_INTR_WINDOW_EXITING |
|
||||
CPU_BASED_NMI_WINDOW_EXITING);
|
||||
|
||||
if (vmx->vcpu.arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)
|
||||
exec_control &= ~CPU_BASED_MOV_DR_EXITING;
|
||||
|
||||
if (!cpu_need_tpr_shadow(&vmx->vcpu)) {
|
||||
if (!cpu_need_tpr_shadow(&vmx->vcpu))
|
||||
exec_control &= ~CPU_BASED_TPR_SHADOW;
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
if (exec_control & CPU_BASED_TPR_SHADOW)
|
||||
exec_control &= ~(CPU_BASED_CR8_LOAD_EXITING |
|
||||
CPU_BASED_CR8_STORE_EXITING);
|
||||
else
|
||||
exec_control |= CPU_BASED_CR8_STORE_EXITING |
|
||||
CPU_BASED_CR8_LOAD_EXITING;
|
||||
#endif
|
||||
}
|
||||
if (!enable_ept)
|
||||
exec_control |= CPU_BASED_CR3_STORE_EXITING |
|
||||
CPU_BASED_CR3_LOAD_EXITING |
|
||||
CPU_BASED_INVLPG_EXITING;
|
||||
/* No need to intercept CR3 access or INVPLG when using EPT. */
|
||||
if (enable_ept)
|
||||
exec_control &= ~(CPU_BASED_CR3_LOAD_EXITING |
|
||||
CPU_BASED_CR3_STORE_EXITING |
|
||||
CPU_BASED_INVLPG_EXITING);
|
||||
if (kvm_mwait_in_guest(vmx->vcpu.kvm))
|
||||
exec_control &= ~(CPU_BASED_MWAIT_EXITING |
|
||||
CPU_BASED_MONITOR_EXITING);
|
||||
@ -5155,8 +5145,10 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
|
||||
* instruction. ICEBP generates a trap-like #DB, but
|
||||
* despite its interception control being tied to #DB,
|
||||
* is an instruction intercept, i.e. the VM-Exit occurs
|
||||
* on the ICEBP itself. Note, skipping ICEBP also
|
||||
* clears STI and MOVSS blocking.
|
||||
* on the ICEBP itself. Use the inner "skip" helper to
|
||||
* avoid single-step #DB and MTF updates, as ICEBP is
|
||||
* higher priority. Note, skipping ICEBP still clears
|
||||
* STI and MOVSS blocking.
|
||||
*
|
||||
* For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS
|
||||
* if single-step is enabled in RFLAGS and STI or MOVSS
|
||||
@ -5638,7 +5630,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
|
||||
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
|
||||
|
||||
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
|
||||
trace_kvm_page_fault(gpa, exit_qualification);
|
||||
trace_kvm_page_fault(vcpu, gpa, exit_qualification);
|
||||
|
||||
/* Is it a read fault? */
|
||||
error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
|
||||
@ -5710,7 +5702,7 @@ static bool vmx_emulation_required_with_pending_exception(struct kvm_vcpu *vcpu)
|
||||
struct vcpu_vmx *vmx = to_vmx(vcpu);
|
||||
|
||||
return vmx->emulation_required && !vmx->rmode.vm86_active &&
|
||||
(vcpu->arch.exception.pending || vcpu->arch.exception.injected);
|
||||
(kvm_is_exception_pending(vcpu) || vcpu->arch.exception.injected);
|
||||
}
|
||||
|
||||
static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
|
||||
@ -7430,7 +7422,7 @@ static int __init vmx_check_processor_compat(void)
|
||||
if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
|
||||
return -EIO;
|
||||
if (nested)
|
||||
nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
|
||||
nested_vmx_setup_ctls_msrs(&vmcs_conf, vmx_cap.ept);
|
||||
if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
|
||||
printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
|
||||
smp_processor_id());
|
||||
@ -8070,7 +8062,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
|
||||
.patch_hypercall = vmx_patch_hypercall,
|
||||
.inject_irq = vmx_inject_irq,
|
||||
.inject_nmi = vmx_inject_nmi,
|
||||
.queue_exception = vmx_queue_exception,
|
||||
.inject_exception = vmx_inject_exception,
|
||||
.cancel_injection = vmx_cancel_injection,
|
||||
.interrupt_allowed = vmx_interrupt_allowed,
|
||||
.nmi_allowed = vmx_nmi_allowed,
|
||||
@ -8227,6 +8219,10 @@ static __init int hardware_setup(void)
|
||||
if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
|
||||
return -EIO;
|
||||
|
||||
if (cpu_has_perf_global_ctrl_bug())
|
||||
pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
|
||||
"does not work properly. Using workaround\n");
|
||||
|
||||
if (boot_cpu_has(X86_FEATURE_NX))
|
||||
kvm_enable_efer_bits(EFER_NX);
|
||||
|
||||
@ -8341,11 +8337,9 @@ static __init int hardware_setup(void)
|
||||
|
||||
if (enable_preemption_timer) {
|
||||
u64 use_timer_freq = 5000ULL * 1000 * 1000;
|
||||
u64 vmx_msr;
|
||||
|
||||
rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
|
||||
cpu_preemption_timer_multi =
|
||||
vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
|
||||
vmcs_config.misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
|
||||
|
||||
if (tsc_khz)
|
||||
use_timer_freq = (u64)tsc_khz * 1000;
|
||||
@ -8381,8 +8375,7 @@ static __init int hardware_setup(void)
|
||||
setup_default_sgx_lepubkeyhash();
|
||||
|
||||
if (nested) {
|
||||
nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
|
||||
vmx_capability.ept);
|
||||
nested_vmx_setup_ctls_msrs(&vmcs_config, vmx_capability.ept);
|
||||
|
||||
r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
|
||||
if (r)
|
||||
|
@ -477,29 +477,145 @@ static inline u8 vmx_get_rvi(void)
|
||||
return vmcs_read16(GUEST_INTR_STATUS) & 0xff;
|
||||
}
|
||||
|
||||
#define BUILD_CONTROLS_SHADOW(lname, uname, bits) \
|
||||
static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val) \
|
||||
{ \
|
||||
if (vmx->loaded_vmcs->controls_shadow.lname != val) { \
|
||||
vmcs_write##bits(uname, val); \
|
||||
vmx->loaded_vmcs->controls_shadow.lname = val; \
|
||||
} \
|
||||
} \
|
||||
static inline u##bits __##lname##_controls_get(struct loaded_vmcs *vmcs) \
|
||||
{ \
|
||||
return vmcs->controls_shadow.lname; \
|
||||
} \
|
||||
static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx) \
|
||||
{ \
|
||||
return __##lname##_controls_get(vmx->loaded_vmcs); \
|
||||
} \
|
||||
static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val) \
|
||||
{ \
|
||||
lname##_controls_set(vmx, lname##_controls_get(vmx) | val); \
|
||||
} \
|
||||
static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val) \
|
||||
{ \
|
||||
lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \
|
||||
#define __KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS \
|
||||
(VM_ENTRY_LOAD_DEBUG_CONTROLS)
|
||||
#ifdef CONFIG_X86_64
|
||||
#define KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS \
|
||||
(__KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS | \
|
||||
VM_ENTRY_IA32E_MODE)
|
||||
#else
|
||||
#define KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS \
|
||||
__KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS
|
||||
#endif
|
||||
#define KVM_OPTIONAL_VMX_VM_ENTRY_CONTROLS \
|
||||
(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | \
|
||||
VM_ENTRY_LOAD_IA32_PAT | \
|
||||
VM_ENTRY_LOAD_IA32_EFER | \
|
||||
VM_ENTRY_LOAD_BNDCFGS | \
|
||||
VM_ENTRY_PT_CONCEAL_PIP | \
|
||||
VM_ENTRY_LOAD_IA32_RTIT_CTL)
|
||||
|
||||
#define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS \
|
||||
(VM_EXIT_SAVE_DEBUG_CONTROLS | \
|
||||
VM_EXIT_ACK_INTR_ON_EXIT)
|
||||
#ifdef CONFIG_X86_64
|
||||
#define KVM_REQUIRED_VMX_VM_EXIT_CONTROLS \
|
||||
(__KVM_REQUIRED_VMX_VM_EXIT_CONTROLS | \
|
||||
VM_EXIT_HOST_ADDR_SPACE_SIZE)
|
||||
#else
|
||||
#define KVM_REQUIRED_VMX_VM_EXIT_CONTROLS \
|
||||
__KVM_REQUIRED_VMX_VM_EXIT_CONTROLS
|
||||
#endif
|
||||
#define KVM_OPTIONAL_VMX_VM_EXIT_CONTROLS \
|
||||
(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | \
|
||||
VM_EXIT_SAVE_IA32_PAT | \
|
||||
VM_EXIT_LOAD_IA32_PAT | \
|
||||
VM_EXIT_SAVE_IA32_EFER | \
|
||||
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | \
|
||||
VM_EXIT_LOAD_IA32_EFER | \
|
||||
VM_EXIT_CLEAR_BNDCFGS | \
|
||||
VM_EXIT_PT_CONCEAL_PIP | \
|
||||
VM_EXIT_CLEAR_IA32_RTIT_CTL)
|
||||
|
||||
#define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL \
|
||||
(PIN_BASED_EXT_INTR_MASK | \
|
||||
PIN_BASED_NMI_EXITING)
|
||||
#define KVM_OPTIONAL_VMX_PIN_BASED_VM_EXEC_CONTROL \
|
||||
(PIN_BASED_VIRTUAL_NMIS | \
|
||||
PIN_BASED_POSTED_INTR | \
|
||||
PIN_BASED_VMX_PREEMPTION_TIMER)
|
||||
|
||||
#define __KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL \
|
||||
(CPU_BASED_HLT_EXITING | \
|
||||
CPU_BASED_CR3_LOAD_EXITING | \
|
||||
CPU_BASED_CR3_STORE_EXITING | \
|
||||
CPU_BASED_UNCOND_IO_EXITING | \
|
||||
CPU_BASED_MOV_DR_EXITING | \
|
||||
CPU_BASED_USE_TSC_OFFSETTING | \
|
||||
CPU_BASED_MWAIT_EXITING | \
|
||||
CPU_BASED_MONITOR_EXITING | \
|
||||
CPU_BASED_INVLPG_EXITING | \
|
||||
CPU_BASED_RDPMC_EXITING | \
|
||||
CPU_BASED_INTR_WINDOW_EXITING)
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
#define KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL \
|
||||
(__KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL | \
|
||||
CPU_BASED_CR8_LOAD_EXITING | \
|
||||
CPU_BASED_CR8_STORE_EXITING)
|
||||
#else
|
||||
#define KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL \
|
||||
__KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL
|
||||
#endif
|
||||
|
||||
#define KVM_OPTIONAL_VMX_CPU_BASED_VM_EXEC_CONTROL \
|
||||
(CPU_BASED_RDTSC_EXITING | \
|
||||
CPU_BASED_TPR_SHADOW | \
|
||||
CPU_BASED_USE_IO_BITMAPS | \
|
||||
CPU_BASED_MONITOR_TRAP_FLAG | \
|
||||
CPU_BASED_USE_MSR_BITMAPS | \
|
||||
CPU_BASED_NMI_WINDOW_EXITING | \
|
||||
CPU_BASED_PAUSE_EXITING | \
|
||||
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS | \
|
||||
CPU_BASED_ACTIVATE_TERTIARY_CONTROLS)
|
||||
|
||||
#define KVM_REQUIRED_VMX_SECONDARY_VM_EXEC_CONTROL 0
|
||||
#define KVM_OPTIONAL_VMX_SECONDARY_VM_EXEC_CONTROL \
|
||||
(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | \
|
||||
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE | \
|
||||
SECONDARY_EXEC_WBINVD_EXITING | \
|
||||
SECONDARY_EXEC_ENABLE_VPID | \
|
||||
SECONDARY_EXEC_ENABLE_EPT | \
|
||||
SECONDARY_EXEC_UNRESTRICTED_GUEST | \
|
||||
SECONDARY_EXEC_PAUSE_LOOP_EXITING | \
|
||||
SECONDARY_EXEC_DESC | \
|
||||
SECONDARY_EXEC_ENABLE_RDTSCP | \
|
||||
SECONDARY_EXEC_ENABLE_INVPCID | \
|
||||
SECONDARY_EXEC_APIC_REGISTER_VIRT | \
|
||||
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | \
|
||||
SECONDARY_EXEC_SHADOW_VMCS | \
|
||||
SECONDARY_EXEC_XSAVES | \
|
||||
SECONDARY_EXEC_RDSEED_EXITING | \
|
||||
SECONDARY_EXEC_RDRAND_EXITING | \
|
||||
SECONDARY_EXEC_ENABLE_PML | \
|
||||
SECONDARY_EXEC_TSC_SCALING | \
|
||||
SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE | \
|
||||
SECONDARY_EXEC_PT_USE_GPA | \
|
||||
SECONDARY_EXEC_PT_CONCEAL_VMX | \
|
||||
SECONDARY_EXEC_ENABLE_VMFUNC | \
|
||||
SECONDARY_EXEC_BUS_LOCK_DETECTION | \
|
||||
SECONDARY_EXEC_NOTIFY_VM_EXITING | \
|
||||
SECONDARY_EXEC_ENCLS_EXITING)
|
||||
|
||||
#define KVM_REQUIRED_VMX_TERTIARY_VM_EXEC_CONTROL 0
|
||||
#define KVM_OPTIONAL_VMX_TERTIARY_VM_EXEC_CONTROL \
|
||||
(TERTIARY_EXEC_IPI_VIRT)
|
||||
|
||||
#define BUILD_CONTROLS_SHADOW(lname, uname, bits) \
|
||||
static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val) \
|
||||
{ \
|
||||
if (vmx->loaded_vmcs->controls_shadow.lname != val) { \
|
||||
vmcs_write##bits(uname, val); \
|
||||
vmx->loaded_vmcs->controls_shadow.lname = val; \
|
||||
} \
|
||||
} \
|
||||
static inline u##bits __##lname##_controls_get(struct loaded_vmcs *vmcs) \
|
||||
{ \
|
||||
return vmcs->controls_shadow.lname; \
|
||||
} \
|
||||
static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx) \
|
||||
{ \
|
||||
return __##lname##_controls_get(vmx->loaded_vmcs); \
|
||||
} \
|
||||
static __always_inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val) \
|
||||
{ \
|
||||
BUILD_BUG_ON(!(val & (KVM_REQUIRED_VMX_##uname | KVM_OPTIONAL_VMX_##uname))); \
|
||||
lname##_controls_set(vmx, lname##_controls_get(vmx) | val); \
|
||||
} \
|
||||
static __always_inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val) \
|
||||
{ \
|
||||
BUILD_BUG_ON(!(val & (KVM_REQUIRED_VMX_##uname | KVM_OPTIONAL_VMX_##uname))); \
|
||||
lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \
|
||||
}
|
||||
BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32)
|
||||
BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32)
|
||||
@ -626,4 +742,14 @@ static inline bool vmx_can_use_ipiv(struct kvm_vcpu *vcpu)
|
||||
return lapic_in_kernel(vcpu) && enable_ipiv;
|
||||
}
|
||||
|
||||
static inline bool guest_cpuid_has_evmcs(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
/*
|
||||
* eVMCS is exposed to the guest if Hyper-V is enabled in CPUID and
|
||||
* eVMCS has been explicitly enabled by userspace.
|
||||
*/
|
||||
return vcpu->arch.hyperv_enabled &&
|
||||
to_vmx(vcpu)->nested.enlightened_vmcs_enabled;
|
||||
}
|
||||
|
||||
#endif /* __KVM_X86_VMX_H */
|
||||
|
@ -10,7 +10,7 @@
|
||||
#include "vmcs.h"
|
||||
#include "../x86.h"
|
||||
|
||||
asmlinkage void vmread_error(unsigned long field, bool fault);
|
||||
void vmread_error(unsigned long field, bool fault);
|
||||
__attribute__((regparm(0))) void vmread_error_trampoline(unsigned long field,
|
||||
bool fault);
|
||||
void vmwrite_error(unsigned long field, unsigned long value);
|
||||
|
@ -173,8 +173,13 @@ bool __read_mostly enable_vmware_backdoor = false;
|
||||
module_param(enable_vmware_backdoor, bool, S_IRUGO);
|
||||
EXPORT_SYMBOL_GPL(enable_vmware_backdoor);
|
||||
|
||||
static bool __read_mostly force_emulation_prefix = false;
|
||||
module_param(force_emulation_prefix, bool, S_IRUGO);
|
||||
/*
|
||||
* Flags to manipulate forced emulation behavior (any non-zero value will
|
||||
* enable forced emulation).
|
||||
*/
|
||||
#define KVM_FEP_CLEAR_RFLAGS_RF BIT(1)
|
||||
static int __read_mostly force_emulation_prefix;
|
||||
module_param(force_emulation_prefix, int, 0644);
|
||||
|
||||
int __read_mostly pi_inject_timer = -1;
|
||||
module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
|
||||
@ -528,6 +533,7 @@ static int exception_class(int vector)
|
||||
#define EXCPT_TRAP 1
|
||||
#define EXCPT_ABORT 2
|
||||
#define EXCPT_INTERRUPT 3
|
||||
#define EXCPT_DB 4
|
||||
|
||||
static int exception_type(int vector)
|
||||
{
|
||||
@ -538,8 +544,14 @@ static int exception_type(int vector)
|
||||
|
||||
mask = 1 << vector;
|
||||
|
||||
/* #DB is trap, as instruction watchpoints are handled elsewhere */
|
||||
if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
|
||||
/*
|
||||
* #DBs can be trap-like or fault-like, the caller must check other CPU
|
||||
* state, e.g. DR6, to determine whether a #DB is a trap or fault.
|
||||
*/
|
||||
if (mask & (1 << DB_VECTOR))
|
||||
return EXCPT_DB;
|
||||
|
||||
if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
|
||||
return EXCPT_TRAP;
|
||||
|
||||
if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
|
||||
@ -549,16 +561,13 @@ static int exception_type(int vector)
|
||||
return EXCPT_FAULT;
|
||||
}
|
||||
|
||||
void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu)
|
||||
void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
|
||||
struct kvm_queued_exception *ex)
|
||||
{
|
||||
unsigned nr = vcpu->arch.exception.nr;
|
||||
bool has_payload = vcpu->arch.exception.has_payload;
|
||||
unsigned long payload = vcpu->arch.exception.payload;
|
||||
|
||||
if (!has_payload)
|
||||
if (!ex->has_payload)
|
||||
return;
|
||||
|
||||
switch (nr) {
|
||||
switch (ex->vector) {
|
||||
case DB_VECTOR:
|
||||
/*
|
||||
* "Certain debug exceptions may clear bit 0-3. The
|
||||
@ -583,8 +592,8 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu)
|
||||
* So they need to be flipped for DR6.
|
||||
*/
|
||||
vcpu->arch.dr6 |= DR6_ACTIVE_LOW;
|
||||
vcpu->arch.dr6 |= payload;
|
||||
vcpu->arch.dr6 ^= payload & DR6_ACTIVE_LOW;
|
||||
vcpu->arch.dr6 |= ex->payload;
|
||||
vcpu->arch.dr6 ^= ex->payload & DR6_ACTIVE_LOW;
|
||||
|
||||
/*
|
||||
* The #DB payload is defined as compatible with the 'pending
|
||||
@ -595,15 +604,30 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu)
|
||||
vcpu->arch.dr6 &= ~BIT(12);
|
||||
break;
|
||||
case PF_VECTOR:
|
||||
vcpu->arch.cr2 = payload;
|
||||
vcpu->arch.cr2 = ex->payload;
|
||||
break;
|
||||
}
|
||||
|
||||
vcpu->arch.exception.has_payload = false;
|
||||
vcpu->arch.exception.payload = 0;
|
||||
ex->has_payload = false;
|
||||
ex->payload = 0;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_deliver_exception_payload);
|
||||
|
||||
static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vector,
|
||||
bool has_error_code, u32 error_code,
|
||||
bool has_payload, unsigned long payload)
|
||||
{
|
||||
struct kvm_queued_exception *ex = &vcpu->arch.exception_vmexit;
|
||||
|
||||
ex->vector = vector;
|
||||
ex->injected = false;
|
||||
ex->pending = true;
|
||||
ex->has_error_code = has_error_code;
|
||||
ex->error_code = error_code;
|
||||
ex->has_payload = has_payload;
|
||||
ex->payload = payload;
|
||||
}
|
||||
|
||||
static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
|
||||
unsigned nr, bool has_error, u32 error_code,
|
||||
bool has_payload, unsigned long payload, bool reinject)
|
||||
@ -613,18 +637,31 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
|
||||
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
|
||||
/*
|
||||
* If the exception is destined for L2 and isn't being reinjected,
|
||||
* morph it to a VM-Exit if L1 wants to intercept the exception. A
|
||||
* previously injected exception is not checked because it was checked
|
||||
* when it was original queued, and re-checking is incorrect if _L1_
|
||||
* injected the exception, in which case it's exempt from interception.
|
||||
*/
|
||||
if (!reinject && is_guest_mode(vcpu) &&
|
||||
kvm_x86_ops.nested_ops->is_exception_vmexit(vcpu, nr, error_code)) {
|
||||
kvm_queue_exception_vmexit(vcpu, nr, has_error, error_code,
|
||||
has_payload, payload);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!vcpu->arch.exception.pending && !vcpu->arch.exception.injected) {
|
||||
queue:
|
||||
if (reinject) {
|
||||
/*
|
||||
* On vmentry, vcpu->arch.exception.pending is only
|
||||
* true if an event injection was blocked by
|
||||
* nested_run_pending. In that case, however,
|
||||
* vcpu_enter_guest requests an immediate exit,
|
||||
* and the guest shouldn't proceed far enough to
|
||||
* need reinjection.
|
||||
* On VM-Entry, an exception can be pending if and only
|
||||
* if event injection was blocked by nested_run_pending.
|
||||
* In that case, however, vcpu_enter_guest() requests an
|
||||
* immediate exit, and the guest shouldn't proceed far
|
||||
* enough to need reinjection.
|
||||
*/
|
||||
WARN_ON_ONCE(vcpu->arch.exception.pending);
|
||||
WARN_ON_ONCE(kvm_is_exception_pending(vcpu));
|
||||
vcpu->arch.exception.injected = true;
|
||||
if (WARN_ON_ONCE(has_payload)) {
|
||||
/*
|
||||
@ -639,17 +676,18 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
|
||||
vcpu->arch.exception.injected = false;
|
||||
}
|
||||
vcpu->arch.exception.has_error_code = has_error;
|
||||
vcpu->arch.exception.nr = nr;
|
||||
vcpu->arch.exception.vector = nr;
|
||||
vcpu->arch.exception.error_code = error_code;
|
||||
vcpu->arch.exception.has_payload = has_payload;
|
||||
vcpu->arch.exception.payload = payload;
|
||||
if (!is_guest_mode(vcpu))
|
||||
kvm_deliver_exception_payload(vcpu);
|
||||
kvm_deliver_exception_payload(vcpu,
|
||||
&vcpu->arch.exception);
|
||||
return;
|
||||
}
|
||||
|
||||
/* to check exception */
|
||||
prev_nr = vcpu->arch.exception.nr;
|
||||
prev_nr = vcpu->arch.exception.vector;
|
||||
if (prev_nr == DF_VECTOR) {
|
||||
/* triple fault -> shutdown */
|
||||
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
|
||||
@ -657,25 +695,22 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
|
||||
}
|
||||
class1 = exception_class(prev_nr);
|
||||
class2 = exception_class(nr);
|
||||
if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY)
|
||||
|| (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
|
||||
if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY) ||
|
||||
(class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
|
||||
/*
|
||||
* Generate double fault per SDM Table 5-5. Set
|
||||
* exception.pending = true so that the double fault
|
||||
* can trigger a nested vmexit.
|
||||
* Synthesize #DF. Clear the previously injected or pending
|
||||
* exception so as not to incorrectly trigger shutdown.
|
||||
*/
|
||||
vcpu->arch.exception.pending = true;
|
||||
vcpu->arch.exception.injected = false;
|
||||
vcpu->arch.exception.has_error_code = true;
|
||||
vcpu->arch.exception.nr = DF_VECTOR;
|
||||
vcpu->arch.exception.error_code = 0;
|
||||
vcpu->arch.exception.has_payload = false;
|
||||
vcpu->arch.exception.payload = 0;
|
||||
} else
|
||||
vcpu->arch.exception.pending = false;
|
||||
|
||||
kvm_queue_exception_e(vcpu, DF_VECTOR, 0);
|
||||
} else {
|
||||
/* replace previous exception with a new one in a hope
|
||||
that instruction re-execution will regenerate lost
|
||||
exception */
|
||||
goto queue;
|
||||
}
|
||||
}
|
||||
|
||||
void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
|
||||
@ -729,20 +764,22 @@ static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err)
|
||||
void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
|
||||
{
|
||||
++vcpu->stat.pf_guest;
|
||||
vcpu->arch.exception.nested_apf =
|
||||
is_guest_mode(vcpu) && fault->async_page_fault;
|
||||
if (vcpu->arch.exception.nested_apf) {
|
||||
vcpu->arch.apf.nested_apf_token = fault->address;
|
||||
kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
|
||||
} else {
|
||||
|
||||
/*
|
||||
* Async #PF in L2 is always forwarded to L1 as a VM-Exit regardless of
|
||||
* whether or not L1 wants to intercept "regular" #PF.
|
||||
*/
|
||||
if (is_guest_mode(vcpu) && fault->async_page_fault)
|
||||
kvm_queue_exception_vmexit(vcpu, PF_VECTOR,
|
||||
true, fault->error_code,
|
||||
true, fault->address);
|
||||
else
|
||||
kvm_queue_exception_e_p(vcpu, PF_VECTOR, fault->error_code,
|
||||
fault->address);
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
|
||||
|
||||
/* Returns true if the page fault was immediately morphed into a VM-Exit. */
|
||||
bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
|
||||
void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault)
|
||||
{
|
||||
struct kvm_mmu *fault_mmu;
|
||||
@ -760,26 +797,7 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
|
||||
kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address,
|
||||
fault_mmu->root.hpa);
|
||||
|
||||
/*
|
||||
* A workaround for KVM's bad exception handling. If KVM injected an
|
||||
* exception into L2, and L2 encountered a #PF while vectoring the
|
||||
* injected exception, manually check to see if L1 wants to intercept
|
||||
* #PF, otherwise queuing the #PF will lead to #DF or a lost exception.
|
||||
* In all other cases, defer the check to nested_ops->check_events(),
|
||||
* which will correctly handle priority (this does not). Note, other
|
||||
* exceptions, e.g. #GP, are theoretically affected, #PF is simply the
|
||||
* most problematic, e.g. when L0 and L1 are both intercepting #PF for
|
||||
* shadow paging.
|
||||
*
|
||||
* TODO: Rewrite exception handling to track injected and pending
|
||||
* (VM-Exit) exceptions separately.
|
||||
*/
|
||||
if (unlikely(vcpu->arch.exception.injected && is_guest_mode(vcpu)) &&
|
||||
kvm_x86_ops.nested_ops->handle_page_fault_workaround(vcpu, fault))
|
||||
return true;
|
||||
|
||||
fault_mmu->inject_page_fault(vcpu, fault);
|
||||
return false;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault);
|
||||
|
||||
@ -4841,7 +4859,7 @@ static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
|
||||
return (kvm_arch_interrupt_allowed(vcpu) &&
|
||||
kvm_cpu_accept_dm_intr(vcpu) &&
|
||||
!kvm_event_needs_reinjection(vcpu) &&
|
||||
!vcpu->arch.exception.pending);
|
||||
!kvm_is_exception_pending(vcpu));
|
||||
}
|
||||
|
||||
static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
|
||||
@ -5016,25 +5034,38 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
|
||||
static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
|
||||
struct kvm_vcpu_events *events)
|
||||
{
|
||||
struct kvm_queued_exception *ex;
|
||||
|
||||
process_nmi(vcpu);
|
||||
|
||||
if (kvm_check_request(KVM_REQ_SMI, vcpu))
|
||||
process_smi(vcpu);
|
||||
|
||||
/*
|
||||
* In guest mode, payload delivery should be deferred,
|
||||
* so that the L1 hypervisor can intercept #PF before
|
||||
* CR2 is modified (or intercept #DB before DR6 is
|
||||
* modified under nVMX). Unless the per-VM capability,
|
||||
* KVM_CAP_EXCEPTION_PAYLOAD, is set, we may not defer the delivery of
|
||||
* an exception payload and handle after a KVM_GET_VCPU_EVENTS. Since we
|
||||
* opportunistically defer the exception payload, deliver it if the
|
||||
* capability hasn't been requested before processing a
|
||||
* KVM_GET_VCPU_EVENTS.
|
||||
* KVM's ABI only allows for one exception to be migrated. Luckily,
|
||||
* the only time there can be two queued exceptions is if there's a
|
||||
* non-exiting _injected_ exception, and a pending exiting exception.
|
||||
* In that case, ignore the VM-Exiting exception as it's an extension
|
||||
* of the injected exception.
|
||||
*/
|
||||
if (vcpu->arch.exception_vmexit.pending &&
|
||||
!vcpu->arch.exception.pending &&
|
||||
!vcpu->arch.exception.injected)
|
||||
ex = &vcpu->arch.exception_vmexit;
|
||||
else
|
||||
ex = &vcpu->arch.exception;
|
||||
|
||||
/*
|
||||
* In guest mode, payload delivery should be deferred if the exception
|
||||
* will be intercepted by L1, e.g. KVM should not modifying CR2 if L1
|
||||
* intercepts #PF, ditto for DR6 and #DBs. If the per-VM capability,
|
||||
* KVM_CAP_EXCEPTION_PAYLOAD, is not set, userspace may or may not
|
||||
* propagate the payload and so it cannot be safely deferred. Deliver
|
||||
* the payload if the capability hasn't been requested.
|
||||
*/
|
||||
if (!vcpu->kvm->arch.exception_payload_enabled &&
|
||||
vcpu->arch.exception.pending && vcpu->arch.exception.has_payload)
|
||||
kvm_deliver_exception_payload(vcpu);
|
||||
ex->pending && ex->has_payload)
|
||||
kvm_deliver_exception_payload(vcpu, ex);
|
||||
|
||||
/*
|
||||
* The API doesn't provide the instruction length for software
|
||||
@ -5042,26 +5073,25 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
|
||||
* isn't advanced, we should expect to encounter the exception
|
||||
* again.
|
||||
*/
|
||||
if (kvm_exception_is_soft(vcpu->arch.exception.nr)) {
|
||||
if (kvm_exception_is_soft(ex->vector)) {
|
||||
events->exception.injected = 0;
|
||||
events->exception.pending = 0;
|
||||
} else {
|
||||
events->exception.injected = vcpu->arch.exception.injected;
|
||||
events->exception.pending = vcpu->arch.exception.pending;
|
||||
events->exception.injected = ex->injected;
|
||||
events->exception.pending = ex->pending;
|
||||
/*
|
||||
* For ABI compatibility, deliberately conflate
|
||||
* pending and injected exceptions when
|
||||
* KVM_CAP_EXCEPTION_PAYLOAD isn't enabled.
|
||||
*/
|
||||
if (!vcpu->kvm->arch.exception_payload_enabled)
|
||||
events->exception.injected |=
|
||||
vcpu->arch.exception.pending;
|
||||
events->exception.injected |= ex->pending;
|
||||
}
|
||||
events->exception.nr = vcpu->arch.exception.nr;
|
||||
events->exception.has_error_code = vcpu->arch.exception.has_error_code;
|
||||
events->exception.error_code = vcpu->arch.exception.error_code;
|
||||
events->exception_has_payload = vcpu->arch.exception.has_payload;
|
||||
events->exception_payload = vcpu->arch.exception.payload;
|
||||
events->exception.nr = ex->vector;
|
||||
events->exception.has_error_code = ex->has_error_code;
|
||||
events->exception.error_code = ex->error_code;
|
||||
events->exception_has_payload = ex->has_payload;
|
||||
events->exception_payload = ex->payload;
|
||||
|
||||
events->interrupt.injected =
|
||||
vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft;
|
||||
@ -5131,9 +5161,22 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
|
||||
return -EINVAL;
|
||||
|
||||
process_nmi(vcpu);
|
||||
|
||||
/*
|
||||
* Flag that userspace is stuffing an exception, the next KVM_RUN will
|
||||
* morph the exception to a VM-Exit if appropriate. Do this only for
|
||||
* pending exceptions, already-injected exceptions are not subject to
|
||||
* intercpetion. Note, userspace that conflates pending and injected
|
||||
* is hosed, and will incorrectly convert an injected exception into a
|
||||
* pending exception, which in turn may cause a spurious VM-Exit.
|
||||
*/
|
||||
vcpu->arch.exception_from_userspace = events->exception.pending;
|
||||
|
||||
vcpu->arch.exception_vmexit.pending = false;
|
||||
|
||||
vcpu->arch.exception.injected = events->exception.injected;
|
||||
vcpu->arch.exception.pending = events->exception.pending;
|
||||
vcpu->arch.exception.nr = events->exception.nr;
|
||||
vcpu->arch.exception.vector = events->exception.nr;
|
||||
vcpu->arch.exception.has_error_code = events->exception.has_error_code;
|
||||
vcpu->arch.exception.error_code = events->exception.error_code;
|
||||
vcpu->arch.exception.has_payload = events->exception_has_payload;
|
||||
@ -7257,6 +7300,7 @@ static int kvm_can_emulate_insn(struct kvm_vcpu *vcpu, int emul_type,
|
||||
int handle_ud(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
static const char kvm_emulate_prefix[] = { __KVM_EMULATE_PREFIX };
|
||||
int fep_flags = READ_ONCE(force_emulation_prefix);
|
||||
int emul_type = EMULTYPE_TRAP_UD;
|
||||
char sig[5]; /* ud2; .ascii "kvm" */
|
||||
struct x86_exception e;
|
||||
@ -7264,10 +7308,12 @@ int handle_ud(struct kvm_vcpu *vcpu)
|
||||
if (unlikely(!kvm_can_emulate_insn(vcpu, emul_type, NULL, 0)))
|
||||
return 1;
|
||||
|
||||
if (force_emulation_prefix &&
|
||||
if (fep_flags &&
|
||||
kvm_read_guest_virt(vcpu, kvm_get_linear_rip(vcpu),
|
||||
sig, sizeof(sig), &e) == 0 &&
|
||||
memcmp(sig, kvm_emulate_prefix, sizeof(sig)) == 0) {
|
||||
if (fep_flags & KVM_FEP_CLEAR_RFLAGS_RF)
|
||||
kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) & ~X86_EFLAGS_RF);
|
||||
kvm_rip_write(vcpu, kvm_rip_read(vcpu) + sizeof(sig));
|
||||
emul_type = EMULTYPE_TRAP_UD_FORCED;
|
||||
}
|
||||
@ -7933,14 +7979,20 @@ static int emulator_get_msr_with_filter(struct x86_emulate_ctxt *ctxt,
|
||||
int r;
|
||||
|
||||
r = kvm_get_msr_with_filter(vcpu, msr_index, pdata);
|
||||
if (r < 0)
|
||||
return X86EMUL_UNHANDLEABLE;
|
||||
|
||||
if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
|
||||
complete_emulated_rdmsr, r)) {
|
||||
/* Bounce to user space */
|
||||
return X86EMUL_IO_NEEDED;
|
||||
if (r) {
|
||||
if (kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
|
||||
complete_emulated_rdmsr, r))
|
||||
return X86EMUL_IO_NEEDED;
|
||||
|
||||
trace_kvm_msr_read_ex(msr_index);
|
||||
return X86EMUL_PROPAGATE_FAULT;
|
||||
}
|
||||
|
||||
return r;
|
||||
trace_kvm_msr_read(msr_index, *pdata);
|
||||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
static int emulator_set_msr_with_filter(struct x86_emulate_ctxt *ctxt,
|
||||
@ -7950,14 +8002,20 @@ static int emulator_set_msr_with_filter(struct x86_emulate_ctxt *ctxt,
|
||||
int r;
|
||||
|
||||
r = kvm_set_msr_with_filter(vcpu, msr_index, data);
|
||||
if (r < 0)
|
||||
return X86EMUL_UNHANDLEABLE;
|
||||
|
||||
if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
|
||||
complete_emulated_msr_access, r)) {
|
||||
/* Bounce to user space */
|
||||
return X86EMUL_IO_NEEDED;
|
||||
if (r) {
|
||||
if (kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
|
||||
complete_emulated_msr_access, r))
|
||||
return X86EMUL_IO_NEEDED;
|
||||
|
||||
trace_kvm_msr_write_ex(msr_index, data);
|
||||
return X86EMUL_PROPAGATE_FAULT;
|
||||
}
|
||||
|
||||
return r;
|
||||
trace_kvm_msr_write(msr_index, data);
|
||||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
|
||||
@ -8161,18 +8219,17 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
|
||||
}
|
||||
}
|
||||
|
||||
static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
|
||||
static void inject_emulated_exception(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
|
||||
if (ctxt->exception.vector == PF_VECTOR)
|
||||
return kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
|
||||
|
||||
if (ctxt->exception.error_code_valid)
|
||||
if (ctxt->exception.vector == PF_VECTOR)
|
||||
kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
|
||||
else if (ctxt->exception.error_code_valid)
|
||||
kvm_queue_exception_e(vcpu, ctxt->exception.vector,
|
||||
ctxt->exception.error_code);
|
||||
else
|
||||
kvm_queue_exception(vcpu, ctxt->exception.vector);
|
||||
return false;
|
||||
}
|
||||
|
||||
static struct x86_emulate_ctxt *alloc_emulate_ctxt(struct kvm_vcpu *vcpu)
|
||||
@ -8548,8 +8605,46 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
|
||||
|
||||
static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu, int *r)
|
||||
static bool kvm_is_code_breakpoint_inhibited(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
u32 shadow;
|
||||
|
||||
if (kvm_get_rflags(vcpu) & X86_EFLAGS_RF)
|
||||
return true;
|
||||
|
||||
/*
|
||||
* Intel CPUs inhibit code #DBs when MOV/POP SS blocking is active,
|
||||
* but AMD CPUs do not. MOV/POP SS blocking is rare, check that first
|
||||
* to avoid the relatively expensive CPUID lookup.
|
||||
*/
|
||||
shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
|
||||
return (shadow & KVM_X86_SHADOW_INT_MOV_SS) &&
|
||||
guest_cpuid_is_intel(vcpu);
|
||||
}
|
||||
|
||||
static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu,
|
||||
int emulation_type, int *r)
|
||||
{
|
||||
WARN_ON_ONCE(emulation_type & EMULTYPE_NO_DECODE);
|
||||
|
||||
/*
|
||||
* Do not check for code breakpoints if hardware has already done the
|
||||
* checks, as inferred from the emulation type. On NO_DECODE and SKIP,
|
||||
* the instruction has passed all exception checks, and all intercepted
|
||||
* exceptions that trigger emulation have lower priority than code
|
||||
* breakpoints, i.e. the fact that the intercepted exception occurred
|
||||
* means any code breakpoints have already been serviced.
|
||||
*
|
||||
* Note, KVM needs to check for code #DBs on EMULTYPE_TRAP_UD_FORCED as
|
||||
* hardware has checked the RIP of the magic prefix, but not the RIP of
|
||||
* the instruction being emulated. The intent of forced emulation is
|
||||
* to behave as if KVM intercepted the instruction without an exception
|
||||
* and without a prefix.
|
||||
*/
|
||||
if (emulation_type & (EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
|
||||
EMULTYPE_TRAP_UD | EMULTYPE_VMWARE_GP | EMULTYPE_PF))
|
||||
return false;
|
||||
|
||||
if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) &&
|
||||
(vcpu->arch.guest_debug_dr7 & DR7_BP_EN_MASK)) {
|
||||
struct kvm_run *kvm_run = vcpu->run;
|
||||
@ -8569,7 +8664,7 @@ static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu, int *r)
|
||||
}
|
||||
|
||||
if (unlikely(vcpu->arch.dr7 & DR7_BP_EN_MASK) &&
|
||||
!(kvm_get_rflags(vcpu) & X86_EFLAGS_RF)) {
|
||||
!kvm_is_code_breakpoint_inhibited(vcpu)) {
|
||||
unsigned long eip = kvm_get_linear_rip(vcpu);
|
||||
u32 dr6 = kvm_vcpu_check_hw_bp(eip, 0,
|
||||
vcpu->arch.dr7,
|
||||
@ -8671,8 +8766,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
* are fault-like and are higher priority than any faults on
|
||||
* the code fetch itself.
|
||||
*/
|
||||
if (!(emulation_type & EMULTYPE_SKIP) &&
|
||||
kvm_vcpu_check_code_breakpoint(vcpu, &r))
|
||||
if (kvm_vcpu_check_code_breakpoint(vcpu, emulation_type, &r))
|
||||
return r;
|
||||
|
||||
r = x86_decode_emulated_instruction(vcpu, emulation_type,
|
||||
@ -8770,8 +8864,7 @@ restart:
|
||||
|
||||
if (ctxt->have_exception) {
|
||||
r = 1;
|
||||
if (inject_emulated_exception(vcpu))
|
||||
return r;
|
||||
inject_emulated_exception(vcpu);
|
||||
} else if (vcpu->arch.pio.count) {
|
||||
if (!vcpu->arch.pio.in) {
|
||||
/* FIXME: return into emulator if single-stepping. */
|
||||
@ -8801,6 +8894,12 @@ writeback:
|
||||
unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
|
||||
toggle_interruptibility(vcpu, ctxt->interruptibility);
|
||||
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
|
||||
|
||||
/*
|
||||
* Note, EXCPT_DB is assumed to be fault-like as the emulator
|
||||
* only supports code breakpoints and general detect #DB, both
|
||||
* of which are fault-like.
|
||||
*/
|
||||
if (!ctxt->have_exception ||
|
||||
exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
|
||||
kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
|
||||
@ -9662,74 +9761,155 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
|
||||
|
||||
static void kvm_inject_exception(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
trace_kvm_inj_exception(vcpu->arch.exception.nr,
|
||||
trace_kvm_inj_exception(vcpu->arch.exception.vector,
|
||||
vcpu->arch.exception.has_error_code,
|
||||
vcpu->arch.exception.error_code,
|
||||
vcpu->arch.exception.injected);
|
||||
|
||||
if (vcpu->arch.exception.error_code && !is_protmode(vcpu))
|
||||
vcpu->arch.exception.error_code = false;
|
||||
static_call(kvm_x86_queue_exception)(vcpu);
|
||||
static_call(kvm_x86_inject_exception)(vcpu);
|
||||
}
|
||||
|
||||
static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
|
||||
/*
|
||||
* Check for any event (interrupt or exception) that is ready to be injected,
|
||||
* and if there is at least one event, inject the event with the highest
|
||||
* priority. This handles both "pending" events, i.e. events that have never
|
||||
* been injected into the guest, and "injected" events, i.e. events that were
|
||||
* injected as part of a previous VM-Enter, but weren't successfully delivered
|
||||
* and need to be re-injected.
|
||||
*
|
||||
* Note, this is not guaranteed to be invoked on a guest instruction boundary,
|
||||
* i.e. doesn't guarantee that there's an event window in the guest. KVM must
|
||||
* be able to inject exceptions in the "middle" of an instruction, and so must
|
||||
* also be able to re-inject NMIs and IRQs in the middle of an instruction.
|
||||
* I.e. for exceptions and re-injected events, NOT invoking this on instruction
|
||||
* boundaries is necessary and correct.
|
||||
*
|
||||
* For simplicity, KVM uses a single path to inject all events (except events
|
||||
* that are injected directly from L1 to L2) and doesn't explicitly track
|
||||
* instruction boundaries for asynchronous events. However, because VM-Exits
|
||||
* that can occur during instruction execution typically result in KVM skipping
|
||||
* the instruction or injecting an exception, e.g. instruction and exception
|
||||
* intercepts, and because pending exceptions have higher priority than pending
|
||||
* interrupts, KVM still honors instruction boundaries in most scenarios.
|
||||
*
|
||||
* But, if a VM-Exit occurs during instruction execution, and KVM does NOT skip
|
||||
* the instruction or inject an exception, then KVM can incorrecty inject a new
|
||||
* asynchrounous event if the event became pending after the CPU fetched the
|
||||
* instruction (in the guest). E.g. if a page fault (#PF, #NPF, EPT violation)
|
||||
* occurs and is resolved by KVM, a coincident NMI, SMI, IRQ, etc... can be
|
||||
* injected on the restarted instruction instead of being deferred until the
|
||||
* instruction completes.
|
||||
*
|
||||
* In practice, this virtualization hole is unlikely to be observed by the
|
||||
* guest, and even less likely to cause functional problems. To detect the
|
||||
* hole, the guest would have to trigger an event on a side effect of an early
|
||||
* phase of instruction execution, e.g. on the instruction fetch from memory.
|
||||
* And for it to be a functional problem, the guest would need to depend on the
|
||||
* ordering between that side effect, the instruction completing, _and_ the
|
||||
* delivery of the asynchronous event.
|
||||
*/
|
||||
static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
|
||||
bool *req_immediate_exit)
|
||||
{
|
||||
bool can_inject;
|
||||
int r;
|
||||
bool can_inject = true;
|
||||
|
||||
/* try to reinject previous events if any */
|
||||
|
||||
if (vcpu->arch.exception.injected) {
|
||||
kvm_inject_exception(vcpu);
|
||||
can_inject = false;
|
||||
}
|
||||
/*
|
||||
* Do not inject an NMI or interrupt if there is a pending
|
||||
* exception. Exceptions and interrupts are recognized at
|
||||
* instruction boundaries, i.e. the start of an instruction.
|
||||
* Trap-like exceptions, e.g. #DB, have higher priority than
|
||||
* NMIs and interrupts, i.e. traps are recognized before an
|
||||
* NMI/interrupt that's pending on the same instruction.
|
||||
* Fault-like exceptions, e.g. #GP and #PF, are the lowest
|
||||
* priority, but are only generated (pended) during instruction
|
||||
* execution, i.e. a pending fault-like exception means the
|
||||
* fault occurred on the *previous* instruction and must be
|
||||
* serviced prior to recognizing any new events in order to
|
||||
* fully complete the previous instruction.
|
||||
* Process nested events first, as nested VM-Exit supercedes event
|
||||
* re-injection. If there's an event queued for re-injection, it will
|
||||
* be saved into the appropriate vmc{b,s}12 fields on nested VM-Exit.
|
||||
*/
|
||||
else if (!vcpu->arch.exception.pending) {
|
||||
if (vcpu->arch.nmi_injected) {
|
||||
static_call(kvm_x86_inject_nmi)(vcpu);
|
||||
can_inject = false;
|
||||
} else if (vcpu->arch.interrupt.injected) {
|
||||
static_call(kvm_x86_inject_irq)(vcpu, true);
|
||||
can_inject = false;
|
||||
}
|
||||
}
|
||||
if (is_guest_mode(vcpu))
|
||||
r = kvm_check_nested_events(vcpu);
|
||||
else
|
||||
r = 0;
|
||||
|
||||
/*
|
||||
* Re-inject exceptions and events *especially* if immediate entry+exit
|
||||
* to/from L2 is needed, as any event that has already been injected
|
||||
* into L2 needs to complete its lifecycle before injecting a new event.
|
||||
*
|
||||
* Don't re-inject an NMI or interrupt if there is a pending exception.
|
||||
* This collision arises if an exception occurred while vectoring the
|
||||
* injected event, KVM intercepted said exception, and KVM ultimately
|
||||
* determined the fault belongs to the guest and queues the exception
|
||||
* for injection back into the guest.
|
||||
*
|
||||
* "Injected" interrupts can also collide with pending exceptions if
|
||||
* userspace ignores the "ready for injection" flag and blindly queues
|
||||
* an interrupt. In that case, prioritizing the exception is correct,
|
||||
* as the exception "occurred" before the exit to userspace. Trap-like
|
||||
* exceptions, e.g. most #DBs, have higher priority than interrupts.
|
||||
* And while fault-like exceptions, e.g. #GP and #PF, are the lowest
|
||||
* priority, they're only generated (pended) during instruction
|
||||
* execution, and interrupts are recognized at instruction boundaries.
|
||||
* Thus a pending fault-like exception means the fault occurred on the
|
||||
* *previous* instruction and must be serviced prior to recognizing any
|
||||
* new events in order to fully complete the previous instruction.
|
||||
*/
|
||||
if (vcpu->arch.exception.injected)
|
||||
kvm_inject_exception(vcpu);
|
||||
else if (kvm_is_exception_pending(vcpu))
|
||||
; /* see above */
|
||||
else if (vcpu->arch.nmi_injected)
|
||||
static_call(kvm_x86_inject_nmi)(vcpu);
|
||||
else if (vcpu->arch.interrupt.injected)
|
||||
static_call(kvm_x86_inject_irq)(vcpu, true);
|
||||
|
||||
/*
|
||||
* Exceptions that morph to VM-Exits are handled above, and pending
|
||||
* exceptions on top of injected exceptions that do not VM-Exit should
|
||||
* either morph to #DF or, sadly, override the injected exception.
|
||||
*/
|
||||
WARN_ON_ONCE(vcpu->arch.exception.injected &&
|
||||
vcpu->arch.exception.pending);
|
||||
|
||||
/*
|
||||
* Call check_nested_events() even if we reinjected a previous event
|
||||
* in order for caller to determine if it should require immediate-exit
|
||||
* from L2 to L1 due to pending L1 events which require exit
|
||||
* from L2 to L1.
|
||||
* Bail if immediate entry+exit to/from the guest is needed to complete
|
||||
* nested VM-Enter or event re-injection so that a different pending
|
||||
* event can be serviced (or if KVM needs to exit to userspace).
|
||||
*
|
||||
* Otherwise, continue processing events even if VM-Exit occurred. The
|
||||
* VM-Exit will have cleared exceptions that were meant for L2, but
|
||||
* there may now be events that can be injected into L1.
|
||||
*/
|
||||
if (is_guest_mode(vcpu)) {
|
||||
r = kvm_check_nested_events(vcpu);
|
||||
if (r < 0)
|
||||
goto out;
|
||||
}
|
||||
if (r < 0)
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* A pending exception VM-Exit should either result in nested VM-Exit
|
||||
* or force an immediate re-entry and exit to/from L2, and exception
|
||||
* VM-Exits cannot be injected (flag should _never_ be set).
|
||||
*/
|
||||
WARN_ON_ONCE(vcpu->arch.exception_vmexit.injected ||
|
||||
vcpu->arch.exception_vmexit.pending);
|
||||
|
||||
/*
|
||||
* New events, other than exceptions, cannot be injected if KVM needs
|
||||
* to re-inject a previous event. See above comments on re-injecting
|
||||
* for why pending exceptions get priority.
|
||||
*/
|
||||
can_inject = !kvm_event_needs_reinjection(vcpu);
|
||||
|
||||
/* try to inject new event if pending */
|
||||
if (vcpu->arch.exception.pending) {
|
||||
if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT)
|
||||
/*
|
||||
* Fault-class exceptions, except #DBs, set RF=1 in the RFLAGS
|
||||
* value pushed on the stack. Trap-like exception and all #DBs
|
||||
* leave RF as-is (KVM follows Intel's behavior in this regard;
|
||||
* AMD states that code breakpoint #DBs excplitly clear RF=0).
|
||||
*
|
||||
* Note, most versions of Intel's SDM and AMD's APM incorrectly
|
||||
* describe the behavior of General Detect #DBs, which are
|
||||
* fault-like. They do _not_ set RF, a la code breakpoints.
|
||||
*/
|
||||
if (exception_type(vcpu->arch.exception.vector) == EXCPT_FAULT)
|
||||
__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
|
||||
X86_EFLAGS_RF);
|
||||
|
||||
if (vcpu->arch.exception.nr == DB_VECTOR) {
|
||||
kvm_deliver_exception_payload(vcpu);
|
||||
if (vcpu->arch.exception.vector == DB_VECTOR) {
|
||||
kvm_deliver_exception_payload(vcpu, &vcpu->arch.exception);
|
||||
if (vcpu->arch.dr7 & DR7_GD) {
|
||||
vcpu->arch.dr7 &= ~DR7_GD;
|
||||
kvm_update_dr7(vcpu);
|
||||
@ -9801,11 +9981,11 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
|
||||
}
|
||||
|
||||
if (is_guest_mode(vcpu) &&
|
||||
kvm_x86_ops.nested_ops->hv_timer_pending &&
|
||||
kvm_x86_ops.nested_ops->hv_timer_pending(vcpu))
|
||||
kvm_x86_ops.nested_ops->has_events &&
|
||||
kvm_x86_ops.nested_ops->has_events(vcpu))
|
||||
*req_immediate_exit = true;
|
||||
|
||||
WARN_ON(vcpu->arch.exception.pending);
|
||||
WARN_ON(kvm_is_exception_pending(vcpu));
|
||||
return 0;
|
||||
|
||||
out:
|
||||
@ -10110,7 +10290,7 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
|
||||
* When APICv gets disabled, we may still have injected interrupts
|
||||
* pending. At the same time, KVM_REQ_EVENT may not be set as APICv was
|
||||
* still active when the interrupt got accepted. Make sure
|
||||
* inject_pending_event() is called to check for that.
|
||||
* kvm_check_and_inject_events() is called to check for that.
|
||||
*/
|
||||
if (!apic->apicv_active)
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
@ -10407,7 +10587,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
|
||||
goto out;
|
||||
}
|
||||
|
||||
r = inject_pending_event(vcpu, &req_immediate_exit);
|
||||
r = kvm_check_and_inject_events(vcpu, &req_immediate_exit);
|
||||
if (r < 0) {
|
||||
r = 0;
|
||||
goto out;
|
||||
@ -10646,10 +10826,26 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
|
||||
if (hv_timer)
|
||||
kvm_lapic_switch_to_hv_timer(vcpu);
|
||||
|
||||
if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
|
||||
/*
|
||||
* If the vCPU is not runnable, a signal or another host event
|
||||
* of some kind is pending; service it without changing the
|
||||
* vCPU's activity state.
|
||||
*/
|
||||
if (!kvm_arch_vcpu_runnable(vcpu))
|
||||
return 1;
|
||||
}
|
||||
|
||||
/*
|
||||
* Evaluate nested events before exiting the halted state. This allows
|
||||
* the halt state to be recorded properly in the VMCS12's activity
|
||||
* state field (AMD does not have a similar field and a VM-Exit always
|
||||
* causes a spurious wakeup from HLT).
|
||||
*/
|
||||
if (is_guest_mode(vcpu)) {
|
||||
if (kvm_check_nested_events(vcpu) < 0)
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (kvm_apic_accept_events(vcpu) < 0)
|
||||
return 0;
|
||||
switch(vcpu->arch.mp_state) {
|
||||
@ -10673,9 +10869,6 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
|
||||
|
||||
static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (is_guest_mode(vcpu))
|
||||
kvm_check_nested_events(vcpu);
|
||||
|
||||
return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
|
||||
!vcpu->arch.apf.halted);
|
||||
}
|
||||
@ -10824,6 +11017,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
|
||||
|
||||
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_queued_exception *ex = &vcpu->arch.exception;
|
||||
struct kvm_run *kvm_run = vcpu->run;
|
||||
int r;
|
||||
|
||||
@ -10852,7 +11046,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
|
||||
r = 0;
|
||||
goto out;
|
||||
}
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
r = -EAGAIN;
|
||||
if (signal_pending(current)) {
|
||||
r = -EINTR;
|
||||
@ -10882,6 +11075,21 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* If userspace set a pending exception and L2 is active, convert it to
|
||||
* a pending VM-Exit if L1 wants to intercept the exception.
|
||||
*/
|
||||
if (vcpu->arch.exception_from_userspace && is_guest_mode(vcpu) &&
|
||||
kvm_x86_ops.nested_ops->is_exception_vmexit(vcpu, ex->vector,
|
||||
ex->error_code)) {
|
||||
kvm_queue_exception_vmexit(vcpu, ex->vector,
|
||||
ex->has_error_code, ex->error_code,
|
||||
ex->has_payload, ex->payload);
|
||||
ex->injected = false;
|
||||
ex->pending = false;
|
||||
}
|
||||
vcpu->arch.exception_from_userspace = false;
|
||||
|
||||
if (unlikely(vcpu->arch.complete_userspace_io)) {
|
||||
int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
|
||||
vcpu->arch.complete_userspace_io = NULL;
|
||||
@ -10988,6 +11196,7 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
|
||||
kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
|
||||
|
||||
vcpu->arch.exception.pending = false;
|
||||
vcpu->arch.exception_vmexit.pending = false;
|
||||
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
}
|
||||
@ -11125,11 +11334,12 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
|
||||
}
|
||||
|
||||
/*
|
||||
* KVM_MP_STATE_INIT_RECEIVED means the processor is in
|
||||
* INIT state; latched init should be reported using
|
||||
* KVM_SET_VCPU_EVENTS, so reject it here.
|
||||
* Pending INITs are reported using KVM_SET_VCPU_EVENTS, disallow
|
||||
* forcing the guest into INIT/SIPI if those events are supposed to be
|
||||
* blocked. KVM prioritizes SMI over INIT, so reject INIT/SIPI state
|
||||
* if an SMI is pending as well.
|
||||
*/
|
||||
if ((kvm_vcpu_latch_init(vcpu) || vcpu->arch.smi_pending) &&
|
||||
if ((!kvm_apic_init_sipi_allowed(vcpu) || vcpu->arch.smi_pending) &&
|
||||
(mp_state->mp_state == KVM_MP_STATE_SIPI_RECEIVED ||
|
||||
mp_state->mp_state == KVM_MP_STATE_INIT_RECEIVED))
|
||||
goto out;
|
||||
@ -11368,7 +11578,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
|
||||
|
||||
if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {
|
||||
r = -EBUSY;
|
||||
if (vcpu->arch.exception.pending)
|
||||
if (kvm_is_exception_pending(vcpu))
|
||||
goto out;
|
||||
if (dbg->control & KVM_GUESTDBG_INJECT_DB)
|
||||
kvm_queue_exception(vcpu, DB_VECTOR);
|
||||
@ -11750,8 +11960,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
|
||||
struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate;
|
||||
|
||||
/*
|
||||
* To avoid have the INIT path from kvm_apic_has_events() that be
|
||||
* called with loaded FPU and does not let userspace fix the state.
|
||||
* All paths that lead to INIT are required to load the guest's
|
||||
* FPU state (because most paths are buried in KVM_RUN).
|
||||
*/
|
||||
if (init_event)
|
||||
kvm_put_guest_fpu(vcpu);
|
||||
@ -12080,6 +12290,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
||||
if (ret)
|
||||
goto out_page_track;
|
||||
|
||||
ret = static_call(kvm_x86_vm_init)(kvm);
|
||||
if (ret)
|
||||
goto out_uninit_mmu;
|
||||
|
||||
INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
|
||||
INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
|
||||
atomic_set(&kvm->arch.noncoherent_dma_count, 0);
|
||||
@ -12115,8 +12329,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
||||
kvm_hv_init_vm(kvm);
|
||||
kvm_xen_init_vm(kvm);
|
||||
|
||||
return static_call(kvm_x86_vm_init)(kvm);
|
||||
return 0;
|
||||
|
||||
out_uninit_mmu:
|
||||
kvm_mmu_uninit_vm(kvm);
|
||||
out_page_track:
|
||||
kvm_page_track_cleanup(kvm);
|
||||
out:
|
||||
@ -12589,13 +12805,14 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
|
||||
if (!list_empty_careful(&vcpu->async_pf.done))
|
||||
return true;
|
||||
|
||||
if (kvm_apic_has_events(vcpu))
|
||||
if (kvm_apic_has_pending_init_or_sipi(vcpu) &&
|
||||
kvm_apic_init_sipi_allowed(vcpu))
|
||||
return true;
|
||||
|
||||
if (vcpu->arch.pv.pv_unhalted)
|
||||
return true;
|
||||
|
||||
if (vcpu->arch.exception.pending)
|
||||
if (kvm_is_exception_pending(vcpu))
|
||||
return true;
|
||||
|
||||
if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
|
||||
@ -12617,16 +12834,13 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
|
||||
return true;
|
||||
|
||||
if (is_guest_mode(vcpu) &&
|
||||
kvm_x86_ops.nested_ops->hv_timer_pending &&
|
||||
kvm_x86_ops.nested_ops->hv_timer_pending(vcpu))
|
||||
kvm_x86_ops.nested_ops->has_events &&
|
||||
kvm_x86_ops.nested_ops->has_events(vcpu))
|
||||
return true;
|
||||
|
||||
if (kvm_xen_has_pending_events(vcpu))
|
||||
return true;
|
||||
|
||||
if (kvm_test_request(KVM_REQ_TRIPLE_FAULT, vcpu))
|
||||
return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
@ -12850,7 +13064,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (unlikely(!lapic_in_kernel(vcpu) ||
|
||||
kvm_event_needs_reinjection(vcpu) ||
|
||||
vcpu->arch.exception.pending))
|
||||
kvm_is_exception_pending(vcpu)))
|
||||
return false;
|
||||
|
||||
if (kvm_hlt_in_guest(vcpu->kvm) && !kvm_can_deliver_async_pf(vcpu))
|
||||
@ -13401,7 +13615,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmenter);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit);
|
||||
|
@ -82,10 +82,18 @@ static inline unsigned int __shrink_ple_window(unsigned int val,
|
||||
void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
|
||||
int kvm_check_nested_events(struct kvm_vcpu *vcpu);
|
||||
|
||||
static inline bool kvm_is_exception_pending(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return vcpu->arch.exception.pending ||
|
||||
vcpu->arch.exception_vmexit.pending ||
|
||||
kvm_test_request(KVM_REQ_TRIPLE_FAULT, vcpu);
|
||||
}
|
||||
|
||||
static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
vcpu->arch.exception.pending = false;
|
||||
vcpu->arch.exception.injected = false;
|
||||
vcpu->arch.exception_vmexit.pending = false;
|
||||
}
|
||||
|
||||
static inline void kvm_queue_interrupt(struct kvm_vcpu *vcpu, u8 vector,
|
||||
@ -267,11 +275,6 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
|
||||
return !(kvm->arch.disabled_quirks & quirk);
|
||||
}
|
||||
|
||||
static inline bool kvm_vcpu_latch_init(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return is_smm(vcpu) || static_call(kvm_x86_apic_init_signal_blocked)(vcpu);
|
||||
}
|
||||
|
||||
void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
|
||||
|
||||
u64 get_kvmclock_ns(struct kvm *kvm);
|
||||
@ -286,7 +289,8 @@ int kvm_write_guest_virt_system(struct kvm_vcpu *vcpu,
|
||||
|
||||
int handle_ud(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu);
|
||||
void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
|
||||
struct kvm_queued_exception *ex);
|
||||
|
||||
void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu);
|
||||
u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
|
||||
|
@ -1065,7 +1065,6 @@ static bool kvm_xen_schedop_poll(struct kvm_vcpu *vcpu, bool longmode,
|
||||
del_timer(&vcpu->arch.xen.poll_timer);
|
||||
|
||||
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
}
|
||||
|
||||
vcpu->arch.xen.poll_evtchn = 0;
|
||||
|
@ -433,6 +433,7 @@ static ssize_t node_read_meminfo(struct device *dev,
|
||||
"Node %d ShadowCallStack:%8lu kB\n"
|
||||
#endif
|
||||
"Node %d PageTables: %8lu kB\n"
|
||||
"Node %d SecPageTables: %8lu kB\n"
|
||||
"Node %d NFS_Unstable: %8lu kB\n"
|
||||
"Node %d Bounce: %8lu kB\n"
|
||||
"Node %d WritebackTmp: %8lu kB\n"
|
||||
@ -459,6 +460,7 @@ static ssize_t node_read_meminfo(struct device *dev,
|
||||
nid, node_page_state(pgdat, NR_KERNEL_SCS_KB),
|
||||
#endif
|
||||
nid, K(node_page_state(pgdat, NR_PAGETABLE)),
|
||||
nid, K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
|
||||
nid, 0UL,
|
||||
nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
|
||||
nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
|
||||
|
@ -115,6 +115,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
|
||||
#endif
|
||||
show_val_kb(m, "PageTables: ",
|
||||
global_node_page_state(NR_PAGETABLE));
|
||||
show_val_kb(m, "SecPageTables: ",
|
||||
global_node_page_state(NR_SECONDARY_PAGETABLE));
|
||||
|
||||
show_val_kb(m, "NFS_Unstable: ", 0);
|
||||
show_val_kb(m, "Bounce: ",
|
||||
|
@ -151,12 +151,11 @@ static inline bool is_error_page(struct page *page)
|
||||
#define KVM_REQUEST_NO_ACTION BIT(10)
|
||||
/*
|
||||
* Architecture-independent vcpu->requests bit members
|
||||
* Bits 4-7 are reserved for more arch-independent bits.
|
||||
* Bits 3-7 are reserved for more arch-independent bits.
|
||||
*/
|
||||
#define KVM_REQ_TLB_FLUSH (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
#define KVM_REQ_VM_DEAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
#define KVM_REQ_UNBLOCK 2
|
||||
#define KVM_REQ_UNHALT 3
|
||||
#define KVM_REQUEST_ARCH_BASE 8
|
||||
|
||||
/*
|
||||
@ -2247,6 +2246,19 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
#endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
|
||||
|
||||
/*
|
||||
* If more than one page is being (un)accounted, @virt must be the address of
|
||||
* the first page of a block of pages what were allocated together (i.e
|
||||
* accounted together).
|
||||
*
|
||||
* kvm_account_pgtable_pages() is thread-safe because mod_lruvec_page_state()
|
||||
* is thread-safe.
|
||||
*/
|
||||
static inline void kvm_account_pgtable_pages(void *virt, int nr)
|
||||
{
|
||||
mod_lruvec_page_state(virt_to_page(virt), NR_SECONDARY_PAGETABLE, nr);
|
||||
}
|
||||
|
||||
/*
|
||||
* This defines how many reserved entries we want to keep before we
|
||||
* kick the vcpu to the userspace to avoid dirty ring full. This
|
||||
|
@ -216,6 +216,7 @@ enum node_stat_item {
|
||||
NR_KERNEL_SCS_KB, /* measured in KiB */
|
||||
#endif
|
||||
NR_PAGETABLE, /* used for pagetables */
|
||||
NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. KVM pagetables */
|
||||
#ifdef CONFIG_SWAP
|
||||
NR_SWAPCACHE,
|
||||
#endif
|
||||
|
@ -1401,6 +1401,7 @@ static const struct memory_stat memory_stats[] = {
|
||||
{ "kernel", MEMCG_KMEM },
|
||||
{ "kernel_stack", NR_KERNEL_STACK_KB },
|
||||
{ "pagetables", NR_PAGETABLE },
|
||||
{ "sec_pagetables", NR_SECONDARY_PAGETABLE },
|
||||
{ "percpu", MEMCG_PERCPU_B },
|
||||
{ "sock", MEMCG_SOCK },
|
||||
{ "vmalloc", MEMCG_VMALLOC },
|
||||
|
@ -6085,7 +6085,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
|
||||
" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
|
||||
" unevictable:%lu dirty:%lu writeback:%lu\n"
|
||||
" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
|
||||
" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
|
||||
" mapped:%lu shmem:%lu pagetables:%lu\n"
|
||||
" sec_pagetables:%lu bounce:%lu\n"
|
||||
" kernel_misc_reclaimable:%lu\n"
|
||||
" free:%lu free_pcp:%lu free_cma:%lu\n",
|
||||
global_node_page_state(NR_ACTIVE_ANON),
|
||||
@ -6102,6 +6103,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
|
||||
global_node_page_state(NR_FILE_MAPPED),
|
||||
global_node_page_state(NR_SHMEM),
|
||||
global_node_page_state(NR_PAGETABLE),
|
||||
global_node_page_state(NR_SECONDARY_PAGETABLE),
|
||||
global_zone_page_state(NR_BOUNCE),
|
||||
global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
|
||||
global_zone_page_state(NR_FREE_PAGES),
|
||||
@ -6135,6 +6137,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
|
||||
" shadow_call_stack:%lukB"
|
||||
#endif
|
||||
" pagetables:%lukB"
|
||||
" sec_pagetables:%lukB"
|
||||
" all_unreclaimable? %s"
|
||||
"\n",
|
||||
pgdat->node_id,
|
||||
@ -6160,6 +6163,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
|
||||
node_page_state(pgdat, NR_KERNEL_SCS_KB),
|
||||
#endif
|
||||
K(node_page_state(pgdat, NR_PAGETABLE)),
|
||||
K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)),
|
||||
pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
|
||||
"yes" : "no");
|
||||
}
|
||||
|
@ -1247,6 +1247,7 @@ const char * const vmstat_text[] = {
|
||||
"nr_shadow_call_stack",
|
||||
#endif
|
||||
"nr_page_table_pages",
|
||||
"nr_sec_page_table_pages",
|
||||
#ifdef CONFIG_SWAP
|
||||
"nr_swapcached",
|
||||
#endif
|
||||
|
1
tools/testing/selftests/kvm/.gitignore
vendored
1
tools/testing/selftests/kvm/.gitignore
vendored
@ -28,6 +28,7 @@
|
||||
/x86_64/max_vcpuid_cap_test
|
||||
/x86_64/mmio_warning_test
|
||||
/x86_64/monitor_mwait_test
|
||||
/x86_64/nested_exceptions_test
|
||||
/x86_64/nx_huge_pages_test
|
||||
/x86_64/platform_info_test
|
||||
/x86_64/pmu_event_filter_test
|
||||
|
@ -91,6 +91,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/monitor_mwait_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/nested_exceptions_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id
|
||||
|
@ -203,14 +203,25 @@ struct hv_enlightened_vmcs {
|
||||
u32 reserved:30;
|
||||
} hv_enlightenments_control;
|
||||
u32 hv_vp_id;
|
||||
|
||||
u32 padding32_2;
|
||||
u64 hv_vm_id;
|
||||
u64 partition_assist_page;
|
||||
u64 padding64_4[4];
|
||||
u64 guest_bndcfgs;
|
||||
u64 padding64_5[7];
|
||||
u64 guest_ia32_perf_global_ctrl;
|
||||
u64 guest_ia32_s_cet;
|
||||
u64 guest_ssp;
|
||||
u64 guest_ia32_int_ssp_table_addr;
|
||||
u64 guest_ia32_lbr_ctl;
|
||||
u64 padding64_5[2];
|
||||
u64 xss_exit_bitmap;
|
||||
u64 padding64_6[7];
|
||||
u64 encls_exiting_bitmap;
|
||||
u64 host_ia32_perf_global_ctrl;
|
||||
u64 tsc_multiplier;
|
||||
u64 host_ia32_s_cet;
|
||||
u64 host_ssp;
|
||||
u64 host_ia32_int_ssp_table_addr;
|
||||
u64 padding64_6;
|
||||
};
|
||||
|
||||
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE 0
|
||||
@ -656,6 +667,18 @@ static inline int evmcs_vmread(uint64_t encoding, uint64_t *value)
|
||||
case VIRTUAL_PROCESSOR_ID:
|
||||
*value = current_evmcs->virtual_processor_id;
|
||||
break;
|
||||
case HOST_IA32_PERF_GLOBAL_CTRL:
|
||||
*value = current_evmcs->host_ia32_perf_global_ctrl;
|
||||
break;
|
||||
case GUEST_IA32_PERF_GLOBAL_CTRL:
|
||||
*value = current_evmcs->guest_ia32_perf_global_ctrl;
|
||||
break;
|
||||
case ENCLS_EXITING_BITMAP:
|
||||
*value = current_evmcs->encls_exiting_bitmap;
|
||||
break;
|
||||
case TSC_MULTIPLIER:
|
||||
*value = current_evmcs->tsc_multiplier;
|
||||
break;
|
||||
default: return 1;
|
||||
}
|
||||
|
||||
@ -1169,6 +1192,22 @@ static inline int evmcs_vmwrite(uint64_t encoding, uint64_t value)
|
||||
current_evmcs->virtual_processor_id = value;
|
||||
current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_XLAT;
|
||||
break;
|
||||
case HOST_IA32_PERF_GLOBAL_CTRL:
|
||||
current_evmcs->host_ia32_perf_global_ctrl = value;
|
||||
current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1;
|
||||
break;
|
||||
case GUEST_IA32_PERF_GLOBAL_CTRL:
|
||||
current_evmcs->guest_ia32_perf_global_ctrl = value;
|
||||
current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1;
|
||||
break;
|
||||
case ENCLS_EXITING_BITMAP:
|
||||
current_evmcs->encls_exiting_bitmap = value;
|
||||
current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2;
|
||||
break;
|
||||
case TSC_MULTIPLIER:
|
||||
current_evmcs->tsc_multiplier = value;
|
||||
current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2;
|
||||
break;
|
||||
default: return 1;
|
||||
}
|
||||
|
||||
|
@ -9,15 +9,12 @@
|
||||
#ifndef SELFTEST_KVM_SVM_UTILS_H
|
||||
#define SELFTEST_KVM_SVM_UTILS_H
|
||||
|
||||
#include <asm/svm.h>
|
||||
|
||||
#include <stdint.h>
|
||||
#include "svm.h"
|
||||
#include "processor.h"
|
||||
|
||||
#define SVM_EXIT_EXCP_BASE 0x040
|
||||
#define SVM_EXIT_HLT 0x078
|
||||
#define SVM_EXIT_MSR 0x07c
|
||||
#define SVM_EXIT_VMMCALL 0x081
|
||||
|
||||
struct svm_test_data {
|
||||
/* VMCB */
|
||||
struct vmcb *vmcb; /* gva */
|
||||
|
@ -8,6 +8,8 @@
|
||||
#ifndef SELFTEST_KVM_VMX_H
|
||||
#define SELFTEST_KVM_VMX_H
|
||||
|
||||
#include <asm/vmx.h>
|
||||
|
||||
#include <stdint.h>
|
||||
#include "processor.h"
|
||||
#include "apic.h"
|
||||
@ -100,55 +102,6 @@
|
||||
#define VMX_EPT_VPID_CAP_AD_BITS 0x00200000
|
||||
|
||||
#define EXIT_REASON_FAILED_VMENTRY 0x80000000
|
||||
#define EXIT_REASON_EXCEPTION_NMI 0
|
||||
#define EXIT_REASON_EXTERNAL_INTERRUPT 1
|
||||
#define EXIT_REASON_TRIPLE_FAULT 2
|
||||
#define EXIT_REASON_INTERRUPT_WINDOW 7
|
||||
#define EXIT_REASON_NMI_WINDOW 8
|
||||
#define EXIT_REASON_TASK_SWITCH 9
|
||||
#define EXIT_REASON_CPUID 10
|
||||
#define EXIT_REASON_HLT 12
|
||||
#define EXIT_REASON_INVD 13
|
||||
#define EXIT_REASON_INVLPG 14
|
||||
#define EXIT_REASON_RDPMC 15
|
||||
#define EXIT_REASON_RDTSC 16
|
||||
#define EXIT_REASON_VMCALL 18
|
||||
#define EXIT_REASON_VMCLEAR 19
|
||||
#define EXIT_REASON_VMLAUNCH 20
|
||||
#define EXIT_REASON_VMPTRLD 21
|
||||
#define EXIT_REASON_VMPTRST 22
|
||||
#define EXIT_REASON_VMREAD 23
|
||||
#define EXIT_REASON_VMRESUME 24
|
||||
#define EXIT_REASON_VMWRITE 25
|
||||
#define EXIT_REASON_VMOFF 26
|
||||
#define EXIT_REASON_VMON 27
|
||||
#define EXIT_REASON_CR_ACCESS 28
|
||||
#define EXIT_REASON_DR_ACCESS 29
|
||||
#define EXIT_REASON_IO_INSTRUCTION 30
|
||||
#define EXIT_REASON_MSR_READ 31
|
||||
#define EXIT_REASON_MSR_WRITE 32
|
||||
#define EXIT_REASON_INVALID_STATE 33
|
||||
#define EXIT_REASON_MWAIT_INSTRUCTION 36
|
||||
#define EXIT_REASON_MONITOR_INSTRUCTION 39
|
||||
#define EXIT_REASON_PAUSE_INSTRUCTION 40
|
||||
#define EXIT_REASON_MCE_DURING_VMENTRY 41
|
||||
#define EXIT_REASON_TPR_BELOW_THRESHOLD 43
|
||||
#define EXIT_REASON_APIC_ACCESS 44
|
||||
#define EXIT_REASON_EOI_INDUCED 45
|
||||
#define EXIT_REASON_EPT_VIOLATION 48
|
||||
#define EXIT_REASON_EPT_MISCONFIG 49
|
||||
#define EXIT_REASON_INVEPT 50
|
||||
#define EXIT_REASON_RDTSCP 51
|
||||
#define EXIT_REASON_PREEMPTION_TIMER 52
|
||||
#define EXIT_REASON_INVVPID 53
|
||||
#define EXIT_REASON_WBINVD 54
|
||||
#define EXIT_REASON_XSETBV 55
|
||||
#define EXIT_REASON_APIC_WRITE 56
|
||||
#define EXIT_REASON_INVPCID 58
|
||||
#define EXIT_REASON_PML_FULL 62
|
||||
#define EXIT_REASON_XSAVES 63
|
||||
#define EXIT_REASON_XRSTORS 64
|
||||
#define LAST_EXIT_REASON 64
|
||||
|
||||
enum vmcs_field {
|
||||
VIRTUAL_PROCESSOR_ID = 0x00000000,
|
||||
@ -208,6 +161,8 @@ enum vmcs_field {
|
||||
VMWRITE_BITMAP_HIGH = 0x00002029,
|
||||
XSS_EXIT_BITMAP = 0x0000202C,
|
||||
XSS_EXIT_BITMAP_HIGH = 0x0000202D,
|
||||
ENCLS_EXITING_BITMAP = 0x0000202E,
|
||||
ENCLS_EXITING_BITMAP_HIGH = 0x0000202F,
|
||||
TSC_MULTIPLIER = 0x00002032,
|
||||
TSC_MULTIPLIER_HIGH = 0x00002033,
|
||||
GUEST_PHYSICAL_ADDRESS = 0x00002400,
|
||||
|
295
tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c
Normal file
295
tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c
Normal file
@ -0,0 +1,295 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
#define _GNU_SOURCE /* for program_invocation_short_name */
|
||||
|
||||
#include "test_util.h"
|
||||
#include "kvm_util.h"
|
||||
#include "processor.h"
|
||||
#include "vmx.h"
|
||||
#include "svm_util.h"
|
||||
|
||||
#define L2_GUEST_STACK_SIZE 256
|
||||
|
||||
/*
|
||||
* Arbitrary, never shoved into KVM/hardware, just need to avoid conflict with
|
||||
* the "real" exceptions used, #SS/#GP/#DF (12/13/8).
|
||||
*/
|
||||
#define FAKE_TRIPLE_FAULT_VECTOR 0xaa
|
||||
|
||||
/* Arbitrary 32-bit error code injected by this test. */
|
||||
#define SS_ERROR_CODE 0xdeadbeef
|
||||
|
||||
/*
|
||||
* Bit '0' is set on Intel if the exception occurs while delivering a previous
|
||||
* event/exception. AMD's wording is ambiguous, but presumably the bit is set
|
||||
* if the exception occurs while delivering an external event, e.g. NMI or INTR,
|
||||
* but not for exceptions that occur when delivering other exceptions or
|
||||
* software interrupts.
|
||||
*
|
||||
* Note, Intel's name for it, "External event", is misleading and much more
|
||||
* aligned with AMD's behavior, but the SDM is quite clear on its behavior.
|
||||
*/
|
||||
#define ERROR_CODE_EXT_FLAG BIT(0)
|
||||
|
||||
/*
|
||||
* Bit '1' is set if the fault occurred when looking up a descriptor in the
|
||||
* IDT, which is the case here as the IDT is empty/NULL.
|
||||
*/
|
||||
#define ERROR_CODE_IDT_FLAG BIT(1)
|
||||
|
||||
/*
|
||||
* The #GP that occurs when vectoring #SS should show the index into the IDT
|
||||
* for #SS, plus have the "IDT flag" set.
|
||||
*/
|
||||
#define GP_ERROR_CODE_AMD ((SS_VECTOR * 8) | ERROR_CODE_IDT_FLAG)
|
||||
#define GP_ERROR_CODE_INTEL ((SS_VECTOR * 8) | ERROR_CODE_IDT_FLAG | ERROR_CODE_EXT_FLAG)
|
||||
|
||||
/*
|
||||
* Intel and AMD both shove '0' into the error code on #DF, regardless of what
|
||||
* led to the double fault.
|
||||
*/
|
||||
#define DF_ERROR_CODE 0
|
||||
|
||||
#define INTERCEPT_SS (BIT_ULL(SS_VECTOR))
|
||||
#define INTERCEPT_SS_DF (INTERCEPT_SS | BIT_ULL(DF_VECTOR))
|
||||
#define INTERCEPT_SS_GP_DF (INTERCEPT_SS_DF | BIT_ULL(GP_VECTOR))
|
||||
|
||||
static void l2_ss_pending_test(void)
|
||||
{
|
||||
GUEST_SYNC(SS_VECTOR);
|
||||
}
|
||||
|
||||
static void l2_ss_injected_gp_test(void)
|
||||
{
|
||||
GUEST_SYNC(GP_VECTOR);
|
||||
}
|
||||
|
||||
static void l2_ss_injected_df_test(void)
|
||||
{
|
||||
GUEST_SYNC(DF_VECTOR);
|
||||
}
|
||||
|
||||
static void l2_ss_injected_tf_test(void)
|
||||
{
|
||||
GUEST_SYNC(FAKE_TRIPLE_FAULT_VECTOR);
|
||||
}
|
||||
|
||||
static void svm_run_l2(struct svm_test_data *svm, void *l2_code, int vector,
|
||||
uint32_t error_code)
|
||||
{
|
||||
struct vmcb *vmcb = svm->vmcb;
|
||||
struct vmcb_control_area *ctrl = &vmcb->control;
|
||||
|
||||
vmcb->save.rip = (u64)l2_code;
|
||||
run_guest(vmcb, svm->vmcb_gpa);
|
||||
|
||||
if (vector == FAKE_TRIPLE_FAULT_VECTOR)
|
||||
return;
|
||||
|
||||
GUEST_ASSERT_EQ(ctrl->exit_code, (SVM_EXIT_EXCP_BASE + vector));
|
||||
GUEST_ASSERT_EQ(ctrl->exit_info_1, error_code);
|
||||
}
|
||||
|
||||
static void l1_svm_code(struct svm_test_data *svm)
|
||||
{
|
||||
struct vmcb_control_area *ctrl = &svm->vmcb->control;
|
||||
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
|
||||
|
||||
generic_svm_setup(svm, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
|
||||
svm->vmcb->save.idtr.limit = 0;
|
||||
ctrl->intercept |= BIT_ULL(INTERCEPT_SHUTDOWN);
|
||||
|
||||
ctrl->intercept_exceptions = INTERCEPT_SS_GP_DF;
|
||||
svm_run_l2(svm, l2_ss_pending_test, SS_VECTOR, SS_ERROR_CODE);
|
||||
svm_run_l2(svm, l2_ss_injected_gp_test, GP_VECTOR, GP_ERROR_CODE_AMD);
|
||||
|
||||
ctrl->intercept_exceptions = INTERCEPT_SS_DF;
|
||||
svm_run_l2(svm, l2_ss_injected_df_test, DF_VECTOR, DF_ERROR_CODE);
|
||||
|
||||
ctrl->intercept_exceptions = INTERCEPT_SS;
|
||||
svm_run_l2(svm, l2_ss_injected_tf_test, FAKE_TRIPLE_FAULT_VECTOR, 0);
|
||||
GUEST_ASSERT_EQ(ctrl->exit_code, SVM_EXIT_SHUTDOWN);
|
||||
|
||||
GUEST_DONE();
|
||||
}
|
||||
|
||||
static void vmx_run_l2(void *l2_code, int vector, uint32_t error_code)
|
||||
{
|
||||
GUEST_ASSERT(!vmwrite(GUEST_RIP, (u64)l2_code));
|
||||
|
||||
GUEST_ASSERT_EQ(vector == SS_VECTOR ? vmlaunch() : vmresume(), 0);
|
||||
|
||||
if (vector == FAKE_TRIPLE_FAULT_VECTOR)
|
||||
return;
|
||||
|
||||
GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_EXCEPTION_NMI);
|
||||
GUEST_ASSERT_EQ((vmreadz(VM_EXIT_INTR_INFO) & 0xff), vector);
|
||||
GUEST_ASSERT_EQ(vmreadz(VM_EXIT_INTR_ERROR_CODE), error_code);
|
||||
}
|
||||
|
||||
static void l1_vmx_code(struct vmx_pages *vmx)
|
||||
{
|
||||
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
|
||||
|
||||
GUEST_ASSERT_EQ(prepare_for_vmx_operation(vmx), true);
|
||||
|
||||
GUEST_ASSERT_EQ(load_vmcs(vmx), true);
|
||||
|
||||
prepare_vmcs(vmx, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
|
||||
GUEST_ASSERT_EQ(vmwrite(GUEST_IDTR_LIMIT, 0), 0);
|
||||
|
||||
/*
|
||||
* VMX disallows injecting an exception with error_code[31:16] != 0,
|
||||
* and hardware will never generate a VM-Exit with bits 31:16 set.
|
||||
* KVM should likewise truncate the "bad" userspace value.
|
||||
*/
|
||||
GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS_GP_DF), 0);
|
||||
vmx_run_l2(l2_ss_pending_test, SS_VECTOR, (u16)SS_ERROR_CODE);
|
||||
vmx_run_l2(l2_ss_injected_gp_test, GP_VECTOR, GP_ERROR_CODE_INTEL);
|
||||
|
||||
GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS_DF), 0);
|
||||
vmx_run_l2(l2_ss_injected_df_test, DF_VECTOR, DF_ERROR_CODE);
|
||||
|
||||
GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS), 0);
|
||||
vmx_run_l2(l2_ss_injected_tf_test, FAKE_TRIPLE_FAULT_VECTOR, 0);
|
||||
GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_TRIPLE_FAULT);
|
||||
|
||||
GUEST_DONE();
|
||||
}
|
||||
|
||||
static void __attribute__((__flatten__)) l1_guest_code(void *test_data)
|
||||
{
|
||||
if (this_cpu_has(X86_FEATURE_SVM))
|
||||
l1_svm_code(test_data);
|
||||
else
|
||||
l1_vmx_code(test_data);
|
||||
}
|
||||
|
||||
static void assert_ucall_vector(struct kvm_vcpu *vcpu, int vector)
|
||||
{
|
||||
struct kvm_run *run = vcpu->run;
|
||||
struct ucall uc;
|
||||
|
||||
TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
|
||||
"Unexpected exit reason: %u (%s),\n",
|
||||
run->exit_reason, exit_reason_str(run->exit_reason));
|
||||
|
||||
switch (get_ucall(vcpu, &uc)) {
|
||||
case UCALL_SYNC:
|
||||
TEST_ASSERT(vector == uc.args[1],
|
||||
"Expected L2 to ask for %d, got %ld", vector, uc.args[1]);
|
||||
break;
|
||||
case UCALL_DONE:
|
||||
TEST_ASSERT(vector == -1,
|
||||
"Expected L2 to ask for %d, L2 says it's done", vector);
|
||||
break;
|
||||
case UCALL_ABORT:
|
||||
TEST_FAIL("%s at %s:%ld (0x%lx != 0x%lx)",
|
||||
(const char *)uc.args[0], __FILE__, uc.args[1],
|
||||
uc.args[2], uc.args[3]);
|
||||
break;
|
||||
default:
|
||||
TEST_FAIL("Expected L2 to ask for %d, got unexpected ucall %lu", vector, uc.cmd);
|
||||
}
|
||||
}
|
||||
|
||||
static void queue_ss_exception(struct kvm_vcpu *vcpu, bool inject)
|
||||
{
|
||||
struct kvm_vcpu_events events;
|
||||
|
||||
vcpu_events_get(vcpu, &events);
|
||||
|
||||
TEST_ASSERT(!events.exception.pending,
|
||||
"Vector %d unexpectedlt pending", events.exception.nr);
|
||||
TEST_ASSERT(!events.exception.injected,
|
||||
"Vector %d unexpectedly injected", events.exception.nr);
|
||||
|
||||
events.flags = KVM_VCPUEVENT_VALID_PAYLOAD;
|
||||
events.exception.pending = !inject;
|
||||
events.exception.injected = inject;
|
||||
events.exception.nr = SS_VECTOR;
|
||||
events.exception.has_error_code = true;
|
||||
events.exception.error_code = SS_ERROR_CODE;
|
||||
vcpu_events_set(vcpu, &events);
|
||||
}
|
||||
|
||||
/*
|
||||
* Verify KVM_{G,S}ET_EVENTS play nice with pending vs. injected exceptions
|
||||
* when an exception is being queued for L2. Specifically, verify that KVM
|
||||
* honors L1 exception intercept controls when a #SS is pending/injected,
|
||||
* triggers a #GP on vectoring the #SS, morphs to #DF if #GP isn't intercepted
|
||||
* by L1, and finally causes (nested) SHUTDOWN if #DF isn't intercepted by L1.
|
||||
*/
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
vm_vaddr_t nested_test_data_gva;
|
||||
struct kvm_vcpu_events events;
|
||||
struct kvm_vcpu *vcpu;
|
||||
struct kvm_vm *vm;
|
||||
|
||||
TEST_REQUIRE(kvm_has_cap(KVM_CAP_EXCEPTION_PAYLOAD));
|
||||
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM) || kvm_cpu_has(X86_FEATURE_VMX));
|
||||
|
||||
vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
|
||||
vm_enable_cap(vm, KVM_CAP_EXCEPTION_PAYLOAD, -2ul);
|
||||
|
||||
if (kvm_cpu_has(X86_FEATURE_SVM))
|
||||
vcpu_alloc_svm(vm, &nested_test_data_gva);
|
||||
else
|
||||
vcpu_alloc_vmx(vm, &nested_test_data_gva);
|
||||
|
||||
vcpu_args_set(vcpu, 1, nested_test_data_gva);
|
||||
|
||||
/* Run L1 => L2. L2 should sync and request #SS. */
|
||||
vcpu_run(vcpu);
|
||||
assert_ucall_vector(vcpu, SS_VECTOR);
|
||||
|
||||
/* Pend #SS and request immediate exit. #SS should still be pending. */
|
||||
queue_ss_exception(vcpu, false);
|
||||
vcpu->run->immediate_exit = true;
|
||||
vcpu_run_complete_io(vcpu);
|
||||
|
||||
/* Verify the pending events comes back out the same as it went in. */
|
||||
vcpu_events_get(vcpu, &events);
|
||||
ASSERT_EQ(events.flags & KVM_VCPUEVENT_VALID_PAYLOAD,
|
||||
KVM_VCPUEVENT_VALID_PAYLOAD);
|
||||
ASSERT_EQ(events.exception.pending, true);
|
||||
ASSERT_EQ(events.exception.nr, SS_VECTOR);
|
||||
ASSERT_EQ(events.exception.has_error_code, true);
|
||||
ASSERT_EQ(events.exception.error_code, SS_ERROR_CODE);
|
||||
|
||||
/*
|
||||
* Run for real with the pending #SS, L1 should get a VM-Exit due to
|
||||
* #SS interception and re-enter L2 to request #GP (via injected #SS).
|
||||
*/
|
||||
vcpu->run->immediate_exit = false;
|
||||
vcpu_run(vcpu);
|
||||
assert_ucall_vector(vcpu, GP_VECTOR);
|
||||
|
||||
/*
|
||||
* Inject #SS, the #SS should bypass interception and cause #GP, which
|
||||
* L1 should intercept before KVM morphs it to #DF. L1 should then
|
||||
* disable #GP interception and run L2 to request #DF (via #SS => #GP).
|
||||
*/
|
||||
queue_ss_exception(vcpu, true);
|
||||
vcpu_run(vcpu);
|
||||
assert_ucall_vector(vcpu, DF_VECTOR);
|
||||
|
||||
/*
|
||||
* Inject #SS, the #SS should bypass interception and cause #GP, which
|
||||
* L1 is no longer interception, and so should see a #DF VM-Exit. L1
|
||||
* should then signal that is done.
|
||||
*/
|
||||
queue_ss_exception(vcpu, true);
|
||||
vcpu_run(vcpu);
|
||||
assert_ucall_vector(vcpu, FAKE_TRIPLE_FAULT_VECTOR);
|
||||
|
||||
/*
|
||||
* Inject #SS yet again. L1 is not intercepting #GP or #DF, and so
|
||||
* should see nested TRIPLE_FAULT / SHUTDOWN.
|
||||
*/
|
||||
queue_ss_exception(vcpu, true);
|
||||
vcpu_run(vcpu);
|
||||
assert_ucall_vector(vcpu, -1);
|
||||
|
||||
kvm_vm_free(vm);
|
||||
}
|
@ -118,13 +118,6 @@ void run_test(int reclaim_period_ms, bool disable_nx_huge_pages,
|
||||
vm = vm_create(1);
|
||||
|
||||
if (disable_nx_huge_pages) {
|
||||
/*
|
||||
* Cannot run the test without NX huge pages if the kernel
|
||||
* does not support it.
|
||||
*/
|
||||
if (!kvm_check_cap(KVM_CAP_VM_DISABLE_NX_HUGE_PAGES))
|
||||
return;
|
||||
|
||||
r = __vm_disable_nx_huge_pages(vm);
|
||||
if (reboot_permissions) {
|
||||
TEST_ASSERT(!r, "Disabling NX huge pages should succeed if process has reboot permissions");
|
||||
@ -248,18 +241,13 @@ int main(int argc, char **argv)
|
||||
}
|
||||
}
|
||||
|
||||
if (token != MAGIC_TOKEN) {
|
||||
print_skip("This test must be run with the magic token %d.\n"
|
||||
"This is done by nx_huge_pages_test.sh, which\n"
|
||||
"also handles environment setup for the test.",
|
||||
MAGIC_TOKEN);
|
||||
exit(KSFT_SKIP);
|
||||
}
|
||||
TEST_REQUIRE(kvm_has_cap(KVM_CAP_VM_DISABLE_NX_HUGE_PAGES));
|
||||
TEST_REQUIRE(reclaim_period_ms > 0);
|
||||
|
||||
if (!reclaim_period_ms) {
|
||||
print_skip("The NX reclaim period must be specified and non-zero");
|
||||
exit(KSFT_SKIP);
|
||||
}
|
||||
__TEST_REQUIRE(token == MAGIC_TOKEN,
|
||||
"This test must be run with the magic token %d.\n"
|
||||
"This is done by nx_huge_pages_test.sh, which\n"
|
||||
"also handles environment setup for the test.");
|
||||
|
||||
run_test(reclaim_period_ms, false, reboot_permissions);
|
||||
run_test(reclaim_period_ms, true, reboot_permissions);
|
||||
|
@ -3409,10 +3409,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
|
||||
int ret = -EINTR;
|
||||
int idx = srcu_read_lock(&vcpu->kvm->srcu);
|
||||
|
||||
if (kvm_arch_vcpu_runnable(vcpu)) {
|
||||
kvm_make_request(KVM_REQ_UNHALT, vcpu);
|
||||
if (kvm_arch_vcpu_runnable(vcpu))
|
||||
goto out;
|
||||
}
|
||||
if (kvm_cpu_has_pending_timer(vcpu))
|
||||
goto out;
|
||||
if (signal_pending(current))
|
||||
@ -5881,7 +5879,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
|
||||
|
||||
r = kvm_async_pf_init();
|
||||
if (r)
|
||||
goto out_free_5;
|
||||
goto out_free_4;
|
||||
|
||||
kvm_chardev_ops.owner = module;
|
||||
|
||||
@ -5905,10 +5903,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
|
||||
|
||||
out_unreg:
|
||||
kvm_async_pf_deinit();
|
||||
out_free_5:
|
||||
out_free_4:
|
||||
for_each_possible_cpu(cpu)
|
||||
free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
|
||||
out_free_4:
|
||||
kmem_cache_destroy(kvm_vcpu_cache);
|
||||
out_free_3:
|
||||
unregister_reboot_notifier(&kvm_reboot_notifier);
|
||||
|
Loading…
Reference in New Issue
Block a user