linux

Author	SHA1	Message	Date
Sean Christopherson	2fba4fc155	KVM: VMX: Hide VMCS control calculators in vmx.c Now that nested VMX pulls KVM's desired VMCS controls from vmcs01 instead of re-calculating on the fly, bury the helpers that do the calcluations in vmx.c. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:35:15 -04:00
Sean Christopherson	b6247686b7	KVM: VMX: Drop caching of KVM's desired sec exec controls for vmcs01 Remove the secondary execution controls cache now that it's effectively dead code; it is only read immediately after it is written. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:35:15 -04:00
Sean Christopherson	389ab25216	KVM: nVMX: Pull KVM L0's desired controls directly from vmcs01 When preparing controls for vmcs02, grab KVM's desired controls from vmcs01's shadow state instead of recalculating the controls from scratch, or in the secondary execution controls, instead of using the dedicated cache. Calculating secondary exec controls is eye-poppingly expensive due to the guest CPUID checks, hence the dedicated cache, but the other calculations aren't exactly free either. Explicitly clear several bits (x2APIC, DESC exiting, and load EFER on exit) as appropriate as they may be set in vmcs01, whereas the previous implementation relied on dynamic bits being cleared in the calculator. Intentionally propagate VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL from vmcs01 to vmcs02. Whether or not PERF_GLOBAL_CTRL is loaded depends on whether or not perf itself is active, so unless perf stops between the exit from L1 and entry to L2, vmcs01 will hold the desired value. This is purely an optimization as atomic_switch_perf_msrs() will set/clear the control as needed at VM-Enter, i.e. it avoids two extra VMWRITEs in the case where perf is active (versus starting with the bits clear in vmcs02, which was the previous behavior). Cc: Zeng Guang <guang.zeng@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:35:15 -04:00
Paolo Bonzini	c3e9434c98	Merge branch 'kvm-vmx-secctl' into HEAD Merge common topic branch for 5.14-rc6 and 5.15 merge window.	2021-08-10 13:45:26 -04:00
Sean Christopherson	7b9cae027b	KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation Use the secondary_exec_controls_get() accessor in vmx_has_waitpkg() to effectively get the controls for the current VMCS, as opposed to using vmx->secondary_exec_controls, which is the cached value of KVM's desired controls for vmcs01 and truly not reflective of any particular VMCS. While the waitpkg control is not dynamic, i.e. vmcs01 will always hold the same waitpkg configuration as vmx->secondary_exec_controls, the same does not hold true for vmcs02 if the L1 VMM hides the feature from L2. If L1 hides the feature _and_ does not intercept MSR_IA32_UMWAIT_CONTROL, L2 could incorrectly read/write L1's virtual MSR instead of taking a #GP. Fixes: `6e3ba4abce` ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-10 13:32:09 -04:00
Sean Christopherson	84ec8d2d53	KVM: VMX: Smush x2APIC MSR bitmap adjustments into single function Consolidate all of the dynamic MSR bitmap adjustments into vmx_update_msr_bitmap_x2apic(), and rename the mode tracker to reflect that it is x2APIC specific. If KVM gains more cases of dynamic MSR pass-through, odds are very good that those new cases will be better off with their own logic, e.g. see Intel PT MSRs and MSR_IA32_SPEC_CTRL. Attempting to handle all updates in a common helper did more harm than good, as KVM ended up collecting a large number of useless "updates". Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-42-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-02 11:01:58 -04:00
Sean Christopherson	284036c644	KVM: nVMX: Remove obsolete MSR bitmap refresh at nested transitions Drop unnecessary MSR bitmap updates during nested transitions, as L1's APIC_BASE MSR is not modified by the standard VM-Enter/VM-Exit flows, and L2's MSR bitmap is managed separately. In the unlikely event that L1 is pathological and loads APIC_BASE via the VM-Exit load list, KVM will handle updating the bitmap in its normal WRMSR flows. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-39-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-02 11:01:58 -04:00
Sean Christopherson	816be9e9be	KVM: nVMX: Don't evaluate "emulation required" on nested VM-Exit Use the "internal" variants of setting segment registers when stuffing state on nested VM-Exit in order to skip the "emulation required" updates. VM-Exit must always go to protected mode, and all segments are mostly hardcoded (to valid values) on VM-Exit. The bits of the segments that aren't hardcoded are explicitly checked during VM-Enter, e.g. the selector RPLs must all be zero. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-30-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-02 11:01:55 -04:00
Yu Zhang	c0e1303ed4	KVM: VMX: Remove vmx_msr_index from vmx.h vmx_msr_index was used to record the list of MSRs which can be lazily restored when kvm returns to userspace. It is now reimplemented as kvm_uret_msrs_list, a common x86 list which is only used inside x86.c. So just remove the obsolete declaration in vmx.h. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Message-Id: <20210707235702.31595-1-yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-07-15 10:19:41 -04:00
Sean Christopherson	b33bb78a1f	KVM: nVMX: Handle split-lock #AC exceptions that happen in L2 Mark #ACs that won't be reinjected to the guest as wanted by L0 so that KVM handles split-lock #AC from L2 instead of forwarding the exception to L1. Split-lock #AC isn't yet virtualized, i.e. L1 will treat it like a regular #AC and do the wrong thing, e.g. reinject it into L2. Fixes: `e6f8b6c12f` ("KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest") Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622172244.3561540-1-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 04:31:16 -04:00
Vineeth Pillai	3c86c0d3db	KVM: x86: hyper-v: Move the remote TLB flush logic out of vmx Currently the remote TLB flush logic is specific to VMX. Move it to a common place so that SVM can use it as well. Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Message-Id: <4f4e4ca19778437dae502f44363a38e99e3ef5d1.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:36 -04:00
Ilias Stamatis	1ab9287add	KVM: X86: Add vendor callbacks for writing the TSC multiplier Currently vmx_vcpu_load_vmcs() writes the TSC_MULTIPLIER field of the VMCS every time the VMCS is loaded. Instead of doing this, set this field from common code on initialization and whenever the scaling ratio changes. Additionally remove vmx->current_tsc_ratio. This field is redundant as vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling ratio. The vmx->current_tsc_ratio field is only used for avoiding unnecessary writes but it is no longer needed after removing the code from the VMCS load path. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ilias Stamatis <ilstam@amazon.com> Message-Id: <20210607105438.16541-1-ilstam@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:29 -04:00
Ilias Stamatis	307a94c721	KVM: X86: Add functions for retrieving L2 TSC fields from common code In order to implement as much of the nested TSC scaling logic as possible in common code, we need these vendor callbacks for retrieving the TSC offset and the TSC multiplier that L1 has set for L2. Signed-off-by: Ilias Stamatis <ilstam@amazon.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-7-ilstam@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:28 -04:00
Sean Christopherson	ee9d22e08d	KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list Explicitly flag a uret MSR as needing to be loaded into hardware instead of resorting the list of "active" MSRs and tracking how many MSRs in total need to be loaded. The only benefit to sorting the list is that the loop to load MSRs during vmx_prepare_switch_to_guest() doesn't need to iterate over all supported uret MRS, only those that are active. But that is a pointless optimization, as the most common case, running a 64-bit guest, will load the vast majority of MSRs. Not to mention that a single WRMSR is far more expensive than iterating over the list. Providing a stable list order obviates the need to track a given MSR's "slot" in the per-CPU list of user return MSRs; all lists simply use the same ordering. Future patches will take advantage of the stable order to further simplify the related code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:18 -04:00
Sean Christopherson	b6194b94a2	KVM: VMX: Configure list of user return MSRs at module init Configure the list of user return MSRs that are actually supported at module init instead of reprobing the list of possible MSRs every time a vCPU is created. Curating the list on a per-vCPU basis is pointless; KVM is completely hosed if the set of supported MSRs changes after module init, or if the set of MSRs differs per physical PCU. The per-vCPU lists also increase complexity (see __vmx_find_uret_msr()) and creates corner cases that _should_ be impossible, but theoretically exist in KVM, e.g. advertising RDTSCP to userspace without actually being able to virtualize RDTSCP if probing MSR_TSC_AUX fails. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:17 -04:00
Sean Christopherson	e23f6d490e	KVM: VMX: Invert the inlining of MSR interception helpers Invert the inline declarations of the MSR interception helpers between the wrapper, vmx_set_intercept_for_msr(), and the core implementations, vmx_{dis,en}able_intercept_for_msr(). Letting the compiler _not_ inline the implementation reduces KVM's code footprint by ~3k bytes. Back when the helpers were added in commit `904e14fb7c` ("KVM: VMX: make MSR bitmaps per-VCPU"), both the wrapper and the implementations were __always_inline because the end code distilled down to a few conditionals and a bit operation. Today, the implementations involve a variety of checks and bit ops in order to support userspace MSR filtering. Furthermore, the vast majority of calls to manipulate MSR interception are not performance sensitive, e.g. vCPU creation and x2APIC toggling. On the other hand, the one path that is performance sensitive, dynamic LBR passthrough, uses the wrappers, i.e. is largely untouched by inverting the inlining. In short, forcing the low level MSR interception code to be inlined no longer makes sense. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210423221912.3857243-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:19:33 -04:00
Sean Christopherson	8f102445d4	KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs Emulate the four Launch Enclave public key hash MSRs (LE hash MSRs) that exist on CPUs that support SGX Launch Control (LC). SGX LC modifies the behavior of ENCLS[EINIT] to use the LE hash MSRs when verifying the key used to sign an enclave. On CPUs without LC support, the LE hash is hardwired into the CPU to an Intel controlled key (the Intel key is also the reset value of the LE hash MSRs). Track the guest's desired hash so that a future patch can stuff the hash into the hardware MSRs when executing EINIT on behalf of the guest, when those MSRs are writable in host. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <c58ef601ddf88f3a113add837969533099b1364a.1618196135.git.kai.huang@intel.com> [Add a comment regarding the MSRs being available until SGX is locked. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:55 -04:00
David Edmondson	0702a3cbbf	KVM: x86: dump_vmcs should show the effective EFER If EFER is not being loaded from the VMCS, show the effective value by reference to the MSR autoload list or calculation. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-5-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:31:00 -04:00
Sean Christopherson	978c834a66	KVM: VMX: Track root HPA instead of EPTP for paravirt Hyper-V TLB flush Track the address of the top-level EPT struct, a.k.a. the root HPA, instead of the EPTP itself for Hyper-V's paravirt TLB flush. The paravirt API takes only the address, not the full EPTP, and in theory tracking the EPTP could lead to false negatives, e.g. if the HPA matched but the attributes in the EPTP do not. In practice, such a mismatch is extremely unlikely, if not flat out impossible, given how KVM generates the EPTP. Opportunsitically rename the related fields to use the 'root' nomenclature, and to prefix them with 'hv_' to connect them to Hyper-V's paravirt TLB flushing. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:44:06 -04:00
Sean Christopherson	ee36656f0a	KVM: VMX: Define Hyper-V paravirt TLB flush fields iff Hyper-V is enabled Ifdef away the Hyper-V specific fields in structs kvm_vmx and vcpu_vmx as each field has only a single reference outside of the struct itself that isn't already wrapped in ifdeffery (and both are initialization). vcpu_vmx.ept_pointer in particular should be wrapped as it is valid if and only if Hyper-v is active, i.e. non-Hyper-V code cannot rely on it to actually track the current EPTP (without additional code changes). Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:44:04 -04:00
Sean Christopherson	cdbd4b40e7	KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatch Drop the dedicated 'ept_pointers_match' field in favor of stuffing 'hv_tlb_eptp' with INVALID_PAGE to mark it as invalid, i.e. to denote that there is at least one EPTP mismatch. Use a local variable to track whether or not a mismatch is detected so that hv_tlb_eptp can be used to skip redundant flushes. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:44:01 -04:00
Sean Christopherson	a4038ef1aa	KVM: VMX: Track common EPTP for Hyper-V's paravirt TLB flush Explicitly track the EPTP that is common to all vCPUs instead of grabbing vCPU0's EPTP when invoking Hyper-V's paravirt TLB flush. Tracking the EPTP will allow optimizing the checks when loading a new EPTP and will also allow dropping ept_pointer_match, e.g. by marking the common EPTP as invalid. This also technically fixes a bug where KVM could theoretically flush an invalid GPA if all vCPUs have an invalid root. In practice, it's likely impossible to trigger a remote TLB flush in such a scenario. In any case, the superfluous flush is completely benign. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:57 -04:00
Sean Christopherson	e83bc09caf	KVM: x86: Get active PCID only when writing a CR3 value Retrieve the active PCID only when writing a guest CR3 value, i.e. don't get the PCID when using EPT or NPT. The PCID is especially problematic for EPT as the bits have different meaning, and so the PCID and must be manually stripped, which is annoying and unnecessary. And on VMX, getting the active PCID also involves reading the guest's CR3 and CR4.PCIDE, i.e. may add pointless VMREADs. Opportunistically rename the pgd/pgd_level params to root_hpa and root_level to better reflect their new roles. Keep the function names, as "load the guest PGD" is still accurate/correct. Last, and probably least, pass root_hpa as a hpa_t/u64 instead of an unsigned long. The EPTP holds a 64-bit value, even in 32-bit mode, so in theory EPT could support HIGHMEM for 32-bit KVM. Never mind that doing so would require changing the MMU page allocators and reworking the MMU to use kmap(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-2-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:56 -04:00
Makarand Sonare	a85863c2ec	KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging Currently, if enable_pml=1 PML remains enabled for the entire lifetime of the VM irrespective of whether dirty logging is enable or disabled. When dirty logging is disabled, all the pages of the VM are manually marked dirty, so that PML is effectively non-operational. Setting the dirty bits is an expensive operation which can cause severe MMU lock contention in a performance sensitive path when dirty logging is disabled after a failed or canceled live migration. Manually setting dirty bits also fails to prevent PML activity if some code path clears dirty bits, which can incur unnecessary VM-Exits. In order to avoid this extra overhead, dynamically enable/disable PML when dirty logging gets turned on/off for the first/last memslot. Signed-off-by: Makarand Sonare <makarandsonare@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:34 -05:00
Uros Bizjak	150f17bfab	KVM/nVMX: Use __vmx_vcpu_run in nested_vmx_check_vmentry_hw Replace inline assembly in nested_vmx_check_vmentry_hw with a call to __vmx_vcpu_run. The function is not performance critical, so (double) GPR save/restore in __vmx_vcpu_run can be tolerated, as far as performance effects are concerned. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Reviewed-and-tested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> [sean: dropped versioning info from changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201231002702.2223707-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:32 -05:00
Jason Baron	b6a7cc3544	KVM: X86: prepend vmx/svm prefix to additional kvm_x86_ops functions A subsequent patch introduces macros in preparation for simplifying the definition for vmx_x86_ops and svm_x86_ops. Making the naming more uniform expands the coverage of the macros. Add vmx/svm prefix to the following functions: update_exception_bitmap(), enable_nmi_window(), enable_irq_window(), update_cr8_intercept and enable_smi_window(). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Message-Id: <ed594696f8e2c2b2bfc747504cee9bbb2a269300.1610680941.git.jbaron@akamai.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:29 -05:00
Like Xu	9254beaafd	KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation When the LBR records msrs has already been pass-through, there is no need to call vmx_update_intercept_for_lbr_msrs() again and again, and vice versa. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Message-Id: <20210201051039.255478-8-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:25 -05:00
Like Xu	1b5ac3226a	KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE In addition to DEBUGCTLMSR_LBR, any KVM trap caused by LBR msrs access will result in a creation of guest LBR event per-vcpu. If the guest LBR event is scheduled on with the corresponding vcpu context, KVM will pass-through all LBR records msrs to the guest. The LBR callstack mechanism implemented in the host could help save/restore the guest LBR records during the event context switches, which reduces a lot of overhead if we save/restore tens of LBR msrs (e.g. 32 LBR records entries) in the much more frequent VMX transitions. To avoid reclaiming LBR resources from any higher priority event on host, KVM would always check the exist of guest LBR event and its state before vm-entry as late as possible. A negative result would cancel the pass-through state, and it also prevents real registers accesses and potential data leakage. If host reclaims the LBR between two checks, the interception state and LBR records can be safely preserved due to native save/restore support from guest LBR event. The KVM emits a pr_warn() when the LBR hardware is unavailable to the guest LBR event. The administer is supposed to reminder users that the guest result may be inaccurate if someone is using LBR to record hypervisor on the host side. Suggested-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Message-Id: <20210201051039.255478-7-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:25 -05:00
Like Xu	8e12911b24	KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR When vcpu sets DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR, the KVM handler would create a guest LBR event which enables the callstack mode and none of hardware counter is assigned. The host perf would schedule and enable this event as usual but in an exclusive way. The guest LBR event will be released when the vPMU is reset but soon, the lazy release mechanism would be applied to this event like a vPMC. Suggested-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Message-Id: <20210201051039.255478-6-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:24 -05:00
Like Xu	c646236344	KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES MSR which tells about the record format stored in the LBR records. The LBR will be enabled on the guest if host perf supports LBR (checked via x86_perf_get_lbr()) and the vcpu model is compatible with the host one. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:24 -05:00
Paolo Bonzini	9c9520ce88	KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES MSR which tells about the record format stored in the LBR records. The LBR will be enabled on the guest if host perf supports LBR (checked via x86_perf_get_lbr()) and the vcpu model is compatible with the host one. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:24 -05:00
Like Xu	252e365eb2	KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static To make code responsibilities clear, we may resue and invoke the vmx_set_intercept_for_msr() in other vmx-specific files (e.g. pmu_intel.c), so expose it to passthrough LBR msrs later. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Message-Id: <20210201051039.255478-2-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:23 -05:00
Chenyi Qiang	fe6b6bc802	KVM: VMX: Enable bus lock VM exit Virtual Machine can exploit bus locks to degrade the performance of system. Bus lock can be caused by split locked access to writeback(WB) memory or by using locks on uncacheable(UC) memory. The bus lock is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). To address the threat, bus lock VM exit is introduced to notify the VMM when a bus lock was acquired, allowing it to enforce throttling or other policy based mitigations. A VMM can enable VM exit due to bus locks by setting a new "Bus Lock Detection" VM-execution control(bit 30 of Secondary Processor-based VM execution controls). If delivery of this VM exit was preempted by a higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of exit reason in vmcs field is set to 1. In current implementation, the KVM exposes this capability through KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap (i.e. off and exit) and enable it explicitly (disabled by default). If bus locks in guest are detected by KVM, exit to user space even when current exit reason is handled by KVM internally. Set a new field KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there is a bus lock detected in guest. Document for Bus Lock VM exit is now available at the latest "Intel Architecture Instruction Set Extensions Programming Reference". Document Link: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:21 -05:00
Sean Christopherson	8e53324021	KVM: VMX: Convert vcpu_vmx.exit_reason to a union Convert vcpu_vmx.exit_reason from a u32 to a union (of size u32). The full VM_EXIT_REASON field is comprised of a 16-bit basic exit reason in bits 15:0, and single-bit modifiers in bits 31:16. Historically, KVM has only had to worry about handling the "failed VM-Entry" modifier, which could only be set in very specific flows and required dedicated handling. I.e. manually stripping the FAILED_VMENTRY bit was a somewhat viable approach. But even with only a single bit to worry about, KVM has had several bugs related to comparing a basic exit reason against the full exit reason store in vcpu_vmx. Upcoming Intel features, e.g. SGX, will add new modifier bits that can be set on more or less any VM-Exit, as opposed to the significantly more restricted FAILED_VMENTRY, i.e. correctly handling everything in one-off flows isn't scalable. Tracking exit reason in a union forces code to explicitly choose between consuming the full exit reason and the basic exit, and is a convenient way to document and access the modifiers. No functional change intended. Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-2-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:21 -05:00
Sean Christopherson	c2fe3cd460	KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(), and invoke the new hook from kvm_valid_cr4(). This fixes an issue where KVM_SET_SREGS would return success while failing to actually set CR4. Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return in __set_sregs() is not a viable option as KVM has already stuffed a variety of vCPU state. Note, kvm_valid_cr4() and is_valid_cr4() have different return types and inverted semantics. This will be remedied in a future patch. Fixes: `5e1746d620` ("KVM: nVMX: Allow setting the VMXE bit in CR4") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-15 09:49:07 -05:00
Paolo Bonzini	c0623f5e5d	Merge branch 'kvm-fixes' into 'next' Pick up bugfixes from 5.9, otherwise various tests fail.	2020-10-21 18:05:58 -04:00
Maxim Levitsky	72f211ecaa	KVM: x86: allow kvm_x86_ops.set_efer to return an error value This will be used to signal an error to the userspace, in case the vendor code failed during handling of this msr. (e.g -ENOMEM) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-21 17:48:48 -04:00
Alexander Graf	3eb900173c	KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied We will introduce the concept of MSRs that may not be handled in kernel space soon. Some MSRs are directly passed through to the guest, effectively making them handled by KVM from user space's point of view. This patch introduces all logic required to ensure that MSRs that user space wants trapped are not marked as direct access for guests. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-7-graf@amazon.com> [Replace "_idx" with "_slot". - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:07 -04:00
Aaron Lewis	476c9bd8e9	KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs Prepare vmx and svm for a subsequent change that ensures the MSR permission bitmap is set to allow an MSR that userspace is tracking to force a vmx_vmexit in the guest. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Oliver Upton <oupton@google.com> [agraf: rebase, adapt SVM scheme to nested changes that came in between] Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-5-graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:05 -04:00
Sean Christopherson	802145c56a	KVM: VMX: Rename vmx_uret_msr's "index" to "slot" Rename "index" to "slot" in struct vmx_uret_msr to align with the terminology used by common x86's kvm_user_return_msrs, and to avoid conflating "MSR's ECX index" with "MSR's index into an array". No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-16-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:02 -04:00
Sean Christopherson	d85a8034c0	KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr" Rename "find_msr_entry" to scope it to VMX and to associate it with guest_uret_msrs. Drop the "entry" so that the function name pairs with the existing __vmx_find_uret_msr(), which intentionally uses a double underscore prefix instead of appending "index" or "slot" as those names are already claimed by other pieces of the user return MSR stack. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-13-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:00 -04:00
Sean Christopherson	658ece84f5	KVM: VMX: Rename vcpu_vmx's "guest_msrs_ready" to "guest_uret_msrs_loaded" Add "uret" to "guest_msrs_ready" to explicitly associate it with the "guest_uret_msrs" array, and replace "ready" with "loaded" to more precisely reflect what it tracks, e.g. "ready" could be interpreted as meaning ready for processing (setup_msrs() has run), which is wrong. "loaded" also aligns with the similar "guest_state_loaded" field. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:58 -04:00
Sean Christopherson	e9bb1ae92d	KVM: VMX: Rename vcpu_vmx's "save_nmsrs" to "nr_active_uret_msrs" Add "uret" into the name of "save_nmsrs" to explicitly associate it with the guest_uret_msrs array, and replace "save" with "active" (for lack of a better word) to better describe what is being tracked. While "save" is more or less accurate when viewed as a literal description of the field, e.g. it holds the number of MSRs that were saved into the array the last time setup_msrs() was invoked, it can easily be misinterpreted by the reader, e.g. as meaning the number of MSRs that were saved from hardware at some point in the past, or as the number of MSRs that need to be saved at some point in the future, both of which are wrong. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:57 -04:00
Sean Christopherson	fbc1800738	KVM: VMX: Rename vcpu_vmx's "nmsrs" to "nr_uret_msrs" Rename vcpu_vmx.nsmrs to vcpu_vmx.nr_uret_msrs to explicitly associate it with the guest_uret_msrs array. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:57 -04:00
Sean Christopherson	eb3db1b137	KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr" Rename struct "shared_msr_entry" to "vmx_uret_msr" to align with x86's rename of "shared_msrs" to "user_return_msrs", and to call out that the struct is specific to VMX, i.e. not part of the generic "shared_msrs" framework. Abbreviate "user_return" as "uret" to keep line lengths marginally sane and code more or less readable. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:56 -04:00
Sean Christopherson	a128a934f2	KVM: VMX: Rename "vmx_find_msr_index" to "vmx_find_loadstore_msr_slot" Add "loadstore" to vmx_find_msr_index() to differentiate it from the so called shared MSRs helpers (which will soon be renamed), and replace "index" with "slot" to better convey that the helper returns slot in the array, not the MSR index (the value that gets stuffed into ECX). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:55 -04:00
Sean Christopherson	ce833b2324	KVM: VMX: Prepend "MAX_" to MSR array size defines Add "MAX" to the LOADSTORE and so called SHARED MSR defines to make it more clear that the define controls the array size, as opposed to the actual number of valid entries that are in the array. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:55 -04:00
Sean Christopherson	2ba4493a8b	KVM: nVMX: Explicitly check for valid guest state for !unrestricted guest Call guest_state_valid() directly instead of querying emulation_required when checking if L1 is attempting VM-Enter with invalid guest state. If emulate_invalid_guest_state is false, KVM will fixup segment regs to avoid emulation and will never set emulation_required, i.e. KVM will incorrectly miss the associated consistency checks because the nested path stuffs segments directly into vmcs02. Opportunsitically add Consistency Check tracing to make future debug suck a little less. Fixes: `2bb8cafea8` ("KVM: vVMX: signal failure for nested VMEntry if emulation_required") Fixes: `3184a995f7` ("KVM: nVMX: fix vmentry failure code when L2 state would require emulation") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923184452.980-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:45 -04:00
Sean Christopherson	5a085326d5	KVM: VMX: Rename ops.h to vmx_ops.h Rename ops.h to vmx_ops.h to allow adding a tdx_ops.h in the future without causing massive confusion. Trust Domain Extensions (TDX) is built on VMX, but KVM cannot directly access the VMCS(es) for a TDX guest, thus TDX will need its own "ops" implementation for wrapping the low level operations. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923183112.3030-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:38 -04:00
Xiaoyao Li	8888cdd099	KVM: VMX: Extract posted interrupt support to separate files Extract the posted interrupt code so that it can be reused for Trust Domain Extensions (TDX), which requires posted interrupts and can use KVM VMX's implementation almost verbatim. TDX is different enough from raw VMX that it is highly desirable to implement the guts of TDX in a separate file, i.e. reusing posted interrupt code by shoving TDX support into vmx.c would be a mess. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923183112.3030-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:38 -04:00

1 2 3

133 Commits