linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-04 01:51:34 +00:00

Author	SHA1	Message	Date
Vitaly Kuznetsov	046f5756c4	KVM: nVMX: hyper-v: Cache VP assist page in 'struct kvm_vcpu_hv' In preparation to enabling L2 TLB flush, cache VP assist page in 'struct kvm_vcpu_hv'. While on it, rename nested_enlightened_vmentry() to nested_get_evmptr() and make it return eVMCS GPA directly. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-26-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:16 -05:00
Vitaly Kuznetsov	d4baf1a9a5	KVM: x86: hyper-v: Introduce fast guest_hv_cpuid_has_l2_tlb_flush() check Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL from L2 in L0 to process L2 TLB flush requests. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-25-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:15 -05:00
Vitaly Kuznetsov	c58a318f60	KVM: x86: hyper-v: L2 TLB flush Handle L2 TLB flush requests by going through all vCPUs and checking whether there are vCPUs running the same VM_ID with a VP_ID specified in the requests. Perform synthetic exit to L2 upon finish. Note, while checking VM_ID/VP_ID of running vCPUs seem to be a bit racy, we count on the fact that KVM flushes the whole L2 VPID upon transition. Also, KVM_REQ_HV_TLB_FLUSH request needs to be done upon transition between L1 and L2 to make sure all pending requests are always processed. For the reference, Hyper-V TLFS refers to the feature as "Direct Virtual Flush". Note, nVMX/nSVM code does not handle VMCALL/VMMCALL from L2 yet. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-24-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:14 -05:00
Vitaly Kuznetsov	3c9eb0655f	KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall() The newly introduced helper checks whether vCPU is performing a Hyper-V TLB flush hypercall. This is required to filter out L2 TLB flush hypercalls for processing. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-23-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:14 -05:00
Vitaly Kuznetsov	b0c9c25e46	KVM: x86: Introduce .hv_inject_synthetic_vmexit_post_tlb_flush() nested hook Hyper-V supports injecting synthetic L2->L1 exit after performing L2 TLB flush operation but the procedure is vendor specific. Introduce .hv_inject_synthetic_vmexit_post_tlb_flush nested hook for it. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-22-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:13 -05:00
Vitaly Kuznetsov	e45aa2444d	KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id Similar to nSVM, KVM needs to know L2's VM_ID/VP_ID and Partition assist page address to handle L2 TLB flush requests. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-21-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:12 -05:00
Vitaly Kuznetsov	38edb45231	KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use To handle L2 TLB flush requests, KVM needs to keep track of L2's VM_ID/ VP_IDs which are set by L1 hypervisor. 'Partition assist page' address is also needed to handle post-flush exit to L1 upon request. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-20-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:11 -05:00
Vitaly Kuznetsov	7d5e88d301	KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks' To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1 may use vCPU overcommit for L2. To avoid growing on-stack allocation, make 'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated dynamically. Note: sparse_set_to_vcpu_mask() can't currently be used to handle L2 requests as KVM does not keep L2 VM_ID -> L2 VCPU_ID -> L1 vCPU mappings, i.e. its vp_bitmap array is still bounded by the number of L1 vCPUs and so can remain an on-stack allocation. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-19-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:10 -05:00
Vitaly Kuznetsov	53ca765a04	KVM: x86: hyper-v: Create a separate fifo for L2 TLB flush To handle L2 TLB flush requests, KVM needs to use a separate fifo from regular (L1) Hyper-V TLB flush requests: e.g. when a request to flush something in L2 is made, the target vCPU can transition from L2 to L1, receive a request to flush a GVA for L1 and then try to enter L2 back. The first request needs to be processed at this point. Similarly, requests to flush GVAs in L1 must wait until L2 exits to L1. No functional change as KVM doesn't handle L2 TLB flush requests from L2 yet. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-18-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:10 -05:00
Vitaly Kuznetsov	b6c2c22fa7	KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi() Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi() for a smaller number of vCPUs in the request. When Hyper-V TLB flush is in use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to send IPIs to a large number of vCPUs (and are rarely used in general). Introduce hv_is_vp_in_sparse_set() to directly check if the specified VP_ID is present in sparse vCPU set. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-17-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:09 -05:00
Vitaly Kuznetsov	ca7372aca7	KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw '64' It may not be clear from where the '64' limit for the maximum sparse bank number comes from, use HV_MAX_SPARSE_VCPU_BANKS define instead. Use HV_VCPUS_PER_SPARSE_BANK in KVM_HV_MAX_SPARSE_VCPU_SET_BITS's definition. Opportunistically adjust the comment around BUILD_BUG_ON(). No functional change. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-16-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:08 -05:00
Vitaly Kuznetsov	bd19c94a19	x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants It may not come clear from where the magical '64' value used in __cpumask_to_vpset() come from. Moreover, '64' means both the maximum sparse bank number as well as the number of vCPUs per bank. Add defines to make things clear. These defines are also going to be used by KVM. No functional change. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-15-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:07 -05:00
Vitaly Kuznetsov	aee738236d	KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs To handle L2 TLB flush requests, KVM needs to translate the specified L2 GPA to L1 GPA to read hypercall arguments from there. No functional change as KVM doesn't handle VMCALL/VMMCALL from L2 yet. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-14-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:07 -05:00
Vitaly Kuznetsov	f84fcb6656	KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls Extended GVA ranges support bit seems to indicate whether lower 12 bits of GVA can be used to specify up to 4095 additional consequent GVAs to flush. This is somewhat described in TLFS. Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} requests by flushing the whole VPID so technically, extended GVA ranges were already supported. As such requests are handled more gently now, advertizing support for extended ranges starts making sense to reduce the size of TLB flush requests. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-13-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:06 -05:00
Vitaly Kuznetsov	260970862c	KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently Currently, HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls are handled the exact same way as HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE{,EX}: by flushing the whole VPID and this is sub-optimal. Switch to handling these requests with 'flush_tlb_gva()' hooks instead. Use the newly introduced TLB flush fifo to queue the requests. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-12-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:05 -05:00
Sean Christopherson	56b5354fd8	KVM: x86: hyper-v: Add helper to read hypercall data for array Move the guts of kvm_get_sparse_vp_set() to a helper so that the code for reading a guest-provided array can be reused in the future, e.g. for getting a list of virtual addresses whose TLB entries need to be flushed. Opportunisticaly swap the order of the data and XMM adjustment so that the XMM/gpa offsets are bundled together. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-11-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:04 -05:00
Vitaly Kuznetsov	0823570f01	KVM: x86: hyper-v: Introduce TLB flush fifo To allow flushing individual GVAs instead of always flushing the whole VPID a per-vCPU structure to pass the requests is needed. Use standard 'kfifo' to queue two types of entries: individual GVA (GFN + up to 4095 following GFNs in the lower 12 bits) and 'flush all'. The size of the fifo is arbitrarily set to '16'. Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now and kvm_hv_vcpu_flush_tlb() doesn't actually read the fifo just resets the queue before returning -EOPNOTSUPP (which triggers full TLB flush) so the functional change is very small but the infrastructure is prepared to handle individual GVA flush requests. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-10-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:04 -05:00
Vitaly Kuznetsov	adc43caa0a	KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag In preparation to implementing fine-grained Hyper-V TLB flush and L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As KVM_REQ_TLB_FLUSH_GUEST is a stronger operation, clear KVM_REQ_HV_TLB_FLUSH request in kvm_vcpu_flush_tlb_guest(). The flush itself is temporary handled by kvm_vcpu_flush_tlb_guest(). No functional change intended. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:03 -05:00
Sean Christopherson	e94cea0930	KVM: x86: Move clearing of TLB_FLUSH_CURRENT to kvm_vcpu_flush_tlb_all() Clear KVM_REQ_TLB_FLUSH_CURRENT in kvm_vcpu_flush_tlb_all() instead of in its sole caller that processes KVM_REQ_TLB_FLUSH. Regardless of why/when kvm_vcpu_flush_tlb_all() is called, flushing "all" TLB entries also flushes "current" TLB entries. Ideally, there will never be another caller of kvm_vcpu_flush_tlb_all(), and moving the handling "requires" extra work to document the ordering requirement, but future Hyper-V paravirt TLB flushing support will add similar logic for flush "guest" (Hyper-V can flush a subset of "guest" entries). And in the Hyper-V case, KVM needs to do more than just clear the request, the queue of GPAs to flush also needs to purged, and doing all only in the request path is undesirable as kvm_vcpu_flush_tlb_guest() does have multiple callers (though it's unlikely KVM's paravirt TLB flush will coincide with Hyper-V's paravirt TLB flush). Move the logic even though it adds extra "work" so that KVM will be consistent with how flush requests are processed when the Hyper-V support lands. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:02 -05:00
Vitaly Kuznetsov	a789aeba41	KVM: VMX: Rename "vmx/evmcs.{ch}" to "vmx/hyperv.{ch}" To conform with SVM, rename VMX specific Hyper-V files from "evmcs.{ch}" to "hyperv.{ch}". While Enlightened VMCS is a lion's share of these files, some stuff (e.g. enlightened MSR bitmap, the upcoming Hyper-V L2 TLB flush, ...) goes beyond that. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:01 -05:00
Vitaly Kuznetsov	b83237ad21	KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush' To make terminology between Hyper-V-on-KVM and KVM-on-Hyper-V consistent, rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'. The change eliminates the use of confusing 'direct' and adds the missing underscore. No functional change. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:59:00 -05:00
Sean Christopherson	26b516bb39	x86/hyperv: KVM: Rename "hv_enlightenments" to "hv_vmcb_enlightenments" Now that KVM isn't littered with "struct hv_enlightenments" casts, rename the struct to "hv_vmcb_enlightenments" to highlight the fact that the struct is specifically for SVM's VMCB. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:58:59 -05:00
Sean Christopherson	68ae7c7bc5	KVM: SVM: Add a proper field for Hyper-V VMCB enlightenments Add a union to provide hv_enlightenments side-by-side with the sw_reserved bytes that Hyper-V's enlightenments overlay. Casting sw_reserved everywhere is messy, confusing, and unnecessarily unsafe. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:58:58 -05:00
Sean Christopherson	381fc63ac0	KVM: selftests: Move "struct hv_enlightenments" to x86_64/svm.h Move Hyper-V's VMCB "struct hv_enlightenments" to the svm.h header so that the struct can be referenced in "struct vmcb_control_area". Alternatively, a dedicated header for SVM+Hyper-V could be added, a la x86_64/evmcs.h, but it doesn't appear that Hyper-V will end up needing a wholesale replacement for the VMCB. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:58:58 -05:00
Sean Christopherson	089fe572a2	x86/hyperv: Move VMCB enlightenment definitions to hyperv-tlfs.h Move Hyper-V's VMCB enlightenment definitions to the TLFS header; the definitions come directly from the TLFS[], not from KVM. No functional change intended. [] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_svm_enlightened_vmcb_fields [vitaly: rename VMCB_HV_ -> HV_VMCB_ to match the rest of hyperv-tlfs.h, keep svm/hyperv.h] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 12:58:57 -05:00
Paolo Bonzini	6c7b2202e4	KVM: x86: avoid memslot check in NX hugepage recovery if it cannot succeed Since gfn_to_memslot() is relatively expensive, it helps to skip it if it the memslot cannot possibly have dirty logging enabled. In order to do this, add to struct kvm a counter of the number of log-page memslots. While the correct value can only be read with slots_lock taken, the NX recovery thread is content with using an approximate value. Therefore, the counter is an atomic_t. Based on https://lore.kernel.org/kvm/20221027200316.2221027-2-dmatlack@google.com/ by David Matlack. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-18 11:30:12 -05:00
Paolo Bonzini	771a579c6e	Merge branch 'kvm-svm-harden' into HEAD This fixes three issues in nested SVM: 1) in the shutdown_interception() vmexit handler we call kvm_vcpu_reset(). However, if running nested and L1 doesn't intercept shutdown, the function resets vcpu->arch.hflags without properly leaving the nested state. This leaves the vCPU in inconsistent state and later triggers a kernel panic in SVM code. The same bug can likely be triggered by sending INIT via local apic to a vCPU which runs a nested guest. On VMX we are lucky that the issue can't happen because VMX always intercepts triple faults, thus triple fault in L2 will always be redirected to L1. Plus, handle_triple_fault() doesn't reset the vCPU. INIT IPI can't happen on VMX either because INIT events are masked while in VMX mode. Secondarily, KVM doesn't honour SHUTDOWN intercept bit of L1 on SVM. A normal hypervisor should always intercept SHUTDOWN, a unit test on the other hand might want to not do so. Finally, the guest can trigger a kernel non rate limited printk on SVM from the guest, which is fixed as well. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:51:09 -05:00
Maxim Levitsky	05311ce954	KVM: x86: remove exit_int_info warning in svm_handle_exit It is valid to receive external interrupt and have broken IDT entry, which will lead to #GP with exit_int_into that will contain the index of the IDT entry (e.g any value). Other exceptions can happen as well, like #NP or #SS (if stack switch fails). Thus this warning can be user triggred and has very little value. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-10-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:41:10 -05:00
Maxim Levitsky	8357b9e19b	KVM: selftests: add svm part to triple_fault_test Add a SVM implementation to triple_fault_test to test that emulated/injected shutdown works. Since instead of the VMX, the SVM allows the hypervisor to avoid intercepting shutdown in guest, don't intercept shutdown to test that KVM suports this correctly. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-9-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:40:00 -05:00
Maxim Levitsky	92e7d5c83a	KVM: x86: allow L1 to not intercept triple fault This is SVM correctness fix - although a sane L1 would intercept SHUTDOWN event, it doesn't have to, so we have to honour this. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:39:59 -05:00
Maxim Levitsky	0bd2d3f487	kvm: selftests: add svm nested shutdown test Add test that tests that on SVM if L1 doesn't intercept SHUTDOWN, then L2 crashes L1 and doesn't crash L2 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:39:59 -05:00
Maxim Levitsky	fc6392d51d	KVM: selftests: move idt_entry to header struct idt_entry will be used for a test which will break IDT on purpose. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:39:58 -05:00
Maxim Levitsky	ed129ec905	KVM: x86: forcibly leave nested mode on vCPU reset While not obivous, kvm_vcpu_reset() leaves the nested mode by clearing 'vcpu->arch.hflags' but it does so without all the required housekeeping. On SVM, it is possible to have a vCPU reset while in guest mode because unlike VMX, on SVM, INIT's are not latched in SVM non root mode and in addition to that L1 doesn't have to intercept triple fault, which should also trigger L1's reset if happens in L2 while L1 didn't intercept it. If one of the above conditions happen, KVM will continue to use vmcb02 while not having in the guest mode. Later the IA32_EFER will be cleared which will lead to freeing of the nested guest state which will (correctly) free the vmcb02, but since KVM still uses it (incorrectly) this will lead to a use after free and kernel crash. This issue is assigned CVE-2022-3344 Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:39:57 -05:00
Maxim Levitsky	f9697df251	KVM: x86: add kvm_leave_nested add kvm_leave_nested which wraps a call to nested_ops->leave_nested into a function. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:39:56 -05:00
Maxim Levitsky	16ae56d7e0	KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use Make sure that KVM uses vmcb01 before freeing nested state, and warn if that is not the case. This is a minimal fix for CVE-2022-3344 making the kernel print a warning instead of a kernel panic. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:39:54 -05:00
Maxim Levitsky	917401f26a	KVM: x86: nSVM: leave nested mode on vCPU free If the VM was terminated while nested, we free the nested state while the vCPU still is in nested mode. Soon a warning will be added for this condition. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:39:53 -05:00
David Matlack	eb29860570	KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages Do not recover (i.e. zap) an NX Huge Page that is being dirty tracked, as it will just be faulted back in at the same 4KiB granularity when accessed by a vCPU. This may need to be changed if KVM ever supports 2MiB (or larger) dirty tracking granularity, or faulting huge pages during dirty tracking for reads/executes. However for now, these zaps are entirely wasteful. In order to check if this commit increases the CPU usage of the NX recovery worker thread I used a modified version of execute_perf_test [1] that supports splitting guest memory into multiple slots and reports /proc/pid/schedstat:se.sum_exec_runtime for the NX recovery worker just before tearing down the VM. The goal was to force a large number of NX Huge Page recoveries and see if the recovery worker used any more CPU. Test Setup: echo 1000 > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms echo 10 > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio Test Command: ./execute_perf_test -v64 -s anonymous_hugetlb_1gb -x 16 -o \| kvm-nx-lpage-re:se.sum_exec_runtime \| \| ---------------------------------------- \| Run \| Before \| After \| ------- \| ------------------ \| ------------------- \| 1 \| 730.084105 \| 724.375314 \| 2 \| 728.751339 \| 740.581988 \| 3 \| 736.264720 \| 757.078163 \| Comparing the median results, this commit results in about a 1% increase CPU usage of the NX recovery worker when testing a VM with 16 slots. However, the effect is negligible with the default halving time of NX pages, which is 1 hour rather than 10 seconds given by period_ms = 1000, ratio = 10. [1] https://lore.kernel.org/kvm/20221019234050.3919566-2-dmatlack@google.com/ Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20221103204421.1146958-1-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:26:35 -05:00
Paolo Bonzini	63d28a25e0	KVM: x86/mmu: simplify kvm_tdp_mmu_map flow when guest has to retry A removed SPTE is never present, hence the "if" in kvm_tdp_mmu_map only fails in the exact same conditions that the earlier loop tested in order to issue a "break". So, instead of checking twice the condition (upper level SPTEs could not be created or was frozen), just exit the loop with a goto---the usual poor-man C replacement for RAII early returns. While at it, do not use the "ret" variable for return values of functions that do not return a RET_PF_* enum. This is clearer and also makes it possible to initialize ret to RET_PF_RETRY. Suggested-by: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 11:10:25 -05:00
David Matlack	c4b33d28ea	KVM: x86/mmu: Split huge pages mapped by the TDP MMU on fault Now that the TDP MMU has a mechanism to split huge pages, use it in the fault path when a huge page needs to be replaced with a mapping at a lower level. This change reduces the negative performance impact of NX HugePages. Prior to this change if a vCPU executed from a huge page and NX HugePages was enabled, the vCPU would take a fault, zap the huge page, and mapping the faulting address at 4KiB with execute permissions enabled. The rest of the memory would be left unmapped and have to be faulted back in by the guest upon access (read, write, or execute). If guest is backed by 1GiB, a single execute instruction can zap an entire GiB of its physical address space. For example, it can take a VM longer to execute from its memory than to populate that memory in the first place: $ ./execute_perf_test -s anonymous_hugetlb_1gb -v96 Populating memory : 2.748378795s Executing from memory : 2.899670885s With this change, such faults split the huge page instead of zapping it, which avoids the non-present faults on the rest of the huge page: $ ./execute_perf_test -s anonymous_hugetlb_1gb -v96 Populating memory : 2.729544474s Executing from memory : 0.111965688s <--- This change also reduces the performance impact of dirty logging when eager_page_split=N. eager_page_split=N (abbreviated "eps=N" below) can be desirable for read-heavy workloads, as it avoids allocating memory to split huge pages that are never written and avoids increasing the TLB miss cost on reads of those pages. \| Config: ept=Y, tdp_mmu=Y, 5% writes \| \| Iteration 1 dirty memory time \| \| --------------------------------------------- \| vCPU Count \| eps=N (Before) \| eps=N (After) \| eps=Y \| ------------ \| -------------- \| ------------- \| ------------ \| 2 \| 0.332305091s \| 0.019615027s \| 0.006108211s \| 4 \| 0.353096020s \| 0.019452131s \| 0.006214670s \| 8 \| 0.453938562s \| 0.019748246s \| 0.006610997s \| 16 \| 0.719095024s \| 0.019972171s \| 0.007757889s \| 32 \| 1.698727124s \| 0.021361615s \| 0.012274432s \| 64 \| 2.630673582s \| 0.031122014s \| 0.016994683s \| 96 \| 3.016535213s \| 0.062608739s \| 0.044760838s \| Eager page splitting remains beneficial for write-heavy workloads, but the gap is now reduced. \| Config: ept=Y, tdp_mmu=Y, 100% writes \| \| Iteration 1 dirty memory time \| \| --------------------------------------------- \| vCPU Count \| eps=N (Before) \| eps=N (After) \| eps=Y \| ------------ \| -------------- \| ------------- \| ------------ \| 2 \| 0.317710329s \| 0.296204596s \| 0.058689782s \| 4 \| 0.337102375s \| 0.299841017s \| 0.060343076s \| 8 \| 0.386025681s \| 0.297274460s \| 0.060399702s \| 16 \| 0.791462524s \| 0.298942578s \| 0.062508699s \| 32 \| 1.719646014s \| 0.313101996s \| 0.075984855s \| 64 \| 2.527973150s \| 0.455779206s \| 0.079789363s \| 96 \| 2.681123208s \| 0.673778787s \| 0.165386739s \| Further study is needed to determine if the remaining gap is acceptable for customer workloads or if eager_page_split=N still requires a-priori knowledge of the VM workload, especially when considering these costs extrapolated out to large VMs with e.g. 416 vCPUs and 12TB RAM. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20221109185905.486172-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-11-17 10:52:48 -05:00
Paolo Bonzini	92292c1de2	KVM selftests updates for 6.2 perf_util: - Add support for pinning vCPUs in dirty_log_perf_test. - Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the so called "perf util" tests. - Rename the so called "perf_util" framework to "memstress". ucall: - Add a common pool-based ucall implementation (code dedup and pre-work for running SEV (and beyond) guests in selftests. - Fix an issue in ARM's single-step test when using the new pool-based implementation; atomics don't play nice with single-step exceptions. init: - Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). x86: - Clean up x86's page tabe management. - Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. - Clean up the nEPT support checks. - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmN1iEgSHHNlYW5qY0Bn b29nbGUuY29tAAoJEGCRIgFNDBL5L8IP/0ep959xR64MGFi+6w/nbHW7/nz1dzo3 A+/58HkU0cdMB8ycw5oaN+wwQKE02mCLNRY1GecddL/Y3+L073KXSUAu5dCc5KtW bIiX45c+HEdxsKr474zawvoIETkESo+dblMEptK5D/y+iBdazTIxzwMsiuGrYdCf Xd8deNMzgzZQDB74UeRDvedW6VSMIzw4XT1tTpdKwFkrAmGa+I3hBbcMVWXPrsJ8 cTDFfjeWIX5flJYV7Mjb+WNdgb9av2srLn7d7dOibCoSedseYJr2E1lVA5i9z0N4 Kbv60bA0O4yEi8xpkBwlCQJTT9u3NBkEfB0efr3HNJzKKc1BUWYITBHaZbJOSVsD ZmrqmeKlboRDsfI3wG7HZyzgD+QOJJKi5b1ooV8iIhLbt7iW2jpo9zZoaImYR+dp /VS+aLWLvJcPiepvMp2NM+vQfq0rYyQim3izIeoCqJPKTOpKbRIJU2FVNmXyF6wy ryeuNdYPsnxRpYjWCXTn7c6VSOVhq9D5AbV679TTJC/VpjdJci0U1coTCK3NtnrW Yd4GDmF3dlFI/4b3uH98Xzk7QKowHbGul3FmqoANKUOTE7SgUIHavCaT8FTc80Lr tQhoRqaYnYvkG6jeTzmOCtj9lz7pjMV5+l+dtN+aGHosd76Vjt3hzPdb9r1XeYhr 9CdjnwbQPqKG =t+ut -----END PGP SIGNATURE----- Merge tag 'kvm-selftests-6.2-1' of https://github.com/kvm-x86/linux into HEAD KVM selftests updates for 6.2 perf_util: - Add support for pinning vCPUs in dirty_log_perf_test. - Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the so called "perf util" tests. - Rename the so called "perf_util" framework to "memstress". ucall: - Add a common pool-based ucall implementation (code dedup and pre-work for running SEV (and beyond) guests in selftests. - Fix an issue in ARM's single-step test when using the new pool-based implementation; LDREX/STREX don't play nice with single-step exceptions. init: - Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). x86: - Clean up x86's page tabe management. - Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. - Clean up the nEPT support checks. - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.	2022-11-17 09:09:29 -05:00
David Matlack	ecb89a5172	KVM: selftests: Check for KVM nEPT support using "feature" MSRs When checking for nEPT support in KVM, use kvm_get_feature_msr() instead of vcpu_get_msr() to retrieve KVM's default TRUE_PROCBASED_CTLS and PROCBASED_CTLS2 MSR values, i.e. don't require a VM+vCPU to query nEPT support. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20220927165209.930904-1-dmatlack@google.com [sean: rebase on merged code, write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-11-16 16:59:07 -08:00
David Matlack	5c107f7085	KVM: selftests: Assert in prepare_eptp() that nEPT is supported Now that a VM isn't needed to check for nEPT support, assert that KVM supports nEPT in prepare_eptp() instead of skipping the test, and push the TEST_REQUIRE() check out to individual tests. The require+assert are somewhat redundant and will incur some amount of ongoing maintenance burden, but placing the "require" logic in the test makes it easier to find/understand a test's requirements and in this case, provides a very strong hint that the test cares about nEPT. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20220927165209.930904-1-dmatlack@google.com [sean: rebase on merged code, write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-11-16 16:59:07 -08:00
Sean Christopherson	b941ba2380	KVM: selftests: Drop helpers for getting specific KVM supported CPUID entry Drop kvm_get_supported_cpuid_entry() and its inner helper now that all known usage can use X86_FEATURE_, X86_PROPERTY_, X86_PMU_FEATURE_*, or the dedicated Family/Model helpers. Providing "raw" access to CPUID leafs is undesirable as it encourages open coding CPUID checks, which is often error prone and not self-documenting. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006005125.680782-13-seanjc@google.com	2022-11-16 16:59:07 -08:00
Sean Christopherson	074e9d4c9c	KVM: selftests: Add and use KVM helpers for x86 Family and Model Add KVM variants of the x86 Family and Model helpers, and use them in the PMU event filter test. Open code the retrieval of KVM's supported CPUID entry 0x1.0 in anticipation of dropping kvm_get_supported_cpuid_entry(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006005125.680782-12-seanjc@google.com	2022-11-16 16:59:06 -08:00
Sean Christopherson	24f3f9898e	KVM: selftests: Add dedicated helpers for getting x86 Family and Model Add dedicated helpers for getting x86's Family and Model, which are the last holdouts that "need" raw access to CPUID information. FMS info is a mess and requires not only splicing together multiple values, but requires doing so conditional in the Family case. Provide wrappers to reduce the odds of copy+paste errors, but mostly to allow for the eventual removal of kvm_get_supported_cpuid_entry(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006005125.680782-11-seanjc@google.com	2022-11-16 16:59:06 -08:00
Sean Christopherson	5228c02a4c	KVM: selftests: Add PMU feature framework, use in PMU event filter test Add an X86_PMU_FEATURE_* framework to simplify probing architectural events on Intel PMUs, which require checking the length of a bit vector and the _absence_ of a "feature" bit. Add helpers for both KVM and "this CPU", and use the newfangled magic (along with X86_PROPERTY_*) to clean up pmu_event_filter_test. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006005125.680782-10-seanjc@google.com	2022-11-16 16:59:05 -08:00
Sean Christopherson	4feb9d21a4	KVM: selftests: Convert vmx_pmu_caps_test to use X86_PROPERTY_* Add X86_PROPERTY_PMU_VERSION and use it in vmx_pmu_caps_test to replace open coded versions of the same functionality. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006005125.680782-9-seanjc@google.com	2022-11-16 16:59:05 -08:00
Sean Christopherson	5dc19f1c7d	KVM: selftests: Convert AMX test to use X86_PROPRETY_XXX Add and use x86 "properties" for the myriad AMX CPUID values that are validated by the AMX test. Drop most of the test's single-usage helpers so that the asserts more precisely capture what check failed. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006005125.680782-8-seanjc@google.com	2022-11-16 16:59:05 -08:00
Sean Christopherson	40854713e3	KVM: selftests: Add kvm_cpu_() support for X86_PROPERTY_ Extent X86_PROPERTY_* support to KVM, i.e. add kvm_cpu_property() and kvm_cpu_has_p(), and use the new helpers in kvm_get_cpu_address_width(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006005125.680782-7-seanjc@google.com	2022-11-16 16:59:04 -08:00
Sean Christopherson	a29e6e383b	KVM: selftests: Refactor kvm_cpuid_has() to prep for X86_PROPERTY_* support Refactor kvm_cpuid_has() to prepare for extending X86_PROPERTY_* support to KVM as well as "this CPU". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221006005125.680782-6-seanjc@google.com	2022-11-16 16:59:04 -08:00

1 2 3 4 5 ...

1137474 Commits