linux

Author	SHA1	Message	Date
Mark Rutland	72e3244512	kvm/mips: rework guest entry logic In kvm_arch_vcpu_ioctl_run() we use guest_enter_irqoff() and guest_exit_irqoff() directly, with interrupts masked between these. As we don't handle any timer ticks during this window, we will not account time spent within the guest as guest time, which is unfortunate. Additionally, we do not inform lockdep or tracing that interrupts will be enabled during guest execution, which caan lead to misleading traces and warnings that interrupts have been enabled for overly-long periods. This patch fixes these issues by using the new timing and context entry/exit helpers to ensure that interrupts are handled during guest vtime but with RCU watching, with a sequence: guest_timing_enter_irqoff(); guest_state_enter_irqoff(); < run the vcpu > guest_state_exit_irqoff(); < take any pending IRQs > guest_timing_exit_irqoff(); In addition, as guest exits during the "run the vcpu" step are handled by kvm_mips_handle_exit(), a wrapper function is added which ensures that such exists are handled with a sequence: guest_state_exit_irqoff(); < handle the exit > guest_state_enter_irqoff(); This means that exits which stop the vCPU running will have a redundant guest_state_enter_irqoff() .. guest_state_exit_irqoff() sequence, which can be addressed with future rework. Since instrumentation may make use of RCU, we must also ensure that no instrumented code is run during the EQS. I've split out the critical section into a new kvm_mips_enter_exit_vcpu() helper which is marked noinstr. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Message-Id: <20220201132926.3301912-6-mark.rutland@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-01 08:51:53 -05:00
Mark Rutland	ef9989afda	kvm: add guest_state_{enter,exit}_irqoff() When transitioning to/from guest mode, it is necessary to inform lockdep, tracing, and RCU in a specific order, similar to the requirements for transitions to/from user mode. Additionally, it is necessary to perform vtime accounting for a window around running the guest, with RCU enabled, such that timer interrupts taken from the guest can be accounted as guest time. Most architectures don't handle all the necessary pieces, and a have a number of common bugs, including unsafe usage of RCU during the window between guest_enter() and guest_exit(). On x86, this was dealt with across commits: `87fa7f3e98` ("x86/kvm: Move context tracking where it belongs") `0642391e21` ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit") `9fc975e9ef` ("x86/kvm/svm: Add hardirq tracing on guest enter/exit") `3ebccdf373` ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text") `135961e0a7` ("x86/kvm/svm: Move guest enter/exit into .noinstr.text") `1604571401` ("KVM: x86: Defer vtime accounting 'til after IRQ handling") `bc908e091b` ("KVM: x86: Consolidate guest enter/exit logic to common helpers") ... but those fixes are specific to x86, and as the resulting logic (while correct) is split across generic helper functions and x86-specific helper functions, it is difficult to see that the entry/exit accounting is balanced. This patch adds generic helpers which architectures can use to handle guest entry/exit consistently and correctly. The guest_{enter,exit}() helpers are split into guest_timing_{enter,exit}() to perform vtime accounting, and guest_context_{enter,exit}() to perform the necessary context tracking and RCU management. The existing guest_{enter,exit}() heleprs are left as wrappers of these. Atop this, new guest_state_enter_irqoff() and guest_state_exit_irqoff() helpers are added to handle the ordering of lockdep, tracing, and RCU manageent. These are inteneded to mirror exit_to_user_mode() and enter_from_user_mode(). Subsequent patches will migrate architectures over to the new helpers, following a sequence: guest_timing_enter_irqoff(); guest_state_enter_irqoff(); < run the vcpu > guest_state_exit_irqoff(); < take any pending IRQs > guest_timing_exit_irqoff(); This sequences handles all of the above correctly, and more clearly balances the entry and exit portions, making it easier to understand. The existing helpers are marked as deprecated, and will be removed once all architectures have been converted. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Message-Id: <20220201132926.3301912-2-mark.rutland@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-01 08:51:18 -05:00
Sean Christopherson	57dfd7b53d	KVM: x86: Move delivery of non-APICv interrupt into vendor code Handle non-APICv interrupt delivery in vendor code, even though it means VMX and SVM will temporarily have duplicate code. SVM's AVIC has a race condition that requires KVM to fall back to legacy interrupt injection _after_ the interrupt has been logged in the vIRR, i.e. to fix the race, SVM will need to open code the full flow anyways[]. Refactor the code so that the SVM bug without introducing other issues, e.g. SVM would return "success" and thus invoke trace_kvm_apicv_accept_irq() even when delivery through the AVIC failed, and to opportunistically prepare for using KVM_X86_OP to fill each vendor's kvm_x86_ops struct, which will rely on the vendor function matching the kvm_x86_op pointer name. No functional change intended. [] https://lore.kernel.org/all/20211213104634.199141-4-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-01 06:03:41 -05:00
Janosch Frank	f6c6804c43	kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h This way we can more easily find the next free IOCTL number when adding new IOCTLs. Fixes: `be50b2065d` ("kvm: x86: Add support for getting/setting expanded xstate buffer") Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-Id: <20220128154025.102666-1-frankja@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-31 09:21:56 -05:00
Paolo Bonzini	17179d0068	KVM/arm64 fixes for 5.17, take #1 - Correctly update the shadow register on exception injection when running in nVHE mode - Correctly use the mm_ops indirection when performing cache invalidation from the page-table walker - Restrict the vgic-v3 workaround for SEIS to the two known broken implementations -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmHzv5EPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpD0DcQAMF0hcKYxwuXi+UwQ8u5SsrpQQZ1BWC6euvB FFQiUPANXq/u0xM2kV+5FhjEfHqqjnh7nYLVKpBcetcvGSfWUnZlVI4DKI+5pdte PTa/minS5sq9BDZ/clRnnomNw0UwtH2OLeolg7+UAqBMihicddVBBU6IqvY1Nx+z F2qovZa3Qqb1EB+9+hPS+qGcjlguaBOEzrJ9uIaw532G1JD1K9hhMlabdhJhiJA3 gWuUJO+cuYEdctli+OJb9g92zIDt0hVP+/1tndlbib5BUw6e2vkdyKF0+/7u77xr SDKNmUosvZt/fABZpv6ycgRszoKRjBCIC5takQCZI/l2QzZFbiP/414E8L0J/zLV PI8e1bs/H9pBF3c7WG+if/3jYs+D+/nYhkE+PeW3k5lxzsHo7XE5ei6mzoxzBusC l4c0QQ7lpwep4dOWm4oRxzE0/9IONgVKKlIKGBkpSbtznDkAToTWobAIFVeZj+nm BVxf+A6ddcnQSzXYa/FUsfV3ZEsJVPSs/DL6mBBJuG8lxNzZnabkt+ODfXuhyrXe 6kGkF9+4HE9XyItieZVDUgRcZ9x57c+3q7A9b7Kl+Ds1Z+hsu0tVqghf5YVQAj3a 4IkOBdPEtaGCSrJWxupX+oimCXqdNfbnOqf4VsO8l1O0O8WBRvYaqYL2RKR32kX2 n3nzO/vE =BKqv -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #1 - Correctly update the shadow register on exception injection when running in nVHE mode - Correctly use the mm_ops indirection when performing cache invalidation from the page-table walker - Restrict the vgic-v3 workaround for SEIS to the two known broken implementations	2022-01-28 07:45:15 -05:00
Hou Wenlong	6a0c61703e	KVM: eventfd: Fix false positive RCU usage warning Fix the following false positive warning: ============================= WARNING: suspicious RCU usage 5.16.0-rc4+ #57 Not tainted ----------------------------- arch/x86/kvm/../../../virt/kvm/eventfd.c:484 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by fc_vcpu 0/330: #0: ffff8884835fc0b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x88/0x6f0 [kvm] #1: ffffc90004c0bb68 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0x600/0x1860 [kvm] #2: ffffc90004c0c1d0 (&kvm->irq_srcu){....}-{0:0}, at: kvm_notify_acked_irq+0x36/0x180 [kvm] stack backtrace: CPU: 26 PID: 330 Comm: fc_vcpu 0 Not tainted 5.16.0-rc4+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x44/0x57 kvm_notify_acked_gsi+0x6b/0x70 [kvm] kvm_notify_acked_irq+0x8d/0x180 [kvm] kvm_ioapic_update_eoi+0x92/0x240 [kvm] kvm_apic_set_eoi_accelerated+0x2a/0xe0 [kvm] handle_apic_eoi_induced+0x3d/0x60 [kvm_intel] vmx_handle_exit+0x19c/0x6a0 [kvm_intel] vcpu_enter_guest+0x66e/0x1860 [kvm] kvm_arch_vcpu_ioctl_run+0x438/0x7f0 [kvm] kvm_vcpu_ioctl+0x38a/0x6f0 [kvm] __x64_sys_ioctl+0x89/0xc0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Since kvm_unregister_irq_ack_notifier() does synchronize_srcu(&kvm->irq_srcu), kvm->irq_ack_notifier_list is protected by kvm->irq_srcu. In fact, kvm->irq_srcu SRCU read lock is held in kvm_notify_acked_irq(), making it a false positive warning. So use hlist_for_each_entry_srcu() instead of hlist_for_each_entry_rcu(). Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com> Message-Id: <f98bac4f5052bad2c26df9ad50f7019e40434512.1643265976.git.houwenlong.hwl@antgroup.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:27 -05:00
Vitaly Kuznetsov	6cbbaab60f	KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when Enlightened VMCS interface is in use: "Any VMREAD or VMWRITE instructions while an enlightened VMCS is active is unsupported and can result in unexpected behavior."" Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field 0x4404 ("VM-exit interruption information") are observed. Failing these attempts with nested_vmx_failInvalid() makes such guests unbootable. Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed eventually but for the time being we need a workaround. (Temporary) allow VMREAD to get data from the currently loaded Enlightened VMCS. Note: VMWRITE instructions remain forbidden, it is not clear how to handle them properly and hopefully won't ever be needed. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:26 -05:00
Vitaly Kuznetsov	892a42c10d	KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread() In preparation to allowing reads from Enlightened VMCS from handle_vmread(), implement evmcs_field_offset() to get the correct read offset. get_evmcs_offset(), which is being used by KVM-on-Hyper-V, is almost what's needed but a few things need to be adjusted. First, WARN_ON() is unacceptable for handle_vmread() as any field can (in theory) be supplied by the guest and not all fields are defined in eVMCS v1. Second, we need to handle 'holes' in eVMCS (missing fields). It also sounds like a good idea to WARN_ON() if such fields are ever accessed by KVM-on-Hyper-V. Implement dedicated evmcs_field_offset() helper. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:26 -05:00
Vitaly Kuznetsov	2423a4c0d1	KVM: nVMX: Rename vmcs_to_field_offset{,_table} vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque blob which is not supposed to be accessed directly. In fact, vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure. Rename vmcs_field_to_offset() to get_vmcs12_field_offset() for clarity. No functional change intended. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:26 -05:00
Vitaly Kuznetsov	7a601e2cf6	KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field, PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too. Note, none of the currently existing Windows/Hyper-V versions are known to enable 'save VMX-preemption timer value' when eVMCS is in use, the change is aimed at making the filtering future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:25 -05:00
Vitaly Kuznetsov	f80ae0ef08	KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS, MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair, MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely on 'true' MSR data. Note, none of the currently existing Windows/Hyper-V versions are known to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change is aimed at making the filtering future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:25 -05:00
Paolo Bonzini	b19c99b9f4	selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP Provide coverage for the new API. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:38:25 -05:00
Paolo Bonzini	dd6e631220	KVM: x86: add system attribute to retrieve full set of supported xsave states Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that have not been enabled. Probing those, for example so that they can be passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl. The latter is in fact worse, even though that is what the rest of the API uses, because it would require supported_xcr0 to be moved from the KVM module to the kernel just for this use. In addition, the value would be nonsensical (or an error would have to be returned) until the KVM module is loaded in. Therefore, to limit the growth of system ioctls, add a /dev/kvm variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86 with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:33:32 -05:00
Sean Christopherson	56f289a8d2	KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr Add a helper to handle converting the u64 userspace address embedded in struct kvm_device_attr into a userspace pointer, it's all too easy to forget the intermediate "unsigned long" cast as well as the truncation check. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-28 07:32:00 -05:00
Paolo Bonzini	dd4516aee3	selftests: kvm: move vm_xsave_req_perm call to amx_test There is no need for tests other than amx_test to enable dynamic xsave states. Remove the call to vm_xsave_req_perm from generic code, and move it inside the test. While at it, allow customizing the bit that is requested, so that future tests can use it differently. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:45:20 -05:00
Like Xu	05a9e06505	KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time XCR0 is reset to 1 by RESET but not INIT and IA32_XSS is zeroed by both RESET and INIT. The kvm_set_msr_common()'s handling of MSR_IA32_XSS also needs to update kvm_update_cpuid_runtime(). In the above cases, the size in bytes of the XSAVE area containing all states enabled by XCR0 or (XCRO \| IA32_XSS) needs to be updated. For simplicity and consistency, existing helpers are used to write values and call kvm_update_cpuid_runtime(), and it's not exactly a fast path. Fixes: `a554d207dc` ("KVM: X86: Processor States following Reset or INIT") Cc: stable@vger.kernel.org Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220126172226.2298529-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:42:45 -05:00
Like Xu	4c282e51e4	KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS Do a runtime CPUID update for a vCPU if MSR_IA32_XSS is written, as the size in bytes of the XSAVE area is affected by the states enabled in XSS. Fixes: `203000993d` ("kvm: vmx: add MSR logic for XSAVES") Cc: stable@vger.kernel.org Signed-off-by: Like Xu <likexu@tencent.com> [sean: split out as a separate patch, adjust Fixes tag] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220126172226.2298529-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:42:45 -05:00
Xiaoyao Li	be4f3b3f82	KVM: x86: Keep MSR_IA32_XSS unchanged for INIT It has been corrected from SDM version 075 that MSR_IA32_XSS is reset to zero on Power up and Reset but keeps unchanged on INIT. Fixes: `a554d207dc` ("KVM: X86: Processor States following Reset or INIT") Cc: stable@vger.kernel.org Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220126172226.2298529-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:42:44 -05:00
Sean Christopherson	811f95ff95	KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2} Free the "struct kvm_cpuid_entry2" array on successful post-KVM_RUN KVM_SET_CPUID{,2} to fix a memory leak, the callers of kvm_set_cpuid() free the array only on failure. BUG: memory leak unreferenced object 0xffff88810963a800 (size 2048): comm "syz-executor025", pid 3610, jiffies 4294944928 (age 8.080s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 0d 00 00 00 ................ 47 65 6e 75 6e 74 65 6c 69 6e 65 49 00 00 00 00 GenuntelineI.... backtrace: [<ffffffff814948ee>] kmalloc_node include/linux/slab.h:604 [inline] [<ffffffff814948ee>] kvmalloc_node+0x3e/0x100 mm/util.c:580 [<ffffffff814950f2>] kvmalloc include/linux/slab.h:732 [inline] [<ffffffff814950f2>] vmemdup_user+0x22/0x100 mm/util.c:199 [<ffffffff8109f5ff>] kvm_vcpu_ioctl_set_cpuid2+0x8f/0xf0 arch/x86/kvm/cpuid.c:423 [<ffffffff810711b9>] kvm_arch_vcpu_ioctl+0xb99/0x1e60 arch/x86/kvm/x86.c:5251 [<ffffffff8103e92d>] kvm_vcpu_ioctl+0x4ad/0x950 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4066 [<ffffffff815afacc>] vfs_ioctl fs/ioctl.c:51 [inline] [<ffffffff815afacc>] __do_sys_ioctl fs/ioctl.c:874 [inline] [<ffffffff815afacc>] __se_sys_ioctl fs/ioctl.c:860 [inline] [<ffffffff815afacc>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860 [<ffffffff844a3335>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff844a3335>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `c6617c61e8` ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN") Cc: stable@vger.kernel.org Reported-by: syzbot+be576ad7655690586eec@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220125210445.2053429-1-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:42:31 -05:00
Sean Christopherson	d6e656cd26	KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 WARN if KVM attempts to allocate a shadow VMCS for vmcs02. KVM emulates VMCS shadowing but doesn't virtualize it, i.e. KVM should never allocate a "real" shadow VMCS for L2. The previous code WARNed but continued anyway with the allocation, presumably in an attempt to avoid NULL pointer dereference. However, alloc_vmcs (and hence alloc_shadow_vmcs) can fail, and indeed the sole caller does: if (enable_shadow_vmcs && !alloc_shadow_vmcs(vcpu)) goto out_shadow_vmcs; which makes it not a useful attempt. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220125220527.2093146-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:04 -05:00
Sean Christopherson	4cf3d3ebe8	KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest Don't skip the vmcall() in l2_guest_code() prior to re-entering L2, doing so will result in L2 running to completion, popping '0' off the stack for RET, jumping to address '0', and ultimately dying with a triple fault shutdown. It's not at all obvious why the test re-enters L2 and re-executes VMCALL, but presumably it serves a purpose. The VMX path doesn't skip vmcall(), and the test can't possibly have passed on SVM, so just do what VMX does. Fixes: `d951b2210c` ("KVM: selftests: smm_test: Test SMM enter from L2") Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220125221725.2101126-1-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:04 -05:00
Vitaly Kuznetsov	033a3ea59a	KVM: x86: Check .flags in kvm_cpuid_check_equal() too kvm_cpuid_check_equal() checks for the (full) equality of the supplied CPUID data so .flags need to be checked too. Reported-by: Sean Christopherson <seanjc@google.com> Fixes: `c6617c61e8` ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220126131804.2839410-1-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:03 -05:00
Sean Christopherson	f7e570780e	KVM: x86: Forcibly leave nested virt when SMM state is toggled Forcibly leave nested virtualization operation if userspace toggles SMM state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS. If userspace forces the vCPU out of SMM while it's post-VMXON and then injects an SMI, vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both vmxon=false and smm.vmxon=false, but all other nVMX state allocated. Don't attempt to gracefully handle the transition as (a) most transitions are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't sufficient information to handle all transitions, e.g. SVM wants access to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede KVM_SET_NESTED_STATE during state restore as the latter disallows putting the vCPU into L2 if SMM is active, and disallows tagging the vCPU as being post-VMXON in SMM if SMM is not active. Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU in an architecturally impossible state. WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Modules linked in: CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00 Call Trace: <TASK> kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline] kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460 kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline] kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline] kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250 kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273 __fput+0x286/0x9f0 fs/file_table.c:311 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xb29/0x2a30 kernel/exit.c:806 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 get_signal+0x4b0/0x28c0 kernel/signal.c:2862 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Cc: stable@vger.kernel.org Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220125220358.2091737-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:03 -05:00
Vitaly Kuznetsov	aa3b39f38c	KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments() Commit `3fa5e8fd0a` ("KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized") re-arranged svm_vcpu_init_msrpm() call in svm_create_vcpu(), thus making the comment about vmcb being NULL obsolete. Drop it. While on it, drop superfluous vmcb_is_clean() check: vmcb_mark_dirty() is a bit flip, an extra check is unlikely to bring any performance gain. Drop now-unneeded vmcb_is_clean() helper as well. Fixes: `3fa5e8fd0a` ("KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211220152139.418372-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:03 -05:00
Vitaly Kuznetsov	38dfa8308c	KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real Commit `c4327f15df` ("KVM: SVM: hyper-v: Enlightened MSR-Bitmap support") introduced enlightened MSR-Bitmap support for KVM-on-Hyper-V but it didn't actually enable the support. Similar to enlightened NPT TLB flush and direct TLB flush features, the guest (KVM) has to tell L0 (Hyper-V) that it's using the feature by setting the appropriate feature fit in VMCB control area (sw reserved fields). Fixes: `c4327f15df` ("KVM: SVM: hyper-v: Enlightened MSR-Bitmap support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211220152139.418372-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:02 -05:00
Sean Christopherson	cdf85e0c5d	KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode Inject a #GP instead of synthesizing triple fault to try to avoid killing the guest if emulation of an SEV guest fails due to encountering the SMAP erratum. The injected #GP may still be fatal to the guest, e.g. if the userspace process is providing critical functionality, but KVM should make every attempt to keep the guest alive. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:02 -05:00
Sean Christopherson	3280cc22ae	KVM: SVM: Don't apply SEV+SMAP workaround on code fetch or PT access Resume the guest instead of synthesizing a triple fault shutdown if the instruction bytes buffer is empty due to the #NPF being on the code fetch itself or on a page table access. The SMAP errata applies if and only if the code fetch was successful and ucode's subsequent data read from the code page encountered a SMAP violation. In practice, the guest is likely hosed either way, but crashing the guest on a code fetch to emulated MMIO is technically wrong according to the behavior described in the APM. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:01 -05:00
Sean Christopherson	04c40f344d	KVM: SVM: Inject #UD on attempted emulation for SEV guest w/o insn buffer Inject #UD if KVM attempts emulation for an SEV guests without an insn buffer and instruction decoding is required. The previous behavior of allowing emulation if there is no insn buffer is undesirable as doing so means KVM is reading guest private memory and thus decoding cyphertext, i.e. is emulating garbage. The check was previously necessary as the emulation type was not provided, i.e. SVM needed to allow emulation to handle completion of emulation after exiting to userspace to handle I/O. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:01 -05:00
Sean Christopherson	132627c64d	KVM: SVM: WARN if KVM attempts emulation on #UD or #GP for SEV guests WARN if KVM attempts to emulate in response to #UD or #GP for SEV guests, i.e. if KVM intercepts #UD or #GP, as emulation on any fault except #NPF is impossible since KVM cannot read guest private memory to get the code stream, and the CPU's DecodeAssists feature only provides the instruction bytes on #NPF. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-7-seanjc@google.com> [Warn on EMULTYPE_TRAP_UD_FORCED according to Liam Merwick's review. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:01 -05:00
Sean Christopherson	4d31d9eff2	KVM: x86: Pass emulation type to can_emulate_instruction() Pass the emulation type to kvm_x86_ops.can_emulate_insutrction() so that a future commit can harden KVM's SEV support to WARN on emulation scenarios that should never happen. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:00 -05:00
Sean Christopherson	c532f2903b	KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support Add a sanity check on DECODEASSIST being support if SEV is supported, as KVM cannot read guest private memory and thus relies on the CPU to provide the instruction byte stream on #NPF for emulation. The intent of the check is to document the dependency, it should never fail in practice as producing hardware that supports SEV but not DECODEASSISTS would be non-sensical. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:00 -05:00
Sean Christopherson	0b0be065b7	KVM: SVM: Don't intercept #GP for SEV guests Never intercept #GP for SEV guests as reading SEV guest private memory will return cyphertext, i.e. emulating on #GP can't work as intended. Cc: stable@vger.kernel.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:15:00 -05:00
Sean Christopherson	31c2558569	Revert "KVM: SVM: avoid infinite loop on NPF from bad address" Revert a completely broken check on an "invalid" RIP in SVM's workaround for the DecodeAssists SMAP errata. kvm_vcpu_gfn_to_memslot() obviously expects a gfn, i.e. operates in the guest physical address space, whereas RIP is a virtual (not even linear) address. The "fix" worked for the problematic KVM selftest because the test identity mapped RIP. Fully revert the hack instead of trying to translate RIP to a GPA, as the non-SEV case is now handled earlier, and KVM cannot access guest page tables to translate RIP. This reverts commit `e72436bc3a`. Fixes: `e72436bc3a` ("KVM: SVM: avoid infinite loop on NPF from bad address") Reported-by: Liam Merwick <liam.merwick@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:14:59 -05:00
Sean Christopherson	55467fcd55	KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests Always signal that emulation is possible for !SEV guests regardless of whether or not the CPU provided a valid instruction byte stream. KVM can read all guest state (memory and registers) for !SEV guests, i.e. can fetch the code stream from memory even if the CPU failed to do so because of the SMAP errata. Fixes: `05d5a48635` ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)") Cc: stable@vger.kernel.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:14:59 -05:00
Denis Valeev	47c28d436f	KVM: x86: nSVM: skip eax alignment check for non-SVM instructions The bug occurs on #GP triggered by VMware backdoor when eax value is unaligned. eax alignment check should not be applied to non-SVM instructions because it leads to incorrect omission of the instructions emulation. Apply the alignment check only to SVM instructions to fix. Fixes: `d1cba6c922` ("KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround") Signed-off-by: Denis Valeev <lemniscattaden@gmail.com> Message-Id: <Yexlhaoe1Fscm59u@q> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:14:59 -05:00
Like Xu	1ffce0924a	KVM: x86/cpuid: Exclude unpermitted xfeatures sizes at KVM_GET_SUPPORTED_CPUID With the help of xstate_get_guest_group_perm(), KVM can exclude unpermitted xfeatures in cpuid.0xd.0.eax, in which case the corresponding xfeatures sizes should also be matched to the permitted xfeatures. To fix this inconsistency, the permitted_xcr0 and permitted_xss are defined consistently, which implies 'supported' plus certain permissions for this task, and it also fixes cpuid.0xd.1.ebx and later leaf-by-leaf queries. Fixes: `445ecdf79b` ("kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID") Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220125115223.33707-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:14:58 -05:00
Wanpeng Li	35fe7cfbab	KVM: LAPIC: Also cancel preemption timer during SET_LAPIC The below warning is splatting during guest reboot. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1931 at arch/x86/kvm/x86.c:10322 kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm] CPU: 0 PID: 1931 Comm: qemu-system-x86 Tainted: G I 5.17.0-rc1+ #5 RIP: 0010:kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm] Call Trace: <TASK> kvm_vcpu_ioctl+0x279/0x710 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fd39797350b This can be triggered by not exposing tsc-deadline mode and doing a reboot in the guest. The lapic_shutdown() function which is called in sys_reboot path will not disarm the flying timer, it just masks LVTT. lapic_shutdown() clears APIC state w/ LVT_MASKED and timer-mode bit is 0, this can trigger timer-mode switch between tsc-deadline and oneshot/periodic, which can result in preemption timer be cancelled in apic_update_lvtt(). However, We can't depend on this when not exposing tsc-deadline mode and oneshot/periodic modes emulated by preemption timer. Qemu will synchronise states around reset, let's cancel preemption timer under KVM_SET_LAPIC. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1643102220-35667-1-git-send-email-wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:14:58 -05:00
Jim Mattson	519669cc58	KVM: VMX: Remove vmcs_config.order The maximum size of a VMCS (or VMXON region) is 4096. By definition, these are order 0 allocations. Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220125004359.147600-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-26 12:14:58 -05:00
Quanfa Fu	d081a343dd	KVM/X86: Make kvm_vcpu_reload_apic_access_page() static Make kvm_vcpu_reload_apic_access_page() static as it is no longer invoked directly by vmx and it is also no longer exported. No functional change intended. Signed-off-by: Quanfa Fu <quanfafu@gmail.com> Message-Id: <20211219091446.174584-1-quanfafu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-25 09:40:20 -05:00
David Matlack	de1956f485	KVM: selftests: Re-enable access_tracking_perf_test This selftest was accidentally removed by commit `6a58150859` ("selftest: KVM: Add intra host migration tests"). Add it back. Fixes: `6a58150859` ("selftest: KVM: Add intra host migration tests") Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220120003826.2805036-1-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-25 09:40:20 -05:00
Sean Christopherson	b9bed78e2f	KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or MOVSS blocking is active. Setting the flag is necessary to make VM-Entry consistency checks happy, as VMX has an invariant that if RFLAGS.TF is set and STI/MOVSS blocking is true, then the previous instruction must have been STI or MOV/POP, and therefore a single-step #DB must be pending since the RFLAGS.TF cannot have been set by the previous instruction, i.e. the one instruction delay after setting RFLAGS.TF must have already expired. Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately when recording guest state as part of a VM-Exit, but #DB VM-Exits intentionally do not treat the #DB as "guest state" as interception of the #DB effectively makes the #DB host-owned, thus KVM needs to manually set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest. Note, although this bug can be triggered by guest userspace, doing so requires IOPL=3, and guest userspace running with IOPL=3 has full access to all I/O ports (from the guest's perspective) and can crash/reboot the guest any number of ways. IOPL=3 is required because STI blocking kicks in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF. MOVSS blocking can be initiated by userspace, but can be coincident with a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking is problematic only for CPL0 (and only if the guest is crazy enough to access a DR in a MOVSS shadow). All other sources of #DBs are either suppressed by MOVSS blocking (single-step, code fetch, data, and I/O), are mutually exclusive with MOVSS blocking (T-bit task switch), or are already handled by KVM (ICEBP, a.k.a. INT1). This bug was originally found by running tests[1] created for XSA-308[2]. Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is presumably why the Xen bug was deemed to be an exploitable DOS from guest userspace. KVM already handles ICEBP by skipping the ICEBP instruction and thus clears MOVSS blocking as a side effect of its "emulation". [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html [2] https://xenbits.xen.org/xsa/advisory-308.html Reported-by: David Woodhouse <dwmw2@infradead.org> Reported-by: Alexander Graf <graf@amazon.de> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220120000624.655815-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-25 09:40:19 -05:00
Vitaly Kuznetsov	5c89be1dd5	KVM: x86: Move CPUID.(EAX=0x12,ECX=1) mangling to __kvm_update_cpuid_runtime() Full equality check of CPUID data on update (kvm_cpuid_check_equal()) may fail for SGX enabled CPUs as CPUID.(EAX=0x12,ECX=1) is currently being mangled in kvm_vcpu_after_set_cpuid(). Move it to __kvm_update_cpuid_runtime() and split off cpuid_get_supported_xcr0() helper as 'vcpu->arch.guest_supported_xcr0' update needs (logically) to stay in kvm_vcpu_after_set_cpuid(). Cc: stable@vger.kernel.org Fixes: `feb627e8d6` ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220124103606.2630588-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-25 09:40:19 -05:00
Xianting Tian	1625566ec8	KVM: remove async parameter of hva_to_pfn_remapped() The async parameter of hva_to_pfn_remapped() is not used, so remove it. Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Message-Id: <20220124020456.156386-1-xianting.tian@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-24 08:53:37 -05:00
Peter Zijlstra	adb759e599	x86,kvm/xen: Remove superfluous .fixup usage Commit `14243b3871` ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery") adds superfluous .fixup usage after the whole .fixup section was removed in commit `e5eefda5aa` ("x86: Remove .fixup section"). Fixes: `14243b3871` ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery") Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220123124219.GH20638@worktop.programming.kicks-ass.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-24 08:53:00 -05:00
Sean Christopherson	94fea1d8a3	KVM: VMX: Zero host's SYSENTER_ESP iff SYSENTER is NOT used Zero vmcs.HOST_IA32_SYSENTER_ESP when initializing constant host state if and only if SYSENTER cannot be used, i.e. the kernel is a 64-bit kernel and is not emulating 32-bit syscalls. As the name suggests, vmx_set_constant_host_state() is intended for state that is constant. When SYSENTER is used, SYSENTER_ESP isn't constant because stacks are per-CPU, and the VMCS must be updated whenever the vCPU is migrated to a new CPU. The logic in vmx_vcpu_load_vmcs() doesn't differentiate between "never loaded" and "loaded on a different CPU", i.e. setting SYSENTER_ESP on VMCS load also handles setting correct host state when the VMCS is first loaded. Because a VMCS must be loaded before it is initialized during vCPU RESET, zeroing the field in vmx_set_constant_host_state() obliterates the value that was written when the VMCS was loaded. If the vCPU is run before it is migrated, the subsequent VM-Exit will zero out MSR_IA32_SYSENTER_ESP, leading to a #DF on the next 32-bit syscall. double fault: 0000 [#1] SMP CPU: 0 PID: 990 Comm: stable Not tainted 5.16.0+ #97 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 EIP: entry_SYSENTER_32+0x0/0xe7 Code: <9c> 50 eb 17 0f 20 d8 a9 00 10 00 00 74 0d 25 ff ef ff ff 0f 22 d8 EAX: 000000a2 EBX: a8d1300c ECX: a8d13014 EDX: 00000000 ESI: a8f87000 EDI: a8d13014 EBP: a8d12fc0 ESP: 00000000 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210093 CR0: 80050033 CR2: fffffffc CR3: 02c3b000 CR4: 00152e90 Fixes: `6ab8a4053f` ("KVM: VMX: Avoid to rdmsrl(MSR_IA32_SYSENTER_ESP)") Cc: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220122015211.1468758-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-24 08:52:49 -05:00
Marc Zyngier	278583055a	KVM: arm64: Use shadow SPSR_EL1 when injecting exceptions on !VHE Injecting an exception into a guest with non-VHE is risky business. Instead of writing in the shadow register for the switch code to restore it, we override the CPU register instead. Which gets overriden a few instructions later by said restore code. The result is that although the guest correctly gets the exception, it will return to the original context in some random state, depending on what was there the first place... Boo. Fix the issue by writing to the shadow register. The original code is absolutely fine on VHE, as the state is already loaded, and writing to the shadow register in that case would actually be a bug. Fixes: `bb666c472c` ("KVM: arm64: Inject AArch64 exceptions from HYP") Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20220121184207.423426-1-maz@kernel.org	2022-01-24 09:39:03 +00:00
Marc Zyngier	d11a327ed9	KVM: arm64: vgic-v3: Restrict SEIS workaround to known broken systems Contrary to what `df652bcf11` ("KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors") was asserting, there is at least one other system out there (Cavium ThunderX2) implementing SEIS, and not in an obviously broken way. So instead of imposing the M1 workaround on an innocent bystander, let's limit it to the two known broken Apple implementations. Fixes: `df652bcf11` ("KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors") Reported-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220122103912.795026-1-maz@kernel.org	2022-01-22 11:38:16 +00:00
Wei Wang	e2e83a73d7	docs: kvm: fix WARNINGs from api.rst Use the api number 134 for KVM_GET_XSAVE2, instead of 42, which has been used by KVM_GET_XSAVE. Also, fix the WARNINGs of the underlines being too short. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Message-Id: <20220120045003.315177-1-wei.w.wang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-20 12:13:35 -05:00
Jinrong Liang	83a34ad848	selftests: kvm/x86: Fix the warning in lib/x86_64/processor.c The following warning appears when executing make -C tools/testing/selftests/kvm include/x86_64/processor.h:290:2: warning: 'ecx' may be used uninitialized in this function [-Wmaybe-uninitialized] asm volatile("cpuid" ^~~ lib/x86_64/processor.c:1523:21: note: 'ecx' was declared here uint32_t eax, ebx, ecx, edx, max_ext_leaf; Just initialize ecx to remove this warning. Fixes: `c8cc43c1ea` ("selftests: KVM: avoid failures due to reserved HyperTransport region") Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220119140325.59369-1-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-20 12:12:47 -05:00
Jinrong Liang	a0f4ba7f51	selftests: kvm/x86: Fix the warning in pmu_event_filter_test.c The following warning appears when executing make -C tools/testing/selftests/kvm x86_64/pmu_event_filter_test.c: In function 'vcpu_supports_intel_br_retired': x86_64/pmu_event_filter_test.c:241:28: warning: variable 'cpuid' set but not used [-Wunused-but-set-variable] 241 \| struct kvm_cpuid2 cpuid; \| ^~~~~ x86_64/pmu_event_filter_test.c: In function 'vcpu_supports_amd_zen_br_retired': x86_64/pmu_event_filter_test.c:258:28: warning: variable 'cpuid' set but not used [-Wunused-but-set-variable] 258 \| struct kvm_cpuid2 cpuid; \| ^~~~~ Just delete the unused variables to stay away from warnings. Fixes: dc7e75b3b3ee ("selftests: kvm/x86: Add test for KVM_SET_PMU_EVENT_FILTER") Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220119133910.56285-1-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-20 12:12:30 -05:00

1 2 3 4 5 ...

1060734 Commits