linux

Author	SHA1	Message	Date
Sean Christopherson	31603d4fc2	KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown interrupted a KVM update of the percpu in-use VMCS list. Because NMIs are not blocked by disabling IRQs, it's possible that crash_vmclear_local_loaded_vmcss() could be called while the percpu list of VMCSes is being modified, e.g. in the middle of list_add() in vmx_vcpu_load_vmcs(). This potential corner case was called out in the original commit[], but the analysis of its impact was wrong. Skipping the VMCLEARs is wrong because it all but guarantees that a loaded, and therefore cached, VMCS will live across kexec and corrupt memory in the new kernel. Corruption will occur because the CPU's VMCS cache is non-coherent, i.e. not snooped, and so the writeback of VMCS memory on its eviction will overwrite random memory in the new kernel. The VMCS will live because the NMI shootdown also disables VMX, i.e. the in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the VMCS cache on VMXOFF. Furthermore, interrupting list_add() and list_del() is safe due to crash_vmclear_local_loaded_vmcss() using forward iteration. list_add() ensures the new entry is not visible to forward iteration unless the entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev" pointer could be observed if the NMI shootdown interrupted list_del() or list_add(), but list_for_each_entry() does not consume ->prev. In addition to removing the temporary disabling of VMCLEAR, open code loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that the VMCS is deleted from the list only after it's been VMCLEAR'd. Deleting the VMCS before VMCLEAR would allow a race where the NMI shootdown could arrive between list_del() and vmcs_clear() and thus neither flow would execute a successful VMCLEAR. Alternatively, more code could be moved into loaded_vmcs_init(), but that gets rather silly as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb() and would need to work around the list_del(). Update the smp_() comments related to the list manipulation, and opportunistically reword them to improve clarity. [*] https://patchwork.kernel.org/patch/1675731/#3720461 Fixes: `8f536b7697` ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-23 15:44:25 -04:00
Zhenyu Wang	e3747407c4	KVM: x86: Expose fast short REP MOV for supported cpuid For CPU supporting fast short REP MOV (XF86_FEATURE_FSRM) e.g Icelake, Tigerlake, expose it in KVM supported cpuid as well. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Message-Id: <20200323092236.3703-1-zhenyuw@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-23 15:44:24 -04:00
Stefan Raspl	0c794dcefb	tools/kvm_stat: add command line switch '-c' to log in csv format Add an alternative format that can be more easily used for further processing later on. Note that we add a timestamp in the first column for both, the regular and the new csv format. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Message-Id: <20200306114250.57585-5-raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-23 15:44:21 -04:00
Stefan Raspl	3cbb394d9f	tools/kvm_stat: add command line switch '-s' to set update interval This now controls both, the refresh rate of the interactive mode as well as the logging mode. Which, as a consequence, means that the default of logging mode is now 3s, too (use command line switch '-s' to adjust to your liking). Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Message-Id: <20200306114250.57585-4-raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-23 15:44:20 -04:00
Stefan Raspl	0e6618fba8	tools/kvm_stat: switch to argparse optparse is deprecated for a while, hence switching over to argparse (which also works with python2). As a consequence, help output has some subtle changes, the most significant one being that the options are all listed explicitly instead of a universal '[options]' indicator. Also, some of the error messages are phrased slightly different. While at it, squashed a number of minor PEP8 issues. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Message-Id: <20200306114250.57585-3-raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-23 15:44:20 -04:00
Stefan Raspl	eecda7a956	tools/kvm_stat: rework command line sequence and message texts Make sure command line arguments are sorted alphabetically everywhere, and adjusted existing texts for interactive command 's' to become consistent with the long form --set-delay. Throwing in some PEP8 fixes (all cosmetics) for good measure. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Message-Id: <20200306114250.57585-2-raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-23 15:44:19 -04:00
Andrew Jones	bfcaa84975	KVM: selftests: Rework timespec functions and usage The steal_time test's timespec stop condition was wrong and should have used the timespec functions instead to avoid being wrong, but timespec_diff had a strange interface. Rework all the timespec API and its use. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-18 14:08:56 +01:00
Xiaoyao Li	cf6c26ec7b	KVM: x86: Code style cleanup in kvm_arch_dev_ioctl() In kvm_arch_dev_ioctl(), the brackets of case KVM_X86_GET_MCE_CAP_SUPPORTED accidently encapsulates case KVM_GET_MSR_FEATURE_INDEX_LIST and case KVM_GET_MSRS. It doesn't affect functionality but it's misleading. Remove unnecessary brackets and opportunistically add a "break" in the default path. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-18 14:05:45 +01:00
Sean Christopherson	2b110b6164	KVM: x86: Add blurb to CPUID tracepoint when using max basic leaf values Tack on "used max basic" at the end of the CPUID tracepoint when the output values correspond to the max basic leaf, i.e. when emulating Intel's out-of-range CPUID behavior. Observing "cpuid entry not found" in the tracepoint with non-zero output values is confusing for users that aren't familiar with the out-of-range semantics, and qualifying the "not found" case hopefully makes it clear that "found" means "found the exact entry". Suggested-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-18 13:44:43 +01:00
Sean Christopherson	e7adda2810	KVM: x86: Add requested index to the CPUID tracepoint Output the requested index when tracing CPUID emulation; it's basically mandatory for leafs where the index is meaningful, and is helpful for verifying KVM correctness even when the index isn't meaningful, e.g. the trace for a Linux guest's hypervisor_cpuid_base() probing appears to be broken (returns all zeroes) at first glance, but is correct because the index is non-zero, i.e. the output values correspond to a random index in the maximum basic leaf. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-18 13:44:18 +01:00
Paolo Bonzini	d55c9d4009	KVM: nSVM: check for EFER.SVME=1 before entering guest EFER is set for L2 using svm_set_efer, which hardcodes EFER_SVME to 1 and hides an incorrect value for EFER.SVME in the L1 VMCB. Perform the check manually to detect invalid guest state. Reported-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-18 13:41:32 +01:00
Zhenyu Wang	9401f2e5b0	KVM: x86: Expose AVX512 VP2INTERSECT in cpuid for TGL On Tigerlake new AVX512 VP2INTERSECT feature is available. This allows to expose it via KVM_GET_SUPPORTED_CPUID. Cc: "Zhong, Yang" <yang.zhong@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-18 12:25:45 +01:00
Paolo Bonzini	96b100cd14	KVM: nVMX: remove side effects from nested_vmx_exit_reflected The name of nested_vmx_exit_reflected suggests that it's purely a test, but it actually marks VMCS12 pages as dirty. Move this to vmx_handle_exit, observing that the initial nested_run_pending check in nested_vmx_exit_reflected is pointless---nested_run_pending has just been cleared in vmx_vcpu_run and won't be set until handle_vmlaunch or handle_vmresume. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-18 12:16:39 +01:00
Uros Bizjak	bb03911f79	KVM: VMX: access regs array in vmenter.S in its natural order Registers in "regs" array are indexed as rax/rcx/rdx/.../rsi/rdi/r8/... Reorder access to "regs" array in vmenter.S to follow its natural order. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-17 14:14:32 +01:00
Paolo Bonzini	1c482452d5	KVM: s390: Features and Enhancements for 5.7 part1 1. Allow to disable gisa 2. protected virtual machines Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state like guest memory and guest registers anymore. Instead the PVMs are mostly managed by a new entity called Ultravisor (UV), which provides an API, so KVM and the PV can request management actions. PVMs are encrypted at rest and protected from hypervisor access while running. They switch from a normal operation into protected mode, so we can still use the standard boot process to load a encrypted blob and then move it into protected mode. Rebooting is only possible by passing through the unprotected/normal mode and switching to protected again. One mm related patch will go via Andrews mm tree ( mm/gup/writeback: add callbacks for inaccessible pages) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJeZf9tAAoJEBF7vIC1phx89J0P/iv3wCoMNDqAttnHa/UQFF04 njUadNYkAADDrsabIEOs9O+BE1/4BVspnIunE4+xw76p5M/7/g5eIhXWcLudhlnL +XtvuEwz/2ffA9JWAAYNKB7cGqBM9BCC+iYzAF9ah6sPLmlDCoF+hRe0g+0tXSON cklUJFril9bOcxd/MxrzFLcmipbxT/Z4/10eBY+FHcm6SQGOKAtJH0xL7X3PfPI5 L/6ZhML9exsj1Iplkrl8BomMRoYOrvfq/jMaZp9SwmfXaOKYmNU3a19MhzfZ593h bfR92H8kZRy/TpBd7EnpxYGQ/n53HkUhFMhtqkkkeHW1rCo8ccwC4VfnXb+KqQp+ nJ8KieWG+OlKKFDuZPl5Gq+jQqjJfzchbyMTYnBNe+GPT5zg76tJXmQyDn5X9p3R mfg+9ZEeEonMu7px93Ht1gLdPiC2gjRckjuBDPqMGEhG2z2SQ/MLri+WnproIQRa TcE7rZBtuyrGFTq4M4dEcsUW02xnOaav6H57kkl8EwqYwgDHlqoUbt0AvLFyW07a RlH7drmhKDwTJkcOhOLeLNM8Un6NvnsLZ8Lbcr9rRf9Z9Lpc+zW88BSwJ7MM/GH8 FEQM8Omnn8KAJTENpIm3bHHyvsi0kJEhl+c3Ila3QnYzXZbJ3ZDaJZngMAbUUnVl YNeFyyALzOgVVBx4kvTm =x6Hn -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Features and Enhancements for 5.7 part1 1. Allow to disable gisa 2. protected virtual machines Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state like guest memory and guest registers anymore. Instead the PVMs are mostly managed by a new entity called Ultravisor (UV), which provides an API, so KVM and the PV can request management actions. PVMs are encrypted at rest and protected from hypervisor access while running. They switch from a normal operation into protected mode, so we can still use the standard boot process to load a encrypted blob and then move it into protected mode. Rebooting is only possible by passing through the unprotected/normal mode and switching to protected again. One mm related patch will go via Andrews mm tree ( mm/gup/writeback: add callbacks for inaccessible pages)	2020-03-16 18:19:34 +01:00
Vitaly Kuznetsov	6d05a965ad	KVM: selftests: enlightened VMPTRLD with an incorrect GPA Check that guest doesn't hang when an invalid eVMCS GPA is specified. Testing that #UD is injected would probably be better but selftests lack the infrastructure currently. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 18:19:32 +01:00
Vitaly Kuznetsov	41b0552aa6	KVM: selftests: test enlightened vmenter with wrong eVMCS version Check that VMfailInvalid happens when eVMCS revision is is invalid. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 18:19:32 +01:00
Vitaly Kuznetsov	7bcf732e74	KVM: selftests: define and use EVMCS_VERSION KVM allows to use revision_id from MSR_IA32_VMX_BASIC as eVMCS revision_id to workaround a bug in genuine Hyper-V (see the comment in nested_vmx_handle_enlightened_vmptrld()), this shouldn't be used by default. Switch to using KVM_EVMCS_VERSION(1). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 18:19:31 +01:00
Vitaly Kuznetsov	b6a0653ae2	KVM: nVMX: properly handle errors in nested_vmx_handle_enlightened_vmptrld() nested_vmx_handle_enlightened_vmptrld() fails in two cases: - when we fail to kvm_vcpu_map() the supplied GPA - when revision_id is incorrect. Genuine Hyper-V raises #UD in the former case (at least with some incorrect GPAs) and does VMfailInvalid() in the later. KVM doesn't do anything so L1 just gets stuck retrying the same faulty VMLAUNCH. nested_vmx_handle_enlightened_vmptrld() has two call sites: nested_vmx_run() and nested_get_vmcs12_pages(). The former needs to queue do much: the failure there happens after migration when L2 was running (and L1 did something weird like wrote to VP assist page from a different vCPU), just kill L1 with KVM_EXIT_INTERNAL_ERROR. Reported-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Squash kbuild autopatch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 18:19:30 +01:00
Vitaly Kuznetsov	e942dbf8c5	KVM: nVMX: stop abusing need_vmcs12_to_shadow_sync for eVMCS mapping When vmx_set_nested_state() happens, we may not have all the required data to map enlightened VMCS: e.g. HV_X64_MSR_VP_ASSIST_PAGE MSR may not yet be restored so we need a postponed action. Currently, we (ab)use need_vmcs12_to_shadow_sync/nested_sync_vmcs12_to_shadow() for that but this is not ideal: - We may not need to sync anything if L2 is running - It is hard to propagate errors from nested_sync_vmcs12_to_shadow() as we call it from vmx_prepare_switch_to_guest() which happens just before we do VMLAUNCH, the code is not ready to handle errors there. Move eVMCS mapping to nested_get_vmcs12_pages() and request KVM_REQ_GET_VMCS12_PAGES, it seems to be is less abusive in nature. It would probably be possible to introduce a specialized KVM_REQ_EVMCS_MAP but it is undesirable to propagate eVMCS specifics all the way up to x86.c Note, we don't need to request KVM_REQ_GET_VMCS12_PAGES from vmx_set_nested_state() directly as nested_vmx_enter_non_root_mode() already does that. Requesting KVM_REQ_GET_VMCS12_PAGES is done to document the (non-obvious) side-effect and to be future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 18:19:29 +01:00
Paolo Bonzini	0c546725ee	Merge branch 'kvm-null-pointer-fix' into HEAD	2020-03-16 17:59:11 +01:00
Wainer dos Santos Moschetta	352be2c539	selftests: kvm: Uses TEST_FAIL in tests/utilities Changed all tests and utilities to use TEST_FAIL macro instead of TEST_ASSERT(false,...). Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:11 +01:00
Wainer dos Santos Moschetta	a46f8a63cd	selftests: kvm: Introduce the TEST_FAIL macro Some tests/utilities use the TEST_ASSERT(false, ...) pattern to indicate a failure and stop execution. This change introduces the TEST_FAIL macro which is a wrap around TEST_ASSERT(false, ...) and so provides a direct alternative for failing a test. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:10 +01:00
Christian Borntraeger	3203a01737	selftests: KVM: s390: check for registers to NOT change on reset Normal reset and initial CPU reset do not clear all registers. Add a test that those registers are NOT changed. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:09 +01:00
Christian Borntraeger	b0435a12a6	selftests: KVM: s390: test more register variants for the reset ioctl We should not only test the oneregs or the get_(x)regs interfaces but also the sync_regs. Those are usually the canonical place for register content. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:08 +01:00
Christian Borntraeger	41cbed5b07	selftests: KVM: s390: fix early guest crash The guest crashes very early due to changes in the control registers used by dynamic address translation. Let us use different registers that will not crash the guest. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:07 +01:00
Andrew Jones	94c4b76b88	KVM: selftests: Introduce steal-time test The steal-time test confirms what is reported to the guest as stolen time is consistent with the run_delay reported for the VCPU thread on the host. Both x86_64 and AArch64 have the concept of steal/stolen time so this test is introduced for both architectures. While adding the test we ensure .gitignore has all tests listed (it was missing s390x/resets) and that the Makefile has all tests listed in alphabetical order (not really necessary, but it almost was already...). We also extend the common API with a new num-guest- pages call and a new timespec call. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:07 +01:00
Andrew Jones	beca54702d	KVM: selftests: virt_map should take npages, not size Also correct the comment and prototype for vm_create_default(), as it takes a number of pages, not a size. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:06 +01:00
Andrew Jones	d0aac3320d	KVM: selftests: Use consistent message for test skipping Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:05 +01:00
Andrew Jones	d9eaf19ecc	KVM: selftests: Enable printf format warnings for TEST_ASSERT Use the format attribute to enable printf format warnings, and then fix them all. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:04 +01:00
Christian Borntraeger	6a46fcf92f	selftests: KVM: s390: fix format strings for access reg test acrs are 32 bit and not 64 bit. Reported-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:03 +01:00
Christian Borntraeger	53362fe930	selftests: KVM: s390: fixup fprintf format error in reset.c value is u64 and not string. Reported-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:03 +01:00
Andrew Jones	425936246f	KVM: selftests: Share common API documentation Move function documentation comment blocks to the header files in order to avoid duplicating them for each architecture. While at it clean up and fix up the comment blocks. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:02 +01:00
Andrew Jones	1914f624f5	selftests: KVM: SVM: Add vmcall test to gitignore Add svm_vmcall_test to gitignore list, and realphabetize it. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:01 +01:00
Miaohe Lin	0b66465344	KVM: nSVM: Remove an obsolete comment. The function does not return bool anymore. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:59:00 +01:00
Paolo Bonzini	8e205a6b2a	KVM: X86: correct meaningless kvm_apicv_activated() check After test_and_set_bit() for kvm->arch.apicv_inhibit_reasons, we will always get false when calling kvm_apicv_activated() because it's sure apicv_inhibit_reasons do not equal to 0. What the code wants to do, is check whether APICv was already active and if so skip the costly request; we can do this using cmpxchg. Reported-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:59 +01:00
Oliver Upton	212617dbb6	KVM: nVMX: Consolidate nested MTF checks to helper function commit `5ef8acbdd6` ("KVM: nVMX: Emulate MTF when performing instruction emulation") introduced a helper to check the MTF VM-execution control in vmcs12. Change pre-existing check in nested_vmx_exit_reflected() to instead use the helper. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:59 +01:00
Wanpeng Li	041bc42ce2	KVM: VMX: Micro-optimize vmexit time when not exposing PMU PMU is not exposed to guest by most of products from cloud providers since the bad performance of PMU emulation and security concern. However, it calls perf_guest_switch_get_msrs() and clear_atomic_switch_msr() unconditionally even if PMU is not exposed to the guest before each vmentry. ~2% vmexit time reduced can be observed by kvm-unit-tests/vmexit.flat on my SKX server. Before patch: vmcall 1559 After patch: vmcall 1529 Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:58 +01:00
Andrew Jones	331b4de9a7	KVM: selftests: s390x: Provide additional num-guest-pages adjustment s390 requires 1M aligned guest sizes. Embedding the rounding in vm_adjust_num_guest_pages() allows us to remove it from a few other places. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:57 +01:00
Suravee Suthikulpanit	ab56f8e62d	kvm: svm: Introduce GA Log tracepoint for AVIC GA Log tracepoint is useful when debugging AVIC performance issue as it can be used with perf to count the number of times IOMMU AVIC injects interrupts through the slow-path instead of directly inject interrupts to the target vcpu. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:56 +01:00
Peter Xu	3ac40c404c	KVM: Documentation: Update fast page fault for indirect sp Clarify locking.rst to mention early that we're not enabling fast page fault for indirect sps. The previous wording is confusing, in that it seems the proposed solution has been already implemented but it has not. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:56 +01:00
Paolo Bonzini	78f2145c4d	KVM: nSVM: avoid loss of pending IRQ/NMI before entering L2 This patch reproduces for nSVM the change that was made for nVMX in commit `b5861e5cf2` ("KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2"). While I do not have a test that breaks without it, I cannot see why it would not be necessary since all events are unblocked by VMRUN's setting of GIF back to 1. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:55 +01:00
Paolo Bonzini	b518ba9fa6	KVM: nSVM: implement check_nested_events for interrupts The current implementation of physical interrupt delivery to a nested guest is quite broken. It relies on svm_interrupt_allowed returning false if VINTR=1 so that the interrupt can be injected from enable_irq_window, but this does not work for guests that do not intercept HLT or that rely on clearing the host IF to block physical interrupts while L2 runs. This patch can be split in two logical parts, but including only one breaks tests so I am combining both changes together. The first and easiest is simply to return true for svm_interrupt_allowed if HF_VINTR_MASK is set and HIF is set. This way the semantics of svm_interrupt_allowed are respected: svm_interrupt_allowed being false does not mean "call enable_irq_window", it means "interrupts cannot be injected now". After doing this, however, we need another place to inject the interrupt, and fortunately we already have one, check_nested_events, which nested SVM does not implement but which is meant exactly for this purpose. It is called before interrupts are injected, and it can therefore do the L2->L1 switch while leaving inject_pending_event none the wiser. This patch was developed together with Cathy Avery, who wrote the test and did a lot of the initial debugging. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:54 +01:00
Paolo Bonzini	64b5bd2704	KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1 If a nested VM is started while an IRQ was pending and with V_INTR_MASKING=1, the behavior of the guest depends on host IF. If it is 1, the VM should exit immediately, before executing the first instruction of the guest, because VMRUN sets GIF back to 1. If it is 0 and the host has VGIF, however, at the time of the VMRUN instruction L0 is running the guest with a pending interrupt window request. This interrupt window request is completely irrelevant to L2, since IF only controls virtual interrupts, so this patch drops INTERCEPT_VINTR from the VMCB while running L2 under these circumstances. To simplify the code, both steps of enabling the interrupt window (setting the VINTR intercept and requesting a fake virtual interrupt in svm_inject_irq) are grouped in the svm_set_vintr function, and likewise for dismissing the interrupt window request in svm_clear_vintr. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:53 +01:00
Paolo Bonzini	b5ec2e020b	KVM: nSVM: do not change host intercepts while nested VM is running Instead of touching the host intercepts so that the bitwise OR in recalc_intercepts just works, mask away uninteresting intercepts directly in recalc_intercepts. This is cleaner and keeps the logic in one place even for intercepts that can change even while L2 is running. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:53 +01:00
Paolo Bonzini	727a7e27cf	KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgd The set_cr3 callback is not setting the guest CR3, it is setting the root of the guest page tables, either shadow or two-dimensional. To make this clearer as well as to indicate that the MMU calls it via kvm_mmu_load_cr3, rename it to load_mmu_pgd. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:52 +01:00
Paolo Bonzini	689f3bf216	KVM: x86: unify callbacks to load paging root Similar to what kvm-intel.ko is doing, provide a single callback that merges svm_set_cr3, set_tdp_cr3 and nested_svm_set_tdp_cr3. This lets us unify the set_cr3 and set_tdp_cr3 entries in kvm_x86_ops. I'm doing that in this same patch because splitting it adds quite a bit of churn due to the need for forward declarations. For the same reason the assignment to vcpu->arch.mmu->set_cr3 is moved to kvm_init_shadow_mmu from init_kvm_softmmu and nested_svm_init_mmu_context. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:51 +01:00
Sean Christopherson	f91af5176c	KVM: x86: Refactor kvm_cpuid() param that controls out-of-range logic Invert and rename the kvm_cpuid() param that controls out-of-range logic to better reflect the semantics of the affected callers, i.e. callers that bypass the out-of-range logic do so because they are looking up an exact guest CPUID entry, e.g. to query the maxphyaddr. Similarly, rename kvm_cpuid()'s internal "found" to "exact" to clarify that it tracks whether or not the exact requested leaf was found, as opposed to any usable leaf being found. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:50 +01:00
Sean Christopherson	09c7431ed3	KVM: x86: Refactor out-of-range logic to contain the madness Move all of the out-of-range logic into a single helper, get_out_of_range_cpuid_entry(), to avoid an extra lookup of CPUID.0.0 and to provide a single location for documenting the out-of-range behavior. No functional change intended. Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:49 +01:00
Sean Christopherson	8d8923115f	KVM: x86: Fix CPUID range checks for Hypervisor and Centaur classes Rework the masking in the out-of-range CPUID logic to handle the Hypervisor sub-classes, as well as the Centaur class if the guest virtual CPU vendor is Centaur. Masking against 0x80000000 only handles basic and extended leafs, which results in Hypervisor range checks being performed against the basic CPUID class, and Centuar range checks being performed against the Extended class. E.g. if CPUID.0x40000000.EAX returns 0x4000000A and there is no entry for CPUID.0x40000006, then function 0x40000006 would be incorrectly reported as out of bounds. While there is no official definition of what constitutes a class, the convention established for Hypervisor classes effectively uses bits 31:8 as the mask by virtue of checking for different bases in increments of 0x100, e.g. KVM advertises its CPUID functions starting at 0x40000100 when HyperV features are advertised at the default base of 0x40000000. The bad range check doesn't cause functional problems for any known VMM because out-of-range semantics only come into play if the exact entry isn't found, and VMMs either support a very limited Hypervisor range, e.g. the official KVM range is 0x40000000-0x40000001 (effectively no room for undefined leafs) or explicitly defines gaps to be zero, e.g. Qemu explicitly creates zeroed entries up to the Centaur and Hypervisor limits (the latter comes into play when providing HyperV features). The bad behavior can be visually confirmed by dumping CPUID output in the guest when running Qemu with a stable TSC, as Qemu extends the limit of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, without defining zeroed entries for 0x40000002 - 0x4000000f. Note, documentation of Centaur/VIA CPUs is hard to come by. Designating 0xc0000000 - 0xcfffffff as the Centaur class is a best guess as to the behavior of a real Centaur/VIA CPU. Fixes: `43561123ab` ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH") Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:58:49 +01:00

1 2 3 4 5 ...

901011 Commits