linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-13 22:53:20 +00:00

Author	SHA1	Message	Date
Sean Christopherson	8fea056eeb	KVM: selftests: Use kvm_cpu_has() in AMX test Use kvm_cpu_has() in the AMX test instead of open coding equivalent functionality using kvm_get_supported_cpuid_entry() and kvm_get_supported_cpuid_index(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-12-seanjc@google.com	2022-07-13 18:14:12 -07:00
Sean Christopherson	2697646bd3	KVM: selftests: Check for _both_ XTILE data and cfg in AMX test Check for _both_ XTILE data and cfg support in the AMX test instead of checking for _either_ feature. Practically speaking, no sane CPU or vCPU will support one but not the other, but the effective "or" behavior is subtle and technically incorrect. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-11-seanjc@google.com	2022-07-13 18:14:12 -07:00
Sean Christopherson	fdd1e2788c	KVM: selftests: Use kvm_cpu_has() for XSAVES in XSS MSR test Use kvm_cpu_has() in the XSS MSR test instead of open coding equivalent functionality using kvm_get_supported_cpuid_index(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-10-seanjc@google.com	2022-07-13 18:14:11 -07:00
Sean Christopherson	50445ea233	KVM: selftests: Drop redundant vcpu_set_cpuid() from PMU selftest Drop a redundant vcpu_set_cpuid() from the PMU test. The vCPU's CPUID is set to KVM's supported CPUID by vm_create_with_one_vcpu(), which was also true back when the helper was named vm_create_default(). Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-9-seanjc@google.com	2022-07-13 18:14:11 -07:00
Sean Christopherson	ea129d2254	KVM: selftests: Use kvm_cpu_has() to query PDCM in PMU selftest Use kvm_cpu_has() in the PMU test to query PDCM support instead of open coding equivalent functionality using kvm_get_supported_cpuid_index(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-8-seanjc@google.com	2022-07-13 18:14:10 -07:00
Sean Christopherson	1ecbb337fa	KVM: selftests: Use kvm_cpu_has() for nested VMX checks Use kvm_cpu_has() to check for nested VMX support, and drop the helpers now that their functionality is trivial to implement. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-7-seanjc@google.com	2022-07-13 18:14:10 -07:00
Sean Christopherson	f21940a3bb	KVM: selftests: Use kvm_cpu_has() for nested SVM checks Use kvm_cpu_has() to check for nested SVM support, and drop the helpers now that their functionality is trivial to implement. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-6-seanjc@google.com	2022-07-13 18:14:10 -07:00
Sean Christopherson	c5c5b827f1	KVM: selftests: Use kvm_cpu_has() in the SEV migration test Use kvm_cpu_has() in the SEV migration test instead of open coding equivalent functionality using kvm_get_supported_cpuid_entry(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-5-seanjc@google.com	2022-07-13 18:14:09 -07:00
Sean Christopherson	61d76b8a69	KVM: selftests: Add framework to query KVM CPUID bits Add X86_FEATURE_* magic in the style of KVM-Unit-Tests' implementation, where the CPUID function, index, output register, and output bit position are embedded in the macro value. Add kvm_cpu_has() to query KVM's supported CPUID and use it set_sregs_test, which is the most prolific user of manual feature querying. Opportunstically rename calc_cr4_feature_bits() to calc_supported_cr4_feature_bits() to better capture how the CR4 bits are chosen. Link: https://lore.kernel.org/all/20210422005626.564163-1-ricarkol@google.com Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-4-seanjc@google.com	2022-07-13 18:14:09 -07:00
Sean Christopherson	683edfd42b	KVM: sefltests: Use CPUID_* instead of X86_FEATURE_* for one-off usage Rename X86_FEATURE_* macros to CPUID_* in various tests to free up the X86_FEATURE_* names for KVM-Unit-Tests style CPUID automagic where the function, leaf, register, and bit for the feature is embedded in its macro value. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-3-seanjc@google.com	2022-07-13 18:14:08 -07:00
Sean Christopherson	4c16fa3ee9	KVM: selftests: Set KVM's supported CPUID as vCPU's CPUID during recreate On x86-64, set KVM's supported CPUID as the vCPU's CPUID when recreating a VM+vCPU to deduplicate code for state save/restore tests, and to provide symmetry of sorts with respect to vm_create_with_one_vcpu(). The extra KVM_SET_CPUID2 call is wasteful for Hyper-V, but ultimately is nothing more than an expensive nop, and overriding the vCPU's CPUID with the Hyper-V CPUID information is the only known scenario where a state save/restore test wouldn't need/want the default CPUID. Opportunistically use __weak for the default vm_compute_max_gfn(), it's provided by tools' compiler.h. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220614200707.3315957-2-seanjc@google.com	2022-07-13 18:14:08 -07:00
Colton Lewis	594a1c271c	KVM: selftests: Fix filename reporting in guest asserts Fix filename reporting in guest asserts by ensuring the GUEST_ASSERT macro records __FILE__ and substituting REPORT_GUEST_ASSERT for many repetitive calls to TEST_FAIL. Previously filename was reported by using __FILE__ directly in the selftest, wrongly assuming it would always be the same as where the assertion failed. Signed-off-by: Colton Lewis <coltonlewis@google.com> Reported-by: Ricardo Koller <ricarkol@google.com> Fixes: `4e18bccc2e` Link: https://lore.kernel.org/r/20220615193116.806312-5-coltonlewis@google.com [sean: convert more TEST_FAIL => REPORT_GUEST_ASSERT instances] Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-13 18:14:08 -07:00
Colton Lewis	ddcb57afd5	KVM: selftests: Write REPORT_GUEST_ASSERT macros to pair with GUEST_ASSERT Write REPORT_GUEST_ASSERT macros to pair with GUEST_ASSERT to abstract and make consistent all guest assertion reporting. Every report includes an explanatory string, a filename, and a line number. Signed-off-by: Colton Lewis <coltonlewis@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20220615193116.806312-4-coltonlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-13 18:14:07 -07:00
Colton Lewis	fc573fa4f3	KVM: selftests: Increase UCALL_MAX_ARGS to 7 Increase UCALL_MAX_ARGS to 7 to allow GUEST_ASSERT_4 to pass 3 builtin ucall arguments specified in guest_assert_builtin_args plus 4 user-specified arguments. Signed-off-by: Colton Lewis <coltonlewis@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20220615193116.806312-3-coltonlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-13 18:14:07 -07:00
Colton Lewis	8fb2638a56	KVM: selftests: enumerate GUEST_ASSERT arguments Enumerate GUEST_ASSERT arguments to avoid magic indices to ucall.args. Signed-off-by: Colton Lewis <coltonlewis@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20220615193116.806312-2-coltonlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-13 18:14:06 -07:00
Sean Christopherson	0bc2732661	KVM: x86: WARN only once if KVM leaves a dangling userspace I/O request Change a WARN_ON() to separate WARN_ON_ONCE() if KVM has an outstanding PIO or MMIO request without an associated callback, i.e. if KVM queued a userspace I/O exit but didn't actually exit to userspace before moving on to something else. Warning on every KVM_RUN risks spamming the kernel if KVM gets into a bad state. Opportunistically split the WARNs so that it's easier to triage failures when a WARN fires. Deliberately do not use KVM_BUG_ON(), i.e. don't kill the VM. While the WARN is all but guaranteed to fire if and only if there's a KVM bug, a dangling I/O request does not present a danger to KVM (that flag is truly truly consumed only in a single emulator path), and any such bug is unlikely to be fatal to the VM (KVM essentially failed to do something it shouldn't have tried to do in the first place). In other words, note the bug, but let the VM keep running. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220711232750.1092012-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-13 18:14:06 -07:00
Sean Christopherson	2626206963	KVM: x86: Set error code to segment selector on LLDT/LTR non-canonical #GP When injecting a #GP on LLDT/LTR due to a non-canonical LDT/TSS base, set the error code to the selector. Intel SDM's says nothing about the #GP, but AMD's APM explicitly states that both LLDT and LTR set the error code to the selector, not zero. Note, a non-canonical memory operand on LLDT/LTR does generate a #GP(0), but the KVM code in question is specific to the base from the descriptor. Fixes: `e37a75a13c` ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220711232750.1092012-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-13 18:14:06 -07:00
Sean Christopherson	ec6e4d8632	KVM: x86: Mark TSS busy during LTR emulation _after_ all fault checks Wait to mark the TSS as busy during LTR emulation until after all fault checks for the LTR have passed. Specifically, don't mark the TSS busy if the new TSS base is non-canonical. Opportunistically drop the one-off !seg_desc.PRESENT check for TR as the only reason for the early check was to avoid marking a !PRESENT TSS as busy, i.e. the common !PRESENT is now done before setting the busy bit. Fixes: `e37a75a13c` ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR") Reported-by: syzbot+760a73552f47a8cd0fd9@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220711232750.1092012-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-13 18:14:05 -07:00
Sean Christopherson	43bb9e000e	KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specific Add a "UD" clause to KVM_X86_QUIRK_MWAIT_NEVER_FAULTS to make it clear that the quirk only controls the #UD behavior of MONITOR/MWAIT. KVM doesn't currently enforce fault checks when MONITOR/MWAIT are supported, but that could change in the future. SVM also has a virtualization hole in that it checks all faults before intercepts, and so "never faults" is already a lie when running on SVM. Fixes: `bfbcc81bb8` ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior") Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220711225753.1073989-4-seanjc@google.com	2022-07-13 18:14:05 -07:00
Vitaly Kuznetsov	14fd95bf14	KVM: selftests: Use "a" and "d" to set EAX/EDX for wrmsr_safe() Do not use GCC's "A" constraint to load EAX:EDX in wrmsr_safe(). Per GCC's documenation on x86-specific constraints, "A" will not actually load a 64-bit value into EAX:EDX on x86-64. The a and d registers. This class is used for instructions that return double word results in the ax:dx register pair. Single word values will be allocated either in ax or dx. For example on i386 the following implements rdtsc: unsigned long long rdtsc (void) { unsigned long long tick; __asm__ __volatile__("rdtsc":"=A"(tick)); return tick; } This is not correct on x86-64 as it would allocate tick in either ax or dx. You have to use the following variant instead: unsigned long long rdtsc (void) { unsigned int tickl, tickh; __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh)); return ((unsigned long long)tickh << 32)\|tickl; } Because a u64 fits in a single 64-bit register, using "A" for selftests, which are 64-bit only, results in GCC loading the value into either RAX or RDX instead of splitting it across EAX:EDX. E.g.: kvm_exit: reason MSR_WRITE rip 0x402919 info 0 0 kvm_msr: msr_write 40000118 = 0x60000000001 (#GP) ... With "A": 48 8b 43 08 mov 0x8(%rbx),%rax 49 b9 ba da ca ba 0a movabs $0xabacadaba,%r9 00 00 00 4c 8d 15 07 00 00 00 lea 0x7(%rip),%r10 # 402f44 <guest_msr+0x34> 4c 8d 1d 06 00 00 00 lea 0x6(%rip),%r11 # 402f4a <guest_msr+0x3a> 0f 30 wrmsr With "a"/"d": 48 8b 53 08 mov 0x8(%rbx),%rdx 89 d0 mov %edx,%eax 48 c1 ea 20 shr $0x20,%rdx 49 b9 ba da ca ba 0a movabs $0xabacadaba,%r9 00 00 00 4c 8d 15 07 00 00 00 lea 0x7(%rip),%r10 # 402fc3 <guest_msr+0xb3> 4c 8d 1d 06 00 00 00 lea 0x6(%rip),%r11 # 402fc9 <guest_msr+0xb9> 0f 30 wrmsr Fixes: `3b23054cd3` ("KVM: selftests: Add x86-64 support for exception fixup") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints [sean: use "& -1u", provide GCC blurb and link to documentation] Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220714011115.3135828-1-seanjc@google.com	2022-07-13 18:13:50 -07:00
Sean Christopherson	b624ae3541	KVM: selftests: Provide valid inputs for MONITOR/MWAIT regs Provide valid inputs for RAX, RCX, and RDX when testing whether or not KVM injects a #UD on MONITOR/MWAIT. SVM has a virtualization hole and checks for _all_ faults before checking for intercepts, e.g. MONITOR with an unsupported RCX will #GP before KVM gets a chance to intercept and emulate. Fixes: `2325d4dd73` ("KVM: selftests: Add MONITOR/MWAIT quirk test") Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220711225753.1073989-3-seanjc@google.com	2022-07-12 22:31:14 +00:00
Sean Christopherson	874190fd4e	KVM: selftests: Test MONITOR and MWAIT, not just MONITOR for quirk Fix a copy+paste error in monitor_mwait_test by switching one of the two "monitor" instructions to an "mwait". The intent of the test is very much to verify the quirk handles both MONITOR and MWAIT. Fixes: `2325d4dd73` ("KVM: selftests: Add MONITOR/MWAIT quirk test") Reported-by: Yuan Yao <yuan.yao@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220711225753.1073989-2-seanjc@google.com	2022-07-12 22:31:13 +00:00
Sean Christopherson	79f772b9e8	KVM: x86: Query vcpu->vcpu_idx directly and drop its accessor, again Read vcpu->vcpu_idx directly instead of bouncing through the one-line wrapper, kvm_vcpu_get_idx(), and drop the wrapper. The wrapper is a remnant of the original implementation and serves no purpose; remove it (again) before it gains more users. kvm_vcpu_get_idx() was removed in the not-too-distant past by commit `4eeef24241` ("KVM: x86: Query vcpu->vcpu_idx directly and drop its accessor"), but was unintentionally re-introduced by commit `a54d806688` ("KVM: Keep memslots in tree-based structures instead of array-based ones"), likely due to a rebase goof. The wrapper then managed to gain users in KVM's Xen code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220614225615.3843835-1-seanjc@google.com	2022-07-12 22:31:13 +00:00
Hou Wenlong	6e1d2a3f25	KVM: x86/mmu: Replace UNMAPPED_GVA with INVALID_GPA for gva_to_gpa() The result of gva_to_gpa() is physical address not virtual address, it is odd that UNMAPPED_GVA macro is used as the result for physical address. Replace UNMAPPED_GVA with INVALID_GPA and drop UNMAPPED_GVA macro. No functional change intended. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/6104978956449467d3c68f1ad7f2c2f6d771d0ee.1656667239.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-12 22:31:12 +00:00
Vitaly Kuznetsov	156b9d76e8	KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1 Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to hang upon boot or shortly after when a non-default TSC frequency was set for L1. The issue is observed on a host where TSC scaling is supported. The problem appears to be that Windows doesn't use TSC scaling for its guests, even when the feature is advertised, and KVM filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from L1's VMCS. This leads to L2 running with the default frequency (matching host's) while L1 is running with an altered one. Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when it was set for L1. TSC_MULTIPLIER is already correctly computed and written by prepare_vmcs02(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: `d041b5ea93` ("KVM: nVMX: Enable nested TSC scaling") Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220712135009.952805-1-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-12 12:06:20 -07:00
Vitaly Kuznetsov	159e037d2e	KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() 'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally not needed for APIC_DM_REMRD, they're still referenced by __apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize the structure to avoid consuming random stack memory. Fixes: `a183b638b6` ("KVM: x86: make apic_accept_irq tracepoint more generic") Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220708125147.593975-1-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-08 15:59:28 -07:00
Sean Christopherson	f83894b24c	KVM: x86: Fix handling of APIC LVT updates when userspace changes MCG_CAP Add a helper to update KVM's in-kernel local APIC in response to MCG_CAP being changed by userspace to fix multiple bugs. First and foremost, KVM needs to check that there's an in-kernel APIC prior to dereferencing vcpu->arch.apic. Beyond that, any "new" LVT entries need to be masked, and the APIC version register needs to be updated as it reports out the number of LVT entries. Fixes: `4b903561ec` ("KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.") Reported-by: syzbot+8cdad6430c24f396f158@syzkaller.appspotmail.com Cc: Siddh Raman Pant <code@siddh.me> Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-08 15:58:16 -07:00
Sean Christopherson	03d84f9689	KVM: x86: Initialize number of APIC LVT entries during APIC creation Initialize the number of LVT entries during APIC creation, else the field will be incorrectly left '0' if userspace never invokes KVM_X86_SETUP_MCE. Add and use a helper to calculate the number of entries even though MCG_CMCI_P is not set by default in vcpu->arch.mcg_cap. Relying on that to always be true is unnecessarily risky, and subtle/confusing as well. Fixes: `4b903561ec` ("KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.") Reported-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-08 15:55:28 -07:00
Sean Christopherson	4a627b0b16	Merge branch 'kvm-5.20-msr-eperm' Merge a bug fix and cleanups for {g,s}et_msr_mce() using a base that predates commit `281b52780b` ("KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs."), which was written with the intention that it be applied _after_ the bug fix and cleanups. The bug fix in particular needs to be sent to stable trees; give them a stable hash to use.	2022-07-08 15:02:41 -07:00
Sean Christopherson	54ad60ba9d	KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs Add helpers to identify CTL (control) and STATUS MCi MSR types instead of open coding the checks using the offset. Using the offset is perfectly safe, but unintuitive, as understanding what the code does requires knowing that the offset calcuation will not affect the lower three bits. Opportunistically comment the STATUS logic to save readers a trip to Intel's SDM or AMD's APM to understand the "data != 0" check. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-4-seanjc@google.com	2022-07-08 14:57:20 -07:00
Sean Christopherson	f5223a332f	KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce() Use an explicit case statement to grab the full range of MCx bank MSRs in {g,s}et_msr_mce(), and manually check only the "end" (the number of banks configured by userspace may be less than the max). The "default" trick works, but is a bit odd now, and will be quite odd if/when support for accessing MCx_CTL2 MSRs is added, which has near identical logic. Hoist "offset" to function scope so as to avoid curly braces for the case statement, and because MCx_CTL2 support will need the same variables. Opportunstically clean up the comment about allowing bit 10 to be cleared from bank 4. No functional change intended. Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-3-seanjc@google.com	2022-07-08 14:57:12 -07:00
Sean Christopherson	2368048bf5	KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or MCi_STATUS MSR. The behavior of "all zeros' or "all ones" for CTL MSRs is architectural, as is the "only zeros" behavior for STATUS MSRs. I.e. the intent is to inject a #GP, not exit to userspace due to an unhandled emulation case. Returning '-1' gets interpreted as -EPERM up the stack and effecitvely kills the guest. Fixes: `890ca9aefa` ("KVM: Add MCE support") Fixes: `9ffd986c6e` ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.com	2022-07-08 14:52:59 -07:00
Sean Christopherson	b9b71f4368	KVM: x86/mmu: Buffer nested MMU split_desc_cache only by default capacity Buffer split_desc_cache, the cache used to allcoate rmap list entries, only by the default cache capacity (currently 40), not by doubling the minimum (513). Aliasing L2 GPAs to L1 GPAs is uncommon, thus eager page splitting is unlikely to need 500+ entries. And because each object is a non-trivial 128 bytes (see struct pte_list_desc), those extra ~500 entries means KVM is in all likelihood wasting ~64kb of memory per VM. Link: https://lore.kernel.org/all/YrTDcrsn0%2F+alpzf@google.com Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220624171808.2845941-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-25 04:54:51 -04:00
Sean Christopherson	72ae5822b8	KVM: x86/mmu: Use "unsigned int", not "u32", for SPTEs' @access info Use an "unsigned int" for @access parameters instead of a "u32", mostly to be consistent throughout KVM, but also because "u32" is misleading. @access can actually squeeze into a u8, i.e. doesn't need 32 bits, but is as an "unsigned int" because sp->role.access is an unsigned int. No functional change intended. Link: https://lore.kernel.org/all/YqyZxEfxXLsHGoZ%2F@google.com Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220624171808.2845941-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-25 04:54:40 -04:00
Paolo Bonzini	db209369d4	KVM: SEV-ES: reuse advance_sev_es_emulated_ins for OUT too complete_emulator_pio_in() only has to be called by complete_sev_es_emulated_ins() now; therefore, all that the function does now is adjust sev_pio_count and sev_pio_data. Which is the same for both IN and OUT. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 13:05:35 -04:00
Paolo Bonzini	f35cee4adb	KVM: x86: de-underscorify __emulator_pio_in Now all callers except emulator_pio_in_emulated are using __emulator_pio_in/complete_emulator_pio_in explicitly. Move the "either copy the result or attempt PIO" logic in emulator_pio_in_emulated, and rename __emulator_pio_in to just emulator_pio_in. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:54:33 -04:00
Paolo Bonzini	dc7a4bfde5	KVM: x86: wean fast IN from emulator_pio_in Use __emulator_pio_in() directly for fast PIO instead of bouncing through emulator_pio_in() now that __emulator_pio_in() fills "val" when handling in-kernel PIO. vcpu->arch.pio.count is guaranteed to be '0', so this a pure nop. emulator_pio_in_emulated is now the last caller of emulator_pio_in. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:54:20 -04:00
Paolo Bonzini	0c05e10bce	KVM: x86: wean in-kernel PIO from vcpu->arch.pio* Make emulator_pio_in_out operate directly on the provided buffer as long as PIO is handled inside KVM. For input operations, this means that, in the case of in-kernel PIO, __emulator_pio_in() does not have to be always followed by complete_emulator_pio_in(). This affects emulator_pio_in() and kvm_sev_es_ins(); for the latter, that is why the call moves from advance_sev_es_emulated_ins() to complete_sev_es_emulated_ins(). For output, it means that vcpu->pio.count is never set unnecessarily and there is no need to clear it; but also vcpu->pio.size must not be used in kvm_sev_es_outs(), because it will not be updated for in-kernel OUT. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:54:04 -04:00
Paolo Bonzini	30d583fd4e	KVM: x86: move all vcpu->arch.pio* setup in emulator_pio_in_out() For now, this is basically an excuse to add back the void* argument to the function, while removing some knowledge of vcpu->arch.pio* from its callers. The WARN that vcpu->arch.pio.count is zero is also extended to OUT operations. The vcpu->arch.pio* fields still need to be filled even when the PIO is handled in-kernel as __emulator_pio_in() is always followed by complete_emulator_pio_in(). But after fixing that, it will be possible to to only populate the vcpu->arch.pio* fields on userspace exits. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:53:50 -04:00
Paolo Bonzini	35ab3b77a0	KVM: x86: drop PIO from unregistered devices KVM protects the device list with SRCU, and therefore different calls to kvm_io_bus_read()/kvm_io_bus_write() can very well see different incarnations of kvm->buses. If userspace unregisters a device while vCPUs are running there is no well-defined result. This patch applies a safe fallback by returning early from emulator_pio_in_out(). This corresponds to returning zeroes from IN, and dropping the writes on the floor for OUT. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:53:37 -04:00
Paolo Bonzini	0f87ac234d	KVM: x86: inline kernel_pio into its sole caller The caller of kernel_pio already has arguments for most of what kernel_pio fishes out of vcpu->arch.pio. This is the first step towards ensuring that vcpu->arch.pio.* is only used when exiting to userspace. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:53:23 -04:00
Paolo Bonzini	7a6177d6f3	KVM: x86: complete fast IN directly with complete_emulator_pio_in() Use complete_emulator_pio_in() directly when completing fast PIO, there's no need to bounce through emulator_pio_in(): the comment about ECX changing doesn't apply to fast PIO, which isn't used for string I/O. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:53:07 -04:00
Maxim Levitsky	091abbf578	KVM: x86: nSVM: optimize svm_set_x2apic_msr_interception - Avoid toggling the x2apic msr interception if it is already up to date. - Avoid touching L0 msr bitmap when AVIC is inhibited on entry to the guest mode, because in this case the guest usually uses its own msr bitmap. Later on VM exit, the 1st optimization will allow KVM to skip touching the L0 msr bitmap as well. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220519102709.24125-18-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:52:59 -04:00
Suravee Suthikulpanit	39b6b8c35c	KVM: SVM: Add AVIC doorbell tracepoint Add a tracepoint to track number of doorbells being sent to signal a running vCPU to process IRQ after being injected. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-17-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:52:45 -04:00
Suravee Suthikulpanit	8c9e639da4	KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible For x2AVIC, the index from incomplete IPI #vmexit info is invalid for logical cluster mode. Only ICRH/ICRL values can be used to determine the IPI destination APIC ID. Since QEMU defines guest physical APIC ID to be the same as vCPU ID, it can be used to quickly identify the target vCPU to deliver IPI, and avoid the overhead from searching through all vCPUs to match the target vCPU. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-16-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:52:18 -04:00
Suravee Suthikulpanit	f8d8ac2159	KVM: x86: Warning APICv inconsistency only when vcpu APIC mode is valid When launching a VM with x2APIC and specify more than 255 vCPUs, the guest kernel can disable x2APIC (e.g. specify nox2apic kernel option). The VM fallbacks to xAPIC mode, and disable the vCPU ID 255 and greater. In this case, APICV is deactivated for the disabled vCPUs. However, the current APICv consistency warning does not account for this case, which results in a warning. Therefore, modify warning logic to report only when vCPU APIC mode is valid. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-15-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:51:15 -04:00
Suravee Suthikulpanit	0e311d33bf	KVM: SVM: Introduce hybrid-AVIC mode Currently, AVIC is inhibited when booting a VM w/ x2APIC support. because AVIC cannot virtualize x2APIC MSR register accesses. However, the AVIC doorbell can be used to accelerate interrupt injection into a running vCPU, while all guest accesses to x2APIC MSRs will be intercepted and emulated by KVM. With hybrid-AVIC support, the APICV_INHIBIT_REASON_X2APIC is no longer enforced. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-14-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:51:00 -04:00
Suravee Suthikulpanit	c0caeee65a	KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu Originalliy, this WARN_ON is designed to detect when calling avic_vcpu_load() on an already running vcpu in AVIC mode (i.e. the AVIC is_running bit is set). However, for x2AVIC, the vCPU can switch from xAPIC to x2APIC mode while in running state, in which the avic_vcpu_load() will be called from svm_refresh_apicv_exec_ctrl(). Therefore, remove this warning since it is no longer appropriate. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-13-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:50:49 -04:00
Suravee Suthikulpanit	4d1d7942e3	KVM: SVM: Introduce logic to (de)activate x2AVIC mode Introduce logic to (de)activate AVIC, which also allows switching between AVIC to x2AVIC mode at runtime. When an AVIC-enabled guest switches from APIC to x2APIC mode, the SVM driver needs to perform the following steps: 1. Set the x2APIC mode bit for AVIC in VMCB along with the maximum APIC ID support for each mode accodingly. 2. Disable x2APIC MSRs interception in order to allow the hardware to virtualize x2APIC MSRs accesses. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-12-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:50:35 -04:00
Maxim Levitsky	7a8f7c1f34	KVM: x86: nSVM: always intercept x2apic msrs As a preparation for x2avic, this patch ensures that x2apic msrs are always intercepted for the nested guest. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220519102709.24125-11-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:50:26 -04:00

1 2 3 4 5 ...

1105134 Commits