Expose IA32_FLUSH_CMD to the guest if the guest CPUID enumerates
support for this MSR. As with IA32_PRED_CMD, permission for
unintercepted writes to this MSR will be granted to the guest after
the first non-zero write.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230201132905.549148-3-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
WARN if generating a GATag given a VM ID and vCPU ID doesn't yield the
same IDs when pulling the IDs back out of the tag. Don't bother adding
error handling to callers, this is very much a paranoid sanity check as
KVM fully controls the VM ID and is supposed to reject too-big vCPU IDs.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20230207002156.521736-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Define AVIC_VCPU_ID_MASK based on AVIC_PHYSICAL_MAX_INDEX, i.e. the mask
that effectively controls the largest guest physical APIC ID supported by
x2AVIC, instead of hardcoding the number of bits to 8 (and the number of
VM bits to 24).
The AVIC GATag is programmed into the AMD IOMMU IRTE to provide a
reference back to KVM in case the IOMMU cannot inject an interrupt into a
non-running vCPU. In such a case, the IOMMU notifies software by creating
a GALog entry with the corresponded GATag, and KVM then uses the GATag to
find the correct VM+vCPU to kick. Dropping bit 8 from the GATag results
in kicking the wrong vCPU when targeting vCPUs with x2APIC ID > 255.
Fixes: 4d1d7942e3 ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20230207002156.521736-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Provide a virtual cache topology to the guest to avoid
inconsistencies with migration on heterogenous systems. Non secure
software has no practical need to traverse the caches by set/way in
the first place.
- Add support for taking stage-2 access faults in parallel. This was an
accidental omission in the original parallel faults implementation,
but should provide a marginal improvement to machines w/o FEAT_HAFDBS
(such as hardware from the fruit company).
- A preamble to adding support for nested virtualization to KVM,
including vEL2 register state, rudimentary nested exception handling
and masking unsupported features for nested guests.
- Fixes to the PSCI relay that avoid an unexpected host SVE trap when
resuming a CPU when running pKVM.
- VGIC maintenance interrupt support for the AIC
- Improvements to the arch timer emulation, primarily aimed at reducing
the trap overhead of running nested.
- Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
interest of CI systems.
- Avoid VM-wide stop-the-world operations when a vCPU accesses its own
redistributor.
- Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions
in the host.
- Aesthetic and comment/kerneldoc fixes
- Drop the vestiges of the old Columbia mailing list and add [Oliver]
as co-maintainer
This also drags in arm64's 'for-next/sme2' branch, because both it and
the PSCI relay changes touch the EL2 initialization code.
RISC-V:
- Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
- Correctly place the guest in S-mode after redirecting a trap to the guest
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
s390:
- Two patches sorting out confusion between virtual and physical
addresses, which currently are the same on s390.
- A new ioctl that performs cmpxchg on guest memory
- A few fixes
x86:
- Change tdp_mmu to a read-only parameter
- Separate TDP and shadow MMU page fault paths
- Enable Hyper-V invariant TSC control
- Fix a variety of APICv and AVIC bugs, some of them real-world,
some of them affecting architecurally legal but unlikely to
happen in practice
- Mark APIC timer as expired if its in one-shot mode and the count
underflows while the vCPU task was being migrated
- Advertise support for Intel's new fast REP string features
- Fix a double-shootdown issue in the emergency reboot code
- Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
similar treatment to VMX
- Update Xen's TSC info CPUID sub-leaves as appropriate
- Add support for Hyper-V's extended hypercalls, where "support" at this
point is just forwarding the hypercalls to userspace
- Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
MSR filters
- One-off fixes and cleanups
- Fix and cleanup the range-based TLB flushing code, used when KVM is
running on Hyper-V
- Add support for filtering PMU events using a mask. If userspace
wants to restrict heavily what events the guest can use, it can now
do so without needing an absurd number of filter entries
- Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
support is disabled
- Add PEBS support for Intel Sapphire Rapids
- Fix a mostly benign overflow bug in SEV's send|receive_update_data()
- Move several SVM-specific flags into vcpu_svm
x86 Intel:
- Handle NMI VM-Exits before leaving the noinstr region
- A few trivial cleanups in the VM-Enter flows
- Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
EPTP switching (or any other VM function) for L1
- Fix a crash when using eVMCS's enlighted MSR bitmaps
Generic:
- Clean up the hardware enable and initialization flow, which was
scattered around multiple arch-specific hooks. Instead, just
let the arch code call into generic code. Both x86 and ARM should
benefit from not having to fight common KVM code's notion of how
to do initialization.
- Account allocations in generic kvm_arch_alloc_vm()
- Fix a memory leak if coalesced MMIO unregistration fails
selftests:
- On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit
the correct hypercall instruction instead of relying on KVM to patch
in VMMCALL
- Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx
mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O
9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H
duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs
VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV
/WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw==
=goe1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Provide a virtual cache topology to the guest to avoid
inconsistencies with migration on heterogenous systems. Non secure
software has no practical need to traverse the caches by set/way in
the first place
- Add support for taking stage-2 access faults in parallel. This was
an accidental omission in the original parallel faults
implementation, but should provide a marginal improvement to
machines w/o FEAT_HAFDBS (such as hardware from the fruit company)
- A preamble to adding support for nested virtualization to KVM,
including vEL2 register state, rudimentary nested exception
handling and masking unsupported features for nested guests
- Fixes to the PSCI relay that avoid an unexpected host SVE trap when
resuming a CPU when running pKVM
- VGIC maintenance interrupt support for the AIC
- Improvements to the arch timer emulation, primarily aimed at
reducing the trap overhead of running nested
- Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
interest of CI systems
- Avoid VM-wide stop-the-world operations when a vCPU accesses its
own redistributor
- Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
exceptions in the host
- Aesthetic and comment/kerneldoc fixes
- Drop the vestiges of the old Columbia mailing list and add [Oliver]
as co-maintainer
RISC-V:
- Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
- Correctly place the guest in S-mode after redirecting a trap to the
guest
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
s390:
- Sort out confusion between virtual and physical addresses, which
currently are the same on s390
- A new ioctl that performs cmpxchg on guest memory
- A few fixes
x86:
- Change tdp_mmu to a read-only parameter
- Separate TDP and shadow MMU page fault paths
- Enable Hyper-V invariant TSC control
- Fix a variety of APICv and AVIC bugs, some of them real-world, some
of them affecting architecurally legal but unlikely to happen in
practice
- Mark APIC timer as expired if its in one-shot mode and the count
underflows while the vCPU task was being migrated
- Advertise support for Intel's new fast REP string features
- Fix a double-shootdown issue in the emergency reboot code
- Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
SVM similar treatment to VMX
- Update Xen's TSC info CPUID sub-leaves as appropriate
- Add support for Hyper-V's extended hypercalls, where "support" at
this point is just forwarding the hypercalls to userspace
- Clean up the kvm->lock vs. kvm->srcu sequences when updating the
PMU and MSR filters
- One-off fixes and cleanups
- Fix and cleanup the range-based TLB flushing code, used when KVM is
running on Hyper-V
- Add support for filtering PMU events using a mask. If userspace
wants to restrict heavily what events the guest can use, it can now
do so without needing an absurd number of filter entries
- Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
support is disabled
- Add PEBS support for Intel Sapphire Rapids
- Fix a mostly benign overflow bug in SEV's
send|receive_update_data()
- Move several SVM-specific flags into vcpu_svm
x86 Intel:
- Handle NMI VM-Exits before leaving the noinstr region
- A few trivial cleanups in the VM-Enter flows
- Stop enabling VMFUNC for L1 purely to document that KVM doesn't
support EPTP switching (or any other VM function) for L1
- Fix a crash when using eVMCS's enlighted MSR bitmaps
Generic:
- Clean up the hardware enable and initialization flow, which was
scattered around multiple arch-specific hooks. Instead, just let
the arch code call into generic code. Both x86 and ARM should
benefit from not having to fight common KVM code's notion of how to
do initialization
- Account allocations in generic kvm_arch_alloc_vm()
- Fix a memory leak if coalesced MMIO unregistration fails
selftests:
- On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
emit the correct hypercall instruction instead of relying on KVM to
patch in VMMCALL
- Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
KVM: SVM: hyper-v: placate modpost section mismatch error
KVM: x86/mmu: Make tdp_mmu_allowed static
KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
KVM: arm64: nv: Filter out unsupported features from ID regs
KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
KVM: arm64: nv: Handle SMCs taken from virtual EL2
KVM: arm64: nv: Handle trapped ERET from virtual EL2
KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
KVM: arm64: nv: Support virtual EL2 exceptions
KVM: arm64: nv: Handle HCR_EL2.NV system register traps
KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
KVM: arm64: nv: Add EL2 system registers to vcpu context
KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
KVM: arm64: nv: Introduce nested virtualization VCPU feature
KVM: arm64: Use the S2 MMU context to iterate over S2 table
...
modpost reports section mismatch errors/warnings:
WARNING: modpost: vmlinux.o: section mismatch in reference: svm_hv_hardware_setup (section: .text) -> (unknown) (section: .init.data)
WARNING: modpost: vmlinux.o: section mismatch in reference: svm_hv_hardware_setup (section: .text) -> (unknown) (section: .init.data)
WARNING: modpost: vmlinux.o: section mismatch in reference: svm_hv_hardware_setup (section: .text) -> (unknown) (section: .init.data)
This "(unknown) (section: .init.data)" all refer to svm_x86_ops.
Tag svm_hv_hardware_setup() with __init to fix a modpost warning as the
non-stub implementation accesses __initdata (svm_x86_ops), i.e. would
generate a use-after-free if svm_hv_hardware_setup() were actually invoked
post-init. The helper is only called from svm_hardware_setup(), which is
also __init, i.e. lack of __init is benign other than the modpost warning.
Fixes: 1e0c7d4075 ("KVM: SVM: hyper-v: Remote TLB flush for SVM")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vineeth Pillai <viremana@linux.microsoft.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20230222073315.9081-1-rdunlap@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
where possible, when supporting a debug registers swap feature for
SEV-ES guests
- Add support for AMD's version of eIBRS called Automatic IBRS which is
a set-and-forget control of indirect branch restriction speculation
resources on privilege change
- Add support for a new x86 instruction - LKGS - Load kernel GS which is
part of the FRED infrastructure
- Reset SPEC_CTRL upon init to accomodate use cases like kexec which
rediscover
- Other smaller fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmP1RDIACgkQEsHwGGHe
VUohBw//ZB9ZRqsrKdm6D9YaP2x4Zb+kqKqo6rjYeWaYqyPyCwDujPwh+pb3Oq1t
aj62muDv1t/wEJc8mKNkfXkjEEtBVAOcpb5YIpKreoEvNKyevol83Ih0u5iJcTRE
E5qf8HDS8b/JZrcazJJLl6WQmQNH5RiKSu5bbCpRhoeOcyo5pRYR5MztK9vNmAQk
GMdwHsUSU+jN8uiE4HnpaOb/luhgFindRwZVTpdjJegQWLABS8cl3CKeTv4+PW45
isvv37XnQP248wsptIEVRHeG6g3g/HtvwRx7DikUw06QwUyUK7H9hJssOoSP8TL9
u4psRwfWnJ1OxU6klL+s0Ii+pjQ97wXmK/oqK7QkdUwhWqR/mQAW2e9kWHAngyDn
A6mKbzSM6HFAeSXQpB9cMb6uvYRD44SngDFe3WXtEK8jiiQ70ikUm4E28I5KJOPg
s+RyioHk0NFRHYSOOBqNG1NKz6ED7L3GbgbbzxkgMh21AAyI3X351t+PtGoLV5ew
eqOsM7lbg9Scg1LvPk1JcoALS8USWqgar397rz9qGUs+OkPWBtEBCmTdMz/Eb+2t
g/WHdLS5/ajSs5gNhT99W3DeqZMPDEkgBRSeyBBmY3CUD3gBL2wXEktRXv504zBR
RC4oyUPX3c9E2ib6GATLE3kBLbcz9hTWbMxF+X3lLJvTVd/Qc2o=
=v/ZC
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Cache the AMD debug registers in per-CPU variables to avoid MSR
writes where possible, when supporting a debug registers swap feature
for SEV-ES guests
- Add support for AMD's version of eIBRS called Automatic IBRS which is
a set-and-forget control of indirect branch restriction speculation
resources on privilege change
- Add support for a new x86 instruction - LKGS - Load kernel GS which
is part of the FRED infrastructure
- Reset SPEC_CTRL upon init to accomodate use cases like kexec which
rediscover
- Other smaller fixes and cleanups
* tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/amd: Cache debug register values in percpu variables
KVM: x86: Propagate the AMD Automatic IBRS feature to the guest
x86/cpu: Support AMD Automatic IBRS
x86/cpu, kvm: Add the SMM_CTL MSR not present feature
x86/cpu, kvm: Add the Null Selector Clears Base feature
x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf
x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature
KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code
x86/cpu, kvm: Add support for CPUID_80000021_EAX
x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h>
x86/gsseg: Use the LKGS instruction if available for load_gs_index()
x86/gsseg: Move load_gs_index() to its own new header file
x86/gsseg: Make asm_load_gs_index() take an u16
x86/opcode: Add the LKGS instruction to x86-opcode-map
x86/cpufeature: Add the CPU feature bit for LKGS
x86/bugs: Reset speculation control settings on init
x86/cpu: Remove redundant extern x86_read_arch_cap_msr()
- Fix a mostly benign overflow bug in SEV's send|receive_update_data()
- Move the SVM-specific "host flags" into vcpu_svm (extracted from the
vNMI enabling series)
- A handful for fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmPsH7kSHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5/dQQAJSVCYA7F7LcfJf+c1ULG0XCd8rHnXdR
EGTygTnWrzwTCRaunOBPE4AJRxKdkwTKy+yfnnQVRdYYfRe1SZKpUQ2XNEDGn0+v
zVfOmSFFcCWXmJeY8y5n1GlDH4ENO3G7nD1ncDQ0I9PazmOsmxChoVZ9afFJ5bpo
73hjcYVfUDYxGkeRLWSSSFtWIGguE8BkpRH3wZ8MGZi+ueoFUJPPBKHeDtxCV2/T
KcJLne8tQVTiWCdMO3EFwxgIvsjQDoT0gZYLYNHJ6KqD9Szc3jA9v2ryTm5IYlpb
akYUqePaD0SGrrfDBrwz3bLu3fehDu7eduXESRlzzb8S4xP7/qXeo9KeVN+b4MBb
nmBBFncvMWbC8Po5wB5OVAfAa7ACmGiXeBV8pfgGI6FTq1fpc4VNm2PevKkDvlqN
O2eZ1KuNkwBnbIPj3JVPPnsJcUjYXFjZyzfpMV1T+ExmL/IYceatX4S7zfgL5nUg
3qFi5mX2Cufk2EBvBu+Dkpt/H4lze+ysZRciMC+v7Q4LWAYZ8HW1a44pnBVUJMPM
bWiJ1/O8RIWM1tWIrlO38+ZZalbu3spIVMBXKzqEGXvpUwJ4UgZM1tFiWvISTVFe
2X6N3d7aT/DQ1PzZU6BsyVZWAFaodHBauMcr9FUkWqqGu3HOhqC4rSJZ9eRR7V5O
WSp1gTVY1JXy
=AVpx
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-svm-6.3' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.3:
- Fix a mostly benign overflow bug in SEV's send|receive_update_data()
- Move the SVM-specific "host flags" into vcpu_svm (extracted from the
vNMI enabling series)
- A handful for fixes and cleanups
KVM_SEV_SEND_UPDATE_DATA and KVM_SEV_RECEIVE_UPDATE_DATA have an integer
overflow issue. Params.guest_len and offset are both 32 bits wide, with a
large params.guest_len the check to confirm a page boundary is not
crossed can falsely pass:
/* Check if we are crossing the page boundary *
offset = params.guest_uaddr & (PAGE_SIZE - 1);
if ((params.guest_len + offset > PAGE_SIZE))
Add an additional check to confirm that params.guest_len itself is not
greater than PAGE_SIZE.
Note, this isn't a security concern as overflow can happen if and only if
params.guest_len is greater than 0xfffff000, and the FW spec says these
commands fail with lengths greater than 16KB, i.e. the PSP will detect
KVM's goof.
Fixes: 15fb7de1a7 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Fixes: d3d1af85e2 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
Reported-by: Andy Nguyen <theflow@google.com>
Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230207171354.4012821-1-pgonda@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Move HF_NMI_MASK and HF_IRET_MASK (a.k.a. "waiting for IRET") out of the
common "hflags" and into dedicated flags in "struct vcpu_svm". The flags
are used only for the SVM and thus should not be in hflags.
Tracking NMI masking in software isn't SVM specific, e.g. VMX has a
similar flag (soft_vnmi_blocked), but that's much more of a hack as VMX
can't intercept IRET, is useful only for ancient CPUs, i.e. will
hopefully be removed at some point, and again the exact behavior is
vendor specific and shouldn't ever be referenced in common code.
converting VMX
No functional change is intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20221129193717.513824-5-mlevitsk@redhat.com
[sean: split from HF_GIF_MASK patch]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Move HF_GIF_MASK out of the common "hflags" and into vcpu_svm.guest_gif.
GIF is an SVM-only concept and has should never be consulted outside of
SVM-specific code.
No functional change is intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20221129193717.513824-5-mlevitsk@redhat.com
[sean: split to separate patch]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Don't sync the TLB control field from vmcb02 to vmcs12 on nested VM-Exit.
Per AMD's APM, the field is not modified by hardware:
The VMRUN instruction reads, but does not change, the value of the
TLB_CONTROL field
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20221129193717.513824-2-mlevitsk@redhat.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add helpers to print unimplemented MSR accesses and condition all such
prints on report_ignored_msrs, i.e. honor userspace's request to not
print unimplemented MSRs. Even though vcpu_unimpl() is ratelimited,
printing can still be problematic, e.g. if a print gets stalled when host
userspace is writing MSRs during live migration, an effective stall can
result in very noticeable disruption in the guest.
E.g. the profile below was taken while calling KVM_SET_MSRS on the PMU
counters while the PMU was disabled in KVM.
- 99.75% 0.00% [.] __ioctl
- __ioctl
- 99.74% entry_SYSCALL_64_after_hwframe
do_syscall_64
sys_ioctl
- do_vfs_ioctl
- 92.48% kvm_vcpu_ioctl
- kvm_arch_vcpu_ioctl
- 85.12% kvm_set_msr_ignored_check
svm_set_msr
kvm_set_msr_common
printk
vprintk_func
vprintk_default
vprintk_emit
console_unlock
call_console_drivers
univ8250_console_write
serial8250_console_write
uart_console_write
Reported-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20230124234905.3774678-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Limit kvm_pmu_cap.num_counters_gp during kvm_init_pmu_capability() based
on the vendor PMU capabilities so that consuming num_counters_gp naturally
does the right thing. This fixes a mostly theoretical bug where KVM could
over-report its PMU support in KVM_GET_SUPPORTED_CPUID for leaf 0xA, e.g.
if the number of counters reported by perf is greater than KVM's
hardcoded internal limit. Incorporating input from the AMD PMU also
avoids over-reporting MSRs to save when running on AMD.
Link: https://lore.kernel.org/r/20230124234905.3774678-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add the AMD Automatic IBRS feature bit to those being propagated to the guest,
and enable the guest EFER bit.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-9-kim.phillips@amd.com
Account the temp/scratch allocation used to decrypt unaligned debug
accesses to SEV guest memory, the allocation is very much tied to the
target VM.
Reported-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Anish Ghulati <aghulati@google.com>
Link: https://lore.kernel.org/r/20230113220923.2834699-1-aghulati@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Even in commit 4bdec12aa8 ("KVM: SVM: Detect X2APIC virtualization
(x2AVIC) support"), where avic_hardware_setup() was first introduced,
its only pass-in parameter "struct kvm_x86_ops *ops" is not used at all.
Clean it up a bit to avoid compiler ranting from LLVM toolchain.
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221109115952.92816-1-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Return value from svm_nmi_blocked() directly instead of taking
this in another redundant variable.
Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn>
Link: https://lore.kernel.org/r/202211282003389362484@zte.com.cn
Signed-off-by: Sean Christopherson <seanjc@google.com>
When checking if a pmu event the guest is attempting to program should
be filtered, only consider the event select + unit mask in that
decision. Use an architecture specific mask to mask out all other bits,
including bits 35:32 on Intel. Those bits are not part of the event
select and should not be considered in that decision.
Fixes: 66bb8a065f ("KVM: x86: PMU Event Filter")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-2-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
The first half or so patches fix semi-urgent, real-world relevant APICv
and AVIC bugs.
The second half fixes a variety of AVIC and optimized APIC map bugs
where KVM doesn't play nice with various edge cases that are
architecturally legal(ish), but are unlikely to occur in most real world
scenarios
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ARM:
* Fix the PMCR_EL0 reset value after the PMU rework
* Correctly handle S2 fault triggered by a S1 page table walk
by not always classifying it as a write, as this breaks on
R/O memslots
* Document why we cannot exit with KVM_EXIT_MMIO when taking
a write fault from a S1 PTW on a R/O memslot
* Put the Apple M2 on the naughty list for not being able to
correctly implement the vgic SEIS feature, just like the M1
before it
* Reviewer updates: Alex is stepping down, replaced by Zenghui
x86:
* Fix various rare locking issues in Xen emulation and teach lockdep
to detect them
* Documentation improvements
* Do not return host topology information from KVM_GET_SUPPORTED_CPUID
Track the per-vendor required APICv inhibits with a variable instead of
calling into vendor code every time KVM wants to query the set of
required inhibits. The required inhibits are a property of the vendor's
virtualization architecture, i.e. are 100% static.
Using a variable allows the compiler to inline the check, e.g. generate
a single-uop TEST+Jcc, and thus eliminates any desire to avoid checking
inhibits for performance reasons.
No functional change intended.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-32-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Turns out that some warnings exist for good reasons. Restore the warning
in avic_vcpu_load() that guards against calling avic_vcpu_load() on a
running vCPU now that KVM avoids doing so when switching between x2APIC
and xAPIC. The entire point of the WARN is to highlight that KVM should
not be reloading an AVIC.
Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid
spamming the kernel if it does fire.
This reverts commit c0caeee65a.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-31-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop writes to APIC_RRR, a.k.a. Remote Read Data Register, on AVIC
unaccelerated write traps. The register is read-only and isn't emulated
by KVM. Sending the register through kvm_apic_write_nodecode() will
result in screaming when x2APIC is enabled due to the unexpected failure
to retrieve the MSR (KVM expects that only "legal" accesses will trap).
Fixes: 4d1d7942e3 ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230106011306.85230-30-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Iterate over all target logical IDs in the AVIC kick fastpath instead of
bailing if there is more than one target. Now that KVM inhibits AVIC if
vCPUs aren't mapped 1:1 with logical IDs, each bit in the destination is
guaranteed to match to at most one vCPU, i.e. iterating over the bitmap
is guaranteed to kick each valid target exactly once.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-29-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not modify AVIC's logical ID table if the logical ID portion of the
LDR is not a power-of-2, i.e. if the LDR has multiple bits set. Taking
only the first bit means that KVM will fail to match MDAs that intersect
with "higher" bits in the "ID"
The "ID" acts as a bitmap, but is referred to as an ID because there's an
implicit, unenforced "requirement" that software only set one bit. This
edge case is arguably out-of-spec behavior, but KVM cleanly handles it
in all other cases, e.g. the optimized logical map (and AVIC!) is also
disabled in this scenario.
Refactor the code to consolidate the checks, and so that the code looks
more like avic_kick_target_vcpus_fast().
Fixes: 18f40c53e1 ("svm: Add VMEXIT handlers for AVIC")
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-28-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Update SVM's cache of the LDR even if the new value is "bad". Leaving
stale information in the cache can result in KVM missing updates and/or
invalidating the wrong entry, e.g. if avic_invalidate_logical_id_entry()
is triggered after a different vCPU has "claimed" the old LDR.
Fixes: 18f40c53e1 ("svm: Add VMEXIT handlers for AVIC")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-27-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Update the vCPU's local (virtual) APIC on LDR writes even if the write
"fails". The APIC needs to recalc the optimized logical map even if the
LDR is invalid or zero, e.g. if the guest clears its LDR, the optimized
map will be left as is and the vCPU will receive interrupts using its
old LDR.
Fixes: 18f40c53e1 ("svm: Add VMEXIT handlers for AVIC")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-26-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Inhibit SVM's AVIC if multiple vCPUs are aliased to the same logical ID.
Architecturally, all CPUs whose logical ID matches the MDA are supposed
to receive the interrupt; overwriting existing entries in AVIC's
logical=>physical map can result in missed IPIs.
Fixes: 18f40c53e1 ("svm: Add VMEXIT handlers for AVIC")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-25-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Inhibit APICv/AVIC if the optimized physical map is disabled so that KVM
KVM provides consistent APIC behavior if xAPIC IDs are aliased due to
vcpu_id being truncated and the x2APIC hotplug hack isn't enabled. If
the hotplug hack is disabled, events that are emulated by KVM will follow
architectural behavior (all matching vCPUs receive events, even if the
"match" is due to truncation), whereas APICv and AVIC will deliver events
only to the first matching vCPU, i.e. the vCPU that matches without
truncation.
Note, the "extra" inhibit is needed because KVM deliberately ignores
mismatches due to truncation when applying the APIC_ID_MODIFIED inhibit
so that large VMs (>255 vCPUs) can run with APICv/AVIC.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-24-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a helper to perform the final kick, two instances of the ICR decoding
is one too many.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230106011306.85230-17-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Document that AVIC is inhibited if any vCPU's APIC ID diverges from its
vCPU ID, i.e. that there's no need to check for a destination match in
the AVIC kick fast path.
Opportunistically tweak comments to remove "guest bug", as that suggests
KVM is punting on error handling, which is not the case. Targeting a
non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether
or not it's a guest bug is irrelevant. Such behavior is architecturally
legal and thus needs to faithfully emulated by KVM (and it is).
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230106011306.85230-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Due to a likely mismerge of patches, KVM ended up with a superfluous
commit to "enable" AVIC's fast path for x2AVIC mode. Even worse, the
superfluous commit has several bugs and creates a nasty local shadow
variable.
Rather than fix the bugs piece-by-piece[*] to achieve the same end
result, revert the patch wholesale.
Opportunistically add a comment documenting the x2AVIC dependencies.
This reverts commit 8c9e639da4.
[*] https://lore.kernel.org/all/YxEP7ZBRIuFWhnYJ@google.com
Fixes: 8c9e639da4 ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible")
Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-15-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For X2APIC ID in cluster mode, the logical ID is bit [15:0].
Fixes: 603ccef42c ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Compute the destination from ICRH using the sender's x2APIC status, not
each (potential) target's x2APIC status.
Fixes: c514d3a348 ("KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID")
Cc: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230106011306.85230-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace the "avic_mode" enum with a single bool to track whether or not
x2AVIC is enabled. KVM already has "apicv_enabled" that tracks if any
flavor of AVIC is enabled, i.e. AVIC_MODE_NONE and AVIC_MODE_X1 are
redundant and unnecessary noise.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230106011306.85230-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Free the APIC access page memslot if any vCPU enables x2APIC and SVM's
AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with
x2APIC enabled. On AMD, if its "hybrid" mode is enabled (AVIC is enabled
when x2APIC is enabled even without x2AVIC support), keeping the APIC
access page memslot results in the guest being able to access the virtual
APIC page as x2APIC is fully emulated by KVM. I.e. hardware isn't aware
that the guest is operating in x2APIC mode.
Exempt nested SVM's update of APICv state from the new logic as x2APIC
can't be toggled on VM-Exit. In practice, invoking the x2APIC logic
should be harmless precisely because it should be a glorified nop, but
play it safe to avoid latent bugs, e.g. with dropping the vCPU's SRCU
lock.
Intel doesn't suffer from the same issue as APICv has fully independent
VMCS controls for xAPIC vs. x2APIC virtualization. Technically, KVM
should provide bus error semantics and not memory semantics for the APIC
page when x2APIC is enabled, but KVM already provides memory semantics in
other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware
disabled (via APIC_BASE MSR).
Note, checking apic_access_memslot_enabled without taking locks relies
it being set during vCPU creation (before kvm_vcpu_reset()). vCPUs can
race to set the inhibit and delete the memslot, i.e. can get false
positives, but can't get false negatives as apic_access_memslot_enabled
can't be toggled "on" once any vCPU reaches KVM_RUN.
Opportunistically drop the "can" while updating avic_activate_vmcb()'s
comment, i.e. to state that KVM _does_ support the hybrid mode. Move
the "Note:" down a line to conform to preferred kernel/KVM multi-line
comment style.
Opportunistically update the apicv_update_lock comment, as it isn't
actually used to protect apic_access_memslot_enabled (which is protected
by slots_lock).
Fixes: 0e311d33bf ("KVM: SVM: Introduce hybrid-AVIC mode")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230106011306.85230-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the APIC access page allocation helper function to common x86 code,
the allocation routine is virtually identical between APICv (VMX) and
AVIC (SVM). Keep APICv's gfn_to_page() + put_page() sequence, which
verifies that a backing page can be allocated, i.e. that the system isn't
under heavy memory pressure. Forcing the backing page to be populated
isn't strictly necessary, but skipping the effective prefetch only delays
the inevitable.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into
avic_set_virtual_apic_mode() and invert the dependency being said
functions to avoid calling avic_vcpu_{load,put}() and
avic_set_pi_irte_mode() when "only" setting the virtual APIC mode.
avic_set_virtual_apic_mode() is invoked from common x86 with preemption
enabled, which makes avic_vcpu_{load,put}() unhappy. Luckily, calling
those and updating IRTE stuff is unnecessary as the only reason
avic_set_virtual_apic_mode() is called is to handle transitions between
xAPIC and x2APIC that don't also toggle APICv activation. And if
activation doesn't change, there's no need to fiddle with the physical
APIC ID table or update IRTE.
The "full" refresh is guaranteed to be called if activation changes in
this case as the only call to the "set" path is:
kvm_vcpu_update_apicv(vcpu);
static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
and kvm_vcpu_update_apicv() invokes the refresh if activation changes:
if (apic->apicv_active == activate)
goto out;
apic->apicv_active = activate;
kvm_apic_update_apicv(vcpu);
static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
Rename the helper to reflect that it is also called during "refresh".
WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd]
CPU: 183 PID: 49186 Comm: stable Tainted: G O 6.0.0-smp--fcddbca45f0a-sink #34
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd]
avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd]
avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd]
kvm_lapic_set_base+0x149/0x1a0 [kvm]
kvm_set_apic_base+0x8f/0xd0 [kvm]
kvm_set_msr_common+0xa3a/0xdc0 [kvm]
svm_set_msr+0x364/0x6b0 [kvm_amd]
__kvm_set_msr+0xb8/0x1c0 [kvm]
kvm_emulate_wrmsr+0x58/0x1d0 [kvm]
msr_interception+0x1c/0x30 [kvm_amd]
svm_invoke_exit_handler+0x31/0x100 [kvm_amd]
svm_handle_exit+0xfc/0x160 [kvm_amd]
vcpu_enter_guest+0x21bb/0x23e0 [kvm]
vcpu_run+0x92/0x450 [kvm]
kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm]
kvm_vcpu_ioctl+0x559/0x620 [kvm]
Fixes: 05c4fe8c1b ("KVM: SVM: Refresh AVIC configuration when changing APIC mode")
Cc: stable@vger.kernel.org
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Emulate ICR writes on AVIC IPI failures due to invalid targets using the
same logic as failures due to invalid types. AVIC acceleration fails if
_any_ of the targets are invalid, and crucially VM-Exits before sending
IPIs to targets that _are_ valid. In logical mode, the destination is a
bitmap, i.e. a single IPI can target multiple logical IDs. Doing nothing
causes KVM to drop IPIs if at least one target is valid and at least one
target is invalid.
Fixes: 18f40c53e1 ("svm: Add VMEXIT handlers for AVIC")
Cc: stable@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Flush the TLB when activating AVIC as the CPU can insert into the TLB
while AVIC is "locally" disabled. KVM doesn't treat "APIC hardware
disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC
hardware disabled, AVIC is not guaranteed to be inhibited. As a result,
KVM may create a valid NPT mapping for the APIC base, which the CPU can
cache as a non-AVIC translation.
Note, Intel handles this in vmx_set_virtual_apic_mode().
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230106011306.85230-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The mysterious comment "We only want the cr8 intercept bits of L1"
dates back to basically the introduction of nested SVM, back when
the handling of "less typical" hypervisors was very haphazard.
With the development of kvm-unit-tests for interrupt handling,
the same code grew another vmcb_clr_intercept for the interrupt
window (VINTR) vmexit, this time with a comment that is at least
decent.
It turns out however that the same comment applies to the CR8 write
intercept, which is also a "recheck if an interrupt should be
injected" intercept. The CR8 read intercept instead has not
been used by KVM for 14 years (commit 649d68643e, "KVM: SVM:
sync TPR value to V_TPR field in the VMCB"), so do not bother
clearing it and let one comment describe both CR8 write and VINTR
handling.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do compatibility checks when enabling hardware to effectively add
compatibility checks when onlining a CPU. Abort enabling, i.e. the
online process, if the (hotplugged) CPU is incompatible with the known
good setup.
At init time, KVM does compatibility checks to ensure that all online
CPUs support hardware virtualization and a common set of features. But
KVM uses hotplugged CPUs without such compatibility checks. On Intel
CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX, or
VM-Entry failure if the hotplugged CPU doesn't support all features
enabled by KVM.
Note, this is little more than a NOP on SVM, as SVM already checks for
full SVM support during hardware enabling.
Opportunistically add a pr_err() if setup_vmcs_config() fails, and
tweak all error messages to output which CPU failed.
Signed-off-by: Chao Gao <chao.gao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20221130230934.1014142-41-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the .check_processor_compatibility() callback from kvm_x86_init_ops
to kvm_x86_ops to allow a future patch to do compatibility checks during
CPU hotplug.
Do kvm_ops_update() before compat checks so that static_call() can be
used during compat checks.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20221130230934.1014142-40-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Check that SVM is supported and enabled in the processor compatibility
checks. SVM already checks for support during hardware enabling,
i.e. this doesn't really add new functionality. The net effect is that
KVM will refuse to load if a CPU doesn't have SVM fully enabled, as
opposed to failing KVM_CREATE_VM.
Opportunistically move svm_check_processor_compat() up in svm.c so that
it can be invoked during hardware enabling in a future patch.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221130230934.1014142-39-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do basic VMX/SVM support checks directly in vendor code instead of
implementing them via kvm_x86_ops hooks. Beyond the superficial benefit
of providing common messages, which isn't even clearly a net positive
since vendor code can provide more precise/detailed messages, there's
zero advantage to bouncing through common x86 code.
Consolidating the checks will also simplify performing the checks
across all CPUs (in a future patch).
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221130230934.1014142-37-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code. In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.
Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.
Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.
Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20221130230934.1014142-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use KBUILD_MODNAME to specify the vendor module name instead of manually
writing out the name to make it a bit more obvious that the name isn't
completely arbitrary. A future patch will also use KBUILD_MODNAME to
define pr_fmt, at which point using KBUILD_MODNAME for kvm_x86_ops.name
further reinforces the intended usage of kvm_x86_ops.name.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221130230934.1014142-34-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop kvm_arch_check_processor_compat() and its support code now that all
architecture implementations are nops.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390
Acked-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20221130230934.1014142-33-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the CPU compatibility checks to pure x86 code, i.e. drop x86's use
of the common kvm_x86_check_cpu_compat() arch hook. x86 is the only
architecture that "needs" to do per-CPU compatibility checks, moving
the logic to x86 will allow dropping the common code, and will also
give x86 more control over when/how the compatibility checks are
performed, e.g. TDX will need to enable hardware (do VMXON) in order to
perform compatibility checks.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20221130230934.1014142-32-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the guts of kvm_arch_init() to a new helper, kvm_x86_vendor_init(),
so that VMX can do _all_ arch and vendor initialization before calling
kvm_init(). Calling kvm_init() must be the _very_ last step during init,
as kvm_init() exposes /dev/kvm to userspace, i.e. allows creating VMs.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221130230934.1014142-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>