Move the enter/exit logic in {svm,vmx}_vcpu_enter_exit() to common
helpers. Opportunistically update the somewhat stale comment about the
updates needing to occur immediately after VM-Exit.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210505002735.1684165-9-seanjc@google.com
Defer the call to account guest time until after servicing any IRQ(s)
that happened in the guest or immediately after VM-Exit. Tick-based
accounting of vCPU time relies on PF_VCPU being set when the tick IRQ
handler runs, and IRQs are blocked throughout the main sequence of
vcpu_enter_guest(), including the call into vendor code to actually
enter and exit the guest.
This fixes a bug where reported guest time remains '0', even when
running an infinite loop in the guest:
https://bugzilla.kernel.org/show_bug.cgi?id=209831
Fixes: 87fa7f3e98 ("x86/kvm: Move context tracking where it belongs")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation,
zap under read lock, enable/disably dirty page logging under
read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing
the architecture-specific code
- Some selftests improvements
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK
y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b
c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+
Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY
+2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b
M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ==
=AXUi
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This is a large update by KVM standards, including AMD PSP (Platform
Security Processor, aka "AMD Secure Technology") and ARM CoreSight
(debug and trace) changes.
ARM:
- CoreSight: Add support for ETE and TRBE
- Stage-2 isolation for the host kernel when running in protected
mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- AMD PSP driver changes
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation, zap under
read lock, enable/disably dirty page logging under read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing the
architecture-specific code
- a handful of "Get rid of oprofile leftovers" patches
- Some selftests improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
KVM: selftests: Speed up set_memory_region_test
selftests: kvm: Fix the check of return value
KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
KVM: SVM: Skip SEV cache flush if no ASIDs have been used
KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
KVM: SVM: Drop redundant svm_sev_enabled() helper
KVM: SVM: Move SEV VMCB tracking allocation to sev.c
KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
KVM: SVM: Unconditionally invoke sev_hardware_teardown()
KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
KVM: SVM: Move SEV module params/variables to sev.c
KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
KVM: SVM: Zero out the VMCB array used to track SEV ASID association
x86/sev: Drop redundant and potentially misleading 'sev_enabled'
KVM: x86: Move reverse CPUID helpers to separate header file
KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
...
Pull cgroup changes from Tejun Heo:
"The only notable change is Vipin's new misc cgroup controller.
This implements generic support for resources which can be controlled
by simply counting and limiting the number of resource instances - ie
there's X number of these on the system and this cgroup subtree can
have upto Y of those.
The first user is the address space IDs used for virtual machine
memory encryption and expected future usages are similar - niche
hardware features with concrete resource limits and simple usage
models"
* 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: use tsk->in_iowait instead of delayacct_is_task_waiting_on_io()
cgroup/cpuset: fix typos in comments
cgroup: misc: mark dummy misc_cg_res_total_usage() static inline
svm/sev: Register SEV and SEV-ES ASIDs to the misc controller
cgroup: Miscellaneous cgroup documentation.
cgroup: Add misc cgroup controller
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGmYIACgkQEsHwGGHe
VUr45w/8CSXr7MXaFBj4To0hTWJXSZyF6YGqlZOSJXFcFh4cWTNwfVOoFaV47aDo
+HsCNTkGENcKhLrDUWDRiG/Uo46jxtOtl1vhq7U4pGemSYH871XWOKfb5k5XNMwn
/uhaHMI4aEfd6bUFnF518NeyRIsD0BdqFj4tB7RbAiyFwdETDX9Tkj/uBKnQ4zon
4tEDoXgThuK5YKK9zVQg5pa7aFp2zg1CAdX/WzBkS8BHVBPXSV0CF97AJYQOM/V+
lUHv+BN3wp97GYHPQMPsbkNr8IuFoe2mIvikwjxg8iOFpzEU1G1u09XV9R+PXByX
LclFTRqK/2uU5hJlcsBiKfUuidyErYMRYImbMAOREt2w0ogWVu2zQ7HkjVve25h1
sQPwPudbAt6STbqRxvpmB3yoV4TCYwnF91FcWgEy+rcEK2BDsHCnScA45TsK5I1C
kGR1K17pHXprgMZFPveH+LgxewB6smDv+HllxQdSG67LhMJXcs2Epz0TsN8VsXw8
dlD3lGReK+5qy9FTgO7mY0xhiXGz1IbEdAPU4eRBgih13puu03+jqgMaMabvBWKD
wax+BWJUrPtetwD5fBPhlS/XdJDnd8Mkv2xsf//+wT0s4p+g++l1APYxeB8QEehm
Pd7Mvxm4GvQkfE13QEVIPYQRIXCMH/e9qixtY5SHUZDBVkUyFM0=
=bO1i
-----END PGP SIGNATURE-----
Merge tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 cleanups from Borislav Petkov:
"Trivial cleanups and fixes all over the place"
* tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Remove me from IDE/ATAPI section
x86/pat: Do not compile stubbed functions when X86_PAT is off
x86/asm: Ensure asm/proto.h can be included stand-alone
x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files
x86/msr: Make locally used functions static
x86/cacheinfo: Remove unneeded dead-store initialization
x86/process/64: Move cpu_current_top_of_stack out of TSS
tools/turbostat: Unmark non-kernel-doc comment
x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL()
x86/fpu/math-emu: Fix function cast warning
x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes
x86: Fix various typos in comments, take #2
x86: Remove unusual Unicode characters from comments
x86/kaslr: Return boolean values from a function returning bool
x86: Fix various typos in comments
x86/setup: Remove unused RESERVE_BRK_ARRAY()
stacktrace: Move documentation for arch_stack_walk_reliable() to header
x86: Remove duplicate TSC DEADLINE MSR definitions
Skip SEV's expensive WBINVD and DF_FLUSH if there are no SEV ASIDs
waiting to be reclaimed, e.g. if SEV was never used. This "fixes" an
issue where the DF_FLUSH fails during hardware teardown if the original
SEV_INIT failed. Ideally, SEV wouldn't be marked as enabled in KVM if
SEV_INIT fails, but that's a problem for another day.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the forward declaration of sev_flush_asids(), which is only a few
lines above the function itself.
No functional change intended.
Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-15-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace calls to svm_sev_enabled() with direct checks on sev_enabled, or
in the case of svm_mem_enc_op, simply drop the call to svm_sev_enabled().
This effectively replaces checks against a valid max_sev_asid with checks
against sev_enabled. sev_enabled is forced off by sev_hardware_setup()
if max_sev_asid is invalid, all call sites are guaranteed to run after
sev_hardware_setup(), and all of the checks care about SEV being fully
enabled (as opposed to intentionally handling the scenario where
max_sev_asid is valid but SEV enabling fails due to OOM).
Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the allocation of the SEV VMCB array to sev.c to help pave the way
toward encapsulating SEV enabling wholly within sev.c.
No functional change intended.
Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Query max_sev_asid directly after setting it instead of bouncing through
its wrapper, svm_sev_enabled(). Using the wrapper is unnecessary
obfuscation.
No functional change intended.
Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the redundant svm_sev_enabled() check when calling
sev_hardware_teardown(), the teardown helper itself does the check.
Removing the check from svm.c will eventually allow dropping
svm_sev_enabled() entirely.
No functional change intended.
Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable the 'sev' and 'sev_es' module params by default instead of having
them conditioned on CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT. The extra
Kconfig is pointless as KVM SEV/SEV-ES support is already controlled via
CONFIG_KVM_AMD_SEV, and CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT has the
unfortunate side effect of enabling all the SEV-ES _guest_ code due to
it being dependent on CONFIG_AMD_MEM_ENCRYPT=y.
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Define sev_enabled and sev_es_enabled as 'false' and explicitly #ifdef
out all of sev_hardware_setup() if CONFIG_KVM_AMD_SEV=n. This kills
three birds at once:
- Makes sev_enabled and sev_es_enabled off by default if
CONFIG_KVM_AMD_SEV=n. Previously, they could be on by default if
CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y, regardless of KVM SEV
support.
- Hides the sev and sev_es modules params when CONFIG_KVM_AMD_SEV=n.
- Resolves a false positive -Wnonnull in __sev_recycle_asids() that is
currently masked by the equivalent IS_ENABLED(CONFIG_KVM_AMD_SEV)
check in svm_sev_enabled(), which will be dropped in a future patch.
Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rename sev and sev_es to sev_enabled and sev_es_enabled respectively to
better align with other KVM terminology, and to avoid pseudo-shadowing
when the variables are moved to sev.c in a future patch ('sev' is often
used for local struct kvm_sev_info pointers.
No functional change intended.
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a reverse-CPUID entry for the memory encryption word, 0x8000001F.EAX,
and use it to override the supported CPUID flags reported to userspace.
Masking the reported CPUID flags avoids over-reporting KVM support, e.g.
without the mask a SEV-SNP capable CPU may incorrectly advertise SNP
support to userspace.
Clear SEV/SEV-ES if their corresponding module parameters are disabled,
and clear the memory encryption leaf completely if SEV is not fully
supported in KVM. Advertise SME_COHERENT in addition to SEV and SEV-ES,
as the guest can use SME_COHERENT to avoid CLFLUSH operations.
Explicitly omit SME and VM_PAGE_FLUSH from the reporting. These features
are used by KVM, but are not exposed to the guest, e.g. guest access to
related MSRs will fault.
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Unconditionally invoke sev_hardware_setup() when configuring SVM and
handle clearing the module params/variable 'sev' and 'sev_es' in
sev_hardware_setup(). This allows making said variables static within
sev.c and reduces the odds of a collision with guest code, e.g. the guest
side of things has already laid claim to 'sev_enabled'.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Disable SEV and SEV-ES if NPT is disabled. While the APM doesn't clearly
state that NPT is mandatory, it's alluded to by:
The guest page tables, managed by the guest, may mark data memory pages
as either private or shared, thus allowing selected pages to be shared
outside the guest.
And practically speaking, shadow paging can't work since KVM can't read
the guest's page tables.
Fixes: e9df094289 ("KVM: SVM: Add sev module_param")
Cc: Brijesh Singh <brijesh.singh@amd.com
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Free sev_asid_bitmap if the reclaim bitmap allocation fails, othwerise
KVM will unnecessarily keep the bitmap when SEV is not fully enabled.
Freeing the page is also necessary to avoid introducing a bug when a
future patch eliminates svm_sev_enabled() in favor of using the global
'sev' flag directly. While sev_hardware_enabled() checks max_sev_asid,
which is true even if KVM setup fails, 'sev' will be true if and only
if KVM setup fully succeeds.
Fixes: 33af3a7ef9 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Zero out the array of VMCB pointers so that pre_sev_run() won't see
garbage when querying the array to detect when an SEV ASID is being
associated with a new VMCB. In practice, reading random values is all
but guaranteed to be benign as a false negative (which is extremely
unlikely on its own) can only happen on CPU0 on the first VMRUN and would
only cause KVM to skip the ASID flush. For anything bad to happen, a
previous instance of KVM would have to exit without flushing the ASID,
_and_ KVM would have to not flush the ASID at any time while building the
new SEV guest.
Cc: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Fixes: 70cd94e60c ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Append raw to the direct variants of kvm_register_read/write(), and
drop the "l" from the mode-aware variants. I.e. make the mode-aware
variants the default, and make the direct variants scary sounding so as
to discourage use. Accessing the full 64-bit values irrespective of
mode is rarely the desired behavior.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422022128.3464144-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
outside of 64-bit mode to make KVM's emulation slightly less wrong. The
address for INVLPGA is determined by the effective address size, i.e.
it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call
out that the emulation is wrong.
Opportunistically tweak the ASID handling to make it clear that it's
defined by ECX, not rCX.
Per the APM:
The portion of rAX used to form the address is determined by the
effective address size (current execution mode and optional address
size prefix). The ASID is taken from ECX.
Fixes: ff092385e8 ("KVM: SVM: Implement INVLPGA")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422022128.3464144-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not
in 64-bit mode. The APM states bits 63:32 are dropped for both DRs and
CRs:
In 64-bit mode, the operand size is fixed at 64 bits without the need
for a REX prefix. In non-64-bit mode, the operand size is fixed at 32
bits and the upper 32 bits of the destination are forced to 0.
Fixes: 7ff76d58a9 ("KVM: SVM: enhance MOV CR intercept handler")
Fixes: cae3797a46 ("KVM: SVM: enhance mov DR intercept handler")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422022128.3464144-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use KVM's "user return MSRs" framework to defer restoring the host's
MSR_TSC_AUX until the CPU returns to userspace. Add/improve comments to
clarify why MSR_TSC_AUX is intercepted on both RDMSR and WRMSR, and why
it's safe for KVM to keep the guest's value loaded even if KVM is
scheduled out.
Cc: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210423223404.3860547-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Force clear bits 63:32 of MSR_TSC_AUX on write to emulate current AMD
CPUs, which completely ignore the upper 32 bits, including dropping them
on write. Emulating AMD hardware will also allow migrating a vCPU from
AMD hardware to Intel hardware without requiring userspace to manually
clear the upper bits, which are reserved on Intel hardware.
Presumably, MSR_TSC_AUX[63:32] are intended to be reserved on AMD, but
sadly the APM doesn't say _anything_ about those bits in the context of
MSR access. The RDTSCP entry simply states that RCX contains bits 31:0
of the MSR, zero extended. And even worse is that the RDPID description
implies that it can consume all 64 bits of the MSR:
RDPID reads the value of TSC_AUX MSR used by the RDTSCP instruction
into the specified destination register. Normal operand size prefixes
do not apply and the update is either 32 bit or 64 bit based on the
current mode.
Emulate current hardware behavior to give KVM the best odds of playing
nice with whatever the behavior of future AMD CPUs happens to be.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210423223404.3860547-3-seanjc@google.com>
[Fix broken patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in
the guest's CPUID model.
Fixes: 46896c73c1 ("KVM: svm: add support for RDTSCP")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210423223404.3860547-2-seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the local stack to "allocate" the structures used to communicate with
the PSP. The largest struct used by KVM, sev_data_launch_secret, clocks
in at 52 bytes, well within the realm of reasonable stack usage. The
smallest structs are a mere 4 bytes, i.e. the pointer for the allocation
is larger than the allocation itself.
Now that the PSP driver plays nice with vmalloc pointers, putting the
data on a virtually mapped stack (CONFIG_VMAP_STACK=y) will not cause
explosions.
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210406224952.4177376-9-seanjc@google.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
[Apply same treatment to PSP migration commands. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The command finalize the guest receiving process and make the SEV guest
ready for the execution.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <d08914dc259644de94e29b51c3b68a13286fc5a3.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The command is used for copying the incoming buffer into the
SEV guest memory space.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <c5d0e3e719db7bb37ea85d79ed4db52e9da06257.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The command is used to create the encryption context for an incoming
SEV guest. The encryption context can be later used by the hypervisor
to import the incoming data into the SEV guest memory space.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <c7400111ed7458eee01007c4d8d57cdf2cbb0fc2.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After completion of SEND_START, but before SEND_FINISH, the source VMM can
issue the SEND_CANCEL command to stop a migration. This is necessary so
that a cancelled migration can restart with a new target later.
Reviewed-by: Nathan Tempelman <natet@google.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Steve Rutherford <srutherford@google.com>
Message-Id: <20210412194408.2458827-1-srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The command is used for encrypting the guest memory region using the encryption
context created with KVM_SEV_SEND_START.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by : Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <d6a6ea740b0c668b30905ae31eac5ad7da048bb3.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a capability for userspace to mirror SEV encryption context from
one vm to another. On our side, this is intended to support a
Migration Helper vCPU, but it can also be used generically to support
other in-guest workloads scheduled by the host. The intention is for
the primary guest and the mirror to have nearly identical memslots.
The primary benefits of this are that:
1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
can't accidentally clobber each other.
2) The VMs can have different memory-views, which is necessary for post-copy
migration (the migration vCPUs on the target need to read and write to
pages, when the primary guest would VMEXIT).
This does not change the threat model for AMD SEV. Any memory involved
is still owned by the primary guest and its initial state is still
attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
to circumvent SEV, they could achieve the same effect by simply attaching
a vCPU to the primary VM.
This patch deliberately leaves userspace in charge of the memslots for the
mirror, as it already has the power to mess with them in the primary guest.
This patch does not support SEV-ES (much less SNP), as it does not
handle handing off attested VMSAs to the mirror.
For additional context, we need a Migration Helper because SEV PSP
migration is far too slow for our live migration on its own. Using
an in-guest migrator lets us speed this up significantly.
Signed-off-by: Nathan Tempelman <natet@google.com>
Message-Id: <20210408223214.2582277-1-natet@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to section "Canonicalization and Consistency Checks" in APM vol 2,
the following guest state is illegal:
"The MSR or IOIO intercept tables extend to a physical address that
is greater than or equal to the maximum supported physical address."
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20210412215611.110095-5-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Define the actual size of the IOPM and MSRPM tables so that the actual size
can be used when initializing them and when checking the consistency of their
physical address.
These #defines are placed in svm.h so that they can be shared.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20210412215611.110095-2-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly document why a vmcb must be marked dirty and assigned a new
asid when it will be run on a different cpu. The "what" is relatively
obvious, whereas the "why" requires reading the APM and/or KVM code.
Opportunistically remove a spurious period and several unnecessary
newlines in the comment.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210406171811.4043363-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a comment above the declaration of vcpu_svm.vmcb to call out that it
is simply a shorthand for current_vmcb->ptr. The myriad accesses to
svm->vmcb are quite confusing without this crucial detail.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210406171811.4043363-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove vmcb_pa from vcpu_svm and simply read current_vmcb->pa directly in
the one path where it is consumed. Unlike svm->vmcb, use of the current
vmcb's address is very limited, as evidenced by the fact that its use
can be trimmed to a single dereference.
Opportunistically add a comment about using vmcb01 for VMLOAD/VMSAVE, at
first glance using vmcb01 instead of vmcb_pa looks wrong.
No functional change intended.
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210406171811.4043363-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not update the new vmcb's last-run cpu when switching to a different
vmcb. If the vCPU is migrated between its last run and a vmcb switch,
e.g. for nested VM-Exit, then setting the cpu without marking the vmcb
dirty will lead to KVM running the vCPU on a different physical cpu with
stale clean bit settings.
vcpu->cpu current_vmcb->cpu hardware
pre_svm_run() cpu0 cpu0 cpu0,clean
kvm_arch_vcpu_load() cpu1 cpu0 cpu0,clean
svm_switch_vmcb() cpu1 cpu1 cpu0,clean
pre_svm_run() cpu1 cpu1 kaboom
Simply delete the offending code; unlike VMX, which needs to update the
cpu at switch time due to the need to do VMPTRLD, SVM only cares about
which cpu last ran the vCPU.
Fixes: af18fa775d ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb")
Cc: Cathy Avery <cavery@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210406171811.4043363-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Access to the GHCB is mainly in the VMGEXIT path and it is known that the
GHCB will be mapped. But there are two paths where it is possible the GHCB
might not be mapped.
The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
However, if a SIPI is performed without a corresponding AP Reset Hold,
then the GHCB might not be mapped (depending on the previous VMEXIT),
which will result in a NULL pointer dereference.
The svm_complete_emulated_msr() routine will update the GHCB to inform
the caller of a RDMSR/WRMSR operation about any errors. While it is likely
that the GHCB will be mapped in this situation, add a safe guard
in this path to be certain a NULL pointer dereference is not encountered.
Fixes: f1c6366e30 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Fixes: 647daca25d ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Injected interrupts/nmi should not block a pending exception,
but rather be either lost if nested hypervisor doesn't
intercept the pending exception (as in stock x86), or be delivered
in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit
that corresponds to the pending exception.
The only reason for an exception to be blocked is when nested run
is pending (and that can't really happen currently
but still worth checking for).
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4
by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore
only root_mmu is reset.
On regular nested entries we call nested_svm_load_cr3 which both updates
the guest's CR3 in the MMU when it is needed, and it also initializes
the mmu again which makes it initialize the walk_mmu as well when nested
paging is enabled in both host and guest.
Since we don't call nested_svm_load_cr3 on nested state load,
the walk_mmu can be left uninitialized, which can lead to a NULL pointer
dereference while accessing it if we happen to get a nested page fault
right after entering the nested guest first time after the migration and
we decide to emulate it, which leads to the emulator trying to access
walk_mmu->gva_to_gpa which is NULL.
Therefore we should call this function on nested state load as well.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel,
we emulate the full 64 value for MSR_IA32_SYSENTER_{EIP|ESP}
msrs, and we also emulate the sysenter/sysexit instruction in long mode.
(Emulator does still refuse to emulate sysenter in 64 bit mode, on the
ground that the code for that wasn't tested and likely has no users)
However when virtual vmload/vmsave is enabled, the vmload instruction will
update these 32 bit msrs without triggering their msr intercept,
which will lead to having stale values in kvm's shadow copy of these msrs,
which relies on the intercept to be up to date.
Fix/optimize this by doing the following:
1. Enable the MSR intercepts for SYSENTER MSRs iff vendor=GenuineIntel
(This is both a tiny optimization and also ensures that in case
the guest cpu vendor is AMD, the msrs will be 32 bit wide as
AMD defined).
2. Store only high 32 bit part of these msrs on interception and combine
it with hardware msr value on intercepted read/writes
iff vendor=GenuineIntel.
3. Disable vmload/vmsave virtualization if vendor=GenuineIntel.
(It is somewhat insane to set vendor=GenuineIntel and still enable
SVM for the guest but well whatever).
Then zero the high 32 bit parts when kvm intercepts and emulates vmload.
Thanks a lot to Paulo Bonzini for helping me with fixing this in the most
correct way.
This patch fixes nested migration of 32 bit nested guests, that was
broken because incorrect cached values of SYSENTER msrs were stored in
the migration stream if L1 changed these msrs with
vmload prior to L2 entry.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401111928.996871-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are
clearly associated with a single task/VM.
Note, there are a several SEV allocations that aren't accounted, but
those can (hopefully) be fixed by using the local stack for memory.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210331023025.2485960-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one
or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES
prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV
enabled, and svm_create_vcpu() needs to allocate the VMSA. At best,
creating vCPUs before SEV/SEV-ES init will lead to unexpected errors
and/or behavior, and at worst it will crash the host, e.g.
sev_launch_update_vmsa() will dereference a null svm->vmsa pointer.
Fixes: 1654efcbc4 ("KVM: SVM: Add KVM_SEV_INIT command")
Fixes: ad73109ae7 ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210331031936.2495277-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Set sev->es_active only after the guts of KVM_SEV_ES_INIT succeeds. If
the command fails, e.g. because SEV is already active or there are no
available ASIDs, then es_active will be left set even though the VM is
not fully SEV-ES capable.
Refactor the code so that "es_active" is passed on the stack instead of
being prematurely shoved into sev_info, both to avoid having to unwind
sev_info and so that it's more obvious what actually consumes es_active
in sev_guest_init() and its helpers.
Fixes: ad73109ae7 ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210331031936.2495277-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the kvm_for_each_vcpu() helper to iterate over vCPUs when encrypting
VMSAs for SEV, which effectively switches to use online_vcpus instead of
created_vcpus. This fixes a possible null-pointer dereference as
created_vcpus does not guarantee a vCPU exists, since it is updated at
the very beginning of KVM_CREATE_VCPU. created_vcpus exists to allow the
bulk of vCPU creation to run in parallel, while still correctly
restricting the max number of max vCPUs.
Fixes: ad73109ae7 ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210331031936.2495277-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to APM, the #DB intercept for a single-stepped VMRUN must happen
after the completion of that instruction, when the guest does #VMEXIT to
the host. However, in the current implementation of KVM, the #DB intercept
for a single-stepped VMRUN happens after the completion of the instruction
that follows the VMRUN instruction. When the #DB intercept handler is
invoked, it shows the RIP of the instruction that follows VMRUN, instead of
of VMRUN itself. This is an incorrect RIP as far as single-stepping VMRUN
is concerned.
This patch fixes the problem by checking, in nested_svm_vmexit(), for the
condition that the VMRUN instruction is being single-stepped and if so,
queues the pending #DB intercept so that the #DB is accounted for before
we execute L1's next instruction.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oraacle.com>
Message-Id: <20210323175006.73249-2-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>