linux/arch/x86
Sean Christopherson fc387d8daf KVM: nVMX: Reset the segment cache when stuffing guest segs
Explicitly reset the segment cache after stuffing guest segment regs in
prepare_vmcs02_rare().  Although the cache is reset when switching to
vmcs02, there is nothing that prevents KVM from re-populating the cache
prior to writing vmcs02 with vmcs12's values.  E.g. if the vCPU is
preempted after switching to vmcs02 but before prepare_vmcs02_rare(),
kvm_arch_vcpu_put() will dereference GUEST_SS_AR_BYTES via .get_cpl()
and cache the stale vmcs02 value.  While the current code base only
caches stale data in the preemption case, it's theoretically possible
future code could read a segment register during the nested flow itself,
i.e. this isn't technically illegal behavior in kvm_arch_vcpu_put(),
although it did introduce the bug.

This manifests as an unexpected nested VM-Enter failure when running
with unrestricted guest disabled if the above preemption case coincides
with L1 switching L2's CPL, e.g. when switching from a L2 vCPU at CPL3
to to a L2 vCPU at CPL0.  stack_segment_valid() will see the new SS_SEL
but the old SS_AR_BYTES and incorrectly mark the guest state as invalid
due to SS.dpl != SS.rpl.

Don't bother updating the cache even though prepare_vmcs02_rare() writes
every segment.  With unrestricted guest, guest segments are almost never
read, let alone L2 guest segments.  On the other hand, populating the
cache requires a large number of memory writes, i.e. it's unlikely to be
a net win.  Updating the cache would be a win when unrestricted guest is
not supported, as guest_state_valid() will immediately cache all segment
registers.  But, nested virtualization without unrestricted guest is
dirt slow, saving some VMREADs won't change that, and every CPU
manufactured in the last decade supports unrestricted guest.  In other
words, the extra (minor) complexity isn't worth the trouble.

Note, kvm_arch_vcpu_put() may see stale data when querying guest CPL
depending on when preemption occurs.  This is "ok" in that the usage is
imperfect by nature, i.e. it's used heuristically to improve performance
but doesn't affect functionality.  kvm_arch_vcpu_put() could be "fixed"
by also disabling preemption while loading segments, but that's
pointless and misleading as reading state from kvm_sched_{in,out}() is
guaranteed to see stale data in one form or another.  E.g. even if all
the usage of regs_avail is fixed to call kvm_register_mark_available()
after the associated state is set, the individual state might still be
stale with respect to the overall vCPU state.  I.e. making functional
decisions in an asynchronous hook is doomed from the get go.  Thankfully
KVM doesn't do that.

Fixes: de63ad4cf4 ("KVM: X86: implement the logic for spinlock optimization")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923184452.980-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:44 -04:00
..
boot Merge 'x86/kaslr' to pick up dependent bits 2020-09-07 18:09:43 +02:00
configs x86/defconfigs: Refresh defconfig files 2020-07-25 12:02:14 +02:00
crypto crypto: x86/curve25519 - Remove unused carry variables 2020-07-31 18:25:29 +10:00
entry x86/entry: Unbreak 32bit fast syscall 2020-09-04 15:50:14 +02:00
events treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
hyperv vmalloc: fix the owner argument for the new __vmalloc_node_range callers 2020-07-03 16:15:25 -07:00
ia32 mm: remove unneeded includes of <asm/pgalloc.h> 2020-08-07 11:33:26 -07:00
include KVM: VMX: Rename RDTSCP secondary exec control name to insert "ENABLE" 2020-09-28 07:57:30 -04:00
kernel cpuidle-haltpoll: fix error comments in arch_haltpoll_disable 2020-09-28 07:57:28 -04:00
kvm KVM: nVMX: Reset the segment cache when stuffing guest segs 2020-09-28 07:57:44 -04:00
lib x86/cmdline: Disable jump tables for cmdline.c 2020-09-03 10:59:16 +02:00
math-emu treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mm x86, fakenuma: Fix invalid starting node ID 2020-09-04 08:56:13 +02:00
net bpf, i386: Remove unneeded conversion to bool 2020-05-07 16:29:14 +02:00
oprofile
pci xen: branch for v5.9-rc2 2020-08-21 12:28:33 -07:00
platform efi/x86: Move 32-bit code into efi_32.c 2020-08-20 11:18:36 +02:00
power Kbuild updates for v5.9 2020-08-09 14:10:26 -07:00
purgatory Misc fixes and small updates all around the place: 2020-08-15 10:38:03 -07:00
ras treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
realmode Rebase locking/kcsan to locking/urgent 2020-06-11 20:02:46 +02:00
tools
um kbuild: remove cc-option test of -fno-stack-protector 2020-07-07 11:13:10 +09:00
video
xen xen: branch for v5.9-rc1b 2020-08-14 13:34:37 -07:00
.gitignore
Kbuild
Kconfig A set of posix CPU timer changes which allows to defer the heavy work of 2020-08-14 14:17:51 -07:00
Kconfig.assembler x86/delay: Introduce TPAUSE delay 2020-05-07 16:06:20 +02:00
Kconfig.cpu treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Kconfig.debug locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs 2020-07-27 15:13:29 +02:00
Makefile Kbuild updates for v5.9 2020-08-09 14:10:26 -07:00
Makefile_32.cpu
Makefile.um