linux/Documentation
Sean Christopherson ce25681d59 KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
Add yet another spinlock for the TDP MMU and take it when marking indirect
shadow pages unsync.  When using the TDP MMU and L1 is running L2(s) with
nested TDP, KVM may encounter shadow pages for the TDP entries managed by
L1 (controlling L2) when handling a TDP MMU page fault.  The unsync logic
is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
which runs with mmu_lock held for read, not write.

Lack of a critical section manifests most visibly as an underflow of
unsync_children in clear_unsync_child_bit() due to unsync_children being
corrupted when multiple CPUs write it without a critical section and
without atomic operations.  But underflow is the best case scenario.  The
worst case scenario is that unsync_children prematurely hits '0' and
leads to guest memory corruption due to KVM neglecting to properly sync
shadow pages.

Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
would functionally be ok.  Usurping the lock could degrade performance when
building upper level page tables on different vCPUs, especially since the
unsync flow could hold the lock for a comparatively long time depending on
the number of indirect shadow pages and the depth of the paging tree.

For simplicity, take the lock for all MMUs, even though KVM could fairly
easily know that mmu_lock is held for write.  If mmu_lock is held for
write, there cannot be contention for the inner spinlock, and marking
shadow pages unsync across multiple vCPUs will be slow enough that
bouncing the kvm_arch cacheline should be in the noise.

Note, even though L2 could theoretically be given access to its own EPT
entries, a nested MMU must hold mmu_lock for write and thus cannot race
against a TDP MMU page fault.  I.e. the additional spinlock only _needs_ to
be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
that is running with the TDP MMU enabled.  Holding mmu_lock for read also
prevents the indirect shadow page from being freed.  But as above, keep
it simple and always take the lock.

Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
effectively disable unsync behavior for nested TDP.  Write protecting leaf
shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
VMMs typically don't modify TDP entries, but the same may not hold true for
non-standard use cases and/or VMMs that are migrating physical pages (from
L1's perspective).

Alternative #2, the unsync logic could be made thread safe.  In theory,
simply converting all relevant kvm_mmu_page fields to atomics and using
atomic bitops for the bitmap would suffice.  However, (a) an in-depth audit
would be required, (b) the code churn would be substantial, and (c) legacy
shadow paging would incur additional atomic operations in performance
sensitive paths for no benefit (to legacy shadow paging).

Fixes: a2855afc7e ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210812181815.3378104-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13 03:32:14 -04:00
..
ABI Networking fixes for 5.14-rc2, including fixes from bpf and netfilter. 2021-07-14 09:24:32 -07:00
accounting delayacct: Add sysctl to enable at runtime 2021-05-12 11:43:25 +02:00
admin-guide USB / Thunderbolt patches for 5.14-rc1 2021-07-05 14:16:22 -07:00
arm docs: Fix typo in Documentation/arm/marvell.rst 2021-06-04 11:28:36 -06:00
arm64 arm64: Document requirement for access to FEAT_HCX 2021-05-25 19:05:28 +01:00
block for-5.14/block-2021-06-29 2021-06-30 12:12:56 -07:00
bpf Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
cdrom docs: cdrom-standard.rst: get rid of uneeded UTF-8 chars 2021-05-11 11:00:17 -06:00
core-api module: add printk formats to add module build ID to stacktraces 2021-07-08 11:48:22 -07:00
cpu-freq cpufreq: Remove ->resolve_freq() 2021-06-30 19:45:42 +02:00
crypto
dev-tools Documentation: kunit: drop obsolete note about uml_abort for coverage 2021-07-12 13:54:12 -06:00
devicetree ARM: SoC fixes for v5.14 2021-07-17 15:58:24 -07:00
doc-guide docs: doc-guide: avoid using ReST :doc:foo markup 2021-06-17 13:24:37 -06:00
driver-api Documentation: Fix intiramfs script name 2021-07-18 23:48:14 +09:00
fault-injection docs: fault-injection: fix non-working usage of negative values 2021-06-14 15:58:22 -06:00
fb
features Documentation/features: Add THREAD_INFO_IN_TASK feature matrix 2021-07-15 06:33:44 -06:00
filesystems Documentation: Fix intiramfs script name 2021-07-18 23:48:14 +09:00
firmware_class
firmware-guide pwm: Changes for v5.14-rc1 2021-07-08 12:18:04 -07:00
fpga Documentation: fpga: dfl: change FPGA indirect article to an 2021-06-09 14:51:25 +02:00
gpu drm/amd/display: Add Freesync video documentation 2021-06-18 17:06:43 -04:00
hid
hwmon hwmon: (pmbus) Add driver for Delta DPS-920AB PSU 2021-06-17 04:21:46 -07:00
i2c Merge branch 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2021-07-04 11:47:18 -07:00
ia64
ide
iio
infiniband
input docs: networking: Replace strncpy() with strscpy() 2021-06-04 11:21:43 -06:00
isdn
kbuild
kernel-hacking docs: kernel-hacking: hacking.rst: avoid using ReST :doc:foo markup 2021-06-17 13:24:38 -06:00
leds
litmus-tests
livepatch
locking locking/lockdep,doc: Improve readability of the block matrix 2021-05-31 10:14:54 +02:00
m68k
maintainer
mhi
mips
misc-devices
netlabel
networking Networking fixes for 5.14-rc2, including fixes from bpf and netfilter. 2021-07-14 09:24:32 -07:00
nios2
nvdimm
openrisc
parisc
PCI pci-v5.14-changes 2021-07-08 12:06:20 -07:00
pcmcia
power PM: runtime: Clarify documentation when callbacks are unassigned 2021-06-11 19:04:07 +02:00
powerpc powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls 2021-05-21 00:58:03 +10:00
process docs: process: submitting-patches.rst: avoid using ReST :doc:foo markup 2021-06-17 13:24:38 -06:00
RCU Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu 2021-07-04 12:58:33 -07:00
riscv riscv: Ensure BPF_JIT_REGION_START aligned with PMD size 2021-06-18 21:10:05 -07:00
s390 vfio/mdev: Remove CONFIG_VFIO_MDEV_DEVICE 2021-06-21 15:29:25 -06:00
scheduler This was a reasonably active cycle for documentation; this pull includes: 2021-06-28 16:53:05 -07:00
scsi scsi: core: Kill message byte 2021-05-31 22:48:24 -04:00
security This was a reasonably active cycle for documentation; this pull includes: 2021-06-28 16:53:05 -07:00
sh
sound ASoC: Updates for v5.14 2021-07-01 08:36:12 +02:00
sparc
sphinx
sphinx-static
spi spi: pxa2xx: Update documentation to point out that it's outdated 2021-05-18 14:05:36 +01:00
staging
target
timers Documentation: drop optional BOMs 2021-05-10 15:17:34 -06:00
trace Tracing updates for 5.14: 2021-07-03 11:13:22 -07:00
translations docs/zh_CN: add a missing space character 2021-07-15 06:33:44 -06:00
usb USB / Thunderbolt patches for 5.14-rc1 2021-07-05 14:16:22 -07:00
userspace-api Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
virt KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock 2021-08-13 03:32:14 -04:00
vm Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
w1 w1: fix build warning in w1_ds2438.rst 2021-05-26 09:11:24 +02:00
watchdog
x86 Fixes and improvements for FPU handling on x86: 2021-07-07 11:12:01 -07:00
xtensa
.gitignore
arch.rst
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py docs: Take a little noise out of the build process 2021-06-17 13:49:18 -06:00
COPYING-logo
docutils.conf
dontdiff
index.rst
Kconfig
logo.gif
Makefile docs: Makefile: Use CONFIG_SHELL not SHELL 2021-06-18 11:26:08 -06:00
memory-barriers.txt
SubmittingPatches
watch_queue.rst