linux/arch/x86
Sean Christopherson 31603d4fc2 KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.

Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs().  This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.

Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel.  Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.

Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration.  list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new).  A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.

In addition to removing the temporary disabling of VMCLEAR, open code
loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
the VMCS is deleted from the list only after it's been VMCLEAR'd.
Deleting the VMCS before VMCLEAR would allow a race where the NMI
shootdown could arrive between list_del() and vmcs_clear() and thus
neither flow would execute a successful VMCLEAR.  Alternatively, more
code could be moved into loaded_vmcs_init(), but that gets rather silly
as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
and would need to work around the list_del().

Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.

[*] https://patchwork.kernel.org/patch/1675731/#3720461

Fixes: 8f536b7697 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 15:44:25 -04:00
..
boot Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
configs x86: Remove the calgary IOMMU driver 2019-11-15 10:36:59 +01:00
crypto crypto: x86/poly1305 - emit does base conversion itself 2020-01-22 16:21:11 +08:00
entry kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
events perf/x86/intel: Fix inaccurate period in context switch for auto-reload 2020-02-11 13:23:27 +01:00
hyperv x86/hyperv: Suspend/resume the hypercall page for hibernation 2020-02-01 09:41:16 +01:00
ia32 x86: Remove force_iret() 2020-01-08 19:40:51 +01:00
include KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgd 2020-03-16 17:58:52 +01:00
kernel A set of fixes for X86: 2020-02-09 12:11:12 -08:00
kvm KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support 2020-03-23 15:44:25 -04:00
lib Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-01-31 11:05:33 -08:00
math-emu Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 10:42:40 -08:00
mm x86: mm: avoid allocating struct mm_struct on the stack 2020-02-04 03:05:25 +00:00
net bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS 2020-01-09 08:46:18 -08:00
oprofile x86: Use pr_warn instead of pr_warning 2019-10-18 15:00:18 +02:00
pci pci-v5.6-changes 2020-01-31 14:48:54 -08:00
platform A single fix for a EFI boot regression on X86 which was caused by the 2020-02-09 11:54:50 -08:00
power x86/asm/32: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 12:03:43 +02:00
purgatory Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 10:42:40 -08:00
ras
realmode kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
tools kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
um um: Implement copy_thread_tls 2020-01-07 13:31:29 +01:00
video
xen A set of fixes for X86: 2020-02-09 12:11:12 -08:00
.gitignore
Kbuild
Kconfig asm-generic/tlb: rename HAVE_RCU_TABLE_FREE 2020-02-04 03:05:26 +00:00
Kconfig.cpu x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs 2020-01-13 18:02:53 +01:00
Kconfig.debug x86: mm: convert dump_pagetables to use walk_page_range 2020-02-04 03:05:25 +00:00
Makefile
Makefile_32.cpu
Makefile.um