linux/arch/x86/kvm/vmx
Sean Christopherson 31603d4fc2 KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.

Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs().  This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.

Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel.  Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.

Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration.  list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new).  A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.

In addition to removing the temporary disabling of VMCLEAR, open code
loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
the VMCS is deleted from the list only after it's been VMCLEAR'd.
Deleting the VMCS before VMCLEAR would allow a race where the NMI
shootdown could arrive between list_del() and vmcs_clear() and thus
neither flow would execute a successful VMCLEAR.  Alternatively, more
code could be moved into loaded_vmcs_init(), but that gets rather silly
as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
and would need to work around the list_del().

Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.

[*] https://patchwork.kernel.org/patch/1675731/#3720461

Fixes: 8f536b7697 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 15:44:25 -04:00
..
capabilities.h KVM: x86: Handle PKU CPUID adjustment in VMX code 2020-03-16 17:58:19 +01:00
evmcs.c x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests 2020-02-05 15:55:26 +01:00
evmcs.h KVM: nVMX: properly handle errors in nested_vmx_handle_enlightened_vmptrld() 2020-03-16 18:19:30 +01:00
nested.c KVM: nVMX: remove side effects from nested_vmx_exit_reflected 2020-03-18 12:16:39 +01:00
nested.h KVM: nVMX: remove side effects from nested_vmx_exit_reflected 2020-03-18 12:16:39 +01:00
ops.h KVM: VMX: Add error handling to VMREAD helper 2019-09-25 15:30:09 +02:00
pmu_intel.c KVM: VMX: Directly query Intel PT mode when refreshing PMUs 2020-03-16 17:58:38 +01:00
vmcs12.c
vmcs12.h KVM/arm updates for 5.3 2019-07-11 15:14:16 +02:00
vmcs_shadow_fields.h KVM: Fix some out-dated function names in comment 2020-01-21 13:57:27 +01:00
vmcs.h KVM: VMX: Leave preemption timer running when it's disabled 2019-06-18 17:10:46 +02:00
vmenter.S KVM: VMX: access regs array in vmenter.S in its natural order 2020-03-17 14:14:32 +01:00
vmx.c KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support 2020-03-23 15:44:25 -04:00
vmx.h KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgd 2020-03-16 17:58:52 +01:00