Commit Graph

1167 Commits

Author SHA1 Message Date
Paolo Bonzini
c4ad77e0d4 KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP
These bits were not defined until now in common code, but they are
now that the kernel supports UMIP too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-17 13:20:23 +01:00
Liran Alon
917dc6068b KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2
vmx_check_nested_events() should return -EBUSY only in case there is a
pending L1 event which requires a VMExit from L2 to L1 but such a
VMExit is currently blocked. Such VMExits are blocked either
because nested_run_pending=1 or an event was reinjected to L2.
vmx_check_nested_events() should return 0 in case there are no
pending L1 events which requires a VMExit from L2 to L1 or if
a VMExit from L2 to L1 was done internally.

However, upstream commit which introduced blocking in case an event was
reinjected to L2 (commit acc9ab6013 ("KVM: nVMX: Fix pending events
injection")) contains a bug: It returns -EBUSY even if there are no
pending L1 events which requires VMExit from L2 to L1.

This commit fix this issue.

Fixes: acc9ab6013 ("KVM: nVMX: Fix pending events injection")

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17 13:20:21 +01:00
Wanpeng Li
5af4157388 KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
Commit 4f350c6dbc (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure
properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls in L1)
null pointer deference and also L0 calltrace when EPT=0 on both L0 and L1.

In L1:

BUG: unable to handle kernel paging request at ffffffffc015bf8f
 IP: vmx_vcpu_run+0x202/0x510 [kvm_intel]
 PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161
 Oops: 0003 [#1] PREEMPT SMP
 CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6
 RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel]
 Call Trace:
 WARNING: kernel stack frame pointer at ffffb86f4988bc18 in qemu-system-x86:1798 has bad value 0000000000000002

In L0:

-----------[ cut here ]------------
 WARNING: CPU: 6 PID: 4460 at /home/kernel/linux/arch/x86/kvm//vmx.c:9845 vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
 CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc7+ #25
 RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
 Call Trace:
  paging64_page_fault+0x500/0xde0 [kvm]
  ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm]
  ? nonpaging_page_fault+0x3b0/0x3b0 [kvm]
  ? __asan_storeN+0x12/0x20
  ? paging64_gva_to_gpa+0xb0/0x120 [kvm]
  ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm]
  ? lock_acquire+0x2c0/0x2c0
  ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel]
  ? vmx_get_segment+0x2a6/0x310 [kvm_intel]
  ? sched_clock+0x1f/0x30
  ? check_chain_key+0x137/0x1e0
  ? __lock_acquire+0x83c/0x2420
  ? kvm_multiple_exception+0xf2/0x220 [kvm]
  ? debug_check_no_locks_freed+0x240/0x240
  ? debug_smp_processor_id+0x17/0x20
  ? __lock_is_held+0x9e/0x100
  kvm_mmu_page_fault+0x90/0x180 [kvm]
  kvm_handle_page_fault+0x15c/0x310 [kvm]
  ? __lock_is_held+0x9e/0x100
  handle_exception+0x3c7/0x4d0 [kvm_intel]
  vmx_handle_exit+0x103/0x1010 [kvm_intel]
  ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm]

The commit avoids to load host state of vmcs12 as vmcs01's guest state
since vmcs12 is not modified (except for the VM-instruction error field)
if the checking of vmcs control area fails. However, the mmu context is
switched to nested mmu in prepare_vmcs02() and it will not be reloaded
since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME
fails. This patch fixes it by reloading mmu context when nested
VMLAUNCH/VMRESUME fails.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17 13:20:14 +01:00
Wanpeng Li
f1b026a331 KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
According to the SDM, if the "load IA32_BNDCFGS" VM-entry controls is 1, the
following checks are performed on the field for the IA32_BNDCFGS MSR:
 - Bits reserved in the IA32_BNDCFGS MSR must be 0.
 - The linear address in bits 63:12 must be canonical.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17 13:20:13 +01:00
Liran Alon
9b8ae63798 KVM: x86: Don't re-execute instruction when not passing CR2 value
In case of instruction-decode failure or emulation failure,
x86_emulate_instruction() will call reexecute_instruction() which will
attempt to use the cr2 value passed to x86_emulate_instruction().
However, when x86_emulate_instruction() is called from
emulate_instruction(), cr2 is not passed (passed as 0) and therefore
it doesn't make sense to execute reexecute_instruction() logic at all.

Fixes: 51d8b66199 ("KVM: cleanup emulate_instruction")

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17 13:20:12 +01:00
Liran Alon
61cb57c9ed KVM: x86: Exit to user-mode on #UD intercept when emulator requires
Instruction emulation after trapping a #UD exception can result in an
MMIO access, for example when emulating a MOVBE on a processor that
doesn't support the instruction.  In this case, the #UD vmexit handler
must exit to user mode, but there wasn't any code to do so.  Add it for
both VMX and SVM.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17 13:20:10 +01:00
Liran Alon
ac9b305caa KVM: nVMX/nSVM: Don't intercept #UD when running L2
When running L2, #UD should be intercepted by L1 or just forwarded
directly to L2. It should not reach L0 x86 emulator.
Therefore, set intercept for #UD only based on L1 exception-bitmap.

Also add WARN_ON_ONCE() on L0 #UD intercept handlers to make sure
it is never reached while running L2.

This improves commit ae1f576707 ("KVM: nVMX: Do not emulate #UD while
in guest mode") by removing an unnecessary exit from L2 to L0 on #UD
when L1 doesn't intercept it.

In addition, SVM L0 #UD intercept handler doesn't handle correctly the
case it is raised from L2. In this case, it should forward the #UD to
guest instead of x86 emulator. As done in VMX #UD intercept handler.
This commit fixes this issue as-well.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17 13:20:09 +01:00
Paolo Bonzini
d02fcf5077 kvm: vmx: Allow disabling virtual NMI support
To simplify testing of these rarely used code paths, add a module parameter
that turns it on.  One eventinj.flat test (NMI after iret) fails when
loading kvm_intel with vnmi=0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17 13:20:07 +01:00
Paolo Bonzini
8a1b43922d kvm: vmx: Reinstate support for CPUs without virtual NMI
This is more or less a revert of commit 2c82878b0c ("KVM: VMX: require
virtual NMI support", 2017-03-27); it turns out that Core 2 Duo machines
only had virtual NMIs in some SKUs.

The revert is not trivial because in the meanwhile there have been several
fixes to nested NMI injection.  Therefore, the entire vNMI state is moved
to struct loaded_vmcs.

Another change compared to before the patch is a simplification here:

       if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked &&
           !(is_guest_mode(vcpu) && nested_cpu_has_virtual_nmis(
                                       get_vmcs12(vcpu))))) {

The final condition here is always true (because nested_cpu_has_virtual_nmis
is always false) and is removed.

Fixes: 2c82878b0c
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1490803
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17 13:20:07 +01:00
Linus Torvalds
974aa5630b First batch of KVM changes for 4.15
Common:
  - Python 3 support in kvm_stat
 
  - Accounting of slabs to kmemcg
 
 ARM:
  - Optimized arch timer handling for KVM/ARM
 
  - Improvements to the VGIC ITS code and introduction of an ITS reset
    ioctl
 
  - Unification of the 32-bit fault injection logic
 
  - More exact external abort matching logic
 
 PPC:
  - Support for running hashed page table (HPT) MMU mode on a host that
    is using the radix MMU mode;  single threaded mode on POWER 9 is
    added as a pre-requisite
 
  - Resolution of merge conflicts with the last second 4.14 HPT fixes
 
  - Fixes and cleanups
 
 s390:
  - Some initial preparation patches for exitless interrupts and crypto
 
  - New capability for AIS migration
 
  - Fixes
 
 x86:
  - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs, and
    after-reset state
 
  - Refined dependencies for VMX features
 
  - Fixes for nested SMI injection
 
  - A lot of cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJaDayXAAoJEED/6hsPKofo/3UH/3HvlcHt+ADTkCU1/iiKAs+i
 0zngIOXIxgHDnV0ww6bV+Znww0BzTYgKCAXX76z603jdpDwG/pzQQcbLDF5ZoJnD
 sQtF10gZinWaRsHlfbLqjrHGL2pGDHO1UKBKLJ0bAIyORPZBxs7i+VmrY/blnr9c
 0wsybJ8RbvwAxjsDL5jeX/z4NehPupmKUc4Lf0eZdSHwVOf9sjn+MP6jJ0r2JcIb
 D+zddPBiLStzN97t4gZpQsrlj3LKrDS+6hY+1TjSvlh+yHKFVFh58VhLm4DuDeb5
 bYOAlWJ/gAWEzfvr5Ld+Nd7SqWWn/14logPkQ4gcU4BI/neAOzk4c6hJfCHl1nk=
 =593n
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "First batch of KVM changes for 4.15

  Common:
   - Python 3 support in kvm_stat
   - Accounting of slabs to kmemcg

  ARM:
   - Optimized arch timer handling for KVM/ARM
   - Improvements to the VGIC ITS code and introduction of an ITS reset
     ioctl
   - Unification of the 32-bit fault injection logic
   - More exact external abort matching logic

  PPC:
   - Support for running hashed page table (HPT) MMU mode on a host that
     is using the radix MMU mode; single threaded mode on POWER 9 is
     added as a pre-requisite
   - Resolution of merge conflicts with the last second 4.14 HPT fixes
   - Fixes and cleanups

  s390:
   - Some initial preparation patches for exitless interrupts and crypto
   - New capability for AIS migration
   - Fixes

  x86:
   - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs,
     and after-reset state
   - Refined dependencies for VMX features
   - Fixes for nested SMI injection
   - A lot of cleanups"

* tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits)
  KVM: s390: provide a capability for AIS state migration
  KVM: s390: clear_io_irq() requests are not expected for adapter interrupts
  KVM: s390: abstract conversion between isc and enum irq_types
  KVM: s390: vsie: use common code functions for pinning
  KVM: s390: SIE considerations for AP Queue virtualization
  KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup
  KVM: PPC: Book3S HV: Cosmetic post-merge cleanups
  KVM: arm/arm64: fix the incompatible matching for external abort
  KVM: arm/arm64: Unify 32bit fault injection
  KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET
  KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET
  KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared
  KVM: arm/arm64: vgic-its: New helper functions to free the caches
  KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device
  arm/arm64: KVM: Load the timer state when enabling the timer
  KVM: arm/arm64: Rework kvm_timer_should_fire
  KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
  KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
  KVM: arm/arm64: Move phys_timer_emulate function
  KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
  ...
2017-11-16 13:00:24 -08:00
Jan H. Schönherr
4191db26b7 KVM: x86: Update APICv on APIC reset
In kvm_apic_set_state() we update the hardware virtualized APIC after
the full APIC state has been overwritten. Do the same, when the full
APIC state has been reset in kvm_lapic_reset().

This updates some hardware state that was previously forgotten, as
far as I can tell. Also, this allows removing some APIC-related reset
code from vmx_vcpu_reset().

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-02 18:28:13 +01:00
Jan H. Schönherr
a4888486c5 KVM: VMX: Do not fully reset PI descriptor on vCPU reset
Parts of the posted interrupt descriptor configure host behavior,
such as the notification vector and destination. Overwriting them
with zero as done during vCPU reset breaks posted interrupts.
KVM (re-)writes these fields on certain occasions and belatedly fixes
the situation in many cases. However, if you have a guest configured
with "idle=poll", for example, the fields might stay zero forever.

Do not reset the full descriptor in vmx_vcpu_reset(). Instead,
reset only the outstanding notifications and leave everything
else untouched.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-02 18:27:11 +01:00
Wanpeng Li
61f1dd9099 KVM: VMX: Fix VPID capability detection
In my setup, EPT is not exposed to L1, the VPID capability is exposed and
can be observed by vmxcap tool in L1:
INVVPID supported                        yes
Individual-address INVVPID               yes
Single-context INVVPID                   yes
All-context INVVPID                      yes
Single-context-retaining-globals INVVPID yes

However, the module parameter of VPID observed in L1 is always N, the
cpu_has_vmx_invvpid() check in L1 KVM fails since vmx_capability.vpid
is 0 and it is not read from MSR due to EPT is not exposed.

The VPID can be used to tag linear mappings when EPT is not enabled. However,
current logic just detects VPID capability if EPT is enabled, this patch
fixes it.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-20 18:05:01 +02:00
Wanpeng Li
575b3a2cb4 KVM: nVMX: Fix EPT switching advertising
I can use vmxcap tool to observe "EPTP Switching   yes" even if EPT is not
exposed to L1.

EPT switching is advertised unconditionally since it is emulated, however,
it can be treated as an extended feature for EPT and it should not be
advertised if EPT itself is not exposed. This patch fixes it.

Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-20 18:05:01 +02:00
Ladi Prosek
cc3d967f7e KVM: SVM: detect opening of SMI window using STGI intercept
Commit 05cade71cf ("KVM: nSVM: fix SMI injection in guest mode") made
KVM mask SMI if GIF=0 but it didn't do anything to unmask it when GIF is
enabled.

The issue manifests for me as a significantly longer boot time of Windows
guests when running with SMM-enabled OVMF.

This commit fixes it by intercepting STGI instead of requesting immediate
exit if the reason why SMM was masked is GIF.

Fixes: 05cade71cf ("KVM: nSVM: fix SMI injection in guest mode")
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-18 21:21:22 +02:00
Ladi Prosek
72e9cbdb43 KVM: nVMX: fix SMI injection in guest mode
Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* SMM handler couldn't write to vmx_set_cr4 because of incorrect validity checks
  predicated on nested.vmxon
* MMU was confused (walk_mmu was still set to nested_mmu)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. What we need to do is
basically get out of guest mode and set nested.vmxon to false for the duration of
SMM. All this completely transparent to L1, i.e. L1 is not given control and no
L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_vmx_vmexit with exit_reason == -1 is called, a trick already
used in vmx_leave_nested. Re-entering is cleaner, using enter_vmx_non_root_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-12 14:01:55 +02:00
Ladi Prosek
21f2d55118 KVM: nVMX: set IDTR and GDTR limits when loading L1 host state
Intel SDM 27.5.2 Loading Host Segment and Descriptor-Table Registers:

"The GDTR and IDTR limits are each set to FFFFH."

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-12 14:01:55 +02:00
Ladi Prosek
72d7b374b1 KVM: x86: introduce ISA specific smi_allowed callback
Similar to NMI, there may be ISA specific reasons why an SMI cannot be
injected into the guest. This commit adds a new smi_allowed callback to
be implemented in following commits.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-12 14:01:55 +02:00
Ladi Prosek
0234bf8852 KVM: x86: introduce ISA specific SMM entry/exit callbacks
Entering and exiting SMM may require ISA specific handling under certain
circumstances. This commit adds two new callbacks with empty implementations.
Actual functionality will be added in following commits.

* pre_enter_smm() is to be called when injecting an SMM, before any
  SMM related vcpu state has been changed
* pre_leave_smm() is to be called when emulating the RSM instruction,
  when the vcpu is in real mode and before any SMM related vcpu state
  has been restored

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-12 14:01:55 +02:00
Paolo Bonzini
d000653057 KVM: SVM: limit kvm_handle_page_fault to #PF handling
It has always annoyed me a bit how SVM_EXIT_NPF is handled by
pf_interception.  This is also the only reason behind the
under-documented need_unprotect argument to kvm_handle_page_fault.
Let NPF go straight to kvm_mmu_page_fault, just like VMX
does in handle_ept_violation and handle_ept_misconfig.

Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-12 14:01:55 +02:00
Wanpeng Li
8ad8182e93 KVM: VMX: Don't expose unrestricted_guest is enabled if ept is disabled
SDM mentioned:

 "If either the “unrestricted guest†VM-execution control or the “mode-based
  execute control for EPT†VM- execution control is 1, the “enable EPTâ€
  VM-execution control must also be 1."

However, we can still observe unrestricted_guest is Y after inserting the kvm-intel.ko
w/ ept=N. It depends on later starts a guest in order that the function
vmx_compute_secondary_exec_control() can be executed, then both the module parameter
and exec control fields will be amended.

This patch fixes it by amending module parameter immediately during vmcs data setup.

Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-12 14:01:54 +02:00
Wanpeng Li
a554d207dc KVM: X86: Processor States following Reset or INIT
- XCR0 is reset to 1 by RESET but not INIT
- XSS is zeroed by both RESET and INIT
- BNDCFGU, BND0-BND3, BNDCFGS, BNDSTATUS are zeroed by both RESET and INIT

This patch does this according to SDM.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-12 14:01:54 +02:00
David Hildenbrand
736fdf7251 KVM: VMX: rename RDSEED and RDRAND vmx ctrls to reflect exiting
Let's just name these according to the SDM. This should make it clearer
that the are used to enable exiting and not the feature itself.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-12 14:01:53 +02:00
David Hildenbrand
d8a6e365b2 KVM: VMX: cleanup init_rmode_identity_map()
No need for another enable_ept check. kvm->arch.ept_identity_map_addr
only has to be inititalized once. Having alloc_identity_pagetable() is
overkill and dropping BUG_ONs is always nice.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-12 14:01:53 +02:00
David Hildenbrand
1c13bffd94 KVM: nVMX: no need to set ept/vpid caps to 0
They are inititally 0, so no need to reset them to 0.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-12 14:01:52 +02:00
David Hildenbrand
0ee096d006 KVM: nVMX: no need to set vcpu->cpu when switching vmcs
vcpu->cpu is not cleared when doing a vmx_vcpu_put/load, so this can be
dropped.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-12 14:01:52 +02:00
David Hildenbrand
9522ea9ef9 KVM: VMX: drop unnecessary function declarations
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-12 14:01:52 +02:00
David Hildenbrand
f5f51586db KVM: VMX: require INVEPT GLOBAL for EPT
Without this, we won't be able to do any flushes, so let's just require
it. Should be absent in very strange configurations.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-12 14:01:52 +02:00
David Hildenbrand
fdf288bf72 KVM: VMX: call ept_sync_global() with enable_ept only
ept_* function should only be called with enable_ept being set.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-12 14:01:52 +02:00
David Hildenbrand
0e1252dc46 KVM: VMX: drop enable_ept check from ept_sync_context()
This function is only called with enable_ept.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-12 14:01:52 +02:00
David Hildenbrand
12d79917a4 KVM: VMX: vmx_vcpu_setup() cannot fail
Make it a void and drop error handling code.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-12 14:01:51 +02:00
Wanpeng Li
0f107682cb KVM: VMX: Don't expose PLE enable if there is no hardware support
KVM doesn't expose the PLE capability to the L1 hypervisor, however,
ple_window still shows the default value on L1 hypervisor. This patch
fixes it by clearing all the PLE related module parameter if there is
no PLE capability.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-12 14:01:50 +02:00
Haozhong Zhang
8eb3f87d90 KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit
When KVM emulates an exit from L2 to L1, it loads L1 CR4 into the
guest CR4. Before this CR4 loading, the guest CR4 refers to L2
CR4. Because these two CR4's are in different levels of guest, we
should vmx_set_cr4() rather than kvm_set_cr4() here. The latter, which
is used to handle guest writes to its CR4, checks the guest change to
CR4 and may fail if the change is invalid.

The failure may cause trouble. Consider we start
  a L1 guest with non-zero L1 PCID in use,
     (i.e. L1 CR4.PCIDE == 1 && L1 CR3.PCID != 0)
and
  a L2 guest with L2 PCID disabled,
     (i.e. L2 CR4.PCIDE == 0)
and following events may happen:

1. If kvm_set_cr4() is used in load_vmcs12_host_state() to load L1 CR4
   into guest CR4 (in VMCS01) for L2 to L1 exit, it will fail because
   of PCID check. As a result, the guest CR4 recorded in L0 KVM (i.e.
   vcpu->arch.cr4) is left to the value of L2 CR4.

2. Later, if L1 attempts to change its CR4, e.g., clearing VMXE bit,
   kvm_set_cr4() in L0 KVM will think L1 also wants to enable PCID,
   because the wrong L2 CR4 is used by L0 KVM as L1 CR4. As L1
   CR3.PCID != 0, L0 KVM will inject GP to L1 guest.

Fixes: 4704d0befb ("KVM: nVMX: Exiting from L2 to L1")
Cc: qemu-stable@nongnu.org
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-12 13:54:56 +02:00
Linus Torvalds
42057e1825 Mixed bugfixes. Perhaps the most interesting one is a latent bug
that was finally triggered by PCID support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZzmJqAAoJEL/70l94x66D99oH/R4hOMfzDxFOgW3LnaCQJvwo
 n1+tH3as0dfdkpggZ+UmJuKnbVJ0625+qozenrdYkKtk1YiyIIQWG3vdsz4HBfzp
 CYK2NVVymf0dg8DQaluz6iB1R28ke12PggzyFv01s1QyENBDA8J38pslZarPM2OA
 tnpRKC6B59/VmRD0PWS6yRmTXY+HfzWlWg4JMraq2VdybbEXJhh8BNfjjNn30DkZ
 SW8kHq60AUd5Arhb3cmiPiXZCQ7odqF2u2mEcBmnA9hAacaGEheSzKCUOaEIjmZV
 5/jTyG1tZkN7CbrG81ryuoa8A6qTOSyHxo1QkzAmE/p+s2IzGfzzLqmtfIsAWkE=
 =1lM1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Mixed bugfixes. Perhaps the most interesting one is a latent bug that
  was finally triggered by PCID support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm/x86: Handle async PF in RCU read-side critical sections
  KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume
  KVM: VMX: use cmpxchg64
  KVM: VMX: simplify and fix vmx_vcpu_pi_load
  KVM: VMX: avoid double list add with VT-d posted interrupts
  KVM: VMX: extract __pi_post_block
  KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exception
  KVM: nVMX: fix HOST_CR3/HOST_CR4 cache
2017-09-29 12:18:55 -07:00
Wanpeng Li
305d0ab476 KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume
------------[ cut here ]------------
 WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0+ #17
 RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 Call Trace:
  ? emulator_read_emulated+0x15/0x20 [kvm]
  ? segmented_read+0xae/0xf0 [kvm]
  vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  x86_emulate_instruction+0x733/0x810 [kvm]
  vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
  ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
  kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2

A nested #PF is triggered during L0 emulating instruction for L2. However, it
doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
it by queuing the #PF exception instead ,requesting an immediate VM exit from
L2 and keeping the exception for L1 pending for a subsequent nested VM exit.

This should actually work all the time, making vmx_inject_page_fault_nested
totally unnecessary.  However, that's not working yet, so this patch can work
around the issue in the meanwhile.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-29 16:30:37 +02:00
Paolo Bonzini
c0a1666bcb KVM: VMX: use cmpxchg64
This fixes a compilation failure on 32-bit systems.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-28 17:58:41 +02:00
Paolo Bonzini
31afb2ea2b KVM: VMX: simplify and fix vmx_vcpu_pi_load
The simplify part: do not touch pi_desc.nv, we can set it when the
VCPU is first created.  Likewise, pi_desc.sn is only handled by
vmx_vcpu_pi_load, do not touch it in __pi_post_block.

The fix part: do not check kvm_arch_has_assigned_device, instead
check the SN bit to figure out whether vmx_vcpu_pi_put ran before.
This matches what the previous patch did in pi_post_block.

Cc: Huangweidong <weidong.huang@huawei.com>
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: wangxin <wangxinxin.wang@huawei.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Longpeng (Mike) <longpeng2@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-27 13:45:42 +02:00
Paolo Bonzini
8b306e2f3c KVM: VMX: avoid double list add with VT-d posted interrupts
In some cases, for example involving hot-unplug of assigned
devices, pi_post_block can forget to remove the vCPU from the
blocked_vcpu_list.  When this happens, the next call to
pi_pre_block corrupts the list.

Fix this in two ways.  First, check vcpu->pre_pcpu in pi_pre_block
and WARN instead of adding the element twice in the list.  Second,
always do the list removal in pi_post_block if vcpu->pre_pcpu is
set (not -1).

The new code keeps interrupts disabled for the whole duration of
pi_pre_block/pi_post_block.  This is not strictly necessary, but
easier to follow.  For the same reason, PI.ON is checked only
after the cmpxchg, and to handle it we just call the post-block
code.  This removes duplication of the list removal code.

Cc: Huangweidong <weidong.huang@huawei.com>
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: wangxin <wangxinxin.wang@huawei.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Longpeng (Mike) <longpeng2@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-27 13:45:37 +02:00
Paolo Bonzini
cd39e1176d KVM: VMX: extract __pi_post_block
Simple code movement patch, preparing for the next one.

Cc: Huangweidong <weidong.huang@huawei.com>
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: wangxin <wangxinxin.wang@huawei.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Longpeng (Mike) <longpeng2@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-27 13:45:28 +02:00
Linus Torvalds
a141fd55f2 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Another round of CR3/PCID related fixes (I think this addresses all
  but one of the known problems with PCID support), an objtool fix plus
  a Clang fix that (finally) solves all Clang quirks to build a bootable
  x86 kernel as-is"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Fix inline asm call constraints for Clang
  objtool: Handle another GCC stack pointer adjustment bug
  x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs
  x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
  x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
  x86/mm: Factor out CR3-building code
2017-09-24 12:33:58 -07:00
Josh Poimboeuf
f5caf621ee x86/asm: Fix inline asm call constraints for Clang
For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:

  static inline void foo()
  {
	register void *__sp asm(_ASM_SP);
	asm("call bar" : "+r" (__sp))
  }

Unfortunately, that pattern causes Clang to corrupt the stack pointer.

The fix is easy: convert the stack pointer register variable to a global
variable.

It should be noted that the end result is different based on the GCC
version.  With GCC 6.4, this patch has exactly the same result as
before:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9820389		9491555		8816046		8516940
 after	9820389		9491555		8816046		8516940

With GCC 7.2, however, GCC's behavior has changed.  It now changes its
behavior based on the conversion of the register variable to a global.
That somehow convinces it to *always* set up the frame pointer before
inserting *any* inline asm.  (Therefore, listing the variable as an
output constraint is a no-op and is no longer necessary.)  It's a bit
overkill, but the performance impact should be negligible.  And in fact,
there's a nice improvement with frame pointers disabled:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9796316		9468236		9076191		8790305
 after	9796957		9464267		9076381		8785949

So in summary, while listing the stack pointer as an output constraint
is no longer necessary for newer versions of GCC, it's still needed for
older versions.

Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 15:06:20 +02:00
Ladi Prosek
44889942b6 KVM: nVMX: fix HOST_CR3/HOST_CR4 cache
For nested virt we maintain multiple VMCS that can run on a vCPU. So it is
incorrect to keep vmcs_host_cr3 and vmcs_host_cr4, whose purpose is caching
the value of the rarely changing HOST_CR3 and HOST_CR4 VMCS fields, in
vCPU-wide data structures.

Hyper-V nested on KVM runs into this consistently for me with PCID enabled.
CR3 is updated with a new value, unlikely(cr3 != vmx->host_state.vmcs_host_cr3)
fires, and the currently loaded VMCS is updated. Then we switch from L2 to
L1 and the next exit reverts CR3 to its old value.

Fixes: d6e41f1151 ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant")
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 10:41:32 +02:00
Haozhong Zhang
5753743fa5 KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc)) in kvm_vcpu_trigger_posted_interrupt()
intends to detect the violation of invariant that VT-d PI notification
event is not suppressed when vcpu is in the guest mode. Because the
two checks for the target vcpu mode and the target suppress field
cannot be performed atomically, the target vcpu mode may change in
between. If that does happen, WARN_ON_ONCE() here may raise false
alarms.

As the previous patch fixed the real invariant breaker, remove this
WARN_ON_ONCE() to avoid false alarms, and document the allowed cases
instead.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 28b835d60f ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-19 15:09:16 +02:00
Haozhong Zhang
dc91f2eb1a KVM: VMX: do not change SN bit in vmx_update_pi_irte()
In kvm_vcpu_trigger_posted_interrupt() and pi_pre_block(), KVM
assumes that PI notification events should not be suppressed when the
target vCPU is not blocked.

vmx_update_pi_irte() sets the SN field before changing an interrupt
from posting to remapping, but it does not check the vCPU mode.
Therefore, the change of SN field may break above the assumption.
Besides, I don't see reasons to suppress notification events here, so
remove the changes of SN field to avoid race condition.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 28b835d60f ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-19 15:09:11 +02:00
Linus Torvalds
9db59599ae * PPC bugfixes
* RCU splat fix
 * swait races fix
 * pointless userspace-triggerable BUG() fix
 * misc fixes for KVM_RUN corner cases
 * nested virt correctness fixes + one host DoS
 * some cleanups
 * clang build fix
 * fix AMD AVIC with default QEMU command line options
 * x86 bugfixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZvANtAAoJEL/70l94x66DtcIH/0i4fenYamxdq2xiWtsZdbcy
 yfk7mWKEzWGZhP+2X8SSOeetd5mqnIcf2cc4m68UCXpt0zoPEjY0i0D4xrYJHZ03
 R3ifqvtpHByodfT7dOKQPEisO8PdJ5tvecaCMnK3u6SNaNLjAZfhobuLppQHOwQO
 eBvpm0jROpA7ENlDgXtsti8MEdsoWtnmGGrRBY77EGW+t24OpNuGB1EMC0nvcs65
 eChwZ3u8xeU5Ws3Y/DiC8tK8t628znknd8ay02LTZjA303Ftoe192jPpS33V4v15
 kqS6vUFy2lpr9L6wicZtcnnSLtKv+LqecK6o8cxNjzlkOeaZuo9D8UMYsWQfj6w=
 =Ma23
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 - PPC bugfixes
 - RCU splat fix
 - swait races fix
 - pointless userspace-triggerable BUG() fix
 - misc fixes for KVM_RUN corner cases
 - nested virt correctness fixes + one host DoS
 - some cleanups
 - clang build fix
 - fix AMD AVIC with default QEMU command line options
 - x86 bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly
  kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly
  kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry
  kvm,mips: Fix potential swait_active() races
  kvm,powerpc: Serialize wq active checks in ops->vcpu_kick
  kvm: Serialize wq active checks in kvm_vcpu_wake_up()
  kvm,x86: Fix apf_task_wake_one() wq serialization
  kvm,lapic: Justify use of swait_active()
  kvm,async_pf: Use swq_has_sleeper()
  sched/wait: Add swq_has_sleeper()
  KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
  KVM: Don't accept obviously wrong gsi values via KVM_IRQFD
  kvm: nVMX: Don't allow L2 to access the hardware CR8
  KVM: trace events: update list of exit reasons
  KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously
  KVM: X86: Don't block vCPU if there is pending exception
  KVM: SVM: Add irqchip_split() checks before enabling AVIC
  KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
  KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
  KVM: x86: fix clang build
  ...
2017-09-15 15:43:55 -07:00
Jim Mattson
4f350c6dbc kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly
When emulating a nested VM-entry from L1 to L2, several control field
validation checks are deferred to the hardware. Should one of these
validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When
this happens, the L2 guest state is not loaded (even in part), and
execution should continue in L1 with the next instruction after the
VMLAUNCH/VMRESUME.

The VMCS12 is not modified (except for the VM-instruction error
field), the VMCS12 MSR save/load lists are not processed, and the CPU
state is not loaded from the VMCS12 host area. Moreover, the vmcs02
exit reason is stale, so it should not be consulted for any reason.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15 16:57:15 +02:00
Jim Mattson
b060ca3b2e kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly
On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the
VM-instruction error field of the current VMCS), the launch state of
the current VMCS is not set to "launched," and the VM-exit information
fields of the current VMCS (including IDT-vectoring information and
exit reason) are stale.

On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit
of the exit reason field), the launch state of the current VMCS is not
set to "launched," and only two of the VM-exit information fields of
the current VMCS are modified (exit reason and exit
qualification). The remaining VM-exit information fields of the
current VMCS (including IDT-vectoring information, in particular) are
stale.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15 16:57:15 +02:00
Jim Mattson
7881f96cac kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry
After a successful VM-entry, RFLAGS is cleared, with the exception of
bit 1, which is always set. This is handled by load_vmcs12_host_state.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15 16:57:14 +02:00
Jan H. Schönherr
3a8b0677fc KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
The value of the guest_irq argument to vmx_update_pi_irte() is
ultimately coming from a KVM_IRQFD API call. Do not BUG() in
vmx_update_pi_irte() if the value is out-of bounds. (Especially,
since KVM as a whole seems to hang after that.)

Instead, print a message only once if we find that we don't have a
route for a certain IRQ (which can be out-of-bounds or within the
array).

This fixes CVE-2017-1000252.

Fixes: efc644048e ("KVM: x86: Update IRTE for posted-interrupts")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15 16:56:43 +02:00
Jim Mattson
51aa68e7d5 kvm: nVMX: Don't allow L2 to access the hardware CR8
If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.

This fixes CVE-2017-12154.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15 14:05:46 +02:00