Commit Graph

6076 Commits

Author SHA1 Message Date
Linus Torvalds
dbb381b619 timekeeping and timer updates:
Core:
 
   - Consolidation of the vDSO build infrastructure to address the
     difficulties of cross-builds for ARM64 compat vDSO libraries by
     restricting the exposure of header content to the vDSO build.
 
     This is achieved by splitting out header content into separate
     headers. which contain only the minimaly required information which is
     necessary to build the vDSO. These new headers are included from the
     kernel headers and the vDSO specific files.
 
   - Enhancements to the generic vDSO library allowing more fine grained
     control over the compiled in code, further reducing architecture
     specific storage and preparing for adopting the generic library by PPC.
 
   - Cleanup and consolidation of the exit related code in posix CPU timers.
 
   - Small cleanups and enhancements here and there
 
  Drivers:
 
   - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support
 
   - Correct the clock rate of PIT64b global clock
 
   - setup_irq() cleanup
 
   - Preparation for PWM and suspend support for the TI DM timer
 
   - Expand the fttmr010 driver to support ast2600 systems
 
   - The usual small fixes, enhancements and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B+QETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofJ5D/94s5fpaqiuNcaAsLq2D3DRIrTnqxx7
 yEeAOPcbYV1bM1SgY/M83L5yGc2S8ny787e26abwRTCZhZV3eAmRTphIFFIZR0Xk
 xS+i67odscbdJTRtztKj3uQ9rFxefszRuphyaa89pwSY9nnyMWLcahGSQOGs0LJK
 hvmgwPjyM1drNfPxgPiaFg7vDr2XxNATpQr/FBt+BhelvVan8TlAfrkcNPiLr++Y
 Axz925FP7jMaRRbZ1acji34gLiIAZk0jLCUdbix7YkPrqDB4GfO+v8Vez+fGClbJ
 uDOYeR4r1+Be/BtSJtJ2tHqtsKCcAL6agtaE2+epZq5HbzaZFRvBFaxgFNF8WVcn
 3FFibdEMdsRNfZTUVp5wwgOLN0UIqE/7LifE12oLEL2oFB5H2PiNEUw3E02XHO11
 rL3zgHhB6Ke1sXKPCjSGdmIQLbxZmV5kOlQFy7XuSeo5fmRapVzKNffnKcftIliF
 1HNtZbgdA+3tdxMFCqoo1QX+kotl9kgpslmdZ0qHAbaRb3xqLoSskbqEjFRMuSCC
 8bjJrwboD9T5GPfwodSCgqs/58CaSDuqPFbIjCay+p90Fcg6wWAkZtyG04ZLdPRc
 GgNNdN4gjTD9bnrRi8cH47z1g8OO4vt4K4SEbmjo8IlDW+9jYMxuwgR88CMeDXd7
 hu7aKsr2I2q/WQ==
 =5o9G
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timekeeping and timer updates from Thomas Gleixner:
 "Core:

   - Consolidation of the vDSO build infrastructure to address the
     difficulties of cross-builds for ARM64 compat vDSO libraries by
     restricting the exposure of header content to the vDSO build.

     This is achieved by splitting out header content into separate
     headers. which contain only the minimaly required information which
     is necessary to build the vDSO. These new headers are included from
     the kernel headers and the vDSO specific files.

   - Enhancements to the generic vDSO library allowing more fine grained
     control over the compiled in code, further reducing architecture
     specific storage and preparing for adopting the generic library by
     PPC.

   - Cleanup and consolidation of the exit related code in posix CPU
     timers.

   - Small cleanups and enhancements here and there

  Drivers:

   - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support

   - Correct the clock rate of PIT64b global clock

   - setup_irq() cleanup

   - Preparation for PWM and suspend support for the TI DM timer

   - Expand the fttmr010 driver to support ast2600 systems

   - The usual small fixes, enhancements and cleanups all over the
     place"

* tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
  Revert "clocksource/drivers/timer-probe: Avoid creating dead devices"
  vdso: Fix clocksource.h macro detection
  um: Fix header inclusion
  arm64: vdso32: Enable Clang Compilation
  lib/vdso: Enable common headers
  arm: vdso: Enable arm to use common headers
  x86/vdso: Enable x86 to use common headers
  mips: vdso: Enable mips to use common headers
  arm64: vdso32: Include common headers in the vdso library
  arm64: vdso: Include common headers in the vdso library
  arm64: Introduce asm/vdso/processor.h
  arm64: vdso32: Code clean up
  linux/elfnote.h: Replace elf.h with UAPI equivalent
  scripts: Fix the inclusion order in modpost
  common: Introduce processor.h
  linux/ktime.h: Extract common header for vDSO
  linux/jiffies.h: Extract common header for vDSO
  linux/time64.h: Extract common header for vDSO
  linux/time32.h: Extract common header for vDSO
  linux/time.h: Extract common header for vDSO
  ...
2020-03-30 18:51:47 -07:00
Linus Torvalds
9b82f05f86 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main changes in this cycle were:

  Kernel side changes:

   - A couple of x86/cpu cleanups and changes were grandfathered in due
     to patch dependencies. These clean up the set of CPU model/family
     matching macros with a consistent namespace and C99 initializer
     style.

   - A bunch of updates to various low level PMU drivers:
       * AMD Family 19h L3 uncore PMU
       * Intel Tiger Lake uncore support
       * misc fixes to LBR TOS sampling

   - optprobe fixes

   - perf/cgroup: optimize cgroup event sched-in processing

   - misc cleanups and fixes

  Tooling side changes are to:

   - perf {annotate,expr,record,report,stat,test}

   - perl scripting

   - libapi, libperf and libtraceevent

   - vendor events on Intel and S390, ARM cs-etm

   - Intel PT updates

   - Documentation changes and updates to core facilities

   - misc cleanups, fixes and other enhancements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
  cpufreq/intel_pstate: Fix wrong macro conversion
  x86/cpu: Cleanup the now unused CPU match macros
  hwrng: via_rng: Convert to new X86 CPU match macros
  crypto: Convert to new CPU match macros
  ASoC: Intel: Convert to new X86 CPU match macros
  powercap/intel_rapl: Convert to new X86 CPU match macros
  PCI: intel-mid: Convert to new X86 CPU match macros
  mmc: sdhci-acpi: Convert to new X86 CPU match macros
  intel_idle: Convert to new X86 CPU match macros
  extcon: axp288: Convert to new X86 CPU match macros
  thermal: Convert to new X86 CPU match macros
  hwmon: Convert to new X86 CPU match macros
  platform/x86: Convert to new CPU match macros
  EDAC: Convert to new X86 CPU match macros
  cpufreq: Convert to new X86 CPU match macros
  ACPI: Convert to new X86 CPU match macros
  x86/platform: Convert to new CPU match macros
  x86/kernel: Convert to new CPU match macros
  x86/kvm: Convert to new CPU match macros
  x86/perf/events: Convert to new CPU match macros
  ...
2020-03-30 16:40:08 -07:00
Thomas Gleixner
a215032725 Merge branch 'next.uaccess-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into x86/cleanups
Pull uaccess cleanups from Al Viro:

  Consolidate the user access areas and get rid of uaccess_try(), user_ex()
  and other warts.
2020-03-28 11:57:02 +01:00
Wanpeng Li
d5361678e6 KVM: X86: Micro-optimize IPI fastpath delay
This patch optimizes the virtual IPI fastpath emulation sequence:

write ICR2                          send virtual IPI
read ICR2                           write ICR2
send virtual IPI         ==>        write ICR
write ICR

We can observe ~0.67% performance improvement for IPI microbenchmark
(https://lore.kernel.org/kvm/20171219085010.4081-1-ynorov@caviumnetworks.com/)
on Skylake server.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1585189202-1708-4-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-26 05:58:26 -04:00
Wanpeng Li
8a1038de11 KVM: X86: Delay read msr data iff writes ICR MSR
Delay read msr data until we identify guest accesses ICR MSR to avoid
to penalize all other MSR writes.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1585189202-1708-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-26 05:58:25 -04:00
Wanpeng Li
e1be9ac8e6 KVM: X86: Narrow down the IPI fastpath to single target IPI
The original single target IPI fastpath patch forgot to filter the
ICR destination shorthand field. Multicast IPI is not suitable for
this feature since wakeup the multiple sleeping vCPUs will extend
the interrupt disabled time, it especially worse in the over-subscribe
and VM has a little bit more vCPUs scenario. Let's narrow it down to
single target IPI.

Two VMs, each is 76 vCPUs, one running 'ebizzy -M', the other
running cyclictest on all vCPUs, w/ this patch, the avg score
of cyclictest can improve more than 5%. (pv tlb, pv ipi, pv
sched yield are disabled during testing to avoid the disturb).

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1585189202-1708-3-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-26 05:44:21 -04:00
Ingo Molnar
629b3df7ec Merge branch 'x86/cpu' into perf/core, to resolve conflict
Conflicts:
	arch/x86/events/intel/uncore.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-25 15:20:44 +01:00
Thomas Gleixner
320debe5ef x86/kvm: Convert to new CPU match macros
The new macro set has a consistent namespace and uses C99 initializers
instead of the grufty C89 ones.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131509.136884777@linutronix.de
2020-03-24 21:27:29 +01:00
Thomas Gleixner
ba5bade4cc x86/devicetable: Move x86 specific macro out of generic code
There is no reason that this gunk is in a generic header file. The wildcard
defines need to stay as they are required by file2alias.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131508.736205164@linutronix.de
2020-03-24 21:02:47 +01:00
Wanpeng Li
94be4b85d8 KVM: LAPIC: Also cancel preemption timer when disarm LAPIC timer
The timer is disarmed when switching between TSC deadline and other modes,
we should set everything to disarmed state, however, LAPIC timer can be
emulated by preemption timer, it still works if vmx->hv_deadline_timer is
not -1. This patch also cancels preemption timer when disarm LAPIC timer.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1585031530-19823-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-24 07:25:20 -04:00
Sean Christopherson
4f6ea0a876 KVM: VMX: Gracefully handle faults on VMXON
Gracefully handle faults on VMXON, e.g. #GP due to VMX being disabled by
BIOS, instead of letting the fault crash the system.  Now that KVM uses
cpufeatures to query support instead of reading MSR_IA32_FEAT_CTL
directly, it's possible for a bug in a different subsystem to cause KVM
to incorrectly attempt VMXON[*].  Crashing the system is especially
annoying if the system is configured such that hardware_enable() will
be triggered during boot.

Oppurtunistically rename @addr to @vmxon_pointer and use a named param
to reference it in the inline assembly.

Print 0xdeadbeef in the ultra-"rare" case that reading MSR_IA32_FEAT_CTL
also faults.

[*] https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.com
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-4-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 15:44:26 -04:00
Sean Christopherson
d260f9ef50 KVM: VMX: Fold loaded_vmcs_init() into alloc_loaded_vmcs()
Subsume loaded_vmcs_init() into alloc_loaded_vmcs(), its only remaining
caller, and drop the VMCLEAR on the shadow VMCS, which is guaranteed to
be NULL.  loaded_vmcs_init() was previously used by loaded_vmcs_clear(),
but loaded_vmcs_clear() also subsumed loaded_vmcs_init() to properly
handle smp_wmb() with respect to VMCLEAR.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-3-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 15:44:26 -04:00
Sean Christopherson
31603d4fc2 KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.

Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs().  This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.

Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel.  Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.

Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration.  list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new).  A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.

In addition to removing the temporary disabling of VMCLEAR, open code
loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
the VMCS is deleted from the list only after it's been VMCLEAR'd.
Deleting the VMCS before VMCLEAR would allow a race where the NMI
shootdown could arrive between list_del() and vmcs_clear() and thus
neither flow would execute a successful VMCLEAR.  Alternatively, more
code could be moved into loaded_vmcs_init(), but that gets rather silly
as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
and would need to work around the list_del().

Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.

[*] https://patchwork.kernel.org/patch/1675731/#3720461

Fixes: 8f536b7697 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 15:44:25 -04:00
Zhenyu Wang
e3747407c4 KVM: x86: Expose fast short REP MOV for supported cpuid
For CPU supporting fast short REP MOV (XF86_FEATURE_FSRM) e.g Icelake,
Tigerlake, expose it in KVM supported cpuid as well.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Message-Id: <20200323092236.3703-1-zhenyuw@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 15:44:24 -04:00
Nick Desaulniers
428b8f1d9f KVM: VMX: don't allow memory operands for inline asm that modifies SP
THUNK_TARGET defines [thunk_target] as having "rm" input constraints
when CONFIG_RETPOLINE is not set, which isn't constrained enough for
this specific case.

For inline assembly that modifies the stack pointer before using this
input, the underspecification of constraints is dangerous, and results
in an indirect call to a previously pushed flags register.

In this case `entry`'s stack slot is good enough to satisfy the "m"
constraint in "rm", but the inline assembly in
handle_external_interrupt_irqoff() modifies the stack pointer via
push+pushf before using this input, which in this case results in
calling what was the previous state of the flags register, rather than
`entry`.

Be more specific in the constraints by requiring `entry` be in a
register, and not a memory operand.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+3f29ca2efb056a761e38@syzkaller.appspotmail.com
Debugged-by: Alexander Potapenko <glider@google.com>
Debugged-by: Paolo Bonzini <pbonzini@redhat.com>
Debugged-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Message-Id: <20200323191243.30002-1-ndesaulniers@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 15:40:51 -04:00
He Zhe
edec6e015a KVM: LAPIC: Mark hrtimer for period or oneshot mode to expire in hard interrupt context
apic->lapic_timer.timer was initialized with HRTIMER_MODE_ABS_HARD but
started later with HRTIMER_MODE_ABS, which may cause the following warning
in PREEMPT_RT kernel.

WARNING: CPU: 1 PID: 2957 at kernel/time/hrtimer.c:1129 hrtimer_start_range_ns+0x348/0x3f0
CPU: 1 PID: 2957 Comm: qemu-system-x86 Not tainted 5.4.23-rt11 #1
Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1a 09/18/2018
RIP: 0010:hrtimer_start_range_ns+0x348/0x3f0
Code: 4d b8 0f 94 c1 0f b6 c9 e8 35 f1 ff ff 4c 8b 45
      b0 e9 3b fd ff ff e8 d7 3f fa ff 48 98 4c 03 34
      c5 a0 26 bf 93 e9 a1 fd ff ff <0f> 0b e9 fd fc ff
      ff 65 8b 05 fa b7 90 6d 89 c0 48 0f a3 05 60 91
RSP: 0018:ffffbc60026ffaf8 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff9d81657d4110 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000006cc7987bcf RDI: ffff9d81657d4110
RBP: ffffbc60026ffb58 R08: 0000000000000001 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000000 R12: 0000006cc7987bcf
R13: 0000000000000000 R14: 0000006cc7987bcf R15: ffffbc60026d6a00
FS: 00007f401daed700(0000) GS:ffff9d81ffa40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000ffffffff CR3: 0000000fa7574000 CR4: 00000000003426e0
Call Trace:
? kvm_release_pfn_clean+0x22/0x60 [kvm]
start_sw_timer+0x85/0x230 [kvm]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
kvm_lapic_switch_to_sw_timer+0x72/0x80 [kvm]
vmx_pre_block+0x1cb/0x260 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_sync_pir_to_irr+0x9e/0x100 [kvm_intel]
? kvm_apic_has_interrupt+0x46/0x80 [kvm]
kvm_arch_vcpu_ioctl_run+0x85b/0x1fa0 [kvm]
? _raw_spin_unlock_irqrestore+0x18/0x50
? _copy_to_user+0x2c/0x30
kvm_vcpu_ioctl+0x235/0x660 [kvm]
? rt_spin_unlock+0x2c/0x50
do_vfs_ioctl+0x3e4/0x650
? __fget+0x7a/0xa0
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x4d/0x120
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4027cc54a7
Code: 00 00 90 48 8b 05 e9 59 0c 00 64 c7 00 26 00 00
      00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
      00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff
      73 01 c3 48 8b 0d b9 59 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f401dae9858 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00005558bd029690 RCX: 00007f4027cc54a7
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000d
RBP: 00007f4028b72000 R08: 00005558bc829ad0 R09: 00000000ffffffff
R10: 00005558bcf90ca0 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00005558bce1c840
--[ end trace 0000000000000002 ]--

Signed-off-by: He Zhe <zhe.he@windriver.com>
Message-Id: <1584687967-332859-1-git-send-email-zhe.he@windriver.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 09:01:14 -04:00
Tom Lendacky
2e2409afe5 KVM: SVM: Issue WBINVD after deactivating an SEV guest
Currently, CLFLUSH is used to flush SEV guest memory before the guest is
terminated (or a memory hotplug region is removed). However, CLFLUSH is
not enough to ensure that SEV guest tagged data is flushed from the cache.

With 33af3a7ef9 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations"), the
original WBINVD was removed. This then exposed crashes at random times
because of a cache flush race with a page that had both a hypervisor and
a guest tag in the cache.

Restore the WBINVD when destroying an SEV guest and add a WBINVD to the
svm_unregister_enc_region() function to ensure hotplug memory is flushed
when removed. The DF_FLUSH can still be avoided at this point.

Fixes: 33af3a7ef9 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <c8bf9087ca3711c5770bdeaafa3e45b717dc5ef4.1584720426.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 09:01:04 -04:00
Paolo Bonzini
2da1ed62d5 KVM: SVM: document KVM_MEM_ENCRYPT_OP, let userspace detect if SEV is available
Userspace has no way to query if SEV has been disabled with the
sev module parameter of kvm-amd.ko.  Actually it has one, but it
is a hack: do ioctl(KVM_MEM_ENCRYPT_OP, NULL) and check if it
returns EFAULT.  Make it a little nicer by returning zero for
SEV enabled and NULL argument, and while at it document the
ioctl arguments.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-20 13:47:52 -04:00
Paolo Bonzini
d332945418 KVM: x86: remove bogus user-triggerable WARN_ON
The WARN_ON is essentially comparing a user-provided value with 0.  It is
trivial to trigger it just by passing garbage to KVM_SET_CLOCK.  Guests
can break if you do so, but the same applies to every KVM_SET_* ioctl.
So, if it hurts when you do like this, just do not do it.

Reported-by: syzbot+00be5da1d75f1cc95f6b@syzkaller.appspotmail.com
Fixes: 9446e6fce0 ("KVM: x86: fix WARN_ON check of an unsigned less than zero")
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-20 13:43:21 -04:00
Xiaoyao Li
cf6c26ec7b KVM: x86: Code style cleanup in kvm_arch_dev_ioctl()
In kvm_arch_dev_ioctl(), the brackets of case KVM_X86_GET_MCE_CAP_SUPPORTED
accidently encapsulates case KVM_GET_MSR_FEATURE_INDEX_LIST and case
KVM_GET_MSRS. It doesn't affect functionality but it's misleading.

Remove unnecessary brackets and opportunistically add a "break" in the
default path.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18 14:05:45 +01:00
Sean Christopherson
2b110b6164 KVM: x86: Add blurb to CPUID tracepoint when using max basic leaf values
Tack on "used max basic" at the end of the CPUID tracepoint when the
output values correspond to the max basic leaf, i.e. when emulating
Intel's out-of-range CPUID behavior.  Observing "cpuid entry not found"
in the tracepoint with non-zero output values is confusing for users
that aren't familiar with the out-of-range semantics, and qualifying the
"not found" case hopefully makes it clear that "found" means "found the
exact entry".

Suggested-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18 13:44:43 +01:00
Sean Christopherson
e7adda2810 KVM: x86: Add requested index to the CPUID tracepoint
Output the requested index when tracing CPUID emulation; it's basically
mandatory for leafs where the index is meaningful, and is helpful for
verifying KVM correctness even when the index isn't meaningful, e.g. the
trace for a Linux guest's hypervisor_cpuid_base() probing appears to
be broken (returns all zeroes) at first glance, but is correct because
the index is non-zero, i.e. the output values correspond to a random
index in the maximum basic leaf.

Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18 13:44:18 +01:00
Paolo Bonzini
d55c9d4009 KVM: nSVM: check for EFER.SVME=1 before entering guest
EFER is set for L2 using svm_set_efer, which hardcodes EFER_SVME to 1 and hides
an incorrect value for EFER.SVME in the L1 VMCB.  Perform the check manually
to detect invalid guest state.

Reported-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18 13:41:32 +01:00
Zhenyu Wang
9401f2e5b0 KVM: x86: Expose AVX512 VP2INTERSECT in cpuid for TGL
On Tigerlake new AVX512 VP2INTERSECT feature is available.
This allows to expose it via KVM_GET_SUPPORTED_CPUID.

Cc: "Zhong, Yang" <yang.zhong@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18 12:25:45 +01:00
Paolo Bonzini
96b100cd14 KVM: nVMX: remove side effects from nested_vmx_exit_reflected
The name of nested_vmx_exit_reflected suggests that it's purely
a test, but it actually marks VMCS12 pages as dirty.  Move this to
vmx_handle_exit, observing that the initial nested_run_pending check in
nested_vmx_exit_reflected is pointless---nested_run_pending has just
been cleared in vmx_vcpu_run and won't be set until handle_vmlaunch
or handle_vmresume.

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18 12:16:39 +01:00
Uros Bizjak
bb03911f79 KVM: VMX: access regs array in vmenter.S in its natural order
Registers in "regs" array are indexed as rax/rcx/rdx/.../rsi/rdi/r8/...
Reorder access to "regs" array in vmenter.S to follow its natural order.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-17 14:14:32 +01:00
Vitaly Kuznetsov
b6a0653ae2 KVM: nVMX: properly handle errors in nested_vmx_handle_enlightened_vmptrld()
nested_vmx_handle_enlightened_vmptrld() fails in two cases:
- when we fail to kvm_vcpu_map() the supplied GPA
- when revision_id is incorrect.
Genuine Hyper-V raises #UD in the former case (at least with *some*
incorrect GPAs) and does VMfailInvalid() in the later. KVM doesn't do
anything so L1 just gets stuck retrying the same faulty VMLAUNCH.

nested_vmx_handle_enlightened_vmptrld() has two call sites:
nested_vmx_run() and nested_get_vmcs12_pages(). The former needs to queue
do much: the failure there happens after migration when L2 was running (and
L1 did something weird like wrote to VP assist page from a different vCPU),
just kill L1 with KVM_EXIT_INTERNAL_ERROR.

Reported-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[Squash kbuild autopatch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 18:19:30 +01:00
Vitaly Kuznetsov
e942dbf8c5 KVM: nVMX: stop abusing need_vmcs12_to_shadow_sync for eVMCS mapping
When vmx_set_nested_state() happens, we may not have all the required
data to map enlightened VMCS: e.g. HV_X64_MSR_VP_ASSIST_PAGE MSR may not
yet be restored so we need a postponed action. Currently, we (ab)use
need_vmcs12_to_shadow_sync/nested_sync_vmcs12_to_shadow() for that but
this is not ideal:
- We may not need to sync anything if L2 is running
- It is hard to propagate errors from nested_sync_vmcs12_to_shadow()
 as we call it from vmx_prepare_switch_to_guest() which happens just
 before we do VMLAUNCH, the code is not ready to handle errors there.

Move eVMCS mapping to nested_get_vmcs12_pages() and request
KVM_REQ_GET_VMCS12_PAGES, it seems to be is less abusive in nature.
It would probably be possible to introduce a specialized KVM_REQ_EVMCS_MAP
but it is undesirable to propagate eVMCS specifics all the way up to x86.c

Note, we don't need to request KVM_REQ_GET_VMCS12_PAGES from
vmx_set_nested_state() directly as nested_vmx_enter_non_root_mode() already
does that. Requesting KVM_REQ_GET_VMCS12_PAGES is done to document the
(non-obvious) side-effect and to be future proof.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 18:19:29 +01:00
Paolo Bonzini
0c546725ee Merge branch 'kvm-null-pointer-fix' into HEAD 2020-03-16 17:59:11 +01:00
Miaohe Lin
0b66465344 KVM: nSVM: Remove an obsolete comment.
The function does not return bool anymore.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:59:00 +01:00
Paolo Bonzini
8e205a6b2a KVM: X86: correct meaningless kvm_apicv_activated() check
After test_and_set_bit() for kvm->arch.apicv_inhibit_reasons, we will
always get false when calling kvm_apicv_activated() because it's sure
apicv_inhibit_reasons do not equal to 0.

What the code wants to do, is check whether APICv was *already* active
and if so skip the costly request; we can do this using cmpxchg.

Reported-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:59 +01:00
Oliver Upton
212617dbb6 KVM: nVMX: Consolidate nested MTF checks to helper function
commit 5ef8acbdd6 ("KVM: nVMX: Emulate MTF when performing
instruction emulation") introduced a helper to check the MTF
VM-execution control in vmcs12. Change pre-existing check in
nested_vmx_exit_reflected() to instead use the helper.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:59 +01:00
Wanpeng Li
041bc42ce2 KVM: VMX: Micro-optimize vmexit time when not exposing PMU
PMU is not exposed to guest by most of products from cloud providers since the
bad performance of PMU emulation and security concern. However, it calls
perf_guest_switch_get_msrs() and clear_atomic_switch_msr() unconditionally
even if PMU is not exposed to the guest before each vmentry.

~2% vmexit time reduced can be observed by kvm-unit-tests/vmexit.flat on my
SKX server.

Before patch:
vmcall 1559

After patch:
vmcall 1529

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:58 +01:00
Suravee Suthikulpanit
ab56f8e62d kvm: svm: Introduce GA Log tracepoint for AVIC
GA Log tracepoint is useful when debugging AVIC performance
issue as it can be used with perf to count the number of times
IOMMU AVIC injects interrupts through the slow-path instead of
directly inject interrupts to the target vcpu.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:56 +01:00
Paolo Bonzini
78f2145c4d KVM: nSVM: avoid loss of pending IRQ/NMI before entering L2
This patch reproduces for nSVM the change that was made for nVMX in
commit b5861e5cf2 ("KVM: nVMX: Fix loss of pending IRQ/NMI before
entering L2").  While I do not have a test that breaks without it, I
cannot see why it would not be necessary since all events are unblocked
by VMRUN's setting of GIF back to 1.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:55 +01:00
Paolo Bonzini
b518ba9fa6 KVM: nSVM: implement check_nested_events for interrupts
The current implementation of physical interrupt delivery to a nested guest
is quite broken.  It relies on svm_interrupt_allowed returning false if
VINTR=1 so that the interrupt can be injected from enable_irq_window,
but this does not work for guests that do not intercept HLT or that rely
on clearing the host IF to block physical interrupts while L2 runs.

This patch can be split in two logical parts, but including only
one breaks tests so I am combining both changes together.

The first and easiest is simply to return true for svm_interrupt_allowed
if HF_VINTR_MASK is set and HIF is set.  This way the semantics of
svm_interrupt_allowed are respected: svm_interrupt_allowed being false
does not mean "call enable_irq_window", it means "interrupts cannot
be injected now".

After doing this, however, we need another place to inject the
interrupt, and fortunately we already have one, check_nested_events,
which nested SVM does not implement but which is meant exactly for this
purpose.  It is called before interrupts are injected, and it can
therefore do the L2->L1 switch while leaving inject_pending_event
none the wiser.

This patch was developed together with Cathy Avery, who wrote the
test and did a lot of the initial debugging.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:54 +01:00
Paolo Bonzini
64b5bd2704 KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1
If a nested VM is started while an IRQ was pending and with
V_INTR_MASKING=1, the behavior of the guest depends on host IF.  If it
is 1, the VM should exit immediately, before executing the first
instruction of the guest, because VMRUN sets GIF back to 1.

If it is 0 and the host has VGIF, however, at the time of the VMRUN
instruction L0 is running the guest with a pending interrupt window
request.  This interrupt window request is completely irrelevant to
L2, since IF only controls virtual interrupts, so this patch drops
INTERCEPT_VINTR from the VMCB while running L2 under these circumstances.
To simplify the code, both steps of enabling the interrupt window
(setting the VINTR intercept and requesting a fake virtual interrupt
in svm_inject_irq) are grouped in the svm_set_vintr function, and
likewise for dismissing the interrupt window request in svm_clear_vintr.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:53 +01:00
Paolo Bonzini
b5ec2e020b KVM: nSVM: do not change host intercepts while nested VM is running
Instead of touching the host intercepts so that the bitwise OR in
recalc_intercepts just works, mask away uninteresting intercepts
directly in recalc_intercepts.

This is cleaner and keeps the logic in one place even for intercepts
that can change even while L2 is running.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:53 +01:00
Paolo Bonzini
727a7e27cf KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgd
The set_cr3 callback is not setting the guest CR3, it is setting the
root of the guest page tables, either shadow or two-dimensional.
To make this clearer as well as to indicate that the MMU calls it
via kvm_mmu_load_cr3, rename it to load_mmu_pgd.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:52 +01:00
Paolo Bonzini
689f3bf216 KVM: x86: unify callbacks to load paging root
Similar to what kvm-intel.ko is doing, provide a single callback that
merges svm_set_cr3, set_tdp_cr3 and nested_svm_set_tdp_cr3.

This lets us unify the set_cr3 and set_tdp_cr3 entries in kvm_x86_ops.
I'm doing that in this same patch because splitting it adds quite a bit
of churn due to the need for forward declarations.  For the same reason
the assignment to vcpu->arch.mmu->set_cr3 is moved to kvm_init_shadow_mmu
from init_kvm_softmmu and nested_svm_init_mmu_context.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:51 +01:00
Sean Christopherson
f91af5176c KVM: x86: Refactor kvm_cpuid() param that controls out-of-range logic
Invert and rename the kvm_cpuid() param that controls out-of-range logic
to better reflect the semantics of the affected callers, i.e. callers
that bypass the out-of-range logic do so because they are looking up an
exact guest CPUID entry, e.g. to query the maxphyaddr.

Similarly, rename kvm_cpuid()'s internal "found" to "exact" to clarify
that it tracks whether or not the exact requested leaf was found, as
opposed to any usable leaf being found.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:50 +01:00
Sean Christopherson
09c7431ed3 KVM: x86: Refactor out-of-range logic to contain the madness
Move all of the out-of-range logic into a single helper,
get_out_of_range_cpuid_entry(), to avoid an extra lookup of CPUID.0.0
and to provide a single location for documenting the out-of-range
behavior.

No functional change intended.

Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:49 +01:00
Sean Christopherson
8d8923115f KVM: x86: Fix CPUID range checks for Hypervisor and Centaur classes
Rework the masking in the out-of-range CPUID logic to handle the
Hypervisor sub-classes, as well as the Centaur class if the guest
virtual CPU vendor is Centaur.

Masking against 0x80000000 only handles basic and extended leafs, which
results in Hypervisor range checks being performed against the basic
CPUID class, and Centuar range checks being performed against the
Extended class.  E.g. if CPUID.0x40000000.EAX returns 0x4000000A and
there is no entry for CPUID.0x40000006, then function 0x40000006 would
be incorrectly reported as out of bounds.

While there is no official definition of what constitutes a class, the
convention established for Hypervisor classes effectively uses bits 31:8
as the mask by virtue of checking for different bases in increments of
0x100, e.g. KVM advertises its CPUID functions starting at 0x40000100
when HyperV features are advertised at the default base of 0x40000000.

The bad range check doesn't cause functional problems for any known VMM
because out-of-range semantics only come into play if the exact entry
isn't found, and VMMs either support a very limited Hypervisor range,
e.g. the official KVM range is 0x40000000-0x40000001 (effectively no
room for undefined leafs) or explicitly defines gaps to be zero, e.g.
Qemu explicitly creates zeroed entries up to the Centaur and Hypervisor
limits (the latter comes into play when providing HyperV features).

The bad behavior can be visually confirmed by dumping CPUID output in
the guest when running Qemu with a stable TSC, as Qemu extends the limit
of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq,
without defining zeroed entries for 0x40000002 - 0x4000000f.

Note, documentation of Centaur/VIA CPUs is hard to come by.  Designating
0xc0000000 - 0xcfffffff as the Centaur class is a best guess as to the
behavior of a real Centaur/VIA CPU.

Fixes: 43561123ab ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH")
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:49 +01:00
Sean Christopherson
23493d0a17 KVM x86: Extend AMD specific guest behavior to Hygon virtual CPUs
Extend guest_cpuid_is_amd() to cover Hygon virtual CPUs and rename it
accordingly.  Hygon CPUs use an AMD-based core and so have the same
basic behavior as AMD CPUs.

Fixes: b8f4abb652 ("x86/kvm: Add Hygon Dhyana support to KVM")
Cc: Pu Wen <puwen@hygon.cn>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:48 +01:00
Sean Christopherson
15608ed03f KVM: x86: Add helpers to perform CPUID-based guest vendor check
Add helpers to provide CPUID-based guest vendor checks, i.e. to do the
ugly register comparisons.  Use the new helpers to check for an AMD
guest vendor in guest_cpuid_is_amd() as well as in the existing emulator
flows.

Using the new helpers fixes a _very_ theoretical bug where
guest_cpuid_is_amd() would get a false positive on a non-AMD virtual CPU
with a vendor string beginning with "Auth" due to the previous logic
only checking EBX.  It also fixes a marginally less theoretically bug
where guest_cpuid_is_amd() would incorrectly return false for a guest
CPU with "AMDisbetter!" as its vendor string.

Fixes: a0c0feb579 ("KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:47 +01:00
Jan Kiszka
b7fb8488c8 KVM: x86: Trace the original requested CPUID function in kvm_cpuid()
Trace the requested CPUID function instead of the effective function,
e.g. if the requested function is out-of-range and KVM is emulating an
Intel CPU, as the intent of the tracepoint is to show if the output came
from the actual leaf as opposed to the max basic leaf via redirection.

Similarly, leave "found" as is, i.e. report that an entry was found if
and only if the requested entry was found.

Fixes: 43561123ab ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH")
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
[Sean: Drop "found" semantic change, reword changelong accordingly ]
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:46 +01:00
Paolo Bonzini
408e9a318f KVM: CPUID: add support for supervisor states
Current CPUID 0xd enumeration code does not support supervisor
states, because KVM only supports setting IA32_XSS to zero.
Change it instead to use a new variable supported_xss, to be
set from the hardware_setup callback which is in charge of CPU
capabilities.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:45 +01:00
Sean Christopherson
257038745c KVM: x86: Move nSVM CPUID 0x8000000A handling into common x86 code
Handle CPUID 0x8000000A in the main switch in __do_cpuid_func() and drop
->set_supported_cpuid() now that both VMX and SVM implementations are
empty.  Like leaf 0x14 (Intel PT) and leaf 0x8000001F (SEV), leaf
0x8000000A is is (obviously) vendor specific but can be queried in
common code while respecting SVM's wishes by querying kvm_cpu_cap_has().

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:45 +01:00
Sean Christopherson
4eb87460c4 KVM: nSVM: Advertise and enable NRIPS for L1 iff nrips is enabled
Set NRIPS in KVM capabilities if and only if nrips=true, which naturally
incorporates the boot_cpu_has() check, and set nrips_enabled only if the
KVM capability is enabled.

Note, previously KVM would set nrips_enabled based purely on userspace
input, but at worst that would cause KVM to propagate garbage into L1,
i.e. userspace would simply be hosing its VM.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:44 +01:00
Sean Christopherson
a50718cc3f KVM: nSVM: Expose SVM features to L1 iff nested is enabled
Set SVM feature bits in KVM capabilities if and only if nested=true, KVM
shouldn't advertise features that realistically can't be used.  Use
kvm_cpu_cap_has(X86_FEATURE_SVM) to indirectly query "nested" in
svm_set_supported_cpuid() in anticipation of moving CPUID 0x8000000A
adjustments into common x86 code.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:43 +01:00
Sean Christopherson
91661989d1 KVM: x86: Move VMX's host_efer to common x86 code
Move host_efer to common x86 code and use it for CPUID's is_efer_nx() to
avoid constantly re-reading the MSR.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:42 +01:00
Sean Christopherson
600087b614 KVM: Drop largepages_enabled and its accessor/mutator
Drop largepages_enabled, kvm_largepages_enabled() and
kvm_disable_largepages() now that all users are gone.

Note, largepages_enabled was an x86-only flag that got left in common
KVM code when KVM gained support for multiple architectures.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:42 +01:00
Sean Christopherson
e884b854ee KVM: x86: Don't propagate MMU lpage support to memslot.disallow_lpage
Stop propagating MMU large page support into a memslot's disallow_lpage
now that the MMU's max_page_level handles the scenario where VMX's EPT is
enabled and EPT doesn't support 2M pages.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:41 +01:00
Sean Christopherson
703c335d06 KVM: x86/mmu: Configure max page level during hardware setup
Configure the max page level during hardware setup to avoid a retpoline
in the page fault handler.  Drop ->get_lpage_level() as the page fault
handler was the last user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:40 +01:00
Sean Christopherson
bde7723559 KVM: x86/mmu: Merge kvm_{enable,disable}_tdp() into a common function
Combine kvm_enable_tdp() and kvm_disable_tdp() into a single function,
kvm_configure_mmu(), in preparation for doing additional configuration
during hardware setup.  And because having separate helpers is silly.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:39 +01:00
Sean Christopherson
213e0e1f50 KVM: SVM: Refactor logging of NPT enabled/disabled
Tweak SVM's logging of NPT enabled/disabled to handle the logging in a
single pr_info() in preparation for merging kvm_enable_tdp() and
kvm_disable_tdp() into a single function.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:38 +01:00
Sean Christopherson
a1bead2aba KVM: VMX: Directly query Intel PT mode when refreshing PMUs
Use vmx_pt_mode_is_host_guest() in intel_pmu_refresh() instead of
bouncing through kvm_x86_ops->pt_supported, and remove ->pt_supported()
as the PMU code was the last remaining user.

Opportunistically clean up the wording of a comment that referenced
kvm_x86_ops->pt_supported().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:38 +01:00
Sean Christopherson
7b874c26a6 KVM: x86: Check for Intel PT MSR virtualization using KVM cpu caps
Use kvm_cpu_cap_has() to check for Intel PT when processing the list of
virtualized MSRs to pave the way toward removing ->pt_supported().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:37 +01:00
Sean Christopherson
a7a200eb4c KVM: VMX: Directly use VMX capabilities helper to detect RDTSCP support
Use cpu_has_vmx_rdtscp() directly when computing secondary exec controls
and drop the now defunct vmx_rdtscp_supported().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:36 +01:00
Sean Christopherson
139085101f KVM: x86: Use KVM cpu caps to detect MSR_TSC_AUX virt support
Check for MSR_TSC_AUX virtualization via kvm_cpu_cap_has() and drop
->rdtscp_supported().

Note, vmx_rdtscp_supported() needs to hang around a tiny bit longer due
other usage in VMX code.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:35 +01:00
Sean Christopherson
7c7f954810 KVM: x86: Do kvm_cpuid_array capacity checks in terminal functions
Perform the capacity checks on the userspace provided kvm_cpuid_array
in the lower __do_cpuid_func() and __do_cpuid_func_emulated().
Pre-checking the array in do_cpuid_func() no longer adds value now that
__do_cpuid_func() has been trimmed down to size, i.e. doesn't invoke a
big pile of retpolined functions before doing anything useful.

Note, __do_cpuid_func() already checks the array capacity via
do_host_cpuid(), "moving" the check to __do_cpuid_func() simply means
removing a WARN_ON().

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:35 +01:00
Sean Christopherson
dd69cc2542 KVM: x86: Use kvm_cpu_caps to detect Intel PT support
Check for Intel PT using kvm_cpu_cap_has() to pave the way toward
eliminating ->pt_supported().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:34 +01:00
Sean Christopherson
93c380e7b5 KVM: x86: Set emulated/transmuted feature bits via kvm_cpu_caps
Set emulated and transmuted (set based on other features) feature bits
via kvm_cpu_caps now that the CPUID output for KVM_GET_SUPPORTED_CPUID
is direcly overidden with kvm_cpu_caps.

Note, VMX emulation of UMIP already sets kvm_cpu_caps.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:33 +01:00
Sean Christopherson
bd79199990 KVM: x86: Override host CPUID results with kvm_cpu_caps
Override CPUID entries with kvm_cpu_caps during KVM_GET_SUPPORTED_CPUID
instead of masking the host CPUID result, which is redundant now that
the host CPUID is incorporated into kvm_cpu_caps at runtime.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:32 +01:00
Sean Christopherson
d8577a4c23 KVM: x86: Do host CPUID at load time to mask KVM cpu caps
Mask kvm_cpu_caps based on host CPUID in preparation for overriding the
CPUID results during KVM_GET_SUPPORTED_CPUID instead of doing the
masking at runtime.

Note, masking may or may not be necessary, e.g. the kernel rarely, if
ever, sets real CPUID bits that are not supported by hardware.  But, the
code is cheap and only runs once at load, so an abundance of caution is
warranted.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:31 +01:00
Sean Christopherson
7ff6c03503 KVM: x86: Remove stateful CPUID handling
Remove the code for handling stateful CPUID 0x2 and mark the associated
flags as deprecated.  WARN if host CPUID 0x2.0.AL > 1, i.e. if by some
miracle a host with stateful CPUID 0x2 is encountered.

No known CPU exists that supports hardware accelerated virtualization
_and_ a stateful CPUID 0x2.  Barring an extremely contrived nested
virtualization scenario, stateful CPUID support is dead code.

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:31 +01:00
Sean Christopherson
c571a144ef KVM: x86: Squash CPUID 0x2.0 insanity for modern CPUs
Rework CPUID 0x2.0 to be a normal CPUID leaf if it returns "01" in AL,
i.e. EAX & 0xff, as a step towards removing KVM's stateful CPUID code
altogether.

Long ago, Intel documented CPUID 0x2.0 as being a stateful leaf, e.g. a
version of the SDM circa 1995 states:

  The least-significant byte in register EAX (register AL) indicates the
  number of times the CPUID instruction must be executed with an input
  value of 2 to get a complete description of the processors's caches
  and TLBs.  The Pentium Pro family of processors will return a 1.

A 2000 version of the SDM only updated the paragraph to reference
Intel's new processory family:

  The first member of the family of Pentium 4 processors will return a 1.

Fast forward to the present, and Intel's SDM now states:

  The least-significant byte in register EAX (register AL) will always
  return 01H.  Software should ignore this value and not interpret it as
  an information descriptor.

AMD's APM simply states that CPUID 0x2 is reserved.

Given that CPUID itself was introduced in the Pentium, odds are good
that the only Intel CPU family that *maybe* implemented a stateful CPUID
was the P5.  Which obviously did not support VMX, or KVM.

In other words, KVM's emulation of a stateful CPUID 0x2.0 has likely
been dead code from the day it was introduced.  This is backed up by
commit 0fdf8e59fa ("KVM: Fix cpuid iteration on multiple leaves per
eac"), which shows that the stateful iteration code was completely
broken when it was introduced by commit 0771671749 ("KVM: Enhance
guest cpuid management"), i.e. not actually tested.

Annotate all stateful code paths as "unlikely", but defer its removal to
a future patch to simplify reinstating the code if by some miracle there
is someone running KVM on a CPU with a stateful CPUID 0x2.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:30 +01:00
Sean Christopherson
bcf600ca8d KVM: x86: Remove the unnecessary loop on CPUID 0x7 sub-leafs
Explicitly handle CPUID 0x7 sub-leaf 1.  The kernel is currently aware
of exactly one feature in CPUID 0x7.1,  which means there is room for
another 127 features before CPUID 0x7.2 will see the light of day, i.e.
the looping is likely to be dead code for years to come.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:29 +01:00
Sean Christopherson
09f628a0b4 KVM: x86: Fold CPUID 0x7 masking back into __do_cpuid_func()
Move the CPUID 0x7 masking back into __do_cpuid_func() now that the
size of the code has been trimmed down significantly.

Tweak the WARN case, which is impossible to hit unless the CPU is
completely broken, to break the loop before creating the bogus entry.

Opportunustically reorder the cpuid_entry_set() calls and shorten the
comment about emulation to further reduce the footprint of CPUID 0x7.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:28 +01:00
Sean Christopherson
90d2f60f41 KVM: x86: Use KVM cpu caps to track UMIP emulation
Set UMIP in kvm_cpu_caps when it is emulated by VMX, even though the
bit will effectively be dropped by do_host_cpuid().  This allows
checking for UMIP emulation via kvm_cpu_caps instead of a dedicated
kvm_x86_ops callback.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:28 +01:00
Sean Christopherson
c10398b6d0 KVM: x86: Use KVM cpu caps to mark CR4.LA57 as not-reserved
Add accessor(s) for KVM cpu caps and use said accessor to detect
hardware support for LA57 instead of manually querying CPUID.

Note, the explicit conversion to bool via '!!' in kvm_cpu_cap_has() is
technically unnecessary, but it gives people a warm fuzzy feeling.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:27 +01:00
Sean Christopherson
8721f5b061 KVM: x86: Add a helper to check kernel support when setting cpu cap
Add a helper, kvm_cpu_cap_check_and_set(), to query boot_cpu_has() as
part of setting a KVM cpu capability.  VMX in particular has a number of
features that are dependent on both a VMCS capability and kernel
support.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:26 +01:00
Sean Christopherson
b3d895d5c4 KVM: x86: Move XSAVES CPUID adjust to VMX's KVM cpu cap update
Move the clearing of the XSAVES CPUID bit into VMX, which has a separate
VMCS control to enable XSAVES in non-root, to eliminate the last ugly
renmant of the undesirable "unsigned f_* = *_supported ? F(*) : 0"
pattern in the common CPUID handling code.

Drop ->xsaves_supported(), CPUID adjustment was the only user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:25 +01:00
Sean Christopherson
3ec6fd8cf0 KVM: VMX: Convert feature updates from CPUID to KVM cpu caps
Use the recently introduced KVM CPU caps to propagate VMX-only (kernel)
settings to supported CPUID flags.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:24 +01:00
Sean Christopherson
9b58b9857f KVM: SVM: Convert feature updates from CPUID to KVM cpu caps
Use the recently introduced KVM CPU caps to propagate SVM-only (kernel)
settings to supported CPUID flags.

Note, there are a few subtleties:

  - Setting a flag based on a *different* feature is effectively
    emulation, and must be done at runtime via ->set_supported_cpuid().

  - CPUID 0x8000000A.EDX is a feature leaf that was previously not
    adjusted by kvm_cpu_cap_mask() because all features are hidden by
    default.

Opportunistically add a technically unnecessary break and fix an
indentation issue in svm_set_supported_cpuid().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:24 +01:00
Sean Christopherson
66a6950f99 KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking
Calculate the CPUID masks for KVM_GET_SUPPORTED_CPUID at load time using
what is effectively a KVM-adjusted copy of boot_cpu_data, or more
precisely, the x86_capability array in boot_cpu_data.

In terms of KVM support, the vast majority of CPUID feature bits are
constant, and *all* feature support is known at KVM load time.  Rather
than apply boot_cpu_data, which is effectively read-only after init,
at runtime, copy it into a KVM-specific array and use *that* to mask
CPUID registers.

In additional to consolidating the masking, kvm_cpu_caps can be adjusted
by SVM/VMX at load time and thus eliminate all feature bit manipulation
in ->set_supported_cpuid().

Opportunistically clean up a few warts:

  - Replace bare "unsigned" with "unsigned int" when a feature flag is
    captured in a local variable, e.g. f_nx.

  - Sort the CPUID masks by function, index and register (alphabetically
    for registers, i.e. EBX comes before ECX/EDX).

  - Remove the superfluous /* cpuid 7.0.ecx */ comments.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Call kvm_set_cpu_caps from kvm_x86_ops->hardware_setup due to fixed
 GBPAGES patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:23 +01:00
Sean Christopherson
9e6d01c2d9 KVM: x86: Refactor handling of XSAVES CPUID adjustment
Invert the handling of XSAVES, i.e. set it based on boot_cpu_has() by
default, in preparation for adding KVM cpu caps, which will generate the
mask at load time before ->xsaves_supported() is ready.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:22 +01:00
Paolo Bonzini
fb7d4377d5 KVM: x86: handle GBPAGE CPUID adjustment for EPT with generic code
The clearing of the GBPAGE CPUID bit for VMX is wrong; support for 1GB
pages in EPT has no relationship to whether 1GB pages should be marked as
supported in CPUID.  This has no ill effect because we're only clearing
the bit, but we're not marking 1GB pages as available when EPT is disabled
(even though they are actually supported thanks to shadowing).  Instead,
forcibly enable 1GB pages in the shadow paging case.

This also eliminates an instance of the undesirable "unsigned f_* =
*_supported ? F(*) : 0" pattern in the common CPUID handling code,
and paves the way toward eliminating ->get_lpage_level().

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:21 +01:00
Sean Christopherson
dbd068040c KVM: x86: Handle Intel PT CPUID adjustment in VMX code
Move the Processor Trace CPUID adjustment into VMX code to eliminate
an instance of the undesirable "unsigned f_* = *_supported ? F(*) : 0"
pattern in the common CPUID handling code, and to pave the way toward
eventually removing ->pt_supported().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:20 +01:00
Sean Christopherson
733deafc00 KVM: x86: Handle RDTSCP CPUID adjustment in VMX code
Move the clearing of the RDTSCP CPUID bit into VMX, which has a separate
VMCS control to enable RDTSCP in non-root, to eliminate an instance of
the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the
common CPUID handling code.  Drop ->rdtscp_supported() since CPUID
adjustment was the last remaining user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:20 +01:00
Sean Christopherson
d64d83d1e0 KVM: x86: Handle PKU CPUID adjustment in VMX code
Move the setting of the PKU CPUID bit into VMX to eliminate an instance
of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in
the common CPUID handling code.  Drop ->pku_supported(), CPUID
adjustment was the only user.

Note, some AMD CPUs now support PKU, but SVM doesn't yet support
exposing it to a guest.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:19 +01:00
Sean Christopherson
e574768f84 KVM: x86: Handle UMIP emulation CPUID adjustment in VMX code
Move the CPUID adjustment for UMIP emulation into VMX code to eliminate
an instance of the undesirable "unsigned f_* = *_supported ? F(*) : 0"
pattern in the common CPUID handling code.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:18 +01:00
Sean Christopherson
5ffec6f910 KVM: x86: Handle INVPCID CPUID adjustment in VMX code
Move the INVPCID CPUID adjustments into VMX to eliminate an instance of
the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the
common CPUID handling code.  Drop ->invpcid_supported(), CPUID
adjustment was the only user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:17 +01:00
Sean Christopherson
6c7ea4b56b KVM: x86: Handle MPX CPUID adjustment in VMX code
Move the MPX CPUID adjustments into VMX to eliminate an instance of the
undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the
common CPUID handling code.

Note, to maintain existing behavior, VMX must manually check for kernel
support for MPX by querying boot_cpu_has(X86_FEATURE_MPX).  Previously,
do_cpuid_7_mask() masked MPX based on boot_cpu_data by invoking
cpuid_mask() on the associated cpufeatures word, but cpuid_mask() runs
prior to executing vmx_set_supported_cpuid().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:17 +01:00
Sean Christopherson
e745e37d49 KVM: x86: Refactor cpuid_mask() to auto-retrieve the register
Use the recently introduced cpuid_entry_get_reg() to automatically get
the appropriate register when masking a CPUID entry.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:16 +01:00
Sean Christopherson
b32666b13a KVM: x86: Introduce cpuid_entry_{change,set,clear}() mutators
Introduce mutators to modify feature bits in CPUID entries and use the
new mutators where applicable.  Using the mutators eliminates the need
to manually specify the register to modify query at no extra cost and
will allow adding runtime consistency checks on the function/index in a
future patch.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:15 +01:00
Sean Christopherson
4c61534aaa KVM: x86: Introduce cpuid_entry_{get,has}() accessors
Introduce accessors to retrieve feature bits from CPUID entries and use
the new accessors where applicable.  Using the accessors eliminates the
need to manually specify the register to be queried at no extra cost
(binary output is identical) and will allow adding runtime consistency
checks on the function and index in a future patch.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:14 +01:00
Sean Christopherson
5e12b2bb34 KVM: x86: Replace bare "unsigned" with "unsigned int" in cpuid helpers
Replace "unsigned" with "unsigned int" to make checkpatch and people
everywhere a little bit happier, and to avoid propagating the filth when
future patches add more cpuid helpers that work with unsigned (ints).

No functional change intended.

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:13 +01:00
Sean Christopherson
3be5a60b45 KVM: x86: Use u32 for holding CPUID register value in helpers
Change the intermediate CPUID output register values from "int" to "u32"
to match both hardware and the storage type in struct cpuid_reg.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:13 +01:00
Sean Christopherson
160b486f65 KVM: x86: Drop explicit @func param from ->set_supported_cpuid()
Drop the explicit @func param from ->set_supported_cpuid() and instead
pull the CPUID function from the relevant entry.  This sets the stage
for hardening guest CPUID updates in future patches, e.g. allows adding
run-time assertions that the CPUID feature being changed is actually
a bit in the referenced CPUID entry.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:12 +01:00
Sean Christopherson
7392079c4e KVM: x86: Clear output regs for CPUID 0x14 if PT isn't exposed to guest
Clear the output regs for the main CPUID 0x14 leaf (index=0) if Intel PT
isn't exposed to the guest.  Leaf 0x14 enumerates Intel PT capabilities
and should return zeroes if PT is not supported.  Incorrectly reporting
PT capabilities is essentially a cosmetic error, i.e. doesn't negatively
affect any known userspace/kernel, as the existence of PT itself is
correctly enumerated via CPUID 0x7.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:11 +01:00
Sean Christopherson
615a4ae1c7 KVM: x86: Make kvm_mpx_supported() an inline function
Expose kvm_mpx_supported() as a static inline so that it can be inlined
in kvm_intel.ko.

No functional change intended.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:10 +01:00
Sean Christopherson
7f5581f592 KVM: x86: Use supported_xcr0 to detect MPX support
Query supported_xcr0 when checking for MPX support instead of invoking
->mpx_supported() and drop ->mpx_supported() as kvm_mpx_supported() was
its last user.  Rename vmx_mpx_supported() to cpu_has_vmx_mpx() to
better align with VMX/VMCS nomenclature.

Modify VMX's adjustment of xcr0 to call cpus_has_vmx_mpx() (renamed from
vmx_mpx_supported()) directly to avoid reading supported_xcr0 before
it's fully configured.

No functional change intended.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Test that *all* bits are set. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:10 +01:00
Sean Christopherson
cfc481810c KVM: x86: Calculate the supported xcr0 mask at load time
Add a new global variable, supported_xcr0, to track which xcr0 bits can
be exposed to the guest instead of calculating the mask on every call.
The supported bits are constant for a given instance of KVM.

This paves the way toward eliminating the ->mpx_supported() call in
kvm_mpx_supported(), e.g. eliminates multiple retpolines in VMX's nested
VM-Enter path, and eventually toward eliminating ->mpx_supported()
altogether.

No functional change intended.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:09 +01:00
Sean Christopherson
2ef7619d43 KVM: VMX: Add helpers to query Intel PT mode
Add helpers to query which of the (two) supported PT modes is active.
The primary motivation is to help document that there is a third PT mode
(host-only) that's currently not supported by KVM.  As is, it's not
obvious that PT_MODE_SYSTEM != !PT_MODE_HOST_GUEST and vice versa, e.g.
that "pt_mode == PT_MODE_SYSTEM" and "pt_mode != PT_MODE_HOST_GUEST" are
two distinct checks.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:08 +01:00
Sean Christopherson
0eee8f9d9d KVM: x86: Use common loop iterator when handling CPUID 0xD.N
Use __do_cpuid_func()'s common loop iterator, "i", when enumerating the
sub-leafs for CPUID 0xD now that the CPUID 0xD loop doesn't need to
manual maintain separate counts for the entries index and CPUID index.

No functional changed intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:07 +01:00
Sean Christopherson
695538aa21 KVM: x86: Drop redundant array size check
Drop a "nent >= maxnent" check in kvm_get_cpuid() that's fully redundant
now that kvm_get_cpuid() isn't indexing the array to pass an entry to
do_cpuid_func().

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:06 +01:00
Sean Christopherson
e53c95e8d4 KVM: x86: Encapsulate CPUID entries and metadata in struct
Add a struct to hold the array of CPUID entries and its associated
metadata when handling KVM_GET_SUPPORTED_CPUID.  Lookup and provide
the correct entry in do_host_cpuid(), which eliminates the majority of
array indexing shenanigans, e.g. entries[i -1], and generally makes the
code more readable.  The last array indexing holdout is kvm_get_cpuid(),
which can't really be avoided without throwing the baby out with the
bathwater.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:06 +01:00
Sean Christopherson
c862903963 KVM: x86: Refactor CPUID 0x4 and 0x8000001d handling
Refactoring the sub-leaf handling for CPUID 0x4/0x8000001d to eliminate
a one-off variable and its associated brackets.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:05 +01:00
Sean Christopherson
74fa0bc7f0 KVM: x86: Hoist loop counter and terminator to top of __do_cpuid_func()
Declare "i" and "max_idx" at the top of __do_cpuid_func() to consolidate
a handful of declarations in various case statements.

More importantly, establish the pattern of using max_idx instead of e.g.
entry->eax as the loop terminator in preparation for refactoring how
entry is handled in __do_cpuid_func().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:04 +01:00