Commit Graph

109057 Commits

Author SHA1 Message Date
Xiao Guangrong
b18d5431ac KVM: x86: fix CR0.CD virtualization
Currently, CR0.CD is not checked when we virtualize memory cache type for
noncoherent_dma guests, this patch fixes it by :

- setting UC for all memory if CR0.CD = 1
- zapping all the last sptes in MMU if CR0.CD is changed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:26 +02:00
Bandan Das
f104765b4f KVM: nSVM: Check for NRIPS support before updating control field
If hardware doesn't support DecodeAssist - a feature that provides
more information about the intercept in the VMCB, KVM decodes the
instruction and then updates the next_rip vmcb control field.
However, NRIP support itself depends on cpuid Fn8000_000A_EDX[NRIPS].
Since skip_emulated_instruction() doesn't verify nrip support
before accepting control.next_rip as valid, avoid writing this
field if support isn't present.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:26 +02:00
Paolo Bonzini
05fe125fa3 KVM/ARM changes for v4.2:
- Proper guest time accounting
 - FP access fix for 32bit
 - The usual pile of GIC fixes
 - PSCI fixes
 - Random cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVg/c2AAoJECPQ0LrRPXpDjSoQALjxhdY0ROAz2yjVqcYD5Goj
 8XHLbwDxlXQIqXulEakpx/2CtQs3tYsnt7gg3HoO3aY0TqxKY+0nh9RuZ/b15KcP
 hailUK/hbdLo9sDmoPBRBBP26gqjnXVMYTNqSFk4u2KADYUYgl5auhXcTNARH6Gx
 +c3LJG9HUbbvKkTG+RB+Xgl3f9N0uaIaV15oTX+WsolNIQVmA8Sh5X7nXPnFXV4r
 klb9pcR3FPWA9m+fDW8VszoLZ39a12kfQsK0yshKy5ep/AA1liPEYDVc6Wp/cEqz
 ybDfGuO9GPoPLfzhYq1Cw/SeKaAFruu6QD1ugsSlh1DVZ2evme+OsEvzRsfCqEYc
 VkKkX2Rc8DQTjsJoaTCOmNfU+v0dWwriCXmfLPBj01+55M4oCDcntTasrEsDAc3o
 YkTk68T9HJxsNesWpkZDo0pYzGeu3GDS1Hg/ElESminCNfK/S/GBTtidLq7Rs7KU
 VJCaUeob6CoQnRdmNEm+nHInh5pSoOlKzdduFM9IcvOO5B12YlNMcVnr48Y16ea7
 ElYGkxR/n/uLt33A3Yvaj8+PeJbwxQ1RcWRDJy8Fw77AFLEaT2VTr2kZgJIqTatQ
 ENnCLMxNP2uAt1TrGhEempndYZdWK/RiuaA9MjnxSscFkrXPfn4bMJzV58ghF9L+
 62hk2S8I8q69SvqbPTgv
 =VQWP
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM changes for v4.2:

- Proper guest time accounting
- FP access fix for 32bit
- The usual pile of GIC fixes
- PSCI fixes
- Random cleanups
2015-06-19 17:15:24 +02:00
Marc Zyngier
4642019dc4 arm/arm64: KVM: vgic: Do not save GICH_HCR / ICH_HCR_EL2
The GIC Hypervisor Configuration Register is used to enable
the delivery of virtual interupts to a guest, as well as to
define in which conditions maintenance interrupts are delivered
to the host.

This register doesn't contain any information that we need to
read back (the EOIcount is utterly useless for us).

So let's save ourselves some cycles, and not save it before
writing zero to it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:59:55 +01:00
Lorenzo Pieralisi
e2d997366d ARM: kvm: psci: fix handling of unimplemented functions
According to the PSCI specification and the SMC/HVC calling
convention, PSCI function_ids that are not implemented must
return NOT_SUPPORTED as return value.

Current KVM implementation takes an unhandled PSCI function_id
as an error and injects an undefined instruction into the guest
if PSCI implementation is called with a function_id that is not
handled by the resident PSCI version (ie it is not implemented),
which is not the behaviour expected by a guest when calling a
PSCI function_id that is not implemented.

This patch fixes this issue by returning NOT_SUPPORTED whenever
the kvm PSCI call is executed for a function_id that is not
implemented by the PSCI kvm layer.

Cc: <stable@vger.kernel.org> # 3.18+
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:46:29 +01:00
Alex Bennée
921ef1e16c KVM: arm64: fix misleading comments in save/restore
The elr_el2 and spsr_el2 registers in fact contain the processor state
before entry into EL2. In the case of guest state it could be in either
el0 or el1.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:46:29 +01:00
Kim Phillips
8889583c03 KVM: arm/arm64: Enable the KVM-VFIO device
The KVM-VFIO device is used by the QEMU VFIO device. It is used to
record the list of in-use VFIO groups so that KVM can manipulate
them.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:46:29 +01:00
Christoffer Dall
1b3d546daf arm/arm64: KVM: Properly account for guest CPU time
Until now we have been calling kvm_guest_exit after re-enabling
interrupts when we come back from the guest, but this has the
unfortunate effect that CPU time accounting done in the context of timer
interrupts occurring while the guest is running doesn't properly notice
that the time since the last tick was spent in the guest.

Inspired by the comment in the x86 code, move the kvm_guest_exit() call
below the local_irq_enable() call and change __kvm_guest_exit() to
kvm_guest_exit(), because we are now calling this function with
interrupts enabled.  We have to now explicitly disable preemption and
not enable preemption before we've called kvm_guest_exit(), since
otherwise we could be preempted and everything happening before we
eventually get scheduled again would be accounted for as guest time.

At the same time, move the trace_kvm_exit() call outside of the atomic
section, since there is no reason for us to do that with interrupts
disabled.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:46:28 +01:00
Tiejun Chen
ea2c6d9745 kvm: remove one useless check extension
We already check KVM_CAP_IRQFD in generic once enable CONFIG_HAVE_KVM_IRQFD,

kvm_vm_ioctl_check_extension_generic()
    |
    + switch (arg) {
    +   ...
    +   #ifdef CONFIG_HAVE_KVM_IRQFD
    +       case KVM_CAP_IRQFD:
    +   #endif
    +   ...
    +   return 1;
    +   ...
    + }
    |
    + kvm_vm_ioctl_check_extension()

So its not necessary to check this in arch again, and also fix one typo,
s/emlation/emulation.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-06-17 09:46:28 +01:00
Marc Zyngier
85e84ba310 arm: KVM: force execution of HCPTR access on VM exit
On VM entry, we disable access to the VFP registers in order to
perform a lazy save/restore of these registers.

On VM exit, we restore access, test if we did enable them before,
and save/restore the guest/host registers if necessary. In this
sequence, the FPEXC register is always accessed, irrespective
of the trapping configuration.

If the guest didn't touch the VFP registers, then the HCPTR access
has now enabled such access, but we're missing a barrier to ensure
architectural execution of the new HCPTR configuration. If the HCPTR
access has been delayed/reordered, the subsequent access to FPEXC
will cause a trap, which we aren't prepared to handle at all.

The same condition exists when trapping to enable VFP for the guest.

The fix is to introduce a barrier after enabling VFP access. In the
vmexit case, it can be relaxed to only takes place if the guest hasn't
accessed its view of the VFP registers, making the access to FPEXC safe.

The set_hcptr macro is modified to deal with both vmenter/vmexit and
vmtrap operations, and now takes an optional label that is branched to
when the guest hasn't touched the VFP registers.

Reported-by: Vikram Sethi <vikrams@codeaurora.org>
Cc: stable@kernel.org	# v3.9+
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:40:14 +01:00
Firo Yang
a5f56ba3b4 ARM: KVM: Remove pointless void pointer cast
No need to cast the void pointer returned by kmalloc() in
arch/arm/kvm/mmu.c::kvm_alloc_stage2_pgd().

Signed-off-by: Firo Yang <firogm@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-09 18:05:17 +01:00
Paolo Bonzini
e80a4a9426 KVM: x86: mark legacy PCI device assignment as deprecated
Follow up to commit e194bbdf36.

Suggested-by: Bandan Das <bsd@redhat.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:26:39 +02:00
Paolo Bonzini
6d396b5520 KVM: x86: advertise KVM_CAP_X86_SMM
... and we're done. :)

Because SMBASE is usually relocated above 1M on modern chipsets, and
SMM handlers might indeed rely on 4G segment limits, we only expose it
if KVM is able to run the guest in big real mode.  This includes any
of VMX+emulate_invalid_guest_state, VMX+unrestricted_guest, or SVM.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:26:38 +02:00
Paolo Bonzini
699023e239 KVM: x86: add SMM to the MMU role, support SMRAM address space
This is now very simple to do.  The only interesting part is a simple
trick to find the right memslot in gfn_to_rmap, retrieving the address
space from the spte role word.  The same trick is used in the auditing
code.

The comment on top of union kvm_mmu_page_role has been stale forever,
so remove it.  Speaking of stale code, remove pad_for_nice_hex_output
too: it was splitting the "access" bitfield across two bytes and thus
had effectively turned into pad_for_ugly_hex_output.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:26:37 +02:00
Paolo Bonzini
9da0e4d5ac KVM: x86: work on all available address spaces
This patch has no semantic change, but it prepares for the introduction
of a second address space for system management mode.

A new function x86_set_memory_region (and the "slots_lock taken"
counterpart __x86_set_memory_region) is introduced in order to
operate on all address spaces when adding or deleting private
memory slots.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:26:37 +02:00
Paolo Bonzini
54bf36aac5 KVM: x86: use vcpu-specific functions to read/write/translate GFNs
We need to hide SMRAM from guests not running in SMM.  Therefore,
all uses of kvm_read_guest* and kvm_write_guest* must be changed to
check whether the VCPU is in system management mode and use a
different set of memslots.  Switch from kvm_* to the newly-introduced
kvm_vcpu_*, which call into kvm_arch_vcpu_memslots_id.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:26:36 +02:00
Paolo Bonzini
e4cd1da944 KVM: x86: pass struct kvm_mmu_page to gfn_to_rmap
This is always available (with one exception in the auditing code),
and with the same auditing exception the level was coming from
sp->role.level.

Later, the spte's role will also be used to look up the right memslots
array.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:26:35 +02:00
Paolo Bonzini
f481b069e6 KVM: implement multiple address spaces
Only two ioctls have to be modified; the address space id is
placed in the higher 16 bits of their slot id argument.

As of this patch, no architecture defines more than one
address space; x86 will be the first.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:26:35 +02:00
Paolo Bonzini
660a5d517a KVM: x86: save/load state on SMM switch
The big ugly one.  This patch adds support for switching in and out of
system management mode, respectively upon receiving KVM_REQ_SMI and upon
executing a RSM instruction.  Both 32- and 64-bit formats are supported
for the SMM state save area.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:17:46 +02:00
Paolo Bonzini
cd7764fe9f KVM: x86: latch INITs while in system management mode
Do not process INITs immediately while in system management mode, keep
it instead in apic->pending_events.  Tell userspace if an INIT is
pending when they issue GET_VCPU_EVENTS, and similarly handle the
new field in SET_VCPU_EVENTS.

Note that the same treatment should be done while in VMX non-root mode.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:01:51 +02:00
Paolo Bonzini
64d6067057 KVM: x86: stubs for SMM support
This patch adds the interface between x86.c and the emulator: the
SMBASE register, a new emulator flag, the RSM instruction.  It also
adds a new request bit that will be used by the KVM_SMI ioctl.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:01:45 +02:00
Paolo Bonzini
f077825a87 KVM: x86: API changes for SMM support
This patch includes changes to the external API for SMM support.
Userspace can predicate the availability of the new fields and
ioctls on a new capability, KVM_CAP_X86_SMM, which is added at the end
of the patch series.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:01:11 +02:00
Paolo Bonzini
a584539b24 KVM: x86: pass the whole hflags field to emulator and back
The hflags field will contain information about system management mode
and will be useful for the emulator.  Pass the entire field rather than
just the guest-mode information.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:01:05 +02:00
Paolo Bonzini
609e36d372 KVM: x86: pass host_initiated to functions that read MSRs
SMBASE is only readable from SMM for the VCPU, but it must be always
accessible if userspace is accessing it.  Thus, all functions that
read MSRs are changed to accept a struct msr_data; the host_initiated
and index fields are pre-initialized, while the data field is filled
on return.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:01:00 +02:00
Paolo Bonzini
62ef68bb4d KVM: x86: introduce num_emulated_msrs
We will want to filter away MSR_IA32_SMBASE from the emulated_msrs if
the host CPU does not support SMM virtualization.  Introduce the
logic to do that, and also move paravirt MSRs to emulated_msrs for
simplicity and to get rid of KVM_SAVE_MSRS_BEGIN.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:00:46 +02:00
Paolo Bonzini
e69fab5df4 KVM: x86: clear hidden CPU state at reset time
This was noticed by Radim while reviewing the implementation of
system management mode.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 10:44:44 +02:00
Paolo Bonzini
ce40cd3fc7 kvm: x86: fix kvm_apic_has_events to check for NULL pointer
Malicious (or egregiously buggy) userspace can trigger it, but it
should never happen in normal operation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 10:16:17 +02:00
Paolo Bonzini
e194bbdf36 kvm: x86: default legacy PCI device assignment support to "n"
VFIO has proved itself a much better option than KVM's built-in
device assignment.  It is mature, provides better isolation because
it enforces ACS, and even the userspace code is being tested on
a wider variety of hardware these days than the legacy support.

Disable legacy device assignment by default.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 09:51:50 +02:00
Paolo Bonzini
f71f81d70a KVM: s390: Fix and cleanup for 4.2 (kvm/next)
One small fix for a commit targetted for 4.2 and one cleanup
 regarding our printks.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJVbWMvAAoJEBF7vIC1phx8G9IP/1VxA0aXRyUEA0KgHIbkWYJb
 DPXpznubJqgWTYGMWM9R8/frIwH3rnJZMdWDNtu3SMULQTC9HzTMcl5PAe+XEe9F
 +oxU4IgUdvcbZkI49rnHn+3n99vhnQS5emmX7ivPV2YQrWFVC36gAOMlNe+S40fJ
 SJ4iXANo+3LT6MaeD67Kcb+nLsrGTTP+6RtNthc4yV14fYPLdafy8+5BAvMZfLRn
 xWS9In8zqQtCnaB4eJ08C4D7MuNL6yIu3s54PLunKVlvCayxThsFNk+al/QwyS74
 6vJZLCFX55RLSBZLkEYH6b2k1ckF//ZgLOL29sLIHwi2Ry01guZ43PjjRa/jdkbj
 cOq5rDsfcfKp8sIMJhGF5Y/UneaqBW+/vAfQrIHANDwUcCkfFh95/Gv/nF5KrcsO
 0pvzi+SSnu7Y2hWL5iJIvrHAclMazEHewWnubur7UTgkzPWxA35gfBqwZir5q/pI
 cG2AELzjERWYWIip4hT2z1UGSKZQNYOddrmZxN6noj0MCyIdnq/wuklOr9y5HTif
 ei+k0xtaViEES4vlc0H6Jo5Cplgv28nxXBemAtNwCCL8iGVJbM7JJcbclrcIkqgc
 AgIWSTd8ZsUqBZUWLX37CXqhdym1LmyE3r1A/eV42NXFWatZnDbCzy9k10y9RVGX
 /i5OFil2B640rmFUEMHM
 =t6ws
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-20150602' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

KVM: s390: Fix and cleanup for 4.2 (kvm/next)

One small fix for a commit targetted for 4.2 and one cleanup
regarding our printks.
2015-06-03 14:51:02 +02:00
David Hildenbrand
ea2cdd27dc KVM: s390: introduce KMSG_COMPONENT for kvm-s390
Let's remove "kvm-s390" from our printk messages and make use
of pr_fmt instead.

Also replace one printk() occurrence by a equivalent pr_warn
on the way.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-06-02 09:39:20 +02:00
David Hildenbrand
61a6df54b6 KVM: s390: call exit_sie() directly on vcpu block/request
Thinking about it, I can't find a real use case where we want
to block a VCPU and not kick it out of SIE. (except if we want
to do the same in batch for multiple VCPUs - but that's a micro
optimization)

So let's simply perform the exit_sie() calls directly when setting
the other magic block bits in the SIE.

Otherwise e.g. kvm_s390_set_tod_low() still has other VCPUs running
after that call, working with a wrong epoch.

Fixes: 27406cd50c ("KVM: s390: provide functions for blocking all CPUs")
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-06-02 09:38:01 +02:00
Marcelo Tosatti
b7e60c5aed KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR
Initialize kvmclock base, on kvmclock system MSR write time,
so that the guest sees kvmclock counting from zero.

This matches baremetal behaviour when kvmclock in guest
sets sched clock stable.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[Remove unnecessary comment. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-29 14:02:40 +02:00
Luiz Capitulino
0ad83caa21 x86: kvmclock: set scheduler clock stable
If you try to enable NOHZ_FULL on a guest today, you'll get
the following error when the guest tries to deactivate the
scheduler tick:

 WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290()
 NO_HZ FULL will not work with unstable sched clock
 CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 Workqueue: events flush_to_ldisc
  ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002
  ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8
  0000000000000000 0000000000000003 0000000000000001 0000000000000001
 Call Trace:
  <IRQ>  [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b
  [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0
  [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50
  [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290
  [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0
  [<ffffffff810511c5>] irq_exit+0xc5/0x130
  [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60
  [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80
  <EOI>  [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60
  [<ffffffff8108bbc8>] __wake_up+0x48/0x60
  [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0
  [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70
  [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20
  [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120
  [<ffffffff81064d05>] process_one_work+0x1d5/0x540
  [<ffffffff81064c81>] ? process_one_work+0x151/0x540
  [<ffffffff81065191>] worker_thread+0x121/0x470
  [<ffffffff81065070>] ? process_one_work+0x540/0x540
  [<ffffffff8106b4df>] kthread+0xef/0x110
  [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
  [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70
  [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
 ---[ end trace 06e3507544a38866 ]---

However, it turns out that kvmclock does provide a stable
sched_clock callback. So, let the scheduler know this which
in turn makes NOHZ_FULL work in the guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-29 14:01:52 +02:00
Marcelo Tosatti
611917258f x86: kvmclock: add flag to indicate pvclock counts from zero
Setting sched clock stable for kvmclock causes the printk timestamps
to not start from zero, which is different from baremetal and
can possibly break userspace. Add a flag to indicate that
hypervisor sets clock base at zero when kvmclock is initialized.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-29 14:01:39 +02:00
Andrew Morton
4141259b56 arch/x86/kvm/mmu.c: work around gcc-4.4.4 bug
arch/x86/kvm/mmu.c: In function 'kvm_mmu_pte_write':
arch/x86/kvm/mmu.c:4256: error: unknown field 'cr0_wp' specified in initializer
arch/x86/kvm/mmu.c:4257: error: unknown field 'cr4_pae' specified in initializer
arch/x86/kvm/mmu.c:4257: warning: excess elements in union initializer
...

gcc-4.4.4 (at least) has issues when using anonymous unions in
initializers.

Fixes: edc90b7dc4 ("KVM: MMU: fix SMAP virtualization")
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28 10:43:38 +02:00
Jan Kiszka
e453aa0f7e KVM: x86: Allow ARAT CPU feature
There is no reason to deny this feature to guests. We are emulating the
APIC timer, thus are exposing it without stops in power-saving states.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28 10:43:37 +02:00
Radim Krčmář
c028dd6bb6 KVM: x86: preserve x2APIC LDR on INIT
Logical x2APIC stops working if we rewrite it with zeros.
The best references are SDM April 2015: 10.12.10.1 Logical Destination
Mode in x2APIC Mode

  [...], the LDR are initialized by hardware based on the value of
  x2APIC ID upon x2APIC state transitions.

and SDM April 2015: 10.12.10.2 Deriving Logical x2APIC ID from the Local
x2APIC ID

  The LDR initialization occurs whenever the x2APIC mode is enabled

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28 10:43:36 +02:00
Radim Krčmář
257b9a5faa KVM: x86: use correct APIC ID on x2APIC transition
SDM April 2015, 10.12.5 State Changes From xAPIC Mode to x2APIC Mode
• Any APIC ID value written to the memory-mapped local APIC ID register
  is not preserved.

Fix it by sourcing vcpu_id (= initial APIC ID) instead of memory-mapped
APIC ID.  Proper use of apic functions would result in two calls to
recalculate_apic_map(), so this patch makes a new helper.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28 10:43:35 +02:00
Marcelo Tosatti
630994b3c7 KVM: x86: add module parameter to disable periodic kvmclock sync
The periodic kvmclock sync can be an undesired source of latencies.

When running cyclictest on a guest, a latency spike is visible.
With kvmclock periodic sync disabled, the spike is gone.

Guests should use ntp which means the propagations of ntp corrections
from the host clock are unnecessary.

v2:
-> Make parameter read-only (Radim)
-> Return early on kvmclock_sync_fn (Andrew)

Reported-and-tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28 10:43:35 +02:00
Paolo Bonzini
3ed1a47876 KVM: x86: pass struct kvm_mmu_page to account/unaccount_shadowed
Prepare for multiple address spaces this way, since a VCPU is not available
where unaccount_shadowed is called.  We will get to the right kvm_memslots
struct through the role field in struct kvm_mmu_page.

Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Reviewed-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28 10:43:34 +02:00
Paolo Bonzini
d9ef13c2b3 KVM: pass kvm_memory_slot to gfn_to_page_many_atomic
The memory slot is already available from gfn_to_memslot_dirty_bitmap.
Isn't it a shame to look it up again?  Plus, it makes gfn_to_page_many_atomic
agnostic of multiple VCPU address spaces.

Reviewed-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28 10:43:33 +02:00
Paolo Bonzini
f36f3f2846 KVM: add "new" argument to kvm_arch_commit_memory_region
This lets the function access the new memory slot without going through
kvm_memslots and id_to_memslot.  It will simplify the code when more
than one address space will be supported.

Unfortunately, the "const"ness of the new argument must be casted
away in two places.  Fixing KVM to accept const struct kvm_memory_slot
pointers would require modifications in pretty much all architectures,
and is left for later.

Reviewed-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28 10:42:58 +02:00
Paolo Bonzini
15f46015ee KVM: add memslots argument to kvm_arch_memslots_updated
Prepare for the case of multiple address spaces.

Reviewed-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26 12:40:17 +02:00
Paolo Bonzini
09170a4942 KVM: const-ify uses of struct kvm_userspace_memory_region
Architecture-specific helpers are not supposed to muck with
struct kvm_userspace_memory_region contents.  Add const to
enforce this.

In order to eliminate the only write in __kvm_set_memory_region,
the cleaning of deleted slots is pulled up from update_memslots
to __kvm_set_memory_region.

Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Reviewed-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26 12:40:13 +02:00
Paolo Bonzini
9f6b802978 KVM: use kvm_memslots whenever possible
kvm_memslots provides lockdep checking.  Use it consistently instead of
explicit dereferencing of kvm->memslots.

Reviewed-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26 12:40:08 +02:00
Paolo Bonzini
a9b4fb7e79 Merge branch 'kvm-master' into kvm-next
Grab MPX bugfix, and fix conflicts against Rik's adaptive FPU
deactivation patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 12:31:37 +02:00
Liang Li
c447e76b4c kvm/fpu: Enable eager restore kvm FPU for MPX
The MPX feature requires eager KVM FPU restore support. We have verified
that MPX cannot work correctly with the current lazy KVM FPU restore
mechanism. Eager KVM FPU restore should be enabled if the MPX feature is
exposed to VM.

Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
[Also activate the FPU on AMD processors. - Paolo]
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 12:30:26 +02:00
Paolo Bonzini
0fdd74f778 Revert "KVM: x86: drop fpu_activate hook"
This reverts commit 4473b570a7.  We'll
use the hook again.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 12:30:15 +02:00
Andrea Arcangeli
e8fd5e9e99 kvm: fix crash in kvm_vcpu_reload_apic_access_page
memslot->userfault_addr is set by the kernel with a mmap executed
from the kernel but the userland can still munmap it and lead to the
below oops after memslot->userfault_addr points to a host virtual
address that has no vma or mapping.

[  327.538306] BUG: unable to handle kernel paging request at fffffffffffffffe
[  327.538407] IP: [<ffffffff811a7b55>] put_page+0x5/0x50
[  327.538474] PGD 1a01067 PUD 1a03067 PMD 0
[  327.538529] Oops: 0000 [#1] SMP
[  327.538574] Modules linked in: macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ipmi_si ipmi_msghandler acpi_pad wmi acpi_power_meter lpc_ich mfd_core mei_me
[  327.539488]  mei shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc mlx4_ib ib_sa ib_mad ib_core mlx4_en vxlan ib_addr ip_tunnel xfs libcrc32c sd_mod crc_t10dif crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci i2c_core libahci mlx4_core libata tg3 ptp pps_core megaraid_sas ntb dm_mirror dm_region_hash dm_log dm_mod
[  327.539956] CPU: 3 PID: 3161 Comm: qemu-kvm Not tainted 3.10.0-240.el7.userfault19.4ca4011.x86_64.debug #1
[  327.540045] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.1.2 01/20/2014
[  327.540115] task: ffff8803280ccf00 ti: ffff880317c58000 task.ti: ffff880317c58000
[  327.540184] RIP: 0010:[<ffffffff811a7b55>]  [<ffffffff811a7b55>] put_page+0x5/0x50
[  327.540261] RSP: 0018:ffff880317c5bcf8  EFLAGS: 00010246
[  327.540313] RAX: 00057ffffffff000 RBX: ffff880616a20000 RCX: 0000000000000000
[  327.540379] RDX: 0000000000002014 RSI: 00057ffffffff000 RDI: fffffffffffffffe
[  327.540445] RBP: ffff880317c5bd10 R08: 0000000000000103 R09: 0000000000000000
[  327.540511] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffe
[  327.540576] R13: 0000000000000000 R14: ffff880317c5bd70 R15: ffff880317c5bd50
[  327.540643] FS:  00007fd230b7f700(0000) GS:ffff880630800000(0000) knlGS:0000000000000000
[  327.540717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  327.540771] CR2: fffffffffffffffe CR3: 000000062a2c3000 CR4: 00000000000427e0
[  327.540837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  327.540904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  327.540974] Stack:
[  327.541008]  ffffffffa05d6d0c ffff880616a20000 0000000000000000 ffff880317c5bdc0
[  327.541093]  ffffffffa05ddaa2 0000000000000000 00000000002191bf 00000042f3feab2d
[  327.541177]  00000042f3feab2d 0000000000000002 0000000000000001 0321000000000000
[  327.541261] Call Trace:
[  327.541321]  [<ffffffffa05d6d0c>] ? kvm_vcpu_reload_apic_access_page+0x6c/0x80 [kvm]
[  327.543615]  [<ffffffffa05ddaa2>] vcpu_enter_guest+0x3f2/0x10f0 [kvm]
[  327.545918]  [<ffffffffa05e2f10>] kvm_arch_vcpu_ioctl_run+0x2b0/0x5a0 [kvm]
[  327.548211]  [<ffffffffa05e2d02>] ? kvm_arch_vcpu_ioctl_run+0xa2/0x5a0 [kvm]
[  327.550500]  [<ffffffffa05ca845>] kvm_vcpu_ioctl+0x2b5/0x680 [kvm]
[  327.552768]  [<ffffffff810b8d12>] ? creds_are_invalid.part.1+0x12/0x50
[  327.555069]  [<ffffffff810b8d71>] ? creds_are_invalid+0x21/0x30
[  327.557373]  [<ffffffff812d6066>] ? inode_has_perm.isra.49.constprop.65+0x26/0x80
[  327.559663]  [<ffffffff8122d985>] do_vfs_ioctl+0x305/0x530
[  327.561917]  [<ffffffff8122dc51>] SyS_ioctl+0xa1/0xc0
[  327.564185]  [<ffffffff816de829>] system_call_fastpath+0x16/0x1b
[  327.566480] Code: 0b 31 f6 4c 89 e7 e8 4b 7f ff ff 0f 0b e8 24 fd ff ff e9 a9 fd ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 12:30:06 +02:00
Nicholas Krause
ed3cf15271 kvm: x86: Make functions that have no external callers static
This makes the functions kvm_guest_cpu_init and  kvm_init_debugfs
static now due to having no external callers outside their
declarations in the file, kvm.c.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 11:48:21 +02:00