Commit Graph

494812 Commits

Author SHA1 Message Date
Jason J. Herne
72f250206f KVM: s390: Provide guest TOD Clock Get/Set Controls
Provide controls for setting/getting the guest TOD clock based on the VM
attribute interface.

Provide TOD and TOD_HIGH vm attributes on s390 for managing guest Time Of
Day clock value.

TOD_HIGH is presently always set to 0. In the future it will contain a high
order expansion of the tod clock value after it overflows the 64-bits of
the TOD.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:40 +01:00
Jens Freimann
556cc0dab1 KVM: s390: trace correct values for set prefix and machine checks
When injecting SIGP set prefix or a machine check, we trace
the values in our per-vcpu local_int data structure instead
of the parameters passed to the function.

Fix this by changing the trace statement to use the correct values.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:39 +01:00
Jens Freimann
49538d1238 KVM: s390: fix bug in sigp emergency signal injection
Currently we are always setting the wrong bit in the
bitmap for pending emergency signals. Instead of using
emerg.code from the passed in irq parameter, we use the
value in our per-vcpu local_int structure, which is always zero.
That means all emergency signals will have address 0 as parameter.
If two CPUs send a SIGP to the same target, one might be lost.

Let's fix this by using the value from the parameter and
also trace the correct value.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:39 +01:00
Thomas Huth
3cfad02380 KVM: s390: Take addressing mode into account for MVPG interception
The handler for MVPG partial execution interception does not take
the current CPU addressing mode into account yet, so addresses are
always treated as 64-bit addresses. For correct behaviour, we should
properly handle 24-bit and 31-bit addresses, too.

Since MVPG is defined to work with logical addresses, we can simply
use guest_translate_address() to achieve the required behaviour
(since DAT is disabled here, guest_translate_address() skips the MMU
translation and only translates the address via kvm_s390_logical_to_effective()
and kvm_s390_real_to_abs(), which is exactly what we want here).

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:38 +01:00
Christian Borntraeger
69a8d45626 KVM: s390: no need to hold the kvm->mutex for floating interrupts
The kvm mutex was (probably) used to protect against cpu hotplug.
The current code no longer needs to protect against that, as we only
rely on CPU data structures that are guaranteed to be available
if we can access the CPU. (e.g. vcpu_create will put the cpu
in the array AFTER the cpu is ready).

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
2015-01-23 13:25:37 +01:00
David Hildenbrand
2444b352c3 KVM: s390: forward most SIGP orders to user space
Most SIGP orders are handled partially in kernel and partially in
user space. In order to:
- Get a correct SIGP SET PREFIX handler that informs user space
- Avoid race conditions between concurrently executed SIGP orders
- Serialize SIGP orders per VCPU

We need to handle all "slow" SIGP orders in user space. The remaining
ones to be handled completely in kernel are:
- SENSE
- SENSE RUNNING
- EXTERNAL CALL
- EMERGENCY SIGNAL
- CONDITIONAL EMERGENCY SIGNAL
According to the PoP, they have to be fast. They can be executed
without conflicting to the actions of other pending/concurrently
executing orders (e.g. STOP vs. START).

This patch introduces a new capability that will - when enabled -
forward all but the mentioned SIGP orders to user space. The
instruction counters in the kernel are still updated.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:37 +01:00
David Hildenbrand
9fbd80828c KVM: s390: clear the pfault queue if user space sets the invalid token
We need a way to clear the async pfault queue from user space (e.g.
for resets and SIGP SET ARCHITECTURE).

This patch simply clears the queue as soon as user space sets the
invalid pfault token. The definition of the invalid token is moved
to uapi.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:36 +01:00
David Hildenbrand
ea5f496925 KVM: s390: only one external call may be pending at a time
Only one external call may be pending at a vcpu at a time. For this
reason, we have to detect whether the SIGP externcal call interpretation
facility is available. If so, all external calls have to be injected
using this mechanism.

SIGP EXTERNAL CALL orders have to return whether another external
call is already pending. This check was missing until now.

SIGP SENSE hasn't returned yet in all conditions whether an external
call was pending.

If a SIGP EXTERNAL CALL irq is to be injected and one is already
pending, -EBUSY is returned.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:36 +01:00
David Hildenbrand
d614be05c8 s390/sclp: introduce check for the SIGP Interpretation Facility
This patch introduces the infrastructure to check whether the SIGP
Interpretation Facility is installed on all VCPUs in the configuration.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:35 +01:00
David Hildenbrand
a3a9c59a68 KVM: s390: SIGP SET PREFIX cleanup
This patch cleanes up the the SIGP SET PREFIX code.

A SIGP SET PREFIX irq may only be injected if the target vcpu is
stopped. Let's move the checking code into the injection code and
return -EBUSY if the target vcpu is not stopped.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:34 +01:00
David Hildenbrand
9a022067ad KVM: s390: a VCPU may only stop when no interrupts are left pending
As a SIGP STOP is an interrupt with the least priority, it may only result
in stop of the vcpu when no other interrupts are left pending.

To detect whether a non-stop irq is pending, we need a way to mask out
stop irqs from the general kvm_cpu_has_interrupt() function. For this
reason, the existing function (with an outdated name) is replaced by
kvm_s390_vcpu_has_irq() which allows to mask out pending stop irqs.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:34 +01:00
David Hildenbrand
6cddd432e3 KVM: s390: handle stop irqs without action_bits
This patch removes the famous action_bits and moves the handling of
SIGP STOP AND STORE STATUS directly into the SIGP STOP interrupt.

The new local interrupt infrastructure is used to track pending stop
requests.

STOP irqs are the only irqs that don't get actively delivered. They
remain pending until the stop function is executed (=stop intercept).

If another STOP irq is already pending, -EBUSY will now be returned
(needed for the SIGP handling code).

Migration of pending SIGP STOP (AND STORE STATUS) orders should now
be supported out of the box.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
David Hildenbrand
2822545f9f KVM: s390: new parameter for SIGP STOP irqs
In order to get rid of the action_flags and to properly migrate pending SIGP
STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember
whether to store the status when stopping.

For this reason, a new parameter (flags) for the SIGP STOP irq is introduced.
These flags further define details of the requested STOP and can be easily
migrated.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
David Hildenbrand
2d00f75942 KVM: s390: forward hrtimer if guest ckc not pending yet
Patch 0759d0681c ("KVM: s390: cleanup handle_wait by reusing
kvm_vcpu_block") changed the way pending guest clock comparator
interrupts are detected. It was assumed that as soon as the hrtimer
wakes up, the condition for the guest ckc is satisfied.

This is however only true as long as adjclock() doesn't speed
up the monotonic clock. Reason is that the hrtimer is based on
CLOCK_MONOTONIC, the guest clock comparator detection is based
on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the
TOD clock, the hrtimer wakes the target VCPU up too early and
the target VCPU will not detect any pending interrupts, therefore
going back to sleep. It will never be woken up again because the
hrtimer has finished. The VCPU is stuck.

As a quick fix, we have to forward the hrtimer until the guest
clock comparator is really due, to guarantee properly timed wake
ups.

As the hrtimer callback might be triggered on another cpu, we
have to make sure that the timer is really stopped and not currently
executing the callback on another cpu. This can happen if the vcpu
thread is scheduled onto another physical cpu, but the timer base
is not migrated. So lets use hrtimer_cancel instead of try_to_cancel.

A proper fix might be to introduce a RAW based hrtimer.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:32 +01:00
David Hildenbrand
0ac96caf0f KVM: s390: base hrtimer on a monotonic clock
The hrtimer that handles the wait with enabled timer interrupts
should not be disturbed by changes of the host time.

This patch changes our hrtimer to be based on a monotonic clock.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:32 +01:00
David Hildenbrand
bda343ef14 KVM: s390: prevent sleep duration underflows in handle_wait()
We sometimes get an underflow for the sleep duration, which most
likely won't result in the short sleep time we wanted.

So let's check for sleep duration underflows and directly continue
to run the guest if we get one.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:31 +01:00
Dominik Dingel
8c0a7ce606 KVM: s390: Allow userspace to limit guest memory size
With commit c6c956b80b ("KVM: s390/mm: support gmap page tables with less
than 5 levels") we are able to define a limit for the guest memory size.

As we round up the guest size in respect to the levels of page tables
we get to guest limits of: 2048 MB, 4096 GB, 8192 TB and 16384 PB.
We currently limit the guest size to 16 TB, which means we end up
creating a page table structure supporting guest sizes up to 8192 TB.

This patch introduces an interface that allows userspace to tune
this limit. This may bring performance improvements for small guests.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:30 +01:00
Dominik Dingel
dafd032a15 KVM: s390: move vcpu specific initalization to a later point
As we will allow in a later patch to recreate gmaps with new limits,
we need to make sure that vcpus get their reference for that gmap
after they increased the online_vcpu counter, so there is no possible race.

While we are doing this, we also can simplify the vcpu_init function, by
moving ucontrol specifics to an own function.
That way we also start now setting the kvm_valid_regs for the ucontrol path.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:30 +01:00
Christian Borntraeger
0675d92dcf KVM: s390: make local function static
sparse rightfully complains about
warning: symbol '__inject_extcall' was not declared. Should it be static?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:29 +01:00
Dominik Dingel
31928aa586 KVM: remove unneeded return value of vcpu_postcreate
The return value of kvm_arch_vcpu_postcreate is not checked in its
caller.  This is okay, because only x86 provides vcpu_postcreate right
now and it could only fail if vcpu_load failed.  But that is not
possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so
just get rid of the unchecked return value.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:24:52 +01:00
Nicholas Krause
bab5bb3982 kvm: x86: Remove kvm_make_request from lapic.c
Adds a function kvm_vcpu_set_pending_timer instead of calling
kvm_make_request in lapic.c.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:08 +01:00
Nadav Amit
edccda7ca7 KVM: x86: Access to LDT/GDT that wraparound is incorrect
When access to descriptor in LDT/GDT wraparound outside long-mode, the address
of the descriptor should be truncated to 32-bit.  Citing Intel SDM 2.1.1.1
"Global and Local Descriptor Tables in IA-32e Mode": "GDTR and LDTR registers
are expanded to 64-bits wide in both IA-32e sub-modes (64-bit mode and
compatibility mode)."

So in other cases, we need to truncate. Creating new function to return a
pointer to descriptor table to avoid too much code duplication.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Wrap 64-bit check with #ifdef CONFIG_X86_64, to avoid a "right shift count
 >= width of type" warning and consequent undefined behavior. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:08 +01:00
Nadav Amit
e2cefa746e KVM: x86: Do not set access bit on accessed segments
When segment is loaded, the segment access bit is set unconditionally.  In
fact, it should be set conditionally, based on whether the segment had the
accessed bit set before. In addition, it can improve performance.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:07 +01:00
Nadav Amit
ab708099a0 KVM: x86: POP [ESP] is not emulated correctly
According to Intel SDM: "If the ESP register is used as a base register for
addressing a destination operand in memory, the POP instruction computes the
effective address of the operand after it increments the ESP register."

The current emulation does not behave so. The fix required to waste another
of the precious instruction flags and to check the flag in decode_modrm.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:07 +01:00
Nadav Amit
80976dbb5c KVM: x86: em_call_far should return failure result
Currently, if em_call_far fails it returns success instead of the resulting
error-code. Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:06 +01:00
Nadav Amit
3dc4bc4f6b KVM: x86: JMP/CALL using call- or task-gate causes exception
The KVM emulator does not emulate JMP and CALL that target a call gate or a
task gate.  This patch does not try to implement these scenario as they are
presumably rare; yet it returns X86EMUL_UNHANDLEABLE error in such cases
instead of generating an exception.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:05 +01:00
Nadav Amit
16bebefe29 KVM: x86: fnstcw and fnstsw may cause spurious exception
Since the operand size of fnstcw and fnstsw is updated during the execution,
the emulation may cause spurious exceptions as it reads the memory beforehand.

Marking these instructions as Mov (since the previous value is ignored) and
DstMem16 to simplify the setting of operand size.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:05 +01:00
Nadav Amit
3313bc4ee8 KVM: x86: pop sreg accesses only 2 bytes
Although pop sreg updates RSP according to the operand size, only 2 bytes are
read.  The current behavior may result in incorrect #GP or #PF exceptions.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:04 +01:00
Paolo Bonzini
fa4a2c080e KVM: x86: mmu: replace assertions with MMU_WARN_ON, a conditional WARN_ON
This makes the direction of the conditions consistent with code that
is already using WARN_ON.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:04 +01:00
Paolo Bonzini
4c1a50de92 KVM: x86: mmu: remove ASSERT(vcpu)
Because ASSERT is just a printk, these would oops right away.
The assertion thus hardly adds anything.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:03 +01:00
Paolo Bonzini
ad896af0b5 KVM: x86: mmu: remove argument to kvm_init_shadow_mmu and kvm_init_shadow_ept_mmu
The initialization function in mmu.c can always use walk_mmu, which
is known to be vcpu->arch.mmu.  Only init_kvm_nested_mmu is used to
initialize vcpu->arch.nested_mmu.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:02 +01:00
Paolo Bonzini
e0c6db3e22 KVM: x86: mmu: do not use return to tail-call functions that return void
This is, pedantically, not valid C.  It also looks weird.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:02 +01:00
Marcelo Tosatti
6c19b7538f KVM: x86: add tracepoint to wait_lapic_expire
Add tracepoint to wait_lapic_expire.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[Remind reader if early or late. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:01 +01:00
Marcelo Tosatti
d0659d946b KVM: x86: add option to advance tscdeadline hrtimer expiration
For the hrtimer which emulates the tscdeadline timer in the guest,
add an option to advance expiration, and busy spin on VM-entry waiting
for the actual expiration time to elapse.

This allows achieving low latencies in cyclictest (or any scenario
which requires strict timing regarding timer expiration).

Reduces average cyclictest latency from 12us to 8us
on Core i5 desktop.

Note: this option requires tuning to find the appropriate value
for a particular hardware/guest combination. One method is to measure the
average delay between apic_timer_fn and VM-entry.
Another method is to start with 1000ns, and increase the value
in say 500ns increments until avg cyclictest numbers stop decreasing.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:47:30 +01:00
Marcelo Tosatti
7c6a98dfa1 KVM: x86: add method to test PIR bitmap vector
kvm_x86_ops->test_posted_interrupt() returns true/false depending
whether 'vector' is set.

Next patch makes use of this interface.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:18 +01:00
Tiejun Chen
b4eef9b36d kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv
In most cases calling hwapic_isr_update(), we always check if
kvm_apic_vid_enabled() == 1, but actually,
kvm_apic_vid_enabled()
    -> kvm_x86_ops->vm_has_apicv()
        -> vmx_vm_has_apicv() or '0' in svm case
            -> return enable_apicv && irqchip_in_kernel(kvm)

So its a little cost to recall vmx_vm_has_apicv() inside
hwapic_isr_update(), here just NULL out hwapic_isr_update() in
case of !enable_apicv inside hardware_setup() then make all
related stuffs follow this. Note we don't check this under that
condition of irqchip_in_kernel() since we should make sure
definitely any caller don't work  without in-kernel irqchip.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:17 +01:00
Nicholas Krause
5ff22e7ebf KVM: x86: Remove FIXMEs in emulate.c for the function,task_switch_32
Remove FIXME comments about needing fault addresses to be returned.  These
are propaagated from walk_addr_generic to gva_to_gpa and from there to
ops->read_std and ops->write_std.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:17 +01:00
Eugene Korenevsky
19d5f10b3a KVM: nVMX: consult PFEC_MASK and PFEC_MATCH when generating #PF VM-exit
When generating #PF VM-exit, check equality:
(PFEC & PFEC_MASK) == PFEC_MATCH
If there is equality, the 14 bit of exception bitmap is used to take decision
about generating #PF VM-exit. If there is inequality, inverted 14 bit is used.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:16 +01:00
Eugene Korenevsky
e9ac033e6b KVM: nVMX: Improve nested msr switch checking
This patch improve checks required by Intel Software Developer Manual.
 - SMM MSRs are not allowed.
 - microcode MSRs are not allowed.
 - check x2apic MSRs only when LAPIC is in x2apic mode.
 - MSR switch areas must be aligned to 16 bytes.
 - address of first and last byte in MSR switch areas should not set any bits
   beyond the processor's physical-address width.

Also it adds warning messages on failures during MSR switch. These messages
are useful for people who debug their VMMs in nVMX.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:15 +01:00
Wincy Van
ff651cb613 KVM: nVMX: Add nested msr load/restore algorithm
Several hypervisors need MSR auto load/restore feature.
We read MSRs from VM-entry MSR load area which specified by L1,
and load them via kvm_set_msr in the nested entry.
When nested exit occurs, we get MSRs via kvm_get_msr, writing
them to L1`s MSR store area. After this, we read MSRs from VM-exit
MSR load area, and load them via kvm_set_msr.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:14 +01:00
Linus Torvalds
b1940cd21c Linux 3.19-rc3 2015-01-05 17:05:20 -08:00
Linus Torvalds
79b8cb9737 powerpc fixes for 3.19
Wire up sys_execveat(). Tested on 32 & 64 bit.
 
 Fix for kdump on LE systems with cpus hot unplugged.
 
 Revert Anton's fix for "kernel BUG at kernel/smpboot.c:134!", this broke other
 platforms, we'll do a proper fix for 3.20.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUqillAAoJEFHr6jzI4aWA+aYP/0zMNyRbQehtpF9OLKw2TGJW
 LeSdQMN+g6JWKzG2hsNIiKga5kquuCi0zgtu3dW/rxlzbXY0gxwAY7HFkjzgE7Em
 qlt/iairlg8ifE436DRCkH2SXTCDN2AnlRLO35QZ2zKoo8osJuA6YjDiQRN3Y//p
 ELvWraYoVdhbfGPg+gYJn94U/OAmFu0E0VrwEmpWie1YrfTNbbfTkS/BieqhqJTu
 UEX53tW6FXbmit3ReQGGsPzcoVWbXfIg09gRy14Bn8dtA1EipHKr1m5o1JCjGu9J
 moJVB8cMkj5o4OcQrPtJZI76FkzaVueNUVTUdFtIt1q6NCyvX8tZe16OKPYY0TEN
 j8snpJfsk+oZij5caYSKqTeMAPrh6JbIjbkl8cIwc7ClRkRf/l5sW0zaFPS+GDWl
 c9OFu1CdKO59g+apQ16BmMRw8b/pV22WkxSpqm9GD3I7Y7ABoUN+1l07d5H+aGRq
 khD7M2rUWh4CwSRvg+ealpyAP++4Sj6mn/z4pe/gem4Wn/ynxGi6CSsYYijB6PWP
 bm47t+QqyomRiH65Yo41vKGNYu/JfenvAWyJp8AnIajWqxwCRLVNSgySjtylUESs
 f02qdvLsBTd3/CorZxigWgThM1wWeKyCDO6NBbjVUC/L9O8Ck9U0WbJUVQFDgG0b
 +52Wv5FRb/xBF9XdpWsw
 =1yze
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc fixes from Michael Ellerman:

 - Wire up sys_execveat(). Tested on 32 & 64 bit.

 - Fix for kdump on LE systems with cpus hot unplugged.

 - Revert Anton's fix for "kernel BUG at kernel/smpboot.c:134!", this
   broke other platforms, we'll do a proper fix for 3.20.

* tag 'powerpc-3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  Revert "powerpc: Secondary CPUs must set cpu_callin_map after setting active and online"
  powerpc/kdump: Ignore failure in enabling big endian exception during crash
  powerpc: Wire up sys_execveat() syscall
2015-01-05 14:49:02 -08:00
Linus Torvalds
f40bde8588 Add execveat syscall
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUqwAbAAoJEKurIx+X31iBQCkP/3W3+bo2uabm8sjz6ivBi3N8
 FW+xiBeyKoCsJIbsdr+txpWoSeIJSQ2Wgd3QwB5L4ztrsykZMSXDg5NzMnnsWh/A
 30EfjraillriN9PHt+jcBBEr43VIMoxHA4E8HOuG08tiEaEQMXyGE0PirbNIB2lu
 Dh5dq5u+fPA9KiD6lHLLfYSAq2PAOl425tsRRRWqaivWjyu0N5pvZJ7zsNN1bFv1
 zVlA++IK7XyiM8QuGIMsQ3hLbVs3tdWVeqlS3CcqHRNi2rHQaRI5oX4Rdfj573Tn
 M7LfYINdhEzzggOpAK9o3olzF9+Z7E9Dj+hSO70GercgD0m3uaRPIgTig9SGP8nq
 IctyFjhC0MfTPMql83fhmcqEyXJelzKXpELT1FG3MpPUlSMXKyOHWxwfqyfUIj1z
 5JV8hurPME0p5FlfLWVaA0o3tncx26SYZ3y7Tdd5fHoOkaZj+Wqcdgs48h5o+1UL
 VjtCh9JUTx5jwn1+hviPxi4l0gfLwcBLVMqV1F3UCBLMeFD9lRuUs9HiX6tQKXAv
 E6RR5CdGZPgzqgcmoNQU73JjleZWlMqNDgn/8QpZ1NWEOq8wkPJqKkbPL1wfO7yC
 Fso4tsRwJn7ykpz4xbEoXYaoTycuIPgqoddxv/OqWw1bTAtlk/mo1sxalrg0XLrv
 mdp5VTjy4jf+/m6HcOid
 =0lKw
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-syscall' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull ia64 fixlet from Tony Luck:
 "Add execveat syscall"

* tag 'please-pull-syscall' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  [IA64] Enable execveat syscall for ia64
2015-01-05 14:31:20 -08:00
Tony Luck
b739896dd2 [IA64] Enable execveat syscall for ia64
See commit 51f39a1f0c
    syscalls: implement execveat() system call

Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-01-05 11:25:19 -08:00
Linus Torvalds
693a30b8f1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger:
 "Two fixes for UML regressions. Nothing exciting"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  x86, um: actually mark system call tables readonly
  um: Skip futex_atomic_cmpxchg_inatomic() test
2015-01-04 11:46:43 -08:00
Pavel Machek
4bf9636c39 Revert "ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo"
Commit 9fc2105aea ("ARM: 7830/1: delay: don't bother reporting
bogomips in /proc/cpuinfo") breaks audio in python, and probably
elsewhere, with message

  FATAL: cannot locate cpu MHz in /proc/cpuinfo

I'm not the first one to hit it, see for example

  https://theredblacktree.wordpress.com/2014/08/10/fatal-cannot-locate-cpu-mhz-in-proccpuinfo/
  https://devtalk.nvidia.com/default/topic/765800/workaround-for-fatal-cannot-locate-cpu-mhz-in-proc-cpuinf/?offset=1

Reading original changelog, I have to say "Stop breaking working setups.
You know who you are!".

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-04 11:39:59 -08:00
Daniel Borkmann
b485342bd7 x86, um: actually mark system call tables readonly
Commit a074335a37 ("x86, um: Mark system call tables readonly") was
supposed to mark the sys_call_table in UML as RO by adding the const,
but it doesn't have the desired effect as it's nevertheless being placed
into the data section since __cacheline_aligned enforces sys_call_table
being placed into .data..cacheline_aligned instead. We need to use
the ____cacheline_aligned version instead to fix this issue.

Before:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                 U sys_writev
0000000000000000 D sys_call_table
0000000000000000 D syscall_table_size

After:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                 U sys_writev
0000000000000000 R sys_call_table
0000000000000000 D syscall_table_size

Fixes: a074335a37 ("x86, um: Mark system call tables readonly")
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2015-01-04 14:21:25 +01:00
Richard Weinberger
f911d73105 um: Skip futex_atomic_cmpxchg_inatomic() test
futex_atomic_cmpxchg_inatomic() does not work on UML because
it triggers a copy_from_user() in kernel context.
On UML copy_from_user() can only be used if the kernel was called
by a real user space process such that UML can use ptrace()
to fetch the value.

Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Daniel Walter <d.walter@0x90.at>
2015-01-04 14:20:26 +01:00
Linus Torvalds
d753856c9f SCSI fixes on 20150102
This is a set of three fixes: one to correct an abort path thinko causing
 failures (and a panic) in USB on device misbehaviour, One to fix an out of
 order issue in the fnic driver and one to match discard expectations to qemu
 which otherwise cause Linux to behave badly as a guest.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJUpsmHAAoJEDeqqVYsXL0MdToH/1UsmOdtxNh6AfBDWWYi45o8
 kBno1gTnRrDgIlOiRu+BtmL25A1rNcCdCQKjG6JBEBruqUVAwZztktGjfuSvso6s
 EYbrnE+5DmDs6cW8pp6GK3QGV+R+AmbT8oBbe/Kpg5LdTsdnQOozSycUp7X3XgTi
 pLOm6rW/AgEV1QFcQz1bjI6cnbcOZMcGZnC5qwphiKgBnVYd+PZY24RSHDKCu/va
 z2lsa5yqXFHKZZZRhYG343YqCTf3Dkph78124JoNvVm3EjO+GQAAiojiUa7l59UF
 RqNRqeMxfz2cPmBnJxbNmWiP1YQGBgOaNDRgc7D7SPxaMwUe9444Gm4MkoZWzmQ=
 =706l
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of three fixes: one to correct an abort path thinko
  causing failures (and a panic) in USB on device misbehaviour, One to
  fix an out of order issue in the fnic driver and one to match discard
  expectations to qemu which otherwise cause Linux to behave badly as a
  guest"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  SCSI: fix regression in scsi_send_eh_cmnd()
  fnic: IOMMU Fault occurs when IO and abort IO is out of order
  sd: tweak discard heuristics to work around QEMU SCSI issue
2015-01-02 13:24:41 -08:00
Linus Torvalds
6a4bfa7c3f sound fixes for 3.19-rc3
Nothing too exciting as a new year's start here: most of fixes are for
 ASoC, a boot crash fix on OMAP for deferred probe, a few driver
 specific fixes (Intel, dwc, rockchip, rt5677), in addition to typo
 fixes in kerneldoc comments for PCM.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJUplctAAoJEGwxgFQ9KSmkmjwP/0Ccw2+8sh3uqPbmiwCBOYYL
 AUo0nhjvsoeCFqgs1rUdQBEeVqvTw5wckhmnvDAYri4wSlnT078m9o4lJpTo/Bem
 1VngAWjcA7CErhaU9G988znGAuIWhf9nr/DVhabIr22vmsPrGUDQrC3B/28596mn
 Pcj2H412vZRMOu6+ZDKX+9oxAWBnP9UDD3hjrKjTCUxwVlhfxAfdbsPNbkivIrLb
 QAkhqIHuMlb3D2MaBUjbQ+CQf+Q0N1szZoy9t83OCUgbNhmNObuwvso21mD92mwQ
 4reW0TdFHXOk3HrY0RBMkvnmrRzZoPBWxBH0bCTLO87rBtZC11IdkMpQ9mZ7gO9Q
 p/P8z+S1I8nuVmFamW0IbSjslVl8+zh/hJ3H+t4bZlOJgXevcl+VwIocsNcdPhTo
 V8CqcgwKLwwMuPOgHsiHk+2uCDxm23zmlgo5jyOuKuSnKxo5qu1aH3Mw7mU3S3dd
 +AOWmM4ighCK2rXkFJSH4UV7/bFIFGS0wfC7adqHW2BaXS/CRStwoYw/7C48c9Mk
 bY3vSjk7HwOebdnhwXclqWpLDPnmAWWfQvYW+mYakIWX0+qLpL1BFQUr6fYwmLMP
 GpPBdgqZmrchwEKQM5Wg+Gz4jlkOXzs1qUMidq4f6JxM2A1H2A8BfTz3pScJqqDh
 +lHj5upG2yVpYQbf4DEf
 =ARQX
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Nothing too exciting as a new year's start here: most of fixes are for
  ASoC, a boot crash fix on OMAP for deferred probe, a few driver
  specific fixes (Intel, dwc, rockchip, rt5677), in addition to typo
  fixes in kerneldoc comments for PCM"

* tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: pcm: Fix kerneldoc for params_*() functions
  ASoC: rockchip: i2s: fix maxburst of dma data to 4
  ASoC: rockchip: i2s: fix error defination of transmit data level
  ASoC: Intel: correct the fixed free block allocation
  ASoC: rt5677: fixed rt5677_dsp_vad_put rt5677_dsp_vad_get panic
  ASoC: Intel: Fix BYTCR machine driver MODULE_ALIAS
  ASoC: Intel: Fix BYTCR firmware name
  ASoC: dwc: Iterate over all channels
  ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
  ASoC: Intel: Add I2C dependency to two new machines
  ASoC: dapm: Remove snd_soc_of_parse_audio_routing() due to deferred probe
2015-01-02 12:57:20 -08:00