Currently the calculations of stolen time for PPC Book3S HV guests
uses fields in both the vcpu struct and the kvmppc_vcore struct. The
fields in the kvmppc_vcore struct are protected by the
vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for
running the virtual core. This works correctly but confuses lockdep,
because it sees that the code takes the tbacct_lock for a vcpu in
kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in
vcore_stolen_time(), and it thinks there is a possibility of deadlock,
causing it to print reports like this:
=============================================
[ INFO: possible recursive locking detected ]
3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted
---------------------------------------------
qemu-system-ppc/6188 is trying to acquire lock:
(&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
but task is already holding lock:
(&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&vcpu->arch.tbacct_lock)->rlock);
lock(&(&vcpu->arch.tbacct_lock)->rlock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by qemu-system-ppc/6188:
#0: (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f98>] .vcpu_load+0x28/0xe0 [kvm]
#1: (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecb41b0>] .kvmppc_vcpu_run_hv+0x530/0x1530 [kvm_hv]
#2: (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]
stack backtrace:
CPU: 40 PID: 6188 Comm: qemu-system-ppc Not tainted 3.18.0-rc7-kvm-00016-g8db4bc6 #89
Call Trace:
[c000000b2754f3f0] [c000000000b31b6c] .dump_stack+0x88/0xb4 (unreliable)
[c000000b2754f470] [c0000000000faeb8] .__lock_acquire+0x1878/0x2190
[c000000b2754f600] [c0000000000fbf0c] .lock_acquire+0xcc/0x1a0
[c000000b2754f6d0] [c000000000b2954c] ._raw_spin_lock_irq+0x4c/0x70
[c000000b2754f760] [d00000000ecb1fe8] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
[c000000b2754f7f0] [d00000000ecb25b4] .kvmppc_remove_runnable.part.3+0x44/0xd0 [kvm_hv]
[c000000b2754f880] [d00000000ecb43ec] .kvmppc_vcpu_run_hv+0x76c/0x1530 [kvm_hv]
[c000000b2754f9f0] [d00000000eb9f46c] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
[c000000b2754fa60] [d00000000eb9c9a4] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
[c000000b2754faf0] [d00000000eb94538] .kvm_vcpu_ioctl+0x498/0x760 [kvm]
[c000000b2754fcb0] [c000000000267eb4] .do_vfs_ioctl+0x444/0x770
[c000000b2754fd90] [c0000000002682a4] .SyS_ioctl+0xc4/0xe0
[c000000b2754fe30] [c0000000000092e4] syscall_exit+0x0/0x98
In order to make the locking easier to analyse, we change the code to
use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and
preempt_tb fields. This lock needs to be an irq-safe lock since it is
used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv()
functions, which are called with the scheduler rq lock held, which is
an irq-safe lock.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Remove the function inst_set_field() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Alexander Graf <agraf@suse.de>
Remove the function get_fpr_index() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Alexander Graf <agraf@suse.de>
Removes some functions that are not used anywhere:
kvmppc_core_load_guest_debugstate() kvmppc_core_load_host_debugstate()
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Alexander Graf <agraf@suse.de>
Remove the function sr_nx() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Alexander Graf <agraf@suse.de>
The kvmppc_vcore_blocked() code does not check for the wait condition
after putting the process on the wait queue. This means that it is
possible for an external interrupt to become pending, but the vcpu to
remain asleep until the next decrementer interrupt. The fix is to
make one last check for pending exceptions and ceded state before
calling schedule().
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
When being restored from qemu, the kvm_get_htab_header are in native
endian, but the ptes are big endian.
This patch fixes restore on a KVM LE host. Qemu also needs a fix for
this :
http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg00008.html
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This fixes some inaccuracies in the state machine for the virtualized
ICP when implementing the H_IPI hcall (Set_MFFR and related states):
1. The old code wipes out any pending interrupts when the new MFRR is
more favored than the CPPR but less favored than a pending
interrupt (by always modifying xisr and the pending_pri). This can
cause us to lose a pending external interrupt.
The correct code here is to only modify the pending_pri and xisr in
the ICP if the MFRR is equal to or more favored than the current
pending pri (since in this case, it is guaranteed that that there
cannot be a pending external interrupt). The code changes are
required in both kvmppc_rm_h_ipi and kvmppc_h_ipi.
2. Again, in both kvmppc_rm_h_ipi and kvmppc_h_ipi, there is a check
for whether MFRR is being made less favored AND further if new MFFR
is also less favored than the current CPPR, we check for any
resends pending in the ICP. These checks look like they are
designed to cover the case where if the MFRR is being made less
favored, we opportunistically trigger a resend of any interrupts
that had been previously rejected. Although, this is not a state
described by PAPR, this is an action we actually need to do
especially if the CPPR is already at 0xFF. Because in this case,
the resend bit will stay on until another ICP state change which
may be a long time coming and the interrupt stays pending until
then. The current code which checks for MFRR < CPPR is broken when
CPPR is 0xFF since it will not get triggered in that case.
Ideally, we would want to do a resend only if
prio(pending_interrupt) < mfrr && prio(pending_interrupt) < cppr
where pending interrupt is the one that was rejected. But we don't
have the priority of the pending interrupt state saved, so we
simply trigger a resend whenever the MFRR is made less favored.
3. In kvmppc_rm_h_ipi, where we save state to pass resends to the
virtual mode, we also need to save the ICP whose need_resend we
reset since this does not need to be my ICP (vcpu->arch.icp) as is
incorrectly assumed by the current code. A new field rm_resend_icp
is added to the kvmppc_icp structure for this purpose.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Testing with KSM active in the host showed occasional corruption of
guest memory. Typically a page that should have contained zeroes
would contain values that look like the contents of a user process
stack (values such as 0x0000_3fff_xxxx_xxx).
Code inspection in kvmppc_h_protect revealed that there was a race
condition with the possibility of granting write access to a page
which is read-only in the host page tables. The code attempts to keep
the host mapping read-only if the host userspace PTE is read-only, but
if that PTE had been temporarily made invalid for any reason, the
read-only check would not trigger and the host HPTE could end up
read-write. Examination of the guest HPT in the failure situation
revealed that there were indeed shared pages which should have been
read-only that were mapped read-write.
To close this race, we don't let a page go from being read-only to
being read-write, as far as the real HPTE mapping the page is
concerned (the guest view can go to read-write, but the actual mapping
stays read-only). When the guest tries to write to the page, we take
an HDSI and let kvmppc_book3s_hv_page_fault take care of providing a
writable HPTE for the page.
This eliminates the occasional corruption of shared pages
that was previously seen with KSM active.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
The B (segment size) field in the RB operand for the tlbie
instruction is two bits, which we get from the top two bits of
the first doubleword of the HPT entry to be invalidated. These
bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM
bit numbering).
The compute_tlbie_rb() function gets these bits as v >> (62 - 8),
which is not correct as it will bring in the top 10 bits, not
just the top two. These extra bits could corrupt the AP, AVAL
and L fields in the RB value. To fix this we shift right 62 bits
and then shift left 8 bits, so we only get the two bits of the
B field.
The first doubleword of the HPT entry is under the control of the
guest kernel. In fact, Linux guests will always put zeroes in bits
54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing
this.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the
next HPTE without unlocking the invalid one. In fact we should never
find an invalid and unlocked HPTE in the rmap chain, but for robustness
we should unlock it. This adds the missing unlock.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
When injecting an IRQ, we only document which IRQ priority (which translates
to IRQ type) gets injected. However, when reading traces you don't necessarily
have all the numbers in your head to know which IRQ really is meant.
This patch converts the IRQ number field to a symbolic name that is in sync
with the respective define. That way it's a lot easier for readers to figure
out what interrupt gets injected.
Signed-off-by: Alexander Graf <agraf@suse.de>
We reused host EBX and ECX, but KVM might not support all features;
emulated XSAVE size should be smaller.
EBX depends on unknown XCR0, so we default to ECX.
SDM CPUID (EAX = 0DH, ECX = 0):
EBX Bits 31-00: Maximum size (bytes, from the beginning of the
XSAVE/XRSTOR save area) required by enabled features in XCR0. May
be different than ECX if some features at the end of the XSAVE save
area are not enabled.
ECX Bit 31-00: Maximum size (bytes, from the beginning of the
XSAVE/XRSTOR save area) of the XSAVE/XRSTOR save area required by
all supported features in the processor, i.e all the valid bit
fields in XCR0.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add nested virtualization support for xsaves.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add logic to get/set the XSS model-specific register.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Initialize the XSS exit bitmap. It is zero so there should be no XSAVES
or XRSTORS exits.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- EAX=0Dh, ECX=1: output registers EBX/ECX/EDX are reserved.
- EAX=0Dh, ECX>1: output register ECX bit 0 is clear for all the CPUID
leaves we support, because variable "supported" comes from XCR0 and not
XSS. Bits above 0 are reserved, so ECX is overall zero. Output register
EDX is reserved.
Source: Intel Architecture Instruction Set Extensions Programming
Reference, ref. number 319433-022
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is the size of the XSAVES area. This starts providing guest support
for XSAVES (with no support yet for supervisor states, i.e. XSS == 0
always in guests for now).
Wanpeng Li suggested testing XSAVEC as well as XSAVES, since in practice
no real processor exists that only has one of them, and there is no
other way for userspace programs to compute the area of the XSAVEC
save area. CPUID(EAX=0xd,ECX=1).EBX provides an upper bound.
Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expose the XSAVES feature to the guest if the kvm_x86_ops say it is
available.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
struct xsave_struct might be using the compacted format. Convert
in order to preserve userspace ABI.
Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE
but the kernel will pass it to XRSTORS, and we need to convert back.
Fixes: f31a9f7c71
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: stable@vger.kernel.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
get_xsave_addr is the API to access XSAVE states, and KVM would
like to use it. Export it.
Cc: stable@vger.kernel.org
Cc: x86@kernel.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Here we have two fixups of the latest interrupt rework and
one architectural fixup.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJUgIO9AAoJEBF7vIC1phx8U5YQALKPvkoyVaGOexDCQQJrz04+
9W5ZQuvg020jj1EmRPlv9yekhtYBTYA/OVT+FsDG3eq+xCS4Y04vlvUfk2Lq0YAB
Bk+cU7plNP8f1Ml/kRTC/yfWcPe8iCVI1WA7zqA2wwEbaEZCz+9+i76wyriDWtJM
FJEKXqVmLZH/H+GAw4GQG0YrO4uZ70Sk5F5YVtceGN5mDlqCATXa2xADm4ciAiIW
L6w+G3KjIuv/Opds6DFMfJgYQeAF+0MFFWpD2rUfVLwIvdtBTtnxVwQZfpoXH7rF
SeNU6n7CILS0csW/ZMcuTE8UrYgW1kkj1iVgc6fT7nkTWWZ5+iGRbAKQG5WPKibn
84f9cn7kJ22ZobayaLskNfe041o1a+zMUCswHdFYTgF29WefeqkikoPx0B80g4p6
O1jZqSZXKiTtlmIqiISikJhDVx0/kC+ftlu4MJLKq7O8RgcaYmAJp7gSGz50sMPB
AybcfNkoYsg5J85sinT16TA/Vmcd3ZBWRqaokiRFD/uuqh5cKnyZOTpPwph8Welq
QxBtqpG36cBLRJRMzlSJNXg1LKvb8WAxWYxGjiSL/8y/V3intnSFYdLjhNpx3yj8
WCxaG3m2Gf68RQOVSdqlvgH//xBeSz0l9grX+BzJNvBY7aZMmTbXHwIt/lSCiBZd
radGCMmVN4YEq4OoTgyr
=ZZSq
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-20141204' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixups for kvm/next (3.19)
Here we have two fixups of the latest interrupt rework and
one architectural fixup.
Instead of returning a possibly random or'ed together value, let's
always return -EFAULT if rc is set.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Currently we use a mixture of atomic/non-atomic bitops
and the local_int spin lock to protect the pending_irqs bitmap
and interrupt payload data.
We need to use atomic bitops for the pending_irqs bitmap everywhere
and in addition acquire the local_int lock where interrupt data needs
to be protected.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The cpu address of a source cpu (responsible for an external irq) is only to
be stored if bit 6 of the ext irq code is set.
If bit 6 is not set, it is to be zeroed out.
The special external irq code used for virtio and pfault uses the cpu addr as a
parameter field. As bit 6 is set, this implementation is correct.
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We currently track the pid of the task that runs the VCPU in vcpu_load.
If a yield to that VCPU is triggered while the PID of the wrong thread
is active, the wrong thread might receive a yield, but this will most
likely not help the executing thread at all. Instead, if we only track
the pid on the KVM_RUN ioctl, there are two possibilities:
1) the thread that did a non-KVM_RUN ioctl is holding a mutex that
the VCPU thread is waiting for. In this case, the VCPU thread is not
runnable, but we also do not do a wrong yield.
2) the thread that did a non-KVM_RUN ioctl is sleeping, or doing
something that does not block the VCPU thread. In this case, the
VCPU thread can receive the directed yield correctly.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Rik van Riel <riel@redhat.com>
CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
CC: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_enter_guest() has to be called with preemption disabled and will
set PF_VCPU. Current code takes PF_VCPU as a hint that the VCPU thread
is running and therefore needs no yield.
However, the check on PF_VCPU is wrong on s390, where preemption has
to stay enabled in order to correctly process page faults. Thus,
s390 reenables preemption and starts to execute the guest. The thread
might be scheduled out between kvm_enter_guest() and kvm_exit_guest(),
resulting in PF_VCPU being set but not being run. When this happens,
the opportunity for directed yield is missed.
However, this check is done already in kvm_vcpu_on_spin before calling
kvm_vcpu_yield_loop:
if (!ACCESS_ONCE(vcpu->preempted))
continue;
so the check on PF_VCPU is superfluous in general, and this patch
removes it.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current linear search doesn't scale well when
large amount of memslots is used and looked up slot
is not in the beginning memslots array.
Taking in account that memslots don't overlap, it's
possible to switch sorting order of memslots array from
'npages' to 'base_gfn' and use binary search for
memslot lookup by GFN.
As result of switching to binary search lookup times
are reduced with large amount of memslots.
Following is a table of search_memslot() cycles
during WS2008R2 guest boot.
boot, boot + ~10 min
mostly same of using it,
slot lookup randomized lookup
max average average
cycles cycles cycles
13 slots : 1450 28 30
13 slots : 1400 30 40
binary search
117 slots : 13000 30 460
117 slots : 2000 35 180
binary search
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
it will allow to use binary search for GFN -> memslot
lookups, reducing lookup cost with large slots amount.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In typical guest boot workload only 2-3 memslots are used
extensively, and at that it's mostly the same memslot
lookup operation.
Adding LRU cache improves average lookup time from
46 to 28 cycles (~40%) for this workload.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
UP/DOWN shift loops will shift array in needed
direction and stop at place where new slot should
be placed regardless of old slot size.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
if number of pages haven't changed sorting algorithm
will do nothing, so there is no need to do extra check
to avoid entering sorting logic.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
While fixing an x2apic bug,
17d68b7 KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)
we've made only one cluster available. This means that the amount of
logically addressible x2APICs was reduced to 16 and VCPUs kept
overwriting themselves in that region, so even the first cluster wasn't
set up correctly.
This patch extends x2APIC support back to the logical_map's limit, and
keeps the CVE fixed as messages for non-present APICs are dropped.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
They can't be violated now, but play it safe for the future.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
x2apic allows destinations > 0xff and we don't want them delivered to
lower APICs. They are correctly handled by doing nothing.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Physical mode can't address more than one APIC, but lowest-prio is
allowed, so we just reuse our paths.
SDM 10.6.2.1 Physical Destination:
Also, for any non-broadcast IPI or I/O subsystem initiated interrupt
with lowest priority delivery mode, software must ensure that APICs
defined in the interrupt address are present and enabled to receive
interrupts.
We could warn on top of that.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
False from kvm_irq_delivery_to_apic_fast() means that we don't handle it
in the fast path, but we still return false in cases that were perfectly
handled, fix that.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR.
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 "x2APIC
Register Address Space"). KVM needs to cause #GP on such accesses.
Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes).
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Certain x86 instructions that use modrm operands only allow memory operand
(i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact.
Currently, the instructions that are such and are emulated by KVM are MOVBE,
MOVNTPS, MOVNTPD and MOVNTI. MOVBE is the most blunt example, since it may be
emulated by the host regardless of MMIO.
The fix introduces a new group for handling such instructions, marking mod3 as
illegal instruction.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Here is a bunch of fixes that deal mostly with architectural compliance:
- interrupt priorities
- interrupt handling
- intruction exit handling
We also provide a helper function for getting the guest visible storage key.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJUeHKOAAoJEBF7vIC1phx8fS8P/2i1zXvsB+Mp9FafU6FO3Ci6
yM/ZBLFEsX2jmU0TGR2szZCP4xuHqomJm441+2CX9bXzjf8vnA2hGiDYX0bnBraK
1/klx6Li1fQxsiSHXIgBXr0wh5ftNUdFVZiJoifY9dEdrhVI+YEiptIl7lADCFXi
SdGtjEAzrVEe8H0g6OuBXeEfPzHvxAzNJ31yHuiCKl7vbFVnXNVPeZhq3dJZwmWC
iIdlSqGIjbcHNMLXvrScLDAScBe6WruBLhPSy5aTIA2eBU6f4qkOedSFABJkIAq+
V6v5pXWbrVBIqfLXE7Vp7jxhP7+viBGzu/gPkfT8HV1pQZDa94WojF8hGU0DLGLd
vgZuFDV8cMOZMUS/onXOwnIXN5VPvP8V2v3y8gTiap3MykRiyTGEzq2auU9p0K5n
/8W6Cn1P/WBS8MOFYg726DGmMAWkzpEVz9rxpCLaTpzz7QVyLuSLq/n3SXyQNQIl
zhox/KzwUQD0t1062USoK3w4suYNvnX0BuFOwxXvS7f4bsb+6V/t0GyIBnVAL/OF
DZzJSIyzP/Ur/9krxJQxML3kEELU1CjwSLOrzDUnZA3ytaKvsLrHkTb9nK6fREDK
14AGRnp94B3EMPR6T+7T6gvwK2Y/QIo8Y/EFAa2BwXY4Q/BQPSVU3x3RK6L9+7jY
VaG9sgn4OC2ZPzxjqMTc
=MQUy
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-20141128' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Several fixes,cleanups and reworks
Here is a bunch of fixes that deal mostly with architectural compliance:
- interrupt priorities
- interrupt handling
- intruction exit handling
We also provide a helper function for getting the guest visible storage key.
Allow to specify CR14, logout area, external damage code
and failed storage address.
Since more then one machine check can be indicated to the guest at
a time we need to combine all indication bits with already pending
requests.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch adapts handling of local interrupts to be more compliant with
the z/Architecture Principles of Operation and introduces a data
structure
which allows more efficient handling of interrupts.
* get rid of li->active flag, use bitmap instead
* Keep interrupts in a bitmap instead of a list
* Deliver interrupts in the order of their priority as defined in the
PoP
* Use a second bitmap for sigp emergency requests, as a CPU can have
one request pending from every other CPU in the system.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Adds a bitmap to the vcpu structure which is used to keep track
of local pending interrupts. Also add enum with all interrupt
types sorted in order of priority (highest to lowest)
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Move delivery code for cpu-local interrupt from the huge do_deliver_interrupt()
to smaller functions which handle one type of interrupt.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Get rid of open coded value for virtio and pfault completion interrupts.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The 32bit external interrupt parameter is only valid for timing-alert and
service-signal interrupts.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
In preparation for the rework of the local interrupt injection code,
factor out injection routines from kvm_s390_inject_vcpu().
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>