Commit Graph

468297 Commits

Author SHA1 Message Date
Ard Biesheuvel
c40f2f8ff8 arm/arm64: KVM: add 'writable' parameter to kvm_phys_addr_ioremap
Add support for read-only MMIO passthrough mappings by adding a
'writable' parameter to kvm_phys_addr_ioremap. For the moment,
mappings will be read-write even if 'writable' is false, but once
the definition of PAGE_S2_DEVICE gets changed, those mappings will
be created read-only.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-10 13:07:37 +02:00
Ard Biesheuvel
37b544087e arm/arm64: KVM: fix potential NULL dereference in user_mem_abort()
Handle the potential NULL return value of find_vma_intersection()
before dereferencing it.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-10 13:07:37 +02:00
Ard Biesheuvel
e9e8578b6c arm/arm64: KVM: use __GFP_ZERO not memset() to get zeroed pages
Pass __GFP_ZERO to __get_free_pages() instead of calling memset()
explicitly.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-10 13:07:37 +02:00
Arnd Bergmann
b5e7a955a3 ARM: KVM: fix vgic-disabled build
The vgic code can be disabled in Kconfig and there are dummy implementations
of most of the provided API functions for the disabled case.

However, the newly introduced kvm_vgic_destroy/kvm_vgic_vcpu_destroy
functions are lacking those dummies, resulting in this build error:

arch/arm/kvm/arm.c: In function 'kvm_arch_destroy_vm':
arch/arm/kvm/arm.c:165:2: error: implicit declaration of function 'kvm_vgic_destroy' [-Werror=implicit-function-declaration]
  kvm_vgic_destroy(kvm);
  ^
arch/arm/kvm/arm.c: In function 'kvm_arch_vcpu_free':
arch/arm/kvm/arm.c:248:2: error: implicit declaration of function 'kvm_vgic_vcpu_destroy' [-Werror=implicit-function-declaration]
  kvm_vgic_vcpu_destroy(vcpu);
  ^

This adds two inline helpers to get it to build again in this configuration.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: c1bfb577ad ("arm/arm64: KVM: vgic: switch to dynamic allocation")
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-07 12:30:16 +02:00
Vladimir Murzin
37a34ac1d4 arm: kvm: fix CPU hotplug
On some platforms with no power management capabilities, the hotplug
implementation is allowed to return from a smp_ops.cpu_die() call as a
function return. Upon a CPU onlining event, the KVM CPU notifier tries
to reinstall the hyp stub, which fails on platform where no reset took
place following a hotplug event, with the message:

CPU1: smp_ops.cpu_die() returned, trying to resuscitate
CPU1: Booted secondary processor
Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x80409540
unexpected data abort in Hyp mode at: 0x80401fe8
unexpected HVC/SVC trap in Hyp mode at: 0x805c6170

since KVM code is trying to reinstall the stub on a system where it is
already configured.

To prevent this issue, this patch adds a check in the KVM hotplug
notifier that detects if the HYP stub really needs re-installing when a
CPU is onlined and skips the installation call if the stub is already in
place, which means that the CPU has not been reset.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-29 16:02:34 +02:00
Christoffer Dall
0496daa5cf arm/arm64: KVM: Report correct FSC for unsupported fault types
When we catch something that's not a permission fault or a translation
fault, we log the unsupported FSC in the kernel log, but we were masking
off the bottom bits of the FSC which was not very helpful.

Also correctly report the FSC for data and instruction faults rather
than telling people it was a DFCS, which doesn't exist in the ARM ARM.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-26 14:39:45 +02:00
Joel Schopp
dbff124e29 arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
The current aarch64 calculation for VTTBR_BADDR_MASK masks only 39 bits
and not all the bits in the PA range. This is clearly a bug that
manifests itself on systems that allocate memory in the higher address
space range.

 [ Modified from Joel's original patch to be based on PHYS_MASK_SHIFT
   instead of a hard-coded value and to move the alignment check of the
   allocation to mmu.c.  Also added a comment explaining why we hardcode
   the IPA range and changed the stage-2 pgd allocation to be based on
   the 40 bit IPA range instead of the maximum possible 48 bit PA range.
   - Christoffer ]

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Joel Schopp <joel.schopp@amd.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-26 14:39:13 +02:00
Christoffer Dall
0fea6d7628 arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
The sgi values calculated in read_set_clear_sgi_pend_reg() and
write_set_clear_sgi_pend_reg() were horribly incorrectly multiplied by 4
with catastrophic results in that subfunctions ended up overwriting
memory not allocated for the expected purpose.

This showed up as bugs in kfree() and the kernel complaining a lot of
you turn on memory debugging.

This addresses: http://marc.info/?l=kvm&m=141164910007868&w=2

Reported-by: Shannon Zhao <zhaoshenglong@huawei.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-25 19:38:25 +02:00
Marc Zyngier
a98f26f183 arm/arm64: KVM: vgic: make number of irqs a configurable attribute
In order to make the number of interrupts configurable, use the new
fancy device management API to add KVM_DEV_ARM_VGIC_GRP_NR_IRQS as
a VGIC configurable attribute.

Userspace can now specify the exact size of the GIC (by increments
of 32 interrupts).

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-09-18 18:48:58 -07:00
Marc Zyngier
4956f2bc1f arm/arm64: KVM: vgic: delay vgic allocation until init time
It is now quite easy to delay the allocation of the vgic tables
until we actually require it to be up and running (when the first
vcpu is kicking around, or someones tries to access the GIC registers).

This allow us to allocate memory for the exact number of CPUs we
have. As nobody configures the number of interrupts just yet,
use a fallback to VGIC_NR_IRQS_LEGACY.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-09-18 18:48:58 -07:00
Marc Zyngier
5fb66da640 arm/arm64: KVM: vgic: kill VGIC_NR_IRQS
Nuke VGIC_NR_IRQS entierly, now that the distributor instance
contains the number of IRQ allocated to this GIC.

Also add VGIC_NR_IRQS_LEGACY to preserve the current API.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-09-18 18:48:57 -07:00
Marc Zyngier
c3c918361a arm/arm64: KVM: vgic: handle out-of-range MMIO accesses
Now that we can (almost) dynamically size the number of interrupts,
we're facing an interesting issue:

We have to evaluate at runtime whether or not an access hits a valid
register, based on the sizing of this particular instance of the
distributor. Furthermore, the GIC spec says that accessing a reserved
register is RAZ/WI.

For this, add a new field to our range structure, indicating the number
of bits a single interrupts uses. That allows us to find out whether or
not the access is in range.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-09-18 18:48:57 -07:00
Marc Zyngier
fc675e355e arm/arm64: KVM: vgic: kill VGIC_MAX_CPUS
We now have the information about the number of CPU interfaces in
the distributor itself. Let's get rid of VGIC_MAX_CPUS, and just
rely on KVM_MAX_VCPUS where we don't have the choice. Yet.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-09-18 18:48:57 -07:00
Marc Zyngier
fb65ab63b8 arm/arm64: KVM: vgic: Parametrize VGIC_NR_SHARED_IRQS
Having a dynamic number of supported interrupts means that we
cannot relly on VGIC_NR_SHARED_IRQS being fixed anymore.

Instead, make it take the distributor structure as a parameter,
so it can return the right value.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-09-18 18:48:56 -07:00
Marc Zyngier
c1bfb577ad arm/arm64: KVM: vgic: switch to dynamic allocation
So far, all the VGIC data structures are statically defined by the
*maximum* number of vcpus and interrupts it supports. It means that
we always have to oversize it to cater for the worse case.

Start by changing the data structures to be dynamically sizeable,
and allocate them at runtime.

The sizes are still very static though.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-09-18 18:48:52 -07:00
Marc Zyngier
71afaba4a2 KVM: ARM: vgic: plug irq injection race
As it stands, nothing prevents userspace from injecting an interrupt
before the guest's GIC is actually initialized.

This goes unnoticed so far (as everything is pretty much statically
allocated), but ends up exploding in a spectacular way once we switch
to a more dynamic allocation (the GIC data structure isn't there yet).

The fix is to test for the "ready" flag in the VGIC distributor before
trying to inject the interrupt. Note that in order to avoid breaking
userspace, we have to ignore what is essentially an error.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-18 18:45:06 -07:00
Christoffer Dall
7e362919a5 arm/arm64: KVM: vgic: Clarify and correct vgic documentation
The VGIC virtual distributor implementation documentation was written a
very long time ago, before the true nature of the beast had been
partially absorbed into my bloodstream.  Clarify the docs.

Plus, it fixes an actual bug.  ICFRn, pfff.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-18 18:44:32 -07:00
Christoffer Dall
9da48b5502 arm/arm64: KVM: vgic: Fix SGI writes to GICD_I{CS}PENDR0
Writes to GICD_ISPENDR0 and GICD_ICPENDR0 ignore all settings of the
pending state for SGIs.  Make sure the implementation handles this
correctly.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-18 18:44:32 -07:00
Christoffer Dall
faa1b46c3e arm/arm64: KVM: vgic: Improve handling of GICD_I{CS}PENDRn
Writes to GICD_ISPENDRn and GICD_ICPENDRn are currently not handled
correctly for level-triggered interrupts.  The spec states that for
level-triggered interrupts, writes to the GICD_ISPENDRn activate the
output of a flip-flop which is in turn or'ed with the actual input
interrupt signal.  Correspondingly, writes to GICD_ICPENDRn simply
deactivates the output of that flip-flop, but does not (of course) affect
the external input signal.  Reads from GICC_IAR will also deactivate the
flip-flop output.

This requires us to track the state of the level-input separately from
the state in the flip-flop.  We therefore introduce two new variables on
the distributor struct to track these two states.  Astute readers may
notice that this is introducing more state than required (because an OR
of the two states gives you the pending state), but the remaining vgic
code uses the pending bitmap for optimized operations to figure out, at
the end of the day, if an interrupt is pending or not on the distributor
side.  Refactoring the code to consider the two state variables all the
places where we currently access the precomputed pending value, did not
look pretty.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-18 18:44:31 -07:00
Christoffer Dall
cced50c928 arm/arm64: KVM: vgic: Clear queued flags on unqueue
If we unqueue a level-triggered interrupt completely, and the LR does
not stick around in the active state (and will therefore no longer
generate a maintenance interrupt), then we should clear the queued flag
so that the vgic can actually queue this level-triggered interrupt at a
later time and deal with its pending state then.

Note: This should actually be properly fixed to handle the active state
on the distributor.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-18 18:44:31 -07:00
Christoffer Dall
dbf20f9d81 arm/arm64: KVM: Rename irq_active to irq_queued
We have a special bitmap on the distributor struct to keep track of when
level-triggered interrupts are queued on the list registers.  This was
named irq_active, which is confusing, because the active state of an
interrupt as per the GIC spec is a different thing, not specifically
related to edge-triggered/level-triggered configurations but rather
indicates an interrupt which has been ack'ed but not yet eoi'ed.

Rename the bitmap and the corresponding accessor functions to irq_queued
to clarify what this is actually used for.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-18 18:44:30 -07:00
Christoffer Dall
227844f538 arm/arm64: KVM: Rename irq_state to irq_pending
The irq_state field on the distributor struct is ambiguous in its
meaning; the comment says it's the level of the input put, but that
doesn't make much sense for edge-triggered interrupts.  The code
actually uses this state variable to check if the interrupt is in the
pending state on the distributor so clarify the comment and rename the
actual variable and accessor methods.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-18 18:44:30 -07:00
Christoffer Dall
a875dafcf9 Merge remote-tracking branch 'kvm/next' into queue
Conflicts:
	arch/arm64/include/asm/kvm_host.h
	virt/kvm/arm/vgic.c
2014-09-18 18:15:32 -07:00
Tang Chen
f51770ed46 kvm: Make init_rmode_identity_map() return 0 on success.
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and ret is redundant.

This patch removes the redundant variable, and makes init_rmode_identity_map()
return 0 on success, -errno on failure.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-17 13:10:12 +02:00
Tang Chen
a255d4795f kvm: Remove ept_identity_pagetable from struct kvm_arch.
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
      As a result, it cannot be migrated/hot-removed. After this patch, since
      kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
      is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-17 13:10:11 +02:00
Will Deacon
80ce163972 KVM: VFIO: register kvm_device_ops dynamically
Now that we have a dynamic means to register kvm_device_ops, use that
for the VFIO kvm device, instead of relying on the static table.

This is achieved by a module_init call to register the ops with KVM.

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Alex Williamson <Alex.Williamson@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-17 13:10:10 +02:00
Cornelia Huck
84877d9333 KVM: s390: register flic ops dynamically
Using the new kvm_register_device_ops() interface makes us get rid of
an #ifdef in common code.

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-17 13:10:09 +02:00
Will Deacon
c06a841bf3 KVM: ARM: vgic: register kvm_device_ops dynamically
Now that we have a dynamic means to register kvm_device_ops, use that
for the ARM VGIC, instead of relying on the static table.

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-17 13:10:09 +02:00
Will Deacon
d60eacb070 KVM: device: add simple registration mechanism for kvm_device_ops
kvm_ioctl_create_device currently has knowledge of all the device types
and their associated ops. This is fairly inflexible when adding support
for new in-kernel device emulations, so move what we currently have out
into a table, which can support dynamic registration of ops by new
drivers for virtual hardware.

Cc: Alex Williamson <Alex.Williamson@redhat.com>
Cc: Alex Graf <agraf@suse.de>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-17 13:10:08 +02:00
Zhang Haoyu
184564efae kvm: ioapic: conditionally delay irq delivery duringeoi broadcast
Currently, we call ioapic_service() immediately when we find the irq is still
active during eoi broadcast. But for real hardware, there's some delay between
the EOI writing and irq delivery.  If we do not emulate this behavior, and
re-inject the interrupt immediately after the guest sends an EOI and re-enables
interrupts, a guest might spend all its time in the ISR if it has a broken
handler for a level-triggered interrupt.

Such livelock actually happens with Windows guests when resuming from
hibernation.

As there's no way to recognize the broken handle from new raised ones, this patch
delays an interrupt if 10.000 consecutive EOIs found that the interrupt was
still high.  The guest can then make a little forward progress, until a proper
IRQ handler is set or until some detection routine in the guest (such as
Linux's note_interrupt()) recognizes the situation.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-16 14:44:48 +02:00
Guo Hui Liu
105b21bbf6 KVM: x86: Use kvm_make_request when applicable
This patch replace the set_bit method by kvm_make_request
to make code more readable and consistent.

Signed-off-by: Guo Hui Liu <liuguohui@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-16 14:44:20 +02:00
Eric Auger
0ba09511dd KVM: EVENTFD: remove inclusion of irq.h
No more needed. irq.h would be void on ARM.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-09-11 11:31:19 +01:00
Ard Biesheuvel
a7d079cea2 ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()
The ISS encoding for an exception from a Data Abort has a WnR
bit[6] that indicates whether the Data Abort was caused by a
read or a write instruction. While there are several fields
in the encoding that are only valid if the ISV bit[24] is set,
WnR is not one of them, so we can read it unconditionally.

Instead of fixing both implementations of kvm_is_write_fault()
in place, reimplement it just once using kvm_vcpu_dabt_iswrite(),
which already does the right thing with respect to the WnR bit.
Also fix up the callers to pass 'vcpu'

Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-09-11 11:31:13 +01:00
Paolo Bonzini
a183b638b6 KVM: x86: make apic_accept_irq tracepoint more generic
Initially the tracepoint was added only to the APIC_DM_FIXED case,
also because it reported coalesced interrupts that only made sense
for that case.  However, the coalesced argument is not used anymore
and tracing other delivery modes is useful, so hoist the call out
of the switch statement.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-11 11:51:02 +02:00
Tang Chen
73a6d94162 kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee00000, which is also the address of
apic access page. So use this macro.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-11 11:10:22 +02:00
Paolo Bonzini
2c69c1a321 KVM: s390: Fixes and features for next (3.18)
1. Crypto/CPACF support: To enable the MSA4 instructions we have to
    provide a common control structure for each SIE control block
 2. Two cleanups found by a static code checker: one redundant assignment
    and one useless if
 3. Fix the page handling of the diag10 ballooning interface. If the
    guest freed the pages at absolute 0 some checks and frees were
    incorrect
 4. Limit guests to 16TB
 5. Add __must_check to interrupt injection code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJUEFCCAAoJEBF7vIC1phx8HgYP/RQLmy5cD8EJCUnETCjnj9+E
 5AhAtHVMAgjTuKdAIIrTbcdA3od8BjnYgfHi7zczG9XSOTA6Mq+suelTwkoWWQmU
 xA4Q4/zuliU5w2LgFYDeSQIrn4faNrCut0eDd2ViQhKL2m8PYBA7O4CllsDo1o8l
 B8q4cHTinXSU4fbppKaohDrdoNvSJVSiNeIy/doAxYLx/OKhlxmGd+dFXQYtUIvP
 /aO31h5BhFkoljRi0AU11GC6yzIHqwJtjRjnV5k2BWmPi8kyeJe8mAbkE0+M10uI
 OSVC34BHh7p+O9vNUZHKBzhZnqvyB4qc+s8Vp5wXwG0mWjWPkaevZgxOLvccOBi1
 1SPKv4ic2aFdfcSiMrP+EEL45OS/ISely2o1bgNVWHxSdgsnMir5H3Frohn+pWaZ
 y6Fw9jaXKTy5f25/IzTteENEvURQVfNjAHUQO03oXu/L+G8MVhzY6g7WiK+tRHDo
 anM3fUThkPwBPdMVpDsBTsCoTOTVTJqMHvnxJujEWYJmVmnnzZzMyUDIUK9qqzVi
 jZDpP8NEkV5LbWMr240DW1XteRGFKWtPZ176+NJUfut40jQWTiDLhBf1wG1xBRsF
 40JYBqATgaLM1Ggk4r3ykd1U9jrX3oqxMT7XJsc111m5ieaaiD1F5SJPXw07Rvug
 cSJI39G+9Df7Jz4R3hwv
 =jt+E
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-20140910' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

KVM: s390: Fixes and features for next (3.18)

1. Crypto/CPACF support: To enable the MSA4 instructions we have to
   provide a common control structure for each SIE control block
2. Two cleanups found by a static code checker: one redundant assignment
   and one useless if
3. Fix the page handling of the diag10 ballooning interface. If the
   guest freed the pages at absolute 0 some checks and frees were
   incorrect
4. Limit guests to 16TB
5. Add __must_check to interrupt injection code
2014-09-11 11:09:33 +02:00
Christian Borntraeger
bfac1f59a1 KVM: s390/interrupt: remove double assignment
r is already initialized to 0.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
2014-09-10 12:19:45 +02:00
Christian Borntraeger
f7a960affc KVM: s390/cmm: Fix prefix handling for diag 10 balloon
The old handling of prefix pages was broken in the diag10 ballooner.
We now rely on gmap_discard to check for start > end and do a
slow path if the prefix swap pages are affected:
1. discard the pages from start to prefix
2. discard the absolute 0 pages
3. discard the pages after prefix swap to end

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
2014-09-10 12:19:42 +02:00
Christian Borntraeger
6b331952f1 KVM: s390: get rid of constant condition in ipte_unlock_simple
Due to the earlier check we know that ipte_lock_count must be 0.
No need to add a useless if. Let's make clear that we are going
to always wakeup when we execute that code.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-09-10 12:19:38 +02:00
Christian Borntraeger
f346026e55 KVM: s390: unintended fallthrough for external call
We must not fallthrough if the conditions for external call are not met.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
2014-09-10 12:19:30 +02:00
Christian Borntraeger
0349985add KVM: s390: Limit guest size to 16TB
Currently we fill up a full 5 level page table to hold the guest
mapping. Since commit "support gmap page tables with less than 5
levels" we can do better.
Having more than 4 TB might be useful for some testing scenarios,
so let's just limit ourselves to 16TB guest size.
Having more than that is totally untested as I do not have enough
swap space/memory.

We continue to allow ucontrol the full size.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-10 12:19:15 +02:00
Christian Borntraeger
614aeab4dc KVM: s390: add __must_check to interrupt deliver functions
We now propagate interrupt injection errors back to the ioctl. We
should mark functions that might fail with __must_check.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
2014-09-10 12:19:12 +02:00
Tony Krowiak
5102ee8795 KVM: CPACF: Enable MSA4 instructions for kvm guest
We have to provide a per guest crypto block for the CPUs to
enable MSA4 instructions. According to icainfo on z196 or
later this enables CCM-AES-128, CMAC-AES-128, CMAC-AES-192
and CMAC-AES-256.

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]
2014-09-10 12:19:05 +02:00
Alex Bennée
209cf19fcd KVM: fix api documentation of KVM_GET_EMULATED_CPUID
It looks like when this was initially merged it got accidentally included
in the following section. I've just moved it back in the correct section
and re-numbered it as other ioctls have been added since.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-10 11:34:39 +02:00
Alex Bennée
4bd9d3441e KVM: document KVM_SET_GUEST_DEBUG api
In preparation for working on the ARM implementation I noticed the debug
interface was missing from the API document. I've pieced together the
expected behaviour from the code and commit messages written it up as
best I can.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-10 11:33:12 +02:00
Christian Borntraeger
f2a2516088 KVM: remove redundant assignments in __kvm_set_memory_region
__kvm_set_memory_region sets r to EINVAL very early.
Doing it again is not necessary. The same is true later on, where
r is assigned -ENOMEM twice.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-05 12:01:15 +02:00
Christian Borntraeger
a13f533b2f KVM: remove redundant assigment of return value in kvm_dev_ioctl
The first statement of kvm_dev_ioctl is
        long r = -EINVAL;

No need to reassign the same value.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-05 12:01:15 +02:00
Christian Borntraeger
3465611318 KVM: remove redundant check of in_spin_loop
The expression `vcpu->spin_loop.in_spin_loop' is always true,
because it is evaluated only when the condition
`!vcpu->spin_loop.in_spin_loop' is false.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-05 12:01:14 +02:00
Paolo Bonzini
54987b7afa KVM: x86: propagate exception from permission checks on the nested page fault
Currently, if a permission error happens during the translation of
the final GPA to HPA, walk_addr_generic returns 0 but does not fill
in walker->fault.  To avoid this, add an x86_exception* argument
to the translate_gpa function, and let it fill in walker->fault.
The nested_page_fault field will be true, since the walk_mmu is the
nested_mmu and translate_gpu instead operates on the "outer" (NPT)
instance.

Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-05 12:01:13 +02:00
Paolo Bonzini
ef54bcfeea KVM: x86: skip writeback on injection of nested exception
If a nested page fault happens during emulation, we will inject a vmexit,
not a page fault.  However because writeback happens after the injection,
we will write ctxt->eip from L2 into the L1 EIP.  We do not write back
if an instruction caused an interception vmexit---do the same for page
faults.

Suggested-by: Gleb Natapov <gleb@kernel.org>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-05 12:01:06 +02:00