Commit Graph

607 Commits

Author SHA1 Message Date
Marcelo Tosatti
42897d866b KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization
TSC initialization will soon make use of online_vcpus.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:14 -02:00
Alexander Graf
0588000eac Merge commit 'origin/queue' into for-queue
Conflicts:
	arch/powerpc/include/asm/Kbuild
	arch/powerpc/include/uapi/asm/Kbuild
2012-10-31 13:36:18 +01:00
Paul Mackerras
9f8c8c7812 KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0
Commit 55b665b026 ("KVM: PPC: Book3S HV: Provide a way for userspace
to get/set per-vCPU areas") includes a check on the length of the
dispatch trace log (DTL) to make sure the buffer is at least one entry
long.  This is appropriate when registering a buffer, but the
interface also allows for any existing buffer to be unregistered by
specifying a zero address.  In this case the length check is not
appropriate.  This makes the check conditional on the address being
non-zero.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:58 +01:00
Paul Mackerras
c7b676709c KVM: PPC: Book3S HV: Fix accounting of stolen time
Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.

In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis.  Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.

To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state.  Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.

Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped.  Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:57 +01:00
Paul Mackerras
8455d79e21 KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
Currently the Book3S HV code implements a policy on multi-threaded
processors (i.e. POWER7) that requires all of the active vcpus in a
virtual core to be ready to run before we run the virtual core.
However, that causes problems on reset, because reset stops all vcpus
except vcpu 0, and can also reduce throughput since all four threads
in a virtual core have to wait whenever any one of them hits a
hypervisor page fault.

This relaxes the policy, allowing the virtual core to run as soon as
any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
between them.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:56 +01:00
Paul Mackerras
2f12f03436 KVM: PPC: Book3S HV: Fixes for late-joining threads
If a thread in a virtual core becomes runnable while other threads
in the same virtual core are already running in the guest, it is
possible for the latecomer to join the others on the core without
first pulling them all out of the guest.  Currently this only happens
rarely, when a vcpu is first started.  This fixes some bugs and
omissions in the code in this case.

First, we need to check for VPA updates for the latecomer and make
a DTL entry for it.  Secondly, if it comes along while the master
vcpu is doing a VPA update, we don't need to do anything since the
master will pick it up in kvmppc_run_core.  To handle this correctly
we introduce a new vcore state, VCORE_STARTING.  Thirdly, there is
a race because we currently clear the hardware thread's hwthread_req
before waiting to see it get to nap.  A latecomer thread could have
its hwthread_req cleared before it gets to test it, and therefore
never increment the nap_count, leading to messages about wait_for_nap
timeouts.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:55 +01:00
Paul Mackerras
913d3ff9a3 KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock
There were a few places where we were traversing the list of runnable
threads in a virtual core, i.e. vc->runnable_threads, without holding
the vcore spinlock.  This extends the places where we hold the vcore
spinlock to cover everywhere that we traverse that list.

Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault,
this moves the call of it from kvmppc_handle_exit out to
kvmppc_vcpu_run, where we don't hold the vcore lock.

In kvmppc_vcore_blocked, we don't actually need to check whether
all vcpus are ceded and don't have any pending exceptions, since the
caller has already done that.  The caller (kvmppc_run_vcpu) wasn't
actually checking for pending exceptions, so we add that.

The change of if to while in kvmppc_run_vcpu is to make sure that we
never call kvmppc_remove_runnable() when the vcore state is RUNNING or
EXITING.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:55 +01:00
Paul Mackerras
7b444c6710 KVM: PPC: Book3S HV: Fix some races in starting secondary threads
Subsequent patches implementing in-kernel XICS emulation will make it
possible for IPIs to arrive at secondary threads at arbitrary times.
This fixes some races in how we start the secondary threads, which
if not fixed could lead to occasional crashes of the host kernel.

This makes sure that (a) we have grabbed all the secondary threads,
and verified that they are no longer in the kernel, before we start
any thread, (b) that the secondary thread loads its vcpu pointer
after clearing the IPI that woke it up (so we don't miss a wakeup),
and (c) that the secondary thread clears its vcpu pointer before
incrementing the nap count.  It also removes unnecessary setting
of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:54 +01:00
Paul Mackerras
512691d490 KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
When a Book3S HV KVM guest is running, we need the host to be in
single-thread mode, that is, all of the cores (or at least all of
the cores where the KVM guest could run) to be running only one
active hardware thread.  This is because of the hardware restriction
in POWER processors that all of the hardware threads in the core
must be in the same logical partition.  Complying with this restriction
is much easier if, from the host kernel's point of view, only one
hardware thread is active.

This adds two hooks in the SMP hotplug code to allow the KVM code to
make sure that secondary threads (i.e. hardware threads other than
thread 0) cannot come online while any KVM guest exists.  The KVM
code still has to check that any core where it runs a guest has the
secondary threads offline, but having done that check it can now be
sure that they will not come online while the guest is running.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:53 +01:00
Alexander Graf
388cf9ee3c KVM: PPC: Move mtspr/mfspr emulation into own functions
The mtspr/mfspr emulation code became quite big over time. Move it
into its own function so things stay more readable.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:51 +01:00
Alexander Graf
e43a028752 KVM: PPC: 44x: fix DCR read/write
When remembering the direction of a DCR transaction, we should write
to the same variable that we interpret on later when doing vcpu_run
again.

Signed-off-by: Alexander Graf <agraf@suse.de>
Cc: stable@vger.kernel.org
2012-10-30 10:54:50 +01:00
Xiao Guangrong
81c52c56e2 KVM: do not treat noslot pfn as a error pfn
This patch filters noslot pfn out from error pfns based on Marcelo comment:
noslot pfn is not a error pfn

After this patch,
- is_noslot_pfn indicates that the gfn is not in slot
- is_error_pfn indicates that the gfn is in slot but the error is occurred
  when translate the gfn to pfn
- is_error_noslot_pfn indicates that the pfn either it is error pfns or it
  is noslot pfn
And is_invalid_pfn can be removed, it makes the code more clean

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-29 20:31:04 -02:00
Marcelo Tosatti
19bf7f8ac3 Merge remote-tracking branch 'master' into queue
Merge reason: development work has dependency on kvm patches merged
upstream.

Conflicts:
	arch/powerpc/include/asm/Kbuild
	arch/powerpc/include/asm/kvm_para.h

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-29 19:15:32 -02:00
Christoffer Dall
8ca40a70a7 KVM: Take kvm instead of vcpu to mmu_notifier_retry
The mmu_notifier_retry is not specific to any vcpu (and never will be)
so only take struct kvm as a parameter.

The motivation is the ARM mmu code that needs to call this from
somewhere where we long let go of the vcpu pointer.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-23 13:35:43 +02:00
Aneesh Kumar K.V
ce236ab576 powerpc: Build fix for powerpc KVM
Fix build failure for powerpc KVM by adding missing VPN_SHIFT definition
and the ';'

arch/powerpc/kvm/book3s_32_mmu_host.c: In function 'kvmppc_mmu_map_page':
arch/powerpc/kvm/book3s_32_mmu_host.c:176: error: 'VPN_SHIFT' undeclared (first use in this function)
arch/powerpc/kvm/book3s_32_mmu_host.c:176: error: (Each undeclared identifier is reported only once
arch/powerpc/kvm/book3s_32_mmu_host.c:176: error: for each function it appears in.)
arch/powerpc/kvm/book3s_32_mmu_host.c:178: error: expected ';' before 'next_pteg'
arch/powerpc/kvm/book3s_32_mmu_host.c:190: error: label 'next_pteg' used but not defined
make[1]: *** [arch/powerpc/kvm/book3s_32_mmu_host.o] Error 1

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-10-18 10:37:52 +11:00
Konstantin Khlebnikov
314e51b985 mm: kill vma flag VM_RESERVED and mm->reserved_vm counter
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

 | effect                 | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump      | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct.  Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:19 +09:00
Julia Lawall
12ecd9570d arch/powerpc/kvm/e500_tlb.c: fix error return code
Convert a 0 error return code to a negative one, as returned elsewhere in the
function.

A new label is also added to avoid freeing things that are known to not yet
be allocated.

A simplified version of the semantic match that finds the first problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret;
expression e,e1,e2,e3,e4,x;
@@

(
if (\(ret != 0\|ret < 0\) || ...) { ... return ...; }
|
ret = 0
)
... when != ret = e1
*x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...);
... when != x = e2
    when != ret = e3
*if (x == NULL || ...)
{
  ... when != ret = e4
*  return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:55 +02:00
Paul Mackerras
55b665b026 KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas
The PAPR paravirtualization interface lets guests register three
different types of per-vCPU buffer areas in its memory for communication
with the hypervisor.  These are called virtual processor areas (VPAs).
Currently the hypercalls to register and unregister VPAs are handled
by KVM in the kernel, and userspace has no way to know about or save
and restore these registrations across a migration.

This adds "register" codes for these three areas that userspace can
use with the KVM_GET/SET_ONE_REG ioctls to see what addresses have
been registered, and to register or unregister them.  This will be
needed for guest hibernation and migration, and is also needed so
that userspace can unregister them on reset (otherwise we corrupt
guest memory after reboot by writing to the VPAs registered by the
previous kernel).

The "register" for the VPA is a 64-bit value containing the address,
since the length of the VPA is fixed.  The "registers" for the SLB
shadow buffer and dispatch trace log (DTL) are 128 bits long,
consisting of the guest physical address in the high (first) 64 bits
and the length in the low 64 bits.

This also fixes a bug where we were calling init_vpa unconditionally,
leading to an oops when unregistering the VPA.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:55 +02:00
Paul Mackerras
a8bd19ef4d KVM: PPC: Book3S: Get/set guest FP regs using the GET/SET_ONE_REG interface
This enables userspace to get and set all the guest floating-point
state using the KVM_[GS]ET_ONE_REG ioctls.  The floating-point state
includes all of the traditional floating-point registers and the
FPSCR (floating point status/control register), all the VMX/Altivec
vector registers and the VSCR (vector status/control register), and
on POWER7, the vector-scalar registers (note that each FP register
is the high-order half of the corresponding VSR).

Most of these are implemented in common Book 3S code, except for VSX
on POWER7.  Because HV and PR differ in how they store the FP and VSX
registers on POWER7, the code for these cases is not common.  On POWER7,
the FP registers are the upper halves of the VSX registers vsr0 - vsr31.
PR KVM stores vsr0 - vsr31 in two halves, with the upper halves in the
arch.fpr[] array and the lower halves in the arch.vsr[] array, whereas
HV KVM on POWER7 stores the whole VSX register in arch.vsr[].

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix whitespace, vsx compilation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:54 +02:00
Paul Mackerras
a136a8bdc0 KVM: PPC: Book3S: Get/set guest SPRs using the GET/SET_ONE_REG interface
This enables userspace to get and set various SPRs (special-purpose
registers) using the KVM_[GS]ET_ONE_REG ioctls.  With this, userspace
can get and set all the SPRs that are part of the guest state, either
through the KVM_[GS]ET_REGS ioctls, the KVM_[GS]ET_SREGS ioctls, or
the KVM_[GS]ET_ONE_REG ioctls.

The SPRs that are added here are:

- DABR:  Data address breakpoint register
- DSCR:  Data stream control register
- PURR:  Processor utilization of resources register
- SPURR: Scaled PURR
- DAR:   Data address register
- DSISR: Data storage interrupt status register
- AMR:   Authority mask register
- UAMOR: User authority mask override register
- MMCR0, MMCR1, MMCRA: Performance monitor unit control registers
- PMC1..PMC8: Performance monitor unit counter registers

In order to reduce code duplication between PR and HV KVM code, this
moves the kvm_vcpu_ioctl_[gs]et_one_reg functions into book3s.c and
centralizes the copying between user and kernel space there.  The
registers that are handled differently between PR and HV, and those
that exist only in one flavor, are handled in kvmppc_[gs]et_one_reg()
functions that are specific to each flavor.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: minimal style fixes]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:54 +02:00
Scott Wood
5bd1cf1185 KVM: PPC: set IN_GUEST_MODE before checking requests
Avoid a race as described in the code comment.

Also remove a related smp_wmb() from booke's kvmppc_prepare_to_enter().
I can't see any reason for it, and the book3s_pr version doesn't have it.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:54 +02:00
Scott Wood
adbb48a854 KVM: PPC: e500: MMU API: fix leak of shared_tlb_pages
This was found by kmemleak.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:53 +02:00
Scott Wood
e400e72f25 KVM: PPC: e500: fix allocation size error on g2h_tlb1_map
We were only allocating half the bytes we need, which was made more
obvious by a recent fix to the memset in  clear_tlb1_bitmap().

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Cc: stable@vger.kernel.org
2012-10-05 23:38:53 +02:00
Paul Mackerras
70bddfefbd KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation
In the case where the host kernel is using a 64kB base page size and
the guest uses a 4k HPTE (hashed page table entry) to map an emulated
MMIO device, we were calculating the guest physical address wrongly.
We were calculating a gfn as the guest physical address shifted right
16 bits (PAGE_SHIFT) but then only adding back in 12 bits from the
effective address, since the HPTE had a 4k page size.  Thus the gpa
reported to userspace was missing 4 bits.

Instead, we now compute the guest physical address from the HPTE
without reference to the host page size, and then compute the gfn
by shifting the gpa right PAGE_SHIFT bits.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:53 +02:00
Paul Mackerras
964ee98ccd KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs
When making a vcpu non-runnable we incorrectly changed the
thread IDs of all other threads on the core, just remove that
code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:52 +02:00
Paul Mackerras
a47d72f361 KVM: PPC: Book3S HV: Fix updates of vcpu->cpu
This removes the powerpc "generic" updates of vcpu->cpu in load and
put, and moves them to the various backends.

The reason is that "HV" KVM does its own sauce with that field
and the generic updates might corrupt it. The field contains the
CPU# of the -first- HW CPU of the core always for all the VCPU
threads of a core (the one that's online from a host Linux
perspective).

However, the preempt notifiers are going to be called on the
threads VCPUs when they are running (due to them sleeping on our
private waitqueue) causing unload to be called, potentially
clobbering the value.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:52 +02:00
Paul Mackerras
dfe49dbd1f KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
This adds an implementation of kvm_arch_flush_shadow_memslot for
Book3S HV, and arranges for kvmppc_core_commit_memory_region to
flush the dirty log when modifying an existing slot.  With this,
we can handle deletion and modification of memory slots.

kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which
on Book3S HV now traverses the reverse map chains to remove any HPT
(hashed page table) entries referring to pages in the memslot.  This
gets called by generic code whenever deleting a memslot or changing
the guest physical address for a memslot.

We flush the dirty log in kvmppc_core_commit_memory_region for
consistency with what x86 does.  We only need to flush when an
existing memslot is being modified, because for a new memslot the
rmap array (which stores the dirty bits) is all zero, meaning that
every page is considered clean already, and when deleting a memslot
we obviously don't care about the dirty bits any more.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:51 +02:00
Paul Mackerras
a66b48c3a3 KVM: PPC: Move kvm->arch.slot_phys into memslot.arch
Now that we have an architecture-specific field in the kvm_memory_slot
structure, we can use it to store the array of page physical addresses
that we need for Book3S HV KVM on PPC970 processors.  This reduces the
size of struct kvm_arch for Book3S HV, and also reduces the size of
struct kvm_arch_memory_slot for other PPC KVM variants since the fields
in it are now only compiled in for Book3S HV.

This necessitates making the kvm_arch_create_memslot and
kvm_arch_free_memslot operations specific to each PPC KVM variant.
That in turn means that we now don't allocate the rmap arrays on
Book3S PR and Book E.

Since we now unpin pages and free the slot_phys array in
kvmppc_core_free_memslot, we no longer need to do it in
kvmppc_core_destroy_vm, since the generic code takes care to free
all the memslots when destroying a VM.

We now need the new memslot to be passed in to
kvmppc_core_prepare_memory_region, since we need to initialize its
arch.slot_phys member on Book3S HV.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:51 +02:00
Paul Mackerras
2c9097e4c1 KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots
The generic KVM code uses SRCU (sleeping RCU) to protect accesses
to the memslots data structures against updates due to userspace
adding, modifying or removing memory slots.  We need to do that too,
both to avoid accessing stale copies of the memslots and to avoid
lockdep warnings.  This therefore adds srcu_read_lock/unlock pairs
around code that accesses and uses memslots.

Since the real-mode handlers for H_ENTER, H_REMOVE and H_BULK_REMOVE
need to access the memslots, and we don't want to call the SRCU code
in real mode (since we have no assurance that it would only access
the linear mapping), we hold the SRCU read lock for the VM while
in the guest.  This does mean that adding or removing memory slots
while some vcpus are executing in the guest will block for up to
two jiffies.  This tradeoff is acceptable since adding/removing
memory slots only happens rarely, while H_ENTER/H_REMOVE/H_BULK_REMOVE
are performance-critical hot paths.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:51 +02:00
Alexander Graf
7a08c2740f KVM: PPC: BookE: Support FPU on non-hv systems
When running on HV aware hosts, we can not trap when the guest sets the FP
bit, so we just let it do so when it wants to, because it has full access to
MSR.

For non-HV aware hosts with an FPU (like 440), we need to also adjust the
shadow MSR though. Otherwise the guest gets an FP unavailable trap even when
it really enabled the FP bit in MSR.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:50 +02:00
Alexander Graf
ceb985f9d1 KVM: PPC: 440: Implement mfdcrx
We need mfdcrx to execute properly on 460 cores.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:49 +02:00
Alexander Graf
e4dcfe88fb KVM: PPC: 440: Implement mtdcrx
We need mtdcrx to execute properly on 460 cores.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:49 +02:00
Alexander Graf
430c7ff52f KVM: PPC: E500: Remove E500_TLB_DIRTY flag
Since we always mark pages as dirty immediately when mapping them read/write
now, there's no need for the dirty flag in our cache.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:48 +02:00
Alexander Graf
166a2b7000 KVM: PPC: Use symbols for exit trace
Exit traces are a lot easier to read when you don't have to remember
cryptic numbers for guest exit reasons. Symbolify them in our trace
output.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:48 +02:00
Alexander Graf
50c871edf5 KVM: PPC: BookE: Add MCSR SPR support
Add support for the MCSR SPR. This only implements the SPR storage
bits, not actual machine checks.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:48 +02:00
Alexander Graf
491dd5b8a4 KVM: PPC: 44x: Initialize PVR
We need to make sure that vcpu->arch.pvr is initialized to a sane value,
so let's just take the host PVR.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:47 +02:00
Bharat Bhushan
6df8d3fc58 booke: Added ONE_REG interface for IAC/DAC debug registers
IAC/DAC are defined as 32 bit while they are 64 bit wide. So ONE_REG
interface is added to set/get them.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:47 +02:00
Bharat Bhushan
f61c94bb99 KVM: PPC: booke: Add watchdog emulation
This patch adds the watchdog emulation in KVM. The watchdog
emulation is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_BOOKE_WATCHDOG) ioctl.
The kernel timer are used for watchdog emulation and emulates
h/w watchdog state machine. On watchdog timer expiry, it exit to QEMU
if TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how
it is configured.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
[bharat.bhushan@freescale.com: reworked patch]
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
[agraf: adjust to new request framework]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:47 +02:00
Alexander Graf
7c973a2ebb KVM: PPC: Add return value to core_check_requests
Requests may want to tell us that we need to go back into host state,
so add a return value for the checks.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:46 +02:00
Alexander Graf
7ee788556b KVM: PPC: Add return value in prepare_to_enter
Our prepare_to_enter helper wants to be able to return in more circumstances
to the host than only when an interrupt is pending. Broaden the interface a
bit and move even more generic code to the generic helper.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:46 +02:00
Alexander Graf
206c2ed7f1 KVM: PPC: Ignore EXITING_GUEST_MODE mode
We don't need to do anything when mode is EXITING_GUEST_MODE, because
we essentially are outside of guest mode and did everything it asked
us to do by the time we check it.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:46 +02:00
Alexander Graf
3766a4c693 KVM: PPC: Move kvm_guest_enter call into generic code
We need to call kvm_guest_enter in booke and book3s, so move its
call to generic code.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:45 +02:00
Alexander Graf
bd2be6836e KVM: PPC: Book3S: PR: Rework irq disabling
Today, we disable preemption while inside guest context, because we need
to expose to the world that we are not in a preemptible context. However,
during that time we already have interrupts disabled, which would indicate
that we are in a non-preemptible context.

The reason the checks for irqs_disabled() fail for us though is that we
manually control hard IRQs and ignore all the lazy EE framework. Let's
stop doing that. Instead, let's always use lazy EE to indicate when we
want to disable IRQs, but do a special final switch that gets us into
EE disabled, but soft enabled state. That way when we get back out of
guest state, we are immediately ready to process interrupts.

This simplifies the code drastically and reduces the time that we appear
as preempt disabled.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:45 +02:00
Alexander Graf
24afa37b9c KVM: PPC: Consistentify vcpu exit path
When getting out of __vcpu_run, let's be consistent about the state we
return in. We want to always

  * have IRQs enabled
  * have called kvm_guest_exit before

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:45 +02:00
Alexander Graf
0652eaaebe KVM: PPC: Book3S: PR: Indicate we're out of guest mode
When going out of guest mode, indicate that we are in vcpu->mode. That way
requests from other CPUs don't needlessly need to kick us to process them,
because it'll just happen next time we enter the guest.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:44 +02:00
Alexander Graf
706fb730cb KVM: PPC: Exit guest context while handling exit
The x86 implementation of KVM accounts for host time while processing
guest exits. Do the same for us.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:43 +02:00
Alexander Graf
c63ddcb454 KVM: PPC: Book3S: PR: Only do resched check once per exit
Now that we use our generic exit helper, we can safely drop our previous
kvm_resched that we used to trigger at the beginning of the exit handler
function.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:43 +02:00
Alexander Graf
e85ad380c6 KVM: PPC: BookE: Drop redundant vcpu->mode set
We only need to set vcpu->mode to outside once.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:43 +02:00
Alexander Graf
9b0cb3c808 KVM: PPC: Book3s: PR: Add (dumb) MMU Notifier support
Now that we have very simple MMU Notifier support for e500 in place,
also add the same simple support to book3s. It gets us one step closer
to actual fast support.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:43 +02:00
Alexander Graf
03d25c5bd5 KVM: PPC: Use same kvmppc_prepare_to_enter code for booke and book3s_pr
We need to do the same things when preparing to enter a guest for booke and
book3s_pr cores. Fold the generic code into a generic function that both call.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-05 23:38:42 +02:00