When we get a program interrupt in guest kernel mode, we try to emulate the
instruction.
If that doesn't fail, we report to the user and try again - at the exact same
instruction pointer. So if the guest kernel really does trigger an invalid
instruction, we loop forever.
So let's better go and forward program exceptions to the guest when we don't
know the instruction we're supposed to emulate.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we need to reinject a program interrupt into the guest, we also need to
reinject the corresponding flags into the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
The code to unset HID5.dcbz32 is broken.
This patch makes it do the right rotate magic.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Book3S needs some flags in SRR1 to get to know details about an interrupt.
One such example is the trap instruction. It tells the guest kernel that
a program interrupt is due to a trap using a bit in SRR1.
This patch implements above behavior, making WARN_ON behave like WARN_ON.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.
To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:
A small helper in linear mapped memory that does mtmsr with IR=0 and
then RFIs info the actual handler.
Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Using an RFI in IR=1 is dangerous. We need to set two SRRs and then do an RFI
without getting interrupted at all, because every interrupt could potentially
overwrite the SRR values.
Fortunately, we don't need to RFI in at least this particular case of the code,
so we can just replace it with an mtmsr and b.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
To fetch the last instruction we were interrupted on, we enable DR in early
exit code, where we are still in a very transitional phase between guest
and host state.
Most of the time this seemed to work, but another CPU can easily flush our
TLB and HTAB which makes us go in the Linux page fault handler which totally
breaks because we still use the guest's SLB entries.
To work around that, let's introduce a second KVM guest mode that defines
that whenever we get a trap, we don't call the Linux handler or go into
the KVM exit code, but just jump over the faulting instruction.
That way a potentially bad lwz doesn't trigger any faults and we can later
on interpret the invalid instruction we fetched as "fetch didn't work".
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.
After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.
That way we can drastically improve the readability of the code, make it
less racy and less complex.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We now have helpers for the GPRs, so let's also add some for CR and XER.
Having them in the PACA simplifies code a lot, as we don't need to care
about where to store CC or not to overflow any integers.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
All code in PPC KVM currently accesses gprs in the vcpu struct directly.
While there's nothing wrong with that wrt the current way gprs are stored
and loaded, it doesn't suffice for the PACA acceleration that will follow
in this patchset.
So let's just create little wrapper inline functions that we call whenever
a GPR needs to be read from or written to. The compiled code shouldn't really
change at all for now.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The PowerPC C ABI defines that registers r14-r31 need to be preserved across
function calls. Since our exit handler is written in C, we can make use of that
and don't need to reload r14-r31 on every entry/exit cycle.
This technique is also used in the BookE code and is called "lightweight exits"
there. To follow the tradition, it's called the same in Book3S.
So far this optimization was disabled though, as the code didn't do what it was
expected to do, but failed to work.
This patch fixes and enables lightweight exits again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When we're loading bolted entries into the SLB again, we're checking if an
entry is in use and only slbmte it when it is.
Unfortunately, the check always goes to the skip label of the first entry,
resulting in an endless loop when it actually gets triggered.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Have a pointer to an allocated region inside struct kvm.
[alex: fix ppc book 3s]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Because we now emulate the DEC interrupt according to real life behavior,
there's no need to keep the AGGRESSIVE_DEC hack around.
Let's just remove it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
We treated the DEC interrupt like an edge based one. This is not true for
Book3s. The DEC keeps firing until mtdec is issued again and thus clears
the interrupt line.
So let's implement this logic in KVM too. This patch moves the line clearing
from the firing of the interrupt to the mtdec emulation.
This makes PPC64 guests work without AGGRESSIVE_DEC defined.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
We're using a switch table to find the irqprio that belongs to a specific
interrupt vector. This table is part of the interrupt inject logic.
Since we'll add a new function to stop interrupts, let's move this table
out of the injection logic into a separate function.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Embedded PowerPC KVM has an exit timing implementation to track and evaluate
how much time was spent in which exit path.
For Book3S, we don't implement it. So let's not expose it as a config option
either.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We were shifting the Ks/Kp/N bits one bit too far on mtsrin. It took
me some time to figure that out, so I also put in some debugging and a
comment explaining the conversion.
This fixes current OpenBIOS boot on PPC64 KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently userspace has no chance to find out which virtual address space we're
in and resolve addresses. While that is a big problem for migration, it's also
unpleasent when debugging, as gdb and the monitor don't work on virtual
addresses.
This patch exports enough of the MMU segment state to userspace to make
debugging work and thus also includes the groundwork for migration.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The old BUILD_BUG_ON implementation didn't work with __builtin_constant_p().
Fixing that revealed this test had been inverted for a long time without
anybody noticing...
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).
Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.
To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.
So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Following S390's good example we should use hrtimers for the decrementer too!
This patch converts the timer from the old mechanism to hrtimers.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It looks like the variable "pc" is defined. At least the current code always
failed on me stating that "pc" is already defined somewhere else.
Let's use _pc instead, because that doesn't collide.
Is this the right approach? Does it break on 440 too? If not, why not?
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now we have everything in place to be able to build KVM, so let's add it
as config option and in the Makefile.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To be able to keep KVM as module, we need to export the SLB trampoline
addresses to the module, so it knows where to jump to.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Little opcodes behave differently on desktop and embedded PowerPC cores.
In order to reflect those differences, let's add some #ifdef code to emulate.c.
We could probably also handle them in the core specific emulation files, but I
would prefer to reuse as much code as possible.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We support setting the DEC to a certain value right now. Doing that basically
triggers the CPU local timer.
But there's also an mfdec command that enabled the OS to read the decrementor.
This is required at least by all desktop and server PowerPC Linux kernels. It
can't really hurt to allow embedded ones to do it as well though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are generic parts of PowerPC that can be shared across all
implementations and specific parts that only apply to BookE or desktop PPCs.
This patch adds emulation for desktop specific opcodes that don't apply
to BookE CPUs.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds an implementation for a G3/G4 MMU, so we can run G3 and
G4 guests in KVM on Book3s_64.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To be able to run a guest, we also need to implement a guest MMU.
This patch adds MMU handling for Book3s_64 guests.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We designed the Book3S port of KVM as modular as possible. Most
of the code could be easily used on a Book3S_32 host as well.
The main difference between 32 and 64 bit cores is the MMU. To keep
things well separated, we treat the book3s_64 MMU as one possible compile
option.
This patch adds all the MMU helpers the rest of the code needs in
order to modify the host's MMU, like setting PTEs and segments.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the book3s core handling file. Here everything that is generic to
desktop PowerPC cores is handled, including interrupt injections, MSR settings,
etc.
It basically takes over the same role as booke.c for embedded PowerPCs.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Getting from host state to the guest is only half the story. We also need
to return to our host context and handle whatever happened to get us out of
the guest.
On PowerPC every guest exit is an interrupt. So all we need to do is trap
the host's interrupt handlers and get into our #VMEXIT code to handle it.
PowerPCs also have a register that can add an offset to the interrupt handlers'
adresses which is what the booke KVM code uses. Unfortunately that is a
hypervisor ressource and we also want to be able to run KVM when we're running
in an LPAR. So we have to hook into the Linux interrupt handlers.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is the really low level of guest entry/exit code.
Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
currently aware of.
The segments in the guest differ from the ones on the host, so we need
to switch the SLB to tell the MMU that we're in a new context.
So we store a shadow of the guest's SLB in the PACA, switch to that on
entry and only restore bolted entries on exit, leaving the rest to the
Linux SLB fault handler.
That way we get a really clean way of switching the SLB.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is the of entry / exit code. In order to switch between host and guest
context, we need to switch register state and call the exit code handler on
exit.
This assembly file does exactly that. To finally enter the guest it calls
into book3s_64_slb.S. On exit it gets jumped at from book3s_64_slb.S too.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PowerPC code handles dirty logging in the generic parts atm. While this
is great for "return -ENOTSUPP", we need to be rather target specific
when actually implementing it.
So let's split it to implementation specific code, so we can implement
it for book3s.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
powerpc/nvram: Enable use Generic NVRAM driver for different size chips
powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
powerpc/ps3: Workaround for flash memory I/O error
powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
powerpc/perf_counters: Reduce stack usage of power_check_constraints
powerpc: Fix bug where perf_counters breaks oprofile
powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
powerpc/irq: Improve nanodoc
powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
powerpc/book3e: Add missing page sizes
powerpc/pseries: Fix to handle slb resize across migration
powerpc/powermac: Thermal control turns system off too eagerly
powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
powerpc/405ex: support cuImage via included dtb
powerpc/405ex: provide necessary fixup function to support cuImage
powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
powerpc/44x: Update Arches defconfig
powerpc/44x: Update Arches dts
...
Fix up conflicts in drivers/char/agp/uninorth-agp.c
Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
interface between general code and arch code. kvm_arch_vcpu_runnable()
checks for interrupts instead.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>