linux/arch/x86/kvm
Nadav Har'El b6f1250edb KVM: nVMX: Correct handling of interrupt injection
The code in this patch correctly emulates external-interrupt injection
while a nested guest L2 is running.

Because of this code's relative un-obviousness, I include here a longer-than-
usual justification for what it does - much longer than the code itself ;-)

To understand how to correctly emulate interrupt injection while L2 is
running, let's look first at what we need to emulate: How would things look
like if the extra L0 hypervisor layer is removed, and instead of L0 injecting
an interrupt, we had hardware delivering an interrupt?

Now we have L1 running on bare metal with a guest L2, and the hardware
generates an interrupt. Assuming that L1 set PIN_BASED_EXT_INTR_MASK to 1, and
VM_EXIT_ACK_INTR_ON_EXIT to 0 (we'll revisit these assumptions below), what
happens now is this: The processor exits from L2 to L1, with an external-
interrupt exit reason but without an interrupt vector. L1 runs, with
interrupts disabled, and it doesn't yet know what the interrupt was. Soon
after, it enables interrupts and only at that moment, it gets the interrupt
from the processor. when L1 is KVM, Linux handles this interrupt.

Now we need exactly the same thing to happen when that L1->L2 system runs
on top of L0, instead of real hardware. This is how we do this:

When L0 wants to inject an interrupt, it needs to exit from L2 to L1, with
external-interrupt exit reason (with an invalid interrupt vector), and run L1.
Just like in the bare metal case, it likely can't deliver the interrupt to
L1 now because L1 is running with interrupts disabled, in which case it turns
on the interrupt window when running L1 after the exit. L1 will soon enable
interrupts, and at that point L0 will gain control again and inject the
interrupt to L1.

Finally, there is an extra complication in the code: when nested_run_pending,
we cannot return to L1 now, and must launch L2. We need to remember the
interrupt we wanted to inject (and not clear it now), and do it on the
next exit.

The above explanation shows that the relative strangeness of the nested
interrupt injection code in this patch, and the extra interrupt-window
exit incurred, are in fact necessary for accurate emulation, and are not
just an unoptimized implementation.

Let's revisit now the two assumptions made above:

If L1 turns off PIN_BASED_EXT_INTR_MASK (no hypervisor that I know
does, by the way), things are simple: L0 may inject the interrupt directly
to the L2 guest - using the normal code path that injects to any guest.
We support this case in the code below.

If L1 turns on VM_EXIT_ACK_INTR_ON_EXIT, things look very different from the
description above: L1 expects to see an exit from L2 with the interrupt vector
already filled in the exit information, and does not expect to be interrupted
again with this interrupt. The current code does not (yet) support this case,
so we do not allow the VM_EXIT_ACK_INTR_ON_EXIT exit-control to be turned on
by L1.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:17 +03:00
..
emulate.c KVM: fix uninitialized warning 2011-07-12 11:45:06 +03:00
i8254.c KVM: fix typo in copyright notice 2010-10-24 10:53:14 +02:00
i8254.h KVM: remove useless function declaration kvm_inject_pit_timer_irqs() 2011-05-11 07:57:09 -04:00
i8259.c KVM: remove isr_ack logic from PIC 2011-03-17 13:08:30 -03:00
irq.c KVM: fix typo in copyright notice 2010-10-24 10:53:14 +02:00
irq.h KVM: remove useless function declarations from file arch/x86/kvm/irq.h 2011-05-11 07:57:09 -04:00
Kconfig KVM: Halt vcpu if page it tries to access is swapped out 2011-01-12 11:21:39 +02:00
kvm_cache_regs.h KVM: Fetch guest cr3 from hardware on demand 2011-01-12 11:31:16 +02:00
kvm_timer.h KVM: arch/x86/kvm/kvm_timer.h checkpatch cleanup 2010-05-17 12:14:42 +03:00
lapic.c KVM: x86: Remove useless regs_page pointer from kvm_lapic 2011-03-17 13:08:33 -03:00
lapic.h KVM: x86: Remove useless regs_page pointer from kvm_lapic 2011-03-17 13:08:33 -03:00
Makefile KVM: x86: Makefile clean up 2011-01-12 11:29:08 +02:00
mmu_audit.c KVM: MMU: audit: allow audit more guests at the same time 2011-01-12 11:31:17 +02:00
mmu.c KVM: MMU: cleanup for dropping parent pte 2011-07-12 11:45:07 +03:00
mmu.h KVM: MMU: Don't track nested fault info in error-code 2010-10-24 10:52:55 +02:00
mmutrace.h KVM: MMU: support disable/enable mmu audit dynamicly 2010-10-24 10:51:56 +02:00
paging_tmpl.h KVM: MMU: Fix build warnings in walk_addr_generic() 2011-06-19 19:23:13 +03:00
svm.c KVM: nVMX: Allow setting the VMXE bit in CR4 2011-07-12 11:45:10 +03:00
timer.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
trace.h tracing: Fix event alignment: kvm:kvm_hv_hypercall 2011-03-10 10:34:24 -05:00
tss.h KVM: x86: hardware task switching support 2008-04-27 12:00:39 +03:00
vmx.c KVM: nVMX: Correct handling of interrupt injection 2011-07-12 11:45:17 +03:00
x86.c KVM: nVMX: Implement VMPTRST 2011-07-12 11:45:13 +03:00
x86.h KVM: nVMX: Implement VMPTRST 2011-07-12 11:45:13 +03:00