This patch allows a guest to use the VMXON and VMXOFF instructions, and
emulates them accordingly. Basically this amounts to checking some
prerequisites, and then remembering whether the guest has enabled or disabled
VMX operation.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch adds to kvm_intel a module option "nested". This option controls
whether the guest can use VMX instructions, i.e., whether we allow nested
virtualization. A similar, but separate, option already exists for the
SVM module.
This option currently defaults to 0, meaning that nested VMX must be
explicitly enabled by giving nested=1. When nested VMX matures, the default
should probably be changed to enable nested VMX by default - just like
nested SVM is currently enabled by default.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
vmcs_readl() and friends are really short, but gcc thinks they are long because of
the out-of-line exception handlers. Mark them always_inline to clear the
misunderstanding.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We clean up a failed VMREAD by clearing the output register. Do
it in the exception handler instead of unconditionally. This is
worthwhile since there are more than a hundred call sites.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Only decache guest CR3 value if vcpu->arch.cr3 is stale.
Fixes loadvm with live guest.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-by: Markus Schade <markus.schade@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Since the emulator now checks segment limits and access rights, it
generates a lot more accesses to the vmcs segment fields. Undo some
of the performance hit by cacheing those fields in a read-only cache
(the entire cache is invalidated on any write, or on guest exit).
Signed-off-by: Avi Kivity <avi@redhat.com>
When doing a soft int, we need to bump eip before pushing it to
the stack. Otherwise we'll do the int a second time.
[apw@canonical.com: merged eip update as per Jan's recommendation.]
Signed-off-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
In case certain allocations fail, vmx_create_vcpu may return 0 as error
instead of a negative value encoded via ERR_PTR. This causes a NULL
pointer dereferencing later on in kvm_vm_ioctl_vcpu_create.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
With TSC scaling in SVM the tsc-offset needs to be
calculated differently. This patch propagates this
calculation into the architecture specific modules so that
this complexity can be handled there.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements a call-back into the architecture code
to allow the propagation of changes to the virtual tsc_khz
of the vcpu.
On SVM it updates the tsc_ratio variable, on VMX it does
nothing.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds a callback into kvm_x86_ops so that svm and
vmx code can do intercept checks on emulated instructions.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Use vmx_set_nmi_mask() instead of open-coding management of
the hardware bit and the software hint (nmi_known_unmasked).
There's a slight change of behaviour when running without
hardware virtual NMI support - we now clear the NMI mask if
NMI delivery faulted in that case as well. This improves
emulation accuracy.
Signed-off-by: Avi Kivity <avi@redhat.com>
When we haven't injected an interrupt, we don't need to recover
the nmi blocking state (since the guest can't set it by itself).
This allows us to avoid a VMREAD later on.
Signed-off-by: Avi Kivity <avi@redhat.com>
We may read the cpl quite often in the same vmexit (instruction privilege
check, memory access checks for instruction and operands), so we gain
a bit if we cache the value.
Signed-off-by: Avi Kivity <avi@redhat.com>
In long mode, vm86 mode is disallowed, so we need not check for
it. Reading rflags.vm may require a VMREAD, so it is expensive.
Signed-off-by: Avi Kivity <avi@redhat.com>
Some rflags bits are owned by the host, not guest, so we need to use
kvm_get_rflags() to strip those bits away or kvm_set_rflags() to add them
back.
Signed-off-by: Avi Kivity <avi@redhat.com>
Commit 6440e5967bc broke old userspaces that do not set tss address
before entering vcpu. Unbreak it by setting tss address to a safe
value on the first vcpu entry. New userspaces should set tss address,
so print warning in case it doesn't.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Access to this page is mostly done through the regs member which holds
the address to this page. The exceptions are in vmx_vcpu_reset() and
kvm_free_lapic() and these both can easily be converted to using regs.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently vm86 task is initialized on each real mode entry and vcpu
reset. Initialization is done by zeroing TSS and updating relevant
fields. But since all vcpus are using the same TSS there is a race where
one vcpu may use TSS while other vcpu is initializing it, so the vcpu
that uses TSS will see wrong TSS content and will behave incorrectly.
Fix that by initializing TSS only once.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When rmode.vm86 is active TR descriptor is updated with vm86 task values,
but selector is left intact. vmx_set_segment() makes sure that if TR
register is written into while vm86 is active the new values are saved
for use after vm86 is deactivated, but since selector is not updated on
vm86 activation/deactivation new value is lost. Fix this by writing new
selector into vmcs immediately.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The changelog of 104f226 said "adds the __noclone attribute",
but it was missing in its patch. I think it is still needed.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch fixes the logic used to detect whether BIOS has disabled VMX, for
the case where VMX is enabled only under SMX, but tboot is not active.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of exchanging the guest and host rcx, have separate storage
for each. This allows us to avoid using the xchg instruction, which
is is a little slower than normal operations.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Change
push top-of-stack
pop guest-rcx
pop dummy
to
pop guest-rcx
which is the same thing, only simpler.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
On some CPUs, a ple_gap of 41 is simply insufficient to ever trigger
PLE exits, even with the minimalistic PLE test from kvm-unit-tests.
http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=commitdiff;h=eda71b28fa122203e316483b35f37aaacd42f545
For example, the Xeon X5670 CPU needs a ple_gap of at least 48 in
order to get pause loop exits:
# modprobe kvm_intel ple_gap=47
# taskset 1 /usr/local/bin/qemu-system-x86_64 \
-device testdev,chardev=log -chardev stdio,id=log \
-kernel x86/vmexit.flat -append ple-round-robin -smp 2
VNC server running on `::1:5900'
enabling apic
enabling apic
ple-round-robin 58298446
# rmmod kvm_intel
# modprobe kvm_intel ple_gap=48
# taskset 1 /usr/local/bin/qemu-system-x86_64 \
-device testdev,chardev=log -chardev stdio,id=log \
-kernel x86/vmexit.flat -append ple-round-robin -smp 2
VNC server running on `::1:5900'
enabling apic
enabling apic
ple-round-robin 36616
Increase the ple_gap to 128 to be on the safe side.
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Zhai, Edwin <edwin.zhai@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When emulating real mode, we fake some state:
- tr.base points to a fake vm86 tss
- segment registers are made to conform to vm86 restrictions
change vmx_get_segment() not to expose this fake state to userspace;
instead, return the original state.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When emulating real mode we play with tr hidden state, but leave
tr.selector alone. That works well, except for save/restore, since
loading TR writes it to the hidden state in vmx->rmode.
Fix by also saving and restoring the tr selector; this makes things
more consistent and allows migration to work during the early
boot stages of Windows XP.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of syncing the guest cr3 every exit, which is expensince on vmx
with ept enabled, sync it only on demand.
[sheng: fix incorrect cr3 seen by Windows XP]
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
'error' is byte sized, so use a byte register constraint.
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When NX is enabled on the host but not on the guest, we use the entry/exit
msr load facility, which is slow. Optimize it to use entry/exit efer load,
which is ~1200 cycles faster.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In case of a nested page fault or an intercepted #PF newer SVM
implementations provide a copy of the faulting instruction bytes
in the VMCB.
Use these bytes to feed the instruction emulator and avoid the costly
guest instruction fetch in this case.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
emulate_instruction had many callers, but only one used all
parameters. One parameter was unused, another one is now
hidden by a wrapper function (required for a future addition
anyway), so most callers use now a shorter parameter list.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
move the complete_insn_gp() helper function out of the VMX part
into the generic x86 part to make it usable by SVM.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The handling of CR8 writes in KVM is currently somewhat cumbersome.
This patch makes it look like the other CR register handlers
and fixes a possible issue in VMX, where the RIP would be incremented
despite an injected #GP.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In certain use-cases, we want to allocate guests fixed time slices where idle
guest cycles leave the machine idling. There are many approaches to achieve
this but the most direct is to simply avoid trapping the HLT instruction which
lets the guest directly execute the instruction putting the processor to sleep.
Introduce this as a module-level option for kvm-vmx.ko since if you do this
for one guest, you probably want to do it for all.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
If we execute VMREAD during reboot we'll just skip over it. Instead of
returning garbage, return 0, which has a much smaller chance of confusing
the code. Otherwise we risk a flood of debug printk()s which block the
reboot process if a serial console or netconsole is enabled.
Signed-off-by: Avi Kivity <avi@redhat.com>
The exit reason alone is insufficient to understand exactly why an exit
occured; add ISA-specific trace parameters for additional information.
Because fetching these parameters is expensive on vmx, and because these
parameters are fetched even if tracing is disabled, we fetch the
parameters via a callback instead of as traditional trace arguments.
Signed-off-by: Avi Kivity <avi@redhat.com>
exit_reason's meaning depend on the instruction set; record it so a trace
taken on one machine can be interpreted on another.
Signed-off-by: Avi Kivity <avi@redhat.com>
cea15c2 ("KVM: Move KVM context switch into own function") split vmx_vcpu_run()
to prevent multiple copies of the context switch from being generated (causing
problems due to a label). This patch folds them back together again and adds
the __noclone attribute to prevent the label from being duplicated.
Signed-off-by: Avi Kivity <avi@redhat.com>
Inform user to either disable TXT in the BIOS or do TXT launch
with tboot before enabling KVM since some BIOSes do not set
FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled.
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
While not mandated by the spec, Linux relies on NMI being blocked by an
IF-enabling STI. VMX also refuses to enter a guest in this state, at
least on some implementations.
Disallow NMI while blocked by STI by checking for the condition, and
requesting an interrupt window exit if it occurs.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently the exit is unhandled, so guest halts with error if it tries
to execute INVD instruction. Call into emulator when INVD instruction
is executed by a guest instead. This instruction is not needed by ordinary
guests, but firmware (like OpenBIOS) use it and fail.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The EPT present/writable bits use the same position as normal
pagetable bits.
Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting
the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte.
Also pass present/writable error code information from EPT violation
to generic pagefault handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
gcc 4.5 with some special options is able to duplicate the VMX
context switch asm in vmx_vcpu_run(). This results in a compile error
because the inline asm sequence uses an on local label. The non local
label is needed because other code wants to set up the return address.
This patch moves the asm code into an own function and marks
that explicitely noinline to avoid this problem.
Better would be probably to just move it into an .S file.
The diff looks worse than the change really is, it's all just
code movement and no logic change.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
To support xsave properly for the guest the SVM module need
software support for it. As long as this is not present do
not report the xsave as supported feature in cpuid.
As a side-effect this patch moves the bit() helper function
into the x86.h file so that it can be used in svm.c too.
KVM-Stable-Tag.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We now use load_gs_index() to load gs safely; unfortunately this also
changes MSR_KERNEL_GS_BASE, which we managed separately. This resulted
in confusion and breakage running 32-bit host userspace on a 64-bit kernel.
Fix by
- saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs
- doing the host save/load unconditionally, instead of only when in guest
long mode
Things can be cleaned up further, but this is the minmal fix for now.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If fs or gs refer to the ldt, they must be reloaded after the ldt. Reorder
the code to that effect.
Userspace code that uses the ldt with kvm is nonexistent, so this doesn't fix
a user-visible bug.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
By chance this caused no harm so far. We overwrite AX during switch
to/from guest context, so we must declare this.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
If an interrupt is pending, we need to stop emulation so we
can inject it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Replace the inject-as-software-interrupt hack we currently have with
emulated injection.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Change the interrupt injection code to work from preemptible, interrupts
enabled context. This works by adding a ->cancel_injection() operation
that undoes an injection in case we were not able to actually enter the guest
(this condition could never happen with atomic injection).
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently vmx_complete_interrupts() can decode event information from vmx
exit fields into the generic kvm event queues. Make it able to decode
the information from the entry fields as well by parametrizing it.
Signed-off-by: Avi Kivity <avi@redhat.com>
vmx_complete_interrupts() does too much, split it up:
- vmx_vcpu_run() gets the "cache important vmcs fields" part
- a new vmx_complete_atomic_exit() gets the parts that must be done atomically
- a new vmx_recover_nmi_blocking() does what its name says
- vmx_complete_interrupts() retains the event injection recovery code
This helps in reducing the work done in atomic context.
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of blindly attempting to inject an event before each guest entry,
check for a possible event first in vcpu->requests. Sites that can trigger
event injection are modified to set KVM_REQ_EVENT:
- interrupt, nmi window opening
- ppr updates
- i8259 output changes
- local apic irr changes
- rflags updates
- gif flag set
- event set on exit
This improves non-injecting entry performance, and sets the stage for
non-atomic injection.
Signed-off-by: Avi Kivity <avi@redhat.com>
This function need to be able to load the pdptrs from any
mmu context currently in use. So change this function to
take an kvm_mmu parameter to fit these needs.
As a side effect this patch also moves the cached pdptrs
from vcpu_arch into the kvm_mmu struct.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch introduces a special set_tdp_cr3 function pointer
in kvm_x86_ops which is only used for tpd enabled mmu
contexts. This allows to remove some hacks from svm code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Move the TSC control logic from the vendor backends into x86.c
by adding adjust_tsc_offset to x86 ops. Now all TSC decisions
can be done in one place.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Also, ensure that the storing of the offset and the reading of the TSC
are never preempted by taking a spinlock. While the lock is overkill
now, it is useful later in this patch series.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Change svm / vmx to be the same internally and write TSC offset
instead of bare TSC in helper functions. Isolated as a single
patch to contain code movement.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This is used only by the VMX code, and is not done properly;
if the TSC is indeed backwards, it is out of sync, and will
need proper handling in the logic at each and every CPU change.
For now, drop this test during init as misguided.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Now that we have the host gdt conveniently stored in a variable, make use
of it instead of querying the cpu.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
kvm reloads the host's fs and gs blindly, however the underlying segment
descriptors may be invalid due to the user modifying the ldt after loading
them.
Fix by using the safe accessors (loadsegment() and load_gs_index()) instead
of home grown unsafe versions.
This is CVE-2010-3698.
KVM-Stable-Tag.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
vmx does not restore GDT.LIMIT to the host value, instead it sets it to 64KB.
This means host userspace can learn a few bits of host memory.
Fix by reloading GDTR when we load other host state.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Commit 341d9b535b6c simplify reload logic while entry guest mode, it
can avoid unnecessary sync-root if KVM_REQ_MMU_RELOAD and
KVM_REQ_MMU_SYNC both set.
But, it cause a issue that when we handle 'KVM_REQ_TLB_FLUSH', the
root is invalid, it is triggered during my test:
Kernel BUG at ffffffffa00212b8 [verbose debug info unavailable]
......
Fixed by directly return if the root is not ready.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Some guest device driver may leverage the "Non-Snoop" I/O, and explicitly
WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or
CLFLUSH, we need to maintain data consistency either by:
1: flushing cache (wbinvd) when the guest is scheduled out if there is no
wbinvd exit, or
2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
On Intel, we call skip_emulated_instruction() even if we injected a #GP,
resulting in the #GP pointing at the wrong address.
Fix by injecting the exception and skipping the instruction at the same place,
so we can do just one or the other.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
On Intel, we call skip_emulated_instruction() even if we injected a #GP,
resulting in the #GP pointing at the wrong address.
Fix by injecting the exception and skipping the instruction at the same place,
so we can do just one or the other.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
On Intel, we call skip_emulated_instruction() even if we injected a #GP,
resulting in the #GP pointing at the wrong address.
Fix by injecting the exception and skipping the instruction at the same place,
so we can do just one or the other.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch enable guest to use XSAVE/XRSTOR instructions.
We assume that host_xcr0 would use all possible bits that OS supported.
And we loaded xcr0 in the same way we handled fpu - do it as late as we can.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The name "pid_sync_vcpu_all" isn't appropriate since it just affect
a single vpid, so rename it to vpid_sync_vcpu_single().
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
According to SDM, we need check whether single-context INVVPID type is supported
before issuing invvpid instruction.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The vmexit handler returns KVM_EXIT_UNKNOWN since there is no handler
for vmentry failures. This intercepts vmentry failures and returns
KVM_FAIL_ENTRY to userspace instead.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Memory allocation may fail. Propagate such errors.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
should be called after doing VMCLEAR.
Therefore in vmm coexistence case, we should firstly call VMXON
before any VMCS operation, and then call VMXOFF after the
operation is done.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Do some preparations for vmm coexistence support.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm
code. Also move VMXE bit operation out of kvm_cpu_vmxoff().
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Do not kill VM when instruction emulation fails. Inject #UD and report
failure to userspace instead. Userspace may choose to reenter guest if
vcpu is in userspace (cpl == 3) in which case guest OS will kill
offending process and continue running.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
cr0.ts may change between entries, so we copy cr0 to HOST_CR0 before each
entry. That is slow, so instead, set HOST_CR0 to have TS set unconditionally
(which is a safe value), and issue a clts() just before exiting vcpu context
if the task indeed owns the fpu.
Saves ~50 cycles/exit.
Signed-off-by: Avi Kivity <avi@redhat.com>