KVM avoids reloading the efer msr when the difference between the guest
and host values consist of the long mode bits (which are switched by
hardware) and the NX bit (which is emulated by the KVM MMU).
This patch also allows KVM to ignore SCE (syscall enable) when the guest
is running in 32-bit mode. This is because the syscall instruction is
not available in 32-bit mode on Intel processors, so the SCE bit is
effectively meaningless.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move emulate_ctxt to kvm_vcpu to keep emulate context when we exit from kvm
module. Call x86_decode_insn() only when needed. Modify x86_emulate_insn() to
not modify the context if it must be re-entered.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
emulate_instruction() calls now x86_decode_insn() and x86_emulate_insn().
x86_emulate_insn() is x86_emulate_memop() without the decoding part.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Split the decoding process into a new function x86_decode_insn().
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move all x86_emulate_memop() common variables between decode and execute to a
structure decode_cache. This will help in later separating decode and
emulate.
struct decode_cache {
u8 twobyte;
u8 b;
u8 lock_prefix;
u8 rep_prefix;
u8 op_bytes;
u8 ad_bytes;
struct operand src;
struct operand dst;
unsigned long *override_base;
unsigned int d;
unsigned long regs[NR_VCPU_REGS];
unsigned long eip;
/* modrm */
u8 modrm;
u8 modrm_mod;
u8 modrm_reg;
u8 modrm_rm;
u8 use_modrm_ea;
unsigned long modrm_ea;
unsigned long modrm_val;
};
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch refactors the current hypercall infrastructure to better
support live migration and SMP. It eliminates the hypercall page by
trapping the UD exception that would occur if you used the wrong hypercall
instruction for the underlying architecture and replacing it with the right
one lazily.
A fall-out of this patch is that the unhandled hypercalls no longer trap to
userspace. There is very little reason though to use a hypercall to
communicate with userspace as PIO or MMIO can be used. There is no code
in tree that uses userspace hypercalls.
[avi: fix #ud injection on vmx]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add vmmcall/vmcall to x86_emulate. Future patch will implement functionality
for these instructions.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
There's no need for the *_MASK flags (TF_MASK, IF_MASK, etc), found in
processor.h (both _32 and _64). They have a one-to-one mapping with the
EFLAGS value. This patch removes the definitions, and use the already
existent X86_EFLAGS_ version when applicable.
[ roland@redhat.com: KVM build fixes. ]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch unifies struct desc_ptr between i386 and x86_64.
They can be expressed in the exact same way in C code, only
having to change the name of one of them. As Xgt_desc_struct
is ugly and big, this is the one that goes away.
There's also a padding field in i386, but it is not really
needed in the C structure definition.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The clts code didn't use set_cr0 properly, so our lazy FPU
processing wasn't being done by the clts instruction at all.
(this isn't called on Intel as the hardware does the decode for us)
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
emulator_write_std() is not implemented, and calling write_emulated should
work just as well in place of write_std.
Fixes emulator failures with the push r/m instruction.
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
this make sure that no matter what is the operand size,
all the value of the eip will be saved
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Change JMP_REL to call to register_address_increment(): the operands size
should not effect the calculation of the eip, instead the ad_bytes should
affect it.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
'invd' can destroy host data, and 'wbinvd' allows the guest to induce
long (milliseconds) latencies.
Noted by Ben Serebrin.
Signed-off-by: Avi Kivity <avi@qumranet.com>
If we stgi() too soon, nmis can reach the processor even though interrupts
are disabled, catching it in a half-switched state. Delay the stgi() until
we're done switching.
Signed-off-by: Avi Kivity <avi@qumranet.com>
'push imm8' found itself in the wrong switch somehow, so it is never executed.
This fixes Windows 2003 installation.
Signed-off-by: Avi Kivity <avi@qumranet.com>
In kvm_flush_remote_tlbs(), replace a loop using smp_call_function_single()
by a single call to smp_call_function_mask() (which is new for x86_64).
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Better handle wrap-around cases when reading the APIC CCR
(current count register). Also, if ICR is 0, CCR should also
be 0... previously reading CCR before setting ICR would result
in a large kinda-random number.
Signed-off-by: Kevin Pedretti <kevin.pedretti@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm_lapic_reset() was initializing apic->timer.divide_count to 0,
which could potentially lead to a divide by zero error in
apic_get_tmcct(). Any guest that reads the APIC's CCR (current count)
register before setting DCR (divide configuration) would trigger a divide
by zero exception in the host kernel, leading to a host-OS crash.
This patch results in apic->timer.divide_count being initialized to
2 at reset, eliminating the bug (DCR=0 at reset, meaning divide by 2).
Signed-off-by: Kevin Pedretti <kevin.pedretti@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
We need to make sure that the timer interrupt happens before we clear
PF_VCPU, so the accounting code actually sees guest mode.
http://lkml.org/lkml/2007/10/15/114
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The patch belows changes the access type to register from memory for
instructions that are declared as SrcMem or DstMem, but have a
ModR/M byte with Mod = 3.
It fixes (at least) the lmsw and smsw instructions on an AMD64 CPU,
which are needed for FreeBSD.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Resetting an SMP guest will force AP enter real mode (RESET) with
paging enabled in protected mode. While current enter_rmode() can
only handle mode switch from nonpaging mode to real mode which leads
to SMP reboot failure.
Fix by reloading the mmu context on entering real mode.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This makes sure we handle NMI on the current cpu, and that we don't service
maskable interrupts before non-maskable ones.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Setting shadow page table entry should be set atomicly using set_shadow_pte().
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The repnz/repne instructions must set rep_prefix to 1 like rep/repe/repz.
This patch correct the disk probe problem met with OpenBSD.
This issue appears with commit e70669abd4
because before it, the decoding was done internally to kvm and after it
is done by x86_emulate.c (which doesn't do it correctly).
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This code has gone to wrong place in the file. Moving it back to
right location.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If we defer updating rip until pio instructions are executed, we have a
problem with reset: a pio reset updates rip, and when the instruction
completes we skip the emulated instruction, pointing rip somewhere completely
unrelated.
Fix by updating rip when we see decode the instruction, not after emulation.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Some operand fetches are less than the machine word size and can result in
stale bits if used together with operands of different sizes.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Implement emulation of instruction
lea r16/r32, m
opcode: 0x8d:
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
According to Intel Software Developer's Manual, Vol. 3B, Appendix H.4.2,
exit qualification should be of natural width. However, current code
uses u64 as the data type for this register, which occasionally
introduces invalid value to VMExit handling logics. This patch fixes
this bug.
I have tested Windows and Linux guest on i386 host, and they can boot
successfully with this patch.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>