Resent spec says that for 0f (20|21|22|23) the 2 bits in the mod field
are ignored. Interestingly enough older spec says that 11 is only valid
encoding.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Eliminate the need to call back into KVM to get it from emulator.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
to do with real mode.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
mmio ring page and dev even after it has freed them.
Also, if this function fails, though it might be rare, it seems to be
suggesting the system's serious state: so we'd better stop the works
following the kvm_creat_vm().
This patch clears these problems.
We move the coalesced mmio's initialization out of kvm_create_vm().
This seems to be natural because it includes a registration which
can be done only when vm is successfully created.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Make use of bool as return values, and remove some useless
bool value converting. Thanks Avi to point this out.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mov reg, cr instruction doesn't change flags in any meaningful way, so
no need to update rflags after instruction execution.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Check return value against correct define instead of open code
the value.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
During rep emulation access length to RCX depends on current address
mode.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races.
However, the SDM is adamant that prefetch is allowed:
"The processor may create entries in paging-structure caches for
translations required for prefetches and for accesses that are a
result of speculative execution that would never actually occur
in the executed code path."
And, in fact, there was a race in the prefetch code: we picked up the pte
without the mmu lock held, so an older invlpg could install the pte over
a newer invlpg.
Reinstate the prefetch logic, but this time note whether another invlpg has
executed using a counter. If a race occured, do not install the pte.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The update_pte() path currently uses a nontrapping spte when a nonpresent
(or nonaccessed) gpte is written. This is fine since at present it is only
used on sync pages. However, on an unsync page this will cause an endless
fault loop as the guest is under no obligation to invlpg a gpte that
transitions from nonpresent to present.
Needed for the next patch which reinstates update_pte() on invlpg.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently emulated atomic operations are immediately followed by a non-atomic
operation, so that kvm_mmu_pte_write() can be invoked. This updates the mmu
but undoes the whole point of doing things atomically.
Fix by only performing the atomic operation and the mmu update, and avoiding
the non-atomic write.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Once upon a time, locked operations were emulated while holding the mmu mutex.
Since mmu pages were write protected, it was safe to emulate the writes in
a non-atomic manner, since there could be no other writer, either in the
guest or in the kernel.
These days emulation takes place without holding the mmu spinlock, so the
write could be preempted by an unshadowing event, which exposes the page
to writes by the guest. This may cause corruption of guest page tables.
Fix by using an atomic cmpxchg for these operations.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
kvm_mmu_pte_write() reads guest ptes in two different occasions, both to
allow a 32-bit pae guest to update a pte with 4-byte writes. Consolidate
these into a single read, which also allows us to consolidate another read
from an invlpg speculating a gpte into the shadow page table.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO
from -EINVAL to -ENXIO if no coalesced mmio dev exists.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT.
But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is
return. So this patch used -ENXIO instead of -EFAULT.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT.
But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is
return. So this patch used -ENXIO instead of -EFAULT.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The ioctl KVM_IA64_VCPU_GET_STACK does not set the error code if
copy_to_user() fail, and 0 will be return, we should use -EFAULT
instead of 0 in this case, so this patch fixed it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch use generic linux function native_store_idt()
instead of kvm_get_idt(), and also removed the useless
function kvm_get_idt().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reading rip is expensive on vmx, so move it inside the tracepoint so we only
incur the cost if tracing is enabled.
Signed-off-by: Avi Kivity <avi@redhat.com>
The prep_new_page() in page allocator calls set_page_private(page, 0).
So we don't need to reinitialize private of page.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Avi Kivity<avi@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch does:
- no need call tracepoint_synchronize_unregister() when kvm module
is unloaded since ftrace can handle it
- cleanup ftrace's macro
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
If fail to create the vcpu, we should not create the debugfs
for it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Alexander Graf <agraf@suse.de>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch fixed possible memory leak in kvm_arch_vcpu_create()
under s390, which would happen when kvm_arch_vcpu_create() fails.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
LMSW is present in both group tables. It was marked privileged only in
one of them. Intel analog of VMMCALL is already marked privileged.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
These bits are ignored by the hardware too. Implement this
for nested svm too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch adds the correct handling of the nested io
permission bitmap. Old behavior was to not lookup the port
in the iopm but only reinject an io intercept to the guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
There is a generic function now to calculate msrpm offsets.
Use that function in nested_svm_exit_handled_msr() remove
the duplicate logic (which had a bug anyway).
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch optimizes the way the msrpm of the host and the
guest are merged. The old code merged the 2 msrpm pages
completly. This code needed to touch 24kb of memory for that
operation. The optimized variant this patch introduces
merges only the parts where the host msrpm may contain zero
bits. This reduces the amount of memory which is touched to
48 bytes.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch introduces a list with all msrs a guest might
have direct access to and changes the svm_vcpu_init_msrpm
function to use this list.
It also adds a check to set_msr_interception which triggers
a warning if a developer changes a msr intercept that is not
in the list.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The algorithm to find the offset in the msrpm for a given
msr is needed at other places too. Move that logic to its
own function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The nested_svm_exit_handled_msr() returned an bool which is
a bug. I worked by accident because the exected integer
return values match with the true and false values. This
patch changes the return value to int and let the function
return the correct values.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
arch/x86/kvm/kvm_timer.h:13: ERROR: code indent should use tabs where possible
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Implement jmp far opcode ff/5. It is used by multiboot loader.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
fix segment_base() to properly check for null segment selector and
avoid accessing NULL pointer if ldt selector in null.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Linux now has native_store_gdt() to do the same. Use it instead of
kvm local version.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo introduced gfn_to_hva_memslot() when he implemented
gfn_to_pfn_memslot(). Let's use this for gfn_to_hva() too.
Note: also remove parentheses next to return as checkpatch said to do.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
When injecting an vmexit.intr into the nested hypervisor
there might be leftover values in the exit_info fields.
Clear them to not confuse nested hypervisors.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
If we have the following situation with nested svm:
1. Host KVM intercepts cr0 writes
2. Guest hypervisor intercepts only selective cr0 writes
Then we get an cr0 write intercept which is handled on the
host. But that intercepts may actually be a selective cr0
intercept for the guest. This patch checks for this
condition and injects a selective cr0 intercept if needed.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The vcpu->arch.cr0 variable is already set in the
architecture specific set_cr0 callbacks. There is no need to
set it in the common code.
This allows the architecture code to keep the old arch.cr0
value if it wants. This is required for nested svm to decide
if a selective_cr0 exit needs to be injected.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Hyper-V as a guest wants to write this bit. This patch
ignores it.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the emulation of the vm_cr msr for
nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>