linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-15 07:33:56 +00:00

Author	SHA1	Message	Date
Joerg Roedel	c2c63a4939	KVM: SVM: Report emulated SVM features to userspace This patch implements the reporting of the emulated SVM features to userspace instead of the real hardware capabilities. Every real hardware capability needs emulation in nested svm so the old behavior was broken. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:24 +03:00
Joerg Roedel	d4330ef2fb	KVM: x86: Add callback to let modules decide over some supported cpuid bits This patch adds the get_supported_cpuid callback to kvm_x86_ops. It will be used in do_cpuid_ent to delegate the decission about some supported cpuid bits to the architecture modules. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:23 +03:00
Joerg Roedel	228070b1b3	KVM: SVM: Propagate nested entry failure into guest hypervisor This patch implements propagation of a failes guest vmrun back into the guest instead of killing the whole guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:21 +03:00
Joerg Roedel	2be4fc7a02	KVM: SVM: Sync cr0 and cr3 to kvm state before nested handling This patch syncs cr0 and cr3 from the vmcb to the kvm state before nested intercept handling is done. This allows to simplify the vmexit path. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:20 +03:00
Joerg Roedel	2041a06a50	KVM: SVM: Make sure rip is synced to vmcb before nested vmexit This patch fixes a bug where a nested guest always went over the same instruction because the rip was not advanced on a nested vmexit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:18 +03:00
Joerg Roedel	924584ccb0	KVM: SVM: Fix nested nmi handling The patch introducing nested nmi handling had a bug. The check does not belong to enable_nmi_window but must be in nmi_allowed. This patch fixes this. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:17 +03:00
Gleb Natapov	acb5451789	KVM: prevent spurious exit to userspace during task switch emulation. If kvm_task_switch() fails code exits to userspace without specifying exit reason, so the previous exit reason is reused by userspace. Fix this by specifying exit reason correctly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:55 +03:00
Jan Kiszka	e269fb2189	KVM: x86: Push potential exception error code on task switches When a fault triggers a task switch, the error code, if existent, has to be pushed on the new task's stack. Implement the missing bits. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:46 +03:00
Gleb Natapov	020df0794f	KVM: move DR register access handling into generic code Currently both SVM and VMX have their own DR handling code. Move it to x86.c. Acked-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:39 +03:00
Andre Przywara	6bc31bdc55	KVM: SVM: implement NEXTRIPsave SVM feature On SVM we set the instruction length of skipped instructions to hard-coded, well known values, which could be wrong when (bogus, but valid) prefixes (REX, segment override) are used. Newer AMD processors (Fam10h 45nm and better, aka. PhenomII or AthlonII) have an explicit NEXTRIP field in the VMCB containing the desired information. Since it is cheap to do so, we use this field to override the guessed value on newer processors. A fix for older CPUs would be rather expensive, as it would require to fetch and partially decode the instruction. As the problem is not a security issue and needs special, handcrafted code to trigger (no compiler will ever generate such code), I omit a fix for older CPUs. If someone is interested, I have both a patch for these CPUs as well as demo code triggering this issue: It segfaults under KVM, but runs perfectly on native Linux. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:38 +03:00
Gleb Natapov	cf8f70bfe3	KVM: x86 emulator: fix in/out emulation. in/out emulation is broken now. The breakage is different depending on where IO device resides. If it is in userspace emulator reports emulation failure since it incorrectly interprets kvm_emulate_pio() return value. If IO device is in the kernel emulation of 'in' will do nothing since kvm_emulate_pio() stores result directly into vcpu registers, so emulator will overwrite result of emulation during commit of shadowed register. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:25 +03:00
Avi Kivity	5bfd8b5455	KVM: Move kvm_exit tracepoint rip reading inside tracepoint Reading rip is expensive on vmx, so move it inside the tracepoint so we only incur the cost if tracing is enabled. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:15:25 +03:00
Joerg Roedel	f71385383f	KVM: SVM: Ignore lower 12 bit of nested msrpm_pa These bits are ignored by the hardware too. Implement this for nested svm too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:16 +03:00
Joerg Roedel	ce2ac085ff	KVM; SVM: Add correct handling of nested iopm This patch adds the correct handling of the nested io permission bitmap. Old behavior was to not lookup the port in the iopm but only reinject an io intercept to the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:15 +03:00
Joerg Roedel	0d6b35378e	KVM: SVM: Use svm_msrpm_offset in nested_svm_exit_handled_msr There is a generic function now to calculate msrpm offsets. Use that function in nested_svm_exit_handled_msr() remove the duplicate logic (which had a bug anyway). Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:13 +03:00
Joerg Roedel	323c3d809b	KVM: SVM: Optimize nested svm msrpm merging This patch optimizes the way the msrpm of the host and the guest are merged. The old code merged the 2 msrpm pages completly. This code needed to touch 24kb of memory for that operation. The optimized variant this patch introduces merges only the parts where the host msrpm may contain zero bits. This reduces the amount of memory which is touched to 48 bytes. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:12 +03:00
Joerg Roedel	ac72a9b733	KVM: SVM: Introduce direct access msr list This patch introduces a list with all msrs a guest might have direct access to and changes the svm_vcpu_init_msrpm function to use this list. It also adds a check to set_msr_interception which triggers a warning if a developer changes a msr intercept that is not in the list. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:10 +03:00
Joerg Roedel	455716fa94	KVM: SVM: Move msrpm offset calculation to seperate function The algorithm to find the offset in the msrpm for a given msr is needed at other places too. Move that logic to its own function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:08 +03:00
Joerg Roedel	d24778265a	KVM: SVM: Return correct values in nested_svm_exit_handled_msr The nested_svm_exit_handled_msr() returned an bool which is a bug. I worked by accident because the exected integer return values match with the true and false values. This patch changes the return value to int and let the function return the correct values. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:07 +03:00
Joerg Roedel	061e2fd168	KVM: SVM: Fix wrong intercept masks on 32 bit This patch makes KVM on 32 bit SVM working again by correcting the masks used for iret interception. With the wrong masks the upper 32 bits of the intercepts are masked out which leaves vmrun unintercepted. This is not legal on svm and the vmrun fails. Bug was introduced by commits `95ba827313` and `3cfc3092`. Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-13 01:24:08 -03:00
Gleb Natapov	d6ab1ed446	KVM: Drop kvm_get_gdt() in favor of generic linux function Linux now has native_store_gdt() to do the same. Use it instead of kvm local version. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:53:32 +03:00
Joerg Roedel	197717d581	KVM: SVM: Clear exit_info for injected INTR exits When injecting an vmexit.intr into the nested hypervisor there might be leftover values in the exit_info fields. Clear them to not confuse nested hypervisors. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:53:26 +03:00
Joerg Roedel	7f5d8b5600	KVM: SVM: Handle nested selective_cr0 intercept correctly If we have the following situation with nested svm: 1. Host KVM intercepts cr0 writes 2. Guest hypervisor intercepts only selective cr0 writes Then we get an cr0 write intercept which is handled on the host. But that intercepts may actually be a selective cr0 intercept for the guest. This patch checks for this condition and injects a selective cr0 intercept if needed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:53:23 +03:00
Joerg Roedel	4a810181c8	KVM: SVM: Implement emulation of vm_cr msr This patch implements the emulation of the vm_cr msr for nested svm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:53:14 +03:00
Joerg Roedel	2e554e8d67	KVM: SVM: Add kvm_nested_intercepts tracepoint This patch adds a tracepoint to get information about the most important intercept bitmasks from the nested vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:53:10 +03:00
Joerg Roedel	ecf1405df2	KVM: SVM: Restore tracing of nested vmcb address A recent change broke tracing of the nested vmcb address. It was reported as 0 all the time. This patch fixes it. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:53:07 +03:00
Joerg Roedel	887f500ca1	KVM: SVM: Check for nested intercepts on NMI injection This patch implements the NMI intercept checking for nested svm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:53:04 +03:00
Joerg Roedel	0e5cbe368b	KVM: SVM: Reset MMU on nested_svm_vmrun for NPT too Without resetting the MMU the gva_to_pga function will not work reliably when the vcpu is running in nested context. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:53:01 +03:00
Joerg Roedel	e02317153e	KVM: SVM: Coding style cleanup This patch removes whitespace errors, fixes comment formats and most of checkpatch warnings. Now vim does not show c-space-errors anymore. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:52:58 +03:00
Jan Kiszka	66b7138f91	KVM: SVM: Emulate nRIP feature when reinjecting INT3 When in guest debugging mode, we have to reinject those #BP software exceptions that are caused by guest-injected INT3. As older AMD processors do not support the required nRIP VMCB field, try to emulate it by moving RIP past the instruction on exception injection. Fix it up again in case the injection failed and we were able to catch this. This does not work for unintercepted faults, but it is better than doing nothing. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:00:43 +03:00
Jan Kiszka	116a4752c8	KVM: SVM: Move svm_queue_exception Move svm_queue_exception past skip_emulated_instruction to allow calling it later on. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 13:00:37 +03:00
Jan Kiszka	48005f64d0	KVM: x86: Save&restore interrupt shadow mask The interrupt shadow created by STI or MOV-SS-like operations is part of the VCPU state and must be preserved across migration. Transfer it in the spare padding field of kvm_vcpu_events.interrupt. As a side effect we now have to make vmx_set_interrupt_shadow robust against both shadow types being set. Give MOV SS a higher priority and skip STI in that case to avoid that VMX throws a fault on next entry. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:38:28 +03:00
Joerg Roedel	8fe546547c	KVM: SVM: Fix wrong interrupt injection in enable_irq_windows The nested_svm_intr() function does not execute the vmexit anymore. Therefore we may still be in the nested state after that function ran. This patch changes the nested_svm_intr() function to return wether the irq window could be enabled. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:38:11 +03:00
Joerg Roedel	66a562f7e2	KVM: SVM: Make lazy FPU switching work with nested svm The new lazy fpu switching code may disable cr0 intercepts when running nested. This is a bug because the nested hypervisor may still want to intercept cr0 which will break in this situation. This patch fixes this issue and makes lazy fpu switching working with nested svm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:34:28 +03:00
Joerg Roedel	06fc777269	KVM: SVM: Activate nested state only when guest state is complete Certain functions called during the emulated world switch behave differently when the vcpu is running nested. This is not the expected behavior during a world switch emulation. This patch ensures that the nested state is activated only if the vcpu is completly in nested state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:34:25 +03:00
Joerg Roedel	88ab24adc7	KVM: SVM: Don't sync nested cr8 to lapic and back This patch makes syncing of the guest tpr to the lapic conditional on !nested. Otherwise a nested guest using the TPR could freeze the guest. Another important change this patch introduces is that the cr8 intercept bits are no longer ORed at vmrun emulation if the guest sets VINTR_MASKING in its VMCB. The reason is that nested cr8 accesses need alway be handled by the nested hypervisor because they change the shadow version of the tpr. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:34:22 +03:00
Joerg Roedel	4c7da8cb43	KVM: SVM: Fix nested msr intercept handling The nested_svm_exit_handled_msr() function maps only one page of the guests msr permission bitmap. This patch changes the code to use kvm_read_guest to fix the bug. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:34:19 +03:00
Joerg Roedel	6c3bd3d766	KVM: SVM: Annotate nested_svm_map with might_sleep() The nested_svm_map() function can sleep and must not be called from atomic context. So annotate that function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:34:16 +03:00
Joerg Roedel	cdbbdc1210	KVM: SVM: Sync all control registers on nested vmexit Currently the vmexit emulation does not sync control registers were the access is typically intercepted by the nested hypervisor. But we can not count on that intercepts to sync these registers too and make the code architecturally more correct. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:34:13 +03:00
Joerg Roedel	b8e88bc8ff	KVM: SVM: Fix schedule-while-atomic on nested exception handling Move the actual vmexit routine out of code that runs with irqs and preemption disabled. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:34:10 +03:00
Joerg Roedel	7597f129d8	KVM: SVM: Don't use kmap_atomic in nested_svm_map Use of kmap_atomic disables preemption but if we run in shadow-shadow mode the vmrun emulation executes kvm_set_cr3 which might sleep or fault. So use kmap instead for nested_svm_map. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:34:07 +03:00
Gleb Natapov	89a27f4d0e	KVM: use desc_ptr struct instead of kvm private descriptor_table x86 arch defines desc_ptr for idt/gdt pointers, no need to define another structure in kvm code. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:27:28 +03:00
Takuya Yoshikawa	b7af404338	KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails svm_create_vcpu() does not free the pages allocated during the creation when it fails to complete the allocations. This patch fixes it. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-20 12:55:04 +03:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Avi Kivity	59200273c4	KVM: Trace failed msr reads and writes Record failed msrs reads and writes, and the fact that they failed as well. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-03-01 12:36:06 -03:00
Avi Kivity	f6801dff23	KVM: Rename vcpu->shadow_efer to efer None of the other registers have the shadow_ prefix. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-03-01 12:36:04 -03:00
Avi Kivity	6b52d18605	KVM: Activate fpu on clts Assume that if the guest executes clts, it knows what it's doing, and load the guest fpu to prevent an #NM exception. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-03-01 12:36:04 -03:00
Jan Kiszka	727f5a23e2	KVM: SVM: Trap all debug register accesses To enable proper debug register emulation under all conditions, trap access to all DR0..7. This may be optimized later on. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-03-01 12:36:02 -03:00
Jan Kiszka	c76de350c8	KVM: SVM: Clean up and enhance mov dr emulation Enhance mov dr instruction emulation used by SVM so that it properly handles dr4/5: alias to dr6/7 if cr4.de is cleared. Otherwise return EMULATE_FAIL which will let our only possible caller in that scenario, ud_interception, re-inject UD. We do not need to inject faults, SVM does this for us (exceptions take precedence over instruction interceptions). For the same reason, the value overflow checks can be removed. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-03-01 12:36:02 -03:00
Avi Kivity	4610c83cdc	KVM: SVM: Lazy fpu with npt Now that we can allow the guest to play with cr0 when the fpu is loaded, we can enable lazy fpu when npt is in use. Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:51 -03:00
Avi Kivity	d225157bc6	KVM: SVM: Selective cr0 intercept If two conditions apply: - no bits outside TS and EM differ between the host and guest cr0 - the fpu is active then we can activate the selective cr0 write intercept and drop the unconditional cr0 read and write intercept, and allow the guest to run with the host fpu state. This reduces cr0 exits due to guest fpu management while the guest fpu is loaded. Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:51 -03:00
Avi Kivity	888f9f3e0c	KVM: SVM: Restore unconditional cr0 intercept under npt Currently we don't intercept cr0 at all when npt is enabled. This improves performance but requires us to activate the fpu at all times. Remove this behaviour in preparation for adding selective cr0 intercepts. Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:51 -03:00
Avi Kivity	bff7827479	KVM: SVM: Initialize fpu_active in init_vmcb() init_vmcb() sets up the intercepts as if the fpu is active, so initialize it there. This avoids an INIT from setting up intercepts inconsistent with fpu_active. Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:51 -03:00
Avi Kivity	02daab21d9	KVM: Lazify fpu activation and deactivation Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep it loaded until the next heavyweight exit (where we are forced to unload it). This reduces unnecessary exits. We also defer fpu activation on clts; while clts signals the intent to use the fpu, we can't be sure the guest will actually use it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:50 -03:00
Avi Kivity	e8467fda83	KVM: VMX: Allow the guest to own some cr0 bits We will use this later to give the guest ownership of cr0.ts. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:50 -03:00
Avi Kivity	4d4ec08745	KVM: Replace read accesses of vcpu->arch.cr0 by an accessor Since we'd like to allow the guest to own a few bits of cr0 at times, we need to know when we access those bits. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:50 -03:00
Sheng Yang	17cc393596	KVM: x86: Rename gb_page_enable() to get_lpage_level() in kvm_x86_ops Then the callback can provide the maximum supported large page level, which is more flexible. Also move the gb page support into x86_64 specific. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-03-01 12:35:46 -03:00
Joerg Roedel	953899b659	KVM: SVM: Adjust tsc_offset only if tsc_unstable The tsc_offset adjustment in svm_vcpu_load is executed unconditionally even if Linux considers the host tsc as stable. This causes a Linux guest detecting an unstable tsc in any case. This patch removes the tsc_offset adjustment if the host tsc is stable. The guest will now get the benefit of a stable tsc too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:41 -03:00
Sheng Yang	4e47c7a6d7	KVM: VMX: Add instruction rdtscp support for guest Before enabling, execution of "rdtscp" in guest would result in #UD. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:40 -03:00
Sheng Yang	0e85188049	KVM: Add cpuid_update() callback to kvm_x86_ops Sometime, we need to adjust some state in order to reflect guest CPUID setting, e.g. if we don't expose rdtscp to guest, we won't want to enable it on hardware. cpuid_update() is introduced for this purpose. Also export kvm_find_cpuid_entry() for later use. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:40 -03:00
Linus Torvalds	d0316554d3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c	2009-12-14 09:58:24 -08:00
Jan Kiszka	3cfc3092f4	KVM: x86: Add KVM_GET/SET_VCPU_EVENTS This new IOCTL exports all yet user-invisible states related to exceptions, interrupts, and NMIs. Together with appropriate user space changes, this fixes sporadic problems of vmsave/restore, live migration and system reset. [avi: future-proof abi by adding a flags field] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:25 +02:00
Eduardo Habkost	3ce672d484	KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization The svm_set_cr0() call will initialize save->cr0 properly even when npt is enabled, clearing the NW and CD bits as expected, so we don't need to initialize it manually for npt_enabled anymore. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:21 +02:00
Eduardo Habkost	18fa000ae4	KVM: SVM: Reset cr0 properly on vcpu reset svm_vcpu_reset() was not properly resetting the contents of the guest-visible cr0 register, causing the following issue: https://bugzilla.redhat.com/show_bug.cgi?id=525699 Without resetting cr0 properly, the vcpu was running the SIPI bootstrap routine with paging enabled, making the vcpu get a pagefault exception while trying to run it. Instead of setting vmcb->save.cr0 directly, the new code just resets kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0(). kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:21 +02:00
Jan Kiszka	6be7d3062b	KVM: SVM: Cleanup NMI singlestep Push the NMI-related singlestep variable into vcpu_svm. It's dealing with an AMD-specific deficit, nothing generic for x86. Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> arch/x86/include/asm/kvm_host.h \| 1 - arch/x86/kvm/svm.c \| 12 +++++++----- 2 files changed, 7 insertions(+), 6 deletions(-) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:19 +02:00
Mark Langsdorf	565d0998ec	KVM: SVM: Support Pause Filter in AMD processors New AMD processors (Family 0x10 models 8+) support the Pause Filter Feature. This feature creates a new field in the VMCB called Pause Filter Count. If Pause Filter Count is greater than 0 and intercepting PAUSEs is enabled, the processor will increment an internal counter when a PAUSE instruction occurs instead of intercepting. When the internal counter reaches the Pause Filter Count value, a PAUSE intercept will occur. This feature can be used to detect contended spinlocks, especially when the lock holding VCPU is not scheduled. Rescheduling another VCPU prevents the VCPU seeking the lock from wasting its quantum by spinning idly. Experimental results show that most spinlocks are held for less than 1000 PAUSE cycles or more than a few thousand. Default the Pause Filter Counter to 3000 to detect the contended spinlocks. Processor support for this feature is indicated by a CPUID bit. On a 24 core system running 4 guests each with 16 VCPUs, this patch improved overall performance of each guest's 32 job kernbench by approximately 3-5% when combined with a scheduler algorithm thati caused the VCPU to sleep for a brief period. Further performance improvement may be possible with a more sophisticated yield algorithm. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:17 +02:00
Joerg Roedel	d36f19e9ec	KVM: SVM: Remove nsvm_printk debugging code With all important informations now delivered through tracepoints we can savely remove the nsvm_printk debugging code for nested svm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:17 +02:00
Joerg Roedel	532a46b989	KVM: SVM: Add tracepoint for skinit instruction This patch adds a tracepoint for the event that the guest executed the SKINIT instruction. This information is important because SKINIT is an SVM extenstion not yet implemented by nested SVM and we may need this information for debugging hypervisors that do not yet run on nested SVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:16 +02:00
Joerg Roedel	ec1ff79084	KVM: SVM: Add tracepoint for invlpga instruction This patch adds a tracepoint for the event that the guest executed the INVLPGA instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:16 +02:00
Joerg Roedel	236649de33	KVM: SVM: Add tracepoint for #vmexit because intr pending This patch adds a special tracepoint for the event that a nested #vmexit is injected because kvm wants to inject an interrupt into the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:16 +02:00
Joerg Roedel	17897f3668	KVM: SVM: Add tracepoint for injected #vmexit This patch adds a tracepoint for a nested #vmexit that gets re-injected to the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:15 +02:00
Joerg Roedel	d8cabddf7e	KVM: SVM: Add tracepoint for nested #vmexit This patch adds a tracepoint for every #vmexit we get from a nested guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:15 +02:00
Joerg Roedel	0ac406de8f	KVM: SVM: Add tracepoint for nested vmrun This patch adds a dedicated kvm tracepoint for a nested vmrun. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:15 +02:00
Joerg Roedel	cd3ff653ae	KVM: SVM: Move INTR vmexit out of atomic code The nested SVM code emulates a #vmexit caused by a request to open the irq window right in the request function. This is a bug because the request function runs with preemption and interrupts disabled but the #vmexit emulation might sleep. This can cause a schedule()-while-atomic bug and is fixed with this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:15 +02:00
Alexander Graf	8d23c46624	KVM: SVM: Notify nested hypervisor of lost event injections If event_inj is valid on a #vmexit the host CPU would write the contents to exit_int_info, so the hypervisor knows that the event wasn't injected. We don't do this in nested SVM by now which is a bug and fixed by this patch. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:14 +02:00
Jan Kiszka	355be0b930	KVM: x86: Refactor guest debug IOCTL handling Much of so far vendor-specific code for setting up guest debug can actually be handled by the generic code. This also fixes a minor deficit in the SVM part /wrt processing KVM_GUESTDBG_ENABLE. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:14 +02:00
Zachary Amsden	3230bb4707	KVM: Fix hotplug of CPUs Both VMX and SVM require per-cpu memory allocation, which is done at module init time, for only online cpus. Backend was not allocating enough structure for all possible CPUs, so new CPUs coming online could not be hardware enabled. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:13 +02:00
Zachary Amsden	e6732a5af9	KVM: Fix printk name error in svm.c Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:13 +02:00
Joerg Roedel	e935d48e1b	KVM: SVM: Remove remaining occurences of rdtscll This patch replaces them with native_read_tsc() which can also be used in expressions and saves a variable on the stack in this case. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:12 +02:00
Joerg Roedel	33527ad7e1	KVM: SVM: don't copy exit_int_info on nested vmrun The exit_int_info field is only written by the hardware and never read. So it does not need to be copied on a vmrun emulation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:11 +02:00
Joerg Roedel	7fcdb5103d	KVM: SVM: reorganize svm_interrupt_allowed This patch reorganizes the logic in svm_interrupt_allowed to make it better to read. This is important because the logic is a lot more complicated with Nested SVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:11 +02:00
Alexander Graf	10474ae894	KVM: Activate Virtualization On Demand X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. To circumvent this problem at least a bit, this patch introduces on demand activation of virtualization. This means, that instead virtualization is enabled on creation of the first virtual machine and disabled on destruction of the last one. So using this, KVM can be easily autoloaded, while keeping other hypervisors usable. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:10 +02:00
Marcelo Tosatti	e8b3433a5c	KVM: SVM: remove needless mmap_sem acquision from nested_svm_map nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since gfn_to_page / get_user_pages are responsible for it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:10 +02:00
Avi Kivity	851ba6922a	KVM: Don't pass kvm_run arguments They're just copies of vcpu->run, which is readily accessible. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:06 +02:00
Tejun Heo	0fe1e00954	percpu: make percpu symbols in x86 unique This patch updates percpu related symbols in x86 such that percpu symbols are unique and don't clash with local symbols. This serves two purposes of decreasing the possibility of global percpu symbol collision and allowing dropping per_cpu__ prefix from percpu symbols. * arch/x86/kernel/cpu/common.c: rename local variable to avoid collision * arch/x86/kvm/svm.c: s/svm_data/sd/ for local variables to avoid collision * arch/x86/kernel/cpu/cpu_debug.c: s/cpu_arr/cpud_arr/ s/priv_arr/cpud_priv_arr/ s/cpu_priv_count/cpud_priv_count/ * arch/x86/kernel/cpu/intel_cacheinfo.c: s/cpuid4_info/ici_cpuid4_info/ s/cache_kobject/ici_cache_kobject/ s/index_kobject/ici_index_kobject/ * arch/x86/kernel/ds.c: s/cpu_context/cpu_ds_context/ Partly based on Rusty Russell's "alloc_percpu: rename percpu vars which cause name clashes" patch. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: (kvm) Avi Kivity <avi@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: x86@kernel.org	2009-10-29 22:34:14 +09:00
Joerg Roedel	20824f30bb	KVM: SVM: Handle tsc in svm_get_msr/svm_set_msr correctly When running nested we need to touch the l1 guests tsc_offset. Otherwise changes will be lost or a wrong value be read. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-10-04 13:57:23 +02:00
Joerg Roedel	77b1ab1732	KVM: SVM: Fix tsc offset adjustment when running nested When svm_vcpu_load is called while the vcpu is running in guest mode the tsc adjustment made there is lost on the next emulated #vmexit. This causes the tsc running backwards in the guest. This patch fixes the issue by also adjusting the tsc_offset in the emulated hsave area so that it will not get lost. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-10-04 13:57:22 +02:00
Avi Kivity	52c7847d12	KVM: SVM: Drop tlb flush workaround in npt It is no longer possible to reproduce the problem any more, so presumably it has been fixed. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:40 +03:00
Joerg Roedel	4b6e4dca70	KVM: SVM: enable nested svm by default Nested SVM is (in my experience) stable enough to be enabled by default. So omit the requirement to pass a module parameter. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:38 +03:00
Joerg Roedel	108768de55	KVM: SVM: check for nested VINTR flag in svm_interrupt_allowed Not checking for this flag breaks any nested hypervisor that does not set VINTR. So fix it with this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:37 +03:00
Joerg Roedel	26666957a5	KVM: SVM: move nested_svm_intr main logic out of if-clause This patch removes one indentation level from nested_svm_intr and makes the logic more readable. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:36 +03:00
Joerg Roedel	cda0ffdd86	KVM: SVM: remove unnecessary is_nested check from svm_cpu_run This check is not necessary. We have to sync the vcpu->arch.cr2 always back to the VMCB. This patch remove the is_nested check. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:34 +03:00
Joerg Roedel	410e4d573d	KVM: SVM: move special nested exit handling to separate function This patch moves the handling for special nested vmexits like #pf to a separate function. This makes the kvm_override parameter obsolete and makes the code more readable. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:33 +03:00
Joerg Roedel	1f8da47805	KVM: SVM: handle errors in vmrun emulation path appropriatly If nested svm fails to load the msrpm the vmrun succeeds with the old msrpm which is not correct. This patch changes the logic to roll back to host mode in case the msrpm cannot be loaded. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:32 +03:00
Joerg Roedel	ea8e064fe2	KVM: SVM: remove nested_svm_do and helper functions This function is not longer required. So remove it. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:31 +03:00
Joerg Roedel	9738b2c97d	KVM: SVM: clean up nested vmrun path This patch removes the usage of nested_svm_do from the vmrun emulation path. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:29 +03:00
Joerg Roedel	9966bf6872	KVM: SVM: clean up nestec vmload/vmsave paths This patch removes the usage of nested_svm_do from the vmload and vmsave emulation code paths. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:28 +03:00
Joerg Roedel	3d62d9aa98	KVM: SVM: clean up nested_svm_exit_handled_msr This patch changes nested svm to call nested_svm_exit_handled_msr directly and not through nested_svm_do. [alex: fix oops due to nested kmap_atomics] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:45:43 +03:00
Joerg Roedel	34f80cfad5	KVM: SVM: get rid of nested_svm_vmexit_real This patch is the starting point of removing nested_svm_do from the nested svm code. The nested_svm_do function basically maps two guest physical pages to host virtual addresses and calls a passed function on it. This function pointer code flow is hard to read and not the best technical solution here. As a side effect this patch indroduces the nested_svm_[un]map helper functions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:26 +03:00
Joerg Roedel	0295ad7de8	KVM: SVM: simplify nested_svm_check_exception Makes the code of this function more readable by removing on indentation level for the core logic. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:25 +03:00
Joerg Roedel	9c4e40b994	KVM: SVM: do nested vmexit in nested_svm_exit_handled If this function returns true a nested vmexit is required. Move that vmexit into the nested_svm_exit_handled function. This also simplifies the handling of nested #pf intercepts in this function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:25 +03:00
Joerg Roedel	4c2161aed5	KVM: SVM: consolidate nested_svm_exit_handled When caching guest intercepts there is no need anymore for the nested_svm_exit_handled_real function. So move its code into nested_svm_exit_handled. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:24 +03:00
Joerg Roedel	aad42c641c	KVM: SVM: cache nested intercepts When the nested intercepts are cached we don't need to call get_user_pages and/or map the nested vmcb on every nested #vmexit to check who will handle the intercept. Further this patch aligns the emulated svm behavior better to real hardware. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:24 +03:00
Joerg Roedel	e6aa9abd73	KVM: SVM: move nested svm state into seperate struct This makes it more clear for which purpose these members in the vcpu_svm exist. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:24 +03:00
Joerg Roedel	a5c3832dfe	KVM: SVM: complete interrupts after handling nested exits The interrupt completion code must run after nested exits are handled because not injected interrupts or exceptions may be handled by the l1 guest first. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:24 +03:00
Joerg Roedel	0460a979b4	KVM: SVM: copy only necessary parts of the control area on vmrun/vmexit The vmcb control area contains more then 800 bytes of reserved fields which are unnecessarily copied. Fix this by introducing a copy function which only copies the relevant part and saves time. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:23 +03:00
Joerg Roedel	defbba5660	KVM: SVM: optimize nested vmrun Only copy the necessary parts of the vmcb save area on vmrun and save precious time. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:23 +03:00
Joerg Roedel	33740e4009	KVM: SVM: optimize nested #vmexit It is more efficient to copy only the relevant parts of the vmcb back to the nested vmcb when we emulate an vmexit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:23 +03:00
Joerg Roedel	2af9194d1b	KVM: SVM: add helper functions for global interrupt flag This patch makes the code easier to read when it comes to setting, clearing and checking the status of the virtualized global interrupt flag for the VCPU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:23 +03:00
Joerg Roedel	344f414fa0	KVM: report 1GB page support to userspace If userspace knows that the kernel part supports 1GB pages it can enable the corresponding cpuid bit so that guests actually use GB pages. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:19 +03:00
Akinobu Mita	b792c344df	KVM: x86: use kvm_get_gdt() and kvm_read_ldt() Use kvm_get_gdt() and kvm_read_ldt() to reduce inline assembly code. Cc: Avi Kivity <avi@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 08:33:15 +03:00
Andre Przywara	8f1589d95e	KVM: ignore AMDs HWCR register access to set the FFDIS bit Linux tries to disable the flush filter on all AMD K8 CPUs. Since KVM does not handle the needed MSR, the injected #GP will panic the Linux kernel. Ignore setting of the HWCR.FFDIS bit in this MSR to let Linux boot with an AMD K8 family guest CPU. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:02 +03:00
Marcelo Tosatti	229456fc34	KVM: convert custom marker based tracing to event traces This allows use of the powerful ftrace infrastructure. See Documentation/trace/ for usage information. [avi, stephen: various build fixes] [sheng: fix control register breakage] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:59 +03:00
Alexander Graf	219b65dcf6	KVM: SVM: Improve nested interrupt injection While trying to get Hyper-V running, I realized that the interrupt injection mechanisms that are in place right now are not 100% correct. This patch makes nested SVM's interrupt injection behave more like on a real machine. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:59 +03:00
Alexander Graf	ff092385e8	KVM: SVM: Implement INVLPGA SVM adds another way to do INVLPG by ASID which Hyper-V makes use of, so let's implement it! For now we just do the same thing invlpg does, as asid switching means we flush the mmu anyways. That might change one day though. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:58 +03:00
Alexander Graf	3c5d0a44b0	KVM: Implement MSRs used by Hyper-V Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage. But let's be nice today and have it its way, because otherwise it fails terribly. [jaswinder: fix build for linux-next changes] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:58 +03:00
Avi Kivity	b3dbf89e67	KVM: SVM: Don't save/restore host cr2 The host never reads cr2 in process context, so are free to clobber it. The vmx code does this, so we can safely remove the save/restore code. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:58 +03:00
Andre Przywara	71db602322	KVM: Move performance counter MSR access interception to generic x86 path The performance counter MSRs are different for AMD and Intel CPUs and they are chosen mainly by the CPUID vendor string. This patch catches writes to all addresses (regardless of VMX/SVM path) and handles them in the generic MSR handler routine. Writing a 0 into the event select register is something we perfectly emulate ;-), so don't print out a warning to dmesg in this case. This fixes booting a 64bit Windows guest with an AMD CPUID on an Intel host. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:54 +03:00
Gleb Natapov	c5af89b68a	KVM: Introduce kvm_vcpu_is_bsp() function. Use it instead of open code "vcpu_id zero is BSP" assumption. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:51 +03:00
Avi Kivity	6de4f3ada4	KVM: Cache pdptrs Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx, guest memory access on svm) extract them on demand. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:46 +03:00
Avi Kivity	6c8166a77c	KVM: SVM: Fold kvm_svm.h info svm.c kvm_svm.h is only included from svm.c, so fold it in. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:44 +03:00
Andre Przywara	017cb99e87	KVM: SVM: use explicit 64bit storage for sysenter values Since AMD does not support sysenter in 64bit mode, the VMCB fields storing the MSRs are truncated to 32bit upon VMRUN/#VMEXIT. So store the values in a separate 64bit storage to avoid truncation. [andre: fix amd->amd migration] Signed-off-by: Christoph Egger <christoph.egger@amd.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:43 +03:00
Jaswinder Singh Rajput	af24a4e4ae	KVM: Replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of msr-index.h Use standard msr-index.h's MSR declaration. MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves 80 column issue. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:38 +03:00
Marcelo Tosatti	4b656b1202	KVM: SVM: force new asid on vcpu migration If a migrated vcpu matches the asid_generation value of the target pcpu, there will be no TLB flush via TLB_CONTROL_FLUSH_ALL_ASID. The check for vcpu.cpu in pre_svm_run is meaningless since svm_vcpu_load already updated it on schedule in. Such vcpu will VMRUN with stale TLB entries. Based on original patch from Joerg Roedel (http://patchwork.kernel.org/patch/10021/) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-08-05 13:59:29 +03:00
Gleb Natapov	44c11430b5	KVM: inject NMI after IRET from a previous NMI, not before. If NMI is received during handling of another NMI it should be injected immediately after IRET from previous NMI handler, but SVM intercept IRET before instruction execution so we can't inject pending NMI at this point and there is not way to request exit when NMI window opens. This patch fix SVM code to open NMI window after IRET by single stepping over IRET instruction. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:59 +03:00
Gleb Natapov	66fd3f7f90	KVM: Do not re-execute INTn instruction. Re-inject event instead. This is what Intel suggest. Also use correct instruction length when re-injecting soft fault/interrupt. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:58 +03:00
Gleb Natapov	f629cf8485	KVM: skip_emulated_instruction() decode instruction if size is not known Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:58 +03:00
Gleb Natapov	3298b75c88	KVM: Unprotect a page if #PF happens during NMI injection. It is done for exception and interrupt already. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:57 +03:00
Glauber Costa	2809f5d2c4	KVM: Replace ->drop_interrupt_shadow() by ->set_interrupt_shadow() This patch replaces drop_interrupt_shadow with the more general set_interrupt_shadow, that can either drop or raise it, depending on its parameter. It also adds ->get_interrupt_shadow() for future use. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:54 +03:00
Gleb Natapov	fe8e7f83de	KVM: SVM: Don't reinject event that caused a task switch If a task switch caused by an event remove it from the event queue. VMX already does that. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:51 +03:00
Andre Przywara	b586eb0253	KVM: SVM: Fix cross vendor migration issue in segment segment descriptor On AMD CPUs sometimes the DB bit in the stack segment descriptor is left as 1, although the whole segment has been made unusable. Clear it here to pass an Intel VMX entry check when cross vendor migrating. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:51 +03:00
Sheng Yang	4b12f0de33	KVM: Replace get_mt_mask_shift with get_mt_mask Shadow_mt_mask is out of date, now it have only been used as a flag to indicate if TDP enabled. Get rid of it and use tdp_enabled instead. Also put memory type logical in kvm_x86_ops->get_mt_mask(). Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:49 +03:00
Gleb Natapov	14d0bc1f7c	KVM: Get rid of get_irq() callback It just returns pending IRQ vector from the queue for VMX/SVM. Get IRQ directly from the queue before migration and put it back after. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:49 +03:00
Gleb Natapov	95ba827313	KVM: SVM: Add NMI injection support Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:48 +03:00
Gleb Natapov	c4282df98a	KVM: Get rid of arch.interrupt_window_open & arch.nmi_window_open They are recalculated before each use anyway. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:48 +03:00
Gleb Natapov	0a5fff1923	KVM: Do not report TPR write to userspace if new value bigger or equal to a previous one. Saves many exits to userspace in a case of IRQ chip in userspace. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:47 +03:00
Gleb Natapov	615d519305	KVM: sync_lapic_to_cr8() should always sync cr8 to V_TPR Even if IRQ chip is in userspace. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:47 +03:00
Gleb Natapov	1d6ed0cb95	KVM: Remove inject_pending_vectors() callback It is the same as inject_pending_irq() for VMX/SVM now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:47 +03:00
Gleb Natapov	1cb948ae86	KVM: Remove exception_injected() callback. It always return false for VMX/SVM now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:46 +03:00
Gleb Natapov	9222be18f7	KVM: SVM: Coalesce userspace/kernel irqchip interrupt injection logic Start to use interrupt/exception queues like VMX does. This also fix the bug that if exit was caused by a guest internal exception access to IDT the exception was not reinjected. Use EVENTINJ to inject interrupts. Use VINT only for detecting when IRQ windows is open again. EVENTINJ ensures the interrupt is injected immediately and not delayed. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:46 +03:00
Gleb Natapov	863e8e658e	KVM: VMX: Consolidate userspace and kernel interrupt injection for VMX Use the same callback to inject irq/nmi events no matter what irqchip is in use. Only from VMX for now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:45 +03:00
Gleb Natapov	8061823a25	KVM: Make kvm_cpu_(has\|get)_interrupt() work for userspace irqchip too At the vector level, kernel and userspace irqchip are fairly similar. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:45 +03:00
Gleb Natapov	8317c298ea	KVM: SVM: Skip instruction on a task switch only when appropriate If a task switch was initiated because off a task gate in IDT and IDT was accessed because of an external even the instruction should not be skipped. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:42 +03:00
Gleb Natapov	64a7ec0668	KVM: Fix unneeded instruction skipping during task switching. There is no need to skip instruction if the reason for a task switch is a task gate in IDT and access to it is caused by an external even. The problem is currently solved only for VMX since there is no reliable way to skip an instruction in SVM. We should emulate it instead. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:38 +03:00
Gleb Natapov	78646121e9	KVM: Fix interrupt unhalting a vcpu when it shouldn't kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:33 +03:00
Gleb Natapov	fe4c7b1914	KVM: reuse (pop\|push)_irq from svm.c in vmx.c The prioritized bit vector manipulation functions are useful in both vmx and svm. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:31 +03:00
Gleb Natapov	61c50edfcd	KVM: SVM: Remove duplicate code in svm_do_inject_vector() svm_do_inject_vector() reimplements pop_irq(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:30 +03:00
Avi Kivity	99f85a28a7	KVM: SVM: Remove port 80 passthrough KVM optimizes guest port 80 accesses by passthing them through to the host. Some AMD machines die on port 80 writes, allowing the guest to hard-lock the host. Remove the port passthrough to avoid the problem. Cc: stable@kernel.org Reported-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Tested-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-05-11 14:40:51 +03:00
Andre Przywara	19bca6ab75	KVM: SVM: Fix cross vendor migration issue with unusable bit AMDs VMCB does not have an explicit unusable segment descriptor field, so we emulate it by using "not present". This has to be setup before the fixups, because this field is used there. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-05-11 11:18:04 +03:00
Andre Przywara	1fbdc7a585	KVM: SVM: set accessed bit for VMCB segment selectors In the segment descriptor _cache_ the accessed bit is always set (although it can be cleared in the descriptor itself). Since Intel checks for this condition on a VMENTRY, set this bit in the AMD path to enable cross vendor migration. Cc: stable@kernel.org Signed-off-by: Andre Przywara <andre.przywara@amd.com> Acked-By: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:03:11 +02:00
Jan Kiszka	34c33d163f	KVM: Drop unused evaluations from string pio handlers Looks like neither the direction nor the rep prefix are used anymore. Drop related evaluations from SVM's and VMX's I/O exit handlers. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:03:08 +02:00
Alexander Graf	1b2fd70c4e	KVM: Add FFXSR support AMD K10 CPUs implement the FFXSR feature that gets enabled using EFER. Let's check if the virtual CPU description includes that CPUID feature bit and allow enabling it then. This is required for Windows Server 2008 in Hyper-V mode. v2 adds CPUID capability exposure Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:03:08 +02:00
Joe Perches	ff81ff10b4	KVM: SVM: Fix typo in has_svm() Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:03:04 +02:00
Alexander Graf	c8a73f186b	KVM: SVM: Add microcode patch level dummy VMware ESX checks if the microcode level is correct when using a barcelona CPU, in order to see if it actually can use SVM. Let's tell it we're on the safe side... Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:03:02 +02:00
Jan Kiszka	ae675ef01c	KVM: x86: Wire-up hardware breakpoints for guest debugging Add the remaining bits to make use of debug registers also for guest debugging, thus enabling the use of hardware breakpoints and watchpoints. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:50 +02:00
Jan Kiszka	42dbaa5a05	KVM: x86: Virtualize debug registers So far KVM only had basic x86 debug register support, once introduced to realize guest debugging that way. The guest itself was not able to use those registers. This patch now adds (almost) full support for guest self-debugging via hardware registers. It refactors the code, moving generic parts out of SVM (VMX was already cleaned up by the KVM_SET_GUEST_DEBUG patches), and it ensures that the registers are properly switched between host and guest. This patch also prepares debug register usage by the host. The latter will (once wired-up by the following patch) allow for hardware breakpoints/watchpoints in guest code. If this is enabled, the guest will only see faked debug registers without functionality, but with content reflecting the guest's modifications. Tested on Intel only, but SVM /should/ work as well, but who knows... Known limitations: Trapping on tss switch won't work - most probably on Intel. Credits also go to Joerg Roedel - I used his once posted debugging series as platform for this patch. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:49 +02:00
Jan Kiszka	d0bfb940ec	KVM: New guest debug interface This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic part, controlling the "main switch" and the single-step feature. The arch specific part adds an x86 interface for intercepting both types of debug exceptions separately and re-injecting them when the host was not interested. Moveover, the foundation for guest debugging via debug registers is layed. To signal breakpoint events properly back to userland, an arch-specific data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block contains the PC, the debug exception, and relevant debug registers to tell debug events properly apart. The availability of this new interface is signaled by KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are provided. Note that both SVM and VTX are supported, but only the latter was tested yet. Based on the experience with all those VTX corner case, I would be fairly surprised if SVM will work out of the box. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:49 +02:00
Alexander Graf	236de05553	KVM: SVM: Allow setting the SVME bit Normally setting the SVME bit in EFER is not allowed, as we did not support SVM. Not since we do, we should also allow enabling SVM mode. v2 comes as last patch, so we don't enable half-ready code v4 introduces a module option to enable SVM v6 warns that nesting is enabled Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:48 +02:00
Joerg Roedel	eb6f302edf	KVM: SVM: Allow read access to MSR_VM_VR KVM tries to read the VM_CR MSR to find out if SVM was disabled by the BIOS. So implement read support for this MSR to make nested SVM running. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:47 +02:00
Alexander Graf	cf74a78b22	KVM: SVM: Add VMEXIT handler and intercepts This adds the #VMEXIT intercept, so we return to the level 1 guest when something happens in the level 2 guest that should return to the level 1 guest. v2 implements HIF handling and cleans up exception interception v3 adds support for V_INTR_MASKING_MASK v4 uses the host page hsave v5 removes IOPM merging code v6 moves mmu code out of the atomic section Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:47 +02:00
Alexander Graf	3d6368ef58	KVM: SVM: Add VMRUN handler This patch implements VMRUN. VMRUN enters a virtual CPU and runs that in the same context as the normal guest CPU would run. So basically it is implemented the same way, a normal CPU would do it. We also prepare all intercepts that get OR'ed with the original intercepts, as we do not allow a level 2 guest to be intercepted less than the first level guest. v2 implements the following improvements: - fixes the CPL check - does not allocate iopm when not used - remembers the host's IF in the HIF bit in the hflags v3: - make use of the new permission checking - add support for V_INTR_MASKING_MASK v4: - use host page backed hsave v5: - remove IOPM merging code v6: - save cr4 so PAE l1 guests work v7: - return 0 on vmrun so we check the MSRs too - fix MSR check to use the correct variable Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:47 +02:00
Alexander Graf	5542675baa	KVM: SVM: Add VMLOAD and VMSAVE handlers This implements the VMLOAD and VMSAVE instructions, that usually surround the VMRUN instructions. Both instructions load / restore the same elements, so we only need to implement them once. v2 fixes CPL checking and replaces memcpy by assignments v3 makes use of the new permission checking Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:47 +02:00
Alexander Graf	b286d5d8b0	KVM: SVM: Implement hsave Implement the hsave MSR, that gives the VCPU a GPA to save the old guest state in. v2 allows userspace to save/restore hsave v4 dummys out the hsave MSR, so we use a host page v6 remembers the guest's hsave and exports the MSR Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:46 +02:00
Alexander Graf	1371d90460	KVM: SVM: Implement GIF, clgi and stgi This patch implements the GIF flag and the clgi and stgi instructions that set this flag. Only if the flag is set (default), interrupts can be received by the CPU. To keep the information about that somewhere, this patch adds a new hidden flags vector. that is used to store information that does not go into the vmcb, but is SVM specific. I tried to write some code to make -no-kvm-irqchip work too, but the first level guest won't even boot with that atm, so I ditched it. v2 moves the hflags to x86 generic code v3 makes use of the new permission helper v6 only enables interrupt_window if GIF=1 Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:46 +02:00
Alexander Graf	c0725420cf	KVM: SVM: Add helper functions for nested SVM These are helpers for the nested SVM implementation. - nsvm_printk implements a debug printk variant - nested_svm_do calls a handler that can accesses gpa-based memory v3 makes use of the new permission checker v6 changes: - streamline nsvm_debug() - remove printk(KERN_ERR) - SVME check before CPL check - give GP error code - use new EFER constant Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:46 +02:00
Alexander Graf	9962d032bb	KVM: SVM: Move EFER and MSR constants to generic x86 code MSR_EFER_SVME_MASK, MSR_VM_CR and MSR_VM_HSAVE_PA are set in KVM specific headers. Linux does have nice header files to collect EFER bits and MSR IDs, so IMHO we should put them there. While at it, I also changed the naming scheme to match that of the other defines. (introduced in v6) Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:46 +02:00
Alexander Graf	f0b85051d0	KVM: SVM: Clean up VINTR setting The current VINTR intercept setters don't look clean to me. To make the code easier to read and enable the possibilty to trap on a VINTR set, this uses a helper function to set the VINTR intercept. v2 uses two distinct functions for setting and clearing the bit Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:45 +02:00
Marcelo Tosatti	b682b814e3	KVM: x86: fix LAPIC pending count calculation Simplify LAPIC TMCCT calculation by using hrtimer provided function to query remaining time until expiration. Fixes host hang with nested ESX. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-02-15 02:47:38 +02:00
Eduardo Habkost	2c8dceebb2	KVM: SVM: move svm_hardware_disable() code to asm/virtext.h Create cpu_svm_disable() function. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:52:30 +02:00
Eduardo Habkost	63d1142f8f	KVM: SVM: move has_svm() code to asm/virtext.h Use a trick to keep the printk()s on has_svm() working as before. gcc will take care of not generating code for the 'msg' stuff when the function is called with a NULL msg argument. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:52:29 +02:00
Guillaume Thouvenin	e93f36bcfa	KVM: allow emulator to adjust rip for emulated pio instructions If we call the emulator we shouldn't call skip_emulated_instruction() in the first place, since the emulator already computes the next rip for us. Thus we move ->skip_emulated_instruction() out of kvm_emulate_pio() and into handle_io() (and the svm equivalent). We also replaced "return 0" by "break" in the "do_io:" case because now the shadow register state needs to be committed. Otherwise eip will never be updated. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:51:48 +02:00
Amit Shah	c0d09828c8	KVM: SVM: Set the 'busy' flag of the TR selector The busy flag of the TR selector is not set by the hardware. This breaks migration from amd hosts to intel hosts. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:51:48 +02:00
Amit Shah	25022acc3d	KVM: SVM: Set the 'g' bit of the cs selector for cross-vendor migration The hardware does not set the 'g' bit of the cs selector and this breaks migration from amd hosts to intel hosts. Set this bit if the segment limit is beyond 1 MB. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:51:48 +02:00
Sheng Yang	64d4d52175	KVM: Enable MTRR for EPT The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and memory type field of EPT entry. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:51:45 +02:00
Marcelo Tosatti	a7052897b3	KVM: x86: trap invlpg With pages out of sync invlpg needs to be trapped. For now simply nuke the entry. Untested on AMD. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-10-15 14:25:21 +02:00
Avi Kivity	fa89a81766	KVM: Add statistics for guest irq injections These can help show whether a guest is making progress or not. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-10-15 10:15:25 +02:00
Avi Kivity	48d1503949	KVM: SVM: No need to unprotect memory during event injection when using npt No memory is protected anyway. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-10-15 10:15:24 +02:00
Amit Shah	94c935a1ee	KVM: SVM: Fix typo Fix typo in as-yet unused macro definition. Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-10-15 10:15:20 +02:00
Avi Kivity	80e31d4f61	KVM: SVM: Unify register save/restore across 32 and 64 bit hosts Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-10-15 10:15:14 +02:00
Jan Kiszka	19bd8afdc4	KVM: Consolidate XX_VECTOR defines Signed-off-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-10-15 10:15:14 +02:00
Marcelo Tosatti	5fdbf9765b	KVM: x86: accessors for guest registers As suggested by Avi, introduce accessors to read/write guest registers. This simplifies the ->cache_regs/->decache_regs interface, and improves register caching which is important for VMX, where the cost of vmcs_read/vmcs_write is significant. [avi: fix warnings] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-10-15 10:13:57 +02:00
Joerg Roedel	e5eab0cede	KVM: SVM: fix guest global tlb flushes with NPT Accesses to CR4 are intercepted even with Nested Paging enabled. But the code does not check if the guest wants to do a global TLB flush. So this flush gets lost. This patch adds the check and the flush to svm_set_cr4. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-09-11 11:39:25 +03:00
Joerg Roedel	44874f8491	KVM: SVM: fix random segfaults with NPT enabled This patch introduces a guest TLB flush on every NPF exit in KVM. This fixes random segfaults and #UD exceptions in the guest seen under some workloads (e.g. long running compile workloads or tbench). A kernbench run with and without that fix showed that it has a slowdown lower than 0.5% Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-09-11 11:31:53 +03:00
Avi Kivity	577bdc4966	KVM: Avoid instruction emulation when event delivery is pending When an event (such as an interrupt) is injected, and the stack is shadowed (and therefore write protected), the guest will exit. The current code will see that the stack is shadowed and emulate a few instructions, each time postponing the injection. Eventually the injection may succeed, but at that time the guest may be unwilling to accept the interrupt (for example, the TPR may have changed). This occurs every once in a while during a Windows 2008 boot. Fix by unshadowing the fault address if the fault was due to an event injection. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-27 11:34:10 +03:00
Joerg Roedel	5f4cb662a0	KVM: SVM: allow enabling/disabling NPT by reloading only the architecture module If NPT is enabled after loading both KVM modules on AMD and it should be disabled, both KVM modules must be reloaded. If only the architecture module is reloaded the behavior is undefined. With this patch it is possible to disable NPT only by reloading the kvm_amd module. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-27 11:34:09 +03:00
Avi Kivity	d6e88aec07	KVM: Prefix some x86 low level function with kvm_, to avoid namespace issues Fixes compilation with CONFIG_VMI enabled. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:39 +03:00
Joerg Roedel	0da1db75a2	KVM: SVM: fix suspend/resume support On suspend the svm_hardware_disable function is called which frees all svm_data variables. On resume they are not re-allocated. This patch removes the deallocation of svm_data from the hardware_disable function to the hardware_unsetup function which is not called on suspend. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:37 +03:00
Avi Kivity	7cc8883074	KVM: Remove decache_vcpus_on_cpu() and related callbacks Obsoleted by the vmx-specific per-cpu list. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:25 +03:00
Avi Kivity	4ecac3fd6d	KVM: Handle virtualization instruction #UD faults during reboot KVM turns off hardware virtualization extensions during reboot, in order to disassociate the memory used by the virtualization extensions from the processor, and in order to have the system in a consistent state. Unfortunately virtual machines may still be running while this goes on, and once virtualization extensions are turned off, any virtulization instruction will #UD on execution. Fix by adding an exception handler to virtualization instructions; if we get an exception during reboot, we simply spin waiting for the reset to complete. If it's a true exception, BUG() so we can have our stack trace. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:41:43 +03:00
Chris Lalancette	14ae51b6c0	KVM: SVM: Fake MSR_K7 performance counters Attached is a patch that fixes a guest crash when booting older Linux kernels. The problem stems from the fact that we are currently emulating MSR_K7_EVNTSEL[0-3], but not emulating MSR_K7_PERFCTR[0-3]. Because of this, setup_k7_watchdog() in the Linux kernel receives a GPF when it attempts to write into MSR_K7_PERFCTR, which causes an OOPs. The patch fixes it by just "fake" emulating the appropriate MSRs, throwing away the data in the process. This causes the NMI watchdog to not actually work, but it's not such a big deal in a virtualized environment. When we get a write to one of these counters, we printk_ratelimit() a warning. I decided to print it out for all writes, even if the data is 0; it doesn't seem to make sense to me to special case when data == 0. Tested by myself on a RHEL-4 guest, and Joerg Roedel on a Windows XP 64-bit guest. Signed-off-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:49 +03:00
Joerg Roedel	d2ebb4103f	KVM: SVM: add tracing support for TDP page faults To distinguish between real page faults and nested page faults they should be traced as different events. This is implemented by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:48 +03:00
Joerg Roedel	af9ca2d703	KVM: SVM: add missing kvmtrace markers This patch adds the missing kvmtrace markers to the svm module of kvm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:48 +03:00
Joerg Roedel	a069805579	KVM: SVM: implement dedicated INTR exit handler With an exit handler for INTR intercepts its possible to account them using kvmtrace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:47 +03:00
Joerg Roedel	c47f098d69	KVM: SVM: implement dedicated NMI exit handler With an exit handler for NMI intercepts its possible to account them using kvmtrace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:47 +03:00
Marcelo Tosatti	2f5997140f	KVM: migrate PIT timer Migrate the PIT timer to the physical CPU which vcpu0 is scheduled on, similarly to what is done for the LAPIC timers, otherwise PIT interrupts will be delayed until an unrelated event causes an exit. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-06-06 21:25:51 +03:00
Sheng Yang	67253af52e	KVM: Add kvm_x86_ops get_tdp_level() The function get_tdp_level() provided the number of tdp level for EPT and NPT rather than the NPT specific macro. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-05-04 14:44:34 +03:00
Joerg Roedel	1336028b9a	KVM: SVM: remove selective CR0 comment There is not selective cr0 intercept bug. The code in the comment sets the CR0.PG bit. But KVM sets the CR4.PG bit for SVM always to implement the paged real mode. So the 'mov %eax,%cr0' instruction does not change the CR0.PG bit. Selective CR0 intercepts only occur when a bit is actually changed. So its the right behavior that there is no intercept on this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:44 +03:00
Joerg Roedel	aaf697e4e0	KVM: SVM: remove now obsolete FIXME comment With the usage of the V_TPR field this comment is now obsolete. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:43 +03:00
Joerg Roedel	aaacfc9ae2	KVM: SVM: disable CR8 intercept when tpr is not masking interrupts This patch disables the intercept of CR8 writes if the TPR is not masking interrupts. This reduces the total number CR8 intercepts to below 1 percent of what we have without this patch using Windows 64 bit guests. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:43 +03:00
Joerg Roedel	d7bf8221a3	KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled If the CR8 write intercept is disabled the V_TPR field of the VMCB needs to be synced with the TPR field in the local apic. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:42 +03:00

... 2 3 4 5 6 ...

376 Commits