linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-07 12:41:55 +00:00

Author	SHA1	Message	Date
Xiao Guangrong	c22885050e	KVM: MMU: fix Dirty bit missed if CR0.WP = 0 If the write-fault access is from supervisor and CR0.WP is not set on the vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte and clears U bit. This is the chance that kvm can change pte access from readonly to writable Unfortunately, the pte access is the access of 'direct' shadow page table, means direct sp.role.access = pte_access, then we will create a writable spte entry on the readonly shadow page table. It will cause Dirty bit is not tracked when two guest ptes point to the same large page. Note, it does not have other impact except Dirty bit since cr0.wp is encoded into sp.role It can be fixed by adjusting pte access before establishing shadow page table. Also, after that, no mmu specified code exists in the common function and drop two parameters in set_spte Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-10 15:28:08 -02:00
Avi Kivity	fb864fbc72	KVM: x86 emulator: convert basic ALU ops to fastop Opcodes: TEST CMP ADD ADC SUB SBB XOR OR AND Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-09 17:39:30 -02:00
Avi Kivity	f7857f35db	KVM: x86 emulator: add macros for defining 2-operand fastop emulation Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-09 17:39:28 -02:00
Avi Kivity	45a1467d7e	KVM: x86 emulator: convert NOT, NEG to fastop Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-09 17:39:25 -02:00
Avi Kivity	75f728456f	KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-09 17:39:21 -02:00
Avi Kivity	b6744dc3fb	KVM: x86 emulator: introduce NoWrite flag Instead of disabling writeback via OP_NONE, just specify NoWrite. Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-09 17:39:18 -02:00
Avi Kivity	b7d491e7f0	KVM: x86 emulator: Support for declaring single operand fastops Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-09 17:39:17 -02:00
Avi Kivity	e28bbd44da	KVM: x86 emulator: framework for streamlining arithmetic opcodes We emulate arithmetic opcodes by executing a "similar" (same operation, different operands) on the cpu. This ensures accurate emulation, esp. wrt. eflags. However, the prologue and epilogue around the opcode is fairly long, consisting of a switch (for the operand size) and code to load and save the operands. This is repeated for every opcode. This patch introduces an alternative way to emulate arithmetic opcodes. Instead of the above, we have four (three on i386) functions consisting of just the opcode and a ret; one for each operand size. For example: .align 8 em_notb: not %al ret .align 8 em_notw: not %ax ret .align 8 em_notl: not %eax ret .align 8 em_notq: not %rax ret The prologue and epilogue are shared across all opcodes. Note the functions use a special calling convention; notably eflags is an input/output parameter and is not clobbered. Rather than dispatching the four functions through a jump table, the functions are declared as a constant size (8) so their address can be calculated. Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-09 17:39:17 -02:00
Marcelo Tosatti	b09408d00f	KVM: VMX: fix incorrect cached cpl value with real/v8086 modes CPL is always 0 when in real mode, and always 3 when virtual 8086 mode. Using values other than those can cause failures on operations that check CPL. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-08 17:25:35 -02:00
Gleb Natapov	b0cfeb5ded	KVM: x86: remove unused variable from walk_addr_generic() Fix compilation warning. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-08 17:23:39 -02:00
Gleb Natapov	908e7d7999	KVM: MMU: simplify folding of dirty bit into accessed_dirty MMU code tries to avoid if()s HW is not able to predict reliably by using bitwise operation to streamline code execution, but in case of a dirty bit folding this gives us nothing since write_fault is checked right before the folding code. Lets just piggyback onto the if() to make code more clear. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-07 20:31:35 -02:00
Gleb Natapov	ee04e0cea8	KVM: mmu: remove unused trace event trace_kvm_mmu_delay_free_pages() is no longer used. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-07 19:54:50 -02:00
Gleb Natapov	0ca1b4f4ba	KVM: VMX: handle IO when emulation is due to #GP in real mode. With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can enter the vcpu with smaller segment limit than guest configured. If the guest tries to access pass this limit it will get #GP at which point instruction will be emulated with correct segment limit applied. If during the emulation IO is detected it is not handled correctly. Vcpu thread should exit to userspace to serve the IO, but it returns to the guest instead. Since emulation is not completed till userspace completes the IO the faulty instruction is re-executed ad infinitum. The patch fixes that by exiting to userspace if IO happens during instruction emulation. Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-02 19:36:31 -02:00
Gleb Natapov	d54d07b2ca	KVM: VMX: Do not fix segment register during vcpu initialization. Segment registers will be fixed according to current emulation policy during switching to real mode for the first time. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-02 19:36:30 -02:00
Gleb Natapov	d99e415275	KVM: VMX: fix emulation of invalid guest state. Currently when emulation of invalid guest state is enable (emulate_invalid_guest_state=1) segment registers are still fixed for entry to vm86 mode some times. Segment register fixing is avoided in enter_rmode(), but vmx_set_segment() still does it unconditionally. The patch fixes it. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-02 19:36:29 -02:00
Gleb Natapov	89efbed02c	KVM: VMX: make rmode_segment_valid() more strict. Currently it allows entering vm86 mode if segment limit is greater than 0xffff and db bit is set. Both of those can cause incorrect execution of instruction by cpu since in vm86 mode limit will be set to 0xffff and db will be forced to 0. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-02 19:36:28 -02:00
Gleb Natapov	045a282ca4	KVM: emulator: implement fninit, fnstsw, fnstcw Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-02 19:36:27 -02:00
Gleb Natapov	3a78a4f463	KVM: emulator: drop RPL check from linearize() function According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6 "Privilege Level Checking When Accessing Data Segments" RPL checking is done during loading of a segment selector, not during data access. We already do checking during segment selector loading, so drop the check during data access. Checking RPL during data access triggers #GP if after transition from real mode to protected mode RPL bits in a segment selector are set. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-01-02 19:36:26 -02:00
Gleb Natapov	f924d66d27	KVM: VMX: remove unneeded temporary variable from vmx_set_segment() Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-23 14:02:00 +02:00
Gleb Natapov	1ecd50a947	KVM: VMX: clean-up vmx_set_segment() Move all vm86_active logic into one place. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-23 14:01:49 +02:00
Gleb Natapov	39dcfb95de	KVM: VMX: remove redundant code from vmx_set_segment() Segment descriptor's base is fixed by call to fix_rmode_seg(). Not need to do it twice. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-23 14:01:37 +02:00
Gleb Natapov	beb853ffec	KVM: VMX: use fix_rmode_seg() to fix all code/data segments The code for SS and CS does the same thing fix_rmode_seg() is doing. Use it instead of hand crafted code. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-23 14:01:18 +02:00
Gleb Natapov	c6ad115348	KVM: VMX: return correct segment limit and flags for CS/SS registers in real mode VMX without unrestricted mode cannot virtualize real mode, so if emulate_invalid_guest_state=0 kvm uses vm86 mode to approximate it. Sometimes, when guest moves from protected mode to real mode, it leaves segment descriptors in a state not suitable for use by vm86 mode virtualization, so we keep shadow copy of segment descriptors for internal use and load fake register to VMCS for guest entry to succeed. Till now we kept shadow for all segments except SS and CS (for SS and CS we returned parameters directly from VMCS), but since commit `a5625189f6` emulator enforces segment limits in real mode. This causes #GP during move from protected mode to real mode when emulator fetches first instruction after moving to real mode since it uses incorrect CS base and limit to linearize the %rip. Fix by keeping shadow for SS and CS too. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-23 14:01:03 +02:00
Gleb Natapov	0647f4aa8c	KVM: VMX: relax check for CS register in rmode_segment_valid() rmode_segment_valid() checks if segment descriptor can be used to enter vm86 mode. VMX spec mandates that in vm86 mode CS register will be of type data, not code. Lets allow guest entry with vm86 mode if the only problem with CS register is incorrect type. Otherwise entire real mode will be emulated. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-23 14:00:47 +02:00
Gleb Natapov	07f42f5f25	KVM: VMX: cleanup rmode_segment_valid() Set segment fields explicitly instead of using binary operations. No behaviour changes. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-23 14:00:36 +02:00
Nickolai Zeldovich	d4b06c2d4c	kvm: fix i8254 counter 0 wraparound The kvm i8254 emulation for counter 0 (but not for counters 1 and 2) has at least two bugs in mode 0: 1. The OUT bit, computed by pit_get_out(), is never set high. 2. The counter value, computed by pit_get_count(), wraps back around to the initial counter value, rather than wrapping back to 0xFFFF (which is the behavior described in the comment in __kpit_elapsed, the behavior implemented by qemu, and the behavior observed on AMD hardware). The bug stems from __kpit_elapsed computing the elapsed time mod the initial counter value (stored as nanoseconds in ps->period). This is both unnecessary (none of the callers of kpit_elapsed expect the value to be at most the initial counter value) and incorrect (it causes pit_get_count to appear to wrap around to the initial counter value rather than 0xFFFF). Removing this mod from __kpit_elapsed fixes both of the above bugs. Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-18 11:12:38 +02:00
Gleb Natapov	e11ae1a102	KVM: remove unused variable. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-14 23:07:44 -02:00
Alex Williamson	f82a8cfe93	KVM: struct kvm_memory_slot.user_alloc -> bool There's no need for this to be an int, it holds a boolean. Move to the end of the struct for alignment. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-13 23:24:38 -02:00
Alex Williamson	bbacc0c111	KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM. One is the user accessible slots and the other is user + private. Make this more obvious. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-13 23:21:57 -02:00
Gleb Natapov	f3200d00ea	KVM: inject ExtINT interrupt before APIC interrupts According to Intel SDM Volume 3 Section 10.8.1 "Interrupt Handling with the Pentium 4 and Intel Xeon Processors" and Section 10.8.2 "Interrupt Handling with the P6 Family and Pentium Processors" ExtINT interrupts are sent directly to the processor core for handling. Currently KVM checks APIC before it considers ExtINT interrupts for injection which is backwards from the spec. Make code behave according to the SDM. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: "Zhang, Yang Z" <yang.z.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-13 23:05:21 -02:00
Nadav Amit	5e2c688351	KVM: x86: fix mov immediate emulation for 64-bit operands MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand. The previous emulation implementation assumes the operand is no longer than 32. Adding OpImm64 for this matter. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=881579 Signed-off-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-13 22:30:56 -02:00
Gleb Natapov	7f662273e4	KVM: emulator: implement AAD instruction Windows2000 uses it during boot. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=50921 Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-13 22:27:22 -02:00
Linus Torvalds	66cdd0ceaf	Merge tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Marcelo Tosatti: "Considerable KVM/PPC work, x86 kvmclock vsyscall support, IA32_TSC_ADJUST MSR emulation, amongst others." Fix up trivial conflict in kernel/sched/core.c due to cross-cpu migration notifier added next to rq migration call-back. * tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits) KVM: emulator: fix real mode segment checks in address linearization VMX: remove unneeded enable_unrestricted_guest check KVM: VMX: fix DPL during entry to protected mode x86/kexec: crash_vmclear_local_vmcss needs __rcu kvm: Fix irqfd resampler list walk KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary KVM: MMU: optimize for set_spte KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation KVM: PPC: bookehv: Add guest computation mode for irq delivery KVM: PPC: Make EPCR a valid field for booke64 and bookehv KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation KVM: PPC: e500: Add emulation helper for getting instruction ea KVM: PPC: bookehv64: Add support for interrupt handling KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler KVM: PPC: booke: Fix get_tb() compile error on 64-bit KVM: PPC: e500: Silence bogus GCC warning in tlb code ...	2012-12-13 15:31:08 -08:00
Gleb Natapov	58b7825bc3	KVM: emulator: fix real mode segment checks in address linearization In real mode CS register is writable, so do not #GP on write. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-11 21:00:28 -02:00
Gleb Natapov	0b26b588d9	VMX: remove unneeded enable_unrestricted_guest check If enable_unrestricted_guest is true vmx->rmode.vm86_active will always be false. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-11 21:00:28 -02:00
Gleb Natapov	a4d3326c2d	KVM: VMX: fix DPL during entry to protected mode On CPUs without support for unrestricted guests DPL cannot be smaller than RPL for data segments during guest entry, but this state can occurs if a data segment selector changes while vcpu is in real mode to a value with lowest two bits != 00. Fix that by forcing DPL == RPL on transition to protected mode. This is a regression introduced by `c865c43de6`. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-11 21:00:27 -02:00
Zhang Yanfei	8f536b7697	KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump The vmclear function will be assigned to the callback function pointer when loading kvm-intel module. And the bitmap indicates whether we should do VMCLEAR operation in kdump. The bits in the bitmap are set/unset according to different conditions. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-06 18:26:57 +02:00
Xiao Guangrong	c219346325	KVM: MMU: optimize for set_spte There are two cases we need to adjust page size in set_spte: 1): the one is other vcpu creates new sp in the window between mapping_level() and acquiring mmu-lock. 2): the another case is the new sp is created by itself (page-fault path) when guest uses the target gfn as its page table. In current code, set_spte drop the spte and emulate the access for these case, it works not good: - for the case 1, it may destroy the mapping established by other vcpu, and do expensive instruction emulation. - for the case 2, it may emulate the access even if the guest is accessing the page which not used as page table. There is a example, 0~2M is used as huge page in guest, in this huge page, only page 3 used as page table, then guest read/writes on other pages can cause instruction emulation. Both of these cases can be fixed by allowing guest to retry the access, it will refault, then we can establish the mapping by using small page Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-06 09:11:25 +02:00
Julian Stecklina	66f7b72e11	KVM: x86: Make register state after reset conform to specification VMX behaves now as SVM wrt to FPU initialization. Code has been moved to generic code path. General-purpose registers are now cleared on reset and INIT. SVM code properly initializes EDX. Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-05 18:00:07 +02:00
Zhang Xiantao	2b3c5cbc0d	kvm: don't use bit24 for detecting address-specific invalidation capability Bit24 in VMX_EPT_VPID_CAP_MASI is not used for address-specific invalidation capability reporting, so remove it from KVM to avoid conflicts in future. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-05 16:35:48 +02:00
Zhang Xiantao	0307b7b8c2	kvm: remove unnecessary bit checking for ept violation Bit 6 in EPT vmexit's exit qualification is not defined in SDM, so remove it. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-05 16:35:21 +02:00
Jan Kiszka	45e3cc7d9f	KVM: x86: Fix uninitialized return code This is a regression caused by `18595411a7`. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-02 17:37:04 +02:00
Will Auld	ba904635d4	KVM: x86: Emulate IA32_TSC_ADJUST MSR CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control. However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semantic effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process performs a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-30 18:29:30 -02:00
Will Auld	8fe8ab46be	KVM: x86: Add code to track call origin for msr assignment In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-30 18:26:12 -02:00
Xiao Guangrong	5a560f8b5e	KVM: VMX: fix memory order between loading vmcs and clearing vmcs vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs does not exist on any vcpu If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu list. The list can be corrupted if the cpu prefetch the vmcs's list before reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before making vmcs->vcpu == -1 be visible Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-29 21:14:46 -02:00
Xiao Guangrong	e6c7d32172	KVM: VMX: fix invalid cpu passed to smp_call_function_single In loaded_vmcs_clear, loaded_vmcs->cpu is the fist parameter passed to smp_call_function_single, if the target cpu is downing (doing cpu hot remove), loaded_vmcs->cpu can become -1 then -1 is passed to smp_call_function_single It can be triggered when vcpu is being destroyed, loaded_vmcs_clear is called in the preemptionable context Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-28 22:04:58 -02:00
Marcelo Tosatti	d98d07ca7e	KVM: x86: update pvclock area conditionally, on cpu migration As requested by Glauber, do not update kvmclock area on vcpu->pcpu migration, in case the host has stable TSC. This is to reduce cacheline bouncing. Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:15 -02:00
Marcelo Tosatti	b48aa97e38	KVM: x86: require matched TSC offsets for master clock With master clock, a pvclock clock read calculates: ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ] Where 'rdtsc' is the host TSC. system_timestamp and tsc_timestamp are unique, one tuple per VM: the "master clock". Given a host with synchronized TSCs, its obvious that guest TSC must be matched for the above to guarantee monotonicity. Allow master clock usage only if guest TSCs are synchronized. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:15 -02:00
Marcelo Tosatti	42897d866b	KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization TSC initialization will soon make use of online_vcpus. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:14 -02:00
Marcelo Tosatti	d828199e84	KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag KVM added a global variable to guarantee monotonicity in the guest. One of the reasons for that is that the time between 1. ktime_get_ts(&timespec); 2. rdtscll(tsc); Is variable. That is, given a host with stable TSC, suppose that two VCPUs read the same time via ktime_get_ts() above. The time required to execute 2. is not the same on those two instances executing in different VCPUS (cache misses, interrupts...). If the TSC value that is used by the host to interpolate when calculating the monotonic time is the same value used to calculate the tsc_timestamp value stored in the pvclock data structure, and a single <system_timestamp, tsc_timestamp> tuple is visible to all vcpus simultaneously, this problem disappears. See comment on top of pvclock_update_vm_gtod_copy for details. Monotonicity is then guaranteed by synchronicity of the host TSCs and guest TSCs. Set TSC stable pvclock flag in that case, allowing the guest to read clock from userspace. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:13 -02:00

1 2 3 4 5 ...

2443 Commits