linux

Author	SHA1	Message	Date
Linus Torvalds	b9b16a7922	Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature update from Ingo Molnar: "Two refinements to clflushopt support" * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpufeature: If we disable CLFLUSH, we should disable CLFLUSHOPT x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH	2014-04-01 10:11:21 -07:00
Radim Krčmář	596f3142d2	KVM: SVM: fix cr8 intercept window We always disable cr8 intercept in its handler, but only re-enable it if handling KVM_REQ_EVENT, so there can be a window where we do not intercept cr8 writes, which allows an interrupt to disrupt a higher priority task. Fix this by disabling intercepts in the same function that re-enables them when needed. This fixes BSOD in Windows 2008. Cc: <stable@vger.kernel.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-03-12 18:21:10 +01:00
Paolo Bonzini	1b385cbdd7	kvm, vmx: Really fix lazy FPU on nested guest Commit `e504c9098e` (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13) highlighted a real problem, but the fix was subtly wrong. nested_read_cr0 is the CR0 as read by L2, but here we want to look at the CR0 value reflecting L1's setup. In other words, L2 might think that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually running it with TS=1, we should inject the fault into L1. The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use it. Fixes: `e504c9098e` Reported-by: Kashyap Chamarty <kchamart@redhat.com> Reported-by: Stefan Bader <stefan.bader@canonical.com> Tested-by: Kashyap Chamarty <kchamart@redhat.com> Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-27 22:54:11 +01:00
Andrew Honig	a08d3b3b99	kvm: x86: fix emulator buffer overflow (CVE-2014-0049) The problem occurs when the guest performs a pusha with the stack address pointing to an mmio address (or an invalid guest physical address) to start with, but then extending into an ordinary guest physical address. When doing repeated emulated pushes emulator_read_write sets mmio_needed to 1 on the first one. On a later push when the stack points to regular memory, mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0. As a result, KVM exits to userspace, and then returns to complete_emulated_mmio. In complete_emulated_mmio vcpu->mmio_cur_fragment is incremented. The termination condition of vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved. The code bounces back and fourth to userspace incrementing mmio_cur_fragment past it's buffer. If the guest does nothing else it eventually leads to a a crash on a memcpy from invalid memory address. However if a guest code can cause the vm to be destroyed in another vcpu with excellent timing, then kvm_clear_async_pf_completion_queue can be used by the guest to control the data that's pointed to by the call to cancel_work_item, which can be used to gain execution. Fixes: `f78146b0f9` Signed-off-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org (3.5+) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-27 19:35:22 +01:00
H. Peter Anvin	840d2830e6	x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH We call this "clflush" in /proc/cpuinfo, and have cpu_has_clflush()... let's be consistent and just call it that. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alan Cox <alan@linux.intel.com> Link: http://lkml.kernel.org/n/tip-mlytfzjkvuf739okyn40p8a5@git.kernel.org	2014-02-27 08:31:30 -08:00
Marcelo Tosatti	404381c583	KVM: MMU: drop read-only large sptes when creating lower level sptes Read-only large sptes can be created due to read-only faults as follows: - QEMU pagetable entry that maps guest memory is read-only due to COW. - Guest read faults such memory, COW is not broken, because it is a read-only fault. - Enable dirty logging, large spte not nuked because it is read-only. - Write-fault on such memory causes guest to loop endlessly (which must go down to level 1 because dirty logging is enabled). Fix by dropping large spte when necessary. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-26 17:23:32 +01:00
Paolo Bonzini	5f66b62095	kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef Self explanatory. Reported-by: Radim Krcmar <rkrcmar@redhat.com> Cc: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-29 18:10:45 +01:00
Jan Kiszka	58cb628dbe	KVM: x86: Validate guest writes to MSR_IA32_APICBASE Check for invalid state transitions on guest-initiated updates of MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is not supported and all invalid transitions as described in SDM section 10.12.5. It also checks that no reserved bit is set in APICBASE by the guest. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> [Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-27 14:39:44 +01:00
Vadim Rozenfeld	b3af1e889e	KVM: x86: mark hyper-v vapic assist page as dirty Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-24 16:51:15 +01:00
Vadim Rozenfeld	b94b64c9a7	KVM: x86: mark hyper-v hypercall page as dirty Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-23 19:00:04 +01:00
Linus Torvalds	7ebd3faa9b	First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJS3TVKAAoJEBvWZb6bTYbyIFgP/2cmt4ifCuFMaZv4+G1S8jZU uC9ZB/+7vzht/p6zAy+4BxurKbHmSBFkC1OKcxYuy7yB4CQkHabzj4V2vRtqFdwH 5lExP9qh3kqaVLuhnvxLTmkktR3EW4PFy6OI53l5kRNktOXSuZ0aN6K3V7tCg/X0 iL7ASo4bJKlxeWcDpmuVrNgAajmZVfXrjKY7robgBQno+yIsgKhRZRBQHjozA6B8 FpCo/k48RZd/EzIbV/PDDRI4hmmry/lgrO9SKjzq56wSqff2bd/k/KYze4dbAPfd Ps60enPTuHmeEjjb4MMMU4EKHVdTQFUMx/xZCmT4xzoh8s4of6RHphXbfE0SUznQ dTveyEQAR7E3JNS0k1+3WEX5fWlFesp0hO2NeE0wzUq4TAr9ztgVO9NQ6Si15e7Z 2HysO0T5Ojtt0lY08/PvS6i48eCAuuBomrejJS8hLW4SUZ5adn+yW4Qo7Fp9JeBR l9a3LsVT8BZMtUWrUuFcVhlM4MbzElUPjDbgWhR8UYU/kpfVZOQu8qWgGKR4UWXy X7/t9l/tjR99CmfMJBAOzJid+ScSpAfg77BdaKiQrVfVIJmsjEjlO8vUMyj5b1HF hPX5wNyJjHAOfridLeHSs4Rdm4a8sk8Az5d4h76pLVz8M4jyTi2v0rO3N4/dU/pu x7N8KR5hAj+mLBoM9/Al =8sYU -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) kvm: make KVM_MMU_AUDIT help text more readable KVM: s390: Fix memory access error detection KVM: nVMX: Update guest activity state field on L2 exits KVM: nVMX: Fix nested_run_pending on activity state HLT KVM: nVMX: Clean up handling of VMX-related MSRs KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit KVM: nVMX: Leave VMX mode on clearing of feature control MSR KVM: VMX: Fix DR6 update on #DB exception KVM: SVM: Fix reading of DR6 KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS add support for Hyper-V reference time counter KVM: remove useless write to vcpu->hv_clock.tsc_timestamp KVM: x86: fix tsc catchup issue with tsc scaling KVM: x86: limit PIT timer frequency KVM: x86: handle invalid root_hpa everywhere kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub kvm: vfio: silence GCC warning KVM: ARM: Remove duplicate include arm/arm64: KVM: relax the requirements of VMA alignment for THP ...	2014-01-22 21:40:43 -08:00
Randy Dunlap	94491620e1	kvm: make KVM_MMU_AUDIT help text more readable Make KVM_MMU_AUDIT kconfig help text readable and collapse two spaces between words down to one space. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-20 12:59:26 +01:00
Jan Kiszka	3edf1e698f	KVM: nVMX: Update guest activity state field on L2 exits Set guest activity state in L1's VMCS according to the VCPUs mp_state. This ensures we report the correct state in case we L2 executed HLT or if we put L2 into HLT state and it was now woken up by an event. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:18 +01:00
Jan Kiszka	7af40ad37b	KVM: nVMX: Fix nested_run_pending on activity state HLT When we suspend the guest in HLT state, the nested run is no longer pending - we emulated it completely. So only set nested_run_pending after checking the activity state. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:17 +01:00
Jan Kiszka	cae501397a	KVM: nVMX: Clean up handling of VMX-related MSRs This simplifies the code and also stops issuing warning about writing to unhandled MSRs when VMX is disabled or the Feature Control MSR is locked - we do handle them all according to the spec. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:16 +01:00
Jan Kiszka	542060ea79	KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject Already used by nested SVM for tracing nested vmexit: kvm_nested_vmexit marks exits from L2 to L0 while kvm_nested_vmexit_inject marks vmexits that are reflected to L1. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:15 +01:00
Jan Kiszka	533558bcb6	KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit Instead of fixing up the vmcs12 after the nested vmexit, pass key parameters already when calling nested_vmx_vmexit. This will help tracing those vmexits. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:14 +01:00
Jan Kiszka	42124925c1	KVM: nVMX: Leave VMX mode on clearing of feature control MSR When userspace sets MSR_IA32_FEATURE_CONTROL to 0, make sure we leave root and non-root mode, fully disabling VMX. The register state of the VCPU is undefined after this step, so userspace has to set it to a proper state afterward. This enables to reboot a VM while it is running some hypervisor code. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:13 +01:00
Jan Kiszka	8246bf52c7	KVM: VMX: Fix DR6 update on #DB exception According to the SDM, only bits 0-3 of DR6 "may" be cleared by "certain" debug exception. So do update them on #DB exception in KVM, but leave the rest alone, only setting BD and BS in addition to already set bits in DR6. This also aligns us with kvm_vcpu_check_singlestep. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:12 +01:00
Jan Kiszka	73aaf249ee	KVM: SVM: Fix reading of DR6 In contrast to VMX, SVM dose not automatically transfer DR6 into the VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor hook to obtain the current value. And as SVM now picks the DR6 state from its VMCB, we also need a set callback in order to write updates of DR6 back. Fixes a regression of `020df0794f`. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:10 +01:00
Jan Kiszka	9926c9fdbd	KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS Whenever we change arch.dr7, we also have to call kvm_update_dr7. In case guest debugging is off, this will synchronize the new state into hardware. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:09 +01:00
Vadim Rozenfeld	e984097b55	add support for Hyper-V reference time counter Signed-off: Peter Lieven <pl@kamp.de> Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com> After some consideration I decided to submit only Hyper-V reference counters support this time. I will submit iTSC support as a separate patch as soon as it is ready. v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup <digitaleric@google.com> and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven <pl@amp.de> 3. move check for TSC page enable from second patch to this one. v3 -> v4 Get rid of ref counter offset. v4 -> v5 replace __copy_to_user with kvm_write_guest when updateing iTSC page. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:08 +01:00
Paolo Bonzini	aab6d7ce37	KVM: remove useless write to vcpu->hv_clock.tsc_timestamp After the previous patch from Marcelo, the comment before this write became obsolete. In fact, the write is unnecessary. The calls to kvm_write_tsc ultimately result in a master clock update as soon as all TSCs agree and the master clock is re-enabled. This master clock update will rewrite tsc_timestamp. So, together with the comment, delete the dead write too. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 18:08:25 +01:00
Marcelo Tosatti	f25e656d31	KVM: x86: fix tsc catchup issue with tsc scaling To fix a problem related to different resolution of TSC and system clock, the offset in TSC units is approximated by delta = vcpu->hv_clock.tsc_timestamp - vcpu->last_guest_tsc (Guest TSC value at (Guest TSC value at last VM-exit) the last kvm_guest_time_update call) Delta is then later scaled using mult,shift pair found in hv_clock structure (which is correct against tsc_timestamp in that structure). However, if a frequency change is performed between these two points, this delta is measured using different TSC frequencies, but scaled using mult,shift pair for one frequency only. The end result is an incorrect delta. The bug which this code works around is not the only cause for clock backwards events. The global accumulator is still necessary, so remove the max_kernel_ns fix and rely on the global accumulator for no clock backwards events. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 13:44:46 +01:00
Andrew Jones	0dce7cd67f	kvm: x86: fix apic_base enable check Commit `e66d2ae7c6` moved the assignment vcpu->arch.apic_base = value above a condition with (vcpu->arch.apic_base ^ value), causing that check to always fail. Use old_value, vcpu->arch.apic_base's old value, in the condition instead. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 13:42:14 +01:00
Marcelo Tosatti	9ed96e87c5	KVM: x86: limit PIT timer frequency Limit PIT timer frequency similarly to the limit applied by LAPIC timer. Cc: stable@kernel.org Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 12:43:54 +01:00
Marcelo Tosatti	37f6a4e237	KVM: x86: handle invalid root_hpa everywhere Rom Freiman <rom@stratoscale.com> notes other code paths vulnerable to bug fixed by `989c6b34f6`. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-01-15 12:16:16 +01:00
Marcelo Tosatti	26a865f4aa	KVM: VMX: fix use after free of vmx->loaded_vmcs After free_loaded_vmcs executes, the "loaded_vmcs" structure is kfreed, and now vmx->loaded_vmcs points to a kfreed area. Subsequent free_loaded_vmcs then attempts to manipulate vmx->loaded_vmcs. Switch the order to avoid the problem. https://bugzilla.redhat.com/show_bug.cgi?id=1047892 Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-01-08 19:14:08 -02:00
Chen Fan	96893977b8	KVM: x86: Fix debug typo error in lapic fix the 'vcpi' typos when apic_debug is enabled. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-01-08 19:09:57 -02:00
Zhihui Zhang	2f0a6397dd	KVM: VMX: check use I/O bitmap first before unconditional I/O exit According to Table C-1 of Intel SDM 3C, a VM exit happens on an I/O instruction when "use I/O bitmaps" VM-execution control was 0 _and_ the "unconditional I/O exiting" VM-execution control was 1. So we can't just check "unconditional I/O exiting" alone. This patch was improved by suggestion from Jan Kiszka. Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-01-08 19:01:40 -02:00
Jan Kiszka	29bf08f12b	KVM: nVMX: Unconditionally uninit the MMU on nested vmexit Three reasons for doing this: 1. arch.walk_mmu points to arch.mmu anyway in case nested EPT wasn't in use. 2. this aligns VMX with SVM. But 3. is most important: nested_cpu_has_ept(vmcs12) queries the VMCS page, and if one guest VCPU manipulates the page of another VCPU in L2, we may be fooled to skip over the nested_ept_uninit_mmu_context, leaving mmu in nested state. That can crash the host later on if nested_ept_get_cr3 is invoked while L1 already left vmxon and nested.current_vmcs12 became NULL therefore. Cc: stable@kernel.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-01-02 11:22:14 -02:00
Jan Kiszka	e66d2ae7c6	KVM: x86: Fix APIC map calculation after re-enabling Update arch.apic_base before triggering recalculate_apic_map. Otherwise the recalculation will work against the previous state of the APIC and will fail to build the correct map when an APIC is hardware-enabled again. This fixes a regression of `1e08ec4a13`. Cc: stable@vger.kernel.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-12-30 18:58:17 -02:00
Marcelo Tosatti	989c6b34f6	KVM: MMU: handle invalid root_hpa at __direct_map It is possible for __direct_map to be called on invalid root_hpa (-1), two examples: 1) try_async_pf -> can_do_async_pf -> vmx_interrupt_allowed -> nested_vmx_vmexit 2) vmx_handle_exit -> vmx_interrupt_allowed -> nested_vmx_vmexit Then to load_vmcs12_host_state and kvm_mmu_reset_context. Check for this possibility, let fault exception be regenerated. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=924916 Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-20 19:22:49 +01:00
Jan Kiszka	4c4d563b49	KVM: VMX: Do not skip the instruction if handle_dr injects a fault If kvm_get_dr or kvm_set_dr reports that it raised a fault, we must not advance the instruction pointer. Otherwise the exception will hit the wrong instruction. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-20 19:20:39 +01:00
Jan Kiszka	ca3f257ae5	KVM: nVMX: Support direct APIC access from L2 It's a pathological case, but still a valid one: If L1 disables APIC virtualization and also allows L2 to directly write to the APIC page, we have to forcibly enable APIC virtualization while in L2 if the in-kernel APIC is in use. This allows to run the direct interrupt test case in the vmx unit test without x2APIC. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-18 10:27:09 +01:00
Takuya Yoshikawa	9357d93952	KVM: x86: Add comment on vcpu_enter_guest()'s return value Giving proper names to the 0 and 1 was once suggested. But since 0 is returned to the userspace, giving it another name can introduce extra confusion. This patch just explains the meanings instead. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-13 14:23:54 +01:00
Takuya Yoshikawa	c08ac06ab3	KVM: Use cond_resched() directly and remove useless kvm_resched() Since the commit `15ad7146` ("KVM: Use the scheduler preemption notifiers to make kvm preemptible"), the remaining stuff in this function is a simple cond_resched() call with an extra need_resched() check which was there to avoid dropping VCPUs unnecessarily. Now it is meaningless. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-13 14:23:45 +01:00
Gleb Natapov	17d68b763f	KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) A guest can cause a BUG_ON() leading to a host kernel crash. When the guest writes to the ICR to request an IPI, while in x2apic mode the following things happen, the destination is read from ICR2, which is a register that the guest can control. kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the cluster id. A BUG_ON is triggered, which is a protection against accessing map->logical_map with an out-of-bounds access and manages to avoid that anything really unsafe occurs. The logic in the code is correct from real HW point of view. The problem is that KVM supports only one cluster with ID 0 in clustered mode, but the code that has the bug does not take this into account. Reported-by: Lars Bull <larsbull@google.com> Cc: stable@vger.kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 22:46:18 +01:00
Andy Honig	fda4e2e855	KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the potential to corrupt kernel memory if userspace provides an address that is at the end of a page. This patches concerts those functions to use kvm_write_guest_cached and kvm_read_guest_cached. It also checks the vapic_address specified by userspace during ioctl processing and returns an error to userspace if the address is not a valid GPA. This is generally not guest triggerable, because the required write is done by firmware that runs before the guest. Also, it only affects AMD processors and oldish Intel that do not have the FlexPriority feature (unless you disable FlexPriority, of course; then newer processors are also affected). Fixes: `b93463aa59` ('KVM: Accelerated apic support') Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 22:39:46 +01:00
Andy Honig	b963a22e6d	KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367) Under guest controllable circumstances apic_get_tmcct will execute a divide by zero and cause a crash. If the guest cpuid support tsc deadline timers and performs the following sequence of requests the host will crash. - Set the mode to periodic - Set the TMICT to 0 - Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline) - Set the TMICT to non-zero. Then the lapic_timer.period will be 0, but the TMICT will not be. If the guest then reads from the TMCCT then the host will perform a divide by 0. This patch ensures that if the lapic_timer.period is 0, then the division does not occur. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 22:39:45 +01:00
Jan Kiszka	6dfacadd58	KVM: nVMX: Add support for activity state HLT We can easily emulate the HLT activity state for L1: If it decides that L2 shall be halted on entry, just invoke the normal emulation of halt after switching to L2. We do not depend on specific host features to provide this, so we can expose the capability unconditionally. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 10:49:56 +01:00
Gleb Natapov	2961e8764f	KVM: VMX: shadow VM_(ENTRY\|EXIT)_CONTROLS vmcs field VM_(ENTRY\|EXIT)_CONTROLS vmcs fields are read/written on each guest entry but most times it can be avoided since values do not changes. Keep fields copy in memory to avoid unnecessary reads from vmcs. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 10:49:49 +01:00
Sasha Levin	521ee0cfb8	kvm: mmu: delay mmu audit activation We should not be using jump labels before they were initialized. Push back the callback to until after jump label initialization. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-20 11:12:56 +02:00
Anthoine Bourgeois	e504c9098e	kvm, vmx: Fix lazy FPU on nested guest If a nested guest does a NM fault but its CR0 doesn't contain the TS flag (because it was already cleared by the guest with L1 aid) then we have to activate FPU ourselves in L0 and then continue to L2. If TS flag is set then we fallback on the previous behavior, forward the fault to L1 if it asked for. Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-11-13 18:46:54 +01:00
Borislav Petkov	1b2ca42267	kvm, cpuid: Fix sparse warning We need to copy padding to kernel space first before looking at it. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-07 12:27:46 +02:00
Michael S. Tsirkin	01b71917b5	kvm: optimize out smp_mb after srcu_read_unlock I noticed that srcu_read_lock/unlock both have a memory barrier, so just by moving srcu_read_unlock earlier we can get rid of one call to smp_mb() using smp_mb__after_srcu_read_unlock instead. Unsurprisingly, the gain is small but measureable using the unit test microbenchmark: before vmcall in the ballpark of 1410 cycles after vmcall in the ballpark of 1360 cycles Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-06 09:32:31 +02:00
Gleb Natapov	a9d4e4393b	KVM: x86: trace cpuid emulation when called from emulator Currently cpuid emulation is traced only when executed by intercept. Move trace point so that emulator invocation is traced too. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-05 09:11:40 +02:00
Gleb Natapov	6d4d85ec56	KVM: emulator: cleanup decode_register_operand() a bit Make code shorter. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-05 09:11:30 +02:00
Gleb Natapov	aa9ac1a632	KVM: emulator: check rex prefix inside decode_register() All decode_register() callers check if instruction has rex prefix to properly decode one byte operand. It make sense to move the check inside. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-05 09:11:18 +02:00
Gleb Natapov	95f328d3ad	Merge branch 'kvm-ppc-queue' of git://github.com/agraf/linux-2.6 into queue Conflicts: arch/powerpc/include/asm/processor.h	2013-11-04 10:20:57 +02:00

1 2 3 4 5 ...

2784 Commits