linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-17 16:43:08 +00:00

Author	SHA1	Message	Date
Ladi Prosek	72bbf9358c	KVM: hyperv: define VP assist page helpers The state related to the VP assist page is still managed by the LAPIC code in the pv_eoi field. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:30:13 +02:00
Vitaly Kuznetsov	f21dd49450	KVM: x86: hyperv: optimize sparse VP set processing Rewrite kvm_hv_flush_tlb()/send_ipi_vcpus_mask() making them cleaner and somewhat more optimal. hv_vcpu_in_sparse_set() is converted to sparse_set_to_vcpu_mask() which copies sparse banks u64-at-a-time and then, depending on the num_mismatched_vp_indexes value, returns immediately or does vp index to vcpu index conversion by walking all vCPUs. To support the change and make kvm_hv_send_ipi() look similar to kvm_hv_flush_tlb() send_ipi_vcpus_mask() is introduced. Suggested-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:30:01 +02:00
Vitaly Kuznetsov	e6b6c483eb	KVM: x86: hyperv: fix 'tlb_lush' typo Regardless of whether your TLB is lush or not it still needs flushing. Reported-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:30:00 +02:00
Vitaly Kuznetsov	214ff83d44	KVM: x86: hyperv: implement PV IPI send hypercalls Using hypercall for sending IPIs is faster because this allows to specify any number of vCPUs (even > 64 with sparse CPU set), the whole procedure will take only one VMEXIT. Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi hypercall can't be 'fast' (passing parameters through registers) but apparently this is not true, Windows always uses it as 'fast' so we need to support that. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:47 +02:00
Vitaly Kuznetsov	2cefc5feb8	KVM: x86: hyperv: optimize kvm_hv_flush_tlb() for vp_index == vcpu_idx case VP inedx almost always matches VCPU and when it does it's faster to walk the sparse set instead of all vcpus. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:46 +02:00
Vitaly Kuznetsov	0b0a31badb	KVM: x86: hyperv: valid_bank_mask should be 'u64' This probably doesn't matter much (KVM_MAX_VCPUS is much lower nowadays) but valid_bank_mask is really u64 and not unsigned long. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:46 +02:00
Vitaly Kuznetsov	87ee613d07	KVM: x86: hyperv: keep track of mismatched VP indexes In most common cases VP index of a vcpu matches its vcpu index. Userspace is, however, free to set any mapping it wishes and we need to account for that when we need to find a vCPU with a particular VP index. To keep search algorithms optimal in both cases introduce 'num_mismatched_vp_indexes' counter showing how many vCPUs with mismatching VP index we have. In case the counter is zero we can assume vp_index == vcpu_idx. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:45 +02:00
Vitaly Kuznetsov	1779a39f78	KVM: x86: hyperv: consistently use 'hv_vcpu' for 'struct kvm_vcpu_hv' variables Rename 'hv' to 'hv_vcpu' in kvm_hv_set_msr/kvm_hv_get_msr(); 'hv' is 'reserved' for 'struct kvm_hv' variables across the file. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:45 +02:00
Vitaly Kuznetsov	a812297c4f	KVM: x86: hyperv: optimize 'all cpus' case in kvm_hv_flush_tlb() We can use 'NULL' to represent 'all cpus' case in kvm_make_vcpus_request_mask() and avoid building vCPU mask with all vCPUs. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:44 +02:00
Vitaly Kuznetsov	9170200ec0	KVM: x86: hyperv: enforce vp_index < KVM_MAX_VCPUS Hyper-V TLFS (5.0b) states: > Virtual processors are identified by using an index (VP index). The > maximum number of virtual processors per partition supported by the > current implementation of the hypervisor can be obtained through CPUID > leaf 0x40000005. A virtual processor index must be less than the > maximum number of virtual processors per partition. Forbid userspace to set VP_INDEX above KVM_MAX_VCPUS. get_vcpu_by_vpidx() can now be optimized to bail early when supplied vpidx is >= KVM_MAX_VCPUS. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:43 +02:00
Paolo Bonzini	44883f01fe	KVM: x86: ensure all MSRs can always be KVM_GET/SET_MSR'd Some of the MSRs returned by GET_MSR_INDEX_LIST currently cannot be sent back to KVM_GET_MSR and/or KVM_SET_MSR; either they can never be sent back, or you they are only accepted under special conditions. This makes the API a pain to use. To avoid this pain, this patch makes it so that the result of the get-list ioctl can always be used for host-initiated get and set. Since we don't have a separate way to check for read-only MSRs, this means some Hyper-V MSRs are ignored when written. Arguably they should not even be in the result of GET_MSR_INDEX_LIST, but I am leaving there in case userspace is using the outcome of GET_MSR_INDEX_LIST to derive the support for the corresponding Hyper-V feature. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-08-06 17:32:01 +02:00
Linus Torvalds	b357bf6023	Small update for KVM. * ARM: lazy context-switching of FPSIMD registers on arm64, "split" regions for vGIC redistributor * s390: cleanups for nested, clock handling, crypto, storage keys and control register bits * x86: many bugfixes, implement more Hyper-V super powers, implement lapic_timer_advance_ns even when the LAPIC timer is emulated using the processor's VMX preemption timer. Two security-related bugfixes at the top of the branch. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJbH8Z/AAoJEL/70l94x66DF+UIAJeOuTp6LGasT/9uAb2OovaN +5kGmOPGFwkTcmg8BQHI2fXT4vhxMXWPFcQnyig9eXJVxhuwluXDOH4P9IMay0yw VDCBsWRdMvZDQad2hn6Z5zR4Jx01XrSaG/KqvXbbDKDCy96mWG7SYAY2m3ZwmeQi 3Pa3O3BTijr7hBYnMhdXGkSn4ZyU8uPaAgIJ8795YKeOJ2JmioGYk6fj6y2WCxA3 ztJymBjTmIoZ/F8bjuVouIyP64xH4q9roAyw4rpu7vnbWGqx1fjPYJoB8yddluWF JqCPsPzhKDO7mjZJy+lfaxIlzz2BN7tKBNCm88s5GefGXgZwk3ByAq/0GQ2M3rk= =H5zI -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "Small update for KVM: ARM: - lazy context-switching of FPSIMD registers on arm64 - "split" regions for vGIC redistributor s390: - cleanups for nested - clock handling - crypto - storage keys - control register bits x86: - many bugfixes - implement more Hyper-V super powers - implement lapic_timer_advance_ns even when the LAPIC timer is emulated using the processor's VMX preemption timer. - two security-related bugfixes at the top of the branch" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (79 commits) kvm: fix typo in flag name kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system KVM: x86: introduce linear_{read,write}_system kvm: nVMX: Enforce cpl=0 for VMX instructions kvm: nVMX: Add support for "VMWRITE to any supported field" kvm: nVMX: Restrict VMX capability MSR changes KVM: VMX: Optimize tscdeadline timer latency KVM: docs: nVMX: Remove known limitations as they do not exist now KVM: docs: mmu: KVM support exposing SLAT to guests kvm: no need to check return value of debugfs_create functions kvm: Make VM ioctl do valloc for some archs kvm: Change return type to vm_fault_t KVM: docs: mmu: Fix link to NPT presentation from KVM Forum 2008 kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation KVM: introduce kvm_make_vcpus_request_mask() API KVM: x86: hyperv: do rep check for each hypercall separately ...	2018-06-12 11:34:04 -07:00
Vitaly Kuznetsov	c70126764b	KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation Implement HvFlushVirtualAddress{List,Space}Ex hypercalls in the same way we've implemented non-EX counterparts. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Initialized valid_bank_mask to silence misguided GCC warnigs. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-05-26 15:35:35 +02:00
Vitaly Kuznetsov	e2f11f4282	KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation Implement HvFlushVirtualAddress{List,Space} hypercalls in a simplistic way: do full TLB flush with KVM_REQ_TLB_FLUSH and kick vCPUs which are currently IN_GUEST_MODE. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-05-26 14:14:33 +02:00
Vitaly Kuznetsov	56b9ae7830	KVM: x86: hyperv: do rep check for each hypercall separately Prepare to support TLB flush hypercalls, some of which are REP hypercalls. Also, return HV_STATUS_INVALID_HYPERCALL_INPUT as it seems more appropriate. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-05-26 14:14:33 +02:00
Vitaly Kuznetsov	142c95da92	KVM: x86: hyperv: use defines when parsing hypercall parameters Avoid open-coding offsets for hypercall input parameters, we already have defines for them. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-05-26 14:14:33 +02:00
Radim Krčmář	696ca779a9	KVM: x86: fix #UD address of failed Hyper-V hypercalls If the hypercall was called from userspace or real mode, KVM injects #UD and then advances RIP, so it looks like #UD was caused by the following instruction. This probably won't cause more than confusion, but could give an unexpected access to guest OS' instruction emulator. Also, refactor the code to count hv hypercalls that were handled by the virt userspace. Fixes: `6356ee0c96` ("x86: Delay skip of emulated hypercall instruction") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-05-25 21:33:31 +02:00
Paolo Bonzini	452a68d0ef	KVM: hyperv: idr_find needs RCU protection Even though the eventfd is released after the KVM SRCU grace period elapses, the conn_to_evt data structure itself is not; it uses RCU internally, instead. Fix the read-side critical section to happen under rcu_read_lock/unlock; the result is still protected by vcpu->kvm->srcu. Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-05-11 11:21:11 +02:00
Marian Rotariu	6356ee0c96	x86: Delay skip of emulated hypercall instruction The IP increment should be done after the hypercall emulation, after calling the various handlers. In this way, these handlers can accurately identify the the IP of the VMCALL if they need it. This patch keeps the same functionality for the Hyper-V handler which does not use the return code of the standard kvm_skip_emulated_instruction() call. Signed-off-by: Marian Rotariu <mrotariu@bitdefender.com> [Hyper-V hypercalls also need kvm_skip_emulated_instruction() - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-05-11 11:21:10 +02:00
Ladi Prosek	d4abc577bb	x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE The assist page has been used only for the paravirtual EOI so far, hence the "APIC" in the MSR name. Renaming to match the Hyper-V TLFS where it's called "Virtual VP Assist MSR". Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Dan Carpenter	d32ef547fd	kvm: x86: hyperv: delete dead code in kvm_hv_hypercall() "rep_done" is always zero so the "(((u64)rep_done & 0xfff) << 32)" expression is just zero. We can remove the "res" temporary variable as well and just use "ret" directly. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-23 20:11:01 +01:00
Vitaly Kuznetsov	915e6f78bd	x86/kvm/hyper-v: inject #GP only when invalid SINTx vector is unmasked Hyper-V 2016 on KVM with SynIC enabled doesn't boot with the following trace: kvm_entry: vcpu 0 kvm_exit: reason MSR_WRITE rip 0xfffff8000131c1e5 info 0 0 kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000090 data 0x10000 host 0 kvm_msr: msr_write 40000090 = 0x10000 (#GP) kvm_inj_exception: #GP (0x0) KVM acts according to the following statement from TLFS: " 11.8.4 SINTx Registers ... Valid values for vector are 16-255 inclusive. Specifying an invalid vector number results in #GP. " However, I checked and genuine Hyper-V doesn't #GP when we write 0x10000 to SINTx. I checked with Microsoft and they confirmed that if either the Masked bit (bit 16) or the Polling bit (bit 18) is set to 1, then they ignore the value of Vector. Make KVM act accordingly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:33 +01:00
Vitaly Kuznetsov	98f65ad458	x86/kvm/hyper-v: remove stale entries from vec_bitmap/auto_eoi_bitmap on vector change When a new vector is written to SINx we update vec_bitmap/auto_eoi_bitmap but we forget to remove old vector from these masks (in case it is not present in some other SINTx). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:32 +01:00
Vitaly Kuznetsov	a2e164e7f4	x86/kvm/hyper-v: add reenlightenment MSRs support Nested Hyper-V/Windows guest running on top of KVM will use TSC page clocksource in two cases: - L0 exposes invariant TSC (CPUID.80000007H:EDX[8]). - L0 provides Hyper-V Reenlightenment support (CPUID.40000003H:EAX[13]). Exposing invariant TSC effectively blocks migration to hosts with different TSC frequencies, providing reenlightenment support will be needed when we start migrating nested workloads. Implement rudimentary support for reenlightenment MSRs. For now, these are just read/write MSRs with no effect. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-16 22:01:31 +01:00
Roman Kagan	faeb7833ee	kvm: x86: hyperv: guest->host event signaling via eventfd In Hyper-V, the fast guest->host notification mechanism is the SIGNAL_EVENT hypercall, with a single parameter of the connection ID to signal. Currently this hypercall incurs a user exit and requires the userspace to decode the parameters and trigger the notification of the potentially different I/O context. To avoid the costly user exit, process this hypercall and signal the corresponding eventfd in KVM, similar to ioeventfd. The association between the connection id and the eventfd is established via the newly introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an (srcu-protected) IDR. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: David Hildenbrand <david@redhat.com> [asm/hyperv.h changes approved by KY Srinivasan. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-06 18:40:36 +01:00
Roman Kagan	cbc0236a4b	kvm: x86: factor out kvm.arch.hyperv (de)init Move kvm.arch.hyperv initialization and cleanup to separate functions. For now only a mutex is inited in the former, and the latter is empty; more stuff will go in there in a followup patch. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-06 18:19:03 +01:00
Longpeng(Mike)	de63ad4cf4	KVM: X86: implement the logic for spinlock optimization get_cpl requires vcpu_load, so we must cache the result (whether the vcpu was preempted when its cpl=0) in kvm_vcpu_arch. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-08-08 10:57:43 +02:00
Longpeng(Mike)	199b5763d3	KVM: add spinlock optimization framework If a vcpu exits due to request a user mode spinlock, then the spinlock-holder may be preempted in user mode or kernel mode. (Note that not all architectures trap spin loops in user mode, only AMD x86 and ARM/ARM64 currently do). But if a vcpu exits in kernel mode, then the holder must be preempted in kernel mode, so we should choose a vcpu in kernel mode as a more likely candidate for the lock holder. This introduces kvm_arch_vcpu_in_kernel() to decide whether the vcpu is in kernel-mode when it's preempted. kvm_vcpu_on_spin's new argument says the same of the spinning VCPU. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-08-08 10:57:43 +02:00
Ladi Prosek	72c139bacf	KVM: hyperv: support HV_X64_MSR_TSC_FREQUENCY and HV_X64_MSR_APIC_FREQUENCY It has been experimentally confirmed that supporting these two MSRs is one of the necessary conditions for nested Hyper-V to use the TSC page. Modern Windows guests are noticeably slower when they fall back to reading timestamps from the HV_X64_MSR_TIME_REF_COUNT MSR instead of using the TSC page. The newly supported MSRs are advertised with the AccessFrequencyRegs partition privilege flag and CPUID.40000003H:EDX[8] "Support for determining timer frequencies is available" (both outside of the scope of this KVM patch). Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-08-07 15:26:06 +02:00
Roman Kagan	f1ff89ec44	kvm: x86: hyperv: avoid livelock in oneshot SynIC timers If the SynIC timer message delivery fails due to SINT message slot being busy, there's no point to attempt starting the timer again until we're notified of the slot being released by the guest (via EOM or EOI). Even worse, when a oneshot timer fails to deliver its message, its re-arming with an expiration time in the past leads to immediate retry of the delivery, and so on, without ever letting the guest vcpu to run and release the slot, which results in a livelock. To avoid that, only start the timer when there's no timer message pending delivery. When there is, meaning the slot is busy, the processing will be restarted upon notification from the guest that the slot is released. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-07-20 17:00:00 +02:00
Roman Kagan	d3457c877b	kvm: x86: hyperv: make VP_INDEX managed by userspace Hyper-V identifies vCPUs by Virtual Processor Index, which can be queried via HV_X64_MSR_VP_INDEX msr. It is defined by the spec as a sequential number which can't exceed the maximum number of vCPUs per VM. APIC ids can be sparse and thus aren't a valid replacement for VP indices. Current KVM uses its internal vcpu index as VP_INDEX. However, to make it predictable and persistent across VM migrations, the userspace has to control the value of VP_INDEX. This patch achieves that, by storing vp_index explicitly on vcpu, and allowing HV_X64_MSR_VP_INDEX to be set from the host side. For compatibility it's initialized to KVM vcpu index. Also a few variables are renamed to make clear distinction betweed this Hyper-V vp_index and KVM vcpu_id (== APIC id). Besides, a new capability, KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip attempting msr writes where unsupported, to avoid spamming error logs. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-07-14 16:28:18 +02:00
Roman Kagan	efc479e690	kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 There is a flaw in the Hyper-V SynIC implementation in KVM: when message page or event flags page is enabled by setting the corresponding msr, KVM zeroes it out. This is problematic because on migration the corresponding MSRs are loaded on the destination, so the content of those pages is lost. This went unnoticed so far because the only user of those pages was in-KVM hyperv synic timers, which could continue working despite that zeroing. Newer QEMU uses those pages for Hyper-V VMBus implementation, and zeroing them breaks the migration. Besides, in newer QEMU the content of those pages is fully managed by QEMU, so zeroing them is undesirable even when writing the MSRs from the guest side. To support this new scheme, introduce a new capability, KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic pages aren't zeroed out in KVM. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-07-13 17:41:04 +02:00
Ingo Molnar	32ef5517c2	sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> Introduce a trivial, mostly empty <linux/sched/cputime.h> header to prepare for the moving of cputime functionality out of sched.h. Update all code that relies on these facilities. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:39 +01:00
Linus Torvalds	fd7e9a8834	4.11 is going to be a relatively large release for KVM, with a little over 200 commits and noteworthy changes for most architectures. * ARM: - GICv3 save/restore - cache flushing fixes - working MSI injection for GICv3 ITS - physical timer emulation * MIPS: - various improvements under the hood - support for SMP guests - a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers to support copy-on-write, KSM, idle page tracking, swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is also supported, so that writes to some memory regions can be treated as MMIO. The new MMU also paves the way for hardware virtualization support. * PPC: - support for POWER9 using the radix-tree MMU for host and guest - resizable hashed page table - bugfixes. * s390: expose more features to the guest - more SIMD extensions - instruction execution protection - ESOP2 * x86: - improved hashing in the MMU - faster PageLRU tracking for Intel CPUs without EPT A/D bits - some refactoring of nested VMX entry/exit code, preparing for live migration support of nested hypervisors - expose yet another AVX512 CPUID bit - host-to-guest PTP support - refactoring of interrupt injection, with some optimizations thrown in and some duct tape removed. - remove lazy FPU handling - optimizations of user-mode exits - optimizations of vcpu_is_preempted() for KVM guests * generic: - alternative signaling mechanism that doesn't pound on tsk->sighand->siglock -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJYral1AAoJEL/70l94x66DbNgH/Rx8YXuidFq2fe3RWOvld3RK 85OM/D5g38cTLpBE0/sJpcvX34iYN8U/l5foCZwpxB+83GHEk2Cr57JyfTogdaAJ x8dBhHKQCA/HxSQUQLN6nFqRV+yT8WUR92Fhqx82+80BSen5Yzcfee/TDoW6T1IW g8CYgX9FrRaGOX066ImAuUfdAdUVjyssfs9VttDTX+HiusPeuBPx/wsRe1ZEEPlH vnltIJQb1ETV2GOZLUojKjzH6aZkjIl29XxjkYii9JTUornClG0DfW+5QT3uLrB5 gJ+G+Zmpsq8ZBx9jNDtAi7sFsoPY1Mzf+JPNCGXBra2sP2GrBAuXcxmgznRYltQ= =8IIp -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "4.11 is going to be a relatively large release for KVM, with a little over 200 commits and noteworthy changes for most architectures. ARM: - GICv3 save/restore - cache flushing fixes - working MSI injection for GICv3 ITS - physical timer emulation MIPS: - various improvements under the hood - support for SMP guests - a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers to support copy-on-write, KSM, idle page tracking, swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is also supported, so that writes to some memory regions can be treated as MMIO. The new MMU also paves the way for hardware virtualization support. PPC: - support for POWER9 using the radix-tree MMU for host and guest - resizable hashed page table - bugfixes. s390: - expose more features to the guest - more SIMD extensions - instruction execution protection - ESOP2 x86: - improved hashing in the MMU - faster PageLRU tracking for Intel CPUs without EPT A/D bits - some refactoring of nested VMX entry/exit code, preparing for live migration support of nested hypervisors - expose yet another AVX512 CPUID bit - host-to-guest PTP support - refactoring of interrupt injection, with some optimizations thrown in and some duct tape removed. - remove lazy FPU handling - optimizations of user-mode exits - optimizations of vcpu_is_preempted() for KVM guests generic: - alternative signaling mechanism that doesn't pound on tsk->sighand->siglock" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits) x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 x86/paravirt: Change vcp_is_preempted() arg type to long KVM: VMX: use correct vmcs_read/write for guest segment selector/base x86/kvm/vmx: Defer TR reload after VM exit x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss x86/kvm/vmx: Simplify segment_base() x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels x86/kvm/vmx: Don't fetch the TSS base from the GDT x86/asm: Define the kernel TSS limit in a macro kvm: fix page struct leak in handle_vmon KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now KVM: Return an error code only as a constant in kvm_get_dirty_log() KVM: Return an error code only as a constant in kvm_get_dirty_log_protect() KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl() KVM: x86: remove code for lazy FPU handling KVM: race-free exit from KVM_RUN without POSIX signals KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message KVM: PPC: Book3S PR: Ratelimit copy data failure error messages KVM: Support vCPU-based gfn->hva cache KVM: use separate generations for each address space ...	2017-02-22 18:22:53 -08:00
Frederic Weisbecker	5613fda9a5	sched/cputime: Convert task/group cputime to nsecs Now that most cputime readers use the transition API which return the task cputime in old style cputime_t, we can safely store the cputime in nsecs. This will eventually make cputime statistics less opaque and more granular. Back and forth convertions between cputime_t and nsecs in order to deal with cputime_t random granularity won't be needed anymore. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 09:13:49 +01:00
Radim Krčmář	f98a3efb28	KVM: x86: use delivery to self in hyperv synic Interrupt to self can be sent without knowing the APIC ID. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-09 14:46:13 +01:00
Paolo Bonzini	3f5ad8be37	KVM: hyperv: fix locking of struct kvm_hv fields Introduce a new mutex to avoid an AB-BA deadlock between kvm->lock and vcpu->mutex. Protect accesses in kvm_hv_setup_tsc_page too, as suggested by Roman. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-12-16 17:53:38 +01:00
Jiang Biao	ecd8a8c2b4	kvm: x86: hyperv: make function static to avoid compiling warning synic_set_irq is only used in hyperv.c, and should be static to avoid compiling warning when with -Wmissing-prototypes option. Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-16 22:09:44 +01:00
Paolo Bonzini	095cf55df7	KVM: x86: Hyper-V tsc page setup Lately tsc page was implemented but filled with empty values. This patch setup tsc page scale and offset based on vcpu tsc, tsc_khz and HV_X64_MSR_TIME_REF_COUNT value. The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr reads count to zero which potentially improves performance. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Peter Hornyack <peterhornyack@google.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> [Computation of TSC page parameters rewritten to use the Linux timekeeper parameters. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-20 09:26:20 +02:00
Paolo Bonzini	108b249c45	KVM: x86: introduce get_kvmclock_ns Introduce a function that reads the exact nanoseconds value that is provided to the guest in kvmclock. This crystallizes the notion of kvmclock as a thin veneer over a stable TSC, that the guest will (hopefully) convert with NTP. In other words, kvmclock is not a paravirtualized host-to-guest NTP. Drop the get_kernel_ns() function, that was used both to get the base value of the master clock and to get the current value of kvmclock. The former use is replaced by ktime_get_boot_ns(), the latter is the purpose of get_kernel_ns(). This also allows KVM to provide a Hyper-V time reference counter that is synchronized with the time that is computed from the TSC page. Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-20 09:26:15 +02:00
Paolo Bonzini	a2b5c3c0c8	KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled If SynIC is disabled, there is nothing that userspace can do to handle these exits; on the other hand, userspace probably will not know about KVM_EXIT_HYPERV_HCALL and complain about it or even exit. Just prevent anything bad from happening by handling the hypercall in KVM and returning an "invalid hypercall" code. Fixes: `83326e43f2` Cc: Andrey Smetanin <irqlevel@gmail.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-04-01 12:10:09 +02:00
Andrey Smetanin	83326e43f2	kvm/x86: Hyper-V VMBus hypercall userspace exit The patch implements KVM_EXIT_HYPERV userspace exit functionality for Hyper-V VMBus hypercalls: HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT. Changes v3: * use vcpu->arch.complete_userspace_io to setup hypercall result Changes v2: * use KVM_EXIT_HYPERV for hypercalls Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-16 18:48:44 +01:00
Andrey Smetanin	b2fdc2570a	kvm/x86: Reject Hyper-V hypercall continuation Currently we do not support Hyper-V hypercall continuation so reject it. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-16 18:48:42 +01:00
Andrey Smetanin	0d9c055eaa	kvm/x86: Pass return code of kvm_emulate_hypercall Pass the return code from kvm_emulate_hypercall on to the caller, in order to allow it to indicate to the userspace that the hypercall has to be handled there. Also adjust all the existing code paths to return 1 to make sure the hypercall isn't passed to the userspace without setting kvm_run appropriately. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-16 18:48:41 +01:00
Andrey Smetanin	8ed6d76781	kvm/x86: Rename Hyper-V long spin wait hypercall Rename HV_X64_HV_NOTIFY_LONG_SPIN_WAIT by HVCALL_NOTIFY_LONG_SPIN_WAIT, so the name is more consistent with the other hypercalls. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org [Change name, Andrey used HV_X64_HCALL_NOTIFY_LONG_SPIN_WAIT. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-16 18:48:38 +01:00
Andrey Smetanin	ac3e5fcae8	kvm/x86: Hyper-V SynIC timers tracepoints Trace the following Hyper SynIC timers events: * periodic timer start * one-shot timer start * timer callback * timer expiration and message delivery result * timer config setup * timer count setup * timer cleanup Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-01-08 19:04:43 +01:00
Andrey Smetanin	18659a9cb1	kvm/x86: Hyper-V SynIC tracepoints Trace the following Hyper SynIC events: * set msr * set sint irq * ack sint * sint irq eoi Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-01-08 19:04:43 +01:00
Andrey Smetanin	f3b138c5d8	kvm/x86: Update SynIC timers on guest entry only Consolidate updating the Hyper-V SynIC timers in a single place: on guest entry in processing KVM_REQ_HV_STIMER request. This simplifies the overall logic, and makes sure the most current state of msrs and guest clock is used for arming the timers (to achieve that, KVM_REQ_HV_STIMER has to be processed after KVM_REQ_CLOCK_UPDATE). Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-01-08 19:04:42 +01:00
Andrey Smetanin	7be58a6488	kvm/x86: Skip SynIC vector check for QEMU side QEMU zero-inits Hyper-V SynIC vectors. We should allow that, and don't reject zero values if set by the host. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-01-08 19:04:42 +01:00
Andrey Smetanin	23a3b201fd	kvm/x86: Hyper-V fix SynIC timer disabling condition Hypervisor Function Specification(HFS) doesn't require to disable SynIC timer at timer config write if timer->count = 0. So drop this check, this allow to load timers MSR's during migration restore, because config are set before count in QEMU side. Also fix condition according to HFS doc(15.3.1): "It is not permitted to set the SINTx field to zero for an enabled timer. If attempted, the timer will be marked disabled (that is, bit 0 cleared) immediately." Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-01-08 19:04:41 +01:00

1 2

64 Commits