linux

Author	SHA1	Message	Date
Paolo Bonzini	4415b33528	Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD The main thing here is a new implementation of the in-kernel XICS interrupt controller emulation for POWER9 machines, from Ben Herrenschmidt. POWER9 has a new interrupt controller called XIVE (eXternal Interrupt Virtualization Engine) which is able to deliver interrupts directly to guest virtual CPUs in hardware without hypervisor intervention. With this new code, the guest still sees the old XICS interface but performance is better because the XICS emulation in the host uses the XIVE directly rather than going through a XICS emulation in firmware. Conflicts: arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix] arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]	2017-05-09 11:50:01 +02:00
Geliang Tang	3bed8888ed	KVM: set no_llseek in stat_fops_per_vm In vm_stat_get_per_vm_fops and vcpu_stat_get_per_vm_fops, since we use nonseekable_open() to open, we should use no_llseek() to seek, not generic_file_llseek(). Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-05-09 11:48:23 +02:00
Linus Torvalds	bf5f89463f	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: - the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 selftests/vm: add a test for virtual address range mapping dax: add tracepoint to dax_insert_mapping() dax: add tracepoint to dax_writeback_one() dax: add tracepoints to dax_writeback_mapping_range() dax: add tracepoints to dax_load_hole() dax: add tracepoints to dax_pfn_mkwrite() dax: add tracepoints to dax_iomap_pte_fault() mtd: nand: nandsim: convert to memalloc_noreclaim_*() treewide: convert PF_MEMALLOC manipulations to new helpers mm: introduce memalloc_noreclaim_{save,restore} mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required mm/huge_memory.c: use zap_deposited_table() more time: delete CURRENT_TIME_SEC and CURRENT_TIME gfs2: replace CURRENT_TIME with current_time apparmorfs: replace CURRENT_TIME with current_time() lustre: replace CURRENT_TIME macro fs: ubifs: replace CURRENT_TIME_SEC with current_time fs: ufs: use ktime_get_real_ts64() for birthtime ...	2017-05-08 18:17:56 -07:00
Michal Hocko	a7c3e901a4	mm: introduce kv[mz]alloc helpers Patch series "kvmalloc", v5. There are many open coded kmalloc with vmalloc fallback instances in the tree. Most of them are not careful enough or simply do not care about the underlying semantic of the kmalloc/page allocator which means that a) some vmalloc fallbacks are basically unreachable because the kmalloc part will keep retrying until it succeeds b) the page allocator can invoke a really disruptive steps like the OOM killer to move forward which doesn't sound appropriate when we consider that the vmalloc fallback is available. As it can be seen implementing kvmalloc requires quite an intimate knowledge if the page allocator and the memory reclaim internals which strongly suggests that a helper should be implemented in the memory subsystem proper. Most callers, I could find, have been converted to use the helper instead. This is patch 6. There are some more relying on __GFP_REPEAT in the networking stack which I have converted as well and Eric Dumazet was not opposed [2] to convert them as well. [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com This patch (of 9): Using kmalloc with the vmalloc fallback for larger allocations is a common pattern in the kernel code. Yet we do not have any common helper for that and so users have invented their own helpers. Some of them are really creative when doing so. Let's just add kv[mz]alloc and make sure it is implemented properly. This implementation makes sure to not make a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also to not warn about allocation failures. This also rules out the OOM killer as the vmalloc is a more approapriate fallback than a disruptive user visible action. This patch also changes some existing users and removes helpers which are specific for them. In some cases this is not possible (e.g. ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and require GFP_NO{FS,IO} context which is not vmalloc compatible in general (note that the page table allocation is GFP_KERNEL). Those need to be fixed separately. While we are at it, document that __vmalloc{_node} about unsupported gfp mask because there seems to be a lot of confusion out there. kvmalloc_node will warn about GFP_KERNEL incompatible (which are not superset) flags to catch new abusers. Existing ones would have to die slowly. [sfr@canb.auug.org.au: f2fs fixup] Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part] Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: John Hubbard <jhubbard@nvidia.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-05-08 17:15:12 -07:00
Linus Torvalds	2d3e4866de	* ARM: HYP mode stub supports kexec/kdump on 32-bit; improved PMU support; virtual interrupt controller performance improvements; support for userspace virtual interrupt controller (slower, but necessary for KVM on the weird Broadcom SoCs used by the Raspberry Pi 3) * MIPS: basic support for hardware virtualization (ImgTec P5600/P6600/I6400 and Cavium Octeon III) * PPC: in-kernel acceleration for VFIO * s390: support for guests without storage keys; adapter interruption suppression * x86: usual range of nVMX improvements, notably nested EPT support for accessed and dirty bits; emulation of CPL3 CPUID faulting * generic: first part of VCPU thread request API; kvm_stat improvements -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJZEHUkAAoJEL/70l94x66DBeYH/09wrpJ2FjU4Rqv7FxmqgWfH 9WGi4wvn/Z+XzQSyfMJiu2SfZVzU69/Y67OMHudy7vBT6knB+ziM7Ntoiu/hUfbG 0g5KsDX79FW15HuvuuGh9kSjUsj7qsQdyPZwP4FW/6ZoDArV9mibSvdjSmiUSMV/ 2wxaoLzjoShdOuCe9EABaPhKK0XCrOYkygT6Paz1pItDxaSn8iW3ulaCuWMprUfG Niq+dFemK464E4yn6HVD88xg5j2eUM6bfuXB3qR3eTR76mHLgtwejBzZdDjLG9fk 32PNYKhJNomBxHVqtksJ9/7cSR6iNPs7neQ1XHemKWTuYqwYQMlPj1NDy0aslQU= =IsiZ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "ARM: - HYP mode stub supports kexec/kdump on 32-bit - improved PMU support - virtual interrupt controller performance improvements - support for userspace virtual interrupt controller (slower, but necessary for KVM on the weird Broadcom SoCs used by the Raspberry Pi 3) MIPS: - basic support for hardware virtualization (ImgTec P5600/P6600/I6400 and Cavium Octeon III) PPC: - in-kernel acceleration for VFIO s390: - support for guests without storage keys - adapter interruption suppression x86: - usual range of nVMX improvements, notably nested EPT support for accessed and dirty bits - emulation of CPL3 CPUID faulting generic: - first part of VCPU thread request API - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) kvm: nVMX: Don't validate disabled secondary controls KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick Revert "KVM: Support vCPU-based gfn->hva cache" tools/kvm: fix top level makefile KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING KVM: Documentation: remove VM mmap documentation kvm: nVMX: Remove superfluous VMX instruction fault checks KVM: x86: fix emulation of RSM and IRET instructions KVM: mark requests that need synchronization KVM: return if kvm_vcpu_wake_up() did wake up the VCPU KVM: add explicit barrier to kvm_vcpu_kick KVM: perform a wake_up in kvm_make_all_cpus_request KVM: mark requests that do not need a wakeup KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up KVM: x86: always use kvm_make_request instead of set_bit KVM: add kvm_{test,clear}_request to replace {test,clear}_bit s390: kvm: Cpu model support for msa6, msa7 and msa8 KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK kvm: better MWAIT emulation for guests KVM: x86: virtualize cpuid faulting ...	2017-05-08 12:37:56 -07:00
Linus Torvalds	ab182e67ec	arm64 updates for 4.12: - kdump support, including two necessary memblock additions: memblock_clear_nomap() and memblock_cap_memory_range() - ARMv8.3 HWCAP bits for JavaScript conversion instructions, complex numbers and weaker release consistency - arm64 ACPI platform MSI support - arm perf updates: ACPI PMU support, L3 cache PMU in some Qualcomm SoCs, Cortex-A53 L2 cache events and DTLB refills, MAINTAINERS update for DT perf bindings - architected timer errata framework (the arch/arm64 changes only) - support for DMA_ATTR_FORCE_CONTIGUOUS in the arm64 iommu DMA API - arm64 KVM refactoring to use common system register definitions - remove support for ASID-tagged VIVT I-cache (no ARMv8 implementation using it and deprecated in the architecture) together with some I-cache handling clean-up - PE/COFF EFI header clean-up/hardening - define BUG() instruction without CONFIG_BUG -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJZDKMoAAoJEGvWsS0AyF7xR+YP/0EMEz5MDfCv0PVYj7/AIa0G Zphl7OhysIkeDAz7urXw9Jdl0NfORNIqmD1vZNVSc321IyNp56Od+kWd82lBrOWB ad3nNT67pEmu0pAW7CO48ju3rTesEnEl3ra45E1tULeLihmv93jc4ZlfXgumlKq3 /GE84XJ5ZFmluuhq1zgNefeUtyl1tbxTxHJ74+INF7dTd/5sJcphpqS4Dzpb+msT 20WYliccQCBF9zBFUYHc2KjcXXKRQGxLulGS3MuoN2DLkD+U9YyR/OmA7SoXh2J2 WXC5b0x856xTQJFCJ39pb7rw5xHjt3l5zfU3VLSvqEVL/+asBqCcgGNtNUgOW1Es dEHC6bc66Ley6mn7bbpFE3MK8D+K5q8HwMF6G5KDtIVB6DB/iQ6kzi5aXKoupxtb 1EuU4OW6cDhmOFQYjgIDofLgqbmVvJofdF6+NfxasfZmWrMgHzv0rYvaCDnAV/Tr t7bhH7hf9/KcP/wpk86O2AMKKpgoNTqe1Qy8cWVFFLnut567Pb6zs/L3ZXfleoLv t613yM8Zj2fE05ja8ylMDjaasidNpXGttb08/4kAn06Daaoueqla0jmduAhy4aaV dQ3OFP9lJ5MFaFnMMTPfU3vtvNLMHuo9MZsYCrv5zCaNNs3lpAPUiPNh588ZscKa sWx4PEiaCi+wcOsLsJvh =SDkm -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - kdump support, including two necessary memblock additions: memblock_clear_nomap() and memblock_cap_memory_range() - ARMv8.3 HWCAP bits for JavaScript conversion instructions, complex numbers and weaker release consistency - arm64 ACPI platform MSI support - arm perf updates: ACPI PMU support, L3 cache PMU in some Qualcomm SoCs, Cortex-A53 L2 cache events and DTLB refills, MAINTAINERS update for DT perf bindings - architected timer errata framework (the arch/arm64 changes only) - support for DMA_ATTR_FORCE_CONTIGUOUS in the arm64 iommu DMA API - arm64 KVM refactoring to use common system register definitions - remove support for ASID-tagged VIVT I-cache (no ARMv8 implementation using it and deprecated in the architecture) together with some I-cache handling clean-up - PE/COFF EFI header clean-up/hardening - define BUG() instruction without CONFIG_BUG * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits) arm64: Fix the DMA mmap and get_sgtable API with DMA_ATTR_FORCE_CONTIGUOUS arm64: Print DT machine model in setup_machine_fdt() arm64: pmu: Wire-up Cortex A53 L2 cache events and DTLB refills arm64: module: split core and init PLT sections arm64: pmuv3: handle pmuv3+ arm64: Add CNTFRQ_EL0 trap handler arm64: Silence spurious kbuild warning on menuconfig arm64: pmuv3: use arm_pmu ACPI framework arm64: pmuv3: handle !PMUv3 when probing drivers/perf: arm_pmu: add ACPI framework arm64: add function to get a cpu's MADT GICC table drivers/perf: arm_pmu: split out platform device probe logic drivers/perf: arm_pmu: move irq request/free into probe drivers/perf: arm_pmu: split cpu-local irq request/free drivers/perf: arm_pmu: rename irq request/free functions drivers/perf: arm_pmu: handle no platform_device drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs() drivers/perf: arm_pmu: factor out pmu registration drivers/perf: arm_pmu: fold init into alloc drivers/perf: arm_pmu: define armpmu_init_fn ...	2017-05-05 12:11:37 -07:00
Paolo Bonzini	0266c894b5	KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick The #ifndef was removed in `75aaafb79f`, but it was also protecting smp_send_reschedule() in kvm_vcpu_kick(). Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-05-04 15:14:13 +02:00
Paolo Bonzini	4e335d9e7d	Revert "KVM: Support vCPU-based gfn->hva cache" This reverts commit `bbd6411513`. I've been sitting on this revert for too long and it unfortunately missed 4.11. It's also the reason why I haven't merged ring-based dirty tracking for 4.12. Using kvm_vcpu_memslots in kvm_gfn_to_hva_cache_init and kvm_vcpu_write_guest_offset_cached means that the MSR value can now be used to access SMRAM, simply by making it point to an SMRAM physical address. This is problematic because it lets the guest OS overwrite memory that it shouldn't be able to touch. Cc: stable@vger.kernel.org Fixes: `bbd6411513` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-05-03 16:30:26 +02:00
David Hildenbrand	5c0aea0e8d	KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING We needed the lock to avoid racing with creation of the irqchip on x86. As kvm_set_irq_routing() calls srcu_synchronize_expedited(), this lock might be held for a longer time. Let's introduce an arch specific callback to check if we can actually add irq routes. For x86, all we have to do is check if we have an irqchip in the kernel. We don't need kvm->lock at that point as the irqchip is marked as inititalized only when actually fully created. Reported-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: `1df6ddede1` ("KVM: x86: race between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP") Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-05-02 14:45:45 +02:00
Paul Mackerras	fb7dcf723d	Merge remote-tracking branch 'remotes/powerpc/topic/xive' into kvm-ppc-next This merges in the powerpc topic/xive branch to bring in the code for the in-kernel XICS interrupt controller emulation to use the new XIVE (eXternal Interrupt Virtualization Engine) hardware in the POWER9 chip directly, rather than via a XICS emulation in firmware. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-28 08:23:16 +10:00
Paolo Bonzini	c24a7be211	KVM/ARM Changes for v4.12. Changes include: - Using the common sysreg definitions between KVM and arm64 - Improved hyp-stub implementation with support for kexec and kdump on the 32-bit side - Proper PMU exception handling - Performance improvements of our GIC handling - Support for irqchip in userspace with in-kernel arch-timers and PMU support - A fix for a race condition in our PSCI code -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJY/IasAAoJEEtpOizt6ddyd7gH/2N3BIMxi/Uqigx0e0byA43s f+8gNq8A71VBTERGW2l9QP1/AZAXpQYNWdWmN2jn+91x2yoVL7AT00gEsliSLEZv tqZaTGFXKi1vNihYrxEWm1mfVNzhRrnbW6vjLrO4J5Advq7T3OWhNuVt2BLTxz3Y h0iqOWNVrUD9h3QSBFH8tz7yXhguDTSppAcXbE0tACdRu4vN50wqEWokHJG5TsMG Tl3KYWrcc3YCKlAJGuJi7t5rMrXk+g1q6HnxlIN6OSk0POC2Vmw9/Gigtltj1Qwh ZEAwsnka/U8ak8WaWeZa3EsGTSFSoAk/+pKv2FB8mFN+uOmWDqVlEiol4dW49AY= =mEOk -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.12. Changes include: - Using the common sysreg definitions between KVM and arm64 - Improved hyp-stub implementation with support for kexec and kdump on the 32-bit side - Proper PMU exception handling - Performance improvements of our GIC handling - Support for irqchip in userspace with in-kernel arch-timers and PMU support - A fix for a race condition in our PSCI code Conflicts: Documentation/virtual/kvm/api.txt include/uapi/linux/kvm.h	2017-04-27 17:33:14 +02:00
Paolo Bonzini	7a97cec26b	KVM: mark requests that need synchronization kvm_make_all_requests() provides a synchronization that waits until all kicked VCPUs have acknowledged the kick. This is important for KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is underway. This patch adds the synchronization property into all requests that are currently being used with kvm_make_all_requests() in order to preserve the current behavior and only introduce a new framework. Removing it from requests where it is not necessary is left for future patches. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-27 14:36:44 +02:00
Radim Krčmář	178f02ffaf	KVM: return if kvm_vcpu_wake_up() did wake up the VCPU No need to kick a VCPU that we have just woken up. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-27 14:18:45 +02:00
Andrew Jones	cde9af6e79	KVM: add explicit barrier to kvm_vcpu_kick kvm_vcpu_kick() must issue a general memory barrier prior to reading vcpu->mode in order to ensure correctness of the mutual-exclusion memory barrier pattern used with vcpu->requests. While the cmpxchg called from kvm_vcpu_kick(): kvm_vcpu_kick kvm_arch_vcpu_should_kick kvm_vcpu_exiting_guest_mode cmpxchg implies general memory barriers before and after the operation, that implication is only valid when cmpxchg succeeds. We need an explicit barrier for when it fails, otherwise a VCPU thread on its entry path that reads zero for vcpu->requests does not exclude the possibility the requesting thread sees !IN_GUEST_MODE when it reads vcpu->mode. kvm_make_all_cpus_request already had a barrier, so we remove it, as now it would be redundant. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-27 14:16:17 +02:00
Radim Krčmář	6c6e8360b3	KVM: perform a wake_up in kvm_make_all_cpus_request We want to have kvm_make_all_cpus_request() to be an optmized version of kvm_for_each_vcpu(i, vcpu, kvm) { kvm_make_request(vcpu, request); kvm_vcpu_kick(vcpu); } and kvm_vcpu_kick() wakes up the target vcpu. We know which requests do not need the wake up and use it to optimize the loop. Thanks to that, this patch doesn't change the behavior of current users (the all don't need the wake up) and only prepares for future where the wake up is going to be needed. I think that most requests do not need the wake up, so we would flip the bit then. Later on, kvm_make_request() will take care of kicking too, using this bit to make the decision whether to kick or not. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-27 14:14:33 +02:00
Radim Krčmář	75aaafb79f	KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up The #ifndef was protecting a missing halt_wakeup stat, but that is no longer necessary. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-27 14:13:32 +02:00
Benjamin Herrenschmidt	5af5099385	KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-27 21:37:29 +10:00
Paolo Bonzini	ec594c4710	Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD	2017-04-21 12:47:31 +02:00
Alexey Kardashevskiy	121f80ba68	KVM: PPC: VFIO: Add in-kernel acceleration for VFIO This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO without passing them to user space which saves time on switching to user space and back. This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM. KVM tries to handle a TCE request in the real mode, if failed it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to the user space; this is not expected to happen though. To avoid dealing with page use counters (which is tricky in real mode), this only accelerates SPAPR TCE IOMMU v2 clients which are required to pre-register the userspace memory. The very first TCE request will be handled in the VFIO SPAPR TCE driver anyway as the userspace view of the TCE table (iommu_table::it_userspace) is not allocated till the very first mapping happens and we cannot call vmalloc in real mode. If we fail to update a hardware IOMMU table unexpected reason, we just clear it and move on as there is nothing really we can do about it - for example, if we hot plug a VFIO device to a guest, existing TCE tables will be mirrored automatically to the hardware and there is no interface to report to the guest about possible failures. This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd and associates a physical IOMMU table with the SPAPR TCE table (which is a guest view of the hardware IOMMU table). The iommu_table object is cached and referenced so we do not have to look up for it in real mode. This does not implement the UNSET counterpart as there is no use for it - once the acceleration is enabled, the existing userspace won't disable it unless a VFIO container is destroyed; this adds necessary cleanup to the KVM_DEV_VFIO_GROUP_DEL handler. This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user space. This adds real mode version of WARN_ON_ONCE() as the generic version causes problems with rcu_sched. Since we testing what vmalloc_to_phys() returns in the code, this also adds a check for already existing vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect(). This finally makes use of vfio_external_user_iommu_id() which was introduced quite some time ago and was considered for removal. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:26 +10:00
Marc Zyngier	cffcd9df10	KVM: arm/arm64: vgic-v3: Fix off-by-one LR access When iterating over the used LRs, be careful not to try to access an unused LR, or even an unimplemented one if you're unlucky... Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-19 17:28:38 +02:00
Marc Zyngier	ff567614d5	KVM: arm/arm64: vgic-v3: De-optimize VMCR save/restore when emulating a GICv2 When emulating a GICv2-on-GICv3, special care must be taken to only save/restore VMCR_EL2 when ICC_SRE_EL1.SRE is cleared. Otherwise, all Group-0 interrupts end-up being delivered as FIQ, which is probably not what the guest expects, as demonstrated here with an unhappy EFI: FIQ Exception at 0x000000013BD21CC4 This means that we cannot perform the load/put trick when dealing with VMCR_EL2 (because the host has SRE set), and we have to deal with it in the world-switch. Fortunately, this is not the most common case (modern guests should be able to deal with GICv3 directly), and the performance is not worse than what it was before the VMCR optimization. Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-19 17:28:38 +02:00
David Hildenbrand	8c6b7828c2	KVM: x86: cleanup return handling in setup_routing_entry() Let's drop the goto and return the error value directly. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-12 20:17:14 +02:00
David Hildenbrand	993225adf4	KVM: x86: rename kvm_vcpu_request_scan_ioapic() Let's rename it into a proper arch specific callback. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-12 20:17:14 +02:00
David Hildenbrand	1df6ddede1	KVM: x86: race between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP Avoid races between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP by taking the kvm->lock when setting up routes. If KVM_CREATE_IRQCHIP fails, KVM_SET_GSI_ROUTING could have already set up routes pointing at pic/ioapic, being silently removed already. Also, as a side effect, this patch makes sure that KVM_SET_GSI_ROUTING and KVM_CAP_SPLIT_IRQCHIP cannot run in parallel. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-12 20:17:13 +02:00
Christoffer Dall	3dbbdf7863	KVM: arm/arm64: Report PMU overflow interrupts to userspace irqchip When not using an in-kernel VGIC, but instead emulating an interrupt controller in userspace, we should report the PMU overflow status to that userspace interrupt controller using the KVM_CAP_ARM_USER_IRQ feature. Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:39 -07:00
Alexander Graf	d9e1397783	KVM: arm/arm64: Support arch timers with a userspace gic If you're running with a userspace gic or other interrupt controller (that is no vgic in the kernel), then you have so far not been able to use the architected timers, because the output of the architected timers, which are driven inside the kernel, was a kernel-only construct between the arch timer code and the vgic. This patch implements the new KVM_CAP_ARM_USER_IRQ feature, where we use a side channel on the kvm_run structure, run->s.regs.device_irq_level, to always notify userspace of the timer output levels when using a userspace irqchip. This works by ensuring that before we enter the guest, if the timer output level has changed compared to what we last told userspace, we don't enter the guest, but instead return to userspace to notify it of the new level. If we are exiting, because of an MMIO for example, and the level changed at the same time, the value is also updated and userspace can sample the line as it needs. This is nicely achieved simply always updating the timer_irq_level field after the main run loop. Note that the kvm_timer_update_irq trace event is changed to show the host IRQ number for the timer instead of the guest IRQ number, because the kernel no longer know which IRQ userspace wires up the timer signal to. Also note that this patch implements all required functionality but does not yet advertise the capability. Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:38 -07:00
Christoffer Dall	b22e7df2d8	KVM: arm/arm64: Cleanup the arch timer code's irqchip checking Currently we check if we have an in-kernel irqchip and if the vgic was properly implemented several places in the arch timer code. But, we already predicate our enablement of the arm timers on having a valid and initialized gic, so we can simply check if the timers are enabled or not. This also gets rid of the ugly "error that's not an error but used to signal that the timer shouldn't poke the gic" construct we have. Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:37 -07:00
Christoffer Dall	8ac76ef4b5	KVM: arm/arm64: vgic: Improve sync_hwstate performance There is no need to call any functions to fold LRs when we don't use any LRs and we don't need to mess with overflow flags, take spinlocks, or prune the AP list if the AP list is empty. Note: list_empty is a single atomic read (uses READ_ONCE) and can therefore check if a list is empty or not without the need to take the spinlock protecting the list. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:49:12 -07:00
Christoffer Dall	0b09b6e519	KVM: arm/arm64: vgic: Don't check vgic_initialized in sync/flush Now when we do an early init of the static parts of the VGIC data structures, we can do things like checking if the AP lists are empty directly without having to explicitly check if the vgic is initialized and reduce a bit of work in our critical path. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:49:11 -07:00
Christoffer Dall	966e014919	KVM: arm/arm64: vgic: Implement early VGIC init functionality Implement early initialization for both the distributor and the CPU interfaces. The basic idea is that even though the VGIC is not functional or not requested from user space, the critical path of the run loop can still call VGIC functions that just won't do anything, without them having to check additional initialization flags to ensure they don't look at uninitialized data structures. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:49:11 -07:00
Christoffer Dall	096f31c436	KVM: arm/arm64: vgic: Get rid of MISR and EISR fields We don't use these fields anymore so let's nuke them completely. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:10 -07:00
Christoffer Dall	b6095b084d	KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state Now when we don't look at the MISR and EISR values anymore, we can get rid of the logic to save them in the GIC save/restore code. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:09 -07:00
Christoffer Dall	af0614991a	KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation Since we always read back the LRs that we wrote to the guest and the MISR and EISR registers simply provide a summary of the configuration of the bits in the LRs, there is really no need to read back those status registers and process them. We might as well just signal the notifyfd when folding the LR state and save some cycles in the process. We now clear the underflow bit in the fold_lr_state functions as we only need to clear this bit if we had used all the LRs, so this is as good a place as any to do that work. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:07 -07:00
Christoffer Dall	90cac1f52a	KVM: arm/arm64: vgic: Only set underflow when actually out of LRs We currently assume that all the interrupts in our AP list will be queued to LRs, but that's not necessarily the case, because some of them could have been migrated away to different VCPUs and only the VCPU thread itself can remove interrupts from its AP list. Therefore, slightly change the logic to only setting the underflow interrupt when we actually run out of LRs. As it turns out, this allows us to further simplify the handling in vgic_sync_hwstate in later patches. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:45:32 -07:00
Christoffer Dall	00dafa0fcf	KVM: arm/arm64: vgic: Get rid of live_lrs There is no need to calculate and maintain live_lrs when we always populate the lowest numbered LRs first on every entry and clear all LRs on every exit. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:45:31 -07:00
Shih-Wei Li	f6769581e9	KVM: arm/arm64: vgic: Avoid flushing vgic state when there's no pending IRQ We do not need to flush vgic states in each world switch unless there is pending IRQ queued to the vgic's ap list. We can thus reduce the overhead by not grabbing the spinlock and not making the extra function call to vgic_flush_lr_state. Note: list_empty is a single atomic read (uses READ_ONCE) and can therefore check if a list is empty or not without the need to take the spinlock protecting the list. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:45:31 -07:00
Christoffer Dall	328e566479	KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put We don't have to save/restore the VMCR on every entry to/from the guest, since on GICv2 we can access the control interface from EL1 and on VHE systems with GICv3 we can access the control interface from KVM running in EL2. GICv3 systems without VHE becomes the rare case, which has to save/restore the register on each round trip. Note that userspace accesses may see out-of-date values if the VCPU is running while accessing the VGIC state via the KVM device API, but this is already the case and it is up to userspace to quiesce the CPUs before reading the CPU registers from the GIC for an up-to-date view. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:45:22 -07:00
Paolo Bonzini	4b4357e025	kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET public Its value has never changed; we might as well make it part of the ABI instead of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO). Because PPC does not always make MMIO available, the code has to be made dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-07 16:49:01 +02:00
Paolo Bonzini	3042255899	kvm: make KVM_CAP_COALESCED_MMIO architecture agnostic Remove code from architecture files that can be moved to virt/kvm, since there is already common code for coalesced MMIO. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Removed a pointless 'break' after 'return'.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-07 16:49:00 +02:00
Paolo Bonzini	ad6260da1e	KVM: x86: drop legacy device assignment Legacy device assignment has been deprecated since 4.2 (released 1.5 years ago). VFIO is better and everyone should have switched to it. If they haven't, this should convince them. :) Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-07 16:49:00 +02:00
Radim Krčmář	6fd6410311	KVM/ARM Fixes for v4.11-rc6 Fixes include: - Fix a problem with GICv3 userspace save/restore - Clarify GICv2 userspace save/restore ABI - Be more careful in clearing GIC LRs - Add missing synchronization primitive to our MMU handling code -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJY5MItAAoJEEtpOizt6ddy4mUH/1Z2rt2mUAYFQpWD/vy9WMxf zJKMtcLlZZGjeU78zFfWuOxEo1bbDO+tOTV1docNnY8xjyszCZ5XKOqMeo2a7Vfh 1QYHxJTOmgxcRmMsOnJpqUXhhYm9hDxrbU88U/wvoNllLjWBea01ZXiJbWFPBssT jrdtcCVstDGp3x3D91RgYNNzj9jNw80RBekACZZwYokDRpBZyUb8DYKfUgABFEKT UPiHrxb8UOVqvbCuXMBNzhUZcuMoAh3oY02R9sV7u1QOXAJYfRV4fOV12fIcYbHf tnyU8cCxEkSI1pHrpVG6SStcMt8yznQ+UPo0okQNBJXim2yI8+QKHtQlvx7Tjo8= =tPDd -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-v4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm From: Christoffer Dall <cdall@linaro.org> KVM/ARM Fixes for v4.11-rc6 Fixes include: - Fix a problem with GICv3 userspace save/restore - Clarify GICv2 userspace save/restore ABI - Be more careful in clearing GIC LRs - Add missing synchronization primitive to our MMU handling code	2017-04-05 16:27:47 +02:00
Christoffer Dall	6d56111c92	KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI As an oversight, for GICv2, we accidentally export the GICC_PMR register in the format of the GICH_VMCR.VMPriMask field in the lower 5 bits of a word, meaning that userspace must always use the lower 5 bits to communicate with the KVM device and must shift the value left by 3 places to obtain the actual priority mask level. Since GICv3 supports the full 8 bits of priority masking in the ICH_VMCR, we have to fix the value we export when emulating a GICv2 on top of a hardware GICv3 and exporting the emulated GICv2 state to userspace. Take the chance to clarify this aspect of the ABI. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-04 14:33:59 +02:00
Christoffer Dall	5b0d2cc280	KVM: arm64: Ensure LRs are clear when they should be We currently have some code to clear the list registers on GICv3, but we never call this code, because the caller got nuked when removing the old vgic. We also used to have a similar GICv2 part, but that got lost in the process too. Let's reintroduce the logic for GICv2 and call the logic when we initialize the use of hypervisors on the CPU, for example when first loading KVM or when exiting a low power state. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-04-04 14:33:58 +02:00
Herongguang (Stephen)	0292e169b2	KVM: pci-assign: do not map smm memory slot pages in vt-d page tables or VM memory are not put thus leaked in kvm_iommu_unmap_memslots() when destroy VM. This is consistent with current vfio implementation. Signed-off-by: herongguang <herongguang.he@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-28 10:08:54 +02:00
David Hildenbrand	90db10434b	KVM: kvm_io_bus_unregister_dev() should never fail No caller currently checks the return value of kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on freeing their device. A stale reference will remain in the io_bus, getting at least used again, when the iobus gets teared down on kvm_destroy_vm() - leading to use after free errors. There is nothing the callers could do, except retrying over and over again. So let's simply remove the bus altogether, print an error and make sure no one can access this broken bus again (returning -ENOMEM on any attempt to access it). Fixes: `e93f8a0f82` ("KVM: convert io_bus to SRCU") Cc: stable@vger.kernel.org # 3.4+ Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-23 19:02:25 +01:00
Ard Biesheuvel	63d7c6afc5	arm: kvm: move kvm_vgic_global_state out of .text section The kvm_vgic_global_state struct contains a static key which is written to by jump_label_init() at boot time. So in preparation of making .text regions truly (well, almost truly) read-only, mark kvm_vgic_global_state __ro_after_init so it moves to the .rodata section instead. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2017-03-23 13:53:46 +00:00
Peter Xu	df630b8c1e	KVM: x86: clear bus pointer when destroyed When releasing the bus, let's clear the bus pointers to mark it out. If any further device unregister happens on this bus, we know that we're done if we found the bus being released already. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-03-21 17:11:18 +01:00
Andre Przywara	a5e1e6ca94	KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled The ITS spec says that ITS commands are only processed when the ITS is enabled (section 8.19.4, Enabled, bit[0]). Our emulation was not taking this into account. Fix this by checking the enabled state before handling CWRITER writes. On the other hand that means that CWRITER could advance while the ITS is disabled, and enabling it would need those commands to be processed. Fix this case as well by refactoring actual command processing and calling this from both the GITS_CWRITER and GITS_CTLR handlers. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-03-07 15:44:08 +00:00
Jintack Lim	370a0ec181	KVM: arm/arm64: Let vcpu thread modify its own active state Currently, if a vcpu thread tries to change the active state of an interrupt which is already on the same vcpu's AP list, it will loop forever. Since the VGIC mmio handler is called after a vcpu has already synced back the LR state to the struct vgic_irq, we can just let it proceed safely. Cc: stable@vger.kernel.org Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-03-07 14:48:16 +00:00
Marc Zyngier	4dfc050571	KVM: arm/arm64: vgic-v3: Don't pretend to support IRQ/FIQ bypass Our GICv3 emulation always presents ICC_SRE_EL1 with DIB/DFB set to zero, which implies that there is a way to bypass the GIC and inject raw IRQ/FIQ by driving the CPU pins. Of course, we don't allow that when the GIC is configured, but we fail to indicate that to the guest. The obvious fix is to set these bits (and never let them being changed again). Reported-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-03-06 10:30:57 +00:00

1 2 3 4 5 ...

1271 Commits