linux

Author	SHA1	Message	Date
Miklos Szeredi	a9667ac88e	fuse: remove unused arg in fuse_write_file_get() The struct fuse_conn argument is not used and can be removed. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-09-06 13:37:10 +02:00
Miklos Szeredi	660585b56e	fuse: wait for writepages in syncfs In case of fuse the MM subsystem doesn't guarantee that page writeback completes by the time ->sync_fs() is called. This is because fuse completes page writeback immediately to prevent DoS of memory reclaim by the userspace file server. This means that fuse itself must ensure that writes are synced before sending the SYNCFS request to the server. Introduce sync buckets, that hold a counter for the number of outstanding write requests. On syncfs replace the current bucket with a new one and wait until the old bucket's counter goes down to zero. It is possible to have multiple syncfs calls in parallel, in which case there could be more than one waited-on buckets. Descendant buckets must not complete until the parent completes. Add a count to the child (new) bucket until the (parent) old bucket completes. Use RCU protection to dereference the current bucket and to wake up an emptied bucket. Use fc->lock to protect against parallel assignments to the current bucket. This leaves just the counter to be a possible scalability issue. The fc->num_waiting counter has a similar issue, so both should be addressed at the same time. Reported-by: Amir Goldstein <amir73il@gmail.com> Fixes: `2d82ab251e` ("virtiofs: propagate sync() to file server") Cc: <stable@vger.kernel.org> # v5.14 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-09-06 13:37:02 +02:00
Xie Yongji	7bc7f61897	Documentation: Add documentation for VDUSE VDUSE (vDPA Device in Userspace) is a framework to support implementing software-emulated vDPA devices in userspace. This document is intended to clarify the VDUSE design and usage. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-14-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:58 -04:00
Xie Yongji	c8a6153b6c	vduse: Introduce VDUSE - vDPA Device in Userspace This VDUSE driver enables implementing software-emulated vDPA devices in userspace. The vDPA device is created by ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device interface (/dev/vduse/$NAME) is exported to userspace for device emulation. In order to make the device emulation more secure, the device's control path is handled in kernel. A message mechnism is introduced to forward some dataplane related control messages to userspace. And in the data path, the DMA buffer will be mapped into userspace address space through different ways depending on the vDPA bus to which the vDPA device is attached. In virtio-vdpa case, the MMU-based software IOTLB is used to achieve that. And in vhost-vdpa case, the DMA buffer is reside in a userspace memory region which can be shared to the VDUSE userspace processs via transferring the shmfd. For more details on VDUSE design and usage, please see the follow-on Documentation commit. NB(mst): when merging this with `b542e383d8` ("eventfd: Make signal recursion protection a task bit") replace eventfd_signal_count with eventfd_signal_allowed, and drop the previous ("eventfd: Export eventfd_wake_count to modules"). Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-13-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:58 -04:00
Xie Yongji	8c773d53fb	vduse: Implement an MMU-based software IOTLB This implements an MMU-based software IOTLB to support mapping kernel dma buffer into userspace dynamically. The basic idea behind it is treating MMU (VA->PA) as IOMMU (IOVA->PA). The software IOTLB will set up MMU mapping instead of IOMMU mapping for the DMA transfer so that the userspace process is able to use its virtual address to access the dma buffer in kernel. To avoid security issue, a bounce-buffering mechanism is introduced to prevent userspace accessing the original buffer directly which may contain other kernel data. During the mapping, unmapping, the software IOTLB will copy the data from the original buffer to the bounce buffer and back, depending on the direction of the transfer. And the bounce-buffer addresses will be mapped into the user address space instead of the original one. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-12-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:58 -04:00
Xie Yongji	d8945ec411	vdpa: Support transferring virtual addressing during DMA mapping This patch introduces an attribute for vDPA device to indicate whether virtual address can be used. If vDPA device driver set it, vhost-vdpa bus driver will not pin user page and transfer userspace virtual address instead of physical address during DMA mapping. And corresponding vma->vm_file and offset will be also passed as an opaque pointer. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-11-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:57 -04:00
Xie Yongji	22af48cf91	vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() The upcoming patch is going to support VA mapping/unmapping. So let's factor out the logic of PA mapping/unmapping firstly to make the code more readable. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-10-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:57 -04:00
Xie Yongji	c10fb9454a	vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() Add an opaque pointer for DMA mapping. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-9-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:57 -04:00
Xie Yongji	59dfe4f1e8	vhost-iotlb: Add an opaque pointer for vhost IOTLB Add an opaque pointer for vhost IOTLB. And introduce vhost_iotlb_add_range_ctx() to accept it. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-8-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:57 -04:00
Xie Yongji	7f05630dc6	vhost-vdpa: Handle the failure of vdpa_reset() The vdpa_reset() may fail now. This adds check to its return value and fail the vhost_vdpa_open(). Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-7-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:57 -04:00
Xie Yongji	0686082dbf	vdpa: Add reset callback in vdpa_config_ops This adds a new callback to support device specific reset behavior. The vdpa bus driver will call the reset function instead of setting status to zero during resetting. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210831103634.33-6-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:57 -04:00
Xie Yongji	86e17a51c1	vdpa: Fix some coding style issues Fix some code indent issues and following checkpatch warning: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' 371: FILE: include/linux/vdpa.h:371: +static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset, Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-5-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:57 -04:00
Xie Yongji	9c930054f2	file: Export receive_fd() to modules Export receive_fd() so that some modules can use it to pass file descriptor between processes without missing any security stuffs. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-4-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:56 -04:00
Xie Yongji	7a6b92d33a	eventfd: Export eventfd_wake_count to modules Export eventfd_wake_count so that some modules can use the eventfd_signal_count() to check whether the eventfd_signal() call should be deferred to a safe context. NB(mst): this patch is not needed in Linus tree since there eventfd_signal_count() has been superseded by an already exported eventfd_signal_allowed(). Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210831103634.33-3-xieyongji@bytedance.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:56 -04:00
Xie Yongji	a93a962669	iova: Export alloc_iova_fast() and free_iova_fast() Export alloc_iova_fast() and free_iova_fast() so that some modules can make use of the per-CPU cache to get rid of rbtree spinlock in alloc_iova() and free_iova() during IOVA allocation. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210831103634.33-2-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:56 -04:00
Max Gurtovoy	6105d1fe6f	virtio-blk: remove unneeded "likely" statements Usually we use "likely/unlikely" to optimize the fast path. Remove redundant "likely/unlikely" statements in the control path to simplify the code and make it easier to read. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20210905085717.7427-1-mgurtovoy@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-09-06 07:20:56 -04:00
Xianting Tian	81a83d7f4c	virtio-balloon: Use virtio_find_vqs() helper Use the helper virtio_find_vqs(). Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210723054259.2779-1-xianting.tian@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-09-06 07:20:56 -04:00
Zelin Deng	d9130a2dfd	KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted When MSR_IA32_TSC_ADJUST is written by guest due to TSC ADJUST feature especially there's a big tsc warp (like a new vCPU is hot-added into VM which has been up for a long time), tsc_offset is added by a large value then go back to guest. This causes system time jump as tsc_timestamp is not adjusted in the meantime and pvclock monotonic character. To fix this, just notify kvm to update vCPU's guest time before back to guest. Cc: stable@vger.kernel.org Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1619576521-81399-2-git-send-email-zelin.deng@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 07:07:03 -04:00
Paolo Bonzini	4ac214574d	KVM: MMU: mark role_regs and role accessors as maybe unused It is reasonable for these functions to be used only in some configurations, for example only if the host is 64-bits (and therefore supports 64-bit guests). It is also reasonable to keep the role_regs and role accessors in sync even though some of the accessors may be used only for one of the two sets (as is the case currently for CR4.LA57).. Because clang reports warnings for unused inlines declared in a .c file, mark both sets of accessors as __maybe_unused. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:56:38 -04:00
Huacai Chen	a3cf527e70	KVM: MIPS: Remove a "set but not used" variable This fix a build warning: arch/mips/kvm/vz.c: In function '_kvm_vz_restore_htimer': >> arch/mips/kvm/vz.c:392:10: warning: variable 'freeze_time' set but not used [-Wunused-but-set-variable] 392 \| ktime_t freeze_time; \| ^~~~~~~~~~~ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Message-Id: <20210406024911.2008046-1-chenhuacai@loongson.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:52:10 -04:00
Paolo Bonzini	e99314a340	Merge tag 'kvmarm-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.15 - Page ownership tracking between host EL1 and EL2 - Rely on userspace page tables to create large stage-2 mappings - Fix incompatibility between pKVM and kmemleak - Fix the PMU reset state, and improve the performance of the virtual PMU - Move over to the generic KVM entry code - Address PSCI reset issues w.r.t. save/restore - Preliminary rework for the upcoming pKVM fixed feature - A bunch of MM cleanups - a vGIC fix for timer spurious interrupts - Various cleanups	2021-09-06 06:34:48 -04:00
Paolo Bonzini	0d0a19395b	Merge tag 'kvm-s390-next-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix and feature for 5.15 - enable interpretion of specification exceptions - fix a vcpu_idx vs vcpu_id mixup	2021-09-06 06:33:40 -04:00
Lai Jiangshan	a40b2fd064	x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait Commit `f4e61f0c9a` ("x86/kvm: Fix broken irq restoration in kvm_wait") replaced "local_irq_restore() when IRQ enabled" with "local_irq_enable() when IRQ enabled" to suppress a warnning. Although there is no similar debugging warnning for doing local_irq_enable() when IRQ enabled as doing local_irq_restore() in the same IRQ situation. But doing local_irq_enable() when IRQ enabled is no less broken as doing local_irq_restore() and we'd better avoid it. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20210814035129.154242-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:32:35 -04:00
Jing Zhang	3cc4e148b9	KVM: stats: Add VM stat for remote tlb flush requests Add a new stat that counts the number of times a remote TLB flush is requested, regardless of whether it kicks vCPUs out of guest mode. This allows us to look at how often flushes are initiated. Unlike remote_tlb_flush, this one applies to ARM's instruction-set-based TLB flush implementation, so apply it there too. Original-by: David Matlack <dmatlack@google.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210817002639.3856694-1-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:30:45 -04:00
Sean Christopherson	fdde13c13f	KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count() Don't export KVM's MMU notifier count helpers, under no circumstance should any downstream module, including x86's vendor code, have a legitimate reason to piggyback KVM's MMU notifier logic. E.g in the x86 case, only KVM's MMU should be elevating the notifier count, and that code is always built into the core kvm.ko module. Fixes: `edb298c663` ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range") Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210902175951.1387989-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:21:29 -04:00
Sean Christopherson	1148bfc47b	KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page Move "lpage_disallowed_link" out of the first 64 bytes, i.e. out of the first cache line, of kvm_mmu_page so that "spt" and to a lesser extent "gfns" land in the first cache line. "lpage_disallowed_link" is accessed relatively infrequently compared to "spt", which is accessed any time KVM is walking and/or manipulating the shadow page tables. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901221023.1303578-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:20:05 -04:00
Sean Christopherson	ca41c34cab	KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality Move "tdp_mmu_page" into the 1-byte void left by the recently removed "mmio_cached" so that it resides in the first 64 bytes of kvm_mmu_page, i.e. in the same cache line as the most commonly accessed fields. Don't bother wrapping tdp_mmu_page in CONFIG_X86_64, including the field in 32-bit builds doesn't affect the size of kvm_mmu_page, and a future patch can always wrap the field in the unlikely event KVM gains a 1-byte flag that is 32-bit specific. Note, the size of kvm_mmu_page is also unchanged on CONFIG_X86_64=y due to it previously sharing an 8-byte chunk with write_flooding_count. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901221023.1303578-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:19:07 -04:00
Sean Christopherson	e7177339d7	Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" Revert a misguided illegal GPA check when "translating" a non-nested GPA. The check is woefully incomplete as it does not fill in @exception as expected by all callers, which leads to KVM attempting to inject a bogus exception, potentially exposing kernel stack information in the process. WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525 CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0 RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525 Call Trace: x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853 kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199 handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336 __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline] vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038 vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712 vcpu_run arch/x86/kvm/x86.c:9779 [inline] kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010 kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652 The bug has escaped notice because practically speaking the GPA check is useless. The GPA check in question only comes into play when KVM is walking guest page tables (or "translating" CR3), and KVM already handles illegal GPA checks by setting reserved bits in rsvd_bits_mask for each PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an illegal CR3. This particular failure doesn't hit the existing reserved bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging doesn't define any reserved PA bits checks, which KVM emulates by only incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32. Simply remove the bogus check. There is zero meaningful value and no architectural justification for supporting guest.MAXPHYADDR < 32, and properly filling the exception would introduce non-trivial complexity. This reverts commit `ec7771ab47`. Fixes: `ec7771ab47` ("KVM: x86: mmu: Add guest physical address check in translate_gpa()") Cc: stable@vger.kernel.org Reported-by: syzbot+200c08e88ae818f849ce@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210831164224.1119728-2-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:18:02 -04:00
Jia He	678a305b85	KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page After reverting and restoring the fast tlb invalidation patch series, the mmio_cached is not removed. Hence a unused field is left in kvm_mmu_page. Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jia He <justin.he@arm.com> Message-Id: <20210830145336.27183-1-justin.he@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:13:09 -04:00
Eduardo Habkost	1dbaf04cb9	kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 Support for 710 VCPUs was tested by Red Hat since RHEL-8.4, so increase KVM_SOFT_MAX_VCPUS to 710. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20210903211600.2002377-4-ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:12:05 -04:00
Eduardo Habkost	074c82c8f7	kvm: x86: Increase MAX_VCPUS to 1024 Increase KVM_MAX_VCPUS to 1024, so we can test larger VMs. I'm not changing KVM_SOFT_MAX_VCPUS yet because I'm afraid it might involve complicated questions around the meaning of "supported" and "recommended" in the upstream tree. KVM_SOFT_MAX_VCPUS will be changed in a separate patch. For reference, visible effects of this change are: - KVM_CAP_MAX_VCPUS will now return 1024 (of course) - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX will now be 1024 - KVM_MAX_VCPU_ID will change from 1151 to 4096 - Size of struct kvm will increase from 19328 to 22272 bytes (in x86_64) - Size of struct kvm_ioapic will increase from 1780 to 5084 bytes (in x86_64) - Bitmap stack variables that will grow: - At kvm_hv_flush_tlb() kvm_hv_send_ipi(), vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long - vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long once patch "KVM: x86: Fix stack-out-of-bounds memory access from ioapic_write_indirect()" is applied Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20210903211600.2002377-3-ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:11:55 -04:00
Eduardo Habkost	4ddacd525a	kvm: x86: Set KVM_MAX_VCPU_ID to 4KVM_MAX_VCPUS Instead of requiring KVM_MAX_VCPU_ID to be manually increased every time we increase KVM_MAX_VCPUS, set it to 4KVM_MAX_VCPUS. This should be enough for CPU topologies where Cores-per-Package and Packages-per-Socket are not powers of 2. In practice, this increases KVM_MAX_VCPU_ID from 1023 to 1152. The only side effect of this change is making some fields in struct kvm_ioapic larger, increasing the struct size from 1628 to 1780 bytes (in x86_64). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20210903211600.2002377-2-ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:11:39 -04:00
Randy Dunlap	e4457a45b4	iwlwifi: fix printk format warnings in uefi.c The kernel test robot reports printk format warnings in uefi.c, so correct them. ../drivers/net/wireless/intel/iwlwifi/fw/uefi.c: In function 'iwl_uefi_get_pnvm': ../drivers/net/wireless/intel/iwlwifi/fw/uefi.c:52:30: warning: format '%zd' expects argument of type 'signed size_t', but argument 7 has type 'long unsigned int' [-Wformat=] 52 \| "PNVM UEFI variable not found %d (len %zd)\n", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 53 \| err, package_size); \| ~~~~~~~~~~~~ \| \| \| long unsigned int ../drivers/net/wireless/intel/iwlwifi/fw/uefi.c:59:29: warning: format '%zd' expects argument of type 'signed size_t', but argument 6 has type 'long unsigned int' [-Wformat=] 59 \| IWL_DEBUG_FW(trans, "Read PNVM from UEFI with size %zd\n", package_size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ \| \| \| long unsigned int Fixes: `84c3c9952a` ("iwlwifi: move UEFI code to a separate file") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Luca Coelho <luciano.coelho@intel.com> Cc: linux-wireless@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210821020901.25901-1-rdunlap@infradead.org	2021-09-06 13:03:06 +03:00
Maxim Levitsky	81b4b56d4f	KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation If we are emulating an invalid guest state, we don't have a correct exit reason, and thus we shouldn't do anything in this function. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Fixes: `95b5a48c4f` ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 06:00:27 -04:00
Sean Christopherson	a717a780fc	KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host Include pml5_root in the set of special roots if and only if the host, and thus NPT, is using 5-level paging. mmu_alloc_special_roots() expects special roots to be allocated as a bundle, i.e. they're either all valid or all NULL. But for pml5_root, that expectation only holds true if the host uses 5-level paging, which causes KVM to WARN about pml5_root being NULL when the other special roots are valid. The silver lining of 4-level vs. 5-level NPT being tied to the host kernel's paging level is that KVM's shadow root level is constant; unlike VMX's EPT, KVM can't choose 4-level NPT based on guest.MAXPHYADDR. That means KVM can still expect pml5_root to be bundled with the other special roots, it just needs to be conditioned on the shadow root level. Fixes: `cb0f722aff` ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210824005824.205536-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-06 05:56:38 -04:00
Hector.Yuan	4855e26bcf	cpufreq: mediatek-hw: Add support for CPUFREQ HW Introduce cpufreq HW driver which can support CPU frequency adjust in MT6779 platform. Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com> [ Viresh: Massaged the patch and cleaned some stuff. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-09-06 15:15:19 +05:30
Hector.Yuan	8486a32dd4	cpufreq: Add of_perf_domain_get_sharing_cpumask Add of_perf_domain_get_sharing_cpumask function to group cpu to specific performance domain. Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com> [ Viresh: create separate routine parse_perf_domain() and always set the cpumask. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-09-06 15:14:35 +05:30
Hector.Yuan	a8bbe0c944	dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW Add devicetree bindings for MediaTek HW driver. Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-09-06 15:14:30 +05:30
Rafał Miłecki	c2f24933a1	dt-bindings: mfd: Add Broadcom CRU CRU is a block used in e.g. Northstar devices. It can be seen in the bcm5301x.dtsi and this binding documents its proper usage. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>	2021-09-06 10:40:52 +01:00
Daniel Vetter	dcc5d82063	drm/i915: Stop rcu support for i915_address_space The full audit is quite a bit of work: - i915_dpt has very simple lifetime (somehow we create a display pagetable vm per object, so its _very_ simple, there's only ever a single vma in there), and uses i915_vm_close(), which internally does a i915_vm_put(). No rcu. Aside: wtf is i915_dpt doing in the intel_display.c garbage collector as a new feature, instead of added as a separate file with some clean-ish interface. Also, i915_dpt unfortunately re-introduces some coding patterns from pre-dma_resv_lock conversion times. - i915_gem_proto_ctx is fully refcounted and no rcu, all protected by fpriv->proto_context_lock. - i915_gem_context is itself rcu protected, and that might leak to anything it points at. Before commit `cf977e1861` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Dec 2 11:21:40 2020 +0000 drm/i915/gem: Spring clean debugfs and commit `db80a1294c` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 18 11:08:54 2021 +0000 drm/i915/gem: Remove per-client stats from debugfs/i915_gem_objects we had a bunch of debugfs files that relied on rcu protecting everything, but those are gone now. The main one was removed even earlier with There doesn't seem to be anything left that's actually protecting stuff now that the ctx->vm itself is invariant. See commit `ccbc1b9794` Author: Jason Ekstrand <jason@jlekstrand.net> Date: Thu Jul 8 10:48:30 2021 -0500 drm/i915/gem: Don't allow changing the VM on running contexts (v4) Note that we drop the vm refcount before the final release of the gem context refcount, so this is all very dangerous even without rcu. Note that aside from later on creating new engines (a defunct feature) and debug output we're never looked at gem_ctx->vm for anything functional, hence why this is ok. Fingers crossed. Preceeding patches removed all vestiges of rcu use from gem_ctx->vm derferencing to make it clear it's really not used. The gem_ctx->rcu protection was introduced in commit `a4e7ccdac3` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 4 14:40:09 2019 +0100 drm/i915: Move context management under GEM The commit message is somewhat entertaining because it fails to mention this fact completely, and compensates that by an in-commit changelog entry that claims that ctx->vm is protected by ctx->mutex. Which was the case _before_ this commit, but no longer after it. - intel_context holds a full reference. Unfortunately intel_context is also rcu protected and the reference to the ->vm is dropped before the rcu barrier - only the kfree is delayed. So again we need to check whether that leaks anywhere on the intel_context->vm. RCU is only used to protect intel_context sitting on the breadcrumb lists, which don't look at the vm anywhere, so we are fine. Nothing else relies on rcu protection of intel_context and hence is fully protected by the kref refcount alone, which protects intel_context->vm in turn. The breadcrumbs rcu usage was added in commit `c744d50363` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 26 14:04:06 2020 +0000 drm/i915/gt: Split the breadcrumb spinlock between global and contexts its parent commit added the intel_context rcu protection: commit `14d1eaf088` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 26 14:04:05 2020 +0000 drm/i915/gt: Protect context lifetime with RCU given some credence to my claim that I've actually caught them all. - drm_i915_gem_object's shares_resv_from pointer has a full refcount to the dma_resv, which is a sub-refcount that's released after the final i915_vm_put() has been called. Safe. Aside: Maybe we should have a struct dma_resv_shared which is just dma_resv + kref as a stand-alone thing. It's a pretty useful pattern which other drivers might want to copy. For a bit more context see commit `4d8151ae53` Author: Thomas Hellström <thomas.hellstrom@linux.intel.com> Date: Tue Jun 1 09:46:41 2021 +0200 drm/i915: Don't free shared locks while shared - the fpriv->vm_xa was relying on rcu_read_lock for lookup, but that was updated in a prep patch too to just be a spinlock-protected lookup. - intel_gt->vm is set at driver load in intel_gt_init() and released in intel_gt_driver_release(). There seems to be some issue that in some error paths this is called twice, but otherwise no rcu to be found anywhere. This was added in the below commit, which unfortunately doesn't explain why this complication exists. commit `e6ba764802` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Dec 21 16:03:24 2019 +0000 drm/i915: Remove i915->kernel_context The proper fix most likely for this is to start using drmm_ at large scale, but that's also huge amounts of work. - i915_vma->vm is some real pain, because rcu is rcu protected, at least in the vma lookup in the context lookup cache in eb_lookup_vma(). This was added in commit `4ff4b44cbb` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 16 15:05:16 2017 +0100 drm/i915: Store a direct lookup from object handle to vma This was changed to a radix tree from the hashtable in, but with the locking unchanged, in commit `d1b48c1e71` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Aug 16 09:52:08 2017 +0100 drm/i915: Replace execbuf vma ht with an idr In commit `93159e1235` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Mar 23 09:28:41 2020 +0000 drm/i915/gem: Avoid gem_context->mutex for simple vma lookup the locking was changed from dev->struct_mutex to rcu, which added the requirement to rcu protect i915_vma. Somehow this was missed in review (or I'm completely blind). Irrespective of all that the vma lookup cache rcu_read_lock grabs a full reference of the vma and the rcu doesn't leak further. So no impact on i915_address_space from that. I have not found any other rcu use for i915_vma, but given that it seems broken I also didn't bother to do a careful in-depth audit. Alltogether there's nothing left in-tree anymore which requires that a pointer deref to an i915_address_space is safe undre rcu_read_lock only. rcu protection of i915_address_space was introduced in commit `b32fa81115` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jun 20 19:37:05 2019 +0100 drm/i915/gtt: Defer address space cleanup to an RCU worker by mixing up a bugfixing (i915_address_space needs to be released from a worker) with enabling rcu support. The commit message also seems somewhat confused, because it talks about cleanup of WC pages requiring sleep, while the code and linked bugzilla are about a requirement to take dev->struct_mutex (which yes sleeps but it's a much more specific problem). Since final kref_put can be called from pretty much anywhere (including hardirq context through the scheduler's i915_active cleanup) we need a worker here. Hence that part must be kept. Ideally all these reclaim workers should have some kind of integration with our shrinkers, but for some of these it's rather tricky. Anyway, that's a preexisting condition in the codeebase that we wont fix in this patch here. We also remove the rcu_barrier in ggtt_cleanup_hw added in commit `60a4233a49` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jul 29 14:24:12 2019 +0100 drm/i915: Flush the i915_vm_release before ggtt shutdown Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-11-daniel.vetter@ffwll.ch	2021-09-06 11:11:56 +02:00
Daniel Vetter	8431515218	drm/i915: use xa_lock/unlock for fpriv->vm_xa lookups We don't need the absolute speed of rcu for this. And i915_address_space in general dont need rcu protection anywhere else, after we've made gem contexts and engines a lot more immutable. Note that this semantically reverts commit `aabbe344dc` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Aug 30 19:03:25 2019 +0100 drm/i915: Use RCU for unlocked vm_idr lookup except we have the conversion from idr to xarray in between. v2: kref_get_unless_zero is no longer required (Maarten) Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-10-daniel.vetter@ffwll.ch	2021-09-06 11:11:45 +02:00
Daniel Vetter	9ec8795e7d	drm/i915: Drop __rcu from gem_context->vm It's been invariant since commit `ccbc1b9794` Author: Jason Ekstrand <jason@jlekstrand.net> Date: Thu Jul 8 10:48:30 2021 -0500 drm/i915/gem: Don't allow changing the VM on running contexts (v4) this just completes the deed. I've tried to split out prep work for more careful review as much as possible, this is what's left: - get_ppgtt gets simplified since we don't need to grab a temporary reference - we can rely on the temporary reference for the gem_ctx while we inspect the vm. The new vm_id still needs a full i915_vm_open ofc. This also removes the final caller of context_get_vm_rcu - A pile of selftests can now just look at ctx->vm instead of rcu_dereference_protected( , true) or similar things. - All callers of i915_gem_context_vm also disappear. - I've changed the hugepage selftest to set scrub_64K without any locking, because when we inspect that setting we're also not taking any locks either. It works because it's a selftests that's careful (single threaded gives you nice ordering) and not a live driver where races can happen from anywhere. These can only be split up further if we have some intermediate state with a bunch more rcu_dereference_protected(ctx->vm, true), just to shut up lockdep and sparse. The conversion to __rcu happened in commit `a4e7ccdac3` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 4 14:40:09 2019 +0100 drm/i915: Move context management under GEM Note that we're not breaking the actual bugfix in there: The real bugfix is pushing the i915_vm_relase onto a separate worker, to avoid locking inversion issues. The rcu conversion was just thrown in for entertainment value on top (no vm lookup isn't even close to anything that's a hotpath where removing the single spinlock can be measured). v2: Rebase over the change to move the i915_vm_put() into i915_gem_context_release(). v3: Trivial conflict against repainted shed. Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-9-daniel.vetter@ffwll.ch	2021-09-06 11:11:32 +02:00
Daniel Vetter	0483a30187	drm/i915: Use i915_gem_context_get_eb_vm in intel_context_set_gem Since commit `ccbc1b9794` Author: Jason Ekstrand <jason@jlekstrand.net> Date: Thu Jul 8 10:48:30 2021 -0500 drm/i915/gem: Don't allow changing the VM on running contexts (v4) the gem_ctx->vm can't change anymore. Plus we always set the intel_context->vm, so might as well use the helper we have for that. This makes it very clear that we always overwrite intel_context->vm for userspace contexts, since the default is gt->vm, which is explicitly reserved for kernel context use. It would be good to split things up a bit further and avoid any possibility for an accident where we run kernel stuff in userspace vm or the other way round. Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-8-daniel.vetter@ffwll.ch	2021-09-06 11:04:35 +02:00
Daniel Vetter	a82a9979de	drm/i915: Add i915_gem_context_is_full_ppgtt And use it anywhere we have open-coded checks for ctx->vm that really only check for full ppgtt. Plus for paranoia add a GEM_BUG_ON that checks it's really only set when we have full ppgtt, just in case. gem_context->vm is different since it's NULL in ggtt mode, unlike intel_context->vm or gt->vm, which is always set. v2: 0day found a testcase that I missed. v3: Repaint shed (Jon, Tvrtko) Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-7-daniel.vetter@ffwll.ch	2021-09-06 10:53:04 +02:00
Daniel Vetter	24fad29e52	drm/i915: Use i915_gem_context_get_eb_vm in ctx_getparam Consolidates the "which is the vm my execbuf runs in" code a bit. We do some get/put which isn't really required, but all the other users want the refcounting, and I figured doing a function just for this getparam to avoid 2 atomis is a bit much. Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-6-daniel.vetter@ffwll.ch	2021-09-06 10:52:03 +02:00
Daniel Vetter	c6d04e48d2	drm/i915: Rename i915_gem_context_get_vm_rcu to i915_gem_context_get_eb_vm The important part isn't so much that this does an rcu lookup - that's more an implementation detail, which will also be removed. The thing that makes this different from other functions is that it's gettting you the vm that batchbuffers will run in for that gem context, which is either a full ppgtt stored in gem->ctx, or the ggtt. We'll make more use of this function later on. Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-5-daniel.vetter@ffwll.ch	2021-09-06 10:46:04 +02:00
Daniel Vetter	e1068a9e80	drm/i915: Drop code to handle set-vm races from execbuf Changing the vm from a finalized gem ctx is no longer possible, which means we don't have to check for that anymore. I was pondering whether to keep the check as a WARN_ON, but things go boom real bad real fast if the vm of a vma is wrong. Plus we'd need to also get the ggtt vm for !full-ppgtt platforms. Ditching it all seemed like a better idea. Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> References: `ccbc1b9794` ("drm/i915/gem: Don't allow changing the VM on running contexts (v4)") Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-4-daniel.vetter@ffwll.ch	2021-09-06 10:45:56 +02:00
Daniel Vetter	8cf97637ff	drm/i915: Keep gem ctx->vm alive until the final put The comment added in commit `b81dde7194` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue May 21 22:11:29 2019 +0100 drm/i915: Allow userspace to clone contexts on creation and moved in commit `27dbae8f36` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 6 09:13:12 2019 +0000 drm/i915/gem: Safely acquire the ctx->vm when copying suggested that i915_address_space were at least intended to be managed through SLAB_TYPESAFE_BY_RCU: * This ppgtt may have be reallocated between * the read and the kref, and reassigned to a third * context. In order to avoid inadvertent sharing * of this ppgtt with that third context (and not * src), we have to confirm that we have the same * ppgtt after passing through the strong memory * barrier implied by a successful * kref_get_unless_zero(). But extensive git history search has not brough any such reuse to light. What has come to light though is that ever since commit `2850748ef8` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 4 14:39:58 2019 +0100 drm/i915: Pull i915_vma_pin under the vm->mutex (yes this commit is earlier) the final i915_vma_put call has been moved from i915_gem_context_free (now called _release) to context_close, which means it's not actually safe anymore to access the ctx->vm pointer without lock helds, because it might disappear at any moment. Note that superficially things all still work, because the i915_address_space is RCU protected since commit `b32fa81115` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jun 20 19:37:05 2019 +0100 drm/i915/gtt: Defer address space cleanup to an RCU worker except the very clever macro above (which is designed to protected against object reuse due to SLAB_TYPESAFE_BY_RCU or similar tricks) results in an endless loop if the refcount of the ctx->vm ever permanently drops to 0. Which it totally now can. Fix that by moving the final i915_vm_put to where it should be. Note that i915_gem_context is rcu protected, but _only_ the final kfree. This means anyone who chases a pointer to a gem ctx solely under the protection can pretty only call kref_get_unless_zero(). This seems to be pretty much the case, aside from a bunch of cases that consult the scheduling information without any further protection. Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Fixes: `2850748ef8` ("drm/i915: Pull i915_vma_pin under the vm->mutex") Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-3-daniel.vetter@ffwll.ch	2021-09-06 10:45:48 +02:00
Daniel Vetter	c238980efd	drm/i915: Release ctx->syncobj on final put, not on ctx close gem context refcounting is another exercise in least locking design it seems, where most things get destroyed upon context closure (which can race with anything really). Only the actual memory allocation and the locks survive while holding a reference. This tripped up Jason when reimplementing the single timeline feature in commit `00dae4d3d3` Author: Jason Ekstrand <jason@jlekstrand.net> Date: Thu Jul 8 10:48:12 2021 -0500 drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4) We could fix the bug by holding ctx->mutex in execbuf and clear the pointer (again while holding the mutex) context_close, but it's cleaner to just make the context object actually invariant over its _entire_ lifetime. This way any other ioctl that's potentially racing, but holding a full reference, can still rely on ctx->syncobj being an immutable pointer. Which without this change, is not the case. Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Fixes: `00dae4d3d3` ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-2-daniel.vetter@ffwll.ch	2021-09-06 10:45:40 +02:00
Daniel Vetter	75eefd8258	drm/i915: Release i915_gem_context from a worker The only reason for this really is the i915_gem_engines->fence callback engines_notify(), which exists purely as a fairly funky reference counting scheme for that. Otherwise all other callers are from process context, and generally fairly benign locking context. Unfortunately untangling that requires some major surgery, and we have a few i915_gem_context reference counting bugs that need fixing, and they blow in the current hardirq calling context, so we need a stop-gap measure. Put a FIXME comment in when this should be removable again. v2: Fix mock_context(), noticed by intel-gfx-ci. Acked-by: Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-1-daniel.vetter@ffwll.ch	2021-09-06 10:45:29 +02:00

... 9 10 11 12 13 ...

1042822 Commits