linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-28 15:11:31 +00:00

Author	SHA1	Message	Date
David Hildenbrand	f55484fd7b	virtio-mem: check if the config changed before fake offlining memory If we repeatedly fail to fake offline memory to unplug it, we won't be sending any unplug requests to the device. However, we only check if the config changed when sending such (un)plug requests. We could end up trying for a long time to unplug memory, even though the config changed already and we're not supposed to unplug memory anymore. For example, the hypervisor might detect a low-memory situation while unplugging memory and decide to replug some memory. Continuing trying to unplug memory in that case can be problematic. So let's check on a more regular basis. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230713145551.2824980-5-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:51:46 -04:00
David Hildenbrand	a31648fd4f	virtio-mem: keep retrying on offline_and_remove_memory() errors in Sub Block Mode (SBM) In case offline_and_remove_memory() fails in SBM, we leave a completely unplugged Linux memory block stick around until we try plugging memory again. We won't try removing that memory block again. offline_and_remove_memory() may, for example, fail if we're racing with another alloc_contig_range() user, if allocating temporary memory fails, or if some memory notifier rejected the offlining request. Let's handle that case better, by simple retrying to offline and remove such memory. Tested using CONFIG_MEMORY_NOTIFIER_ERROR_INJECT. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230713145551.2824980-4-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:51:46 -04:00
David Hildenbrand	ddf4098514	virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY Just like we do with alloc_contig_range(), let's convert all unknown errors to -EBUSY, but WARN so we can look into the issue. For example, offline_pages() could fail with -EINTR, which would be unexpected in our case. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230713145551.2824980-3-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:51:46 -04:00
David Hildenbrand	f504e15b94	virtio-mem: remove unsafe unplug in Big Block Mode (BBM) When "unsafe unplug" is enabled, we don't fake-offline all memory ahead of actual memory offlining using alloc_contig_range(). Instead, we rely on offline_pages() to also perform actual page migration, which might fail or take a very long time. In that case, it's possible to easily run into endless loops that cannot be aborted anymore (as offlining is triggered by a workqueue then): For example, a single (accidentally) permanently unmovable page in ZONE_MOVABLE results in an endless loop. For ZONE_NORMAL, races between isolating the pageblock (and checking for unmovable pages) and concurrent page allocation are possible and similarly result in endless loops. The idea of the unsafe unplug mode was to make it possible to more reliably unplug large memory blocks. However, (a) we really should be tackling that differently, by extending the alloc_contig_range()-based mechanism; and (b) this mode is not the default and as far as I know, it's unused either way. So let's simply get rid of it. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230713145551.2824980-2-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:51:46 -04:00
Gal Pressman	df95570464	virtio-vdpa: Fix cpumask memory leak in virtio_vdpa_find_vqs() Free the cpumask allocated by create_affinity_masks() before returning from the function. Fixes: `3dad56823b` ("virtio-vdpa: Support interrupt affinity spreading mechanism") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230726191036.14324-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>	2023-08-10 15:24:28 -04:00
Feng Liu	13f3efaca0	virtio-pci: Fix legacy device flag setting error in probe The 'is_legacy' flag is used to differentiate between legacy vs modern device. Currently, it is based on the value of vp_dev->ldev.ioaddr. However, due to the shared memory of the union between struct virtio_pci_legacy_device and struct virtio_pci_modern_device, when virtio_pci_modern_probe modifies the content of struct virtio_pci_modern_device, it affects the content of struct virtio_pci_legacy_device, and ldev.ioaddr is no longer zero, causing the 'is_legacy' flag to be set as true. To resolve issue, when legacy device is probed, mark 'is_legacy' as true, when modern device is probed, keep 'is_legacy' as false. Fixes: `4f0fc22534` ("virtio_pci: Optimize virtio_pci_device structure size") Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20230719154550.79536-1-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-08-10 15:24:28 -04:00
Wolfram Sang	55c91fedd0	virtio-mmio: don't break lifecycle of vm_dev vm_dev has a separate lifecycle because it has a 'struct device' embedded. Thus, having a release callback for it is correct. Allocating the vm_dev struct with devres totally breaks this protection, though. Instead of waiting for the vm_dev release callback, the memory is freed when the platform_device is removed. Resulting in a use-after-free when finally the callback is to be called. To easily see the problem, compile the kernel with CONFIG_DEBUG_KOBJECT_RELEASE and unbind with sysfs. The fix is easy, don't use devres in this case. Found during my research about object lifetime problems. Fixes: `7eb781b1bb` ("virtio_mmio: add cleanup for virtio_mmio_probe") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Message-Id: <20230629120526.7184-1-wsa+renesas@sang-engineering.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-08-10 15:24:27 -04:00
Shannon Nelson	5d7d82d39e	virtio: allow caller to override device DMA mask in vp_modern To add a bit of vendor flexibility with various virtio based devices, allow the caller to specify a different DMA mask. This adds a dma_mask field to struct virtio_pci_modern_device. If defined by the driver, this mask will be used in a call to dma_set_mask_and_coherent() instead of the traditional DMA_BIT_MASK(64). This allows limiting the DMA space on vendor devices with address limitations. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230519215632.12343-3-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-06-27 10:47:08 -04:00
Shannon Nelson	a37c0191ac	virtio: allow caller to override device id in vp_modern To add a bit of vendor flexibility with various virtio based devices, allow the caller to check for a different device id. This adds a function pointer field to struct virtio_pci_modern_device to specify an override device id check. If defined by the driver, this function will be called to check that the PCI device is the vendor's expected device, and will return the found device id to be stored in mdev->id.device. This allows vendors with alternative vendor device ids to use this library on their own device BAR. Note: A lot of the diff in this is simply indenting the existing code into an else block. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230519215632.12343-2-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-06-27 10:47:08 -04:00
Feng Liu	4f0fc22534	virtio_pci: Optimize virtio_pci_device structure size Improve the size of the virtio_pci_device structure, which is commonly used to represent a virtio PCI device. A given virtio PCI device can either of legacy type or modern type, with the struct virtio_pci_legacy_device occupying 32 bytes and the struct virtio_pci_modern_device occupying 88 bytes. Make them a union, thereby save 32 bytes of memory as shown by the pahole tool. This improvement is particularly beneficial when dealing with numerous devices, as it helps conserve memory resources. Before the modification, pahole tool reported the following: struct virtio_pci_device { [...] struct virtio_pci_legacy_device ldev; /* 824 32 / / --- cacheline 13 boundary (832 bytes) was 24 bytes ago --- / struct virtio_pci_modern_device mdev; / 856 88 / / XXX last struct has 4 bytes of padding / [...] / size: 1056, cachelines: 17, members: 19 / [...] }; After the modification, pahole tool reported the following: struct virtio_pci_device { [...] union { struct virtio_pci_legacy_device ldev; / 824 32 / struct virtio_pci_modern_device mdev; / 824 88 / }; / 824 88 / [...] / size: 1024, cachelines: 16, members: 18 */ [...] }; Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20230516135446.16266-1-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-06-27 10:47:08 -04:00
Dragos Tatulea	fe37efba47	virtio-vdpa: Fix unchecked call to NULL set_vq_affinity The referenced patch calls set_vq_affinity without checking if the op is valid. This patch adds the check. Fixes: `3dad56823b` ("virtio-vdpa: Support interrupt affinity spreading mechanism") Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230504135053.2283816-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Feng Liu <feliu@nvidia.com>	2023-06-27 10:47:08 -04:00
Linus Torvalds	7fa8a8ee94	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page(). - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY= =J+Dj -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...	2023-04-27 19:42:02 -07:00
Linus Torvalds	8ccd54fe45	virtio,vhost,vdpa: features, fixes, cleanups reduction in interrupt rate in virtio perf improvement for VDUSE scalability for vhost-scsi non power of 2 ring support for packed rings better management for mlx5 vdpa suspend for snet VIRTIO_F_NOTIFICATION_DATA shared backend with vdpa-sim-blk user VA support in vdpa-sim better struct packing for virtio fixes, cleanups all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmRG+QcPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpMyAIALpq8Z9ljl7ADGLuvt/xeCnIdifo7NXam71s +algalRplF3QplnMxZ0vH19Z8Gvyl18fkk/l0tHoCrZZgyseYR6DbyZXPv8YIfFh NSBokhil+ZURH6eNJc2PLcBUF3QIL3rSv7tBq7/++PN3KIqdHIePbyUFLlwqb272 NLkOkHT30QBtncRWJORj/GqDxi/4H1zHDmfMd6xD/1B6IrC3gin205RnLuCa2H65 bP0IE025VrmrRqNGX7nhi7dIFo6SmMPwG5O0YWeEhFHaSOL9PJM/Z9EN4tLhC1v1 Y34fryH9e+MMSgBnCK2ExxTq/pGWsbhPbvisDfDf3M1m1HHfhYI= =N1SV -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: "virtio,vhost,vdpa: features, fixes, and cleanups: - reduction in interrupt rate in virtio - perf improvement for VDUSE - scalability for vhost-scsi - non power of 2 ring support for packed rings - better management for mlx5 vdpa - suspend for snet - VIRTIO_F_NOTIFICATION_DATA - shared backend with vdpa-sim-blk - user VA support in vdpa-sim - better struct packing for virtio and fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (52 commits) vhost_vdpa: fix unmap process in no-batch mode MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS tools/virtio: fix build caused by virtio_ring changes virtio_ring: add a struct device forward declaration vdpa_sim_blk: support shared backend vdpa_sim: move buffer allocation in the devices vdpa/snet: use likely/unlikely macros in hot functions vdpa/snet: implement kick_vq_with_data callback virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support virtio: add VIRTIO_F_NOTIFICATION_DATA feature support vdpa/snet: support the suspend vDPA callback vdpa/snet: support getting and setting VQ state MAINTAINERS: add vringh.h to Virtio Core and Net Drivers vringh: address kdoc warnings vdpa: address kdoc warnings virtio_ring: don't update event idx on get_buf vdpa_sim: add support for user VA vdpa_sim: replace the spinlock with a mutex to protect the state vdpa_sim: use kthread worker vdpa_sim: make devices agnostic for work management ...	2023-04-27 17:05:34 -07:00
Alvaro Karsz	2c4e4a22a3	virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support Add VIRTIO_F_NOTIFICATION_DATA support for vDPA transport. If this feature is negotiated, the driver passes extra data when kicking a virtqueue. A device that offers this feature needs to implement the kick_vq_with_data callback. kick_vq_with_data receives the vDPA device and data. data includes: 16 bits vqn and 16 bits next available index for split virtqueues. 16 bits vqs, 15 least significant bits of next available index and 1 bit next_wrap for packed virtqueues. This patch follows a patch [1] by Viktor Prutyanov which adds support for the MMIO, channel I/O and modern PCI transports. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230413081855.36643-3-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-04-21 03:02:35 -04:00
Viktor Prutyanov	af8ececda1	virtio: add VIRTIO_F_NOTIFICATION_DATA feature support According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature indicates that the driver passes extra data along with the queue notifications. In a split queue case, the extra data is 16-bit available index. In a packed queue case, the extra data is 1-bit wrap counter and 15-bit available index. Add support for this feature for MMIO, channel I/O and modern PCI transports. Signed-off-by: Viktor Prutyanov <viktor@daynix.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230413081855.36643-2-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-04-21 03:02:35 -04:00
Albert Huang	6c0b057cec	virtio_ring: don't update event idx on get_buf In virtio_net, if we disable napi_tx, when we trigger a tx interrupt, the vq->event_triggered will be set to true. It is then never reset until we explicitly call virtqueue_enable_cb_delayed or virtqueue_enable_cb_prepare. If we disable the napi_tx, virtqueue_enable_cb* will only be called when the tx ring is getting relatively empty. Since event_triggered is true, VRING_AVAIL_F_NO_INTERRUPT or VRING_PACKED_EVENT_FLAG_DISABLE will not be set. As a result we update vring_used_event(&vq->split.vring) or vq->packed.vring.driver->off_wrap every time we call virtqueue_get_buf_ctx. This causes more interrupts. To summarize: 1) event_triggered was set to true in vring_interrupt() 2) after this nothing will happen in virtqueue_disable_cb() so VRING_AVAIL_F_NO_INTERRUPT is not set in avail_flags_shadow 3) virtqueue_get_buf_ctx_split() will still think the cb is enabled and then it will publish a new event index To fix: update VRING_AVAIL_F_NO_INTERRUPT or VRING_PACKED_EVENT_FLAG_DISABLE in the vq when we call virtqueue_disable_cb even when event_triggered is true. Tested with iperf: iperf3 tcp stream: vm1 -----------------> vm2 vm2 just receives tcp data stream from vm1, and sends acks to vm1, there are many tx interrupts in vm2. with the patch applied there are just a few tx interrupts. v2->v3: -update the interrupt disable flag even with the event_triggered is set, -instead of checking whether event_triggered is set in -virtqueue_get_buf_ctx_{packed/split}, will cause the drivers which have -not called virtqueue_{enable/disable}_cb to miss notifications. v3->v4: -remove change for -"if (vq->packed.event_flags_shadow != VRING_PACKED_EVENT_FLAG_DISABLE)" -in virtqueue_disable_cb_packed Fixes: `8d622d21d2` ("virtio: fix up virtio_disable_cb") Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230329102300.61000-1-huangjie.albert@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-04-21 03:02:34 -04:00
Xie Yongji	5e68470f4e	vdpa: Add eventfd for the vdpa callback Add eventfd for the vdpa callback so that user can signal it directly instead of triggering the callback. It will be used for vhost-vdpa case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-9-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-04-21 03:02:32 -04:00
Xie Yongji	3dad56823b	virtio-vdpa: Support interrupt affinity spreading mechanism To support interrupt affinity spreading mechanism, this makes use of group_cpus_evenly() to create an irq callback affinity mask for each virtqueue of vdpa device. Then we will unify set_vq_affinity callback to pass the affinity to the vdpa device driver. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-4-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-04-21 03:02:31 -04:00
Xie Yongji	1d24692732	vdpa: Add set/get_vq_affinity callbacks in vdpa_config_ops This introduces set/get_vq_affinity callbacks in vdpa_config_ops to support virtqueue affinity management for vdpa device drivers. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-3-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-04-21 03:02:31 -04:00
Feng Liu	a084983dcd	virtio_ring: Allow non power of 2 sizes for packed virtqueue According to the Virtio Specification, the Queue Size parameter of a virtqueue corresponds to the maximum number of descriptors in that queue, and it does not have to be a power of 2 for packed virtqueues. However, the virtio_pci_modern driver enforced a power of 2 check for virtqueue sizes, which is unnecessary and restrictive for packed virtuqueue. Split virtqueue still needs to check the virtqueue size is power_of_2 which has been done in vring_alloc_queue_split of the virtio_ring layer. To validate this change, we tested various virtqueue sizes for packed rings, including 128, 256, 512, 100, 200, 500, and 1000, with CONFIG_PAGE_POISONING enabled, and all tests passed successfully. Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20230315185458.11638-2-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-04-21 03:02:31 -04:00
Feng Liu	4b6ec919b8	virtio_ring: Use const to annotate read-only pointer params Add const to make the read-only pointer parameters clear, similar to many existing functions. To implement this change, the commit also introduces the use of `container_of_const` to implement `to_vvq`, which ensures the const-ness of read-only parameters and avoids accidental modification of their members. Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Message-Id: <20230310053428.3376-4-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-04-21 03:02:30 -04:00
Feng Liu	1adbd6b2fc	virtio_ring: Avoid using inline for small functions According to kernel coding style [1], defining inline functions is not necessary and beneficial for simple functions. Hence clean up the code by removing the inline keyword. It is verified with GCC 12.2.0, the generated code with/without inline is same. Additionally tested with pktgen and iperf, and verified the result, the pps test results are the same in the cases of with/without inline. Iperf and pps of pktgen for virtio-net didn't change before and after the change. [1] https://www.kernel.org/doc/html/v6.2-rc3/process/coding-style.html#the-inline-disease Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20230310053428.3376-3-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-04-21 03:02:30 -04:00
Rob Herring	c5111a5b1a	virtio-mmio: Add explicit include for of.h With linux/acpi.h no longer implicitly including of.h, add an explicit include of of.h to fix the following error: drivers/virtio/virtio_mmio.c:492:13: error: implicit declaration of function 'of_property_read_bool'; did you mean 'fwnode_property_read_bool'? [-Werror=implicit-function-declaration] Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-04-06 20:36:27 +02:00
Kirill A. Shutemov	23baf831a3	mm, treewide: redefine MAX_ORDER sanely MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-04-05 19:42:46 -07:00
Linus Torvalds	84cc6674b7	virtio,vhost,vdpa: features, fixes device feature provisioning in ifcvf, mlx5 new SolidNET driver support for zoned block device in virtio blk numa support in virtio pmem VIRTIO_F_RING_RESET support in vhost-net more debugfs entries in mlx5 resume support in vdpa completion batching in virtio blk cleanup of dma api use in vdpa now simulating more features in vdpa-sim documentation, features, fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmP0D98PHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpV6IH/iecRgLMWWjp3n31IFdu31f/J4HpF7dczVjK qtV98eJ1N2pkgeJkdCfmB5XszfvFBeAurrS7++FTHiJhrRfR3Z+2ml/Qtvh5DEyP qxz6wOw6VVsi/txdUxM1wsxLeEmmzkmFdAmPM+FXeIjhWj76GOgy/4A3eaj6TgzV W8ShsBve/UZ5qMOC3XbIscvdOrudHJ18tH90Tiz3NZfH1fAs5E4uWbU6Mrz9DJVr canGvf4kAI9z8qram5HSgzPIXRJEYiF4q/eiStdtiiME8gL1mHLRZDNP1I1LeCAb q6Q6RCRKi3Ek+LGdH6u+nR1Swu03N2b/g+vgKtv30kJo06oZVzw= =EasV -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: - device feature provisioning in ifcvf, mlx5 - new SolidNET driver - support for zoned block device in virtio blk - numa support in virtio pmem - VIRTIO_F_RING_RESET support in vhost-net - more debugfs entries in mlx5 - resume support in vdpa - completion batching in virtio blk - cleanup of dma api use in vdpa - now simulating more features in vdpa-sim - documentation, features, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits) vdpa/mlx5: support device features provisioning vdpa/mlx5: make MTU/STATUS presence conditional on feature bits vdpa: validate device feature provisioning against supported class vdpa: validate provisioned device features against specified attribute vdpa: conditionally read STATUS in config space vdpa: fix improper error message when adding vdpa dev vdpa/mlx5: Initialize CVQ iotlb spinlock vdpa/mlx5: Don't clear mr struct on destroy MR vdpa/mlx5: Directly assign memory key tools/virtio: enable to build with retpoline vringh: fix a typo in comments for vringh_kiov vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails scsi: virtio_scsi: fix handling of kmalloc failure vdpa: Fix a couple of spelling mistakes in some messages vhost-net: support VIRTIO_F_RING_RESET vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit vdpa: mlx5: support per virtqueue dma device vdpa: set dma mask for vDPA device virtio-vdpa: support per vq dma device vdpa: introduce get_vq_dma_device() ...	2023-02-25 11:48:02 -08:00
Jason Wang	a1baedb11e	virtio-vdpa: support per vq dma device This patch adds the support of per vq dma device for virito-vDPA. vDPA parents then are allowed to use different DMA devices. This is useful for the parents that have software or emulated virtqueues. Reviewed-by: Eli Cohen <elic@nvidia.com> Tested-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230119061525.75068-4-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-02-20 19:26:58 -05:00
Jason Wang	2713ea3c7d	virtio_ring: per virtqueue dma device This patch introduces a per virtqueue dma device. This will be used for virtio devices whose virtqueue are backed by different underlayer devices. One example is the vDPA that where the control virtqueue could be implemented through software mediation. Some of the work are actually done before since the helper like vring_dma_device(). This work left are: - Let vring_dma_device() return the per virtqueue dma device instead of the vdev's parent. - Allow passing a dma_device when creating the virtqueue through a new helper, old vring creation helper will keep using vdev's parent. Reviewed-by: Eli Cohen <elic@nvidia.com> Tested-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230119061525.75068-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-02-20 19:26:58 -05:00
Greg Kroah-Hartman	2a81ada32f	driver core: make struct bus_type.uevent() take a const * The uevent() callback in struct bus_type should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-27 13:45:52 +01:00
Dawei Li	c8e82e3877	virtio: Implementing attribute show with sysfs_emit Replace sprintf with sysfs_emit or its variants for their built-in PAGE_SIZE awareness. Signed-off-by: Dawei Li <set_pte_at@outlook.com> Message-Id: <TYCP286MB23232A999FE7DBDF50BA0FAACA0F9@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-12-28 05:28:11 -05:00
Angus Chen	b66ead2d0e	virtio_pci: modify ENOENT to EINVAL Virtio_crypto use max_data_queues+1 to setup vqs, we use vp_modern_get_num_queues to protect the vq range in setup_vq. We could enter index >= vp_modern_get_num_queues(mdev) in setup_vq if common->num_queues is not set well,and it return -ENOENT. It is better to use -EINVAL instead. Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com> Message-Id: <20221101111655.1947-1-angus.chen@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-12-28 05:28:09 -05:00
Shaoqin Huang	b9d978a892	virtio_ring: use helper function is_power_of_2() Use helper function is_power_of_2() to check if num is power of two. Minor readability improvement. Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com> Message-Id: <20221021062734.228881-3-shaoqin.huang@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>	2022-12-28 05:28:09 -05:00
Shaoqin Huang	344686136d	virtio_pci: use helper function is_power_of_2() Use helper function is_power_of_2() to check if num is power of two. Minor readability improvement. Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com> Message-Id: <20221021062734.228881-2-shaoqin.huang@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>	2022-12-28 05:28:09 -05:00
Michael S. Tsirkin	2145ab513e	virtio_pci: use irq to detect interrupt support commit `71491c54ea` ("virtio_pci: don't try to use intxif pin is zero") breaks virtio_pci on powerpc, when running as a qemu guest. vp_find_vqs() bails out because pci_dev->pin == 0. But pci_dev->irq is populated correctly, so vp_find_vqs_intx() would succeed if we called it - which is what the code used to do. This seems to happen because pci_dev->pin is not populated in pci_assign_irq(). A PCI core bug? Maybe. However Linus said: I really think that that is basically the only time you should use that 'pci_dev->pin' thing: it basically exists not for "does this device have an IRQ", but for "what is the routing of this irq on this device". and The correct way to check for "no irq" doesn't use NO_IRQ at all, it just does if (dev->irq) ... so let's just check irq and be done with it. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Fixes: `71491c54ea` ("virtio_pci: don't try to use intxif pin is zero") Cc: "Angus Chen" <angus.chen@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221012220312.308522-1-mst@redhat.com>	2022-10-13 09:33:03 -04:00
Linus Torvalds	27bc50fc90	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam R. Howlett. An overlapping range-based tree for vmas. It it apparently slight more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com). This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU= =xfWx -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...	2022-10-10 17:53:04 -07:00
Angus Chen	71491c54ea	virtio_pci: don't try to use intxif pin is zero The background is that we use dpu in cloud computing,the arch is x86,80 cores. We will have a lots of virtio devices,like 512 or more. When we probe about 200 virtio_blk devices,it will fail and the stack is printed as follows: [25338.485128] virtio-pci 0000:b3:00.0: virtio_pci: leaving for legacy driver [25338.496174] genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00 (timer) [25338.503822] CPU: 20 PID: 5431 Comm: kworker/20:0 Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.30.1.el8.x86_64 [25338.516403] Hardware name: Inspur NF5280M5/YZMB-00882-10E, BIOS 4.1.21 08/25/2021 [25338.523881] Workqueue: events work_for_cpu_fn [25338.528235] Call Trace: [25338.530687] dump_stack+0x5c/0x80 [25338.534000] __setup_irq.cold.53+0x7c/0xd3 [25338.538098] request_threaded_irq+0xf5/0x160 [25338.542371] vp_find_vqs+0xc7/0x190 [25338.545866] init_vq+0x17c/0x2e0 [virtio_blk] [25338.550223] ? ncpus_cmp_func+0x10/0x10 [25338.554061] virtblk_probe+0xe6/0x8a0 [virtio_blk] [25338.558846] virtio_dev_probe+0x158/0x1f0 [25338.562861] really_probe+0x255/0x4a0 [25338.566524] ? __driver_attach_async_helper+0x90/0x90 [25338.571567] driver_probe_device+0x49/0xc0 [25338.575660] bus_for_each_drv+0x79/0xc0 [25338.579499] __device_attach+0xdc/0x160 [25338.583337] bus_probe_device+0x9d/0xb0 [25338.587167] device_add+0x418/0x780 [25338.590654] register_virtio_device+0x9e/0xe0 [25338.595011] virtio_pci_probe+0xb3/0x140 [25338.598941] local_pci_probe+0x41/0x90 [25338.602689] work_for_cpu_fn+0x16/0x20 [25338.606443] process_one_work+0x1a7/0x360 [25338.610456] ? create_worker+0x1a0/0x1a0 [25338.614381] worker_thread+0x1cf/0x390 [25338.618132] ? create_worker+0x1a0/0x1a0 [25338.622051] kthread+0x116/0x130 [25338.625283] ? kthread_flush_work_fn+0x10/0x10 [25338.629731] ret_from_fork+0x1f/0x40 [25338.633395] virtio_blk: probe of virtio418 failed with error -16 The log : "genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00 (timer)" was printed because of the irq 0 is used by timer exclusive,and when vp_find_vqs call vp_find_vqs_msix and returns false twice (for whatever reason), then it will call vp_find_vqs_intx as a fallback. Because vp_dev->pci_dev->irq is zero, we request irq 0 with flag IRQF_SHARED, and get a backtrace like above. According to PCI spec about "Interrupt Pin" Register (Offset 3Dh): "The Interrupt Pin register is a read-only register that identifies the legacy interrupt Message(s) the Function uses. Valid values are 01h, 02h, 03h, and 04h that map to legacy interrupt Messages for INTA, INTB, INTC, and INTD respectively. A value of 00h indicates that the Function uses no legacy interrupt Message(s)." So if vp_dev->pci_dev->pin is zero, we should not request legacy interrupt. Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220930000915.548-1-angus.chen@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-10-07 20:00:44 -04:00
Deming Wang	f7adf38928	virtio_ring: make vring_alloc_queue_packed prettier Add some spaces to vring_alloc_queue(make it look prettier). Signed-off-by: Deming Wang <wangdeming@inspur.com> Message-Id: <20220926183306.4535-1-wangdeming@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-10-07 09:32:40 -04:00
Deming Wang	bdeb2f9836	virtio_ring: split: Operators use unified style The operators of vring_alloc_queue_split should use the unified style.Add space for the '\|' ,make it be looked more pretty. Signed-off-by: Deming Wang <wangdeming@inspur.com> Message-Id: <20220926022202.1516-1-wangdeming@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-10-07 09:32:40 -04:00
Alexander Potapenko	88938359e2	virtio: kmsan: check/unpoison scatterlist in vring_map_one_sg() If vring doesn't use the DMA API, KMSAN is unable to tell whether the memory is initialized by hardware. Explicitly call kmsan_handle_dma() from vring_map_one_sg() in this case to prevent false positives. Link: https://lkml.kernel.org/r/20220915150417.722975-23-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-10-03 14:03:22 -07:00
Ricardo Cañuelo	5c669c4a4c	virtio: kerneldocs fixes and enhancements Fix variable names in some kerneldocs, naming in others. Add kerneldocs for struct vring_desc and vring_interrupt. Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Message-Id: <20220810094004.1250-2-ricardo.canuelo@collabora.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>	2022-08-16 01:40:24 -04:00
Michael S. Tsirkin	9993a4f989	virtio: Revert "virtio: find_vqs() add arg sizes" This reverts commit `a10fba0377`: the proposed API isn't supported on all transports but no effort was made to address this. It might not be hard to fix if we want to: maybe just rename size to size_hint and make sure legacy transports ignore the hint. But it's not sure what the benefit is in any case, so let's drop it. Fixes: `a10fba0377` ("virtio: find_vqs() add arg sizes") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220816053602.173815-8-mst@redhat.com>	2022-08-16 01:40:24 -04:00
Michael S. Tsirkin	9e82eb574c	virtio_vdpa: Revert "virtio_vdpa: support the arg sizes of find_vqs()" This reverts commit `99e8927d8a`: proposed API isn't supported on all transports but no effort was made to address this. It might not be hard to fix if we want to: maybe just rename size to size_hint and make sure legacy transports ignore the hint. But it's not sure what the benefit is in any case, so let's drop it. Fixes: `99e8927d8a` ("virtio_vdpa: support the arg sizes of find_vqs()") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220816053602.173815-6-mst@redhat.com>	2022-08-16 01:40:18 -04:00
Michael S. Tsirkin	13aa8c6c37	virtio_pci: Revert "virtio_pci: support the arg sizes of find_vqs()" This reverts commit `cdb44806fc`: the legacy path is wrong and in fact can not support the proposed API since for a legacy device we never communicate the vq size to the hypervisor. Reported-by: Andres Freund <andres@anarazel.de> Fixes: `cdb44806fc` ("virtio_pci: support the arg sizes of find_vqs()") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220816053602.173815-5-mst@redhat.com>	2022-08-16 01:38:29 -04:00
Michael S. Tsirkin	c62f61b58f	virtio-mmio: Revert "virtio_mmio: support the arg sizes of find_vqs()" This reverts commit `fbed86abba`. The API is now unused, let's not carry dead code around. Fixes: `fbed86abba` ("virtio_mmio: support the arg sizes of find_vqs()") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220816053602.173815-4-mst@redhat.com>	2022-08-16 01:38:29 -04:00
Linus Torvalds	7a53e17acc	virtio: fatures, fixes A huge patchset supporting vq resize using the new vq reset capability. Features, fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmL2F9APHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRp00QIAKpxyu+zCtrdDuh68DsNn1Cu0y0PXG336ySy MA1ck/bv94MZBIbI/Bnn3T1jDmUqTFHJiwaGz/aZ5gGAplZiejhH5Ds3SYjHckaa MKeJ4FTXin9RESP+bXhv4BgZ+ju3KHHkf1jw3TAdVKQ7Nma1u4E6f8nprYEi0TI0 7gLUYenqzS7X1+v9O3rEvPr7tSbAKXYGYpV82sSjHIb9YPQx5luX1JJIZade8A25 mTt5hG1dP1ugUm1NEBPQHjSvdrvO3L5Ahy0My2Bkd77+tOlNF4cuMPt2NS/6+Pgd n6oMt3GXqVvw5RxZyY8dpknH5kofZhjgFyZXH0l+aNItfHUs7t0= =rIo2 -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: - A huge patchset supporting vq resize using the new vq reset capability - Features, fixes, and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (88 commits) vdpa/mlx5: Fix possible uninitialized return value vdpa_sim_blk: add support for discard and write-zeroes vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSH vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests vdpa_sim_blk: check if sector is 0 for commands other than read or write vdpa_sim: Implement suspend vdpa op vhost-vdpa: uAPI to suspend the device vhost-vdpa: introduce SUSPEND backend feature bit vdpa: Add suspend operation virtio-blk: Avoid use-after-free on suspend/resume virtio_vdpa: support the arg sizes of find_vqs() vhost-vdpa: Call ida_simple_remove() when failed vDPA: fix 'cast to restricted le16' warnings in vdpa.c vDPA: !FEATURES_OK should not block querying device config space vDPA/ifcvf: support userspace to query features and MQ of a management device vDPA/ifcvf: get_config_size should return a value no greater than dev implementation vhost scsi: Allow user to control num virtqueues vhost-scsi: Fix max number of virtqueues vdpa/mlx5: Support different address spaces for control and data vdpa/mlx5: Implement susupend virtqueue callback ...	2022-08-12 09:50:34 -07:00
Bo Liu	99e8927d8a	virtio_vdpa: support the arg sizes of find_vqs() Virtio vdpa support the new parameter sizes of find_vqs(). Signed-off-by: Bo Liu <liubo03@inspur.com> Message-Id: <20220810085151.7251-1-liubo03@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:26:09 -04:00
Bo Liu	95bf979877	virtio: Check dev_set_name() return value It's possible that dev_set_name() returns -ENOMEM, catch and handle this. Signed-off-by: Bo Liu <liubo03@inspur.com> Message-Id: <20220707031751.4802-1-liubo03@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2022-08-11 04:26:07 -04:00
Xuan Zhuo	fbed86abba	virtio_mmio: support the arg sizes of find_vqs() Virtio MMIO support the new parameter sizes of find_vqs(). Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-36-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:41 -04:00
Xuan Zhuo	cdb44806fc	virtio_pci: support the arg sizes of find_vqs() Virtio PCI supports new parameter sizes of find_vqs(). Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-35-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:41 -04:00
Xuan Zhuo	a10fba0377	virtio: find_vqs() add arg sizes find_vqs() adds a new parameter sizes to specify the size of each vq vring. NULL as sizes means that all queues in find_vqs() use the maximum size. A value in the array is 0, which means that the corresponding queue uses the maximum size. In the split scenario, the meaning of size is the largest size, because it may be limited by memory, the virtio core will try a smaller size. And the size is power of 2. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-34-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:40 -04:00
Xuan Zhuo	04ca0b0b16	virtio_pci: support VIRTIO_F_RING_RESET This patch implements virtio pci support for QUEUE RESET. Performing reset on a queue is divided into these steps: 1. notify the device to reset the queue 2. recycle the buffer submitted 3. reset the vring (may re-alloc) 4. mmap vring to device, and enable the queue This patch implements virtio_reset_vq(), virtio_enable_resetq() in the pci scenario. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-33-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:40 -04:00
Xuan Zhuo	56bdc06139	virtio_pci: extract the logic of active vq for modern pci Introduce vp_active_vq() to configure vring to backend after vq attach vring. And configure vq vector if necessary. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-32-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:40 -04:00
Xuan Zhuo	0b50cece0b	virtio_pci: introduce helper to get/set queue reset Introduce new helpers to implement queue reset and get queue reset status. https://github.com/oasis-tcs/virtio-spec/issues/124 https://github.com/oasis-tcs/virtio-spec/issues/139 Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-31-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:40 -04:00
Xuan Zhuo	4913e85441	virtio_ring: struct virtqueue introduce reset Introduce a new member reset to the structure virtqueue to determine whether the current vq is in the reset state. Subsequent patches will use it. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-29-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:40 -04:00
Xuan Zhuo	3251063155	virtio: allow to unbreak/break virtqueue individually This patch allows the new introduced __virtqueue_break()/__virtqueue_unbreak() to break/unbreak the virtqueue. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-27-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:40 -04:00
Xuan Zhuo	c790e8e181	virtio_ring: introduce virtqueue_resize() Introduce virtqueue_resize() to implement the resize of vring. Based on these, the driver can dynamically adjust the size of the vring. For example: ethtool -G. virtqueue_resize() implements resize based on the vq reset function. In case of failure to allocate a new vring, it will give up resize and use the original vring. During this process, if the re-enable reset vq fails, the vq can no longer be used. Although the probability of this situation is not high. The parameter recycle is used to recycle the buffer that is no longer used. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-25-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:40 -04:00
Xuan Zhuo	947f9fcf67	virtio_ring: packed: introduce virtqueue_resize_packed() virtio ring packed supports resize. Only after the new vring is successfully allocated based on the new num, we will release the old vring. In any case, an error is returned, indicating that the vring still points to the old vring. In the case of an error, re-initialize(by virtqueue_reinit_packed()) the virtqueue to ensure that the vring can be used. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-24-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:40 -04:00
Xuan Zhuo	56775e141b	virtio_ring: packed: introduce virtqueue_reinit_packed() Introduce a function to initialize vq without allocating new ring, desc_state, desc_extra. Subsequent patches will call this function after reset vq to reinitialize vq. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-23-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:40 -04:00
Xuan Zhuo	51d649f14a	virtio_ring: packed: extract the logic of attach vring Separate the logic of attach vring, the subsequent patch will call it separately. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-22-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	1a107c87eb	virtio_ring: packed: extract the logic of vring init Separate the logic of initializing vring, and subsequent patches will call it separately. This function completes the variable initialization of packed vring. It together with the logic of atatch constitutes the initialization of vring. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-21-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	ef3167cfd5	virtio_ring: packed: extract the logic of alloc state and extra Separate the logic for alloc desc_state and desc_extra, which will be called separately by subsequent patches. Use struct vring_packed to pass desc_state, desc_extra. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-20-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	6b60b9c008	virtio_ring: packed: extract the logic of alloc queue Separate the logic of packed to create vring queue. This feature is required for subsequent virtuqueue reset vring. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-19-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	6356f8bb01	virtio_ring: packed: introduce vring_free_packed Free the structure struct vring_vritqueue_packed. Subsequent patches require it. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-18-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	6fea20e561	virtio_ring: split: introduce virtqueue_resize_split() virtio ring split supports resize. Only after the new vring is successfully allocated based on the new num, we will release the old vring. In any case, an error is returned, indicating that the vring still points to the old vring. In the case of an error, re-initialize(virtqueue_reinit_split()) the virtqueue to ensure that the vring can be used. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-17-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	af36b16f6c	virtio_ring: split: reserve vring_align, may_reduce_num In vring_alloc_queue_split() save vring_align, may_reduce_num to structure vring_virtqueue_split. Used to create a new vring when implementing resize. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-16-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	e5175b419a	virtio_ring: split: introduce virtqueue_reinit_split() Introduce a function to initialize vq without allocating new ring, desc_state, desc_extra. Subsequent patches will call this function after reset vq to reinitialize vq. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-15-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	e1d6a423ea	virtio_ring: split: extract the logic of attach vring Separate the logic of attach vring, subsequent patches will call it separately. virtqueue_vring_init_split() completes the initialization of other variables of vring split. We can directly use vq->split = *vring_split to complete attach. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-14-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	198fa7be96	virtio_ring: split: extract the logic of vring init Separate the logic of initializing vring, and subsequent patches will call it separately. This function completes the variable initialization of split vring. It together with the logic of atatch constitutes the initialization of vring. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-13-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	a2b36c8d7d	virtio_ring: split: extract the logic of alloc state and extra Separate the logic of creating desc_state, desc_extra, and subsequent patches will call it independently. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-12-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	c2d87fe68d	virtio_ring: split: extract the logic of alloc queue Separate the logic of split to create vring queue. This feature is required for subsequent virtuqueue reset vring. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-11-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	89f05d94a3	virtio_ring: split: introduce vring_free_split() Free the structure struct vring_vritqueue_split. Subsequent patches require it. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-10-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:39 -04:00
Xuan Zhuo	cd4c812acb	virtio_ring: split: __vring_new_virtqueue() accept struct vring_virtqueue_split __vring_new_virtqueue() instead accepts struct vring_virtqueue_split. The purpose of this is to pass more information into __vring_new_virtqueue() to make the code simpler and the structure cleaner. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-9-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:38 -04:00
Xuan Zhuo	07d9629d49	virtio_ring: split: stop __vring_new_virtqueue as export symbol There is currently only one place to reference __vring_new_virtqueue() directly from the outside of virtio core. And here vring_new_virtqueue() can be used instead. Subsequent patches will modify __vring_new_virtqueue, so stop it as an export symbol for now. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-8-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:38 -04:00
Xuan Zhuo	3a897128d3	virtio_ring: introduce virtqueue_init() Separate the logic of virtqueue initialization. These variables should be reset during reset. This logic can be called independently when implementing resize/reset later. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-7-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:38 -04:00
Xuan Zhuo	d76136e434	virtio_ring: split vring_virtqueue Separate the two inline structures(split and packed) from the structure vring_virtqueue. In this way, we can use these two structures later to pass parameters and retain temporary variables. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-6-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:38 -04:00
Xuan Zhuo	3ea19e3265	virtio_ring: extract the logic of freeing vring Introduce vring_free() to free the vring of vq. Subsequent patches will use vring_free() alone. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-5-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:38 -04:00
Xuan Zhuo	a62eecb3a9	virtio_ring: update the document of the virtqueue_detach_unused_buf for queue reset Added documentation for virtqueue_detach_unused_buf, allowing it to be called on queue reset. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-4-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:38 -04:00
Xuan Zhuo	da80296183	virtio: record the maximum queue num supported by the device. virtio-net can display the maximum (supported by hardware) ring size in ethtool -g eth0. When the subsequent patch implements vring reset, it can judge whether the ring size passed by the driver is legal based on this. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-2-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:38 -04:00
David Hildenbrand	0b6fd46ec5	drivers/virtio: Clarify CONFIG_VIRTIO_MEM for unsupported architectures Let's make it clearer that simply unlocking CONFIG_VIRTIO_MEM on an architecture is most probably not sufficient to have it working as expected. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20220610094737.65254-1-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:38 -04:00
Minghao Xue	02213273f7	virtio_mmio: add support to set IRQ of a virtio device as wakeup source According to virtio_mmio wakeup flag in device trees, set its IRQ as wakeup source in virtqueue initialization. Signed-off-by: Minghao Xue <quic_mingxue@quicinc.com> Message-Id: <1654851507-13891-3-git-send-email-quic_mingxue@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:38 -04:00
Michael S. Tsirkin	ebe797f25f	virtio: VIRTIO_HARDEN_NOTIFICATION is broken This option doesn't really work and breaks too many drivers. Not yet sure what's the right thing to do, for now let's make sure randconfig isn't broken by this. Fixes: `c346dae4f3` ("virtio: disable notification hardening by default") Cc: "Jason Wang" <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2022-08-11 04:06:37 -04:00
Xuan Zhuo	96ef18a24b	virtio_ring: remove the arg vq of vring_alloc_desc_extra() The parameter vq of vring_alloc_desc_extra() is useless. This patch removes this parameter. Subsequent patches will call this function to avoid passing useless arguments. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220624025621.128843-6-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-08-11 04:06:37 -04:00
Linus Torvalds	6614a3c316	- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/ SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE= =w/UH -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...	2022-08-05 16:32:45 -07:00
Linus Torvalds	7447691ef9	xen: branch for v6.0-rc1 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYuooOQAKCRCAXGG7T9hj vmmlAPoCfYBh4jKwRnvGvyn+sPQed/r0TH0wnsGK1ccONhyIvAD+IZcSTPsnp4Cj m1URGGff2PvAyjOIAzQZbKZomtfICwM= =z2e5 -----END PGP SIGNATURE----- Merge tag 'for-linus-6.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - a series fine tuning virtio support for Xen guests, including removal the now again unused "platform_has()" feature. - a fix for host admin triggered reboot of Xen guests - a simple spelling fix * tag 'for-linus-6.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: don't require virtio with grants for non-PV guests kernel: remove platform_has() infrastructure virtio: replace restricted mem access flag with callback xen: Fix spelling mistake xen/manage: Use orderly_reboot() to reboot	2022-08-04 15:10:55 -07:00
Linus Torvalds	f00654007f	Folio changes for 6.0 - Fix an accounting bug that made NR_FILE_DIRTY grow without limit when running xfstests - Convert more of mpage to use folios - Remove add_to_page_cache() and add_to_page_cache_locked() - Convert find_get_pages_range() to filemap_get_folios() - Improvements to the read_cache_page() family of functions - Remove a few unnecessary checks of PageError - Some straightforward filesystem conversions to use folios - Split PageMovable users out from address_space_operations into their own movable_operations - Convert aops->migratepage to aops->migrate_folio - Remove nobh support (Christoph Hellwig) -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmLpViQACgkQDpNsjXcp gj5pBgf/f3+K7Hi3qw7aYQCYJQ7IA/bLyE/DLWI59kuiao6wDSve40B9YH9X++Ha mRLp55bkQS+bwS2xa4jlqrIDJzAfNoWlXaXZHUXGL1C/52ChTF6jaH2cvO9PVlDS 7fLv1hy2LwiIdzpKJkUW7T+kcQGj3QLKqtQ4x8zD0LGMg055yvt/qndHSUi41nWT /58+6W8Sk4vvRgkpeChFzF1lGLy00+FGT8y5V2kM9uRliFQ7XPCwqB2a3e5jbW6z C1NXQmRnopCrnOT1TFIhK3DyX6MDIWV5qcikNAmCKFb9fQFPmjDLPt9iSoMGjw2M Z+UVhJCaU3ISccd0DG5Ra/vzs9/O9Q== =DgUi -----END PGP SIGNATURE----- Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache Pull folio updates from Matthew Wilcox: - Fix an accounting bug that made NR_FILE_DIRTY grow without limit when running xfstests - Convert more of mpage to use folios - Remove add_to_page_cache() and add_to_page_cache_locked() - Convert find_get_pages_range() to filemap_get_folios() - Improvements to the read_cache_page() family of functions - Remove a few unnecessary checks of PageError - Some straightforward filesystem conversions to use folios - Split PageMovable users out from address_space_operations into their own movable_operations - Convert aops->migratepage to aops->migrate_folio - Remove nobh support (Christoph Hellwig) * tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits) fs: remove the NULL get_block case in mpage_writepages fs: don't call ->writepage from __mpage_writepage fs: remove the nobh helpers jfs: stop using the nobh helper ext2: remove nobh support ntfs3: refactor ntfs_writepages mm/folio-compat: Remove migration compatibility functions fs: Remove aops->migratepage() secretmem: Convert to migrate_folio hugetlb: Convert to migrate_folio aio: Convert to migrate_folio f2fs: Convert to filemap_migrate_folio() ubifs: Convert to filemap_migrate_folio() btrfs: Convert btrfs_migratepage to migrate_folio mm/migrate: Add filemap_migrate_folio() mm/migrate: Convert migrate_page() to migrate_folio() nfs: Convert to migrate_folio btrfs: Convert btree_migratepage to migrate_folio mm/migrate: Convert expected_page_refs() to folio_expected_refs() mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio() ...	2022-08-03 10:35:43 -07:00
Matthew Wilcox (Oracle)	68f2736a85	mm: Convert all PageMovable users to movable_operations These drivers are rather uncomfortably hammered into the address_space_operations hole. They aren't filesystems and don't behave like filesystems. They just need their own movable_operations structure, which we can point to directly from page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:03 -04:00
Juergen Gross	a603002eea	virtio: replace restricted mem access flag with callback Instead of having a global flag to require restricted memory access for all virtio devices, introduce a callback which can select that requirement on a per-device basis. For convenience add a common function returning always true, which can be used for use cases like SEV. Per default use a callback always returning false. As the callback needs to be set in early init code already, add a virtio anchor which is builtin in case virtio is enabled. Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> # Arm64 guest using Xen Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20220622063838.8854-2-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2022-08-01 07:42:49 +02:00
Kefeng Wang	07252dfea2	mm: use is_zone_movable_page() helper Use is_zone_movable_page() helper to simplify code. Link: https://lkml.kernel.org/r/20220726131135.146912-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-29 18:07:20 -07:00
Roman Gushchin	e33c267ab7	mm: shrinkers: provide shrinkers with names Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-03 18:08:40 -07:00
Deming Wang	c7cc29aaeb	virtio_ring: make vring_create_virtqueue_split prettier Add some spaces to vring_alloc_queue(make it look prettier). Signed-off-by: Deming Wang <wangdeming@inspur.com> Message-Id: <20220622192306.4371-1-wangdeming@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-06-27 08:05:35 -04:00
Stephan Gerhold	e0c2ce8217	virtio_mmio: Restore guest page size on resume Virtio devices might lose their state when the VMM is restarted after a suspend to disk (hibernation) cycle. This means that the guest page size register must be restored for the virtio_mmio legacy interface, since otherwise the virtio queues are not functional. This is particularly problematic for QEMU that currently still defaults to using the legacy interface for virtio_mmio. Write the guest page size register again in virtio_mmio_restore() to make legacy virtio_mmio devices work correctly after hibernation. Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com> Message-Id: <20220621110621.3638025-3-stephan.gerhold@kernkonzept.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-06-27 08:05:35 -04:00
Stephan Gerhold	ed7ac37fde	virtio_mmio: Add missing PM calls to freeze/restore Most virtio drivers provide freeze/restore callbacks to finish up device usage before suspend and to reinitialize the virtio device after resume. However, these callbacks are currently only called when using virtio_pci. virtio_mmio does not have any PM ops defined. This causes problems for example after suspend to disk (hibernation), since the virtio devices might lose their state after the VMM is restarted. Calling virtio_device_freeze()/restore() ensures that the virtio devices are re-initialized correctly. Fix this by implementing the dev_pm_ops for virtio_mmio, similar to virtio_pci_common. Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com> Message-Id: <20220621110621.3638025-2-stephan.gerhold@kernkonzept.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-06-27 08:05:35 -04:00
Jason Wang	c346dae4f3	virtio: disable notification hardening by default We try to harden virtio device notifications in `8b4ec69d7e` ("virtio: harden vring IRQ"). It works with the assumption that the driver or core can properly call virtio_device_ready() at the right place. Unfortunately, this seems to be not true and uncover various bugs of the existing drivers, mainly the issue of using virtio_device_ready() incorrectly. So let's add a Kconfig option and disable it by default. It gives us time to fix the drivers and then we can consider re-enabling it. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220622012940.21441-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>	2022-06-24 02:49:48 -04:00
Bo Liu	03d9571706	virtio: Remove unnecessary variable assignments In function vp_modern_probe(), "pci_dev" is initialized with the value of "mdev->pci_dev", so assigning "pci_dev" to "mdev->pci_dev" is unnecessary since they store the same value. Signed-off-by: Bo Liu <liubo03@inspur.com> Message-Id: <20220617055952.5364-1-liubo03@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>	2022-06-24 02:49:48 -04:00
huangjie.albert	a7722890fd	virtio_ring : keep used_wrap_counter in vq->last_used_idx the used_wrap_counter and the vq->last_used_idx may get out of sync if they are separate assignment，and interrupt might use an incorrect value to check for the used index. for example:OOB access ksoftirqd may consume the packet and it will call: virtnet_poll -->virtnet_receive -->virtqueue_get_buf_ctx -->virtqueue_get_buf_ctx_packed and in virtqueue_get_buf_ctx_packed: vq->last_used_idx += vq->packed.desc_state[id].num; if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) { vq->last_used_idx -= vq->packed.vring.num; vq->packed.used_wrap_counter ^= 1; } if at the same time, there comes a vring interrupt，in vring_interrupt: we will call: vring_interrupt -->more_used -->more_used_packed -->is_used_desc_packed in is_used_desc_packed, the last_used_idx maybe >= vq->packed.vring.num. so this could case a memory out of bounds bug. this patch is to keep the used_wrap_counter in vq->last_used_idx so we can get the correct value to check for used index in interrupt. v3->v4: - use READ_ONCE/WRITE_ONCE to get/set vq->last_used_idx v2->v3: - add inline function to get used_wrap_counter and last_used - when use vq->last_used_idx, only read once if vq->last_used_idx is read twice, the values can be inconsistent. - use last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR)) to get the all bits below VRING_PACKED_EVENT_F_WRAP_CTR v1->v2: - reuse the VRING_PACKED_EVENT_F_WRAP_CTR - Remove parameter judgment in is_used_desc_packed, because it can't be illegal Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com> Message-Id: <20220617020411.80367-1-huangjie.albert@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-06-24 02:49:48 -04:00
Linus Torvalds	abe71eb32f	virtio,vdpa: fixes Fixes all over the place, most notably fixes for latent bugs in drivers that got exposed by suppressing interrupts before DRIVER_OK, which in turn has been done by `8b4ec69d7e` ("virtio: harden vring IRQ"). Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmKkIwUPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpo/8H/3JwrGv0k5vj1wbHXymmBCRJrQ90Wg5/lwGj 8419DWHidPqf8X/ZAEHoWMr3FM7Jsj7KuTl3sQZODxQbQ4wERvEljTRosxImJ3iI k2+T7HGf4pAIp8zzSp+1xL1feyctHT3atWKlpjXGqsDNCONexBIfG7/M63MkArOg SzrbcTsZzabdI2eKDSipiywnrGlQH6mq7YXynaKWBdL7RtJC+PHAYLksJufPz4Lw vsHpNEfWkoss+hsYKhH73lU4B++eQnHKbjiMktPkNLusxWW035unP97kqJKvxFgu NDF9PDa/4e47nJqqcibvo6VEZ5q6hLjaDCqRPjzEdKl44V8VeNE= =Q+zn -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael Tsirkin: "Fixes all over the place, most notably fixes for latent bugs in drivers that got exposed by suppressing interrupts before DRIVER_OK, which in turn has been done by `8b4ec69d7e` ("virtio: harden vring IRQ")" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: um: virt-pci: set device ready in probe() vdpa: make get_vq_group and set_group_asid optional virtio: Fix all occurences of the "the the" typo vduse: Fix NULL pointer dereference on sysfs access vringh: Fix loop descriptors check in the indirect cases vdpa/mlx5: clean up indenting in handle_ctrl_vlan() vdpa/mlx5: fix error code for deleting vlan virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed vdpa/mlx5: Fix syntax errors in comments virtio-rng: make device ready before making request	2022-06-11 16:32:47 -07:00
Bo Liu	acb0055e18	virtio: Fix all occurences of the "the the" typo There are double "the" in message in file virtio_mmio.c and virtio_pci_modern_dev.c, fix it. Signed-off-by: Bo Liu <liubo03@inspur.com> Message-Id: <20220609031106.2161-1-liubo03@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-06-09 00:26:16 -04:00
chengkaitao	a58a7f97ba	virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed The reference must be released when device_register(&vm_cmdline_parent) failed. Add the corresponding 'put_device()' in the error handling path. Signed-off-by: chengkaitao <pilgrimtao@gmail.com> Message-Id: <20220602005542.16489-1-chengkaitao@didiglobal.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2022-06-08 08:56:03 -04:00
Juergen Gross	3f9dfbebdc	virtio: replace arch_has_restricted_virtio_memory_access() Instead of using arch_has_restricted_virtio_memory_access() together with CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, replace those with platform_has() and a new platform feature PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> # Arm64 only Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Borislav Petkov <bp@suse.de>	2022-06-06 08:22:01 +02:00
keliu	4f58afd6eb	virtio: Directly use ida_alloc()/free() Use ida_alloc()/ida_free() instead of deprecated ida_simple_get()/ida_simple_remove() . Signed-off-by: keliu <liuke94@huawei.com> Message-Id: <20220527073302.2474073-1-liuke94@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-05-31 12:45:10 -04:00
Jason Wang	8b4ec69d7e	virtio: harden vring IRQ This is a rework on the previous IRQ hardening that is done for virtio-pci where several drawbacks were found and were reverted: 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ that is used by some device such as virtio-blk 2) done only for PCI transport The vq->broken is re-used in this patch for implementing the IRQ hardening. The vq->broken is set to true during both initialization and reset. And the vq->broken is set to false in virtio_device_ready(). Then vring_interrupt() can check and return when vq->broken is true. And in this case, switch to return IRQ_NONE to let the interrupt core aware of such invalid interrupt to prevent IRQ storm. The reason of using a per queue variable instead of a per device one is that we may need it for per queue reset hardening in the future. Note that the hardening is only done for vring interrupt since the config interrupt hardening is already done in commit `22b7050a02` ("virtio: defer config changed notifications"). But the method that is used by config interrupt can't be reused by the vring interrupt handler because it uses spinlock to do the synchronization which is expensive. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Vineeth Vijayan <vneethv@linux.ibm.com> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: linux-s390@vger.kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220527060120.20964-9-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>	2022-05-31 12:45:10 -04:00
Jason Wang	be83f04d25	virtio: allow to unbreak virtqueue This patch allows the new introduced __virtio_break_device() to unbreak the virtqueue. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Vineeth Vijayan <vneethv@linux.ibm.com> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: linux-s390@vger.kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220527060120.20964-8-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>	2022-05-31 12:45:10 -04:00
Jason Wang	9e9b289328	virtio-mmio: implement synchronize_cbs() Simply synchronize the platform irq that is used by us. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Vineeth Vijayan <vneethv@linux.ibm.com> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: linux-s390@vger.kernel.org Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220527060120.20964-6-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>	2022-05-31 12:45:09 -04:00
Jason Wang	48b3dd2438	virtio-pci: implement synchronize_cbs() We can simply reuse vp_synchronize_vectors() for .synchronize_cbs(). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Vineeth Vijayan <vneethv@linux.ibm.com> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: linux-s390@vger.kernel.org Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220527060120.20964-5-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>	2022-05-31 12:45:09 -04:00
Jason Wang	0aa96837c3	virtio: use virtio_reset_device() when possible This allows us to do common extension without duplicating code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Vineeth Vijayan <vneethv@linux.ibm.com> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: linux-s390@vger.kernel.org Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220527060120.20964-3-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>	2022-05-31 12:45:09 -04:00
Stefano Garzarella	2536b2ca15	virtio: use virtio_device_ready() in virtio_device_restore() It will allow us to do extension on virtio_device_ready() without duplicating code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Vineeth Vijayan <vneethv@linux.ibm.com> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: linux-s390@vger.kernel.org Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220527060120.20964-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>	2022-05-31 12:45:09 -04:00
Solomon Tan	0619eda83d	virtio: Replace long long int with long long This patch addresses the checkpatch.pl warning that long long is preferred over long long int. Signed-off-by: Solomon Tan <solomonbstoner@protonmail.ch> Message-Id: <YlzTUQa06sP94zxB@ArchDesktop> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-05-31 12:45:09 -04:00
Solomon Tan	3153234097	virtio: Replace unsigned with unsigned int This patch addresses the checkpatch.pl warning where unsigned int is preferred over unsigned. Signed-off-by: Solomon Tan <solomonbstoner@protonmail.ch> Message-Id: <YlzS49Wo8JMDhKOt@ArchDesktop> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-05-31 12:45:09 -04:00
Murilo Opsfelder Araujo	7e415282b4	virtio-pci: Remove wrong address verification in vp_del_vqs() GCC 12 enhanced -Waddress when comparing array address to null [0], which warns: drivers/virtio/virtio_pci_common.c: In function ‘vp_del_vqs’: drivers/virtio/virtio_pci_common.c:257:29: warning: the comparison will always evaluate as ‘true’ for the pointer operand in ‘vp_dev->msix_affinity_masks + (sizetype)((long unsigned int)i * 256)’ must not be NULL [-Waddress] 257 \| if (vp_dev->msix_affinity_masks[i]) \| ^~~~~~ In fact, the verification is comparing the result of a pointer arithmetic, the address "msix_affinity_masks + i", which will always evaluate to true. Under the hood, free_cpumask_var() calls kfree(), which is safe to pass NULL, not requiring non-null verification. So remove the verification to make compiler happy (happy compiler, happy life). [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102103 Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Message-Id: <20220415023002.49805-1-muriloo@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Christophe de Dinechin <dinechin@redhat.com>	2022-05-31 12:45:08 -04:00
Christophe JAILLET	7a836a2aba	virtio: pci: Fix an error handling path in vp_modern_probe() If an error occurs after a successful pci_request_selected_regions() call, it should be undone by a corresponding pci_release_selected_regions() call, as already done in vp_modern_remove(). Fixes: `fd502729fb` ("virtio-pci: introduce modern device module") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <237109725aad2c3c03d14549f777b1927c84b045.1648977064.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-05-31 12:45:08 -04:00
Gautam Dawar	ea239a6746	virtio-vdpa: don't set callback if virtio doesn't need it There's no need for setting callbacks for the driver that doesn't care about that. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-3-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-05-31 12:44:26 -04:00
Xianting Tian	b4b4ff73ef	virtio_ring: add unlikely annotation for free descs check The 'if (vq->vq.num_free < descs_used)' check will almost always be false. Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Message-Id: <20220328105817.1028065-2-xianting.tian@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>	2022-05-31 12:44:25 -04:00
Xianting Tian	35c51e093d	virtio_ring: remove unnecessary to_vvq call in vring hot path It passes '_vq' to virtqueue_use_indirect(), which still calls to_vvq to get 'vq', let's directly pass 'vq'. It can avoid unnecessary call of to_vvq in hot path. Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Message-Id: <20220328105817.1028065-1-xianting.tian@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>	2022-05-31 12:44:24 -04:00
Zi Yan	448b8ec3bf	drivers: virtio_mem: use pageblock size as the minimum virtio_mem size. alloc_contig_range() now only needs to be aligned to pageblock_nr_pages, drop virtio_mem size requirement that it needs to be MAX_ORDER_NR_PAGES. Link: https://lkml.kernel.org/r/20220425143118.2850746-7-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Ren <renzhengeek@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-05-13 07:20:13 -07:00
Linus Torvalds	3e732ebf73	virtio: fixes, cleanups A couple of mlx5 fixes related to cvq A couple of reverts dropping useless code (code that used it got reverted earlier) Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmJKyK4PHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpE40IAIvsq1WDsXfg/649lp0wMO5MPEzAvCfgQpub hCti2lKY4K9Ykqh+uTipDTxpRmg3wt/4QlQ6NYZVmn8jQ1JsNAIFCkc7ptG1xwQI 0RypofGsFQwbar2vKh8jEzYUsIbwLKeLNlib113fH9zyDRw7KjRppUgC2cjMoYv6 JhlOFZAApzFknXC/I8fmJyzVYpX+v2GB8RX2wWiNXHQZ9hkdKd1IvJqRiuRuSY3L haubMBFG8hrdW8qQ40ngvt+GnrxKUustmPM6x9DCJYaMkKRkXC6NxueUcIk6qWiw UJz5IOXURdJ3i1qxmkMGSnJiYbFpmwVGlqRekN3qNVpXNovfbg0= =l15S -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael Tsirkin: "Fixes and cleanups: - A couple of mlx5 fixes related to cvq - A couple of reverts dropping useless code (code that used it got reverted earlier)" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa: mlx5: synchronize driver status with CVQ vdpa: mlx5: prevent cvq work from hogging CPU Revert "virtio_config: introduce a new .enable_cbs method" Revert "virtio: use virtio_device_ready() in virtio_device_restore()"	2022-04-05 10:40:52 -07:00
Linus Torvalds	f4f5d7cfb2	virtio: features, fixes vdpa generic device type support More virtio hardening for broken devices On the same theme, revert some virtio hotplug hardening patches - they were misusing some interrupt flags, will have to be reverted. RSS support in virtio-net max device MTU support in mlx5 vdpa akcipher support in virtio-crypto shared IRQ support in ifcvf vdpa a minor performance improvement in vhost Enable virtio mem for ARM64 beginnings of advance dma support Cleanups, fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmJEEk8PHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpcpUH+wRIXrzveirsN4MYH0aAeF+SLYaA5pgtO4U7 da22HYtwlMrDRMxwjepKBOTSu89uP5LEK7IKWPj9VRZg+GLz/Cdfc6BZl/fND3qt 0yFpwG1ZLsBK1+WHbysWQneEbPjXqQdbh9eVkKVGcNkRuLJJwXbmF95dyQEJwzeh dPHssDcEC2tRgHAMrLyjLPKwMCRwcgtdPoB1ZC+lqTs3G6lktAfREEvqVfJOVe1b mQcgdAJ+aRM0J/w/PYTmxFOZPYAmQ6hmAQ8Hf7nkjfRWQ4EM91W0cKAoZPc/+7KN ZfFKVL28GEZLJqnx+3xijwCR2gwVHsRYZHaTjfGgQUWZPoB3Vrc= =ynRx -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: - vdpa generic device type support - more virtio hardening for broken devices (but on the same theme, revert some virtio hotplug hardening patches - they were misusing some interrupt flags and had to be reverted) - RSS support in virtio-net - max device MTU support in mlx5 vdpa - akcipher support in virtio-crypto - shared IRQ support in ifcvf vdpa - a minor performance improvement in vhost - enable virtio mem for ARM64 - beginnings of advance dma support - cleanups, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (33 commits) vdpa/mlx5: Avoid processing works if workqueue was destroyed vhost: handle error while adding split ranges to iotlb vdpa: support exposing the count of vqs to userspace vdpa: change the type of nvqs to u32 vdpa: support exposing the config size to userspace vdpa/mlx5: re-create forwarding rules after mac modified virtio: pci: check bar values read from virtio config space Revert "virtio_pci: harden MSI-X interrupts" Revert "virtio-pci: harden INTX interrupts" drivers/net/virtio_net: Added RSS hash report control. drivers/net/virtio_net: Added RSS hash report. drivers/net/virtio_net: Added basic RSS support. drivers/net/virtio_net: Fixed padded vheader to use v1 with hash. virtio: use virtio_device_ready() in virtio_device_restore() tools/virtio: compile with -pthread tools/virtio: fix after premapped buf support virtio_ring: remove flags check for unmap packed indirect desc virtio_ring: remove flags check for unmap split indirect desc virtio_ring: rename vring_unmap_state_packed() to vring_unmap_extra_packed() net/mlx5: Add support for configuring max device MTU ...	2022-03-31 13:57:15 -07:00
Michael S. Tsirkin	7414539c5f	Revert "virtio: use virtio_device_ready() in virtio_device_restore()" This reverts commit `8d65bc9a5b`. We reverted the problematic changes, no more need for work arounds on restore. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2022-03-30 04:18:14 -04:00
Keir Fraser	3f63a1d7f6	virtio: pci: check bar values read from virtio config space virtio pci config structures may in future have non-standard bar values in the bar field. We should anticipate this by skipping any structures containing such a reserved value. The bar value should never change: check for harmful modified values we re-read it from the config space in vp_modern_map_capability(). Also clean up an existing check to consistently use PCI_STD_NUM_BARS. Signed-off-by: Keir Fraser <keirf@google.com> Link: https://lore.kernel.org/r/20220323140727.3499235-1-keirf@google.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-03-28 16:52:59 -04:00
Jason Wang	eb4cecb453	Revert "virtio_pci: harden MSI-X interrupts" This reverts commit `9e35276a53`. Issue were reported for the drivers that are using affinity managed IRQ where manually toggling IRQ status is not expected. And we forget to enable the interrupts in the restore path as well. In the future, we will rework on the interrupt hardening. Fixes: `9e35276a53` ("virtio_pci: harden MSI-X interrupts") Reported-by: Marc Zyngier <maz@kernel.org> Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220323031524.6555-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-03-28 16:52:59 -04:00
Jason Wang	7b79edfb86	Revert "virtio-pci: harden INTX interrupts" This reverts commit `080cd7c3ac`. Since the MSI-X interrupts hardening will be reverted in the next patch. We will rework the interrupt hardening in the future. Fixes: `080cd7c3ac` ("virtio-pci: harden INTX interrupts") Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220323031524.6555-1-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-03-28 16:52:59 -04:00
Stefano Garzarella	8d65bc9a5b	virtio: use virtio_device_ready() in virtio_device_restore() After waking up a suspended VM, the kernel prints the following trace for virtio drivers which do not directly call virtio_device_ready() in the .restore: PM: suspend exit irq 22: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> dump_stack_lvl+0x38/0x49 dump_stack+0x10/0x12 __report_bad_irq+0x3a/0xaf note_interrupt.cold+0xb/0x60 handle_irq_event+0x71/0x80 handle_fasteoi_irq+0x95/0x1e0 __common_interrupt+0x6b/0x110 common_interrupt+0x63/0xe0 asm_common_interrupt+0x1e/0x40 ? __do_softirq+0x75/0x2f3 irq_exit_rcu+0x93/0xe0 sysvec_apic_timer_interrupt+0xac/0xd0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch_cpu_idle+0x12/0x20 default_idle_call+0x39/0xf0 do_idle+0x1b5/0x210 cpu_startup_entry+0x20/0x30 start_secondary+0xf3/0x100 secondary_startup_64_no_verify+0xc3/0xcb </TASK> handlers: [<000000008f9bac49>] vp_interrupt [<000000008f9bac49>] vp_interrupt Disabling IRQ #22 This happens because we don't invoke .enable_cbs callback in virtio_device_restore(). That callback is used by some transports (e.g. virtio-pci) to enable interrupts. Let's fix it, by calling virtio_device_ready() as we do in virtio_dev_probe(). This function calls .enable_cts callback and sets DRIVER_OK status bit. This fix also avoids setting DRIVER_OK twice for those drivers that call virtio_device_ready() in the .restore. Fixes: `d50497eb4e` ("virtio_config: introduce a new .enable_cbs method") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20220322114313.116516-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-03-28 16:52:59 -04:00
Xuan Zhuo	920379a465	virtio_ring: remove flags check for unmap packed indirect desc When calling vring_unmap_desc_packed(), it will not encounter the situation that the flags contains VRING_DESC_F_INDIRECT. So remove this logic. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20220224110402.108161-4-xuanzhuo@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-03-28 16:52:58 -04:00
Xuan Zhuo	b4282ebc71	virtio_ring: remove flags check for unmap split indirect desc When calling vring_unmap_one_split_indirect(), it will not encounter the situation that the flags contains VRING_DESC_F_INDIRECT. So remove this logic. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20220224110402.108161-3-xuanzhuo@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-03-28 16:52:58 -04:00
Xuan Zhuo	d80dc15bb6	virtio_ring: rename vring_unmap_state_packed() to vring_unmap_extra_packed() The actual parameter handled by vring_unmap_state_packed() is that vring_desc_extra, so this function should use "extra" instead of "state". Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20220224110402.108161-2-xuanzhuo@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-03-28 16:52:58 -04:00
Gavin Shan	6f4abbaa1b	drivers/virtio: Enable virtio mem for ARM64 This enables virtio-mem device support by allowing to enable the corresponding kernel config option (CONFIG_VIRTIO_MEM) on the architecture. Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220119010551.181405-1-gshan@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2022-03-28 16:52:57 -04:00
David Hildenbrand	b3d40a2b6d	mm: enforce pageblock_order < MAX_ORDER Some places in the kernel don't really expect pageblock_order >= MAX_ORDER, and it looks like this is only possible in corner cases: 1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order pages via __free_pages_core(), which cannot possibly work. 2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger pageblock_order, we could have a single pageblock partially managed by two zones. 3) compaction code runs into __fragmentation_index() with order >= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1] 4) mm/page_reporting.c won't be reporting any pages with default page_reporting_order == pageblock_order, as we'll be skipping the reporting loop inside page_reporting_process_zone(). 5) __rmqueue_fallback() will never be able to steal with ALLOC_NOFRAGMENT. pageblock_order >= MAX_ORDER is weird either way: it's a pure optimization for making alloc_contig_range(), as used for allcoation of gigantic pages, a little more reliable to succeed. However, if there is demand for somewhat reliable allocation of gigantic pages, affected setups should be using CMA or boottime allocations instead. So let's make sure that pageblock_order < MAX_ORDER and simplify. [1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20220214174132.219303-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Frank Rowand <frowand.list@gmail.com> Cc: John Garry via iommu <iommu@lists.linux-foundation.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-03-22 15:57:06 -07:00
Michael S. Tsirkin	e7c552ec89	virtio: drop default for virtio-mem There's no special reason why virtio-mem needs a default that's different from what kconfig provides, any more than e.g. virtio blk. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>	2022-03-06 06:06:50 -05:00
Si-Wei Liu	e0077cc13b	vdpa: factor out vdpa_set_features_unlocked for vdpa internal use No functional change introduced. vdpa bus driver such as virtio_vdpa or vhost_vdpa is not supposed to take care of the locking for core by its own. The locked API vdpa_set_features should suffice the bus driver's need. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/1642206481-30721-2-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2022-03-04 11:56:33 -05:00
Michael S. Tsirkin	c46eccdaad	virtio: document virtio_reset_device Looks like most callers get driver/device removal wrong. Document what's expected of callers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-03-04 08:33:22 -05:00
Michael S. Tsirkin	4fa59ede95	virtio: acknowledge all features before access The feature negotiation was designed in a way that makes it possible for devices to know which config fields will be accessed by drivers. This is broken since commit `404123c2db` ("virtio: allow drivers to validate features") with fallout in at least block and net. We have a partial work-around in commit `2f9a174f91` ("virtio: write back F_VERSION_1 before validate") which at least lets devices find out which format should config space have, but this is a partial fix: guests should not access config space without acknowledging features since otherwise we'll never be able to change the config space format. To fix, split finalize_features from virtio_finalize_features and call finalize_features with all feature bits before validation, and then - if validation changed any bits - once again after. Since virtio_finalize_features no longer writes out features rename it to virtio_features_ok - since that is what it does: checks that features are ok with the device. As a side effect, this also reduces the amount of hypervisor accesses - we now only acknowledge features once unless we are clearing any features when validating (which is uncommon). IRC I think that this was more or less always the intent in the spec but unfortunately the way the spec is worded does not say this explicitly, I plan to address this at the spec level, too. Acked-by: Jason Wang <jasowang@redhat.com> Cc: stable@vger.kernel.org Fixes: `404123c2db` ("virtio: allow drivers to validate features") Fixes: `2f9a174f91` ("virtio: write back F_VERSION_1 before validate") Cc: "Halil Pasic" <pasic@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-03-04 08:33:21 -05:00
Michael S. Tsirkin	838d6d3461	virtio: unexport virtio_finalize_features virtio_finalize_features is only used internally within virtio. No reason to export it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2022-03-04 08:33:21 -05:00
Eli Cohen	aba21aff77	vdpa: Allow to configure max data virtqueues Add netlink support to configure the max virtqueue pairs for a device. At least one pair is required. The maximum is dictated by the device. Example: $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 4 Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-6-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-14 18:50:53 -05:00
Eli Cohen	73bc0dbb59	vdpa: Sync calls set/get config/status with cf_mutex Add wrappers to get/set status and protect these operations with cf_mutex to serialize these operations with respect to get/set config operations. Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-4-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-14 18:50:53 -05:00
Eli Cohen	a64917bc2e	vdpa: Provide interface to read driver features Provide an interface to read the negotiated features. This is needed when building the netlink message in vdpa_dev_net_config_fill(). Also fix the implementation of vdpa_dev_net_config_fill() to use the negotiated features instead of the device features. To make APIs clearer, make the following name changes to struct vdpa_config_ops so they better describe their operations: get_features -> get_device_features set_features -> set_driver_features Finally, add get_driver_features to return the negotiated features and add implementation to all the upstream drivers. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-2-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-14 18:50:53 -05:00
Michael S. Tsirkin	1861ba626a	virtio_ring: mark ring unused on error A recently added error path does not mark ring unused when exiting on OOM, which will lead to BUG on the next entry in debug builds. TODO: refactor code so we have START_USE and END_USE in the same function. Fixes: `fc6d70f40b` ("virtio_ring: check desc == NULL when using indirect with packed") Cc: "Xuan Zhuo" <xuanzhuo@linux.alibaba.com> Cc: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-14 18:50:53 -05:00
Peng Hao	49814ce9e2	virtio/virtio_pci_legacy_dev: ensure the correct return value When pci_iomap return NULL, the return value is zero. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20211222112014.87394-1-flyingpeng@tencent.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2022-01-14 18:50:53 -05:00
Peng Hao	cf4a4493ff	virtio/virtio_mem: handle a possible NULL as a memcpy parameter There is a check for vm->sbm.sb_states before, and it should check it here as well. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20211222011225.40573-1-flyingpeng@tencent.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: `5f1f79bbc9` ("virtio-mem: Paravirtualized memory hotplug") Cc: stable@vger.kernel.org # v5.8+	2022-01-14 18:50:53 -05:00
Dapeng Mi	2b68224ec6	virtio: fix a typo in function "vp_modern_remove" comments. Function name "vp_modern_remove" in comments is written to "vp_modern_probe" incorrectly. Change it. Signed-off-by: Dapeng Mi <dapeng1.mi@intel.com> Link: https://lore.kernel.org/r/20211210073546.700783-1-dapeng1.mi@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>	2022-01-14 18:50:53 -05:00
王贇	6017599bb2	virtio-pci: fix the confusing error message The error message on the failure of pfn check should tell virtio-pci rather than virtio-mmio, just fix it. Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/ae5e154e-ac59-f0fa-a7c7-091a2201f581@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-14 18:50:53 -05:00
David Hildenbrand	57c5a5b304	virtio-mem: prepare fake page onlining code for granularity smaller than MAX_ORDER - 1 Let's prepare our fake page onlining code for subblock size smaller than MAX_ORDER - 1: we might get called for ranges not covering properly aligned MAX_ORDER - 1 pages. We have to detect the order to use dynamically. Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211126134209.17332-3-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Eric Ren <renzhengeek@gmail.com>	2022-01-14 18:50:52 -05:00
David Hildenbrand	6639032acc	virtio-mem: prepare page onlining code for granularity smaller than MAX_ORDER - 1 Let's prepare our page onlining code for subblock size smaller than MAX_ORDER - 1: we'll get called for a MAX_ORDER - 1 page but might have some subblocks in the range plugged and some unplugged. In that case, fallback to subblock granularity to properly only expose the plugged parts to the buddy. Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211126134209.17332-2-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Eric Ren <renzhengeek@gmail.com>	2022-01-14 18:50:52 -05:00
Michael S. Tsirkin	d9679d0013	virtio: wrap config->reset calls This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2022-01-14 18:50:52 -05:00
Will Deacon	817fc978b5	virtio_ring: Fix querying of maximum DMA mapping size for virtio device virtio_max_dma_size() returns the maximum DMA mapping size of the virtio device by querying dma_max_mapping_size() for the device when the DMA API is in use for the vring. Unfortunately, the device passed is initialised by register_virtio_device() and does not inherit the DMA configuration from its parent, resulting in SWIOTLB errors when bouncing is enabled and the default 256K mapping limit (IO_TLB_SEGSIZE) is not respected: \| virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 294912 bytes), total 1024 (slots), used 725 (slots) Follow the pattern used elsewhere in the virtio_ring code when calling into the DMA layer and pass the parent device to dma_max_mapping_size() instead. Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211201112018.25276-1-will@kernel.org Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Fixes: `e6d6dd6c87` ("virtio: Introduce virtio_max_dma_size()") Cc: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-12-08 15:04:06 -05:00
Michael S. Tsirkin	f124034faa	Revert "virtio_ring: validate used buffer length" This reverts commit `939779f515`. Attempts to validate length in the core did not work out: there turn out to exist multiple broken devices, and in particular legacy devices are known to be broken in this respect. We have ideas for handling this better in the next version but for now let's revert to a known good state to make sure drivers work for people. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-11-24 18:47:27 -05:00
David Hildenbrand	61082ad6a6	virtio-mem: support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE The initial virtio-mem spec states that while unplugged memory should not be read, the device still has to allow for reading unplugged memory inside the usable region. The primary motivation for this default handling was to simplify bringup of virtio-mem, because there were corner cases where Linux might have accidentially read unplugged memory inside added Linux memory blocks. In the meantime, we: 1. Removed /dev/kmem in commit `bbcd53c960` ("drivers/char: remove /dev/kmem for good") 2. Disallowed access to virtio-mem device memory via /dev/mem in commit `2128f4e21a` ("virtio-mem: disallow mapping virtio-mem memory via /dev/mem") 3. Sanitized access to virtio-mem device memory via /proc/kcore in commit `0daa322b8f` ("fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages") 4. Sanitized access to virtio-mem device memory via /proc/vmcore in commit `ce2814622e` ("virtio-mem: kdump mode to sanitize /proc/vmcore access") "Accidential" access to unplugged memory is no longer possible; we can support the new VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE feature that will be required by some hypervisors implementing virtio-mem in the near future. Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: Hui Zhu <teawater@gmail.com> Cc: Sebastien Boeuf <sebastien.boeuf@intel.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2021-11-10 15:32:38 +01:00
Linus Torvalds	59a2ceeef6	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "87 patches. Subsystems affected by this patch series: mm (pagecache and hugetlb), procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs, init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork, sysvfs, kcov, gdb, resource, selftests, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (87 commits) ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL ipc: check checkpoint_restore_ns_capable() to modify C/R proc files selftests/kselftest/runner/run_one(): allow running non-executable files virtio-mem: disallow mapping virtio-mem memory via /dev/mem kernel/resource: disallow access to exclusive system RAM regions kernel/resource: clean up and optimize iomem_is_exclusive() scripts/gdb: handle split debug for vmlinux kcov: replace local_irq_save() with a local_lock_t kcov: avoid enable+disable interrupts if !in_task() kcov: allocate per-CPU memory on the relevant node Documentation/kcov: define `ip' in the example Documentation/kcov: include types.h in the example sysv: use BUILD_BUG_ON instead of runtime check kernel/fork.c: unshare(): use swap() to make code cleaner seq_file: fix passing wrong private data seq_file: move seq_escape() to a header signal: remove duplicate include in signal.h crash_dump: remove duplicate include in crash_dump.h crash_dump: fix boolreturn.cocci warning hfs/hfsplus: use WARN_ON for sanity check ...	2021-11-09 10:11:53 -08:00
David Hildenbrand	2128f4e21a	virtio-mem: disallow mapping virtio-mem memory via /dev/mem We don't want user space to be able to map virtio-mem device memory directly (e.g., via /dev/mem) in order to have guarantees that in a sane setup we'll never accidentially access unplugged memory within the device-managed region of a virtio-mem device, just as required by the virtio-spec. As soon as the virtio-mem driver is loaded, the device region is visible in /proc/iomem via the parent device region. From that point on user space is aware of the device region and we want to disallow mapping anything inside that region (where we will dynamically (un)plug memory) until the driver has been unloaded cleanly and e.g., another driver might take over. By creating our parent IORESOURCE_SYSTEM_RAM resource with IORESOURCE_EXCLUSIVE, we will disallow any /dev/mem access to our device region until the driver was unloaded cleanly and removed the parent region. This will work even though only some memory blocks are actually currently added to Linux and appear as busy in the resource tree. So access to the region from user space is only possible a) if we don't load the virtio-mem driver. b) after unloading the virtio-mem driver cleanly. Don't build virtio-mem if access to /dev/mem cannot be restricticted -- if we have CONFIG_DEVMEM=y but CONFIG_STRICT_DEVMEM is not set. Link: https://lkml.kernel.org/r/20210920142856.17758-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:52 -08:00
David Hildenbrand	ce2814622e	virtio-mem: kdump mode to sanitize /proc/vmcore access Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [1] Let's register a vmcore callback, to allow vmcore code to check if a PFN belonging to a virtio-mem device is either currently plugged and should be dumped or is currently unplugged and should not be accessed, instead mapping the shared zeropage or returning zeroes when reading. This is important when not capturing /proc/vmcore via tools like "makedumpfile" that can identify logically unplugged virtio-mem memory via PG_offline in the memmap, but simply by e.g., copying the file. Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. With this series, we'll send one virtio-mem state request for every ~2 MiB chunk of virtio-mem memory indicated in the vmcore that we intend to read/map. In the future, we might want to allow building virtio-mem for kdump mode only, even without CONFIG_MEMORY_HOTPLUG and friends: this way, we could support special stripped-down kdump kernels that have many other config options disabled; we'll tackle that once required. Further, we might want to try sensing bigger blocks (e.g., memory sections) first before falling back to device blocks on demand. Tested with Fedora rawhide, which contains a recent kexec-tools version (considering "System RAM (virtio_mem)" when creating the vmcore header) and a recent dracut version (including the virtio_mem module in the kdump initrd). Link: https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com [1] Link: https://github.com/dracutdevs/dracut/pull/1157 [2] Link: https://lkml.kernel.org/r/20211005121430.30136-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	ffc763d0c3	virtio-mem: factor out hotplug specifics from virtio_mem_remove() into virtio_mem_deinit_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	84e17e684e	virtio-mem: factor out hotplug specifics from virtio_mem_probe() into virtio_mem_init_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-8-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	94300fcf4c	virtio-mem: factor out hotplug specifics from virtio_mem_init() into virtio_mem_init_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00

1 2 3 4 5 ...

910 Commits