linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-24 05:02:12 +00:00

Author	SHA1	Message	Date
Alex Williamson	b1b8132a65	vfio: More vfio_file_is_group() use cases Replace further open coded tests with helper. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/166516896843.1215571.5378890510536477434.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-07 13:06:52 -06:00
Jason Gunthorpe	3dd59a7dcb	vfio: Make the group FD disassociate from the iommu_group Allow the vfio_group struct to exist with a NULL iommu_group pointer. When the pointer is NULL the vfio_group users promise not to touch the iommu_group. This allows a driver to be hot unplugged while userspace is keeping the group FD open. Remove all the code waiting for the group FD to close. This fixes a userspace regression where we learned that virtnodedevd leaves a group FD open even though the /dev/ node for it has been deleted and all the drivers for it unplugged. Fixes: `ca5f21b257` ("vfio: Follow a strict lifetime for struct iommu_group") Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v2-15417f29324e+1c-vfio_group_disassociate_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-07 08:10:52 -06:00
Jason Gunthorpe	819da99a73	vfio: Hold a reference to the iommu_group in kvm for SPAPR SPAPR exists completely outside the normal iommu driver framework, the groups it creates are fake and are only created to enable VFIO's uAPI. Thus, it does not need to follow the iommu core rule that the iommu_group will only be touched while a driver is attached. Carry a group reference into KVM and have KVM directly manage the lifetime of this object independently of VFIO. This means KVM no longer relies on the vfio group file being valid to maintain the group reference. Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v2-15417f29324e+1c-vfio_group_disassociate_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-07 08:10:52 -06:00
Jason Gunthorpe	4b22ef042d	vfio: Add vfio_file_is_group() This replaces uses of vfio_file_iommu_group() which were only detecting if the file is a VFIO file with no interest in the actual group. The only remaning user of vfio_file_iommu_group() is in KVM for the SPAPR stuff. It passes the iommu_group into the arch code through kvm for some reason. Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v2-15417f29324e+1c-vfio_group_disassociate_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-07 08:10:52 -06:00
Jason Gunthorpe	c82e81ab25	vfio: Change vfio_group->group_rwsem to a mutex These days not much is using the read side: - device first open - ioctl_get_status - device FD release - check enforced_coherent None of this is performance, so just make it into a normal mutex. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/2-v1-917e3647f123+b1a-vfio_group_users_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Jason Gunthorpe	912b74d26c	vfio: Remove the vfio_group->users and users_comp Kevin points out that the users is really just tracking if group->opened_file is set, so we can simplify this code to a wait_queue that looks for !opened_file under the group_rwsem. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/1-v1-917e3647f123+b1a-vfio_group_users_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Jason Gunthorpe	9c799c224d	vfio/mdev: add mdev available instance checking to the core Many of the mdev drivers use a simple counter for keeping track of the available instances. Move this code to the core code and store the counter in the mdev_parent. Implement it using correct locking, fixing mdpy. Drivers just provide the value in the mdev_driver at registration time and the core code takes care of maintaining it and exposing the value in sysfs. [hch: count instances per-parent instead of per-type, use an atomic_t to avoid taking mdev_list_lock in the show method] Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-15-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Christoph Hellwig	685a1537f4	vfio/mdev: consolidate all the description sysfs into the core code Every driver just emits a string, simply add a method to the mdev_driver to return it and provide a standard sysfs show function. Remove the now unused types_attrs field in struct mdev_driver and the support code for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20220923092652.100656-14-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Christoph Hellwig	f2fbc72e6d	vfio/mdev: consolidate all the available_instance sysfs into the core code Every driver just print a number, simply add a method to the mdev_driver to return it and provide a standard sysfs show function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-13-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Christoph Hellwig	0bc79069cc	vfio/mdev: consolidate all the name sysfs into the core code Every driver just emits a static string, simply add a field to the mdev_type for the driver to fill out or fall back to the sysfs name and provide a standard sysfs show function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-12-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Jason Gunthorpe	290aac5df8	vfio/mdev: consolidate all the device_api sysfs into the core code Every driver just emits a static string, simply feed it through the ops and provide a standard sysfs show function. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-11-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Christoph Hellwig	c7c1f38f6c	vfio/mdev: remove mtype_get_parent_dev Just open code the dereferences in the only user. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-10-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Christoph Hellwig	062e720cd2	vfio/mdev: remove mdev_parent_dev Just open code the dereferences in the only user. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20220923092652.100656-9-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Christoph Hellwig	2815fe149f	vfio/mdev: unexport mdev_bus_type mdev_bus_type is only used in mdev.ko now, so unexport it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20220923092652.100656-8-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Christoph Hellwig	cbf3bb28aa	vfio/mdev: remove mdev_from_dev Just open code it in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20220923092652.100656-7-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Christoph Hellwig	da44c340c4	vfio/mdev: simplify mdev_type handling Instead of abusing struct attribute_group to control initialization of struct mdev_type, just define the actual attributes in the mdev_driver, allocate the mdev_type structures in the caller and pass them to mdev_register_parent. This allows the caller to use container_of to get at the containing structure and thus significantly simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-6-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Christoph Hellwig	89345d5177	vfio/mdev: embedd struct mdev_parent in the parent data structure Simplify mdev_{un}register_device by requiring the caller to pass in a structure allocate as part of the parent device structure. This removes the need for a list of parents and the separate mdev_parent refcount as we can simplify rely on the reference to the parent device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-5-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Christoph Hellwig	bdef2b7896	vfio/mdev: make mdev.h standalone includable Include <linux/device.h> and <linux/uuid.h> so that users of this headers don't need to do that and remove those includes that aren't needed any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20220923092652.100656-4-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-10-04 12:06:58 -06:00
Longfang Liu	42e1d1eed2	hisi_acc_vfio_pci: Update some log and comment formats 1. Modify some annotation information formats to keep the entire driver annotation format consistent. 2. Modify some log description formats to be consistent with the format of the entire driver log. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-6-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-27 09:30:31 -06:00
Longfang Liu	3b7cfba0d8	hisi_acc_vfio_pci: Remove useless macro definitions The QM_QUE_ISO_CFG macro definition is no longer used and needs to be deleted from the current driver. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-5-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-27 09:30:31 -06:00
Longfang Liu	af72f53c1b	hisi_acc_vfio_pci: Remove useless function parameter Remove unused function parameters for vf_qm_fun_reset() and ensure the device is enabled before the reset operation is performed. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-4-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-27 09:30:31 -06:00
Longfang Liu	008e5e996f	hisi_acc_vfio_pci: Fix device data address combination problem The queue address of the accelerator device should be combined into a dma address in a way of combining the low and high bits. The previous combination is wrong and needs to be modified. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-3-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-27 09:30:31 -06:00
Longfang Liu	948f5ada58	hisi_acc_vfio_pci: Fixes error return code issue During the process of compatibility and matching of live migration device information, if the isolation status of the two devices is inconsistent, the live migration needs to be exited. The current driver does not return the error code correctly and needs to be fixed. Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-2-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-27 09:30:31 -06:00
Jason Gunthorpe	ca5f21b257	vfio: Follow a strict lifetime for struct iommu_group The iommu_group comes from the struct device that a driver has been bound to and then created a struct vfio_device against. To keep the iommu layer sane we want to have a simple rule that only an attached driver should be using the iommu API. Particularly only an attached driver should hold ownership. In VFIO's case since it uses the group APIs and it shares between different drivers it is a bit more complicated, but the principle still holds. Solve this by waiting for all users of the vfio_group to stop before allowing vfio_unregister_group_dev() to complete. This is done with a new completion to know when the users go away and an additional refcount to keep track of how many device drivers are sharing the vfio group. The last driver to be unregistered will clean up the group. This solves crashes in the S390 iommu driver that come because VFIO ends up racing releasing ownership (which attaches the default iommu_domain to the device) with the removal of that same device from the iommu driver. This is a side case that iommu drivers should not have to cope with. iommu driver failed to attach the default/blocking domain WARNING: CPU: 0 PID: 5082 at drivers/iommu/iommu.c:1961 iommu_detach_group+0x6c/0x80 Modules linked in: macvtap macvlan tap vfio_pci vfio_pci_core irqbypass vfio_virqfd kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink mlx5_ib sunrpc ib_uverbs ism smc uvdevice ib_core s390_trng eadm_sch tape_3590 tape tape_class vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha aes_s390 mlx5_core des_s390 libdes sha3_512_s390 nvme sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme_core zfcp scsi_transport_fc pkey zcrypt rng_core autofs4 CPU: 0 PID: 5082 Comm: qemu-system-s39 Tainted: G W 6.0.0-rc3 #5 Hardware name: IBM 3931 A01 782 (LPAR) Krnl PSW : 0704c00180000000 000000095bb10d28 (iommu_detach_group+0x70/0x80) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000001 0000000900000027 0000000000000039 000000095c97ffe0 00000000fffeffff 00000009fc290000 00000000af1fda50 00000000af590b58 00000000af1fdaf0 0000000135c7a320 0000000135e52258 0000000135e52200 00000000a29e8000 00000000af590b40 000000095bb10d24 0000038004b13c98 Krnl Code: 000000095bb10d18: c020003d56fc larl %r2,000000095c2bbb10 000000095bb10d1e: c0e50019d901 brasl %r14,000000095be4bf20 #000000095bb10d24: af000000 mc 0,0 >000000095bb10d28: b904002a lgr %r2,%r10 000000095bb10d2c: ebaff0a00004 lmg %r10,%r15,160(%r15) 000000095bb10d32: c0f4001aa867 brcl 15,000000095be65e00 000000095bb10d38: c004002168e0 brcl 0,000000095bf3def8 000000095bb10d3e: eb6ff0480024 stmg %r6,%r15,72(%r15) Call Trace: [<000000095bb10d28>] iommu_detach_group+0x70/0x80 ([<000000095bb10d24>] iommu_detach_group+0x6c/0x80) [<000003ff80243b0e>] vfio_iommu_type1_detach_group+0x136/0x6c8 [vfio_iommu_type1] [<000003ff80137780>] __vfio_group_unset_container+0x58/0x158 [vfio] [<000003ff80138a16>] vfio_group_fops_unl_ioctl+0x1b6/0x210 [vfio] pci 0004:00:00.0: Removing from iommu group 4 [<000000095b5b62e8>] __s390x_sys_ioctl+0xc0/0x100 [<000000095be5d3b4>] __do_syscall+0x1d4/0x200 [<000000095be6c072>] system_call+0x82/0xb0 Last Breaking-Event-Address: [<000000095be4bf80>] __warn_printk+0x60/0x68 It indicates that domain->ops->attach_dev() failed because the driver has already passed the point of destructing the device. Fixes: `9ac8545199` ("iommu: Fix use-after-free in iommu_release_device") Reported-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-a3c5f4429e2a+55-iommu_group_lifetime_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-27 09:01:23 -06:00
Joerg Roedel	38713c6028	Merge branches 'apple/dart', 'arm/mediatek', 'arm/omap', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next	2022-09-26 15:52:31 +02:00
Jason Gunthorpe	cdc71fe4ec	vfio: Move container code into drivers/vfio/container.c All the functions that dereference struct vfio_container are moved into container.c. Simple code motion, no functional change. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-22 15:46:06 -06:00
Jason Gunthorpe	9446162e74	vfio: Split the register_device ops call into functions This is a container item. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-22 15:46:06 -06:00
Jason Gunthorpe	1408640d57	vfio: Rename vfio_ioctl_check_extension() To vfio_container_ioctl_check_extension(). A following patch will turn this into a non-static function, make it clear it is related to the container. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-22 15:46:06 -06:00
Jason Gunthorpe	c41da4622e	vfio: Split out container code from the init/cleanup functions This miscdev, noiommu driver and a couple of globals are all container items. Move this init into its own functions. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-22 15:46:06 -06:00
Jason Gunthorpe	444d43ecd0	vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMU This can all be accomplished using typical IS_ENABLED techniques, drop it all. Also rename the variable to vfio_noiommu so this can be made global in following patches. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-22 15:46:06 -06:00
Jason Gunthorpe	03e650f661	vfio: Split the container logic into vfio_container_attach_group() This splits up the ioctl of vfio_group_ioctl_set_container() so it determines the type of file then invokes a type specific attachment function. Future patches will add iommufd to this function as an alternative type. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-22 15:46:06 -06:00
Jason Gunthorpe	429a781c8e	vfio: Rename __vfio_group_unset_container() To vfio_group_detach_container(). This function is really a container function. Fold the WARN_ON() into it as a precondition assertion. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-22 15:46:06 -06:00
Jason Gunthorpe	e3bb4de0a0	vfio: Add header guards and includes to drivers/vfio/vfio.h As is normal for headers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-22 15:46:06 -06:00
Yi Liu	3c28a76124	vfio: Add struct device to vfio_device and replace kref. With it a 'vfio-dev/vfioX' node is created under the sysfs path of the parent, indicating the device is bound to a vfio driver, e.g.: /sys/devices/pci0000\:6f/0000\:6f\:01.0/vfio-dev/vfio0 It is also a preparatory step toward adding cdev for supporting future device-oriented uAPI. Add Documentation/ABI/testing/sysfs-devices-vfio-dev. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-16-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-21 14:15:11 -06:00
Kevin Tian	4a725b8de4	vfio: Rename vfio_device_put() and vfio_device_try_get() With the addition of vfio_put_device() now the names become confusing. vfio_put_device() is clear from object life cycle p.o.v given kref. vfio_device_put()/vfio_device_try_get() are helpers for tracking users on a registered device. Now rename them: - vfio_device_put() -> vfio_device_put_registration() - vfio_device_try_get() -> vfio_device_try_get_registration() Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-15-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-21 14:15:11 -06:00
Kevin Tian	ebb72b765f	vfio/ccw: Use the new device life cycle helpers ccw is the only exception which cannot use vfio_alloc_device() because its private device structure is designed to serve both mdev and parent. Life cycle of the parent is managed by css_driver so vfio_ccw_private must be allocated/freed in css_driver probe/remove path instead of conforming to vfio core life cycle for mdev. Given that use a wait/completion scheme so the mdev remove path waits after vfio_put_device() until receiving a completion notification from @release. The completion indicates that all active references on vfio_device have been released. After that point although free of vfio_ccw_private is delayed to css_driver it's at least guaranteed to have no parallel reference on released vfio device part from other code paths. memset() in @probe is removed. vfio_device is either already cleared when probed for the first time or cleared in @release from last probe. The right fix is to introduce separate structures for mdev and parent, but this won't happen in short term per prior discussions. Remove vfio_init/uninit_group_dev() as no user now. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220921104401.38898-14-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-21 14:15:11 -06:00
Kevin Tian	ac1237912f	vfio/amba: Use the new device life cycle helpers Implement amba's own vfio_device_ops. Remove vfio_platform_probe/remove_common() given no user now. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-13-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-21 14:15:11 -06:00
Kevin Tian	5f6c7e0831	vfio/platform: Use the new device life cycle helpers Move vfio_device_ops from platform core to platform drivers so device specific init/cleanup can be added. Introduce two new helpers vfio_platform_init/release_common() for the use in driver @init/@release. vfio_platform_probe/remove_common() will be deprecated. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-12-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-21 14:15:11 -06:00
Yi Liu	7566692c57	vfio/fsl-mc: Use the new device life cycle helpers Also add a comment to mark that vfio core releases device_set if @init fails. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-11-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-21 14:15:11 -06:00
Yi Liu	27aeb91559	vfio/hisi_acc: Use the new device life cycle helpers Tidy up @probe so all migration specific initialization logic is moved to migration specific @init callback. Remove vfio_pci_core_{un}init_device() given no user now. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220921104401.38898-5-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-21 14:15:10 -06:00
Yi Liu	d3966e305a	vfio/mlx5: Use the new device life cycle helpers mlx5 has its own @init/@release for handling migration cap. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-4-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-21 14:15:10 -06:00
Yi Liu	63d7c77989	vfio/pci: Use the new device life cycle helpers Also introduce two pci core helpers as @init/@release for pci drivers: - vfio_pci_core_init_dev() - vfio_pci_core_release_dev() Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-3-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-21 14:15:10 -06:00
Kevin Tian	cb9ff3f3b8	vfio: Add helpers for unifying vfio_device life cycle The idea is to let vfio core manage the vfio_device life cycle instead of duplicating the logic cross drivers. This is also a preparatory step for adding struct device into vfio_device. New pair of helpers together with a kref in vfio_device: - vfio_alloc_device() - vfio_put_device() Drivers can register @init/@release callbacks to manage any private state wrapping the vfio_device. However vfio-ccw doesn't fit this model due to a life cycle mess that its private structure mixes both parent and mdev info hence must be allocated/freed outside of the life cycle of vfio device. Per prior discussions this won't be fixed in short term by IBM folks. Instead of waiting for those modifications introduce another helper vfio_init_device() so ccw can call it to initialize a pre-allocated vfio_device. Further implication of the ccw trick is that vfio_device cannot be freed uniformly in vfio core. Instead, require EVERY driver to implement @release and free vfio_device inside. Then ccw can choose to delay the free at its own discretion. Another trick down the road is that kvzalloc() is used to accommodate the need of gvt which uses vzalloc() while all others use kzalloc(). So drivers should call a helper vfio_free_device() to free the vfio_device instead of assuming that kfree() or vfree() is appliable. Later once the ccw mess is fixed we can remove those tricks and fully handle structure alloc/free in vfio core. Existing vfio_{un}init_group_dev() will be deprecated after all existing usages are converted to the new model. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Co-developed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-2-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-21 14:15:10 -06:00
Linus Torvalds	725f3f3b27	VFIO fix for v6.0-rc5 - Fix zero page refcount leak (Alex Williamson) -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmMaUVMbHGFsZXgud2ls bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiBl4P/3dXKhitc/ryWcqm0BsH xooWVUglmO7p3KuInJWOkT44a8f13EucKMaBry69G48T+H2d7a/e4qPDzsA84ELq 1XjJHUcvyojblwDnqE5MlC0IM7Pb1vUQ0KbcGdGMfh+hYQZ0mrifmNeXE+a+VFYs KIPlesid4A8+9y431ynWEPQ6GtldigURYj93QPIDfPMGtZVugtNEtUZRN/xvSqrt /AHjcYYFdhFCuy9yDZ255+Hnn4NJ7gctFBT7u2znR5fQntQLyK8z8J8ydfzxMu/+ ifMn/uAV5qIv9p0ir0NrFjTda/TD7Qjli5Tw7LKRqMrhnYmG1Y9BwUMYXeQjYlWV qaeE0lVH1E7a9sqrZy8MfaEZdNbfBcawRUjktNea4fZIGHhMQUU2b96cwesMkomW BlfHKp4Ml3lIGFcCh/LYgNHdevTl0WJ2qqStJqYqMxWBY+4zxAOO/AJen/gJ2qDL qj9FqrdjjVwU67/Rg1he4LcXAeG5rPWut2hXvmv/tsc+KDJ0KbQ618xDvlR1dgH8 7KqZIQxKJ6EX5HlPIGO1vb1KhHLTF4OnHPOznorIyS/9bYwOMcoSN7waLCpCKMan 3n+nx16CmsgBh/hwzSzoe7yviIu0V+MGcMzHE++O1moCd+I5hjTE+nHFk4nhDgcl Pc32wk0ql9nCXCMGjV1t/rNs =Ps/q -----END PGP SIGNATURE----- Merge tag 'vfio-v6.0-rc5' of https://github.com/awilliam/linux-vfio Pull VFIO fix from Alex Williamson: - Fix zero page refcount leak (Alex Williamson) * tag 'vfio-v6.0-rc5' of https://github.com/awilliam/linux-vfio: vfio/type1: Unpin zero pages	2022-09-09 07:44:33 -04:00
Yishai Hadas	f39856aacb	vfio/mlx5: Set the driver DMA logging callbacks Now that everything is ready set the driver DMA logging callbacks if supported by the device. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-11-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-08 12:59:01 -06:00
Yishai Hadas	e295738756	vfio/mlx5: Manage error scenarios on tracker Handle async error events and health/recovery flow to safely stop the tracker upon error scenarios. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-10-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-08 12:59:01 -06:00
Yishai Hadas	1047797e8e	vfio/mlx5: Report dirty pages from tracker Report dirty pages from tracker. It includes: Querying for dirty pages in a given IOVA range, this is done by modifying the tracker into the reporting state and supplying the required range. Using the CQ event completion mechanism to be notified once data is ready on the CQ/QP to be processed. Once data is available turn on the corresponding bits in the bit map. This functionality will be used as part of the 'log_read_and_clear' driver callback in the next patches. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-9-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-08 12:59:01 -06:00
Yishai Hadas	c1d050b0d1	vfio/mlx5: Create and destroy page tracker object Add support for creating and destroying page tracker object. This object is used to control/report the device dirty pages. As part of creating the tracker need to consider the device capabilities for max ranges and adapt/combine ranges accordingly. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-8-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-08 12:59:01 -06:00
Yishai Hadas	79c3cf2799	vfio/mlx5: Init QP based resources for dirty tracking Init QP based resources for dirty tracking to be used upon start logging. It includes: Creating the host and firmware RC QPs, move each of them to its expected state based on the device specification, etc. Creating the relevant resources which are needed by both QPs as of UAR, PD, etc. Creating the host receive side resources as of MKEY, CQ, receive WQEs, etc. The above resources are cleaned-up upon stop logging. The tracker object that will be introduced by next patches will use those resources. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-7-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-08 12:59:00 -06:00
Yishai Hadas	80c4b92a2d	vfio: Introduce the DMA logging feature support Introduce the DMA logging feature support in the vfio core layer. It includes the processing of the device start/stop/report DMA logging UAPIs and calling the relevant driver 'op' to do the work. Specifically, Upon start, the core translates the given input ranges into an interval tree, checks for unexpected overlapping, non aligned ranges and then pass the translated input to the driver for start tracking the given ranges. Upon report, the core translates the given input user space bitmap and page size into an IOVA kernel bitmap iterator. Then it iterates it and call the driver to set the corresponding bits for the dirtied pages in a specific IOVA range. Upon stop, the driver is called to stop the previous started tracking. The next patches from the series will introduce the mlx5 driver implementation for the logging ops. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-6-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-08 12:59:00 -06:00
Joao Martins	58ccf0190d	vfio: Add an IOVA bitmap support The new facility adds a bunch of wrappers that abstract how an IOVA range is represented in a bitmap that is granulated by a given page_size. So it translates all the lifting of dealing with user pointers into its corresponding kernel addresses backing said user memory into doing finally the (non-atomic) bitmap ops to change various bits. The formula for the bitmap is: data[(iova / page_size) / 64] & (1ULL << (iova % 64)) Where 64 is the number of bits in a unsigned long (depending on arch) It introduces an IOVA iterator that uses a windowing scheme to minimize the pinning overhead, as opposed to pinning it on demand 4K at a time. Assuming a 4K kernel page and 4K requested page size, we can use a single kernel page to hold 512 page pointers, mapping 2M of bitmap, representing 64G of IOVA space. An example usage of these helpers for a given @base_iova, @page_size, @length and __user @data: bitmap = iova_bitmap_alloc(base_iova, page_size, length, data); if (IS_ERR(bitmap)) return -ENOMEM; ret = iova_bitmap_for_each(bitmap, arg, dirty_reporter_fn); iova_bitmap_free(bitmap); Each iteration of the @dirty_reporter_fn is called with a unique @iova and @length argument, indicating the current range available through the iova_bitmap. The @dirty_reporter_fn uses iova_bitmap_set() to mark dirty areas (@iova_length) within that provided range, as following: iova_bitmap_set(bitmap, iova, iova_length); The facility is intended to be used for user bitmaps representing dirtied IOVAs by IOMMU (via IOMMUFD) and PCI Devices (via vfio-pci). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-08 12:59:00 -06:00
Alex Williamson	71aef261e0	Merge remote-tracking branch 'mlx5/mlx5-vfio' into v6.1/vfio/next Merge net/mlx5 depedencies for device DMA logging and mlx5 variant driver suppport. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-08 10:44:34 -06:00
Christophe JAILLET	eab60bbc05	vfio/fsl-mc: Fix a typo in a message L and S are swapped in the message. s/VFIO_FLS_MC/VFIO_FSL_MC/ Also use 'ret' instead of 'WARN_ON(ret)' to avoid a duplicated message. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Diana Craciun <diana.craciun@oss.nxp.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/a7c1394346725b7435792628c8d4c06a0a745e0b.1662134821.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-08 10:41:41 -06:00
Robin Murphy	fa49364cd5	iommu/dma: Move public interfaces to linux/iommu.h The iommu-dma layer is now mostly encapsulated by iommu_dma_ops, with only a couple more public interfaces left pertaining to MSI integration. Since these depend on the main IOMMU API header anyway, move their declarations there, taking the opportunity to update the half-baked comments to proper kerneldoc along the way. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/9cd99738f52094e6bed44bfee03fa4f288d20695.1660668998.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-09-07 14:47:00 +02:00
Shameer Kolothum	245898eb92	hisi_acc_vfio_pci: Correct the function prefix for hssi_acc_drvdata() Commit 91be0bd6c6cf("vfio/pci: Have all VFIO PCI drivers store the vfio_pci_core_device in drvdata") introduced a helper function to retrieve the drvdata but used "hssi" instead of "hisi" for the function prefix. Correct that and also while at it, moved the function a bit down so that it's close to other hisi_ prefixed functions. No functional changes. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220831085943.993-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:30:04 -06:00
Jason Gunthorpe	21c13829bc	vfio: Remove vfio_group dev_counter This counts the number of devices attached to a vfio_group, ie the number of items in the group->device_list. It is only read in vfio_pin_pages(), as some kind of protection against limitations in type1. However, with all the code cleanups in this area, now that vfio_pin_pages() accepts a vfio_device directly it is redundant. All drivers are already calling vfio_register_emulated_iommu_dev() which directly creates a group specifically for the device and thus it is guaranteed that there is a singleton group. Leave a note in the comment about this requirement and remove the logic. Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v2-d4374a7bf0c9+c4-vfio_dev_counter_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Abhishek Sahu	453e6c98fd	vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP This patch implements VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature. In the VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY, if there is any access for the VFIO device on the host side, then the device will be moved out of the low power state without the user's guest driver involvement. Once the device access has been finished, then the host can move the device again into low power state. With the low power entry happened through VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP, the device will not be moved back into the low power state and a notification will be sent to the user by triggering wakeup eventfd. vfio_pci_core_pm_entry() will be called for both the variants of low power feature entry so add an extra argument for wakeup eventfd context and store locally in 'struct vfio_pci_core_device'. For the entry happened without wakeup eventfd, all the exit related handling will be done by the LOW_POWER_EXIT device feature only. When the LOW_POWER_EXIT will be called, then the vfio core layer vfio_device_pm_runtime_get() will increment the usage count and will resume the device. In the driver runtime_resume callback, the 'pm_wake_eventfd_ctx' will be NULL. Then vfio_pci_core_pm_exit() will call vfio_pci_runtime_pm_exit() and all the exit related handling will be done. For the entry happened with wakeup eventfd, in the driver resume callback, eventfd will be triggered and all the exit related handling will be done. When vfio_pci_runtime_pm_exit() will be called by vfio_pci_core_pm_exit(), then it will return early. But if the runtime suspend has not happened on the host side, then all the exit related handling will be done in vfio_pci_core_pm_exit() only. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-6-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Abhishek Sahu	cc2742fe36	vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY/EXIT Currently, if the runtime power management is enabled for vfio-pci based devices in the guest OS, then the guest OS will do the register write for PCI_PM_CTRL register. This write request will be handled in vfio_pm_config_write() where it will do the actual register write of PCI_PM_CTRL register. With this, the maximum D3hot state can be achieved for low power. If we can use the runtime PM framework, then we can achieve the D3cold state (on the supported systems) which will help in saving maximum power. 1. D3cold state can't be achieved by writing PCI standard PM config registers. This patch implements the following newly added low power related device features: - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT The VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY feature will allow the device to make use of low power platform states on the host while the VFIO_DEVICE_FEATURE_LOW_POWER_EXIT will prevent further use of those power states. 2. The vfio-pci driver uses runtime PM framework for low power entry and exit. On the platforms where D3cold state is supported, the runtime PM framework will put the device into D3cold otherwise, D3hot or some other power state will be used. There are various cases where the device will not go into the runtime suspended state. For example, - The runtime power management is disabled on the host side for the device. - The user keeps the device busy after calling LOW_POWER_ENTRY. - There are dependent devices that are still in runtime active state. For these cases, the device will be in the same power state that has been configured by the user through PCI_PM_CTRL register. 3. The hypervisors can implement virtual ACPI methods. For example, in guest linux OS if PCI device ACPI node has _PR3 and _PR0 power resources with _ON/_OFF method, then guest linux OS invokes the _OFF method during D3cold transition and then _ON during D0 transition. The hypervisor can tap these virtual ACPI calls and then call the low power device feature IOCTL. 4. The 'pm_runtime_engaged' flag tracks the entry and exit to runtime PM. This flag is protected with 'memory_lock' semaphore. 5. All the config and other region access are wrapped under pm_runtime_resume_and_get() and pm_runtime_put(). So, if any device access happens while the device is in the runtime suspended state, then the device will be resumed first before access. Once the access has been finished, then the device will again go into the runtime suspended state. 6. The memory region access through mmap will not be allowed in the low power state. Since __vfio_pci_memory_enabled() is a common function, so check for 'pm_runtime_engaged' has been added explicitly in vfio_pci_mmap_fault() to block only mmap'ed access. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-5-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Abhishek Sahu	4813724c4b	vfio/pci: Mask INTx during runtime suspend This patch adds INTx handling during runtime suspend/resume. All the suspend/resume related code for the user to put the device into the low power state will be added in subsequent patches. The INTx lines may be shared among devices. Whenever any INTx interrupt comes for the VFIO devices, then vfio_intx_handler() will be called for each device sharing the interrupt. Inside vfio_intx_handler(), it calls pci_check_and_mask_intx() and checks if the interrupt has been generated for the current device. Now, if the device is already in the D3cold state, then the config space can not be read. Attempt to read config space in D3cold state can cause system unresponsiveness in a few systems. To prevent this, mask INTx in runtime suspend callback, and unmask the same in runtime resume callback. If INTx has been already masked, then no handling is needed in runtime suspend/resume callbacks. 'pm_intx_masked' tracks this, and vfio_pci_intx_mask() has been updated to return true if the INTx vfio_pci_irq_ctx.masked value is changed inside this function. For the runtime suspend which is triggered for the no user of VFIO device, the 'irq_type' will be VFIO_PCI_NUM_IRQS and these callbacks won't do anything. The MSI/MSI-X are not shared so similar handling should not be needed for MSI/MSI-X. vfio_msihandler() triggers eventfd_signal() without doing any device-specific config access. When the user performs any config access or IOCTL after receiving the eventfd notification, then the device will be moved to the D0 state first before servicing any request. Another option was to check this flag 'pm_intx_masked' inside vfio_intx_handler() instead of masking the interrupts. This flag is being set inside the runtime_suspend callback but the device can be in non-D3cold state (for example, if the user has disabled D3cold explicitly by sysfs, the D3cold is not supported in the platform, etc.). Also, in D3cold supported case, the device will be in D0 till the PCI core moves the device into D3cold. In this case, there is a possibility that the device can generate an interrupt. Adding check in the IRQ handler will not clear the IRQ status and the interrupt line will still be asserted. This can cause interrupt flooding. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-4-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Abhishek Sahu	8e5c699511	vfio: Increment the runtime PM usage count during IOCTL call The vfio-pci based drivers will have runtime power management support where the user can put the device into the low power state and then PCI devices can go into the D3cold state. If the device is in the low power state and the user issues any IOCTL, then the device should be moved out of the low power state first. Once the IOCTL is serviced, then it can go into the low power state again. The runtime PM framework manages this with help of usage count. One option was to add the runtime PM related API's inside vfio-pci driver but some IOCTL (like VFIO_DEVICE_FEATURE) can follow a different path and more IOCTL can be added in the future. Also, the runtime PM will be added for vfio-pci based drivers variant currently, but the other VFIO based drivers can use the same in the future. So, this patch adds the runtime calls runtime-related API in the top-level IOCTL function itself. For the VFIO drivers which do not have runtime power management support currently, the runtime PM API's won't be invoked. Only for vfio-pci based drivers currently, the runtime PM API's will be invoked to increment and decrement the usage count. In the vfio-pci drivers also, the variant drivers can opt-out by incrementing the usage count during device-open. The pm_runtime_resume_and_get() checks the device current status and will return early if the device is already in the ACTIVE state. Taking this usage count incremented while servicing IOCTL will make sure that the user won't put the device into the low power state when any other IOCTL is being serviced in parallel. Let's consider the following scenario: 1. Some other IOCTL is called. 2. The user has opened another device instance and called the IOCTL for low power entry. 3. The low power entry IOCTL moves the device into the low power state. 4. The other IOCTL finishes. If we don't keep the usage count incremented then the device access will happen between step 3 and 4 while the device has already gone into the low power state. The pm_runtime_resume_and_get() will be the first call so its error should not be propagated to user space directly. For example, if pm_runtime_resume_and_get() can return -EINVAL for the cases where the user has passed the correct argument. So the pm_runtime_resume_and_get() errors have been masked behind -EIO. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-3-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	99a27c088b	vfio: Split VFIO_GROUP_GET_STATUS into a function This is the last sizable implementation in vfio_group_fops_unl_ioctl(), move it to a function so vfio_group_fops_unl_ioctl() is emptied out. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	b3b43590fa	vfio: Follow the naming pattern for vfio_group_ioctl_unset_container() Make it clear that this is the body of the ioctl. Fold the locking into the function so it is self contained like the other ioctls. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/7-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	67671f153e	vfio: Fold VFIO_GROUP_SET_CONTAINER into vfio_group_set_container() No reason to split it up like this, just have one function to process the ioctl. Move the lock into the function as well to avoid having a lockdep annotation. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	150ee2f9cd	vfio: Fold VFIO_GROUP_GET_DEVICE_FD into vfio_group_get_device_fd() No reason to split it up like this, just have one function to process the ioctl. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	663eab456e	vfio-pci: Replace 'void __user *' with proper types in the ioctl functions This makes the code clearer and replaces a few places trying to access a flex array with an actual flex array. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	ea3fc04d4f	vfio-pci: Re-indent what was vfio_pci_core_ioctl() Done mechanically with: $ git clang-format-14 -i --lines 675:1210 drivers/vfio/pci/vfio_pci_core.c And manually reflow the multi-line comments clang-format doesn't fix. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	2ecf3b58ed	vfio-pci: Break up vfio_pci_core_ioctl() into one function per ioctl 500 lines is a bit long for a single function, move the bodies of each ioctl into separate functions and leave behind a switch statement to dispatch them. This patch just adds the function declarations and does not fix the indenting. The next patch will restore the indenting. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	16f4cbd9e1	vfio-pci: Fix vfio_pci_ioeventfd() to return int This only returns 0 or -ERRNO, it should return int like all the other ioctl dispatch functions. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	c462a8c5d9	vfio/pci: Simplify the is_intx/msi/msix/etc defines Only three of these are actually used, simplify to three inline functions, and open code the if statement in vfio_pci_config.c. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/3-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	1e979ef5df	vfio/pci: Rename vfio_pci_register_dev_region() As this is part of the vfio_pci_core component it should be called vfio_pci_core_register_dev_region() like everything else exported from this module. Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/2-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Jason Gunthorpe	e34a0425b8	vfio/pci: Split linux/vfio_pci_core.h The header in include/linux should have only the exported interface for other vfio_pci modules to use. Internal definitions for vfio_pci.ko should be in a "priv" header along side the .c files. Move the internal declarations out of vfio_pci_core.h. They either move to vfio_pci_priv.h or to the C file that is the only user. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/1-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-09-01 15:29:11 -06:00
Alex Williamson	873aefb376	vfio/type1: Unpin zero pages There's currently a reference count leak on the zero page. We increment the reference via pin_user_pages_remote(), but the page is later handled as an invalid/reserved page, therefore it's not accounted against the user and not unpinned by our put_pfn(). Introducing special zero page handling in put_pfn() would resolve the leak, but without accounting of the zero page, a single user could still create enough mappings to generate a reference count overflow. The zero page is always resident, so for our purposes there's no reason to keep it pinned. Therefore, add a loop to walk pages returned from pin_user_pages_remote() and unpin any zero pages. Cc: stable@vger.kernel.org Reported-by: Luboslav Pivarc <lpivarc@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/166182871735.3518559.8884121293045337358.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-08-31 08:57:30 -06:00
Pierre Morel	ca922fecda	KVM: s390: pci: Hook to access KVM lowlevel from VFIO We have a cross dependency between KVM and VFIO when using s390 vfio_pci_zdev extensions for PCI passthrough To be able to keep both subsystem modular we add a registering hook inside the S390 core code. This fixes a build problem when VFIO is built-in and KVM is built as a module. Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Fixes: `09340b2fca` ("KVM: s390: pci: add routines to start/stop interpretive execution") Cc: <stable@vger.kernel.org> Acked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20220819122945.9309-1-pmorel@linux.ibm.com Message-Id: <20220819122945.9309-1-pmorel@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>	2022-08-29 13:29:28 +02:00
Jason Gunthorpe	0f3e72b5c8	vfio: Move vfio.c to vfio_main.c If a source file has the same name as a module then kbuild only supports a single source file in the module. Rename vfio.c to vfio_main.c so that we can have more that one .c file in vfio.ko. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220731125503.142683-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-08-08 14:33:41 -06:00
Linus Torvalds	a9cf69d0e7	VFIO updates for v6.0-rc1 - Cleanup use of extern in function prototypes (Alex Williamson) - Simplify bus_type usage and convert to device IOMMU interfaces (Robin Murphy) - Check missed return value and fix comment typos (Bo Liu) - Split migration ops from device ops and fix races in mlx5 migration support (Yishai Hadas) - Fix missed return value check in noiommu support (Liam Ni) - Hardening to clear buffer pointer to avoid use-after-free (Schspa Shi) - Remove requirement that only the same mm can unmap a previously mapped range (Li Zhe) - Adjust semaphore release vs device open counter (Yi Liu) - Remove unused arg from SPAPR support code (Deming Wang) - Rework vfio-ccw driver to better fit new mdev framework (Eric Farman, Michael Kawano) - Replace DMA unmap notifier with callbacks (Jason Gunthorpe) - Clarify SPAPR support comment relative to iommu_ops (Alexey Kardashevskiy) - Revise page pinning API towards compatibility with future iommufd support (Nicolin Chen) - Resolve issues in vfio-ccw, including use of DMA unmap callback (Eric Farman) -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmLqvYMbHGFsZXgud2ls bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiHM0P/1n/bszel20PRC7x+NLI P7b/0aonW4Qtei2HORwowmaznb4NgRE5GCm5RU+a9+AwQKnK44j3lqy0skcfgZXr f4viFlxOyd0H4blOhUZ+FuPNkUMAyz6HerzvJ9jQFG426pL5vr7UKWBuJPYB5RCT 4jEy3EUTSH8/Zt8ApLysFTyR64xN3Sk7vSUcj9rEhu5T3FWq8t9+jb3tE/HW/Xaw pMwdC+ctYzYaBD/oA7Ns2IebNS9AUIUjKMXC25oCmc83WGgGOqgLB2mAthQ2NKB5 5capKBYuYl7PWERvpGpsPILEWvR6m+Rxh8r4Pqjcoyfq4k7vp+A/AFKiD7AEYBdy BtfLWO59w6vuRQ5XXOa6Hu4ef6BcMvH4StrHxlHkKcgI4PJA0QscIXiJPQSt7Crr m+kCNgPPgrfZDu7lmZTiWbXOYSkJR3Mxkhf2iNHudW9SsJT9pUAVEiGVVA/kC1Y/ fNBziRQeVF6JUW8M4pveXEWEbA8iE1HQeJA6aVRonxAkJk1KBaQgm/GKJlPXCHIR R6lI90NXZHz/3ndIX1znKOm0qli+8auX/FH8iWUffZxGmtINOGGMYebD6YxFdCCJ sWalL8vlQNCams2MZdovu/5BowXWtwOMm6KNG9RXSyWIWZEcNVbAzhTr+rrDdHZd AJiUNCGO9UlO9FZM+ntfQTSr =4BE8 -----END PGP SIGNATURE----- Merge tag 'vfio-v6.0-rc1' of https://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Cleanup use of extern in function prototypes (Alex Williamson) - Simplify bus_type usage and convert to device IOMMU interfaces (Robin Murphy) - Check missed return value and fix comment typos (Bo Liu) - Split migration ops from device ops and fix races in mlx5 migration support (Yishai Hadas) - Fix missed return value check in noiommu support (Liam Ni) - Hardening to clear buffer pointer to avoid use-after-free (Schspa Shi) - Remove requirement that only the same mm can unmap a previously mapped range (Li Zhe) - Adjust semaphore release vs device open counter (Yi Liu) - Remove unused arg from SPAPR support code (Deming Wang) - Rework vfio-ccw driver to better fit new mdev framework (Eric Farman, Michael Kawano) - Replace DMA unmap notifier with callbacks (Jason Gunthorpe) - Clarify SPAPR support comment relative to iommu_ops (Alexey Kardashevskiy) - Revise page pinning API towards compatibility with future iommufd support (Nicolin Chen) - Resolve issues in vfio-ccw, including use of DMA unmap callback (Eric Farman) * tag 'vfio-v6.0-rc1' of https://github.com/awilliam/linux-vfio: (40 commits) vfio/pci: fix the wrong word vfio/ccw: Check return code from subchannel quiesce vfio/ccw: Remove FSM Close from remove handlers vfio/ccw: Add length to DMA_UNMAP checks vfio: Replace phys_pfn with pages for vfio_pin_pages() vfio/ccw: Add kmap_local_page() for memcpy vfio: Rename user_iova of vfio_dma_rw() vfio/ccw: Change pa_pfn list to pa_iova list vfio/ap: Change saved_pfn to saved_iova vfio: Pass in starting IOVA to vfio_pin/unpin_pages API vfio/ccw: Only pass in contiguous pages vfio/ap: Pass in physical address of ind to ap_aqic() drm/i915/gvt: Replace roundup with DIV_ROUND_UP vfio: Make vfio_unpin_pages() return void vfio/spapr_tce: Fix the comment vfio: Replace the iommu notifier with a device list vfio: Replace the DMA unmapping notifier with a callback vfio/ccw: Move FSM open/close to MDEV open/close vfio/ccw: Refactor vfio_ccw_mdev_reset vfio/ccw: Create a CLOSE FSM event ...	2022-08-06 08:59:35 -07:00
Linus Torvalds	7c5c3a6177	ARM: * Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack * Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure * Disagregation of the vcpu flags in separate sets to better track their use model. * A fix for the GICv2-on-v3 selftest * A small set of cosmetic fixes RISC-V: * Track ISA extensions used by Guest using bitmap * Added system instruction emulation framework * Added CSR emulation framework * Added gfp_custom flag in struct kvm_mmu_memory_cache * Added G-stage ioremap() and iounmap() functions * Added support for Svpbmt inside Guest s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes x86: * Permit guests to ignore single-bit ECC errors * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Use try_cmpxchg64 instead of cmpxchg64 * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * Allow NX huge page mitigation to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs * Miscellaneous cleanups: MCE MSR emulation Use separate namespaces for guest PTEs and shadow PTEs bitmasks PIO emulation Reorganize rmap API, mostly around rmap destruction Do not workaround very old KVM bugs for L0 that runs with nesting enabled new selftests API for CPUID Generic: * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLnyo4UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtQQf/XjVWiRcWLPR9dqzRM/vvRXpiG+UL jU93R7m6ma99aqTtrxV/AE+kHgamBlma3Cwo+AcWk9uCVNbIhFjv2YKg6HptKU0e oJT3zRYp+XIjEo7Kfw+TwroZbTlG6gN83l1oBLFMqiFmHsMLnXSI2mm8MXyi3dNB vR2uIcTAl58KIprqNNsYJ2dNn74ogOMiXYx9XzoA9/5Xb6c0h4rreHJa5t+0s9RO Gz7Io3PxumgsbJngjyL1Ve5oxhlIAcZA8DU0PQmjxo3eS+k6BcmavGFd45gNL5zg iLpCh4k86spmzh8CWkAAwWPQE4dZknK6jTctJc0OFVad3Z7+X7n0E8TFrA== =PM8o -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Quite a large pull request due to a selftest API overhaul and some patches that had come in too late for 5.19. ARM: - Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack - Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure - Disagregation of the vcpu flags in separate sets to better track their use model. - A fix for the GICv2-on-v3 selftest - A small set of cosmetic fixes RISC-V: - Track ISA extensions used by Guest using bitmap - Added system instruction emulation framework - Added CSR emulation framework - Added gfp_custom flag in struct kvm_mmu_memory_cache - Added G-stage ioremap() and iounmap() functions - Added support for Svpbmt inside Guest s390: - add an interface to provide a hypervisor dump for secure guests - improve selftests to use TAP interface - enable interpretive execution of zPCI instructions (for PCI passthrough) - First part of deferred teardown - CPU Topology - PV attestation - Minor fixes x86: - Permit guests to ignore single-bit ECC errors - Intel IPI virtualization - Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS - PEBS virtualization - Simplify PMU emulation by just using PERF_TYPE_RAW events - More accurate event reinjection on SVM (avoid retrying instructions) - Allow getting/setting the state of the speaker port data bit - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent - "Notify" VM exit (detect microarchitectural hangs) for Intel - Use try_cmpxchg64 instead of cmpxchg64 - Ignore benign host accesses to PMU MSRs when PMU is disabled - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior - Allow NX huge page mitigation to be disabled on a per-vm basis - Port eager page splitting to shadow MMU as well - Enable CMCI capability by default and handle injected UCNA errors - Expose pid of vcpu threads in debugfs - x2AVIC support for AMD - cleanup PIO emulation - Fixes for LLDT/LTR emulation - Don't require refcounted "struct page" to create huge SPTEs - Miscellaneous cleanups: - MCE MSR emulation - Use separate namespaces for guest PTEs and shadow PTEs bitmasks - PIO emulation - Reorganize rmap API, mostly around rmap destruction - Do not workaround very old KVM bugs for L0 that runs with nesting enabled - new selftests API for CPUID Generic: - Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache - new selftests API using struct kvm_vcpu instead of a (vm, id) tuple" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits) selftests: kvm: set rax before vmcall selftests: KVM: Add exponent check for boolean stats selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test selftests: KVM: Check stat name before other fields KVM: x86/mmu: remove unused variable RISC-V: KVM: Add support for Svpbmt inside Guest/VM RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap() RISC-V: KVM: Add G-stage ioremap() and iounmap() functions KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache RISC-V: KVM: Add extensible CSR emulation framework RISC-V: KVM: Add extensible system instruction emulation framework RISC-V: KVM: Factor-out instruction emulation into separate sources RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function RISC-V: KVM: Fix variable spelling mistake RISC-V: KVM: Improve ISA extension by using a bitmap KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs() KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT KVM: x86: Do not block APIC write for non ICR registers ...	2022-08-04 14:59:54 -07:00
Linus Torvalds	b44f2fd879	drm for 5.20/6.0 New driver: - logicvc vfio: - use aperture API core: - of: Add data-lane helpers and convert drivers - connector: Remove deprecated ida_simple_get() media: - Add various RGB666 and RGB888 format constants panel: - Add HannStar HSD101PWW - Add ETML0700Y5DHA dma-buf: - add sync-file API - set dma mask for udmabuf devices fbcon: - Improve scrolling performance - Sanitize input fbdev: - device unregistering fixes - vesa: Support COMPILE_TEST - Disable firmware-device registration when first native driver loads aperture: - fix segfault during hot-unplug - export for use with other subsystems client: - use driver validated modes dp: - aux: make probing more reliable - mst: Read extended DPCD capabilities during system resume - Support waiting for HDP signal - Port-validation fixes edid: - CEA data-block iterators - struct drm_edid introduction - implement HF-EEODB extension gem: - don't use fb format non-existing planes probe-helper: - use 640x480 as displayport fallback scheduler: - don't kill jobs in interrupt context bridge: - Add support for i.MX8qxp and i.MX8qm - lots of fixes/cleanups - Add TI-DLPC3433 - fy07024di26a30d: Optional GPIO reset - ldb: Add reg and reg-name properties to bindings, Kconfig fixes - lt9611: Fix display sensing; - tc358767: DSI/DPI refactoring and DSI-to-eDP support, DSI lane handling - tc358775: Fix clock settings - ti-sn65dsi83: Allow GPIO to sleep - adv7511: I2C fixes - anx7625: Fix error handling; DPI fixes; Implement HDP timeout via callback - fsl-ldb: Drop DE flip - ti-sn65dsi86: Convert to atomic modesetting amdgpu: - use atomic fence helpers in DM - fix VRAM address calculations - export CRTC bpc via debugfs - Initial devcoredump support - Enable high priority gfx queue on asics which support it - Adjust GART size on newer APUs for S/G display - Soft reset for GFX 11 / SDMA 6 - Add gfxoff status query for vangogh - Fix timestamps for cursor only commits - Adjust GART size on newer APUs for S/G display - fix buddy memory corruption amdkfd: - MMU notifier fixes - P2P DMA support using dma-buf - Add available memory IOCTL - HMM profiler support - Simplify GPUVM validation - Unified memory for CWSR save/restore area i915: - General driver clean-up - DG2 enabling (still under force probe) - DG2 small BAR memory support - HuC loading support - DG2 workarounds - DG2/ATS-M device IDs added - Ponte Vecchio prep work and new blitter engines - add Meteorlake support - Fix sparse warnings - DMC MMIO range checks - Audio related fixes - Runtime PM fixes - PSR fixes - Media freq factor and per-gt enhancements - DSI fixes for ICL+ - Disable DMC flip queue handlers - ADL_P voltage swing updates - Use more the VBT for panel information - Fix on Type-C ports with TBT mode - Improve fastset and allow seamless M/N changes - Accept more fixed modes with VRR/DMRRS panels - Disable connector polling for a headless SKU - ADL-S display PLL w/a - Enable THP on Icelake and beyond - Fix i915_gem_object_ggtt_pin_ww regression on old platforms - Expose per tile media freq factor in sysfs - Fix dma_resv fence handling in multi-batch execbuf - Improve on suspend / resume time with VT-d enabled - export CRTC bpc settings via debugfs msm: - gpu: a619 support - gpu: Fix for unclocked GMU register access - gpu: Devcore dump enhancements - client utilization via fdinfo support - fix fence rollover issue - gem: Lockdep false-positive warning fix - gem: Switch to pfn mappings - WB support on sc7180 - dp: dropped custom bulk clock implementation - fix link retraining on resolution change - hdmi: dropped obsolete GPIO support tegra: - context isolation for host1x engines - tegra234 soc support mediatek: - add vdosys0/1 for mt8195 - add MT8195 dp_intf driver exynos: - Fix resume function issue of exynos decon driver by calling clk_disable_unprepare() properly if clk_prepare_enable() failed. nouveau: - set of misc fixes/cleanups - display cleanups gma500: - Cleanup connector I2C handling hyperv: - Unify VRAM allocation of Gen1 and Gen2 meson: - Support YUV422 output; Refcount fixes mgag200: - Support damage clipping - Support gamma handling - Protect concurrent HW access - Fixes to connector - Store model-specific limits in device-info structure - fix PCI register init panfrost: - Valhall support r128: - Fix bit-shift overflow rockchip: - Locking fixes in error path ssd130x: - Fix built-in linkage udl: - Always advertize VGA connector ast: - Support multiple outputs - fix black screen on resume sun4i: - HDMI PHY cleanups vc4: - Add support for BCM2711 vkms: - Allocate output buffer with vmalloc() mcde: - Fix ref-count leak mxsfb/lcdif: - Support i.MX8MP LCD controller stm/ltdc: - Support dynamic Z order - Support mirroring ingenic: - Fix display at maximum resolution -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmLp/7YACgkQDHTzWXnE hr7NjhAAnefa+72EG42OAqajbwTQMENOtFfqyL3k6ueK2ciYbsj/wklw/xc4Ok3o DM5kG54t+nA9L1M7UyE7eaO36/XcuvS8Ea0uKKkamWt+3Ux4g1Vo1J37nP5sK5jI GT/wceKA5sk3nuYly2lBby6mVTGuhAX+3edTAFeOwmd0WvQzzpy4vV+nCAgfshUs ql4gfQPdQdP+wiovUzCIEu6exCSCAI/Oc944fd3AJi5bZbOPFXRS4rMMOLSrdoXV 9P44EZExPbYrDuVUCx/UaZtN8D9myyyBfZe62CtdgNyTYUHXnHCBYue+7D/s5O+y GaLWcP128MsqZNmJNhmcWFIlgqowO24YkKUH68JH0UtBLSWich8rfdEsrxIidYED 0ma1jodRapjyZOjrHEJ3N5deKpoflMmqvCMpvIk1Ev6pT8KX9a6u34kLgsOVCV41 2bDEYD+DbRW2FexGR79yB2huXHGSnguco6069ca1oy9RF4q8cX6Pb1w2u42oS7zX lIgLIashilVR2AYg/qi6IPHavmOQ9ItSXPC+4YasYiMGp/mwePqpmL63b/wkhg0D nXn6/F8Bm6wle2FFbkLGwo1fF1Hn7RzTHSlqRWDKSEaMLhCus6M09VsobFCB19i0 lO4FNVTL8ZtryR94bgVmgi616w9hOhDhM9A+C0kJ9KBkDnDYUJU= =HQ9U -----END PGP SIGNATURE----- Merge tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm Pull drm updates from Dave Airlie: "Highlights: - New driver for logicvc - which is a display IP core. - EDID parser rework to add new extensions - fbcon scrolling improvements - i915 has some more DG2 work but not enabled by default, but should have enough features for userspace to work now. Otherwise it's lots of work all over the place. Detailed summary: New driver: - logicvc vfio: - use aperture API core: - of: Add data-lane helpers and convert drivers - connector: Remove deprecated ida_simple_get() media: - Add various RGB666 and RGB888 format constants panel: - Add HannStar HSD101PWW - Add ETML0700Y5DHA dma-buf: - add sync-file API - set dma mask for udmabuf devices fbcon: - Improve scrolling performance - Sanitize input fbdev: - device unregistering fixes - vesa: Support COMPILE_TEST - Disable firmware-device registration when first native driver loads aperture: - fix segfault during hot-unplug - export for use with other subsystems client: - use driver validated modes dp: - aux: make probing more reliable - mst: Read extended DPCD capabilities during system resume - Support waiting for HDP signal - Port-validation fixes edid: - CEA data-block iterators - struct drm_edid introduction - implement HF-EEODB extension gem: - don't use fb format non-existing planes probe-helper: - use 640x480 as displayport fallback scheduler: - don't kill jobs in interrupt context bridge: - Add support for i.MX8qxp and i.MX8qm - lots of fixes/cleanups - Add TI-DLPC3433 - fy07024di26a30d: Optional GPIO reset - ldb: Add reg and reg-name properties to bindings, Kconfig fixes - lt9611: Fix display sensing; - tc358767: DSI/DPI refactoring and DSI-to-eDP support, DSI lane handling - tc358775: Fix clock settings - ti-sn65dsi83: Allow GPIO to sleep - adv7511: I2C fixes - anx7625: Fix error handling; DPI fixes; Implement HDP timeout via callback - fsl-ldb: Drop DE flip - ti-sn65dsi86: Convert to atomic modesetting amdgpu: - use atomic fence helpers in DM - fix VRAM address calculations - export CRTC bpc via debugfs - Initial devcoredump support - Enable high priority gfx queue on asics which support it - Adjust GART size on newer APUs for S/G display - Soft reset for GFX 11 / SDMA 6 - Add gfxoff status query for vangogh - Fix timestamps for cursor only commits - Adjust GART size on newer APUs for S/G display - fix buddy memory corruption amdkfd: - MMU notifier fixes - P2P DMA support using dma-buf - Add available memory IOCTL - HMM profiler support - Simplify GPUVM validation - Unified memory for CWSR save/restore area i915: - General driver clean-up - DG2 enabling (still under force probe) - DG2 small BAR memory support - HuC loading support - DG2 workarounds - DG2/ATS-M device IDs added - Ponte Vecchio prep work and new blitter engines - add Meteorlake support - Fix sparse warnings - DMC MMIO range checks - Audio related fixes - Runtime PM fixes - PSR fixes - Media freq factor and per-gt enhancements - DSI fixes for ICL+ - Disable DMC flip queue handlers - ADL_P voltage swing updates - Use more the VBT for panel information - Fix on Type-C ports with TBT mode - Improve fastset and allow seamless M/N changes - Accept more fixed modes with VRR/DMRRS panels - Disable connector polling for a headless SKU - ADL-S display PLL w/a - Enable THP on Icelake and beyond - Fix i915_gem_object_ggtt_pin_ww regression on old platforms - Expose per tile media freq factor in sysfs - Fix dma_resv fence handling in multi-batch execbuf - Improve on suspend / resume time with VT-d enabled - export CRTC bpc settings via debugfs msm: - gpu: a619 support - gpu: Fix for unclocked GMU register access - gpu: Devcore dump enhancements - client utilization via fdinfo support - fix fence rollover issue - gem: Lockdep false-positive warning fix - gem: Switch to pfn mappings - WB support on sc7180 - dp: dropped custom bulk clock implementation - fix link retraining on resolution change - hdmi: dropped obsolete GPIO support tegra: - context isolation for host1x engines - tegra234 soc support mediatek: - add vdosys0/1 for mt8195 - add MT8195 dp_intf driver exynos: - Fix resume function issue of exynos decon driver by calling clk_disable_unprepare() properly if clk_prepare_enable() failed. nouveau: - set of misc fixes/cleanups - display cleanups gma500: - Cleanup connector I2C handling hyperv: - Unify VRAM allocation of Gen1 and Gen2 meson: - Support YUV422 output; Refcount fixes mgag200: - Support damage clipping - Support gamma handling - Protect concurrent HW access - Fixes to connector - Store model-specific limits in device-info structure - fix PCI register init panfrost: - Valhall support r128: - Fix bit-shift overflow rockchip: - Locking fixes in error path ssd130x: - Fix built-in linkage udl: - Always advertize VGA connector ast: - Support multiple outputs - fix black screen on resume sun4i: - HDMI PHY cleanups vc4: - Add support for BCM2711 vkms: - Allocate output buffer with vmalloc() mcde: - Fix ref-count leak mxsfb/lcdif: - Support i.MX8MP LCD controller stm/ltdc: - Support dynamic Z order - Support mirroring ingenic: - Fix display at maximum resolution" * tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm: (1480 commits) drm/amd/display: Fix a compilation failure on PowerPC caused by FPU code drm/amdgpu: enable support for psp 13.0.4 block drm/amdgpu: add files for PSP 13.0.4 drm/amdgpu: add header files for MP 13.0.4 drm/amdgpu: correct RLC_RLCS_BOOTLOAD_STATUS offset and index drm/amdgpu: send msg to IMU for the front-door loading drm/amdkfd: use time_is_before_jiffies(a + b) to replace "jiffies - a > b" drm/amdgpu: fix hive reference leak when reflecting psp topology info drm/amd/pm: enable GFX ULV feature support for SMU13.0.0 drm/amd/pm: update driver if header for SMU 13.0.0 drm/amdgpu: move mes self test after drm sched re-started drm/amdgpu: drop non-necessary call trace dump drm/amdgpu: enable VCN cg and JPEG cg/pg drm/amdgpu: vcn_4_0_2 video codec query drm/amdgpu: add VCN_4_0_2 firmware support drm/amdgpu: add VCN function in NBIO v7.7 drm/amdgpu: fix a vcn4 boot poll bug in emulation mode drm/amd/amdgpu: add memory training support for PSP_V13 drm/amdkfd: remove an unnecessary amdgpu_bo_ref drm/amd/pm: Add get_gfx_off_status interface for yellow carp ...	2022-08-03 19:52:08 -07:00
Linus Torvalds	a782e86649	Saner handling of "lseek should fail with ESPIPE" - gets rid of magical no_llseek thing and makes checks consistent. In particular, ad-hoc "can we do splice via internal pipe" checks got saner (and somewhat more permissive, which is what Jason had been after, AFAICT) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYug2xgAKCRBZ7Krx/gZQ 6wxWAQDqeg+xMq2FGPXmgjCa+Cp3PXH96Lp6f3hHzakIDx+t8gEAxvuiXAD22Mct 6S1SKuGj0iDIuM4L7hUiWTiY/bDXSAc= =3EC/ -----END PGP SIGNATURE----- Merge tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs lseek updates from Al Viro: "Jason's lseek series. Saner handling of 'lseek should fail with ESPIPE' - this gets rid of the magical no_llseek thing and makes checks consistent. In particular, the ad-hoc "can we do splice via internal pipe" checks got saner (and somewhat more permissive, which is what Jason had been after, AFAICT)" * tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: remove no_llseek fs: check FMODE_LSEEK to control internal pipe splicing vfio: do not set FMODE_LSEEK flag dma-buf: remove useless FMODE_LSEEK flag fs: do not compare against ->llseek fs: clear or set FMODE_LSEEK based on llseek function	2022-08-03 11:35:20 -07:00
Bo Liu	099fd2c202	vfio/pci: fix the wrong word This patch fixes a wrong word in comment. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20220801013918.2520-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-08-01 13:37:42 -06:00
Paolo Bonzini	63f4b21041	Merge remote-tracking branch 'kvm/next' into kvm-next-5.20 KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID	2022-08-01 03:21:00 -04:00
Nicolin Chen	34a255e676	vfio: Replace phys_pfn with pages for vfio_pin_pages() Most of the callers of vfio_pin_pages() want "struct page " and the low-level mm code to pin pages returns a list of "struct page " too. So there's no gain in converting "struct page *" to PFN in between. Replace the output parameter "phys_pfn" list with a "pages" list, to simplify callers. This also allows us to replace the vfio_iommu_type1 implementation with a more efficient one. And drop the pfn_valid check in the gvt code, as there is no need to do such a check at a page-backed struct page pointer. For now, also update vfio_iommu_type1 to fit this new parameter too. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Eric Farman <farman@linux.ibm.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-11-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-25 13:41:22 -06:00
Nicolin Chen	8561aa4fb7	vfio: Rename user_iova of vfio_dma_rw() Following the updated vfio_pin/unpin_pages(), use the simpler "iova". Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-9-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-25 13:41:22 -06:00
Nicolin Chen	44abdd1646	vfio: Pass in starting IOVA to vfio_pin/unpin_pages API The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA. Among all three callers, there was only one caller possibly passing in a non-contiguous PFN list, which is now ensured to have contiguous PFN inputs too. Pass in the starting address with "iova" alone to simplify things, so callers no longer need to maintain a PFN list or to pin/unpin one page at a time. This also allows VFIO to use more efficient implementations of pin/unpin_pages. For now, also update vfio_iommu_type1 to fit this new parameter too, while keeping its input intact (being user_iova) since we don't want to spend too much effort swapping its parameters and local variables at that level. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Eric Farman <farman@linux.ibm.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-6-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-25 13:41:22 -06:00
Nicolin Chen	e8f90717ed	vfio: Make vfio_unpin_pages() return void There's only one caller that checks its return value with a WARN_ON_ONCE, while all other callers don't check the return value at all. Above that, an undo function should not fail. So, simplify the API to return void by embedding similar WARN_ONs. Also for users to pinpoint which condition fails, separate WARN_ON lines, yet remove the "driver->ops->unpin_pages" check, since it's unreasonable for callers to unpin on something totally random that wasn't even pinned. And remove NULL pointer checks for they would trigger oops vs. warnings. Note that npage is already validated in the vfio core, thus drop the same check in the type1 code. Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-2-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-23 07:29:10 -06:00
Alexey Kardashevskiy	9cb633acfe	vfio/spapr_tce: Fix the comment Grepping for "iommu_ops" finds this spot and gives wrong impression that iommu_ops is used in here, fix the comment. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Link: https://lore.kernel.org/r/20220714080912.3713509-1-aik@ozlabs.ru [aw: convert to multi-line comment] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-22 16:24:47 -06:00
Jason Gunthorpe	8cfc5b6075	vfio: Replace the iommu notifier with a device list Instead of bouncing the function call to the driver op through a blocking notifier just have the iommu layer call it directly. Register each device that is being attached to the iommu with the lower driver which then threads them on a linked list and calls the appropriate driver op at the right time. Currently the only use is if dma_unmap() is defined. Also, fully lock all the debugging tests on the pinning path that a dma_unmap is registered. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-20 11:57:59 -06:00
Jason Gunthorpe	ce4b4657ff	vfio: Replace the DMA unmapping notifier with a callback Instead of having drivers register the notifier with explicit code just have them provide a dma_unmap callback op in their driver ops and rely on the core code to wire it up. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-20 11:57:59 -06:00
Jason A. Donenfeld	54ef7a47f6	vfio: do not set FMODE_LSEEK flag This file does not support llseek, so don't set the flag advertising it. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-07-16 09:19:15 -04:00
Matthew Rosato	ba6090ff8a	vfio-pci/zdev: different maxstbl for interpreted devices When doing load/store interpretation, the maximum store block length is determined by the underlying firmware, not the host kernel API. Reflect that in the associated Query PCI Function Group clp capability and let userspace decide which is appropriate to present to the guest. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-20-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2022-07-11 09:54:37 +02:00
Matthew Rosato	faf3bfcb89	vfio-pci/zdev: add function handle to clp base capability The function handle is a system-wide unique identifier for a zPCI device. With zPCI instruction interpretation, the host will no longer be executing the zPCI instructions on behalf of the guest. As a result, the guest needs to use the real function handle in order for firmware to associate the instruction with the proper PCI function. Let's provide that handle to the guest. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-19-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2022-07-11 09:54:36 +02:00
Matthew Rosato	8061d1c31f	vfio-pci/zdev: add open/close device hooks During vfio-pci open_device, pass the KVM associated with the vfio group (if one exists). This is needed in order to pass a special indicator (GISA) to firmware to allow zPCI interpretation facilities to be used for only the specific KVM associated with the vfio-pci device. During vfio-pci close_device, unregister the notifier. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-18-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2022-07-11 09:54:35 +02:00
Matthew Rosato	c435c54639	vfio/pci: introduce CONFIG_VFIO_PCI_ZDEV_KVM The current contents of vfio-pci-zdev are today only useful in a KVM environment; let's tie everything currently under vfio-pci-zdev to this Kconfig statement and require KVM in this case, reducing complexity (e.g. symbol lookups). Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-11-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2022-07-11 09:54:25 +02:00
Alex Williamson	2a8ed7ef00	Merge branches 'v5.20/vfio/spapr_tce-unused-arg-v1', 'v5.20/vfio/comment-typo-v1' and 'v5.20/vfio/vfio-ccw-rework-v4' into v5.20/vfio/next	2022-07-07 15:05:13 -06:00
Deming Wang	ff4f65e4dd	vfio/spapr_tce: Remove the unused parameters container The parameter of container has been unused for tce_iommu_unuse_page. So, we should delete it. Signed-off-by: Deming Wang <wangdeming@inspur.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Link: https://lore.kernel.org/r/20220702064613.5293-1-wangdeming@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-07 10:58:19 -06:00
Bo Liu	6577067d7f	vfio/pci: fix the wrong word This patch fixes a wrong word in comment. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20220704023649.3913-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-06 13:17:01 -06:00
Jason Gunthorpe	afe4e376ac	vfio: Move IOMMU_CAP_CACHE_COHERENCY test to after we know we have a group The test isn't going to work if a group doesn't exist. Normally this isn't a problem since VFIO isn't going to create a device if there is no group, but the special CONFIG_VFIO_NOIOMMU behavior allows bypassing this prevention. The new cap test effectively forces a group and breaks this config option. Move the cap test to vfio_group_find_or_alloc() which is the earliest time we know we have a group available and thus are not running in noiommu mode. Fixes: `e8ae0e140c` ("vfio: Require that devices support DMA cache coherence") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-e8934b490f36+f4-vfio_cap_fix_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-05 16:06:50 -06:00
Alex Williamson	7654a8881a	Merge branches 'v5.20/vfio/migration-enhancements-v3', 'v5.20/vfio/simplify-bus_type-determination-v3', 'v5.20/vfio/check-vfio_register_iommu_driver-return-v2', 'v5.20/vfio/check-iommu_group_set_name_return-v1', 'v5.20/vfio/clear-caps-buf-v3', 'v5.20/vfio/remove-useless-judgement-v1' and 'v5.20/vfio/move-device_open-count-v2' into v5.20/vfio/next	2022-06-30 11:50:40 -06:00
Yi Liu	330c179976	vfio: Move "device->open_count--" out of group_rwsem in vfio_device_open() We do not protect the vfio_device::open_count with group_rwsem elsewhere (see vfio_device_fops_release as a comparison, where we already drop group_rwsem before open_count--). So move the group_rwsem unlock prior to open_count--. This change now also drops group_rswem before setting device->kvm = NULL, but that's also OK (again, just like vfio_device_fops_release). The setting of device->kvm before open_device is technically done while holding the group_rwsem, this is done to protect the group kvm value we are copying from, and we should not be relying on that to protect the contents of device->kvm; instead we assume this value will not change until after the device is closed and while under the dev_set->lock. Cc: Matthew Rosato <mjrosato@linux.ibm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220627074119.523274-1-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-06-30 11:07:07 -06:00
Li Zhe	ffed0518d8	vfio: remove useless judgement In function vfio_dma_do_unmap(), we currently prevent process to unmap vfio dma region whose mm_struct is different from the vfio_dma->task. In our virtual machine scenario which is using kvm and qemu, this judgement stops us from liveupgrading our qemu, which uses fork() && exec() to load the new binary but the new process cannot do the VFIO_IOMMU_UNMAP_DMA action during vm exit because of this judgement. This judgement is added in commit `8f0d5bb95f` ("vfio iommu type1: Add task structure to vfio_dma") for the security reason. But it seems that no other task who has no family relationship with old and new process can get the same vfio_dma struct here for the reason of resource isolation. So this patch delete it. Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Reviewed-by: Jason Gunthorpe <jgg@ziepe.ca> Link: https://lore.kernel.org/r/20220627035109.73745-1-lizhe.67@bytedance.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-06-30 11:02:40 -06:00
Schspa Shi	6641085e8d	vfio: Clear the caps->buf to NULL after free On buffer resize failure, vfio_info_cap_add() will free the buffer, report zero for the size, and return -ENOMEM. As additional hardening, also clear the buffer pointer to prevent any chance of a double free. Signed-off-by: Schspa Shi <schspa@gmail.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20220629022948.55608-1-schspa@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-06-30 11:01:14 -06:00
Liam Ni	1c61d51e96	vfio: check iommu_group_set_name() return value As iommu_group_set_name() can fail, we should check the return value. Signed-off-by: Liam Ni <zhiguangni01@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220625114239.9301-1-zhiguangni01@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-06-30 10:57:58 -06:00
Yishai Hadas	6e97eba8ad	vfio: Split migration ops from main device ops vfio core checks whether the driver sets some migration op (e.g. set_state/get_state) and accordingly calls its op. However, currently mlx5 driver sets the above ops without regards to its migration caps. This might lead to unexpected usage/Oops if user space may call to the above ops even if the driver doesn't support migration. As for example, the migration state_mutex is not initialized in that case. The cleanest way to manage that seems to split the migration ops from the main device ops, this will let the driver setting them separately from the main ops when it's applicable. As part of that, validate ops construction on registration and include a check for VFIO_MIGRATION_STOP_COPY since the uAPI claims it must be set in migration_flags. HISI driver was changed as well to match this scheme. This scheme may enable down the road to come with some extra group of ops (e.g. DMA log) that can be set without regards to the other options based on driver caps. Fixes: `6fadb02126` ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220628155910.171454-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-06-30 10:47:22 -06:00
Yishai Hadas	2b1c190628	vfio/mlx5: Protect mlx5vf_disable_fds() upon close device Protect mlx5vf_disable_fds() upon close device to be called under the state mutex as done in all other places. This will prevent a race with any other flow which calls mlx5vf_disable_fds() as of health/recovery upon MLX5_PF_NOTIFY_DISABLE_VF event. Encapsulate this functionality in a separate function named mlx5vf_cmd_close_migratable() to consider migration caps and for further usage upon close device. Fixes: `6fadb02126` ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220628155910.171454-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-06-30 10:45:39 -06:00
Bo Liu	a13b1e472b	vfio: check vfio_register_iommu_driver() return value As vfio_register_iommu_driver() can fail, we should check the return value. Signed-off-by: Bo Liu <liubo03@inspur.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20220622045651.5416-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-06-27 13:57:08 -06:00
Robin Murphy	3b498b6656	vfio: Use device_iommu_capable() Use the new interface to check the capabilities for our device specifically. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4ea5eb64246f1ee188d1a61c3e93b37756932eb7.1656092606.git.robin.murphy@arm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-06-27 13:23:32 -06:00
Robin Murphy	eed20c782a	vfio/type1: Simplify bus_type determination Since IOMMU groups are mandatory for drivers to support, it stands to reason that any device which has been successfully added to a group must be on a bus supported by that IOMMU driver, and therefore a domain viable for any device in the group must be viable for all devices in the group. This already has to be the case for the IOMMU API's internal default domain, for instance. Thus even if the group contains devices on different buses, that can only mean that the IOMMU driver actually supports such an odd topology, and so without loss of generality we can expect the bus type of any device in a group to be suitable for IOMMU API calls. Furthermore, scrutiny reveals a lack of protection for the bus being removed while vfio_iommu_type1_attach_group() is using it; the reference that VFIO holds on the iommu_group ensures that data remains valid, but does not prevent the group's membership changing underfoot. We can address both concerns by recycling vfio_bus_type() into some superficially similar logic to indirect the IOMMU API calls themselves. Each call is thus protected from races by the IOMMU group's own locking, and we no longer need to hold group-derived pointers beyond that scope. It also gives us an easy path for the IOMMU API's migration of bus-based interfaces to device-based, of which we can already take the first step with device_iommu_capable(). As with domains, any capability must in practice be consistent for devices in a given group - and after all it's still the same capability which was expected to be consistent across an entire bus! - so there's no need for any complicated validation. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/194a12d3434d7b38f84fa96503c7664451c8c395.1656092606.git.robin.murphy@arm.com [aw: add comment to vfio_iommu_device_capable()] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-06-27 13:23:29 -06:00
Alex Williamson	d1877e639b	vfio: de-extern-ify function prototypes The use of 'extern' in function prototypes has been disrecommended in the kernel coding style for several years now, remove them from all vfio related files so contributors no longer need to decide between style and consistency. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/165471414407.203056.474032786990662279.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-06-27 09:38:02 -06:00
Alex Williamson	d173780620	vfio/pci: Remove console drivers Console drivers can create conflicts with PCI resources resulting in userspace getting mmap failures to memory BARs. This is especially evident when trying to re-use the system primary console for userspace drivers. Use the aperture helpers to remove these conflicts. v3: * call aperture_remove_conflicting_pci_devices() Reported-by: Laszlo Ersek <lersek@redhat.com> Suggested-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220622140134.12763-4-tzimmermann@suse.de	2022-06-27 11:10:32 +02:00
Linus Torvalds	176882156a	VFIO updates for v5.19-rc1 - Improvements to mlx5 vfio-pci variant driver, including support for parallel migration per PF (Yishai Hadas) - Remove redundant iommu_present() check (Robin Murphy) - Ongoing refactoring to consolidate the VFIO driver facing API to use vfio_device (Jason Gunthorpe) - Use drvdata to store vfio_device among all vfio-pci and variant drivers (Jason Gunthorpe) - Remove redundant code now that IOMMU core manages group DMA ownership (Jason Gunthorpe) - Remove vfio_group from external API handling struct file ownership (Jason Gunthorpe) - Correct typo in uapi comments (Thomas Huth) - Fix coccicheck detected deadlock (Wan Jiabing) - Use rwsem to remove races and simplify code around container and kvm association to groups (Jason Gunthorpe) - Harden access to devices in low power states and use runtime PM to enable d3cold support for unused devices (Abhishek Sahu) - Fix dma_owner handling of fake IOMMU groups (Jason Gunthorpe) - Set driver_managed_dma on vfio-pci variant drivers (Jason Gunthorpe) - Pass KVM pointer directly rather than via notifier (Matthew Rosato) -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmKPvyMbHGFsZXgud2ls bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsihegP/3XamiYsS0GuA7awAq/X h9Jahb6kJ+sh0RXL1Gqzc9nxH5X9H/hBcL88VOV3GLwyOhNVNpVjQXGguL3aLaCE zUrs0+AFEJb990y9H+VgwIDom5BIpgdZ2naG42bz9wUeVGg4daJnkMwOgXwIBzfx IOddktN6UwuE+DyA57yqL93f+0cTrhYZx9R14sDoLR5lE4uGnbQwIknawEKVtoeR rEPaCFptxPxCUbqoOSR0Y3bu6rUYSH4iiMZpMviqm2ak3aNn76gru3q4QAnI4gTd l/w+2OJNFC0U7H5Cz7cdIn2StdJvfSkX0e753+qsFccFsViRCGdnW0Lht/xrYrFC i8AJxkrq2/bs00LXs7kzcruaD8pJ2UPe2x2+nupHSEsj99K4NraeHRB2CC1uwj0d gYliOSW5T3//wOpztK48s475VppgXeKWkXGoNY3JJlGjAPyd0vFrH8hRLhVZJ9uI /eLh6hQnOJuCDz1rQrVNRk6cZi9R1Wpl5dvCBRLqjK519nm569aTlVBra+iNyUCQ lU5/kN0ym8+X8CweE5ILPGiX2iEXBYMqv+Dm5yOimRUHRJZHYv900FX0GVEnCUCq 23sMDaeHS1hyDCQk//bd2Ig7xjh7mbh7CrKcdJ7pL5Gc/A1zkCXd54hvxViiGwQq U5KIPTyJy+erpcpxjUApaoP2 =etEI -----END PGP SIGNATURE----- Merge tag 'vfio-v5.19-rc1' of https://github.com/awilliam/linux-vfio Pull vfio updates from Alex Williamson: - Improvements to mlx5 vfio-pci variant driver, including support for parallel migration per PF (Yishai Hadas) - Remove redundant iommu_present() check (Robin Murphy) - Ongoing refactoring to consolidate the VFIO driver facing API to use vfio_device (Jason Gunthorpe) - Use drvdata to store vfio_device among all vfio-pci and variant drivers (Jason Gunthorpe) - Remove redundant code now that IOMMU core manages group DMA ownership (Jason Gunthorpe) - Remove vfio_group from external API handling struct file ownership (Jason Gunthorpe) - Correct typo in uapi comments (Thomas Huth) - Fix coccicheck detected deadlock (Wan Jiabing) - Use rwsem to remove races and simplify code around container and kvm association to groups (Jason Gunthorpe) - Harden access to devices in low power states and use runtime PM to enable d3cold support for unused devices (Abhishek Sahu) - Fix dma_owner handling of fake IOMMU groups (Jason Gunthorpe) - Set driver_managed_dma on vfio-pci variant drivers (Jason Gunthorpe) - Pass KVM pointer directly rather than via notifier (Matthew Rosato) * tag 'vfio-v5.19-rc1' of https://github.com/awilliam/linux-vfio: (38 commits) vfio: remove VFIO_GROUP_NOTIFY_SET_KVM vfio/pci: Add driver_managed_dma to the new vfio_pci drivers vfio: Do not manipulate iommu dma_owner for fake iommu groups vfio/pci: Move the unused device into low power state with runtime PM vfio/pci: Virtualize PME related registers bits and initialize to zero vfio/pci: Change the PF power state to D0 before enabling VFs vfio/pci: Invalidate mmaps and block the access in D3hot power state vfio: Change struct vfio_group::container_users to a non-atomic int vfio: Simplify the life cycle of the group FD vfio: Fully lock struct vfio_group::container vfio: Split up vfio_group_get_device_fd() vfio: Change struct vfio_group::opened from an atomic to bool vfio: Add missing locking for struct vfio_group::kvm kvm/vfio: Fix potential deadlock problem in vfio include/uapi/linux/vfio.h: Fix trivial typo - _IORW should be _IOWR instead vfio/pci: Use the struct file as the handle not the vfio_group kvm/vfio: Remove vfio_group from kvm vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm() vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent() vfio: Remove vfio_external_group_match_file() ...	2022-06-01 13:49:15 -07:00
Linus Torvalds	e1cbc3b96a	IOMMU Updates for Linux v5.19 Including: - Intel VT-d driver updates - Domain force snooping improvement. - Cleanups, no intentional functional changes. - ARM SMMU driver updates - Add new Qualcomm device-tree compatible strings - Add new Nvidia device-tree compatible string for Tegra234 - Fix UAF in SMMUv3 shared virtual addressing code - Force identity-mapped domains for users of ye olde SMMU legacy binding - Minor cleanups - Patches to fix a BUG_ON in the vfio_iommu_group_notifier - Groundwork for upcoming iommufd framework - Introduction of DMA ownership so that an entire IOMMU group is either controlled by the kernel or by user-space - MT8195 and MT8186 support in the Mediatek IOMMU driver - Patches to make forcing of cache-coherent DMA more coherent between IOMMU drivers - Fixes for thunderbolt device DMA protection - Various smaller fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmKWCbUACgkQK/BELZcB GuPHmRAAuoH9iK/jrC3SgrqpBfH2iRN7ovIX8dFvgbQWX27lhXF4gvj2/nYdIvPK 75j/LmdibuzV3Iez4kjbGKNG1AikwK3dKIH21a84f3ctnoamQyL6nMfCVBFaVD/D kvPpTHyjbGPNf6KZyWQdkJ5DXD1aoG1DKkBnslH5pTNPqGuNqbcnRTg0YxiJFLBv 5w2B6jL06XRzunh+Sp1Dbj+po8ROjLRCEU+tdrndO8W/Dyp6+ZNNuxL9/3BM9zMj py0M4piFtGnhmJSdym1eeHm7r1YRjkZw+MN+e8NcrcSihmDutEWo7nRRxA5uVaa+ 3O2DNERqCvQUYxfNRUOKwzV8v51GYQHEPhvOe/MLgaEQDmDmlF2dHNGm93eCMdrv m1cT011oU7pa4qHomwLyTJxSsR7FzJ37igq/WeY++MBhl+frqfzEQPVxF+W7GLb8 QvT/+woCPzLVpJbE7s0FUD4nbPd8c1dAz4+HO1DajxILIOTq1bnPIorSjgXODRjq yzsiP1rAg0L0PsL7pXn3cPMzNCE//xtOsRsAGmaVv6wBoMLyWVFCU/wjPEdjrSWA nXpAuCL84uxCEl/KLYMsg9UhjT6ko7CuKdsybIG9zNIiUau43uSqgTen0xCpYt0i m//O/X3tPyxmoLKRW+XVehGOrBZW+qrQny6hk/Zex+6UJQqVMTA= =W0hj -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Intel VT-d driver updates: - Domain force snooping improvement. - Cleanups, no intentional functional changes. - ARM SMMU driver updates: - Add new Qualcomm device-tree compatible strings - Add new Nvidia device-tree compatible string for Tegra234 - Fix UAF in SMMUv3 shared virtual addressing code - Force identity-mapped domains for users of ye olde SMMU legacy binding - Minor cleanups - Fix a BUG_ON in the vfio_iommu_group_notifier: - Groundwork for upcoming iommufd framework - Introduction of DMA ownership so that an entire IOMMU group is either controlled by the kernel or by user-space - MT8195 and MT8186 support in the Mediatek IOMMU driver - Make forcing of cache-coherent DMA more coherent between IOMMU drivers - Fixes for thunderbolt device DMA protection - Various smaller fixes and cleanups * tag 'iommu-updates-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (88 commits) iommu/amd: Increase timeout waiting for GA log enablement iommu/s390: Tolerate repeat attach_dev calls iommu/vt-d: Remove hard coding PGSNP bit in PASID entries iommu/vt-d: Remove domain_update_iommu_snooping() iommu/vt-d: Check domain force_snooping against attached devices iommu/vt-d: Block force-snoop domain attaching if no SC support iommu/vt-d: Size Page Request Queue to avoid overflow condition iommu/vt-d: Fold dmar_insert_one_dev_info() into its caller iommu/vt-d: Change return type of dmar_insert_one_dev_info() iommu/vt-d: Remove unneeded validity check on dev iommu/dma: Explicitly sort PCI DMA windows iommu/dma: Fix iova map result check bug iommu/mediatek: Fix NULL pointer dereference when printing dev_name iommu: iommu_group_claim_dma_owner() must always assign a domain iommu/arm-smmu: Force identity domains for legacy binding iommu/arm-smmu: Support Tegra234 SMMU dt-bindings: arm-smmu: Add compatible for Tegra234 SOC dt-bindings: arm-smmu: Document nvidia,memory-controller property iommu/arm-smmu-qcom: Add SC8280XP support dt-bindings: arm-smmu: Add compatible for Qualcomm SC8280XP ...	2022-05-31 09:56:54 -07:00
Matthew Rosato	421cfe6596	vfio: remove VFIO_GROUP_NOTIFY_SET_KVM Rather than relying on a notifier for associating the KVM with the group, let's assume that the association has already been made prior to device_open. The first time a device is opened associate the group KVM with the device. This fixes a user-triggerable oops in GVT. Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Zhi Wang <zhi.a.wang@intel.com> Link: https://lore.kernel.org/r/20220519183311.582380-2-mjrosato@linux.ibm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-24 08:41:18 -06:00
Jason Gunthorpe	c490513c81	vfio/pci: Add driver_managed_dma to the new vfio_pci drivers When the iommu series adding driver_managed_dma was rebased it missed that new VFIO drivers were added and did not update them too. Without this vfio will claim the groups are not viable. Add driver_managed_dma to mlx5 and hisi. Fixes: `70693f4708` ("vfio: Set DMA ownership for VFIO devices") Reported-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-f9dfa642fab0+2b3-vfio_managed_dma_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-23 10:46:34 -06:00
Jason Gunthorpe	a3da1ab6fb	vfio: Do not manipulate iommu dma_owner for fake iommu groups Since asserting dma ownership now causes the group to have its DMA blocked the iommu layer requires a working iommu. This means the dma_owner APIs cannot be used on the fake groups that VFIO creates. Test for this and avoid calling them. Otherwise asserting dma ownership will fail for VFIO mdev devices as a BLOCKING iommu_domain cannot be allocated due to the NULL iommu ops. Fixes: `0286300e60` ("iommu: iommu_group_claim_dma_owner() must always assign a domain") Reported-by: Eric Farman <farman@linux.ibm.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/0-v1-9cfc47edbcd4+13546-vfio_dma_owner_fix_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-23 10:27:43 -06:00
Joerg Roedel	b0dacee202	Merge branches 'apple/dart', 'arm/mediatek', 'arm/msm', 'arm/smmu', 'ppc/pamu', 'x86/vt-d', 'x86/amd' and 'vfio-notifier-fix' into next	2022-05-20 12:27:17 +02:00
Abhishek Sahu	7ab5e10eda	vfio/pci: Move the unused device into low power state with runtime PM Currently, there is very limited power management support available in the upstream vfio_pci_core based drivers. If there are no users of the device, then the PCI device will be moved into D3hot state by writing directly into PCI PM registers. This D3hot state help in saving power but we can achieve zero power consumption if we go into the D3cold state. The D3cold state cannot be possible with native PCI PM. It requires interaction with platform firmware which is system-specific. To go into low power states (including D3cold), the runtime PM framework can be used which internally interacts with PCI and platform firmware and puts the device into the lowest possible D-States. This patch registers vfio_pci_core based drivers with the runtime PM framework. 1. The PCI core framework takes care of most of the runtime PM related things. For enabling the runtime PM, the PCI driver needs to decrement the usage count and needs to provide 'struct dev_pm_ops' at least. The runtime suspend/resume callbacks are optional and needed only if we need to do any extra handling. Now there are multiple vfio_pci_core based drivers. Instead of assigning the 'struct dev_pm_ops' in individual parent driver, the vfio_pci_core itself assigns the 'struct dev_pm_ops'. There are other drivers where the 'struct dev_pm_ops' is being assigned inside core layer (For example, wlcore_probe() and some sound based driver, etc.). 2. This patch provides the stub implementation of 'struct dev_pm_ops'. The subsequent patch will provide the runtime suspend/resume callbacks. All the config state saving, and PCI power management related things will be done by PCI core framework itself inside its runtime suspend/resume callbacks (pci_pm_runtime_suspend() and pci_pm_runtime_resume()). 3. Inside pci_reset_bus(), all the devices in dev_set needs to be runtime resumed. vfio_pci_dev_set_pm_runtime_get() will take care of the runtime resume and its error handling. 4. Inside vfio_pci_core_disable(), the device usage count always needs to be decremented which was incremented in vfio_pci_core_enable(). 5. Since the runtime PM framework will provide the same functionality, so directly writing into PCI PM config register can be replaced with the use of runtime PM routines. Also, the use of runtime PM can help us in more power saving. In the systems which do not support D3cold, With the existing implementation: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3hot So, with runtime PM, the upstream bridge or root port will also go into lower power state which is not possible with existing implementation. In the systems which support D3cold, // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3cold // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3cold So, with runtime PM, both the PCI device and upstream bridge will go into D3cold state. 6. If 'disable_idle_d3' module parameter is set, then also the runtime PM will be enabled, but in this case, the usage count should not be decremented. 7. vfio_pci_dev_set_try_reset() return value is unused now, so this function return type can be changed to void. 8. Use the runtime PM API's in vfio_pci_core_sriov_configure(). The device can be in low power state either with runtime power management (when there is no user) or PCI_PM_CTRL register write by the user. In both the cases, the PF should be moved to D0 state. For preventing any runtime usage mismatch, pci_num_vf() has been called explicitly during disable. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-5-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-18 10:00:48 -06:00
Abhishek Sahu	54918c2874	vfio/pci: Virtualize PME related registers bits and initialize to zero If any PME event will be generated by PCI, then it will be mostly handled in the host by the root port PME code. For example, in the case of PCIe, the PME event will be sent to the root port and then the PME interrupt will be generated. This will be handled in drivers/pci/pcie/pme.c at the host side. Inside this, the pci_check_pme_status() will be called where PME_Status and PME_En bits will be cleared. So, the guest OS which is using vfio-pci device will not come to know about this PME event. To handle these PME events inside guests, we need some framework so that if any PME events will happen, then it needs to be forwarded to virtual machine monitor. We can virtualize PME related registers bits and initialize these bits to zero so vfio-pci device user will assume that it is not capable of asserting the PME# signal from any power state. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-4-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-18 10:00:48 -06:00
Abhishek Sahu	f4162eb1e2	vfio/pci: Change the PF power state to D0 before enabling VFs According to [PCIe v5 9.6.2] for PF Device Power Management States "The PF's power management state (D-state) has global impact on its associated VFs. If a VF does not implement the Power Management Capability, then it behaves as if it is in an equivalent power state of its associated PF. If a VF implements the Power Management Capability, the Device behavior is undefined if the PF is placed in a lower power state than the VF. Software should avoid this situation by placing all VFs in lower power state before lowering their associated PF's power state." From the vfio driver side, user can enable SR-IOV when the PF is in D3hot state. If VF does not implement the Power Management Capability, then the VF will be actually in D3hot state and then the VF BAR access will fail. If VF implements the Power Management Capability, then VF will assume that its current power state is D0 when the PF is D3hot and in this case, the behavior is undefined. To support PF power management, we need to create power management dependency between PF and its VF's. The runtime power management support may help with this where power management dependencies are supported through device links. But till we have such support in place, we can disallow the PF to go into low power state, if PF has VF enabled. There can be a case, where user first enables the VF's and then disables the VF's. If there is no user of PF, then the PF can put into D3hot state again. But with this patch, the PF will still be in D0 state after disabling VF's since detecting this case inside vfio_pci_core_sriov_configure() requires access to struct vfio_device::open_count along with its locks. But the subsequent patches related to runtime PM will handle this case since runtime PM maintains its own usage count. Also, vfio_pci_core_sriov_configure() can be called at any time (with and without vfio pci device user), so the power state change and SR-IOV enablement need to be protected with the required locks. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-3-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-18 10:00:48 -06:00
Abhishek Sahu	2b2c651baf	vfio/pci: Invalidate mmaps and block the access in D3hot power state According to [PCIe v5 5.3.1.4.1] for D3hot state "Configuration and Message requests are the only TLPs accepted by a Function in the D3Hot state. All other received Requests must be handled as Unsupported Requests, and all received Completions may optionally be handled as Unexpected Completions." Currently, if the vfio PCI device has been put into D3hot state and if user makes non-config related read/write request in D3hot state, these requests will be forwarded to the host and this access may cause issues on a few systems. This patch leverages the memory-disable support added in commit 'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")' to generate page fault on mmap access and return error for the direct read/write. If the device is D3hot state, then the error will be returned for MMIO access. The IO access generally does not make the system unresponsive so the IO access can still happen in D3hot state. The default value should be returned in this case without bringing down the complete system. Also, the power related structure fields need to be protected so we can use the same 'memory_lock' to protect these fields also. This protection is mainly needed when user changes the PCI power state by writing into PCI_PM_CTRL register. vfio_lock_and_set_power_state() wrapper function will take the required locks and then it will invoke the vfio_pci_set_power_state(). Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-2-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-18 09:59:18 -06:00
Jason Gunthorpe	3ca5470878	vfio: Change struct vfio_group::container_users to a non-atomic int Now that everything is fully locked there is no need for container_users to remain as an atomic, change it to an unsigned int. Use 'if (group->container)' as the test to determine if the container is present or not instead of using container_users. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/6-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	b76c0eed74	vfio: Simplify the life cycle of the group FD Once userspace opens a group FD it is prevented from opening another instance of that same group FD until all the prior group FDs and users of the container are done. The first is done trivially by checking the group->opened during group FD open. However, things get a little weird if userspace creates a device FD and then closes the group FD. The group FD still cannot be re-opened, but this time it is because the group->container is still set and container_users is elevated by the device FD. Due to this mismatched lifecycle we have the vfio_group_try_dissolve_container() which tries to auto-free a container after the group FD is closed but the device FD remains open. Instead have the device FD hold onto a reference to the single group FD. This directly prevents vfio_group_fops_release() from being called when any device FD exists and makes the lifecycle model more understandable. vfio_group_try_dissolve_container() is removed as the only place a container is auto-deleted is during vfio_group_fops_release(). At this point the container_users is either 1 or 0 since all device FDs must be closed. Change group->opened to group->opened_file which points to the single struct file * that is open for the group. If the group->open_file is NULL then group->container == NULL. If all device FDs have closed then the group's notifier list must be empty. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/5-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	e0e29bdb59	vfio: Fully lock struct vfio_group::container This is necessary to avoid various user triggerable races, for instance racing SET_CONTAINER/UNSET_CONTAINER: ioctl(VFIO_GROUP_SET_CONTAINER) ioctl(VFIO_GROUP_UNSET_CONTAINER) vfio_group_unset_container int users = atomic_cmpxchg(&group->container_users, 1, 0); // users == 1 container_users == 0 __vfio_group_unset_container(group); container = group->container; vfio_group_set_container() if (!atomic_read(&group->container_users)) down_write(&container->group_lock); group->container = container; up_write(&container->group_lock); down_write(&container->group_lock); group->container = NULL; up_write(&container->group_lock); vfio_container_put(container); /* woops we lost/leaked the new container */ This can then go on to NULL pointer deref since container == 0 and container_users == 1. Wrap all touches of container, except those on a performance path with a known open device, with the group_rwsem. The only user of vfio_group_add_container_user() holds the user count for a simple operation, change it to just hold the group_lock over the operation and delete vfio_group_add_container_user(). Containers now only gain a user when a device FD is opened. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/4-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	805bb6c1bd	vfio: Split up vfio_group_get_device_fd() The split follows the pairing with the destroy functions: - vfio_group_get_device_fd() destroyed by close() - vfio_device_open() destroyed by vfio_device_fops_release() - vfio_device_assign_container() destroyed by vfio_group_try_dissolve_container() The next patch will put a lock around vfio_device_assign_container(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/3-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	c6f4860ef9	vfio: Change struct vfio_group::opened from an atomic to bool This is not a performance path, just use the group_rwsem to protect the value. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/2-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	be8d3adae6	vfio: Add missing locking for struct vfio_group::kvm Without locking userspace can trigger a UAF by racing KVM_DEV_VFIO_GROUP_DEL with VFIO_GROUP_GET_DEVICE_FD: CPU1 CPU2 ioctl(KVM_DEV_VFIO_GROUP_DEL) ioctl(VFIO_GROUP_GET_DEVICE_FD) vfio_group_get_device_fd open_device() intel_vgpu_open_device() vfio_register_notifier() vfio_register_group_notifier() blocking_notifier_call_chain(&group->notifier, VFIO_GROUP_NOTIFY_SET_KVM, group->kvm); set_kvm() group->kvm = NULL close() kfree(kvm) intel_vgpu_group_notifier() vdev->kvm = data [..] kvm_get_kvm(vgpu->kvm); // UAF! Add a simple rwsem in the group to protect the kvm while the notifier is using it. Note this doesn't fix the race internal to i915 where userspace can trigger two VFIO_GROUP_NOTIFY_SET_KVM's before we reach a consumer of vgpu->kvm and trigger this same UAF, it just makes the notifier self-consistent. Fixes: `ccd46dbae7` ("vfio: support notifier chain in vfio_group") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/1-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-17 13:07:09 -06:00
Jason Gunthorpe	6a985ae80b	vfio/pci: Use the struct file as the handle not the vfio_group VFIO PCI does a security check as part of hot reset to prove that the user has permission to manipulate all the devices that will be impacted by the reset. Use a new API vfio_file_has_dev() to perform this security check against the struct file directly and remove the vfio_group from VFIO PCI. Since VFIO PCI was the last user of vfio_group_get_external_user() and vfio_group_put_external_user() remove it as well. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:14:20 -06:00
Jason Gunthorpe	ba70a89f3c	vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm() Just change the argument from struct vfio_group to struct file *. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:14:20 -06:00
Jason Gunthorpe	a905ad043f	vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent() Instead of a general extension check change the function into a limited test if the iommu_domain has enforced coherency, which is the only thing kvm needs to query. Make the new op self contained by properly refcounting the container before touching it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:14:20 -06:00
Jason Gunthorpe	c38ff5b0c3	vfio: Remove vfio_external_group_match_file() vfio_group_fops_open() ensures there is only ever one struct file open for any struct vfio_group at any time: /* Do we need multiple instances of the group open? Seems not. / opened = atomic_cmpxchg(&group->opened, 0, 1); if (opened) { vfio_group_put(group); return -EBUSY; Therefor the struct file can be used directly to search the list of VFIO groups that KVM keeps instead of using the vfio_external_group_match_file() callback to try to figure out if the passed in FD matches the list or not. Delete vfio_external_group_match_file(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:14:19 -06:00
Jason Gunthorpe	50d63b5bbf	vfio: Change vfio_external_user_iommu_id() to vfio_file_iommu_group() The only caller wants to get a pointer to the struct iommu_group associated with the VFIO group file. Instead of returning the group ID then searching sysfs for that string to get the struct iommu_group just directly return the iommu_group pointer already held by the vfio_group struct. It already has a safe lifetime due to the struct file kref, the vfio_group and thus the iommu_group cannot be destroyed while the group file is open. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:14:19 -06:00
Jason Gunthorpe	dc15f82f53	vfio: Delete container_q Now that the iommu core takes care of isolation there is no race between driver attach and container unset. Once iommu_group_release_dma_owner() returns the device can immediately be re-used. Remove this mechanism. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-a1e8791d795b+6b-vfio_container_q_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:08:02 -06:00
Alex Williamson	c5e8c39282	Merge remote-tracking branch 'iommu/vfio-notifier-fix' into v5.19/vfio/next Merge IOMMU dependencies for vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-13 10:04:22 -06:00
Jason Gunthorpe	ff806cbd90	vfio/pci: Remove vfio_device_get_from_dev() The last user of this function is in PCI callbacks that want to convert their struct pci_dev to a vfio_device. Instead of searching use the vfio_device available trivially through the drvdata. When a callback in the device_driver is called, the caller must hold the device_lock() on dev. The purpose of the device_lock is to prevent remove() from being called (see __device_release_driver), and allow the driver to safely interact with its drvdata without races. The PCI core correctly follows this and holds the device_lock() when calling error_detected (see report_error_detected) and sriov_configure (see sriov_numvfs_store). Further, since the drvdata holds a positive refcount on the vfio_device any access of the drvdata, under the device_lock(), from a driver callback needs no further protection or refcounting. Thus the remark in the vfio_device_get_from_dev() comment does not apply here, VFIO PCI drivers all call vfio_unregister_group_dev() from their remove callbacks under the device_lock() and cannot race with the remaining callers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:32:56 -06:00
Jason Gunthorpe	91be0bd6c6	vfio/pci: Have all VFIO PCI drivers store the vfio_pci_core_device in drvdata Having a consistent pointer in the drvdata will allow the next patch to make use of the drvdata from some of the core code helpers. Use a WARN_ON inside vfio_pci_core_register_device() to detect drivers that miss this. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:32:56 -06:00
Jason Gunthorpe	eadd86f835	vfio: Remove calls to vfio_group_add_container_user() When the open_device() op is called the container_users is incremented and held incremented until close_device(). Thus, so long as drivers call functions within their open_device()/close_device() region they do not need to worry about the container_users. These functions can all only be called between open_device() and close_device(): vfio_pin_pages() vfio_unpin_pages() vfio_dma_rw() vfio_register_notifier() vfio_unregister_notifier() Eliminate the calls to vfio_group_add_container_user() and add vfio_assert_device_open() to detect driver mis-use. This causes the close_device() op to check device->open_count so always leave it elevated while calling the op. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:13:00 -06:00
Jason Gunthorpe	231657b345	vfio: Remove dead code Now that callers have been updated to use the vfio_device APIs the driver facing group interface is no longer used, delete it: - vfio_group_get_external_user_from_dev() - vfio_group_pin_pages() - vfio_group_unpin_pages() - vfio_group_iommu_domain() -- Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:13:00 -06:00
Jason Gunthorpe	c6250ffbac	vfio/mdev: Pass in a struct vfio_device * to vfio_dma_rw() Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. Change vfio_dma_rw() to take in the struct vfio_device and move the container users that would have been held by vfio_group_get_external_user_from_dev() to vfio_dma_rw() directly, like vfio_pin/unpin_pages(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:59 -06:00
Jason Gunthorpe	8e432bb015	vfio/mdev: Pass in a struct vfio_device * to vfio_pin/unpin_pages() Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. The struct vfio_device already contains the group we need so this avoids complexity, extra refcountings, and a confusing lifecycle model. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:59 -06:00
Jason Gunthorpe	09ea48efff	vfio: Make vfio_(un)register_notifier accept a vfio_device All callers have a struct vfio_device trivially available, pass it in directly and avoid calling the expensive vfio_group_get_from_dev(). Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:58 -06:00
Robin Murphy	a77109ffca	vfio: Stop using iommu_present() IOMMU groups have been mandatory for some time now, so a device without one is necessarily a device without any usable IOMMU, therefore the iommu_present() check is redundant (or at best unhelpful). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/537103bbd7246574f37f2c88704d7824a3a889f2.1649160714.git.robin.murphy@arm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:58 -06:00
Alex Williamson	5acb6cd19d	Merge tag 'gvt-next-2022-04-29' into v5.19/vfio/next Merge GVT-g dependencies for vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:05 -06:00
Alex Williamson	920df8d6ef	Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com Signed-of-by: Leon Romanovsky <leonro@nvidia.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCYntY3AAKCRAp8NhrnBAZ scRWAP0QzEqg/Xqk/geUAQ3dliFrA2DZJm8v9B3x5tA5nEAazAD9HqC17MvDzY8T 6KBP7G37JNg2NCkxnKnt2gCIT+O4lgA= =zwWT -----END PGP SIGNATURE----- Merge tag 'mlx5-lm-parallel' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into v5.19/vfio/next Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com Signed-of-by: Leon Romanovsky <leonro@nvidia.com>	2022-05-11 13:08:49 -06:00
Yishai Hadas	85c205db60	vfio/mlx5: Run the SAVE state command in an async mode Use the PF asynchronous command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Link: https://lore.kernel.org/r/20220510090206.90374-5-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-05-11 09:33:40 +03:00
Yishai Hadas	8580ad14f9	vfio/mlx5: Refactor to enable VFs migration in parallel Refactor to enable different VFs to run their commands over the PF command interface in parallel and to not block one each other. This is done by not using the global PF lock that was used before but relying on the VF attach/detach mechanism to sync. Link: https://lore.kernel.org/r/20220510090206.90374-4-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-05-11 09:33:33 +03:00
Yishai Hadas	61a2f1460f	vfio/mlx5: Manage the VF attach/detach callback from the PF Manage the VF attach/detach callback from the PF. This lets the driver to enable parallel VFs migration as will be introduced in the next patch. As part of this, reorganize the VF is migratable code to be in a separate function and rename it to be set_migratable() to match its functionality. Link: https://lore.kernel.org/r/20220510090206.90374-3-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-05-11 09:33:22 +03:00
Dave Airlie	d53b8e19c2	Merge tag 'drm-intel-next-2022-05-06' of git://anongit.freedesktop.org/drm/drm-intel into drm-next drm/i915 feature pull #2 for v5.19: Features and functionality: - Add first set of DG2 PCI IDs for "motherboard down" designs (Matt Roper) - Add initial RPL-P PCI IDs as ADL-P subplatform (Matt Atwood) Refactoring and cleanups: - Power well refactoring and cleanup (Imre) - GVT-g refactor and mdev API cleanup (Christoph, Jason, Zhi) - DPLL refactoring and cleanup (Ville) - VBT panel specific data parsing cleanup (Ville) - Use drm_mode_init() for on-stack modes (Ville) Fixes: - Fix PSR state pipe A/B confusion by clearing more state on disable (José) - Fix FIFO underruns caused by not taking DRAM channel into account (Vinod) - Fix FBC flicker on display 11+ by enabling a workaround (José) - Fix VBT seamless DRRS min refresh rate check (Ville) - Fix panel type assumption on bogus VBT data (Ville) - Fix panel data parsing for VBT that misses panel data pointers block (Ville) - Fix spurious AUX timeout/hotplug handling on LTTPR links (Imre) Merges: - Backmerge drm-next (Jani) - GVT changes (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87bkwbkkdo.fsf@intel.com	2022-05-11 11:00:15 +10:00
Jason Gunthorpe	e8ae0e140c	vfio: Require that devices support DMA cache coherence IOMMU_CACHE means that normal DMAs do not require any additional coherency mechanism and is the basic uAPI that VFIO exposes to userspace. For instance VFIO applications like DPDK will not work if additional coherency operations are required. Therefore check IOMMU_CAP_CACHE_COHERENCY like vdpa & usnic do before allowing an IOMMU backed VFIO device to be created. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 17:24:57 +02:00
Jason Gunthorpe	71cfafda9c	vfio: Move the Intel no-snoop control off of IOMMU_CACHE IOMMU_CACHE means "normal DMA to this iommu_domain's IOVA should be cache coherent" and is used by the DMA API. The definition allows for special non-coherent DMA to exist - ie processing of the no-snoop flag in PCIe TLPs - so long as this behavior is opt-in by the device driver. The flag is mainly used by the DMA API to synchronize the IOMMU setting with the expected cache behavior of the DMA master. eg based on dev_is_dma_coherent() in some case. For Intel IOMMU IOMMU_CACHE was redefined to mean 'force all DMA to be cache coherent' which has the practical effect of causing the IOMMU to ignore the no-snoop bit in a PCIe TLP. x86 platforms are always IOMMU_CACHE, so Intel should ignore this flag. Instead use the new domain op enforce_cache_coherency() which causes every IOPTE created in the domain to have the no-snoop blocking behavior. Reconfigure VFIO to always use IOMMU_CACHE and call enforce_cache_coherency() to operate the special Intel behavior. Remove the IOMMU_CACHE test from Intel IOMMU. Ultimately VFIO plumbs the result of enforce_cache_coherency() back into the x86 platform code through kvm_arch_register_noncoherent_dma() which controls if the WBINVD instruction is available in the guest. No other archs implement kvm_arch_register_noncoherent_dma() nor are there any other known consumers of VFIO_DMA_CC_IOMMU that might be affected by the user visible result change on non-x86 archs. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 17:24:57 +02:00
Lu Baolu	3b86f317c9	vfio: Remove iommu group notifier The iommu core and driver core have been enhanced to avoid unsafe driver binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER) has been called. There's no need to register iommu group notifier. This removes the iommu group notifer which contains BUG_ON() and WARN(). Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 15:32:20 +02:00
Jason Gunthorpe	93219ea943	vfio: Delete the unbound_list commit `60720a0fc6` ("vfio: Add device tracking during unbind") added the unbound list to plug a problem with KVM where KVM_DEV_VFIO_GROUP_DEL relied on vfio_group_get_external_user() succeeding to return the vfio_group from a group file descriptor. The unbound list allowed vfio_group_get_external_user() to continue to succeed in edge cases. However commit `5d6dee80a1` ("vfio: New external user group/file match") deleted the call to vfio_group_get_external_user() during KVM_DEV_VFIO_GROUP_DEL. Instead vfio_external_group_match_file() is used to directly match the file descriptor to the group pointer. This in turn avoids the call down to vfio_dev_viable() during KVM_DEV_VFIO_GROUP_DEL and also avoids the trouble the first commit was trying to fix. There are no other users of vfio_dev_viable() that care about the time after vfio_unregister_group_dev() returns, so simply delete the unbound_list entirely. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 15:32:20 +02:00
Lu Baolu	31076af0cb	vfio: Remove use of vfio_group_viable() As DMA ownership is claimed for the iommu group when a VFIO group is added to a VFIO container, the VFIO group viability is guaranteed as long as group->container_users > 0. Remove those unnecessary group viability checks which are only hit when group->container_users is not zero. The only remaining reference is in GROUP_GET_STATUS, which could be called at any time when group fd is valid. Here we just replace the vfio_group_viable() by directly calling IOMMU core to get viability status. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-9-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 15:32:20 +02:00

1 2 3 4 5 ...

1009 Commits