mirror of
https://github.com/torvalds/linux.git
synced 2024-11-22 04:02:20 +00:00
iommufd: Document overview of iommufd
Add iommufd into the documentation tree, and supply initial documentation. Much of this is linked from code comments by kdoc. Link: https://lore.kernel.org/r/5-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This commit is contained in:
parent
632ce1377d
commit
658234de0d
@ -25,6 +25,7 @@ place where this information is gathered.
|
||||
ebpf/index
|
||||
ioctl/index
|
||||
iommu
|
||||
iommufd
|
||||
media/index
|
||||
netlink/index
|
||||
sysfs-platform_profile
|
||||
|
223
Documentation/userspace-api/iommufd.rst
Normal file
223
Documentation/userspace-api/iommufd.rst
Normal file
@ -0,0 +1,223 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0+
|
||||
|
||||
=======
|
||||
IOMMUFD
|
||||
=======
|
||||
|
||||
:Author: Jason Gunthorpe
|
||||
:Author: Kevin Tian
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
IOMMUFD is the user API to control the IOMMU subsystem as it relates to managing
|
||||
IO page tables from userspace using file descriptors. It intends to be general
|
||||
and consumable by any driver that wants to expose DMA to userspace. These
|
||||
drivers are eventually expected to deprecate any internal IOMMU logic
|
||||
they may already/historically implement (e.g. vfio_iommu_type1.c).
|
||||
|
||||
At minimum iommufd provides universal support of managing I/O address spaces and
|
||||
I/O page tables for all IOMMUs, with room in the design to add non-generic
|
||||
features to cater to specific hardware functionality.
|
||||
|
||||
In this context the capital letter (IOMMUFD) refers to the subsystem while the
|
||||
small letter (iommufd) refers to the file descriptors created via /dev/iommu for
|
||||
use by userspace.
|
||||
|
||||
Key Concepts
|
||||
============
|
||||
|
||||
User Visible Objects
|
||||
--------------------
|
||||
|
||||
Following IOMMUFD objects are exposed to userspace:
|
||||
|
||||
- IOMMUFD_OBJ_IOAS, representing an I/O address space (IOAS), allowing map/unmap
|
||||
of user space memory into ranges of I/O Virtual Address (IOVA).
|
||||
|
||||
The IOAS is a functional replacement for the VFIO container, and like the VFIO
|
||||
container it copies an IOVA map to a list of iommu_domains held within it.
|
||||
|
||||
- IOMMUFD_OBJ_DEVICE, representing a device that is bound to iommufd by an
|
||||
external driver.
|
||||
|
||||
- IOMMUFD_OBJ_HW_PAGETABLE, representing an actual hardware I/O page table
|
||||
(i.e. a single struct iommu_domain) managed by the iommu driver.
|
||||
|
||||
The IOAS has a list of HW_PAGETABLES that share the same IOVA mapping and
|
||||
it will synchronize its mapping with each member HW_PAGETABLE.
|
||||
|
||||
All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
|
||||
|
||||
The diagram below shows relationship between user-visible objects and kernel
|
||||
datastructures (external to iommufd), with numbers referred to operations
|
||||
creating the objects and links::
|
||||
|
||||
_________________________________________________________
|
||||
| iommufd |
|
||||
| [1] |
|
||||
| _________________ |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | [3] [2] |
|
||||
| | | ____________ __________ |
|
||||
| | IOAS |<--| |<------| | |
|
||||
| | | |HW_PAGETABLE| | DEVICE | |
|
||||
| | | |____________| |__________| |
|
||||
| | | | | |
|
||||
| | | | | |
|
||||
| | | | | |
|
||||
| | | | | |
|
||||
| | | | | |
|
||||
| |_________________| | | |
|
||||
| | | | |
|
||||
|_________|___________________|___________________|_______|
|
||||
| | |
|
||||
| _____v______ _______v_____
|
||||
| PFN storage | | | |
|
||||
|------------>|iommu_domain| |struct device|
|
||||
|____________| |_____________|
|
||||
|
||||
1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can
|
||||
hold multiple IOAS objects. IOAS is the most generic object and does not
|
||||
expose interfaces that are specific to single IOMMU drivers. All operations
|
||||
on the IOAS must operate equally on each of the iommu_domains inside of it.
|
||||
|
||||
2. IOMMUFD_OBJ_DEVICE is created when an external driver calls the IOMMUFD kAPI
|
||||
to bind a device to an iommufd. The driver is expected to implement a set of
|
||||
ioctls to allow userspace to initiate the binding operation. Successful
|
||||
completion of this operation establishes the desired DMA ownership over the
|
||||
device. The driver must also set the driver_managed_dma flag and must not
|
||||
touch the device until this operation succeeds.
|
||||
|
||||
3. IOMMUFD_OBJ_HW_PAGETABLE is created when an external driver calls the IOMMUFD
|
||||
kAPI to attach a bound device to an IOAS. Similarly the external driver uAPI
|
||||
allows userspace to initiate the attaching operation. If a compatible
|
||||
pagetable already exists then it is reused for the attachment. Otherwise a
|
||||
new pagetable object and iommu_domain is created. Successful completion of
|
||||
this operation sets up the linkages among IOAS, device and iommu_domain. Once
|
||||
this completes the device could do DMA.
|
||||
|
||||
Every iommu_domain inside the IOAS is also represented to userspace as a
|
||||
HW_PAGETABLE object.
|
||||
|
||||
.. note::
|
||||
|
||||
Future IOMMUFD updates will provide an API to create and manipulate the
|
||||
HW_PAGETABLE directly.
|
||||
|
||||
A device can only bind to an iommufd due to DMA ownership claim and attach to at
|
||||
most one IOAS object (no support of PASID yet).
|
||||
|
||||
Kernel Datastructure
|
||||
--------------------
|
||||
|
||||
User visible objects are backed by following datastructures:
|
||||
|
||||
- iommufd_ioas for IOMMUFD_OBJ_IOAS.
|
||||
- iommufd_device for IOMMUFD_OBJ_DEVICE.
|
||||
- iommufd_hw_pagetable for IOMMUFD_OBJ_HW_PAGETABLE.
|
||||
|
||||
Several terminologies when looking at these datastructures:
|
||||
|
||||
- Automatic domain - refers to an iommu domain created automatically when
|
||||
attaching a device to an IOAS object. This is compatible to the semantics of
|
||||
VFIO type1.
|
||||
|
||||
- Manual domain - refers to an iommu domain designated by the user as the
|
||||
target pagetable to be attached to by a device. Though currently there are
|
||||
no uAPIs to directly create such domain, the datastructure and algorithms
|
||||
are ready for handling that use case.
|
||||
|
||||
- In-kernel user - refers to something like a VFIO mdev that is using the
|
||||
IOMMUFD access interface to access the IOAS. This starts by creating an
|
||||
iommufd_access object that is similar to the domain binding a physical device
|
||||
would do. The access object will then allow converting IOVA ranges into struct
|
||||
page * lists, or doing direct read/write to an IOVA.
|
||||
|
||||
iommufd_ioas serves as the metadata datastructure to manage how IOVA ranges are
|
||||
mapped to memory pages, composed of:
|
||||
|
||||
- struct io_pagetable holding the IOVA map
|
||||
- struct iopt_area's representing populated portions of IOVA
|
||||
- struct iopt_pages representing the storage of PFNs
|
||||
- struct iommu_domain representing the IO page table in the IOMMU
|
||||
- struct iopt_pages_access representing in-kernel users of PFNs
|
||||
- struct xarray pinned_pfns holding a list of pages pinned by in-kernel users
|
||||
|
||||
Each iopt_pages represents a logical linear array of full PFNs. The PFNs are
|
||||
ultimately derived from userspace VAs via an mm_struct. Once they have been
|
||||
pinned the PFNs are stored in IOPTEs of an iommu_domain or inside the pinned_pfns
|
||||
xarray if they have been pinned through an iommufd_access.
|
||||
|
||||
PFN have to be copied between all combinations of storage locations, depending
|
||||
on what domains are present and what kinds of in-kernel "software access" users
|
||||
exist. The mechanism ensures that a page is pinned only once.
|
||||
|
||||
An io_pagetable is composed of iopt_areas pointing at iopt_pages, along with a
|
||||
list of iommu_domains that mirror the IOVA to PFN map.
|
||||
|
||||
Multiple io_pagetable-s, through their iopt_area-s, can share a single
|
||||
iopt_pages which avoids multi-pinning and double accounting of page
|
||||
consumption.
|
||||
|
||||
iommufd_ioas is sharable between subsystems, e.g. VFIO and VDPA, as long as
|
||||
devices managed by different subsystems are bound to a same iommufd.
|
||||
|
||||
IOMMUFD User API
|
||||
================
|
||||
|
||||
.. kernel-doc:: include/uapi/linux/iommufd.h
|
||||
|
||||
IOMMUFD Kernel API
|
||||
==================
|
||||
|
||||
The IOMMUFD kAPI is device-centric with group-related tricks managed behind the
|
||||
scene. This allows the external drivers calling such kAPI to implement a simple
|
||||
device-centric uAPI for connecting its device to an iommufd, instead of
|
||||
explicitly imposing the group semantics in its uAPI as VFIO does.
|
||||
|
||||
.. kernel-doc:: drivers/iommu/iommufd/device.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: drivers/iommu/iommufd/main.c
|
||||
:export:
|
||||
|
||||
VFIO and IOMMUFD
|
||||
----------------
|
||||
|
||||
Connecting a VFIO device to iommufd can be done in two ways.
|
||||
|
||||
First is a VFIO compatible way by directly implementing the /dev/vfio/vfio
|
||||
container IOCTLs by mapping them into io_pagetable operations. Doing so allows
|
||||
the use of iommufd in legacy VFIO applications by symlinking /dev/vfio/vfio to
|
||||
/dev/iommufd or extending VFIO to SET_CONTAINER using an iommufd instead of a
|
||||
container fd.
|
||||
|
||||
The second approach directly extends VFIO to support a new set of device-centric
|
||||
user API based on aforementioned IOMMUFD kernel API. It requires userspace
|
||||
change but better matches the IOMMUFD API semantics and easier to support new
|
||||
iommufd features when comparing it to the first approach.
|
||||
|
||||
Currently both approaches are still work-in-progress.
|
||||
|
||||
There are still a few gaps to be resolved to catch up with VFIO type1, as
|
||||
documented in iommufd_vfio_check_extension().
|
||||
|
||||
Future TODOs
|
||||
============
|
||||
|
||||
Currently IOMMUFD supports only kernel-managed I/O page table, similar to VFIO
|
||||
type1. New features on the radar include:
|
||||
|
||||
- Binding iommu_domain's to PASID/SSID
|
||||
- Userspace page tables, for ARM, x86 and S390
|
||||
- Kernel bypass'd invalidation of user page tables
|
||||
- Re-use of the KVM page table in the IOMMU
|
||||
- Dirty page tracking in the IOMMU
|
||||
- Runtime Increase/Decrease of IOPTE size
|
||||
- PRI support with faults resolved in userspace
|
Loading…
Reference in New Issue
Block a user