drm/doc: device hot-unplug for userspace
Set up the expectations on how hot-unplugging a DRM device should look like to userspace. Written by Daniel Vetter's request and largely based on his comments in IRC and from https://lists.freedesktop.org/archives/dri-devel/2020-May/265484.html . A related Wayland protocol change proposal is at https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/35 Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Noralf Trønnes <noralf@tronnes.org> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Sean Paul <sean@poorly.run> Cc: Simon Ser <contact@emersion.fr> Cc: Ben Skeggs <skeggsb@gmail.com> Cc: Karol Herbst <kherbst@redhat.com> Acked-by: Simon Ser <contact@emersion.fr> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200707113805.30936-1-ppaalanen@gmail.com
This commit is contained in:
		
							parent
							
								
									85b3bfa266
								
							
						
					
					
						commit
						cfb9b89f11
					
				| @ -1,3 +1,5 @@ | ||||
| .. Copyright 2020 DisplayLink (UK) Ltd. | ||||
| 
 | ||||
| =================== | ||||
| Userland interfaces | ||||
| =================== | ||||
| @ -162,6 +164,116 @@ other hand, a driver requires shared state between clients which is | ||||
| visible to user-space and accessible beyond open-file boundaries, they | ||||
| cannot support render nodes. | ||||
| 
 | ||||
| Device Hot-Unplug | ||||
| ================= | ||||
| 
 | ||||
| .. note:: | ||||
|    The following is the plan. Implementation is not there yet | ||||
|    (2020 May). | ||||
| 
 | ||||
| Graphics devices (display and/or render) may be connected via USB (e.g. | ||||
| display adapters or docking stations) or Thunderbolt (e.g. eGPU). An end | ||||
| user is able to hot-unplug this kind of devices while they are being | ||||
| used, and expects that the very least the machine does not crash. Any | ||||
| damage from hot-unplugging a DRM device needs to be limited as much as | ||||
| possible and userspace must be given the chance to handle it if it wants | ||||
| to. Ideally, unplugging a DRM device still lets a desktop continue to | ||||
| run, but that is going to need explicit support throughout the whole | ||||
| graphics stack: from kernel and userspace drivers, through display | ||||
| servers, via window system protocols, and in applications and libraries. | ||||
| 
 | ||||
| Other scenarios that should lead to the same are: unrecoverable GPU | ||||
| crash, PCI device disappearing off the bus, or forced unbind of a driver | ||||
| from the physical device. | ||||
| 
 | ||||
| In other words, from userspace perspective everything needs to keep on | ||||
| working more or less, until userspace stops using the disappeared DRM | ||||
| device and closes it completely. Userspace will learn of the device | ||||
| disappearance from the device removed uevent, ioctls returning ENODEV | ||||
| (or driver-specific ioctls returning driver-specific things), or open() | ||||
| returning ENXIO. | ||||
| 
 | ||||
| Only after userspace has closed all relevant DRM device and dmabuf file | ||||
| descriptors and removed all mmaps, the DRM driver can tear down its | ||||
| instance for the device that no longer exists. If the same physical | ||||
| device somehow comes back in the mean time, it shall be a new DRM | ||||
| device. | ||||
| 
 | ||||
| Similar to PIDs, chardev minor numbers are not recycled immediately. A | ||||
| new DRM device always picks the next free minor number compared to the | ||||
| previous one allocated, and wraps around when minor numbers are | ||||
| exhausted. | ||||
| 
 | ||||
| The goal raises at least the following requirements for the kernel and | ||||
| drivers. | ||||
| 
 | ||||
| Requirements for KMS UAPI | ||||
| ------------------------- | ||||
| 
 | ||||
| - KMS connectors must change their status to disconnected. | ||||
| 
 | ||||
| - Legacy modesets and pageflips, and atomic commits, both real and | ||||
|   TEST_ONLY, and any other ioctls either fail with ENODEV or fake | ||||
|   success. | ||||
| 
 | ||||
| - Pending non-blocking KMS operations deliver the DRM events userspace | ||||
|   is expecting. This applies also to ioctls that faked success. | ||||
| 
 | ||||
| - open() on a device node whose underlying device has disappeared will | ||||
|   fail with ENXIO. | ||||
| 
 | ||||
| - Attempting to create a DRM lease on a disappeared DRM device will | ||||
|   fail with ENODEV. Existing DRM leases remain and work as listed | ||||
|   above. | ||||
| 
 | ||||
| Requirements for Render and Cross-Device UAPI | ||||
| --------------------------------------------- | ||||
| 
 | ||||
| - All GPU jobs that can no longer run must have their fences | ||||
|   force-signalled to avoid inflicting hangs on userspace. | ||||
|   The associated error code is ENODEV. | ||||
| 
 | ||||
| - Some userspace APIs already define what should happen when the device | ||||
|   disappears (OpenGL, GL ES: `GL_KHR_robustness`_; `Vulkan`_: | ||||
|   VK_ERROR_DEVICE_LOST; etc.). DRM drivers are free to implement this | ||||
|   behaviour the way they see best, e.g. returning failures in | ||||
|   driver-specific ioctls and handling those in userspace drivers, or | ||||
|   rely on uevents, and so on. | ||||
| 
 | ||||
| - dmabuf which point to memory that has disappeared will either fail to | ||||
|   import with ENODEV or continue to be successfully imported if it would | ||||
|   have succeeded before the disappearance. See also about memory maps | ||||
|   below for already imported dmabufs. | ||||
| 
 | ||||
| - Attempting to import a dmabuf to a disappeared device will either fail | ||||
|   with ENODEV or succeed if it would have succeeded without the | ||||
|   disappearance. | ||||
| 
 | ||||
| - open() on a device node whose underlying device has disappeared will | ||||
|   fail with ENXIO. | ||||
| 
 | ||||
| .. _GL_KHR_robustness: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_robustness.txt | ||||
| .. _Vulkan: https://www.khronos.org/vulkan/ | ||||
| 
 | ||||
| Requirements for Memory Maps | ||||
| ---------------------------- | ||||
| 
 | ||||
| Memory maps have further requirements that apply to both existing maps | ||||
| and maps created after the device has disappeared. If the underlying | ||||
| memory disappears, the map is created or modified such that reads and | ||||
| writes will still complete successfully but the result is undefined. | ||||
| This applies to both userspace mmap()'d memory and memory pointed to by | ||||
| dmabuf which might be mapped to other devices (cross-device dmabuf | ||||
| imports). | ||||
| 
 | ||||
| Raising SIGBUS is not an option, because userspace cannot realistically | ||||
| handle it. Signal handlers are global, which makes them extremely | ||||
| difficult to use correctly from libraries like those that Mesa produces. | ||||
| Signal handlers are not composable, you can't have different handlers | ||||
| for GPU1 and GPU2 from different vendors, and a third handler for | ||||
| mmapped regular files. Threads cause additional pain with signal | ||||
| handling as well. | ||||
| 
 | ||||
| .. _drm_driver_ioctl: | ||||
| 
 | ||||
| IOCTL Support on Device Nodes | ||||
| @ -199,7 +311,7 @@ EPERM/EACCES: | ||||
|         difference between EACCES and EPERM. | ||||
| 
 | ||||
| ENODEV: | ||||
|         The device is not (yet) present or fully initialized. | ||||
|         The device is not present anymore or is not yet fully initialized. | ||||
| 
 | ||||
| EOPNOTSUPP: | ||||
|         Feature (like PRIME, modesetting, GEM) is not supported by the driver. | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user