linux/arch/x86
Jens Axboe 2b188cc1bb Add io_uring IO interface
The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.

IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_cqe data structures. The SQ
ring is an index into the io_uring_sqe array, which makes it possible
to submit a batch of IOs without them being contiguous in the ring.
The CQ ring is always contiguous, as completion events are inherently
unordered, and hence any io_uring_cqe entry can point back to an
arbitrary submission.

Two new system calls are added for this:

io_uring_setup(entries, params)
	Sets up an io_uring instance for doing async IO. On success,
	returns a file descriptor that the application can mmap to
	gain access to the SQ ring, CQ ring, and io_uring_sqes.

io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
	Initiates IO against the rings mapped to this fd, or waits for
	them to complete, or both. The behavior is controlled by the
	parameters passed in. If 'to_submit' is non-zero, then we'll
	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
	kernel will wait for 'min_complete' events, if they aren't
	already available. It's valid to set IORING_ENTER_GETEVENTS
	and 'min_complete' == 0 at the same time, this allows the
	kernel to return already completed events without waiting
	for them. This is useful only for polling, as for IRQ
	driven IO, the application can just check the CQ ring
	without entering the kernel.

With this setup, it's possible to do async IO with a single system
call. Future developments will enable polled IO with this interface,
and polled submission as well. The latter will enable an application
to do IO without doing ANY system calls at all.

For IRQ driven IO, an application only needs to enter the kernel for
completions if it wants to wait for them to occur.

Each io_uring is backed by a workqueue, to support buffered async IO
as well. We will only punt to an async context if the command would
need to wait for IO on the device side. Any data that can be accessed
directly in the page cache is done inline. This avoids the slowness
issue of usual threadpools, since cached data is accessed as quickly
as a sync interface.

Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28 08:24:23 -07:00
..
boot x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting 2019-02-06 18:56:18 +01:00
configs PCI: consolidate PCI config entry in drivers/pci 2018-11-23 11:45:34 +09:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-12-27 13:53:32 -08:00
entry Add io_uring IO interface 2019-02-28 08:24:23 -07:00
events perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu() 2019-02-04 08:44:51 +01:00
hyperv x86/hyper-v: Add HvFlushGuestAddressList hypercall support 2018-12-21 11:28:39 +01:00
ia32 Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
include Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-02-10 09:57:42 -08:00
kernel Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-02-10 09:57:42 -08:00
kvm KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221) 2019-02-07 19:03:01 +01:00
lib x86: explicitly align IO accesses in memcpy_{to,from}io 2019-02-01 09:07:48 -08:00
math-emu Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
mm x86/mm/cpa: Fix set_mce_nospec() 2019-02-08 14:31:56 +01:00
net bpf: Add bpf_line_info support 2018-12-09 13:54:38 -08:00
oprofile
pci pci-v4.21-changes 2019-01-05 17:57:34 -08:00
platform Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-12-26 18:42:51 -08:00
power mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
purgatory
ras
realmode
tools x86: Clean up 'sizeof x' => 'sizeof(x)' 2018-10-29 07:13:28 +01:00
um Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
video
xen xen: fixes for 5.0-rc3 2019-01-19 05:53:41 +12:00
.gitignore
Kbuild KVM: x86: Allow Qemu/KVM to use PVH entry point 2018-12-13 13:41:49 -05:00
Kconfig x86/resctrl: Avoid confusion over the new X86_RESCTRL config 2019-02-02 10:34:52 +01:00
Kconfig.cpu x86/cpu: Create Hygon Dhyana architecture support file 2018-09-27 16:14:05 +02:00
Kconfig.debug x86/kconfig: Remove redundant 'default n' lines from all x86 Kconfig's 2018-10-17 08:39:42 +02:00
Makefile jump_label: move 'asm goto' support test to Kconfig 2019-01-06 09:46:51 +09:00
Makefile_32.cpu
Makefile.um x86, powerpc: Remove -funit-at-a-time compiler option entirely 2018-12-09 11:55:32 +01:00