linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 20:22:09 +00:00

Author	SHA1	Message	Date
Jens Axboe	977bc87356	io_uring/rsrc: always initialize 'folio' to NULL Smatch complains that: smatch warnings: io_uring/rsrc.c:1262 io_sqe_buffer_register() error: uninitialized symbol 'folio'. 'folio' may be used uninitialized, which can happen if we end up with a single page mapped. Ensure that we clear folio to NULL at the top so it's always set. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202302241432.YML1CD5C-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-24 12:58:31 -07:00
Pavel Begunkov	57bebf807e	io_uring/rsrc: optimise registered huge pages When registering huge pages, internally io_uring will split them into many PAGE_SIZE bvec entries. That's bad for performance as drivers need to eventually dma-map the data and will do it individually for each bvec entry. Coalesce huge pages into one large bvec. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-22 09:57:24 -07:00
Pavel Begunkov	b000ae0ec2	io_uring/rsrc: optimise single entry advance Iterating within the first bvec entry should be essentially free, but we use iov_iter_advance() for that, which shows up in benchmark profiles taking up to 0.5% of CPU. Replace it with a hand coded version. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-22 09:57:23 -07:00
Pavel Begunkov	edd4782696	io_uring/rsrc: disallow multi-source reg buffers If two or more mappings go back to back to each other they can be passed into io_uring to be registered as a single registered buffer. That would even work if mappings came from different sources, e.g. it's possible to mix in this way anon pages and pages from shmem or hugetlb. That is not a problem but it'd rather be less prone if we forbid such mixing. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-22 09:57:23 -07:00
Pavel Begunkov	9a1563d172	io_uring: remove unused wq_list_merge There are no users of wq_list_merge, kill it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5f9ad0301949213230ad9000a8359d591aae615a.1677002255.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-22 09:57:23 -07:00
Wojciech Lukowicz	48ba08374e	io_uring: fix size calculation when registering buf ring Using struct_size() to calculate the size of io_uring_buf_ring will sum the size of the struct and of the bufs array. However, the struct's fields are overlaid with the array making the calculated size larger than it should be. When registering a ring with N * PAGE_SIZE / sizeof(struct io_uring_buf) entries, i.e. with fully filled pages, the calculated size will span one more page than it should and io_uring will try to pin the following page. Depending on how the application allocated the ring, it might succeed using an unrelated page or fail returning EFAULT. The size of the ring should be the product of ring_entries and the size of io_uring_buf, i.e. the size of the bufs array only. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230218184141.70891-1-wlukowicz01@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-22 09:57:23 -07:00
Pavel Begunkov	6bf65a1b36	io_uring/rsrc: fix a comment in io_import_fixed() io_import_fixed() supports offsets, but "may not" means the opposite. Replace it with "might not" so the comments rather speaks about possible cases. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/5b5f79958456caa6dc532f6205f75f224b232c81.1676902343.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-22 09:57:23 -07:00
Jens Axboe	8d664282a0	io_uring: rename 'in_idle' to 'in_cancel' This better describes what it does - it's incremented when the task is currently undergoing a cancelation operation, due to exiting or exec'ing. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-22 09:57:23 -07:00
Jens Axboe	ce8e04f6e5	io_uring: consolidate the put_ref-and-return section of adding work We've got a few cases of this, move them to one section and just use gotos to get there. Reduces the text section on both arm64 and x86-64, using gcc-12.2. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-22 09:57:23 -07:00
Linus Torvalds	5b0ed59649	for-6.3/block-2023-02-16 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPvfncQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpob2EADXJxcr2jjYHm/7cjKkyuVX8fr80dNdMeuY JFdsjG1k6Uj73BVhQQWYTcs/PsrWBHWRsv6uz4WgOELj55eXmf5Q0kJszyUeJW33 /DjqLvtoppVcYf80xE13wKvCfn73BjwQo6xkGM0qAYn15eaXiD/Ax3xC6eJlsBeK PEw7EJyhacbSxZa/1D2B6+mqII1jUQWProTCc3udZ4JHi3WvdWa3Rda0qCqHl4a1 +K2aP2YTFIRPxBzfMNa/CafWVIFubTdht+4Ds6R60RImzB9e0VUBfcsiUyW5Zg7L Fwv7ptXuWrALwVNdW56Oz1QikBxn2pdRR2HMLwKJW1MD8kP9r8LMm2jV5Rhiwe0B OQsGRYkOzBvw+bxeP5fvk0iPGVMz6ActH4gkraA5QdLqayDaFYOadlhqz0uRo5SH Fb42Vl658K/MHDSIk8U58TNkmrsIJsBGohXI9DOGINPPvv3XOPi4Q1HmXkGRmii0 y+lNU/QEGh7xXXew29SPP76uQpQaYfC7NxXCMw/OpOMwehzjsjshmM2lpxi8zsgt PJUmfHv5qxCplNmTJXmUpmX7sS7550HUdu9FJb13DM+gzKg8bk9jWVuLrzqrVlG5 1hKWEl1+heg1heRfaIuJVLbPI0au6Sb4uqhih/PHyrP9TWIoAruDbDJM65GKTxyE 2uEgcHzHQw== =poRc -----END PGP SIGNATURE----- Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - NVMe updates via Christoph: - Small improvements to the logging functionality (Amit Engel) - Authentication cleanups (Hannes Reinecke) - Cleanup and optimize the DMA mapping cod in the PCIe driver (Keith Busch) - Work around the command effects for Format NVM (Keith Busch) - Misc cleanups (Keith Busch, Christoph Hellwig) - Fix and cleanup freeing single sgl (Keith Busch) - MD updates via Song: - Fix a rare crash during the takeover process - Don't update recovery_cp when curr_resync is ACTIVE - Free writes_pending in md_stop - Change active_io to percpu - Updates to drbd, inching us closer to unifying the out-of-tree driver with the in-tree one (Andreas, Christoph, Lars, Robert) - BFQ update adding support for multi-actuator drives (Paolo, Federico, Davide) - Make brd compliant with REQ_NOWAIT (me) - Fix for IOPOLL and queue entering, fixing stalled IO waiting on timeouts (me) - Fix for REQ_NOWAIT with multiple bios (me) - Fix memory leak in blktrace cleanup (Greg) - Clean up sbitmap and fix a potential hang (Kemeng) - Clean up some bits in BFQ, and fix a bug in the request injection (Kemeng) - Clean up the request allocation and issue code, and fix some bugs related to that (Kemeng) - ublk updates and fixes: - Add support for unprivileged ublk (Ming) - Improve device deletion handling (Ming) - Misc (Liu, Ziyang) - s390 dasd fixes (Alexander, Qiheng) - Improve utility of request caching and fixes (Anuj, Xiao) - zoned cleanups (Pankaj) - More constification for kobjs (Thomas) - blk-iocost cleanups (Yu) - Remove bio splitting from drivers that don't need it (Christoph) - Switch blk-cgroups to use struct gendisk. Some of this is now incomplete as select late reverts were done. (Christoph) - Add bvec initialization helpers, and convert callers to use that rather than open-coding it (Christoph) - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin, Matthew, Ulf, Zhong) * tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits) brd: use radix_tree_maybe_preload instead of radix_tree_preload block: use proper return value from bio_failfast() block: bio-integrity: Copy flags when bio_integrity_payload is cloned block: Fix io statistics for cgroup in throttle path brd: mark as nowait compatible brd: check for REQ_NOWAIT and set correct page allocation mask brd: return 0/-error from brd_insert_page() block: sync mixed merged request's failfast with 1st bio's Revert "blk-cgroup: pin the gendisk in struct blkcg_gq" Revert "blk-cgroup: pass a gendisk to blkg_lookup" Revert "blk-cgroup: delay blk-cgroup initialization until add_disk" Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release" Revert "blk-cgroup: move the cgroup information to struct gendisk" nvme-pci: remove iod use_sgls nvme-pci: fix freeing single sgl block: ublk: check IO buffer based on flag need_get_data s390/dasd: Fix potential memleak in dasd_eckd_init() s390/dasd: sort out physical vs virtual pointers usage block: Remove the ALLOC_CACHE_SLACK constant block: make kobj_type structures constant ...	2023-02-20 14:27:21 -08:00
Linus Torvalds	c1ef500307	for-6.3/iter-ubuf-2023-02-16 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPueOUQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpkEWD/9hOagNSeXfCd1eAJ44E5IemgHKqfU0RXRs kdW1o35eBXwPVAyhhDmcz60hkijm47Pw3IJUdSNaGqdm9uYpLwiatuYY5EOVC4qg BFkVPGCA8ERXStFM/mnWj0gkYDmb/8bzk9bdBU1FQvQOIQgYpomlHdMVfQJ+0tDT 7VTffRaWfcxWd1u+NBMDxmfz47teplxiHJDg38wGlgT6G1kMdEUK+y6hd0SoASPM ocMW8LL2v3wLQhQAOWYd6sw2kFnxx4VOzhSepPAY0U78CR6CYm6zthRd+k+Ro/nt RFKL6Ijt2LRaOZqY3HRnCpUwmhBNft0ZFH4OHh21vPaukB4sjWbQ5SJniucNcoCN rb9jAJDJdS6oy+Uimeig99aQ/yGSLJXG8MQKrC36NdGSwydUfaCLaoLKwfC8zYDC Zr3G7tfOhSJQzQtNSH1H0SqHFvMfc7C2Ra8mYXdHbcREswKOTT73aJUHq5RFfwO+ m10V5rQgCB9rJz0NLbo68GhxDrbTQuueDj+yDWCSoulUdNg3s2BZ3/iBjODJyJNO P3aG4bMYxC5te2JWCBnmR6du//8vnvDHnwWh9yKcUk+l/9OTtAPouAdUCv+r1wkz Ib0aEX3SiJ65LIePQO2kbdvgnweyFCJYduvMW9zjsH9GMgRP0eA6EKZh3mbKhOw4 yw9BcZoNYQ== =+ImB -----END PGP SIGNATURE----- Merge tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux Pull io_uring ITER_UBUF conversion from Jens Axboe: "Since we now have ITER_UBUF available, switch to using it for single ranges as it's more efficient than ITER_IOVEC for that" * tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux: block: use iter_ubuf for single range iov_iter: move iter_ubuf check inside restore WARN io_uring: use iter_ubuf for single range imports io_uring: switch network send/recv to ITER_UBUF iov: add import_ubuf()	2023-02-20 14:03:57 -08:00
Josh Triplett	7d3fd88d61	io_uring: Support calling io_uring_register with a registered ring fd Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit of the opcode) to treat the fd as a registered index rather than a file descriptor. This makes it possible for a library to open an io_uring, register the ring fd, close the ring fd, and subsequently use the ring entirely via registered index. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/f2396369e638284586b069dbddffb8c992afba95.1676419314.git.josh@joshtriplett.org [axboe: remove extra high bit clear] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-16 06:09:30 -07:00
Richard Guy Briggs	fbe870a72f	io_uring,audit: don't log IORING_OP_MADVISE fadvise and madvise both provide hints for caching or access pattern for file and memory respectively. Skip them. Fixes: `5bd2182d58` ("audit,io_uring,io-wq: add some basic audit support to io_uring") Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Link: https://lore.kernel.org/r/b5dfdcd541115c86dbc774aa9dd502c964849c5f.1675282642.git.rgb@redhat.com Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-10 16:00:30 -07:00
Jens Axboe	2f2bb1ffc9	io_uring: mark task TASK_RUNNING before handling resume/task work Just like for task_work, set the task mode to TASK_RUNNING before doing any potential resume work. We're not holding any locks at this point, but we may have already set the task state to TASK_INTERRUPTIBLE in preparation for going to sleep waiting for events. Ensure that we set it back to TASK_RUNNING if we have work to process, to avoid warnings on calling blocking operations with !TASK_RUNNING. Fixes: `b5d3ae202f` ("io_uring: handle TIF_NOTIFY_RESUME when checking for task_work") Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202302062208.24d3e563-oliver.sang@intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-06 08:23:21 -07:00
Christoph Hellwig	cc342a2193	io_uring: use bvec_set_page to initialize a bvec Use the bvec_set_page helper to initialize a bvec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230203150634.3199647-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-03 10:17:42 -07:00
Dylan Yudaken	0ffae640ad	io_uring: always go async for unsupported open flags No point in issuing -> return -EAGAIN -> go async, when it can be done upfront. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20230127135227.3646353-5-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:18:26 -07:00
Dylan Yudaken	c31cc60fdd	io_uring: always go async for unsupported fadvise flags No point in issuing -> return -EAGAIN -> go async, when it can be done upfront. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20230127135227.3646353-4-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:18:26 -07:00
Dylan Yudaken	aebb224fd4	io_uring: for requests that require async, force it Some requests require being run async as they do not support non-blocking. Instead of trying to issue these requests, getting -EAGAIN and then queueing them for async issue, rather just force async upfront. Add WARN_ON_ONCE to make sure surprising code paths do not come up, however in those cases the bug would end up being a blocking io_uring_enter(2) which should not be critical. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20230127135227.3646353-3-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:18:26 -07:00
Dylan Yudaken	6bb3085556	io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async REQ_F_FORCE_ASYNC was being ignored for re-queueing linked requests. Instead obey that flag. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20230127135227.3646353-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:18:26 -07:00
Jens Axboe	f586800854	io_uring: add reschedule point to handle_tw_list() If CONFIG_PREEMPT_NONE is set and the task_work chains are long, we could be running into issues blocking others for too long. Add a reschedule check in handle_tw_list(), and flush the ctx if we need to reschedule. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Jens Axboe	fcc926bb85	io_uring: add a conditional reschedule to the IOPOLL cancelation loop If the kernel is configured with CONFIG_PREEMPT_NONE, we could be sitting in a tight loop reaping events but not giving them a chance to finish. This results in a trace ala: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 2-...!: (5249 ticks this GP) idle=935c/1/0x4000000000000000 softirq=4265/4274 fqs=1 (t=5251 jiffies g=465 q=4135 ncpus=4) rcu: rcu_sched kthread starved for 5249 jiffies! g465 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_sched state:R running task stack:0 pid:12 ppid:2 flags:0x00000008 Call trace: __switch_to+0xb0/0xc8 __schedule+0x43c/0x520 schedule+0x4c/0x98 schedule_timeout+0xbc/0xdc rcu_gp_fqs_loop+0x308/0x344 rcu_gp_kthread+0xd8/0xf0 kthread+0xb8/0xc8 ret_from_fork+0x10/0x20 rcu: Stack dump where RCU GP kthread last ran: Task dump for CPU 0: task:kworker/u8:10 state:R running task stack:0 pid:89 ppid:2 flags:0x0000000a Workqueue: events_unbound io_ring_exit_work Call trace: __switch_to+0xb0/0xc8 0xffff0000c8fefd28 CPU: 2 PID: 95 Comm: kworker/u8:13 Not tainted 6.2.0-rc5-00042-g40316e337c80-dirty #2759 Hardware name: linux,dummy-virt (DT) Workqueue: events_unbound io_ring_exit_work pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : io_do_iopoll+0x344/0x360 lr : io_do_iopoll+0xb8/0x360 sp : ffff800009bebc60 x29: ffff800009bebc60 x28: 0000000000000000 x27: 0000000000000000 x26: ffff0000c0f67d48 x25: ffff0000c0f67840 x24: ffff800008950024 x23: 0000000000000001 x22: 0000000000000000 x21: ffff0000c27d3200 x20: ffff0000c0f67840 x19: ffff0000c0f67800 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000000 x11: 0000000000000179 x10: 0000000000000870 x9 : ffff800009bebd60 x8 : ffff0000c27d3ad0 x7 : fefefefefefefeff x6 : 0000646e756f626e x5 : ffff0000c0f67840 x4 : 0000000000000000 x3 : ffff0000c2398000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: io_do_iopoll+0x344/0x360 io_uring_try_cancel_requests+0x21c/0x334 io_ring_exit_work+0x90/0x40c process_one_work+0x1a4/0x254 worker_thread+0x1ec/0x258 kthread+0xb8/0xc8 ret_from_fork+0x10/0x20 Add a cond_resched() in the cancelation IOPOLL loop to fix this. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	50470fc572	io_uring: return normal tw run linking optimisation io_submit_flush_completions() may produce new task_work items, so it's a good idea to recheck the task_work list after flushing completions. The optimisation is not new and was accidentially removed by `f88262e60b` ("io_uring: lockless task list") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a7ed5ede84de190832cc33ebbcdd6e91cd90f5b6.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	cb6bf7f285	io_uring: refactor tctx_task_work Merge almost identical sections of tctx_task_work(), this will make code modifications later easier and also inlines handle_tw_list(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d06592d91e3e7559e7a4dbb8907d110863008dc7.1674484266.git.asml.silence@gmail.com [axboe: fold in setting count to zero patch from Tom Rix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	5afa465071	io_uring: refactor io_put_task helpers Add a helper for putting refs from the target task context, rename __io_put_task() and add a couple of comments around. Use the remote version for __io_req_complete_post(), the local is only needed for __io_submit_flush_completions(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3bf92ebd594769d8a5d648472a8e335f2031d542.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	c8576f3e61	io_uring: refactor req allocation Follow the io_get_sqe pattern returning the result via a pointer and hide request cache refill inside io_alloc_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8c37c2e8a3cb5e4cd6a8ae3b91371227a92708a6.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	b5083dfa36	io_uring: improve io_get_sqe Return an SQE from io_get_sqe() as a parameter and use the return value to determine if it failed or not. This enables the compiler to compile out the sqe NULL check when we know that the return SQE is valid. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9cceb11329240ea097dffef6bf0a675bca14cf42.1674484266.git.asml.silence@gmail.com [axboe: remove bogus const modifier on return value] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	b2aa66aff6	io_uring: kill outdated comment about overflow flush __io_cqring_overflow_flush() doesn't return anything anymore, remove outdate comment. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4ce2bcbb17eac80cdf883fd1459d5ee6586e238c.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	c10bb64684	io_uring: use user visible tail in io_uring_poll() We return POLLIN from io_uring_poll() depending on whether there are CQEs for the userspace, and so we should use the user visible tail pointer instead of a transient cached value. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/228ffcbf30ba98856f66ffdb9a6a60ead1dd96c0.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Jens Axboe	f499254474	io_uring: pass in io_issue_def to io_assign_file() This generates better code for me, avoiding an extra load on arm64, and both call sites already have this variable available for easy passing. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Breno Leitao	c1755c25a7	io_uring: Enable KASAN for request cache Every io_uring request is represented by struct io_kiocb, which is cached locally by io_uring (not SLAB/SLUB) in the list called submit_state.freelist. This patch simply enabled KASAN for this free list. This list is initially created by KMEM_CACHE, but later, managed by io_uring. This patch basically poisons the objects that are not used (i.e., they are the free list), and unpoisons it when the object is allocated/removed from the list. Touching these poisoned objects while in the freelist will cause a KASAN warning. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Jens Axboe	b5d3ae202f	io_uring: handle TIF_NOTIFY_RESUME when checking for task_work If TIF_NOTIFY_RESUME is set, then we need to call resume_user_mode_work() for PF_IO_WORKER threads. They never return to usermode, hence never get a chance to process any items that are marked by this flag. Most notably this includes the final put of files, but also any throttling markers set by block cgroups. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Jens Axboe	8572df941c	io_uring/msg-ring: ensure flags passing works for task_work completions If the target ring is using IORING_SETUP_SINGLE_ISSUER and we're posting a message from a different thread, then we need to ensure that the fallback task_work that posts the CQE knwos about the flags passing as well. If not we'll always be posting 0 as the flags. Fixes: 3563d7ed58a5 ("io_uring/msg_ring: Pass custom flags to the cqe") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Breno Leitao	f30bd4d038	io_uring: Split io_issue_def struct This patch removes some "cold" fields from `struct io_issue_def`. The plan is to keep only highly used fields into `struct io_issue_def`, so, it may be hot in the cache. The hot fields are basically all the bitfields and the callback functions for .issue and .prep. The other less frequently used fields are now located in a secondary and cold struct, called `io_cold_def`. This is the size for the structs: Before: io_issue_def = 56 bytes After: io_issue_def = 24 bytes; io_cold_def = 40 bytes Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20230112144411.2624698-2-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Breno Leitao	a7dd27828b	io_uring: Rename struct io_op_def The current io_op_def struct is becoming huge and the name is a bit generic. The goal of this patch is to rename this struct to `io_issue_def`. This struct will contain the hot functions associated with the issue code path. For now, this patch only renames the structure, and an upcoming patch will break up the structure in two, moving the non-issue fields to a secondary struct. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20230112144411.2624698-1-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	68a2cc1bba	io_uring: refactor __io_req_complete_post Keep parts of __io_req_complete_post() relying on req->flags together so the value can be cached. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2b4fbb42f404a0e75c4d9f0a5b16f314a839d0a9.1673887636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	31f084b7b0	io_uring: simplify fallback execution Lock the ring with uring_lock in io_fallback_req_func(), which should make it a bit safer and easier. With that we also don't need refs pinning as io_ring_exit_work() will wait until uring_lock is freed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/56170e6a0cbfc8edee2794c6613e8f6f1d76d276.1673887636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	89800a2dd5	io_uring: don't export io_put_task() io_put_task() is only used in uring.c so enclose it there together with __io_put_task(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/43c7f9227e2ab215f1a6069dadbc5382bed346fe.1673887636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	b0b7a7d24b	io_uring: return back links tw run optimisation io_submit_flush_completions() may queue new requests for tw execution, especially true for linked requests. Recheck the tw list for emptiness after flushing completions. Note that this doesn't really fix the commit referenced below, but it does reinstate an optimization that existed before that got merged. Fixes: `f88262e60b` ("io_uring: lockless task list") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6328acdbb5e60efc762b18003382de077e6e1367.1673887636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Quanfa Fu	88b80534f6	io_uring: make io_sqpoll_wait_sq return void Change the return type to void since it always return 0, and no need to do the checking in syscall io_uring_enter. Signed-off-by: Quanfa Fu <quanfafu@gmail.com> Link: https://lore.kernel.org/r/20230115071519.554282-1-quanfafu@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Pavel Begunkov	c3f4d39ee4	io_uring: optimise deferred tw execution We needed fake nodes in __io_run_local_work() and to avoid unecessary wake ups while the task already running task_works, but we don't need them anymore since wake ups are protected by cq_waiting, which is always cleared by the time we're executing deferred task_work items. Note that because of loose sync around cq_waiting clearing io_req_local_work_add() may wake the task more than once, but that's fine and should be rare to not hurt perf. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8839534891f0a2f1076e78554a31ea7e099f7de5.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Pavel Begunkov	d80c0f00d0	io_uring: add io_req_local_work_add wake fast path Don't wake the master task after queueing a deferred tw unless it's currently waiting in io_cqring_wait. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/717702d772825a6647e6c315b4690277ba84c3fc.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Pavel Begunkov	130bd686d9	io_uring: waitqueue-less cq waiting With DEFER_TASKRUN only ctx->submitter_task might be waiting for CQEs, we can use this to optimise io_cqring_wait(). Replace ->cq_wait waitqueue with waking the task directly. It works but misses an important optimisation covered by the following patch, so this patch without follow ups might hurt performance. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/103d174d35d919d4cb0922d8a9c93a8f0c35f74a.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Pavel Begunkov	3181e22fb7	io_uring: wake up optimisations Flush completions is done either from the submit syscall or by the task_work, both are in the context of the submitter task, and when it goes for a single threaded rings like implied by ->task_complete, there won't be any waiters on ->cq_wait but the master task. That means that there can be no tasks sleeping on cq_wait while we run __io_submit_flush_completions() and so waking up can be skipped. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/60ad9768ec74435a0ddaa6eec0ffa7729474f69f.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Pavel Begunkov	bca39f3905	io_uring: add lazy poll_wq activation Even though io_poll_wq_wake()'s waitqueue_active reuses a barrier we do for another waitqueue, it's not going to be the case in the future and so we want to have a fast path for it when the ring has never been polled. Move poll_wq wake ups into __io_commit_cqring_flush() using a new flag called ->poll_activated. The idea behind the flag is to set it when the ring was polled for the first time. This requires additional sync to not miss events, which is done here by using task_work for ->task_complete rings, and by default enabling the flag for all other types of rings. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/060785e8e9137a920b232c0c7f575b131af19cac.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Pavel Begunkov	7b235dd82a	io_uring: separate wq for ring polling Don't use ->cq_wait for ring polling but add a separate wait queue for it. We need it for following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dea0be0bf990503443c5c6c337fc66824af7d590.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Pavel Begunkov	360173ab9e	io_uring: move io_run_local_work_locked io_run_local_work_locked() is only used in io_uring.c, move it there. With that we can also make __io_run_local_work() static. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/91757bcb33e5774e49fed6f2b6e058630608119b.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Pavel Begunkov	3e5655552a	io_uring: mark io_run_local_work static io_run_local_work is enclosed in io_uring.c, we don't need to export it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b477fb81f5e77044f724a06fe245d5c078659364.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Pavel Begunkov	2f413956cc	io_uring: don't set TASK_RUNNING in local tw runner The CQ waiting loop sets TASK_RUNNING before trying to execute task_work, no need to repeat it in io_run_local_work(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9d9422c429ef3f9457b4f4b8288bf4789564f33b.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Pavel Begunkov	bd550173ac	io_uring: refactor io_wake_function Remove a local variable ctx in io_wake_function(), we don't need it if io_should_wake() triggers it to wake up. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e60eb1008aebe286aab7d34c772ed01c447bddb1.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00
Dmitrii Bundin	81594e7e7a	io_uring: remove excessive unlikely on IS_ERR The IS_ERR function uses the IS_ERR_VALUE macro under the hood which already wraps the condition into unlikely. Signed-off-by: Dmitrii Bundin <dmitrii.bundin.a@gmail.com> Link: https://lore.kernel.org/r/20230109185854.25698-1-dmitrii.bundin.a@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:40 -07:00

1 2 3 4 5 ...

501 Commits