linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-23 20:51:44 +00:00

Author	SHA1	Message	Date
Linus Torvalds	5b9a7bb72f	for-6.4/io_uring-2023-04-21 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRCvawQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpiKTEACvp0jm3Lyhxb8RMsx5T6Ko0pFH3DIymiL4 xpoZmAUflOjD0c+99FwHRQqKKXuo3OelhW+YOm0N6OOAt6JMSGmKpZh0UNJx+Fgj wMwiQ0X3Y5SaAsr5ZpXM+G1BV7ajihsMpu8/a718ERB3U3cLDz2qJfnzJh+E5Ip5 pYB4vS3+/FAER2MYQ7IPeovch2wWYtxDPztOxNX6SORu3OvpWiz1GR/+8u0tqj50 ROq97Jwjh5Tl4zP356EUSj/Vkfdr2yb+NlLbun8My5x8tYftZjnrNQ/+qeJNLwB8 tWTrg166ox/VX3aYZruAgUPv0IyGPZg7qZV5R72ChBK3VhIbOOLOCm/V6dhvl/XH vu2FG7J8WylWHmc+OU8u7TeSJdrwxTLs4e2IFUBK9ymAYFp0Q9S924fgvSYsFvVB iNn58SPRIbuA4SPtRfCd7pENtZW/QKfBC5CYK+pjsZVX40c9dbe40foVu4t2/EAo gi9+gSWEUVRRW2osxjaHXh78cW63g0j9bNfS6n1Vy32Oo5Mwm7n+bVWqCU5bCBXI MpPOk6AgME3UPwFzGzSmx+PVw8VacPxYP1NF8RFTCwj7OowFnrolJtruDmKJgXWY BN41EDo41k/C5mEu16Jr9rAkHeVhHaNZ+JhyDrzv8llJ/rv+4zEJw9SrhnpufmOX +YERd/ndAw== =Erfk -----END PGP SIGNATURE----- Merge tag 'for-6.4/io_uring-2023-04-21' of git://git.kernel.dk/linux Pull io_uring updates from Jens Axboe: - Cleanup of the io-wq per-node mapping, notably getting rid of it so we just have a single io_wq entry per ring (Breno) - Followup to the above, move accounting to io_wq as well and completely drop struct io_wqe (Gabriel) - Enable KASAN for the internal io_uring caches (Breno) - Add support for multishot timeouts. Some applications use timeouts to wake someone waiting on completion entries, and this makes it a bit easier to just have a recurring timer rather than needing to rearm it every time (David) - Support archs that have shared cache coloring between userspace and the kernel, and hence have strict address requirements for mmap'ing the ring into userspace. This should only be parisc/hppa. (Helge, me) - XFS has supported O_DIRECT writes without needing to lock the inode exclusively for a long time, and ext4 now supports it as well. This is true for the common cases of not extending the file size. Flag the fs as having that feature, and utilize that to avoid serializing those writes in io_uring (me) - Enable completion batching for uring commands (me) - Revert patch adding io_uring restriction to what can be GUP mapped or not. This does not belong in io_uring, as io_uring isn't really special in this regard. Since this is also getting in the way of cleanups and improvements to the GUP code, get rid of if (me) - A few series greatly reducing the complexity of registered resources, like buffers or files. Not only does this clean up the code a lot, the simplified code is also a LOT more efficient (Pavel) - Series optimizing how we wait for events and run task_work related to it (Pavel) - Fixes for file/buffer unregistration with DEFER_TASKRUN (Pavel) - Misc cleanups and improvements (Pavel, me) * tag 'for-6.4/io_uring-2023-04-21' of git://git.kernel.dk/linux: (71 commits) Revert "io_uring/rsrc: disallow multi-source reg buffers" io_uring: add support for multishot timeouts io_uring/rsrc: disassociate nodes and rsrc_data io_uring/rsrc: devirtualise rsrc put callbacks io_uring/rsrc: pass node to io_rsrc_put_work() io_uring/rsrc: inline io_rsrc_put_work() io_uring/rsrc: add empty flag in rsrc_node io_uring/rsrc: merge nodes and io_rsrc_put io_uring/rsrc: infer node from ctx on io_queue_rsrc_removal io_uring/rsrc: remove unused io_rsrc_node::llist io_uring/rsrc: refactor io_queue_rsrc_removal io_uring/rsrc: simplify single file node switching io_uring/rsrc: clean up __io_sqe_buffers_update() io_uring/rsrc: inline switch_start fast path io_uring/rsrc: remove rsrc_data refs io_uring/rsrc: fix DEFER_TASKRUN rsrc quiesce io_uring/rsrc: use wq for quiescing io_uring/rsrc: refactor io_rsrc_ref_quiesce io_uring/rsrc: remove io_rsrc_node::done io_uring/rsrc: use nospec'ed indexes ...	2023-04-26 12:40:31 -07:00
Pavel Begunkov	0b222eeb65	io_uring/rsrc: remove rsrc_data refs Instead of waiting for rsrc_data->refs to be downed to zero, check whether there are rsrc nodes queued for completion, that's easier then maintaining references. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8e33fd143d83e11af3e386aea28eb6d6c6a1be10.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-15 14:44:57 -06:00
Pavel Begunkov	4ea15b56f0	io_uring/rsrc: use wq for quiescing Replace completions with waitqueues for rsrc data quiesce, the main wakeup condition is when data refs hit zero. Note that data refs are only changes under ->uring_lock, so we prepare before mutex_unlock() reacquire it after taking the lock back. This change will be needed in the next patch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1d0dbc74b3b4fd67c8f01819e680c5e0da252956.1681395792.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-15 14:44:57 -06:00
Ming Lei	860e1c7f8b	io_uring: complete request via task work in case of DEFER_TASKRUN So far io_req_complete_post() only covers DEFER_TASKRUN by completing request via task work when the request is completed from IOWQ. However, uring command could be completed from any context, and if io uring is setup with DEFER_TASKRUN, the command is required to be completed from current context, otherwise wait on IORING_ENTER_GETEVENTS can't be wakeup, and may hang forever. The issue can be observed on removing ublk device, but turns out it is one generic issue for uring command & DEFER_TASKRUN, so solve it in io_uring core code. Fixes: `e6aeb2721d` ("io_uring: complete all requests in task context") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-block/b3fc9991-4c53-9218-a8cc-5b4dd3952108@kernel.dk/ Reported-by: Jens Axboe <axboe@kernel.dk> Cc: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-14 06:38:23 -06:00
Pavel Begunkov	2933ae6eaa	io_uring/rsrc: refactor io_rsrc_node_switch We use io_rsrc_node_switch() coupled with io_rsrc_node_switch_start() for a bunch of cases including initialising ctx->rsrc_node, i.e. by passing NULL instead of rsrc_data. Leave it to only deal with actual node changing. For that, first remove it from io_uring_create() and add a function allocating the first node. Then also remove all calls to io_rsrc_node_switch() from files/buffers register as we already have a node installed and it does essentially nothing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d146fe306ff98b1a5a60c997c252534f03d423d7.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Pavel Begunkov	528407b1e0	io_uring/rsrc: consolidate node caching We store one pre-allocated rsrc node in ->rsrc_backup_node, merge it with ->rsrc_node_cache. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6d5410e51ccd29be7a716be045b51d6b371baef6.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Pavel Begunkov	8b1df11f97	io_uring: shut io_prep_async_work warning io_uring/io_uring.c:432 io_prep_async_work() error: we previously assumed 'req->file' could be null (see line 425). Even though it's a false positive as there will not be REQ_F_ISREG set without a file, let's add a simple check to make the kernel test robot happy. We don't care about performance here, but assumingly it'll be optimised out by the compiler. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a6cfbe92c74b789c0b4f046f7f98d19b1ca2e5b7.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Pavel Begunkov	360cd42c4e	io_uring: optimise io_req_local_work_add Chains of memory accesses are never good for performance. The req->task->io_uring->in_cancel in io_req_local_work_add() is there so that when a task is exiting via io_uring_try_cancel_requests() and starts waiting for completions, it gets woken up by every new task_work item queued. Do a little trick by announcing waiting in io_uring_try_cancel_requests(), making io_req_local_work_add() wake us up. We also need to check for deferred tw items after prepare_to_wait(TASK_INTERRUPTIBLE); Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fb11597e9bbcb365901824f8c5c2cf0d6ee100d0.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:24:36 -06:00
Pavel Begunkov	c66ae3ec38	io_uring: refactor __io_cq_unlock_post_flush() Separate ->task_complete path in __io_cq_unlock_post_flush(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/baa9b8d822f024e4ee01c40209dbbe38d9c8c11d.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:28 -06:00
Pavel Begunkov	8751d15426	io_uring: reduce scheduling due to tw Every task_work will try to wake the task to be executed, which causes excessive scheduling and additional overhead. For some tw it's justified, but others won't do much but post a single CQE. When a task waits for multiple cqes, every such task_work will wake it up. Instead, the task may give a hint about how many cqes it waits for, io_req_local_work_add() will compare against it and skip wake ups if #cqes + #tw is not enough to satisfy the waiting condition. Task_work that uses the optimisation should be simple enough and never post more than one CQE. It's also ignored for non DEFER_TASKRUN rings. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d2b77e99d1e86624d8a69f7037d764b739dcd225.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:28 -06:00
Pavel Begunkov	5150940079	io_uring: inline llist_add() We'll need to grab some information from the previous request in the tw list, inline llist_add(), it'll be used in the following patch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f0165493af7b379943c792114b972f331e7d7d10.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:28 -06:00
Pavel Begunkov	8501fe70ae	io_uring: add tw add flags We pass 'allow_local' into io_req_task_work_add() but will need more flags. Replace it with a flags bit field and name this allow_local flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4c0f01e7ef4e6feebfb199093cc995af7a19befa.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:28 -06:00
Pavel Begunkov	6e7248adf8	io_uring: refactor io_cqring_wake() Instead of smp_mb() + __io_cqring_wake() in __io_cq_unlock_post_flush() use equivalent io_cqring_wake(). With that we can clean it up further and remove __io_cqring_wake(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/662ee5d898168ac206be06038525e97b64072a46.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:28 -06:00
Pavel Begunkov	d73a572df2	io_uring: optimize local tw add ctx pinning We currently pin the ctx for io_req_local_work_add() with percpu_ref_get/put, which implies two rcu_read_lock/unlock pairs and some extra overhead on top in the fast path. Replace it with a pure rcu read and let io_ring_exit_work() synchronise against it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cbdfcb6b232627f30e9e50ef91f13c4f05910247.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:10 -06:00
Pavel Begunkov	ab1c590f5c	io_uring: move pinning out of io_req_local_work_add Move ctx pinning from io_req_local_work_add() to the caller, looks better and makes working with the code a bit easier. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/49c0dbed390b0d6d04cb942dd3592879fd5bfb1b.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:22:07 -06:00
Pavel Begunkov	69bbc6ade9	io_uring/rsrc: add custom limit for node caching The number of entries in the rsrc node cache is limited to 512, which still seems unnecessarily large. Add per cache thresholds and set to to 32 for the rsrc node cache. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d0cd538b944dac0bf878e276fc0199f21e6bccea.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	1f2c8f610a	io_uring/rsrc: add lockdep sanity checks We should hold ->uring_lock while putting nodes with io_put_rsrc_node(), add a lockdep check for that. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b50d5f156ac41450029796738c1dfd22a521df7a.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	9eae8655f9	io_uring/rsrc: cache struct io_rsrc_node Add allocation cache for struct io_rsrc_node, it's always allocated and put under ->uring_lock, so it doesn't need any extra synchronisation around caches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/252a9d9ef9654e6467af30fdc02f57c0118fb76e.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	36b9818a5a	io_uring/rsrc: don't offload node free struct delayed_work rsrc_put_work was previously used to offload node freeing because io_rsrc_node_ref_zero() was previously called by RCU in the IRQ context. Now, as percpu refcounting is gone, we can do it eagerly at the spot without pushing it to a worker. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/13fb1aac1e8d068ad8fd4a0c6d0d157ab61b90c0.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	0a4813b1ab	io_uring/rsrc: kill rsrc_ref_lock We use ->rsrc_ref_lock spinlock to protect ->rsrc_ref_list in io_rsrc_node_ref_zero(). Now we removed pcpu refcounting, which means io_rsrc_node_ref_zero() is not executed from the irq context as an RCU callback anymore, and we also put it under ->uring_lock. io_rsrc_node_switch(), which queues up nodes into the list, is also protected by ->uring_lock, so we can safely get rid of ->rsrc_ref_lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6b60af883c263551190b526a55ff2c9d5ae07141.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	ef8ae64ffa	io_uring/rsrc: protect node refs with uring_lock Currently, for nodes we have an atomic counter and some cached (non-atomic) refs protected by uring_lock. Let's put all ref manipulations under uring_lock and get rid of the atomic part. It's free as in all cases we care about we already hold the lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/25b142feed7d831008257d90c8b17c0115d4fc15.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	03adabe81a	io_uring: io_free_req() via tw io_free_req() is not often used but nevertheless problematic as there is no way to know the current context, it may be used from the submission path or even by an irq handler. Push it to a fresh context using task_work. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3a92fe80bb068757e51aaa0b105cfbe8f5dfee9e.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	2ad4c6d080	io_uring: don't put nodes under spinlocks io_req_put_rsrc() doesn't need any locking, so move it out of a spinlock section in __io_req_complete_post() and adjust helpers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d5b87a5f31270dade6805f7acafc4cc34b84b241.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Jens Axboe	e3ef728ff0	io_uring: cap io_sqring_entries() at SQ ring size We already do this manually for the !SQPOLL case, do it in general and we can also dump the ugly min3() in io_submit_sqes(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Jens Axboe	2ad57931db	io_uring: rename trace_io_uring_submit_sqe() tracepoint It has nothing to do with the SQE at this point, it's a request submission. While in there, get rid of the 'force_nonblock' argument which is also dead, as we only pass in true. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Pavel Begunkov	a282967c84	io_uring: encapsulate task_work state For task works we're passing around a bool pointer for whether the current ring is locked or not, let's wrap it in a structure, that will make it more opaque preventing abuse and will also help us to pass more info in the future if needed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Pavel Begunkov	13bfa6f15d	io_uring: remove extra tw trylocks Before cond_resched()'ing in handle_tw_list() we also drop the current ring context, and so the next loop iteration will need to pick/pin a new context and do trylock. The chunk removed by this patch was intended to be an optimisation covering exactly this case, i.e. retaking the lock after reschedule, but in reality it's skipped for the first iteration after resched as described and will keep hammering the lock if it's contended. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Breno Leitao	e1fe7ee885	io_uring: Add KASAN support for alloc_caches Add support for KASAN in the alloc_caches (apoll and netmsg_cache). Thus, if something touches the unused caches, it will raise a KASAN warning/exception. It poisons the object when the object is put to the cache, and unpoisons it when the object is gotten or freed. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:14 -06:00
Jens Axboe	c56e022c0a	io_uring: add support for user mapped provided buffer ring The ring mapped provided buffer rings rely on the application allocating the memory for the ring, and then the kernel will map it. This generally works fine, but runs into issues on some architectures where we need to be able to ensure that the kernel and application virtual address for the ring play nicely together. This at least impacts architectures that set SHM_COLOUR, but potentially also anyone setting SHMLBA. To use this variant of ring provided buffers, the application need not allocate any memory for the ring. Instead the kernel will do so, and the allocation must subsequently call mmap(2) on the ring with the offset set to: IORING_OFF_PBUF_RING \| (bgid << IORING_OFF_PBUF_SHIFT) to get a virtual address for the buffer ring. Normally the application would allocate a suitable piece of memory (and correctly aligned) and simply pass that in via io_uring_buf_reg.ring_addr and the kernel would map it. Outside of the setup differences, the kernel allocate + user mapped provided buffer ring works exactly the same. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:21 -06:00
Helge Deller	d808459b2e	io_uring: Adjust mapping wrt architecture aliasing requirements Some architectures have memory cache aliasing requirements (e.g. parisc) if memory is shared between userspace and kernel. This patch fixes the kernel to return an aliased address when asked by userspace via mmap(). Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:20 -06:00
Jens Axboe	d4755e1538	io_uring: avoid hashing O_DIRECT writes if the filesystem doesn't need it io_uring hashes writes to a given file/inode so that it can serialize them. This is useful if the file system needs exclusive access to the file to perform the write, as otherwise we end up with a ton of io-wq threads trying to lock the inode at the same time. This can cause excessive system time. But if the file system has flagged that it supports parallel O_DIRECT writes, then there's no need to serialize the writes. Check for that through FMODE_DIO_PARALLEL_WRITE and don't hash it if we don't need to. In a basic test of 8 threads writing to a file on XFS on a gen2 Optane, with each thread writing in 4k chunks, it improves performance from ~1350K IOPS (or ~5290MiB/sec) to ~1410K IOPS (or ~5500MiB/sec). Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:20 -06:00
Wojciech Lukowicz	b4a72c0589	io_uring: fix memory leak when removing provided buffers When removing provided buffers, io_buffer structs are not being disposed of, leading to a memory leak. They can't be freed individually, because they are allocated in page-sized groups. They need to be added to some free list instead, such as io_buffers_cache. All callers already hold the lock protecting it, apart from when destroying buffers, so had to extend the lock there. Fixes: `cc3cec8367` ("io_uring: speedup provided buffer handling") Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com> Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-01 16:52:12 -06:00
Jens Axboe	fa780334a8	io_uring: silence variable ‘prev’ set but not used warning If io_uring.o is built with W=1, it triggers a warning: io_uring/io_uring.c: In function ‘__io_submit_flush_completions’: io_uring/io_uring.c:1502:40: warning: variable ‘prev’ set but not used [-Wunused-but-set-variable] 1502 \| struct io_wq_work_node node, prev; \| ^~~~ which is due to the wq_list_for_each() iterator always keeping a 'prev' variable. Most users need this to remove an entry from a list, for example, but __io_submit_flush_completions() never does that. Add a basic helper that doesn't track prev instead, and use that in that function. Reported-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-09 10:10:58 -07:00
Linus Torvalds	53ae7e1176	io_uring-6.3-2023-03-03 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmQB58gQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpvz0EACk4hUVpReCU7mspo+eAnzo1+Oi5DHdm34C t4DU00lJ107Als5i86kP95zK2UHMXMsTt6N0fVwmDd/qQ8zhMwxH5sjyHVsknvGa sNLDCcA8yYwhfnhMnRIk024Rvn9p3U4CThSKF6G7g78absm5ULu/UL3MSiI4Q8aM gfK2MlD6aZLTZKLYefOI4fMBwLvJoZ3/mwpzdBfijJT/qUsYhrg8rJ9Pxo+JvtNf Oxny83DnT2i7ggVZraNouuD972EGf9Lm4TAmWV5l5nG4aKYZfRny+QwA48Eghs5+ EpUNtpdvL5pguZIJLI77LSkNmJA/0oFsjh/qGOYcPZfS9erY3KOr8+jBh9FKdLC+ 5AZVt8pnSKqWT1EjKmpEThbaPgzXEhiFYq9kKO4gG5VhSgD5oxC1MXhwb8QJ0gOm 9PR9pjzwPyYqMj7yAne4lY7I0OujHgYi9nrSpW2mLeeWjRJmkYXjB9NYxNnwiOLS H5SzbzHafn8HAuLacwp3StqrwqCE4bx+dTCYgLj/puN2IFEre/xtvy4S6sG2tPAF 1hJwvV2cYqqK9Khx5fBHv2GAlMlozqe8RwQF84VGTuu+Jturo73CNtlkTGPT2Yds SJCTNA05uo0WsvNdiwlMWUW+nXzMFuzu0stb/Gg14KjAcS8CxYGKRRP8kFAoxrQY LtmaNYcukQ== =qWNf -----END PGP SIGNATURE----- Merge tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linux Pull more io_uring updates from Jens Axboe: "Here's a set of fixes/changes that didn't make the first cut, either because they got queued before I sent the early merge request, or fixes that came in afterwards. In detail: - Don't set MSG_NOSIGNAL on recv/recvmsg opcodes, as AF_PACKET will error out (David) - Fix for spurious poll wakeups (me) - Fix for a file leak for buffered reads in certain conditions (Joseph) - Don't allow registered buffers of mixed types (Pavel) - Improve handling of huge pages for registered buffers (Pavel) - Provided buffer ring size calculation fix (Wojciech) - Minor cleanups (me)" * tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linux: io_uring/poll: don't pass in wake func to io_init_poll_iocb() io_uring: fix fget leak when fs don't support nowait buffered read io_uring/poll: allow some retries for poll triggering spuriously io_uring: remove MSG_NOSIGNAL from recvmsg io_uring/rsrc: always initialize 'folio' to NULL io_uring/rsrc: optimise registered huge pages io_uring/rsrc: optimise single entry advance io_uring/rsrc: disallow multi-source reg buffers io_uring: remove unused wq_list_merge io_uring: fix size calculation when registering buf ring io_uring/rsrc: fix a comment in io_import_fixed() io_uring: rename 'in_idle' to 'in_cancel' io_uring: consolidate the put_ref-and-return section of adding work	2023-03-03 10:25:29 -08:00
Joseph Qi	54aa7f2330	io_uring: fix fget leak when fs don't support nowait buffered read Heming reported a BUG when using io_uring doing link-cp on ocfs2. [1] Do the following steps can reproduce this BUG: mount -t ocfs2 /dev/vdc /mnt/ocfs2 cp testfile /mnt/ocfs2/ ./link-cp /mnt/ocfs2/testfile /mnt/ocfs2/testfile.1 umount /mnt/ocfs2 Then umount will fail, and it outputs: umount: /mnt/ocfs2: target is busy. While tracing umount, it blames mnt_get_count() not return as expected. Do a deep investigation for fget()/fput() on related code flow, I've finally found that fget() leaks since ocfs2 doesn't support nowait buffered read. io_issue_sqe \|-io_assign_file // do fget() first \|-io_read \|-io_iter_do_read \|-ocfs2_file_read_iter // return -EOPNOTSUPP \|-kiocb_done \|-io_rw_done \|-__io_complete_rw_common // set REQ_F_REISSUE \|-io_resubmit_prep \|-io_req_prep_async // override req->file, leak happens This was introduced by commit `a196c78b54` in v5.18. Fix it by don't re-assign req->file if it has already been assigned. [1] https://lore.kernel.org/ocfs2-devel/ab580a75-91c8-d68a-3455-40361be1bfa8@linux.alibaba.com/T/#t Fixes: `a196c78b54` ("io_uring: assign non-fixed early for async work") Cc: <stable@vger.kernel.org> Reported-by: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/20230228045459.13524-1-joseph.qi@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-28 05:58:42 -07:00
Linus Torvalds	3822a7c409	- Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K DmxHkn0LAitGgJRS/W9w81yrgig9tAQ= =MlGs -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...	2023-02-23 17:09:35 -08:00
Jens Axboe	8d664282a0	io_uring: rename 'in_idle' to 'in_cancel' This better describes what it does - it's incremented when the task is currently undergoing a cancelation operation, due to exiting or exec'ing. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-22 09:57:23 -07:00
Jens Axboe	ce8e04f6e5	io_uring: consolidate the put_ref-and-return section of adding work We've got a few cases of this, move them to one section and just use gotos to get there. Reduces the text section on both arm64 and x86-64, using gcc-12.2. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-22 09:57:23 -07:00
Josh Triplett	7d3fd88d61	io_uring: Support calling io_uring_register with a registered ring fd Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit of the opcode) to treat the fd as a registered index rather than a file descriptor. This makes it possible for a library to open an io_uring, register the ring fd, close the ring fd, and subsequently use the ring entirely via registered index. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/f2396369e638284586b069dbddffb8c992afba95.1676419314.git.josh@joshtriplett.org [axboe: remove extra high bit clear] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-16 06:09:30 -07:00
Dylan Yudaken	6bb3085556	io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async REQ_F_FORCE_ASYNC was being ignored for re-queueing linked requests. Instead obey that flag. Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20230127135227.3646353-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:18:26 -07:00
Jens Axboe	f586800854	io_uring: add reschedule point to handle_tw_list() If CONFIG_PREEMPT_NONE is set and the task_work chains are long, we could be running into issues blocking others for too long. Add a reschedule check in handle_tw_list(), and flush the ctx if we need to reschedule. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Jens Axboe	fcc926bb85	io_uring: add a conditional reschedule to the IOPOLL cancelation loop If the kernel is configured with CONFIG_PREEMPT_NONE, we could be sitting in a tight loop reaping events but not giving them a chance to finish. This results in a trace ala: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 2-...!: (5249 ticks this GP) idle=935c/1/0x4000000000000000 softirq=4265/4274 fqs=1 (t=5251 jiffies g=465 q=4135 ncpus=4) rcu: rcu_sched kthread starved for 5249 jiffies! g465 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_sched state:R running task stack:0 pid:12 ppid:2 flags:0x00000008 Call trace: __switch_to+0xb0/0xc8 __schedule+0x43c/0x520 schedule+0x4c/0x98 schedule_timeout+0xbc/0xdc rcu_gp_fqs_loop+0x308/0x344 rcu_gp_kthread+0xd8/0xf0 kthread+0xb8/0xc8 ret_from_fork+0x10/0x20 rcu: Stack dump where RCU GP kthread last ran: Task dump for CPU 0: task:kworker/u8:10 state:R running task stack:0 pid:89 ppid:2 flags:0x0000000a Workqueue: events_unbound io_ring_exit_work Call trace: __switch_to+0xb0/0xc8 0xffff0000c8fefd28 CPU: 2 PID: 95 Comm: kworker/u8:13 Not tainted 6.2.0-rc5-00042-g40316e337c80-dirty #2759 Hardware name: linux,dummy-virt (DT) Workqueue: events_unbound io_ring_exit_work pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : io_do_iopoll+0x344/0x360 lr : io_do_iopoll+0xb8/0x360 sp : ffff800009bebc60 x29: ffff800009bebc60 x28: 0000000000000000 x27: 0000000000000000 x26: ffff0000c0f67d48 x25: ffff0000c0f67840 x24: ffff800008950024 x23: 0000000000000001 x22: 0000000000000000 x21: ffff0000c27d3200 x20: ffff0000c0f67840 x19: ffff0000c0f67800 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000000 x11: 0000000000000179 x10: 0000000000000870 x9 : ffff800009bebd60 x8 : ffff0000c27d3ad0 x7 : fefefefefefefeff x6 : 0000646e756f626e x5 : ffff0000c0f67840 x4 : 0000000000000000 x3 : ffff0000c2398000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: io_do_iopoll+0x344/0x360 io_uring_try_cancel_requests+0x21c/0x334 io_ring_exit_work+0x90/0x40c process_one_work+0x1a4/0x254 worker_thread+0x1ec/0x258 kthread+0xb8/0xc8 ret_from_fork+0x10/0x20 Add a cond_resched() in the cancelation IOPOLL loop to fix this. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	50470fc572	io_uring: return normal tw run linking optimisation io_submit_flush_completions() may produce new task_work items, so it's a good idea to recheck the task_work list after flushing completions. The optimisation is not new and was accidentially removed by `f88262e60b` ("io_uring: lockless task list") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a7ed5ede84de190832cc33ebbcdd6e91cd90f5b6.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	cb6bf7f285	io_uring: refactor tctx_task_work Merge almost identical sections of tctx_task_work(), this will make code modifications later easier and also inlines handle_tw_list(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d06592d91e3e7559e7a4dbb8907d110863008dc7.1674484266.git.asml.silence@gmail.com [axboe: fold in setting count to zero patch from Tom Rix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	5afa465071	io_uring: refactor io_put_task helpers Add a helper for putting refs from the target task context, rename __io_put_task() and add a couple of comments around. Use the remote version for __io_req_complete_post(), the local is only needed for __io_submit_flush_completions(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3bf92ebd594769d8a5d648472a8e335f2031d542.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	c8576f3e61	io_uring: refactor req allocation Follow the io_get_sqe pattern returning the result via a pointer and hide request cache refill inside io_alloc_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8c37c2e8a3cb5e4cd6a8ae3b91371227a92708a6.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	b5083dfa36	io_uring: improve io_get_sqe Return an SQE from io_get_sqe() as a parameter and use the return value to determine if it failed or not. This enables the compiler to compile out the sqe NULL check when we know that the return SQE is valid. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9cceb11329240ea097dffef6bf0a675bca14cf42.1674484266.git.asml.silence@gmail.com [axboe: remove bogus const modifier on return value] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	b2aa66aff6	io_uring: kill outdated comment about overflow flush __io_cqring_overflow_flush() doesn't return anything anymore, remove outdate comment. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4ce2bcbb17eac80cdf883fd1459d5ee6586e238c.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Pavel Begunkov	c10bb64684	io_uring: use user visible tail in io_uring_poll() We return POLLIN from io_uring_poll() depending on whether there are CQEs for the userspace, and so we should use the user visible tail pointer instead of a transient cached value. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/228ffcbf30ba98856f66ffdb9a6a60ead1dd96c0.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00
Jens Axboe	f499254474	io_uring: pass in io_issue_def to io_assign_file() This generates better code for me, avoiding an extra load on arm64, and both call sites already have this variable available for easy passing. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-29 15:17:41 -07:00

1 2 3 4 5 ...

277 Commits