Commit Graph

204 Commits

Author SHA1 Message Date
Pavel Begunkov
46929b0868 io_uring: add io_commit_cqring_flush()
Since __io_commit_cqring_flush users moved to different files, introduce
io_commit_cqring_flush() helper and encapsulate all flags testing details
inside.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0da03887435dd9869ffe46dcd3962bf104afcca3.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:15 -06:00
Pavel Begunkov
253993210b io_uring: introduce locking helpers for CQE posting
spin_lock(&ctx->completion_lock);
/* post CQEs */
io_commit_cqring(ctx);
spin_unlock(&ctx->completion_lock);
io_cqring_ev_posted(ctx);

We have many places repeating this sequence, and the three function
unlock section is not perfect from the maintainance perspective and also
makes it harder to add new locking/sync trick.

Introduce two helpers. io_cq_lock(), which is simple and only grabs
->completion_lock, and io_cq_unlock_post() encapsulating the three call
section.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fe0c682bf7f7b55d9be55b0d034be9c1949277dc.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
305bef9887 io_uring: hide eventfd assumptions in eventfd paths
Some io_uring-eventfd users assume that there won't be spurious wakeups.
That assumption has to be honoured by all io_cqring_ev_posted() callers,
which is inconvenient and from time to time leads to problems but should
be maintained to not break the userspace.

Instead of making the callers track whether a CQE was posted or not, hide
it inside io_eventfd_signal(). It saves ->cached_cq_tail it saw last time
and triggers the eventfd only when ->cached_cq_tail changed since then.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0ffc66bae37a2513080b601e4370e147faaa72c5.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
b321823a03 io_uring: fix io_poll_remove_all clang warnings
clang complains on bitwise operations with bools, add a bit more
verbosity to better show that we want to call io_poll_remove_all_table()
twice but with different arguments.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f11d21dcdf9233e0eeb15fa13b858a05a78eb310.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
ba3cdb6fbb io_uring: improve task exit timeout cancellations
Don't spin trying to cancel timeouts that are reachable but not
cancellable, e.g. already executing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ab8a7440a60bbdf69ae514f672ad050e43dd1b03.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
affa87db90 io_uring: fix multi ctx cancellation
io_uring_try_cancel_requests() loops until there is nothing left to do
with the ring, however there might be several rings and they might have
dependencies between them, e.g. via poll requests.

Instead of cancelling rings one by one, try to cancel them all and only
then loop over if we still potenially some work to do.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8d491fe02d8ac4c77ff38061cf86b9a827e8845c.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
d9dee4302a io_uring: remove ->flush_cqes optimisation
It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
->flush_cqes flag prevents from completion being flushed. Sometimes it's
high level of concurrency that enables it at least for one CQE, but
sometimes it doesn't save much because nobody waiting on the CQ.

Remove ->flush_cqes flag and the optimisation, it should benefit the
normal use case. Note, that there is no spurious eventfd problem with
that as checks for spuriousness were incorporated into
io_eventfd_signal().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com
[axboe: remove now dead state->flush_cqes variable]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
a830ffd287 io_uring: move io_eventfd_signal()
Move io_eventfd_signal() in the sources without any changes and kill its
forward declaration.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9ebebb3f6f56f5a5448a621e0b6a537720c43334.1655637157.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
9046c6415b io_uring: reshuffle io_uring/io_uring.h
It's a good idea to first do forward declarations and then inline
helpers, otherwise there will be keep stumbling on dependencies
between them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1d7fa6672ed43f20ccc0c54ae201369ebc3ebfab.1655637157.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
d142c3ec8d io_uring: remove extra io_commit_cqring()
We don't post events in __io_commit_cqring_flush() anymore but send all
requests to tw, so no need to do io_commit_cqring() there.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f2481e32375e749be89c42e4804268b608722cef.1655637157.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Jens Axboe
ad163a7e25 io_uring: move a few private types to local headers
Commit 3a3d47fa9cfd ("io_uring: make io_uring_types.h public") moved
a bunch of io_uring types to a kernel wide header, so we could make
tracing a bit saner rather than pass in a ton of arguments.

However, there are a few types in there that are not really needed to
be system wide. Move the cancel data and mapped buffers back to the
appropriate io_uring local headers.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
48863ffd3e io_uring: clean up tracing events
We have lots of trace events accepting an io_uring request and wanting
to print some of its fields like user_data, opcode, flags and so on.
However, as trace points were unaware of io_uring structures, we had to
pass all the fields as arguments. Teach trace/events/io_uring.h about
struct io_kiocb and stop the misery of passing a horde of arguments to
trace helpers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/40ff72f92798114e56d400f2b003beb6cde6ef53.1655384063.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
ab1c84d855 io_uring: make io_uring_types.h public
Move io_uring types to linux/include, need them public so tracing can
see the definitions and we can clean trace/events/io_uring.h

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a15f12e8cb7289b2de0deaddcc7518d98a132d17.1655384063.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
27a9d66fec io_uring: kill extra io_uring_types.h includes
io_uring/io_uring.h already includes io_uring_types.h, no need to
include it every time. Kill it in a bunch of places, it prepares us for
following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
b3659a65be io_uring: change ->cqe_cached invariant for CQE32
With IORING_SETUP_CQE32 ->cqe_cached doesn't store a real address but
rather an implicit offset into cqes. Store the real cqe pointer and
increment it accordingly if CQE32.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ee1838cba16bed96381a006950b36ba640d998c.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
e8c328c391 io_uring: deduplicate io_get_cqe() calls
Deduplicate calls to io_get_cqe() from __io_fill_cqe_req().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4fa077986cc3abab7c59ff4e7c390c783885465f.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
ae5735c69b io_uring: deduplicate __io_fill_cqe_req tracing
Deduplicate two trace_io_uring_complete() calls in __io_fill_cqe_req().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/277ed85dba5189ab7d932164b314013a0f0b0fdc.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
68494a65d0 io_uring: introduce io_req_cqe_overflow()
__io_fill_cqe_req() is hot and inlined, we want it to be as small as
possible. Add io_req_cqe_overflow() accepting only a request and doing
all overflow accounting, and replace with it two calls to 6 argument
io_cqring_event_overflow().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/048b9fbcce56814d77a1a540409c98c3d383edcb.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
faf88dde06 io_uring: don't inline __io_get_cqe()
__io_get_cqe() is not as hot as io_get_cqe(), no need to inline it, it
sheds ~500B from the binary.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c1ac829198a881b7af8710926f99a3559b9f24c0.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
d245bca637 io_uring: don't expose io_fill_cqe_aux()
Deduplicate some code and add a helper for filling an aux CQE, locking
and notification.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7c6557c8f9dc5c4cfb01292116c682a0ff61081.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Hao Xu
f09c8643f0 io_uring: kbuf: add comments for some tricky code
Add comments to explain why it is always under uring lock when
incrementing head in __io_kbuf_recycle. And rectify one comemnt about
kbuf consuming in iowq case.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220617050429.94293-1-hao.xu@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
9ca9fb24d5 io_uring: mutex locked poll hashing
Currently we do two extra spin lock/unlock pairs to add a poll/apoll
request to the cancellation hash table and remove it from there.

On the submission side we often already hold ->uring_lock and tw
completion is likely to hold it as well. Add a second cancellation hash
table protected by ->uring_lock. In concerns for latency because of a
need to have the mutex locked on the completion side, use the new table
only in following cases:

1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
   is little to no contention and so the main tw hander will almost
   always end up grabbing it before calling callbacks.

2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
   a major user of ->uring_lock.

3) apoll: we normally grab the lock on the completion side anyway to
   execute the request, so it's free.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1bbad9c78c454b7b92f100bbf46730a37df7194f.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Pavel Begunkov
5d7943d99d io_uring: propagate locking state to poll cancel
Poll cancellation will be soon need to grab ->uring_lock inside, pass
the locking state, i.e. issue_flags, inside the cancellation functions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b86781d047727c07163443b57551a3fa57c7c5e1.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
e6f89be614 io_uring: introduce a struct for hash table
Instead of passing around a pointer to hash buckets, add a bit of type
safety and wrap it into a structure.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d65bc3faba537ec2aca9eabf334394936d44bd28.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
a2cdd51932 io_uring: pass hash table into poll_find
In preparation for having multiple cancellation hash tables, pass a
table pointer into io_poll_find() and other poll cancel functions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a31c88502463dce09254240fa037352927d7ecc3.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
97bbdc06a4 io_uring: add IORING_SETUP_SINGLE_ISSUER
Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
of it, i.e. put limitations of submitters. Also, don't allow it together
with IOPOLL as we're not going to put it to good use.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4bcc41ee467fdf04c8aab8baf6ce3ba21858c3d4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
0ec6dca223 io_uring: use state completion infra for poll reqs
Use io_req_task_complete() for poll request completions, so it can
utilise state completions and save lots of unnecessary locking.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ced94cb5a728d8e386c640d052fd3da3f5d6891a.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
8b1dfd343a io_uring: clean up io_ring_ctx_alloc
Add a variable for the number of hash buckets in io_ring_ctx_alloc(),
makes it more readable.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/993926ed0d614ba9a76b2a85bebae2babcb13983.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
4a07723fb4 io_uring: limit the number of cancellation buckets
Don't allocate to many hash/cancellation buckets, there might be too
many, clamp it to 8 bits, or 256 * 64B = 16KB. We don't usually have too
many requests, and 256 buckets should be enough, especially since we
do hash search only in the cancellation path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b9620c8072ba61a2d50eba894b89bd93a94a9abd.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
4dfab8abb4 io_uring: clean up io_try_cancel
Get rid of an unnecessary extra goto in io_try_cancel() and simplify the
function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/48cf5417b43a8386c6c364dba1ad9b4c7382d158.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
1ab1edb0a1 io_uring: pass poll_find lock back
Instead of using implicit knowledge of what is locked or not after
io_poll_find() and co returns, pass back a pointer to the locked
bucket if any. If set the user must to unlock the spinlock.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dae1dc5749aa34367812ecf62f82fd3f053aae44.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Hao Xu
38513c464d io_uring: switch cancel_hash to use per entry spinlock
Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/05d1e135b0c8bce9d1441e6346776589e5783e26.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Hao Xu
3654ab0c51 io_uring: poll: remove unnecessary req->ref set
We now don't need to set req->refcount for poll requests since the
reworked poll code ensures no request release race.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ec6fee45705890bdb968b0c175519242753c0215.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
53ccf69bda io_uring: don't inline io_put_kbuf
io_put_kbuf() is huge, don't bloat the kernel with inlining.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2e21ccf0be471ffa654032914b9430813cae53f8.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
7012c81593 io_uring: refactor io_req_task_complete()
Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ae3148ac7eb5cce3e06895cde306e9e959d6f6ae.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
75d7b3aec1 io_uring: kill REQ_F_COMPLETE_INLINE
REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.

As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/600ba20a9338b8a39b249b23d3d177803613dde4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
df9830d883 io_uring: rw: delegate sync completions to core io_uring
io_issue_sqe() from the io_uring core knows how to complete requests
based on the returned error code, we can delegate io_read()/io_write()
completion to it. Make kiocb_done() to return the right completion
code and propagate it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/32ef005b45d23bf6b5e6837740dc0331bb051bd4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Jens Axboe
bb8f870031 io_uring: remove unused IO_REQ_CACHE_SIZE defined
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
c65f5279ba io_uring: don't set REQ_F_COMPLETE_INLINE in tw
io_req_task_complete() enqueues requests for state completion itself, no
need for REQ_F_COMPLETE_INLINE, which is only serve the purpose of not
bloating the kernel.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/aca80f71464ad02c06f1311d998a2d6ee0b31573.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
3a08576b96 io_uring: remove check_cq checking from hot paths
All ctx->check_cq events are slow path, don't test every single flag one
by one in the hot path, but add a common guarding if.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dff026585cea7ff3a172a7c83894a3b0111bbf6a.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
aeaa72c694 io_uring: never defer-complete multi-apoll
Luckily, nnobody completes multi-apoll requests outside the polling
functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
requests if used.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a65ed3f5effd9321ee06e6edea294a03be3e15a0.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
6a02e4be81 io_uring: inline ->registered_rings
There can be only 16 registered rings, no need to allocate an array for
them separately but store it in tctx.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/495f0b953c87994dd9e13de2134019054fa5830d.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
48c13d8980 io_uring: explain io_wq_work::cancel_seq placement
Add a comment on why we keep ->cancel_seq in struct io_wq_work instead
of struct io_kiocb despite it needed only by io_uring but not io-wq.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/988e87eec9dc700b5dae933df3aefef303502f6c.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
aa1e90f64e io_uring: move small helpers to headers
There is a bunch of inline helpers that will be useful not only to the
core of io_uring, move them to headers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/22df99c83723e44cba7e945e8519e64e3642c064.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:13 -06:00
Pavel Begunkov
22eb2a3fde io_uring: refactor ctx slow data placement
Shove all slow path data at the end of ctx and get rid of extra
indention.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bcaf200298dd469af20787650550efc66d89bef2.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Pavel Begunkov
aff5b2df9e io_uring: better caching for ctx timeout fields
Following timeout fields access patterns, move all of them into a
separate cache line inside ctx, so they don't intervene with normal
completion caching, especially since timeout removals and completion
are separated and the later is done via tw.

It also sheds some bytes from io_ring_ctx, 1216B -> 1152B

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4b163793072840de53b3cb66e0c2995e7226ff78.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Pavel Begunkov
b25436038f io_uring: move defer_list to slow data
draining is slow path, move defer_list to the end where slow data lives
inside the context.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e16379391ca72b490afdd24e8944baab849b4a7b.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Pavel Begunkov
5ff4fdffad io_uring: make reg buf init consistent
The default (i.e. empty) state of register buffer is dummy_ubuf, so set
it to dummy on init instead of NULL.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c5456aecf03d9627fbd6e65e100e2b5293a6151e.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
61a2732af4 io_uring: deprecate epoll_ctl support
As far as we know, nobody ever adopted the epoll_ctl management via
io_uring. Deprecate it now with a warning, and plan on removing it in
a later kernel version. When we do remove it, we can revert the following
commits as well:

39220e8d4a ("eventpoll: support non-blocking do_epoll_ctl() calls")
58e41a44c4 ("eventpoll: abstract out epoll_ctl() handler")

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/io-uring/CAHk-=wiTyisXBgKnVHAGYCNvkmjk=50agS2Uk6nr+n3ssLZg2w@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
b9ba8a4463 io_uring: add support for level triggered poll
By default, the POLL_ADD command does edge triggered poll - if we get
a non-zero mask on the initial poll attempt, we complete the request
successfully.

Support level triggered by always waiting for a notification, regardless
of whether or not the initial mask matches the file state.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
d9b57aa3cf io_uring: move opcode table to opdef.c
We already have the declarations in opdef.h, move the rest into its own
file rather than in the main io_uring.c file.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
f3b44f92e5 io_uring: move read/write related opcodes to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
c98817e6cd io_uring: move remaining file table manipulation to filetable.c
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
7357298448 io_uring: move rsrc related data, core, and commands
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
3b77495a97 io_uring: split provided buffers handling into its own file
Move both the opcodes related to it, and the internals code dealing with
it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
7aaff708a7 io_uring: move cancelation into its own file
This also helps cleanup the io_uring.h cancel parts, as we can make
things static in the cancel.c file, mostly.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
329061d3e2 io_uring: move poll handling into its own file
Add a io_poll_issue() rather than export the general task_work locking
and io_issue_sqe(), and put the io_op_defs definition and structure into
a separate header file so that poll can use it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
cfd22e6b33 io_uring: add opcode name to io_op_defs
This kills the last per-op switch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
92ac8beaea io_uring: include and forward-declaration sanitation
Remove some dead headers we no longer need, and get rid of the
io_ring_ctx and io_uring_fops forward declarations.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
c9f06aa7de io_uring: move io_uring_task (tctx) helpers into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
a4ad4f748e io_uring: move fdinfo helpers to its own file
This also means moving a bit more of the fixed file handling to the
filetable side, which makes sense separately too.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
e5550a1447 io_uring: use io_is_uring_fops() consistently
Convert the last spots that check for io_uring_fops to use the provided
helper instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
17437f3114 io_uring: move SQPOLL related handling into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
59915143e8 io_uring: move timeout opcodes and handling into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
e418bbc97b io_uring: move our reference counting into a header
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
36404b09aa io_uring: move msg_ring into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:12 -06:00
Jens Axboe
f9ead18c10 io_uring: split network related opcodes into its own file
While at it, convert the handlers to just use io_eopnotsupp_prep()
if CONFIG_NET isn't set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
e0da14def1 io_uring: move statx handling to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
a9c210cebe io_uring: move epoll handler to its own file
Would be nice to sort out Kconfig for this and don't even compile
epoll.c if we don't have epoll configured.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
4cf9049528 io_uring: add a dummy -EOPNOTSUPP prep handler
Add it and use it for the epoll handling, if epoll isn't configured.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
99f15d8d61 io_uring: move uring_cmd handling to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
cd40cae29e io_uring: split out open/close operations
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
453b329be5 io_uring: separate out file table handling code
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
f4c163dd7d io_uring: split out fadvise/madvise operations
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
0d58472740 io_uring: split out fs related sync/fallocate functions
This splits out sync_file_range, fsync, and fallocate.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
531113bbd5 io_uring: split out splice related operations
This splits out splice and tee support.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
11aeb71406 io_uring: split out filesystem related operations
This splits out renameat, unlinkat, mkdirat, symlinkat, and linkat.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
e28683bdfc io_uring: move nop into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
5e2a18d93f io_uring: move xattr related opcodes to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
97b388d70b io_uring: handle completions in the core
Normally request handlers complete requests themselves, if they don't
return an error. For the latter case, the core will complete it for
them.

This is unhandy for pushing opcode handlers further out, as we don't
want a bunch of inline completion code and we don't want to make the
completion path slower than it is now.

Let the core handle any completion, unless the handler explicitly
asks us not to.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
de23077eda io_uring: set completion results upfront
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
e27f928ee1 io_uring: add io_uring_types.h
This adds definitions of structs that both the core and the various
opcode handlers need to know about.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
4d4c9cff4f io_uring: define a request type cleanup handler
This can move request type specific cleanup into a private handler,
removing the need for the core io_uring parts to know what types
they are dealing with.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
890968dc03 io_uring: unify struct io_symlink and io_hardlink
They are really just a subset of each other, just use the one type.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
9a3a11f977 io_uring: convert iouring_cmd to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
ceb452e1b4 io_uring: convert xattr to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
ea5af87d29 io_uring: convert rsrc_update to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:11 -06:00
Jens Axboe
c1ee559501 io_uring: convert msg and nop to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
2511d3030c io_uring: convert splice to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
3e93a3571a io_uring: convert epoll to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
bb040a21fd io_uring: convert file system request types to use io_cmd_type
This converts statx, rename, unlink, mkdir, symlink, and hardlink to
use io_cmd_type.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
37d4842f11 io_uring: convert madvise/fadvise to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
dd752582e3 io_uring: convert open/close path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
a43714ace5 io_uring: convert timeout path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
f38987f09a io_uring: convert cancel path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
e4a71006ea io_uring: convert the sync and fallocate paths to use io_cmd_type
They all share the same struct io_sync, convert them to use the
io_cmd_type approach instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
8ff86d85b7 io_uring: convert net related opcodes to use io_cmd_type
This converts accept, connect, send/recv, sendmsg/recvmsg, shutdown, and
socket to use io_cmd_type.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
bd8587e499 io_uring: remove recvmsg knowledge from io_arm_poll_handler()
There's a special case for recvmsg with MSG_ERRQUEUE set. This is
problematic as it means the core needs to know about this special
request type.

For now, just add a generic flag for it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
c24b154967 io_uring: convert poll_update path to use io_cmd_type
Remove struct io_poll_update from io_kiocb, and convert the poll path to
use the io_cmd_type approach instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
8d4388d116 io_uring: convert poll path to use io_cmd_type
Remove struct io_poll_iocb from io_kiocb, and convert the poll path to
use the io_cmd_type approach instead.

While at it, rename io_poll_iocb to io_poll which is consistent with the
other request type private structures.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
3c306fb2f9 io_uring: convert read/write path to use io_cmd_type
Remove struct io_rw from io_kiocb, and convert the read/write path to
use the io_cmd_type approach instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
f49eca2156 io_uring: add generic command payload type to struct io_kiocb
Each opcode generally has a command structure in io_kiocb which it can
use to store data associated with that request.

In preparation for having the core layer not know about what's inside
these fields, add a generic io_cmd_data type and put in the union as
well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
dc919caff6 io_uring: move req async preparation into opcode handler
Define an io_op_def->prep_async() handler and push the async preparation
to there. Since we now have that, we can drop ->needs_async_setup, as
they mean the same thing.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00
Jens Axboe
ed29b0b4fd io_uring: move to separate directory
In preparation for splitting io_uring up a bit, move it into its own
top level directory. It didn't really belong in fs/ anyway, as it's
not a file system only API.

This adds io_uring/ and moves the core files in there, and updates the
MAINTAINERS file for the new location.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:10 -06:00