io_uring_cancel_files() cancels all request that match files regardless
of task. There is no real need in that, cancel only requests of the
specified task. That also handles SQPOLL case as it already changes task
to it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add io_match_task() that matches both task and files.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If IORING_SETUP_SQPOLL is set all requests belong to the corresponding
SQPOLL task, so skip task checking in that case and always match.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Inline io_import_iovec() and leave only its former __io_import_iovec()
renamed to the original name. That makes it more obious what is reused in
io_read/write().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_size and iov_count in io_read() and io_write() hold the same value,
kill the last one.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is the only code that relies on import_iovec() returning
iter.count on success.
This allows a better interface to import_iovec().
Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now users who want to get woken when waiting for events should submit a
timeout command first. It is not safe for applications that split SQ and
CQ handling between two threads, such as mysql. Users should synchronize
the two threads explicitly to protect SQ and that will impact the
performance.
This patch adds support for timeout to existing io_uring_enter(). To
avoid overloading arguments, it introduces a new parameter structure
which contains sigmask and timeout.
I have tested the workloads with one thread submiting nop requests
while the other reaping the cqe with timeout. It shows 1.8~2x faster
when the iodepth is 16.
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
[axboe: various cleanups/fixes, and name change to SIG_IS_DATA]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We unconditionally call blk_start_plug() when starting the IO
submission, but we only really should do that if we have more than 1
request to submit AND we're potentially dealing with block based storage
underneath. For any other type of request, it's just a waste of time to
do so.
Add a ->plug bit to io_op_def and set it for read/write requests. We
could make this more precise and check the file itself as well, but it
doesn't matter that much and would quickly become more expensive.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We've got extra 8 bytes in the 2nd cacheline, put ->fixed_file_refs
there, so inline execution path mostly doesn't touch the 3rd cacheline
for fixed_file requests as well.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Singly linked list for keeping linked requests is enough, because we
almost always operate on the head and traverse forward with the
exception of linked timeouts going 1 hop backwards.
Replace ->link_list with a handmade singly linked list. Also kill
REQ_F_LINK_HEAD in favour of checking a newly added ->list for NULL
directly.
That saves 8B in io_kiocb, is not as heavy as list fixup, makes better
use of cache by not touching a previous request (i.e. last request of
the link) each time on list modification and optimises cache use further
in the following patch, and actually makes travesal easier removing in
the end some lines. Also, keeping invariant in ->list instead of having
REQ_F_LINK_HEAD is less error-prone.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for converting singly linked lists for chaining requests,
make linked timeouts save requests that they're responsible for and not
count on doubly linked list for back referencing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Explicitly save not only a link's head in io_submit_sqe[s]() but the
tail as well. That's in preparation for keeping linked requests in a
singly linked list.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't use a single struct for polls and poll remove requests, they have
totally different layouts.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that SQPOLL supports non-registered files and grabs the file table,
we can relax the restriction on open/close/accept/connect and allow
them on a ring that is setup with IORING_SETUP_SQPOLL.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The restriction of needing fixed files for SQPOLL is problematic, and
prevents/inhibits several valid uses cases. With the referenced
files_struct that we have now, it's trivially supportable.
Treat ->files like we do the mm for the SQPOLL thread - grab a reference
to it (and assign it), and drop it when we're done.
This feature is exposed as IORING_FEAT_SQPOLL_NONFIXED.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This adds support for the shutdown(2) system call, which is useful for
dealing with sockets.
shutdown(2) may block, so we have to punt it to async context.
Suggested-by: Norman Maurer <norman.maurer@googlemail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
CAP_SYS_ADMIN is too restrictive for a lot of uses cases, allow
CAP_SYS_NICE based on the premise that such users are already allowed
to raise the priority of tasks.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+4DAwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgphdOD/9xOEnYPuekvVH9G9nyNd//Q9fPArG2+j6V
/MCnze07GNtDt7z15oR+T07hKXmf+Ejh4nu3JJ6MUNfe/47hhJqHSxRHU6+PJCjk
hPrsaTsDedxxLEDiLmvhXnUPzfVzJtefxVAAaKikWOb3SBqLdh7xTFSlor1HbRBl
Zk4d343cjBDYfvSSt/zMWDzwwvramdz7rJnnPMKXITu64ITL5314vuK2YVZmBOet
YujSah7J8FL1jKhiG1Iw5rayd2Q3smnHWIEQ+lvW6WiTvMJMLOxif2xNF4/VEZs1
CBGJUQt42LI6QGEzRBHohcefZFuPGoxnduSzHCOIhh7d6+k+y9mZfsPGohr3g9Ov
NotXpVonnA7GbRqzo1+IfBRve7iRONdZ3/LBwyRmqav4I4jX68wXBNH5IDpVR0Sn
c31avxa/ZL7iLIBx32enp0/r3mqNTQotEleSLUdyJQXAZTyG2INRhjLLXTqSQ5BX
oVp0fZzKCwsr6HCPZpXZ/f2G7dhzuF0ghoceC02GsOVooni22gdVnQj+AWNus398
e+wcimT4MX6AHNFxO2aUtJow0KWWZRzC1p5Mxu/9W3YiMtJiC0YOGePfSqiTqX0g
Uk0H5dOAgBUQrAsusf7bKr0K6W25yEk/JipxhWqi0rC71x42mLTsCT1wxSCvLwqs
WxhdtVKroQ==
=7PAe
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Mostly regression or stable fodder:
- Disallow async path resolution of /proc/self
- Tighten constraints for segmented async buffered reads
- Fix double completion for a retry error case
- Fix for fixed file life times (Pavel)"
* tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
io_uring: order refnode recycling
io_uring: get an active ref_node from files_data
io_uring: don't double complete failed reissue request
mm: never attempt async page lock if we've transferred data already
io_uring: handle -EOPNOTSUPP on path resolution
proc: don't allow async path resolution of /proc/self components
Don't recycle a refnode until we're done with all requests of nodes
ejected before.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
An active ref_node always can be found in ctx->files_data, it's much
safer to get it this way instead of poking into files_data->ref_list.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Zorro reports that an xfstest test case is failing, and it turns out that
for the reissue path we can potentially issue a double completion on the
request for the failure path. There's an issue around the retry as well,
but for now, at least just make sure that we handle the error path
correctly.
Cc: stable@vger.kernel.org
Fixes: b63534c41e ("io_uring: re-issue block requests that failed because of resources")
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Any attempt to do path resolution on /proc/self from an async worker will
yield -EOPNOTSUPP. We can safely do that resolution from the task itself,
and without blocking, so retry it from there.
Ideally io_uring would know this upfront and not have to go through the
worker thread to find out, but that doesn't currently seem feasible.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Minor cleanups of the sb_start_* fs freeze helpers.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl+sDaIACgkQ+H93GTRK
tOu4sw//bIdBw11YfI9sPtMJR/RkK3lm/pU4A/eJYGD65Mzk8J4kNi6jXKuyqQ8e
/RpTqKWOwVW05Qg5HlKTxXRyr5Q788+EuBQH2t8VukWVdAgK2TFvNTTXb7QDsNSD
SneC7Sox3CEO+vYnBsr7tUjfl7AYH0uFTxLkvpYqSQBn2+jo2x0s7NyKKZSDAASI
+Rmhinw4QjjAHYC54nBy6Q47XhrZJj7XCODJdEql81cKSJUvjCo3url3sNvGXXNW
oXbs5IO5cVQrQx6n9rQxCfkN1dz9c/CBopYFwdgmg76Bj4VLSzCYVecnMeDl53pV
3jXesNtJcR2dz64e98K1Moof2dHSm0/NP0Q7KnMYEaGEl6tAtyjSx9lL2Qd6npG+
mG460UHd/7RHXoH/BTaCrtHHyA4pApHMqf+w3R2ienxrltKUJAEfGM/5x8o0ikWx
laeT0L/m6Yv/dGnDvNthhoF84tCiQUnxg+UeXiKv4R9uFL1bKMFPw5i1zWuXqqaX
yZPqUY1tiecQskr89AimOVI64L2MJ4DgBey1JzNL/XzPtw55Qu+LR6MkkaIC08Wu
ubGJTm6fPw3Cz8JYgn4WIgKB9Q7yAoKsyl0mGLQh2SJT1FS8WLct+SRPwXcMVfJT
VpkgjJW/ak5L+XfQU6Ev39zUasEAqdaxvPoTxUfne6spUiNbgrk=
=ZC9a
-----END PGP SIGNATURE-----
Merge tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull fs freeze fix and cleanups from Darrick Wong:
"A single vfs fix for 5.10, along with two subsequent cleanups.
A very long time ago, a hack was added to the vfs fs freeze protection
code to work around lockdep complaints about XFS, which would try to
run a transaction (which requires intwrite protection) to finalize an
xfs freeze (by which time the vfs had already taken intwrite).
Fast forward a few years, and XFS fixed the recursive intwrite problem
on its own, and the hack became unnecessary. Fast forward almost a
decade, and latent bugs in the code converting this hack from freeze
flags to freeze locks combine with lockdep bugs to make this reproduce
frequently enough to notice page faults racing with freeze.
Since the hack is unnecessary and causes thread race errors, just get
rid of it completely. Making this kind of vfs change midway through a
cycle makes me nervous, but a large enough number of the usual
VFS/ext4/XFS/btrfs suspects have said this looks good and solves a
real problem vector.
And once that removal is done, __sb_start_write is now simple enough
that it becomes possible to refactor the function into smaller,
simpler static inline helpers in linux/fs.h. The cleanup is
straightforward.
Summary:
- Finally remove the "convert to trylock" weirdness in the fs freezer
code. It was necessary 10 years ago to deal with nested
transactions in XFS, but we've long since removed that; and now
this is causing subtle race conditions when lockdep goes offline
and sb_start_* aren't prepared to retry a trylock failure.
- Minor cleanups of the sb_start_* fs freeze helpers"
* tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: move __sb_{start,end}_write* to fs.h
vfs: separate __sb_start_write into blocking and non-blocking helpers
vfs: remove lockdep bogosity in __sb_start_write
If an application specifies IORING_SETUP_CQSIZE to set the CQ ring size
to a specific size, we ensure that the CQ size is at least that of the
SQ ring size. But in doing so, we compare the already rounded up to power
of two SQ size to the as-of yet unrounded CQ size. This means that if an
application passes in non power of two sizes, we can return -EINVAL when
the final value would've been fine. As an example, an application passing
in 100/100 for sq/cq size should end up with 128 for both. But since we
round the SQ size first, we compare the CQ size of 100 to 128, and return
-EINVAL as that is too small.
Cc: stable@vger.kernel.org
Fixes: 33a107f0a1 ("io_uring: allow application controlled CQ ring size")
Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Break this function into two helpers so that it's obvious that the
trylock versions return a value that must be checked, and the blocking
versions don't require that. While we're at it, clean up the return
type mismatch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
We can't just go over linked requests because it may race with linked
timeouts. Take ctx->completion_lock in that case.
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Previous commit changed how we index the registered credentials, but
neglected to update one spot that is used when the personalities are
iterated through ->show_fdinfo(). Ensure we use the right struct type
for the iteration.
Reported-by: syzbot+a6d494688cdb797bdfce@syzkaller.appspotmail.com
Fixes: 1e6fa5216a ("io_uring: COW io_identity on mismatch")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If there is a long-standing request of one task locking up execution of
deferred requests, and the defer list contains requests of another task
(all files-less), then a potential execution of __io_uring_task_cancel()
by that another task will sleep until that first long-standing request
completion, and that may take long.
E.g.
tsk1: req1/read(empty_pipe) -> tsk2: req(DRAIN)
Then __io_uring_task_cancel(tsk2) waits for req1 completion.
It seems we even can manufacture a complicated case with many tasks
sharing many rings that can lock them forever.
Cancel deferred requests for __io_uring_task_cancel() as well.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Current io_match_files() check in io_cqring_overflow_flush() is useless
because requests drop ->files before going to the overflow list, however
linked to it request do not, and we don't check them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We can't bundle this into one operation, as the identity may not have
originated from the tctx to begin with. Drop one ref for each of them
separately, if they don't match the static assignment. If we don't, then
if the identity is a lookup from registered credentials, we could be
freeing that identity as we're dropping a reference assuming it came from
the tctx. syzbot reports this as a use-after-free, as the identity is
still referencable from idr lookup:
==================================================================
BUG: KASAN: use-after-free in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
BUG: KASAN: use-after-free in atomic_fetch_add_relaxed include/asm-generic/atomic-instrumented.h:142 [inline]
BUG: KASAN: use-after-free in __refcount_add include/linux/refcount.h:193 [inline]
BUG: KASAN: use-after-free in __refcount_inc include/linux/refcount.h:250 [inline]
BUG: KASAN: use-after-free in refcount_inc include/linux/refcount.h:267 [inline]
BUG: KASAN: use-after-free in io_init_req fs/io_uring.c:6700 [inline]
BUG: KASAN: use-after-free in io_submit_sqes+0x15a9/0x25f0 fs/io_uring.c:6774
Write of size 4 at addr ffff888011e08e48 by task syz-executor165/8487
CPU: 1 PID: 8487 Comm: syz-executor165 Not tainted 5.10.0-rc1-next-20201102-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xae/0x4c8 mm/kasan/report.c:385
__kasan_report mm/kasan/report.c:545 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
atomic_fetch_add_relaxed include/asm-generic/atomic-instrumented.h:142 [inline]
__refcount_add include/linux/refcount.h:193 [inline]
__refcount_inc include/linux/refcount.h:250 [inline]
refcount_inc include/linux/refcount.h:267 [inline]
io_init_req fs/io_uring.c:6700 [inline]
io_submit_sqes+0x15a9/0x25f0 fs/io_uring.c:6774
__do_sys_io_uring_enter+0xc8e/0x1b50 fs/io_uring.c:9159
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x440e19
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 0f fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff644ff178 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000440e19
RDX: 0000000000000000 RSI: 000000000000450c RDI: 0000000000000003
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000022b4850
R13: 0000000000000010 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 8487:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:461
kmalloc include/linux/slab.h:552 [inline]
io_register_personality fs/io_uring.c:9638 [inline]
__io_uring_register fs/io_uring.c:9874 [inline]
__do_sys_io_uring_register+0x10f0/0x40a0 fs/io_uring.c:9924
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 8487:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0x102/0x140 mm/kasan/common.c:422
slab_free_hook mm/slub.c:1544 [inline]
slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1577
slab_free mm/slub.c:3140 [inline]
kfree+0xdb/0x360 mm/slub.c:4122
io_identity_cow fs/io_uring.c:1380 [inline]
io_prep_async_work+0x903/0xbc0 fs/io_uring.c:1492
io_prep_async_link fs/io_uring.c:1505 [inline]
io_req_defer fs/io_uring.c:5999 [inline]
io_queue_sqe+0x212/0xed0 fs/io_uring.c:6448
io_submit_sqe fs/io_uring.c:6542 [inline]
io_submit_sqes+0x14f6/0x25f0 fs/io_uring.c:6784
__do_sys_io_uring_enter+0xc8e/0x1b50 fs/io_uring.c:9159
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff888011e08e00
which belongs to the cache kmalloc-96 of size 96
The buggy address is located 72 bytes inside of
96-byte region [ffff888011e08e00, ffff888011e08e60)
The buggy address belongs to the page:
page:00000000a7104751 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11e08
flags: 0xfff00000000200(slab)
raw: 00fff00000000200 ffffea00004f8540 0000001f00000002 ffff888010041780
raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888011e08d00: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
ffff888011e08d80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
> ffff888011e08e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
^
ffff888011e08e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
ffff888011e08f00: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
==================================================================
Reported-by: syzbot+625ce3bb7835b63f7f3d@syzkaller.appspotmail.com
Fixes: 1e6fa5216a ("io_uring: COW io_identity on mismatch")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ensure we get a valid view of the task mm, by using task_lock() when
attempting to grab the original task mm.
Reported-by: syzbot+b57abf7ee60829090495@syzkaller.appspotmail.com
Fixes: 2aede0e417 ("io_uring: stash ctx task reference for SQPOLL")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Track if a given task io_uring context contains SQPOLL instances, so we
can iterate those for cancelation (and request counts). This ensures that
we properly wait on SQPOLL contexts, and find everything that needs
canceling.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+cRyAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpiisD/9qmkOK7zfdh6HWyMAKm4m2GHMlhZy56VQ0
MklbKcYblfg69u1lmvcDv5/9l2h3ESxCMDYQbl/yuQ0MepK0PrDyndN3hVg8y8VW
tRP6rHvOVBLH/R8C1ClfWJ2gVxrH776GOugV3q7wY8uD+caNug12kjV3YFVwychD
akSoSzpCkN5BFfMkWgapcnvQD+SR5lPJeojru9kH94BIUC9zOCgkMVlZ1TAue8B4
VNHP5ghv/t4SWzmKiuLnboGUP6NVk9EPBPmVFNklfdr6kDpkKGRofVnS54/dcRRG
JHpP0dvAVSjpKztW2f1fFeG/0OIRYuLuMS5SERrgIacIPVuz21i5VKpNYP7wKb24
oarxRtMBsOmkejfSPiSlGlQkcfB1j6K/13a+xIFkczT62SdO2wPcg/4BFuQx+yq0
Pw8gSXQ3QltcfsojojjQ61cnT1p0mSS7uObcgT6wVQQ8rFQaqSaZLhXFCvrb3731
28py3baghl0IrvFDaBjbJFbetGBhuaMxoBrr3B3sZsF5UMVHXUYgweJB+gGADE3s
SlYaYHxgiraPSpl6F8zLse1WGPISRjchTArRcntgYlEXIlFrqWGNKOOIBD6y7OZe
3ARvPaUZsmi6oZ5SlEqTmAsSqZDo0UzyWzpB2yDBLY90Re/b2lwzhapgI4WbqX+W
Bngw2TwZFg==
=xYFz
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.10-2020-10-30' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Fixes for linked timeouts (Pavel)
- Set IO_WQ_WORK_CONCURRENT early for async offload (Pavel)
- Two minor simplifications that make the code easier to read and
follow (Pavel)
* tag 'io_uring-5.10-2020-10-30' of git://git.kernel.dk/linux-block:
io_uring: use type appropriate io_kiocb handler for double poll
io_uring: simplify __io_queue_sqe()
io_uring: simplify nxt propagation in io_queue_sqe
io_uring: don't miss setting IO_WQ_WORK_CONCURRENT
io_uring: don't defer put of cancelled ltimeout
io_uring: always clear LINK_TIMEOUT after cancel
io_uring: don't adjust LINK_HEAD in cancel ltimeout
io_uring: remove opcode check on ltimeout kill
io_poll_double_wake() is called for both request types - both pure poll
requests, and internal polls. This means that we should be using the
right handler based on the request type. Use the one that the original
caller already assigned for the waitqueue handling, that will always
match the correct type.
Cc: stable@vger.kernel.org # v5.8+
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+UQh8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpl7WEADOTslFOof1RUPMb0Qvj4GO4cjvoFLW7KLt
B83PmlW3WJpZrSiqZlrSPwcDELVphw67RL/2hp0jAfT1t00OdCOYQDmh7+kg9lnI
fzu4NzfTKbriRWEtodIqZCiDoGXjzJGxNffhxPEt33YxRErI/fvuD/TzxwGGUInW
OZ3Aze9Nj2DQ/eXhio48n4letTK6xNsjGDWvzwinthHWeBbID01isLlTei20PKU5
Dk1buueUuEr/vNjJwEeRd8yDXZeLZ/br3gw/3B71MJoi2PUaXvuS8DV4LmXg2SS5
yN0udSNk4AP/UlrVqN9bEqdbSTBSf2JIEW3k3/SEUjcjw6hMnbLeoW2vZx6Xvk6T
vvAVHesLpCu8oEdWAkFm6Rb6ptJ1XpRrWWYxi1J1SB2Y8cGyGS1GoZWWPknM5M3I
b1dNj18Bb+MmFvuKr7YYrb77tECuywxTHVGj6WwBOIlYrg44XQOumYYH9OmvZFz1
6vWaXjLPOIM8fpAKX5Tx5sAy/FMl17H8I5AD2bZVvD0h0MqzLnvHEYahcAfOfb9y
qpkdGnbAWo6IIkCrDcSOV4q6dmWu3as9eSs1j/6Xl4WoJ2MT9C//Gpv7iNMxxozy
CznEPcbA8N9QazQmoebtB3gTBVyGUUKVDdVNzleMj9KD6yPlKFZ6+FZdikX59I9M
t9QGh3+gow==
=xidc
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- fsize was missed in previous unification of work flags
- Few fixes cleaning up the flags unification creds cases (Pavel)
- Fix NUMA affinities for completely unplugged/replugged node for io-wq
- Two fallout fixes from the set_fs changes. One local to io_uring, one
for the splice entry point that io_uring uses.
- Linked timeout fixes (Pavel)
- Removal of ->flush() ->files work-around that we don't need anymore
with referenced files (Pavel)
- Various cleanups (Pavel)
* tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block:
splice: change exported internal do_splice() helper to take kernel offset
io_uring: make loop_rw_iter() use original user supplied pointers
io_uring: remove req cancel in ->flush()
io-wq: re-set NUMA node affinities if CPUs come online
io_uring: don't reuse linked_timeout
io_uring: unify fsize with def->work_flags
io_uring: fix racy REQ_F_LINK_TIMEOUT clearing
io_uring: do poll's hash_node init in common code
io_uring: inline io_poll_task_handler()
io_uring: remove extra ->file check in poll prep
io_uring: make cached_cq_overflow non atomic_t
io_uring: inline io_fail_links()
io_uring: kill ref get/drop in personality init
io_uring: flags-based creds init in queue
Restructure __io_queue_sqe() so it follows simple if/else if/else
control flow. It's more readable and removes extra goto/labels.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't overuse goto's, complex control flow doesn't make compilers happy
and makes code harder to read.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Set IO_WQ_WORK_CONCURRENT for all REQ_F_FORCE_ASYNC requests, do that in
that is also looks better.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Inline io_link_cancel_timeout() and __io_kill_linked_timeout() into
io_kill_linked_timeout(). That allows to easily move a put of a cancelled
linked timeout out of completion_lock and to not deferring it. It is also
much more readable when not scattered across three different functions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move REQ_F_LINK_TIMEOUT clearing out of __io_kill_linked_timeout()
because it might return early and leave the flag set. It's not a
problem, but may be confusing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
An armed linked timeout can never be a head of a link, so we don't need
to clear REQ_F_LINK_HEAD for it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
__io_kill_linked_timeout() already checks for REQ_F_LTIMEOUT_ACTIVE and
it's set only for linked timeouts. No need to verify next request's
opcode.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+SOXIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptrcD/93VUDmRAn73ChKNd0TtXUicJlAlNLVjvfs
VFTXWBDnlJnGkZT7ElkDD9b8dsz8l4xGf/QZ5dzhC/th2OsfObQkSTfe0lv5cCQO
mX7CRSrDpjaHtW+WGPDa0oQsGgIfpqUz2IOg9NKbZZ1LJ2uzYfdOcf3oyRgwZJ9B
I3sh1vP6OzjZVVCMmtMTM+sYZEsDoNwhZwpkpiwMmj8tYtOPgKCYKpqCiXrGU0x2
ML5FtDIwiwU+O3zYYdCBWqvCb2Db0iA9Aov2whEBz/V2jnmrN5RMA/90UOh1E2zG
br4wM1Wt3hNrtj5qSxZGlF/HEMYJVB8Z2SgMjYu4vQz09qRVVqpGdT/dNvLAHQWg
w4xNCj071kVZDQdfwnqeWSKYUau9Xskvi8xhTT+WX8a5CsbVrM9vGslnS5XNeZ6p
h2D3Q+TAYTvT756icTl0qsYVP7PrPY7DdmQYu0q+Lc3jdGI+jyxO2h9OFBRLZ3p6
zFX2N8wkvvCCzP2DwVnnhIi/GovpSh7ksHnb039F36Y/IhZPqV1bGqdNQVdanv6I
8fcIDM6ltRQ7dO2Br5f1tKUZE9Pm6x60b/uRVjhfVh65uTEKyGRhcm5j9ztzvQfI
cCBg4rbVRNKolxuDEkjsAFXVoiiEEsb7pLf4pMO+Dr62wxFG589tQNySySneUIVZ
J9ILnGAAeQ==
=aVWo
-----END PGP SIGNATURE-----
Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block
Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:
- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().
- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"
* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
We jump through a hoop for fixed buffers, where we first map these to
a bvec(), then kmap() the bvec to obtain the pointer we copy to/from.
This was always a bit ugly, and with the set_fs changes, it ends up
being practically problematic as well.
There's no need to jump through these hoops, just use the original user
pointers and length for the non iter based read/write.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Every close(io_uring) causes cancellation of all inflight requests
carrying ->files. That's not nice but was neccessary up until recently.
Now task->files removal is handled in the core code, so that part of
flush can be removed.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Clear linked_timeout for next requests in __io_queue_sqe() so we won't
queue it up unnecessary when it's going to be punted.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Cc: stable@vger.kernel.org # v5.9
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This one was missed in the earlier conversion, should be included like
any of the other IO identity flags. Make sure we restore to RLIM_INIFITY
when dropping the personality again.
Fixes: 98447d65b4 ("io_uring: move io identity items into separate struct")
Signed-off-by: Jens Axboe <axboe@kernel.dk>