linux/io_uring
Jens Axboe 1d60d74e85 io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
When io_uring starts a write, it'll call kiocb_start_write() to bump the
super block rwsem, preventing any freezes from happening while that
write is in-flight. The freeze side will grab that rwsem for writing,
excluding any new writers from happening and waiting for existing writes
to finish. But io_uring unconditionally uses kiocb_start_write(), which
will block if someone is currently attempting to freeze the mount point.
This causes a deadlock where freeze is waiting for previous writes to
complete, but the previous writes cannot complete, as the task that is
supposed to complete them is blocked waiting on starting a new write.
This results in the following stuck trace showing that dependency with
the write blocked starting a new write:

task:fio             state:D stack:0     pid:886   tgid:886   ppid:876
Call trace:
 __switch_to+0x1d8/0x348
 __schedule+0x8e8/0x2248
 schedule+0x110/0x3f0
 percpu_rwsem_wait+0x1e8/0x3f8
 __percpu_down_read+0xe8/0x500
 io_write+0xbb8/0xff8
 io_issue_sqe+0x10c/0x1020
 io_submit_sqes+0x614/0x2110
 __arm64_sys_io_uring_enter+0x524/0x1038
 invoke_syscall+0x74/0x268
 el0_svc_common.constprop.0+0x160/0x238
 do_el0_svc+0x44/0x60
 el0_svc+0x44/0xb0
 el0t_64_sync_handler+0x118/0x128
 el0t_64_sync+0x168/0x170
INFO: task fsfreeze:7364 blocked for more than 15 seconds.
      Not tainted 6.12.0-rc5-00063-g76aaf945701c #7963

with the attempting freezer stuck trying to grab the rwsem:

task:fsfreeze        state:D stack:0     pid:7364  tgid:7364  ppid:995
Call trace:
 __switch_to+0x1d8/0x348
 __schedule+0x8e8/0x2248
 schedule+0x110/0x3f0
 percpu_down_write+0x2b0/0x680
 freeze_super+0x248/0x8a8
 do_vfs_ioctl+0x149c/0x1b18
 __arm64_sys_ioctl+0xd0/0x1a0
 invoke_syscall+0x74/0x268
 el0_svc_common.constprop.0+0x160/0x238
 do_el0_svc+0x44/0x60
 el0_svc+0x44/0xb0
 el0t_64_sync_handler+0x118/0x128
 el0t_64_sync+0x168/0x170

Fix this by having the io_uring side honor IOCB_NOWAIT, and only attempt a
blocking grab of the super block rwsem if it isn't set. For normal issue
where IOCB_NOWAIT would always be set, this returns -EAGAIN which will
have io_uring core issue a blocking attempt of the write. That will in
turn also get completions run, ensuring forward progress.

Since freezing requires CAP_SYS_ADMIN in the first place, this isn't
something that can be triggered by a regular user.

Cc: stable@vger.kernel.org # 5.10+
Reported-by: Peter Mann <peter.mann@sh.cz>
Link: https://lore.kernel.org/io-uring/38c94aec-81c9-4f62-b44e-1d87f5597644@sh.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-31 08:21:02 -06:00
..
advise.c io_uring/advise: support 64-bit lengths 2024-06-16 14:54:55 -06:00
advise.h
alloc_cache.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
cancel.c io_uring: fix warnings on shadow variables 2024-04-15 08:10:26 -06:00
cancel.h io_uring: fix cancellation overwriting req->flags 2024-06-13 19:25:28 -06:00
epoll.c
epoll.h
eventfd.c io_uring/eventfd: move refs to refcount_t 2024-09-08 16:43:57 -06:00
eventfd.h io_uring/eventfd: move eventfd handling to separate file 2024-06-16 14:54:55 -06:00
fdinfo.c io_uring/rsrc: change ubuf->ubuf_end to length tracking 2024-09-15 09:15:22 -06:00
fdinfo.h
filetable.c io_uring/filetable: don't unnecessarily clear/reset bitmap 2024-05-08 08:27:45 -06:00
filetable.h io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-08 13:27:03 -07:00
fs.c
fs.h
futex.c io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
futex.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
io_uring.c io_uring: fix casts to io_req_flags_t 2024-09-24 13:31:04 -06:00
io_uring.h io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work 2024-10-17 08:38:04 -06:00
io-wq.c io_uring/io-wq: inherit cpuset of cgroup in io worker 2024-09-11 07:27:56 -06:00
io-wq.h io_uring/io-wq: make io_wq_work flags atomic 2024-06-16 14:54:55 -06:00
kbuf.c for-6.12/io_uring-20240913 2024-09-16 13:29:00 +02:00
kbuf.h io_uring/kbuf: add support for incremental buffer consumption 2024-08-29 08:44:58 -06:00
Makefile io_uring: add GCOV_PROFILE_URING Kconfig option 2024-08-30 10:52:02 -06:00
memmap.c io_uring: don't attempt to mmap larger than what the user asks for 2024-05-29 09:53:14 -06:00
memmap.h io_uring: move mapping/allocation helpers to a separate file 2024-04-15 08:10:26 -06:00
msg_ring.c io_uring/msg_ring: fix uninitialized use of target_req->flags 2024-07-25 08:41:35 -06:00
msg_ring.h io_uring/msg_ring: add an alloc cache for io_kiocb entries 2024-06-24 08:39:55 -06:00
napi.c io_uring: user registered clockid for wait timeouts 2024-08-25 08:27:01 -06:00
napi.h io_uring/napi: postpone napi timeout adjustment 2024-08-25 08:27:01 -06:00
net.c io_uring/net: harden multishot termination case for recv 2024-09-30 08:26:59 -06:00
net.h io_uring: Introduce IORING_OP_LISTEN 2024-06-19 07:57:21 -06:00
nop.c io_uring: support to inject result for NOP 2024-05-10 06:09:45 -06:00
nop.h
notif.c io_uring/notif: disable LAZY_WAKE for linked notifs 2024-04-30 13:06:27 -06:00
notif.h io_uring/notif: implement notification stacking 2024-04-22 19:31:18 -06:00
opdef.c io_uring: Fix probe of disabled operations 2024-06-19 08:58:00 -06:00
opdef.h io_uring: Fix probe of disabled operations 2024-06-19 08:58:00 -06:00
openclose.c io_uring: enable audit and restrict cred override for IORING_OP_FIXED_FD_INSTALL 2024-01-23 15:25:14 -07:00
openclose.h io_uring/openclose: add support for IORING_OP_FIXED_FD_INSTALL 2023-12-12 07:42:57 -07:00
poll.c io_uring: keep multishot request NAPI timeout current 2024-07-30 06:18:58 -06:00
poll.h io_uring/poll: shrink alloc cache size to 32 2024-04-15 08:10:25 -06:00
refs.h io_uring: kill dead code in io_req_complete_post 2024-04-15 08:10:26 -06:00
register.c io_uring: clean up a type in io_uring_register_get_file() 2024-09-16 12:04:10 -06:00
register.h io_uring: clean up a type in io_uring_register_get_file() 2024-09-16 12:04:10 -06:00
rsrc.c io_uring/rsrc: ignore dummy_ubuf for buffer cloning 2024-10-16 07:09:25 -06:00
rsrc.h io_uring/rsrc: change ubuf->ubuf_end to length tracking 2024-09-15 09:15:22 -06:00
rw.c io_uring/rw: fix missing NOWAIT check for O_DIRECT start write 2024-10-31 08:21:02 -06:00
rw.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
slist.h
splice.c splice: return type ssize_t from all helpers 2023-12-12 16:19:59 +01:00
splice.h
sqpoll.c for-6.12/io_uring-20240922 2024-09-24 11:11:38 -07:00
sqpoll.h io_uring/sqpoll: statistics of the true utilization of sq threads 2024-03-01 06:28:19 -07:00
statx.c vfs: retire user_path_at_empty and drop empty arg from getname_flags 2024-06-05 17:03:57 +02:00
statx.h
sync.c
sync.h
tctx.c
tctx.h
timeout.c io_uring: fix io_match_task must_hold 2024-07-24 08:01:49 -06:00
timeout.h
truncate.c io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
truncate.h io_uring: add support for ftruncate 2024-02-09 09:04:39 -07:00
uring_cmd.c io_uring/cmd: expose iowq to cmds 2024-09-11 10:44:10 -06:00
uring_cmd.h io_uring/alloc_cache: switch to array based caching 2024-04-15 08:10:25 -06:00
waitid.c io_uring: remove struct io_tw_state::locked 2024-04-15 08:10:24 -06:00
waitid.h
xattr.c vfs: retire user_path_at_empty and drop empty arg from getname_flags 2024-06-05 17:03:57 +02:00
xattr.h