linux/fs
Naohiro Aota 9ce7466f37 btrfs: ensure pages are unlocked on cow_file_range() failure
There is a hung_task report on zoned btrfs like below.

https://github.com/naota/linux/issues/59

  [726.328648] INFO: task rocksdb:high0:11085 blocked for more than 241 seconds.
  [726.329839]       Not tainted 5.16.0-rc1+ #1
  [726.330484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [726.331603] task:rocksdb:high0   state:D stack:    0 pid:11085 ppid: 11082 flags:0x00000000
  [726.331608] Call Trace:
  [726.331611]  <TASK>
  [726.331614]  __schedule+0x2e5/0x9d0
  [726.331622]  schedule+0x58/0xd0
  [726.331626]  io_schedule+0x3f/0x70
  [726.331629]  __folio_lock+0x125/0x200
  [726.331634]  ? find_get_entries+0x1bc/0x240
  [726.331638]  ? filemap_invalidate_unlock_two+0x40/0x40
  [726.331642]  truncate_inode_pages_range+0x5b2/0x770
  [726.331649]  truncate_inode_pages_final+0x44/0x50
  [726.331653]  btrfs_evict_inode+0x67/0x480
  [726.331658]  evict+0xd0/0x180
  [726.331661]  iput+0x13f/0x200
  [726.331664]  do_unlinkat+0x1c0/0x2b0
  [726.331668]  __x64_sys_unlink+0x23/0x30
  [726.331670]  do_syscall_64+0x3b/0xc0
  [726.331674]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [726.331677] RIP: 0033:0x7fb9490a171b
  [726.331681] RSP: 002b:00007fb943ffac68 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
  [726.331684] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb9490a171b
  [726.331686] RDX: 00007fb943ffb040 RSI: 000055a6bbe6ec20 RDI: 00007fb94400d300
  [726.331687] RBP: 00007fb943ffad00 R08: 0000000000000000 R09: 0000000000000000
  [726.331688] R10: 0000000000000031 R11: 0000000000000246 R12: 00007fb943ffb000
  [726.331690] R13: 00007fb943ffb040 R14: 0000000000000000 R15: 00007fb943ffd260
  [726.331693]  </TASK>

While we debug the issue, we found running fstests generic/551 on 5GB
non-zoned null_blk device in the emulated zoned mode also had a
similar hung issue.

Also, we can reproduce the same symptom with an error injected
cow_file_range() setup.

The hang occurs when cow_file_range() fails in the middle of
allocation. cow_file_range() called from do_allocation_zoned() can
split the give region ([start, end]) for allocation depending on
current block group usages. When btrfs can allocate bytes for one part
of the split regions but fails for the other region (e.g. because of
-ENOSPC), we return the error leaving the pages in the succeeded regions
locked. Technically, this occurs only when @unlock == 0. Otherwise, we
unlock the pages in an allocated region after creating an ordered
extent.

Considering the callers of cow_file_range(unlock=0) won't write out
the pages, we can unlock the pages on error exit from
cow_file_range(). So, we can ensure all the pages except @locked_page
are unlocked on error case.

In summary, cow_file_range now behaves like this:

- page_started == 1 (return value)
  - All the pages are unlocked. IO is started.
- unlock == 1
  - All the pages except @locked_page are unlocked in any case
- unlock == 0
  - On success, all the pages are locked for writing out them
  - On failure, all the pages except @locked_page are unlocked

Fixes: 42c0110009 ("btrfs: zoned: introduce dedicated data write path for zoned filesystems")
CC: stable@vger.kernel.org # 5.12+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-25 17:45:38 +02:00
..
9p 9p: fix EBADF errors in cached mode 2022-06-17 06:03:30 +09:00
adfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
affs affs: Convert affs to read_folio 2022-05-09 16:21:44 -04:00
afs netfs: do not unlock and put the folio twice 2022-07-14 10:10:12 +02:00
autofs
befs befs: Convert befs to read_folio 2022-05-09 16:21:45 -04:00
bfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
btrfs btrfs: ensure pages are unlocked on cow_file_range() failure 2022-07-25 17:45:38 +02:00
cachefiles cachefiles: narrow the scope of flushed requests when releasing fd 2022-07-05 16:12:21 +01:00
ceph netfs: do not unlock and put the folio twice 2022-07-14 10:10:12 +02:00
cifs smb3: workaround negprot bug in some Samba servers 2022-07-13 19:59:47 -05:00
coda coda: Convert coda to read_folio 2022-05-09 16:21:45 -04:00
configfs configfs: fix a race in configfs_{,un}register_subsystem() 2022-02-22 18:30:28 +01:00
cramfs cramfs: Convert cramfs to read_folio 2022-05-09 16:21:45 -04:00
crypto fscrypt: add new helper functions for test_dummy_encryption 2022-05-09 16:18:54 -07:00
debugfs debugfs: Document that debugfs_create functions need not be error checked 2022-02-25 11:56:13 +01:00
devpts fsnotify: fix fsnotify hooks in pseudo filesystems 2022-01-24 14:17:02 +01:00
dlm dlm: use kref_put_lock in __put_lkb 2022-05-02 11:23:49 -05:00
ecryptfs ecryptfs: Convert ecryptfs to read_folio 2022-05-09 16:21:45 -04:00
efivarfs
efs efs: Convert efs symlinks to read_folio 2022-05-09 16:21:45 -04:00
erofs Changes since last update: 2022-06-01 11:54:29 -07:00
exfat exfat: use updated exfat_chain directly during renaming 2022-06-09 21:26:32 +09:00
exportfs exportfs: support idmapped mounts 2022-04-28 16:31:10 +02:00
ext2 ext2: fix fs corruption when trying to remove a non-empty directory with IO error 2022-06-16 10:55:45 +02:00
ext4 ext4: fix a doubled word "need" in a comment 2022-06-18 19:36:20 -04:00
f2fs f2fs: do not count ENOENT for error case 2022-06-21 08:29:56 -07:00
fat Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
freevxfs SPDX changes for 5.19-rc1 2022-06-03 10:34:34 -07:00
fscache fscache: Fix invalidation/lookup race 2022-07-05 16:12:55 +01:00
fuse libnvdimm for 5.19 2022-05-27 15:49:30 -07:00
gfs2 Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
hfs fs: Change try_to_free_buffers() to take a folio 2022-05-09 23:12:34 -04:00
hfsplus fs: Change try_to_free_buffers() to take a folio 2022-05-09 23:12:34 -04:00
hostfs hostfs: Convert hostfs to read_folio 2022-05-09 16:21:45 -04:00
hpfs hpfs: Convert symlinks to read_folio 2022-05-09 16:21:45 -04:00
hugetlbfs hugetlbfs: zero partial pages during fallocate hole punch 2022-06-16 19:11:32 -07:00
iomap Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
isofs isofs: Convert symlinks and zisofs to read_folio 2022-05-09 16:21:45 -04:00
jbd2 fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment 2022-06-16 10:36:09 -04:00
jffs2 This pull request contains fixes for JFFS2, UBI and UBIFS 2022-06-03 14:42:24 -07:00
jfs JFS: One bug fix and some code cleanup 2022-05-27 15:59:21 -07:00
kernfs kernfs: Separate kernfs_pr_cont_buf and rename_lock. 2022-05-19 19:37:06 +02:00
ksmbd vfs: fix copy_file_range() regression in cross-fs copies 2022-06-30 15:16:38 -07:00
lockd lockd: fix nlm_close_files 2022-07-11 15:49:56 -04:00
minix fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
netfs netfs: do not unlock and put the folio twice 2022-07-14 10:10:12 +02:00
nfs NFSv4: Add an fattr allocation to _nfs4_discover_trunking() 2022-06-30 16:13:00 -04:00
nfs_common
nfsd Notable regression fixes: 2022-07-14 12:29:43 -07:00
nilfs2 nilfs2: fix incorrect masking of permission flags for symlinks 2022-07-03 15:42:33 -07:00
nls
notify fanotify: refine the validation checks on non-dir inode mask 2022-06-28 11:18:13 +02:00
ntfs Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
ntfs3 Ntfs3 for 5.19 2022-06-03 16:57:16 -07:00
ocfs2 Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
omfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
openpromfs fs: allocate inode by using alloc_inode_sb() 2022-03-22 15:57:03 -07:00
orangefs orangefs: Convert to free_folio 2022-05-09 23:12:53 -04:00
overlayfs ovl: turn of SB_POSIXACL with idmapped layers temporarily 2022-07-08 15:48:31 +02:00
proc Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
pstore pstore: Don't use semaphores in always-atomic-context code 2022-03-15 11:08:23 -07:00
qnx4 fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
qnx6 fs: Convert mpage_readpage to mpage_read_folio 2022-05-09 16:21:44 -04:00
quota quota: Prevent memory allocation recursion while holding dq_lock 2022-06-06 10:08:10 +02:00
ramfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
reiserfs fs: Change try_to_free_buffers() to take a folio 2022-05-09 23:12:34 -04:00
romfs romfs: Convert romfs to read_folio 2022-05-09 16:21:46 -04:00
smbfs_common Add various fsctl structs 2022-05-23 20:24:12 -05:00
squashfs Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
sysfs kobject: kobj_type: remove default_attrs 2022-04-05 15:39:19 +02:00
sysv Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
tracefs tracefs: Fix syntax errors in comments 2022-06-17 19:01:28 -04:00
ubifs This pull request contains fixes for JFFS2, UBI and UBIFS 2022-06-03 14:42:24 -07:00
udf Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
ufs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
unicode kbuild: unify cmd_copy and cmd_shipped 2022-02-14 10:37:32 +09:00
vboxsf vboxsf: Convert vboxsf to read_folio 2022-05-09 16:21:46 -04:00
verity Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
xfs xfs: prevent a UAF when log IO errors race with unmount 2022-07-01 09:09:52 -07:00
zonefs zonefs: fix zonefs_iomap_begin() for reads 2022-06-08 19:13:55 +09:00
aio.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2022-04-01 19:57:03 -07:00
anon_inodes.c
attr.c fs: account for group membership 2022-06-14 12:18:47 +02:00
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c coredump: Snapshot the vmas in do_coredump 2022-03-08 12:55:29 -06:00
binfmt_elf_test.c binfmt_elf: Introduce KUnit test 2022-03-03 20:38:56 -08:00
binfmt_elf.c revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" 2022-04-15 14:49:56 -07:00
binfmt_flat.c binfmt_flat: Remove shared library support 2022-04-22 10:57:18 -07:00
binfmt_misc.c Fix regression due to "fs: move binfmt_misc sysctl to its own file" 2022-02-09 09:50:02 -08:00
binfmt_script.c
buffer.c fs: Convert drop_buffers() to use a folio 2022-05-09 23:12:34 -04:00
char_dev.c
compat_binfmt_elf.c binfmt_elf: Introduce KUnit test 2022-03-03 20:38:56 -08:00
coredump.c ptrace: Cleanups for v5.18 2022-03-28 17:29:53 -07:00
d_path.c d_path: fix Kernel doc validator complaining 2021-11-06 13:30:32 -07:00
dax.c libnvdimm for 5.19 2022-05-27 15:49:30 -07:00
dcache.c mm: dcache: use kmem_cache_alloc_lru() to allocate dentry 2022-03-22 15:57:03 -07:00
direct-io.c direct-io: remove random prefetches 2022-04-17 19:50:02 -06:00
drop_caches.c
eventfd.c
eventpoll.c eventpoll: simplify sysctl declaration with register_sysctl() 2022-01-22 08:33:35 +02:00
exec.c fix race between exit_itimers() and /proc/pid/timers 2022-07-11 09:52:59 -07:00
fcntl.c VFS: add FMODE_CAN_ODIRECT file flag 2022-05-09 18:20:49 -07:00
fhandle.c
file_table.c Descriptor handling cleanups 2022-06-04 18:52:00 -07:00
file.c fix the breakage in close_fd_get_file() calling conventions change 2022-06-05 15:03:03 -04:00
filesystems.c
fs_context.c vfs: fs_context: fix up param length parsing in legacy_parse_param 2022-01-18 09:23:19 +02:00
fs_parser.c fs_parse: allow parameter value to be empty 2021-12-09 14:09:36 -05:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c writeback: Fix inode->i_io_list not be protected by inode->i_lock error 2022-06-06 09:54:30 +02:00
fsopen.c uninline may_mount() and don't opencode it in fspick(2)/fsopen(2) 2022-05-19 23:25:10 -04:00
init.c
inode.c writeback: Fix inode->i_io_list not be protected by inode->i_lock error 2022-06-06 09:54:30 +02:00
internal.h Cleanups (and one fix) around struct mount handling. 2022-06-04 19:00:05 -07:00
io_uring.c io_uring: do not recycle buffer in READV 2022-07-21 08:31:31 -06:00
io-wq.c io-wq: use __set_notify_signal() to wake workers 2022-04-30 08:39:54 -06:00
io-wq.h io_uring: add support for IORING_ASYNC_CANCEL_ALL 2022-04-24 18:18:18 -06:00
ioctl.c Fixes for 5.18-rc1: 2022-04-01 19:35:56 -07:00
Kconfig mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* 2022-04-28 23:16:15 -07:00
Kconfig.binfmt m68knommu: changes for linux 5.19 2022-05-30 10:56:18 -07:00
kernel_read_file.c
libfs.c fs: Convert simple_readpage to simple_read_folio 2022-05-09 16:21:44 -04:00
locks.c fs/lock: add 2 callbacks to lock_manager_operations to resolve conflict 2022-05-19 12:25:39 -04:00
Makefile Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
mbcache.c
mount.h
mpage.c fs: Change try_to_free_buffers() to take a folio 2022-05-09 23:12:34 -04:00
namei.c Several cleanups in fs/namei.c. 2022-06-04 19:07:15 -07:00
namespace.c Cleanups (and one fix) around struct mount handling. 2022-06-04 19:00:05 -07:00
no-block.c
nsfs.c
open.c RISC-V Patches for the 5.19 Merge Window, Part 1 2022-05-31 14:10:54 -07:00
pipe.c Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
pnode.c
pnode.h
posix_acl.c fs: fix acl translation 2022-04-19 10:19:02 -07:00
proc_namespace.c fs: add is_idmapped_mnt() helper 2021-12-03 18:44:06 +01:00
read_write.c vfs: fix copy_file_range() regression in cross-fs copies 2022-06-30 15:16:38 -07:00
readdir.c
remap_range.c Revert "vf/remap: return the amount of bytes actually deduplicated" 2022-07-14 15:35:24 -07:00
select.c select: Fix indefinitely sleeping task in poll_schedule_timeout() 2022-01-11 09:03:05 -08:00
seq_file.c rxrpc: Fix locking issue 2022-05-22 21:03:01 +01:00
signalfd.c Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
splice.c mm: Convert remove_mapping() to take a folio 2022-03-21 12:59:01 -04:00
stack.c
stat.c RISC-V Patches for the 5.19 Merge Window, Part 1 2022-05-31 14:10:54 -07:00
statfs.c
super.c block: add a bdev_stable_writes helper 2022-04-17 19:49:59 -06:00
sync.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
sysctls.c fs: move namespace sysctls and declare fs base directory 2022-01-22 08:33:36 +02:00
timerfd.c
userfaultfd.c mm/uffd: enable write protection for shmem & hugetlbfs 2022-05-13 07:20:11 -07:00
utimes.c
xattr.c fs: split off do_getxattr from getxattr 2022-04-24 18:18:37 -06:00