linux/fs/btrfs
Nikolay Borisov 4d14c5cde5 btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070eac ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277a49 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_timeout at ffffffffa52a1bdd
  #3  wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
  #4  start_delalloc_inodes at ffffffffc0380db5
  #5  btrfs_start_delalloc_snapshot at ffffffffc0393836
  #6  try_flush_qgroup at ffffffffc03f04b2
  #7  __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
  #8  btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
  #9  btrfs_update_inode at ffffffffc0385ba8
 #10  btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
 #11  touch_atime at ffffffffa4cf0000
 #12  generic_file_read_iter at ffffffffa4c1f123
 #13  new_sync_read at ffffffffa4ccdc8a
 #14  vfs_read at ffffffffa4cd0849
 #15  ksys_read at ffffffffa4cd0bd1
 #16  do_syscall_64 at ffffffffa4a052eb
 #17  entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_preempt_disabled at ffffffffa529e80a
  #3  __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
  #4  btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
  #5  btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
  #6  cow_file_range_inline.constprop.78 at ffffffffc0386be7
  #7  cow_file_range at ffffffffc03879c1
  #8  btrfs_run_delalloc_range at ffffffffc038894c
  #9  writepage_delalloc at ffffffffc03a3c8f
 #10  __extent_writepage at ffffffffc03a4c01
 #11  extent_write_cache_pages at ffffffffc03a500b
 #12  extent_writepages at ffffffffc03a6de2
 #13  do_writepages at ffffffffa4c277eb
 #14  __filemap_fdatawrite_range at ffffffffa4c1e5bb
 #15  btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
 #16  normal_work_helper at ffffffffc03b706c
 #17  process_one_work at ffffffffa4aba4e4
 #18  worker_thread at ffffffffa4aba6fd
 #19  kthread at ffffffffa4ac0a3d
 #20  ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e965360 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 17:17:09 +01:00
..
tests btrfs: extend btrfs_rmap_block for specifying a device 2021-02-09 02:46:05 +01:00
acl.c
async-thread.c Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
async-thread.h Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
backref.c btrfs: do not warn if we can't find the reloc root when looking up backref 2021-02-08 22:58:56 +01:00
backref.h btrfs: add asserts for deleting backref cache nodes 2021-02-08 22:58:56 +01:00
block-group.c btrfs: fix race between writes to swap files and scrub 2021-02-22 18:07:15 +01:00
block-group.h btrfs: fix race between writes to swap files and scrub 2021-02-22 18:07:15 +01:00
block-rsv.c btrfs: introduce mount option rescue=ignorebadroots 2020-12-08 15:53:41 +01:00
block-rsv.h btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
btrfs_inode.h btrfs: make btrfs_dio_private::bytes u32 2021-02-08 22:58:51 +01:00
check-integrity.c btrfs: drop casts of bio bi_sector 2020-12-09 19:16:05 +01:00
check-integrity.h btrfs: remove btrfsic_submit_bh() 2020-03-23 17:01:39 +01:00
compression.c btrfs: make check_compressed_csum() to be subpage compatible 2021-02-22 17:15:27 +01:00
compression.h btrfs: compression: move declarations to header 2020-10-07 12:06:55 +02:00
ctree.c btrfs: fix extent buffer leak on failure to copy root 2021-02-08 22:59:04 +01:00
ctree.h btrfs: fix race between writes to swap files and scrub 2021-02-22 18:07:15 +01:00
delalloc-space.c btrfs: fix parameter description of btrfs_inode_rsv_release/btrfs_delalloc_release_space 2021-02-08 22:58:54 +01:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: don't flush from btrfs_delayed_inode_reserve_metadata 2021-03-02 17:17:09 +01:00
delayed-inode.h btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-ref.c btrfs: account for new extents being deleted in total_bytes_pinned 2021-02-08 22:58:55 +01:00
delayed-ref.h btrfs: only let one thread pre-flush delayed refs in commit 2021-02-08 22:58:56 +01:00
dev-replace.c btrfs: zoned: mark block groups to copy for device-replace 2021-02-09 02:46:07 +01:00
dev-replace.h btrfs: zoned: mark block groups to copy for device-replace 2021-02-09 02:46:07 +01:00
dir-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
discard.c btrfs: document now parameter of peek_discard_list 2021-02-08 22:58:53 +01:00
discard.h btrfs: cleanup btrfs_discard_update_discardable usage 2020-12-08 15:54:02 +01:00
disk-io.c btrfs: zoned: reorder log node allocation on zoned filesystem 2021-02-09 02:48:41 +01:00
disk-io.h btrfs: split alloc_log_tree() 2021-02-09 02:46:07 +01:00
export.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
export.h btrfs: export helpers for subvolume name/id resolution 2020-03-23 17:01:42 +01:00
extent_io.c btrfs: zoned: relocate block group to repair IO failure in zoned filesystems 2021-02-09 02:46:07 +01:00
extent_io.h btrfs: zoned: redirty released extent buffers 2021-02-09 02:46:04 +01:00
extent_map.c btrfs: fix parameter description of btrfs_add_extent_mapping 2021-02-08 22:58:53 +01:00
extent_map.h btrfs: remove extent_map::bdev 2019-11-18 23:43:44 +01:00
extent-io-tree.h btrfs: use fixed width int type for extent_state::state 2020-12-08 15:54:13 +01:00
extent-tree.c btrfs: zoned: extend zoned allocator to use dedicated tree-log block group 2021-02-09 02:46:08 +01:00
file-item.c btrfs: fix function description formats in file-item.c 2021-02-08 22:58:53 +01:00
file.c btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors 2021-03-02 16:55:44 +01:00
free-space-cache.c btrfs: avoid double put of block group when emptying cluster 2021-02-22 18:07:45 +01:00
free-space-cache.h btrfs: zoned: track unusable bytes for zones 2021-02-09 02:46:03 +01:00
free-space-tree.c btrfs: fix possible free space tree corruption with online conversion 2021-01-25 18:44:37 +01:00
free-space-tree.h btrfs: rename btrfs_block_group_cache 2019-11-18 17:51:51 +01:00
inode-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
inode.c btrfs: don't flush from btrfs_delayed_inode_reserve_metadata 2021-03-02 17:17:09 +01:00
ioctl.c btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl 2021-03-02 16:55:47 +01:00
Kconfig btrfs: switch to iomap for direct IO 2020-10-07 12:06:57 +02:00
locking.c btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
locking.h btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
lzo.c btrfs: compression: inline free_workspace 2019-11-18 12:46:59 +01:00
Makefile btrfs: introduce the skeleton of btrfs_subpage structure 2021-02-08 22:59:01 +01:00
misc.h btrfs: rename tree_entry to rb_simple_node and export it 2020-05-25 11:25:19 +02:00
ordered-data.c btrfs: zoned: use ZONE_APPEND write for zoned mode 2021-02-09 02:46:06 +01:00
ordered-data.h btrfs: zoned: use ZONE_APPEND write for zoned mode 2021-02-09 02:46:06 +01:00
orphan.c
print-tree.c btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
print-tree.h btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
props.c btrfs: simplify iget helpers 2020-05-25 11:25:37 +02:00
props.h
qgroup.c btrfs: export and rename qgroup_reserve_meta 2021-03-02 16:58:30 +01:00
qgroup.h btrfs: export and rename qgroup_reserve_meta 2021-03-02 16:58:30 +01:00
raid56.c btrfs: fix raid6 qstripe kmap 2021-02-22 17:15:21 +01:00
raid56.h
rcu-string.h btrfs: rcu-string: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
reada.c btrfs: pass the owner_root and level to alloc_extent_buffer 2020-12-08 15:54:07 +01:00
ref-verify.c btrfs: ref-verify: use 'inline void' keyword ordering 2021-03-02 16:55:40 +01:00
ref-verify.h
reflink.c btrfs: fix stale data exposure after cloning a hole with NO_HOLES enabled 2021-02-22 18:07:45 +01:00
reflink.h Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
relocation.c btrfs: zoned: enable relocation on a zoned filesystem 2021-02-09 02:46:07 +01:00
root-tree.c btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations 2020-10-07 12:12:13 +02:00
scrub.c btrfs: fix race between writes to swap files and scrub 2021-02-22 18:07:15 +01:00
send.c btrfs: send: use struct send_ctx *sctx for btrfs_compare_trees and changed_cb 2021-02-08 22:58:57 +01:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: zoned: track unusable bytes for zones 2021-02-09 02:46:03 +01:00
space-info.h btrfs: zoned: track unusable bytes for zones 2021-02-09 02:46:03 +01:00
struct-funcs.c btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors 2020-12-09 19:16:10 +01:00
subpage.c btrfs: integrate page status update for data read path into begin/end_page_read 2021-02-08 22:59:03 +01:00
subpage.h btrfs: integrate page status update for data read path into begin/end_page_read 2021-02-08 22:59:03 +01:00
super.c btrfs: fix spurious free_space_tree remount warning 2021-03-02 16:55:55 +01:00
sysfs.c btrfs: zoned: track unusable bytes for zones 2021-02-09 02:46:03 +01:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: zoned: redirty released extent buffers 2021-02-09 02:46:04 +01:00
transaction.h btrfs: zoned: redirty released extent buffers 2021-02-09 02:46:04 +01:00
tree-checker.c btrfs: tree-checker: do not error out if extent ref hash doesn't match 2021-02-22 18:07:44 +01:00
tree-checker.h
tree-defrag.c btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
tree-log.c btrfs: zoned: fix deadlock on log sync 2021-02-22 18:08:48 +01:00
tree-log.h btrfs: make fast fsyncs wait only for writeback 2020-10-07 12:06:56 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: remove unnecessary casts in printk 2020-12-08 15:53:52 +01:00
volumes.c btrfs: zoned: relocate block group to repair IO failure in zoned filesystems 2021-02-09 02:46:07 +01:00
volumes.h btrfs: zoned: relocate block group to repair IO failure in zoned filesystems 2021-02-09 02:46:07 +01:00
xattr.c btrfs: skip unnecessary searches for xattrs when logging an inode 2020-12-08 15:54:12 +01:00
xattr.h
zlib.c btrfs: use larger zlib buffer for s390 hardware compression 2020-01-31 10:30:40 -08:00
zoned.c btrfs: zoned: support dev-replace in zoned filesystems 2021-02-09 02:46:07 +01:00
zoned.h btrfs: zoned: extend zoned allocator to use dedicated tree-log block group 2021-02-09 02:46:08 +01:00
zstd.c btrfs: compression: inline free_workspace 2019-11-18 12:46:59 +01:00