linux/fs/xfs
Dave Chinner a9a4bc8c76 xfs: log worker needs to start before intent/unlink recovery
After 963 iterations of generic/530, it deadlocked during recovery
on a pinned inode cluster buffer like so:

XFS (pmem1): Starting recovery (logdev: internal)
INFO: task kworker/8:0:306037 blocked for more than 122 seconds.
      Not tainted 5.17.0-rc6-dgc+ #975
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/8:0     state:D stack:13024 pid:306037 ppid:     2 flags:0x00004000
Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker
Call Trace:
 <TASK>
 __schedule+0x30d/0x9e0
 schedule+0x55/0xd0
 schedule_timeout+0x114/0x160
 __down+0x99/0xf0
 down+0x5e/0x70
 xfs_buf_lock+0x36/0xf0
 xfs_buf_find+0x418/0x850
 xfs_buf_get_map+0x47/0x380
 xfs_buf_read_map+0x54/0x240
 xfs_trans_read_buf_map+0x1bd/0x490
 xfs_imap_to_bp+0x4f/0x70
 xfs_iunlink_map_ino+0x66/0xd0
 xfs_iunlink_map_prev.constprop.0+0x148/0x2f0
 xfs_iunlink_remove_inode+0xf2/0x1d0
 xfs_inactive_ifree+0x1a3/0x900
 xfs_inode_unlink+0xcc/0x210
 xfs_inodegc_worker+0x1ac/0x2f0
 process_one_work+0x1ac/0x390
 worker_thread+0x56/0x3c0
 kthread+0xf6/0x120
 ret_from_fork+0x1f/0x30
 </TASK>
task:mount           state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x30d/0x9e0
 schedule+0x55/0xd0
 schedule_timeout+0x114/0x160
 __down+0x99/0xf0
 down+0x5e/0x70
 xfs_buf_lock+0x36/0xf0
 xfs_buf_find+0x418/0x850
 xfs_buf_get_map+0x47/0x380
 xfs_buf_read_map+0x54/0x240
 xfs_trans_read_buf_map+0x1bd/0x490
 xfs_imap_to_bp+0x4f/0x70
 xfs_iget+0x300/0xb40
 xlog_recover_process_one_iunlink+0x4c/0x170
 xlog_recover_process_iunlinks.isra.0+0xee/0x130
 xlog_recover_finish+0x57/0x110
 xfs_log_mount_finish+0xfc/0x1e0
 xfs_mountfs+0x540/0x910
 xfs_fs_fill_super+0x495/0x850
 get_tree_bdev+0x171/0x270
 xfs_fs_get_tree+0x15/0x20
 vfs_get_tree+0x24/0xc0
 path_mount+0x304/0xba0
 __x64_sys_mount+0x108/0x140
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
 </TASK>
task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x30d/0x9e0
 schedule+0x55/0xd0
 io_schedule+0x4b/0x80
 xfs_buf_wait_unpin+0x9e/0xf0
 __xfs_buf_submit+0x14a/0x230
 xfs_buf_delwri_submit_buffers+0x107/0x280
 xfs_buf_delwri_submit_nowait+0x10/0x20
 xfsaild+0x27e/0x9d0
 kthread+0xf6/0x120
 ret_from_fork+0x1f/0x30

We have the mount process waiting on an inode cluster buffer read,
inodegc doing unlink waiting on the same inode cluster buffer, and
the AIL push thread blocked in writeback waiting for the inode
cluster buffer to become unpinned.

What has happened here is that the AIL push thread has raced with
the inodegc process modifying, committing and pinning the inode
cluster buffer here in xfs_buf_delwri_submit_buffers() here:

	blk_start_plug(&plug);
	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
		if (!wait_list) {
			if (xfs_buf_ispinned(bp)) {
				pinned++;
				continue;
			}
Here >>>>>>
			if (!xfs_buf_trylock(bp))
				continue;

Basically, the AIL has found the buffer wasn't pinned and got the
lock without blocking, but then the buffer was pinned. This implies
the processing here was pre-empted between the pin check and the
lock, because the pin count can only be increased while holding the
buffer locked. Hence when it has gone to submit the IO, it has
blocked waiting for the buffer to be unpinned.

With all executing threads now waiting on the buffer to be unpinned,
we normally get out of situations like this via the background log
worker issuing a log force which will unpinned stuck buffers like
this. But at this point in recovery, we haven't started the log
worker. In fact, the first thing we do after processing intents and
unlinked inodes is *start the log worker*. IOWs, we start it too
late to have it break deadlocks like this.

Avoid this and any other similar deadlock vectors in intent and
unlinked inode recovery by starting the log worker before we recover
intents and unlinked inodes. This part of recovery runs as though
the filesystem is fully active, so we really should have the same
infrastructure running as we normally do at runtime.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-20 08:59:49 -07:00
..
libxfs xfs: constify xfs_name_dotdot 2022-03-14 10:23:17 -07:00
scrub xfs: fix online fsck handling of v5 feature bits on secondary supers 2022-01-12 09:45:21 -08:00
Kconfig xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n 2020-10-16 15:34:28 -07:00
kmem.c mm: introduce memalloc_retry_wait() 2022-01-15 16:30:29 +02:00
kmem.h xfs: remove kmem_zone typedef 2021-10-22 16:00:31 -07:00
Makefile
mrlock.h
xfs_acl.c overlayfs update for 5.15 2021-09-02 09:21:27 -07:00
xfs_acl.h vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
xfs_aops.c xfs, iomap: limit individual ioend chain lengths in writeback 2022-01-26 09:19:20 -08:00
xfs_aops.h
xfs_attr_inactive.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_attr_list.c xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown 2021-08-19 10:07:13 -07:00
xfs_bio_io.c xfs: async blkdev cache flush 2021-06-21 10:05:51 -07:00
xfs_bmap_item.c xfs: create slab caches for frequently-used deferred items 2021-10-22 16:04:36 -07:00
xfs_bmap_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_bmap_util.c xfs: set prealloc flag in xfs_alloc_file_space() 2022-02-01 14:14:48 -08:00
xfs_bmap_util.h xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls 2022-01-17 09:16:41 -08:00
xfs_buf_item_recover.c xfs: check sb_meta_uuid for dabuf buffer recovery 2021-12-21 09:49:41 -08:00
xfs_buf_item.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_buf_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_buf.c Merge branch 'akpm' (patches from Andrew) 2022-01-15 20:37:06 +02:00
xfs_buf.h dax: return the partition offset from fs_dax_get_by_bdev 2021-12-04 08:58:54 -08:00
xfs_dir2_readdir.c xfs: take the ILOCK when readdir inspects directory mapping data 2022-01-11 15:11:04 -08:00
xfs_discard.c xfs: convert mount flags to features 2021-08-19 10:07:12 -07:00
xfs_discard.h
xfs_dquot_item_recover.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_dquot_item.c xfs: remove support for disabling quota accounting on a mounted file system 2021-08-06 11:05:36 -07:00
xfs_dquot_item.h xfs: remove support for disabling quota accounting on a mounted file system 2021-08-06 11:05:36 -07:00
xfs_dquot.c xfs: hold quota inode ILOCK_EXCL until the end of dqalloc 2022-01-06 10:43:30 -08:00
xfs_dquot.h xfs: queue inactivation immediately when quota is nearing enforcement 2021-08-09 10:52:18 -07:00
xfs_error.c xfs: sysfs: use default_groups in kobj_type 2022-01-06 10:43:30 -08:00
xfs_error.h xfs: add trace point for fs shutdown 2021-08-18 18:46:00 -07:00
xfs_export.c xfs: convert remaining mount flags to state flags 2021-08-19 10:07:13 -07:00
xfs_export.h
xfs_extent_busy.c xfs: pass perags through to the busy extent code 2021-06-02 10:48:24 +10:00
xfs_extent_busy.h xfs: pass perags through to the busy extent code 2021-06-02 10:48:24 +10:00
xfs_extfree_item.c xfs: reduce the size of struct xfs_extent_free_item 2021-10-22 16:04:36 -07:00
xfs_extfree_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_file.c xfs: ensure log flush at the end of a synchronous fallocate call 2022-02-01 14:14:48 -08:00
xfs_filestream.c xfs: convert remaining mount flags to state flags 2021-08-19 10:07:13 -07:00
xfs_filestream.h xfs: convert mount flags to features 2021-08-19 10:07:12 -07:00
xfs_fsmap.c xfs: don't generate selinux audit messages for capability testing 2022-03-09 10:32:06 -08:00
xfs_fsmap.h xfs: fix deadlock and streamline xfs_getfsmap performance 2020-10-07 08:40:29 -07:00
xfs_fsops.c xfs: convert remaining mount flags to state flags 2021-08-19 10:07:13 -07:00
xfs_fsops.h xfs: get rid of xfs_growfs_{data,log}_t 2021-02-03 09:18:50 -08:00
xfs_globals.c xfs: consolidate the eofblocks and cowblocks workers 2021-02-03 09:18:49 -08:00
xfs_health.c xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown 2021-08-19 10:07:13 -07:00
xfs_icache.c New code for 5.17: 2022-01-22 11:04:27 +02:00
xfs_icache.h xfs: throttle inode inactivation queuing on memory reclaim 2021-08-09 11:13:17 -07:00
xfs_icreate_item.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_icreate_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_inode_item_recover.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_inode_item.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_inode_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_inode.c xfs: constify the name argument to various directory functions 2022-03-14 10:23:17 -07:00
xfs_inode.h xfs: constify the name argument to various directory functions 2022-03-14 10:23:17 -07:00
xfs_ioctl32.c xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls 2022-01-17 09:16:41 -08:00
xfs_ioctl32.h xfs: remove unused xfs_ioctl32.h declarations 2022-01-18 10:18:36 -08:00
xfs_ioctl.c xfs: don't generate selinux audit messages for capability testing 2022-03-09 10:32:06 -08:00
xfs_ioctl.h xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls 2022-01-17 09:16:41 -08:00
xfs_iomap.c fsdax: shift partition offset handling into the file systems 2021-12-04 08:58:54 -08:00
xfs_iomap.h iomap: add a IOMAP_DAX flag 2021-12-04 08:58:53 -08:00
xfs_iops.c xfs: refactor user/group quota chown in xfs_setattr_nonsize 2022-03-14 10:23:17 -07:00
xfs_iops.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_itable.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_itable.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_iwalk.c xfs: avoid buffer deadlocks when walking fs inodes 2021-08-09 11:13:16 -07:00
xfs_iwalk.h
xfs_linux.h fs: move mapping helpers 2021-12-03 18:50:17 +01:00
xfs_log_cil.c xfs: reduce kvmalloc overhead for CIL shadow buffers 2022-01-06 10:43:30 -08:00
xfs_log_priv.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_log_recover.c xfs: Remove redundant assignment of mp 2022-01-06 10:43:30 -08:00
xfs_log.c xfs: log worker needs to start before intent/unlink recovery 2022-03-20 08:59:49 -07:00
xfs_log.h xfs: AIL needs asynchronous CIL forcing 2021-08-16 12:09:30 -07:00
xfs_message.c
xfs_message.h once: implement DO_ONCE_LITE for non-fast-path "do once" functionality 2021-06-28 15:54:57 -07:00
xfs_mount.c xfs: only run COW extent recovery when there are no live extents 2021-12-21 09:49:41 -08:00
xfs_mount.h xfs: compute maximum AG btree height for critical reservation calculation 2021-10-19 11:45:15 -07:00
xfs_mru_cache.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_mru_cache.h
xfs_ondisk.h xfs: rename struct xfs_legacy_ictimestamp 2021-04-22 18:29:25 -07:00
xfs_pnfs.c xfs: use setattr_copy to set vfs inode attributes 2022-03-14 10:23:16 -07:00
xfs_pnfs.h
xfs_pwork.c xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_pwork.h xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_qm_bhv.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_qm_syscalls.c xfs: fix quotaoff mutex usage now that we don't support disabling it 2021-12-21 09:49:41 -08:00
xfs_qm.c xfs: remove the xfs_dqblk_t typedef 2021-10-14 09:19:33 -07:00
xfs_qm.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_quota.h xfs: queue inactivation immediately when quota is nearing enforcement 2021-08-09 10:52:18 -07:00
xfs_quotaops.c xfs: remove the active vs running quota differentiation 2021-08-06 11:05:37 -07:00
xfs_refcount_item.c xfs: create slab caches for frequently-used deferred items 2021-10-22 16:04:36 -07:00
xfs_refcount_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_reflink.c xfs: add missing cmap->br_state = XFS_EXT_NORM update 2022-03-09 10:32:06 -08:00
xfs_reflink.h xfs: convert xfs_sb_version_has checks to use mount features 2021-08-19 10:07:14 -07:00
xfs_rmap_item.c xfs: create slab caches for frequently-used deferred items 2021-10-22 16:04:36 -07:00
xfs_rmap_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_rtalloc.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_rtalloc.h xfs: make the record pointer passed to query_range functions const 2021-08-18 18:46:01 -07:00
xfs_stats.c xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_stats.h xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_super.c Bug fixes for 5.17-rc4: 2022-02-26 09:53:19 -08:00
xfs_super.h xfs: remove xfs_blkdev_issue_flush 2021-06-21 10:05:46 -07:00
xfs_symlink.c New code for 5.17: 2022-01-11 15:01:50 -08:00
xfs_symlink.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_sysctl.c xfs: restore speculative_cow_prealloc_lifetime sysctl 2021-02-24 10:16:08 -08:00
xfs_sysctl.h xfs: consolidate the eofblocks and cowblocks workers 2021-02-03 09:18:49 -08:00
xfs_sysfs.c xfs: sysfs: use default_groups in kobj_type 2022-01-06 10:43:30 -08:00
xfs_sysfs.h xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init 2020-08-07 11:50:17 -07:00
xfs_trace.c xfs: add trace point for fs shutdown 2021-08-18 18:46:00 -07:00
xfs_trace.h xfs: constify the name argument to various directory functions 2022-03-14 10:23:17 -07:00
xfs_trans_ail.c xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown 2021-08-19 10:07:13 -07:00
xfs_trans_buf.c xfs: introduce xfs_buf_daddr() 2021-08-19 10:07:14 -07:00
xfs_trans_dquot.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_trans_priv.h
xfs_trans.c xfs: reserve quota for dir expansion when linking/unlinking files 2022-03-14 10:23:17 -07:00
xfs_trans.h xfs: reserve quota for dir expansion when linking/unlinking files 2022-03-14 10:23:17 -07:00
xfs_xattr.c xfs: prevent metadata files from being inactivated 2021-03-25 16:47:50 -07:00
xfs.h