linux/fs/xfs
Dave Chinner 2c6e24ce1a xfs: prevent deadlock trying to cover an active log
Recent analysis of a deadlocked XFS filesystem from a kernel
crash dump indicated that the filesystem was stuck waiting for log
space. The short story of the hang on the RHEL6 kernel is this:

	- the tail of the log is pinned by an inode
	- the inode has been pushed by the xfsaild
	- the inode has been flushed to it's backing buffer and is
	  currently flush locked and hence waiting for backing
	  buffer IO to complete and remove it from the AIL
	- the backing buffer is marked for write - it is on the
	  delayed write queue
	- the inode buffer has been modified directly and logged
	  recently due to unlinked inode list modification
	- the backing buffer is pinned in memory as it is in the
	  active CIL context.
	- the xfsbufd won't start buffer writeback because it is
	  pinned
	- xfssyncd won't force the log because it sees the log as
	  needing to be covered and hence wants to issue a dummy
	  transaction to move the log covering state machine along.

Hence there is no trigger to force the CIL to the log and hence
unpin the inode buffer and therefore complete the inode IO, remove
it from the AIL and hence move the tail of the log along, allowing
transactions to start again.

Mainline kernels also have the same deadlock, though the signature
is slightly different - the inode buffer never reaches the delayed
write lists because xfs_buf_item_push() sees that it is pinned and
hence never adds it to the delayed write list that the xfsaild
flushes.

There are two possible solutions here. The first is to simply force
the log before trying to cover the log and so ensure that the CIL is
emptied before we try to reserve space for the dummy transaction in
the xfs_log_worker(). While this might work most of the time, it is
still racy and is no guarantee that we don't get stuck in
xfs_trans_reserve waiting for log space to come free. Hence it's not
the best way to solve the problem.

The second solution is to modify xfs_log_need_covered() to be aware
of the CIL. We only should be attempting to cover the log if there
is no current activity in the log - covering the log is the process
of ensuring that the head and tail in the log on disk are identical
(i.e. the log is clean and at idle). Hence, by definition, if there
are items in the CIL then the log is not at idle and so we don't
need to attempt to cover it.

When we don't need to cover the log because it is active or idle, we
issue a log force from xfs_log_worker() - if the log is idle, then
this does nothing.  However, if the log is active due to there being
items in the CIL, it will force the items in the CIL to the log and
unpin them.

In the case of the above deadlock scenario, instead of
xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to
cover the log, it will instead force the log, thereby unpinning the
inode buffer, allowing IO to be issued and complete and hence
removing the inode that was pinning the tail of the log from the
AIL. At that point, everything will start moving along again. i.e.
the xfs_log_worker turns back into a watchdog that can alleviate
deadlocks based around pinned items that prevent the tail of the log
from being moved...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17 10:56:17 -05:00
..
Kconfig xfs: introduce CONFIG_XFS_WARN 2013-05-07 18:45:36 -05:00
kmem.c xfs: factor all the kmalloc-or-vmalloc fallback allocations 2013-09-10 13:57:03 -05:00
kmem.h xfs: factor all the kmalloc-or-vmalloc fallback allocations 2013-09-10 13:57:03 -05:00
Makefile xfs: Add xfs_log_rlimit.c 2013-08-12 17:49:38 -05:00
mrlock.h xfs: introduce CONFIG_XFS_WARN 2013-05-07 18:45:36 -05:00
time.h
uuid.c
uuid.h xfs: add CRC infrastructure 2012-11-19 20:11:24 -06:00
xfs_acl.c xfs: factor all the kmalloc-or-vmalloc fallback allocations 2013-09-10 13:57:03 -05:00
xfs_acl.h xfs: increase number of ACL entries for V5 superblocks 2013-06-06 10:52:15 -05:00
xfs_ag.h xfs: make struct xfs_perag kernel only 2013-08-12 17:44:36 -05:00
xfs_alloc_btree.c xfs: introduce CONFIG_XFS_WARN 2013-05-07 18:45:36 -05:00
xfs_alloc_btree.h xfs: add support for large btree blocks 2013-04-21 14:53:46 -05:00
xfs_alloc.c xfs: kill __KERNEL__ check for debug code in allocation code 2013-08-12 16:57:51 -05:00
xfs_alloc.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_aops.c xfs: get rid of count from xfs_iomap_write_allocate() 2013-10-01 15:42:34 -05:00
xfs_aops.h direct-io: Implement generic deferred AIO completions 2013-09-04 09:23:46 -04:00
xfs_attr_inactive.c xfs: refactor xfs_trans_reserve() interface 2013-08-12 17:47:34 -05:00
xfs_attr_leaf.c Fix wrong flag ASSERT in xfs_attr_shortform_getvalue 2013-08-30 15:20:50 -05:00
xfs_attr_leaf.h xfs: sync minor header differences needed by userspace. 2013-08-12 16:35:41 -05:00
xfs_attr_list.c xfs: split out attribute listing code into separate file 2013-08-12 16:41:29 -05:00
xfs_attr_remote.c xfs: fix issues that cause userspace warnings 2013-08-12 16:52:54 -05:00
xfs_attr_remote.h xfs: rework remote attr CRCs 2013-05-30 17:26:31 -05:00
xfs_attr_sf.h
xfs_attr.c xfs: avoid double-free in xfs_attr_node_addname 2013-08-13 15:48:01 -05:00
xfs_attr.h xfs: kill xfs_vnodeops.[ch] 2013-08-12 16:53:39 -05:00
xfs_bit.c
xfs_bit.h
xfs_bmap_btree.c xfs: = vs == typo in ASSERT() 2013-09-12 09:42:08 -05:00
xfs_bmap_btree.h xfs: recovery of swap extents operations for CRC filesystems 2013-09-10 12:49:57 -05:00
xfs_bmap_util.c xfs: factor all the kmalloc-or-vmalloc fallback allocations 2013-09-10 13:57:03 -05:00
xfs_bmap_util.h xfs: consolidate extent swap code 2013-08-12 16:56:06 -05:00
xfs_bmap.c xfs: fix some minor sparse warnings 2013-09-09 17:43:05 -05:00
xfs_bmap.h xfs: remove __KERNEL__ from debug code 2013-08-12 16:58:37 -05:00
xfs_btree.c xfs: recovery of swap extents operations for CRC filesystems 2013-09-10 12:49:57 -05:00
xfs_btree.h xfs: recovery of swap extents operations for CRC filesystems 2013-09-10 12:49:57 -05:00
xfs_buf_item.c xfs: lock the AIL before removing the buffer item 2013-09-24 12:31:41 -05:00
xfs_buf_item.h xfs: split out buf log item format definitions 2013-08-12 16:06:37 -05:00
xfs_buf.c super: fix for destroy lrus 2013-09-10 18:56:32 -04:00
xfs_buf.h xfs: rework buffer dispose list tracking 2013-09-10 18:56:31 -04:00
xfs_cksum.h xfs: add CRC infrastructure 2012-11-19 20:11:24 -06:00
xfs_da_btree.c xfs: fix node forward in xfs_node_toosmall 2013-09-26 10:38:17 -05:00
xfs_da_btree.h XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568 2013-08-30 09:48:59 -05:00
xfs_dinode.h xfs: di_flushiter considered harmful 2013-07-25 10:41:42 -05:00
xfs_dir2_block.c xfs: dirent dtype presence is dependent on directory magic numbers 2013-09-30 17:49:28 -05:00
xfs_dir2_data.c xfs: Add write support for dirent filetype field 2013-08-22 08:44:49 -05:00
xfs_dir2_format.h xfs: dirent dtype presence is dependent on directory magic numbers 2013-09-30 17:49:28 -05:00
xfs_dir2_leaf.c xfs: check magic numbers in dir3 leaf verifier first 2013-09-09 17:43:58 -05:00
xfs_dir2_node.c xfs: Add write support for dirent filetype field 2013-08-22 08:44:49 -05:00
xfs_dir2_priv.h xfs: Add read-only support for dirent filetype field 2013-08-22 08:40:24 -05:00
xfs_dir2_readdir.c xfs: dirent dtype presence is dependent on directory magic numbers 2013-09-30 17:49:28 -05:00
xfs_dir2_sf.c xfs: dirent dtype presence is dependent on directory magic numbers 2013-09-30 17:49:28 -05:00
xfs_dir2.c XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568 2013-08-30 09:48:59 -05:00
xfs_dir2.h xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino 2013-09-03 14:51:16 -05:00
xfs_discard.c xfs: split out transaction reservation code 2013-08-12 16:36:16 -05:00
xfs_discard.h
xfs_dquot_item.c xfs: fix some minor sparse warnings 2013-09-09 17:43:05 -05:00
xfs_dquot_item.h
xfs_dquot.c xfs: lockdep needs to know about 3 dquot-deep nesting 2013-09-30 17:48:25 -05:00
xfs_dquot.h xfs: Add pquota fields where gquota is used. 2013-07-11 10:35:32 -05:00
xfs_error.c xfs: consolidate xfs_utils.c 2013-08-12 16:55:17 -05:00
xfs_error.h
xfs_export.c xfs: kill xfs_vnodeops.[ch] 2013-08-12 16:53:39 -05:00
xfs_export.h
xfs_extent_busy.c xfs: fix some minor sparse warnings 2013-09-09 17:43:05 -05:00
xfs_extent_busy.h xfs: make xfs_extent_busy_trim not static 2012-05-14 16:21:04 -05:00
xfs_extfree_item.c xfs: return log item size in IOP_SIZE 2013-08-13 16:10:21 -05:00
xfs_extfree_item.h xfs: split out EFI/EFD log item format definition 2013-08-12 16:07:13 -05:00
xfs_file.c xfs: kill xfs_vnodeops.[ch] 2013-08-12 16:53:39 -05:00
xfs_filestream.c xfs: consolidate xfs_utils.c 2013-08-12 16:55:17 -05:00
xfs_filestream.h xfs: xfs_filestreams.h doesn't need __KERNEL__ 2013-08-12 17:00:11 -05:00
xfs_format.h xfs: split out the remote symlink handling 2013-08-12 16:43:38 -05:00
xfs_fs.h xfs: add the inode directory type support to XFS_IOC_FSGEOM 2013-10-08 14:28:09 -05:00
xfs_fsops.c xfs: add the inode directory type support to XFS_IOC_FSGEOM 2013-10-08 14:28:09 -05:00
xfs_fsops.h
xfs_globals.c xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_ialloc_btree.c xfs: introduce CONFIG_XFS_WARN 2013-05-07 18:45:36 -05:00
xfs_ialloc_btree.h xfs: add support for large btree blocks 2013-04-21 14:53:46 -05:00
xfs_ialloc.c xfs: check correct status variable for xfs_inobt_get_rec() call 2013-08-30 13:48:35 -05:00
xfs_ialloc.h xfs: Inode create item recovery 2013-06-27 14:26:21 -05:00
xfs_icache.c xfs: remove usage of is_bad_inode 2013-10-01 17:38:16 -05:00
xfs_icache.h xfs: update #2 for v3.12-rc1 2013-09-12 16:13:41 -07:00
xfs_icreate_item.c xfs: return log item size in IOP_SIZE 2013-08-13 16:10:21 -05:00
xfs_icreate_item.h xfs: separate icreate log format definitions from xfs_icreate_item.h 2013-08-12 16:10:35 -05:00
xfs_inode_buf.c xfs: don't assert fail on bad inode numbers 2013-09-10 14:07:54 -05:00
xfs_inode_buf.h xfs: recovery of swap extents operations for CRC filesystems 2013-09-10 12:49:57 -05:00
xfs_inode_fork.c xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct() 2013-10-01 17:33:10 -05:00
xfs_inode_fork.h xfs: move inode fork definitions to a new header file 2013-08-12 16:37:32 -05:00
xfs_inode_item.c xfs: return log item size in IOP_SIZE 2013-08-13 16:10:21 -05:00
xfs_inode_item.h xfs: split out inode log item format definition 2013-08-12 16:05:19 -05:00
xfs_inode.c xfs: clean up xfs_inactive() error handling, kill VN_INACTIVE_[NO]CACHE 2013-10-08 17:20:41 -05:00
xfs_inode.h xfs: clean up xfs_inactive() error handling, kill VN_INACTIVE_[NO]CACHE 2013-10-08 17:20:41 -05:00
xfs_inum.h xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_ioctl32.c xfs: factor all the kmalloc-or-vmalloc fallback allocations 2013-09-10 13:57:03 -05:00
xfs_ioctl32.h
xfs_ioctl.c xfs: factor all the kmalloc-or-vmalloc fallback allocations 2013-09-10 13:57:03 -05:00
xfs_ioctl.h xfs: consolidate extent swap code 2013-08-12 16:56:06 -05:00
xfs_iomap.c xfs: get rid of count from xfs_iomap_write_allocate() 2013-10-01 15:42:34 -05:00
xfs_iomap.h xfs: get rid of count from xfs_iomap_write_allocate() 2013-10-01 15:42:34 -05:00
xfs_iops.c xfs: Add read-only support for dirent filetype field 2013-08-22 08:40:24 -05:00
xfs_iops.h xfs: kill xfs_vnodeops.[ch] 2013-08-12 16:53:39 -05:00
xfs_itable.c xfs: factor all the kmalloc-or-vmalloc fallback allocations 2013-09-10 13:57:03 -05:00
xfs_itable.h
xfs_linux.h xfs: remove two unused macro definitions in xfs_linux.h 2013-08-20 15:30:23 -05:00
xfs_log_cil.c xfs: prevent deadlock trying to cover an active log 2013-10-17 10:56:17 -05:00
xfs_log_format.h xfs: recovery of swap extents operations for CRC filesystems 2013-09-10 12:49:57 -05:00
xfs_log_priv.h xfs: prevent deadlock trying to cover an active log 2013-10-17 10:56:17 -05:00
xfs_log_recover.c xfs: Use kmem_free() instead of free() 2013-10-01 10:26:24 -05:00
xfs_log_recover.h
xfs_log_rlimit.c xfs: call roundup_64() to calculate the min_logblks 2013-08-13 14:19:11 -05:00
xfs_log.c xfs: prevent deadlock trying to cover an active log 2013-10-17 10:56:17 -05:00
xfs_log.h xfs: Reduce allocations during CIL insertion 2013-08-13 16:12:30 -05:00
xfs_message.c xfs: introduce CONFIG_XFS_WARN 2013-05-07 18:45:36 -05:00
xfs_message.h xfs: introduce CONFIG_XFS_WARN 2013-05-07 18:45:36 -05:00
xfs_mount.c xfs: Register hotcpu notifier after initialization 2013-08-22 14:05:27 -05:00
xfs_mount.h xfs: Introduce a new structure to hold transaction reservation items 2013-08-12 17:45:49 -05:00
xfs_mru_cache.c
xfs_mru_cache.h
xfs_qm_bhv.c xfs: separate dquot on disk format definitions out of xfs_quota.h 2013-08-12 16:09:52 -05:00
xfs_qm_syscalls.c xfs: Add support for the Q_XGETQSTATV 2013-08-20 17:00:38 -05:00
xfs_qm.c super: fix for destroy lrus 2013-09-10 18:56:32 -04:00
xfs_qm.h xfs: convert dquot cache lru to list_lru 2013-09-10 18:56:31 -04:00
xfs_quota_defs.h xfs: introduce xfs_quota_defs.h 2013-08-12 16:20:18 -05:00
xfs_quota_priv.h xfs: use per-filesystem radix trees for dquot lookup 2012-03-14 11:09:06 -05:00
xfs_quota.h xfs: XFS_MOUNT_QUOTA_ALL needed by userspace 2013-09-03 15:00:06 -05:00
xfs_quotaops.c xfs: Add support for the Q_XGETQSTATV 2013-08-20 17:00:38 -05:00
xfs_rtalloc.c xfs: refactor xfs_trans_reserve() interface 2013-08-12 17:47:34 -05:00
xfs_rtalloc.h xfs: introduce xfs_rtalloc_defs.h 2013-08-12 16:13:10 -05:00
xfs_sb.c xfs: fix the comment of xfs_sb_quiet_read_verify() 2013-08-20 15:51:49 -05:00
xfs_sb.h xfs: add xfs sb v4 support for dirent filetype field 2013-08-22 08:49:59 -05:00
xfs_stats.c xfs: use common code for quota statistics 2012-03-14 11:09:06 -05:00
xfs_stats.h xfs: use common code for quota statistics 2012-03-14 11:09:06 -05:00
xfs_super.c xfs: remove usage of is_bad_inode 2013-10-01 17:38:16 -05:00
xfs_super.h xfs: xfs_sync_data is redundant. 2012-10-17 12:01:25 -05:00
xfs_symlink_remote.c xfs: make struct xfs_perag kernel only 2013-08-12 17:44:36 -05:00
xfs_symlink.c xfs: push down inactive transaction mgmt for remote symlinks 2013-10-08 14:53:02 -05:00
xfs_symlink.h xfs: push down inactive transaction mgmt for remote symlinks 2013-10-08 14:53:02 -05:00
xfs_sysctl.c xfs: Convert use of typedef ctl_table to struct ctl_table 2013-06-17 17:42:25 -05:00
xfs_sysctl.h xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_trace.c xfs: separate dquot on disk format definitions out of xfs_quota.h 2013-08-12 16:09:52 -05:00
xfs_trace.h xfs: update for 3.11-rc1 2013-07-09 12:29:12 -07:00
xfs_trans_ail.c xfs: finish removing IOP_* macros. 2013-08-30 14:14:35 -05:00
xfs_trans_buf.c xfs: finish removing IOP_* macros. 2013-08-30 14:14:35 -05:00
xfs_trans_dquot.c xfs: separate dquot on disk format definitions out of xfs_quota.h 2013-08-12 16:09:52 -05:00
xfs_trans_extfree.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_trans_inode.c xfs: implement inode change count 2013-06-28 13:00:05 -05:00
xfs_trans_priv.h xfs: Simplify xfs_ail_min() with list_first_entry_or_null() 2013-08-23 12:57:43 -05:00
xfs_trans_resv.c xfs: inode log reservations are too small 2013-08-30 13:59:30 -05:00
xfs_trans_resv.h xfs: Get rid of all XFS_XXX_LOG_RES() macro 2013-08-12 17:48:08 -05:00
xfs_trans_space.h
xfs_trans.c xfs: finish removing IOP_* macros. 2013-08-30 14:14:35 -05:00
xfs_trans.h xfs: finish removing IOP_* macros. 2013-08-30 14:14:35 -05:00
xfs_types.h xfs: Add read-only support for dirent filetype field 2013-08-22 08:40:24 -05:00
xfs_vnode.h xfs: clean up xfs_inactive() error handling, kill VN_INACTIVE_[NO]CACHE 2013-10-08 17:20:41 -05:00
xfs_xattr.c xfs: kill xfs_vnodeops.[ch] 2013-08-12 16:53:39 -05:00
xfs.h xfs: introduce CONFIG_XFS_WARN 2013-05-07 18:45:36 -05:00