linux/fs/xfs
Dave Chinner 85bec5460a xfs: log mount failures don't wait for buffers to be released
Recently I've been seeing xfs/051 fail on 1k block size filesystems.
Trying to trace the events during the test lead to the problem going
away, indicating that it was a race condition that lead to this
ASSERT failure:

XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 156
.....
[<ffffffff814e1257>] xfs_free_perag+0x87/0xb0
[<ffffffff814e21b9>] xfs_mountfs+0x4d9/0x900
[<ffffffff814e5dff>] xfs_fs_fill_super+0x3bf/0x4d0
[<ffffffff811d8800>] mount_bdev+0x180/0x1b0
[<ffffffff814e3ff5>] xfs_fs_mount+0x15/0x20
[<ffffffff811d90a8>] mount_fs+0x38/0x170
[<ffffffff811f4347>] vfs_kern_mount+0x67/0x120
[<ffffffff811f7018>] do_mount+0x218/0xd60
[<ffffffff811f7e5b>] SyS_mount+0x8b/0xd0

When I finally caught it with tracing enabled, I saw that AG 2 had
an elevated reference count and a buffer was responsible for it. I
tracked down the specific buffer, and found that it was missing the
final reference count release that would put it back on the LRU and
hence be found by xfs_wait_buftarg() calls in the log mount failure
handling.

The last four traces for the buffer before the assert were (trimmed
for relevance)

kworker/0:1-5259   xfs_buf_iodone:        hold 2  lock 0 flags ASYNC
kworker/0:1-5259   xfs_buf_ioerror:       hold 2  lock 0 error -5
mount-7163	   xfs_buf_lock_done:     hold 2  lock 0 flags ASYNC
mount-7163	   xfs_buf_unlock:        hold 2  lock 1 flags ASYNC

This is an async write that is completing, so there's nobody waiting
for it directly.  Hence we call xfs_buf_relse() once all the
processing is complete. That does:

static inline void xfs_buf_relse(xfs_buf_t *bp)
{
	xfs_buf_unlock(bp);
	xfs_buf_rele(bp);
}

Now, it's clear that mount is waiting on the buffer lock, and that
it has been released by xfs_buf_relse() and gained by mount. This is
expected, because at this point the mount process is in
xfs_buf_delwri_submit() waiting for all the IO it submitted to
complete.

The mount process, however, is waiting on the lock for the buffer
because it is in xfs_buf_delwri_submit(). This waits for IO
completion, but it doesn't wait for the buffer reference owned by
the IO to go away. The mount process collects all the completions,
fails the log recovery, and the higher level code then calls
xfs_wait_buftarg() to free all the remaining buffers in the
filesystem.

The issue is that on unlocking the buffer, the scheduler has decided
that the mount process has higher priority than the the kworker
thread that is running the IO completion, and so immediately
switched contexts to the mount process from the semaphore unlock
code, hence preventing the kworker thread from finishing the IO
completion and releasing the IO reference to the buffer.

Hence by the time that xfs_wait_buftarg() is run, the buffer still
has an active reference and so isn't on the LRU list that the
function walks to free the remaining buffers. Hence we miss that
buffer and continue onwards to tear down the mount structures,
at which time we get find a stray reference count on the perag
structure. On a non-debug kernel, this will be ignored and the
structure torn down and freed. Hence when the kworker thread is then
rescheduled and the buffer released and freed, it will access a
freed perag structure.

The problem here is that when the log mount fails, we still need to
quiesce the log to ensure that the IO workqueues have returned to
idle before we run xfs_wait_buftarg(). By synchronising the
workqueues, we ensure that all IO completions are fully processed,
not just to the point where buffers have been unlocked. This ensures
we don't end up in the situation above.

cc: <stable@vger.kernel.org> # 3.18
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-19 08:28:10 +11:00
..
libxfs Merge branch 'xfs-dax-updates' into for-next 2015-11-03 13:28:41 +11:00
Kconfig xfs: require 64-bit sector_t 2014-07-30 09:12:05 +10:00
kmem.c xfs: more info from kmem deadlocks and high-level error msgs 2015-10-12 16:04:45 +11:00
kmem.h xfs: change kmem_free to use generic kvfree() 2015-02-02 09:54:18 +11:00
Makefile xfs: stats are no longer dependent on CONFIG_PROC_FS 2015-10-19 08:42:46 +11:00
mrlock.h
uuid.c
uuid.h
xfs_acl.c xfs: Fix error path in xfs_get_acl 2015-11-10 10:09:45 +11:00
xfs_acl.h xfs: Fix error path in xfs_get_acl 2015-11-10 10:09:45 +11:00
xfs_aops.c Merge branch 'xfs-dax-updates' into for-next 2015-11-03 13:28:41 +11:00
xfs_aops.h xfs: DAX does not use IO completion callbacks 2015-11-03 12:37:02 +11:00
xfs_attr_inactive.c Merge branch 'xfs-misc-fixes-for-4.2-3' into for-next 2015-06-23 08:49:01 +10:00
xfs_attr_list.c xfs: per-filesystem stats counter implementation 2015-10-12 18:21:22 +11:00
xfs_attr.h
xfs_bmap_util.c Merge branch 'xfs-dax-updates' into for-next 2015-11-03 13:28:41 +11:00
xfs_bmap_util.h xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate 2015-03-25 15:08:56 +11:00
xfs_buf_item.c Merge branch 'xfs-misc-fixes-for-4.3-3' into for-next 2015-08-25 10:13:35 +10:00
xfs_buf_item.h xfs: fix non-debug build warnings 2015-08-25 10:05:13 +10:00
xfs_buf.c xfs: log mount failures don't wait for buffers to be released 2016-01-19 08:28:10 +11:00
xfs_buf.h dax: move DAX-related functions to a new header 2015-09-08 15:35:28 -07:00
xfs_dir2_readdir.c xfs: per-filesystem stats counter implementation 2015-10-12 18:21:22 +11:00
xfs_discard.c xfs: pass mp to XFS_WANT_CORRUPTED_GOTO 2015-02-23 22:39:08 +11:00
xfs_discard.h
xfs_dquot_item.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_dquot_item.h
xfs_dquot.c xfs: per-filesystem stats counter implementation 2015-10-12 18:21:22 +11:00
xfs_dquot.h xfs: fix implicit bool to int conversion 2015-01-09 10:48:58 +11:00
xfs_error.c xfs: remove inst_t 2015-06-22 09:44:02 +10:00
xfs_error.h xfs: remove inst_t 2015-06-22 09:44:02 +10:00
xfs_export.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xfs_export.h
xfs_extent_busy.c xfs: merge xfs_ag.h into xfs_format.h 2014-11-28 14:25:04 +11:00
xfs_extent_busy.h
xfs_extfree_item.c xfs: add helper to conditionally remove items from the AIL 2015-08-19 10:01:08 +10:00
xfs_extfree_item.h xfs: fix efi/efd error handling to avoid fs shutdown hangs 2015-08-19 09:51:16 +10:00
xfs_file.c xfs: updates for 4.4-rc1 2015-11-11 20:18:48 -08:00
xfs_filestream.c xfs: clean up XFS_MIN_FREELIST macros 2015-06-22 10:13:30 +10:00
xfs_filestream.h xfs: add filestream allocator tracepoints 2014-04-23 07:11:52 +10:00
xfs_fsops.c xfs: growfs not aware of sb_meta_uuid 2015-08-19 10:31:41 +10:00
xfs_fsops.h
xfs_globals.c xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_icache.c xfs: per-filesystem stats counter implementation 2015-10-12 18:21:22 +11:00
xfs_icache.h xfs: merge xfs_ag.h into xfs_format.h 2014-11-28 14:25:04 +11:00
xfs_icreate_item.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_icreate_item.h
xfs_inode_item.c xfs: optimise away log forces on timestamp updates for fdatasync 2015-11-03 13:14:59 +11:00
xfs_inode_item.h xfs: optimise away log forces on timestamp updates for fdatasync 2015-11-03 13:14:59 +11:00
xfs_inode.c Merge branch 'xfs-misc-fixes-for-4.4-2' into for-next 2015-11-03 13:27:58 +11:00
xfs_inode.h xfs: clean up inode lockdep annotations 2015-08-19 10:32:49 +10:00
xfs_ioctl32.c xfs: prefix XATTR_LIST_MAX with XFS_ 2015-10-12 16:02:56 +11:00
xfs_ioctl32.h xfs: compat_xfs_bstat does not have forkoff 2014-10-02 09:17:58 +10:00
xfs_ioctl.c Merge branch 'xfs-misc-fixes-for-4.4-2' into for-next 2015-11-03 13:27:58 +11:00
xfs_ioctl.h
xfs_iomap.c Merge branch 'xfs-dax-updates' into for-next 2015-11-03 13:28:41 +11:00
xfs_iomap.h xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten 2015-01-09 10:48:12 +11:00
xfs_iops.c xfs: per-filesystem stats counter implementation 2015-10-12 18:21:22 +11:00
xfs_iops.h xfs: inodes are new until the dentry cache is set up 2015-02-23 22:38:08 +11:00
xfs_itable.c xfs: fix btree cursor error cleanups 2015-08-19 10:00:53 +10:00
xfs_itable.h xfs: bulkstat chunk formatting cursor is broken 2014-11-07 08:30:30 +11:00
xfs_linux.h xfs: pass xfsstats structures to handlers and macros 2015-10-12 05:19:45 +11:00
xfs_log_cil.c xfs: close xc_cil list_empty() races with cil commit sequence 2015-07-29 11:51:01 +10:00
xfs_log_priv.h xfs: validate metadata LSNs against log on v5 superblocks 2015-10-12 15:59:25 +11:00
xfs_log_recover.c Merge branch 'xfs-misc-fixes-for-4.4-3' into for-next 2015-11-10 10:20:48 +11:00
xfs_log.c Merge branch 'xfs-logging-fixes' into for-next 2015-10-12 18:37:58 +11:00
xfs_log.h xfs: validate metadata LSNs against log on v5 superblocks 2015-10-12 15:59:25 +11:00
xfs_message.c xfs: more info from kmem deadlocks and high-level error msgs 2015-10-12 16:04:45 +11:00
xfs_message.h
xfs_mount.c Merge branch 'xfs-misc-fixes-for-4.4-2' into for-next 2015-11-03 13:27:58 +11:00
xfs_mount.h Merge branch 'xfs-dax-updates' into for-next 2015-11-03 13:28:41 +11:00
xfs_mru_cache.c xfs: xfs_mru_cache_insert() should use GFP_NOFS 2015-03-25 14:57:53 +11:00
xfs_mru_cache.h xfs: embedd mru_elem into parent structure 2014-04-23 07:11:51 +10:00
xfs_pnfs.c xfs: add missing ilock around dio write last extent alignment 2015-10-12 15:34:20 +11:00
xfs_pnfs.h xfs: unlock i_mutex in xfs_break_layouts 2015-04-13 11:38:29 +10:00
xfs_qm_bhv.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_qm_syscalls.c xfs: saner xfs_trans_commit interface 2015-06-04 13:48:08 +10:00
xfs_qm.c xfs: updates for 4.4-rc1 2015-11-11 20:18:48 -08:00
xfs_qm.h xfs: Convert to using ->get_state callback 2015-03-04 16:06:36 +01:00
xfs_quota.h xfs: fix quota block reservation leak when tp allocates and frees blocks 2015-06-01 07:15:37 +10:00
xfs_quotaops.c xfs: Add support for Q_SETINFO 2015-03-04 16:06:38 +01:00
xfs_rtalloc.c xfs: add missing bmap cancel calls in error paths 2015-08-19 10:01:40 +10:00
xfs_rtalloc.h xfs: combine xfs_rtmodify_summary and xfs_rtget_summary 2014-09-09 11:58:42 +10:00
xfs_stats.c xfs: stats are no longer dependent on CONFIG_PROC_FS 2015-10-19 08:42:46 +11:00
xfs_stats.h xfs: per-filesystem stats counter implementation 2015-10-12 18:21:22 +11:00
xfs_super.c Merge branch 'xfs-misc-fixes-for-4.4-3' into for-next 2015-11-10 10:20:48 +11:00
xfs_super.h xfs: Remove icsb infrastructure 2015-02-23 21:22:31 +11:00
xfs_symlink.c Merge branch 'xfs-misc-fixes-for-4.3-2' into for-next 2015-08-20 09:28:45 +10:00
xfs_symlink.h
xfs_sysctl.c xfs: pass xfsstats structures to handlers and macros 2015-10-12 05:19:45 +11:00
xfs_sysctl.h xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_sysfs.c xfs: pass xfsstats structures to handlers and macros 2015-10-12 05:19:45 +11:00
xfs_sysfs.h xfs: create global stats and stats_clear in sysfs 2015-10-12 05:15:45 +11:00
xfs_trace.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_trace.h Merge branch 'xfs-dax-updates' into for-next 2015-11-03 13:28:41 +11:00
xfs_trans_ail.c Revert "xfs: clear PF_NOFREEZE for xfsaild kthread" 2016-01-19 08:21:46 +11:00
xfs_trans_buf.c xfs: only trace buffer items if they exist 2015-02-10 09:23:40 +11:00
xfs_trans_dquot.c xfs: Clean up xfs_trans_dup_dqinfo 2015-06-01 10:50:00 +10:00
xfs_trans_extfree.c xfs: ensure EFD trans aborts on log recovery extent free failure 2015-08-19 09:51:43 +10:00
xfs_trans_inode.c xfs: optimise away log forces on timestamp updates for fdatasync 2015-11-03 13:14:59 +11:00
xfs_trans_priv.h xfs: add helper to conditionally remove items from the AIL 2015-08-19 10:01:08 +10:00
xfs_trans.c xfs: per-filesystem stats counter implementation 2015-10-12 18:21:22 +11:00
xfs_trans.h xfs: ensure EFD trans aborts on log recovery extent free failure 2015-08-19 09:51:43 +10:00
xfs_xattr.c Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-13 18:02:30 -08:00
xfs.h