linux

History

Dave Chinner 85bec5460a xfs: log mount failures don't wait for buffers to be released Recently I've been seeing xfs/051 fail on 1k block size filesystems. Trying to trace the events during the test lead to the problem going away, indicating that it was a race condition that lead to this ASSERT failure: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 156 ..... [<ffffffff814e1257>] xfs_free_perag+0x87/0xb0 [<ffffffff814e21b9>] xfs_mountfs+0x4d9/0x900 [<ffffffff814e5dff>] xfs_fs_fill_super+0x3bf/0x4d0 [<ffffffff811d8800>] mount_bdev+0x180/0x1b0 [<ffffffff814e3ff5>] xfs_fs_mount+0x15/0x20 [<ffffffff811d90a8>] mount_fs+0x38/0x170 [<ffffffff811f4347>] vfs_kern_mount+0x67/0x120 [<ffffffff811f7018>] do_mount+0x218/0xd60 [<ffffffff811f7e5b>] SyS_mount+0x8b/0xd0 When I finally caught it with tracing enabled, I saw that AG 2 had an elevated reference count and a buffer was responsible for it. I tracked down the specific buffer, and found that it was missing the final reference count release that would put it back on the LRU and hence be found by xfs_wait_buftarg() calls in the log mount failure handling. The last four traces for the buffer before the assert were (trimmed for relevance) kworker/0:1-5259 xfs_buf_iodone: hold 2 lock 0 flags ASYNC kworker/0:1-5259 xfs_buf_ioerror: hold 2 lock 0 error -5 mount-7163 xfs_buf_lock_done: hold 2 lock 0 flags ASYNC mount-7163 xfs_buf_unlock: hold 2 lock 1 flags ASYNC This is an async write that is completing, so there's nobody waiting for it directly. Hence we call xfs_buf_relse() once all the processing is complete. That does: static inline void xfs_buf_relse(xfs_buf_t *bp) { xfs_buf_unlock(bp); xfs_buf_rele(bp); } Now, it's clear that mount is waiting on the buffer lock, and that it has been released by xfs_buf_relse() and gained by mount. This is expected, because at this point the mount process is in xfs_buf_delwri_submit() waiting for all the IO it submitted to complete. The mount process, however, is waiting on the lock for the buffer because it is in xfs_buf_delwri_submit(). This waits for IO completion, but it doesn't wait for the buffer reference owned by the IO to go away. The mount process collects all the completions, fails the log recovery, and the higher level code then calls xfs_wait_buftarg() to free all the remaining buffers in the filesystem. The issue is that on unlocking the buffer, the scheduler has decided that the mount process has higher priority than the the kworker thread that is running the IO completion, and so immediately switched contexts to the mount process from the semaphore unlock code, hence preventing the kworker thread from finishing the IO completion and releasing the IO reference to the buffer. Hence by the time that xfs_wait_buftarg() is run, the buffer still has an active reference and so isn't on the LRU list that the function walks to free the remaining buffers. Hence we miss that buffer and continue onwards to tear down the mount structures, at which time we get find a stray reference count on the perag structure. On a non-debug kernel, this will be ignored and the structure torn down and freed. Hence when the kworker thread is then rescheduled and the buffer released and freed, it will access a freed perag structure. The problem here is that when the log mount fails, we still need to quiesce the log to ensure that the IO workqueues have returned to idle before we run xfs_wait_buftarg(). By synchronising the workqueues, we ensure that all IO completions are fully processed, not just to the point where buffers have been unlocked. This ensures we don't end up in the situation above. cc: <stable@vger.kernel.org> # 3.18 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>		2016-01-19 08:28:10 +11:00
..
9p	9p: ->evict_inode() should kick out ->i_data, not ->i_mapping	2015-12-08 14:51:16 -05:00
adfs	fs/adfs: remove unneeded cast	2015-06-30 19:44:57 -07:00
affs	fs/affs: make root lookup from blkdev logical size	2015-09-10 13:29:01 -07:00
afs	net: Add a struct net parameter to sock_create_kern	2015-05-11 10:50:17 -04:00
autofs4	make simple_positive() public	2015-06-23 18:02:01 -04:00
befs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-07-04 19:36:06 -07:00
bfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-04-26 17:22:07 -07:00
btrfs	Merge branch 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs	2015-12-18 15:35:08 -08:00
cachefiles	FS-Cache: Add missing initialization of ret in cachefiles_write_page()	2015-11-16 20:38:43 -05:00
ceph	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client	2015-11-13 09:24:40 -08:00
cifs	sched/wait: Fix the signal handling fix	2015-12-13 14:30:59 -08:00
coda	fs/coda: fix readlink buffer overflow	2015-09-10 13:29:01 -07:00
configfs	configfs: allow dynamic group creation	2015-11-20 16:17:32 -08:00
cramfs
debugfs	debugfs: fix refcount imbalance in start_creating	2015-11-11 02:04:44 -05:00
devpts	devpts: if initialization failed, don't crash when opening /dev/ptmx	2015-06-30 19:44:58 -07:00
dlm	net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA	2015-12-01 15:45:05 -05:00
ecryptfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2015-11-07 13:05:44 -08:00
efivarfs	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2015-05-06 10:57:37 -07:00
efs	fs/efs: femove unneeded cast	2015-06-25 17:00:42 -07:00
exofs	osd fs: __r4w_get_page rely on PageUptodate for uptodate	2015-12-12 10:15:34 -08:00
exportfs
ext2	ext2, ext4: warn when mounting with dax enabled	2015-11-16 09:43:54 -08:00
ext4	Ext4 bug fixes for v4.4, including fixes for post-2038 time encodings,	2015-12-07 10:25:00 -08:00
f2fs	Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-11-13 18:02:30 -08:00
fat	fat: fix fake_offset handling on error path	2015-11-20 16:17:32 -08:00
freevxfs	freevxfs: Grammar s/an negative/a negative/	2015-08-07 13:59:24 +02:00
fscache	FS-Cache: Handle a write to the page immediately beyond the EOF marker	2015-11-11 02:11:02 -05:00
fuse	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse	2015-12-11 10:56:41 -08:00
gfs2	Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-11-13 18:02:30 -08:00
hfs	hfs: fix B-tree corruption after insertion at position 0	2015-09-10 13:29:01 -07:00
hfsplus	xattr handlers: Pass handler to operations instead of flags	2015-11-13 20:34:32 -05:00
hostfs	fs: create and use seq_show_option for escaping	2015-09-04 16:54:41 -07:00
hpfs	fs/hpfs/namei.c: remove unnecessary new_valid_dev() check	2015-11-09 15:11:24 -08:00
hugetlbfs	mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes	2015-11-20 16:17:32 -08:00
isofs	VFS: normal filesystems (and lustre): d_inode() annotations	2015-04-15 15:06:57 -04:00
jbd2	Ext4 bug fixes for v4.4, including fixes for post-2038 time encodings,	2015-12-07 10:25:00 -08:00
jffs2	xattr handlers: Pass handler to operations instead of flags	2015-11-13 20:34:32 -05:00
jfs	fs/jfs: remove unnecessary new_valid_dev() checks	2015-11-09 15:11:24 -08:00
kernfs	kernfs: implement kernfs_path_len()	2015-08-18 15:49:15 -07:00
lockd	Mainly smaller bugfixes and cleanup. We're still finding some bugs from	2015-11-11 20:11:28 -08:00
logfs	mm, fs: introduce mapping_gfp_constraint()	2015-11-06 17:50:42 -08:00
minix	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-07-04 19:36:06 -07:00
ncpfs	ncpfs: don't allow negative timeouts	2015-11-20 16:17:32 -08:00
nfs	sched/wait: Fix the signal handling fix	2015-12-13 14:30:59 -08:00
nfs_common	lockd: NLM grace period shouldn't block NFSv4 opens	2015-08-13 10:22:06 -04:00
nfsd	nfsd: don't hold ls_mutex across a layout recall	2015-12-16 11:49:58 -05:00
nilfs2	Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-11-11 09:45:24 -08:00
nls
notify	inotify: actually check for invalid bits in sys_inotify_add_watch()	2015-11-05 19:34:48 -08:00
ntfs	mm, fs: introduce mapping_gfp_constraint()	2015-11-06 17:50:42 -08:00
ocfs2	ocfs2/dlm: clear migration_pending when migration target goes down	2015-12-29 17:45:49 -08:00
omfs	omfs: fix potential integer overflow in allocator	2015-05-28 18:25:19 -07:00
openpromfs
overlayfs	ovl: get rid of the dead code left from broken (and disabled) optimizations	2015-12-06 12:31:07 -05:00
proc	proc: fix -ESRCH error when writing to /proc/$pid/coredump_filter	2015-12-18 14:25:40 -08:00
pstore	pstore: fix code comment to match code	2015-11-02 13:41:52 -08:00
qnx4
qnx6	pagemap.h: move dir_pages() over there	2015-06-23 18:02:00 -04:00
quota	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-09-05 20:34:28 -07:00
ramfs	mm, fs: obey gfp_mapping for add_to_page_cache()	2015-10-16 11:42:28 -07:00
reiserfs	Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-11-13 18:02:30 -08:00
romfs	make new_sync_{read,write}() static	2015-04-11 22:29:40 -04:00
squashfs	squashfs: xattr simplifications	2015-11-13 20:34:33 -05:00
sysfs	platform/chrome: Branch for v4.4	2015-11-13 21:53:18 -08:00
sysv	fix sysvfs symlinks	2015-11-23 21:11:08 -05:00
tracefs	tracefs: Fix refcount imbalance in start_creating()	2015-11-04 22:13:45 -05:00
ubifs	Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-11-13 18:02:30 -08:00
udf	udf: Don't modify filesystem for read-only mounts	2015-08-20 14:58:35 +02:00
ufs	fix ufs write vs readpage race when writing into a hole	2015-09-09 10:43:12 -07:00
xfs	xfs: log mount failures don't wait for buffers to be released	2016-01-19 08:28:10 +11:00
aio.c	mm: move ->mremap() from file_operations to vm_operations_struct	2015-09-04 16:54:41 -07:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c	libnvdimm for 4.4:	2015-11-10 12:07:22 -08:00
binfmt_elf.c	Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-11-11 09:45:24 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-04-26 17:22:07 -07:00
binfmt_script.c
block_dev.c	block: detach bdev inode from its wb in __blkdev_put()	2015-12-04 11:02:17 -07:00
buffer.c	vfs: remove unused wrapper block_page_mkwrite()	2015-11-11 02:19:33 -05:00
char_dev.c	fs/char_dev.c: fix incorrect documentation for unregister_chrdev_region	2015-08-05 13:49:35 -07:00
compat_binfmt_elf.c
compat_ioctl.c	i2c-dev: Fix typo in ioctl name reference	2015-10-23 23:26:43 +02:00
compat.c
coredump.c	coredump: change zap_threads() and zap_process() to use for_each_thread()	2015-11-06 17:50:42 -08:00
dax.c	dax: disable pmd mappings	2015-11-16 23:54:45 -08:00
dcache.c	dcache: Reduce the scope of i_lock in d_splice_alias	2015-08-21 02:34:37 -04:00
dcookies.c
direct-io.c	fix the regression from "direct-io: Fix negative return from dio read beyond eof"	2015-12-08 15:02:42 -05:00
drop_caches.c	inode: convert inode_sb_list_lock to per-sb	2015-08-17 18:39:46 -04:00
eventfd.c
eventpoll.c
exec.c	vfs: Commit to never having exectuables on proc and sysfs.	2015-07-10 10:39:25 -05:00
fcntl.c
fhandle.c	vfs: read file_handle only once in handle_to_path	2015-06-02 10:29:07 -07:00
file_table.c	fs, file table: reinit files_stat.max_files after deferred memory initialisation	2015-08-07 04:39:40 +03:00
file.c	vfs: clear remainder of 'full_fds_bits' in dup_fd()	2015-11-05 23:05:32 -08:00
filesystems.c
fs_pin.c	fs_pin: Allow for the possibility that m_list or s_list go unused.	2015-04-09 11:39:55 -05:00
fs_struct.c
fs-writeback.c	fs: fix writeback.c kernel-doc warnings	2015-11-11 02:18:27 -05:00
inode.c	fs: fix inode.c kernel-doc warning	2015-11-11 02:18:27 -05:00
internal.h	inode: rename i_wb_list to i_io_list	2015-08-17 23:38:10 -04:00
ioctl.c
Kconfig	dax: disable pmd mappings	2015-11-16 23:54:45 -08:00
Kconfig.binfmt	mm: split ET_DYN ASLR from mmap ASLR	2015-04-14 16:49:05 -07:00
libfs.c	fs: Set the size of empty dirs to 0.	2015-08-12 15:28:45 -05:00
locks.c	locks: cleanup posix_lock_inode_wait and flock_lock_inode_wait	2015-10-22 14:57:42 -04:00
Makefile	ext4: promote ext4 over ext2 in the default probe order	2015-10-15 10:33:21 -04:00
mbcache.c
mount.h	fs: use seq_open_private() for proc_mounts	2015-06-30 19:44:56 -07:00
mpage.c	mm, fs: introduce mapping_gfp_constraint()	2015-11-06 17:50:42 -08:00
namei.c	Don't reset ->total_link_count on nested calls of vfs_path_lookup()	2015-12-06 12:33:02 -05:00
namespace.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2015-09-01 16:13:25 -07:00
no-block.c
nsfs.c	fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void	2015-09-11 15:21:34 -07:00
open.c	vfs: Commit to never having exectuables on proc and sysfs.	2015-07-10 10:39:25 -05:00
pipe.c	fs/pipe.c: return error code rather than 0 in pipe_write()	2015-11-11 02:18:26 -05:00
pnode.c	mnt: Don't propagate unmounts to locked mounts	2015-04-02 20:34:20 -05:00
pnode.h	mnt: Clarify and correct the disconnect logic in umount_tree	2015-07-22 20:33:27 -05:00
posix_acl.c	xattr handlers: Pass handler to operations instead of flags	2015-11-13 20:34:32 -05:00
proc_namespace.c	fs: use seq_open_private() for proc_mounts	2015-06-30 19:44:56 -07:00
read_write.c	new_sync_write(): discard ->ki_pos unless the return value is positive	2015-04-11 22:29:46 -04:00
readdir.c
select.c	locking/arch: Rename set_mb() to smp_store_mb()	2015-05-19 08:32:00 +02:00
seq_file.c	fs, seqfile: always allow oom killer	2015-11-06 17:50:42 -08:00
signalfd.c	signalfd: fix information leak in signalfd_copyinfo	2015-08-07 04:39:40 +03:00
splice.c	vfs: Avoid softlockups with sendfile(2)	2015-11-23 21:15:30 -05:00
stack.c
stat.c	fs/stat.c: remove unnecessary new_valid_dev() check	2015-11-09 15:11:24 -08:00
statfs.c
super.c	Merge branch 'superblock-scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-next	2015-08-21 02:31:20 -04:00
sync.c	fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback	2015-11-06 17:50:42 -08:00
timerfd.c
userfaultfd.c	userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key"	2015-09-22 15:09:53 -07:00
utimes.c
xattr.c	9p: xattr simplifications	2015-11-13 20:34:33 -05:00