linux/fs/xfs
Darrick J. Wong 4c6dbfd275 xfs: attach dquots to inode before reading data/cow fork mappings
I've been running near-continuous integration testing of online fsck,
and I've noticed that once a day, one of the ARM VMs will fail the test
with out of order records in the data fork.

xfs/804 races fsstress with online scrub (aka scan but do not change
anything), so I think this might be a bug in the core xfs code.  This
also only seems to trigger if one runs the test for more than ~6 minutes
via TIME_FACTOR=13 or something.
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/tree/tests/xfs/804?h=djwong-wtf

I added a debugging patch to the kernel to check the data fork extents
after taking the ILOCK, before dropping ILOCK, and before and after each
bmapping operation.  So far I've narrowed it down to the delalloc code
inserting a record in the wrong place in the iext tree:

xfs_bmap_add_extent_hole_delay, near line 2691:

	case 0:
		/*
		 * New allocation is not contiguous with another
		 * delayed allocation.
		 * Insert a new entry.
		 */
		oldlen = newlen = 0;
		xfs_iunlock_check_datafork(ip);		<-- ok here
		xfs_iext_insert(ip, icur, new, state);
		xfs_iunlock_check_datafork(ip);		<-- bad here
		break;
	}

I recorded the state of the data fork mappings and iext cursor state
when a corrupt data fork is detected immediately after the
xfs_bmap_add_extent_hole_delay call in xfs_bmapi_reserve_delalloc:

ino 0x140bb3 func xfs_bmapi_reserve_delalloc line 4164 data fork:
    ino 0x140bb3 nr 0x0 nr_real 0x0 offset 0xb9 blockcount 0x1f startblock 0x935de2 state 1
    ino 0x140bb3 nr 0x1 nr_real 0x1 offset 0xe6 blockcount 0xa startblock 0xffffffffe0007 state 0
    ino 0x140bb3 nr 0x2 nr_real 0x1 offset 0xd8 blockcount 0xe startblock 0x935e01 state 0

Here we see that a delalloc extent was inserted into the wrong position
in the iext leaf, same as all the other times.  The extra trace data I
collected are as follows:

ino 0x140bb3 fork 0 oldoff 0xe6 oldlen 0x4 oldprealloc 0x6 isize 0xe6000
    ino 0x140bb3 oldgotoff 0xea oldgotstart 0xfffffffffffffffe oldgotcount 0x0 oldgotstate 0
    ino 0x140bb3 crapgotoff 0x0 crapgotstart 0x0 crapgotcount 0x0 crapgotstate 0
    ino 0x140bb3 freshgotoff 0xd8 freshgotstart 0x935e01 freshgotcount 0xe freshgotstate 0
    ino 0x140bb3 nowgotoff 0xe6 nowgotstart 0xffffffffe0007 nowgotcount 0xa nowgotstate 0
    ino 0x140bb3 oldicurpos 1 oldleafnr 2 oldleaf 0xfffffc00f0609a00
    ino 0x140bb3 crapicurpos 2 crapleafnr 2 crapleaf 0xfffffc00f0609a00
    ino 0x140bb3 freshicurpos 1 freshleafnr 2 freshleaf 0xfffffc00f0609a00
    ino 0x140bb3 newicurpos 1 newleafnr 3 newleaf 0xfffffc00f0609a00

The first line shows that xfs_bmapi_reserve_delalloc was called with
whichfork=XFS_DATA_FORK, off=0xe6, len=0x4, prealloc=6.

The second line ("oldgot") shows the contents of @got at the beginning
of the call, which are the results of the first iext lookup in
xfs_buffered_write_iomap_begin.

Line 3 ("crapgot") is the result of duplicating the cursor at the start
of the body of xfs_bmapi_reserve_delalloc and performing a fresh lookup
at @off.

Line 4 ("freshgot") is the result of a new xfs_iext_get_extent right
before the call to xfs_bmap_add_extent_hole_delay.  Totally garbage.

Line 5 ("nowgot") is contents of @got after the
xfs_bmap_add_extent_hole_delay call.

Line 6 is the contents of @icur at the beginning fo the call.  Lines 7-9
are the contents of the iext cursors at the point where the block
mappings were sampled.

I think @oldgot is a HOLESTARTBLOCK extent because the first lookup
didn't find anything, so we filled in imap with "fake hole until the
end".  At the time of the first lookup, I suspect that there's only one
32-block unwritten extent in the mapping (hence oldicurpos==1) but by
the time we get to recording crapgot, crapicurpos==2.

Dave then added:

Ok, that's much simpler to reason about, and implies the smoke is
coming from xfs_buffered_write_iomap_begin() or
xfs_bmapi_reserve_delalloc(). I suspect the former - it does a lot
of stuff with the ILOCK_EXCL held.....

.... including calling xfs_qm_dqattach_locked().

xfs_buffered_write_iomap_begin
  ILOCK_EXCL
  look up icur
  xfs_qm_dqattach_locked
    xfs_qm_dqattach_one
      xfs_qm_dqget_inode
        dquot cache miss
        xfs_iunlock(ip, XFS_ILOCK_EXCL);
        error = xfs_qm_dqread(mp, id, type, can_alloc, &dqp);
        xfs_ilock(ip, XFS_ILOCK_EXCL);
  ....
  xfs_bmapi_reserve_delalloc(icur)

Yup, that's what is letting the magic smoke out -
xfs_qm_dqattach_locked() can cycle the ILOCK. If that happens, we
can pass a stale icur to xfs_bmapi_reserve_delalloc() and it all
goes downhill from there.

Back to Darrick now:

So.  Fix this by moving the dqattach_locked call up before we take the
ILOCK, like all the other callers in that file.

Fixes: a526c85c22 ("xfs: move xfs_file_iomap_begin_delay around") # goes further back than this
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-30 08:55:18 -08:00
..
libxfs xfs: add debug knob to slow down write for fun 2022-11-28 17:54:49 -08:00
scrub xfs: check inode core when scrubbing metadata files 2022-11-16 16:11:51 -08:00
Kconfig
kmem.c mm: introduce memalloc_retry_wait() 2022-01-15 16:30:29 +02:00
kmem.h xfs: remove kmem_zone typedef 2021-10-22 16:00:31 -07:00
Makefile - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
mrlock.h
xfs_acl.c xfs: move xfs_attr_use_log_assist usage out of libxfs 2022-05-27 10:34:04 +10:00
xfs_acl.h xfs: improve __xfs_set_acl 2022-04-26 13:34:42 +10:00
xfs_aops.c xfs: add debug knob to slow down writeback for fun 2022-11-28 17:24:35 -08:00
xfs_aops.h
xfs_attr_inactive.c xfs: don't leak memory when attr fork loading fails 2022-07-20 16:40:39 -07:00
xfs_attr_item.c xfs: dump corrupt recovered log intent items to dmesg consistently 2022-10-31 08:58:20 -07:00
xfs_attr_item.h xfs: share xattr name and value buffers when logging xattr updates 2022-05-23 08:43:46 +10:00
xfs_attr_list.c xfs: use XFS_IFORK_Q to determine the presence of an xattr fork 2022-07-09 15:17:21 -07:00
xfs_bio_io.c fs/xfs: Use the enum req_op and blk_opf_t types 2022-07-14 12:14:33 -06:00
xfs_bmap_item.c xfs: dump corrupt recovered log intent items to dmesg consistently 2022-10-31 08:58:20 -07:00
xfs_bmap_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_bmap_util.c xfs: xfs_bmap_punch_delalloc_range() should take a byte range 2022-11-29 09:09:17 +11:00
xfs_bmap_util.h xfs: xfs_bmap_punch_delalloc_range() should take a byte range 2022-11-29 09:09:17 +11:00
xfs_buf_item_recover.c xfs: convert buf_cancel_table allocation to kmalloc_array 2022-05-27 10:27:19 +10:00
xfs_buf_item.c xfs: log items should have a xlog pointer, not a mount 2022-03-20 08:59:49 -07:00
xfs_buf_item.h xfs: convert buffer log item flags to unsigned. 2022-04-21 10:46:40 +10:00
xfs_buf.c xfs: invalidate block device page cache during unmount 2022-11-30 08:55:18 -08:00
xfs_buf.h xfs: xfs_buf cache destroy isn't RCU safe 2022-07-20 16:40:39 -07:00
xfs_dir2_readdir.c xfs: rearrange the logic and remove the broken comment for xfs_dir2_isxx 2022-10-04 16:39:58 +11:00
xfs_discard.c xfs: pass perag to xfs_alloc_read_agf() 2022-07-07 19:07:40 +10:00
xfs_discard.h
xfs_dquot_item_recover.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_dquot_item.c xfs: remove support for disabling quota accounting on a mounted file system 2021-08-06 11:05:36 -07:00
xfs_dquot_item.h xfs: remove support for disabling quota accounting on a mounted file system 2021-08-06 11:05:36 -07:00
xfs_dquot.c xfs: Fix comment typo 2022-07-22 10:58:39 -07:00
xfs_dquot.h xfs: remove warning counters from struct xfs_dquot_res 2022-05-11 17:12:09 +10:00
xfs_error.c xfs: add debug knob to slow down write for fun 2022-11-28 17:54:49 -08:00
xfs_error.h xfs: add debug knob to slow down writeback for fun 2022-11-28 17:24:35 -08:00
xfs_export.c xfs: convert remaining mount flags to state flags 2021-08-19 10:07:13 -07:00
xfs_export.h
xfs_extent_busy.c
xfs_extent_busy.h
xfs_extfree_item.c xfs: dump corrupt recovered log intent items to dmesg consistently 2022-10-31 08:58:20 -07:00
xfs_extfree_item.h xfs: refactor all the EFI/EFD log item sizeof logic 2022-10-31 08:58:20 -07:00
xfs_file.c xfs: write page faults in iomap are not buffered writes 2022-11-07 10:09:11 +11:00
xfs_filestream.c xfs: pass perag to xfs_alloc_read_agf() 2022-07-07 19:07:40 +10:00
xfs_filestream.h xfs: convert mount flags to features 2021-08-19 10:07:12 -07:00
xfs_fsmap.c xfs: make rtbitmap ILOCKing consistent when scanning the rt bitmap file 2022-11-16 15:25:03 -08:00
xfs_fsmap.h
xfs_fsops.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
xfs_fsops.h
xfs_globals.c xfs: Add larp debug option 2022-05-11 17:01:22 +10:00
xfs_health.c xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown 2021-08-19 10:07:13 -07:00
xfs_icache.c xfs: fix incorrect i_nlink caused by inode racing 2022-11-21 10:00:01 -08:00
xfs_icache.h xfs: introduce xfs_inodegc_push() 2022-06-23 13:34:38 -07:00
xfs_icreate_item.c xfs: fix potential log item leak 2022-05-04 11:45:11 +10:00
xfs_icreate_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_inode_item_recover.c xfs: clean up "%Ld/%Lu" which doesn't meet C standard 2022-09-19 06:47:14 +10:00
xfs_inode_item.c xfs: remove the redundant word in comment 2022-09-19 06:45:14 +10:00
xfs_inode_item.h xfs: aborting inodes on shutdown may need buffer lock 2022-03-29 18:21:59 -07:00
xfs_inode.c xfs: fix incorrect error-out in xfs_remove 2022-11-16 19:20:20 -08:00
xfs_inode.h - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
xfs_ioctl32.c xfs: Set up infrastructure for log attribute replay 2022-05-04 12:41:02 +10:00
xfs_ioctl32.h xfs: remove unused xfs_ioctl32.h declarations 2022-01-18 10:18:36 -08:00
xfs_ioctl.c xfs: convert XFS_IFORK_PTR to a static inline helper 2022-07-09 15:17:21 -07:00
xfs_ioctl.h xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls 2022-01-17 09:16:41 -08:00
xfs_iomap.c xfs: attach dquots to inode before reading data/cow fork mappings 2022-11-30 08:55:18 -08:00
xfs_iomap.h xfs: use iomap_valid method to detect stale cached iomaps 2022-11-29 09:09:17 +11:00
xfs_iops.c xfs: changes for 6.1-rc1 2022-10-10 20:32:10 -07:00
xfs_iops.h xfs: remove xfs_setattr_time() declaration 2022-09-19 06:53:14 +10:00
xfs_itable.c xfs: port to vfs{g,u}id_t and associated helpers 2022-09-19 06:54:14 +10:00
xfs_itable.h xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters 2022-04-13 07:02:45 +00:00
xfs_iunlink_item.c xfs: add in-memory iunlink log item 2022-07-14 11:47:42 +10:00
xfs_iunlink_item.h xfs: add in-memory iunlink log item 2022-07-14 11:47:42 +10:00
xfs_iwalk.c xfs: avoid buffer deadlocks when walking fs inodes 2021-08-09 11:13:16 -07:00
xfs_iwalk.h xfs: Decouple XFS_IBULK flags from XFS_IWALK flags 2022-04-13 07:02:44 +00:00
xfs_linux.h fs/xfs: Use the enum req_op and blk_opf_t types 2022-07-14 12:14:33 -06:00
xfs_log_cil.c xfs: xlog_sync() manually adjusts grant head space 2022-07-07 18:56:09 +10:00
xfs_log_priv.h xfs: xlog_sync() manually adjusts grant head space 2022-07-07 18:56:09 +10:00
xfs_log_recover.c xfs: avoid a UAF when log intent item recovery fails 2022-10-18 14:39:29 -07:00
xfs_log.c xfs: Print XFS UUID on mount and umount events. 2022-11-16 19:20:21 -08:00
xfs_log.h xfs: move CIL ordering to the logvec chain 2022-07-07 18:56:08 +10:00
xfs_message.c Merge branch 'guilt/xfs-unsigned-flags-5.18' into xfs-5.19-for-next 2022-04-21 16:45:03 +10:00
xfs_message.h xfs: implement per-mount warnings for scrub and shrink usage 2022-05-27 10:31:34 +10:00
xfs_mount.c xfs: fix sb write verify for lazysbcount 2022-11-16 19:20:20 -08:00
xfs_mount.h - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
xfs_mru_cache.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_mru_cache.h
xfs_notify_failure.c xfs: changes for 6.1-rc1 2022-10-10 20:32:10 -07:00
xfs_ondisk.h xfs: fix memcpy fortify errors in EFI log format copying 2022-10-31 08:58:20 -07:00
xfs_pnfs.c xfs: use iomap_valid method to detect stale cached iomaps 2022-11-29 09:09:17 +11:00
xfs_pnfs.h
xfs_pwork.c
xfs_pwork.h
xfs_qm_bhv.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_qm_syscalls.c xfs: introduce xfs_inodegc_push() 2022-06-23 13:34:38 -07:00
xfs_qm.c New code for 6.0: 2022-08-13 13:50:11 -07:00
xfs_qm.h xfs: remove quota warning limit from struct xfs_quota_limits 2022-05-11 17:12:09 +10:00
xfs_quota.h xfs: queue inactivation immediately when quota is nearing enforcement 2021-08-09 10:52:18 -07:00
xfs_quotaops.c xfs: don't set quota warning values 2022-05-11 17:12:09 +10:00
xfs_refcount_item.c xfs: dump corrupt recovered log intent items to dmesg consistently 2022-10-31 08:58:20 -07:00
xfs_refcount_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_reflink.c xfs: simplify if-else condition in xfs_reflink_trim_around_shared 2022-09-19 06:50:14 +10:00
xfs_reflink.h xfs: pass perag to xfs_alloc_read_agf() 2022-07-07 19:07:40 +10:00
xfs_rmap_item.c xfs: dump corrupt recovered log intent items to dmesg consistently 2022-10-31 08:58:20 -07:00
xfs_rmap_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_rtalloc.c xfs: make rtbitmap ILOCKing consistent when scanning the rt bitmap file 2022-11-16 15:25:03 -08:00
xfs_rtalloc.h xfs: recalculate free rt extents after log recovery 2022-04-12 06:49:42 +10:00
xfs_stats.c xfs: replace unnecessary seq_printf with seq_puts 2022-09-19 06:48:14 +10:00
xfs_stats.h
xfs_super.c xfs: Print XFS UUID on mount and umount events. 2022-11-16 19:20:21 -08:00
xfs_super.h xfs: implement ->notify_failure() for XFS 2022-07-17 17:14:30 -07:00
xfs_symlink.c xfs: replace inode fork size macros with functions 2022-07-12 11:17:27 -07:00
xfs_symlink.h
xfs_sysctl.c
xfs_sysctl.h xfs: Add larp debug option 2022-05-11 17:01:22 +10:00
xfs_sysfs.c xfs: Add larp debug option 2022-05-11 17:01:22 +10:00
xfs_sysfs.h xfs: Fix unreferenced object reported by kmemleak in xfs_sysfs_init() 2022-10-20 09:42:56 -07:00
xfs_trace.c xfs: add debug knob to slow down writeback for fun 2022-11-28 17:24:35 -08:00
xfs_trace.h xfs: add debug knob to slow down write for fun 2022-11-28 17:54:49 -08:00
xfs_trans_ail.c xfs: shut up -Wuninitialized in xfsaild_push 2022-11-30 08:55:18 -08:00
xfs_trans_buf.c xfs: introduce xfs_buf_daddr() 2021-08-19 10:07:14 -07:00
xfs_trans_dquot.c xfs: remove quota warning limit from struct xfs_quota_limits 2022-05-11 17:12:09 +10:00
xfs_trans_priv.h xfs: convert log vector chain to use list heads 2022-07-07 18:55:59 +10:00
xfs_trans.c xfs: introduce in-memory inode unlink log items 2022-07-14 09:21:42 -07:00
xfs_trans.h xfs: introduce in-memory inode unlink log items 2022-07-14 09:21:42 -07:00
xfs_xattr.c xfs: use memcpy, not strncpy, to format the attr prefix during listxattr 2022-11-30 08:55:18 -08:00
xfs_xattr.h xfs: move xfs_attr_use_log_assist usage out of libxfs 2022-05-27 10:34:04 +10:00
xfs.h