linux/fs/xfs
Dave Chinner 0fe0bbe00a xfs: bunmapi has unnecessary AG lock ordering issues
large directory block size operations are assert failing because
xfs_bunmapi() is not completely removing fragmented directory blocks
like so:

XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677
....
Call Trace:
 xfs_dir2_shrink_inode+0x1a8/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_rename+0xb79/0xc50
 ? avc_has_perm+0x8d/0x1a0
 ? avc_has_perm_noaudit+0x9a/0x120
 xfs_vn_rename+0xdb/0x150
 vfs_rename+0x719/0xb50
 ? __lookup_hash+0x6a/0xa0
 do_renameat2+0x413/0x5e0
 __x64_sys_rename+0x45/0x50
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

We are aborting the bunmapi() pass because of this specific chunk of
code:

                /*
                 * Make sure we don't touch multiple AGF headers out of order
                 * in a single transaction, as that could cause AB-BA deadlocks.
                 */
                if (!wasdel && !isrt) {
                        agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
                        if (prev_agno != NULLAGNUMBER && prev_agno > agno)
                                break;
                        prev_agno = agno;
                }

This is designed to prevent deadlocks in AGF locking when freeing
multiple extents by ensuring that we only ever lock in increasing
AG number order. Unfortunately, this also violates the "bunmapi will
always succeed" semantic that some high level callers depend on,
such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and
xfs_inactive_symlink_rmt().

This AG lock ordering was introduced back in 2017 to fix deadlocks
triggered by generic/299 as reported here:

https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/

This codebase is old enough that it was before we were defering all
AG based extent freeing from within xfs_bunmapi(). THat is, we never
actually lock AGs in xfs_bunmapi() any more - every non-rt based
extent free is added to the defer ops list, as is all BMBT block
freeing. And RT extents are not RT based, so there's no lock
ordering issues associated with them.

Hence this AGF lock ordering code is both broken and dead. Let's
just remove it so that the large directory block code works reliably
again.

Tested against xfs/538 and generic/299 which is the original test
that exposed the deadlocks that this code fixed.

Fixes: 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-05-27 08:11:24 -07:00
..
libxfs xfs: bunmapi has unnecessary AG lock ordering issues 2021-05-27 08:11:24 -07:00
scrub xfs: fix deadlock retry tracepoint arguments 2021-05-20 08:31:22 -07:00
Kconfig
kmem.c
kmem.h
Makefile
mrlock.h
xfs_acl.c xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_acl.h fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
xfs_aops.c iomap: remove unused private field from ioend 2021-05-04 08:54:29 -07:00
xfs_aops.h
xfs_attr_inactive.c
xfs_attr_list.c xfs: rename and simplify xfs_bmap_one_block 2021-04-15 09:35:50 -07:00
xfs_bio_io.c block: Add bio_max_segs 2021-02-26 15:49:51 -07:00
xfs_bmap_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_bmap_item.h
xfs_bmap_util.c xfs: retry allocations when locality-based search fails 2021-05-20 08:28:34 -07:00
xfs_bmap_util.h
xfs_buf_item_recover.c
xfs_buf_item.c xfs: optimise xfs_buf_item_size/format for contiguous regions 2021-03-25 16:47:51 -07:00
xfs_buf_item.h
xfs_buf.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_buf.h xfs: don't drain buffer lru on freeze and read-only remount 2021-01-22 16:54:50 -08:00
xfs_dir2_readdir.c xfs: remove XFS_IFINLINE 2021-04-15 09:35:51 -07:00
xfs_discard.c
xfs_discard.h
xfs_dquot_item_recover.c
xfs_dquot_item.c
xfs_dquot_item.h
xfs_dquot.c xfs: move the XFS_IFEXTENTS check into xfs_iread_extents 2021-04-15 09:35:50 -07:00
xfs_dquot.h
xfs_error.c xfs: add error injection for per-AG resv failure 2021-03-25 16:47:53 -07:00
xfs_error.h
xfs_export.c
xfs_export.h
xfs_extent_busy.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_extent_busy.h treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_extfree_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_extfree_item.h
xfs_file.c xfs: move the di_flags2 field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_filestream.c
xfs_filestream.h xfs: move the di_flags field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_fsmap.c xfs: drop freeze protection when running GETFSMAP 2021-03-24 10:36:05 -07:00
xfs_fsmap.h
xfs_fsops.c xfs: remove obsolete AGF counter debugging 2021-04-29 07:44:18 -07:00
xfs_fsops.h xfs: get rid of xfs_growfs_{data,log}_t 2021-02-03 09:18:50 -08:00
xfs_globals.c xfs: consolidate the eofblocks and cowblocks workers 2021-02-03 09:18:49 -08:00
xfs_health.c
xfs_icache.c xfs: move the xfs_can_free_eofblocks call under the IOLOCK 2021-04-07 14:38:16 -07:00
xfs_icache.h xfs: rename block gc start and stop functions 2021-02-03 09:18:50 -08:00
xfs_icreate_item.c
xfs_icreate_item.h
xfs_inode_item_recover.c xfs: rename struct xfs_legacy_ictimestamp 2021-04-22 18:29:25 -07:00
xfs_inode_item.c xfs: rename struct xfs_legacy_ictimestamp 2021-04-22 18:29:25 -07:00
xfs_inode_item.h
xfs_inode.c xfs: validate extsz hints against rt extent size when rtinherit is set 2021-05-24 18:01:04 -07:00
xfs_inode.h xfs: move the di_crtime field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_ioctl32.c xfs: convert to fileattr 2021-04-12 15:04:29 +02:00
xfs_ioctl32.h xfs: convert to fileattr 2021-04-12 15:04:29 +02:00
xfs_ioctl.c xfs: validate extsz hints against rt extent size when rtinherit is set 2021-05-24 18:01:04 -07:00
xfs_ioctl.h xfs: convert to fileattr 2021-04-12 15:04:29 +02:00
xfs_iomap.c xfs: remove XFS_IFEXTENTS 2021-04-15 09:35:51 -07:00
xfs_iomap.h
xfs_iops.c New code for 5.13: 2021-04-29 10:43:51 -07:00
xfs_iops.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_itable.c xfs: move the di_crtime field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_itable.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_iwalk.c xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_iwalk.h
xfs_linux.h xfs: move the di_flags field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_log_cil.c
xfs_log_priv.h
xfs_log_recover.c xfs: remove the di_dmevmask and di_dmstate fields from struct xfs_icdinode 2021-04-07 14:37:03 -07:00
xfs_log.c xfs: don't allow log writes if the data device is readonly 2021-05-04 08:43:27 -07:00
xfs_log.h xfs: cover the log during log quiesce 2021-01-22 16:54:51 -08:00
xfs_message.c
xfs_message.h xfs: validate extsz hints against rt extent size when rtinherit is set 2021-05-24 18:01:04 -07:00
xfs_mount.c xfs: set aside allocation btree blocks from block reservation 2021-04-29 07:45:44 -07:00
xfs_mount.h xfs: introduce in-core global counter of allocbt blocks 2021-04-29 07:45:44 -07:00
xfs_mru_cache.c xfs: set WQ_SYSFS on all workqueues in debug mode 2021-02-03 09:18:49 -08:00
xfs_mru_cache.h
xfs_ondisk.h xfs: rename struct xfs_legacy_ictimestamp 2021-04-22 18:29:25 -07:00
xfs_pnfs.c xfs: move the di_size field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_pnfs.h
xfs_pwork.c xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_pwork.h xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_qm_bhv.c xfs: move the di_projid field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_qm_syscalls.c xfs: move the di_size field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_qm.c xfs: move the XFS_IFEXTENTS check into xfs_iread_extents 2021-04-15 09:35:50 -07:00
xfs_qm.h
xfs_quota.h xfs: remove xfs_qm_vop_chown_reserve 2021-02-03 09:18:49 -08:00
xfs_quotaops.c xfs: move the di_nblocks field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_refcount_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_refcount_item.h
xfs_reflink.c xfs: fix xfs_reflink_unshare usage of filemap_write_and_wait_range 2021-04-29 07:45:44 -07:00
xfs_reflink.h
xfs_rmap_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_rmap_item.h
xfs_rtalloc.c xfs: move the di_flags field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_rtalloc.h xfs: remove xfs_buf_t typedef 2020-12-16 16:07:34 -08:00
xfs_stats.c
xfs_stats.h
xfs_super.c xfs: move the di_flags field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_super.h xfs: remove xfs_quiesce_attr declaration 2021-04-16 08:28:36 -07:00
xfs_symlink.c New code for 5.13: 2021-04-29 10:43:51 -07:00
xfs_symlink.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_sysctl.c xfs: restore speculative_cow_prealloc_lifetime sysctl 2021-02-24 10:16:08 -08:00
xfs_sysctl.h xfs: consolidate the eofblocks and cowblocks workers 2021-02-03 09:18:49 -08:00
xfs_sysfs.c
xfs_sysfs.h
xfs_trace.c xfs: add a tracepoint for blockgc scans 2021-02-03 09:18:49 -08:00
xfs_trace.h xfs: move the di_size field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_trans_ail.c
xfs_trans_buf.c xfs: remove xfs_buf_t typedef 2020-12-16 16:07:34 -08:00
xfs_trans_dquot.c xfs: shut down the filesystem if we screw up quota reservation 2021-02-03 09:18:49 -08:00
xfs_trans_priv.h
xfs_trans.c xfs: update superblock counters correctly for !lazysbcount 2021-04-29 07:44:18 -07:00
xfs_trans.h xfs: remove obsolete AGF counter debugging 2021-04-29 07:44:18 -07:00
xfs_xattr.c xfs: prevent metadata files from being inactivated 2021-03-25 16:47:50 -07:00
xfs.h