linux/fs/btrfs
Filipe Manana ff1f8250a9 Btrfs: fix race between block group creation and their cache writeout
So creating a block group has 2 distinct phases:

Phase 1 - creates the btrfs_block_group_cache item and adds it to the
rbtree fs_info->block_group_cache_tree and to the corresponding list
space_info->block_groups[];

Phase 2 - adds the block group item to the extent tree and corresponding
items to the chunk tree.

The first phase adds the block_group_cache_item to a list of pending block
groups in the transaction handle, and phase 2 happens when
btrfs_end_transaction() is called against the transaction handle.

It happens that once phase 1 completes, other concurrent tasks that use
their own transaction handle, but points to the same running transaction
(struct btrfs_trans_handle->transaction), can use this block group for
space allocations and therefore mark it dirty. Dirty block groups are
tracked in a list belonging to the currently running transaction (struct
btrfs_transaction) and not in the transaction handle (btrfs_trans_handle).

This is a problem because once a task calls btrfs_commit_transaction(),
it calls btrfs_start_dirty_block_groups() which will see all dirty block
groups and attempt to start their writeout, including those that are
still attached to the transaction handle of some concurrent task that
hasn't called btrfs_end_transaction() yet - which means those block
groups haven't gone through phase 2 yet and therefore when
write_one_cache_group() is called, it won't find the block group items
in the extent tree and abort the current transaction with -ENOENT,
turning the fs into readonly mode and require a remount.

Fix this by ignoring -ENOENT when looking for block group items in the
extent tree when we attempt to start the writeout of the block group
caches outside the critical section of the transaction commit. We will
try again later during the critical section and if there we still don't
find the block group item in the extent tree, we then abort the current
transaction.

This issue happened twice, once while running fstests btrfs/067 and once
for btrfs/078, which produced the following trace:

[ 3278.703014] WARNING: CPU: 7 PID: 18499 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
[ 3278.707329] BTRFS: Transaction aborted (error -2)
(...)
[ 3278.731555] Call Trace:
[ 3278.732396]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
[ 3278.733860]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
[ 3278.735312]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
[ 3278.736874]  [<ffffffffa03ada6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
[ 3278.738302]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
[ 3278.739520]  [<ffffffffa03ada6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
[ 3278.741222]  [<ffffffffa03b9e56>] write_one_cache_group+0xae/0xbf [btrfs]
[ 3278.742797]  [<ffffffffa03c487b>] btrfs_start_dirty_block_groups+0x170/0x2b2 [btrfs]
[ 3278.744492]  [<ffffffffa03d309c>] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
[ 3278.746084]  [<ffffffff8107d33d>] ? trace_hardirqs_on+0xd/0xf
[ 3278.747249]  [<ffffffffa03e5660>] btrfs_sync_file+0x313/0x387 [btrfs]
[ 3278.748744]  [<ffffffff8117acad>] vfs_fsync_range+0x95/0xa4
[ 3278.749958]  [<ffffffff81435b54>] ? ret_from_sys_call+0x1d/0x58
[ 3278.751218]  [<ffffffff8117acd8>] vfs_fsync+0x1c/0x1e
[ 3278.754197]  [<ffffffff8117ae54>] do_fsync+0x34/0x4e
[ 3278.755192]  [<ffffffff8117b07c>] SyS_fsync+0x10/0x14
[ 3278.756236]  [<ffffffff81435b32>] system_call_fastpath+0x12/0x17
[ 3278.757366] ---[ end trace 9a4d4df4969709aa ]---

Fixes: 1bbc621ef2 ("Btrfs: allow block group cache writeout
                      outside critical section in commit")

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-05-11 07:59:10 -07:00
..
tests Btrfs: qgroup: cleanup, remove an unsued parameter in btrfs_create_qgroup(). 2015-04-13 07:52:44 -07:00
acl.c btrfs: remove useless ACL check 2014-06-09 17:20:42 -07:00
async-thread.c btrfs: use correct type for workqueue flags 2015-02-16 18:48:47 +01:00
async-thread.h btrfs: use correct type for workqueue flags 2015-02-16 18:48:47 +01:00
backref.c btrfs: use explicit initializer for seq_elem 2015-03-03 17:23:59 +01:00
backref.h btrfs: cleanup, remove inode_item_info helper 2015-01-14 19:23:47 +01:00
btrfs_inode.h Btrfs: fix metadata inconsistencies after directory fsync 2015-03-26 17:56:23 -07:00
check-integrity.c Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 2015-03-25 10:52:48 -07:00
check-integrity.h block: submit_bio_wait() conversions 2013-11-24 16:33:41 -07:00
compression.c Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 2015-03-25 10:52:48 -07:00
compression.h btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00
ctree.c Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 2015-03-25 10:52:48 -07:00
ctree.h btrfs: Don't allow subvolid >= (1 << BTRFS_QGROUP_LEVEL_SHIFT) to be created 2015-04-13 07:52:54 -07:00
delayed-inode.c Btrfs: fill ->last_trans for delayed inode in btrfs_fill_inode. 2015-04-26 06:27:03 -07:00
delayed-inode.h Btrfs: introduce the delayed inode ref deletion for the single link inode 2014-01-28 13:20:09 -08:00
delayed-ref.c Btrfs: account for crcs in delayed ref processing 2015-04-10 14:04:47 -07:00
delayed-ref.h Btrfs: account for crcs in delayed ref processing 2015-04-10 14:04:47 -07:00
dev-replace.c btrfs: cleanup 64bit/32bit divs, compile time constants 2015-03-03 17:23:57 +01:00
dev-replace.h
dir-item.c Btrfs: make xattr replace operations atomic 2014-11-20 17:20:07 -08:00
disk-io.c btrfs: Fix NO_SPACE bug caused by delayed-iput 2015-04-13 07:27:41 -07:00
disk-io.h Btrfs: disk-io: replace root args iff only fs_info used 2015-02-16 18:48:43 +01:00
export.c btrfs: kill the key type accessor helpers 2014-09-17 13:37:12 -07:00
export.h
extent_io.c Btrfs: btrfs_release_extent_buffer_page didn't free pages of dummy extent 2015-04-29 13:22:09 -07:00
extent_io.h btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00
extent_map.c Btrfs: do not move em to modified list when unpinning 2014-11-21 11:59:54 -08:00
extent_map.h Btrfs: fix NULL pointer crash when running balance and scrub concurrently 2014-06-19 14:20:55 -07:00
extent-tree.c Btrfs: fix race between block group creation and their cache writeout 2015-05-11 07:59:10 -07:00
file-item.c Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 2015-03-25 10:52:48 -07:00
file.c btrfs: qgroup: do a reservation in a higher level. 2015-04-13 07:52:50 -07:00
free-space-cache.c Btrfs: fix crash after inode cache writeback failure 2015-05-11 07:59:10 -07:00
free-space-cache.h Btrfs: allow block group cache writeout outside critical section in commit 2015-04-10 14:07:22 -07:00
hash.c btrfs: LLVMLinux: Remove VLAIS 2014-10-14 10:51:22 +02:00
hash.h Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
inode-item.c Btrfs: fix fsync log replay for inodes with a mix of regular refs and extrefs 2015-01-21 18:02:05 -08:00
inode-map.c Btrfs: allow block group cache writeout outside critical section in commit 2015-04-10 14:07:22 -07:00
inode-map.h
inode.c Btrfs: fill ->last_trans for delayed inode in btrfs_fill_inode. 2015-04-26 06:27:03 -07:00
ioctl.c btrfs: unlock i_mutex after attempting to delete subvolume during send 2015-04-26 06:27:02 -07:00
Kconfig rcu: Make SRCU optional by using CONFIG_SRCU 2015-01-06 11:04:29 -08:00
locking.c btrfs: fix lockups from btrfs_clear_path_blocking 2014-11-19 10:34:35 -08:00
locking.h btrfs: fix lockups from btrfs_clear_path_blocking 2014-11-19 10:34:35 -08:00
lzo.c btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00
Makefile Btrfs: add sanity tests for new qgroup accounting code 2014-06-09 17:20:49 -07:00
math.h btrfs: cleanup 64bit/32bit divs, compile time constants 2015-03-03 17:23:57 +01:00
ordered-data.c Btrfs: fix panic when starting bg cache writeout after IO error 2015-05-11 07:59:10 -07:00
ordered-data.h Btrfs: collect only the necessary ordered extents on ranged fsync 2014-11-21 11:59:56 -08:00
orphan.c btrfs: kill the key type accessor helpers 2014-09-17 13:37:12 -07:00
print-tree.c btrfs: remove parameter blocksize from read_tree_block 2014-10-02 17:14:50 +02:00
print-tree.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
props.c btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00
props.h Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
qgroup.c btrfs: quota: Automatically update related qgroups or mark INCONSISTENT flags when assigning/deleting a qgroup relations. 2015-04-13 07:52:59 -07:00
qgroup.h btrfs: qgroup: do a reservation in a higher level. 2015-04-13 07:52:50 -07:00
raid56.c Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 2015-03-25 10:52:48 -07:00
raid56.h Btrfs: Make raid_map array be inlined in btrfs_bio structure 2015-01-21 18:06:47 -08:00
rcu-string.h
reada.c Btrfs: add ref_count and free function for btrfs_bio 2015-01-21 18:06:48 -08:00
relocation.c btrfs: qgroup: do a reservation in a higher level. 2015-04-13 07:52:50 -07:00
root-tree.c Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root 2014-06-09 17:20:40 -07:00
scrub.c Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 2015-03-25 10:52:48 -07:00
send.c Btrfs: incremental send, remove dead code 2015-03-26 17:55:52 -07:00
send.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
struct-funcs.c
super.c btrfs: cleanup orphans while looking up default subvolume 2015-03-26 18:10:25 -07:00
sysfs.c btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00
sysfs.h btrfs: switch helper macros to static inlines in sysfs.h 2015-03-03 17:24:02 +01:00
transaction.c Btrfs: allow block group cache writeout outside critical section in commit 2015-04-10 14:07:22 -07:00
transaction.h Btrfs: allow block group cache writeout outside critical section in commit 2015-04-10 14:07:22 -07:00
tree-defrag.c Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root 2014-06-09 17:20:40 -07:00
tree-log.c btrfs: actively run the delayed refs while deleting large files 2015-04-10 14:00:14 -07:00
tree-log.h Btrfs: fix metadata inconsistencies after directory fsync 2015-03-26 17:56:23 -07:00
ulist.c Btrfs: do not export ulist functions 2014-01-29 07:06:27 -08:00
ulist.h Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch 2014-08-15 07:43:19 -07:00
uuid-tree.c Btrfs: make btrfs_search_forward return with nodes unlocked 2014-09-17 13:38:02 -07:00
volumes.c Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole 2015-04-26 06:26:59 -07:00
volumes.h btrfs: remove unused fs_info arg from btrfs_close_extra_devices() 2015-02-16 18:48:45 +01:00
xattr.c btrfs: don't accept bare namespace as a valid xattr 2015-03-26 18:10:24 -07:00
xattr.h btrfs: use generic posix ACL infrastructure 2014-01-25 23:58:18 -05:00
zlib.c btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00