linux/fs/btrfs
Filipe Manana 8e7f82deb0 btrfs: fix race between reading a directory and adding entries to it
When opening a directory (opendir(3)) or rewinding it (rewinddir(3)), we
are not holding the directory's inode locked, and this can result in later
attempting to add two entries to the directory with the same index number,
resulting in a transaction abort, with -EEXIST (-17), when inserting the
second delayed dir index. This results in a trace like the following:

  Sep 11 22:34:59 myhostname kernel: BTRFS error (device dm-3): err add delayed dir index item(name: cockroach-stderr.log) into the insertion tree of the delayed node(root id: 5, inode id: 4539217, errno: -17)
  Sep 11 22:34:59 myhostname kernel: ------------[ cut here ]------------
  Sep 11 22:34:59 myhostname kernel: kernel BUG at fs/btrfs/delayed-inode.c:1504!
  Sep 11 22:34:59 myhostname kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  Sep 11 22:34:59 myhostname kernel: CPU: 0 PID: 7159 Comm: cockroach Not tainted 6.4.15-200.fc38.x86_64 #1
  Sep 11 22:34:59 myhostname kernel: Hardware name: ASUS ESC500 G3/P9D WS, BIOS 2402 06/27/2018
  Sep 11 22:34:59 myhostname kernel: RIP: 0010:btrfs_insert_delayed_dir_index+0x1da/0x260
  Sep 11 22:34:59 myhostname kernel: Code: eb dd 48 (...)
  Sep 11 22:34:59 myhostname kernel: RSP: 0000:ffffa9980e0fbb28 EFLAGS: 00010282
  Sep 11 22:34:59 myhostname kernel: RAX: 0000000000000000 RBX: ffff8b10b8f4a3c0 RCX: 0000000000000000
  Sep 11 22:34:59 myhostname kernel: RDX: 0000000000000000 RSI: ffff8b177ec21540 RDI: ffff8b177ec21540
  Sep 11 22:34:59 myhostname kernel: RBP: ffff8b110cf80888 R08: 0000000000000000 R09: ffffa9980e0fb938
  Sep 11 22:34:59 myhostname kernel: R10: 0000000000000003 R11: ffffffff86146508 R12: 0000000000000014
  Sep 11 22:34:59 myhostname kernel: R13: ffff8b1131ae5b40 R14: ffff8b10b8f4a418 R15: 00000000ffffffef
  Sep 11 22:34:59 myhostname kernel: FS:  00007fb14a7fe6c0(0000) GS:ffff8b177ec00000(0000) knlGS:0000000000000000
  Sep 11 22:34:59 myhostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  Sep 11 22:34:59 myhostname kernel: CR2: 000000c00143d000 CR3: 00000001b3b4e002 CR4: 00000000001706f0
  Sep 11 22:34:59 myhostname kernel: Call Trace:
  Sep 11 22:34:59 myhostname kernel:  <TASK>
  Sep 11 22:34:59 myhostname kernel:  ? die+0x36/0x90
  Sep 11 22:34:59 myhostname kernel:  ? do_trap+0xda/0x100
  Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
  Sep 11 22:34:59 myhostname kernel:  ? do_error_trap+0x6a/0x90
  Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
  Sep 11 22:34:59 myhostname kernel:  ? exc_invalid_op+0x50/0x70
  Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
  Sep 11 22:34:59 myhostname kernel:  ? asm_exc_invalid_op+0x1a/0x20
  Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
  Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
  Sep 11 22:34:59 myhostname kernel:  btrfs_insert_dir_item+0x200/0x280
  Sep 11 22:34:59 myhostname kernel:  btrfs_add_link+0xab/0x4f0
  Sep 11 22:34:59 myhostname kernel:  ? ktime_get_real_ts64+0x47/0xe0
  Sep 11 22:34:59 myhostname kernel:  btrfs_create_new_inode+0x7cd/0xa80
  Sep 11 22:34:59 myhostname kernel:  btrfs_symlink+0x190/0x4d0
  Sep 11 22:34:59 myhostname kernel:  ? schedule+0x5e/0xd0
  Sep 11 22:34:59 myhostname kernel:  ? __d_lookup+0x7e/0xc0
  Sep 11 22:34:59 myhostname kernel:  vfs_symlink+0x148/0x1e0
  Sep 11 22:34:59 myhostname kernel:  do_symlinkat+0x130/0x140
  Sep 11 22:34:59 myhostname kernel:  __x64_sys_symlinkat+0x3d/0x50
  Sep 11 22:34:59 myhostname kernel:  do_syscall_64+0x5d/0x90
  Sep 11 22:34:59 myhostname kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
  Sep 11 22:34:59 myhostname kernel:  ? do_syscall_64+0x6c/0x90
  Sep 11 22:34:59 myhostname kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc

The race leading to the problem happens like this:

1) Directory inode X is loaded into memory, its ->index_cnt field is
   initialized to (u64)-1 (at btrfs_alloc_inode());

2) Task A is adding a new file to directory X, holding its vfs inode lock,
   and calls btrfs_set_inode_index() to get an index number for the entry.

   Because the inode's index_cnt field is set to (u64)-1 it calls
   btrfs_inode_delayed_dir_index_count() which fails because no dir index
   entries were added yet to the delayed inode and then it calls
   btrfs_set_inode_index_count(). This functions finds the last dir index
   key and then sets index_cnt to that index value + 1. It found that the
   last index key has an offset of 100. However before it assigns a value
   of 101 to index_cnt...

3) Task B calls opendir(3), ending up at btrfs_opendir(), where the VFS
   lock for inode X is not taken, so it calls btrfs_get_dir_last_index()
   and sees index_cnt still with a value of (u64)-1. Because of that it
   calls btrfs_inode_delayed_dir_index_count() which fails since no dir
   index entries were added to the delayed inode yet, and then it also
   calls btrfs_set_inode_index_count(). This also finds that the last
   index key has an offset of 100, and before it assigns the value 101
   to the index_cnt field of inode X...

4) Task A assigns a value of 101 to index_cnt. And then the code flow
   goes to btrfs_set_inode_index() where it increments index_cnt from
   101 to 102. Task A then creates a delayed dir index entry with a
   sequence number of 101 and adds it to the delayed inode;

5) Task B assigns 101 to the index_cnt field of inode X;

6) At some later point when someone tries to add a new entry to the
   directory, btrfs_set_inode_index() will return 101 again and shortly
   after an attempt to add another delayed dir index key with index
   number 101 will fail with -EEXIST resulting in a transaction abort.

Fix this by locking the inode at btrfs_get_dir_last_index(), which is only
only used when opening a directory or attempting to lseek on it.

Reported-by: ken <ken@bllue.org>
Link: https://lore.kernel.org/linux-btrfs/CAE6xmH+Lp=Q=E61bU+v9eWX8gYfLvu6jLYxjxjFpo3zHVPR0EQ@mail.gmail.com/
Reported-by: syzbot+d13490c82ad5353c779d@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
Fixes: 9b378f6ad4 ("btrfs: fix infinite directory reads")
CC: stable@vger.kernel.org # 6.5+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-14 23:24:42 +02:00
..
tests btrfs: tests: test invalid splitting when skipping pinned drop extent_map 2023-08-21 14:54:49 +02:00
accessors.c btrfs: add eb to btrfs_node_key_ptr_offset 2022-12-05 18:00:58 +01:00
accessors.h btrfs: use helper sizeof_field in struct accessors 2023-08-21 14:52:13 +02:00
acl.c fs: port acl to mnt_idmap 2023-01-19 09:24:28 +01:00
acl.h fs: port ->set_acl() to pass mnt_idmap 2023-01-19 09:24:27 +01:00
async-thread.c btrfs: use alloc_ordered_workqueue() to create ordered workqueues 2023-06-19 13:59:30 +02:00
async-thread.h btrfs: use alloc_ordered_workqueue() to create ordered workqueues 2023-06-19 13:59:30 +02:00
backref.c btrfs: remove v0 extent handling 2023-08-21 14:54:48 +02:00
backref.h btrfs: fix backref walking not returning all inode refs 2023-05-09 22:09:11 +02:00
bio.c btrfs: add an ordered_extent pointer to struct btrfs_bio 2023-06-19 13:59:36 +02:00
bio.h btrfs: add an ordered_extent pointer to struct btrfs_bio 2023-06-19 13:59:36 +02:00
block-group.c btrfs: fix race between finishing block group creation and its item update 2023-09-08 14:10:36 +02:00
block-group.h btrfs: rename add_new_free_space() to btrfs_add_new_free_space() 2023-08-21 14:52:12 +02:00
block-rsv.c btrfs: account block group tree when calculating global reserve size 2023-07-20 19:22:54 +02:00
block-rsv.h btrfs: move btrfs_check_trunc_cache_free_space into block-rsv.c 2023-06-19 13:59:24 +02:00
btrfs_inode.h btrfs: reduce the number of arguments to btrfs_run_delalloc_range 2023-08-21 14:52:14 +02:00
check-integrity.c btrfs: rename __btrfs_map_block to btrfs_map_block 2023-06-19 13:59:34 +02:00
check-integrity.h
compression.c btrfs: make btrfs_compressed_bioset static 2023-06-19 17:01:44 +02:00
compression.h btrfs: pass an ordered_extent to btrfs_submit_compressed_write 2023-06-19 13:59:36 +02:00
ctree.c btrfs: replace BUG_ON() at split_item() with proper error handling 2023-06-19 13:59:39 +02:00
ctree.h btrfs: fix infinite directory reads 2023-08-14 16:17:37 +02:00
defrag.c btrfs: drop gfp from parameter extent state helpers 2023-06-19 13:59:30 +02:00
defrag.h btrfs: move defrag related prototypes to their own header 2022-12-05 18:00:46 +01:00
delalloc-space.c btrfs: count extents before taking inode's spinlock when reserving metadata 2023-04-17 18:01:19 +02:00
delalloc-space.h btrfs: move delalloc space related prototypes to delalloc-space.h 2022-12-05 18:00:44 +01:00
delayed-inode.c btrfs: assert delayed node locked when removing delayed item 2023-09-08 14:20:40 +02:00
delayed-inode.h btrfs: fix infinite directory reads 2023-08-14 16:17:37 +02:00
delayed-ref.c btrfs: use a single switch statement when initializing delayed ref head 2023-06-19 13:59:32 +02:00
delayed-ref.h btrfs: use bool type for delayed ref head fields that are used as booleans 2023-06-19 13:59:32 +02:00
dev-replace.c btrfs: make find_first_extent_bit() return a boolean 2023-08-21 14:52:12 +02:00
dev-replace.h btrfs: move dev-replace prototypes into dev-replace.h 2022-12-05 18:00:47 +01:00
dir-item.c btrfs: move dir-item prototypes into dir-item.h 2022-12-05 18:00:46 +01:00
dir-item.h btrfs: move dir-item prototypes into dir-item.h 2022-12-05 18:00:46 +01:00
discard.c btrfs: unexport btrfs_run_discard_work and make it static 2023-06-19 13:59:25 +02:00
discard.h btrfs: unexport btrfs_run_discard_work and make it static 2023-06-19 13:59:25 +02:00
disk-io.c btrfs: fix a compilation error if DEBUG is defined in btree_dirty_folio 2023-09-08 14:11:04 +02:00
disk-io.h btrfs: make btrfs_cleanup_fs_roots() static 2023-08-21 14:52:18 +02:00
export.c btrfs: move super_block specific helpers into super.h 2022-12-05 18:00:47 +01:00
export.h btrfs: simplify generation check in btrfs_get_dentry 2022-12-05 18:00:41 +01:00
extent_io.c btrfs: don't clear uptodate on write errors 2023-09-13 18:41:07 +02:00
extent_io.h btrfs: zoned: introduce block group context to btrfs_eb_write_context 2023-08-21 14:52:19 +02:00
extent_map.c btrfs: fix incorrect splitting in btrfs_drop_extent_map_range 2023-08-18 14:38:10 +02:00
extent_map.h btrfs: pass the new logical address to split_extent_map 2023-06-19 13:59:33 +02:00
extent-io-tree.c btrfs: make find_first_extent_bit() return a boolean 2023-08-21 14:52:12 +02:00
extent-io-tree.h btrfs: make find_first_extent_bit() return a boolean 2023-08-21 14:52:12 +02:00
extent-tree.c btrfs: remove v0 extent handling 2023-08-21 14:54:48 +02:00
extent-tree.h btrfs: wait on uncached block groups on every allocation loop 2023-08-21 14:54:47 +02:00
file-item.c btrfs: scrub: avoid unnecessary csum tree search preparing stripes 2023-08-21 14:54:48 +02:00
file-item.h btrfs: scrub: avoid unnecessary csum tree search preparing stripes 2023-08-21 14:54:48 +02:00
file.c btrfs: file_remove_privs needs an exclusive lock in direct io write 2023-09-13 18:41:03 +02:00
file.h btrfs: use cached state when looking for delalloc ranges with fiemap 2022-12-05 18:00:56 +01:00
free-space-cache.c btrfs: zoned: no longer count fresh BG region as zone unusable 2023-08-21 14:52:19 +02:00
free-space-cache.h btrfs: move btrfs_check_trunc_cache_free_space into block-rsv.c 2023-06-19 13:59:24 +02:00
free-space-tree.c btrfs: rename add_new_free_space() to btrfs_add_new_free_space() 2023-08-21 14:52:12 +02:00
free-space-tree.h btrfs: make clear_cache mount option to rebuild FST without disabling it 2023-05-10 14:51:27 +02:00
fs.c btrfs: sysfs: update fs features directory asynchronously 2023-02-13 17:50:35 +01:00
fs.h btrfs: zoned: activate metadata block group on write time 2023-08-21 14:52:19 +02:00
inode-item.c btrfs: remove obsolete delayed ref throttling logic when truncating items 2023-04-17 18:01:19 +02:00
inode-item.h btrfs: move split_flags/combine_flags helpers to inode-item.h 2023-06-19 13:59:25 +02:00
inode.c btrfs: fix race between reading a directory and adding entries to it 2023-09-14 23:24:42 +02:00
ioctl.c btrfs: release path before inode lookup during the ino lookup ioctl 2023-09-08 14:10:40 +02:00
ioctl.h fs: port ->fileattr_set() to pass mnt_idmap 2023-01-19 09:24:27 +01:00
Kconfig MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org 2023-09-08 14:21:27 +02:00
locking.c btrfs: add block-group tree to lockdep classes 2023-06-19 13:59:35 +02:00
locking.h btrfs: do not block starts waiting on previous transaction commit 2023-09-08 14:10:49 +02:00
lru_cache.c btrfs: send: cache utimes operations for directories if possible 2023-02-15 19:38:50 +01:00
lru_cache.h btrfs: remove btrfs_lru_cache_is_full() inline function 2023-04-17 18:01:18 +02:00
lzo.c btrfs: disable allocation warnings for compression workspaces 2023-06-19 13:59:34 +02:00
Makefile btrfs: send: genericize the backref cache to allow it to be reused 2023-02-13 17:50:35 +01:00
messages.c btrfs: remove v0 extent handling 2023-08-21 14:54:48 +02:00
messages.h btrfs: remove v0 extent handling 2023-08-21 14:54:48 +02:00
misc.h btrfs: export bitmap_test_range_all_{set,zero} 2023-06-19 13:59:22 +02:00
ordered-data.c btrfs: check for BTRFS_FS_ERROR in pending ordered assert 2023-09-08 14:10:59 +02:00
ordered-data.h btrfs: add a btrfs_finish_ordered_extent helper 2023-06-19 13:59:37 +02:00
orphan.c btrfs: move orphan prototypes into orphan.h 2022-12-05 18:00:47 +01:00
orphan.h btrfs: move orphan prototypes into orphan.h 2022-12-05 18:00:47 +01:00
print-tree.c btrfs: remove v0 extent handling 2023-08-21 14:54:48 +02:00
print-tree.h btrfs: print-tree: pass const extent buffer pointer 2023-06-19 13:59:22 +02:00
props.c btrfs: move super_block specific helpers into super.h 2022-12-05 18:00:47 +01:00
props.h btrfs: make module init/exit match their sequence 2022-12-05 18:00:40 +01:00
qgroup.c btrfs: avoid start and commit empty transaction when flushing qgroups 2023-08-21 14:52:18 +02:00
qgroup.h btrfs: sink gfp_t parameter to btrfs_qgroup_trace_extent 2022-12-05 18:00:43 +01:00
raid56.c btrfs: scrub: avoid unnecessary csum tree search preparing stripes 2023-08-21 14:54:48 +02:00
raid56.h btrfs: raid56: remove unused BTRFS_RBIO_REBUILD_MISSING 2023-08-21 14:52:12 +02:00
rcu-string.h btrfs: replace strncpy() with strscpy() 2022-12-05 18:00:59 +01:00
ref-verify.c btrfs: move accessor helpers into accessors.h 2022-12-05 18:00:42 +01:00
ref-verify.h
reflink.c btrfs: pass btrfs_inode to btrfs_inode_unlock 2022-12-05 18:00:53 +01:00
reflink.h
relocation.c btrfs: remove v0 extent handling 2023-08-21 14:54:48 +02:00
relocation.h btrfs: pass an ordered_extent to btrfs_reloc_clone_csums 2023-06-19 13:59:36 +02:00
root-tree.c btrfs: move orphan prototypes into orphan.h 2022-12-05 18:00:47 +01:00
root-tree.h btrfs: move root tree prototypes to their own header 2022-12-05 18:00:44 +01:00
scrub.c btrfs: scrub: move write back of repaired sectors to scrub_stripe_read_repair_worker() 2023-08-21 14:54:49 +02:00
scrub.h btrfs: scrub: remove scrub_bio structure 2023-04-17 18:01:24 +02:00
send.c btrfs: use LIST_HEAD() to initialize the list_head 2023-08-21 14:54:46 +02:00
send.h btrfs: send add define for v2 buffer size 2022-12-05 18:00:41 +01:00
space-info.c btrfs: zoned: re-enable metadata over-commit for zoned mode 2023-08-21 14:52:19 +02:00
space-info.h btrfs: update documentation for BTRFS_RESERVE_FLUSH_EVICT flush method 2023-04-17 18:01:18 +02:00
subpage.c btrfs: stop setting PageError in the data I/O path 2023-06-19 13:59:35 +02:00
subpage.h btrfs: stop setting PageError in the data I/O path 2023-06-19 13:59:35 +02:00
super.c btrfs: deprecate integrity checker feature 2023-08-21 14:52:13 +02:00
super.h btrfs: move super_block specific helpers into super.h 2022-12-05 18:00:47 +01:00
sysfs.c btrfs: sysfs: show if ACL support has been compiled in 2023-08-21 14:52:12 +02:00
sysfs.h btrfs: sysfs: update fs features directory asynchronously 2023-02-13 17:50:35 +01:00
transaction.c btrfs: do not block starts waiting on previous transaction commit 2023-09-08 14:10:49 +02:00
transaction.h btrfs: do not block starts waiting on previous transaction commit 2023-09-08 14:10:49 +02:00
tree-checker.c for-6.5-rc5-tag 2023-08-12 13:28:55 -07:00
tree-checker.h btrfs: move btrfs_verify_level_key into tree-checker.c 2023-06-19 13:59:25 +02:00
tree-log.c btrfs: use LIST_HEAD() to initialize the list_head 2023-08-21 14:54:46 +02:00
tree-log.h btrfs: change for_rename argument of btrfs_record_unlink_dir() to bool 2023-06-19 13:59:26 +02:00
tree-mod-log.c btrfs: avoid tree mod log ENOMEM failures when we don't need to log 2023-06-19 13:59:38 +02:00
tree-mod-log.h btrfs: fix SPDX comment in tree-mod-log.h 2022-12-05 18:00:48 +01:00
ulist.c btrfs: constify ulist parameter of ulist_next() 2022-12-05 18:00:50 +01:00
ulist.h btrfs: constify ulist parameter of ulist_next() 2022-12-05 18:00:50 +01:00
uuid-tree.c btrfs: move uuid tree prototypes to uuid-tree.h 2022-12-05 18:00:46 +01:00
uuid-tree.h btrfs: move uuid tree prototypes to uuid-tree.h 2022-12-05 18:00:46 +01:00
verity.c btrfs: convert btrfs_read_merkle_tree_page() to use a folio 2023-09-13 18:40:54 +02:00
verity.h btrfs: move verity prototypes into verity.h 2022-12-05 18:00:47 +01:00
volumes.c btrfs: simplify memcpy either of metadata_uuid or fsid 2023-08-21 14:54:48 +02:00
volumes.h btrfs: add a helper to read the superblock metadata_uuid 2023-08-21 14:54:48 +02:00
xattr.c fs: drop unused posix acl handlers 2023-03-06 09:57:12 +01:00
xattr.h
zlib.c btrfs: disable allocation warnings for compression workspaces 2023-06-19 13:59:34 +02:00
zoned.c btrfs: zoned: skip splitting and logical rewriting on pre-alloc write 2023-08-22 14:19:59 +02:00
zoned.h btrfs: zoned: reserve zones for an active metadata/system block group 2023-08-21 14:52:19 +02:00
zstd.c btrfs: disable allocation warnings for compression workspaces 2023-06-19 13:59:34 +02:00