linux/fs/btrfs
Stefan Behrens ff76b05655 Btrfs: Don't allocate inode that is already in use
Due to an off-by-one error, it is possible to reproduce a bug
when the inode cache is used.

The same inode number is assigned twice, the second time this
leads to an EEXIST in btrfs_insert_empty_items().

The issue can happen when a file is removed right after a subvolume
is created and then a new inode number is created before the
inodes in free_inode_pinned are processed.
unlink() calls btrfs_return_ino() which calls start_caching() in this
case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
searching for the highest inode (which already cannot find the
unlinked one anymore in btrfs_find_free_objectid()). So if this
unlinked inode's number is equal to the highest_ino + 1 (or >= this value
instead of > this value which was the off-by-one error), we mustn't add
the inode number to free_ino_pinned (caching_thread() does it right).
In this case we need to try directly to add the number to the inode_cache
which will fail in this case.

When this inode number is allocated while it is still in free_ino_pinned,
it is allocated and still added to the free inode cache when the
pinned inodes are processed, thus one of the following inode number
allocations will get an inode that is already in use and fail with EEXIST
in btrfs_insert_empty_items().

One example which was created with the reproducer below:
Create a snapshot, work in the newly created snapshot for the rest.
In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
start_caching() calls add_free_space [34284, 18446744073709517077].
In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
btrfs_unpin_free_ino calls add_free_space [34284, 1].
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
EEXIST when the new inode is inserted.

One possible reproducer is this one:
 #!/bin/sh
 # preparation
TEST_DEV=/dev/sdc1
TEST_MNT=/mnt
umount ${TEST_MNT} 2>/dev/null || true
mkfs.btrfs -f ${TEST_DEV}
mount ${TEST_DEV} ${TEST_MNT} -o \
 rw,relatime,compress=lzo,space_cache,inode_cache
btrfs subv create ${TEST_MNT}/s1
for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'`
rm ${TEST_MNT}/s2/$FILENAME
touch ${TEST_MNT}/s2/$FILENAME
 # the following steps can be repeated to reproduce the issue again and again
[ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3
btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
rm ${TEST_MNT}/s3/$FILENAME
touch ${TEST_MNT}/s3/$FILENAME
ls -alFi ${TEST_MNT}/s?/$FILENAME
touch ${TEST_MNT}/s3/_1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_1
touch ${TEST_MNT}/s3/_2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_2
touch ${TEST_MNT}/s3/__1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__1
touch ${TEST_MNT}/s3/__2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__2
 # if the above is not enough, add the following loop:
for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
 #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
 # one of the touch(1) calls in s3 fail due to EEXIST because the inode is
 # already in use that btrfs_find_ino_for_alloc() returns.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11 22:02:36 -05:00
..
tests Btrfs: add a sanity test for a vacant extent at the front of a file 2013-11-11 21:58:19 -05:00
acl.c
async-thread.c Btrfs: eliminate races in worker stopping code 2013-10-04 16:02:13 -04:00
async-thread.h Btrfs: eliminate races in worker stopping code 2013-10-04 16:02:13 -04:00
backref.c btrfs: drop unused parameter from btrfs_item_nr 2013-11-11 21:50:48 -05:00
backref.h Btrfs: allocate prelim_ref with a slab allocater 2013-09-01 08:16:27 -04:00
btrfs_inode.h Btrfs: improve inode hash function/inode lookup 2013-11-11 21:55:19 -05:00
check-integrity.c Btrfs: Use %z to format size_t 2013-09-01 08:16:19 -04:00
check-integrity.h
compat.h
compression.c Btrfs: Remove superfluous casts from u64 to unsigned long long 2013-09-01 08:16:08 -04:00
compression.h
ctree.c Btrfs: fix btrfs_prev_leaf() previous key computation 2013-11-11 22:02:26 -05:00
ctree.h Btrfs: add tests for btrfs_get_extent 2013-11-11 21:57:30 -05:00
delayed-inode.c Btrfs: don't leak delayed node on path allocation failure 2013-11-11 22:01:27 -05:00
delayed-inode.h
delayed-ref.c Btrfs: get rid of sparse warnings 2013-09-01 08:15:50 -04:00
delayed-ref.h
dev-replace.c Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts 2013-11-11 22:00:50 -05:00
dev-replace.h
dir-item.c btrfs: drop unused parameter from btrfs_item_nr 2013-11-11 21:50:48 -05:00
disk-io.c Btrfs: Simplify the logic in alloc_extent_buffer() for existing extent buffer case 2013-11-11 21:59:11 -05:00
disk-io.h Btrfs: add a sanity test for btrfs_split_item 2013-11-11 21:51:02 -05:00
export.c
export.h
extent_io.c Btrfs: Simplify the logic in alloc_extent_buffer() for existing extent buffer case 2013-11-11 21:59:11 -05:00
extent_io.h Btrfs: Simplify the logic in alloc_extent_buffer() for existing extent buffer case 2013-11-11 21:59:11 -05:00
extent_map.c
extent_map.h
extent-tree.c Btrfs: fixup error path in __btrfs_inc_extent_ref 2013-11-11 22:01:00 -05:00
file-item.c Btrfs: add an assert to btrfs_lookup_csums_range for alignment 2013-11-11 21:58:45 -05:00
file.c Btrfs: fix up seek_hole/seek_data handling 2013-11-11 21:58:56 -05:00
free-space-cache.c Btrfs: remove path arg from btrfs_truncate_free_space_cache 2013-11-11 21:51:33 -05:00
free-space-cache.h Btrfs: remove path arg from btrfs_truncate_free_space_cache 2013-11-11 21:51:33 -05:00
hash.h
inode-item.c btrfs: drop unused parameter from btrfs_item_nr 2013-11-11 21:50:48 -05:00
inode-map.c Btrfs: Don't allocate inode that is already in use 2013-11-11 22:02:36 -05:00
inode-map.h
inode.c Btrfs: handle a missing extent for the first file extent 2013-11-11 21:58:05 -05:00
ioctl.c btrfs: simplify kmalloc+copy_from_user to memdup_user 2013-11-11 22:01:51 -05:00
Kconfig Btrfs: add support for asserts 2013-09-01 08:16:32 -04:00
locking.c
locking.h
lzo.c
Makefile Btrfs: add tests for btrfs_get_extent 2013-11-11 21:57:30 -05:00
math.h
ordered-data.c Btrfs: btrfs_add_ordered_operation: Fix last modified transaction comparison. 2013-11-11 22:01:37 -05:00
ordered-data.h Btrfs: kill delay_iput arg to the wait_ordered functions 2013-09-21 11:05:27 -04:00
orphan.c
print-tree.c btrfs: drop unused parameter from btrfs_item_nr 2013-11-11 21:50:48 -05:00
print-tree.h
qgroup.c Btrfs: Remove superfluous casts from u64 to unsigned long long 2013-09-01 08:16:08 -04:00
raid56.c Btrfs, raid56: fix memory leak when allocating pages for p/q stripes failed 2013-09-01 08:04:27 -04:00
raid56.h
rcu-string.h
reada.c
relocation.c Btrfs: fix BUG_ON() casued by the reserved space migration 2013-11-11 21:54:28 -05:00
root-tree.c Btrfs: insert orphan roots into fs radix tree 2013-10-10 21:30:53 -04:00
scrub.c Btrfs: fix the dev-replace suspend sequence 2013-11-11 21:55:36 -05:00
send.c btrfs: drop unused parameter from btrfs_item_nr 2013-11-11 21:50:48 -05:00
send.h
struct-funcs.c
super.c Btrfs: Wait for uuid-tree rebuild task on remount read-only 2013-11-11 22:01:18 -05:00
sysfs.c
transaction.c Btrfs: fix memory leaks on transaction commit failure 2013-11-11 21:55:46 -05:00
transaction.h Btrfs: fix BUG_ON() casued by the reserved space migration 2013-11-11 21:54:28 -05:00
tree-defrag.c Btrfs: cleanup dead code of defragment 2013-11-11 21:59:45 -05:00
tree-log.c Btrfs: optimize tree-log.c:count_inode_refs() 2013-11-11 22:02:19 -05:00
tree-log.h
ulist.c
ulist.h
uuid-tree.c Btrfs: remove unused max_key arg from btrfs_search_forward 2013-11-11 21:54:57 -05:00
volumes.c Btrfs: init device stats for new devices 2013-11-11 22:01:09 -05:00
volumes.h Btrfs: add btrfs_alloc_device and switch to it 2013-09-01 08:16:04 -04:00
xattr.c
xattr.h
zlib.c