linux

Author	SHA1	Message	Date
Ilya Dryomov	c6664b42c4	Btrfs: remove lock assert from get_restripe_target() This fixes a regression introduced by `fc67c450`. spin_is_locked() always returns 0 on UP kernels, which caused assert in get_restripe_target() to be fired on every call from btrfs_reduce_alloc_profile() on UP systems. Remove it completely for now, it's not clear if it's going to be needed in future. Reported-by: Bobby Powers <bobbypowers@gmail.com> Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org> Tested-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-04-12 16:03:56 -04:00
Chris Mason	8e62c2de6e	Revert "Btrfs: increase the global block reserve estimates" This reverts commit `5500cdbe14`. We've had a number of complaints of early enospc that bisect down to this patch. We'll hae to fix the reservations differently. CC: stable@kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-04-12 13:46:48 -04:00
Liu Bo	15d1ff8111	Btrfs: fix deadlock during allocating chunks This deadlock comes from xfstests 251. We'll hold the chunk_mutex throughout the whole of a chunk allocation. But if we find that we've used up system chunk space, we need to allocate a new system chunk, but this will lead to a recursion of chunk allocation and end up with a deadlock on chunk_mutex. So instead we need to allocate the system chunk first if we find we're in ENOSPC. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-29 09:57:44 -04:00
Liu Bo	2bcc0328c3	Btrfs: show useful info in space reservation tracepoint o For space info, the type of space info is useful for debug. o For transaction handle, its transid is useful. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-29 09:57:44 -04:00
Chris Mason	1c691b330a	Merge branch 'for-chris' of git://github.com/idryomov/btrfs-unstable into for-linus	2012-03-28 20:32:46 -04:00
Chris Mason	1d4284bd6e	Merge branch 'error-handling' into for-linus Conflicts: fs/btrfs/ctree.c fs/btrfs/disk-io.c fs/btrfs/extent-tree.c fs/btrfs/extent_io.c fs/btrfs/extent_io.h fs/btrfs/inode.c fs/btrfs/scrub.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-28 20:31:37 -04:00
Ilya Dryomov	4a5e98f5d6	Btrfs: improve the logic in btrfs_can_relocate() Currently if we don't have enough space allocated we go ahead and loop though devices in the hopes of finding enough space for a chunk of the same type as the one we are trying to relocate. The problem with that is that if we are trying to restripe the chunk its target type can be more relaxed than the current one (eg require less devices or less space). So, when restriping, run checks against the target profile instead of the current one. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:17 +03:00
Ilya Dryomov	7738a53a3a	Btrfs: add __get_block_group_index() helper Add __get_block_group_index() helper to be able to derive block group index from an arbitary set of flags. Implement get_block_group_index() in terms of it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:17 +03:00
Ilya Dryomov	fc67c45083	Btrfs: add get_restripe_target() helper Add get_restripe_target() helper and switch everybody to use it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:17 +03:00
Ilya Dryomov	0c460c0d70	Btrfs: move alloc_profile_is_valid() to volumes.c Header file is not a good place to define functions. This also moves a call to alloc_profile_is_valid() down the stack and removes a redundant check from __btrfs_alloc_chunk() - alloc_profile_is_valid() takes it into account. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:17 +03:00
Ilya Dryomov	e8920a640b	Btrfs: make profile_is_valid() check more strict "0" is a valid value for an on-disk chunk profile, but it is not a valid extended profile. (We have a separate bit for single chunks in extended case) Also rename it to alloc_profile_is_valid() for clarity. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:17 +03:00
Ilya Dryomov	899c81eac8	Btrfs: add wrappers for working with alloc profiles Add functions to abstract the conversion between chunk and extended allocation profile formats and switch everybody to use them. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:16 +03:00
Ilya Dryomov	e3176ca276	Btrfs: stop silently switching single chunks to raid0 on balance This has been causing a lot of confusion for quite a while now and a lot of users were surprised by this (some of them were even stuck in a ENOSPC situation which they couldn't easily get out of). The addition of restriper gives users a clear choice between raid0 and drive concat setup so there's absolutely no excuse for us to keep doing this. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-03-27 17:09:16 +03:00
Josef Bacik	3083ee2e18	Btrfs: introduce free_extent_buffer_stale Because btrfs cow's we can end up with extent buffers that are no longer necessary just sitting around in memory. So instead of evicting these pages, we could end up evicting things we actually care about. Thus we have free_extent_buffer_stale for use when we are freeing tree blocks. This will make it so that the ref for the eb being in the radix tree is dropped as soon as possible and then is freed when the refcount hits 0 instead of waiting to be released by releasepage. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-03-26 16:51:08 -04:00
Josef Bacik	81c9ad237c	Btrfs: remove search_start and search_end from find_free_extent and callers We have been passing nothing but (u64)-1 to find_free_extent for search_end in all of the callers, so it's completely useless, and we've always been passing 0 in as search_start, so just remove them as function arguments and move search_start into find_free_extent. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-03-26 14:42:51 -04:00
Josef Bacik	285ff5af6c	Btrfs: remove the ideal caching code This is a relic from before we had the disk space cache and it was to make bootup times when you had btrfs as root not be so damned slow. Now that we have the disk space cache this isn't a problem anymore and really having this code casues uneeded fragmentation and complexity, so just remove it. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-03-26 14:42:51 -04:00
Jeff Mahoney	79787eaab4	btrfs: replace many BUG_ONs with proper error handling btrfs currently handles most errors with BUG_ON. This patch is a work-in- progress but aims to handle most errors other than internal logic errors and ENOMEM more gracefully. This iteration prevents most crashes but can run into lockups with the page lock on occasion when the timing "works out." Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 11:52:54 +01:00
Jeff Mahoney	2c536799f1	btrfs: btrfs_drop_snapshot should return int Commit `cb1b69f4` (Btrfs: forced readonly when btrfs_drop_snapshot() fails) made btrfs_drop_snapshot return void because there were no callers checking the return value. That is the wrong order to handle error propogation since the caller will have no idea that an error has occured and continue on as if nothing went wrong. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:36 +01:00
Jeff Mahoney	143bede527	btrfs: return void in functions without error conditions Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:34 +01:00
Jeff Mahoney	538042801a	btrfs: avoid NULL deref in btrfs_reserve_extent with DEBUG_ENOSPC __find_space_info can return NULL but we don't check it before calling dump_space_info(). Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:32 +01:00
Chris Mason	e77266e4c4	Btrfs: fix compiler warnings on 32 bit systems The enospc tracing code added some interesting uses of u64 pointer casts. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-24 10:39:05 -05:00
Liu Bo	5500cdbe14	Btrfs: increase the global block reserve estimates When doing IO with large amounts of data fragmentation, the global block reserve calulations are too low. This increases them to avoid ENOSPC crashes. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-23 10:49:04 -05:00
Liu Bo	d9b0218f6c	Btrfs: fix a bug on overcommit stuff When overcommitting, we should check the sum of pinned space and bytes for delayed item. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-02-16 17:23:18 +01:00
Liu Bo	2cac13e41b	Btrfs: fix trim 0 bytes after a device delete A user reported a bug of btrfs's trim, that is we will trim 0 bytes after a device delete. The reproducer: $ mkfs.btrfs disk1 $ mkfs.btrfs disk2 $ mount disk1 /mnt $ fstrim -v /mnt $ btrfs device add disk2 /mnt $ btrfs device del disk1 /mnt $ fstrim -v /mnt This is because after we delete the device, the block group may start from a non-zero place, which will confuse trim to discard nothing. Reported-by: Lutz Euler <lutz.euler@freenet.de> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-02-15 16:40:23 +01:00
Miao Xie	9e622d6bea	Btrfs: fix enospc error caused by wrong checks of the chunk When we did sysbench test for inline files, enospc error happened easily though there was lots of free disk space which could be allocated for new chunks. Reproduce steps: # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition> # mount <test partition> /mnt # ulimit -n 102400 # cd /mnt # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr prepare # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr run <soon later, BUG_ON() was triggered by enospc error> The reason of this bug is: Now, we can reserve space which is larger than the free space in the chunks if we have enough free disk space which can be used for new chunks. By this way, the space allocator should allocate a new chunk by force if there is no free space in the free space cache. But there are two wrong checks which break this operation. One is if (ret == -ENOSPC && num_bytes > min_alloc_size) in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk even we fail to allocate free space by minimum allocable size. The other is if (space_info->force_alloc) force = space_info->force_alloc; in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen. Fix these two wrong checks. Especially the second one, we fix it by changing the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has higher priority. And if the value which is passed in by the caller is greater than ->force_alloc, use the passed value. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
Chris Mason	96bdc7dc61	Btrfs: use larger system chunks system chunks by default are very small. This makes them slightly larger and also fixes the conditional checks to make sure we don't allocate a billion of them at once. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-16 15:38:24 -05:00
Josef Bacik	f248679e86	Btrfs: add a delalloc mutex to inodes for delalloc reservations I was using i_mutex for this, but we're getting bogus lockdep warnings by doing that and theres no real way to get rid of those, so just stop using i_mutex to protect delalloc metadata reservations and use a delalloc mutex instead. This shouldn't be contended often at all, only if you are writing and mmap writing to the file at the same time. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-01-16 15:29:43 -05:00
Josef Bacik	8c2a3ca20f	Btrfs: space leak tracepoints This in addition to a script in my btrfs-tracing tree will help track down space leaks when we're getting space left over in block groups on umount. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-01-16 15:29:43 -05:00
Josef Bacik	3f7de037fb	Btrfs: add allocator tracepoints I used these tracepoints when figuring out what the cluster stuff was doing, so add them to mainline in case we need to profile this stuff again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-01-16 15:29:42 -05:00
Chris Mason	9785dbdf26	Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into integration	2012-01-16 15:26:31 -05:00
Chris Mason	d756bd2d93	Merge branch 'for-chris' of git://repo.or.cz/linux-btrfs-devel into integration Conflicts: fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-16 15:26:17 -05:00
Chris Mason	27263e2832	Merge branch 'restriper' of git://github.com/idryomov/btrfs-unstable into integration	2012-01-16 15:26:02 -05:00
Ilya Dryomov	e4d8ec0f65	Btrfs: implement online profile changing Profile changing is done by launching a balance with BTRFS_BALANCE_CONVERT bits set and target fields of respective btrfs_balance_args structs initialized. Profile reducing code in this case will pick restriper's target profile if it's available instead of doing a blind reduce. If target profile is not yet available it goes back to a plain reduce. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:48 +02:00
Ilya Dryomov	70922617b0	Btrfs: do not reduce profile in do_chunk_alloc() Every caller of do_chunk_alloc() feeds it the reduced allocation profile, so stop trying to reduce it one more time. Instead check the validity of the passed profile. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:48 +02:00
Ilya Dryomov	10ea00f55a	Btrfs: make avail_*_alloc_bits fields dynamic Currently when new chunks are created respective avail_alloc_bits field is updated to reflect profiles of all chunks present in the system. However when chunks are removed profile bits are never cleared. This patch clears profile bit of respective avail_alloc_bits field when the last chunk with that profile is removed. Restriper needs this to properly operate when "downgrading". Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:47 +02:00
Ilya Dryomov	a46d11a8b0	Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for avail_{data,metadata,system}_alloc_bits fields, which gather info about available allocation profiles in the FS. When chunk is created or read from disk, its profile is OR'ed with the corresponding avail_alloc_bits field. Since SINGLE is denoted by 0 in the on-disk format, currently there is no way to tell when such chunks become avaialble. Restriper needs that information, so add a separate bit for SINGLE profile. This bit is going to be in-memory only, it should never be written out to disk, so it's not a disk format change. However to avoid remappings in future, reserve corresponding on-disk bit. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:47 +02:00
Ilya Dryomov	52ba692972	Btrfs: introduce masks for chunk type and profile Chunk's type and profile are encoded in u64 flags field. Introduce masks to easily access them. Also fix the type of BTRFS_BLOCK_GROUP_* constants, it should be ULL. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:47 +02:00
Ilya Dryomov	6fef8df1dc	Btrfs: get rid of *_alloc_profile fields {data,metadata,system}_alloc_profile fields have been unused for a long time now. Get rid of them. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:47 +02:00
Li Zefan	c7c144db53	Btrfs: update global block_rsv when creating a new block group A bug was triggered while using seed device: # mkfs.btrfs /dev/loop1 # btrfstune -S 1 /dev/loop1 # mount -o /dev/loop1 /mnt # btrfs dev add /dev/loop2 /mnt btrfs: block rsv returned -28 ------------[ cut here ]------------ WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 [btrfs]() ... Call Trace: ... [<f7b7c31c>] btrfs_cow_block+0x101/0x147 [btrfs] [<f7b7eaa6>] btrfs_search_slot+0x1b8/0x55f [btrfs] [<f7b7f844>] btrfs_insert_empty_items+0x42/0x7f [btrfs] [<f7b7f8c1>] btrfs_insert_item+0x40/0x7e [btrfs] [<f7b8ac02>] btrfs_make_block_group+0x243/0x2aa [btrfs] [<f7bb3f53>] __btrfs_alloc_chunk+0x672/0x70e [btrfs] [<f7bb41ff>] init_first_rw_device+0x77/0x13c [btrfs] [<f7bb5a62>] btrfs_init_new_device+0x664/0x9fd [btrfs] [<f7bbb65a>] btrfs_ioctl+0x694/0xdbe [btrfs] [<c04f55f7>] do_vfs_ioctl+0x496/0x4cc [<c04f5660>] sys_ioctl+0x33/0x4f [<c07b9edf>] sysenter_do_call+0x12/0x38 ---[ end trace 906adac595facc7d ]--- Since seed device is readonly, there's no usable space in the filesystem. Afterwards we add a sprout device to it, and the kernel creates a METADATA block group and a SYSTEM block group where comes free space we can reserve, but we still get revervation failure because the global block_rsv hasn't been updated accordingly. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2012-01-11 10:26:52 +08:00
Li Zefan	125ccb0ae6	Btrfs: don't pass a trans handle unnecessarily in volumes.c Some functions never use the transaction handle passed to them. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2012-01-11 10:26:42 +08:00
Alexandre Oliva	fc7c1077ce	Btrfs: don't set up allocation result twice We store the allocation start and length twice in ins, once right after the other, but with intervening calls that may prevent the duplicate from being optimized out by the compiler. Remove one of the assignments. Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-07 19:15:14 -05:00
Alexandre Oliva	a5f6f719a5	Btrfs: test free space only for unclustered allocation Since the clustered allocation may be taking extents from a different block group, there's no point in spin-locking and testing the current block group free space before attempting to allocate space from a cluster, even more so when we might refrain from even trying the cluster in the current block group because, after the cluster was set up, not enough free space remained. Furthermore, cluster creation attempts fail fast when the block group doesn't have enough free space, so the test was completely superfluous. I've move the free space test past the cluster allocation attempt, where it is more useful, and arranged for a cluster in the current block group to be released before trying an unclustered allocation, when we reach the LOOP_NO_EMPTY_SIZE stage, so that the free space in the cluster stands a chance of being combined with additional free space in the block group so as to succeed in the allocation attempt. Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-06 15:48:21 -05:00
Chris Mason	cf1d72c9ce	Btrfs: lower the bar for chunk allocation The chunk allocation code has tried to keep a pretty tight lid on creating new metadata chunks. This is partially because in the past the reservation code didn't give us an accurate idea of how much space was being used. The new code is much more accurate, so we're able to get rid of some of these checks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-06 15:41:34 -05:00
Chris Mason	203bf287cb	Btrfs: run chunk allocations while we do delayed refs Btrfs tries to batch extent allocation tree changes to improve performance and reduce metadata trashing. But it doesn't allocate new metadata chunks while it is doing allocations for the extent allocation tree. This commit changes the delayed refence code to do chunk allocations if we're getting low on room. It prevents crashes and improves performance. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-06 15:23:57 -05:00
Jan Schmidt	a168650c08	Btrfs: add waitqueue instead of doing busy waiting for more delayed refs Now that we may be holding back delayed refs for a limited period, we might end up having no runnable delayed refs. Without this commit, we'd do busy waiting in that thread until another (runnable) ref arives. Instead, we're detecting this situation and use a waitqueue, such that we only try to run more refs after a) another runnable ref was added or b) delayed refs are no longer held back Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-01-04 16:12:48 +01:00
Arne Jansen	d1270cd91f	Btrfs: put back delayed refs that are too new When processing a delayed ref, first check if there are still old refs in the process of being added. If so, put this ref back to the tree. To avoid looping on this ref, choose a newer one in the next loop. btrfs_find_ref_cluster has to take care of that. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-01-04 16:12:45 +01:00
Arne Jansen	66d7e7f09f	Btrfs: mark delayed refs as for cow Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value from every call site. The for_cow parameter will later on be used to determine if a ref will change anything with respect to qgroups. Delayed refs coming from relocation are always counted as for_cow, as they don't change subvol quota. Also pass in the fs_info for later use. btrfs_find_all_roots() will use this as an optimization, as changes that are for_cow will not change anything with respect to which root points to a certain leaf. Thus, we don't need to add the current sequence number to those delayed refs. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2011-12-22 16:22:27 +01:00
Linus Torvalds	c9a7fe9672	Merge branches 'for-linus' and 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: unplug every once and a while Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code Btrfs: only set cache_generation if we setup the block group Btrfs: don't panic if orphan item already exists Btrfs: fix leaked space in truncate Btrfs: fix how we do delalloc reservations and how we free reservations on error Btrfs: deal with enospc from dirtying inodes properly Btrfs: fix num_workers_starting bug and other bugs in async thread BTRFS: Establish i_ops before calling d_instantiate Btrfs: add a cond_resched() into the worker loop Btrfs: fix ctime update of on-disk inode btrfs: keep orphans for subvolume deletion Btrfs: fix inaccurate available space on raid0 profile Btrfs: fix wrong disk space information of the files Btrfs: fix wrong i_size when truncating a file to a larger size Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror * 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: lower the dirty balance poll interval	2011-12-16 12:15:50 -08:00
Josef Bacik	e65cbb94e0	Btrfs: only set cache_generation if we setup the block group A user reported a problem booting into a new kernel with the old format inodes. He was panicing in cow_file_range while writing out the inode cache. This is because if the block group is not cached we'll just skip writing out the cache, however if it gets dirtied again in the same transaction and it finished caching we'd go ahead and write it out, but since we set cache_generation to the transid we think we've already truncated it and will just carry on, running into cow_file_range and blowing up. We need to make sure we only set cache_generation if we've done the truncate. The user tested this patch and verified that the panic no longer occured. Thanks, Reported-and-Tested-by: Klaus Bitto <klaus.bitto@gmail.com> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-12-15 11:04:24 -05:00
Josef Bacik	660d3f6cde	Btrfs: fix how we do delalloc reservations and how we free reservations on error Running xfstests 269 with some tracing my scripts kept spitting out errors about releasing bytes that we didn't actually have reserved. This took me down a huge rabbit hole and it turns out the way we deal with reserved_extents is wrong, we need to only be setting it if the reservation succeeds, otherwise the free() method will come in and unreserve space that isn't actually reserved yet, which can lead to other warnings and such. The math was all working out right in the end, but it caused all sorts of other issues in addition to making my scripts yell and scream and generally make it impossible for me to track down the original issue I was looking for. The other problem is with our error handling in the reservation code. There are two cases that we need to deal with 1) We raced with free. In this case free won't free anything because csum_bytes is modified before we dro the lock in our reservation path, so free rightly doesn't release any space because the reservation code may be depending on that reservation. However if we fail, we need the reservation side to do the free at that point since that space is no longer in use. So as it stands the code was doing this fine and it worked out, except in case #2 2) We don't race with free. Nobody comes in and changes anything, and our reservation fails. In this case we didn't reserve anything anyway and we just need to clean up csum_bytes but not free anything. So we keep track of csum_bytes before we drop the lock and if it hasn't changed we know we can just decrement csum_bytes and carry on. Because of the case where we can race with free()'s since we have to drop our spin_lock to do the reservation, I'm going to serialize all reservations with the i_mutex. We already get this for free in the heavy use paths, truncate and file write all hold the i_mutex, just needed to add it to page_mkwrite and various ioctl/balance things. With this patch my space leak scripts no longer scream bloody murder. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-12-15 11:04:22 -05:00

1 2 3 4 5 ...

620 Commits