linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-03 17:41:22 +00:00

Author	SHA1	Message	Date
Filipe Manana	46fefe41b5	Btrfs: remove unused wait queue in struct extent_buffer The lock_wq wait queue is not used anywhere, therefore just remove it. On a x86_64 system, this reduced sizeof(struct extent_buffer) from 320 bytes down to 296 bytes, which means a 4Kb page can now be used for 13 extent buffers instead of 12. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:28 -07:00
Gerhard Heift	550ac1d85e	btrfs: new function read_extent_buffer_to_user This new function reads the content of an extent directly to user memory. Signed-off-by: Gerhard Heift <Gerhard@Heift.Name> Signed-off-by: Chris Mason <clm@fb.com> Acked-by: David Sterba <dsterba@suse.cz>	2014-06-12 18:21:56 -07:00
Josef Bacik	faa2dbf004	Btrfs: add sanity tests for new qgroup accounting code This exercises the various parts of the new qgroup accounting code. We do some basic stuff and do some things with the shared refs to make sure all that code works. I had to add a bunch of infrastructure because I needed to be able to insert items into a fake tree without having to do all the hard work myself, hopefully this will be usefull in the future. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:49 -07:00
Josef Bacik	a26e8c9f75	Btrfs: don't clear uptodate if the eb is under IO So I have an awful exercise script that will run snapshot, balance and send/receive in parallel. This sometimes would crash spectacularly and when it came back up the fs would be completely hosed. Turns out this is because of a bad interaction of balance and send/receive. Send will hold onto its entire path for the whole send, but its blocks could get relocated out from underneath it, and because it doesn't old tree locks theres nothing to keep this from happening. So it will go to read in a slot with an old transid, and we could have re-allocated this block for something else and it could have a completely different transid. But because we think it is invalid we clear uptodate and re-read in the block. If we do this before we actually write out the new block we could write back stale data to the fs, and boom we're screwed. Now we definitely need to fix this disconnect between send and balance, but we really really need to not allow ourselves to accidently read in stale data over new data. So make sure we check if the extent buffer is not under io before clearing uptodate, this will kick back EIO to the caller instead of reading in stale data and keep us from corrupting the fs. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-04-06 17:34:37 -07:00
Josef Bacik	f28491e0a6	Btrfs: move the extent buffer radix tree into the fs_info I need to create a fake tree to test qgroups and I don't want to have to setup a fake btree_inode. The fact is we only use the radix tree for the fs_info, so everybody else who allocates an extent_io_tree is just wasting the space anyway. This patch moves the radix tree and its lock into btrfs_fs_info so there is less stuff I have to fake to do qgroup sanity tests. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-01-28 13:19:55 -08:00
Josef Bacik	34b41acec1	Btrfs: use a bit to track if we're in the radix tree For creating a dummy in-memory btree I need to be able to use the radix tree to keep track of the buffers like normal extent buffers. With dummy buffers we skip the radix tree step, and we still want to do that for the tree mod log dummy buffers but for my test buffers we need to be able to remove them from the radix tree like normal. This will give me a way to do that. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-01-28 13:19:54 -08:00
Chandra Seetharaman	452c75c3d2	Btrfs: Simplify the logic in alloc_extent_buffer() for existing extent buffer case alloc_extent_buffer() uses radix_tree_lookup() when radix_tree_insert() fails with EEXIST. That part of the code is very similar to the code in find_extent_buffer(). This patch replaces radix_tree_lookup() and surrounding code in alloc_extent_buffer() with find_extent_buffer(). Note that radix_tree_lookup() does not need to be protected by tree->buffer_lock. It is protected by eb->refs. While at it, this patch - changes the other usage of radix_tree_lookup() in alloc_extent_buffer() with find_extent_buffer() to reduce redundancy. - removes the unused argument 'len' to find_extent_buffer(). Signed-Off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 21:59:11 -05:00
Josef Bacik	294e30fee3	Btrfs: add tests for find_lock_delalloc_range So both Liu and I made huge messes of find_lock_delalloc_range trying to fix stuff, me first by fixing extent size, then him by fixing something I broke and then me again telling him to fix it a different way. So this is obviously a candidate for some testing. This patch adds a pseudo fs so we can allocate fake inodes for tests that need an inode or pages. Then it addes a bunch of tests to make sure find_lock_delalloc_range is acting the way it is supposed to. With this patch and all of our previous patches to find_lock_delalloc_range I am sure it is working as expected now. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 21:56:51 -05:00
Sergei Trofimovich	171170c1c5	btrfs: mark some local function as 'static' Cc: Josef Bacik <jbacik@fusionio.com> Cc: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:15:51 -04:00
Mark Fasheh	4b384318a7	btrfs: Introduce extent_read_full_page_nolock() We want this for btrfs_extent_same. Basically readpage and friends do their own extent locking but for the purposes of dedupe, we want to have both files locked down across a set of readpage operations (so that we can compare data). Introduce this variant and a flag which can be set for extent_read_full_page() to indicate that we are already locked. Partial credit for this patch goes to Gabriel de Perthuis <g2p.code@gmail.com> as I have included a fix from him to the original patch which avoids a deadlock on compressed extents. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:04:59 -04:00
Josef Bacik	c2790a2e2b	Btrfs: cleanup arguments to extent_clear_unlock_delalloc This patch removes the io_tree argument for extent_clear_unlock_delalloc since we always use &BTRFS_I(inode)->io_tree, and it separates out the extent tree operations from the page operations. This way we just pass in the extent bits we want to clear and then pass in the operations we want done to the pages. This is because I'm going to fix what extent bits we clear in some cases and rather than add a bunch of new flags we'll just use the actual extent bits we want to clear. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:04:38 -04:00
Miao Xie	facc8a2247	Btrfs: don't cache the csum value into the extent state tree Before applying this patch, we cached the csum value into the extent state tree when reading some data from the disk, this operation increased the lock contention of the state tree. Now, we just store the csum value into the bio structure or other unshared structure, so we can reduce the lock contention. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:04:33 -04:00
Josef Bacik	7ee9e4405f	Btrfs: check if we can nocow if we don't have data space We always just try and reserve data space when we write, but if we are out of space but have prealloc'ed extents we should still successfully write. This patch will try and see if we can write to prealloc'ed space and if we can go ahead and allow the write to continue. With this patch we now pass xfstests generic/274. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-07-02 11:50:45 -04:00
Chris Mason	9be3395bcd	Btrfs: use a btrfs bioset instead of abusing bio internals Btrfs has been pointer tagging bi_private and using bi_bdev to store the stripe index and mirror number of failed IOs. As bios bubble back up through the call chain, we use these to decide if and how to retry our IOs. They are also used to count IO failures on a per device basis. Recently a bio tracepoint was added lead to crashes because we were abusing bi_bdev. This commit adds a btrfs bioset, and creates explicit fields for the mirror number and stripe index. The plan is to extend this structure for all of the fields currently in struct btrfs_bio, which will mean one less kmalloc in our IO path. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Reported-by: Tejun Heo <tj@kernel.org>	2013-05-17 21:52:52 -04:00
David Sterba	410748882a	btrfs: use unsigned long type for extent state bits Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:55:27 -04:00
David Sterba	f7a52a40ca	btrfs: remove unused gfp mask parameter from release_extent_buffer callchain It's unused since `0b32f4bbb4`. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:55:24 -04:00
Eric Sandeen	48a3b6366f	btrfs: make static code static & remove dead code Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:55:23 -04:00
Eric Sandeen	6d49ba1b47	btrfs: move leak debug code to functions Clean up the leak debugging in extent_io.c by moving the debug code into functions. This also removes the list_heads used for debugging from the extent_buffer and extent_state structures when debug is not enabled. Since we need a global debug config to do that last part, implement CONFIG_BTRFS_DEBUG to accommodate. Thanks to Dave Sterba for the Kconfig bit. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:55:16 -04:00
Josef Bacik	fd8b2b6115	Btrfs: cleanup destroy_marked_extents We can just look up the extent_buffers for the range and free stuff that way. This makes the cleanup a bit cleaner and we can make sure to evict the extent_buffers pretty quickly by marking them as stale. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:55:11 -04:00
Miao Xie	e4100d987b	Btrfs: improve the performance of the csums lookup It is very likely that there are several blocks in bio, it is very inefficient if we get their csums one by one. This patch improves this problem by getting the csums in batch. According to the result of the following test, the execute time of __btrfs_lookup_bio_sums() is down by ~28%(300us -> 217us). # dd if=<mnt>/file of=/dev/null bs=1M count=1024 Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:54:35 -04:00
Chris Mason	4adaa61102	Btrfs: fix race between mmap writes and compression Btrfs uses page_mkwrite to ensure stable pages during crc calculations and mmap workloads. We call clear_page_dirty_for_io before we do any crcs, and this forces any application with the file mapped to wait for the crc to finish before it is allowed to change the file. With compression on, the clear_page_dirty_for_io step is happening after we've compressed the pages. This means the applications might be changing the pages while we are compressing them, and some of those modifications might not hit the disk. This commit adds the clear_page_dirty_for_io before compression starts and makes sure to redirty the page if we have to fallback to uncompressed IO as well. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Reported-by: Alexandre Oliva <oliva@gnu.org> cc: stable@vger.kernel.org	2013-03-26 13:19:14 -04:00
David Sterba	b8dae31388	btrfs: use only inline_pages from extent buffer The nodesize is capped at 64k and there are enough pages preallocated in extent_buffer::inline_pages. The fallback to kmalloc never happened because even on the smallest page size considered (4k) inline_pages covered the needs. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-02-28 13:33:56 -05:00
Chris Mason	e942f883bc	Merge branch 'raid56-experimental' into for-linus-3.9 Signed-off-by: Chris Mason <chris.mason@fusionio.com> Conflicts: fs/btrfs/ctree.h fs/btrfs/extent-tree.c fs/btrfs/inode.c fs/btrfs/volumes.c	2013-02-20 14:06:05 -05:00
Josef Bacik	c8f2f24bd5	Btrfs: remove unused extent io tree ops V2 Nobody uses these io tree ops anymore so just remove them and clean up the code a bit. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-02-20 12:59:52 -05:00
David Woodhouse	64a167011b	Btrfs: add rw argument to merge_bio_hook() We'll want to merge writes so they can fill a full RAID[56] stripe, but not necessarily reads. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-02-01 11:49:47 -05:00
Stefan Behrens	3ec706c831	Btrfs: pass fs_info to btrfs_map_block() instead of mapping_tree This is required for the device replace procedure in a later step. Two calling functions also had to be changed to have the fs_info pointer: repair_io_failure() and scrub_setup_recheck_block(). Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-12-12 17:15:34 -05:00
Robin Dong	479ed9abdb	btrfs: move inline function code to header file When building btrfs from kernel code, it will report: fs/btrfs/extent_io.h:281: warning: 'extent_buffer_page' declared inline after being called fs/btrfs/extent_io.h:281: warning: previous declaration of 'extent_buffer_page' was here fs/btrfs/extent_io.h:280: warning: 'num_extent_pages' declared inline after being called fs/btrfs/extent_io.h:280: warning: previous declaration of 'num_extent_pages' was here because of the wrong declaration of inline functions. Signed-off-by: Robin Dong <sanbai@taobao.com>	2012-10-09 09:15:43 -04:00
Josef Bacik	e6138876ad	Btrfs: cache extent state when writing out dirty metadata pages Everytime we write out dirty pages we search for an offset in the tree, convert the bits in the state, and then when we wait we search for the offset again and clear the bits. So for every dirty range in the io tree we are doing 4 rb searches, which is suboptimal. With this patch we are only doing 2 searches for every cycle (modulo weird things happening). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-10-09 09:15:41 -04:00
Josef Bacik	de0022b9da	Btrfs: do not async metadata csumming in certain situations There are a coule scenarios where farming metadata csumming off to an async thread doesn't help. The first is if our processor supports crc32c, in which case the csumming will be fast and so the overhead of the async model is not worth the cost. The other case is for our tree log. We will be making that stuff dirty and writing it out and waiting for it immediately. Even with software crc32c this gives me a ~15% increase in speed with O_SYNC workloads. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-10-09 09:15:40 -04:00
Liu Bo	9e8a4a8b0b	Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag We're going to use this flag EXTENT_DEFRAG to indicate which range belongs to defragment so that we can implement snapshow-aware defrag: We set the EXTENT_DEFRAG flag when dirtying the extents that need defragmented, so later on writeback thread can differentiate between normal writeback and writeback started by defragmentation. Original-Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>	2012-10-01 15:19:15 -04:00
Chris Mason	1e20932a23	Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into for-linus Conflicts: fs/btrfs/ulist.h Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-05-31 16:49:53 -04:00
Josef Bacik	5fd0204355	Btrfs: finish ordered extents in their own thread We noticed that the ordered extent completion doesn't really rely on having a page and that it could be done independantly of ending the writeback on a page. This patch makes us not do the threaded endio stuff for normal buffered writes and direct writes so we can end page writeback as soon as possible (in irq context) and only start threads to do the ordered work when it is actually done. Compression needs to be reworked some to take advantage of this as well, but atm it has to do a find_get_page in its endio handler so it must be done in its own thread. This makes direct writes quite a bit faster. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:33 -04:00
Jan Schmidt	815a51c74a	Btrfs: dummy extent buffers for tree mod log The tree modification log needs two ways to create dummy extent buffers, once by allocating a fresh one (to rebuild an old root) and once by cloning an existing one (to make private rewind modifications) to it. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-26 12:17:54 +02:00
Josef Bacik	5cf1ab5613	Btrfs: always store the mirror we read the eb from A user reported a panic where we were trying to fix a bad mirror but the mirror number we were giving was 0, which is invalid. This is because we don't do the transid verification until after the read, so as far as the read code is concerned the read was a success. So instead store the mirror we read from so that if there is some failure post read we know which mirror to try next and which mirror needs to be fixed if we find a good copy of the block. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-04-18 19:22:30 +02:00
Chris Mason	1d4284bd6e	Merge branch 'error-handling' into for-linus Conflicts: fs/btrfs/ctree.c fs/btrfs/disk-io.c fs/btrfs/extent-tree.c fs/btrfs/extent_io.c fs/btrfs/extent_io.h fs/btrfs/inode.c fs/btrfs/scrub.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-28 20:31:37 -04:00
Josef Bacik	ea46679408	Btrfs: deal with read errors on extent buffers differently Since we need to read and write extent buffers in their entirety we can't use the normal bio_readpage_error stuff since it only works on a per page basis. So instead make it so that if we see an io error in endio we just mark the eb as having an IO error and then in btree_read_extent_buffer_pages we will manually try other mirrors and then overwrite the bad mirror if we find a good copy. This works with larger than page size blocks. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-26 21:57:36 -04:00
Josef Bacik	0b32f4bbb4	Btrfs: ensure an entire eb is written at once This patch simplifies how we track our extent buffers. Previously we could exit writepages with only having written half of an extent buffer, which meant we had to track the state of the pages and the state of the extent buffers differently. Now we only read in entire extent buffers and write out entire extent buffers, this allows us to simply set bits in our bflags to indicate the state of the eb and we no longer have to do things like track uptodate with our iotree. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-26 17:04:23 -04:00
Josef Bacik	3083ee2e18	Btrfs: introduce free_extent_buffer_stale Because btrfs cow's we can end up with extent buffers that are no longer necessary just sitting around in memory. So instead of evicting these pages, we could end up evicting things we actually care about. Thus we have free_extent_buffer_stale for use when we are freeing tree blocks. This will make it so that the ref for the eb being in the radix tree is dropped as soon as possible and then is freed when the refcount hits 0 instead of waiting to be released by releasepage. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-03-26 16:51:08 -04:00
Josef Bacik	4f2de97ace	Btrfs: set page->private to the eb We spend a lot of time looking up extent buffers from pages when we could just store the pointer to the eb the page is associated with in page->private. This patch does just that, and it makes things a little simpler and reduces a bit of CPU overhead involved with doing metadata IO. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-03-26 16:51:07 -04:00
Chris Mason	727011e07c	Btrfs: allow metadata blocks larger than the page size A few years ago the btrfs code to support blocks lager than the page size was disabled to fix a few corner cases in the page cache handling. This fixes the code to properly support large metadata blocks again. Since current kernels will crash early and often with larger metadata blocks, this adds an incompat bit so that older kernels can't mount it. This also does away with different blocksizes for nodes and leaves. You get a single block size for all tree blocks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-26 16:50:37 -04:00
Jeff Mahoney	3fbe5c02ae	btrfs: split extent_state ops set_extent_bit can do exclusive locking but only when called by lock_extent*, Drop the exclusive bits argument except when called by lock_extent. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:35 +01:00
Jeff Mahoney	d0082371cf	btrfs: drop gfp_t from lock_extent lock_extent and unlock_extent are always called with GFP_NOFS, drop the argument and use GFP_NOFS consistently. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:35 +01:00
Jeff Mahoney	143bede527	btrfs: return void in functions without error conditions Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:34 +01:00
Jeff Mahoney	87826df0ec	btrfs: delalloc for page dirtied out-of-band in fixup worker We encountered an issue that was easily observable on s/390 systems but could really happen anywhere. The timing just seemed to hit reliably on s/390 with limited memory. The gist is that when an unexpected set_page_dirty() happened, we'd run into the BUG() in btrfs_writepage_fixup_worker since it wasn't properly set up for delalloc. This patch does the following: - Performs the missing delalloc in the fixup worker - Allow the start hook to return -EBUSY which informs __extent_writepage that it should mark the page skipped and not to redirty it. This is required since the fixup worker can fail with -ENOSPC and the page will have already been redirtied. That causes an Oops in drop_outstanding_extents later. Retrying the fixup worker could lead to an infinite loop. Deferring the page redirty also saves us some cycles since the page would be stuck in a resubmit-redirty loop until the fixup worker completes. It's not harmful, just wasteful. - If the fixup worker fails, we mark the page and mapping as errored, and end the writeback, similar to what we would do had the page actually been submitted to writeback. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-02-15 16:40:25 +01:00
Arne Jansen	5b25f70f42	Btrfs: add nested locking mode for paths This patch adds the possibilty to read-lock an extent even if it is already write-locked from the same thread. btrfs_find_all_roots() needs this capability. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-01-04 16:12:29 +01:00
Jan Schmidt	32240a913d	btrfs: mirror_num should be int, not u64 My previous patch introduced some u64 for failed_mirror variables, this one makes it consistent again. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-20 07:42:14 -05:00
Chris Mason	806468f8bf	Merge git://git.jan-o-sch.net/btrfs-unstable into integration Conflicts: fs/btrfs/Makefile fs/btrfs/extent_io.c fs/btrfs/extent_io.h fs/btrfs/scrub.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-06 03:07:10 -05:00
Chris Mason	531f4b1ae5	Merge branch 'for-chris' of git://github.com/sensille/linux into integration Conflicts: fs/btrfs/ctree.h Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-06 03:05:08 -05:00
Chris Mason	01d658f2ca	Btrfs: make sure to flush queued bios if write_cache_pages waits write_cache_pages tries to build up a large bio to stuff down the pipe. But if it needs to wait for a page lock, it needs to make sure and send down any pending writes so we don't deadlock with anyone who has the page lock and is waiting for writeback of things inside the bio. Dave Sterba triggered this as a deadlock between the autodefrag code and the extent write_cache_pages Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-06 03:03:48 -05:00
Josef Bacik	1728366efa	Btrfs: stop using write_one_page While looking for a performance regression a user was complaining about, I noticed that we had a regression with the varmail test of filebench. This was introduced by `0d10ee2e6d` which keeps us from calling writepages in writepage. This is a correct change, however it happens to help the varmail test because we write out in larger chunks. This is largly to do with how we write out dirty pages for each transaction. If you run filebench with load varmail set $dir=/mnt/btrfs-test run 60 prior to this patch you would get ~1420 ops/second, but with the patch you get ~1200 ops/second. This is a 16% decrease. So since we know the range of dirty pages we want to write out, don't write out in one page chunks, write out in ranges. So to do this we call filemap_fdatawrite_range() on the range of bytes. Then we convert the DIRTY extents to NEED_WAIT extents. When we then call btrfs_wait_marked_extents() we only have to filemap_fdatawait_range() on that range and clear the NEED_WAIT extents. This doesn't get us back to our original speeds, but I've been seeing ~1380 ops/second, which is a <5% regression as opposed to a >15% regression. That is acceptable given that the original commit greatly reduces our latency to begin with. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:48 -04:00

1 2 3

116 Commits