linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-08 05:01:48 +00:00

Author	SHA1	Message	Date
Chris Mason	dc7fdde39e	Btrfs: reduce lock contention during extent insertion We're spending huge amounts of time on lock contention during end_io processing because we unconditionally assume we are overwriting an existing extent in the file for each IO. This checks to see if we are outside i_size, and if so, it uses a less expensive readonly search of the btree to look for existing extents. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-04-27 14:51:05 -04:00
Jeff Mahoney	79787eaab4	btrfs: replace many BUG_ONs with proper error handling btrfs currently handles most errors with BUG_ON. This patch is a work-in- progress but aims to handle most errors other than internal logic errors and ENOMEM more gracefully. This iteration prevents most crashes but can run into lockups with the page lock on occasion when the timing "works out." Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 11:52:54 +01:00
Jeff Mahoney	d0082371cf	btrfs: drop gfp_t from lock_extent lock_extent and unlock_extent are always called with GFP_NOFS, drop the argument and use GFP_NOFS consistently. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:35 +01:00
Linus Torvalds	855a85f704	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Quoth Chris: "This is later than I wanted because I got backed up running through btrfs bugs from the Oracle QA teams. But they are all bug fixes that we've queued and tested since rc1. Nothing in particular stands out, this just reflects bug fixing and QA done in parallel by all the btrfs developers. The most user visible of these is: Btrfs: clear the extent uptodate bits during parent transid failures Because that helps deal with out of date drives (say an iscsi disk that has gone away and come back). The old code wasn't always properly retrying the other mirror for this type of failure." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits) Btrfs: fix compiler warnings on 32 bit systems Btrfs: increase the global block reserve estimates Btrfs: clear the extent uptodate bits during parent transid failures Btrfs: add extra sanity checks on the path names in btrfs_mksubvol Btrfs: make sure we update latest_bdev Btrfs: improve error handling for btrfs_insert_dir_item callers Btrfs: be less strict on finding next node in clear_extent_bit Btrfs: fix a bug on overcommit stuff Btrfs: kick out redundant stuff in convert_extent_bit Btrfs: skip states when they does not contain bits to clear Btrfs: check return value of lookup_extent_mapping() correctly Btrfs: fix deadlock on page lock when doing auto-defragment Btrfs: fix return value check of extent_io_ops btrfs: honor umask when creating subvol root btrfs: silence warning in raid array setup btrfs: fix structs where bitfields and spinlock/atomic share 8B word btrfs: delalloc for page dirtied out-of-band in fixup worker Btrfs: fix memory leak in load_free_space_cache() btrfs: don't check DUP chunks twice Btrfs: fix trim 0 bytes after a device delete ...	2012-02-24 09:02:53 -08:00
Jeff Liu	6af021d8fc	Btrfs: return the internal error unchanged if btrfs_get_extent_fiemap() call failed for SEEK_DATA/SEEK_HOLE inquiry Given that ENXIO only means "offset beyond EOF" for either SEEK_DATA or SEEK_HOLE inquiry in a desired file range, so we should return the internal error unchanged if btrfs_get_extent_fiemap() call failed, rather than ENXIO. Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com>	2012-02-15 16:40:23 +01:00
Chris Mason	d98456fcaf	Btrfs: don't reserve data with extents locked in btrfs_fallocate btrfs_fallocate tries to allocate space only if ranges in the file don't already exist. But the enospc checks it does are not allowed with extents locked. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-31 20:27:41 -05:00
Linus Torvalds	f9156c7288	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits) Btrfs: use larger system chunks Btrfs: add a delalloc mutex to inodes for delalloc reservations Btrfs: space leak tracepoints Btrfs: protect orphan block rsv with spin_lock Btrfs: add allocator tracepoints Btrfs: don't call btrfs_throttle in file write Btrfs: release space on error in page_mkwrite Btrfs: fix btrfsck error 400 when truncating a compressed Btrfs: do not use btrfs_end_transaction_throttle everywhere Btrfs: add balance progress reporting Btrfs: allow for resuming restriper after it was paused Btrfs: allow for canceling restriper Btrfs: allow for pausing restriper Btrfs: add skip_balance mount option Btrfs: recover balance on mount Btrfs: save balance parameters to disk Btrfs: soft profile changing mode (aka soft convert) Btrfs: implement online profile changing Btrfs: do not reduce profile in do_chunk_alloc() Btrfs: virtual address space subset filter ... Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new mnt_drop_write_file() helper.	2012-01-17 15:49:54 -08:00
Josef Bacik	45a8090e62	Btrfs: don't call btrfs_throttle in file write Btrfs_throttle will make us wait if there is a currently committing transaction until we can open new transactions, which is ridiculous since we don't actually start any transactions within the file write path anyway, so all this does is introduce big latencies if we have a sync/fsync heavy workload going on while somebody else is trying to do work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-16 15:28:55 -05:00
Chris Mason	9785dbdf26	Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into integration	2012-01-16 15:26:31 -05:00
Linus Torvalds	001a541ea9	Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c writeback: balanced_rate cannot exceed write bandwidth writeback: do strict bdi dirty_exceeded writeback: avoid tiny dirty poll intervals writeback: max, min and target dirty pause time writeback: dirty ratelimit - think time compensation btrfs: fix dirtied pages accounting on sub-page writes writeback: fix dirtied pages accounting on redirty writeback: fix dirtied pages accounting on sub-page writes writeback: charge leaked page dirties to active tasks writeback: Include all dirty inodes in background writeback	2012-01-10 16:59:59 -08:00
Johannes Weiner	e3a41a5ba9	btrfs: pass __GFP_WRITE for buffered write page allocations Tell the page allocator that pages allocated for a buffered write are expected to become dirty soon. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:44 -08:00
Arne Jansen	66d7e7f09f	Btrfs: mark delayed refs as for cow Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value from every call site. The for_cow parameter will later on be used to determine if a ref will change anything with respect to qgroups. Delayed refs coming from relocation are always counted as for_cow, as they don't change subvol quota. Also pass in the fs_info for later use. btrfs_find_all_roots() will use this as an optimization, as changes that are for_cow will not change anything with respect to which root points to a certain leaf. Thus, we don't need to add the current sequence number to those delayed refs. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2011-12-22 16:22:27 +01:00
Wu Fengguang	32c7f202a4	btrfs: fix dirtied pages accounting on sub-page writes When doing 1KB sequential writes to the same page, balance_dirty_pages_ratelimited_nr() should be called once instead of 4 times, the latter makes the dirtier tasks be throttled much too heavy. Fix it with proper de-accounting on clear_page_dirty_for_io(). CC: Chris Mason <chris.mason@oracle.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-12-18 14:20:25 +08:00
Linus Torvalds	c9a7fe9672	Merge branches 'for-linus' and 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: unplug every once and a while Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code Btrfs: only set cache_generation if we setup the block group Btrfs: don't panic if orphan item already exists Btrfs: fix leaked space in truncate Btrfs: fix how we do delalloc reservations and how we free reservations on error Btrfs: deal with enospc from dirtying inodes properly Btrfs: fix num_workers_starting bug and other bugs in async thread BTRFS: Establish i_ops before calling d_instantiate Btrfs: add a cond_resched() into the worker loop Btrfs: fix ctime update of on-disk inode btrfs: keep orphans for subvolume deletion Btrfs: fix inaccurate available space on raid0 profile Btrfs: fix wrong disk space information of the files Btrfs: fix wrong i_size when truncating a file to a larger size Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror * 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: lower the dirty balance poll interval	2011-12-16 12:15:50 -08:00
Wu Fengguang	142349f541	btrfs: lower the dirty balance poll interval Tests show that the original large intervals can easily make the dirty limit exceeded on 100 concurrent dd's. So adapt to as large as the next check point selected by the dirty throttling algorithm. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-12-16 12:32:57 -05:00
Josef Bacik	22c44fe65a	Btrfs: deal with enospc from dirtying inodes properly Now that we're properly keeping track of delayed inode space we've been getting a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is because a bunch of people call mark_inode_dirty, which is void so we can't return ENOSPC. This needs to be fixed in a few areas 1) file_update_time - this updates the mtime and such when writing to a file, which will call mark_inode_dirty. So copy file_update_time into btrfs so we can call btrfs_dirty_inode directly and return an error if we get one appropriately. 2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't setting ->setattr for symlinks, even though we should have been. This catches one of the cases where we were getting errors in mark_inode_dirty. 3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly instead of mark_inode_dirty. This lets us return errors properly for truncate and chown/anything related to setattr. 4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and print an error if we have one. The only remaining user we can't control for this is touch_atime(), but we don't really want to keep people from walking down the tree if we don't have space to save the atime update, so just complain but don't worry about it. With this patch xfstests 83 complains a handful of times instead of hundreds of times. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-12-15 11:04:21 -05:00
Linus Torvalds	6a6662ced4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (114 commits) Btrfs: check for a null fs root when writing to the backup root log Btrfs: fix race during transaction joins Btrfs: fix a potential btrfs_bio leak on scrub fixups Btrfs: rename btrfs_bio multi -> bbio for consistency Btrfs: stop leaking btrfs_bios on readahead Btrfs: stop the readahead threads on failed mount Btrfs: fix extent_buffer leak in the metadata IO error handling Btrfs: fix the new inspection ioctls for 32 bit compat Btrfs: fix delayed insertion reservation Btrfs: ClearPageError during writepage and clean_tree_block Btrfs: be smarter about committing the transaction in reserve_metadata_bytes Btrfs: make a delayed_block_rsv for the delayed item insertion Btrfs: add a log of past tree roots btrfs: separate superblock items out of fs_info Btrfs: use the global reserve when truncating the free space cache inode Btrfs: release metadata from global reserve if we have to fallback for unlink Btrfs: make sure to flush queued bios if write_cache_pages waits Btrfs: fix extent pinning bugs in the tree log Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN Btrfs: don't wait as long for more batches during SSD log commit ...	2011-11-06 20:03:41 -08:00
Andi Kleen	ef3d0fd27e	vfs: do (nearly) lockless generic_file_llseek The i_mutex lock use of generic _file_llseek hurts. Independent processes accessing the same file synchronize over a single lock, even though they have no need for synchronization at all. Under high utilization this can cause llseek to scale very poorly on larger systems. This patch does some rethinking of the llseek locking model: First the 64bit f_pos is not necessarily atomic without locks on 32bit systems. This can already cause races with read() today. This was discussed on linux-kernel in the past and deemed acceptable. The patch does not change that. Let's look at the different seek variants: SEEK_SET: Doesn't really need any locking. If there's a race one writer wins, the other loses. For 32bit the non atomic update races against read() stay the same. Without a lock they can also happen against write() now. The read() race was deemed acceptable in past discussions, and I think if it's ok for read it's ok for write too. => Don't need a lock. SEEK_END: This behaves like SEEK_SET plus it reads the maximum size too. Reading the maximum size would have the 32bit atomic problem. But luckily we already have a way to read the maximum size without locking (i_size_read), so we can just use that instead. Without i_mutex there is no synchronization with write() anymore, however since the write() update is atomic on 64bit it just behaves like another racy SEEK_SET. On non atomic 32bit it's the same as SEEK_SET. => Don't need a lock, but need to use i_size_read() SEEK_CUR: This has a read-modify-write race window on the same file. One could argue that any application doing unsynchronized seeks on the same file is already broken. But for the sake of not adding a regression here I'm using the file->f_lock to synchronize this. Using this lock is much better than the inode mutex because it doesn't synchronize between processes. => So still need a lock, but can use a f_lock. This patch implements this new scheme in generic_file_llseek. I dropped generic_file_llseek_unlocked and changed all callers. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-10-28 14:58:58 +02:00
Josef Bacik	3b16a4e3c3	Btrfs: use the inode's mapping mask for allocating pages Johannes pointed out we were allocating only kernel pages for doing writes, which is kind of a big deal if you are on 32bit and have more than a gig of ram. So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we don't re-enter. Thanks, Reported-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:45 -04:00
Josef Bacik	1b9c332b6c	Btrfs: only reserve space in fallocate if we have to do a preallocate Lukas found a problem where if he tries to fallocate over the same region twice and the first fallocate took up all the space we would fail with ENOSPC. This is because we reserve the total space we want to use for fallocate, regardless of wether or not we will have to actually preallocate. So instead move the check into the loop where we actually have to do the preallocate. Thanks, Tested-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:36 -04:00
Linus Torvalds	7fd21be75d	Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux * 'btrfs-3.0' of git://github.com/chrismason/linux: Btrfs: force a page fault if we have a shorty copy on a page boundary	2011-10-03 12:17:44 -07:00
Josef Bacik	b6316429af	Btrfs: force a page fault if we have a shorty copy on a page boundary A user reported a problem where ceph was getting into 100% cpu usage while doing some writing. It turns out it's because we were doing a short write on a not uptodate page, which means we'd fall back at one page at a time and fault the page in. The problem is our position is on the page boundary, so our fault in logic wasn't actually reading the page, so we'd just spin forever or until the page got read in by somebody else. This will force a readpage if we end up doing a short copy. Alexandre could reproduce this easily with ceph and reports it fixes his problem. I also wrote a reproducer that no longer hangs my box with this patch. Thanks, Reported-and-tested-by: Alexandre Oliva <aoliva@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-30 15:23:54 -04:00
Jeff Liu	48802c8ae2	BTRFS: Fix lseek return value for error The recent reworking of btrfs' lseek lead to incorrect values being returned. This adds checks for seeking beyond EOF in SEEK_HOLE and makes sure the error values come back correct. Andi Kleen also sent in similar patches. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-18 10:34:02 -04:00
Linus Torvalds	0b001b2eda	Merge branch 'for-linus' of git://github.com/chrismason/linux * 'for-linus' of git://github.com/chrismason/linux: Btrfs: add dummy extent if dst offset excceeds file end in Btrfs: calc file extent num_bytes correctly in file clone btrfs: xattr: fix attribute removal Btrfs: fix wrong nbytes information of the inode Btrfs: fix the file extent gap when doing direct IO Btrfs: fix unclosed transaction handle in btrfs_cont_expand Btrfs: fix misuse of trans block rsv Btrfs: reset to appropriate block rsv after orphan operations Btrfs: skip locking if searching the commit root in csum lookup btrfs: fix warning in iput for bad-inode Btrfs: fix an oops when deleting snapshots	2011-09-12 11:47:49 -07:00
Miao Xie	0c1a98c814	Btrfs: fix the file extent gap when doing direct IO When we write some data to the place that is beyond the end of the file in direct I/O mode, a data hole will be created. And Btrfs should insert a file extent item that point to this hole into the fs tree. But unfortunately Btrfs forgets doing it. The following is a simple way to reproduce it: # mkfs.btrfs /dev/sdc2 # mount /dev/sdc2 /test4 # touch /test4/a # dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc # umount /test4 # btrfsck /dev/sdc2 root 5 inode 257 errors 100 Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:24 -04:00
Chris Mason	81d86e1b70	Merge branch 'btrfs-3.0' into for-linus	2011-08-18 10:38:03 -04:00
Josef Bacik	f1e490a7eb	Btrfs: set i_size properly when fallocating and we already xfstests exposed a problem with preallocate when it fallocates a range that already has an extent. We don't set the new i_size properly because we see that we already have an extent. This isn't right and we should update i_size if the space already exists. With this patch we now pass xfstests 075. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-18 10:36:39 -04:00
Dan Carpenter	9a4327ca1f	btrfs: unlock on error in btrfs_file_llseek() There were some unlocks on error missing in a recent patch to btrfs_file_llseek(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-18 10:16:05 -04:00
Dan Carpenter	f4ac904c41	btrfs: memory leak in btrfs_add_inode_defrag() We don't use the defrag struct on this path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:15 -04:00
Linus Torvalds	ed8f37370d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (31 commits) Btrfs: don't call writepages from within write_full_page Btrfs: Remove unused variable 'last_index' in file.c Btrfs: clean up for find_first_extent_bit() Btrfs: clean up for wait_extent_bit() Btrfs: clean up for insert_state() Btrfs: remove unused members from struct extent_state Btrfs: clean up code for merging extent maps Btrfs: clean up code for extent_map lookup Btrfs: clean up search_extent_mapping() Btrfs: remove redundant code for dir item lookup Btrfs: make acl functions really no-op if acl is not enabled Btrfs: remove remaining ref-cache code Btrfs: remove a BUG_ON() in btrfs_commit_transaction() Btrfs: use wait_event() Btrfs: check the nodatasum flag when writing compressed files Btrfs: copy string correctly in INO_LOOKUP ioctl Btrfs: don't print the leaf if we had an error btrfs: make btrfs_set_root_node void Btrfs: fix oops while writing data to SSD partitions Btrfs: Protect the readonly flag of block group ... Fix up trivial conflicts (due to acl and writeback cleanups) in - fs/btrfs/acl.c - fs/btrfs/ctree.h - fs/btrfs/extent_io.c	2011-08-02 21:14:05 -10:00
Mitch Harder	341d14f161	Btrfs: Remove unused variable 'last_index' in file.c The variable 'last_index' is calculated in the __btrfs_buffered_write function and passed as a parameter to the prepare_pages function, but is not used anywhere in the prepare_pages function. Remove instances of 'last_index' in these functions. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:32:39 -04:00
Wanlong Gao	a0f98dde11	Btrfs:don't check the return value of __btrfs_add_inode_defrag Don't need to check the return value of __btrfs_add_inode_defrag(), since it will always return 0. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:41 -04:00
Chris Mason	b43b31bdf2	Merge branch 'alloc_path' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/btrfs-error-handling into for-linus	2011-08-01 14:27:34 -04:00
Linus Torvalds	22712200e1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors Btrfs: use the commit_root for reading free_space_inode crcs Btrfs: reduce extent_state lock contention for metadata Btrfs: remove lockdep magic from btrfs_next_leaf Btrfs: make a lockdep class for each root Btrfs: switch the btrfs tree locks to reader/writer Btrfs: fix deadlock when throttling transactions Btrfs: stop using highmem for extent_buffers Btrfs: fix BUG_ON() caused by ENOSPC when relocating space Btrfs: tag pages for writeback in sync Btrfs: fix enospc problems with delalloc Btrfs: don't flush delalloc arbitrarily Btrfs: use find_or_create_page instead of grab_cache_page Btrfs: use a worker thread to do caching Btrfs: fix how we merge extent states and deal with cached states Btrfs: use the normal checksumming infrastructure for free space cache Btrfs: serialize flushers in reserve_metadata_bytes Btrfs: do transaction space reservation before joining the transaction Btrfs: try to only do one btrfs_search_slot in do_setxattr	2011-07-27 16:43:52 -07:00
Josef Bacik	9e0baf60de	Btrfs: fix enospc problems with delalloc So I had this brilliant idea to use atomic counters for outstanding and reserved extents, but this turned out to be a bad idea. Consider this where we have 1 outstanding extent and 1 reserved extent Reserver Releaser atomic_dec(outstanding) now 0 atomic_read(outstanding)+1 get 1 atomic_read(reserved) get 1 don't actually reserve anything because they are the same atomic_cmpxchg(reserved, 1, 0) atomic_inc(outstanding) atomic_add(0, reserved) free reserved space for 1 extent Then the reserver now has no actual space reserved for it, and when it goes to finish the ordered IO it won't have enough space to do it's allocation and you get those lovely warnings. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:44 -04:00
Josef Bacik	a94733d0bc	Btrfs: use find_or_create_page instead of grab_cache_page grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we need GFP_NOFS so we don't deadlock. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-07-27 12:46:43 -04:00
Josef Bacik	02c24a8218	fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:59 -04:00
Josef Bacik	b26751575a	Btrfs: implement our own ->llseek In order to handle SEEK_HOLE/SEEK_DATA we need to implement our own llseek. Basically for the normal SEEK_*'s we will just defer to the generic helper, and for SEEK_HOLE/SEEK_DATA we will use our fiemap helper to figure out the nearest hole or data. Currently this helper doesn't check for delalloc bytes for prealloc space, so for now treat prealloc as data until that is fixed. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:56 -04:00
Mark Fasheh	d8926bb3ba	btrfs: don't BUG_ON btrfs_alloc_path() errors This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation failure. All the sites that are fixed in this patch were checked by me to be fairly trivial to fix because of at least one of two criteria: - Callers of the function catch errors from it already so bubbling the error up will be handled. - Callers of the function might BUG_ON any nonzero return code in which case there is no behavior changed (but we still got to remove a BUG_ON) The following functions were updated: btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group, btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written, btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink, insert_reserved_file_extent, and run_delalloc_nocow Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-14 14:14:44 -07:00
David Sterba	7841cb2898	btrfs: add helper for fs_info->closing wrap checking of filesystem 'closing' flag and fix a few missing memory barriers. Signed-off-by: David Sterba <dsterba@suse.cz>	2011-06-04 08:11:22 -04:00
David Sterba	a4689d2bd3	btrfs: use btrfs_ino to access inode number commit `4cb5300bc` ("Btrfs: add mount -o auto_defrag") accesses inode number directly while it should use the helper with the new inode number allocator. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:46 -04:00
Chris Mason	ff5714cca9	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus Conflicts: fs/btrfs/disk-io.c fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-28 07:00:39 -04:00
Chris Mason	4cb5300bc8	Btrfs: add mount -o auto_defrag This will detect small random writes into files and queue the up for an auto defrag process. It isn't well suited to database workloads yet, but works for smaller files such as rpm, sqlite or bdb databases. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-26 17:52:15 -04:00
Josef Bacik	a4abeea41a	Btrfs: kill trans_mutex We use trans_mutex for lots of things, here's a basic list 1) To serialize trans_handles joining the currently running transaction 2) To make sure that no new trans handles are started while we are committing 3) To protect the dead_roots list and the transaction lists Really the serializing trans_handles joining is not too hard, and can really get bogged down in acquiring a reference to the transaction. So replace the trans_mutex with a trans_lock spinlock and use it to do the following 1) Protect fs_info->running_transaction. All trans handles have to do is check this, and then take a reference of the transaction and keep on going. 2) Protect the fs_info->trans_list. This doesn't get used too much, basically it just holds the current transactions, which will usually just be the currently committing transaction and the currently running transaction at most. 3) Protect the dead roots list. This is only ever processed by splicing the list so this is relatively simple. 4) Protect the fs_info->reloc_ctl stuff. This is very lightweight and was using the trans_mutex before, so this is a pretty straightforward change. 5) Protect fs_info->no_trans_join. Because we don't hold the trans_lock over the entirety of the commit we need to have a way to block new people from creating a new transaction while we're doing our work. So we set no_trans_join and in join_transaction we test to see if that is set, and if it is we do a wait_on_commit. 6) Make the transaction use count atomic so we don't need to take locks to modify it when we're dropping references. 7) Add a commit_lock to the transaction to make sure multiple people trying to commit the same transaction don't race and commit at the same time. 8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl trans. I have tested this with xfstests, but obviously it is a pretty hairy change so lots of testing is greatly appreciated. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:00:57 -04:00
Chris Mason	945d8962ce	Merge branch 'cleanups' of git://repo.or.cz/linux-2.6/btrfs-unstable into inode_numbers Conflicts: fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/tree-log.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-22 12:33:42 -04:00
David Sterba	b3b4aa74b5	btrfs: drop unused parameter from btrfs_release_path parameter tree root it's not used since commit `5f39d397df` ("Btrfs: Create extent_buffer interface for large blocksizes") Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-02 13:57:22 +02:00
David Sterba	172ddd60a6	btrfs: drop gfp parameter from alloc_extent_map pass GFP_NOFS directly to kmem_cache_alloc Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-02 13:57:21 +02:00
David Sterba	c704005d88	btrfs: unify checking of IS_ERR and null use IS_ERR_OR_NULL when possible, done by this coccinelle script: @ match @ identifier id; @@ ( - BUG_ON(IS_ERR(id) \|\| !id); + BUG_ON(IS_ERR_OR_NULL(id)); \| - IS_ERR(id) \|\| !id + IS_ERR_OR_NULL(id) \| - !id \|\| IS_ERR(id) + IS_ERR_OR_NULL(id) ) Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-02 13:57:20 +02:00
Li Zefan	33345d0152	Btrfs: Always use 64bit inode number There's a potential problem in 32bit system when we exhaust 32bit inode numbers and start to allocate big inode numbers, because btrfs uses inode->i_ino in many places. So here we always use BTRFS_I(inode)->location.objectid, which is an u64 variable. There are 2 exceptions that BTRFS_I(inode)->location.objectid != inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2), and inode->i_ino will be used in those cases. Another reason to make this change is I'm going to use a special inode to save free ino cache, and the inode number must be > (u64)-256. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-04-25 16:46:09 +08:00
Josef Bacik	be1a12a0df	Btrfs: deal with the case that we run out of space in the cache Currently we don't handle running out of space in the cache, so to fix this we keep track of how far in the cache we are. Then we only dirty the pages if we successfully modify all of them, otherwise if we have an error or run out of space we can just drop them and not worry about the vm writing them out. Thanks, Tested-by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-04-08 13:00:27 -04:00

1 2 3 4 5

249 Commits