linux

Author	SHA1	Message	Date
Dave Chinner	cd4a3c503c	xfs: clean up code layout in xfs_trans_ail.c This patch rearranges the location of functions in xfs_trans_ail.c to remove the need for forward declarations of those functions in preparation for adding new functions without the need for forward declarations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-04-08 12:45:07 +10:00
Dave Chinner	0bf6a5bd4b	xfs: convert the xfsaild threads to a workqueue Similar to the xfssyncd, the per-filesystem xfsaild threads can be converted to a global workqueue and run periodically by delayed works. This makes sense for the AIL pushing because it uses variable timeouts depending on the work that needs to be done. By removing the xfsaild, we simplify the AIL pushing code and remove the need to spread the code to implement the threading and pushing across multiple files. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-04-08 12:45:07 +10:00
Dave Chinner	a7b339f1b8	xfs: introduce background inode reclaim work Background inode reclaim needs to run more frequently that the XFS syncd work is run as 30s is too long between optimal reclaim runs. Add a new periodic work item to the xfs syncd workqueue to run a fast, non-blocking inode reclaim scan. Background inode reclaim is kicked by the act of marking inodes for reclaim. When an AG is first marked as having reclaimable inodes, the background reclaim work is kicked. It will continue to run periodically untill it detects that there are no more reclaimable inodes. It will be kicked again when the first inode is queued for reclaim. To ensure shrinker based inode reclaim throttles to the inode cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the background inode reclaim so that when we are low on memory we are trying to reclaim inodes as efficiently as possible. This kick shoul d not be necessary, but it will protect against failures to kick the background reclaim when inodes are first dirtied. To provide the rate throttling, make the shrinker pass do synchronous inode reclaim so that it blocks on inodes under IO. This means that the shrinker will reclaim inodes rather than just skipping over them, but it does not adversely affect the rate of reclaim because most dirty inodes are already under IO due to the background reclaim work the shrinker kicked. These two modifications solve one of the two OOM killer invocations Chris Mason reported recently when running a stress testing script. The particular workload trigger for the OOM killer invocation is where there are more threads than CPUs all unlinking files in an extremely memory constrained environment. Unlike other solutions, this one does not have a performance impact on performance when memory is not constrained or the number of concurrent threads operating is <= to the number of CPUs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-04-08 12:45:07 +10:00
Dave Chinner	89e4cb550a	xfs: convert ENOSPC inode flushing to use new syncd workqueue On of the problems with the current inode flush at ENOSPC is that we queue a flush per ENOSPC event, regardless of how many are already queued. Thi can result in hundreds of queued flushes, most of which simply burn CPU scanned and do no real work. This simply slows down allocation at ENOSPC. We really only need one active flush at a time, and we can easily implement that via the new xfs_syncd_wq. All we need to do is queue a flush if one is not already active, then block waiting for the currently active flush to complete. The result is that we only ever have a single ENOSPC inode flush active at a time and this greatly reduces the overhead of ENOSPC processing. On my 2p test machine, this results in tests exercising ENOSPC conditions running significantly faster - 042 halves execution time, 083 drops from 60s to 5s, etc - while not introducing test regressions. This allows us to remove the old xfssyncd threads and infrastructure as they are no longer used. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-04-08 12:45:07 +10:00
Dave Chinner	c6d09b666d	xfs: introduce a xfssyncd workqueue All of the work xfssyncd does is background functionality. There is no need for a thread per filesystem to do this work - it can al be managed by a global workqueue now they manage concurrency effectively. Introduce a new gglobal xfssyncd workqueue, and convert the periodic work to use this new functionality. To do this, use a delayed work construct to schedule the next running of the periodic sync work for the filesystem. When the sync work is complete, queue a new delayed work for the next running of the sync work. For laptop mode, we wait on completion for the sync works, so ensure that the sync work queuing interface can flush and wait for work to complete to enable the work queue infrastructure to replace the current sequence number and wakeup that is used. Because the sync work does non-trivial amounts of work, mark the new work queue as CPU intensive. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-04-08 12:45:07 +10:00
Dave Chinner	e828776a8a	xfs: fix extent format buffer allocation size When formatting an inode item, we have to allocate a separate buffer to hold extents when there are delayed allocation extents on the inode and it is in extent format. The allocation size is derived from the in-core data fork representation, which accounts for delayed allocation extents, while the on-disk representation does not contain any delalloc extents. As a result of this mismatch, the allocated buffer can be far larger than needed to hold the real extent list which, due to the fact the inode is in extent format, is limited to the size of the literal area of the inode. However, we can have thousands of delalloc extents, resulting in an allocation size orders of magnitude larger than is needed to hold all the real extents. Fix this by limiting the size of the buffer being allocated to the size of the literal area of the inodes in the filesystem (i.e. the maximum size an inode fork can grow to). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-04-08 12:45:07 +10:00
Dave Chinner	89b3600ccf	xfs: fix unreferenced var error in xfs_buf.c Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-03-30 23:34:20 -05:00
Dave Chinner	0444d76ae6	fs: don't use igrab() while holding i_lock Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥ If we are already holding the i_lock, we have a reference to the inode so we can safely use ihold() to gain an extra reference. This avoids hangs due to lock recursion on the i_lock now that the inode_lock is gone and igrab() uses the i_lock itself. Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Ryan Mallon <ryan@bluewatersys.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-29 07:50:34 -07:00
Linus Torvalds	c5850150d0	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: stop using the page cache to back the buffer cache xfs: register the inode cache shrinker before quotachecks xfs: xfs_trans_read_buf() should return an error on failure xfs: introduce inode cluster buffer trylocks for xfs_iflush vmap: flush vmap aliases when mapping fails xfs: preallocation transactions do not need to be synchronous Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_buf.c due to plug removal.	2011-03-28 15:51:02 -07:00
Linus Torvalds	7f5fe3ec8e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: eCryptfs: write lock requested keys eCryptfs: move ecryptfs_find_auth_tok_for_sig() call before mutex_lock eCryptfs: verify authentication tokens before their use eCryptfs: modified size of keysig in the ecryptfs_key_sig structure eCryptfs: removed num_global_auth_toks from ecryptfs_mount_crypt_stat eCryptfs: ecryptfs_keyring_auth_tok_for_sig() bug fix eCryptfs: Unlock page in write_begin error path ecryptfs: modify write path to encrypt page in writepage eCryptfs: Remove ECRYPTFS_NEW_FILE crypt stat flag eCryptfs: Remove unnecessary grow_file() function	2011-03-28 15:43:25 -07:00
Linus Torvalds	212a17ab87	Merge branch 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (45 commits) Btrfs: fix __btrfs_map_block on 32 bit machines btrfs: fix possible deadlock by clearing __GFP_FS flag btrfs: check link counter overflow in link(2) btrfs: don't mess with i_nlink of unlocked inode in rename() Btrfs: check return value of btrfs_alloc_path() Btrfs: fix OOPS of empty filesystem after balance Btrfs: fix memory leak of empty filesystem after balance Btrfs: fix return value of setflags ioctl Btrfs: fix uncheck memory allocations btrfs: make inode ref log recovery faster Btrfs: add btrfs_trim_fs() to handle FITRIM Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP Btrfs: make update_reserved_bytes() public btrfs: return EXDEV when linking from different subvolumes Btrfs: Per file/directory controls for COW and compression Btrfs: add datacow flag in inode flag btrfs: use GFP_NOFS instead of GFP_KERNEL Btrfs: check return value of read_tree_block() btrfs: properly access unaligned checksum buffer ... Fix up trivial conflicts in fs/btrfs/volumes.c due to plug removal in the block layer.	2011-03-28 15:31:05 -07:00
Linus Torvalds	03e4970c10	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (39 commits) Treat writes as new when holes span across page boundaries fs,ocfs2: Move o2net_get_func_run_time under CONFIG_OCFS2_FS_STATS. ocfs2/dlm: Move kmalloc() outside the spinlock ocfs2: Make the left masklogs compat. ocfs2: Remove masklog ML_AIO. ocfs2: Remove masklog ML_UPTODATE. ocfs2: Remove masklog ML_BH_IO. ocfs2: Remove masklog ML_JOURNAL. ocfs2: Remove masklog ML_EXPORT. ocfs2: Remove masklog ML_DCACHE. ocfs2: Remove masklog ML_NAMEI. ocfs2: Remove mlog(0) from fs/ocfs2/dir.c ocfs2: remove NAMEI from symlink.c ocfs2: Remove masklog ML_QUOTA. ocfs2: Remove mlog(0) from quota_local.c. ocfs2: Remove masklog ML_RESERVATIONS. ocfs2: Remove masklog ML_XATTR. ocfs2: Remove masklog ML_SUPER. ocfs2: Remove mlog(0) from fs/ocfs2/heartbeat.c ocfs2: Remove mlog(0) from fs/ocfs2/slot_map.c ... Fix up trivial conflict in fs/ocfs2/super.c	2011-03-28 13:03:31 -07:00
Goldwyn Rodrigues	272b62c1f0	Treat writes as new when holes span across page boundaries When a hole spans across page boundaries, the next write forces a read of the block. This could end up reading existing garbage data from the disk in ocfs2_map_page_blocks. This leads to non-zero holes. In order to avoid this, mark the writes as new when the holes span across page boundaries. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: jlbec <jlbec@evilplan.org>	2011-03-28 09:44:58 -07:00
Joel Becker	99bdc3880c	Merge branch 'mlog_replace_for_39' of git://repo.or.cz/taoma-kernel into ocfs2-merge-window-fix	2011-03-28 09:44:26 -07:00
Rakib Mullick	ed59992e8d	fs,ocfs2: Move o2net_get_func_run_time under CONFIG_OCFS2_FS_STATS. When CONFIG_DEBUG_FS=y and CONFIG_OCFS2_FS_STATS=n, we get the following warning: fs/ocfs2/cluster/tcp.c:213:16: warning: ‘o2net_get_func_run_time’ defined but not used Since o2net_get_func_run_time is only called from o2net_update_recv_stats, so move it under CONFIG_OCFS2_FS_STATS. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: jlbec <jlbec@evilplan.org>	2011-03-28 09:43:28 -07:00
Linus Torvalds	1788c208aa	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Ensure that rpc_release_resources_task() can be called twice. NFS: Don't leak RPC clients in NFSv4 secinfo negotiation NFS: Fix a hang in the writeback path	2011-03-28 07:52:58 -07:00
Chris Mason	d9d0487932	Btrfs: fix __btrfs_map_block on 32 bit machines Recent changes for discard support didn't compile, this fixes them not to try and % 64 bit numbers. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:59 -04:00
Miao Xie	1561deda68	btrfs: fix possible deadlock by clearing __GFP_FS flag Using the GFP_HIGHUSER_MOVABLE flag to allocate the metadata's page may cause deadlock. Task1 open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() wait for reclaiming shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. So the btree's page cache is different with the file's page cache, it can not allocate pages by GFP_HIGHUSER_MOVABLE flag, we must clear __GFP_FS flag in GFP_HIGHUSER_MOVABLE flag. Reported-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:58 -04:00
Al Viro	c055e99eea	btrfs: check link counter overflow in link(2) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:56 -04:00
Al Viro	92986796d8	btrfs: don't mess with i_nlink of unlocked inode in rename() old_inode is not locked; it's not safe to play with its link count. Instead of bumping it and calling btrfs_unlink_inode(), add a variant of the latter that does not do btrfs_drop_nlink()/ btrfs_update_inode(), call it instead of btrfs_inc_nlink()/ btrfs_unlink_inode() and do btrfs_update_inode() ourselves. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:55 -04:00
Tsutomu Itoh	c2db1073fd	Btrfs: check return value of btrfs_alloc_path() Adding the check on the return value of btrfs_alloc_path() to several places. And, some of callers are modified by this change. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:54 -04:00
liubo	c59021f846	Btrfs: fix OOPS of empty filesystem after balance btrfs will remove unused block groups after balance. When a empty filesystem is balanced, the block group with tag "DATA" may be dropped, and after umount and mount again, it will not find "DATA" space_info and lead to OOPS. So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS. Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:53 -04:00
liubo	9f7c43c967	Btrfs: fix memory leak of empty filesystem after balance After Josef's patch(commit `3c14874acc`), btrfs will exclude super bytes when reading block groups(by marking a extent state UPTODATE). However, these bytes do not get freed while balance remove unused block groups, and we won't process those removed ones any more, when we do umount and unload the btrfs module, btrfs hits a memory leak. This patch add the missing free operation. Reproduce steps: $ mkfs.btrfs disk $ mount disk /mnt/btrfs -o loop $ btrfs filesystem balance /mnt/btrfs $ umount /mnt/btrfs $ rmmod btrfs Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:52 -04:00
liubo	2d4e6f6ad2	Btrfs: fix return value of setflags ioctl setflags ioctl should return error when any checks fail. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:51 -04:00
Yoshinori Sano	dac97e516c	Btrfs: fix uncheck memory allocations To make Btrfs code more robust, several return value checks where memory allocation can fail are introduced. I use BUG_ON where I don't know how to handle the error properly, which increases the number of using the notorious BUG_ON, though. Signed-off-by: Yoshinori Sano <yoshinori.sano@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:49 -04:00
liubo	c622ae6085	btrfs: make inode ref log recovery faster When we recover from crash via write-ahead log tree and process the inode refs, for each btrfs_inode_ref item, we will 1) check if we already have a perfect match in fs/file tree, if we have, then we're done. 2) search the corresponding back reference in fs/file tree, and check all the names in this back reference to see if they are also in the log to avoid conflict corners. 3) recover the logged inode refs to fs/file tree. In current btrfs, however, - for 2)'s check, once is enough, since the checked back reference will remain unchanged after processing all the inode refs belonged to the key. - it has no need to do another 1) between 2) and 3). I've made a small test to show how it improves, $dd if=/dev/zero of=foobar bs=4K count=1 $sync $make 100 hard links continuously, like ln foobar link_i $fsync foobar $echo b > /proc/sysrq-trigger after reboot $time mount DEV PATH without patch: real 0m0.285s user 0m0.001s sys 0m0.009s with patch: real 0m0.123s user 0m0.000s sys 0m0.010s Changelog v1->v2: - fix double free - pointed by David Sterba Changelog v2->v3: - adjust free order Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:48 -04:00
Li Dongyang	f7039b1d5c	Btrfs: add btrfs_trim_fs() to handle FITRIM We take an free extent out from allocator, trim it, then put it back, but before we trim the block group, we should make sure the block group is cached, so plus a little change to make cache_block_group() run without a transaction. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:47 -04:00
Li Dongyang	5378e60734	Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes Callers of btrfs_discard_extent() should check if we are mounted with -o discard, as we want to make fitrim to work even the fs is not mounted with -o discard. Also we should use REQ_DISCARD to map the free extent to get a full mapping, last we only return errors if 1. the error is not a EOPNOTSUPP 2. no device supports discard Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:46 -04:00
Li Dongyang	fce3bb9a1b	Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP btrfs_map_block() will only return a single stripe length, but we want the full extent be mapped to each disk when we are trimming the extent, so we add length to btrfs_bio_stripe and fill it if we are mapping for REQ_DISCARD. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:45 -04:00
Li Dongyang	b4d00d569a	Btrfs: make update_reserved_bytes() public Make the function public as we should update the reserved extents calculations after taking out an extent for trimming. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:43 -04:00
Mark Fasheh	3ab3564f01	btrfs: return EXDEV when linking from different subvolumes btrfs_link returns EPERM if a cross-subvolume link is attempted. However, in this case I believe EXDEV to be the more appropriate value. >From the link(2) man page: EXDEV oldpath and newpath are not on the same mounted file system. (Linux permits a file system to be mounted at multiple points, but link() does not work across different mount points, even if the same file system is mounted on both.) This matters because an application may have different behaviors based on return codes. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:42 -04:00
Liu Bo	75e7cb7fe0	Btrfs: Per file/directory controls for COW and compression Data compression and data cow are controlled across the entire FS by mount options right now. ioctls are needed to set this on a per file or per directory basis. This has been proposed previously, but VFS developers wanted us to use generic ioctls rather than btrfs-specific ones. According to Chris's comment, there should be just one true compression method(probably LZO) stored in the super. However, before this, we would wait for that one method is stable enough to be adopted into the super. So I list it as a long term goal, and just store it in ram today. After applying this patch, we can use the generic "FS_IOC_SETFLAGS" ioctl to control file and directory's datacow and compression attribute. NOTE: - The compression type is selected by such rules: If we mount btrfs with compress options, ie, zlib/lzo, the type is it. Otherwise, we'll use the default compress type (zlib today). v1->v2: - rebase to the latest btrfs. v2->v3: - fix a problem, i.e. when a file is set NOCOW via mount option, then this NOCOW will be screwed by inheritance from parent directory. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:41 -04:00
Miao Xie	fc0e4a314e	btrfs: use GFP_NOFS instead of GFP_KERNEL In the filesystem context, we must allocate memory by GFP_NOFS, or we may start another filesystem operation and make kswap thread hang up. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:39 -04:00
Tsutomu Itoh	97d9a8a420	Btrfs: check return value of read_tree_block() This patch is checking return value of read_tree_block(), and if it is NULL, error processing. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:37 -04:00
David Sterba	7e75bf3ff3	btrfs: properly access unaligned checksum buffer On Fri, Mar 18, 2011 at 11:56:53AM -0400, Chris Mason wrote: > Thanks for fielding this one. Does put_unaligned_le32 optimize away on > platforms with efficient access? It would be great if we didn't need > the #ifdef. (quicktest: assembly output is same for put_unaligned_le32 and direct assignment on my x86_64) I was originally following examples in Documentation/unaligned-memory-access.txt. From other code it seems to me that the define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is intended for larger portions of code. Macros/wrappers for {put,get}_unaligned* are chosen via arch/<arch>/include/asm/unaligned.h accordingly, therefore it's safe to use put_unaligned_le32 without the ifdef. dave Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:36 -04:00
Tsutomu Itoh	db5b493ac7	Btrfs: cleanup some BUG_ON() This patch changes some BUG_ON() to the error return. (but, most callers still use BUG_ON()) Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:35 -04:00
liubo	1abe9b8a13	Btrfs: add initial tracepoint support for btrfs Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0 dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0 btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0) btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0) btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8 flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0) flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0) flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0) Here is what I have added: 1) ordere_extent: btrfs_ordered_extent_add btrfs_ordered_extent_remove btrfs_ordered_extent_start btrfs_ordered_extent_put These provide critical information to understand how ordered_extents are updated. 2) extent_map: btrfs_get_extent extent_map is used in both read and write cases, and it is useful for tracking how btrfs specific IO is running. 3) writepage: __extent_writepage btrfs_writepage_end_io_hook Pages are cirtical resourses and produce a lot of corner cases during writeback, so it is valuable to know how page is written to disk. 4) inode: btrfs_inode_new btrfs_inode_request btrfs_inode_evict These can show where and when a inode is created, when a inode is evicted. 5) sync: btrfs_sync_file btrfs_sync_fs These show sync arguments. 6) transaction: btrfs_transaction_commit In transaction based filesystem, it will be useful to know the generation and who does commit. 7) back reference and cow: btrfs_delayed_tree_ref btrfs_delayed_data_ref btrfs_delayed_ref_head btrfs_cow_block Btrfs natively supports back references, these tracepoints are helpful on understanding btrfs's COW mechanism. 8) chunk: btrfs_chunk_alloc btrfs_chunk_free Chunk is a link between physical offset and logical offset, and stands for space infomation in btrfs, and these are helpful on tracing space things. 9) reserved_extent: btrfs_reserved_extent_alloc btrfs_reserved_extent_free These can show how btrfs uses its space. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:33 -04:00
Chris Mason	240f62c875	Btrfs: use RCU instead of a spinlock to protect the root node The pointer to the extent buffer for the root of each tree is protected by a spinlock so that we can safely read the pointer and take a reference on the extent buffer. But now that the extent buffers are freed via RCU, we can safely use rcu_read_lock instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:22 -04:00
Roberto Sassu	b5695d0463	eCryptfs: write lock requested keys A requested key is write locked in order to prevent modifications on the authentication token while it is being used. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-03-28 01:49:43 -05:00
Roberto Sassu	950983fc04	eCryptfs: move ecryptfs_find_auth_tok_for_sig() call before mutex_lock The ecryptfs_find_auth_tok_for_sig() call is moved before the mutex_lock(s->tfm_mutex) instruction in order to avoid possible deadlocks that may occur by holding the lock on the two semaphores 'key->sem' and 's->tfm_mutex' in reverse order. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-03-28 01:49:42 -05:00
Roberto Sassu	0e1fc5ef47	eCryptfs: verify authentication tokens before their use Authentication tokens content may change if another requestor calls the update() method of the corresponding key. The new function ecryptfs_verify_auth_tok_from_key() retrieves the authentication token from the provided key and verifies if it is still valid before being used to encrypt or decrypt an eCryptfs file. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> [tyhicks: Minor formatting changes] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-03-28 01:49:41 -05:00
Roberto Sassu	7762e230fd	eCryptfs: modified size of keysig in the ecryptfs_key_sig structure The size of the 'keysig' array is incremented of one byte in order to make room for the NULL character. The 'keysig' variable is used, in the function ecryptfs_generate_key_packet_set(), to find an authentication token with the given signature and is printed a debug message if it cannot be retrieved. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-03-28 01:49:40 -05:00
Roberto Sassu	cf35ca6913	eCryptfs: removed num_global_auth_toks from ecryptfs_mount_crypt_stat This patch removes the 'num_global_auth_toks' field of the ecryptfs_mount_crypt_stat structure, used to count the number of items in the 'global_auth_tok_list' list. This variable is not needed because there are no checks based upon it. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-03-28 01:49:39 -05:00
Roberto Sassu	1821df040a	eCryptfs: ecryptfs_keyring_auth_tok_for_sig() bug fix The pointer '(*auth_tok_key)' is set to NULL in case request_key() fails, in order to prevent its use by functions calling ecryptfs_keyring_auth_tok_for_sig(). Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-03-28 01:49:15 -05:00
Tyler Hicks	50f198ae16	eCryptfs: Unlock page in write_begin error path Unlock the page in error path of ecryptfs_write_begin(). This may happen, for example, if decryption fails while bring the page up-to-date. Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-03-28 01:47:46 -05:00
Thieu Le	57db4e8d73	ecryptfs: modify write path to encrypt page in writepage Change the write path to encrypt the data only when the page is written to disk in ecryptfs_writepage. Previously, ecryptfs encrypts the page in ecryptfs_write_end which means that if there are multiple write requests to the same page, ecryptfs ends up re-encrypting that page over and over again. This patch minimizes the number of encryptions needed. Signed-off-by: Thieu Le <thieule@chromium.org> [tyhicks: Changed NULL .drop_inode sop pointer to generic_drop_inode] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-03-28 01:47:45 -05:00
Tyler Hicks	fed8859b3a	eCryptfs: Remove ECRYPTFS_NEW_FILE crypt stat flag Now that grow_file() is not called in the ecryptfs_create() path, the ECRYPTFS_NEW_FILE flag is no longer needed. It helped ecryptfs_readpage() know not to decrypt zeroes that were read from the lower file in the grow_file() path. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-03-28 01:47:44 -05:00
Tyler Hicks	bd4f0fe8bb	eCryptfs: Remove unnecessary grow_file() function When creating a new eCryptfs file, the crypto metadata is written out and then the lower file was being "grown" with 4 kB of encrypted zeroes. I suspect that growing the encrypted file was to prevent an information leak that the unencrypted file was empty. However, the unencrypted file size is stored, in plaintext, in the metadata so growing the file is unnecessary. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-03-28 01:47:43 -05:00
Linus Torvalds	a17d47300b	Merge branch 'for-linus-1' of git://git.infradead.org/mtd-2.6 * 'for-linus-1' of git://git.infradead.org/mtd-2.6: (49 commits) mtd: mtdswap: fix compilation warning mtdswap: kill strict error handling option mtd: nand: enable software BCH ECC in nand simulator mtd: nand: add software BCH ECC support mtd: fix printf format warnings, mostly lack of %zd for size_t, in mtdswap mtd: sm_rtl: check kmalloc return value mtd: cfi: add support for AMIC flashes (e.g. A29L160AT) lib: add shared BCH ECC library mtd: mxc_nand: fix OOB corruption when page size > 2KiB mtd: DaVinci: Removed header file that is not required mtd: pxa3xx_nand: clean the keep configure code mtd: pxa3xx_nand: mtd scan id process could be defined by driver itself mtd: pxa3xx_nand: unify prepare command mtd: pxa3xx_nand: discard wait_for_event,write_cmd,__readid function mtd: pxa3xx_nand: rework irq logic mtd: pxa3xx_nand: make scan procedure more clear mtd: speedtest: fix integer overflow mtd: mxc_nand: fix read past buffer end mtd: omap3: nand: report corrected ecc errors jffs2: remove a trailing white space in commentaries ...	2011-03-27 19:40:56 -07:00
Randy Dunlap	b6d0ad686d	fs: fix inode.c kernel-doc warning Fix inode.c kernel-doc fatal error: 2 comment sections have the same name: Error(fs/inode.c:1171): duplicate section name 'Note' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-27 19:30:19 -07:00

1 2 3 4 5 ...

22153 Commits