linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-03 17:41:22 +00:00

Author	SHA1	Message	Date
Al Viro	0714a53380	vfs: now it can be done - make mnt_parent point to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:05 -05:00
Al Viro	3376f34fff	vfs: mnt_parent moved to struct mount the second victim... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:04 -05:00
Al Viro	676da58df7	vfs: spread struct mount - mnt_has_parent Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:04 -05:00
Al Viro	afac7cba7e	vfs: more mnt_parent cleanups a) mount --move is checking that ->mnt_parent is non-NULL before looking if that parent happens to be shared; ->mnt_parent is never NULL and it's not even an misspelled !mnt_has_parent() b) pivot_root open-codes is_path_reachable(), poorly. c) so does path_is_under(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:36 -05:00
Al Viro	b2dba1af3c	vfs: new internal helper: mnt_has_parent(mnt) vfsmounts have ->mnt_parent pointing either to a different vfsmount or to itself; it's never NULL and termination condition in loops traversing the tree towards root is mnt == mnt->mnt_parent. At least one place (see the next patch) is confused about what's going on; let's add an explicit helper checking it right way and use it in all places where we need it. Not that there had been too many, but... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:36 -05:00
Al Viro	02125a8264	fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API __d_path() API is asking for trouble and in case of apparmor d_namespace_path() getting just that. The root cause is that when __d_path() misses the root it had been told to look for, it stores the location of the most remote ancestor in root. Without grabbing references. Sure, at the moment of call it had been pinned down by what we have in path. And if we raced with umount -l, we could have very well stopped at vfsmount/dentry that got freed as soon as prepend_path() dropped vfsmount_lock. It is safe to compare these pointers with pre-existing (and known to be still alive) vfsmount and dentry, as long as all we are asking is "is it the same address?". Dereferencing is not safe and apparmor ended up stepping into that. d_namespace_path() really wants to examine the place where we stopped, even if it's not connected to our namespace. As the result, it looked at ->d_sb->s_magic of a dentry that might've been already freed by that point. All other callers had been careful enough to avoid that, but it's really a bad interface - it invites that kind of trouble. The fix is fairly straightforward, even though it's bigger than I'd like: * prepend_path() root argument becomes const. * __d_path() is never called with NULL/NULL root. It was a kludge to start with. Instead, we have an explicit function - d_absolute_root(). Same as __d_path(), except that it doesn't get root passed and stops where it stops. apparmor and tomoyo are using it. * __d_path() returns NULL on path outside of root. The main caller is show_mountinfo() and that's precisely what we pass root for - to skip those outside chroot jail. Those who don't want that can (and do) use d_path(). * __d_path() root argument becomes const. Everyone agrees, I hope. * apparmor does NOT try to use __d_path() or any of its variants when it sees that path->mnt is an internal vfsmount. In that case it's definitely not mounted anywhere and dentry_path() is exactly what we want there. Handling of sysctl()-triggered weirdness is moved to that place. * if apparmor is asked to do pathname relative to chroot jail and __d_path() tells it we it's not in that jail, the sucker just calls d_absolute_path() instead. That's the other remaining caller of __d_path(), BTW. * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway - the normal seq_file logics will take care of growing the buffer and redoing the call of ->show() just fine). However, if it gets path not reachable from root, it returns SEQ_SKIP. The only caller adjusted (i.e. stopped ignoring the return value as it used to do). Reviewed-by: John Johansen <john.johansen@canonical.com> ACKed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org	2011-12-06 23:57:18 -05:00
David Howells	dd179946db	VFS: Log the fact that we've given ELOOP rather than creating a loop To prevent an NFS server from being used to create a directory loop in an NFS superblock on the client, the following patch was committed: commit `1836750115` Author: Al Viro <viro@zeniv.linux.org.uk> Date: Tue Jul 12 21:42:24 2011 -0400 Subject: fix loop checks in d_materialise_unique() This causes ELOOP to be reported to anyone trying to access the dentry that would otherwise cause the kernel to complete the loop. However, no indication is given to the caller as to why an operation that ought to work doesn't. The fault is with the kernel, which doesn't want to try and solve the problem as it gets horrendously messy if there's another mountpoint somewhere in the trees being spliced that can't be moved[]. [] The real problem is that we don't handle the excision of a subtree that gets moved _out_ of what we can see. This can happen on the server where a directory is merely moved between two other dirs on the same filesystem, but where destination dir is not accessible by the client. So, given the choice to return ELOOP rather than trying to reconfigure the dentry tree, we should give the caller some indication of why they aren't being allowed to make what should be a legitimate request and log a message. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-11-20 23:04:27 -05:00
Al Viro	50e696308c	vfs: d_invalidate() should leave mountpoints alone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-07 10:54:10 -08:00
Sage Weil	f0023bc617	vfs: add d_prune dentry operation This adds a d_prune dentry operation that is called by the VFS prior to pruning (i.e. unhashing and killing) a hashed dentry from the dcache. Wrap dentry_lru_del() and use the new _prune() helper in the cases where we are about to unhash and kill the dentry. This will be used by Ceph to maintain a flag indicating whether the complete contents of a directory are contained in the dcache, allowing it to satisfy lookups and readdir without addition server communication. Renumber a few DCACHE_* #defines to group DCACHE_OP_PRUNE with the other DCACHE_OP_ bits. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-11-02 12:53:43 +01:00
Linus Torvalds	830c0f0edc	vfs: renumber DCACHE_xyz flags, remove some stale ones Gcc tends to generate better code with small integers, including the DCACHE_xyz flag tests - so move the common ones to be first in the list. Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and DCACHE_AUTOFS_PENDING values, their users no longer exists in the source tree. And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the common case to be a nice straight-line fall-through. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-06 22:52:40 -07:00
Randy Dunlap	2af1416265	fs/dcache.c: fix new kernel-doc warning Fix new kernel-doc warning in fs/dcache.c: Warning(fs/dcache.c:797): No description found for parameter 'sb' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-03 14:25:21 -10:00
David Howells	43c1c9cd24	VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock Reorganise shrink_dcache_for_umount_subtree() in light of the demise of dcache_lock. Without that dcache_lock, there is no need for the batching of removal of dentries from the system under it (we wanted to make intensive use of the locked data whilst we held it, but didn't want to hold it for long at a time). This works, provided the preceding patch is correct in its removal of locking on dentry->d_lock on the basis that no one should be locking these dentries any more as the whole superblock is defunct. With this patch, the calls to dentry_lru_del() and __d_shrink() are placed at the point where each dentry is detached handled. It is possible that, as an alternative, the batching should still be done - but only for dentry_lru_del() of all a dentry's children in one go. In such a case, the batching would be done under dcache_lru_lock. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:27:57 -04:00
David Howells	c6627c60c0	VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree() Locks of the dcache_lock were replaced by locks of dentry->d_lock in commits such as: `2304450783` `2fd6b7f507` as part of the RCU-based pathwalk changes, despite the fact that the caller (shrink_dcache_for_umount()) notes in the banner comment the reasons that d_lock is not necessary in these functions: /* * destroy the dentries attached to a superblock on unmounting * - we don't need to use dentry->d_lock because: * - the superblock is detached from all mountings and open files, so the * dentry trees will not be rearranged by the VFS * - s_umount is write-locked, so the memory pressure shrinker will ignore * any dentries belonging to this superblock that it comes across * - the filesystem itself is no longer permitted to rearrange the dentries * in this superblock */ So remove these locks. If the locks are actually necessary, then this banner comment should be altered instead. The hash table chains are protected by 1-bit locks in the hash table heads, so those shouldn't be a problem. Note that to make this work, __d_drop() has to be split so that the RCUwalk barrier can be avoided. This causes problems otherwise as it has an assertion that dentry->d_lock is locked - but there is no need for that as no one else can be trying to access this dentry, except to step over it (and that should be handled by d_free(), I think). Signed-off-by: David Howells <dhowells@redhat.com> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:27:57 -04:00
David Howells	35f40ef002	VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree() Remove the detached-dentry counter from shrink_dcache_for_umount_subtree() as the value it computes is no longer used as of commit `312d3ca856` which made the nr_dentry counters summed per-CPU rather than global atomic. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:27:57 -04:00
Jeff Layton	c46c887744	vfs: document locking requirements for d_move, __d_move and d_materialise_unique Adding a comment to d_materialise_unique per Al's request... d_move and __d_move have some pretty substantial locking requirements, but they are not clearly documented. Add some comments spelling them out. Also, document the requirement for the i_mutex of the parent in d_materialise_unique. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 13:41:14 -04:00
Linus Torvalds	bbd9d6f7fb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits) vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp isofs: Remove global fs lock jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory fix IN_DELETE_SELF on overwriting rename() on ramfs et.al. mm/truncate.c: fix build for CONFIG_BLOCK not enabled fs:update the NOTE of the file_operations structure Remove dead code in dget_parent() AFS: Fix silly characters in a comment switch d_add_ci() to d_splice_alias() in "found negative" case as well simplify gfs2_lookup() jfs_lookup(): don't bother with . or .. get rid of useless dget_parent() in btrfs rename() and link() get rid of useless dget_parent() in fs/btrfs/ioctl.c fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers drivers: fix up various ->llseek() implementations fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek Ext4: handle SEEK_HOLE/SEEK_DATA generically Btrfs: implement our own ->llseek fs: add SEEK_HOLE and SEEK_DATA flags reiserfs: make reiserfs default to barrier=flush ... Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new shrinker callout for the inode cache, that clashed with the xfs code to start the periodic workers later.	2011-07-22 19:02:39 -07:00
Linus Torvalds	b91da88fed	vfs: drop conditional inode prefetch in __do_lookup_rcu It seems to hurt performance in real life. Yes, the inode will be used later, but the conditional doesn't seem to predict all that well (negative dentries are not uncommon) and it looks like the cost of prefetching is simply higher than depending on the cache doing the right thing. As usual. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-21 11:01:42 -07:00
Al Viro	86c98e8cdb	Remove dead code in dget_parent() ->d_parent is never NULL... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:04 -04:00
Al Viro	4513d899c4	switch d_add_ci() to d_splice_alias() in "found negative" case as well Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:02 -04:00
Dave Chinner	b0d40c92ad	superblock: introduce per-sb cache shrinker infrastructure With context based shrinkers, we can implement a per-superblock shrinker that shrinks the caches attached to the superblock. We currently have global shrinkers for the inode and dentry caches that split up into per-superblock operations via a coarse proportioning method that does not batch very well. The global shrinkers also have a dependency - dentries pin inodes - so we have to be very careful about how we register the global shrinkers so that the implicit call order is always correct. With a per-sb shrinker callout, we can encode this dependency directly into the per-sb shrinker, hence avoiding the need for strictly ordering shrinker registrations. We also have no need for any proportioning code for the shrinker subsystem already provides this functionality across all shrinkers. Allowing the shrinker to operate on a single superblock at a time means that we do less superblock list traversals and locking and reclaim should batch more effectively. This should result in less CPU overhead for reclaim and potentially faster reclaim of items from each filesystem. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:10 -04:00
Al Viro	a9049376ee	make d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err) ... and simplify the living hell out of callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:26 -04:00
Al Viro	a4464dbc0c	Make ->d_sb assign-once and always non-NULL New helper (non-exported, fs/internal.h-only): __d_alloc(sb, name). Allocates dentry, sets its ->d_sb to given superblock and sets ->d_op accordingly. Old d_alloc(NULL, name) callers are converted to that (all of them know what superblock they want). d_alloc() itself is left only for parent != NULl case; uses __d_alloc(), inserts result into the list of parent's children. Note that now ->d_sb is assign-once and never NULL and ->d_parent is never NULL either. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:17 -04:00
Josef Bacik	44396f4b5c	fs: add a DCACHE_NEED_LOOKUP flag for d_flags Btrfs (and I'd venture most other fs's) stores its indexes in nice disk order for readdir, but unfortunately in the case of anything that stats the files in order that readdir spits back (like oh say ls) that means we still have to do the normal lookup of the file, which means looking up our other index and then looking up the inode. What I want is a way to create dummy dentries when we find them in readdir so that when ls or anything else subsequently does a stat(), we already have the location information in the dentry and can go straight to the inode itself. The lookup stuff just assumes that if it finds a dentry it is done, it doesn't perform a lookup. So add a DCACHE_NEED_LOOKUP flag so that the lookup code knows it still needs to run i_op->lookup() on the parent to get the inode for the dentry. I have tested this with btrfs and I went from something that looks like this http://people.redhat.com/jwhiter/ls-noreada.png To this http://people.redhat.com/jwhiter/ls-good.png Thats a savings of 1300 seconds, or 22 minutes. That is a significant savings. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:03 -04:00
Al Viro	1836750115	fix loop checks in d_materialise_unique() Both __d_unalias() and __d_materialise_dentry() need loop prevention. Grab rename_lock in caller, check for loops there... As a side benefit, we have dentry_lock_for_move() called only under rename_lock, which seriously reduces deadlock potential of the execrable "locking order" used for ->d_lock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-14 21:33:41 -04:00
Ying Han	1495f230fa	vmscan: change shrinker API by passing shrink_control struct Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:26 -07:00
Linus Torvalds	268bb0ce3e	sanitize <linux/prefetch.h> usage Commit `e66eed651f` ("list: remove prefetching from regular list iterators") removed the include of prefetch.h from list.h, which uncovered several cases that had apparently relied on that rather obscure header file dependency. So this fixes things up a bit, using grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw(' -- '.[ch]') grep -L 'prefetchw(' $(git grep -l 'linux/prefetch.h' -- '.[ch]') to guide us in finding files that either need <linux/prefetch.h> inclusion, or have it despite not needing it. There are more of them around (mostly network drivers), but this gets many core ones. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-20 12:50:29 -07:00
Christoph Hellwig	1879fd6a26	add hlist_bl_lock/unlock helpers Now that the whole dcache_hash_bucket crap is gone, go all the way and also remove the weird locking layering violations for locking the hash buckets. Add hlist_bl_lock/unlock helpers to move the locking into the list abstraction instead of requiring each caller to open code it. After all allowing for the bit locks is the whole point of these helpers over the plain hlist variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-04-25 18:14:10 -07:00
Linus Torvalds	dea3667bc3	vfs: get rid of insane dentry hashing rules The dentry hashing rules have been really quite complicated for a long while, in odd ways. That made functions like __d_drop() very fragile and non-obvious. In particular, whether a dentry was hashed or not was indicated with an explicit DCACHE_UNHASHED bit. That's despite the fact that the hash abstraction that the dentries use actually have a 'is this entry hashed or not' model (which is a simple test of the 'pprev' pointer). The reason that was done is because we used the normal 'is this entry unhashed' model to mark whether the dentry had _ever_ been hashed in the dentry hash tables, and that logic goes back many years (commit `b3423415fb`: "dcache: avoid RCU for never-hashed dentries"). That, in turn, meant that __d_drop had totally different unhashing logic for the dentry hash table case and for the anonymous dcache case, because in order to use the "is this dentry hashed" logic as a flag for whether it had ever been on the RCU hash table, we had to unhash such a dentry differently so that we'd never think that it wasn't 'unhashed' and wouldn't be free'd correctly. That's just insane. It made the logic really hard to follow, when there were two different kinds of "unhashed" states, and one of them (the one that used "list_bl_unhashed()") really had nothing at all to do with being unhashed per se, but with a very subtle lifetime rule instead. So turn all of it around, and make it logical. Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether the dentry is on the hash chains or not, use the hash chain unhashed logic for that. Suddenly "d_unhashed()" just uses "list_bl_unhashed()", and everything makes sense. And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit. If we ever insert the dentry into the dentry hash table so that it is visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now needs the RCU lifetime rules. Now suddently that test at dentry free time makes sense too. And because unhashing now is sane and doesn't depend on where the dentry got unhashed from (because the dentry hash chain details doesn't have some subtle side effects), we can re-unify the __d_drop() logic and use common code for the unhashing. Also fix one more open-coded hash chain bit_spin_lock() that I missed in the previous chain locking cleanup commit. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-04-24 07:58:46 -07:00
Linus Torvalds	b07ad9967f	vfs: get rid of 'struct dcache_hash_bucket' abstraction It's a useless abstraction for 'hlist_bl_head', and it doesn't actually help anything - quite the reverse. All the users end up having to know about the hlist_bl_head details anyway, using 'struct hlist_bl_node *' etc. So it just makes the code look confusing. And the cost of it is extra '&b->head' syntactic noise, but more importantly it spuriously makes the hash table dentry list look different from the per-superblock DCACHE_DISCONNECTED dentry list. As a result, the code ended up using ad-hoc locking for one case and special helper functions for what is really another totally identical case in the very same function. Make it all look and work the same. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-04-23 22:32:03 -07:00
Linus Torvalds	7ebfa57f6d	vfs: fix incorrect dentry_update_name_case() BUG_ON() test The case we should be verifying when updating the dentry name is that the _parent_ inode (the directory) semaphore is held, not the semaphore for the dentry itself. It's the directory locking that rename and readdir() etc all care about. The comment just above even says so - but then the BUG_ON() still checked the dentry inode itself. Very few people noticed, because this helper function really isn't used for very much, so you had to be using ncpfs to ever hit it. I think I should just remove the BUG_ON (the function really has just one user), but let's run with it fixed for a while before getting rid of it entirely. Reported-and-tested-by: Bongani Hlope <bonganih@bankservafrica.com> Reported-and-tested-by: Bernd Feige <bernd.feige@uniklinik-freiburg.de> Cc: Petr Vandrovec <petr@vandrovec.name>, Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-04-15 07:34:26 -07:00
Linus Torvalds	f539abece1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fs: call security_d_instantiate in d_obtain_alias V2 lose 'mounting_here' argument in ->d_manage() don't pass 'mounting_here' flag to follow_down() change the locking order for namespace_sem fix deadlock in pivot_root() vfs: split off vfsmount-related parts of vfs_kern_mount() Some fixes for pstore kill simple_set_mnt()	2011-03-18 10:51:11 -07:00
Linus Torvalds	e16b396ce3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits) doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore Update cpuset info & webiste for cgroups dcdbas: force SMI to happen when expected arch/arm/Kconfig: remove one to many l's in the word. asm-generic/user.h: Fix spelling in comment drm: fix printk typo 'sracth' Remove one to many n's in a word Documentation/filesystems/romfs.txt: fixing link to genromfs drivers:scsi Change printk typo initate -> initiate serial, pch uart: Remove duplicate inclusion of linux/pci.h header fs/eventpoll.c: fix spelling mm: Fix out-of-date comments which refers non-existent functions drm: Fix printk typo 'failled' coh901318.c: Change initate to initiate. mbox-db5500.c Change initate to initiate. edac: correct i82975x error-info reported edac: correct i82975x mci initialisation edac: correct commented info fs: update comments to point correct document target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c ... Trivial conflict in fs/eventpoll.c (spelling vs addition)	2011-03-18 10:37:40 -07:00
Josef Bacik	24ff6663cc	fs: call security_d_instantiate in d_obtain_alias V2 While trying to track down some NFS problems with BTRFS, I kept noticing I was getting -EACCESS for no apparent reason. Eric Paris and printk() helped me figure out that it was SELinux that was giving me grief, with the following denial type=AVC msg=audit(1290013638.413:95): avc: denied { 0x800000 } for pid=1772 comm="nfsd" name="" dev=sda1 ino=256 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=file Turns out this is because in d_obtain_alias if we can't find an alias we create one and do all the normal instantiation stuff, but we don't do the security_d_instantiate. Usually we are protected from getting a hashed dentry that hasn't yet run security_d_instantiate() by the parent's i_mutex, but obviously this isn't an option there, so in order to deal with the case that a second thread comes in and finds our new dentry before we get to run security_d_instantiate(), we go ahead and call it if we find a dentry already. Eric assures me that this is ok as the code checks to see if the dentry has been initialized already so calling security_d_instantiate() against the same dentry multiple times is ok. With this patch I'm no longer getting errant -EACCESS values. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-18 10:02:09 -04:00
Trond Myklebust	c83ce989cb	VFS: Fix the nfs sillyrename regression in kernel 2.6.38 The new vfs locking scheme introduced in 2.6.38 breaks NFS sillyrename because the latter relies on being able to determine the parent directory of the dentry in the ->iput() callback in order to send the appropriate unlink rpc call. Looking at the code that cares about races with dput(), there doesn't seem to be anything that specifically uses d_parent as a test for whether or not there is a race: - __d_lookup_rcu(), __d_lookup() all test for d_hashed() after d_parent - shrink_dcache_for_umount() is safe since nothing else can rearrange the dentries in that super block. - have_submount(), select_parent() and d_genocide() can test for a deletion if we set the DCACHE_DISCONNECTED flag when the dentry is removed from the parent's d_subdirs list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org (2.6.38, needs commit `c826cb7dfc` "dcache.c: create helper function for duplicated functionality" ) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-15 15:46:11 -07:00
Linus Torvalds	c826cb7dfc	dcache.c: create helper function for duplicated functionality This creates a helper function for he "try to ascend into the parent directory" case, which was written out in triplicate before. With all the locking and subtle sequence number stuff, we really don't want to duplicate that kind of code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-15 15:29:21 -07:00
J. Bruce Fields	d891eedbc3	fs/dcache: allow d_obtain_alias() to return unhashed dentries Without this patch, inodes are not promptly freed on last close of an unlinked file by an nfs client: client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. - close: closes the existing filp, which is destroyed immediately by dput() since it's DCACHE_UNHASHED. - end of the compound: release the reference to the current filehandle, and dput() the new DCACHE_DISCONECTED dentry, which gets put on the unused list instead of being destroyed immediately. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 05:18:54 -05:00
Namhyung Kim	b0a4bb830e	fs: update comments to point correct document dcache-locking.txt is not exist any more, and the path was not correct anyway. Fix it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-02-17 16:41:32 +01:00
Randy Dunlap	ff5fdb6149	fs: fix new dcache.c kernel-doc warnings Fix new fs/dcache.c kernel-doc warnings: Warning(fs/dcache.c:184): No description found for parameter 'dentry' Warning(fs/dcache.c:296): No description found for parameter 'parent' Warning(fs/dcache.c:1985): No description found for parameter 'dparent' Warning(fs/dcache.c:1985): Excess function parameter 'parent' description in 'd_validate' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-22 20:32:38 -08:00
Linus Torvalds	f8206b925f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits) sanitize vfsmount refcounting changes fix old umount_tree() breakage autofs4: Merge the remaining dentry ops tables Unexport do_add_mount() and add in follow_automount(), not ->d_automount() Allow d_manage() to be used in RCU-walk mode Remove a further kludge from __do_follow_link() autofs4: Bump version autofs4: Add v4 pseudo direct mount support autofs4: Fix wait validation autofs4: Clean up autofs4_free_ino() autofs4: Clean up dentry operations autofs4: Clean up inode operations autofs4: Remove unused code autofs4: Add d_manage() dentry operation autofs4: Add d_automount() dentry operation Remove the automount through follow_link() kludge code from pathwalk CIFS: Use d_automount() rather than abusing follow_link() NFS: Use d_automount() rather than abusing follow_link() AFS: Use d_automount() rather than abusing follow_link() Add an AT_NO_AUTOMOUNT flag to suppress terminal automount ...	2011-01-16 11:31:50 -08:00
David Howells	9875cf8064	Add a dentry op to handle automounting rather than abusing follow_link() Add a dentry op (d_automount) to handle automounting directories rather than abusing the follow_link() inode operation. The operation is keyed off a new dentry flag (DCACHE_NEED_AUTOMOUNT). This also makes it easier to add an AT_ flag to suppress terminal segment automount during pathwalk and removes the need for the kludge code in the pathwalk algorithm to handle directories with follow_link() semantics. The ->d_automount() dentry operation: struct vfsmount (d_automount)(struct path mountpoint); takes a pointer to the directory to be mounted upon, which is expected to provide sufficient data to determine what should be mounted. If successful, it should return the vfsmount struct it creates (which it should also have added to the namespace using do_add_mount() or similar). If there's a collision with another automount attempt, NULL should be returned. If the directory specified by the parameter should be used directly rather than being mounted upon, -EISDIR should be returned. In any other case, an error code should be returned. The ->d_automount() operation is called with no locks held and may sleep. At this point the pathwalk algorithm will be in ref-walk mode. Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is added to handle mountpoints. It will return -EREMOTE if the automount flag was set, but no d_automount() op was supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR if the walk point should be used without mounting and 0 if successful. The path will be updated to point to the mounted filesystem if a successful automount took place. __follow_mount() is replaced by follow_managed() which is more generic (especially with the patch that adds ->d_manage()). This handles transits from directories during pathwalk, including automounting and skipping over mountpoints (and holding processes with the next patch). __follow_mount_rcu() will jump out of RCU-walk mode if it encounters an automount point with nothing mounted on it. follow_dotdot() does not handle automounts as you don't want to trigger them whilst following "..". I've also extracted the mount/don't-mount logic from autofs4 and included it here. It makes the mount go ahead anyway if someone calls open() or creat(), tries to traverse the directory, tries to chdir/chroot/etc. into the directory, or sticks a '/' on the end of the pathname. If they do a stat(), however, they'll only trigger the automount if they didn't also say O_NOFOLLOW. I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their inodes as automount points. This flag is automatically propagated to the dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate(). This saves NFS and could save AFS a private flag bit apiece, but is not strictly necessary. It would be preferable to do the propagation in d_set_d_op(), but that doesn't normally have access to the inode. [AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu() succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after that, rather than just returning with ungrabbed *path] Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:05:03 -05:00
Linus Torvalds	6f7f7caab2	Turn d_set_d_op() BUG_ON() into WARN_ON_ONCE() It's indicative of a real problem, and it actually triggers with autofs4, but the BUG_ON() is excessive. The autofs4 case is being fixed (to only set d_op in the ->lookup method) but not merged yet. In the meantime this gets the code limping along. Reported-by: Alex Elder <aelder@sgi.com> Cc: Ian Kent <raven@themaw.net> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-14 13:26:18 -08:00
Randy Dunlap	208898c17a	fs: fix kernel-doc for dcache::prepend_path Fix function kernel-doc warning for prepend_path(): Warning(fs/dcache.c:1924): missing initial short description on line: Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:06:57 -05:00
Randy Dunlap	1c977540fd	fs: fix kernel-doc for dcache::d_validate Fix function parameter kernel-doc for d_validate(): Warning(fs/dcache.c:1495): No description found for parameter 'parent' Warning(fs/dcache.c:1495): Excess function parameter 'dparent' description in 'd_validate' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:06:55 -05:00
Al Viro	c8aebb0c9f	per-superblock default ->d_op Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:34 -05:00
Nick Piggin	9d55c369bb	fs: implement faster dentry memcmp The standard memcmp function on a Westmere system shows up hot in profiles in the `git diff` workload (both parallel and single threaded), and it is likely due to the costs associated with trapping into microcode, and little opportunity to improve memory access (dentry name is not likely to take up more than a cacheline). So replace it with an open-coded byte comparison. This increases code size by 8 bytes in the critical __d_lookup_rcu function, but the speedup is huge, averaging 10 runs of each: git diff st user sys elapsed CPU before 1.15 2.57 3.82 97.1 after 1.14 2.35 3.61 96.8 git diff mt user sys elapsed CPU before 1.27 3.85 1.46 349 after 1.26 3.54 1.43 333 Elapsed time for single threaded git diff at 95.0% confidence: -0.21 +/- 0.01 -5.45% +/- 0.24% It's -0.66% +/- 0.06% elapsed time on my Opteron, so rep cmp costs on the fam10h seem to be relatively smaller, but there is still a win. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:32 +11:00
Nick Piggin	e1bb578263	fs: prefetch inode data in dcache lookup This makes single threaded git diff -1.25% +/- 0.05% elapsed time on my 2s12c24t Westmere system, and -0.86% +/- 0.05% on my 2s8c Barcelona, by prefetching the important first cacheline of the inode in while we do the actual name compare and other operations on the dentry. There was no measurable slowdown in the single file stat case, or the creat case (where negative dentries would be common). Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:32 +11:00
Nick Piggin	4b936885ab	fs: improve scalability of pseudo filesystems Regardless of how much we possibly try to scale dcache, there is likely always going to be some fundamental contention when adding or removing children under the same parent. Pseudo filesystems do not seem need to have connected dentries because by definition they are disconnected. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:32 +11:00
Nick Piggin	873feea09e	fs: dcache per-inode inode alias locking dcache_inode_lock can be replaced with per-inode locking. Use existing inode->i_lock for this. This is slightly non-trivial because we sometimes need to find the inode from the dentry, which requires d_inode to be stabilised (either with refcount or d_lock). Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:31 +11:00
Nick Piggin	ceb5bdc2d2	fs: dcache per-bucket dcache hash locking We can turn the dcache hash locking from a global dcache_hash_lock into per-bucket locking. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:31 +11:00
Nick Piggin	44a7d7a878	fs: cache optimise dentry and inode for rcu-walk Put dentry and inode fields into top of data structure. This allows RCU path traversal to perform an RCU dentry lookup in a path walk by touching only the first 56 bytes of the dentry. We also fit in 8 bytes of inline name in the first 64 bytes, so for short names, only 64 bytes needs to be touched to perform the lookup. We should get rid of the hash->prev pointer from the first 64 bytes, and fit 16 bytes of name in there, which will take care of 81% rather than 32% of the kernel tree. inode is also rearranged so that RCU lookup will only touch a single cacheline in the inode, plus one in the i_ops structure. This is important for directory component lookups in RCU path walking. In the kernel source, directory names average is around 6 chars, so this works. When we reach the last element of the lookup, we need to lock it and take its refcount which requires another cacheline access. Align dentry and inode operations structs, so members will be at predictable offsets and we can group common operations into head of structure. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Nick Piggin	fb045adb99	fs: dcache reduce branches in lookup path Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/$[^\t ]$->d_op = $.$;/d_set_d_op(\1, \2);/' -e 's/$[^\t ]$\.d_op = $.$;/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Nick Piggin	5f57cbcc02	fs: dcache remove d_mounted Rather than keep a d_mounted count in the dentry, set a dentry flag instead. The flag can be cleared by checking the hash table to see if there are any mounts left, which is not time critical because it is performed at detach time. The mounted state of a dentry is only used to speculatively take a look in the mount hash table if it is set -- before following the mount, vfsmount lock is taken and mount re-checked without races. This saves 4 bytes on 32-bit, nothing on 64-bit but it does provide a hole I might use later (and some configs have larger than 32-bit spinlocks which might make use of the hole). Autofs4 conversion and changelog by Ian Kent <raven@themaw.net>: In autofs4, when expring direct (or offset) mounts we need to ensure that we block user path walks into the autofs mount, which is covered by another mount. To do this we clear the mounted status so that follows stop before walking into the mount and are essentially blocked until the expire is completed. The automount daemon still finds the correct dentry for the umount due to the follow mount logic in fs/autofs4/root.c:autofs4_follow_link(), which is set as an inode operation for direct and offset mounts only and is called following the lookup that stopped at the covered mount. At the end of the expire the covering mount probably has gone away so the mounted status need not be restored. But we need to check this and only restore the mounted status if the expire failed. XXX: autofs may not work right if we have other mounts go over the top of it? Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Nick Piggin	31e6b01f41	fs: rcu-walk for path lookup Perform common cases of path lookups without any stores or locking in the ancestor dentry elements. This is called rcu-walk, as opposed to the current algorithm which is a refcount based walk, or ref-walk. This results in far fewer atomic operations on every path element, significantly improving path lookup performance. It also avoids cacheline bouncing on common dentries, significantly improving scalability. The overall design is like this: * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk. * Take the RCU lock for the entire path walk, starting with the acquiring of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are not required for dentry persistence. * synchronize_rcu is called when unregistering a filesystem, so we can access d_ops and i_ops during rcu-walk. * Similarly take the vfsmount lock for the entire path walk. So now mnt refcounts are not required for persistence. Also we are free to perform mount lookups, and to assume dentry mount points and mount roots are stable up and down the path. * Have a per-dentry seqlock to protect the dentry name, parent, and inode, so we can load this tuple atomically, and also check whether any of its members have changed. * Dentry lookups (based on parent, candidate string tuple) recheck the parent sequence after the child is found in case anything changed in the parent during the path walk. * inode is also RCU protected so we can load d_inode and use the inode for limited things. * i_mode, i_uid, i_gid can be tested for exec permissions during path walk. * i_op can be loaded. When we reach the destination dentry, we lock it, recheck lookup sequence, and increment its refcount and mountpoint refcount. RCU and vfsmount locks are dropped. This is termed "dropping rcu-walk". If the dentry refcount does not match, we can not drop rcu-walk gracefully at the current point in the lokup, so instead return -ECHILD (for want of a better errno). This signals the path walking code to re-do the entire lookup with a ref-walk. Aside from the final dentry, there are other situations that may be encounted where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take a reference on the last good dentry) and continue with a ref-walk. Again, if we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup using ref-walk. But it is very important that we can continue with ref-walk for most cases, particularly to avoid the overhead of double lookups, and to gain the scalability advantages on common path elements (like cwd and root). The cases where rcu-walk cannot continue are: * NULL dentry (ie. any uncached path element) * parent with d_inode->i_op->permission or ACLs * dentries with d_revalidate * Following links In future patches, permission checks and d_revalidate become rcu-walk aware. It may be possible eventually to make following links rcu-walk aware. Uncached path elements will always require dropping to ref-walk mode, at the very least because i_mutex needs to be grabbed, and objects allocated. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:27 +11:00
Nick Piggin	77812a1ef1	fs: consolidate dentry kill sequence The tricky locking for disposing of a dentry is duplicated 3 times in the dcache (dput, pruning a dentry from the LRU, and pruning its ancestors). Consolidate them all into a single function dentry_kill. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:25 +11:00
Nick Piggin	ec33679d78	fs: use RCU in shrink_dentry_list to reduce lock nesting Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:25 +11:00
Nick Piggin	be182bff72	fs: reduce dcache_inode_lock width in lru scanning Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:25 +11:00
Nick Piggin	89e6054836	fs: dcache reduce prune_one_dentry locking prune_one_dentry can avoid quite a bit of locking in the common case where ancestors have an elevated refcount. Alternatively, we could have gone the other way and made fewer trylocks in the case where d_count goes to zero, but is probably less common. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:25 +11:00
Nick Piggin	a734eb458a	fs: dcache reduce d_parent locking Use RCU to simplify locking in dget_parent. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:24 +11:00
Nick Piggin	dc0474be3e	fs: dcache rationalise dget variants dget_locked was a shortcut to avoid the lazy lru manipulation when we already held dcache_lock (lru manipulation was relatively cheap at that point). However, how that the lru lock is an innermost one, we never hold it at any caller, so the lock cost can now be avoided. We already have well working lazy dcache LRU, so it should be fine to defer LRU manipulations to scan time. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:24 +11:00
Nick Piggin	357f8e658b	fs: dcache reduce dcache_inode_lock dcache_inode_lock can be avoided in d_delete() and d_materialise_unique() in cases where it is not required. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:24 +11:00
Nick Piggin	89ad485f01	fs: dcache reduce locking in d_alloc Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:24 +11:00
Nick Piggin	61f3dee4af	fs: dcache reduce dput locking It is possible to run dput without taking data structure locks up-front. In many cases where we don't kill the dentry anyway, these locks are not required. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:23 +11:00
Nick Piggin	58db63d086	fs: dcache avoid starvation in dcache multi-step operations Long lived dcache "multi-step" operations which retry on rename seq can be starved with a lot of rename activity. If they fail after the 1st pass, take the rename_lock for writing to avoid further starvation. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:23 +11:00
Nick Piggin	b5c84bf6f6	fs: dcache remove dcache_lock dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:23 +11:00
Nick Piggin	949854d024	fs: Use rename lock and RCU for multi-step operations The remaining usages for dcache_lock is to allow atomic, multi-step read-side operations over the directory tree by excluding modifications to the tree. Also, to walk in the leaf->root direction in the tree where we don't have a natural d_lock ordering. This could be accomplished by taking every d_lock, but this would mean a huge number of locks and actually gets very tricky. Solve this instead by using the rename seqlock for multi-step read-side operations, retry in case of a rename so we don't walk up the wrong parent. Concurrent dentry insertions are not serialised against. Concurrent deletes are tricky when walking up the directory: our parent might have been deleted when dropping locks so also need to check and retry for that. We can also use the rename lock in cases where livelock is a worry (and it is introduced in subsequent patch). Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:22 +11:00
Nick Piggin	9abca36087	fs: increase d_name lock coverage Cover d_name with d_lock in more cases, where there may be concurrent modification to it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:22 +11:00
Nick Piggin	b23fb0a603	fs: scale inode alias list Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list from concurrent modification. d_alias is also protected by d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:22 +11:00
Nick Piggin	2fd6b7f507	fs: dcache scale subdirs Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:21 +11:00
Nick Piggin	da5029563a	fs: dcache scale d_unhashed Protect d_unhashed(dentry) condition with d_lock. This means keeping DCACHE_UNHASHED bit in synch with hash manipulations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:21 +11:00
Nick Piggin	b7ab39f631	fs: dcache scale dentry refcount Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:21 +11:00
Nick Piggin	2304450783	fs: dcache scale lru Add a new lock, dcache_lru_lock, to protect the dcache LRU list from concurrent modification. d_lru is also protected by d_lock, which allows LRU lists to be accessed without the lru lock, using RCU in future patches. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:20 +11:00
Nick Piggin	789680d1ee	fs: dcache scale hash Add a new lock, dcache_hash_lock, to protect the dcache hash table from concurrent modification. d_hash is also protected by d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:20 +11:00
Nick Piggin	ec2447c278	hostfs: simplify locking Remove dcache_lock locking from hostfs filesystem, and move it into dcache helpers. All that is required is a coherent path name. Protection from concurrent modification of the namespace after path name generation is not provided in current code, because dcache_lock is dropped before the path is used. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:20 +11:00
Nick Piggin	b1e6a015a5	fs: change d_hash for rcu-walk Change d_hash so it may be called from lock-free RCU lookups. See similar patch for d_compare for details. For in-tree filesystems, this is just a mechanical change. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:20 +11:00
Nick Piggin	621e155a35	fs: change d_compare for rcu-walk Change d_compare so it may be called from lock-free RCU lookups. This does put significant restrictions on what may be done from the callback, however there don't seem to have been any problems with in-tree fses. If some strange use case pops up that _really_ cannot cope with the rcu-walk rules, we can just add new rcu-unaware callbacks, which would cause name lookup to drop out of rcu-walk mode. For in-tree filesystems, this is just a mechanical change. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:19 +11:00
Nick Piggin	fb2d5b86af	fs: name case update method smpfs and ncpfs want to update a live dentry name in-place. Rather than have them open code the locking, provide a documented dcache API. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:19 +11:00
Nick Piggin	fe15ce446b	fs: change d_delete semantics Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:18 +11:00
Nick Piggin	3e880fb5e4	fs: use fast counters for vfs caches percpu_counter library generates quite nasty code, so unless you need to dynamically allocate counters or take fast approximate value, a simple per cpu set of counters is much better. The percpu_counter can never be made to work as well, because it has an indirection from pointer to percpu memory, and it can't use direct this_cpu_inc interfaces because it doesn't use static PER_CPU data, so code will always be worse. In the fastpath, it is the difference between this: incl %gs:nr_dentry # nr_dentry and this: movl percpu_counter_batch(%rip), %edx # percpu_counter_batch, movl $1, %esi #, movq $nr_dentry, %rdi #, call __percpu_counter_add # (plus I clobber registers) __percpu_counter_add: pushq %rbp # movq %rsp, %rbp #, subq $32, %rsp #, movq %rbx, -24(%rbp) #, movq %r12, -16(%rbp) #, movq %r13, -8(%rbp) #, movq %rdi, %rbx # fbc, fbc #APP # 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1 movq %gs:kernel_stack,%rax #, pfo_ret__ # 0 "" 2 #NO_APP incl -8124(%rax) # <variable>.preempt_count movq 32(%rdi), %r12 # <variable>.counters, tcp_ptr__ #APP # 78 "lib/percpu_counter.c" 1 add %gs:this_cpu_off, %r12 # this_cpu_off, tcp_ptr__ # 0 "" 2 #NO_APP movslq (%r12),%r13 #* tcp_ptr__, tmp73 movslq %edx,%rax # batch, batch addq %rsi, %r13 # amount, count cmpq %rax, %r13 # batch, count jge .L27 #, negl %edx # tmp76 movslq %edx,%rdx # tmp76, tmp77 cmpq %rdx, %r13 # tmp77, count jg .L28 #, .L27: movq %rbx, %rdi # fbc, call _raw_spin_lock # addq %r13, 8(%rbx) # count, <variable>.count movq %rbx, %rdi # fbc, movl $0, (%r12) #,* tcp_ptr__ call _raw_spin_unlock # .L29: #APP # 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1 movq %gs:kernel_stack,%rax #, pfo_ret__ # 0 "" 2 #NO_APP decl -8124(%rax) # <variable>.preempt_count movq -8136(%rax), %rax #, D.14625 testb $8, %al #, D.14625 jne .L32 #, .L31: movq -24(%rbp), %rbx #, movq -16(%rbp), %r12 #, movq -8(%rbp), %r13 #, leave ret .p2align 4,,10 .p2align 3 .L28: movl %r13d, (%r12) # count,* jmp .L29 # .L32: call preempt_schedule # .p2align 4,,6 jmp .L31 # .size __percpu_counter_add, .-__percpu_counter_add .p2align 4,,15 Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:17 +11:00
Nick Piggin	86c8749ede	vfs: revert per-cpu nr_unused counters for dentry and inodes The nr_unused counters count the number of objects on an LRU, and as such they are synchronized with LRU object insertion and removal and scanning, and protected under the LRU lock. Making it per-cpu does not actually get any concurrency improvements because of this lock, and summing the counter is much slower, and incrementing/decrementing it costs more code size and is slower too. These counters should stay per-LRU, which currently means global. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:17 +11:00
Nick Piggin	786a5e15b6	fs: d_validate fixes d_validate has been broken for a long time. kmem_ptr_validate does not guarantee that a pointer can be dereferenced if it can go away at any time. Even rcu_read_lock doesn't help, because the pointer might be queued in RCU callbacks but not executed yet. So the parent cannot be checked, nor the name hashed. The dentry pointer can not be touched until it can be verified under lock. Hashing simply cannot be used. Instead, verify the parent/child relationship by traversing parent's d_child list. It's slow, but only ncpfs and the destaged smbfs care about it, at this point. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:16 +11:00
Nick Piggin	d3a23e1678	Revert "fs: use RCU read side protection in d_validate" This reverts commit `3825bdb7ed`. You cannot dget() a dentry without having a reference, or holding a lock that guarantees it remains valid. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-05 20:01:21 +11:00
Christoph Hellwig	3825bdb7ed	fs: use RCU read side protection in d_validate d_validate does a purely read lookup in the dentry hash, so use RCU read side locking instead of dcache_lock. Split out from a larget patch by Nick Piggin <npiggin@suse.de>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:13 -04:00
Christoph Hellwig	a4633357ac	fs: clean up dentry lru modification Always do a list_del_init on the LRU to make sure the list_empty invariant for not beeing on the LRU always holds true, and fold dentry_lru_del_init into dentry_lru_del. Replace the dentry_lru_add_tail primitive with a dentry_lru_move_tail operations that simpler when the dentry already is one the list, which is always is. Move the list_empty into dentry_lru_add to fit the scheme of the other lru helpers, and simplify locking once we move to a separate LRU lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:13 -04:00
Christoph Hellwig	3049cfe24e	fs: split __shrink_dcache_sb Currently __shrink_dcache_sb has an extremly awkward calling convention because it tries to please very different callers. Split out the main loop into a shrink_dentry_list helper, which gets called directly from shrink_dcache_sb for the cases where all dentries need to be pruned, or from __shrink_dcache_sb for pruning only a certain number of dentries. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:13 -04:00
Nick Piggin	265ac90230	fs: improve DCACHE_REFERENCED usage dentry referenced bit is only set when installing the dentry back onto the LRU. However with lazy LRU, the dentry can already be on the LRU list at dput time, thus missing out on setting the referenced bit. Fix this. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:12 -04:00
Christoph Hellwig	312d3ca856	fs: use percpu counter for nr_dentry and nr_dentry_unused The nr_dentry stat is a globally touched cacheline and atomic operation twice over the lifetime of a dentry. It is used for the benfit of userspace only. Turn it into a per-cpu counter and always decrement it in d_free instead of doing various batching operations to reduce lock hold times in the callers. Based on an earlier patch from Nick Piggin <npiggin@suse.de>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:12 -04:00
Christoph Hellwig	9c82ab9c9e	fs: simplify __d_free Remove d_callback and always call __d_free with a RCU head. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:12 -04:00
Christoph Hellwig	be148247cf	fs: take dcache_lock inside __d_path All callers take dcache_lock just around the call to __d_path, so take the lock into it in preparation of getting rid of dcache_lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:12 -04:00
Nick Piggin	99b7db7b8f	fs: brlock vfsmount_lock fs: brlock vfsmount_lock Use a brlock for the vfsmount lock. It must be taken for write whenever modifying the mount hash or associated fields, and may be taken for read when performing mount hash lookups. A new lock is added for the mnt-id allocator, so it doesn't need to take the heavy vfsmount write-lock. The number of atomics should remain the same for fastpath rlock cases, though code would be slightly slower due to per-cpu access. Scalability is not not be much improved in common cases yet, due to other locks (ie. dcache_lock) getting in the way. However path lookups crossing mountpoints should be one case where scalability is improved (currently requiring the global lock). The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node Altix system (high latency to remote nodes), a simple umount microbenchmark (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it took 6.8s, afterwards took 7.1s, about 5% slower. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-18 08:35:48 -04:00
Nick Piggin	b04f784e5d	fs: remove extra lookup in __lookup_hash fs: remove extra lookup in __lookup_hash Optimize lookup for create operations, where no dentry should often be common-case. In cases where it is not, such as unlink, the added overhead is much smaller than the removed. Also, move comments about __d_lookup racyness to the __d_lookup call site. d_lookup is intuitive; __d_lookup is what needs commenting. So in that same vein, add kerneldoc comments to __d_lookup and clean up some of the comments: - We are interested in how the RCU lookup works here, particularly with renames. Make that explicit, and point to the document where it is explained in more detail. - RCU is pretty standard now, and macros make implementations pretty mindless. If we want to know about RCU barrier details, we look in RCU code. - Delete some boring legacy comments because we don't care much about how the code used to work, more about the interesting parts of how it works now. So comments about lazy LRU may be interesting, but would better be done in the LRU or refcount management code. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-18 08:35:47 -04:00
Randy Dunlap	cd956a1c03	fs/dcache: fix function param name in kernel-doc Fix parameter name in kernel-doc notation (causes a warning). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-14 16:21:00 -07:00
Miklos Szeredi	8df9d1a414	vfs: show unreachable paths in getcwd and proc Prepend "(unreachable)" to path strings if the path is not reachable from the current root. Two places updated are - the return string from getcwd() - and symlinks under /proc/$PID. Other uses of d_path() are left unchanged (we know that some old software crashes if /proc/mounts is changed). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-11 00:29:47 -04:00
Miklos Szeredi	ffd1f4ed5b	vfs: only add " (deleted)" where necessary __d_path() has 4 callers: d_path() sys_getcwd() seq_path_root() tomoyo_realpath_from_path2() Of these the only one which needs the " (deleted)" ending is d_path(). sys_getcwd() checks for existence before calling __d_path(). seq_path_root() is used to show the mountpoint path in /proc/PID/mountinfo, which is always a positive. And tomoyo doesn't want the deleted ending. Create a helper "path_with_deleted()" as subsequent patches will need this in multiple places. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-11 00:28:21 -04:00
Miklos Szeredi	f2eb6575d5	vfs: add prepend_path() helper Split off prepend_path() from __d_path(). This new helper takes an end-of-buffer pointer and buffer-length pointer just like the other prepend_* functions. Move the " (deleted)" postfix out to __d_path(). This patch doesn't change any functionality but paves the way for the following patches. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-11 00:28:21 -04:00
Miklos Szeredi	98dc568bc2	vfs: __d_path: dont prepend the name of the root dentry In the old times pseudo-filesystems set the name of theroot dentry to some prefix like "pipe:" and the name of the child dentry to "[123]" and relied on a hack in __d_path() to replace the preceding slash with the root's name to get "pipe:[123]". Then the d_dname() dentry operation was introduced which solved the same problem without having to pre-fill the name in each dentry. Currently the following pseudo filesystems exist in the kernel: perfmon mtd anon_inode bdev pipe socket Of these only perfmon, anon_inode, pipe and socket create sub-dentries, all of which have now been switched to using d_dname(). bdev and mtd only create inodes. This means that now the hack to overwrite the slash can be removed, so for unreachable paths (e.g. within a detached mount) the path string won't be polluted with garbage. For these cases a subsequent patch will add a prefix, indicating that the path is unreachable. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-11 00:28:21 -04:00
Miklos Szeredi	f7ad3c6be9	vfs: add helpers to get root and pwd Add three helpers that retrieve a refcounted copy of the root and cwd from the supplied fs_struct. get_fs_root() get_fs_pwd() get_fs_root_and_pwd() Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-11 00:28:20 -04:00
Al Viro	dca332528b	no need for list_for_each_entry_safe()/resetting with superblock list just delay __put_super() a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:49:02 -04:00
Al Viro	c103135c14	new helper: __dentry_path() builds path relative to fs root, called under dcache_lock, doesn't append any nonsense to unlinked ones. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:13 -04:00
Dave Chinner	7f8275d0d6	mm: add context argument to shrinker callback The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-07-19 14:56:17 +10:00
npiggin@suse.de	57439f878a	fs: fix superblock iteration race list_for_each_entry_safe is not suitable to protect against concurrent modification of the list. `6754af6` introduced a race in sb walking. list_for_each_entry can use the trick of pinning the current entry in the list before we drop and retake the lock because it subsequently follows cur->next. However list_for_each_entry_safe saves n=cur->next for following before entering the loop body, so when the lock is dropped, n may be deleted. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 10:38:22 -07:00

1 2 3 4 5 ...

256 Commits