linux

Author	SHA1	Message	Date
Jeff Layton	90478939dc	locks: require that flock->l_pid be set to 0 for file-private locks Neil Brown suggested potentially overloading the l_pid value as a "lock context" field for file-private locks. While I don't think we will probably want to do that here, it's probably a good idea to ensure that in the future we could extend this API without breaking existing callers. Typically the l_pid value is ignored for incoming struct flock arguments, serving mainly as a place to return the pid of the owner if there is a conflicting lock. For file-private locks, require that it currently be set to 0 and return EINVAL if it isn't. If we eventually want to make a non-zero l_pid mean something, then this will help ensure that we don't break legacy programs that are using file-private locks. Cc: Neil Brown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	5d50ffd7c3	locks: add new fcntl cmd values for handling file private locks Due to some unfortunate history, POSIX locks have very strange and unhelpful semantics. The thing that usually catches people by surprise is that they are dropped whenever the process closes any file descriptor associated with the inode. This is extremely problematic for people developing file servers that need to implement byte-range locks. Developers often need a "lock management" facility to ensure that file descriptors are not closed until all of the locks associated with the inode are finished. Additionally, "classic" POSIX locks are owned by the process. Locks taken between threads within the same process won't conflict with one another, which renders them useless for synchronization between threads. This patchset adds a new type of lock that attempts to address these issues. These locks conflict with classic POSIX read/write locks, but have semantics that are more like BSD locks with respect to inheritance and behavior on close. This is implemented primarily by changing how fl_owner field is set for these locks. Instead of having them owned by the files_struct of the process, they are instead owned by the filp on which they were acquired. Thus, they are inherited across fork() and are only released when the last reference to a filp is put. These new semantics prevent them from being merged with classic POSIX locks, even if they are acquired by the same process. These locks will also conflict with classic POSIX locks even if they are acquired by the same process or on the same file descriptor. The new locks are managed using a new set of cmd values to the fcntl() syscall. The initial implementation of this converts these values to "classic" cmd values at a fairly high level, and the details are not exposed to the underlying filesystem. We may eventually want to push this handing out to the lower filesystem code but for now I don't see any need for it. Also, note that with this implementation the new cmd values are only available via fcntl64() on 32-bit arches. There's little need to add support for legacy apps on a new interface like this. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	57b65325fe	locks: skip deadlock detection on FL_FILE_PVT locks It's not really feasible to do deadlock detection with FL_FILE_PVT locks since they aren't owned by a single task, per-se. Deadlock detection also tends to be rather expensive so just skip it for these sorts of locks. Also, add a FIXME comment about adding more limited deadlock detection that just applies to ro -> rw upgrades, per Andy's request. Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	c1e62b8fc3	locks: pass the cmd value to fcntl_getlk/getlk64 Once we introduce file private locks, we'll need to know what cmd value was used, as that affects the ownership and whether a conflict would arise. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	3fd80cddc6	locks: report l_pid as -1 for FL_FILE_PVT locks FL_FILE_PVT locks are no longer tied to a particular pid, and are instead inheritable by child processes. Report a l_pid of '-1' for these sorts of locks since the pid is somewhat meaningless for them. This precedent comes from FreeBSD. There, POSIX and flock() locks can conflict with one another. If fcntl(F_GETLK, ...) returns a lock set with flock() then the l_pid member cannot be a process ID because the lock is not held by a process as such. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	c918d42a27	locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT" In a later patch, we'll be adding a new type of lock that's owned by the struct file instead of the files_struct. Those sorts of locks will be flagged with a new FL_FILE_PVT flag. Report these types of locks as "FLPVT" in /proc/locks to distinguish them from "classic" POSIX locks. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	78ed8a1338	locks: rename locks_remove_flock to locks_remove_file This function currently removes leases in addition to flock locks and in a later patch we'll have it deal with file-private locks too. Rename it to locks_remove_file to indicate that it removes locks that are associated with a particular struct file, and not just flock locks. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	bce7560d49	locks: consolidate checks for compatible filp->f_mode values in setlk handlers Move this check into flock64_to_posix_lock instead of duplicating it in two places. This also fixes a minor wart in the code where we continue referring to the struct flock after converting it to struct file_lock. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
J. Bruce Fields	ef12e72a01	locks: fix posix lock range overflow handling In the 32-bit case fcntl assigns the 64-bit f_pos and i_size to a 32-bit off_t. The existing range checks also seem to depend on signed arithmetic wrapping when it overflows. In practice maybe that works, but we can be more careful. That also allows us to make a more reliable distinction between -EINVAL and -EOVERFLOW. Note that in the 32-bit case SEEK_CUR or SEEK_END might allow the caller to set a lock with starting point no longer representable as a 32-bit value. We could return -EOVERFLOW in such cases, but the locks code is capable of handling such ranges, so we choose to be lenient here. The only problem is that subsequent GETLK calls on such a lock will fail with EOVERFLOW. While we're here, do some cleanup including consolidating code for the flock and flock64 cases. Signed-off-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	8c3cac5e6a	locks: eliminate BUG() call when there's an unexpected lock on file close A leftover lock on the list is surely a sign of a problem of some sort, but it's not necessarily a reason to panic the box. Instead, just log a warning with some info about the lock, and then delete it like we would any other lock. In the event that the filesystem declares a ->lock f_op, we may end up leaking something, but that's generally preferable to an immediate panic. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	b03dfdec03	locks: add __acquires and __releases annotations to locks_start and locks_stop ...to make sparse happy. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	6ca10ed8ed	locks: remove "inline" qualifier from fl_link manipulation functions It's best to let the compiler decide that. Acked-by: J. Bruce Fields <bfields@fieldses.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	46dad7603f	locks: clean up comment typo Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	24cbe7845e	locks: close potential race between setlease and open As Al Viro points out, there is an unlikely, but possible race between opening a file and setting a lease on it. generic_add_lease is done with the i_lock held, but the inode->i_flock check in break_lease is lockless. It's possible for another task doing an open to do the entire pathwalk and call break_lease between the point where generic_add_lease checks for a conflicting open and adds the lease to the list. If this occurs, we can end up with a lease set on the file with a conflicting open. To guard against that, check again for a conflicting open after adding the lease to the i_flock list. If the above race occurs, then we can simply unwind the lease setting and return -EAGAIN. Because we take dentry references and acquire write access on the file before calling break_lease, we know that if the i_flock list is empty when the open caller goes to check it then the necessary refcounts have already been incremented. Thus the additional check for a conflicting open will see that there is one and the setlease call will fail. Cc: Bruce Fields <bfields@fieldses.org> Cc: David Howells <dhowells@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@fieldses.org>	2014-03-31 08:24:42 -04:00
Dan Carpenter	4fdb793ffe	locks: missing unlock on error in generic_add_lease() We should unlock here before returning. Fixes: `df4e8d2c1d` ('locks: implement delegations') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-13 07:30:53 -05:00
J. Bruce Fields	df4e8d2c1d	locks: implement delegations Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock type. Note nfsd is the only delegation user and is only using read delegations. Warn on any attempt to set a write delegation for now. We'll come back to that case later. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-09 00:16:41 -05:00
J. Bruce Fields	617588d518	locks: introduce new FL_DELEG lock flag For now FL_DELEG is just a synonym for FL_LEASE. So this patch doesn't change behavior. Next we'll modify break_lease to treat FL_DELEG leases differently, to account for the fact that NFSv4 delegations should be broken in more situations than Windows oplocks. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-09 00:16:41 -05:00
Al Viro	72c2d53192	file->f_op is never NULL... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:54 -04:00
Jeff Layton	7012b02a2b	locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock The file_lock_list is only used for /proc/locks. The vastly common case is for locks to be put onto the list and come off again, without ever being traversed. Help optimize for this use-case by moving to percpu hlist_head-s. At the same time, we can make the locking less contentious by moving to an lglock. When iterating over the lists for /proc/locks, we must take the global lock and then iterate over each CPU's list in turn. This change necessitates a new fl_link_cpu field to keep track of which CPU the entry is on. On x86_64 at least, this field is placed within an existing hole in the struct to avoid growing the size. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-07-08 13:36:42 +04:00
Al Viro	84d08fa888	helper for reading ->d_count Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-07-05 18:59:33 +04:00
Jeff Layton	7b2296afb3	locks: give the blocked_hash its own spinlock There's no reason we have to protect the blocked_hash and file_lock_list with the same spinlock. With the tests I have, breaking it in two gives a barely measurable performance benefit, but it seems reasonable to make this locking as granular as possible. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:46 +04:00
Jeff Layton	3999e49364	locks: add a new "lm_owner_key" lock operation Currently, the hashing that the locking code uses to add these values to the blocked_hash is simply calculated using fl_owner field. That's valid in most cases except for server-side lockd, which validates the owner of a lock based on fl_owner and fl_pid. In the case where you have a small number of NFS clients doing a lot of locking between different processes, you could end up with all the blocked requests sitting in a very small number of hash buckets. Add a new lm_owner_key operation to the lock_manager_operations that will generate an unsigned long to use as the key in the hashtable. That function is only implemented for server-side lockd, and simply XORs the fl_owner and fl_pid. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:45 +04:00
Jeff Layton	48f7418654	locks: turn the blocked_list into a hashtable Break up the blocked_list into a hashtable, using the fl_owner as a key. This speeds up searching the hash chains, which is especially significant for deadlock detection. Note that the initial implementation assumes that hashing on fl_owner is sufficient. In most cases it should be, with the notable exception being server-side lockd, which compares ownership using a tuple of the nlm_host and the pid sent in the lock request. So, this may degrade to a single hash bucket when you only have a single NFS client. That will be addressed in a later patch. The careful observer may note that this patch leaves the file_lock_list alone. There's much less of a case for turning the file_lock_list into a hashtable. The only user of that list is the code that generates /proc/locks, and it always walks the entire list. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:44 +04:00
Jeff Layton	139ca04ee5	locks: convert fl_link to a hlist_node Testing has shown that iterating over the blocked_list for deadlock detection turns out to be a bottleneck. In order to alleviate that, begin the process of turning it into a hashtable. We start by turning the fl_link into a hlist_node and the global lists into hlists. A later patch will do the conversion of the blocked_list to a hashtable. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:44 +04:00
Jeff Layton	4e8c765d38	locks: avoid taking global lock if possible when waking up blocked waiters Since we always hold the i_lock when inserting a new waiter onto the fl_block list, we can avoid taking the global lock at all if we find that it's empty when we go to wake up blocked waiters. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:43 +04:00
Jeff Layton	1c8c601a8c	locks: protect most of the file_lock handling with i_lock Having a global lock that protects all of this code is a clear scalability problem. Instead of doing that, move most of the code to be protected by the i_lock instead. The exceptions are the global lists that the ->fl_link sits on, and the ->fl_block list. ->fl_link is what connects these structures to the global lists, so we must ensure that we hold those locks when iterating over or updating these lists. Furthermore, sound deadlock detection requires that we hold the blocked_list state steady while checking for loops. We also must ensure that the search and update to the list are atomic. For the checking and insertion side of the blocked_list, push the acquisition of the global lock into __posix_lock_file and ensure that checking and update of the blocked_list is done without dropping the lock in between. On the removal side, when waking up blocked lock waiters, take the global lock before walking the blocked list and dequeue the waiters from the global list prior to removal from the fl_block list. With this, deadlock detection should be race free while we minimize excessive file_lock_lock thrashing. Finally, in order to avoid a lock inversion problem when handling /proc/locks output we must ensure that manipulations of the fl_block list are also protected by the file_lock_lock. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:42 +04:00
Jeff Layton	8897469171	locks: encapsulate the fl_link list handling Move the fl_link list handling routines into a separate set of helpers. Also ensure that locks and requests are always put on global lists last (after fully initializing them) and are taken off before unintializing them. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:41 +04:00
Jeff Layton	b9746ef80f	locks: make "added" in __posix_lock_file a bool Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:40 +04:00
Jeff Layton	1cb3601259	locks: comment cleanups and clarifications Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:39 +04:00
Jeff Layton	d4f22d19df	locks: make generic_add_lease and generic_delete_lease static Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:39 +04:00
Jeff Layton	1a9e64a711	cifs: use posix_unblock_lock instead of locks_delete_block commit `66189be74` (CIFS: Fix VFS lock usage for oplocked files) exported the locks_delete_block symbol. There's already an exported helper function that provides this capability however, so make cifs use that instead and turn locks_delete_block back into a static function. Note that if fl->fl_next == NULL then this lock has already been through locks_delete_block(), so we should be OK to ignore an ENOENT error here and simply not retry the lock. Cc: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:38 +04:00
Jeff Layton	f891a29f46	locks: drop the unused filp argument to posix_unblock_lock Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:57:37 +04:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
J. Bruce Fields	f474af7051	UAPI Disintegration 2012-10-09 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAUHPmWxOxKuMESys7AQKN4w//XDwALfbf0MXIw+gwyRiUtJe9mGexvI6X 1R4FWU9a3ImzEZP4cWnmPGT2wmC/x007DcIvx8cyvbdlSuqtR2i/DC+HbWabiLRn nJS7Eer1BJvLv5dn6NmXMEz7yB4Z46+frcmBs3WQeR0sqBMDm+rjQzCqECznO8Jc VtCbox+VR2DuWcM++YECTblYEH3Z+doDXUN2eBaD8L9x3klPbPXD7OcRyOnry8w+ ynmUTKKyH4+hpxDakYrObPIg+vFCxb4QRck1mlgA4wbvb3eqjhM0oOCYJ8GvmILA vdFYztWCjkiuOl5djtXBlsClX8SAMOBYlRed+R1GvjNCSR+WCWrFJJ2F8qoQ1w87 9ts2/8qrozS8luTB475SkT2uLdJkIUKX89Oh+dWeE8YkbPnRPj5lNAdtNY5QSyDq VaRpIo+YfmZygyvHJQlAXBuZ0mvzcPzArfcPgSVTD3B7xTEGVu/45V7SnQX5os/V v39ySPXMdGOIdvK51gw7OtZl64uqrEKu39PyYDX/GUADflp/CHD0J7PJrQePbsH9 AQolVZDIxTfKqYQnUdL8+C8Zc24RowEzz3c2+aO89MSzwGqev3q8sXRVbW/Iqryg p+V3nHe+ipKcga5tOBlPr9KDtDd7j3xN2yaIwf5/QyO1OHBpjAZP1gjSVDcUcwpi svYy4kPn3PA= =etoL -----END PGP SIGNATURE----- nfs: disintegrate UAPI for nfs This is to complete part of the Userspace API (UAPI) disintegration for which the preparatory patches were pulled recently. After these patches, userspace headers will be segregated into: include/uapi/linux/.../foo.h for the userspace interface stuff, and: include/linux/.../foo.h for the strictly kernel internal stuff. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-10-09 18:35:22 -04:00
Al Viro	2903ff019b	switch simple cases of fget_light to fdget Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 22:20:08 -04:00
Jeff Layton	0ee5c6d632	vfs: don't treat fl_type as a bitmap The rules for fl_type are rather convoluted. Typically it's treated as holding specific values, except in the case of LOCK_MAND, in which case it can be or'ed with LOCK_READ\|LOCK_WRITE. On some arches F_WRLCK == 2 and F_UNLCK == 3, so and'ing with F_WRLCK will also catch the F_UNLCK case. It's unlikely in either case here that we'd ever see F_UNLCK since those shouldn't end up on any lists, but it's still best to be consistent. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-08-20 18:39:42 -04:00
J. Bruce Fields	068535f1fe	locks: remove unused lm_release_private In commit `3b6e2723f3` ("locks: prevent side-effects of locks_release_private before file_lock is initialized") we removed the last user of lm_release_private without removing the field itself. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-01 09:01:46 -07:00
Linus Torvalds	08843b79fb	Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux Pull nfsd changes from J. Bruce Fields: "This has been an unusually quiet cycle--mostly bugfixes and cleanup. The one large piece is Stanislav's work to containerize the server's grace period--but that in itself is just one more step in a not-yet-complete project to allow fully containerized nfs service. There are a number of outstanding delegation, container, v4 state, and gss patches that aren't quite ready yet; 3.7 may be wilder." * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits) NFSd: make boot_time variable per network namespace NFSd: make grace end flag per network namespace Lockd: move grace period management from lockd() to per-net functions LockD: pass actual network namespace to grace period management functions LockD: manage grace list per network namespace SUNRPC: service request network namespace helper introduced NFSd: make nfsd4_manager allocated per network namespace context. LockD: make lockd manager allocated per network namespace LockD: manage grace period per network namespace Lockd: add more debug to host shutdown functions Lockd: host complaining function introduced LockD: manage used host count per networks namespace LockD: manage garbage collection timeout per networks namespace LockD: make garbage collector network namespace aware. LockD: mark host per network namespace on garbage collect nfsd4: fix missing fault_inject.h include locks: move lease-specific code out of locks_delete_lock locks: prevent side-effects of locks_release_private before file_lock is initialized NFSd: set nfsd_serv to NULL after service destruction NFSd: introduce nfsd_destroy() helper ...	2012-07-31 14:42:28 -07:00
J. Bruce Fields	96d6d59cea	locks: move lease-specific code out of locks_delete_lock No point putting something only used by one caller into common code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:18:00 -04:00
Filipe Brandenburger	3b6e2723f3	locks: prevent side-effects of locks_release_private before file_lock is initialized When calling fcntl(fd, F_SETLEASE, lck) [with lck=F_WRLCK or F_RDLCK], the custom signal or owner (if any were previously set using F_SETSIG or F_SETOWN fcntls) would be reset when F_SETLEASE was called for the second time on the same file descriptor. This bug is a regression of 2.6.37 and is described here: https://bugzilla.kernel.org/show_bug.cgi?id=43336 This patch reverts a commit from Oct 2004 (with subject "nfs4 lease: move the f_delown processing") which originally introduced the lm_release_private callback. Signed-off-by: Filipe Brandenburger <filbranden@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 09:39:55 -04:00
J. Bruce Fields	0ec4f431eb	locks: fix checking of fcntl_setlease argument The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.) are done after converting the long to an int. Thus some illegal values may be let through and cause problems in later code. [ They actually don't cause problems in mainline, as of Dave Jones's commit `8d657eb3b4` "Remove easily user-triggerable BUG from generic_setlease", but we should fix this anyway. And this patch will be necessary to fix real bugs on earlier kernels. ] Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-23 12:46:01 -07:00
Dave Jones	8d657eb3b4	Remove easily user-triggerable BUG from generic_setlease This can be trivially triggered from userspace by passing in something unexpected. kernel BUG at fs/locks.c:1468! invalid opcode: 0000 [#1] SMP RIP: 0010:generic_setlease+0xc2/0x100 Call Trace: __vfs_setlease+0x35/0x40 fcntl_setlease+0x76/0x150 sys_fcntl+0x1c6/0x810 system_call_fastpath+0x1a/0x1f Signed-off-by: Dave Jones <davej@redhat.com> Cc: stable@kernel.org # 3.2+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-13 10:50:23 -07:00
Al Viro	bdc689594b	switch flock to fget_light/fput_light Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:31 -04:00
Linus Torvalds	644473e9c6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace enhancements from Eric Biederman: "This is a course correction for the user namespace, so that we can reach an inexpensive, maintainable, and reasonably complete implementation. Highlights: - Config guards make it impossible to enable the user namespace and code that has not been converted to be user namespace safe. - Use of the new kuid_t type ensures the if you somehow get past the config guards the kernel will encounter type errors if you enable user namespaces and attempt to compile in code whose permission checks have not been updated to be user namespace safe. - All uids from child user namespaces are mapped into the initial user namespace before they are processed. Removing the need to add an additional check to see if the user namespace of the compared uids remains the same. - With the user namespaces compiled out the performance is as good or better than it is today. - For most operations absolutely nothing changes performance or operationally with the user namespace enabled. - The worst case performance I could come up with was timing 1 billion cache cold stat operations with the user namespace code enabled. This went from 156s to 164s on my laptop (or 156ns to 164ns per stat operation). - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value. Most uid/gid setting system calls treat these value specially anyway so attempting to use -1 as a uid would likely cause entertaining failures in userspace. - If setuid is called with a uid that can not be mapped setuid fails. I have looked at sendmail, login, ssh and every other program I could think of that would call setuid and they all check for and handle the case where setuid fails. - If stat or a similar system call is called from a context in which we can not map a uid we lie and return overflowuid. The LFS experience suggests not lying and returning an error code might be better, but the historical precedent with uids is different and I can not think of anything that would break by lying about a uid we can't map. - Capabilities are localized to the current user namespace making it safe to give the initial user in a user namespace all capabilities. My git tree covers all of the modifications needed to convert the core kernel and enough changes to make a system bootable to runlevel 1." Fix up trivial conflicts due to nearby independent changes in fs/stat.c * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) userns: Silence silly gcc warning. cred: use correct cred accessor with regards to rcu read lock userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq userns: Convert cgroup permission checks to use uid_eq userns: Convert tmpfs to use kuid and kgid where appropriate userns: Convert sysfs to use kgid/kuid where appropriate userns: Convert sysctl permission checks to use kuid and kgids. userns: Convert proc to use kuid/kgid where appropriate userns: Convert ext4 to user kuid/kgid where appropriate userns: Convert ext3 to use kuid/kgid where appropriate userns: Convert ext2 to use kuid/kgid where appropriate. userns: Convert devpts to use kuid/kgid where appropriate userns: Convert binary formats to use kuid/kgid where appropriate userns: Add negative depends on entries to avoid building code that is userns unsafe userns: signal remove unnecessary map_cred_ns userns: Teach inode_capable to understand inodes whose uids map to other namespaces. userns: Fail exec for suid and sgid binaries with ids outside our user namespace. userns: Convert stat to return values mapped from kuids and kgids userns: Convert user specfied uids and gids in chown into kuids and kgid userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs ...	2012-05-23 17:42:39 -07:00
Eric W. Biederman	8e96e3b7b8	userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-03 03:29:34 -07:00
Pavel Shilovsky	66189be74f	CIFS: Fix VFS lock usage for oplocked files We can deadlock if we have a write oplock and two processes use the same file handle. In this case the first process can't unlock its lock if the second process blocked on the lock in the same time. Fix it by using posix_lock_file rather than posix_lock_file_wait under cinode->lock_mutex. If we request a blocking lock and posix_lock_file indicates that there is another lock that prevents us, wait untill that lock is released and restart our call. Cc: stable@kernel.org Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-04-01 13:54:27 -05:00
Linus Torvalds	6d4b9e38d3	vfs: fix handling of lock allocation failure in lease-break case Bruce Fields notes that commit `778fc546f7` ("locks: fix tracking of inprogress lease breaks") introduced a possible error pointer dereference on failure to allocate memory. locks_conflict() will dereference the passed-in new lease lock structure that may be an error pointer. This means an open (without O_NONBLOCK set) on a file with a lease applied (generally only done when Samba or nfsd (with v4) is running) could crash if a kmalloc() fails. So instead of playing games with IS_ERROR() all over the place, just check the allocation failure early. That makes the code more straightforward, and avoids this possible bad pointer dereference. Based-on-patch-by: J. Bruce Fields <bfields@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-12-26 10:25:26 -08:00
Linus Torvalds	1442d1678c	Merge branch 'for-3.2' of git://linux-nfs.org/~bfields/linux * 'for-3.2' of git://linux-nfs.org/~bfields/linux: (103 commits) nfs41: implement DESTROY_CLIENTID operation nfsd4: typo logical vs bitwise negate for want_mask nfsd4: allow NFS4_SHARE_SIGNAL_DELEG_WHEN_RESRC_AVAIL \| NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED nfsd4: seq->status_flags may be used unitialized nfsd41: use SEQ4_STATUS_BACKCHANNEL_FAULT when cb_sequence is invalid nfsd4: implement new 4.1 open reclaim types nfsd4: remove unneeded CLAIM_DELEGATE_CUR workaround nfsd4: warn on open failure after create nfsd4: preallocate open stateid in process_open1() nfsd4: do idr preallocation with stateid allocation nfsd4: preallocate nfs4_file in process_open1() nfsd4: clean up open owners on OPEN failure nfsd4: simplify process_open1 logic nfsd4: make is_open_owner boolean nfsd4: centralize renew_client() calls nfsd4: typo logical vs bitwise negate nfs: fix bug about IPv6 address scope checking nfsd4: more robust ignoring of WANT bits in OPEN nfsd4: move name-length checks to xdr nfsd4: move access/deny validity checks to xdr code ...	2011-10-25 15:42:01 +02:00
Paul Bolle	395cf9691d	doc: fix broken references There are numerous broken references to Documentation files (in other Documentation files, in comments, etc.). These broken references are caused by typo's in the references, and by renames or removals of the Documentation files. Some broken references are simply odd. Fix these broken references, sometimes by dropping the irrelevant text they were part of. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-27 18:08:04 +02:00
J. Bruce Fields	8335ebd94b	leases: split up generic_setlease into lock/unlock cases Eventually we should probably do the same thing to the file operations as well. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-09-21 10:40:54 -04:00
J. Bruce Fields	c1f24ef4ed	locks: setlease cleanup There's an incorrect comment here. Also clean up the logic: the "rdlease" and "wrlease" locals are confusingly named, and don't really add anything since we can make a decision as soon as we hit one of these cases. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-08-19 13:25:35 -04:00
J. Bruce Fields	778fc546f7	locks: fix tracking of inprogress lease breaks We currently use a bit in fl_flags to record whether a lease is being broken, and set fl_type to the type (RDLCK or UNLCK) that it will eventually have. This means that once the lease break starts, we forget what the lease's type used to be. Breaking a read lease will then result in blocking read opens, even though there's no conflict--because the lease type is now F_UNLCK and we can no longer tell whether it was previously a read or write lease. So, instead keep fl_type as the original type (the type which we enforce), and keep track of whether we're unlocking or merely downgrading by replacing the single FL_INPROGRESS flag by FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags. To get this right we also need to track separate downgrade and break times, to handle the case where a write-leased file gets conflicting opens first for read, then later for write. (I first considered just eliminating the downgrade behavior completely--nfsv4 doesn't need it, and nobody as far as I can tell actually uses it currently--but Jeremy Allison tells me that Windows oplocks do behave this way, so Samba will probably use this some day.) Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-08-19 13:25:34 -04:00
J. Bruce Fields	710b721696	locks: move F_INPROGRESS from fl_type to fl_flags field F_INPROGRESS isn't exposed to userspace. To me it makes more sense in fl_flags.... Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-08-19 13:25:34 -04:00
J. Bruce Fields	ab83fa4b49	locks: minor lease cleanup Use a helper function, to simplify upcoming changes. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-08-19 13:25:33 -04:00
J. Bruce Fields	8fb47a4fbf	locks: rename lock-manager ops Both the filesystem and the lock manager can associate operations with a lock. Confusingly, one of them (fl_release_private) actually has the same name in both operation structures. It would save some confusion to give the lock-manager ops different names. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-20 20:23:19 -04:00
Miklos Szeredi	ee19cc406d	fs: locks: remove init_once From: Miklos Szeredi <mszeredi@suse.cz> Remove SLAB initialization entirely, as suggested by Bruce and Linus. Allocate with __GFP_ZERO instead and only initialize list heads. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 19:00:39 -04:00
Miklos Szeredi	a51cb91d81	fs: fix lock initialization locks_alloc_lock() assumed that the allocated struct file_lock is already initialized to zero members. This is only true for the first allocation of the structure, after reuse some of the members will have random values. This will for example result in passing random fl_start values to userspace in fuse for FL_FLOCK locks, which is an information leak at best. Fix by reinitializing those members which may be non-zero after freeing. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-06 10:41:13 -07:00
Linus Torvalds	dc87c55120	Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux * 'for-2.6.39' of git://linux-nfs.org/~bfields/linux: SUNRPC: Remove resource leak in svc_rdma_send_error() nfsd: wrong index used in inner loop nfsd4: fix comment and remove unused nfsd4_file fields nfs41: make sure nfs server return right ca_maxresponsesize_cached nfsd: fix compile error svcrpc: fix bad argument in unix_domain_find nfsd4: fix struct file leak nfsd4: minor nfs4state.c reshuffling svcrpc: fix rare race on unix_domain creation nfsd41: modify the members value of nfsd4_op_flags nfsd: add proc file listing kernel's gss_krb5 enctypes gss:krb5 only include enctype numbers in gm_upcall_enctypes NFSD, VFS: Remove dead code in nfsd_rename() nfsd: kill unused macro definition locks: use assign_type()	2011-03-24 08:20:39 -07:00
Namhyung Kim	f32cb53219	locks: use assign_type() Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-07 12:05:09 -05:00
Matt Fleming	ae7eb8979c	fs/locks.c: Remove stale FIXME left over from BKL conversion The comment is no longer true as (now that the BKL conversion is finished) a spinlock _is_ now used to protect file_lock_list, blocked_list and inode->i_flock. Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2011-03-05 10:55:59 +01:00
Linus Torvalds	18bce371ae	Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits) nfsd4: fix callback restarting nfsd: break lease on unlink, link, and rename nfsd4: break lease on nfsd setattr nfsd: don't support msnfs export option nfsd4: initialize cb_per_client nfsd4: allow restarting callbacks nfsd4: simplify nfsd4_cb_prepare nfsd4: give out delegations more quickly in 4.1 case nfsd4: add helper function to run callbacks nfsd4: make sure sequence flags are set after destroy_session nfsd4: re-probe callback on connection loss nfsd4: set sequence flag when backchannel is down nfsd4: keep finer-grained callback status rpc: allow xprt_class->setup to return a preexisting xprt rpc: keep backchannel xprt as long as server connection rpc: move sk_bc_xprt to svc_xprt nfsd4: allow backchannel recovery nfsd4: support BIND_CONN_TO_SESSION nfsd4: modify session list under cl_lock Documentation: fl_mylease no longer exists ... Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work. The vfs-scale work touched some msnfs cases, and this merge removes support for that entirely, so the conflict was trivial to resolve.	2011-01-14 13:17:26 -08:00
Nick Piggin	b7ab39f631	fs: dcache scale dentry refcount Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:21 +11:00
J. Bruce Fields	255c7cf810	locks: minor setlease cleanup Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-04 16:49:29 -05:00
J. Bruce Fields	c45821d263	locks: eliminate fl_mylease callback The nfs server only supports read delegations for now, so we don't care how conflicts are determined. All we care is that unlocks are recognized as matching the leases they are meant to remove. After the last patch, a comparison of struct files will work for that purpose. So we no longer need this callback. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-04 16:49:28 -05:00
Arnd Bergmann	451a3c24b0	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
J. Bruce Fields	8896b93f42	locks: remove dead lease error-handling code A minor oversight from `f7347ce4ee`, "fasync: re-organize fasync entry insertion to allow it under a spinlock": this cleanup-on-error was only needed to handle -ENOMEM. Now that we're preallocating it's unneeded. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-10 14:31:29 -05:00
J. Bruce Fields	3df057ac9a	locks: fix leak on merging leases We must also free the passed-in lease in the case it wasn't used because an existing lease was upgrade/downgraded or already existed. Note the nfsd caller doesn't care because it's fl_change callback returns an error in those cases. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-10 14:31:23 -05:00
Christoph Hellwig	bb8430a2c8	locks: remove fl_copy_lock lock_manager operation This one was only used for a nasty hack in nfsd, which has recently been removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-31 06:35:15 -07:00
Christoph Hellwig	51ee4b84f5	locks: let the caller free file_lock on ->setlease failure The caller allocated it, the caller should free it. The only issue so far is that we could change the flp pointer even on an error return if the fl_change callback failed. But we can simply move the flp assignment after the fl_change invocation, as the callers don't care about the flp return value if the setlease call failed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-31 06:35:15 -07:00
J. Bruce Fields	05fa3135fd	locks: fix setlease methods to free passed-in lock We modified setlease to require the caller to allocate the new lease in the case of creating a new lease, but forgot to fix up the filesystem methods. Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-30 18:08:15 -07:00
J. Bruce Fields	096657b65e	locks: fix leaks on setlease errors We're depending on setlease to free the passed-in lease on failure. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-30 18:08:15 -07:00
J. Bruce Fields	0ceaf6c700	locks: prevent ENOMEM on lease unlock Removing a lock shouldn't require any allocations; a failure due to ENOMEM leaves the caller with a choice between retrying or giving up and leaking an unused lease. Next we should split the other lease calls into add and delete cases. I wanted to start with just the bugfix. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-30 18:08:14 -07:00
Linus Torvalds	7420a8c0de	Merge branch 'flock' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'flock' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: locks: turn lock_flocks into a spinlock fasync: re-organize fasync entry insertion to allow it under a spinlock locks/nfsd: allocate file lock outside of spinlock lockd: fix nlmsvc_notify_blocked locking lockd: push lock_flocks down	2010-10-27 18:13:34 -07:00
Arnd Bergmann	72f98e7255	locks: turn lock_flocks into a spinlock Nothing depends on lock_flocks using the BKL any more, so we can do the switch over to a private spinlock. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-27 22:07:36 +02:00
Linus Torvalds	f7347ce4ee	fasync: re-organize fasync entry insertion to allow it under a spinlock You currently cannot use "fasync_helper()" in an atomic environment to insert a new fasync entry, because it will need to allocate the new "struct fasync_struct". Yet fcntl_setlease() wants to call this under lock_flocks(), which is in the process of being converted from the BKL to a spinlock. In order to fix this, this abstracts out the actual fasync list insertion and the fasync allocations into functions of their own, and teaches fs/locks.c to pre-allocate the fasync_struct entry. That way the actual list insertion can happen while holding the required spinlock. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bfields@redhat.com: rebase on top of my changes to Arnd's patch] Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-27 22:06:17 +02:00
Arnd Bergmann	c5b1f0d92c	locks/nfsd: allocate file lock outside of spinlock As suggested by Christoph Hellwig, this moves allocation of new file locks out of generic_setlease into the callers, nfs4_open_delegation and fcntl_setlease in order to allow GFP_KERNEL allocations when lock_flocks has become a spinlock. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: J. Bruce Fields <bfields@redhat.com>	2010-10-27 21:41:50 +02:00
Jerome Marchand	99dc829256	procfs: fix numbering in /proc/locks The lock number in /proc/locks (first field) is implemented by a counter (private field of struct seq_file) which is incremented at each call of locks_show() and reset to 1 in locks_start() whatever the offset is. It should be reset according to the actual position in the list. Because of this, the numbering erratically restarts at 1 several times when reading a long /proc/locks file. Moreover, locks_show() can be called twice to print a single line thus skipping a number. The counter should be incremented in locks_next(). And last, pos is a loff_t, which can be bigger than a pointer, so we don't use the pointer as an integer anymore, and allocate a loff_t instead. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:13 -07:00
Arnd Bergmann	b89f432133	fs/locks.c: prepare for BKL removal This prepares the removal of the big kernel lock from the file locking code. We still use the BKL as long as fs/lockd uses it and ceph might sleep, but we can flip the definition to a private spinlock as soon as that's done. All users outside of fs/lockd get converted to use lock_flocks() instead of lock_kernel() where appropriate. Based on an earlier patch to use a spinlock from Matthew Wilcox, who has attempted this a few times before, the earliest patch from over 10 years ago turned it into a semaphore, which ended up being slower than the BKL and was subsequently reverted. Someone should do some serious performance testing when this becomes a spinlock, since this has caused problems before. Using a spinlock should be at least as good as the BKL in theory, but who knows... Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Matthew Wilcox <willy@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org	2010-10-05 11:02:04 +02:00
Jiri Kosina	318ae2edc3	Merge branch 'for-next' into for-linus Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c	2010-03-08 16:55:37 +01:00
Al Viro	8737c9305b	Switch may_open() and break_lease() to passing O_... ... instead of mixing FMODE_ and O_ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 13:00:21 -05:00
Adam Buchbinder	c9404c9c39	Fix misspelling of "should" and "shouldn't" in comments. Some comments misspell "should" or "shouldn't"; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-05 12:22:30 +01:00
Alexey Dobriyan	7b021967c5	const: make lock_manager_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:25 -07:00
Linus Torvalds	774a694f8c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits) sched: Fix sched::sched_stat_wait tracepoint field sched: Disable NEW_FAIR_SLEEPERS for now sched: Keep kthreads at default priority sched: Re-tune the scheduler latency defaults to decrease worst-case latencies sched: Turn off child_runs_first sched: Ensure that a child can't gain time over it's parent after fork() sched: enable SD_WAKE_IDLE sched: Deal with low-load in wake_affine() sched: Remove short cut from select_task_rq_fair() sched: Turn on SD_BALANCE_NEWIDLE sched: Clean up topology.h sched: Fix dynamic power-balancing crash sched: Remove reciprocal for cpu_power sched: Try to deal with low capacity, fix update_sd_power_savings_stats() sched: Try to deal with low capacity sched: Scale down cpu_power due to RT tasks sched: Implement dynamic cpu_power sched: Add smt_gain sched: Update the cpu_power sum during load-balance sched: Add SD_PREFER_SIBLING ...	2009-09-11 13:23:18 -07:00
Frederic Weisbecker	def01bc53d	sched: Convert the only user of cond_resched_bkl to use cond_resched() fs/locks.c:flock_lock_file() is the only user of cond_resched_bkl() This helper doesn't do anything more than cond_resched(). The latter naming is enough to explain that we are rescheduling if needed. The bkl suffix suggests another semantics but it's actually a synonym of cond_resched(). Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-7-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-18 15:51:45 +02:00
Sten Spans	713c0ecdb8	security: fix security_file_lock cmd argument Pass posix-translated lock operations to security_file_lock when invoked via sys_flock. Signed-off-by: Sten Spans <Sten_Spans@genua.de> Signed-off-by: James Morris <jmorris@namei.org>	2009-07-17 07:41:23 +10:00
Felix Blyakher	a9e61e25f9	lockd: call locks_release_private to cleanup per-filesystem state For every lock request lockd creates a new file_lock object in nlmsvc_setgrantargs() by copying the passed in file_lock with locks_copy_lock(). A filesystem can attach it's own lock_operations vector to the file_lock. It has to be cleaned up at the end of the file_lock's life. However, lockd doesn't do it today, yet it asserts in nlmclnt_release_lockargs() that the per-filesystem state is clean. This patch fixes it by exporting locks_release_private() and adding it to nlmsvc_freegrantargs(), to be symmetrical to creating a file_lock in nlmsvc_setgrantargs(). Signed-off-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-24 16:36:03 -04:00
Heiko Carstens	002c8976ee	[CVE-2009-0029] System call wrappers part 16 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2009-01-14 14:15:25 +01:00
David Howells	da9592edeb	CRED: Wrap task credential accesses in the filesystem subsystem Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:05 +11:00
Linus Torvalds	88ed86fee6	Merge branch 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc * 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: (35 commits) proc: remove fs/proc/proc_misc.c proc: move /proc/vmcore creation to fs/proc/vmcore.c proc: move pagecount stuff to fs/proc/page.c proc: move all /proc/kcore stuff to fs/proc/kcore.c proc: move /proc/schedstat boilerplate to kernel/sched_stats.h proc: move /proc/modules boilerplate to kernel/module.c proc: move /proc/diskstats boilerplate to block/genhd.c proc: move /proc/zoneinfo boilerplate to mm/vmstat.c proc: move /proc/vmstat boilerplate to mm/vmstat.c proc: move /proc/pagetypeinfo boilerplate to mm/vmstat.c proc: move /proc/buddyinfo boilerplate to mm/vmstat.c proc: move /proc/vmallocinfo to mm/vmalloc.c proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c proc: move /proc/slab_allocators boilerplate to mm/slab.c proc: move /proc/interrupts boilerplate code to fs/proc/interrupts.c proc: move /proc/stat to fs/proc/stat.c proc: move rest of /proc/partitions code to block/genhd.c proc: move /proc/cpuinfo code to fs/proc/cpuinfo.c proc: move /proc/devices code to fs/proc/devices.c proc: move rest of /proc/locks to fs/locks.c ...	2008-10-23 12:04:37 -07:00
Alexey Dobriyan	d8ba7a3633	proc: move rest of /proc/locks to fs/locks.c Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2008-10-23 14:37:00 +04:00
Al Viro	aeb5d72706	[PATCH] introduce fmode_t, do annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:06 -04:00
Alexey Dobriyan	51cc50685a	SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Miklos Szeredi	764c76b371	locks: allow ->lock() to return FILE_LOCK_DEFERRED Allow filesystem's ->lock() method to call posix_lock_file() instead of posix_lock_file_wait(), and return FILE_LOCK_DEFERRED. This makes it possible to implement a such a ->lock() function, that works with the lock manager, which needs the call to be asynchronous. Now the vfs_lock_file() helper can be used, so this is a cleanup as well. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	b648a6de00	locks: cleanup code duplication Extract common code into a function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	bde74e4bc6	locks: add special return value for asynchronous locks Use a special error value FILE_LOCK_DEFERRED to mean that a locking operation returned asynchronously. This is returned by posix_lock_file() for sleeping locks to mean that the lock has been queued on the block list, and will be woken up when it might become available and needs to be retried (either fl_lmops->fl_notify() is called or fl_wait is woken up). f_op->lock() to mean either the above, or that the filesystem will call back with fl_lmops->fl_grant() when the result of the locking operation is known. The filesystem can do this for sleeping as well as non-sleeping locks. This is to make sure, that return values of -EAGAIN and -EINPROGRESS by filesystems are not mistaken to mean an asynchronous locking. This also makes error handling in fs/locks.c and lockd/svclock.c slightly cleaner. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:47 -07:00
Denis V. Lunev	f9f48ec72b	[patch 4/4] flock: remove unused fields from file_lock_operations fl_insert and fl_remove are not used right now in the kernel. Remove them. Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-06-23 11:52:30 -04:00
Linus Torvalds	c3921ab715	Add new 'cond_resched_bkl()' helper function It acts exactly like a regular 'cond_resched()', but will not get optimized away when CONFIG_PREEMPT is set. Normal kernel code is already preemptable in the presense of CONFIG_PREEMPT, so cond_resched() is optimized away (see commit `02b67cc3ba` "sched: do not do cond_resched() when CONFIG_PREEMPT"). But when wanting to conditionally reschedule while holding a lock, you need to use "cond_sched_lock(lock)", and the new function is the BKL equivalent of that. Also make fs/locks.c use it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-11 16:04:48 -07:00
Al Viro	0b2bac2f1e	[PATCH] fix SMP ordering hole in fcntl_setlk() fcntl_setlk()/close() race prevention has a subtle hole - we need to make sure that if we do have an fcntl/close race on SMP box, the access to descriptor table and inode->i_flock won't get reordered. As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs. STORE descriptor table entry, LOAD inode->i_flock with not a single lock in common on both sides. We do have BKL around the first STORE, but check in locks_remove_posix() is outside of BKL and for a good reason - we don't want BKL on common path of close(2). Solution is to hold ->file_lock around fcheck() in there; that orders us wrt removal from descriptor table that preceded locks_remove_posix() on close path and we either come first (in which case eviction will be handled by the close side) or we'll see the effect of close and do eviction ourselves. Note that even though it's read-only access, we do need ->file_lock here - rcu_read_lock() won't be enough to order the things. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-05-06 13:58:34 -04:00
Al Viro	9f3acc3140	[PATCH] split linux/file.h Initial splitoff of the low-level stuff; taken to fdtable.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-05-01 13:08:16 -04:00
Roland Dreier	3dd7b71ca0	Export __locks_copy_lock() so modular lockd builds Commit `1a747ee0` ("locks: don't call ->copy_lock methods on return of conflicting locks") changed fs/lockd/svclock.c to call __locks_copy_lock() instead of locks_copy_lock(), but lockd can be built as a module and __locks_copy_lock() is not exported, which causes a build error ERROR: "__locks_copy_lock" [fs/lockd/lockd.ko] undefined! with CONFIG_LOCKD=m. Fix this by exporting __locks_copy_lock(). Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-25 15:49:46 -07:00
J. Bruce Fields	1a747ee0cc	locks: don't call ->copy_lock methods on return of conflicting locks The file_lock structure is used both as a heavy-weight representation of an active lock, with pointers to reference-counted structures, etc., and as a simple container for parameters that describe a file lock. The conflicting lock returned from __posix_lock_file is an example of the latter; so don't call the filesystem or lock manager callbacks when copying to it. This also saves the need for an unnecessary locks_init_lock in the nfsv4 server. Thanks to Trond for pointing out the error. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-04-25 13:00:11 -04:00
David M. Richter	9d91cdcc0c	leases: remove unneeded variable from fcntl_setlease(). fcntl_setlease() has a struct dentry* that is used only once; this patch removes it. Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-04-25 12:58:22 -04:00
David M. Richter	1908555767	leases: move lock allocation earlier in generic_setlease() In generic_setlease(), the struct file_lock is allocated after tests for the presence of conflicting readers/writers is done, despite the fact that the allocation might block; this patch moves the allocation earlier. A subsequent set of patches will rely on this behavior to properly serialize between a modified __break_lease() and generic_setlease(). Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-04-25 12:58:22 -04:00
David M. Richter	288b2fd825	leases: when unlocking, skip locking-related steps In generic_setlease(), we don't need to allocate a new struct file_lock or check for readers or writers when called with F_UNLCK. Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-04-25 12:58:22 -04:00
David M. Richter	5fcc60c3a0	leases: fix a return-value mixup Fixes a return-value mixup from `85c59580b3` "locks: Fix potential OOPS in generic_setlease()", in which -ENOMEM replaced what had been intended to stay -EAGAIN in the variable "error". Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-04-25 12:58:22 -04:00
Matthew Wilcox	cb688371e2	fs: Remove unnecessary inclusions of asm/semaphore.h None of these files use any of the functionality promised by asm/semaphore.h. Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2008-04-18 22:16:44 -04:00
J. Bruce Fields	19e729a928	locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs Miklos Szeredi found the bug: "Basically what happens is that on the server nlm_fopen() calls nfsd_open() which returns -EACCES, to which nlm_fopen() returns NLM_LCK_DENIED. "On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), which in will cause fcntl_setlk() to retry forever." So, for example, opening a file on an nfs filesystem, changing permissions to forbid further access, then trying to lock the file, could result in an infinite loop. And Trond Myklebust identified the culprit, from Marc Eshel and I: `7723ec9777` "locks: factor out generic/filesystem switch from setlock code" That commit claimed to just be reshuffling code, but actually introduced a behavioral change by calling the lock method repeatedly as long as it returned -EAGAIN. We assumed this would be safe, since we assumed a lock of type SETLKW would only return with either success or an error other than -EAGAIN. However, nfs does can in fact return -EAGAIN in this situation, and independently of whether that behavior is correct or not, we don't actually need this change, and it seems far safer not to depend on such assumptions about the filesystem's ->lock method. Therefore, revert the problematic part of the original commit. This leaves vfs_lock_file() and its other callers unchanged, while returning fcntl_setlk and fcntl_setlk64 to their former behavior. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Tested-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-14 12:22:14 -07:00
Randy Dunlap	a6b91919e0	fs: fix kernel-doc notation warnings Fix kernel-doc notation warnings in fs/. Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line: * mark_files_ro Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line: * lease_get_mtime Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line: * lease_get_mtime Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line: * lookup_one_len: filesystem helper to lookup single pathname component Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line: * bh_uptodate_or_lock: Test whether the buffer is uptodate Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line: * bh_submit_read: Submit a locked buffer for reading Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line: * writeback_acquire: attempt to get exclusive writeback access to a device Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line: * writeback_in_progress: determine whether there is writeback in progress Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line: * writeback_release: relinquish exclusive writeback access against a device. Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line: * void journal_invalidatepage() Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Pavel Emelyanov	6c5f3e7b43	Pidns: make full use of xxx_vnr() calls Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were _all_ converted to operate on the current pid namespace. After this each call like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo) one. Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where appropriate. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Vitaliy Gusev	ab1f161165	pid-namespaces-vs-locks-interaction fcntl(F_GETLK,..) can return pid of process for not current pid namespace (if process is belonged to the several namespaces). It is true also for pids in /proc/locks. So correct behavior is saving pointer to the struct pid of the process lock owner. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-03 17:51:36 -05:00
Matthew Wilcox	4321e01e7d	file locks: Use wait_event_interruptible_timeout() interruptible_sleep_on_locked() is just an open-coded wait_event_interruptible_timeout(), with the one difference that interruptible_sleep_on_locked() doesn't bother to check the condition on which it is waiting, depending instead on the BKL to avoid the case where it blocks after the wakeup has already been called. locks_block_on_timeout() is only used in one place, so it's actually simpler to inline it into its caller. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-03 17:51:36 -05:00
J. Bruce Fields	b533184fc3	locks: clarify posix_locks_deadlock For such a short function (with such a long comment), posix_locks_deadlock() seems to cause a lot of confusion. Attempt to make it a bit clearer: - Remove the initial posix_same_owner() check, which can never pass (since this is only called in the case that block_fl and caller_fl conflict) - Use an explicit loop (and a helper function) instead of a goto. - Rewrite the comment, attempting a clearer explanation, and removing some uninteresting historical detail. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-03 17:51:36 -05:00
J. Bruce Fields	97855b49b6	locks: fix possible infinite loop in posix deadlock detection It's currently possible to send posix_locks_deadlock() into an infinite loop (under the BKL). For now, fix this just by bailing out after a few iterations. We may want to fix this in a way that better clarifies the semantics of deadlock detection. But that will take more time, and this minimal fix is probably adequate for any realistic scenario, and is simple enough to be appropriate for applying to stable kernels now. Thanks to George Davis for reporting the problem. Cc: "George G. Davis" <gdavis@mvista.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-30 09:04:18 -07:00
Christoph Lameter	4ba9b9d0ba	Slab API: remove useless ctor parameter and reorder parameters Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void object, struct kmem_cache s, unsigned long flags) to ctor(struct kmem_cache s, void object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:45 -07:00
Pavel Emelyanov	7f8ada98d9	Rework /proc/locks via seq_files and seq_list helpers Currently /proc/locks is shown with a proc_read function, but its behavior is rather complex as it has to manually handle current offset and buffer length. On the other hand, files that show objects from lists can be easily reimplemented using the sequential files and the seq_list_XXX() helpers. This saves (as usually) 16 lines of code and more than 200 from the .text section. [akpm@linux-foundation.org: no externs in C] [akpm@linux-foundation.org: warning fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-09 18:32:46 -04:00
Matthias Kaehlcke	094f282521	fs/locks.c: use list_for_each_entry() instead of list_for_each() fs/locks.c: use list_for_each_entry() instead of list_for_each() in posix_locks_deadlock() and get_locks_status() Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-09 18:32:46 -04:00
Pavel Emelyanov	a16877ca9c	Cleanup macros for distinguishing mandatory locks The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the inode as "mandatory lockable" and there's a macro for this check called MANDATORY_LOCK(inode). However, fs/locks.c and some filesystems still perform the explicit i_mode checking. Besides, Andrew pointed out, that this macro is buggy itself, as it dereferences the inode arg twice. Convert this macro into static inline function and switch its users to it, making the code shorter and more readable. The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK() for superblock is already known to be true. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: David Howells <dhowells@redhat.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-09 18:32:46 -04:00
Pavel Emelyanov	85c59580b3	locks: Fix potential OOPS in generic_setlease() This code is run under lock_kernel(), which is dropped during sleeping operations, so the following race is possible: CPU1: CPU2: vfs_setlease(); vfs_setlease(); lock_kernel(); lock_kernel(); /* spin / generic_setlease(): ... for (before = ...) / here we found some lease after * which we will insert the new one / fl = locks_alloc_lock(); / go to sleep in this allocation and * drop the BKL / generic_setlease(): ... for (before = ...) / here we find the "before" pointing * at the one we found on CPU1 / ->fl_change(my_before, arg); lease_modify(); locks_free_lock(); / and we freed it / ... unlock_kernel(); locks_insert_lock(before, fl); / OOPS! We have just tried to add the lease * at the tail of already removed one */ The similar races are already handled in other code - all the allocations are performed before any checks/updates. Thanks to Kamalesh Babulal for testing and for a bug report on an earlier version. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>	2007-10-09 18:32:45 -04:00
Pavel Emelyanov	f0c1cd0eaf	Use list_first_entry in locks_wake_up_blocks This routine deletes all the elements from the list with the "while (!list_empty())" loop, and we already have a list_first_entry() macro to help it look nicer :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org>	2007-10-09 18:32:45 -04:00
J. Bruce Fields	02888f41e9	locks: fix flock_lock_file() comment This comment wasn't updated when lease support was added, and it makes essentially the same mistake that the code made before a recent bugfix. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-10-09 18:32:45 -04:00
Pavel Emelyanov	84d535ade6	Memory shortage can result in inconsistent flocks state When the flock_lock_file() is called to change the flock from F_RDLCK to F_WRLCK or vice versa the existing flock can be removed without appropriate warning. Look: for_each_lock(inode, before) { struct file_lock fl = before; if (IS_POSIX(fl)) break; if (IS_LEASE(fl)) continue; if (filp != fl->fl_file) continue; if (request->fl_type == fl->fl_type) goto out; found = 1; locks_delete_lock(before); <<<<<< ! break; } if after this point the subsequent locks_alloc_lock() will fail the return code will be -ENOMEM, but the existing lock is already removed. This is a known feature that such "re-locking" is not atomic, but in the racy case the file should stay locked (although by some other process), but in this case the file will be unlocked. The proposal is to prepare the lock in advance keeping no chance to fail in the future code. Found during making the flocks pid-namespaces aware. (Note: Thanks to Reuben Farrelly for finding a bug in an earlier version of this patch.) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Reuben Farrelly <reuben-linuxkernel@reub.net>	2007-10-09 18:32:45 -04:00
J. Bruce Fields	526985b9dd	locks: kill redundant local variable There's no need for another variable local to this loop; we can use the variable (of the same name!) already declared at the top of the function, and not used till later (at which point it's initialized, so this is safe). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-10-09 18:32:45 -04:00
J. Bruce Fields	b842e240f2	locks: reverse order of posix_locks_conflict() arguments The first argument to posix_locks_conflict() is meant to be a lock request, and the second a lock from an inode's lock request. It doesn't really make a difference which order you call them in, since the only asymmetric test in posix_lock_conflict() is the check whether the second argument is a posix lock--and every caller already does that check for some reason. But may as well fix posix_test_lock() to call posix_locks_conflict() with the arguments in the same order as everywhere else. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-10-09 18:32:45 -04:00
Pavel Emelyanov	0e2f6db88a	Leases can be hidden by flocks The inode->i_flock list contains the leases, flocks and posix locks in the specified order. However, the flocks are added in the head of this list thus hiding the leases from F_GETLEASE command, from time_out_leases() and other code that expects the leases to come first. The following example will demonstrate this: #define _GNU_SOURCE #include <unistd.h> #include <fcntl.h> #include <stdio.h> #include <sys/file.h> static void show_lease(int fd) { int res; res = fcntl(fd, F_GETLEASE); switch (res) { case F_RDLCK: printf("Read lease\n"); break; case F_WRLCK: printf("Write lease\n"); break; case F_UNLCK: printf("No leases\n"); break; default: printf("Some shit\n"); break; } } int main(int argc, char **argv) { int fd, res; fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("Can't open file"); return 1; } res = fcntl(fd, F_SETLEASE, F_WRLCK); if (res == -1) { perror("Can't set lease"); return 1; } show_lease(fd); if (flock(fd, LOCK_SH) == -1) { perror("Can't flock shared"); return 1; } show_lease(fd); return 0; } The first call to show_lease() will show the write lease set, but the second will show no leases. Fix the flock adding so that the leases always stay in the head of this list. Found during making the flocks pid-namespaces aware. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-11 17:21:27 -07:00
Christoph Hellwig	0af1a45046	rename setlease to generic_setlease Make it a little more clear that this is the default implementation for the setleast operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:43 -07:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
J. Bruce Fields	6924c55492	locks: fix vfs_test_lock() comment Thanks to Doug Chapman for pointing out that the comment here is inconsistent with the function prototype. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:19 -04:00
J. Bruce Fields	6d34ac199a	locks: make posix_test_lock() interface more consistent Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or presence of a conflict lock by setting fl_type to, respectively, F_UNLCK or something other than F_UNLCK, the return value is no longer needed. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:19 -04:00
J. Bruce Fields	4698afe8e3	locks: export setlease to filesystems Export setlease so it can used by filesystems to implement their lease methods. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:06 -04:00
J. Bruce Fields	f9ffed26d6	locks: provide a file lease method enabling cluster-coherent leases Currently leases are only kept locally, so there's no way for a distributed filesystem to enforce them against multiple clients. We're particularly interested in the case of nfsd exporting a cluster filesystem, in which case nfsd needs cluster-coherent leases in order to implement delegations correctly. Also add some documentation. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:14:47 -04:00
J. Bruce Fields	a9933cea7a	locks: rename lease functions to reflect locks.c conventions We've been using the convention that vfs_foo is the function that calls a filesystem-specific foo method if it exists, or falls back on a generic method if it doesn't; thus vfs_foo is what is called when some other part of the kernel (normally lockd or nfsd) wants to get a lock, whereas foo is what filesystems call to use the underlying local functionality as part of their lock implementation. So rename setlease to vfs_setlease (which will call a filesystem-specific setlease after a later patch) and __setlease to setlease. Also, vfs_setlease need only be GPL-exported as long as it's only needed by lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:14:12 -04:00
J. Bruce Fields	6d5e8b05ca	locks: share more common lease code Share more code between setlease (used by nfsd) and fcntl. Also some minor cleanup. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Christoph Hellwig <hch@infradead.org>	2007-07-18 19:09:27 -04:00
J. Bruce Fields	e32b8ee27b	locks: clean up lease_alloc() Return the newly allocated structure as the return value instead of using a struct ** parameter. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:09:27 -04:00
J. Bruce Fields	d2ab0b0c4c	locks: convert an -EINVAL return to a BUG There's no point trying to return an error in these cases, which all represent bugs in the callers. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:09:27 -04:00
david m. richter	87250dd26a	leases: minor break_lease() comment clarification clarify that break_lease() checks for presence of any lock, not just leases. Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:09:27 -04:00
Christoph Lameter	a35afb830f	Remove SLAB_CTOR_CONSTRUCTOR SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:04 -07:00
J. Bruce Fields	129a84de23	locks: fix F_GETLK regression (failure to find conflicts) In `9d6a8c5c21` we changed posix_test_lock to modify its single file_lock argument instead of taking separate input and output arguments. This makes it no longer safe to set the output lock's fl_type to F_UNLCK before looking for a conflict, since that means searching for a conflict against a lock with type F_UNLCK. This fixes a regression which causes F_GETLK to incorrectly report no conflict on most filesystems (including any filesystem that doesn't do its own locking). Also fix posix_lock_to_flock() to copy the lock type. This isn't strictly necessary, since the caller already does this; but it seems less likely to cause confusion in the future. Thanks to Doug Chapman for the bug report. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Doug Chapman <doug.chapman@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-10 20:25:59 -07:00
Linus Torvalds	2d56d3c43c	Merge branch 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux * 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux: gfs2: nfs lock support for gfs2 lockd: add code to handle deferred lock requests lockd: always preallocate block in nlmsvc_lock() lockd: handle test_lock deferrals lockd: pass cookie in nlmsvc_testlock lockd: handle fl_grant callbacks lockd: save lock state on deferral locks: add fl_grant callback for asynchronous lock return nfsd4: Convert NFSv4 to new lock interface locks: add lock cancel command locks: allow {vfs,posix}_lock_file to return conflicting lock locks: factor out generic/filesystem switch from setlock code locks: factor out generic/filesystem switch from test_lock locks: give posix_test_lock same interface as ->lock locks: make ->lock release private data before returning in GETLK case locks: create posix-to-flock helper functions locks: trivial removal of unnecessary parentheses	2007-05-07 12:34:24 -07:00
Christoph Lameter	50953fe9e0	slab allocators: Remove SLAB_DEBUG_INITIAL flag I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Marc Eshel	2beb6614f5	locks: add fl_grant callback for asynchronous lock return Acquiring a lock on a cluster filesystem may require communication with remote hosts, and to avoid blocking lockd or nfsd threads during such communication, we allow the results to be returned asynchronously. When a ->lock() call needs to block, the file system will return -EINPROGRESS, and then later return the results with a call to the routine in the fl_grant field of the lock_manager_operations struct. This differs from the case when ->lock returns -EAGAIN to a blocking lock request; in that case, the filesystem calls fl_notify when the lock is granted, and the caller retries the original lock. So while fl_notify is merely a hint to the caller that it should retry, fl_grant actually communicates the final result of the lock operation (with the lock already acquired in the succesful case). Therefore fl_grant takes a lock, a status and, for the test lock case, a conflicting lock. We also allow fl_grant to return an error to the filesystem, to handle the case where the fl_grant requests arrives after the lock manager has already given up waiting for it. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:49 -04:00
Marc Eshel	9b9d2ab415	locks: add lock cancel command Lock managers need to be able to cancel pending lock requests. In the case where the exported filesystem manages its own locks, it's not sufficient just to call posix_unblock_lock(); we need to let the filesystem know what's happening too. We do this by adding a new fcntl lock command: FL_CANCELLK. Some day this might also be made available to userspace applications that could benefit from an asynchronous locking api. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 20:38:28 -04:00
Marc Eshel	150b393456	locks: allow {vfs,posix}_lock_file to return conflicting lock The nfsv4 protocol's lock operation, in the case of a conflict, returns information about the conflicting lock. It's unclear how clients can use this, so for now we're not going so far as to add a filesystem method that can return a conflicting lock, but we may as well return something in the local case when it's easy to. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 19:23:24 -04:00
Marc Eshel	7723ec9777	locks: factor out generic/filesystem switch from setlock code Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for all the setlk code. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 18:08:49 -04:00
J. Bruce Fields	3ee17abd14	locks: factor out generic/filesystem switch from test_lock Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for test_lock. Note that this hasn't been necessary until recently, because the few filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS. However GFS (and, in the future, other cluster filesystems) need to implement their own locking to get cluster-coherent locking, and also want to be able to export locking to NFS (lockd and NFSv4). So we accomplish this by factoring out code such as this and exporting it for the use of lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 18:06:44 -04:00
Marc Eshel	9d6a8c5c21	locks: give posix_test_lock same interface as ->lock posix_test_lock() and ->lock() do the same job but have gratuitously different interfaces. Modify posix_test_lock() so the two agree, simplifying some code in the process. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 17:39:00 -04:00
J. Bruce Fields	70cc6487a4	locks: make ->lock release private data before returning in GETLK case The file_lock argument to ->lock is used to return the conflicting lock when found. There's no reason for the filesystem to return any private information with this conflicting lock, but nfsv4 is. Fix nfsv4 client, and modify locks.c to stop calling fl_release_private for it in this case. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: "Trond Myklebust" <Trond.Myklebust@netapp.com>"	2007-05-06 17:38:19 -04:00
J. Bruce Fields	c2fa1b8a6c	locks: create posix-to-flock helper functions Factor out a bit of messy code by creating posix-to-flock counterparts to the existing flock-to-posix helper functions. Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-04-16 13:40:37 -04:00
J. Bruce Fields	226a998dbf	locks: trivial removal of unnecessary parentheses Remove some unnecessary parentheses. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-04-16 13:40:37 -04:00
Josef "Jeff" Sipek	0f7fc9e4d0	[PATCH] VFS: change struct file to use struct path This patch changes struct file to use struct path instead of having independent pointers to struct dentry and struct vfsmount, and converts all users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}. Additionally, it adds two #define's to make the transition easier for users of the f_dentry and f_vfsmnt. Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:41 -08:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00

1 2 3 4 5 ...

289 Commits