linux

Author	SHA1	Message	Date
Ian Kent	213614d583	autofs4: always use lookup for lookup We need to be able to cope with the directory mutex being held during ->d_revalidate() in some cases, but not all cases, and not necessarily by us. Because we need to release the mutex when we call back to the daemon to do perform a mount we must be sure that it is us who holds the mutex so we must redirect mount requests to ->lookup() if the mutex is held. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Yehuda Saheh <yehuda@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:58 -08:00
Ian Kent	cb4b492ac7	autofs4: rename dentry to expiring in autofs4_lookup_expiring() In autofs4_lookup_expiring() a declaration within the list traversal loop uses a declaration that has the same name as the function parameter. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Yehuda Saheh <yehuda@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:58 -08:00
Ian Kent	e4d5ade7b5	autofs4: rename dentry to active in autofs4_lookup_active() In autofs4_lookup_active() a declaration within the list traversal loop uses a declaration that has the same name as the function parameter. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Yehuda Saheh <yehuda@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:58 -08:00
Ian Kent	c42c7f7e69	autofs4: eliminate d_unhashed in path walk checks We unhash the dentry (in a subsequent patch) in ->d_revalidate() in order to send mount requests to ->lookup(). But then we can not rely on d_unhased() to give reliable results because it may be called at any time by any code path. The d_unhashed() function is used by __simple_empty() in the path walking callbacks but autofs mount point dentrys should have no directories at all so a list_empty() on d_subdirs should be (and is) sufficient. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Yehuda Saheh <yehuda@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:58 -08:00
Ian Kent	6510c9d859	autofs4: cleanup active and expire lookup The lookup functions for active and expiring dentrys use parameters that can be easily obtained on entry so we change the call to to take just the dentry. This makes the subsequent change, to send all lookups to ->lookup(), a bit cleaner. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Yehuda Saheh <yehuda@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:58 -08:00
Ian Kent	90387c9c1d	autofs4: renamer unhashed to active in autofs4_lookup() Rename the variable unhashed to active in autofs4_lookup() to better reflect its usage. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Yehuda Saheh <yehuda@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:58 -08:00
Ian Kent	aa952eb26d	autofs4: use autofs_info for pending flag Eliminate the use of the d_lock spin lock by using the autofs super block info spin lock. This reduces the number of spin locks we use by one and makes the code for the following patch (to redirect ->d_revalidate() to ->lookup()) a little simpler. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Yehuda Saheh <yehuda@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:58 -08:00
Ian Kent	36b6413ef3	autofs4: use helper function for need mount check Define simple helper function for checking if we need to trigger a mount. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Yehuda Saheh <yehuda@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:57 -08:00
Ian Kent	c4cd70b3e3	autofs4: use helper functions for expiring list Define some simple helper functions for adding and deleting entries on the expiring dentry list. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Yehuda Saheh <yehuda@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:57 -08:00
Ian Kent	4f8427d190	autofs4: use helper functions for active list handling Define some simple helper functions for adding and deleting entries on the active (and unhashed) dentry list. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Yehuda Saheh <yehuda@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:57 -08:00
Alexey Dobriyan	135d5655dc	proc: rename de_get() to pde_get() and inline it * de_get() is trivial -- make inline, save a few bits of code, drop "refcount is 0" check -- it should be done in some generic refcount code, don't recall it's was helpful * rename GET and PUT functions to pde_get(), pde_put() for cool prefix! * remove obvious and incorrent comments * in remove_proc_entry() use pde_put(), when I fixed PDE refcounting to be normal one, remove_proc_entry() was supposed to do "-1" and code now reflects that. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:19:57 -08:00
Wu Fengguang	1a9b5b7fe0	mm: export stable page flags Rename get_uflags() to stable_page_flags() and make it a global function for use in the hwpoison page flags filter, which need to compare user page flags with the value provided by user space. Also move KPF_* to kernel-page-flags.h for use by user space tools. Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> CC: Nick Piggin <npiggin@suse.de> CC: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2009-12-16 12:19:59 +01:00
David Woodhouse	2e16cfca6e	jffs2: Fix long-standing bug with symlink garbage collection. Ever since jffs2_garbage_collect_metadata() was first half-written in February 2001, it's been broken on architectures where 'char' is signed. When garbage collecting a symlink with target length above 127, the payload length would end up negative, causing interesting and bad things to happen. Cc: stable@kernel.org Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-12-16 03:27:20 +00:00
Yan, Zheng	920bbbfb05	Btrfs: Rewrite btrfs_drop_extents Rewrite btrfs_drop_extents by using btrfs_duplicate_item, so we can avoid calling lock_extent within transaction. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-12-15 21:24:52 -05:00
Yan, Zheng	ad48fd7546	Btrfs: Add btrfs_duplicate_item btrfs_duplicate_item duplicates item with new key, guaranteeing the source item and the new items are in the same tree leaf and contiguous. It allows us to split file extent in place, without using lock_extent to prevent bookend extent race. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-12-15 21:24:25 -05:00
Yan, Zheng	8cef4e160d	Btrfs: Avoid superfluous tree-log writeout We allow two log transactions at a time, but use same flag to mark dirty tree-log btree blocks. So we may flush dirty blocks belonging to newer log transaction when committing a log transaction. This patch fixes the issue by using two flags to mark dirty tree-log btree blocks. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-12-15 21:24:25 -05:00
Trond Myklebust	380454126f	NFSv4: Fix a regression in the NFSv4 state manager Commit `5601a00d67` (nfs: run state manager in privileged mode) introduces a regression in the NFSv4 code when compiled with CONFIG_NFS_V4_1. The calls to nfs4_end_drain_session() from the main loop in nfs4_state_manager() Oops due to the lack of an NFSv4.1 session when running NFSv4.0. The fix is to move those two calls back into nfs41_init_clientid() and nfs4_reset_session(). The calls to nfs4_end_drain_session() that remain inside nfs4_state_manager() are safe, since the NFSv4.0 code will never set the NFS4CLNT_SESSION_DRAINING bit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 17:36:57 -05:00
J. Bruce Fields	7663dacd92	nfsd: remove pointless paths in file headers The new .h files have paths at the top that are now out of date. While we're here, just remove all of those from fs/nfsd; they never served any purpose. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-15 15:01:47 -05:00
J. Bruce Fields	1557aca790	nfsd: move most of nfsfh.h to fs/nfsd Most of this can be trivially moved to a private header as well. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-15 15:01:46 -05:00
J. Bruce Fields	774b147828	nfsd: make V4ROOT exports read-only I can't see any use for writeable V4ROOT exports. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-15 15:01:44 -05:00
Trond Myklebust	72211dbe72	NFSv4: Release the sequence id before restarting a CLOSE rpc call If the CLOSE or OPEN_DOWNGRADE call triggers a state recovery, and has to be resent, then we must release the seqid. Otherwise the open recovery will wait for the close to finish, which causes a deadlock. This is mainly a NFSv4.1 problem, although it can theoretically happen with NFSv4.0 too, in a OPEN_DOWNGRADE situation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 14:47:36 -05:00
Steve Dickson	03a816b46d	nfsd: restrict filehandles accepted in V4ROOT case On V4ROOT exports, only accept filehandles that are the root of some export. This allows mountd to allow or deny access to individual directories and symlinks on the pseudofilesystem. Note that the checks in readdir and lookup are not enough, since a malicious host with access to the network could guess filehandles that they weren't able to obtain through lookup or readdir. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-15 14:07:24 -05:00
J. Bruce Fields	f2ca7153ca	nfsd: allow exports of symlinks We want to allow exports of symlinks, to allow mountd to communicate to the kernel which symlinks lead to exports, and hence which symlinks need to be visible on the pseudofilesystem. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-15 14:07:24 -05:00
J. Bruce Fields	3227fa41ab	nfsd: filter readdir results in V4ROOT case As with lookup, we treat every boject as a mountpoint and pretend it doesn't exist if it isn't exported. The preexisting code here is confusing, but I haven't yet figured out how to make it clearer. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-15 14:07:24 -05:00
J. Bruce Fields	82ead7fe41	nfsd: filter lookup results in V4ROOT case We treat every object as a mountpoint and pretend it doesn't exist if it isn't exported. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-15 14:07:23 -05:00
J. Bruce Fields	3b6cee7bc4	nfsd4: don't continue "under" mounts in V4ROOT case If /A/mount/point/ has filesystem "B" mounted on top of it, and if "A" is exported, but not "B", then the nfs server has always returned to the client a filehandle for the mountpoint, instead of for the root of "B", allowing the client to see the subtree of "A" that would otherwise be hidden by B. Disable this behavior in the case of V4ROOT exports; we implement the path restrictions of V4ROOT exports by treating every directory as if it were a mountpoint, and allowing traversal only if the new directory is exported. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-15 14:07:23 -05:00
Steve Dickson	eb4c86c6a5	nfsd: introduce export flag for v4 pseudoroot NFSv4 differs from v2 and v3 in that it presents a single unified filesystem tree, whereas v2 and v3 exported multiple filesystem (whose roots could be found using a separate mount protocol). Our original NFSv4 server implementation asked the administrator to designate a single filesystem as the NFSv4 root, then to mount filesystems they wished to export underneath. (Often using bind mounts of already-existing filesystems.) This was conceptually simple, and allowed easy implementation, but created a serious obstacle to upgrading between v2/v3: since the paths to v4 filesystems were different, administrators would have to adjust all the paths in client-side mount commands when switching to v4. Various workarounds are possible. For example, the administrator could export "/" and designate it as the v4 root. However, the security risks of that approach are obvious, and in any case we shouldn't be requiring the administrator to take extra steps to fix this problem; instead, the server should present consistent paths across different versions by default. These patches take a modified version of that approach: we provide a new export option which exports only a subset of a filesystem. With this flag, it becomes safe for mountd to export "/" by default, with no need for additional configuration. We begin just by defining the new flag. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-15 14:00:40 -05:00
Andy Adamson	68bf05efb7	nfs41: fix session fore channel negotiation If the rsize or wsize is not set on the mount command, negotiate the highest supported rsize and wsize in session creation. Fixes a bug where the client negotiated nfs41_maxwrite_overhead as ca_maxrequestsize and nfs41_maxread_overhead as ca_maxresponsesize resulting in NFS4ERR_REQ_TOO_BIG errors on writes. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 13:58:42 -05:00
Andy Adamson	a5523b84c4	nfs41: do not zero seqid portion of stateid on close Remove code left over from a previous minorversion draft. which specified zeroing seqid portions of stateid's. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 13:58:36 -05:00
Alexandros Batsakis	5601a00d67	nfs: run state manager in privileged mode Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 13:58:23 -05:00
Alexandros Batsakis	b257957e50	nfs: make recovery state manager operations privileged Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 13:58:07 -05:00
Alexandros Batsakis	689cf5c15b	nfs: enforce FIFO ordering of operations trying to acquire slot Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 13:55:18 -05:00
Alexandros Batsakis	40ead580ae	nfs: remove rpc_task argument from nfs4_find_slot Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 13:51:22 -05:00
Alexandros Batsakis	afe6c27ccb	nfs: change nfs4_do_setlk params to identify recovery type Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 13:50:32 -05:00
Alexandros Batsakis	0f7e720694	nfs: do not do a LOOKUP after open Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 13:50:21 -05:00
Alexandros Batsakis	3bfb0fc591	nfs: minor cleanup of session draining Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-15 13:50:01 -05:00
Linus Torvalds	d180ec5d34	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: event tracing support xfs: change the xfs_iext_insert / xfs_iext_remove xfs: cleanup bmap extent state macros	2009-12-15 09:12:43 -08:00
Joe Perches	7f2f4e72d3	fs/ubifs: use %pUB to print UUIDs Signed-off-by: Joe Perches <joe@perches.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:33 -08:00
Joe Perches	f0b34ae634	fs/gfs2/sys.c: use %pUB to print UUIDs Signed-off-by: Joe Perches <joe@perches.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:33 -08:00
Joe Perches	03daa57cdb	fs/xfs/xfs_log_recover.c: use %pU to print UUIDs Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:33 -08:00
André Goddard Rosa	e7d2860b69	tree-wide: convert open calls to remove spaces to skip_spaces() lib function Makes use of skip_spaces() defined in lib/string.c for removing leading spaces from strings all over the tree. It decreases lib.a code size by 47 bytes and reuses the function tree-wide: text data bss dec hex filename 64688 584 592 65864 10148 (TOTALS-BEFORE) 64641 584 592 65817 10119 (TOTALS-AFTER) Also, while at it, if we see (str && isspace(str)), we can be sure to remove the first condition (str) as the second one (isspace(str)) also evaluates to 0 whenever str == 0, making it redundant. In other words, "a char equals zero is never a space". Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below, and found occurrences of this pattern on 3 more files: drivers/leds/led-class.c drivers/leds/ledtrig-timer.c drivers/video/output.c @@ expression str; @@ ( // ignore skip_spaces cases while (str && isspace(str)) { $str++;\\|++str;$ } \| - str && isspace(*str) ) Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Cc: Julia Lawall <julia@diku.dk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Neil Brown <neilb@suse.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: David Howells <dhowells@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:32 -08:00
Hiroshi Shimamoto	e4c570c4cb	task_struct: make journal_info conditional journal_info in task_struct is used in journaling file system only. So introduce CONFIG_FS_JOURNAL_INFO and make it conditional. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:27 -08:00
john stultz	4614a696bd	procfs: allow threads to rename siblings via /proc/pid/tasks/tid/comm Setting a thread's comm to be something unique is a very useful ability and is helpful for debugging complicated threaded applications. However currently the only way to set a thread name is for the thread to name itself via the PR_SET_NAME prctl. However, there may be situations where it would be advantageous for a thread dispatcher to be naming the threads its managing, rather then having the threads self-describe themselves. This sort of behavior is available on other systems via the pthread_setname_np() interface. This patch exports a task's comm via proc/pid/comm and proc/pid/task/tid/comm interfaces, and allows thread siblings to write to these values. [akpm@linux-foundation.org: cleanups] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Mike Fulton <fultonm@ca.ibm.com> Cc: Sean Foley <Sean_Foley@ca.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:24 -08:00
Steven J. Magnani	7e1e0ef22c	procfs: use proper units for noMMU statm On no-MMU systems, sizes reported in /proc/n/statm have units of bytes. Per Documentation/filesystems/proc.txt, these values should be in pages. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Cc: Greg Ungerer <gerg@snapgear.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:24 -08:00
Jie Zhang	ea63763959	nommu: fix malloc performance by adding uninitialized flag The NOMMU code currently clears all anonymous mmapped memory. While this is what we want in the default case, all memory allocation from userspace under NOMMU has to go through this interface, including malloc() which is allowed to return uninitialized memory. This can easily be a significant performance penalty. So for constrained embedded systems were security is irrelevant, allow people to avoid clearing memory unnecessarily. This also alters the ELF-FDPIC binfmt such that it obtains uninitialised memory for the brk and stack region. Signed-off-by: Jie Zhang <jie.zhang@analog.com> Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:24 -08:00
Naoya Horiguchi	5dc37642cb	mm hugetlb: add hugepage support to pagemap This patch enables extraction of the pfn of a hugepage from /proc/pid/pagemap in an architecture independent manner. Details ------- My test program (leak_pagemap) works as follows: - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,) - read()/write() something on it, - call page-types with option -p, - munmap() and unlink() the file on hugetlbfs Without my patches ------------------ $ ./leak_pagemap flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 1 0 __________________________________ 0x0000000000000804 1 0 __R________M______________________ referenced,mmap 0x000000000000086c 81 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap 0x0000000000005808 5 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked 0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 101 0 The output of page-types don't show any hugepage. With my patches --------------- $ ./leak_pagemap flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 1 0 __________________________________ 0x0000000000030000 51100 199 ________________TG________________ compound_tail,huge 0x0000000000028018 100 0 ___UD__________H_G________________ uptodate,dirty,compound_head,huge 0x0000000000000804 1 0 __R________M______________________ referenced,mmap 0x000000000000080c 1 0 __RU_______M______________________ referenced,uptodate,mmap 0x000000000000086c 80 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap 0x0000000000005808 4 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked 0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 51300 200 The output of page-types shows 51200 pages contributing to hugepages, containing 100 head pages and 51100 tail pages as expected. [akpm@linux-foundation.org: build fix] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:24 -08:00
Amerigo Wang	ec81aecb29	hfs: fix a potential buffer overflow A specially-crafted Hierarchical File System (HFS) filesystem could cause a buffer overflow to occur in a process's kernel stack during a memcpy() call within the hfs_bnode_read() function (at fs/hfs/bnode.c:24). The attacker can provide the source buffer and length, and the destination buffer is a local variable of a fixed length. This local variable (passed as "&entry" from fs/hfs/dir.c:112 and allocated on line 60) is stored in the stack frame of hfs_bnode_read()'s caller, which is hfs_readdir(). Because the hfs_readdir() function executes upon any attempt to read a directory on the filesystem, it gets called whenever a user attempts to inspect any filesystem contents. [amwang@redhat.com: modify this patch and fix coding style problems] Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Eugene Teo <eteo@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Anderson <anderson@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:10 -08:00
Christoph Hellwig	0b1b213fcf	xfs: event tracing support Convert the old xfs tracing support that could only be used with the out of tree kdb and xfsidbg patches to use the generic event tracer. To use it make sure CONFIG_EVENT_TRACING is enabled and then enable all xfs trace channels by: echo 1 > /sys/kernel/debug/tracing/events/xfs/enable or alternatively enable single events by just doing the same in one event subdirectory, e.g. echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable or set more complex filters, etc. In Documentation/trace/events.txt all this is desctribed in more detail. To reads the events do a cat /sys/kernel/debug/tracing/trace Compared to the last posting this patch converts the tracing mostly to the one tracepoint per callsite model that other users of the new tracing facility also employ. This allows a very fine-grained control of the tracing, a cleaner output of the traces and also enables the perf tool to use each tracepoint as a virtual performance counter, allowing us to e.g. count how often certain workloads git various spots in XFS. Take a look at http://lwn.net/Articles/346470/ for some examples. Also the btree tracing isn't included at all yet, as it will require additional core tracing features not in mainline yet, I plan to deliver it later. And the really nice thing about this patch is that it actually removes many lines of code while adding this nice functionality: fs/xfs/Makefile \| 8 fs/xfs/linux-2.6/xfs_acl.c \| 1 fs/xfs/linux-2.6/xfs_aops.c \| 52 - fs/xfs/linux-2.6/xfs_aops.h \| 2 fs/xfs/linux-2.6/xfs_buf.c \| 117 +-- fs/xfs/linux-2.6/xfs_buf.h \| 33 fs/xfs/linux-2.6/xfs_fs_subr.c \| 3 fs/xfs/linux-2.6/xfs_ioctl.c \| 1 fs/xfs/linux-2.6/xfs_ioctl32.c \| 1 fs/xfs/linux-2.6/xfs_iops.c \| 1 fs/xfs/linux-2.6/xfs_linux.h \| 1 fs/xfs/linux-2.6/xfs_lrw.c \| 87 -- fs/xfs/linux-2.6/xfs_lrw.h \| 45 - fs/xfs/linux-2.6/xfs_super.c \| 104 --- fs/xfs/linux-2.6/xfs_super.h \| 7 fs/xfs/linux-2.6/xfs_sync.c \| 1 fs/xfs/linux-2.6/xfs_trace.c \| 75 ++ fs/xfs/linux-2.6/xfs_trace.h \| 1369 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/linux-2.6/xfs_vnode.h \| 4 fs/xfs/quota/xfs_dquot.c \| 110 --- fs/xfs/quota/xfs_dquot.h \| 21 fs/xfs/quota/xfs_qm.c \| 40 - fs/xfs/quota/xfs_qm_syscalls.c \| 4 fs/xfs/support/ktrace.c \| 323 --------- fs/xfs/support/ktrace.h \| 85 -- fs/xfs/xfs.h \| 16 fs/xfs/xfs_ag.h \| 14 fs/xfs/xfs_alloc.c \| 230 +----- fs/xfs/xfs_alloc.h \| 27 fs/xfs/xfs_alloc_btree.c \| 1 fs/xfs/xfs_attr.c \| 107 --- fs/xfs/xfs_attr.h \| 10 fs/xfs/xfs_attr_leaf.c \| 14 fs/xfs/xfs_attr_sf.h \| 40 - fs/xfs/xfs_bmap.c \| 507 +++------------ fs/xfs/xfs_bmap.h \| 49 - fs/xfs/xfs_bmap_btree.c \| 6 fs/xfs/xfs_btree.c \| 5 fs/xfs/xfs_btree_trace.h \| 17 fs/xfs/xfs_buf_item.c \| 87 -- fs/xfs/xfs_buf_item.h \| 20 fs/xfs/xfs_da_btree.c \| 3 fs/xfs/xfs_da_btree.h \| 7 fs/xfs/xfs_dfrag.c \| 2 fs/xfs/xfs_dir2.c \| 8 fs/xfs/xfs_dir2_block.c \| 20 fs/xfs/xfs_dir2_leaf.c \| 21 fs/xfs/xfs_dir2_node.c \| 27 fs/xfs/xfs_dir2_sf.c \| 26 fs/xfs/xfs_dir2_trace.c \| 216 ------ fs/xfs/xfs_dir2_trace.h \| 72 -- fs/xfs/xfs_filestream.c \| 8 fs/xfs/xfs_fsops.c \| 2 fs/xfs/xfs_iget.c \| 111 --- fs/xfs/xfs_inode.c \| 67 -- fs/xfs/xfs_inode.h \| 76 -- fs/xfs/xfs_inode_item.c \| 5 fs/xfs/xfs_iomap.c \| 85 -- fs/xfs/xfs_iomap.h \| 8 fs/xfs/xfs_log.c \| 181 +---- fs/xfs/xfs_log_priv.h \| 20 fs/xfs/xfs_log_recover.c \| 1 fs/xfs/xfs_mount.c \| 2 fs/xfs/xfs_quota.h \| 8 fs/xfs/xfs_rename.c \| 1 fs/xfs/xfs_rtalloc.c \| 1 fs/xfs/xfs_rw.c \| 3 fs/xfs/xfs_trans.h \| 47 + fs/xfs/xfs_trans_buf.c \| 62 - fs/xfs/xfs_vnodeops.c \| 8 70 files changed, 2151 insertions(+), 2592 deletions(-) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-14 23:08:16 -06:00
Christoph Hellwig	6ef3554422	xfs: change the xfs_iext_insert / xfs_iext_remove Change the xfs_iext_insert / xfs_iext_remove prototypes to pass more information which will allow pushing the trace points from the callers into those functions. This includes folding the whichfork information into the state variable to minimize the addition stack footprint. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-14 23:08:15 -06:00
Christoph Hellwig	7574aa92f9	xfs: cleanup bmap extent state macros Cleanup the extent state macros in the bmap code to use one common set of flags that we can pass to the tracing code later and remove a lot of the macro obsfucation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-14 23:08:15 -06:00
J. Bruce Fields	12045a6ee9	nfsd: let "insecure" flag vary by pseudoflavor This was an oversight; it should be among the export flags that can be allowed to vary by pseudoflavor. This allows an administrator to (for example) allow auth_sys mounts only from low ports, but allow auth_krb5 mounts to use any port. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-14 19:08:58 -05:00
J. Bruce Fields	e8e8753f7a	nfsd: new interface to advertise export features Soon we will add the new V4ROOT flag, and allow the INSECURE flag to vary by pseudoflavor. It would be useful for nfs-utils (for example, for improved exportfs error reporting) to be able to know when this happens. Use this new interface for that purpose. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-14 18:51:29 -05:00
Boaz Harrosh	9a74af2133	nfsd: Move private headers to source directory Lots of include/linux/nfsd/* headers are only used by nfsd module. Move them to the source directory Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-14 18:12:12 -05:00
Boaz Harrosh	68590c382b	vfs: nfsctl.c un-used nfsd #includes Only linux/nfsd/syscall.h is actually used. Remove the other nfsd #includes, so they can be moved to source directory. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-14 18:12:11 -05:00
Boaz Harrosh	0296f55f98	lockd: Remove un-used nfsd headers #includes In what history where these ever needed? Well not any more. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-14 18:12:11 -05:00
Boaz Harrosh	370d560017	compat.c: Remove dependence on nfsd private headers Two nfsd related headers where included but never actually used. The linux/nfsd/nfsd.h file will eventually be moved to fs/nfsd directory as it is only needed by nfsd itself. There are 3 more compat.c files in the Kernel at other ARCHs that wrongly #include nfsd headers. Once these are fixed the headers can be moved. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-14 18:12:10 -05:00
Boaz Harrosh	341eb18446	nfsd: Source files #include cleanups Now that the headers are fixed and carry their own wait, all fs/nfsd/ source files can include a minimal set of headers. and still compile just fine. This patch should improve the compilation speed of the nfsd module. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-14 18:12:09 -05:00
J. Bruce Fields	57ecb34feb	nfsd4: fix share mode permissions NFSv4 opens may function as locks denying other NFSv4 users the rights to open a file. We're requiring a user to have write permissions before they can deny write. We're not requiring a user to have write permissions to deny read, which is if anything a more drastic denial. What was intended was to require write permissions for DENY_READ. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-12-14 18:06:54 -05:00
Linus Torvalds	3ea6b3d0e6	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: Avoid IO in udf_clear_inode udf: Try harder when looking for VAT inode udf: Fix compilation with UDFFS_DEBUG enabled	2009-12-14 12:50:25 -08:00
Jan Kara	2c948b3f86	udf: Avoid IO in udf_clear_inode It is not very good to do IO in udf_clear_inode. First, VFS does not really expect inode to become dirty there and thus we have to write it ourselves, second, memory reclaim gets blocked waiting for IO when it does not really expect it, third, the IO pattern (e.g. on umount) resulting from writes in udf_clear_inode is bad and it slows down writing a lot. The reason why UDF needed to do IO in udf_clear_inode is that UDF standard mandates extent length to exactly match inode size. But when we allocate extents to a file or directory, we don't really know what exactly the final file size will be and thus temporarily set it to block boundary and later truncate it to exact length in udf_clear_inode. Now, this is changed to truncate to final file size in udf_release_file for regular files. For directories and symlinks, we do the truncation at the moment when learn what the final file size will be. Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-14 21:40:04 +01:00
Jan Kara	e971b0b9e0	udf: Try harder when looking for VAT inode Some disks do not contain VAT inode in the last recorded block as required by the standard but a few blocks earlier (or the number of recorded blocks is wrong). So look for the VAT inode a bit before the end of the media. Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-14 21:40:04 +01:00
Jan Kara	1fefd086df	udf: Fix compilation with UDFFS_DEBUG enabled Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-14 21:40:04 +01:00
Linus Torvalds	37222e1c9e	Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: (27 commits) md: add 'recovery_start' per-device sysfs attribute md: rcu_read_lock() walk of mddev->disks in md_do_sync() md: integrate spares into array at earliest opportunity. md: move compat_ioctl handling into md.c md: revise Kconfig help for MD_MULTIPATH md: add MODULE_DESCRIPTION for all md related modules. raid: improve MD/raid10 handling of correctable read errors. md/raid10: print more useful messages on device failure. md/bitmap: update dirty flag when bitmap bits are explicitly set. md: Support write-intent bitmaps with externally managed metadata. md/bitmap: move setting of daemon_lastrun out of bitmap_read_sb md: support updating bitmap parameters via sysfs. md: factor out parsing of fixed-point numbers md: support bitmap offset appropriate for external-metadata arrays. md: remove needless setting of thread->timeout in raid10_quiesce md: change daemon_sleep to be in 'jiffies' rather than 'seconds'. md: move offset, daemon_sleep and chunksize out of bitmap structure md: collect bitmap-specific fields into one structure. md/raid1: add takeover support for raid5->raid1 md: add honouring of suspend_{lo,hi} to raid1. ...	2009-12-14 10:03:36 -08:00
Linus Torvalds	3f86ce72cf	Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (75 commits) NFS: Fix nfs_migrate_page() rpc: remove unneeded function parameter in gss_add_msg() nfs41: Invoke RECLAIM_COMPLETE on all new client ids SUNRPC: IS_ERR/PTR_ERR confusion NFSv41: Fix a potential state leakage when restarting nfs4_close_prepare nfs41: Handle NFSv4.1 session errors in the delegation recall code nfs41: Retry delegation return if it failed with session error nfs41: Handle session errors during delegation return nfs41: Mark stateids in need of reclaim if state manager gets stale clientid NFS: Fix up the declaration of nfs4_restart_rpc when NFSv4 not configured nfs41: Don't clear DRAINING flag on NFS4ERR_STALE_CLIENTID nfs41: nfs41_setup_state_renewal NFSv41: More cleanups NFSv41: Fix up some bugs in the NFS4CLNT_SESSION_DRAINING code NFSv41: Clean up slot table management NFSv41: Fix nfs4_proc_create_session nfs41: Invoke RECLAIM_COMPLETE nfs41: RECLAIM_COMPLETE functionality nfs41: RECLAIM_COMPLETE XDR functionality Cleanup some NFSv4 XDR decode comments ...	2009-12-14 10:00:24 -08:00
Linus Torvalds	d0316554d3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c	2009-12-14 09:58:24 -08:00
Surbhi Palande	034fb4c95f	ext4: replace BUG() with return -EIO in ext4_ext_get_blocks This patch fixes the Kernel BZ #14286. When the address of an extent corresponding to a valid block is corrupted, a -EIO should be reported instead of a BUG(). This situation should not normally not occur except in the case of a corrupted filesystem. If however it does, then the system should not panic directly but depending on the mount time options appropriate action should be taken. If the mount options so permit, the I/O should be gracefully aborted by returning a -EIO. http://bugzilla.kernel.org/show_bug.cgi?id=14286 Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-14 09:53:52 -05:00
Theodore Ts'o	51b7e3c9fb	ext4: add module aliases for ext2 and ext3 Add module aliases for ext2 and ext3 when CONFIG_EXT4_USE_FOR_EXT23 is set. This makes the existing user-space stuff like mkinitrd working as is. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-21 10:56:09 -05:00
David Howells	84c6647303	ext4: Don't ask about supporting ext2/3 in ext4 if ext4 is not configured Don't offer to build ext2/3 support into ext4 if ext4 itself is not configured on. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-21 10:54:09 -05:00
Huang Weiyi	149feb00d7	ext4: remove unused #include <linux/version.h> Remove unused #include <linux/version.h>('s) in fs/ext4/block_validity.c fs/ext4/mballoc.h Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-14 09:24:20 -05:00
Frederic Weisbecker	cb1c2e51c5	reiserfs: Fix reiserfs lock and journal lock inversion dependency When we were using the bkl, we didn't care about dependencies against other locks, but the mutex conversion created new ones, which is why we have reiserfs_mutex_lock_safe(), which unlocks the reiserfs lock before acquiring another mutex. But this trick actually fails if we have acquired the reiserfs lock recursively, as we try to unlock it to acquire the new mutex without inverted dependency, but we eventually only decrease its depth. This happens in the case of a nested inode creation/deletion. Say we have no space left on the device, we create an inode and tak the lock but fail to create its entry, then we release the inode using iput(), which calls reiserfs_delete_inode() that takes the reiserfs lock recursively. The path eventually ends up in journal_begin() where we try to take the journal safely but we fail because of the reiserfs lock recursion: [ INFO: possible circular locking dependency detected ] 2.6.32-06486-g053fe57 #2 ------------------------------------------------------- vi/23454 is trying to acquire lock: (&journal->j_mutex){+.+...}, at: [<c110dac4>] do_journal_begin_r+0x64/0x2f0 but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>] reiserfs_write_lock+0x28/0x40 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c104f8f3>] validate_chain+0xa23/0xf70 [<c1050325>] __lock_acquire+0x4e5/0xa70 [<c105092a>] lock_acquire+0x7a/0xa0 [<c134c78f>] mutex_lock_nested+0x5f/0x2b0 [<c11106a8>] reiserfs_write_lock+0x28/0x40 [<c110dacb>] do_journal_begin_r+0x6b/0x2f0 [<c110ddcf>] journal_begin+0x7f/0x120 [<c10f76c2>] reiserfs_remount+0x212/0x4d0 [<c1093997>] do_remount_sb+0x67/0x140 [<c10a9ca6>] do_mount+0x436/0x6b0 [<c10a9f86>] sys_mount+0x66/0xa0 [<c1002c50>] sysenter_do_call+0x12/0x36 -> #0 (&journal->j_mutex){+.+...}: [<c104fe38>] validate_chain+0xf68/0xf70 [<c1050325>] __lock_acquire+0x4e5/0xa70 [<c105092a>] lock_acquire+0x7a/0xa0 [<c134c78f>] mutex_lock_nested+0x5f/0x2b0 [<c110dac4>] do_journal_begin_r+0x64/0x2f0 [<c110ddcf>] journal_begin+0x7f/0x120 [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140 [<c10a55fc>] generic_delete_inode+0x9c/0x150 [<c10a56ed>] generic_drop_inode+0x3d/0x60 [<c10a4607>] iput+0x47/0x50 [<c10e915c>] reiserfs_create+0x16c/0x1c0 [<c109a9c1>] vfs_create+0xc1/0x130 [<c109dbec>] do_filp_open+0x81c/0x920 [<c109004f>] do_sys_open+0x4f/0x110 [<c1090179>] sys_open+0x29/0x40 [<c1002c50>] sysenter_do_call+0x12/0x36 other info that might help us debug this: 2 locks held by vi/23454: #0: (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [<c109d64e>] do_filp_open+0x27e/0x920 #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>] reiserfs_write_lock+0x28/0x40 stack backtrace: Pid: 23454, comm: vi Not tainted 2.6.32-06486-g053fe57 #2 Call Trace: [<c134b202>] ? printk+0x18/0x1e [<c104e960>] print_circular_bug+0xc0/0xd0 [<c104fe38>] validate_chain+0xf68/0xf70 [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10 [<c1050325>] __lock_acquire+0x4e5/0xa70 [<c105092a>] lock_acquire+0x7a/0xa0 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0 [<c134c78f>] mutex_lock_nested+0x5f/0x2b0 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0 [<c110ff80>] ? delete_one_xattr+0x0/0x1c0 [<c110dac4>] do_journal_begin_r+0x64/0x2f0 [<c110ddcf>] journal_begin+0x7f/0x120 [<c11105b5>] ? reiserfs_delete_xattrs+0x15/0x50 [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140 [<c10a55bf>] ? generic_delete_inode+0x5f/0x150 [<c10ef490>] ? reiserfs_delete_inode+0x0/0x140 [<c10a55fc>] generic_delete_inode+0x9c/0x150 [<c10a56ed>] generic_drop_inode+0x3d/0x60 [<c10a4607>] iput+0x47/0x50 [<c10e915c>] reiserfs_create+0x16c/0x1c0 [<c1099a5d>] ? inode_permission+0x7d/0xa0 [<c109a9c1>] vfs_create+0xc1/0x130 [<c10e8ff0>] ? reiserfs_create+0x0/0x1c0 [<c109dbec>] do_filp_open+0x81c/0x920 [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10 [<c134dc0d>] ? _spin_unlock+0x1d/0x20 [<c10a6eea>] ? alloc_fd+0xba/0xf0 [<c109004f>] do_sys_open+0x4f/0x110 [<c1090179>] sys_open+0x29/0x40 [<c1002c50>] sysenter_do_call+0x12/0x36 To fix this, use reiserfs_lock_once() from reiserfs_delete_inode() which prevents from adding reiserfs lock recursion. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>	2009-12-14 11:47:11 +01:00
Frederic Weisbecker	500f5a0bf5	reiserfs: Fix possible recursive lock While allocating the bitmap using vmalloc, we hold the reiserfs lock, which makes lockdep later reporting a possible deadlock as we may swap out pages to allocate memory and then take the reiserfs lock recursively: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/312 [HC0[0]:SC0[0]:HE1:SE1] takes: (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c11108a8>] reiserfs_write_lock+0x28/0x40 {RECLAIM_FS-ON-W} state was registered at: [<c104e1c2>] mark_held_locks+0x62/0x90 [<c104e28a>] lockdep_trace_alloc+0x9a/0xc0 [<c108e396>] kmem_cache_alloc+0x26/0xf0 [<c10850ec>] __get_vm_area_node+0x6c/0xf0 [<c10857de>] __vmalloc_node+0x7e/0xa0 [<c108597b>] vmalloc+0x2b/0x30 [<c10e00b9>] reiserfs_init_bitmap_cache+0x39/0x70 [<c10f8178>] reiserfs_fill_super+0x2e8/0xb90 [<c1094345>] get_sb_bdev+0x145/0x180 [<c10f5a11>] get_super_block+0x21/0x30 [<c10931f0>] vfs_kern_mount+0x40/0xd0 [<c10932d9>] do_kern_mount+0x39/0xd0 [<c10a9857>] do_mount+0x2c7/0x6b0 [<c10a9ca6>] sys_mount+0x66/0xa0 [<c161589b>] mount_block_root+0xc4/0x245 [<c1615a75>] mount_root+0x59/0x5f [<c1615b8c>] prepare_namespace+0x111/0x14b [<c1615269>] kernel_init+0xcf/0xdb [<c10031fb>] kernel_thread_helper+0x7/0x1c This is actually fine for two reasons: we call vmalloc at mount time then it's not in the swapping out path. Also the reiserfs lock can be acquired recursively, but since its implementation depends on a mutex, it's hard and not necessary worth it to teach that to lockdep. The lock is useless at mount time anyway, at least until we replay the journal. But let's remove it from this path later as this needs more thinking and is a sensible change. For now we can just relax the lock around vmalloc, Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>	2009-12-14 11:43:09 +01:00
Arnd Bergmann	aa98aa3198	md: move compat_ioctl handling into md.c The RAID ioctls are only implemented in md.c, so the handling for them should also be moved there from fs/compat_ioctl.c. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Neil Brown <neilb@suse.de> Cc: Andre Noll <maan@systemlinux.org> Cc: linux-raid@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>	2009-12-14 12:51:41 +11:00
Trond Myklebust	52c9948b1f	Merge branch 'nfs-for-2.6.33'	2009-12-13 13:56:27 -05:00
Linus Torvalds	09cea96caa	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (151 commits) powerpc: Fix usage of 64-bit instruction in 32-bit altivec code MAINTAINERS: Add PowerPC patterns powerpc/pseries: Track previous CPPR values to correctly EOI interrupts powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP powerpc: Make "intspec" pointers in irq_host->xlate() const powerpc/8xx: DTLB Miss cleanup powerpc/8xx: Remove DIRTY pte handling in DTLB Error. powerpc/8xx: Start using dcbX instructions in various copy routines powerpc/8xx: Restore _PAGE_WRITETHRU powerpc/8xx: Add missing Guarded setting in DTLB Error. powerpc/8xx: Fixup DAR from buggy dcbX instructions. powerpc/8xx: Tag DAR with 0x00f0 to catch buggy instructions. powerpc/8xx: Update TLB asm so it behaves as linux mm expects. powerpc/8xx: Invalidate non present TLBs powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate pseries/pseries: Add code to online/offline CPUs of a DLPAR node powerpc: stop_this_cpu: remove the cpu from the online map. powerpc/pseries: Add kernel based CPU DLPAR handling sysfs/cpu: Add probe/release files powerpc/pseries: Kernel DLPAR Infrastructure ...	2009-12-12 14:27:24 -08:00
Linus Torvalds	92340ee319	Merge branch 'compat-ioctl-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground * 'compat-ioctl-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: usbdevfs: move compat_ioctl handling to devio.c lp: move compat_ioctl handling into lp.c compat_ioctl: pass compat pointer directly to handlers compat_ioctl: simplify lookup table compat_ioctl: simplify calling of handlers compat_ioctl: inline all conversion handlers compat_ioctl: Remove BKL compat_ioctl: remove all VT ioctl handling	2009-12-11 20:57:46 -08:00
Linus Torvalds	0f4974c439	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (58 commits) tty: split the lock up a bit further tty: Move the leader test in disassociate tty: Push the bkl down a bit in the hangup code tty: Push the lock down further into the ldisc code tty: push the BKL down into the handlers a bit tty: moxa: split open lock tty: moxa: Kill the use of lock_kernel tty: moxa: Fix modem op locking tty: moxa: Kill off the throttle method tty: moxa: Locking clean up tty: moxa: rework the locking a bit tty: moxa: Use more tty_port ops tty: isicom: fix deadlock on shutdown tty: mxser: Use the new locking rules to fix setserial properly tty: mxser: use the tty_port_open method tty: isicom: sort out the board init logic tty: isicom: switch to the new tty_port_open helper tty: tty_port: Add a kref object to the tty port tty: istallion: tty port open/close methods tty: stallion: Convert to the tty_port_open/close methods ...	2009-12-11 15:34:40 -08:00
Linus Torvalds	3126c136bc	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits) ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks() ext3: Fix data / filesystem corruption when write fails to copy data ext4: Support for 64-bit quota format ext3: Support for vfsv1 quota format quota: Implement quota format with 64-bit space and inode limits quota: Move definition of QFMT_OCFS2 to linux/quota.h ext2: fix comment in ext2_find_entry about return values ext3: Unify log messages in ext3 ext2: clear uptodate flag on super block I/O error ext2: Unify log messages in ext2 ext3: make "norecovery" an alias for "noload" ext3: Don't update the superblock in ext3_statfs() ext3: journal all modifications in ext3_xattr_set_handle ext2: Explicitly assign values to on-disk enum of filetypes quota: Fix WARN_ON in lookup_one_len const: struct quota_format_ops ubifs: remove manual O_SYNC handling afs: remove manual O_SYNC handling kill wait_on_page_writeback_range vfs: Implement proper O_SYNC semantics ...	2009-12-11 15:31:13 -08:00
Linus Torvalds	f4d544ee57	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: Fix error return for fallocate() on XFS xfs: cleanup dmapi macros in the umount path xfs: remove incorrect sparse annotation for xfs_iget_cache_miss xfs: kill the STATIC_INLINE macro xfs: uninline xfs_get_extsz_hint xfs: rename xfs_attr_fetch to xfs_attr_get_int xfs: simplify xfs_buf_get / xfs_buf_read interfaces xfs: remove IO_ISAIO xfs: Wrapped journal record corruption on read at recovery xfs: cleanup data end I/O handlers xfs: use WRITE_SYNC_PLUG for synchronous writeout xfs: reset the i_iolock lock class in the reclaim path xfs: I/O completion handlers must use NOFS allocations xfs: fix mmap_sem/iolock inversion in xfs_free_eofblocks xfs: simplify inode teardown	2009-12-11 15:30:29 -08:00
Linus Torvalds	f58df54a54	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (27 commits) Driver core: fix race in dev_driver_string Driver Core: Early platform driver buffer sysfs: sysfs_setattr remove unnecessary permission check. sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir sysfs: Propagate renames to the vfs on demand sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish sysfs: In sysfs_chmod_file lazily propagate the mode change. sysfs: Implement sysfs_getattr & sysfs_permission sysfs: Nicely indent sysfs_symlink_inode_operations sysfs: Update s_iattr on link and unlink. sysfs: Fix locking and factor out sysfs_sd_setattr sysfs: Simplify iattr time assignments sysfs: Simplify sysfs_chmod_file semantics sysfs: Use dentry_ops instead of directly playing with the dcache sysfs: Rename sysfs_d_iput to sysfs_dentry_iput sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex debugfs: fix create mutex racy fops and private data Driver core: Don't remove kobjects in device_shutdown. firmware_class: make request_firmware_nowait more useful Driver-Core: devtmpfs - set root directory mode to 0755 ...	2009-12-11 15:24:56 -08:00
Sukadev Bhattiprolu	edfacdd6f8	devpts_get_tty() should validate inode devpts_get_tty() assumes that the inode passed in is associated with a valid pty. But if the only reference to the pty is via a bind-mount, the inode passed to devpts_get_tty() while valid, would refer to a pty that no longer exists. With a lot of debug effort, Grzegorz Nosek developed a small program (see below) to reproduce a crash on recent kernels. This crash is a regression introduced by the commit: commit `527b3e4773` Author: Sukadev Bhattiprolu <sukadev@us.ibm.com> Date: Mon Oct 13 10:43:08 2008 +0100 To fix, ensure that the dentry associated with the inode has not yet been deleted/unhashed by devpts_pty_kill(). See also: https://lists.linux-foundation.org/pipermail/containers/2009-July/019273.html tty-bug.c: #define _GNU_SOURCE #include <fcntl.h> #include <sched.h> #include <stdlib.h> #include <sys/mount.h> #include <sys/signal.h> #include <unistd.h> #include <stdio.h> #include <linux/fs.h> void dummy(int sig) { } static int child(void unused) { int fd; signal(SIGINT, dummy); signal(SIGHUP, dummy); pause(); / cheesy synchronisation to wait for /dev/pts/0 to appear / mount("/dev/pts/0", "/dev/console", NULL, MS_BIND, NULL); sleep(2); fd = open("/dev/console", O_RDWR); dup(0); dup(0); write(1, "Hello world!\n", sizeof("Hello world!\n")-1); return 0; } int main(void) { pid_t pid; char stack; stack = malloc(16384); pid = clone(child, stack+16384, CLONE_NEWNS\|SIGCHLD, NULL); open("/dev/ptmx", O_RDWR\|O_NOCTTY\|O_NONBLOCK); unlockpt(fd); grantpt(fd); sleep(2); kill(pid, SIGHUP); sleep(1); return 0; /* exit before child opens /dev/console */ } Reported-by: Grzegorz Nosek <root@localdomain.pl> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Tested-by: Serge Hallyn <serue@us.ibm.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 15:18:05 -08:00
Jason Gunthorpe	44a743f687	xfs: Fix error return for fallocate() on XFS Noticed that through glibc fallocate would return 28 rather than -1 and errno = 28 for ENOSPC. The xfs routines uses XFS_ERROR format positive return error codes while the syscalls use negative return codes. Fixup the two cases in xfs_vn_fallocate syscall to convert to negative. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:23 -06:00
Christoph Hellwig	30ac0683dd	xfs: cleanup dmapi macros in the umount path Stop the flag saving as we never mangle those in the unmount path, and hide all the weird arguents to the dmapi code inside the XFS_SEND_PREUNMOUNT / XFS_SEND_UNMOUNT macros. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:23 -06:00
Christoph Hellwig	0c3dc2b02a	xfs: remove incorrect sparse annotation for xfs_iget_cache_miss xfs_iget_cache_miss does not get called with the pag_ici_lock held, so the __releases annotation is incorrect. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:23 -06:00
Christoph Hellwig	b8f82a4a6f	xfs: kill the STATIC_INLINE macro Remove our own STATIC_INLINE macro. For small function inside implementation files just use STATIC and let gcc inline it, and for those in headers do the normal static inline - they are all small enough to be inlined for debug builds, too. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:22 -06:00
Christoph Hellwig	5683f53e36	xfs: uninline xfs_get_extsz_hint This function is too large to efficiently be inlined. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:22 -06:00
Christoph Hellwig	e82fa0c7ca	xfs: rename xfs_attr_fetch to xfs_attr_get_int Using a totally different name for the low-level get operation does not fit the _int convention used in the rest of the attr code, so rename it. While we're at it also fix the prototype to use the normal convention and mark it static as it's never used outside of xfs_attr.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:22 -06:00
Christoph Hellwig	6ad112bfb5	xfs: simplify xfs_buf_get / xfs_buf_read interfaces Currently the low-level buffer cache interfaces are highly confusing as we have a _flags variant of each that does actually respect the flags, and one without _flags which has a flags argument that gets ignored and overriden with a default set. Given that very few places use the default arguments get rid of the duplication and convert all callers to pass the flags explicitly. Also remove the now confusing _flags postfix. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:21 -06:00
Christoph Hellwig	c355c656fe	xfs: remove IO_ISAIO We set the IO_ISAIO flag for all read/write I/O since early Linux 2.6.x. Remove it as it has lost it's purpose long ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:21 -06:00
Andy Poling	fc5bc4c85c	xfs: Wrapped journal record corruption on read at recovery Summary of problem: If a journal record wraps at the physical end of the journal, it has to be read in two parts in xlog_do_recovery_pass(): a read at the physical end and a read at the physical beginning. If xlog_bread() has to re-align the first read, the second read request does not take that re-alignment into account. If the first read was re-aligned, the second read over-writes the end of the data from the first read, effectively corrupting it. This can happen either when reading the record header or reading the record data. The first sanity check in xlog_recover_process_data() is to check for a valid clientid, so that is the error reported. Summary of fix: If there was a first read at the physical end, XFS_BUF_PTR() returns where the data was requested to begin. Conversely, because it is the result of xlog_align(), offset indicates where the requested data for the first read actually begins - whether or not xlog_bread() has re-aligned it. Using offset as the base for the calculation of where to place the second read data ensures that it will be correctly placed immediately following the data from the first read instead of sometimes over-writing the end of it. The attached patch has resolved the reported problem of occasional inability to recover the journal (reporting "bad clientid"). Signed-off-by: Andy Poling <andy@realbig.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:21 -06:00
Christoph Hellwig	5ec4fabb02	xfs: cleanup data end I/O handlers Currently we have different end I/O handlers for read vs the different types of write I/O. But they are all very similar so we could just use one with a few conditionals and reduce code size a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:20 -06:00
Christoph Hellwig	06342cf8ad	xfs: use WRITE_SYNC_PLUG for synchronous writeout The VM and I/O schedulers now expect us to use WRITE_SYNC_PLUG for synchronous writeout. Right now I can't see any changes in performance numbers with this, but we're getting some beating for not using it, and the knowledge definitely could help the block code to make better decisions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:20 -06:00
Christoph Hellwig	033da48fda	xfs: reset the i_iolock lock class in the reclaim path The iolock is used for protecting reads, writes and block truncates against each other. We have two classes of callers, the first one is induced by a file operation and requires a reference to the inode be held and not dropped after the operation is done: - xfs_vm_vmap, xfs_vn_fallocate, xfs_read, xfs_write, xfs_splice_read, xfs_splice_write and xfs_setattr are all implementations of VFS methods that require a live inode - xfs_getbmap and xfs_swap_extents are ioctl subcommand for which the same is true - xfs_truncate_file is only called on quota inodes just returned from xfs_iget - xfs_sync_inode_data does the lock just after an igrab() - xfs_filestream_associate and xfs_filestream_new_ag take the iolock on the parent inode of an inode which by VFS rules must be referenced And we have various calls to truncate blocks past EOF or the whole file when dropping the last reference to an inode. Unfortunately lockdep complains when we do memory allocations that can recurse into the filesystem in the first class because the second class happens to take the same lock. To avoid this re-init the iolock in the beginning of xfs_fs_clear_inode to get a new lock class. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:20 -06:00
Christoph Hellwig	80641dc66a	xfs: I/O completion handlers must use NOFS allocations When completing I/O requests we must not allow the memory allocator to recurse into the filesystem, as we might deadlock on waiting for the I/O completion otherwise. The only thing currently allocating normal GFP_KERNEL memory is the allocation of the transaction structure for the unwritten extent conversion. Add a memflags argument to _xfs_trans_alloc to allow controlling the allocator behaviour. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Thomas Neumann <tneumann@users.sourceforge.net> Tested-by: Thomas Neumann <tneumann@users.sourceforge.net> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:20 -06:00
Christoph Hellwig	c56c9631cb	xfs: fix mmap_sem/iolock inversion in xfs_free_eofblocks When xfs_free_eofblocks is called from ->release the VM might already hold the mmap_sem, but in the write path we take the iolock before taking the mmap_sem in the generic write code. Switch xfs_free_eofblocks to only trylock the iolock if called from ->release and skip trimming the prellocated blocks in that case. We'll still free them later on the final iput. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:19 -06:00
Christoph Hellwig	848ce8f731	xfs: simplify inode teardown Currently the reclaim code for the case where we don't reclaim the final reclaim is overly complicated. We know that the inode is clean but instead of just directly reclaiming the clean inode we go through the whole process of marking the inode reclaimable just to directly reclaim it from the calling context. Besides being overly complicated this introduces a race where iget could recycle an inode between marked reclaimable and actually being reclaimed leading to panics. This patch gets rid of the existing reclaim path, and replaces it with a simple call to xfs_ireclaim if the inode was clean. While we're at it we also use the slightly more lax xfs_inode_clean check we'd use later to determine if we need to flush the inode here. Finally get rid of xfs_reclaim function and place the remaining small bits of reclaim code directly into xfs_fs_destroy_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Patrick Schreurs <patrick@news-service.com> Reported-by: Tommy van Leeuwen <tommy@news-service.com> Tested-by: Patrick Schreurs <patrick@news-service.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:19 -06:00
Linus Torvalds	4e2ccdb040	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (49 commits) nilfs2: separate wait function from nilfs_segctor_write nilfs2: add iterator for segment buffers nilfs2: hide nilfs_write_info struct in segment buffer code nilfs2: relocate io status variables to segment buffer nilfs2: do not return io error for bio allocation failure nilfs2: use list_splice_tail or list_splice_tail_init nilfs2: replace mark_inode_dirty as nilfs_mark_inode_dirty nilfs2: delete mark_inode_dirty in nilfs_delete_entry nilfs2: delete mark_inode_dirty in nilfs_commit_chunk nilfs2: change return type of nilfs_commit_chunk nilfs2: split nilfs_unlink as nilfs_do_unlink and nilfs_unlink nilfs2: delete redundant mark_inode_dirty nilfs2: expand inode_inc_link_count and inode_dec_link_count nilfs2: delete mark_inode_dirty from nilfs_set_link nilfs2: delete mark_inode_dirty in nilfs_new_inode nilfs2: add norecovery mount option nilfs2: add helper to get if volume is in a valid state nilfs2: move recovery completion into load_nilfs function nilfs2: apply readahead for recovery on mount nilfs2: clean up get/put function of a segment usage ...	2009-12-11 11:49:18 -08:00
Eric W. Biederman	e16acb503b	sysfs: sysfs_setattr remove unnecessary permission check. inode_change_ok already clears the SGID bit when necessary so there is no reason for sysfs_setattr to carry code to do the same, and it is good to kill the extra copy because when I moved the code last in certain corner cases the code will look at the wrong gid. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	ca1bab3819	sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir These two functions do 90% of the same work and it doesn't significantly obfuscate the function to allow both the parent dir and the name to change at the same time. So merge them together to simplify maintenance, and increase testing. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	832b6af198	sysfs: Propagate renames to the vfs on demand By teaching sysfs_revalidate to hide a dentry for a sysfs_dirent if the sysfs_dirent has been renamed, and by teaching sysfs_lookup to return the original dentry if the sysfs dirent has been renamed. I can show the results of renames correctly without having to update the dcache during the directory rename. This massively simplifies the rename logic allowing a lot of weird sysfs special cases to be removed along with a lot of now unnecesary helper code. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	a16bbc3430	sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish With lazy inode updates and dentry operations bringing everything into sync on demand there is no longer any need to immediately update the vfs or grab i_mutex to protect those updates as we make changes to sysfs. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	06fc0d66f7	sysfs: In sysfs_chmod_file lazily propagate the mode change. Now that sysfs_getattr and sysfs_permission refresh the vfs inode there is no need to immediatly push the mode change into the vfs cache. Reducing the amount of work needed and simplifying the locking. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	e61ab4ae48	sysfs: Implement sysfs_getattr & sysfs_permission With the implementation of sysfs_getattr and sysfs_permission sysfs becomes able to lazily propogate inode attribute changes from the sysfs_dirents to the vfs inodes. This paves the way for deleting significant chunks of now unnecessary code. While doing this we did not reference sysfs_setattr from sysfs_symlink_inode_operations so I added along with sysfs_getattr and sysfs_permission. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	c099aacd48	sysfs: Nicely indent sysfs_symlink_inode_operations Lining up the functions in sysfs_symlink_inode_operations follows the pattern in the rest of sysfs and makes things slightly more readable. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	6b0bfe9383	sysfs: Update s_iattr on link and unlink. Currently sysfs updates the timestamps on the vfs directory inode when we create or remove a directory entry but doesn't update the cached copy on the sysfs_dirent, fix that oversight. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	35df63c46c	sysfs: Fix locking and factor out sysfs_sd_setattr Cleanly separate the work that is specific to setting the attributes of a sysfs_dirent from what is needed to update the attributes of a vfs inode. Additionally grab the sysfs_mutex to keep any nasties from surprising us when updating the sysfs_dirent. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	4be3df28be	sysfs: Simplify iattr time assignments The granularity of sysfs time when we keep it is 1 ns. Which when passed to timestamp_trunc results in a nop. So remove the unnecessary function call making sysfs_setattr slightly easier to read. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	4c6974f51a	sysfs: Simplify sysfs_chmod_file semantics Currently every caller of sysfs_chmod_file happens at either file creation time to set a non-default mode or in response to a specific user requested space change in policy. Making timestamps of when the chmod happens and notification of a file changing mode uninteresting. Remove the unnecessary time stamp and filesystem change notification, and removes the last of the explicit inotify and donitfy support from sysfs. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	e8f077c883	sysfs: Use dentry_ops instead of directly playing with the dcache Calling d_drop unconditionally when a sysfs_dirent is deleted has the potential to leak mounts, so instead implement dentry delete and revalidate operations that cause sysfs dentries to be removed at the appropriate time. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	28a027cfc0	sysfs: Rename sysfs_d_iput to sysfs_dentry_iput Using dentry instead of d in the function name is what several other filesystems are doing and it seems to be a more readable convention. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	f44d3e7857	sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex The sysfs_mutex is required to ensure updates are and will remain atomic with respect to other inode iattr updates, that do not happen through the filesystem. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Mathieu Desnoyers	d3a3b0adad	debugfs: fix create mutex racy fops and private data Setting fops and private data outside of the mutex at debugfs file creation introduces a race where the files can be opened with the wrong file operations and private data. It is easy to trigger with a process waiting on file creation notification. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Stefan Richter	f38506c49d	sysfs: mark a locally-only used function static Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: David P. Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:51 -08:00
Arnd Bergmann	637e8a60a7	usbdevfs: move compat_ioctl handling to devio.c Half the compat_ioctl handling is in devio.c, the other half is in fs/compat_ioctl.c. This moves everything into one place for consistency. As a positive side-effect, push down the BKL into the ioctl methods. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Oliver Neukum <oliver@neukum.org> Cc: Alon Bar-Lev <alon.barlev@gmail.com> Cc: David Vrabel <david.vrabel@csr.com> Cc: linux-usb@vger.kernel.org	2009-12-10 22:55:37 +01:00
Arnd Bergmann	3695669cd4	lp: move compat_ioctl handling into lp.c Handling for LPSETTIMEOUT can easily be done in lp_ioctl, which is the only user. As a positive side-effect, push the BKL into the ioctl methods. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-10 22:55:36 +01:00
Arnd Bergmann	43c6e7b97f	compat_ioctl: pass compat pointer directly to handlers Instead of having each handler call compat_ptr, we can now convert the pointer once and pass that to each handler. This saves a little bit of both source and object code size. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2009-12-10 22:52:12 +01:00
Arnd Bergmann	661f627da9	compat_ioctl: simplify lookup table The compat_ioctl table now only contains entries for COMPATIBLE_IOCTL, so we only need to know if a number is listed in it or now. As an optimization, we hash the table entries with a reversible transformation to get a more uniform distribution over it, sort the table at startup and then guess the position in the table when an ioctl number gets called to do a linear search from there. With the current set of ioctl numbers and the chosen transformation function, we need an average of four steps to find if a number is in the set, all of the accesses within one or two cache lines. This at least as good as the previous hash table approach but saves 8.5 kb of kernel memory. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2009-12-10 22:52:11 +01:00
Arnd Bergmann	789f0f8911	compat_ioctl: simplify calling of handlers The compat_ioctl array now contains only entries for ioctl numbers that do not require a separate handler. By special-casing the ULONG_IOCTL case in the do_ioctl_trans function, we can kill the final use of a function pointer in the array. text data bss dec hex filename 7539 13352 2080 22971 59bb before/fs/compat_ioctl.o 7910 8552 2080 18542 486e after/fs/compat_ioctl.o Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2009-12-10 22:52:10 +01:00
Arnd Bergmann	5a07ea0b97	compat_ioctl: inline all conversion handlers This makes all ioctl conversion handlers called from a single switch statement, leaving only COMPATIBLE_IOCTL and ULONG_IOCTL statements in the table. This is somewhat more space efficient and also lets us simplify the handling of the lookup table significantly. before: text data bss dec hex filename 7619 14024 2080 23723 5cab obj/fs/compat_ioctl.o after: 7567 13352 2080 22999 59d7 obj/fs/compat_ioctl.o Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2009-12-10 22:52:09 +01:00
Arnd Bergmann	348c4b9078	compat_ioctl: Remove BKL We have always called ioctl conversion handlers under the big kernel lock, although that is generally not necessary. In particular it is not needed for conversion of data structures and for calling sys_ioctl or do_vfs_ioctl, which will get the BKL again if needed. Handlers doing more than those two have been moved out, so we can kill off the BKL from compat_sys_ioctl. This may significantly improve latencies with 32 bit applications, and it avoids a common scenario where a thread acquires the BKL twice. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2009-12-10 22:52:08 +01:00
Arnd Bergmann	fb07a5f857	compat_ioctl: remove all VT ioctl handling The VT driver now handles all of these ioctls directly, so we can remove the handlers from common code. These are the only handlers that require the BKL because they directly perform the ioctl action rather than just converting the data structures. Once they are gone, we can remove the BKL from the remaining ioctl conversion handlers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2009-12-10 22:52:08 +01:00
Linus Torvalds	02412f49f6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: always use GFP_NOFS	2009-12-10 09:33:59 -08:00
Linus Torvalds	4515c3069d	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits) ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) ext4: Do not override ext2 or ext3 if built they are built as modules jbd2: Export jbd2_log_start_commit to fix ext4 build ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT ext4: Wait for proper transaction commit on fsync ext4: fix incorrect block reservation on quota transfer. ext4: quota macros cleanup ext4: ext4_get_reserved_space() must return bytes instead of blocks ext4: remove blocks from inode prealloc list on failure ext4: wait for log to commit when umounting ext4: Avoid data / filesystem corruption when write fails to copy data ext4: Use ext4 file system driver for ext2/ext3 file system mounts ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() ext4: remove unused parameter wbc from __ext4_journalled_writepage() ext4: remove encountered_congestion trace ext4: move_extent_per_page() cleanup ext4: initialize moved_len before calling ext4_move_extents() ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT ext4: use ext4_data_block_valid() in ext4_free_blocks() ...	2009-12-10 09:33:29 -08:00
Linus Torvalds	a5eba3f66f	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd * 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Multi-device mirror support exofs: Move all operations to an io_engine exofs: move osd.c to ios.c exofs: statfs blocks is sectors not FS blocks exofs: Prints on mount and unmout exofs: refactor exofs_i_info initialization into common helper exofs: dbg-print less exofs: More sane debug print trivial: some small fixes in exofs documentation	2009-12-10 09:32:24 -08:00
Linus Torvalds	fc1495bf99	Merge git://git.infradead.org/ubifs-2.6 * git://git.infradead.org/ubifs-2.6: UBIFS: fix return code in check_leaf UBI: flush wl before clearing update marker MAINTAINERS: change e-mail of Artem Bityutskiy UBIFS: remove manual O_SYNC handling UBIFS: support mounting of UBI volume character devices UBI: Add ubi_open_volume_path	2009-12-10 09:31:45 -08:00
Trond Myklebust	190f38e5ce	NFS: Fix nfs_migrate_page() The call to migrate_page() will cause the page->private field to be cleared. Also fix up the locking around the page->private transfer, so that we ensure that calls to nfs_page_find_request() don't end up racing. Finally, fix up a double free bug: nfs_unlock_request() already calls nfs_release_request() for us... Reported-by: Wu Fengguang <fengguang.wu@intel.com> Tested-by: Andi Kleen <andi@firstfloor.org> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-10 09:05:55 -05:00
Roel Kluin	8e0eb4011b	ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks() Return the PTR_ERR of the correct pointer. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:55 +01:00
Jan Kara	68eb3db083	ext3: Fix data / filesystem corruption when write fails to copy data When ext3_write_begin fails after allocating some blocks or generic_perform_write fails to copy data to write, we truncate blocks already instantiated beyond i_size. Although these blocks were never inside i_size, we have to truncate pagecache of these blocks so that corresponding buffers get unmapped. Otherwise subsequent __block_prepare_write (called because we are retrying the write) will find the buffers mapped, not call ->get_block, and thus the page will be backed by already freed blocks leading to filesystem and data corruption. Reported-by: James Y Knight <foom@fuhm.net> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:55 +01:00
Jan Kara	5a20bdfcdc	ext4: Support for 64-bit quota format Add support for new 64-bit quota format. It is enough to add proper mount options handling. The rest is done by the generic code. Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:54 +01:00
Jan Kara	1aeec43432	ext3: Support for vfsv1 quota format We just have to add proper mount options handling. The rest is handled by the generic quota code. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:54 +01:00
Jan Kara	498c60153e	quota: Implement quota format with 64-bit space and inode limits So far the maximum quota space limit was 4TB. Apparently this isn't enough for Lustre guys anymore. So implement new quota format which raises block limits to 2^64 bytes. Also store number of inodes and inode limits in 64-bit variables as 2^32 files isn't that insanely high anymore. The first version of the patch has been developed by Andrew Perepechko <Andrew.Perepechko@Sun.COM>. CC: Andrew.Perepechko@Sun.COM Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:54 +01:00
Jan Kara	3067393005	quota: Move definition of QFMT_OCFS2 to linux/quota.h Move definition of this constant to linux/quota.h so that it cannot clash with other format IDs. CC: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:53 +01:00
Jérémy Cochoy	92e128884b	ext2: fix comment in ext2_find_entry about return values Signed-off-by: Jérémy Cochoy <jeremy.cochoy@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:53 +01:00
Alexey Fisher	4cf46b67eb	ext3: Unify log messages in ext3 Make messages produced by ext3 more unified. It should be easy to parse. dmesg before patch: [ 4893.684892] reservations ON [ 4893.684896] xip option not supported [ 4893.684964] EXT3-fs warning: maximal mount count reached, running e2fsck is recommended dmesg after patch: [ 873.300792] EXT3-fs (loop0): using internal journaln [ 873.300796] EXT3-fs (loop0): mounted filesystem with writeback data mode [ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0. [ 723.755642] EXT3-fs (loop0): error: bad blocksize 8192 [ 357.874687] EXT3-fs (loop0): error: no journal found. mounting ext3 over ext2? [ 873.300764] EXT3-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended [ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0. Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:53 +01:00
Stephen Hemminger	2074abfeb8	ext2: clear uptodate flag on super block I/O error This fixes a WARN backtrace in mark_buffer_dirty() that occurs during unmount when a USB or floppy device is removed. I reported this a kernel regression, but looks like it might have been there for longer than that. The super block update from a previous operation has marked the buffer as in error, and the flag has to be cleared before doing the update. (Similar code already exists in ext4). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:53 +01:00
Alexey Fisher	2314b07cb4	ext2: Unify log messages in ext2 make messages produced by ext2 more unified. It should be easy to parse. dmesg before patch: [ 4893.684892] reservations ON [ 4893.684896] xip option not supported [ 4893.684961] EXT2-fs warning: mounting ext3 filesystem as ext2 [ 4893.684964] EXT2-fs warning: maximal mount count reached, running e2fsck is recommended [ 4893.684990] EXT II FS: 0.5b, 95/08/09, bs=1024, fs=1024, gc=2, bpg=8192, ipg=1280, mo=80010] dmesg after patch: [ 4893.684892] EXT2-fs (loop0): reservations ON [ 4893.684896] EXT2-fs (loop0): xip option not supported [ 4893.684961] EXT2-fs (loop0): warning: mounting ext3 filesystem as ext2 [ 4893.684964] EXT2-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended [ 4893.684990] EXT2-fs (loop0): 0.5b, 95/08/09, bs=1024, fs=1024, gc=2, bpg=8192, ipg=1280, mo=80010] Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net> Reviewed-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:52 +01:00
Eric Sandeen	dee1d3b627	ext3: make "norecovery" an alias for "noload" Users on the list recently complained about differences across filesystems w.r.t. how to mount without a journal replay. In the discussion it was noted that xfs's "norecovery" option is perhaps more descriptively accurate than "noload," so let's make that an alias for ext3. Also show this status in /proc/mounts Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:52 +01:00
Eric Sandeen	b918397542	ext3: Don't update the superblock in ext3_statfs() commit `a71ce8c6c9` updated ext3_statfs() to update the on-disk superblock counters, but modified this buffer directly without any journaling of the change. This is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. The modifications were originally to keep the sb "more" in sync, so that a readonly fsck of the device didn't flag this as an error (as often), but apparently e2fsprogs deals with this differently now, anyway. Based on Ted's patch for ext4, which was in turn based on my work on that bug and another preliminary patch... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:52 +01:00
Eric Sandeen	d965736b8c	ext3: journal all modifications in ext3_xattr_set_handle ext3_xattr_set_handle() was zeroing out an inode outside of journaling constraints; this is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. Although ext3 doesn't have the crc issue, modifications out of journal control are a Bad Thing. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:52 +01:00
Jan Kara	c56818d7dc	quota: Fix WARN_ON in lookup_one_len We should hold i_mutex when looking up quota files for journaled quotas, otherwise a WARN_ON in lookup_one_len triggers. The fact that we didn't hold i_mutex previously probably could not lead to a real bug since the filesystem is just being mounted / remounted read-write and thus the root directory cannot change anyway but it's definitely cleaner with i_mutex. Reported-by: Bastien ROUCARIES <roucaries.bastien@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:51 +01:00
Alexey Dobriyan	1472da5fdc	const: struct quota_format_ops Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:51 +01:00
Christoph Hellwig	5ced58f735	ubifs: remove manual O_SYNC handling generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC. Remove the duplicate call to ubifs_sync_wbufs_by_inode which is already covered by ubifs_fsync. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:51 +01:00
Christoph Hellwig	027cf316af	afs: remove manual O_SYNC handling generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC. Remove the duplicate manual invocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:50 +01:00
Christoph Hellwig	94004ed726	kill wait_on_page_writeback_range All callers really want the more logical filemap_fdatawait_range interface, so convert them to use it and merge wait_on_page_writeback_range into filemap_fdatawait_range. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:50 +01:00
Christoph Hellwig	6b2f3d1f76	vfs: Implement proper O_SYNC semantics While Linux provided an O_SYNC flag basically since day 1, it took until Linux 2.4.0-test12pre2 to actually get it implemented for filesystems, since that day we had generic_osync_around with only minor changes and the great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC" comment. This patch intends to actually give us real O_SYNC semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC patches which are required before this patch it's actually surprisingly simple, we just need to figure out when to set the datasync flag to vfs_fsync_range and when not. This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's numerical value to keep binary compatibility, and adds a new real O_SYNC flag. To guarantee backwards compatiblity it is defined as expanding to both the O_DSYNC and the new additional binary flag (__O_SYNC) to make sure we are backwards-compatible when compiled against the new headers. This also means that all places that don't care about the differences can just check O_DSYNC and get the right behaviour for O_SYNC, too - only places that actuall care need to check __O_SYNC in addition. Drivers and network filesystems have been updated in a fail safe way to always do the full sync magic if O_DSYNC is set. The few places setting O_SYNC for lower layers are kept that way for now to stay failsafe. We enforce that O_DSYNC is set when __O_SYNC is set early in the open path to make sure we always get these sane options. Note that parisc really screwed up their headers as they already define a O_DSYNC that has always been a no-op. We try to repair it by using it for the new O_DSYNC and redefinining O_SYNC to send both the traditional O_SYNC numerical value _and_ the O_DSYNC one. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:50 +01:00
Jan Kara	59bc055211	zisofs: Implement reading of compressed files when PAGE_CACHE_SIZE > compress block size Also split and cleanup zisofs_readpage() when we are changing it anyway. Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:49 +01:00
Boaz Harrosh	04dc1e88ad	exofs: Multi-device mirror support This patch changes on-disk format, it is accompanied with a parallel patch to mkfs.exofs that enables multi-device capabilities. After this patch, old exofs will refuse to mount a new formatted FS and new exofs will refuse an old format. This is done by moving the magic field offset inside the FSCB. A new FSCB version field was added. In the future, exofs will refuse to mount unmatched FSCB version. To up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option before mounting. Introduced, a new object that contains a device-table. This object contains the default data-map and a linear array of devices information, which identifies the devices used in the filesystem. This object is only written to offline by mkfs.exofs. This is why it is kept separate from the FSCB, since the later is written to while mounted. Same partition number, same object number is used on all devices only the device varies. * define the new format, then load the device table on mount time make sure every thing is supported. * Change I/O engine to now support Mirror IO, .i.e write same data to multiple devices, read from a random device to spread the read-load from multiple clients (TODO: stripe read) Implementation notes: A few points introduced in previous patch should be mentioned here: * Special care was made so absolutlly all operation that have any chance of failing are done before any osd-request is executed. This is to minimize the need for a data consistency recovery, to only real IO errors. * Each IO state has a kref. It starts at 1, any osd-request executed will increment the kref, finally when all are executed the first ref is dropped. At IO-done, each request completion decrements the kref, the last one to return executes the internal _last_io() routine. _last_io() will call the registered io_state_done. On sync mode a caller does not supply a done method, indicating a synchronous request, the caller is put to sleep and a special io_state_done is registered that will awaken the caller. Though also in sync mode all operations are executed in parallel. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:23 +02:00
Boaz Harrosh	06886a5a3d	exofs: Move all operations to an io_engine In anticipation for multi-device operations, we separate osd operations into an abstract I/O API. Currently only one device is used but later when adding more devices, we will drive all devices in parallel according to a "data_map" that describes how data is arranged on multiple devices. The file system level operates, like before, as if there is one object (inode-number) and an i_size. The io engine will split this to the same object-number but on multiple device. At first we introduce Mirror (raid 1) layout. But at the final outcome we intend to fully implement the pNFS-Objects data-map, including raid 0,4,5,6 over mirrored devices, over multiple device-groups. And more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12 * Define an io_state based API for accessing osd storage devices in an abstract way. Usage: First a caller allocates an io state with: exofs_get_io_state(struct exofs_sb_info sbi, struct exofs_io_state* ios); Then calles one of: exofs_sbi_create(struct exofs_io_state ios); exofs_sbi_remove(struct exofs_io_state ios); exofs_sbi_write(struct exofs_io_state ios); exofs_sbi_read(struct exofs_io_state ios); exofs_oi_truncate(struct exofs_i_info oi, u64 new_len); And when done exofs_put_io_state(struct exofs_io_state ios); * Convert all source files to use this new API * Convert from bio_alloc to bio_kmalloc * In io engine we make use of the now fixed osd_req_decode_sense There are no functional changes or on disk additions after this patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:22 +02:00
Boaz Harrosh	8ce9bdd1fb	exofs: move osd.c to ios.c If I do a "git mv" together with a massive code change and commit in one patch, git looses the rename and records a delete/new instead. This is bad because I want a rename recorded so later rebased/cherry-picked patches to the old name will work. Also the --follow is lost. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:21 +02:00
Boaz Harrosh	cae012d853	exofs: statfs blocks is sectors not FS blocks Even though exofs has a 4k block size, statfs blocks is in sectors (512 bytes). Also if target returns 0 for capacity then make it ULLONG_MAX. df does not like zero-size filesystems Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:21 +02:00
Boaz Harrosh	19fe294f2e	exofs: Prints on mount and unmout It is important to print in the logs when a filesystem was mounted and eventually unmounted. Print the osd-device's osd_name and pid the FS was mounted/unmounted on. TODO: How to also print the namespace path the filesystem was mounted on? Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:20 +02:00
Boaz Harrosh	9cfdc7aa9f	exofs: refactor exofs_i_info initialization into common helper There are two places that initialize inodes: exofs_iget() and exofs_new_inode() As more members of exofs_i_info that need initialization are added this code will grow. (soon) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:19 +02:00
Boaz Harrosh	fe33cc1ee1	exofs: dbg-print less Iner-loops printing is converted to EXOFS_DBG2 which is #defined to nothing. It is now almost bareable to just leave debug-on. Every operation is printed once, with most relevant info (I hope). Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:18 +02:00
Boaz Harrosh	58311c43df	exofs: More sane debug print debug prints should be somewhat useful without actually reading the source code Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:17 +02:00
Linus Torvalds	4ef58d4e2a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...	2009-12-09 19:43:33 -08:00
Theodore Ts'o	fab3a549e2	ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) Fix the following potential circular locking dependency between mm->mmap_sem and ei->i_data_sem: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-04115-gec044c5 #37 ------------------------------------------------------- ureadahead/1855 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac but task is already holding lock: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem){++++..}: [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81516633>] down_read+0x51/0x84 [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5 [<ffffffff811a3453>] ext4_get_block+0xab/0xef [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d [<ffffffff81155360>] mpage_readpages+0xd0/0x114 [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc [<ffffffff810f86f2>] ra_submit+0x21/0x25 [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c [<ffffffff81107b97>] __do_fault+0x55/0x3a2 [<ffffffff81109db0>] handle_mm_fault+0x327/0x734 [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa [<ffffffff81518205>] page_fault+0x25/0x30 [<ffffffff812a34d8>] clear_user+0x38/0x3c [<ffffffff81167e16>] padzero+0x20/0x31 [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff81166d64>] load_script+0x1b8/0x1cc [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff8113255f>] do_execve+0x1ce/0x2cf [<ffffffff81027494>] sys_execve+0x43/0x5a [<ffffffff8102918a>] stub_execve+0x6a/0xc0 -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by ureadahead/1855: #0: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 stack backtrace: Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37 Call Trace: [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7 [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff8102f229>] ? sched_clock+0x9/0xd [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95 [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-09 21:30:02 -05:00
Theodore Ts'o	a214238d3b	ext4: Do not override ext2 or ext3 if built they are built as modules The CONFIG_EXT4_USE_FOR_EXT23 option must not try to take over the ext2 or ext3 file systems if the those file system drivers are configured to be built as mdoules. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-09 21:09:58 -05:00
Theodore Ts'o	3b799d15f2	jbd2: Export jbd2_log_start_commit to fix ext4 build This fixes: ERROR: "jbd2_log_start_commit" [fs/ext4/ext4.ko] undefined! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-09 20:42:53 -05:00
Linus Torvalds	a9280fed38	Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: (31 commits) kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block() kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0() kill-the-bkl/reiserfs: definitely drop the bkl from reiserfs_ioctl() kill-the-bkl/reiserfs: always lock the ioctl path kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency kill-the-bkl/reiserfs: panic in case of lock imbalance kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() kill-the-bkl/reiserfs: fix recursive reiserfs lock in reiserfs_mkdir() kill-the-bkl/reiserfs: fix "reiserfs lock" / "inode mutex" lock inversion dependency kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock kill-the-bkl/reiserfs: acquire the inode mutex safely kill-the-bkl/reiserfs: unlock only when needed in search_by_key kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() kill-the-bkl/reiserfs: reduce number of contentions in search_by_key() kill-the-bkl/reiserfs: don't hold the write recursively in reiserfs_lookup() kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() kill-the-bkl/reiserfs: conditionaly release the write lock on fs_changed() kill-the-BKL/reiserfs: add reiserfs_cond_resched() ...	2009-12-09 07:58:15 -08:00
Benjamin Herrenschmidt	bcd6acd51f	Merge commit 'origin/master' into next Conflicts: include/linux/kvm.h	2009-12-09 17:14:38 +11:00
Ricardo Labiaga	7cab89b275	nfs41: Invoke RECLAIM_COMPLETE on all new client ids The NFSv4.1 spec indicates RECLAIM_COMPLETE is to be issued whenever a client establishes a new client id, not only after detecting the server has rebooted. Set the NFS4CLNT_RECLAIM_REBOOT bit after every new client id has been established. This enables us to issue RECLAIM_COMPLETE during the wrap up of the NFS4CLNT_RECLAIM_REBOOT state. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-08 14:35:28 -05:00
Linus Torvalds	6035ccd8e9	Merge branch 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits) cfq-iosched: Do not access cfqq after freeing it block: include linux/err.h to use ERR_PTR cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit blkio: Allow CFQ group IO scheduling even when CFQ is a module blkio: Implement dynamic io controlling policy registration blkio: Export some symbols from blkio as its user CFQ can be a module block: Fix io_context leak after failure of clone with CLONE_IO block: Fix io_context leak after clone with CLONE_IO cfq-iosched: make nonrot check logic consistent io controller: quick fix for blk-cgroup and modular CFQ cfq-iosched: move IO controller declerations to a header file cfq-iosched: fix compile problem with !CONFIG_CGROUP blkio: Documentation blkio: Wait on sync-noidle queue even if rq_noidle = 1 blkio: Implement group_isolation tunable blkio: Determine async workload length based on total number of queues blkio: Wait for cfq queue to get backlogged if group is empty blkio: Propagate cgroup weight updation to cfq groups blkio: Drop the reference to queue once the task changes cgroup blkio: Provide some isolation between groups ...	2009-12-08 08:19:16 -08:00
Linus Torvalds	d7fc02c7ba	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c	2009-12-08 07:55:01 -08:00
Linus Torvalds	1557d33007	Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits) security/tomoyo: Remove now unnecessary handling of security_sysctl. security/tomoyo: Add a special case to handle accesses through the internal proc mount. sysctl: Drop & in front of every proc_handler. sysctl: Remove CTL_NONE and CTL_UNNUMBERED sysctl: kill dead ctl_handler definitions. sysctl: Remove the last of the generic binary sysctl support sysctl net: Remove unused binary sysctl code sysctl security/tomoyo: Don't look at ctl_name sysctl arm: Remove binary sysctl support sysctl x86: Remove dead binary sysctl support sysctl sh: Remove dead binary sysctl support sysctl powerpc: Remove dead binary sysctl support sysctl ia64: Remove dead binary sysctl support sysctl s390: Remove dead sysctl binary support sysctl frv: Remove dead binary sysctl support sysctl mips/lasat: Remove dead binary sysctl support sysctl drivers: Remove dead binary sysctl support sysctl crypto: Remove dead binary sysctl support sysctl security/keys: Remove dead binary sysctl support sysctl kernel: Remove binary sysctl logic ...	2009-12-08 07:38:50 -08:00
Trond Myklebust	88069f77e1	NFSv41: Fix a potential state leakage when restarting nfs4_close_prepare Currently, if the call to nfs4_setup_sequence() in nfs4_close_prepare fails, any later retries will fail to launch an RPC call, due to the fact that the &state->flags will have been cleared. Ditto if nfs4_close_done() triggers a call to the NFSv4.1 version of nfs_restart_rpc(). We therefore move the actual clearing of the state->flags to nfs4_close_done(), when we know that the RPC call was successful. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-08 08:33:16 -05:00
Roel Kluin	b38882f5c0	UBIFS: fix return code in check_leaf Return the PTR_ERR of the correct pointer. This fixes the debugging code. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-12-08 14:24:00 +02:00
Jiri Kosina	d014d04386	Merge branch 'for-next' into for-linus Conflicts: kernel/irq/chip.c	2009-12-07 18:36:35 +01:00
Ricardo Labiaga	74e7bb73a3	nfs41: Handle NFSv4.1 session errors in the delegation recall code Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-07 09:48:30 -05:00
Ricardo Labiaga	7970886118	nfs41: Retry delegation return if it failed with session error Update nfs4_delegreturn_done() to retry the operation after setting the NFS4CLNT_SESSION_SETUP bit to indicate the need to reset the session. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-07 09:23:21 -05:00
Ricardo Labiaga	bcfa49f6f9	nfs41: Handle session errors during delegation return Add session error handling to nfs4_open_delegation_recall() Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-07 09:22:29 -05:00
Ricardo Labiaga	f455848a11	nfs41: Mark stateids in need of reclaim if state manager gets stale clientid The state manager was not marking the stateids as needing to be reclaimed after reestablishing the clientid. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-07 09:16:09 -05:00
Trond Myklebust	0110ee152b	NFS: Fix up the declaration of nfs4_restart_rpc when NFSv4 not configured Also rename it: it is used in generic code, and so should not have a 'nfs4' prefix. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-07 09:00:24 -05:00
Frederic Weisbecker	6548698f92	Merge commit 'v2.6.32' into reiserfs/kill-bkl Merge-reason: The tree was based 2.6.31. It's better to be up to date with 2.6.32. Although no conflicting changes were made in between, it gives benchmarking results closer to the lastest kernel behaviour.	2009-12-07 07:29:22 +01:00
Steve French	a994b8fa66	[CIFS] Enable mmap on forcedirectio mounts openoffice and gedit failed with 'direct' options Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-12-07 05:44:46 +00:00
Akira Fujita	4a58579b9e	ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT This patch fixes three problems in the handling of the EXT4_IOC_MOVE_EXT ioctl: 1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for original and donor files, but they allow the illegal write access to donor file, since donor file is overwritten by original file data. To fix this problem, change access mode checks of original (r->r/w) and donor (r->w) files. 2. Disallow the use of donor files that have a setuid or setgid bits. 3. Call mnt_want_write() and mnt_drop_write() before and after ext4_move_extents() calling to get write access to a mount. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-06 23:38:31 -05:00
Jan Kara	b436b9bef8	ext4: Wait for proper transaction commit on fsync We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 23:51:10 -05:00
Dmitry Monakhov	194074acac	ext4: fix incorrect block reservation on quota transfer. Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid This means that we may end-up with transferring all quotas. Add we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in case of QUOTA_INIT_BLOCKS. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 22:42:28 -05:00
Dmitry Monakhov	5aca07eb7d	ext4: quota macros cleanup Currently all quota block reservation macros contains hard-coded "2" aka MAXQUOTAS value. This is no good because in some places it is not obvious to understand what does this digit represent. Let's introduce new macro with self descriptive name. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 22:42:15 -05:00
Dmitry Monakhov	8aa6790f87	ext4: ext4_get_reserved_space() must return bytes instead of blocks Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 22:41:52 -05:00
Curt Wohlgemuth	b844167edc	ext4: remove blocks from inode prealloc list on failure This fixes a leak of blocks in an inode prealloc list if device failures cause ext4_mb_mark_diskspace_used() to fail. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 22:18:25 -05:00
Josef Bacik	d4edac314e	ext4: wait for log to commit when umounting There is a potential race when a transaction is committing right when the file system is being umounting. This could reduce in a race because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super before the commit code calls a callback so the mballoc code can release freed blocks in the transaction, resulting in a panic trying to access the freed s_group_info. The fix is to wait for the transaction to finish committing before we shutdown the multiblock allocator. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 21:48:58 -05:00
Jan Kara	b9a4207d5e	ext4: Avoid data / filesystem corruption when write fails to copy data When ext4_write_begin fails after allocating some blocks or generic_perform_write fails to copy data to write, we truncate blocks already instantiated beyond i_size. Although these blocks were never inside i_size, we have to truncate the pagecache of these blocks so that corresponding buffers get unmapped. Otherwise subsequent __block_prepare_write (called because we are retrying the write) will find the buffers mapped, not call ->get_block, and thus the page will be backed by already freed blocks leading to filesystem and data corruption. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 21:24:33 -05:00
Theodore Ts'o	24b584240a	ext4: Use ext4 file system driver for ext2/ext3 file system mounts Add a new config option, CONFIG_EXT4_USE_FOR_EXT23 which if enabled, will cause ext4 to be used for either ext2 or ext3 file system mounts when ext2 or ext3 is not enabled in the configuration. This allows minimalist kernel fanatics to drop to file system drivers from their compiled kernel with out losing functionality. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-07 14:08:51 -05:00
Roel Kluin	c09eef305d	ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-07 10:38:16 -05:00
Ricardo Labiaga	9dfdf404c9	nfs41: Don't clear DRAINING flag on NFS4ERR_STALE_CLIENTID If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, don't clear the NFS4CLNT_SESSION_DRAINING flag and don't wake RPCs waiting for the session to be reestablished. We don't have a session yet, so there is no reason to wake other RPCs. This avoids sending spurious compounds with bogus sequenceID during session and state recovery. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [Trond.Myklebust@netapp.com: cleaned up patch by adding the nfs41_begin/end_drain_session() helpers] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-06 12:57:34 -05:00
Ricardo Labiaga	9430fb6b53	nfs41: nfs41_setup_state_renewal Move call to get the lease time and the setup of the state renewal out of nfs4_create_session so that it can be called after clearing the DRAINING flag. We use the getattr RPC to obtain the lease time, which requires a sequence slot. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-06 12:23:46 -05:00
Trond Myklebust	bcb56164ce	NFSv41: More cleanups Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 19:32:19 -05:00
Trond Myklebust	35dc1d74a8	NFSv41: Fix up some bugs in the NFS4CLNT_SESSION_DRAINING code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 19:32:19 -05:00
Trond Myklebust	d61e612a72	NFSv41: Clean up slot table management We no longer need to maintain a distinction between nfs41_sequence_done and nfs41_sequence_free_slot. This fixes a number of slot table leakages in the NFSv4.1 code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 19:32:19 -05:00
Trond Myklebust	f26468fb93	NFSv41: Fix nfs4_proc_create_session We should not assume that nfs41_init_clientid() will always want to initialise the session. If it is being called due to a server reboot, then we just want to reset the session after re-establishing the clientid. Fix this by getting rid of the 'reset' parameter in nfs4_proc_create_session(), and instead relying on whether or not the session slot table pointer is non-NULL. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 19:32:11 -05:00
Linus Torvalds	897e81bea1	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits) sched, cputime: Introduce thread_group_times() sched, cputime: Cleanups related to task_times() Revert "sched, x86: Optimize branch hint in __switch_to()" sched: Fix isolcpus boot option sched: Revert `498657a478` sched, time: Define nsecs_to_jiffies() sched: Remove task_{u,s,g}time() sched: Introduce task_times() to replace task_{u,s}time() pair sched: Limit the number of scheduler debug messages sched.c: Call debug_show_all_locks() when dumping all tasks sched, x86: Optimize branch hint in __switch_to() sched: Optimize branch hint in context_switch() sched: Optimize branch hint in pick_next_task_fair() sched_feat_write(): Update ppos instead of file->f_pos sched: Sched_rt_periodic_timer vs cpu hotplug sched, kvm: Fix race condition involving sched_in_preempt_notifers sched: More generic WAKE_AFFINE vs select_idle_sibling() sched: Cleanup select_task_rq_fair() sched: Fix granularity of task_u/stime() sched: Fix/add missing update_rq_clock() calls ...	2009-12-05 15:30:49 -08:00
David S. Miller	28b4d5cc17	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ Conflicts: drivers/net/pcmcia/fmvj18x_cs.c drivers/net/pcmcia/nmclan_cs.c drivers/net/pcmcia/xirc2ps_cs.c drivers/net/wireless/ray_cs.c	2009-12-05 15:22:26 -08:00
Ricardo Labiaga	da6ebfe34a	nfs41: Invoke RECLAIM_COMPLETE This patch invokes RECLAIM_COMPLETE after the client is done reclaiming state. There are interpretations of the spec that suggest that RECLAIM_COMPLETE should also be issued after a new clientid has been obtained from the server and even if there is no state to reclaim. This tells the server that the client has no state to reclaim even if the client isn't aware the server may have rebooted. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 16:08:41 -05:00
Ricardo Labiaga	fce5c838e1	nfs41: RECLAIM_COMPLETE functionality Implements RECLAIM_COMPLETE as an asynchronous RPC. NFS4ERR_DELAY is retried, NFS4ERR_DEADSESSION invokes the error handling but does not result in a retry, since we don't want to have a lingering RECLAIM_COMPLETE call sent in the middle of a possible new state recovery cycle. If a session reset occurs, a new wave of reclaim operations will follow, containing their own RECLAIM_COMPLETE call. We don't want a retry to get on the way of recovery by incorrectly indicating to the server that we're done reclaiming state. A subsequent patch invokes the functionality. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 16:08:41 -05:00
Ricardo Labiaga	180197536b	nfs41: RECLAIM_COMPLETE XDR functionality XDR encoding and decoding for RECLAIM_COMPLETE. Implements the necessary encoding to indicate reclaim complete for the entire client. In the future, it can be extended to provide reclaim complete functionality for a single file system after migration. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 16:08:40 -05:00
Ricardo Labiaga	8b173218bd	Cleanup some NFSv4 XDR decode comments Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 16:08:39 -05:00
Trond Myklebust	0556d1a695	NFSv41: nfs4_reset_session must always set NFS4CLNT_SESSION_DRAINING Otherwise we have no guarantees that other processes won't start another RPC call while we're resetting the session. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 15:03:20 -05:00
Alexandros Batsakis	2597641dea	nfs41: v2 fix cb_recall bug in NFSv4.1 the seqid part of a stateid in CB_RECALL must be 0 Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 13:48:55 -05:00
Alexandros Batsakis	0629e370dd	nfs41: check SEQUENCE status flag the server can indicate a number of error conditions by setting the appropriate bits in the SEQUENCE operation. The client re-establishes state with the server when it receives one of those, with the action depending on the specific case. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 13:46:14 -05:00
Alexandros Batsakis	2449ea2e19	nfs41: V2 adjust max_rqst_sz, max_resp_sz w.r.t to rsize, wsize The v4.1 client should take into account the desired rsize, wsize when negotiating the max size in CREATE_SESSION. Accordingly, it should use rsize, wsize that are smaller than the session negotiated values. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 13:36:55 -05:00
Alexandros Batsakis	7b183d0d43	nfs41: remove server-only EXCHGID4_FLAG_CONFIRMED_R flag from exchange_id Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 13:33:25 -05:00
Alexandros Batsakis	4882ef72cd	nfs41: add support for the exclusive create flags In v4.1 the client MUST/SHOULD use the EXCLUSIVE4_1 flag instead of EXCLUSIVE4, and GUARDED when the server supports persistent sessions. For now (and until we support suppattr_exclcreat), we don't send any attributes with EXCLUSIVE4_1 relying in the subsequent SETATTR as in v4.0 Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 13:30:21 -05:00
Alexandros Batsakis	d8cb1a7ce3	nfs41: check if session exists and if it is persistent Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 13:29:53 -05:00
Alexandros Batsakis	31f0960778	nfs41: V2 initial support for CB_RECALL_ANY For now the clients returns _all_ the delegations of the specificed type it holds Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 13:27:02 -05:00
Alexandros Batsakis	c79571a508	nfs4: V2 return/expire delegations depending on their type Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 13:20:52 -05:00
Alexandros Batsakis	b4a6f4966e	nfs4: minor delegation cleaning Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 13:19:11 -05:00
Alexandros Batsakis	07bccc2dd4	nfs41: add support for callback with RPC version number 4 The NFSv4.1 spec-29 (18.36.3) says that the server MUST use an ONC RPC (program) version number equal to 4 in callbacks sent to the client. For now we allow both versions 1 and 4. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-05 13:19:01 -05:00
Linus Torvalds	1ebb275afc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (31 commits) GFS2: Fix glock refcount issues writeback: remove unused nonblocking and congestion checks (gfs2) GFS2: drop rindex glock to refresh rindex list GFS2: Tag all metadata with jid GFS2: Locking order fix in gfs2_check_blk_state GFS2: Remove dirent_first() function GFS2: Display nobarrier option in /proc/mounts GFS2: add barrier/nobarrier mount options GFS2: remove division from new statfs code GFS2: Improve statfs and quota usability GFS2: Use dquot_send_warning() VFS: Export dquot_send_warning GFS2: Add set_xquota support GFS2: Add get_xquota support GFS2: Clean up gfs2_adjust_quota() and do_glock() GFS2: Remove constant argument from qd_get() GFS2: Remove constant argument from qdsb_get() GFS2: Add proper error reporting to quota sync via sysfs GFS2: Add get_xstate quota function GFS2: Remove obsolete code in quota.c ...	2009-12-05 09:47:17 -08:00
Adam Buchbinder	6070d81eb5	tree-wide: fix misspelling of "definition" in comments "Definition" is misspelled "defintion" in several comments; this patch fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 23:41:47 +01:00
Adam Buchbinder	febe29d957	reiserfs: fix misspelling of "journaled" "Journaled" is misspelled "journlaled" in an output string; this patch fixed it. No changes in functionality. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 23:39:11 +01:00
Andy Adamson	0b9e2d41f1	nfs41: only state manager sets NFS4CLNT_SESSION_SETUP Replace sync and async handlers setting of the NFS4CLNT_SESSION_SETUP bit with setting NFS4CLNT_CHECK_LEASE, and let the state manager decide to reset the session. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 16:02:14 -05:00
Andy Adamson	691daf3b0c	nfs41: drain session cleanup Do not wake up the next slot_tbl_waitq task in nfs4_free_slot because we may be draining the slot. Either signal the state manager that the session is drained (the state manager wakes up tasks) OR wake up the next task. In nfs41_sequence_done, the slot dereference is only needed in the sequence operation success case. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:55:39 -05:00
Andy Adamson	ea028ac925	nfs41: nfs41: fix state manager deadlock in session reset If the session is reset during state recovery, the state manager thread can sleep on the slot_tbl_waitq causing a deadlock. Add a completion framework to the session. Have the state manager thread set a new session state (NFS4CLNT_SESSION_DRAINING) and wait for the session slot table to drain. Signal the state manager thread in nfs41_sequence_free_slot when the NFS4CLNT_SESSION_DRAINING bit is set and the session is drained. Reported-by: Trond Myklebust <trond@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:55:38 -05:00
Andy Adamson	05f0d23647	nfs41: remove nfs4_recover_session nfs4_recover_session can put rpciod to sleep. Just use nfs4_schedule_recovery. Reported-by: Trond Myklebust <trond.myklebust@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:55:37 -05:00
Andy Adamson	2628eddff1	nfs41: don't clear tk_action on success Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:55:35 -05:00
Andy Adamson	8ba9bf8e51	nfs41: fix switch in nfs4_recovery_handle_error Do not fall through and set NFS4CLNT_SESSION_RESET bit on NFS4ERR_EXPIRED Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:55:34 -05:00
Andy Adamson	b9179237e2	nfs41: fix switch in nfs4_handle_exception Do not fall through and call nfs4_delay on session error handling. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:55:32 -05:00
Andy Adamson	36bbe34239	nfs41: free the slot on unhandled read errors nfs4_read_done returns zero on unhandled errors. nfs_readpage_result will return on a negative tk_status without freeing the slot. Call nfs4_sequence_free_slot on unhandled errors in nfs4_read_done. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:55:30 -05:00
Andy Adamson	e608e79f1b	nfs41: call free slot from nfs4_restart_rpc nfs41_sequence_free_slot can be called multiple times on SEQUENCE operation errors. No reason to inline nfs4_restart_rpc Reported-by: Trond Myklebust <trond.myklebust@netapp.com> nfs_writeback_done and nfs_readpage_retry call nfs4_restart_rpc outside the error handler, and the slot is not freed prior to restarting in the rpc_prepare state during session reset. Fix this by moving the call to nfs41_sequence_free_slot from the error path of nfs41_sequence_done into nfs4_restart_rpc, and by removing the test for NFS4CLNT_SESSION_SETUP. Always free slot and goto the rpc prepare state on async errors. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:55:29 -05:00
Andy Adamson	1d9ddde94a	nfs41: nfs4_get_lease_time will never session reset Make this clear by calling rpc_restart-call. Prepare for nfs4_restart_rpc() to free slots. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:55:27 -05:00
Andy Adamson	6df08189ff	nfs41: rename cl_state session SETUP bit to RESET The bit is no longer used for session setup, only for session reset. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:55:05 -05:00
Andy Adamson	4d643d1dfa	nfs41: add create session into establish_clid Reported-by: Trond Myklebust <trond.myklebust@netapp.com> Resetting the clientid from the state manager could result in not confirming the clientid due to create session not being called. Move the create session call from the NFS4CLNT_SESSION_SETUP state manager initialize session case into the NFS4CLNT_LEASE_EXPIRED case establish_clid call. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-04 15:52:24 -05:00
Giuseppe Scrivano	336e8683b9	inotify: remove superfluous return code check Signed-off-by: Giuseppe Scrivano <gscrivano@gnu.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:58 +01:00
André Goddard Rosa	af901ca181	tree-wide: fix assorted typos all over the place That is "success", "unknown", "through", "performance", "[re\|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over\|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:55 +01:00
Alberto Bertogli	be030e653f	fs/debugfs/inode.c: fix comment typos Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:52 +01:00
Uwe Kleine-König	bf48aabb89	tree-wide: fix typos "offest" -> "offset" This patch was generated by git grep -E -i -l 'offest' \| xargs -r perl -p -i -e 's/offest/offset/' Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:50 +01:00
Trond Myklebust	7285f2d2ff	Merge branch 'devel' into linux-next	2009-12-03 21:27:36 -05:00
NeilBrown	44ed3556ba	NFS4ERR_FILE_OPEN handling in Linux/NFS NFS4ERR_FILE_OPEN is return by the server when an operation cannot be performed because the file is currently open and local (to the server) semantics prohibit the operation while the file is open. A typical case is a RENAME operation on an MS-Windows platform, which prevents rename while the file is open. While it is possible that such a condition is transitory, it is also very possible that the file will be held open for an extended period of time thus preventing the operation. The current behaviour of Linux/NFS is to retry the operation indefinitely. This is not appropriate - we do not expect a rename to take an arbitrary amount of time to complete. Rather, and error should be returned. The most obvious error code would be EBUSY, which is a legal at least for 'rename' and 'unlink', and accurately captures the reason for the error. This patch allows a few retries until about 2 seconds have elapsed, then returns EBUSY. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 21:26:36 -05:00
Miklos Szeredi	24e93025ee	nfs: clean up sillyrenaming in nfs_rename() The d_instantiate(new_dentry, NULL) is superfluous, the dentry is already negative. Rehashing this dummy dentry isn't needed either, d_move() works fine on an unhashed target. The re-checking for busy after a failed nfs_sillyrename() is bogus too: new_dentry->d_count < 2 would be a bug here. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:58:56 -05:00
Miklos Szeredi	27226104e6	nfs: dont unhash target if renaming a directory Move unhashing the target to after the check for existence and being a non-directory. If renaming a directory then the VFS already unhashes the target if it is not busy. If it's busy then acquiring more references during the rename makes no difference. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:58:56 -05:00
Miklos Szeredi	28f79a1a69	nfs: fix comments in nfs_rename() Comments are wrong or out of date. In particular d_drop() doesn't free the inode it just unhashes the dentry. And if target is a directory then it is not checked for being busy. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:58:56 -05:00
Miklos Szeredi	e48de5ec25	nfs: remove unnecessary check from nfs_rename() VFS already checks if both source and target are directories. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:58:56 -05:00
Trond Myklebust	9c4c761a62	NFSv4.1: Handle NFSv4.1 session errors in the lock recovery code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:58:56 -05:00
Chuck Lever	dd47f96c07	NFS: Revert default r/wsize behavior When the "rsize=" or "wsize=" mount options are not specified, text-based mounts have slightly different behavior than legacy binary mounts. Text-based mounts use the smaller of the server's maximum and the client's maximum, but binary mounts use the smaller of the server's _preferred_ size and the client's maximum. This difference is actually pretty subtle. Most servers advertise the same value as their maximum and their preferred transfer size, so the end result is the same in most cases. The reason for this difference is that for text-based mounts, if r/wsize are not specified, they are set to the largest value supported by the client. For legacy mounts, the values are set to zero if these options are not specified. nfs_server_set_fsinfo() can negotiate the transfer size defaults correctly in any case. There's no need to specify any particular value as default in the text-based option parsing logic. Note that nfs4 doesn't use nfs_server_set_fsinfo(), but the mount.nfs4 command does set rsize and wsize to 0 if the user didn't specify these options. So, make the same change for text-based NFSv4 mounts. Thanks to James Pearson <james-p@moving-picture.com> for reporting and diagnosing the problem. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:58:56 -05:00
Chuck Lever	d250e190fb	NFS: Display compressed (shorthand) IPv6 in /proc/mounts Recent changes to snprintf() introduced the %pI6c formatter, which can display an IPv6 address with standard shorthanding. Use this new formatter when displaying IPv6 server addresses in /proc/mounts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:58:56 -05:00
Jeff Layton	ee671b016f	NFS: convert proto= option to use netids rather than a protoname Solaris uses netids as values for the proto= option, so that when someone specifies "tcp6" they get traffic over TCP + IPv6. Until recently, this has never really been an issue for Linux since it didn't support NFS over IPv6. The netid and the protocol name were generally always the same (modulo any strange configuration in /etc/netconfig). The solaris manpage documents their proto= option as: proto= _netid_ \| rdma This patch is intended to bring Linux closer to how the Solaris proto= option works, by declaring a static netid mapping in the kernel and converting the proto= and mountproto= options to follow it and display the proper values in /proc/mounts. Much of this functionality will need to be provided by a userspace mount.nfs patch. Chuck Lever has a patch to change mount.nfs in the same way. In principle, we could do all of this in userspace but that would mean that the options in /proc/mounts may not match the options used by userspace. The alternative to the static mapping here is to add a mechanism to upcall to userspace for netid's. I'm not opposed to that option, but it'll probably mean more overhead (and quite a bit more code). Rather than shoot for that at first, I figured it was probably better to start simply. Comments welcome. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:58:56 -05:00
J. Bruce Fields	d4e935bd67	The rpc server does not require that service threads take the BKL. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:58:33 -05:00
Trond Myklebust	1185a552e3	NFSv4: Ensure nfs4_close_context() is declared as static Fix another 'sparse' warning in fs/nfs/nfs4proc.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:54:02 -05:00
Trond Myklebust	0a6566ecd3	NFSv4: Ensure nfs_dns_lookup() and nfs_dns_update() are declared static Fix two 'sparse' warnings in fs/nfs/dns_resolve.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:54:01 -05:00
Trond Myklebust	b6d408ba8c	NFSv4: Fix up error handling in the state manager main loop. The nfs4_state_manager should not be looking at the error values when deciding whether or not to loop round in order to handle a higher priority state recovery task. It should rather be looking at the clp->cl_state. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:53:22 -05:00
Trond Myklebust	a9ed2e2583	NFSv4: Handle NFS4ERR_GRACE when recovering an expired lease. If our lease expires, and the server reboots while we're recovering, we need to be able to wait until the grace period is over. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:53:21 -05:00
Trond Myklebust	c8b7ae3d32	NFSv4: Ensure the state manager handles NFS4ERR_NO_GRACE correctly Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:53:21 -05:00
Trond Myklebust	4f7cdf18e1	NFSv4: The state manager shouldn't exit on errors that were handled nfs4_recovery_handle_error() will correctly handle errors such as NFS4ERR_CB_PATH_DOWN, however because they are still passed back to the main loop in nfs4_state_manager(), they can cause the latter to exit prematurely. Fix this by letting nfs4_recovery_handle_error() change the error value in cases where there is no action required by the caller. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:53:20 -05:00
Trond Myklebust	e345e88a77	NFSv4: Fix up the callers of nfs4_state_end_reclaim_reboot In practice, we need to ensure that we call nfs4_state_end_reclaim_reboot in 2 cases: - If we lose the lease while we were reclaiming state OR - After we're done with reboot recovery Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 15:52:41 -05:00
Jeff Layton	a2934c7b36	cifs: NULL out tcon, pSesInfo, and srvTcp pointers when chasing DFS referrals The scenario is this: The kernel gets EREMOTE and starts chasing a DFS referral at mount time. The tcon reference is put, which puts the session reference too, but neither pointer is zeroed out. The mount gets retried (goto try_mount_again) with new mount info. Session setup fails fails and rc ends up being non-zero. The code then falls through to the end and tries to put the previously freed tcon pointer again. Oops at: cifs_put_smb_ses+0x14/0xd0 Fix this by moving the initialization of the rc variable and the tcon, pSesInfo and srvTcp pointers below the try_mount_again label. Also, add a FreeXid() before the goto to prevent xid "leaks". Signed-off-by: Jeff Layton <jlayton@redhat.com> Reported-by: Gustavo Carvalho Homem <gustavo@angulosolido.pt> CC: stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-12-03 16:12:41 +00:00
Trond Myklebust	d18cc1fda2	NFSv4: Fix a potential state manager deadlock when returning delegations The nfsv4 state manager could potentially deadlock inside __nfs_inode_return_delegation() if the server reboots, so that the calls to nfs_msync_inode() end up waiting on state recovery to complete. Also ensure that if a server reboot or network partition causes us to have to stop returning delegations, that NFS4CLNT_DELEGRETURN is set so that the state manager can resume any outstanding delegation returns after it has dealt with the state recovery situation. Finally, ensure that the state manager doesn't wait for the DELEGRETURN call to complete. It doesn't need to, and that too can cause a deadlock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 08:10:17 -05:00
J. Bruce Fields	d327cf7449	Re: acl trouble after upgrading ubuntu Subject: [PATCH] nfs: fix acl decoding Commit `28f566942c` "NFS: use dynamically computed compound_hdr.replen for xdr_inline_pages offset" accidentally changed the amount of space to allow for the acl reply, resulting in an IO error on attempts to get an acl. Reported-by: Paul Rudin <paul@rudin.co.uk> Cc: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 08:10:17 -05:00
Trond Myklebust	96f287b0cf	NFS: BKL removal from the mount code... None of the code in nfs_umount_begin() or nfs_remount() has any BKL dependency. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-03 08:09:56 -05:00
Wu Fengguang	0d99519efe	writeback: remove unused nonblocking and congestion checks - no one is calling wb_writeback and write_cache_pages with wbc.nonblocking=1 any more - lumpy pageout will want to do nonblocking writeback without the congestion wait So remove the congestion checks as suggested by Chris. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Alex Elder <aelder@sgi.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-12-03 13:54:25 +01:00
Wu Fengguang	b17621fed6	writeback: introduce wbc.for_background It will lower the flush priority for NFS, and maybe more in future. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-12-03 13:54:25 +01:00
Wu Fengguang	951c30d135	writeback: remove the always false bdi_cap_writeback_dirty() test This is dead code because no bdi flush thread will be started for !bdi_cap_writeback_dirty bdi. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-12-03 13:54:25 +01:00

... 3 4 5 6 7 ...

16449 Commits