linux

Author	SHA1	Message	Date
Christoph Hellwig	f4378b6eaf	xfs: actually enable the swapext compat handler Fix a small typo in the compat ioctl handler that cause the swapext compat handler to never be called. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Torsten Kaiser <just.for.lkml@googlemail.com> Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 16:55:53 -05:00
Christoph Hellwig	aa72a5cf00	xfs: simplify xfs_trans_iget xfs_trans_iget is a wrapper for xfs_iget that adds the inode to the transaction after it is read. Except when the inode already is in the inode cache, in which case it returns the existing locked inode with increment lock recursion counts. Now, no one in the tree every decrements these lock recursion counts, so any user of this gets a potential double unlock when both the original owner of the inode and the xfs_trans_iget caller unlock it. When looking back in a git bisect in the historic XFS tree there was only one place that decremented these counts, xfs_trans_iput. Introduced in commit ca25df7a840f426eb566d52667b6950b92bb84b5 by Adam Sweeney in 1993, and removed in commit 19f899a3ab155ff6a49c0c79b06f2f61059afaf3 by Steve Lord in 2003. And as long as it didn't slip through git bisects cracks never actually used in that time frame. A quick audit of the callers of xfs_trans_iget shows that no caller really relies on this behaviour fortunately - xfs_ialloc allows this inode from disk so it must not be there before, and all the RT allocator routines only every add each RT bitmap inode once. In addition to removing lots of code and reducing the size of the inode item this patch also avoids the double inode cache lookup in each create/mkdir/mknod transaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:46:16 -05:00
Christoph Hellwig	13e6d5cdde	xfs: merge fsync and O_SYNC handling The guarantees for O_SYNC are exactly the same as the ones we need to make for an fsync call (and given that Linux O_SYNC is O_DSYNC the equivalent is fdadatasync, but we treat both the same in XFS), except with a range data writeout. Jan Kara has started unifying these two path for filesystems using the generic helpers, and I've started to look at XFS. The actual transaction commited by xfs_fsync and xfs_write_sync_logforce has a different transaction number, but actually is exactly the same. We'll only use the fsync transaction going forward. One major difference is that xfs_write_sync_logforce never issues a cache flush unless we commit a transaction causing that as a side-effect, which is an obvious bug in the O_SYNC handling. Second all the locking and i_update_size vs i_update_core changes from `978b723712` never made it to xfs_write_sync_logforce, so we add them back. To make xfs_fsync easily usable from the O_SYNC path, the filemap_fdatawait call is moved up to xfs_file_fsync, so that we don't wait on the whole file after we already waited for our portion in xfs_write. We'll also use a plain call to filemap_write_and_wait_range instead of the previous sync_page_rang which did it in two steps including an half-hearted inode write out that doesn't help us. Once we're done with this also remove the now useless i_update_size tracking. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:57 -05:00
Dave Chinner	bd16956599	xfs: speed up free inode search Don't search too far - abort if it is outside a certain radius and simply do a linear search for the first free inode. In AGs with a million inodes this can speed up allocation speed by 3-4x. [hch: ported to the new xfs_ialloc.c world order] Signed-off-by: Dave Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:48 -05:00
Christoph Hellwig	2187550525	xfs: rationalize xfs_inobt_lookup* Currenly we have a xfs_inobt_lookup* variant for each comparism direction, and all these get all three fields of the inobt records passed, while the common case is just looking for the inode number and we have only marginally more callers than xfs_inobt_lookup* variants. So opencode a direct call to xfs_btree_lookup for the single case where we need all fields, and replace xfs_inobt_lookup* with a xfs_inobt_looku that just takes the inode number and the direction for all other callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:39 -05:00
Christoph Hellwig	4254b0bbb1	xfs: untangle xfs_dialloc Clarify the control flow in xfs_dialloc. Factor out a helper to go to the next node from the current one and improve the control flow by expanding composite if statements and using gotos. The xfs_ialloc_next_rec helper is borrowed from Dave Chinners dynamic allocation policy patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:29 -05:00
Dave Chinner	0b48db80ba	xfs: factor out debug checks from xfs_dialloc and xfs_difree Factor out a common helper from repeated debug checks in xfs_dialloc and xfs_difree. [hch: split out from Dave's dynamic allocation policy patches] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:18 -05:00
Christoph Hellwig	afabc24a73	xfs: improve xfs_inobt_update prototype Both callers of xfs_inobt_update have the record in form of a xfs_inobt_rec_incore_t, so just pass a pointer to it instead of the individual variables. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:08 -05:00
Christoph Hellwig	2e287a731e	xfs: improve xfs_inobt_get_rec prototype Most callers of xfs_inobt_get_rec need to fill a xfs_inobt_rec_incore_t, and those who don't yet are fine with a xfs_inobt_rec_incore_t, instead of the three individual variables, too. So just change xfs_inobt_get_rec to write the output into a xfs_inobt_rec_incore_t directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:44:56 -05:00
Dave Chinner	85c0b2ab5e	xfs: factor out inode initialisation Factor out code to initialize new inode clusters into a function of it's own. This keeps xfs_ialloc_ag_alloc smaller and better structured and enables a future inode cluster initialization transaction. Also initialize the agno variable earlier in xfs_ialloc_ag_alloc to avoid repeated byte swaps. [hch: The original patch is from Dave from his unpublished inode create transaction patch series, with some modifcations by me to apply stand-alone] Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:44:27 -05:00
Steve French	ca43e3beee	[CIFS] Fix checkpatch warnings Also update version number to 1.61 Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-09-01 17:20:50 +00:00
Suresh Jayaraman	bdb97adcdf	PATCH] cifs: fix broken mounts when a SSH tunnel is used (try #4 ) One more try.. It seems there is a regression that got introduced while Jeff fixed all the mount/umount races. While attempting to find whether a tcp session is already existing, we were not checking whether the "port" used are the same. When a second mount is attempted with a different "port=" option, it is being ignored. Because of this the cifs mounts that uses a SSH tunnel appears to be broken. Steps to reproduce: 1. create 2 shares # SSH Tunnel a SMB session 2. ssh -f -L 6111:127.0.0.1:445 root@localhost "sleep 86400" 3. ssh -f -L 6222:127.0.0.1:445 root@localhost "sleep 86400" 4. tcpdump -i lo 6111 & 5. mkdir -p /mnt/mnt1 6. mkdir -p /mnt/mnt2 7. mount.cifs //localhost/a /mnt/mnt1 -o username=guest,ip=127.0.0.1,port=6111 #(shows tcpdump activity on port 6111) 8. mount.cifs //localhost/b /mnt/mnt2 -o username=guest,ip=127.0.0.1,port=6222 #(shows tcpdump activity only on port 6111 and not on 6222 Fix by adding a check to compare the port _only_ if the user tries to override the tcp port with "port=" option, before deciding that an existing tcp session is found. Also, clean up a bit by replacing if-else if by a switch statment while at it as suggested by Jeff. Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-09-01 17:08:48 +00:00
Alexander Strakh	1b3859bc9e	[CIFS] Memory leak in ntlmv2 hash calculation in function calc_ntlmv2_hash memory is not released. 1. If in the line 333 we successfully allocate memory and assign it to pctxt variable: pctxt = kmalloc(sizeof(struct HMACMD5Context), GFP_KERNEL); then we go to line 376 and exit wihout releasing memory pointed to by pctxt variable. Add a memory releasing for pctxt variable before exit from function calc_ntlmv2_hash. Signed-off-by: Alexander Strakh <strakh@ispras.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-09-01 17:02:24 +00:00
Ian Kent	37d0892c5a	autofs4 - fix missed case when changing to use struct path In the recent change by Al Viro that changes verious subsystems to use "struct path" one case was missed in the autofs4 module which causes mounts to no longer expire. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-31 17:44:05 -10:00
Theodore Ts'o	b3a3ca8ca0	ext4: Add new tracepoint: trace_ext4_da_write_pages() Add a new tracepoint which shows the pages that will be written using write_cache_pages() by ext4_da_writepages(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-31 23:13:11 -04:00
Theodore Ts'o	de89de6e0c	ext4: Restore wbc->range_start in ext4_da_writepages() To solve a lock inversion problem, we implement part of the range_cyclic algorithm in ext4_da_writepages(). (See commit `2acf2c26` for more details.) As part of that change wbc->range_start was modified by ext4's writepages function, which causes its callers to get confused since they aren't expecting the filesystem to modify it. The simplest fix is to save and restore wbc->range_start in ext4_da_writepages. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-31 17:00:59 -04:00
Julia Lawall	a0f7bfd342	fs/xfs: Correct redundant test bp was tested for NULL a few lines before, followed by a return, and there is no intervening modification of its value. A simplified version of the semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; expression E; position p1,p2; @@ if (x == NULL \|\| ...) { ... when forall return ...; } ... when != $x=E\\|x--\\|x++\\|--x\\|++x\\|x-=E\\|x+=E\\|x\|=E\\|x&=E\\|&x$ ( x == NULL \| x != NULL ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-31 14:46:22 -05:00
Eric Sandeen	eb00457d62	xfs: remove XFS_INO64_OFFSET Commit `a19d9f887d` removed the ino64 option but left the XFS_INO64_OFFSET define it used in place - just remove it. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-31 14:46:22 -05:00
Eric Sandeen	fef1111ecd	un-static xfs_read_agf CONFIG_XFS_DEBUG builds still need xfs_read_agf to be non-static, oops. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-31 14:46:21 -05:00
Eric Sandeen	d96f8f891f	xfs: add more statics & drop some unused functions A lot more functions could be made static, but they need forward declarations; this does some easy ones, and also found a few unused functions in the process. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-31 14:46:20 -05:00
Steve French	2920ee2b47	[CIFS] potential NULL dereference in parse_DFS_referrals() memory allocation may fail, prevent a NULL dereference Pointed out by Roel Kluin CC: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-08-31 15:27:26 +00:00
Ryusuke Konishi	b1f1b8ce0a	nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key This will fix the following preempt count underflow reported from users with the title "[NILFS users] segctord problem" (Message-ID: <949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID: <debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>): WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0() Hardware name: HP Compaq 6530b (KR980UT#ABC) Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7 Call Trace: [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0 [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0 [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20 [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0 [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2] [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2] [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2] [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2] [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2] [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40 [<ffffffff803c06e0>] ? __up_write+0xe0/0x150 [<ffffffff80262959>] ? up_write+0x9/0x10 [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2] [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2] [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2] [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2] [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2] [<ffffffff80252633>] ? add_timer+0x13/0x20 [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90 [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffff8025e556>] kthread+0x56/0x90 [<ffffffff8020cdea>] child_rip+0xa/0x20 [<ffffffff8025e500>] ? kthread+0x0/0x90 [<ffffffff8020cde0>] ? child_rip+0x0/0x20 This problem was caused due to a missing radix_tree_preload() call in the retry path of nilfs_btnode_prepare_change_key() function. Reported-by: Eric A <eric225125@yahoo.com> Reported-by: Jerome Poulin <jeromepoulin@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Jerome Poulin <jeromepoulin@gmail.com> Cc: stable@kernel.org	2009-08-31 12:03:06 +09:00
Theodore Ts'o	b05ab1dc37	ext4: Limit number of links that can be created by ext4_link() In ext4_link we need to check using EXT4_LINK_MAX, and not EXT4_DIR_LINK_MAX(), since ext4_link() is creating hard links of regular files, and not directories. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-29 21:08:08 -04:00
Aneesh Kumar K.V	2c94eb86c6	ext4: Allow rename to create more than EXT4_LINK_MAX subdirectories Use EXT4_DIR_LINK_MAX so that rename() can move a directory into new parent directory without running into the EXT4_LINK_MAX limit. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-28 21:43:15 -04:00
Eric Paris	750a8870fe	inotify: update the group mask on mark addition Seperating the addition and update of marks in inotify resulted in a regression in that inotify never gets events. The inotify group mask is always 0. This mask should be updated any time a new mark is added. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-08-28 12:51:14 -04:00
Andy Adamson	468de9e54a	nfsd41: expand solo sequence check Compounds consisting of only a sequence operation don't need any additional caching beyond the sequence information we store in the slot entry. Fix nfsd4_is_solo_sequence to identify this case correctly. The additional check for a failed sequence in nfsd4_store_cache_entry() is redundant, since the nfsd4_is_solo_sequence call lower down catches this case. The final ce_cachethis set in nfsd4_sequence is also redundant. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-28 12:20:15 -04:00
Eric Paris	83cb10f0ef	inotify: fix length reporting and size checking `0db501bd06` introduced a regresion in that it now sends a nul terminator but the length accounting when checking for space or reporting to userspace did not take this into account. This corrects all of the rounding logic. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-08-28 11:57:55 -04:00
Theodore Ts'o	55ad63bf3a	ext4: fix extent sanity checking code with AGGRESSIVE_TEST The extents sanity-checking code depends on the ext4_ext_space_*() functions returning the maximum alloable size for eh_max; however, when the debugging #ifdef AGGRESSIVE_TEST is enabled to test the extent tree handling code, this prevents a normally created ext4 filesystem from being mounted with the errors: Aug 26 15:43:50 bsd086 kernel: [ 96.070277] EXT4-fs error (device sda8): ext4_ext_check_inode: bad header/extent in inode #8: too large eh_max - magic f30a, entries 1, max 4(3), depth 0(0) Aug 26 15:43:50 bsd086 kernel: [ 96.070526] EXT4-fs (sda8): no journal found Bug reported by Akira Fujita. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-28 10:40:33 -04:00
Brian Rogers	b962e7312a	inotify: do not send a block of zeros when no pathname is available When an event has no pathname, there's no need to pad it with a null byte and therefore generate an inotify_event sized block of zeros. This fixes a regression introduced by commit `0db501bd06` where my system wouldn't finish booting because some process was being confused by this. Signed-off-by: Brian Rogers <brian@xyzw.org> Signed-off-by: Eric Paris <eparis@redhat.com>	2009-08-28 10:03:06 -04:00
Tao Ma	a1b08e75df	ocfs2: invalidate dentry if its dentry_lock isn't initialized. In commit `a5a0a63092`, when ocfs2_attch_dentry_lock fails, we call an extra iput and reset dentry->d_fsdata to NULL. This resolve a bug, but it isn't completed and the dentry is still there. When we want to use it again, ocfs2_dentry_revalidate doesn't catch it and return true. That make future ocfs2_dentry_lock panic out. One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162. The resolution is to add a check for dentry->d_fsdata in revalidate process and return false if dentry->d_fsdata is NULL, so that a new ocfs2_lookup will be called again. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-27 18:10:54 -07:00
Frank Filz	d8d0b85b11	nfsd4: remove ACE4_IDENTIFIER_GROUP flag from GROUP@ entry RFC 3530 says "ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these special identifiers. When encoding entries with these special identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero." It really shouldn't matter either way, but the point is that this flag is used to distinguish named users from named groups (since unix allows a group to have the same name as a user), so it doesn't really make sense to use it on a special identifier such as this.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-27 17:35:41 -04:00
Benny Halevy	aaf84eb95a	nfsd41: renew_client must be called under the state lock Until we work out the state locking so we can use a spin lock to protect the cl_lru, we need to take the state_lock to renew the client. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: Do not renew state on error] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: Simplify exit code] Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-27 17:17:40 -04:00
Linus Torvalds	9c504cadc4	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: Ensure we alwasy write the terminating NULL. inotify: fix locking around inotify watching in the idr inotify: do not BUG on idr entries at inotify destruction inotify: seperate new watch creation updating existing watches	2009-08-27 12:26:02 -07:00
Linus Torvalds	cf481442f2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: update documentation pointers 9p: remove unnecessary v9fses->options which duplicates the mount string net/9p: insulate the client against an invalid error code sent by a 9p server 9p: Add missing cast for the error return value in v9fs_get_inode 9p: Remove redundant inode uid/gid assignment 9p: Fix possible regressions when ->get_sb fails. 9p: Fix v9fs show_options 9p: Fix possible memleak in v9fs_inode_from fid. 9p: minor comment fixes 9p: Fix possible inode leak in v9fs_get_inode. 9p: Check for error in return value of v9fs_fid_add	2009-08-27 12:24:08 -07:00
David Howells	9886e836a6	AFS: Stop readlink() on AFS crashing due to NULL 'file' ptr kAFS crashes when asked to read a symbolic link because page_getlink() passes a NULL file pointer to read_mapping_page(), but afs_readpage() expects a file pointer from which to extract a key. Modify afs_readpage() to request the appropriate key from the calling process's keyrings if a file struct is not supplied with one attached. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-27 12:22:08 -07:00
Steven Whitehouse	8d8291ae93	GFS2: Remove no_formal_ino generating code The inum structure used throughout GFS2 has two fields. One no_addr is the disk block number of the inode in question and is used everywhere as the inode number. The other, no_formal_ino, is used only as the generation number for NFS. Historically the no_formal_ino field was set using a complicated system of one global and one per-node file containing inode numbers in order to ensure that each no_formal_ino was unique. Also this code made no provision for what would happen when eventually the (64 bit) numbers ran out. Now I know that is pretty unlikely to happen given the large space of numbers, but it is possible nevertheless. The only guarantee required for no_formal_ino is that, for any single inode, the same number doesn't get reused too quickly. We already have a generation number which is kept in the inode and initialised from a counter in the resource group (almost no overhead, since we have to touch the resource group anyway in order to allocate an inode in the first place). Aside from ensuring that we never use the value 0 in the no_formal_ino field, we can use that counter directly. As a result of that change, we lose about 200 lines of code and also gain about 10 creates/sec on the postmark benchmark (on my test machine). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-27 15:51:07 +01:00
Eric W. Biederman	0db501bd06	inotify: Ensure we alwasy write the terminating NULL. Before the rewrite copy_event_to_user always wrote a terqminating '\0' byte to user space after the filename. Since the rewrite that terminating byte was skipped if your filename is exactly a multiple of event_size. Ouch! So add one byte to name_size before we round up and use clear_user to set userspace to zero like /dev/zero does instead of copying the strange nul_inotify_event. I can't quite convince myself len_to_zero will never exceed 16 and even if it doesn't clear_user should be more efficient and a more accurate reflection of what the code is trying to do. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2009-08-27 08:02:10 -04:00
Eric Paris	dead537dd8	inotify: fix locking around inotify watching in the idr The are races around the idr storage of inotify watches. It's possible that a watch could be found from sys_inotify_rm_watch() in the idr, but it could be removed from the idr before that code does it's removal. Move the locking and the refcnt'ing so that these have to happen atomically. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-08-27 08:02:04 -04:00
Eric Paris	cf4374267f	inotify: do not BUG on idr entries at inotify destruction If an inotify watch is left in the idr when an fsnotify group is destroyed this will lead to a BUG. This is not a dangerous situation and really indicates a programming bug and leak of memory. This patch changes it to use a WARN and a printk rather than killing people's boxes. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-08-27 08:02:04 -04:00
Eric Paris	52cef7555a	inotify: seperate new watch creation updating existing watches There is nothing known wrong with the inotify watch addition/modification but this patch seperates the two code paths to make them each easy to verify as correct. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-08-27 08:02:04 -04:00
Steven Whitehouse	307cf6e63c	GFS2: Rename eattr.[ch] as xattr.[ch] Use the more conventional name for the extended attribute support code. Update all the places which care. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-26 18:51:04 +01:00
Steven Whitehouse	40b78a3223	GFS2: Clean up of extended attribute support This has been on my list for some time. We need to change the way in which we handle extended attributes to allow faster file creation times (by reducing the number of transactions required) and the extended attribute code is the main obstacle to this. In addition to that, the VFS provides a way to demultiplex the xattr calls which we ought to be using, rather than rolling our own. This patch changes the GFS2 code to use that VFS feature and as a result the code shrinks by a couple of hundred lines or so, and becomes easier to read. I'm planning on doing further clean up work in this area, but this patch is a good start. The cleaned up code also uses the more usual "xattr" shorthand, I plan to eliminate the use of "eattr" eventually and in the mean time it serves as a flag as to which bits of the code have been updated. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-26 18:41:32 +01:00
Eric Sandeen	a36b44988c	ext4: use ext4_grpblk_t more extensively unsigned short is potentially too small to track blocks within a group; today it is safe due to restrictions in e2fsprogs but we have _lo / _hi bits for group blocks with the intent to go up to 32 bits, so clean this up now. There are many more places where we use unsigned/int/unsigned int to contain a group block but this should at least fix all the short types. I added a few comments to the struct ext4_group_info definition as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-25 22:36:45 -04:00
Eric Sandeen	1927805e65	ext4: use variables not types in sizeofs() for allocations Precursor to changing some types; to keep things in sync, it seems better to allocate/memset based on the size of the variables we are using rather than on some disconnected basic type like "unsigned short" Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2009-08-25 22:36:25 -04:00
Aneesh Kumar K.V	a8526e84ac	ext4: Add missing unlock_new_inode() call in extent migration code We need to unlock the new inode before iput. This patch fixes the following warning when calling chattr +e to migrate a file to use extents. It also fixes problems in when e4defrag attempts to defragment an inode. [ 470.400044] ------------[ cut here ]------------ [ 470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a() [ 470.400072] Hardware name: N/A ..... ... [ 470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4 [ 470.400359] Call Trace: [ 470.400372] [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f [ 470.400385] [<ffffffff81037798>] warn_slowpath_null+0xf/0x11 [ 470.400395] [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a [ 470.400405] [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd [ 470.400413] [<ffffffff810b7083>] iput+0x61/0x65 [ 470.400455] [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4] [ 470.400492] [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4] [ 470.400507] [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82 [ 470.400517] [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9 [ 470.400527] [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf [ 470.400537] [<ffffffff810b2087>] sys_ioctl+0x51/0x74 [ 470.400549] [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b [ 470.400557] ---[ end trace ab85723542352dac ]--- Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-25 22:36:05 -04:00
Linus Torvalds	e9cab24cf3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: ext3: Improve error message that changing journaling mode on remount is not possible ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED	2009-08-25 09:47:36 -07:00
Ryusei Yamaguchi	ed2d8aed52	knfsd: Replace lock_kernel with a mutex in nfsd pool stats. lock_kernel() in knfsd was replaced with a mutex. The later commit `03cf6c9f49` ("knfsd: add file to export stats about nfsd pools") did not follow that change. This patch fixes the issue. Also move the get and put of nfsd_serv to the open and close methods (instead of start and stop methods) to allow atomic check and increment of reference count in the open method (where we can still return an error). Signed-off-by: Ryusei Yamaguchi <mandel59@gmail.com> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: Greg Banks <gnb@fmeh.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-25 12:39:37 -04:00
Frank Filz	55bb55dca0	nfsd: Fix unnecessary deny bits in NFSv4 ACL The group deny entries end up denying tcy even though tcy was just allowed by the allow entry. This appears to be due to: ace->access_mask = mask_from_posix(deny, flags); instead of: ace->access_mask = deny_mask_from_posix(deny, flags); Denying a previously allowed bit has no effect, so this shouldn't affect behavior, but it's ugly. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-24 20:01:22 -04:00
Trond Myklebust	7111dc7392	NFSv4: Fix an infinite looping problem with the nfs4_state_manager Commit `76db6d9500` (nfs41: add session setup to the state manager) introduces an infinite loop possibility in the NFSv4 state manager. By first checking nfs4_has_session() before clearing the NFS4CLNT_SESSION_SETUP flag, it allows for a situation where someone sets that flag, but it never gets cleared, and so the state manager loops. In fact commit `c3fad1b1aa` (nfs41: add session reset to state manager) causes this to happen every time we get a network partition error. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-24 16:28:42 -07:00
Linus Torvalds	2584e7986f	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/dlm: Wait on lockres instead of erroring cancel requests ocfs2: Add missing lock name ocfs2: Don't oops in ocfs2_kill_sb on a failed mount ocfs2: release the buffer head in ocfs2_do_truncate. ocfs2: Handle quota file corruption more gracefully	2009-08-24 14:41:28 -07:00
Hugh Dickins	353d5c30c6	mm: fix hugetlb bug due to user_shm_unlock call 2.6.30's commit `8a0bdec194` removed user_shm_lock() calls in hugetlb_file_setup() but left the user_shm_unlock call in shm_destroy(). In detail: Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock() is not called in hugetlb_file_setup(). However, user_shm_unlock() is called in any case in shm_destroy() and in the following atomic_dec_and_lock(&up->__count) in free_uid() is executed and if up->__count gets zero, also cleanup_user_struct() is scheduled. Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set. However, the ref counter up->__count gets unexpectedly non-positive and the corresponding structs are freed even though there are live references to them, resulting in a kernel oops after a lots of shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set. Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the time of shm_destroy() may give a different answer from at the time of hugetlb_file_setup(). And fixed newseg()'s no_id error path, which has missed user_shm_unlock() ever since it came in 2.6.9. Reported-by: Stefan Huber <shuber2@gmail.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Tested-by: Stefan Huber <shuber2@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-24 12:53:01 -07:00
Paolo Bonzini	1329e3f2c8	dlm: use kernel_sendpage Using kernel_sendpage() is cleaner and safer than following sock->ops ourselves. Signed-off-by: Paolo Bonzini <bonzini@gnu.org> Signed-off-by: David Teigland <teigland@redhat.com>	2009-08-24 13:18:04 -05:00
Lars Marowsky-Bree	063c4c9963	dlm: fix connection close handling Closing a connection to a node can create problems if there are outstanding messages for that node. The problems include dlm_send spinning attempting to reconnect, or BUG from tcp_connect_to_sock() attempting to use a partially closed connection. To cleanly close a connection, we now first attempt to send any pending messages, cancel any remaining workqueue work, and flag the connection as closed to avoid reconnect attempts. Signed-off-by: Lars Marowsky-Bree <lmb@suse.de> Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2009-08-24 13:13:56 -05:00
Jan Kara	3c4cec6527	ext3: Improve error message that changing journaling mode on remount is not possible This patch makes the error message about changing journaling mode on remount more descriptive. Some people are going to hit this error now due to commit `bbae8bcc49` if they configure a kernel to default to data=writeback mode. The problem happens if they have data=ordered set for the root filesystem in /etc/fstab but not in the kernel command line (and they don't use initrd). Their filesystem then gets mounted as data=writeback by kernel but then their boot fails because init scripts won't be able to remount the filesystem rw. Better error message will hopefully make it easier for them to find the error in their setup and bother us less with error reports :). Signed-off-by: Jan Kara <jack@suse.cz>	2009-08-24 16:48:45 +02:00
Theodore Ts'o	6d41807614	ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED The old description for this configuration option was perhaps not completely balanced in terms of describing the tradeoffs of using a default of data=writeback vs. data=ordered. Despite the fact that old description very strongly recomended disabling this feature, all of the major distributions have elected to preserve the existing 'legacy' default, which is a strong hint that it perhaps wasn't telling the whole story. This revised description has been vetted by a number of ext3 developers as being better at informing the user about the tradeoffs of enabling or disabling this configuration feature. Cc: linux-ext4@vger.kernel.org Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2009-08-24 16:48:32 +02:00
Bob Peterson	d34843d0c4	GFS2: Add "-o errors=panic\|withdraw" mount options This patch adds "-o errors=panic" and "-o errors=withdraw" to the gfs2 mount options. The "errors=withdraw" option is today's current behaviour, meaning to withdraw from the file system if a non-serious gfs2 error occurs. The new "errors=panic" option tells gfs2 to force a kernel panic if a non-serious gfs2 file system error occurs. This may be useful, for example, where fabric-level fencing is used that has no way to reboot (such as fence_scsi). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-24 10:44:18 +01:00
Roel Kluin	cd0120751d	GFS2: jumping to wrong label? Also a gfs2_glock_dq() is required here. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-24 10:41:44 +01:00
Mimi Zohar	6777d773a4	kernel_read: redefine offset type vfs_read() offset is defined as loff_t, but kernel_read() offset is only defined as unsigned long. Redefine kernel_read() offset as loff_t. Cc: stable@kernel.org Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-08-24 14:58:23 +10:00
Chuck Lever	5eecfde615	NFS: Handle a zero-length auth flavor list Some releases of Linux rpc.mountd (nfs-utils 1.1.4 and later) return an empty auth flavor list if no sec= was specified for the export. This is notably broken server behavior. The new auth flavor list checking added in a recent commit rejects this case. The OpenSolaris client does too. The broken mountd implementation is already widely deployed. To avoid a behavioral regression, the kernel's mount client skips flavor checking (ie reverts to the pre-2.6.32 behavior) if mountd returns an empty flavor list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-23 23:43:57 -04:00
Artem Bityutskiy	8c6866b071	UBIFS: constify file and inode operations This patch adds 'const' qualifier to UBIFS xattr inode and file operations. Pointed-out-by: Julia Lawall <julia@diku.dk> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-08-22 11:54:51 +03:00
Linus Torvalds	8e9d78edea	Re-introduce page mapping check in mark_buffer_dirty() In commit `a8e7d49aa7` ("Fix race in create_empty_buffers() vs __set_page_dirty_buffers()"), I removed a test for a NULL page mapping unintentionally when some of the code inside __set_page_dirty() was moved to the callers. That removal generally didn't matter, since a filesystem would serialize truncation (which clears the page mapping) against writing (which marks the buffer dirty), so locking at a higher level (either per-page or an inode at a time) should mean that the buffer page would be stable. And indeed, nothing bad seemed to happen. Except it turns out that apparently reiserfs does something odd when under load and writing out the journal, and we have a number of bugzilla entries that look similar: http://bugzilla.kernel.org/show_bug.cgi?id=13556 http://bugzilla.kernel.org/show_bug.cgi?id=13756 http://bugzilla.kernel.org/show_bug.cgi?id=13876 and it looks like reiserfs depended on that check (the common theme seems to be "data=journal", and a journal writeback during a truncate). I suspect reiserfs should have some additional locking, but in the meantime this should get us back to the pre-2.6.29 behavior. Pattern-pointed-out-by: Roland Kletzing <devzero@web.de> Cc: stable@kernel.org (2.6.29 and 2.6.30) Cc: Jeff Mahoney <jeffm@suse.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-21 17:40:08 -07:00
Linus Torvalds	b57f92157e	Merge branch 'btrfs' of git://git.kernel.dk/linux-2.6-block * 'btrfs' of git://git.kernel.dk/linux-2.6-block: btrfs: fix inode rbtree corruption	2009-08-21 09:56:55 -07:00
Jeff Layton	fbf4665f41	nfsd: populate sin6_scope_id on callback address with scopeid from rq_addr on SETCLIENTID call When a SETCLIENTID call comes in, one of the args given is the svc_rqst. This struct contains an rq_addr field which holds the address that sent the call. If this is an IPv6 address, then we can use the sin6_scope_id field in this address to populate the sin6_scope_id field in the callback address. AFAICT, the rq_addr.sin6_scope_id is non-zero if and only if the client mounted the server's link-local address. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-21 11:27:44 -04:00
Jeff Layton	7077ecbabd	nfsd: add support for NFSv4 callbacks over IPv6 The framework to add this is all in place. Now, add the code to allow support for establishing a callback channel on an IPv6 socket. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-21 11:27:44 -04:00
Jeff Layton	aa9a4ec770	nfsd: convert nfs4_cb_conn struct to hold address in sockaddr_storage ...rather than as a separate address and port fields. This will be necessary for implementing callbacks over IPv6. Also, convert gen_callback to use the standard rpcuaddr2sockaddr routine rather than its own private one. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-21 11:27:43 -04:00
Jeff Layton	363168b4ea	nfsd: make nfs4_client->cl_addr a struct sockaddr_storage It's currently a __be32, which isn't big enough to hold an IPv6 address. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-21 11:27:43 -04:00
Jeff Layton	4516fc0454	sunrpc: add routine for comparing addresses lockd needs these sort of routines, as does the NFSv4 callback code. Move lockd's routines into common code and rename them so that they can be used by others. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-21 11:27:42 -04:00
J. Bruce Fields	e9dc122166	Merge branch 'nfs-for-2.6.32' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 into for-2.6.32-incoming Conflicts: net/sunrpc/cache.c	2009-08-21 11:27:29 -04:00
From: Nick Piggin	03e860bd9f	btrfs: fix inode rbtree corruption Node may not be inserted over existing node. This causes inode tree corruption and I was seeing crashes in inode_tree_del which I can not reproduce after this patch. The other way to fix this would be to tie inode lifetime in the rbtree with inode while not in freeing state. I had a look at this but it is not so trivial at this point. At least this patch gets things working again. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Acked-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-08-21 10:09:44 +02:00
Amerigo Wang	939a9421eb	vfs: allow file truncations when both suid and write permissions set When suid is set and the non-owner user has write permission, any writing into this file should be allowed and suid should be removed after that. However, current kernel only allows writing without truncations, when we do truncations on that file, we get EPERM. This is a bug. Steps to reproduce this bug: % ls -l rootdir/file1 -rwsrwsrwx 1 root root 3 Jun 25 15:42 rootdir/file1 % echo h > rootdir/file1 zsh: operation not permitted: rootdir/file1 % ls -l rootdir/file1 -rwsrwsrwx 1 root root 3 Jun 25 15:42 rootdir/file1 % echo h >> rootdir/file1 % ls -l rootdir/file1 -rwxrwxrwx 1 root root 5 Jun 25 16:34 rootdir/file1 Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Eric Sandeen <esandeen@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Cc: Eugene Teo <eteo@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Christoph Hellwig <hch@lst.de> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org>	2009-08-21 14:25:48 +10:00
Goldwyn Rodrigues	c795b33ba1	ocfs2/dlm: Wait on lockres instead of erroring cancel requests In case a downconvert is queued, and a flock receives a signal, BUG_ON(lockres->l_action != OCFS2_AST_INVALID) is triggered because a lock cancel triggers a dlmunlock while an AST is scheduled. To avoid this, allow a LKM_CANCEL to pass through, and let it wait on __dlm_wait_on_lockres(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Acked-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-20 18:42:34 -07:00
Jan Kara	a8b88d3d49	ocfs2: Add missing lock name There is missing name for NFSSync cluster lock. This makes lockdep unhappy because we end up passing NULL to lockdep when initializing lock key. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-20 16:41:53 -07:00
Jan Kara	e1af88a1ad	nfs: Remove reference to generic_osync_inode from a comment generic_file_direct_write() no longer calls generic_osync_inode() so remove the comment. CC: linux-nfs@vger.kernel.org CC: Neil Brown <neilb@suse.de> CC: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-19 19:48:08 -04:00
James Morris	ece13879e7	Merge branch 'master' into next Conflicts: security/Kconfig Manual fix. Signed-off-by: James Morris <jmorris@namei.org>	2009-08-20 09:18:42 +10:00
Trond Myklebust	7d7ea88289	NFS: Use the DNS resolver in the mount code. In the referral code, use it to look up the new server's ip address if the fs_locations attribute contains a hostname. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-19 18:22:15 -04:00
Trond Myklebust	e571cbf1a4	NFS: Add a dns resolver for use with NFSv4 referrals and migration The NFSv4 and NFSv4.1 protocols both allow for the redirection of a client from one server to another in order to support filesystem migration and replication. For full protocol support, we need to add the ability to convert a DNS host name into an IP address that we can feed to the RPC client. We'll reuse the sunrpc cache, now that it has been converted to work with rpc_pipefs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-19 18:22:15 -04:00
Trond Myklebust	6a396f67d2	Merge branch 'nfsv4_xdr_cleanups-for-2.6.32' into nfs-for-2.6.32 Conflicts: fs/nfs/nfs4xdr.c	2009-08-19 18:21:52 -04:00
Linus Torvalds	6c30c53fd5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix oopses with doubly mounted snapshots nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint()	2009-08-19 10:40:24 -07:00
KOSAKI Motohiro	0753ba01e1	mm: revert "oom: move oom_adj value" The commit `2ff05b2b` (oom: move oom_adj value) moveed the oom_adj value to the mm_struct. It was a very good first step for sanitize OOM. However Paul Menage reported the commit makes regression to his job scheduler. Current OOM logic can kill OOM_DISABLED process. Why? His program has the code of similar to the following. ... set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom / ... if (vfork() == 0) { set_oom_adj(0); / Invoked child can be killed */ execve("foo-bar-cmd"); } .... vfork() parent and child are shared the same mm_struct. then above set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also change oom_adj for vfork() parent. Then, vfork() parent (job scheduler) lost OOM immune and it was killed. Actually, fork-setting-exec idiom is very frequently used in userland program. We must not break this assumption. Then, this patch revert commit `2ff05b2b` and related commit. Reverted commit list --------------------- - commit `2ff05b2b4e` (oom: move oom_adj value from task_struct to mm_struct) - commit `4d8b9135c3` (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE) - commit `8123681022` (oom: only oom kill exiting tasks with attached memory) - commit `933b787b57` (mm: copy over oom_adj value at fork time) Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-18 16:31:13 -07:00
Jeff Layton	89a4eb4b66	vfs: make get_sb_pseudo set s_maxbytes to value that can be cast to signed get_sb_pseudo sets s_maxbytes to ~0ULL which becomes negative when cast to a signed value. Fix it to use MAX_LFS_FILESIZE which casts properly to a positive signed value. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Steve French <smfrench@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-18 16:31:12 -07:00
Casey Dahlin	b5711b8e5a	dlm: fix double-release of socket in error exit path The last correction to the tcp_connect_to_sock error exit path, commit `a89d63a159`, can free an already freed socket, due to collision with a previous (incomplete) attempt to fix the same issue, commit `311f6fc77c`. Signed-off-by: Casey Dahlin <cdahlin@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2009-08-18 15:09:24 -05:00
Ryusuke Konishi	a924586036	nilfs2: fix oopses with doubly mounted snapshots will fix kernel oopses like the following: # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2 # umount /test1 # umount /test2 BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069 in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2 1 lock held by umount.nilfs2/3886: #0: (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c irq event stamp: 1219 hardirqs last enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119 hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119 softirqs last enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55 Call Trace: [<c1023549>] __might_sleep+0x107/0x10e [<c13603c0>] do_page_fault+0x246/0x397 [<c136017a>] ? do_page_fault+0x0/0x397 [<c135e753>] error_code+0x6b/0x70 [<c136017a>] ? do_page_fault+0x0/0x397 [<c104f805>] ? __lock_acquire+0x91/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050b2b>] lock_acquire+0xba/0xdd [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c135d4fe>] down_write+0x2a/0x46 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c104ea2c>] ? mark_held_locks+0x43/0x5b [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133 [<c104ece4>] ? trace_hardirqs_on+0xb/0xd [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2] [<c10b3352>] generic_shutdown_super+0x49/0xb8 [<c10b33de>] kill_block_super+0x1d/0x31 [<c10e6599>] ? vfs_quota_off+0x0/0x12 [<c10b398f>] deactivate_super+0x57/0x6c [<c10c4bc3>] mntput_no_expire+0x8c/0xb4 [<c10c5094>] sys_umount+0x27f/0x2a4 [<c10c50c6>] sys_oldumount+0xd/0xf [<c10031a4>] sysenter_do_call+0x12/0x38 ... This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify remaining sget() use"). In the patch, a new "put resource" function, nilfs_put_sbinfo() was introduced to delay freeing nilfs_sb_info struct. But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test() function to check the reference count, and it caused the nilfs_sb_info was freed when user mounted a snapshot twice. This bug also suggests there was unseen memory leak in usual mount /umount operations for nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-08-19 02:10:13 +09:00
Wengang Wang	970343cd49	GFS2: free disk inode which is deleted by remote node -V2 this patch is for the same problem that Benjamin Marzinski fixes at commit `b94a170e96` quotation of the original problem: ---cut here--- When a file is deleted from a gfs2 filesystem on one node, a dcache entry for it may still exist on other nodes in the cluster. If this happens, gfs2 will be unable to free this file on disk. Because of this, it's possible to have a gfs2 filesystem with no files on it and no free space. With this patch, when a node receives a callback notifying it that the file is being deleted on another node, it schedules a new workqueue thread to remove the file's dcache entry. ---end cut--- after applying Benjamin's patch, I think there is still a case in which the disk inode remains even when "no space" is hit. the case is that when running d_prune_aliases() against the inode, there are one or more dentries(aliases) which have reference count number > 0. in this case the dentries won't be pruned. and even later, the reference count becomes to 0, the dentries can still be cached in memory. unfortunately, no callback come again, things come back to the state before the callback runs. thus the on disk inode remains there until in memoryinode is removed for some other reason(shrinking inode cache or unmount the volume..). this patch is to remove those dentries when their reference count becomes to 0 and the inode is deleted by remote node. for implementation, gfs2_dentry_delete() is added as dentry_operations.d_delete. the function returns true when the inode is deleted by remote node. in dput(), gfs2_dentry_delete() is called and since it returns true, the dentry is unhashed from dcache and then removed. when all dentries are removed, the in memory inode get removed so that the on disk inode is freed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-18 10:29:39 +01:00
Zhang Qiang	1154ecbd2f	nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint() 'ns_cno' of structure 'the_nilfs' must be protected from segment writer, in other words, the caller of nilfs_get_checkpoint should hold read lock for nilfs->ns_segctor_sem. This patch adds the lock/unlock operations in nilfs_attach_checkpoint() when calling nilfs_cpfile_get_checkpoint(). Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-08-18 17:32:27 +09:00
Eric Sandeen	a13fb1a453	ext4: Add feature set check helper for mount & remount paths A user reported that although his root ext4 filesystem was mounting fine, other filesystems would not mount, with the: "Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF" error on his 32-bit box built without CONFIG_LBDAF. This is because the test at mount time for this situation was not being re-checked on remount, and the normal boot process makes an ro->rw transition, so this was being missed. Refactor to make a common helper function to test the filesystem features against the type of mount request (RO vs. RW) so that we stay consistent. Addresses Red-Hat-Bugzilla: #517650 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-18 00:20:23 -04:00
Eric Sandeen	38877f4e8d	simplify some logic in ext4_mb_normalize_request While reading through some of the mballoc code it seems that a couple spots in the size normalization function could be streamlined. The test for non-overlapping PAs can be or'd for the start & end conditions, and the tests for adjacent PAs can be else-if'd - it's essentially independently testing: if (A + B <= C) ... if (A > C) ... These cannot both be true so it seems like the else-if might be slightly more efficient and/or informative. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-17 23:55:24 -04:00
Eric Sandeen	0373130d5b	ext4: open-code ext4_mb_update_group_info ext4_mb_update_group_info is only called in one place, and it's extremely simple. There's no reason to have it in a separate function in a separate file as far as I can tell, it just obfuscates what's really going on. Perhaps it was intended to keep the grp->bb_* manipulation local to mballoc.c but we're already accessing other grp-> fields in balloc.c directly so this seems ok. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-17 23:51:29 -04:00
Eric Sandeen	bf43d84b18	ext4: reject too-large filesystems on 32-bit kernels ext4 will happily mount a > 16T filesystem on a 32-bit box, but this is not safe; writes to the block device will wrap past 16T and the page cache can't index past 16T (232 index * 4k pages). Adding another test to the existing "too many sectors" test should do the trick. Add a comment, a relevant return value, and fix the reference to the CONFIG_LBD(AF) option as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-17 23:48:51 -04:00
Jan Kara	487caeef9f	ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() During truncate we are sometimes forced to start a new transaction as the amount of blocks to be journaled is both quite large and hard to predict. So far we restarted a transaction while holding i_data_sem and that violates lock ordering because i_data_sem ranks below a transaction start (and it can lead to a real deadlock with ext4_get_blocks() mapping blocks in some page while having a transaction open). We fix the problem by dropping the i_data_sem before restarting the transaction and acquire it afterwards. It's slightly subtle that this works: 1) By the time ext4_truncate() is called, all the page cache for the truncated part of the file is dropped so get_block() should not be called on it (we only have to invalidate extent cache after we reacquire i_data_sem because some extent from not-truncated part could extend also into the part we are going to truncate). 2) Writes, migrate or defrag hold i_mutex so they are stopped for all the time of the truncate. This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-17 22:17:20 -04:00
Jan Kara	9599b0e597	jbd2: Annotate transaction start also for jbd2_journal_restart() lockdep annotation for a transaction start has been at the end of jbd2_journal_start(). But a transaction is also started from jbd2_journal_restart(). Move the lockdep annotation to start_this_handle() which covers both cases. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-17 21:23:17 -04:00
Mingming	553f900893	ext4: Show unwritten extent flag in ext4_ext_show_leaf() ext4_ext_show_leaf() will display the leaf extents when extent debugging is enabled. Printing out the unwritten bit is useful for debugging unwritten extent, allow us to see the unwritten extents vs written extents, after the unwritten extents are splitted or converted. Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2009-09-18 13:34:55 -04:00
Mingming	84fe3bef59	ext4: Compile warning fix when EXT_DEBUG enabled When EXT_DEBUG is enabled I received the following compile warning on PPC64: CC [M] fs/ext4/inode.o CC [M] fs/ext4/extents.o fs/ext4/extents.c: In function ‘ext4_ext_rm_leaf’: fs/ext4/extents.c:2097: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_lblk_t’ fs/ext4/extents.c: In function ‘ext4_ext_get_blocks’: fs/ext4/extents.c:2789: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ fs/ext4/extents.c:2852: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘ext4_lblk_t’ fs/ext4/extents.c:2953: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘unsigned int’ CC [M] fs/ext4/migrate.o The patch fixes compile warning. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Index: linux-2.6.31-rc4/fs/ext4/extents.c ===================================================================	2009-09-01 08:44:37 -04:00
Theodore Ts'o	50797481a7	ext4: Avoid group preallocation for closed files Currently the group preallocation code tries to find a large (512) free block from which to do per-cpu group allocation for small files. The problem with this scheme is that it leaves the filesystem horribly fragmented. In the worst case, if the filesystem is unmounted and remounted (after a system shutdown, for example) we forget the fact that wee were using a particular (now-partially filled) 512 block extent. So the next time we try to allocate space for a small file, we will find another completely free 512 block chunk to allocate small files. Given that there are 32,768 blocks in a block group, after 64 iterations of "mount, write one 4k file in a directory, unmount", the block group will have 64 files, each separated by 511 blocks, and the block group will no longer have any free 512 completely free chunks of blocks for group preallocation space. So if we try to allocate blocks for a file that has been closed, such that we know the final size of the file, and the filesystem is not busy, avoid using group preallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-18 13:34:02 -04:00
Abhishek Kulkarni	4b53e4b500	9p: remove unnecessary v9fses->options which duplicates the mount string The mount options string is saved in sb->s_options. This patch removes the redundant duplicating of the mount options. Also, since we are not displaying anything special in show options, we replace v9fs_show_options with generic_show_options for now. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-08-17 16:42:28 -05:00
Abhishek Kulkarni	48559b4c30	9p: Add missing cast for the error return value in v9fs_get_inode Cast the error return value (ENOMEM) in v9fs_get_inode() to its correct type using ERR_PTR. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-08-17 16:35:08 -05:00
Jan Kara	5fd1318937	ocfs2: Don't oops in ocfs2_kill_sb on a failed mount If we fail to mount the filesystem, we have to be careful not to dereference uninitialized structures in ocfs2_kill_sb. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-17 14:32:24 -07:00
Abhishek Kulkarni	4d3297ca5b	9p: Remove redundant inode uid/gid assignment Remove a redundant update of inode's i_uid and i_gid after v9fs_get_inode() since the latter already sets up a new inode and sets the proper uid and gid values. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-08-17 16:27:58 -05:00
Abhishek Kulkarni	1b5ab3e867	9p: Fix possible regressions when ->get_sb fails. ->get_sb can fail causing some badness. this patch fixes * clear sb->fs_s_info in kill_sb. * deactivate_locked_super() calls kill_sb (v9fs_kill_super) which closes the destroys the client, clunks all its fids and closes the v9fs session. Attempting to do it twice will cause an oops. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-08-17 16:27:57 -05:00
Abhishek Kulkarni	4f4038328d	9p: Fix v9fs show_options Add the delimiter ',' before the options when they are passed and check if no option parameters are passed to prevent displaying NULL in /proc/mounts. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-08-17 16:27:57 -05:00
Abhishek Kulkarni	02bc35672b	9p: Fix possible memleak in v9fs_inode_from fid. Add missing p9stat_free in v9fs_inode_from_fid to avoid any possible leaks. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-08-17 16:27:57 -05:00
Abhishek Kulkarni	0e15597ebf	9p: minor comment fixes Fix the comments -- mostly the improper and/or missing descriptions of function parameters. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-08-17 16:27:57 -05:00
Abhishek Kulkarni	2bb541157f	9p: Fix possible inode leak in v9fs_get_inode. Add a missing iput when cleaning up if v9fs_get_inode fails after returning a valid inode. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-08-17 16:27:57 -05:00
Abhishek Kulkarni	50fb6d2bd7	9p: Check for error in return value of v9fs_fid_add Check if v9fs_fid_add was successful or not based on its return value. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-08-17 16:27:57 -05:00
Linus Torvalds	c58afec8b2	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix locking in xfs_iget_cache_hit	2009-08-17 13:39:30 -07:00
Eric Paris	08e53fcb0d	inotify: start watch descriptor count at 1 The inotify_add_watch man page specifies that inotify_add_watch() will return a non-negative integer. However, historically the inotify watches started at 1, not at 0. Turns out that the inotifywait program provided by the inotify-tools package doesn't properly handle a 0 watch descriptor. In `7e790dd5` we changed from starting at 1 to starting at 0. This patch starts at 1, just like in previous kernels, but also just like in previous kernels it's possible for it to wrap back to 0. This preserves the kernel functionality exactly like it was before the patch (neither method broke the spec) Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-17 13:37:37 -07:00
Eric Paris	cd94c8bbef	inotify: tail drop inotify q_overflow events In `f44aebcc` the tail drop logic of events with no file backing (q_overflow and in_ignored) was reversed so IN_IGNORED events would never be tail dropped. This now means that Q_OVERFLOW events are NOT tail dropped. The fix is to not tail drop IN_IGNORED, but to tail drop Q_OVERFLOW. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-17 13:37:37 -07:00
Eric Paris	eef3a116be	notify: unused event private race inotify decides if private data it passed to get added to an event was used by checking list_empty(). But it's possible that the event may have been dequeued and the private event removed so it would look empty. The fix is to use the return code from fsnotify_add_notify_event rather than looking at the list. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-17 13:37:37 -07:00
Tao Ma	60e2ec4866	ocfs2: release the buffer head in ocfs2_do_truncate. In ocfs2_do_truncate, we forget to release last_eb_bh which will cause memleak. So call brelse in the end. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-17 12:50:35 -07:00
Jan Kara	ada508274b	ocfs2: Handle quota file corruption more gracefully ocfs2_read_virt_blocks() does BUG when we try to read a block from a file beyond its end. Since this can happen due to filesystem corruption, it is not really an appropriate answer. Make ocfs2_read_quota_block() check the condition and handle it by calling ocfs2_error() and returning EIO. [ Modified to print ip_blkno in the error - Joel ] Reported-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-17 12:50:12 -07:00
Steven Whitehouse	31e54b01f3	GFS2: Add sysfs link to device This adds a link from the per-gfs2 sb sysfs directory to the block device upon which the filesystem is mounted. The link is called "device", strangely enough :-) Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-17 11:11:18 +01:00
Steven Whitehouse	05164e5b37	GFS2: Replace assertion with proper error handling One fewer assert, one more place we can recover gracefully if there is an error. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-17 11:06:43 +01:00
Steven Whitehouse	6050b9c74f	GFS2: Improve error handling in inode allocation A little while back, block allocation was given some improved error handling which meant that -EIO was returned in the case of there being a problem in the resource group data. In addition a message is printed explaning what went wrong and how to fix it. This extends that error handling so that it also covers inode allocation too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-17 11:05:31 +01:00
Steven Whitehouse	440d6da207	GFS2: Add some more info to uevents With each uevent, we now always include the journal ID. We can't call it JID since that is already in use by some of the individual events relating to recovery, so we use JOURNALID instead. We don't send the JOURNALID for spectator mounts, since there isn't one. Also the ADD event now has both RDONLY and SPECTATOR information to match that of the ONLINE event. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-17 11:05:00 +01:00
Steven Whitehouse	8633ecfaba	GFS2: Add online uevent to GFS2 We already have an offline uevent (used when a withdraw occurs) but no online uevent. This adds an online uevent so that userspace will be able to detect a successful mount by means other than not receiving a remove event after the add & recovery (change) uevents. It has also been added to the remount path as well - we can't use a change uevent there as older GFS2 userspace acts on change uevents according to the state that it thinks the fs is in, so we can't easily add any new ones. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-08-17 11:04:42 +01:00
Christoph Hellwig	bc990f5cb4	xfs: fix locking in xfs_iget_cache_hit The locking in xfs_iget_cache_hit currently has numerous problems: - we clear the reclaim tag without i_flags_lock which protects modifications to it - we call inode_init_always which can sleep with pag_ici_lock held (this is oss.sgi.com BZ #819) - we acquire and drop i_flags_lock a lot and thus provide no consistency between the various flags we set/clear under it This patch fixes all that with a major revamp of the locking in the function. The new version acquires i_flags_lock early and only drops it once we need to call into inode_init_always or before calling xfs_ilock. This patch fixes a bug seen in the wild where we race modifying the reclaim tag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-17 01:23:48 -05:00
Guillaume Knispel	b2add73dbf	poll/select: initialize triggered field of struct poll_wqueues The triggered field of struct poll_wqueues introduced in commit `5f820f648c` ("poll: allow f_op->poll to sleep"). It was first set to 1 in pollwake() (now __pollwake() ), tested and later set to 0 in poll_schedule_timeout(), but not initialized before. As a result when the process needs to sleep, triggered was likely to be non-zero even if pollwake() is not called before the first poll_schedule_timeout(), meaning schedule_hrtimeout_range() would not be called and an extra loop calling all ->poll() would be done. This patch initialize triggered to 0 in poll_initwait() so the ->poll() are not called twice before the process goes to sleep when it needs to. Signed-off-by: Guillaume Knispel <gknispel@proformatique.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-15 18:40:11 -07:00
Benny Halevy	cccddf4f55	nfs: nfs4xdr: optimize low level decoding do not increment decoding ptr if not needed. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 14:02:26 -04:00
Benny Halevy	c0eae66ece	nfs: nfs4xdr: get rid of READ_BUF Use xdr_inline_decode instead. Open code debug printout and error return. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 14:02:23 -04:00
Benny Halevy	2460ba57c4	nfs: nfs4xdr: simplify decode_exchange_id by reusing decode_opaque_inline Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 14:02:20 -04:00
Benny Halevy	99398d0655	nfs: nfs4xdr: get rid of COPYMEM Just directly call memcpy. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 14:02:17 -04:00
Benny Halevy	e78291e4e0	nfs: nfs4xdr: introduce decode_sessionid helper Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 14:02:14 -04:00
Benny Halevy	db942bbd09	nfs: nfs4xdr: introduce decode_verifier helper Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Trond: Fixed up an 'uninitialised variable' issue in decode_readdir] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:57:58 -04:00
Benny Halevy	07d30434cf	nfs: nfs4xdr: introduce decode_opaque_fixed and decode_stateid helpers Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:26:27 -04:00
Benny Halevy	686841b3cc	nfs: nfs4xdr: introduce print_overflow_msg Part fo the nfs4xdr cleanup. READ_BUF will go away. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:24:38 -04:00
Benny Halevy	c816fd3406	nfs: nfs4xdr: get rid of READTIME It has no users. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:24:32 -04:00
Benny Halevy	3ceb4dbb99	nfs: nfs4xdr: get rid of READ64 s/READ64$\(.)$/p = xdr_decode_hyper(p, \1)/ s/READ64$(.*)$/p = xdr_decode_hyper(p, &\1)/ Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:24:13 -04:00
Benny Halevy	6f723f7710	nfs: nfs4xdr: get rid of READ32 s/READ32$(.*)$/\1 = be32_to_cpup(p++)/ Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:23:58 -04:00
Benny Halevy	811652bd6e	nfs: nfs4xdr: merge xdr_encode_int+xdr_encode_opaque_fixed into xdr_encode_opaque use encode_string where appropriate. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:19:24 -04:00
Benny Halevy	345585132a	nfs: nfs4xdr: optimize low level encoding do not increment encoding ptr if not needed. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:18:03 -04:00
Benny Halevy	13c65ce900	nfs: nfs4xdr: change RESERVE_SPACE macro into a static helper In order to open code and expose the result pointer assignment. Alternatively, we can open code the call to xdr_reserve_space and do the BUG_ON an the error case at the call site. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:17:17 -04:00
Benny Halevy	2220f13a8b	nfs: nfs4xdr: encode_compound_hdr does not have to round up reserved bytes This is already done by xdr_reserve_space and since encode_compound_hdr is adding a byte count to "12" which is already word aligned, the xdr level rounding will work just as well. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:16:00 -04:00
Benny Halevy	42edd69812	nfs: nfs4xdr: optimize RESERVE_SPACE in encode_create_session and encode_sequence Coalesce multilpe constant RESERVE_SPACEs into one Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:15:20 -04:00
Benny Halevy	93f0cf2594	nfs: nfs4xdr: get rid of WRITEMEM s/WRITEMEM(/p = xdr_encode_opaque_fixed(p, / Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:13:58 -04:00
Benny Halevy	b95be5a976	nfs: nfs4xdr: get rid of WRITE64 s/WRITE64/p = xdr_encode_hyper(p, / Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:13:15 -04:00
Benny Halevy	e75bc1c89e	nfs: nfs4xdr: get rid of WRITE32 s/WRITE32/*p++ = cpu_to_be32/ Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-14 13:12:55 -04:00
Steven Whitehouse	d7e623da1a	GFS2: Fix permissions on "recover" file Although this file is only ever written and not read by userspace, it seems that the utils are opening this file O_RDWR, so we need to allow that. Also fixes the whitespace which seemed to be broken. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: David Teigland <teigland@redhat.com>	2009-08-14 14:04:46 +01:00
Linus Torvalds	bc7af9ba15	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (22 commits) ocfs2: Fix possible deadlock when extending quota file ocfs2: keep index within status_map[] ocfs2: Initialize the cluster we're writing to in a non-sparse extend ocfs2: Remove redundant BUG_ON in __dlm_queue_ast() ocfs2/quota: Release lock for error in ocfs2_quota_write. ocfs2: Define credit counts for quota operations ocfs2: Remove syncjiff field from quota info ocfs2: Fix initialization of blockcheck stats ocfs2: Zero out padding of on disk dquot structure ocfs2: Initialize blocks allocated to local quota file ocfs2: Mark buffer uptodate before calling ocfs2_journal_access_dq() ocfs2: Make global quota files blocksize aligned ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records. ocfs2: Fix deadlock on umount ocfs2: Add extra credits and access the modified bh in update_edge_lengths. ocfs2: Fail ocfs2_get_block() immediately when a block needs allocation ocfs2: Fix error return in ocfs2_write_cluster() ocfs2: Fix compilation warning for fs/ocfs2/xattr.c ocfs2: Initialize count in aio_write before generic_write_checks ocfs2: log the actual return value of ocfs2_file_aio_write() ...	2009-08-13 11:17:40 -07:00
David S. Miller	aa11d958d1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: arch/microblaze/include/asm/socket.h	2009-08-12 17:44:53 -07:00
Linus Torvalds	78efd1ddd9	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix spin_is_locked assert on uni-processor builds xfs: check for dinode realtime flag corruption use XFS_CORRUPTION_ERROR in xfs_btree_check_sblock xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_get xfs: switch to NOFS allocation under i_lock in xfs_readlink_bmap xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_set xfs: switch to NOFS allocation under i_lock in xfs_buf_associate_memory xfs: switch to NOFS allocation under i_lock in xfs_dir_cilookup_result xfs: switch to NOFS allocation under i_lock in xfs_da_buf_make xfs: switch to NOFS allocation under i_lock in xfs_da_state_alloc xfs: switch to NOFS allocation under i_lock in xfs_getbmap xfs: avoid memory allocation under m_peraglock in growfs code	2009-08-12 08:49:35 -07:00
Trond Myklebust	1ae88b2e44	NFS: Fix an O_DIRECT Oops... We can't call nfs_readdata_release()/nfs_writedata_release() without first initialising and referencing args.context. Doing so inside nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment() causes an Oops. We should rather be calling nfs_readdata_free()/nfs_writedata_free() in those cases. Looking at the O_DIRECT code, the "struct nfs_direct_req" is already referencing the nfs_open_context for us. Since the readdata and writedata structures carry a reference to that, we can simplify things by getting rid of the extra nfs_open_context references, so that we can replace all instances of nfs_readdata_release()/nfs_writedata_release(). Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-12 08:21:39 -07:00
Christoph Hellwig	a8914f3a6d	xfs: fix spin_is_locked assert on uni-processor builds Without SMP or preemption spin_is_locked always returns false, so we can't do an assert with it. Instead use assert_spin_locked, which does the right thing on all builds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reported-by: Johannes Engel <jcnengel@googlemail.com> Tested-by: Johannes Engel <jcnengel@googlemail.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:08:27 -05:00
Christoph Hellwig	b89d4208de	xfs: check for dinode realtime flag corruption Ramon tested XFS with a modified version of fsfuzzer and hit a NULL pointer dereference in __xfs_get_blocks due to the RT device target pointer being NULL. To fix this reject inode with the realtime bit set on a a filesystem without an RT subvolume during inode read. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reported-by: Ramon de Carvalho Valle <ramon@risesecurity.org> Tested-by: Ramon de Carvalho Valle <ramon@risesecurity.org> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:08:21 -05:00
Eric Sandeen	e0c222c411	use XFS_CORRUPTION_ERROR in xfs_btree_check_sblock In Red Hat Bug 512552 - Can't write to XFS mount during raid5 resync a user ran into corruption while resyncing a raid, and we failed a consistency test, but didn't get much more info; it'd be nice to call XFS_CORRUPTION_ERROR here so we can see the buffer contents. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:08:10 -05:00
Christoph Hellwig	ddd3a14e0f	xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_get xfs_attr_rmtval_get is always called with i_lock held, but i_lock is taken in reclaim context so all allocations under it must avoid recursions into the filesystem. Reported by the new reclaim context tracing in lockdep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:08:01 -05:00
Christoph Hellwig	7b02ecb303	xfs: switch to NOFS allocation under i_lock in xfs_readlink_bmap xfs_readlink_bmap is called with i_lock held, but i_lock is taken in reclaim context so all allocations under it must avoid recursions into the filesystem. Reported by the new reclaim context tracing in lockdep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:07:53 -05:00
Christoph Hellwig	10746e47e7	xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_set xfs_attr_rmtval_set is always called with i_lock held, and i_lock is taken in reclaim context so all allocations under it must avoid recursions into the filesystem. Reported by the new reclaim context tracing in lockdep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:07:44 -05:00
Christoph Hellwig	36fae17a64	xfs: switch to NOFS allocation under i_lock in xfs_buf_associate_memory xfs_buf_associate_memory is used for setting up the spare buffer for the log wrap case in xlog_sync which can happen under i_lock when called from xfs_fsync. The i_lock mutex is taken in reclaim context so all allocations under it must avoid recursions into the filesystem. There are a couple more uses of xfs_buf_associate_memory in the log recovery code that are also affected by this, but I'd rather keep the code simple than passing on a gfp_mask argument. Longer term we should just stop requiring the memoery allocation in xlog_sync by some smaller rework of the buffer layer. Reported by the new reclaim context tracing in lockdep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:07:38 -05:00
Christoph Hellwig	3f52c2f0a0	xfs: switch to NOFS allocation under i_lock in xfs_dir_cilookup_result xfs_dir_cilookup_result is always called with i_lock held, but i_lock is taken in reclaim context so all allocations under it must avoid recursions into the filesystem. Reported by the new reclaim context tracing in lockdep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:07:23 -05:00
Christoph Hellwig	73195ed786	xfs: switch to NOFS allocation under i_lock in xfs_da_buf_make i_lock is taken in the reclaim context so all allocations under it must avoid recursions into the filesystem. Reported by the new reclaim context tracing in lockdep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:07:14 -05:00
Christoph Hellwig	f41d7fb9da	xfs: switch to NOFS allocation under i_lock in xfs_da_state_alloc xfs_da_state_alloc is always called with i_lock held, but i_lock is taken in reclaim context so all allocations under it must avoid recursions into the filesystem. Reported by the new reclaim context tracing in lockdep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:07:07 -05:00
Christoph Hellwig	ca35dcd6ca	xfs: switch to NOFS allocation under i_lock in xfs_getbmap xfs_getbmap allocates memory with i_lock held, but i_lock is taken in reclaim context so all allocations under it must avoid recursions into the filesystem. Reported by the new reclaim context tracing in lockdep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:06:59 -05:00
Christoph Hellwig	0cc6eee130	xfs: avoid memory allocation under m_peraglock in growfs code Allocate the memory for the larger m_perag array before taking the per-AG lock as the per-AG lock can be taken under the i_lock which can be taken from reclaim context. Reported by the new reclaim context tracing in lockdep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:06:51 -05:00
James Morris	8b4bfc7feb	Merge branch 'master' into next	2009-08-11 08:33:01 +10:00
Trond Myklebust	f884dcaead	Merge branch 'sunrpc_cache-for-2.6.32' into nfs-for-2.6.32	2009-08-10 17:45:58 -04:00
Trond Myklebust	976a6f921c	Merge branch 'patches_cel-for-2.6.32' into nfs-for-2.6.32	2009-08-10 17:45:50 -04:00
Jan Kara	b409d7a0ab	ocfs2: Fix possible deadlock when extending quota file In OCFS2, allocator locks rank above transaction start. Thus we cannot extend quota file from inside a transaction less we could deadlock. We solve the problem by starting transaction not already in ocfs2_acquire_dquot() but only in ocfs2_local_read_dquot() and ocfs2_global_read_dquot() and we allocate blocks to quota files before starting the transaction. In case we crash, quota files will just have a few blocks more but that's no problem since we just use them next time we extend the quota file. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-10 12:20:22 -07:00
Bartlomiej Zolnierkiewicz	e576e05a73	nfs: remove superfluous BUG_ON()s Subject: [PATCH] nfs: remove superfluous BUG_ON()s Remove duplicated BUG_ON()s from nfs[4]_create_server() (we make the same checks earlier in both functions). This takes care of the following entries from Dan's list: fs/nfs/client.c +1078 nfs_create_server(47) warning: variable derefenced before check 'server->nfs_client' fs/nfs/client.c +1079 nfs_create_server(48) warning: variable derefenced before check 'server->nfs_client->rpc_ops' fs/nfs/client.c +1363 nfs4_create_server(43) warning: variable derefenced before check 'server->nfs_client' fs/nfs/client.c +1364 nfs4_create_server(44) warning: variable derefenced before check 'server->nfs_ Reported-by: Dan Carpenter <error27@gmail.com> Cc: corbet@lwn.net Cc: eteo@redhat.com Cc: Julia Lawall <julia@diku.dk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-10 08:54:16 -04:00
Peter Staubach	38c73044f5	NFS: read-modify-write page updating Hi. I have a proposal for possibly resolving this issue. I believe that this situation occurs due to the way that the Linux NFS client handles writes which modify partial pages. The Linux NFS client handles partial page modifications by allocating a page from the page cache, copying the data from the user level into the page, and then keeping track of the offset and length of the modified portions of the page. The page is not marked as up to date because there are portions of the page which do not contain valid file contents. When a read call comes in for a portion of the page, the contents of the page must be read in the from the server. However, since the page may already contain some modified data, that modified data must be written to the server before the file contents can be read back in the from server. And, since the writing and reading can not be done atomically, the data must be written and committed to stable storage on the server for safety purposes. This means either a FILE_SYNC WRITE or a UNSTABLE WRITE followed by a COMMIT. This has been discussed at length previously. This algorithm could be described as modify-write-read. It is most efficient when the application only updates pages and does not read them. My proposed solution is to add a heuristic to decide whether to do this modify-write-read algorithm or switch to a read- modify-write algorithm when initially allocating the page in the write system call path. The heuristic uses the modes that the file was opened with, the offset in the page to read from, and the size of the region to read. If the file was opened for reading in addition to writing and the page would not be filled completely with data from the user level, then read in the old contents of the page and mark it as Uptodate before copying in the new data. If the page would be completely filled with data from the user level, then there would be no reason to read in the old contents because they would just be copied over. This would optimize for applications which randomly access and update portions of files. The linkage editor for the C compiler is an example of such a thing. I tested the attached patch by using rpmbuild to build the current Fedora rawhide kernel. The kernel without the patch generated about 269,500 WRITE requests. The modified kernel containing the patch generated about 261,000 WRITE requests. Thus, about 8,500 fewer WRITE requests were generated. I suspect that many of these additional WRITE requests were probably FILE_SYNC requests to WRITE a single page, but I didn't test this theory. The difference between this patch and the previous one was to remove the unneeded PageDirty() test. I then retested to ensure that the resulting system continued to behave as desired. Thanx... ps Signed-off-by: Peter Staubach <staubach@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-10 08:54:16 -04:00
Trond Myklebust	074cc1deec	NFS: Add a ->migratepage() aop for NFS Make NFS a bit more friendly to NUMA and memory hot removal... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-10 08:54:13 -04:00
Tejun Heo	1905b1bfc0	chrdev: implement __[un]register_chrdev() [un]register_chrdev() assume minor range 0-255. This patch adds __ prefixed versions which take @minorbase and @count explicitly. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2009-08-10 13:59:12 +02:00
Oleg Nesterov	704b836cbf	mm_for_maps: take ->cred_guard_mutex to fix the race with exec The problem is minor, but without ->cred_guard_mutex held we can race with exec() and get the new ->mm but check old creds. Now we do not need to re-check task->mm after ptrace_may_access(), it can't be changed to the new mm under us. Strictly speaking, this also fixes another very minor problem. Unless security check fails or the task exits mm_for_maps() should never return NULL, the caller should get either old or new ->mm. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-08-10 20:49:26 +10:00
Oleg Nesterov	00f89d2185	mm_for_maps: shift down_read(mmap_sem) to the caller mm_for_maps() takes ->mmap_sem after security checks, this looks strange and obfuscates the locking rules. Move this lock to its single caller, m_start(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-08-10 20:48:32 +10:00
Oleg Nesterov	13f0feafa6	mm_for_maps: simplify, use ptrace_may_access() It would be nice to kill __ptrace_may_access(). It requires task_lock(), but this lock is only needed to read mm->flags in the middle. Convert mm_for_maps() to use ptrace_may_access(), this also simplifies the code a little bit. Also, we do not need to take ->mmap_sem in advance. In fact I think mm_for_maps() should not play with ->mmap_sem at all, the caller should take this lock. With or without this patch, without ->cred_guard_mutex held we can race with exec() and get the new ->mm but check old creds. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-08-10 20:47:42 +10:00
Oleg Nesterov	896a6de40e	mm_for_maps: take ->cred_guard_mutex to fix the race with exec The problem is minor, but without ->cred_guard_mutex held we can race with exec() and get the new ->mm but check old creds. Now we do not need to re-check task->mm after ptrace_may_access(), it can't be changed to the new mm under us. Strictly speaking, this also fixes another very minor problem. Unless security check fails or the task exits mm_for_maps() should never return NULL, the caller should get either old or new ->mm. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-08-10 12:21:08 +10:00
Oleg Nesterov	d3c8660233	mm_for_maps: shift down_read(mmap_sem) to the caller mm_for_maps() takes ->mmap_sem after security checks, this looks strange and obfuscates the locking rules. Move this lock to its single caller, m_start(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-08-10 12:21:06 +10:00
Theodore Ts'o	4ba74d00a2	ext4: Fix bugs in mballoc's stream allocation mode The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all screwed up. These fields were getting unconditionally all the time, set even when stream allocation had not taken place, and if they were being used when the file was smaller than s_mb_stream_request, which is when the allocation should _not_ be doing stream allocation. Fix this by determining whether or not we stream allocation should take place once, in ext4_mb_group_or_file(), and setting a flag which gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found(). This simplifies the code and assures that we are consistently using (or not using) the stream allocation logic. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-09 22:01:13 -04:00
Theodore Ts'o	0ef90db93a	ext4: Display the mballoc flags in mb_history in hex instead of decimal Displaying the flags in base 16 makes it easier to see which flags have been set. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-09 16:46:13 -04:00
Theodore Ts'o	6ba495e925	ext4: Add configurable run-time mballoc debugging Allow mballoc debugging to be enabled at run-time instead of just at compile time. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-18 13:38:55 -04:00
Peng Tao	91cc219ad9	ext4: fix journal ref count in move_extent_par_page move_extent_par_page calls a_ops->write_begin() to increase journal handler's reference count. However, if either mext_replace_branches() or ext4_get_block fails, the increased reference count isn't decreased. This will cause a later attempt to umount of the fs to hang forever. The patch addresses the issue by calling ext4_journal_stop() if page is not NULL (which means a_ops->write_end() isn't invoked). Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-10 23:05:28 -04:00
Andreas Dilger	b1f485f20e	jbd2: round commit timer up to avoid uncommitted transaction fix jiffie rounding in jbd commit timer setup code. Rounding down could cause the timer to be fired before the corresponding transaction has expired. That transaction can stay not committed forever if no new transaction is created or expicit sync/umount happens. Signed-off-by: Alex Zhuravlev (Tomas) <alex.zhuravlev@sun.com> Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-10 22:51:53 -04:00
Roel Kluin	c333e073b7	ext4: remove redundant test on unsigned unsigned i_block cannot be less than 0. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-10 22:47:22 -04:00
Trond Myklebust	bc74b4f5e6	SUNRPC: Allow the cache_detail to specify alternative upcall mechanisms For events that are rare, such as referral DNS lookups, it makes limited sense to have a daemon constantly listening for upcalls on a channel. An alternative in those cases might simply be to run the app that fills the cache using call_usermodehelper_exec() and friends. The following patch allows the cache_detail to specify alternative upcall mechanisms for these particular cases. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:29 -04:00
Trond Myklebust	2da8ca26c6	NFSD: Clean up the idmapper warning... What part of 'internal use' is so hard to understand? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:26 -04:00
Trond Myklebust	7d217caca5	SUNRPC: Replace rpc_client->cl_dentry and cl_mnt, with a cl_path Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:24 -04:00
Trond Myklebust	b693ba4a33	SUNRPC: Constify rpc_pipe_ops... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:14:15 -04:00
Chuck Lever	4116092b92	NFSD: Support IPv6 addresses in write_failover_ip() In write_failover_ip(), replace the sscanf() with a call to the common sunrpc.ko presentation address parser. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:40 -04:00
Chuck Lever	c15128c5e4	lockd: Replace nsm_display_address() with rpc_ntop() Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:39 -04:00
Chuck Lever	b97a56747e	lockd: Replace nlm_clear_port() Clean up: Use shared rpc_set_port() function instead of nlm_clear_port(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:38 -04:00
Chuck Lever	ec6ee61250	NFS: Replace nfs_set_port() with rpc_set_port() Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:37 -04:00
Chuck Lever	53a0b9c4c9	NFS: Replace nfs_parse_ip_address() with rpc_pton() Clean up: Use the common routine now provided in sunrpc.ko for parsing mount addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:36 -04:00
Chuck Lever	a02d692611	SUNRPC: Provide functions for managing universal addresses Introduce a set of functions in the kernel's RPC implementation for converting between a socket address and either a standard presentation address string or an RPC universal address. The universal address functions will be used to encode and decode RPCB_FOO and NFSv4 SETCLIENTID arguments. The other functions are part of a previous promise to deliver shared functions that can be used by upper-layer protocols to display and manipulate IP addresses. The kernel's current address printf formatters were designed specifically for kernel to user-space APIs that require a particular string format for socket addresses, thus are somewhat limited for the purposes of sunrpc.ko. The formatter for IPv6 addresses, %pI6, does not support short-handing or scope IDs. Also, these printf formatters are unique per address family, so a separate formatter string is required for printing AF_INET and AF_INET6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:34 -04:00
Chuck Lever	ec88f28d1e	NFS: Use the authentication flavor list returned by mountd Commit `a14017db` added support in the kernel's NFS mount client to decode the authentication flavor list returned by mountd. The NFS client can now use this list to determine whether the authentication flavor requested by the user is actually supported by the server. Note we don't actually negotiate the security flavor if none was specified by the user. Instead, we try to use AUTH_SYS, and fail if the server does not support it. This prevents us from negotiating an inappropriate security flavor (some servers list AUTH_NULL first). If the server does not support AUTH_SYS, the user must provide an appropriate security flavor by specifying the "sec=" mount option. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:32 -04:00
Chuck Lever	059f90b323	NFS: Fix auth flavor len accounting Previous logic in the NFS mount parsing code path assumed auth_flavor_len was set to zero for simple authentication flavors (like AUTH_UNIX), and 1 for compound flavors (like AUTH_GSS). At some earlier point (maybe even before the option parsers were merged?) specific checks for auth_flavor_len being zero were removed from the functions that validate the mount option that sets the mount point's authentication flavor. Since we are populating an array for authentication flavors, the auth_flavor_len should always be set to the number of flavors. Let's eliminate some cleverness here, and prepare for new logic that needs to know the number of flavors in the auth_flavors[] array. (auth_flavors[] is an array because at some point we want to allow a list of acceptable authentication flavors to be specified via the sec= mount option. For now it remains a single element array). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:31 -04:00
Chuck Lever	0b524123c9	NFS: Add ability to send MOUNTPROC_UMNT to the kernel's mountd client After certain failure modes of an NFS mount, an NFS client should send a MOUNTPROC_UMNT request to remove the just-added mount entry from the server's mount table. While no-one should rely on the accuracy of the server's mount table, sending a UMNT is simply being a good internet neighbor. Since NFS mount processing is handled in the kernel now, we will need a function in the kernel's mountd client that can post a MOUNTRPC_UMNT request, in order to handle these failure modes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:30 -04:00
Chuck Lever	f3f4f4ed26	NFS: Fix up new minorversion= option The new minorversion= mount option (commit `3fd5be9e`) was merged at the same time as the recent sloppy parser fixes (commit `a5a16bae`), so minorversion= still uses the old value parsing logic. If the minorversion= option specifies a bogus value, it should fail with "bad value" not "bad option." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:09:29 -04:00
Trond Myklebust	c140aa9135	NFSv4: Clean up the nfs.callback_tcpport option Tighten up the validity checking in param_set_port: check for NULL pointers. Ensure that the option shows up on 'modinfo' output. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:06:19 -04:00
Trond Myklebust	80e52aced1	NFSv4: Don't do idmapper upcalls for asynchronous RPC calls We don't want to cause rpciod to hang... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:06:19 -04:00
Trond Myklebust	62ab460cf5	NFSv4: Add 'server capability' flags for NFSv4 recommended attributes If the NFSv4 server doesn't support a POSIX attribute, the generic NFS code needs to know that, so that it don't keep trying to poll for it. However, by the same count, if the NFSv4 server does support that attribute, then we should ensure that the inode metadata is appropriately labelled as being untrusted. For instance, if we don't know the correct value of the file's uid, we should certainly not be caching ACLs or ACCESS results. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:06:19 -04:00
Trond Myklebust	a78cb57a10	NFSv4: Don't loop forever on state recovery failure... If the server is broken, then retrying forever won't fix it. We should just give up after a while, and return an error to the user. We set the number of retries to 10 for now... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:06:19 -04:00
Roel Kluin	dd8ac1da41	nfs: Keep index within mnt_errtbl[] Ensure that index i remains within array mnt_errtbl[] and mnt3_errtbl[]. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-08-09 15:06:19 -04:00
Linus Torvalds	d6a0967c90	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix balancing oops when invalidate_inode_pages2 returns EBUSY Btrfs: correct error-handling zlib error handling Btrfs: remove superfluous NULL pointer check in btrfs_rename() Btrfs: make sure the async caching thread advances the key Btrfs: fix btrfs_remove_from_free_space corner case	2009-08-07 19:03:09 -07:00
Linus Torvalds	fb385003c4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/hch/xfs-icache-races * git://git.kernel.org/pub/scm/linux/kernel/git/hch/xfs-icache-races: xfs: fix freeing of inodes not yet added to the inode cache vfs: add __destroy_inode vfs: fix inode_init_always calling convention	2009-08-07 18:53:44 -07:00
Roel Kluin	8a57a9dda7	ocfs2: keep index within status_map[] Do not exceed array status_map[] Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-07 13:16:50 -07:00
Sunil Mushran	e7432675f8	ocfs2: Initialize the cluster we're writing to in a non-sparse extend In a non-sparse extend, we correctly allocate (and zero) the clusters between the old_i_size and pos, but we don't zero the portions of the cluster we're writing to outside of pos<->len. It handles clustersize > pagesize and blocksize < pagesize. [Cleaned up by Joel Becker.] Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-07 13:16:23 -07:00
Yan Zheng	ceab36edd3	Btrfs: fix balancing oops when invalidate_inode_pages2 returns EBUSY invalidate_inode_pages2_range may return -EBUSY occasionally which results Oops. This patch fixes the issue by moving invalidate_inode_pages2_range into a loop and keeping calling it until the return value is not -EBUSY. The EBUSY return is temporary, and can happen when the btrfs release page function is unable to release a page because the EXTENT_LOCK bit is set. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-08-07 13:51:33 -04:00
Julia Lawall	60f2e8f8a0	Btrfs: correct error-handling zlib error handling find_zlib_workspace returns an ERR_PTR value in an error case instead of NULL. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @match exists@ expression x, E; statement S1, S2; @@ x = find_zlib_workspace(...) ... when != x = E ( * if (x == NULL \|\| ...) S1 else S2 \| * if (x == NULL && ...) S1 else S2 ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-08-07 13:51:33 -04:00
Bartlomiej Zolnierkiewicz	4baf8c9201	Btrfs: remove superfluous NULL pointer check in btrfs_rename() This takes care of the following entry from Dan's list: fs/btrfs/inode.c +4788 btrfs_rename(36) warning: variable derefenced before check 'old_inode' Reported-by: Dan Carpenter <error27@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eugene Teo <eteo@redhat.com> Cc: Julia Lawall <julia@diku.dk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-08-07 13:47:08 -04:00
Linus Torvalds	389623fef0	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: jffs2: Fix return value from jffs2_do_readpage_nolock() mtd: mtdblock: introduce mtdblks_lock mtd: remove 'SBC8240 Wind River' Device Driver Code mtd: OneNAND: OMAP2/3: free GPMC CS on module removal mtd: OneNAND: fix incorrect bufferram offset mtd: blkdevs: do not forget to get MTD devices mtd: fix the conversion from dev to mtd_info mtd: let include/linux/mtd/partitions.h stand on its own	2009-08-07 10:42:31 -07:00
Linus Torvalds	3440625d78	flat: fix uninitialized ptr with shared libs The new credentials code broke load_flat_shared_library() as it now uses an uninitialized cred pointer. Reported-by: Bernd Schmidt <bernds_cb1@t-online.de> Tested-by: Bernd Schmidt <bernds_cb1@t-online.de> Cc: Mike Frysinger <vapier@gentoo.org> Cc: David Howells <dhowells@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-07 10:39:57 -07:00
OGAWA Hirofumi	2d8dd38a5a	vfs: mnt_want_write_file(): fix special file handling I suspect that mnt_want_write_file() may have wrong assumption. I think mnt_want_write_file() is assuming it increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's special_file(), it is false? Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-07 10:39:56 -07:00
Eric Sandeen	69130c7cf9	compat_ioctl: hook up compat handler for FIEMAP ioctl The FIEMAP_IOC_FIEMAP mapping ioctl was missing a 32-bit compat handler, which means that 32-bit suerspace on 64-bit kernels cannot use this ioctl command. The structure is nicely aligned, padded, and sized, so it is just this simple. Tested w/ 32-bit ioctl tester (from Josef) on a 64-bit kernel on ext4. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Mark Lord <lkml@rtr.ca> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Josef Bacik <josef@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-07 10:39:56 -07:00
Christoph Hellwig	b36ec0428a	xfs: fix freeing of inodes not yet added to the inode cache When freeing an inode that lost race getting added to the inode cache we must not call into ->destroy_inode, because that would delete the inode that won the race from the inode cache radix tree. This patch uses splits a new xfs_inode_free helper out of xfs_ireclaim and uses that plus __destroy_inode to make sure we really only free the memory allocted for the inode that lost the race, and not mess with the inode cache state. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reported-by: Alex Samad <alex@samad.com.au> Reported-by: Andrew Randrianasulu <randrik@mail.ru> Reported-by: Stephane <sharnois@max-t.com> Reported-by: Tommy <tommy@news-service.com> Reported-by: Miah Gregory <mace@darksilence.net> Reported-by: Gabriel Barazer <gabriel@oxeva.fr> Reported-by: Leandro Lucarella <llucax@gmail.com> Reported-by: Daniel Burr <dburr@fami.com.au> Reported-by: Nickolay <newmail@spaces.ru> Reported-by: Michael Guntsche <mike@it-loops.com> Reported-by: Dan Carley <dan.carley+linuxkern-bugs@gmail.com> Reported-by: Michael Ole Olsen <gnu@gmx.net> Reported-by: Michael Weissenbacher <mw@dermichi.com> Reported-by: Martin Spott <Martin.Spott@mgras.net> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Michael Guntsche <mike@it-loops.com> Tested-by: Dan Carley <dan.carley+linuxkern-bugs@gmail.com> Tested-by: Christian Kujau <lists@nerdbynature.de>	2009-08-07 14:38:34 -03:00
Christoph Hellwig	2e00c97e2c	vfs: add __destroy_inode When we want to tear down an inode that lost the add to the cache race in XFS we must not call into ->destroy_inode because that would delete the inode that won the race from the inode cache radix tree. This patch provides the __destroy_inode helper needed to fix this, the actual fix will be in th next patch. As XFS was the only reason destroy_inode was exported we shift the export to the new __destroy_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>	2009-08-07 14:38:29 -03:00
Christoph Hellwig	54e346215e	vfs: fix inode_init_always calling convention Currently inode_init_always calls into ->destroy_inode if the additional initialization fails. That's not only counter-intuitive because inode_init_always did not allocate the inode structure, but in case of XFS it's actively harmful as ->destroy_inode might delete the inode from a radix-tree that has never been added. This in turn might end up deleting the inode for the same inum that has been instanciated by another process and cause lots of cause subtile problems. Also in the case of re-initializing a reclaimable inode in XFS it would free an inode we still want to keep alive. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>	2009-08-07 14:38:25 -03:00
James Morris	012a5299a2	Merge branch 'master' into next	2009-08-06 08:55:03 +10:00
Linus Torvalds	624720e09c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix missing unlock in error path of nilfs_mdt_write_page nilfs2: fix oops due to inconsistent state in page with discrete b-tree nodes	2009-08-04 15:28:23 -07:00
Linus Torvalds	849c9caa60	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Update readme to reflect forceuid mount parms cifs: Read buffer overflow cifs: show noforceuid/noforcegid mount options (try #2) cifs: reinstate original behavior when uid=/gid= options are specified [CIFS] Updates fs/cifs/CHANGES cifs: fix error handling in mount-time DFS referral chasing code	2009-08-04 15:27:56 -07:00
Anders Grafström	57ca7deb06	jffs2: Fix return value from jffs2_do_readpage_nolock() This fixes "kernel BUG at fs/jffs2/file.c:251!". This pseudocode hopefully illustrates the scenario that triggers it: jffs2_write_begin { jffs2_do_readpage_nolock { jffs2_read_inode_range { jffs2_read_dnode { Data CRC 33c102e9 != calculated CRC 0ef77e7b for node at 005d42e4 return -EIO; } } ClearPageUptodate(pg); return 0; } } jffs2_write_end { BUG_ON(!PageUptodate(pg)); } Signed-off-by: Anders Grafström <grfstrm@users.sourceforge.net> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-08-04 12:13:06 +01:00
Steve French	d098564f3b	[CIFS] Update readme to reflect forceuid mount parms Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-08-04 03:53:28 +00:00
Roel Kluin	24e2fb615f	cifs: Read buffer overflow Check whether index is within bounds before testing the element. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-08-03 21:01:32 +00:00
Jeff Layton	4486d6ede1	cifs: show noforceuid/noforcegid mount options (try #2 ) Since forceuid is the default, we now need to show when it's disabled. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-08-03 21:01:17 +00:00
Ryusuke Konishi	01a261e09a	nilfs2: fix missing unlock in error path of nilfs_mdt_write_page This adds a missing unlock of nilfs->ns_writer_mutex in nilfs_mdt_write_page() function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-08-02 22:24:15 +09:00
Ingo Molnar	8e9ed8b024	Merge branch 'sched/urgent' into sched/core Merge reason: avoid upcoming patch conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:23:57 +02:00
Jeff Layton	9b9d6b2434	cifs: reinstate original behavior when uid=/gid= options are specified This patch fixes the regression reported here: http://bugzilla.kernel.org/show_bug.cgi?id=13861 commit `4ae1507f6d` changed the default behavior when the uid= or gid= option was specified for a mount. The existing behavior was to always clobber the ownership information provided by the server when these options were specified. The above commit changed this behavior so that these options simply provided defaults when the server did not provide this information (unless "forceuid" or "forcegid" were specified) This patch reverts this change so that the default behavior is restored. It also adds "noforceuid" and "noforcegid" options to make it so that ownership information from the server is preserved, even when the mount has uid= or gid= options specified. It also adds a couple of printk notices that pop up when forceuid or forcegid options are specified without a uid= or gid= option. Reported-by: Tom Chiverton <bugzilla.kernel.org@falkensweb.com> Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-08-02 03:47:25 +00:00
Ryusuke Konishi	a97778457f	nilfs2: fix oops due to inconsistent state in page with discrete b-tree nodes Andrea Gelmini gave me a report that a kernel oops hit on a nilfs filesystem with a 1KB block size when doing rsync. This turned out to be caused by an inconsistency of dirty state between a page and its buffers storing b-tree node blocks. If the page had multiple buffers split over multiple logs, and if the logs were written at a time, a dirty flag remained in the page even every dirty flag in the buffers was cleared. This will fix the failure by dropping the dirty flag properly for pages with the discrete multiple b-tree nodes. Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: stable@kernel.org	2009-08-01 22:48:32 +09:00
Paul Wise	955234755c	vfat: change the default from shortname=lower to shortname=mixed Because, with "shortname=lower", copying one FAT filesystem tree to another FAT filesystem tree using Linux results in semantically different filesystems. (E.g.: Filenames which were once "all uppercase" are now "all lowercase"). So, this changes the default of "shortname=lower" to "shortname=mixed". Signed-off-by: Paul Wise <pabs3@bonedaddy.net> [change fat_show_options()] Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>	2009-08-01 21:35:25 +09:00
OGAWA Hirofumi	67638e4043	fat/nls: Fix handling of utf8 invalid char With utf8 option, vfat allowed the duplicated filenames. Normal nls returns -EINVAL for invalid char. But utf8s_to_utf16s() skipped the invalid char historically. So, this changes the utf8s_to_utf16s() directly to return -EINVAL for invalid char, because vfat is only user of it. mkdir /mnt/fatfs FILENAME=`echo -ne "invalidutf8char_\\0341_endofchar"` echo "Using filename: $FILENAME" dd if=/dev/zero of=fatfs bs=512 count=128 mkdosfs -F 32 fatfs mount -o loop,utf8 fatfs /mnt/fatfs touch "/mnt/fatfs/$FILENAME" umount /mnt/fatfs mount -o loop,utf8 fatfs /mnt/fatfs touch "/mnt/fatfs/$FILENAME" ls -l /mnt/fatfs umount /mnt/fatfs ---- And the output is: Using filename: invalidutf8char_\0341_endofchar 128+0 records in 128+0 records out 65536 bytes (66 kB) copied, 0.000388118 s, 169 MB/s mkdosfs 2.11 (12 Mar 2005) total 0 -rwxr-xr-x 1 root root 0 Jun 28 19:46 invalidutf8char__endofchar -rwxr-xr-x 1 root root 0 Jun 28 19:46 invalidutf8char__endofchar Tested-by: Marton Balint <cus@fazekas.hu> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>	2009-08-01 21:35:21 +09:00
Goldwyn Rodrigues	ab57a40827	ocfs2: Remove redundant BUG_ON in __dlm_queue_ast() We BUG_ON() the same thing twice. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-31 13:43:44 -07:00
Linus Torvalds	f5266cbd2f	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: bump up nr_to_write in xfs_vm_writepage xfs: reduce bmv_count in xfs_vn_fiemap	2009-07-31 12:17:37 -07:00
Chris Mason	013f1b12f4	Btrfs: make sure the async caching thread advances the key The async caching thread can end up looping forever if a given search puts it at the last key in a leaf. It will end up calling btrfs_next_leaf and then checking if it needs to politely drop the read semaphore. Most of the time this looping isn't noticed because it is able to make progress the next time around. But, during log replay, we wait on the async caching thread to finish, and the async thread is waiting on the commit, and no progress is really made. The fix used here is to copy the key out of the next leaf, that way our search lands there properly. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-31 14:57:55 -04:00
Josef Bacik	6606bb97e1	Btrfs: fix btrfs_remove_from_free_space corner case Yan Zheng hit a problem where we tried to remove some free space but failed because we couldn't find the free space entry. This is because the free space was held within a bitmap that had a starting offset well before the actual offset of the free space, and there were free space extents that were in the same range as that offset, so tree_search_offset returned with NULL because we couldn't find a free space extent that had that offset. This is fixed by making sure that if we fail to find the entry, we re-search again with bitmap_only set to 1 and do an offset_to_bitmap so we can get the appropriate bitmap. A similar problem happens in btrfs_alloc_from_bitmap for the clustering code, but that is not as bad since we will just go and redo our cluster allocation. Also this adds some debugging checks to make sure that the free space we are trying to remove from the bitmap is in fact there. This can probably go away after a while, but since this code is only used by the tree-logging stuff it would be nice to run with it for a while to make sure there are no problems. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-31 11:03:58 -04:00
Eric Sandeen	c8a4051c37	xfs: bump up nr_to_write in xfs_vm_writepage VM calculation for nr_to_write seems off. Bump it way up, this gets simple streaming writes zippy again. To be reviewed again after Jens' writeback changes. Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Cc: Chris Mason <chris.mason@oracle.com> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-07-31 00:57:11 -05:00
Eric Sandeen	97db39a1f6	xfs: reduce bmv_count in xfs_vn_fiemap commit `6321e3ed2a` caused the full bmv_count's worth of getbmapx structures to get allocated; telling it to do MAXEXTNUM was a bit insane, resulting in ENOMEM every time. Chop it down to something reasonable, the number of slots in the caller's input buffer. If this is too large the caller may get ENOMEM but the reason should not be a mystery, and they can try again with something smaller. We add 1 to the value because in the normal getbmap world, bmv_count includes the header and xfs_getbmap does: nex = bmv->bmv_count - 1; if (nex <= 0) return XFS_ERROR(EINVAL); Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Olaf Weber <olaf@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-07-31 00:56:58 -05:00
Linus Torvalds	ec6a8679fa	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: be more polite in the async caching threads Btrfs: preserve commit_root for async caching	2009-07-30 16:46:48 -07:00
Linus Torvalds	784b1d6b21	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: Fix loading of VAT inode when drive wrongly reports number of recorded blocks	2009-07-30 16:46:17 -07:00
Linus Torvalds	691c5f7374	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: quota: Silence lockdep on quota_on	2009-07-30 16:46:06 -07:00
Linus Torvalds	79af313317	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: remove dcache entries for remote deleted inodes GFS2: Fix incorrent statfs consistency check GFS2: Don't put unlikely reclaim candidates on the reclaim list. GFS2: Don't try and dealloc own inode GFS2: Fix panic in glock memory shrinker GFS2: keep statfs info in sync on grows GFS2: Shrink the shrinker	2009-07-30 16:45:37 -07:00
Tao Ma	e9956fae7d	ocfs2/quota: Release lock for error in ocfs2_quota_write. ocfs2_quota_write needs to release the lock if it fails to read quota block. So use "goto out" instead of "return err". Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-30 11:06:06 -07:00
Jan Kara	dee865656f	quota: Silence lockdep on quota_on Commit `d01730d74d` didn't completely fix the problem since we still take dqio_mutex and i_mutex in the wrong order. Move taking of i_mutex further down (luckily it's needed only for updating inode flags) below where dqio_mutex is taken. Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-30 17:31:23 +02:00
Jan Kara	4bf17af0db	udf: Fix loading of VAT inode when drive wrongly reports number of recorded blocks VAT inode is located in the last block recorded block of the medium. When the drive errorneously reports number of recorded blocks, we failed to load the VAT inode and thus mount the medium. This patch makes kernel try to read VAT inode from the last block of the device if it is different from the last recorded block. Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-30 17:28:26 +02:00
Chris Mason	f36f3042ea	Btrfs: be more polite in the async caching threads The semaphore used by the async caching threads can prevent a transaction commit, which can make the FS appear to stall. This releases the semaphore more often when a transaction commit is in progress. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-30 10:14:46 -04:00
Yan Zheng	276e680d19	Btrfs: preserve commit_root for async caching The async block group caching code uses the commit_root pointer to get a stable version of the extent allocation tree for scanning. This copy of the tree root isn't going to change and it significantly reduces the complexity of the scanning code. During a commit, we have a loop where we update the extent allocation tree root. We need to loop because updating the root pointer in the tree of tree roots may allocate blocks which may change the extent allocation tree. Right now the commit_root pointer is changed inside this loop. It is more correct to change the commit_root pointer only after all the looping is done. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-30 09:40:40 -04:00
Benjamin Marzinski	b94a170e96	GFS2: remove dcache entries for remote deleted inodes When a file is deleted from a gfs2 filesystem on one node, a dcache entry for it may still exist on other nodes in the cluster. If this happens, gfs2 will be unable to free this file on disk. Because of this, it's possible to have a gfs2 filesystem with no files on it and no free space. With this patch, when a node receives a callback notifying it that the file is being deleted on another node, it schedules a new workqueue thread to remove the file's dcache entry. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 11:01:03 +01:00
Benjamin Marzinski	6b94617024	GFS2: Fix incorrent statfs consistency check Since both linked and unlinked inodes are counted by rgd->rd_dinodes, It makes no sense to count them with the used data blocks (first check that I changed), it makes sense to count them with the linked inodes (second check), and it makes no sense to care if there are more unlinked inodes than linked ones. This fixes these errors. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 11:00:28 +01:00
Benjamin Marzinski	8ff22a6f9b	GFS2: Don't put unlikely reclaim candidates on the reclaim list. GFS2 was placing far too many glocks on the reclaim list that were not good candidates for freeing up from cache. These locks would sit there and repeatedly get scanned to see if they could be reclaimed, wasting a lot of time when there was memory pressure. This fix does more checks on the locks to see if they are actually likely to be removable from cache. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 11:00:09 +01:00
Steven Whitehouse	1e19a19584	GFS2: Don't try and dealloc own inode When searching for unlinked, but still allocated inodes during block allocation, avoid the block relating to the inode that is doing the allocation. This fixes a hang caused when an unlinked, but still open, inode tries to allocate some more blocks and lands up finding itself during the search for deallocatable inodes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 10:59:50 +01:00
Benjamin Marzinski	a51b56fff3	GFS2: Fix panic in glock memory shrinker It is possible for gfs2_shrink_glock_memory() to check a glock for demotion that's in the process of being freed by gfs2_glock_put(). In this case, gfs2_shrink_glock_memory() will acquire a new reference to this glock, and then try to free the glock itself when it drops the refernce. To solve this, gfs2_shrink_glock_memory() just needs to check if the glock is in the process of being freed, and if so skip it without ever unlocking the lru_lock. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 10:59:28 +01:00
Benjamin Marzinski	1946f70ab5	GFS2: keep statfs info in sync on grows GFS2 wasn't syncing its statfs info on grows. This causes a problem when you grow the filesystem on multiple nodes. GFS2 would calculate the new space based on the resource groups (which are always current), and then assume that the filesystem had grown the from the existing statfs size. If you grew the filesystem on two different nodes in a short time, the second node wouldn't see the statfs size change from the first node, and would assume that it was grown by a larger amount than it was. When all these changes were synced out, the total fileystem size would be incorrect (the first grow would be counted twice). This patch syncs makes GFS2 read in the statfs changes from disk before a grow, and write them out after the grow, while the master statfs inode is locked. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 10:52:33 +01:00
Steven Whitehouse	2163b1e616	GFS2: Shrink the shrinker This patch removes some of the special cases that the shrinker was trying to deal with. As a result we leave fewer items on the list and none at all which cannot be demoted. This makes the list scanning more efficient and solves some issues seen with large numbers of inodes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-07-30 10:52:14 +01:00
Steve French	5bd9052d79	[CIFS] Updates fs/cifs/CHANGES Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-30 02:26:14 +00:00
Linus Torvalds	91a5698d1f	Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Hibernate: Replace bdget call with simple atomic_inc of i_count PM / ACPI: HP G7000 Notebook needs a SCI_EN resume quirk	2009-07-29 19:15:18 -07:00
Catalin Marinas	5c80536523	fs/ramfs/file-nommu.c needs include/linux/sched.h This file makes use of various macros defined in files like asm/current.h or asm-generic/resource.h. All these files can be included via sched.h. The building of the !MMU ARM kernel (with additional patches) fails without this change. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-29 19:10:36 -07:00
Linus Torvalds	ccf5675a82	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: driver core: documentation: make it clear that sysfs is optional driver core: sysdev: do not send KOBJ_ADD uevent if kobject_init_and_add fails Dynamic debug: fix typo: -/-> driver core: firmware_class:fix memory leak of page pointers array sysfs: fix hardlink count on device_move	2009-07-29 12:29:08 -07:00
Alan Jenkins	dddac6a7b4	PM / Hibernate: Replace bdget call with simple atomic_inc of i_count Create bdgrab(). This function copies an existing reference to a block_device. It is safe to call from any context. Hibernation code wishes to copy a reference to the active swap device. Right now it calls bdget() under a spinlock, but this is wrong because bdget() can sleep. It doesn't need a full bdget() because we already hold a reference to active swap devices (and the spinlock protects against swapoff). Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13827 Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2009-07-29 21:07:55 +02:00
Linus Torvalds	655c5d8fc1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (22 commits) Btrfs: Fix async caching interaction with unmount Btrfs: change how we unpin extents Btrfs: Correct redundant test in add_inode_ref Btrfs: find smallest available device extent during chunk allocation Btrfs: clear all space_info->full after removing a block group Btrfs: make flushoncommit mount option correctly wait on ordered_extents Btrfs: Avoid delayed reference update looping Btrfs: Fix ordering of key field checks in btrfs_previous_item Btrfs: find_free_dev_extent doesn't handle holes at the start of the device Btrfs: Remove code duplication in comp_keys Btrfs: async block group caching Btrfs: use hybrid extents+bitmap rb tree for free space Btrfs: Fix crash on read failures at mount Btrfs: remove of redundant btrfs_header_level Btrfs: adjust NULL test Btrfs: Remove broken sanity check from btrfs_rmap_block() Btrfs: convert nested spin_lock_irqsave to spin_lock Btrfs: make sure all dirty blocks are written at commit time Btrfs: fix locking issue in btrfs_find_next_key Btrfs: fix double increment of path->slots[0] in btrfs_next_leaf ...	2009-07-28 14:27:06 -07:00
Ramon de Carvalho Valle	f151cd2c54	eCryptfs: parse_tag_3_packet check tag 3 packet encrypted key size The parse_tag_3_packet function does not check if the tag 3 packet contains a encrypted key size larger than ECRYPTFS_MAX_ENCRYPTED_KEY_BYTES. Signed-off-by: Ramon de Carvalho Valle <ramon@risesecurity.org> [tyhicks@linux.vnet.ibm.com: Added printk newline and changed goto to out_free] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: stable@kernel.org (2.6.27 and 30) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-28 14:26:06 -07:00
Tyler Hicks	6352a29305	eCryptfs: Check Tag 11 literal data buffer size Tag 11 packets are stored in the metadata section of an eCryptfs file to store the key signature(s) used to encrypt the file encryption key. After extracting the packet length field to determine the key signature length, a check is not performed to see if the length would exceed the key signature buffer size that was passed into parse_tag_11_packet(). Thanks to Ramon de Carvalho Valle for finding this bug using fsfuzzer. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: stable@kernel.org (2.6.27 and 30) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-28 14:26:06 -07:00
Peter Oberparleiter	0f58b44582	sysfs: fix hardlink count on device_move Update directory hardlink count when moving kobjects to a new parent. Fixes the following problem which occurs when several devices are moved to the same parent and then unregistered: > ls -laF /sys/devices/css0/defunct/ > total 0 > drwxr-xr-x 4294967295 root root 0 2009-07-14 17:02 ./ > drwxr-xr-x 114 root root 0 2009-07-14 17:02 ../ > drwxr-xr-x 2 root root 0 2009-07-14 17:01 power/ > -rw-r--r-- 1 root root 4096 2009-07-14 17:01 uevent Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-07-28 13:45:21 -07:00
Andy Adamson	abfabf8caf	nfsd41: encode replay sequence from the slot values The sequence operation is not cached; always encode the sequence operation on a replay from the slot table and session values. This simplifies the sessions replay logic in nfsd4_proc_compound. If this is a replay of a compound that was specified not to be cached, return NFS4ERR_RETRY_UNCACHED_REP. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 16:12:34 -04:00
Andy Adamson	c8647947f8	nfsd41: rename nfsd4_enc_uncached_replay This function is only used for SEQUENCE replay. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 14:30:36 -04:00
Andy Adamson	49557cc74c	nfsd41: Use separate DRC for setclientid Instead of trying to share the generic 4.1 reply cache code for the CREATE_SESSION reply cache, it's simpler to handle CREATE_SESSION separately. The nfs41 single slot clientid DRC holds the results of create session processing. CREATE_SESSION can be preceeded by a SEQUENCE operation (an embedded CREATE_SESSION) and the create session single slot cache must be maintained. nfsd4_replay_cache_entry() and nfsd4_store_cache_entry() do not implement the replay of an embedded CREATE_SESSION. The clientid DRC slot does not need the inuse, cachethis or other fields that the multiple slot session cache uses. Replace the clientid DRC cache struct nfs4_slot cache with a new nfsd4_clid_slot cache. Save the xdr struct nfsd4_create_session into the cache at the end of processing, and on a replay, replace the struct for the replay request with the cached version all while under the state lock. nfsd4_proc_compound will handle both the solo and embedded CREATE_SESSION case via the normal use of encode_operation. Errors that do not change the create session cache: A create session NFS4ERR_STALE_CLIENTID error means that a client record (and associated create session slot) could not be found and therefore can't be changed. NFSERR_SEQ_MISORDERED errors do not change the slot cache. All other errors get cached. Remove the clientid DRC specific check in nfs4svc_encode_compoundres to put the session only if cstate.session is set which will now always be true. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 14:30:29 -04:00
Andy Adamson	88e588d56a	nfsd41: change check_slot_seqid parameters For separation of session slot and clientid slot processing. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 14:30:23 -04:00
Andy Adamson	5261dcf8eb	nfsd41: remove redundant forechannel max requests check This check is done in set_forechannel_maxreqs. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 14:30:15 -04:00
Andy Adamson	0c193054a4	nfsd41: hange from page to memory based drc limits NFSD_SLOT_CACHE_SIZE is the size of all encoded operation responses (excluding the sequence operation) that we want to cache. For now, keep NFSD_SLOT_CACHE_SIZE at PAGE_SIZE. It will be reduced when the DRC is changed from page based to memory based. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 14:30:05 -04:00
Andy Adamson	6a14dd1a4f	nfsd41: reserve less memory for DRC Also remove a slightly misleading comment. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 14:29:59 -04:00
Andy Adamson	b101ebbc39	nfsd41: minor set_forechannel_maxreqs cleanup Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 14:29:54 -04:00
Andy Adamson	be98d1bbd1	nfsd41: reclaim DRC memory on session free This fixes a leak which would eventually lock out new clients. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 14:29:48 -04:00
J. Bruce Fields	413d63d710	nfsd: minor write_pool_threads exit cleanup Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 14:29:41 -04:00
Eric Sesterhenn	2522a776c1	Fix memory leak in write_pool_threads kmemleak produces the following warning unreferenced object 0xc9ec02a0 (size 8): comm "cat", pid 19048, jiffies 730243 backtrace: [<c01bf970>] create_object+0x100/0x240 [<c01bfadb>] kmemleak_alloc+0x2b/0x60 [<c01bcd4b>] __kmalloc+0x14b/0x270 [<c02fd027>] write_pool_threads+0x87/0x1d0 [<c02fcc08>] nfsctl_transaction_write+0x58/0x70 [<c02fcc6f>] nfsctl_transaction_read+0x4f/0x60 [<c01c2574>] vfs_read+0x94/0x150 [<c01c297d>] sys_read+0x3d/0x70 [<c0102d6b>] sysenter_do_call+0x12/0x32 [<ffffffff>] 0xffffffff write_pool_threads() only frees nthreads on error paths, in the success case we leak it. Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-28 14:29:34 -04:00
Yan Zheng	f25784b35f	Btrfs: Fix async caching interaction with unmount - don't stop the caching thread until btrfs_commit_super return. - if caching is interrupted by umount, set last to (u64)-1. otherwise the un-scanned range of block group will be considered as free extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-28 08:41:57 -04:00
Peng Tao	785b4b3a5a	ext4: fix build warning when EXT4FS_DEBUG is on When compiling with EXT4FS_DEBUG on, gcc will complain with following warnings: linux-2.6/fs/ext4/ialloc.c: In function ‘ext4_count_free_inodes’: linux-2.6/fs/ext4/ialloc.c:1192: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_group_t’ So add a type cast to suppress it. Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-27 21:44:40 -04:00
Jeff Layton	7b91e2661a	cifs: fix error handling in mount-time DFS referral chasing code If the referral is malformed or the hostname can't be resolved, then the current code generates an oops. Fix it to handle these errors gracefully. Reported-by: Sandro Mathys <sm@sandro-mathys.ch> Acked-by: Igor Mammedov <niallain@gmail.com> CC: Stable <stable@kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-28 00:51:59 +00:00
Linus Torvalds	fc013a5885	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: use GFP_NOFS under potential memory pressure fsnotify: fix inotify tail drop check with path entries inotify: check filename before dropping repeat events fsnotify: use def_bool in kconfig instead of letting the user choose inotify: fix error paths in inotify_update_watch inotify: do not leak inode marks in inotify_add_watch inotify: drop user watch count when a watch is removed	2009-07-27 15:54:10 -07:00
Linus Torvalds	a9355cf8e6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: Fix early release of acl in jfs_get_acl	2009-07-27 12:15:56 -07:00
Linus Torvalds	2bc20d09b0	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: jbd: fix race between write_metadata_buffer and get_write_access ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() jbd: Fix a race between checkpointing code and journal_get_write_access() ext3: Fix truncation of symlinks after failed write jbd: Fail to load a journal if it is too short	2009-07-27 12:12:10 -07:00
Linus Torvalds	c7425eb481	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] fix sparse warning cifs: fix sb->s_maxbytes so that it casts properly to a signed value cifs: disable serverino if server doesn't support it	2009-07-27 12:11:43 -07:00
Josef Bacik	68b38550dd	Btrfs: change how we unpin extents We are racy with async block caching and unpinning extents. This patch makes things much less complicated by only unpinning the extent if the block group is cached. We check the block_group->cached var under the block_group->lock spin lock. If it is set to BTRFS_CACHE_FINISHED then we update the pinned counters, and unpin the extent and add the free space back. If it is not set to this, we start the caching of the block group so the next time we unpin extents we can unpin the extent. This keeps us from racing with the async caching threads, lets us kill the fs wide async thread counter, and keeps us from having to set DELALLOC bits for every extent we hit if there are caching kthreads going. One thing that needed to be changed was btrfs_free_super_mirror_extents. Now instead of just looking for LOCKED extents, we also look for DIRTY extents, since we could have left some extents pinned in the previous transaction that will never get freed now that we are unmounting, which would cause us to leak memory. So btrfs_free_super_mirror_extents has been changed to btrfs_free_pinned_extents, and it will clear the extents locked for the super mirror, and any remaining pinned extents that may be present. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-27 13:57:01 -04:00
Julia Lawall	631c07c8d1	Btrfs: Correct redundant test in add_inode_ref dir has already been tested. It seems that this test should be on the recently returned value inode. A simplified version of the semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-27 13:57:00 -04:00
Chris Mason	9779b72f05	Btrfs: find smallest available device extent during chunk allocation Allocating new block group is easy when the disk has plenty of space. But things get difficult as the disk fills up, especially if the FS has been run through btrfs-vol -b. The balance operation is likely to make the total bytes available on the device greater than the largest extent we'll actually be able to allocate. But the device extent allocation code incorrectly assumes that a device with 5G free will be able to allocate a 5G extent. It isn't normally a problem because device extents don't get freed unless btrfs-vol -b is run. This fixes the device extent allocator to remember the largest free extent it can find, and then uses that value as a fallback. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 16:41:41 -04:00
Chris Mason	283bb1979f	Btrfs: clear all space_info->full after removing a block group Btrfs allocates individual extents from block groups, and each block group has a specific type. It may hold metadata, data mirrored or striped etc. When we balance space (btrfs-vol -b) or remove a drive (btrfs-vol -r) we free block groups. Once a block group is freed, the space it was using on the device may be available for use by new block groups. btrfs_remove_block_group was clearing the flag that said 'our devices are full, don't even try to allocate new block groups', but it was only clearing that flag for a specific type of block group. This commit clears the full flag for all of the types of block groups, making it much more likely that we'll be able to balance space when the drive is close to full. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 16:30:55 -04:00
Sage Weil	ebecd3d9d2	Btrfs: make flushoncommit mount option correctly wait on ordered_extents The commit_transaction call to wait_ordered_extents when snap_pending passes nocow_only=1 to process only NOCOW or PREALLOC extents. This isn't correct for the 'flushoncommit' mode, as it skips extents we just started IO on in start_delalloc_inodes. So, in the flushoncommit case, wait on all ordered extents. Otherwise, only pass the nocow_only flag to wait_ordered_extents if snap_pending. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 13:17:44 -04:00
Yan Zheng	d717aa1d31	Btrfs: Avoid delayed reference update looping btrfs_split_leaf and btrfs_del_items can end up in a loop where one is constantly spliting a given leaf and the other is constantly merging it back with the adjacent nodes. There is a better fix for this, but in the interest of something small, this patch just changes btrfs_del_items back to balancing less often. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 12:42:46 -04:00
Yan Zheng	0a4eefbb74	Btrfs: Fix ordering of key field checks in btrfs_previous_item Check objectid of item before checking the item type, otherwise we may return zero for a key that is actually too low. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 11:22:47 -04:00
Yan Zheng	1fcbac581b	Btrfs: find_free_dev_extent doesn't handle holes at the start of the device find_free_dev_extent does not properly handle the case where the device is not complete free, and there is a free extent at the beginning of the device. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 11:22:47 -04:00
Diego Calleja	20736abaa3	Btrfs: Remove code duplication in comp_keys comp_keys is duplicating what is done in btrfs_comp_cpu_keys, so just call it. Signed-off-by: Diego Calleja <diegocg@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 11:22:46 -04:00
Josef Bacik	817d52f8db	Btrfs: async block group caching This patch moves the caching of the block group off to a kthread in order to allow people to allocate sooner. Instead of blocking up behind the caching mutex, we instead kick of the caching kthread, and then attempt to make an allocation. If we cannot, we wait on the block groups caching waitqueue, which the caching kthread will wake the waiting threads up everytime it finds 2 meg worth of space, and then again when its finished caching. This is how I tested the speedup from this mkfs the disk mount the disk fill the disk up with fs_mark unmount the disk mount the disk time touch /mnt/foo Without my changes this took 11 seconds on my box, with these changes it now takes 1 second. Another change thats been put in place is we lock the super mirror's in the pinned extent map in order to keep us from adding that stuff as free space when caching the block group. This doesn't really change anything else as far as the pinned extent map is concerned, since for actual pinned extents we use EXTENT_DIRTY, but it does mean that when we unmount we have to go in and unlock those extents to keep from leaking memory. I've also added a check where when we are reading block groups from disk, if the amount of space used == the size of the block group, we go ahead and mark the block group as cached. This drastically reduces the amount of time it takes to cache the block groups. Using the same test as above, except doing a dd to a file and then unmounting, it used to take 33 seconds to umount, now it takes 3 seconds. This version uses the commit_root in the caching kthread, and then keeps track of how many async caching threads are running at any given time so if one of the async threads is still running as we cross transactions we can wait until its finished before handling the pinned extents. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 09:23:39 -04:00
Josef Bacik	9630308170	Btrfs: use hybrid extents+bitmap rb tree for free space Currently btrfs has a problem where it can use a ridiculous amount of RAM simply tracking free space. As free space gets fragmented, we end up with thousands of entries on an rb-tree per block group, which usually spans 1 gig of area. Since we currently don't ever flush free space cache back to disk this gets to be a bit unweildly on large fs's with lots of fragmentation. This patch solves this problem by using PAGE_SIZE bitmaps for parts of the free space cache. Initially we calculate a threshold of extent entries we can handle, which is however many extent entries we can cram into 16k of ram. The maximum amount of RAM that should ever be used to track 1 gigabyte of diskspace will be 32k of RAM, which scales much better than we did before. Once we pass the extent threshold, we start adding bitmaps and using those instead for tracking the free space. This patch also makes it so that any free space thats less than 4 * sectorsize we go ahead and put into a bitmap. This is nice since we try and allocate out of the front of a block group, so if the front of a block group is heavily fragmented and then has a huge chunk of free space at the end, we go ahead and add the fragmented areas to bitmaps and use a normal extent entry to track the big chunk at the back of the block group. I've also taken the opportunity to revamp how we search for free space. Previously we indexed free space via an offset indexed rb tree and a bytes indexed rb tree. I've dropped the bytes indexed rb tree and use only the offset indexed rb tree. This cuts the number of tree operations we were doing previously down by half, and gives us a little bit of a better allocation pattern since we will always start from a specific offset and search forward from there, instead of searching for the size we need and try and get it as close as possible to the offset we want. I've given this a healthy amount of testing pre-new format stuff, as well as post-new format stuff. I've booted up my fedora box which is installed on btrfs with this patch and ran with it for a few days without issues. I've not seen any performance regressions in any of my tests. Since the last patch Yan Zheng fixed a problem where we could have overlapping entries, so updating their offset inline would cause problems. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 09:23:30 -04:00
Artem Bityutskiy	887ee17117	UBIFS: remove unneeded call from ubifs_sync_fs Nowadays VFS always synchronizes all dirty inodes and pages before calling '->sync_fs()', so remove unneeded 'generic_sync_sb_inodes()' from 'ubifs_sync_fs()'. It used to be needed, but not any longer. Pointed-out-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-24 15:14:55 +03:00
Artem Bityutskiy	e9d6bbc428	UBIFS: kill BKL The BKL was pushed down from VFS to the file-systems. It used to serialize mount/unmount/remount and prevented more than one instance of the same file-system from doing mount/umount/remount at the same time. But it is OK for UBIFS and it does not need any additional locking for these cases. Thus, kick the BKL out of UBIFS. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-24 15:14:54 +03:00
Subrata Modak	b5148da40c	UBIFS: remove unused functions Remove 'xent_key_init_hash()' and 'data_key_init_flash()' functions, as they are unot used anywhere. Signed-off-by: Subrata Modak <subrata@linux.vnet.ibm.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-24 15:14:54 +03:00
Subrata Modak	83ef2ecdbb	UBIFS: suppress compilation warning Fix "using uninitialized variable" compilation warning by using the "unititialized_var()" helper. Signed-off-by: Subrata Modak<subrata@linux.vnet.ibm.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-24 15:14:54 +03:00
Jan Kara	0584974a77	ocfs2: Define credit counts for quota operations Numbers of needed credits for some quota operations were written as raw numbers. Create appropriate defines instead. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:31 -07:00
Jan Kara	4539f1df25	ocfs2: Remove syncjiff field from quota info syncjiff is just a converted value of syncms. Some places which are updating syncms forgot to update syncjiff as well. Since the conversion is just a simple division / multiplication and it does not happen frequently, just remove the syncjiff field to avoid forgotten conversions. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:27 -07:00
Jan Kara	1c1d9793ff	ocfs2: Fix initialization of blockcheck stats We just set blockcheck stats to zeros but we should also properly initialize the spinlock there. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:24 -07:00
Jan Kara	7669f54c55	ocfs2: Zero out padding of on disk dquot structure Padding fields of on-disk dquot structure were not zeroed. Zero them so that it's easier to use them later. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:17 -07:00
Jan Kara	0e7f387bf3	ocfs2: Initialize blocks allocated to local quota file When we extend local quota file, we should initialize data in newly allocated block. Firstly because on recovery we could parse bogus data, secondly so that block checksums are properly computed. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:11 -07:00
Jan Kara	4b3fa1904c	ocfs2: Mark buffer uptodate before calling ocfs2_journal_access_dq() In a code path extending local quota files we marked new header buffer uptodate only after calling ocfs2_journal_access_dq() which triggers a bug. Fix it and also call ocfs2 variant of the function marking buffer uptodate. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:05 -07:00
Jan Kara	b57ac2c43e	ocfs2: Make global quota files blocksize aligned Change i_size of global quota files so that it always remains aligned to block size. This is mainly because the end of quota block may contain checksum (if checksumming is enabled) and it's a bit awkward for it to be "outside" of quota file (and it makes life harder for ocfs2-tools). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:58:59 -07:00
Tao Ma	82e12644cf	ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records. In ocfs2_adjust_adjacent_records, we will adjust adjacent records according to the extent_list in the lower level. But actually the lower level tree will either be a leaf or a branch. If we only use ocfs2_is_empty_extent we will meet with some problem if the lower tree is a branch (tree_depth > 1). So use !ocfs2_rec_clusters instead. And actually only the leaf record can have holes. So add a BUG_ON for non-leaf branch. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:58:46 -07:00
Stefan Bader	4a19fb11a9	jfs: Fix early release of acl in jfs_get_acl BugLink: http://bugs.launchpad.net/ubuntu/+bug/396780 Commit `073aaa1b14` "helpers for acl caching + switch to those" introduced new helper functions for acl handling but seems to have introduced a regression for jfs as the acl is released before returning it to the caller, instead of leaving this for the caller to do. This causes the acl object to be used after freeing it, leading to kernel panics in completely different places. Thanks to Christophe Dumez for reporting and bisecting into this. Reported-by: Christophe Dumez <dchris@gmail.com> Tested-by: Christophe Dumez <dchris@gmail.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2009-07-23 11:08:36 -05:00
Linus Torvalds	81cbf6d055	Merge branch 'lockdep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep * 'lockdep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep: lockdep: Fix lockdep annotation for pipe_double_lock()	2009-07-22 16:44:18 -07:00
Steve French	f1230c9797	[CIFS] fix sparse warning Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-22 23:13:01 +00:00
Jeff Layton	03aa3a49ad	cifs: fix sb->s_maxbytes so that it casts properly to a signed value This off-by-one bug causes sendfile() to not work properly. When a task calls sendfile() on a file on a CIFS filesystem, the syscall returns -1 and sets errno to EOVERFLOW. do_sendfile uses s_maxbytes to verify the returned offset of the file. The problem there is that this value is cast to a signed value (loff_t). When this is done on the s_maxbytes value that cifs uses, it becomes negative and the comparisons against it fail. Even though s_maxbytes is an unsigned value, it seems that it's not OK to set it in such a way that it'll end up negative when it's cast to a signed value. These casts happen in other codepaths besides sendfile too, but the VFS is a little hard to follow in this area and I can't be sure if there are other bugs that this will fix. It's not clear to me why s_maxbytes isn't just declared as loff_t in the first place, but either way we still need to fix these values to make sendfile work properly. This is also an opportunity to replace the magic bit-shift values here with the standard #defines for this. This fixes the reproducer program I have that does a sendfile and will probably also fix the situation where apache is serving from a CIFS share. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-22 21:08:00 +00:00
Jeff Layton	ce6e7fcd43	cifs: disable serverino if server doesn't support it A recent regression when dealing with older servers. This bug was introduced when we made serverino the default... When the server can't provide inode numbers, disable it for the mount. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-22 21:07:51 +00:00
David Woodhouse	83121942b2	Btrfs: Fix crash on read failures at mount If the tree roots hit read errors during mount, btrfs is not properly erroring out. We need to check the uptodate bits after reading in the tree root node. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:52:13 -04:00
Daniel Cadete	c271b49241	Btrfs: remove of redundant btrfs_header_level This removes the continues call's of btrfs_header_level. One call of btrfs_header_level(c) its enough. Signed-off-by Daniel Cadete <danielncadete10@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:52:13 -04:00
Julia Lawall	33c17ad571	Btrfs: adjust NULL test Move the call to BUG_ON to before the dereference of the tested value. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:49:01 -04:00
David Woodhouse	3acada49c2	Btrfs: Remove broken sanity check from btrfs_rmap_block() It was never actually doing anything anyway (see the loop condition), and it would be difficult to make it work for RAID[56]. Even if it was actually working, it's checking for the wrong thing anyway. Instead of checking whether we list a block which _doesn't_ land at the relevant physical location, it should be checking that we _have_ listed all the logical blocks which refer to the required physical location on all devices. This function is only called from remove_sb_from_cache() to ensure that we reserve the logical blocks which would reside at the same physical location as the superblock copies. So listing more blocks than we need is actually OK. With RAID[56] we're going to throw away an entire stripe for each block we have to ignore, so we _are_ going to list blocks other than the ones which actually contain the superblock. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:49:01 -04:00
Julia Lawall	29c5e8ce01	Btrfs: convert nested spin_lock_irqsave to spin_lock If spin_lock_irqsave is called twice in a row with the same second argument, the interrupt state at the point of the second call overwrites the value saved by the first call. Indeed, the second call does not need to save the interrupt state, so it is changed to a simple spin_lock. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:49:00 -04:00
Peter Zijlstra	023d43c7b5	lockdep: Fix lockdep annotation for pipe_double_lock() The presumed use of the pipe_double_lock() routine is to lock 2 locks in a deadlock free way by ordering the locks by their address. However it fails to keep the specified lock classes in order and explicitly annotates a deadlock. Rectify this. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Miklos Szeredi <mszeredi@suse.cz> LKML-Reference: <1248163763.15751.11098.camel@twins>	2009-07-22 21:14:14 +02:00
Linus Torvalds	1f9758d4e7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: fs/Kconfig: move nilfs2 out	2009-07-22 10:05:00 -07:00
Yan Zheng	4a8c9a62d7	Btrfs: make sure all dirty blocks are written at commit time Write dirty block groups may allocate new block, and so may add new delayed back ref. btrfs_run_delayed_refs may make some block groups dirty. commit_cowonly_roots does not handle the recursion properly, and some dirty blocks can be left unwritten at commit time. This patch moves btrfs_run_delayed_refs into the loop that writes dirty block groups, and makes the code not break out of the loop until there are no dirty block groups or delayed back refs. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 10:07:05 -04:00
Yan Zheng	33c66f430b	Btrfs: fix locking issue in btrfs_find_next_key When walking up the tree, btrfs_find_next_key assumes the upper level tree block is properly locked. This isn't always true even path->keep_locks is 1. This is because btrfs_find_next_key may advance path->slots[] several times instead of only once. When 'path->slots[level] >= btrfs_header_nritems(path->nodes[level])' is found, we can't guarantee the original value of 'path->slots[level]' is 'btrfs_header_nritems(path->nodes[level]) - 1'. If it's not, the tree block at 'level + 1' isn't locked. This patch fixes the issue by explicitly checking the locking state, re-searching the tree if it's not locked. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Yan Zheng	e457afec60	Btrfs: fix double increment of path->slots[0] in btrfs_next_leaf if 1 is returned by btrfs_search_slot, the path already points to the first item with 'key > searching key'. So increasing path->slots[0] by one is superfluous in that case. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Yan Zheng	bf1fb512a5	Btrfs: properly update space information after shrinking device. Change 'goto done' to 'break' for the case of all device extents have been freed, so that the code updates space information will be execute. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Yan Zheng	1bec1aed1e	Btrfs: fix definition of struct btrfs_extent_inline_ref use __le64 instead of u64 in on-disk structure definition. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Trond Myklebust	d953126a28	NFSv4: Fix a problem whereby a buggy server can oops the kernel We just had a case in which a buggy server occasionally returns the wrong attributes during an OPEN call. While the client does catch this sort of condition in nfs4_open_done(), and causes the nfs4_atomic_open() to return -EISDIR, the logic in nfs_atomic_lookup() is broken, since it causes a fallback to an ordinary lookup instead of just returning the error. When the buggy server then returns a regular file for the fallback lookup, the VFS allows the open, and bad things start to happen, since the open file doesn't have any associated NFSv4 state. The fix is firstly to return the EISDIR/ENOTDIR errors immediately, and secondly to ensure that we are always careful when dereferencing the nfs_open_context state pointer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-07-21 19:22:38 -04:00
Jan Kara	f7b1aa69be	ocfs2: Fix deadlock on umount In commit `ea455f8ab6`, we moved the dentry lock put process into ocfs2_wq. This causes problems during umount because ocfs2_wq can drop references to inodes while they are being invalidated by invalidate_inodes() causing all sorts of nasty things (invalidate_inodes() ending in an infinite loop, "Busy inodes after umount" messages etc.). We fix the problem by stopping ocfs2_wq from doing any further releasing of inode references on the superblock being unmounted, wait until it finishes the current round of releasing and finally cleaning up all the references in dentry_lock_list from ocfs2_put_super(). The issue was tracked down by Tao Ma <tao.ma@oracle.com>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-21 15:47:55 -07:00
Tao Ma	3c5e10683e	ocfs2: Add extra credits and access the modified bh in update_edge_lengths. In normal tree rotation left process, we will never touch the tree branch above subtree_index and ocfs2_extend_rotate_transaction doesn't reserve the credits for them either. But when we want to delete the rightmost extent block, we have to update the rightmost records for all the rightmost branch(See ocfs2_update_edge_lengths), so we have to allocate extra credits for them. What's more, we have to access them also. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-21 14:41:54 -07:00
Trond Myklebust	fccba80455	NFSv4: Fix an NFSv4 mount regression Commit `008f55d0e0` (nfs41: recover lease in _nfs4_lookup_root) forces the state manager to always run on mount. This is a bug in the case of NFSv4.0, which doesn't require us to send a setclientid until we want to grab file state. In any case, this is completely the wrong place to be doing state management. Moving that code into nfs4_init_session... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-07-21 16:48:07 -04:00
Trond Myklebust	b64aec8d1e	NFSv4: Fix an Oops in nfs4_free_lock_state The oops http://www.kerneloops.org/raw.php?rawid=537858&msgid= appears to be due to the nfs4_lock_state->ls_state field being uninitialised. This happens if the call to nfs4_free_lock_state() is triggered at the end of nfs4_get_lock_state(). The fix is to move the initialisation of ls_state into the allocator. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-07-21 16:47:46 -04:00
Eric Paris	f44aebcc56	inotify: use GFP_NOFS under potential memory pressure inotify can have a watchs removed under filesystem reclaim. ================================= [ INFO: inconsistent lock state ] 2.6.31-rc2 #16 --------------------------------- inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes: (iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3 {IN-RECLAIM_FS-W} state was registered at: [<c10536ab>] __lock_acquire+0x2c9/0xac4 [<c1053f45>] lock_acquire+0x9f/0xc2 [<c1308872>] __mutex_lock_common+0x2d/0x323 [<c1308c00>] mutex_lock_nested+0x2e/0x36 [<c10ba6ff>] shrink_icache_memory+0x38/0x1b2 [<c108bfb6>] shrink_slab+0xe2/0x13c [<c108c3e1>] kswapd+0x3d1/0x55d [<c10449b5>] kthread+0x66/0x6b [<c1003fdf>] kernel_thread_helper+0x7/0x10 [<ffffffff>] 0xffffffff Two things are needed to fix this. First we need a method to tell fsnotify_create_event() to use GFP_NOFS and second we need to stop using one global IN_IGNORED event and allocate them one at a time. This solves current issues with multiple IN_IGNORED on a queue having tail drop problems and simplifies the allocations since we don't have to worry about two tasks opperating on the IGNORED event concurrently. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:27 -04:00
Eric Paris	c05594b621	fsnotify: fix inotify tail drop check with path entries fsnotify drops new events when they are the same as the tail event on the queue to be sent to userspace. The problem is that if the event comes with a path we forget to break out of the switch statement and fall into the code path which matches on events that do not have any type of file backed information (things like IN_UNMOUNT and IN_Q_OVERFLOW). The problem is that this code thinks all such events should be dropped. Fix is to add a break. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
Eric Paris	4a148ba988	inotify: check filename before dropping repeat events inotify drops events if the last event on the queue is the same as the current event. But it does 2 things wrong. First it is comparing old->inode with new->inode. But after an event if put on the queue the ->inode is no longer allowed to be used. It's possible between the last event and this new event the inode could be reused and we would falsely match the inode's memory address between two differing events. The second problem is that when a file is removed fsnotify is passed the negative dentry for the removed object rather than the postive dentry from immediately before the removal. This mean the (broken) inotify tail drop code was matching the NULL ->inode of differing events. The fix is to check the file name which is stored with events when doing the tail drop instead of wrongly checking the address of the stored ->inode. Reported-by: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
Eric Paris	520dc2a526	fsnotify: use def_bool in kconfig instead of letting the user choose fsnotify doens't give the user anything. If someone chooses inotify or dnotify it should build fsnotify, if they don't select one it shouldn't be built. This patch changes fsnotify to be a def_bool=n and makes everything else select it. Also fixes the issue people complained about on lwn where gdm hung because they didn't have inotify and they didn't get the inotify build option..... Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
Eric Paris	7e790dd5fc	inotify: fix error paths in inotify_update_watch inotify_update_watch could leave things in a horrid state on a number of error paths. We could try to remove idr entries that didn't exist, we could send an IN_IGNORED to userspace for watches that don't exist, and a bit of other stupidity. Clean these up by doing the idr addition before we put the mark on the inode since we can clean that up on error and getting off the inode's mark list is hard. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
Eric Paris	75fe2b2639	inotify: do not leak inode marks in inotify_add_watch inotify_add_watch had a couple of problems. The biggest being that if inotify_add_watch was called on the same inode twice (to update or change the event mask) a refence was taken on the original inode mark by fsnotify_find_mark_entry but was not being dropped at the end of the inotify_add_watch call. Thus if inotify_rm_watch was called although the mark was removed from the inode, the refcnt wouldn't hit zero and we would leak memory. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
Eric Paris	5549f7cdf8	inotify: drop user watch count when a watch is removed The inotify rewrite forgot to drop the inotify watch use cound when a watch was removed. This means that a single inotify fd can only ever register a maximum of /proc/sys/fs/max_user_watches even if some of those had been freed. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
dingdinghua	f1015c4477	jbd: fix race between write_metadata_buffer and get_write_access The function journal_write_metadata_buffer() calls jbd_unlock_bh_state(bh_in) too early; this could potentially allow another thread to call get_write_access on the buffer head, modify the data, and dirty it, and allowing the wrong data to be written into the journal. Fortunately, if we lose this race, the only time this will actually cause filesystem corruption is if there is a system crash or other unclean shutdown of the system before the next commit can take place. Signed-off-by: dingdinghua <dingdinghua85@gmail.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-21 11:54:42 +02:00
Wengang Wang	1f4cea3790	ocfs2: Fail ocfs2_get_block() immediately when a block needs allocation ocfs2_get_block() does no allocation. Hole filling for writes should have happened farther up in the call chain. We detect this case and print an error, but we then continue with the function. We should be exiting immediately. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-20 17:36:07 -07:00
Wengang Wang	cbfa9639ae	ocfs2: Fix error return in ocfs2_write_cluster() A typo caused ocfs2_write_cluster() to return 0 in some error cases. Fix it. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-20 17:32:55 -07:00
Linus Torvalds	457f82bac6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: Fix incorrect parameters to v9fs_file_readn. 9p: Possible regression in p9_client_stat 9p: default 9p transport module fix	2009-07-20 16:48:31 -07:00
Subrata Modak	44d8e4e1e9	ocfs2: Fix compilation warning for fs/ocfs2/xattr.c gcc 4.4.1 generates the following build warning on i386: CC [M] fs/ocfs2/xattr.o fs/ocfs2/xattr.c: In function ‘ocfs2_xattr_block_get’: fs/ocfs2/xattr.c:1055: warning: ‘block_off’ may be used uninitialized in this function The following fix is based on a similar approach by David Howells few days back: http://lkml.org/lkml/2009/7/9/109, Signed-off-by: Subrata Modak<subrata@linux.vnet.ibm.com>, Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-20 15:52:57 -07:00
Goldwyn Rodrigues	cefcb800fa	ocfs2: Initialize count in aio_write before generic_write_checks generic_write_checks() expects count to be initialized to the size of the write. Writes to files open with O_DIRECT\|O_LARGEFILE write 0 bytes because count is uninitialized. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-20 15:47:58 -07:00
Jeff Layton	90a98b2f3f	cifs: free nativeFileSystem field before allocating a new one ...otherwise, we'll leak this memory if we have to reconnect (e.g. after network failure). Signed-off-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-20 18:24:37 +00:00
Frederic Weisbecker	def01bc53d	sched: Convert the only user of cond_resched_bkl to use cond_resched() fs/locks.c:flock_lock_file() is the only user of cond_resched_bkl() This helper doesn't do anything more than cond_resched(). The latter naming is enough to explain that we are rescheduling if needed. The bkl suffix suggests another semantics but it's actually a synonym of cond_resched(). Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-7-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-18 15:51:45 +02:00
Frederic Weisbecker	613afbf832	sched: Pull up the might_sleep() check into cond_resched() might_sleep() is called late-ish in cond_resched(), after the need_resched()/preempt enabled/system running tests are checked. It's better to check the sleeps while atomic earlier and not depend on some environment datas that reduce the chances to detect a problem. Also define cond_resched_*() helpers as macros, so that the FILE/LINE reported in the sleeping while atomic warning displays the real origin and not sched.h Changes in v2: - Call __might_sleep() directly instead of might_sleep() which may call cond_resched() - Turn cond_resched() into a macro so that the file:line couple reported refers to the caller of cond_resched() and not __cond_resched() itself. Changes in v3: - Also propagate this __might_sleep() pull up to cond_resched_lock() and cond_resched_softirq() Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-6-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-18 15:51:44 +02:00
Sten Spans	713c0ecdb8	security: fix security_file_lock cmd argument Pass posix-translated lock operations to security_file_lock when invoked via sys_flock. Signed-off-by: Sten Spans <Sten_Spans@genua.de> Signed-off-by: James Morris <jmorris@namei.org>	2009-07-17 07:41:23 +10:00
Steve French	f6c4338543	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2009-07-16 04:21:39 +00:00
Jan Kara	43237b5490	ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to be a relict from some old days and setting disksize in this function does not make much sence. Currently it was set only by ext3_getblk(). Since the parameter has some effect only if create == 1, it is easy to check that the three callers which end up calling ext3_getblk() with create == 1 (ext3_append, ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves. Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-15 21:30:46 +02:00
Jan Kara	1e9fd53b78	jbd: Fix a race between checkpointing code and journal_get_write_access() The following race can happen: CPU1 CPU2 checkpointing code checks the buffer, adds it to an array for writeback do_get_write_access() ... lock_buffer() unlock_buffer() flush_batch() submits the buffer for IO __jbd_journal_file_buffer() So a buffer under writeout is returned from do_get_write_access(). Since the filesystem code relies on the fact that journaled buffers cannot be written out, it does not take the buffer lock and so it can modify buffer while it is under writeout. That can lead to a filesystem corruption if we crash at the right moment. The similar problem can happen with the journal_get_create_access() path. We fix the problem by clearing the buffer dirty bit under buffer_lock even if the buffer is on BJ_None list. Actually, we clear the dirty bit regardless the list the buffer is in and warn about the fact if the buffer is already journalled. Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>. Reported-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-15 21:30:07 +02:00
Jan Kara	9eaaa2d575	ext3: Fix truncation of symlinks after failed write Contents of long symlinks is written via standard write methods. So when the write fails, we add inode to orphan list. But symlinks don't have .truncate method defined so nobody properly removes them from the orphan list (both on disk and in memory). Fix this by calling ext3_truncate() directly instead of calling vmtruncate() (which is saner anyway since we don't need anything vmtruncate() does except from calling .truncate in these paths). We also add inode to orphan list only if ext3_can_truncate() is true (currently, it can be false for symlinks when there are no blocks allocated) - otherwise orphan list processing will complain and ext3_truncate() will not remove inode from on-disk orphan list. Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-15 21:28:07 +02:00
Jan Kara	7447a668a3	jbd: Fail to load a journal if it is too short Due to on disk corruption, it can happen that journal is too short. Fail to load it in such case so that we don't oops somewhere later. Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-15 21:26:23 +02:00
Linus Torvalds	8aa651e23e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: free socket in error exit path dlm: fix plock use-after-free dlm: Fix uninitialised variable warning in lock.c	2009-07-14 18:37:24 -07:00
Linus Torvalds	c0c50b541a	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing/function-profiler: do not free per cpu variable stat tracing/events: Move TRACE_SYSTEM outside of include guard	2009-07-14 18:34:32 -07:00
Andy Adamson	4bd9b0f4af	nfsd41: use globals for DRC limits The version 4.1 DRC memory limit and tracking variables are server wide and session specific. Replace struct svc_serv fields with globals. Stop using the svc_serv sv_lock. Add a spinlock to serialize access to the DRC limit management variables which change on session creation and deletion (usage counter) or (future) administrative action to adjust the total DRC memory limit. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-07-14 17:52:40 -04:00
Abhishek Kulkarni	9c9ad6162e	9p: Fix incorrect parameters to v9fs_file_readn. Fix v9fs_vfs_readpage. The offset and size parameters to v9fs_file_readn were interchanged and hence passed incorrectly. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-07-14 15:54:42 -05:00
Casey Dahlin	a89d63a159	dlm: free socket in error exit path In the tcp_connect_to_sock() error exit path, the socket allocated at the top of the function was not being freed. Signed-off-by: Casey Dahlin <cdahlin@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2009-07-14 12:28:43 -05:00
Yu Zhiguo	9208faf297	NFSv4: ACL in operations 'open' and 'create' should be used ACL in operations 'open' and 'create' is decoded but never be used. It should be set as the initial ACL for the object according to RFC3530. If error occurs when setting the ACL, just clear the ACL bit in the returned attr bitmap. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-14 12:16:47 -04:00
Ryusuke Konishi	4fed598a49	fs/Kconfig: move nilfs2 out fs/Kconfig file was split into individual fs/*/Kconfig files before nilfs was merged. I've found the current config entry of nilfs is tainting the work. Sorry, I didn't notice. This fixes the violation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Alexey Dobriyan <adobriyan@gmail.com>	2009-07-14 12:34:17 +09:00
Linus Torvalds	1cf29683f4	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: fix race between write_metadata_buffer and get_write_access ext4: Fix ext4_mb_initialize_context() to initialize all fields ext4: fix null handler of ioctls in no journal mode ext4: Fix buffer head reference leak in no-journal mode ext4: Move __ext4_journalled_writepage() to avoid forward declaration ext4: Fix mmap/truncate race when blocksize < pagesize && !nodellaoc ext4: Fix mmap/truncate race when blocksize < pagesize && delayed allocation ext4: Don't look at buffer_heads outside i_size. ext4: Fix goal inum check in the inode allocator ext4: fix no journal corruption with locale-gen ext4: Calculate required journal credits for inserting an extent properly ext4: Fix truncation of symlinks after failed write jbd2: Fix a race between checkpointing code and journal_get_write_access() ext4: Use rcu_barrier() on module unload. ext4: naturally align struct ext4_allocation_request ext4: mark several more functions in mballoc.c as noinline ext4: Fix potential reclaim deadlock when truncating partial block jbd2: Remove GFP_ATOMIC kmalloc from inside spinlock critical region ext4: Fix type warning on 64-bit platforms in tracing events header	2009-07-13 16:39:25 -07:00
dingdinghua	96577c4382	jbd2: fix race between write_metadata_buffer and get_write_access The function jbd2_journal_write_metadata_buffer() calls jbd_unlock_bh_state(bh_in) too early; this could potentially allow another thread to call get_write_access on the buffer head, modify the data, and dirty it, and allowing the wrong data to be written into the journal. Fortunately, if we lose this race, the only time this will actually cause filesystem corruption is if there is a system crash or other unclean shutdown of the system before the next commit can take place. Signed-off-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 17:55:35 -04:00
Linus Torvalds	a4dc32374e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: wm97xx_batery: replace driver_data with dev_get_drvdata() omap: video: remove direct access of driver_data Sound: remove direct access of driver_data driver model: fix show/store prototypes in doc. Firmware: firmware_class, fix lock imbalance Driver Core: remove BUS_ID_SIZE sparc: remove driver-core BUS_ID_SIZE partitions: fix broken uevent_suppress conversion devres: WARN() and return, don't crash on device_del() of uninitialized device	2009-07-13 10:24:08 -07:00
James Morris	7d45ecafb6	Merge branch 'master' into next Conflicts: include/linux/personality.h Use Linus' version. Signed-off-by: James Morris <jmorris@namei.org>	2009-07-14 00:30:40 +10:00
Theodore Ts'o	833576b362	ext4: Fix ext4_mb_initialize_context() to initialize all fields Pavel Roskin pointed out that kmemcheck indicated that ext4_mb_store_history() was accessing uninitialized values of ac->ac_tail and ac->ac_buddy leading to garbage in the mballoc history. Fix this by initializing the entire structure to all zeros first. Also, two fields were getting doubly initialized by the caller of ext4_mb_initialize_context, so remove them for efficiency's sake. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 09:45:52 -04:00
Peng Tao	ac046f1d61	ext4: fix null handler of ioctls in no journal mode The EXT4_IOC_GROUP_ADD and EXT4_IOC_GROUP_EXTEND ioctls should not flush the journal in no_journal mode. Otherwise, running resize2fs on a mounted no_journal partition triggers the following error messages: BUG: unable to handle kernel NULL pointer dereference at 00000014 IP: [<c039d282>] _spin_lock+0x8/0x19 *pde = 00000000 Oops: 0002 [#1] SMP Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 09:30:17 -04:00
Curt Wohlgemuth	e6b5d30104	ext4: Fix buffer head reference leak in no-journal mode We found a problem with buffer head reference leaks when using an ext4 partition without a journal. In particular, calls to ext4_forget() would not to a brelse() on the input buffer head, which will cause pages they belong to to not be reclaimable. Further investigation showed that all places where ext4_journal_forget() and ext4_journal_revoke() are called are subject to the same problem. The patch below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit release of the buffer head when the journal handle isn't valid. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 09:07:20 -04:00
Li Zefan	d0b6e04a4c	tracing/events: Move TRACE_SYSTEM outside of include guard If TRACE_INCLDUE_FILE is defined, <trace/events/TRACE_INCLUDE_FILE.h> will be included and compiled, otherwise it will be <trace/events/TRACE_SYSTEM.h> So TRACE_SYSTEM should be defined outside of #if proctection, just like TRACE_INCLUDE_FILE. Imaging this scenario: #include <trace/events/foo.h> -> TRACE_SYSTEM == foo ... #include <trace/events/bar.h> -> TRACE_SYSTEM == bar ... #define CREATE_TRACE_POINTS #include <trace/events/foo.h> -> TRACE_SYSTEM == bar !!! and then bar.h will be included and compiled. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A5A9CF1.2010007@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-13 10:59:55 +02:00
Johannes Berg	134e63756d	genetlink: make netns aware This makes generic netlink network namespace aware. No generic netlink families except for the controller family are made namespace aware, they need to be checked one by one and then set the family->netnsok member to true. A new function genlmsg_multicast_netns() is introduced to allow sending a multicast message in a given namespace, for example when it applies to an object that lives in that namespace, a new function genlmsg_multicast_allns() to send a message to all network namespaces (for objects that do not have an associated netns). The function genlmsg_multicast() is changed to multicast the message in just init_net, which is currently correct for all generic netlink families since they only work in init_net right now. Some will later want to work in all net namespaces because they do not care about the netns at all -- those will have to be converted to use one of the new functions genlmsg_multicast_allns() or genlmsg_multicast_netns() whenever they are made netns aware in some way. After this patch families can easily decide whether or not they should be available in all net namespaces. Many genl families us it for objects not related to networking and should therefore be available in all namespaces, but that will have to be done on a per family basis. Note that this doesn't touch on the checkpoint/restart problem where network namespaces could be used, genl families and multicast groups are numbered globally and I see no easy way of changing that, especially since it must be possible to multicast to all network namespaces for those families that do not care about netns. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-12 14:03:27 -07:00
Heiko Carstens	f8c73c790c	partitions: fix broken uevent_suppress conversion git commit `f67f129e` "Driver core: implement uevent suppress in kobject" contains this chunk for fs/partitions/check.c: /* suppress uevent if the disk supresses it */ - if (!ddev->uevent_suppress) + if (!dev_get_uevent_suppress(pdev)) kobject_uevent(&pdev->kobj, KOBJ_ADD); However that should have been - if (!ddev->uevent_suppress) + if (!dev_get_uevent_suppress(ddev)) Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Ming Lei <tom.leiming@gmail.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-07-12 13:02:09 -07:00
Artem Bityutskiy	dd0d9a46f5	AFS: Fix compilation warning Fix the following warning: fs/afs/dir.c: In function 'afs_d_revalidate': fs/afs/dir.c:567: warning: 'fid.vnode' may be used uninitialized in this function fs/afs/dir.c:567: warning: 'fid.unique' may be used uninitialized in this function by marking the 'fid' variable as an uninitialized_var. The problem is that gcc doesn't always manage to work out that fid is always set on the path through the function that uses it. Cc: linux-afs@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:24:07 -07:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00
Linus Torvalds	81e4e1ba7e	Revert "fuse: Fix build error" as unnecessary This reverts commit `097041e576`. Trond had a better fix, which is the parent of this one ("Fix compile error due to congestion_wait() changes") Requested-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-11 11:22:34 -07:00
Bartlomiej Zolnierkiewicz	8711c67bee	isofs: fix Joliet regression commit `5404ac8e44` ("isofs: cleanup mount option processing") missed conversion of joliet option flag resulting in non-working Joliet support. CC: walt <w41ter@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-10 19:18:59 -07:00
Linus Torvalds	44c695b13b	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: fix corruption dump UBIFS: clean up free space checking UBIFS: small amendments in the LEB scanning code UBIFS: dump a little more in case of corruptions MAINTAINERS: update ahunter's e-mail address UBIFS: allow more than one volume to be mounted UBIFS: fix assertion warning UBIFS: minor spelling and grammar fixes UBIFS: fix 64-bit divisions in debug print UBIFS: few spelling fixes UBIFS: set write-buffer timout to 3-5 seconds UBIFS: slightly optimize write-buffer timer usage UBIFS: improve debugging messaged UBIFS: fix integer overflow warning	2009-07-10 19:14:48 -07:00
Linus Torvalds	04eef90c2e	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd * 'for-linus' of git://git.open-osd.org/linux-open-osd: osdblk: Adjust queue limits to lower device's limits osdblk: a Linux block device for OSD objects MAINTAINERS: Add osd maintained files (F:) exofs: Avoid using file_fsync() exofs: Remove IBM copyrights exofs: Fix bio leak in error handling path (sync read)	2009-07-10 19:12:24 -07:00
Larry Finger	097041e576	fuse: Fix build error When building v2.6.31-rc2-344-g69ca06c, the following build errors are found due to missing includes: CC [M] fs/fuse/dev.o fs/fuse/dev.c: In function ‘request_end’: fs/fuse/dev.c:289: error: ‘BLK_RW_SYNC’ undeclared (first use in this function) ... fs/nfs/write.c: In function ‘nfs_set_page_writeback’: fs/nfs/write.c:207: error: ‘BLK_RW_ASYNC’ undeclared (first use in this function) Signed-off-by: Larry Finger@lwfinger.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-10 19:09:46 -07:00
Wengang Wang	812e7a6a43	ocfs2: log the actual return value of ocfs2_file_aio_write() in ocfs2_file_aio_write(), log_exit() could don't log the value which is really returned. this patch fixes it. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-10 16:53:52 -07:00
Linus Torvalds	69ca06c945	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cfq-iosched: reset oom_cfqq in cfq_set_request() block: fix sg SG_DXFER_TO_FROM_DEV regression block: call blk_scsi_ioctl_init() Fix congestion_wait() sync/async vs read/write confusion	2009-07-10 14:29:58 -07:00
Linus Torvalds	9f2d8be426	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix disorder in cp count on error during deleting checkpoints nilfs2: fix lockdep warning between regular file and inode file nilfs2: fix incorrect KERN_CRIT messages in case of write failures nilfs2: fix hang problem of log writer which occurs after write failures nilfs2: remove unlikely directive causing mis-conversion of error code	2009-07-10 14:27:21 -07:00
FUJITA Tomonori	ecb554a846	block: fix sg SG_DXFER_TO_FROM_DEV regression I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use the block layer mapping API (2.6.28). Douglas Gilbert explained SG_DXFER_TO_FROM_DEV: http://www.spinics.net/lists/linux-scsi/msg37135.html = The semantics of SG_DXFER_TO_FROM_DEV were: - copy user space buffer to kernel (LLD) buffer - do SCSI command which is assumed to be of the DATA_IN (data from device) variety. This would overwrite some or all of the kernel buffer - copy kernel (LLD) buffer back to the user space. The idea was to detect short reads by filling the original user space buffer with some marker bytes ("0xec" it would seem in this report). The "resid" value is a better way of detecting short reads but that was only added this century and requires co-operation from the LLD. = This patch changes the block layer mapping API to support this semantics. This simply adds another field to struct rq_map_data and enables __bio_copy_iov() to copy data from user space even with READ requests. It's better to add the flags field and kills null_mapped and the new from_user fields in struct rq_map_data but that approach makes it difficult to send this patch to stable trees because st and osst drivers use struct rq_map_data (they were converted to use the block layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer mapping API. zhou sf reported this regiression and tested this patch: http://www.spinics.net/lists/linux-scsi/msg37128.html http://www.spinics.net/lists/linux-scsi/msg37168.html Reported-by: zhou sf <sxzzsf@gmail.com> Tested-by: zhou sf <sxzzsf@gmail.com> Cc: stable@kernel.org Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:53 +02:00
Jens Axboe	8aa7e847d8	Fix congestion_wait() sync/async vs read/write confusion Commit `1faa16d228` accidentally broke the bdi congestion wait queue logic, causing us to wait on congestion for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:53 +02:00
Steve French	65bc98b005	[CIFS] Distinguish posix opens and mkdirs from legacy mkdirs in stats Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-10 15:27:25 +00:00
Linus Torvalds	c2cc49a2f8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: when ATTR_READONLY is set, only clear write bits on non-directories cifs: remove cifsInodeInfo->inUse counter cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget [CIFS] update cifs version number cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo cifs: add pid of initiating process to spnego upcall info cifs: fix regression with O_EXCL creates and optimize away lookup cifs: add new cifs_iget function and convert unix codepath to use it	2009-07-09 20:40:58 -07:00
Jeff Layton	d0c280d26d	cifs: when ATTR_READONLY is set, only clear write bits on non-directories cifs: when ATTR_READONLY is set, only clear write bits on non-directories On windows servers, ATTR_READONLY apparently either has no meaning or serves as some sort of queue to certain applications for unrelated behavior. This MS kbase article has details: http://support.microsoft.com/kb/326549/ Don't clear the write bits directory mode when ATTR_READONLY is set. Reported-by: pouchat@peewiki.net Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 23:06:04 +00:00
Jeff Layton	aeaaf253c4	cifs: remove cifsInodeInfo->inUse counter cifs: remove cifsInodeInfo->inUse counter It was purported to be a refcounter of some sort, but was never used that way. It never served any purpose that wasn't served equally well by the I_NEW flag. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 23:06:00 +00:00
Jeff Layton	0b8f18e358	cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget Rather than allocating an inode and filling it out, have cifs_get_inode_info fill out a cifs_fattr and call cifs_iget. This means a pretty hefty reorganization of cifs_get_inode_info. For the readdir codepath, add a couple of new functions for filling out cifs_fattr's from different FindFile response infolevels. Finally, remove cifs_new_inode since there are no more callers. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 23:05:48 +00:00
Steve French	b77863bfa1	[CIFS] update cifs version number Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 22:51:38 +00:00
Jeff Layton	3bbeeb3c93	cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls When there's an open filehandle, SET_FILE_INFO is apparently preferred over SET_PATH_INFO. Add a new variant that sets a FILE_UNIX_INFO_BASIC infolevel via SET_FILE_INFO and switch cifs_setattr_unix to use the new call when there's an open filehandle available. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 21:15:10 +00:00
Jeff Layton	654cf14ac0	cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO The SET_FILE_INFO variant will need to do the same thing here. Break this code out into a separate function that both variants can call. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 21:15:06 +00:00
Jeff Layton	01ea95e3b6	cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo ...in preparation of adding a SET_FILE_INFO variant. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 21:15:02 +00:00
Jeff Layton	c4c1bff64d	cifs: add pid of initiating process to spnego upcall info cifs: add pid of initiating process to spnego upcall info This will allow the upcall to poke in /proc/<pid>/environ and get the value of the $KRB5CCNAME env var for the process. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 21:14:58 +00:00
Artem Bityutskiy	0611254760	UBIFS: fix corruption dump In the 'ubifs_recover_leb()' function, when we find corrupted empty space, we dump 8K starting from the offset where the last node ends. This is OK if the corrupted empty space is somewhere near that offset. But if the corruption is far at the end of the LEB, we will dump all 0xFF bytes and complitely ignore the interesting data. This is observed on a PPC ("kilauea") with NOR flash. This patch changes the behavior and teaches UBIFS to print only interesting data. I.e., now we find where corruption starts and start dumping from that offset. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>	2009-07-09 09:19:39 +03:00
Artem Bityutskiy	431102fed3	UBIFS: clean up free space checking recovery.c has 'is_empty()' helper and it is better to use this helper instead of re-implementing it in several places. This patch does this and removes some amount of unneeded code. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>	2009-07-09 09:19:38 +03:00
Artem Bityutskiy	ed43f2f06c	UBIFS: small amendments in the LEB scanning code This patch fixes few minor things I've spotted while going through code: 1. Better document return codes 2. If 'ubifs_scan_a_node()' returns some thing we do not expect, treat this as an error. 3. Try to do recovery only when 'ubifs_scan()' returns %-EUCLEAN, not on any error. 4. If empty space starts at a non-aligned address, print a message. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>	2009-07-09 09:19:38 +03:00
Artem Bityutskiy	086b3640c1	UBIFS: dump a little more in case of corruptions In case of corruptions, dump 8192 bytes instead of 4096. The largest node is 4096+ bytes, so it is better to see a node boundary, which is not always possible when only 4096 bytes are printed. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>	2009-07-09 09:19:38 +03:00
Jeff Liu	17ae26b669	ocfs2: trivial fix for s/migrate/migration/ in dlmrecovery.c logging in dlmrecovery.c:1121, replace 'migrate' to 'migration' to keep the consistency by comparing to other lines with the similar log info in the same file. Signed-off-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-08 15:34:40 -07:00
Jeff Mahoney	8b712cd58a	ocfs2: Fixup orphan scan cleanup after failed mount If the mount fails for any reason, ocfs2_dismount_volume calls ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init be called to setup the mutex and work queues, but that doesn't happen if the mount has failed and we oops accessing an uninitialized work queue. This patch splits the init and startup of the orphan scan, eliminating the oops. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-08 15:34:02 -07:00
Jeff Layton	5ddf1e0ff0	cifs: fix regression with O_EXCL creates and optimize away lookup cifs: fix regression with O_EXCL creates and optimize away lookup Signed-off-by: Jeff Layton <jlayton@redhat.com> Tested-by: Shirish Pargaonkar <shirishp@gmail.com> CC: Stable Kernel <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-08 21:55:45 +00:00
Joe Perches	ad361c9884	Remove multiple KERN_ prefixes from printk formats Commit `5fd29d6ccb` ("printk: clean up handling of log-levels and newlines") changed printk semantics. printk lines with multiple KERN_<level> prefixes are no longer emitted as before the patch. <level> is now included in the output on each additional use. Remove all uses of multiple KERN_<level>s in formats. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 10:30:03 -07:00
Linus Torvalds	728b690fd5	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: quota: Fix possible deadlock during parallel quotaon and quotaoff	2009-07-08 09:35:50 -07:00
Catalin Marinas	d5ce5b40bc	Free the memory allocated by memdup_user() in fs/sysfs/bin.c Commit `1c8542c7bb` replaced kmalloc() with memdup_user() in the write() function but also dropped the kfree(temp). The memdup_user() function allocates memory which is never freed. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Parag Warudkar <parag.warudkar@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 09:34:07 -07:00
Alexey Dobriyan	b43f3cbd21	headers: mnt_namespace.h redux Fix various silly problems wrt mnt_namespace.h: - exit_mnt_ns() isn't used, remove it - done that, sched.h and nsproxy.h inclusions aren't needed - mount.h inclusion was need for vfsmount_lock, but no longer - remove mnt_namespace.h inclusion from files which don't use anything from mnt_namespace.h Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 09:31:56 -07:00
Jiaying Zhang	d01730d74d	quota: Fix possible deadlock during parallel quotaon and quotaoff The following test script triggers a deadlock on ext2 filesystem: while true; do quotaon /dev/hda >&/dev/null; usleep $RANDOM; done & while true; do quotaoff /dev/hda >&/dev/null; usleep $RANDOM; done & I found there is a potential deadlock between quotaon and quotaoff (or quotasync). Basically, all of quotactl operations need to be protected by dqonoff_mutex. vfs_quota_off and vfs_quota_sync also call sb->s_op->quota_write that needs to grab the i_mutex of the quota file. But in vfs_quota_on_inode (called from quotaon operation), the current code tries to grab the i_mutex of the quota file first before getting quonoff_mutex. Reverse the order in which we take locks in vfs_quota_on_inode(). Jan Kara: Changed changelog to be more readable, made lockdep happy with I_MUTEX_QUOTA. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-07 18:15:21 +02:00
Csaba Henk	7a6d3c8b30	fuse: make the number of max background requests and congestion threshold tunable The practical values for these limits depend on the design of the filesystem server so let userspace set them at initialization time. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-07-07 17:28:52 +02:00
Oleg Nesterov	793285fcaf	cred_guard_mutex: do not return -EINTR to user-space do_execve() and ptrace_attach() return -EINTR if mutex_lock_interruptible(->cred_guard_mutex) fails. This is not right, change the code to return ERESTARTNOINTR. Perhaps we should also change proc_pid_attr_write(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-06 13:57:04 -07:00
Zhang, Yanmin	3beab0b424	sys_sync(): fix 16% performance regression in ffsb create_4k test I run many ffsb test cases on JBODs (typically 13/12 disks). Comparing with kernel 2.6.30, 2.6.31-rc1 has about 16% regression with ffsb_create_4k. The sub test case creates files continuously for 10 minitues and every file is 1MB. Bisect located below patch. `5cee5815d1` is first bad commit commit `5cee5815d1` Author: Jan Kara <jack@suse.cz> Date: Mon Apr 27 16:43:51 2009 +0200 vfs: Make sys_sync() use fsync_super() (version 4) It is unnecessarily fragile to have two places (fsync_super() and do_sync()) doing data integrity sync of the filesystem. Alter __fsync_super() to accommodate needs of both callers and use it. So after this patch __fsync_super() is the only place where we gather all the calls needed to properly send all data on a filesystem to disk. As a matter of fact, ffsb calls sys_sync in the end to make sure all data is flushed to disks and the flushing is counted into the result. vmstat shows ffsb is blocked when syncing for a long time. With 2.6.30, ffsb is blocked for a short time. I checked the patch and did experiments to recover the original methods. Eventually, the root cause is the patch deletes the calling to wakeup_pdflush when syncing, so only ffsb is blocked on disk I/O. wakeup_pdflush could ask pdflush to write back pages with ffsb at the same time. [akpm@linux-foundation.org: restore comment too] Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-06 13:57:03 -07:00
Akira Fujita	1c71850517	ext4: Fix compile warnings with MB_DEBUG When MB_DEBUG is enabled, we get some compile warnings because ext4_group_t is unsigned int. This patch fixes them. Signed-off-by Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-05 23:04:36 -04:00
Joe Perches	5a4a798937	ext4: Remove unnecessary semicolons in mballoc.c Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-05 22:33:08 -04:00
Curt Wohlgemuth	6487a9d3b5	ext4: More buffer head reference leaks After the patch I posted last week regarding buffer head ref leaks in no-journal mode, I looked at all the code that uses buffer heads and searched for more potential leaks. The patch below fixes the issues I found; these can occur even when a journal is present. The change to inode.c fixes a double release if ext4_journal_get_create_access() fails. The changes to namei.c are more complicated. add_dirent_to_buf() will release the input buffer head EXCEPT when it returns -ENOSPC. There are some callers of this routine that don't always do the brelse() in the event that -ENOSPC is returned. Unfortunately, to put this fix into ext4_add_entry() required capturing the return value of make_indexed_dir() and add_dirent_to_buf(). Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-17 10:54:08 -04:00
Jan Kara	f6f50e28f0	jbd2: Fail to load a journal if it is too short Due to on disk corruption, it can happen that journal is too short. Fail to load it in such case so that we don't oops somewhere later. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-17 10:40:01 -04:00
Theodore Ts'o	78f1ddbb49	ext4: Avoid null pointer dereference when decoding EROFS w/o a journal We need to check to make sure a journal is present before checking the journal flags in ext4_decode_error(). Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-27 23:09:47 -04:00
Manish Katiyar	43b3852029	ext4: Fix typo in ext4/Kconfig Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-27 21:38:17 -04:00
Aneesh Kumar K.V	024eab4d5b	ext4: Fix memory leak fix when mounting an ext4 filesystem The allocation of the ext4_group_info array was moved to a new function ext4_mb_add_group_info() in commit `5f21b0e6` so that online resize would use a common (and correct) codepath. Unfortunately, the call to the new ext4_mb_add_group_info() function was added without removing the code which originally allocated the array. This caused a memory leak each time an ext4 filesystem was mounted. The fix is simple; remove the code that did the original allocation, since it is no longer needed. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-17 09:01:04 -04:00
Daniel Mack	7fcd9c3ecb	UBIFS: allow more than one volume to be mounted UBIFS uses a bdi device per volume, but does not care to hand out unique names to each of them. This causes an error when trying to mount more than one volumes. Append the UBI volume and device ID to avoid that. [Amended a bit by Artem Bityutskiy] Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Adrian Hunter <ext-adrian.hunter@nokia.com> Cc: linux-mtd@lists.infradead.org Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-05 18:45:19 +03:00
Artem Bityutskiy	1fb8bd01ed	UBIFS: fix assertion warning When debugging is enabled and an unclean file-system is mounter, the following assertion is triggered: UBIFS assert failed in ubifs_tnc_start_commit at 805 (pid 1081) Call Trace: [cfaffbd0] [c0006cf8] show_stack+0x44/0x16c (unreliable) [cfaffc10] [c011b738] ubifs_tnc_start_commit+0xbb8/0xd18 [cfaffc90] [c0112670] do_commit+0x150/0xa44 [cfaffd10] [c0125234] ubifs_rcvry_gc_commit+0xd8/0x544 [cfaffd60] [c0100e9c] ubifs_fill_super+0xe78/0x15f8 [cfaffdf0] [c0102118] ubifs_get_sb+0x20c/0x320 [cfaffe70] [c007f764] vfs_kern_mount+0x58/0xe0 [cfaffe90] [c007f83c] do_kern_mount+0x40/0xf8 [cfaffeb0] [c0095c24] do_mount+0x550/0x758 [cfafff10] [c0095ebc] sys_mount+0x90/0xe0 [cfafff40] [c000ed4c] ret_from_syscall+0x0/0x3c The reason is that we initialize 'c->min_leb_idx' early, and do not re-calculate it after journal replay. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-05 18:45:19 +03:00
Adrian Hunter	681947d2fa	UBIFS: minor spelling and grammar fixes Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>	2009-07-05 18:45:18 +03:00
Adrian Hunter	4473758944	UBIFS: fix 64-bit divisions in debug print Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>	2009-07-05 18:45:18 +03:00
Artem Bityutskiy	cb54ef8b13	UBIFS: few spelling fixes Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-05 18:45:18 +03:00
Artem Bityutskiy	2a35a3a8ab	UBIFS: set write-buffer timout to 3-5 seconds This patch cleans up write-buffer timeout initialization and sets it to 3-5 interval. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-05 18:45:17 +03:00
Artem Bityutskiy	0b335b9d7d	UBIFS: slightly optimize write-buffer timer usage This patch adds the following minor optimization: 1. If write-buffer does not use the timer, indicate it with the wbuf->no_timer variable, instead of using the wbuf->softlimit variable. This is better because wbuf->softlimit is of ktime_t type, and the ktime_to_ns function contains 64-bit multiplication. 2. Do not call the 'hrtimer_cancel()' function for write-buffers which do not use timers. 3. Do not cancel the timer in 'ubifs_put_super()' because the synchronization function does this. This patch also removes a confusing comment. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-05 18:45:16 +03:00
Artem Bityutskiy	70aee2f153	UBIFS: improve debugging messaged 1. Make the I/O debugging message print the journal head number. 2. Add prints to timer functions. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-05 18:45:16 +03:00
Adrian Hunter	e3dc5a665d	UBIFS: fix integer overflow warning Fix the following warning: fs/ubifs/io.c: In function 'ubifs_wbuf_init': fs/ubifs/io.c:860: warning: integer overflow in expression And limit maximum hrtimer delta to ULONG_MAX because the argument is 'unsigned long'. Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-07-05 18:45:15 +03:00
Jiro SEKIBA	d9a0a345ab	nilfs2: fix disorder in cp count on error during deleting checkpoints This fixes a bug that checkpoint count gets wrong on errors when deleting a series of checkpoints. The count error is persistent since the checkpoint count is stored on disk. Some userland programs refer to the count via ioctl, and this bugfix is needed to prevent malfunction of such programs. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	ff54de363a	nilfs2: fix lockdep warning between regular file and inode file This will fix the following false positive of recursive locking which lockdep has detected: ============================================= [ INFO: possible recursive locking detected ] 2.6.30-nilfs #42 --------------------------------------------- nilfs_cleanerd/10607 is trying to acquire lock: (&bmap->b_sem){++++-.}, at: [<e0d025b7>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2] but task is already holding lock: (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2] other info that might help us debug this: 2 locks held by nilfs_cleanerd/10607: #0: (&nilfs->ns_segctor_sem){++++.+}, at: [<e0d0d75a>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] #1: (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	4a52df7797	nilfs2: fix incorrect KERN_CRIT messages in case of write failures In case of write-failure retries, the following KERN_CRIT level messages are mistakenly output by nilfs_dat_commit_start() function: nilfs_dat_commit_start: vbn = 408463, start = 12506, end = 18446744073709551615, pbn = 530210 nilfs_dat_commit_start: vbn = 408515, start = 12506, end = 18446744073709551615, pbn = 530211 nilfs_dat_commit_start: vbn = 408464, start = 12506, end = 18446744073709551615, pbn = 530212 ... This suppresses these messages. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	8227b29722	nilfs2: fix hang problem of log writer which occurs after write failures Leandro Lucarella gave me a report that nilfs gets stuck after its write function fails. The problem turned out to be caused by bugs which leave writeback flag on pages. This fixes the problem by ensuring to clear the writeback flag in error path. Reported-by: Leandro Lucarella <llucax@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	0cfae3d879	nilfs2: remove unlikely directive causing mis-conversion of error code The following error code handling in nilfs_segctor_write() function wrongly converted negative error codes to a truth value (i.e. 1): err = unlikely(err) ? : res; which originaly meant to be err = err ? : res; This mis-conversion caused that write or sync functions receive the unexpected error code. This fixes the bug by removing the unlikely directive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:19 +09:00
Linus Torvalds	14c1b7c212	Merge branch 'for-2.6.31' of git://linux-nfs.org/~bfields/linux * 'for-2.6.31' of git://linux-nfs.org/~bfields/linux: NFSD: Don't hold unrefcounted creds over call to nfsd_setuser()	2009-07-04 10:11:38 -07:00
David Howells	033a666ccb	NFSD: Don't hold unrefcounted creds over call to nfsd_setuser() nfsd_open() gets an unrefcounted pointer to the current process's effective credentials at the top of the function, then calls nfsd_setuser() via fh_verify() - which may replace and destroy the current process's effective credentials - and then passes the unrefcounted pointer to dentry_open() - but the credentials may have been destroyed by this point. Instead, the value from current_cred() should be passed directly to dentry_open() as one of its arguments, rather than being cached in a variable. Possibly fh_verify() should return the creds to use. This is a regression introduced by `745ca2475a` "CRED: Pass credentials through dentry_open()". Signed-off-by: David Howells <dhowells@redhat.com> Tested-and-Verified-By: Steve Dickson <steved@redhat.com> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-07-03 10:21:10 -04:00
Linus Torvalds	746a99a5af	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: fs/notify/inotify: decrement user inotify count on close	2009-07-02 16:54:07 -07:00
Linus Torvalds	5291a12f05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix error message formatting Btrfs: fix use after free in btrfs_start_workers fail path Btrfs: honor nodatacow/sum mount options for new files Btrfs: update backrefs while dropping snapshot Btrfs: account for space we may use in fallocate Btrfs: fix the file clone ioctl for preallocated extents Btrfs: don't log the inode in file_write while growing the file	2009-07-02 16:52:38 -07:00
Hu Tao	68f5a38c3e	Btrfs: fix error message formatting Make an error msg look nicer by inserting a space between number and word. Signed-off-by: Hu Tao <hu.taoo@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-02 13:55:45 -04:00
Jiri Slaby	9b627e9bf4	Btrfs: fix use after free in btrfs_start_workers fail path worker memory is already freed on one fail path in btrfs_start_workers, but is still dereferenced. Switch the dereference and kfree. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-02 13:50:58 -04:00
Chris Mason	9427216476	Btrfs: honor nodatacow/sum mount options for new files The btrfs attr patches unconditionally inherited the inode flags field without honoring nodatacow and nodatasum. This fix makes sure we properly record the nodatacow/sum mount options in new inodes. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-02 13:41:17 -04:00
Yan Zheng	2c47e605a9	Btrfs: update backrefs while dropping snapshot The new backref format has restriction on type of backref item. If a tree block isn't referenced by its owner tree, full backrefs must be used for the pointers in it. When a tree block loses its owner tree's reference, backrefs for the pointers in it should be updated to full backrefs. Current btrfs_drop_snapshot misses the code that updates backrefs, so it's unsafe for general use. This patch adds backrefs update code to btrfs_drop_snapshot. It isn't a problem in the restricted form btrfs_drop_snapshot is used today, but for general snapshot deletion this update is required. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-02 13:41:17 -04:00
Josef Bacik	a970b0a16c	Btrfs: account for space we may use in fallocate Using Eric Sandeen's xfstest for fallocate, you can easily trigger a ENOSPC panic on btrfs. This is because we do not account for data we may use when doing the fallocate. This patch fixes the problem by properly reserving space, and then just freeing it when we are done. The reservation stuff was made with delalloc in mind, so its a little crude for this case, but it keeps the box from panicing. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-02 13:41:16 -04:00
Chris Mason	c8a894d77d	Btrfs: fix the file clone ioctl for preallocated extents	2009-07-02 13:41:16 -04:00
Chris Mason	f597bb19cc	Btrfs: don't log the inode in file_write while growing the file	2009-07-02 13:41:16 -04:00
Keith Packard	bdae997f44	fs/notify/inotify: decrement user inotify count on close The per-user inotify_devs value is incremented each time a new file is allocated, but never decremented. This led to inotify_init failing after a limited number of calls. Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-02 08:23:00 -04:00
Jeff Layton	cc0bad7552	cifs: add new cifs_iget function and convert unix codepath to use it cifs: add new cifs_iget function and convert unix codepath to use it In order to unify some codepaths, introduce a common cifs_fattr struct for storing inode attributes. The different codepaths (unix, legacy, normal, etc...) can fill out this struct with inode info. It can then be passed as an arg to a common set of routines to get and update inodes. Add a new cifs_iget function that uses iget5_locked to identify inodes. This will compare inodes based on the uniqueid value in a cifs_fattr struct. Rather than filling out an already-created inode, have cifs_get_inode_info_unix instead fill out cifs_fattr and hand that off to cifs_iget. cifs_iget can then properly look for hardlinked inodes. On the readdir side, add a new cifs_readdir_lookup function that spawns populated dentries. Redefine FILE_UNIX_INFO so that it's basically a FILE_UNIX_BASIC_INFO that has a few fields wrapped around it. This allows us to more easily use the same function for filling out the fattr as the non-readdir codepath. With this, we should then have proper hardlink detection and can eventually get rid of some nasty CIFS-specific hacks for handing them. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-01 21:26:42 +00:00
Linus Torvalds	5c5d4e8eaf	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: mtd: nand: fix build failure and incorrect return from omap_wait() mtd: Use BLOCK_NIL consistently in NFTL/INFTL mtd: m25p80 timeout too short for worst-case m25p16 devices mtd: atmel_nand: Fix typo s/parititions/partitions/ mtd: cmdlineparts: Use 64-bit format when printing a debug message. mtd: maps: Remove BUS_ID_SIZE from integrator_flash jffs2: fix another potential leak on error path in scan.c	2009-07-01 11:25:46 -07:00
Linus Torvalds	fa172f4006	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: invalidation reverse calls fuse: allow umask processing in userspace fuse: fix bad return value in fuse_file_poll() fuse: fix return value of fuse_dev_write()	2009-07-01 11:20:46 -07:00
Amerigo Wang	e2dbe12557	elf: fix one check-after-use Check before use it. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-01 11:14:28 -07:00
Martin K. Petersen	7878cba9f0	block: Create bip slabs with embedded integrity vectors This patch restores stacking ability to the block layer integrity infrastructure by creating a set of dedicated bip slabs. Each bip slab has an embedded bio_vec array at the end. This cuts down on memory allocations and also simplifies the code compared to the original bvec version. Only the largest bip slab is backed by a mempool. The pool is contained in the bio_set so stacking drivers can ensure forward progress. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.(none)>	2009-07-01 10:56:25 +02:00
Wolfgang Illmeyer	752fa51e4c	hostfs: set maximum filesize in superblock for proper LFS support Maximum file size for hostfs mounts defaults to 2GB, so bigger files cannot be read/written through hostfs. This patch initializes the maximum file size to MAX_LFS_SIZE. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13531 Signed-off-by: Wolfgang Illmeyer <wolfgang@illmeyer.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 18:56:03 -07:00
Bryan Donlan	4d6c13f87d	ext2: return -EIO not -ESTALE on directory traversal through deleted inode ext2_iget() returns -ESTALE if invoked on a deleted inode, in order to report errors to NFS properly. However, in ext[234]_lookup(), this -ESTALE can be propagated to userspace if the filesystem is corrupted such that a directory entry references a deleted inode. This leads to a misleading error message - "Stale NFS file handle" - and confusion on the part of the admin. The bug can be easily reproduced by creating a new filesystem, making a link to an unused inode using debugfs, then mounting and attempting to ls -l said link. This patch thus changes ext2_lookup to return -EIO if it receives -ESTALE from ext2_iget(), as ext2 does for other filesystem metadata corruption; and also invokes the appropriate ext*_error functions when this case is detected. Signed-off-by: Bryan Donlan <bdonlan@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 18:56:00 -07:00
KAMEZAWA Hiroyuki	341c87bf34	elf: limit max map count to safe value With ELF, at generating coredump, some more headers other than used vmas are added. When max_map_count == 65536, a core generated by following kinds of code can be unreadable because the number of ELF's program header is written in 16bit in Ehdr (please see elf.h) and the number overflows. == ... = mmap(); (munmap, mprotect, etc...) if (failed) abort(); == This can happen in mmap/munmap/mprotect/etc...which calls split_vma(). I think 65536 is not safe as _default_ and reduce it to 65530 is good for avoiding unexpected corrupted core. Anyway, max_map_count can be enlarged by sysctl if a user is brave.. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Jakub Jelinek <jakub@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 18:55:59 -07:00
Davide Libenzi	133890103b	eventfd: revised interface and cleanups Change the eventfd interface to de-couple the eventfd memory context, from the file pointer instance. Without such change, there is no clean way to racely free handle the POLLHUP event sent when the last instance of the file* goes away. Also, now the internal eventfd APIs are using the eventfd context instead of the file*. This patch is required by KVM's IRQfd code, which is still under development. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Gregory Haskins <ghaskins@novell.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 18:55:58 -07:00
Jiri Slaby	f7c2df9b55	AFS: Fix lock imbalance Don't unlock on vfs_rejected_lock path in afs_do_setlk, since the lock is unlocked after abort_attempt label. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 13:30:44 -07:00
John Muir	3b463ae0c6	fuse: invalidation reverse calls Add notification messages that allow the filesystem to invalidate VFS caches. Two notifications are added: 1) inode invalidation - invalidate cached attributes - invalidate a range of pages in the page cache (this is optional) 2) dentry invalidation - try to invalidate a subtree in the dentry cache Care must be taken while accessing the 'struct super_block' for the mount, as it can go away while an invalidation is in progress. To prevent this, introduce a rw-semaphore, that is taken for read during the invalidation and taken for write in the ->kill_sb callback. Cc: Csaba Henk <csaba@gluster.com> Cc: Anand Avati <avati@zresearch.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-06-30 20:12:24 +02:00
Miklos Szeredi	e0a43ddcc0	fuse: allow umask processing in userspace This patch lets filesystems handle masking the file mode on creation. This is needed if filesystem is using ACLs. - The CREATE, MKDIR and MKNOD requests are extended with a "umask" parameter. - A new FUSE_DONT_MASK flag is added to the INIT request/reply. With this the filesystem may request that the create mode is not masked. CC: Jean-Pierre André <jean-pierre.andre@wanadoo.fr> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-06-30 20:12:23 +02:00
Miklos Szeredi	201fa69a28	fuse: fix bad return value in fuse_file_poll() Fix fuse_file_poll() which returned a -errno value instead of a poll mask. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org	2009-06-30 20:06:24 +02:00
Csaba Henk	b4c458b3a2	fuse: fix return value of fuse_dev_write() On 64 bit systems -- where sizeof(ssize_t) > sizeof(int) -- the following test exposes a bug due to a non-careful return of an int or unsigned value: implement a FUSE filesystem which sends an unsolicited notification to the kernel with invalid opcode. The respective write to /dev/fuse will return (1 << 32) - EINVAL with errno == 0 instead of -1 with errno == EINVAL. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org	2009-06-30 20:06:23 +02:00
James Morris	ac7242142b	Merge branch 'master' into next	2009-06-30 09:10:35 +10:00
Mimi Zohar	94e5d714f6	integrity: add ima_counts_put (updated) This patch fixes an imbalance message as reported by J.R. Okajima. The IMA file counters are incremented in ima_path_check. If the actual open fails, such as ETXTBSY, decrement the counters to prevent unnecessary imbalance messages. Reported-by: J.R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-06-29 08:59:10 +10:00
Jeff Layton	f0a71eb820	cifs: fix fh_mutex locking in cifs_reopen_file Fixes a regression caused by commit `a6ce4932fb` When this lock was converted to a mutex, the locks were turned into unlocks and vice-versa. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com> Cc: Stable Tree <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-27 23:46:43 +00:00
Linus Torvalds	aada1bc927	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] remove unknown mount option warning message [CIFS] remove bkl usage from umount begin cifs: Fix incorrect return code being printed in cFYI messages [CIFS] cleanup asn handling for ntlmssp [CIFS] Copy struct after setting the port, instead of before. cifs: remove rw/ro options cifs: fix problems with earlier patches cifs: have cifs parse scope_id out of IPv6 addresses and use it [CIFS] Do not send tree disconnect if session is already disconnected [CIFS] Fix build break cifs: display scopeid in /proc/mounts cifs: add new routine for converting AF_INET and AF_INET6 addrs cifs: have cifs_show_options show forceuid/forcegid options cifs: remove unneeded NULL checks from cifs_show_options	2009-06-26 09:37:19 -07:00
Steve French	71a394faaa	[CIFS] remove unknown mount option warning message Jeff's previous patch which removed the unneeded rw/ro parsing can cause a minor warning in dmesg (about the unknown rw or ro mount option) at mount time. This patch makes cifs ignore them in kernel to remove the warning (they are already handled in the mount helper and VFS). Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-26 04:07:18 +00:00
Steve French	ad8034f197	[CIFS] remove bkl usage from umount begin The lock_kernel call moved into the fs for umount_begin is not needed. This adds a check to make sure we don't call umount_begin twice on the same fs. umount_begin for cifs is probably not needed and may eventually be able to be removed, but in the meantime this smaller patch is safe and gets rid of the bkl from this path which provides some benefit. Acked-by: Jeff Layton <redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-26 03:25:49 +00:00
Suresh Jayaraman	0f3bc09ee1	cifs: Fix incorrect return code being printed in cFYI messages FreeXid() along with freeing Xid does add a cifsFYI debug message that prints rc (return code) as well. In some code paths where we set/return error code after calling FreeXid(), incorrect error code is being printed when cifsFYI is enabled. This could be misleading in few cases. For eg. In cifs_open() if cifs_fill_filedata() returns a valid pointer to cifsFileInfo, FreeXid() prints rc=-13 whereas 0 is actually being returned. Fix this by setting rc before calling FreeXid(). Basically convert FreeXid(xid); rc = -ERR; return -ERR; => FreeXid(xid); return rc; [Note that Christoph would like to replace the GetXid/FreeXid calls, which are primarily used for debugging. This seems like a good longer term goal, but although there is an alternative tracing facility, there are no examples yet available that I know of that we can use (yet) to convert this cifs function entry/exit logging, and for creating an identifier that we can use to correlate all dmesg log entries for a particular vfs operation (ie identify all log entries for a particular vfs request to cifs: e.g. a particular close or read or write or byte range lock call ... and just using the thread id is harder). Eventually when a replacement for this is available (e.g. when NFS switches over and various samples to look at in other file systems) we can remove the GetXid/FreeXid macro but in the meantime multiple people use this run time configurable logging all the time for debugging, and Suresh's patch fixes a problem which made it harder to notice some low memory problems in the log so it is worthwhile to fix this problem until a better logging approach is able to be used] Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-25 19:12:57 +00:00
Steve French	f46c7234e4	[CIFS] cleanup asn handling for ntlmssp Also removes obsolete distinction between rawntlmssp and ntlmssp (in asn/SPNEGO) since as jra noted we can always send raw ntlmssp in session setup now. remove check for experimental runtime flag (/proc/fs/cifs/Experimental) in ntlmssp path. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-25 03:07:48 +00:00
Simo Leone	6debdbc0ba	[CIFS] Copy struct after setting the port, instead of before. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Simo Leone <simo@archlinux.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-25 02:44:43 +00:00
Jeff Layton	6459340cfc	cifs: remove rw/ro options cifs: remove rw/ro options These options are handled at the VFS layer. They only ever set the option in the smb_vol struct. Nothing was ever done with them afterward anyway. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-25 02:33:01 +00:00
Jeff Layton	b48a485884	cifs: fix problems with earlier patches cifs: fix problems with earlier patches cifs_show_address hasn't been introduced yet, and fix a typo that was silently fixed by a later patch in the series. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-25 02:32:57 +00:00
Jeff Layton	681bf72e48	cifs: have cifs parse scope_id out of IPv6 addresses and use it This patch has CIFS look for a '%' in an IPv6 address. If one is present then it will try to treat that value as a numeric interface index suitable for stuffing into the sin6_scope_id field. This should allow people to mount servers on IPv6 link-local addresses. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: David Holder <david@erion.co.uk> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-25 01:14:36 +00:00
Steve French	268875b9d1	[CIFS] Do not send tree disconnect if session is already disconnected Noticed this when tree connect timed out (due to Samba server crash) - we try to send a tree disconnect for a tid that does not exist since we don't have a valid tree id yet. This checks that the session is valid before sending the tree disconnect to handle this case. Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-25 00:29:21 +00:00
Al Viro	d5bb68adda	another race fix in jfs_check_acl() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 17:02:42 -04:00
Al Viro	72c04902d1	Get "no acls for this inode" right, fix shmem breakage Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 16:58:48 -04:00
Linus Torvalds	936940a9c7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits) switch xfs to generic acl caching helpers helpers for acl caching + switch to those switch shmem to inode->i_acl switch reiserfs to inode->i_acl switch reiserfs to usual conventions for caching ACLs reiserfs: minimal fix for ACL caching switch nilfs2 to inode->i_acl switch btrfs to inode->i_acl switch jffs2 to inode->i_acl switch jfs to inode->i_acl switch ext4 to inode->i_acl switch ext3 to inode->i_acl switch ext2 to inode->i_acl add caching of ACLs in struct inode fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls cleanup __writeback_single_inode ... and the same for vfsmount id/mount group id Make allocation of anon devices cheaper update Documentation/filesystems/Locking devpts: remove module-related code ...	2009-06-24 10:03:12 -07:00
Linus Torvalds	d7ed9c05eb	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: remove redundant tests on unsigned udf: Use device size when drive reported bogus number of written blocks	2009-06-24 09:57:10 -07:00
Oleg Nesterov	a893a84e87	mm_for_maps: simplify, use ptrace_may_access() It would be nice to kill __ptrace_may_access(). It requires task_lock(), but this lock is only needed to read mm->flags in the middle. Convert mm_for_maps() to use ptrace_may_access(), this also simplifies the code a little bit. Also, we do not need to take ->mmap_sem in advance. In fact I think mm_for_maps() should not play with ->mmap_sem at all, the caller should take this lock. With or without this patch, without ->cred_guard_mutex held we can race with exec() and get the new ->mm but check old creds. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-06-25 00:20:58 +10:00
Al Viro	1cbd20d820	switch xfs to generic acl caching helpers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:07 -04:00
Al Viro	073aaa1b14	helpers for acl caching + switch to those helpers: get_cached_acl(inode, type), set_cached_acl(inode, type, acl), forget_cached_acl(inode, type). ubifs/xattr.c needed includes reordered, the rest is a plain switchover. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:07 -04:00
Al Viro	281eede032	switch reiserfs to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:06 -04:00
Al Viro	7a77b15d92	switch reiserfs to usual conventions for caching ACLs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:06 -04:00
Al Viro	e68888bcb6	reiserfs: minimal fix for ACL caching reiserfs uses NULL as "unknown" and ERR_PTR(-ENODATA) as "no ACL"; several codepaths store the former instead of the latter. All those codepaths go through iset_acl() and all cases when it's called with NULL acl are for the second variety, so the minimal fix is to teach iset_acl() to deal with that. Proper fix is to switch to more usual conventions and avoid back and forth between internally used ERR_PTR(-ENODATA) and NULL expected by the rest of the kernel. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:05 -04:00
Al Viro	d441b1c293	switch nilfs2 to inode->i_acl Actually, get rid of private analog, since nothing in there is using ACLs at all so far. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:05 -04:00
Al Viro	5affd88a10	switch btrfs to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:05 -04:00
Al Viro	290c263bf8	switch jffs2 to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:05 -04:00
Al Viro	05fc0790b6	switch jfs to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:04 -04:00
Al Viro	d4bfe2f76d	switch ext4 to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:04 -04:00
Al Viro	6582a0e6f6	switch ext3 to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:04 -04:00
Al Viro	5e78b43568	switch ext2 to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:28 -04:00
Al Viro	f19d4a8fa6	add caching of ACLs in struct inode No helpers, no conversions yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:27 -04:00
Ankit Jain	3e63cbb1ef	fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls This patch adds ioctls to vfs for compatibility with legacy XFS pre-allocation ioctls (XFS_IOC_RESVP). The implementation effectively invokes sys_fallocate for the new ioctls. Also handles the compat_ioctl case. Note: These legacy ioctls are also implemented by OCFS2. [AV: folded fixes from hch] Signed-off-by: Ankit Jain <me@ankitjain.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:27 -04:00
Christoph Hellwig	01c031945f	cleanup __writeback_single_inode There is no reason to for the split between __writeback_single_inode and __sync_single_inode, the former just does a couple of checks before tail-calling the latter. So merge the two, and while we're at it split out the I_SYNC waiting case for data integrity writers, as it's logically separate function. Finally rename __writeback_single_inode to writeback_single_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:26 -04:00
Al Viro	f21f62208a	... and the same for vfsmount id/mount group id Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:26 -04:00
Al Viro	c63e09eccc	Make allocation of anon devices cheaper Standard trick - add a new variable (start) such that for each n < start n is known to be busy. Allocation can skip checking everything in [0..start) and if it returns n, we can set start to n + 1. Freeing below start sets start to what we'd just freed. Of course, it still sucks if we do something like free 0 allocate allocate in a loop - still O(n^2) time. However, on saner loads it improves the things a lot and the entire thing is not worth the trouble of switching to something with better worst-case behaviour. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:25 -04:00
H. Peter Anvin	f6cc746bbb	devpts: remove module-related code These days, the devpts filesystem is closely integrated with the pty memory management, and cannot be built as a module, even less removed from the kernel. Accordingly, remove all module-related stuff from this filesystem. [ v2: only remove code that's actually dead ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:24 -04:00
Trond Myklebust	3b22edc573	VFS: Switch init_mount_tree() to use the new create_mnt_ns() helper Eliminates some duplicated code... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:24 -04:00
J. R. Okajima	654f562c52	vfs: fix nd->root leak in do_filp_open() commit `2a73787110` "Cache root in nameidata" introduced a new member nd->root, but forgot to put it in do_filp_open(). Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:24 -04:00
Christoph Hellwig	b5450d9c84	reiserfs: remove stray unlock_super in reiserfs_resize Reiserfs doesn't use lock_super anywhere internally, and ->remount_fs which calls reiserfs_resize does have it currently but also expects it to be held on return, so there's no business for the unlock_super here. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked by Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:24 -04:00
Roel Kluin	3391faa4f1	udf: remove redundant tests on unsigned first_block and goal are unsigned. When negative they are wrapped and caught by the other test. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-06-24 13:48:28 +02:00
Linus Torvalds	cf5434e894	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define. ocfs2: Add lockdep annotations vfs: Set special lockdep map for dirs only if not set by fs ocfs2: Disable orphan scanning for local and hard-ro mounts ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() ocfs2: Stop orphan scan as early as possible during umount ocfs2: Fix ocfs2_osb_dump() ocfs2: Pin journal head before accessing jh->b_committed_data ocfs2: Update atime in splice read if necessary. ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.	2009-06-23 19:36:02 -07:00
Trond Myklebust	b88f8a546f	NFS: Correct the NFS mount path when following a referral Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Trond Myklebust	0b75b35c7c	NFS: Fix nfs_path() to always return a '/' at the beginning of the path Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Trond Myklebust	c02d7adf8c	NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private namespace As noted in the previous patch, the NFSv4 client mount code currently has several limitations. If the mount path contains symlinks, or referrals, or even if it just contains a '..', then the client code in nfs4_path_walk() will fail with an error. This patch replaces the nfs4_path_walk()-based lookup with a helper function that sets up a private namespace to represent the namespace on the server, then uses the ordinary VFS and NFS path lookup code to walk down the mount path in that namespace. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Trond Myklebust	cf8d2c11cb	VFS: Add VFS helper functions for setting up private namespaces The purpose of this patch is to improve the remote mount path lookup support for distributed filesystems such as the NFSv4 client. When given a mount command of the form "mount server:/foo/bar /mnt", the NFSv4 client is required to look up the filehandle for "server:/", and then look up each component of the remote mount path "foo/bar" in order to find the directory that is actually going to be mounted on /mnt. Following that remote mount path may involve following symlinks, crossing server-side mount points and even following referrals to filesystem volumes on other servers. Since the standard VFS path lookup code already supports walking paths that contain all these features (using in-kernel automounts for following referrals) we would like to be able to reuse that rather than duplicate the full path traversal functionality in the NFSv4 client code. This patch therefore defines a VFS helper function create_mnt_ns(), that sets up a temporary filesystem namespace and attaches a root filesystem to it. It exports the create_mnt_ns() and put_mnt_ns() function for use by filesystem modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Trond Myklebust	616511d039	VFS: Uninline the function put_mnt_ns() In order to allow modules to use it without having to export vfsmount_lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
David Woodhouse	4839641333	jffs2: fix another potential leak on error path in scan.c Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-06-23 01:34:19 +01:00
Linus Torvalds	ac1b7c378e	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (63 commits) mtd: OneNAND: Allow setting of boundary information when built as module jffs2: leaking jffs2_summary in function jffs2_scan_medium mtd: nand: Fix memory leak on txx9ndfmc probe failure. mtd: orion_nand: use burst reads with double word accesses mtd/nand: s3c6400 support for s3c2410 driver [MTD] [NAND] S3C2410: Use DIV_ROUND_UP [MTD] [NAND] S3C2410: Deal with unaligned lengths in S3C2440 buffer read/write [MTD] [NAND] S3C2410: Allow the machine code to get the BBT table from NAND [MTD] [NAND] S3C2410: Added a kerneldoc for s3c2410_nand_set mtd: physmap_of: Add multiple regions and concatenation support mtd: nand: max_retries off by one in mxc_nand mtd: nand: s3c2410_nand_setrate(): use correct macros for 2412/2440 mtd: onenand: add bbt_wait & unlock_all as replaceable for some platform mtd: Flex-OneNAND support mtd: nand: add OMAP2/OMAP3 NAND driver mtd: maps: Blackfin async: fix memory leaks in probe/remove funcs mtd: uclinux: mark local stuff static mtd: uclinux: do not allow to be built as a module mtd: uclinux: allow systems to override map addr/size mtd: blackfin NFC: fix hang when using NAND on BF527-EZKITs ...	2009-06-22 16:56:22 -07:00
Tao Ma	d246ab307d	ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define. Actually ocfs2_sysfile_cluster_lock_key is only used if we enable CONFIG_DEBUG_LOCK_ALLOC. Wrap it so that we can avoid a building warning. fs/ocfs2/sysfile.c:53: warning: ‘ocfs2_sysfile_cluster_lock_key’ defined but not used Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:34:29 -07:00
Jan Kara	cb25797d45	ocfs2: Add lockdep annotations Add lockdep support to OCFS2. The support also covers all of the cluster locks except for open locks, journal locks, and local quotafile locks. These are special because they are acquired for a node, not for a particular process and lockdep cannot deal with such type of locking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:34:26 -07:00
Jan Kara	9a7aa12f39	vfs: Set special lockdep map for dirs only if not set by fs Some filesystems need to set lockdep map for i_mutex differently for different directories. For example OCFS2 has system directories (for orphan inode tracking and for gathering all system files like journal or quota files into a single place) which have different locking locking rules than standard directories. For a filesystem setting lockdep map is naturaly done when the inode is read but we have to modify unlock_new_inode() not to overwrite the lockdep map the filesystem has set. Acked-by: peterz@infradead.org CC: mingo@redhat.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:34:22 -07:00
Sunil Mushran	df152c241d	ocfs2: Disable orphan scanning for local and hard-ro mounts Local and Hard-RO mounts do not need orphan scanning. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:55 -07:00
Sunil Mushran	3211949f89	ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() We don't access the LVB in our ocfs2_*_lock_res_init() functions. Since the LVB can become invalid during some cluster recovery operations, the dlmglue must be able to handle an uninitialized LVB. For the orphan scan lock, we initialized an uninitialzed LVB with our scan sequence number plus one. This starts a normal orphan scan cycle. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:53 -07:00
Sunil Mushran	692684e19e	ocfs2: Stop orphan scan as early as possible during umount Currently if the orphan scan fires a tick before the user issues the umount, the umount will wait for the queued orphan scan tasks to complete. This patch makes the umount stop the orphan scan as early as possible so as to reduce the probability of the queued tasks slowing down the umount. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:51 -07:00
Sunil Mushran	c3d38840ab	ocfs2: Fix ocfs2_osb_dump() Skip printing information that is not valid for local mounts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:49 -07:00
Sunil Mushran	94e41ecfe0	ocfs2: Pin journal head before accessing jh->b_committed_data This patch adds jbd_lock_bh_state() and jbd_unlock_bh_state() around accessses to jh->b_committed_data. Fixes oss bugzilla#1131 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1131 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:47 -07:00
Tao Ma	1962f39abb	ocfs2: Update atime in splice read if necessary. We should call ocfs2_inode_lock_atime instead of ocfs2_inode_lock in ocfs2_file_splice_read like we do in ocfs2_file_aio_read so that we can update atime in splice read if necessary. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:45 -07:00
Joel Becker	1c520dfbf3	ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API. The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and the DLM cannot reconstruct its state. Clients of the DLM need to know this. ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses track of the state. This is not a standard behavior, but ocfs2 has always relied on it. Thus, an o2dlm LVB is always "valid". ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When fs/dlm loses track of an LVBs state, it sets a flag (DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of the LVB may be garbage or merely stale. ocfs2 doesn't want to try to guess at the validity of the stale LVB. Instead, it should be checking the VALNOTVALID flag. As this is the 'standard' way of treating LVBs, we will promote this behavior. We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when the LVB is valid. o2dlm will always return valid, while fs/dlm will check VALNOTVALID. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com>	2009-06-22 14:24:30 -07:00
Linus Torvalds	7e0338c0de	Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd * 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits) SUNRPC: Fix the TCP server's send buffer accounting nfsd41: Backchannel: minorversion support for the back channel nfsd41: Backchannel: cleanup nfs4.0 callback encode routines nfsd41: Remove ip address collision detection case nfsd: optimise the starting of zero threads when none are running. nfsd: don't take nfsd_mutex twice when setting number of threads. nfsd41: sanity check client drc maxreqs nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct NFS: kill off complicated macro 'PROC' sunrpc: potential memory leak in function rdma_read_xdr nfsd: minor nfsd_vfs_write cleanup nfsd: Pull write-gathering code out of nfsd_vfs_write nfsd: track last inode only in use_wgather case sunrpc: align cache_clean work's timer nfsd: Use write gathering only with NFSv2 NFSv4: kill off complicated macro 'PROC' NFSv4: do exact check about attribute specified knfsd: remove unreported filehandle stats counters knfsd: fix reply cache memory corruption knfsd: reply cache cleanups ...	2009-06-22 12:55:50 -07:00
Linus Torvalds	df36b439c5	Merge branch 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (128 commits) nfs41: sunrpc: xprt_alloc_bc_request() should not use spin_lock_bh() nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res nfs: remove unnecessary NFS_INO_INVALID_ACL checks NFS: More "sloppy" parsing problems NFS: Invalid mount option values should always fail, even with "sloppy" NFS: Remove unused XDR decoder functions NFS: Update MNT and MNT3 reply decoding functions NFS: add XDR decoder for mountd version 3 auth-flavor lists NFS: add new file handle decoders to in-kernel mountd client NFS: Add separate mountd status code decoders for each mountd version NFS: remove unused function in fs/nfs/mount_clnt.c NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument NFS: Clean up MNT program definitions lockd: Don't bother with RPC ping for NSM upcalls lockd: Update NSM state from SM_MON replies NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available NFS: Return error code from nfs_callback_up() to user space NFS: Do not display the setting of the "intr" mount option NFS: add support for splice writes nfs41: Backchannel: CB_SEQUENCE validation ...	2009-06-22 12:53:06 -07:00
Hitoshi Mitake	e38be994b9	Making fs/minix/minix.h double including safe I happened to find that fs/minix/minix.h doesn't guard double include. Yes, I know this never cause something destructive because this is self-evidence that no source file includes minix.h twice, but I think fixing this is better than disregarding it. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 11:34:42 -07:00
Boaz Harrosh	baaf94cdc7	exofs: Avoid using file_fsync() The use of file_fsync() in exofs_file_sync() is not necessary since it does some extra stuff not used by exofs. Open code just the parts that are currently needed. TODO: Farther optimization can be done to sync the sb only on inode update of new files, Usually the sb update is not needed in exofs. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-06-21 17:53:47 +03:00
Boaz Harrosh	27d2e14919	exofs: Remove IBM copyrights Boaz, Congrats on getting all the OSD stuff into 2.6.30! I just pulled the git, and saw that the IBM copyrights are still there. Please remove them from all files: * Copyright (C) 2005, 2006 * International Business Machines IBM has revoked all rights on the code - they gave it to me. Thanks! Avishay Signed-off-by: Avishay Traeger <avishay@gmail.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-06-21 17:53:47 +03:00
Boaz Harrosh	b76a3f93d0	exofs: Fix bio leak in error handling path (sync read) When failing a read request in the sync path, called from write_begin, I forgot to free the allocated bio, fix it. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-06-21 17:53:44 +03:00
Benny Halevy	578e458568	nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res nfs4_open_recover_helper clears opendata->o_res before calling nfs4_init_opendata_res, thus causing NFSv4.0 OPEN operations to be sent rather than nfsv4.1. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-20 14:55:12 -04:00
Linus Torvalds	12e24f34cb	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) perfcounter: Handle some IO return values perf_counter: Push perf_sample_data through the swcounter code perf_counter tools: Define and use our own u64, s64 etc. definitions perf_counter: Close race in perf_lock_task_context() perf_counter, x86: Improve interactions with fast-gup perf_counter: Simplify and fix task migration counting perf_counter tools: Add a data file header perf_counter: Update userspace callchain sampling uses perf_counter: Make callchain samples extensible perf report: Filter to parent set by default perf_counter tools: Handle lost events perf_counter: Add event overlow handling fs: Provide empty .set_page_dirty() aop for anon inodes perf_counter: tools: Makefile tweaks for 64-bit powerpc perf_counter: powerpc: Add processor back-end for MPC7450 family perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels perf_counter: powerpc: Change how processor-specific back-ends get selected perf_counter: powerpc: Use unsigned long for register and constraint values perf_counter: powerpc: Enable use of software counters on 32-bit powerpc perf_counter tools: Add and use isprint() ...	2009-06-20 11:29:32 -07:00
OGAWA Hirofumi	3e107603ae	fat: Fix the removal of opts->fs_dmask (`ce3b0f8d5c`: New helper - current_umask()) is removing the opts->fs_dmask, probably it's a cut-and-paste miss or something. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>	2009-06-20 21:50:47 +09:00
Linus Torvalds	bee89ab228	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: inotify_destroy_mark_entry could get called twice	2009-06-19 17:46:44 -07:00
Linus Torvalds	31583d6acf	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Fix kernel-doc parameter name typo in blk-settings.c: block: rename CONFIG_LBD to CONFIG_LBDAF block: Fix bounce_pfn setting hd: stop defining MAJOR_NR	2009-06-19 17:43:04 -07:00
Eric Paris	528da3e9e2	inotify: inotify_destroy_mark_entry could get called twice inotify_destroy_mark_entry could get called twice for the same mark since it is called directly in inotify_rm_watch and when the mark is being destroyed for another reason. As an example assume that the file being watched was just deleted so inotify_destroy_mark_entry would get called from the path fsnotify_inoderemove() -> fsnotify_destroy_marks_by_inode() -> fsnotify_destroy_mark_entry() -> inotify_destroy_mark_entry(). If this happened at the same time as userspace tried to remove a watch via inotify_rm_watch we could attempt to remove the mark from the idr twice and could thus double dec the ref cnt and potentially could be in a use after free/double free situation. The fix is to have inotify_rm_watch use the generic recursive safe fsnotify_destroy_mark_by_entry() so we are sure the inotify_destroy_mark_entry() function can only be called one. This patch also renames the function to inotify_ingored_remove_idr() so it is clear what is actually going on in the function. Hopefully this fixes: [ 20.342058] idr_remove called for id=20 which is not allocated. [ 20.348000] Pid: 1860, comm: udevd Not tainted 2.6.30-tip #1077 [ 20.353933] Call Trace: [ 20.356410] [<ffffffff811a82b7>] idr_remove+0x115/0x18f [ 20.361737] [<ffffffff8134259d>] ? _spin_lock+0x6d/0x75 [ 20.367061] [<ffffffff8111640a>] ? inotify_destroy_mark_entry+0xa3/0xcf [ 20.373771] [<ffffffff8111641e>] inotify_destroy_mark_entry+0xb7/0xcf [ 20.380306] [<ffffffff81115913>] inotify_freeing_mark+0xe/0x10 [ 20.386238] [<ffffffff8111410d>] fsnotify_destroy_mark_by_entry+0x143/0x170 [ 20.393293] [<ffffffff811163a3>] inotify_destroy_mark_entry+0x3c/0xcf [ 20.399829] [<ffffffff811164d1>] sys_inotify_rm_watch+0x9b/0xc6 [ 20.405850] [<ffffffff8100bcdb>] system_call_fastpath+0x16/0x1b Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by: Peter Ziljlstra <peterz@infradead.org>	2009-06-19 12:42:48 -04:00
Bartlomiej Zolnierkiewicz	90c699a9ee	block: rename CONFIG_LBD to CONFIG_LBDAF Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-19 08:08:50 +02:00
Andy Adamson	ab52ae6db0	nfsd41: Backchannel: minorversion support for the back channel Prepare to share backchannel code with NFSv4.1. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [nfsd41: use nfsd4_cb_sequence for callback minorversion] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-18 18:33:57 -07:00
Andy Adamson	ef52bff840	nfsd41: Backchannel: cleanup nfs4.0 callback encode routines Mimic the client and prepare to share the back channel xdr with NFSv4.1. Bump the number of operations in each encode routine, then backfill the number of operations. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-18 18:33:57 -07:00
Trond Myklebust	1f84603c09	Merge branch 'devel-for-2.6.31' into for-2.6.31 Conflicts: fs/nfs/client.c fs/nfs/super.c	2009-06-18 18:13:44 -07:00
Mike Sager	6ddbbbfe52	nfsd41: Remove ip address collision detection case Verified that cthon and pynfs exchange id tests pass (except for the two expected fails: EID8 and EID50) Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-18 17:43:53 -07:00
Linus Torvalds	0732f87761	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: clean up jbd2_journal_try_to_free_buffers() ext4: Don't update ctime for non-extent-mapped inodes ext4: Fix up whitespace issues in fs/ext4/inode.c ext4: Fix 64-bit block type problem on 32-bit platforms ext4: teach the inode allocator to use a goal inode number ext4: Use a hash of the topdir directory name for the Orlov parent group ext4: document the "abort" mount option ext4: move the abort flag from s_mount_opts to s_mount_flags ext4: update the s_last_mounted field in the superblock ext4: change s_mount_opt to be an unsigned int ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctl ext4: avoid unnecessary spinlock in critical POSIX ACL path ext3: avoid unnecessary spinlock in critical POSIX ACL path ext4: convert instrumentation from markers to tracepoints jbd2: convert instrumentation from markers to tracepoints	2009-06-18 14:07:46 -07:00
Peter Oberparleiter	0b923606e7	seq_file: add function to write binary data seq_write() can be used to construct seq_files containing arbitrary data. Required by the gcov-profiling interface to synthesize binary profiling data files. Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:57 -07:00
Oleg Nesterov	3b34fc5880	elf_core_dump: use rcu_read_lock() to access ->real_parent In theory it is not safe to dereference ->parent/real_parent without tasklist or rcu lock, we can race with re-parenting. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:52 -07:00
Jeff Mahoney	1d965fe0eb	reiserfs: fix warnings with gcc 4.4 Several code paths in reiserfs have a construct like: if (is_direntry_le_ih(ih = B_N_PITEM_HEAD(src, item_num))) ... which, in addition to being ugly, end up causing compiler warnings with gcc 4.4.0. Previous compilers didn't issue a warning. fs/reiserfs/do_balan.c:1273: warning: operation on `aux_ih' may be undefined fs/reiserfs/lbalance.c:393: warning: operation on `ih' may be undefined fs/reiserfs/lbalance.c:421: warning: operation on `ih' may be undefined fs/reiserfs/lbalance.c:777: warning: operation on `ih' may be undefined I believe this is due to the ih being passed to macros which evaluate the argument more than once. This is old code and we haven't seen any problems with it, but this patch eliminates the warnings. It converts the multiple evaluation macros to static inlines and does a preassignment for the cases that were causing the warnings because that code is just ugly. Reported-by: Chris Mason <mason@oracle.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:46 -07:00
Roel Kluin	37044c86ba	ufs: sector_t cannot be negative unsigned i_block,fragment cannot be negative. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:46 -07:00
Jan Kara	5404ac8e44	isofs: cleanup mount option processing Remove unused variables from isofs_sb_info (used to be some mount options), unify variables for option to use 0/1 (some options used 'y'/'n'), use bit fields for option flags in superblock. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Jan Kara	5c4a656b7e	isofs: fix setting of uid and gid to 0 isofs allows setting of default uid and gid of files but value 0 was used to indicate that user did not specify any uid/gid mount option. Since this option also overrides uid/gid set in Rock Ridge extension, it makes sense to allow forcing uid/gid 0. Fix option processing to allow this. Cc: <Hans-Joachim.Baader@cjt.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Jan Kara	52b680c812	isofs: let mode and dmode mount options override rock ridge mode setting So far, permissions set via 'mode' and/or 'dmode' mount options were effective only if the medium had no rock ridge extensions (or was mounted without them). Add 'overriderockmode' mount option to indicate that these options should override permissions set in rock ridge extensions. Maybe this should be default but the current behavior is there since mount options were created so I think we should not change how they behave. Cc: <Hans-Joachim.Baader@cjt.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Jan Kara	ef43618a47	ext3: make sure inode is deleted from orphan list after truncate As Ted pointed out, it can happen that ext3_truncate() returns without removing inode from orphan list. This way we could in some rare cases (like when we get ENOMEM from an allocation in ext3_truncate called because of failed ext3_write_begin) leave the inode on orphan list and that triggers assertion failure on umount. So make ext3_truncate() always remove inode from in-memory orphan list. Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Hisashi Hifumi	6f3f1cb21f	jbd: clean up journal_try_to_free_buffers() I delete the following patch "commit `3f31fddfa2` Author: Mingming Cao <cmm@us.ibm.com> Date: Fri Jul 25 01:46:22 2008 -0700 jbd: fix race between free buffer and commit transaction This patch is no longer needed because if race between freeing buffer and committing transaction functionality occurs and dio gets error, currently dio falls back to buffered IO by the following patch. commit `6ccfa806a9` Author: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Date: Tue Sep 2 14:35:40 2008 -0700 VFS: fix dio write returning EIO when try_to_release_page fails Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Theodore Tso <tytso@mit.edu> Cc: Mingming Cao <cmm@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Jan Kara	e8ef7aaea7	ext3: fix chain verification in ext3_get_blocks() Chain verification in ext3_get_blocks() has been hosed since it called verify_chain(chain, NULL) which always returns success. As a result readers could in theory race with truncate. On the other hand the race probably cannot happen with the current locking scheme, since by the time ext3_truncate() is called all the pages are already removed and hence get_block() shouldn't be called on such pages... Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Jan Kara	39fe7557b4	ext2: Do not update mtime of a moved directory One of our users is complaining that his backup tool is upset on ext2 (while it's happy on ext3, xfs, ...) because of the mtime change. The problem is: mkdir foo mkdir bar mkdir foo/a Now under ext2: mv foo/a foo/b changes mtime of 'foo/a' (foo/b after the move). That does not really make sense and it does not happen under any other filesystem I've seen. More complicated is: mv foo/a bar/a This changes mtime of foo/a (bar/a after the move) and it makes some sense since we had to update parent directory pointer of foo/a. But again, no other filesystem does this. So after some thoughts I'd vote for consistency and change ext2 to behave the same as other filesystems. Do not update mtime of a moved directory. Specs don't say anything about it (neither that it should, nor that it should not be updated) and other common filesystems (ext3, ext4, xfs, reiserfs, fat, ...) don't do it. So let's become more consistent. Spotted by ronny.pretzsch@dfs.de, initial fix by Jörn Engel. Reported-by: <ronny.pretzsch@dfs.de> Cc: <hare@suse.de> Cc: Jörn Engel <joern@logfs.org> Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:44 -07:00
Cyrill Gorcunov	2f6d311080	proc: vmcore - use kzalloc in get_new_element() Instead of kmalloc+memset better use straight kzalloc Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:41 -07:00
Michal Simek	bcac2b1b7d	procfs: remove sparse errors in proc_devtree.c CHECK fs/proc/proc_devtree.c fs/proc/proc_devtree.c:197:14: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:203:34: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:210:14: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:223:26: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:226:14: warning: Using plain integer as NULL pointer Signed-off-by: Michal Simek <monstr@monstr.eu> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:41 -07:00
Davide Libenzi	3fe4a975d6	epoll: fix nested calls support This fixes a regression in 2.6.30. I unfortunately accepted a patch time ago, to drop the "current" usage from possible IRQ context, w/out proper thought over it. The patch switched to using the CPU id by bounding the nested call callback with a get_cpu()/put_cpu(). Unfortunately the ep_call_nested() function can be called with a callback that grabs sleepy locks (from own f_op->poll()), that results in epic fails. The following patch uses the proper "context" depending on the path where it is called, and on the kind of callback. This has been reported by Stefan Richter, that has also verified the patch is his previously failing environment. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:41 -07:00
Keika Kobayashi	d3d64df21d	proc: export statistics for softirq to /proc Export statistics for softirq in /proc/softirqs and /proc/stat. 1. /proc/softirqs Implement /proc/softirqs which shows the number of softirq for each CPU like /proc/interrupts. 2. /proc/stat Add the "softirq" line to /proc/stat. This line shows the number of softirq for all cpu. The first column is the total of all softirqs and each subsequent column is the total for particular softirq. [kosaki.motohiro@jp.fujitsu.com: remove redundant for_each_possible_cpu() loop] Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp> Reviewed-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:41 -07:00
David Teigland	c78a87d0a1	dlm: fix plock use-after-free Fix a regression from the original addition of nfs lock support `586759f03e`. When a synchronous (non-nfs) plock completes, the waiting thread will wake up and free the op struct. This races with the user thread in dev_write() which goes on to read the op's callback field to check if the lock is async and needs a callback. This check can happen on the freed op. The fix is to note the callback value before the op can be freed. Signed-off-by: David Teigland <teigland@redhat.com>	2009-06-18 13:42:42 -05:00
NeilBrown	671e1fcf63	nfsd: optimise the starting of zero threads when none are running. Currently, if we ask to set then number of nfsd threads to zero when there are none running, we set up all the sockets and register the service, and then tear it all down again. This is pointless. So detect that case and exit promptly. (also remove an assignment to 'error' which was never used. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com>	2009-06-18 09:42:41 -07:00
NeilBrown	82e12fe924	nfsd: don't take nfsd_mutex twice when setting number of threads. Currently when we write a number to 'threads' in nfsdfs, we take the nfsd_mutex, update the number of threads, then take the mutex again to read the number of threads. Mostly this isn't a big deal. However if we are write '0', and portmap happens to be dead, then we can get unpredictable behaviour. If the nfsd threads all got killed quickly and the last thread is waiting for portmap to respond, then the second time we take the mutex we will block waiting for the last thread. However if the nfsd threads didn't die quite that fast, then there will be no contention when we try to take the mutex again. Unpredictability isn't fun, and waiting for the last thread to exit is pointless, so avoid taking the lock twice. To achieve this, get nfsd_svc return a non-negative number of active threads when not returning a negative error. Signed-off-by: NeilBrown <neilb@suse.de>	2009-06-18 09:40:31 -07:00
Peter Zijlstra	d3a9262e59	fs: Provide empty .set_page_dirty() aop for anon inodes .set_page_dirty() is one of those a_ops that defaults to the buffer implementation when not set. Therefore provide a dummy function to make it do nothing. (Uncovered by perfcounters fd's which can now be writable-mmap-ed.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-18 14:46:10 +02:00
Jan Kara	24a5d59f34	udf: Use device size when drive reported bogus number of written blocks Some drives report 0 as the number of written blocks when there are some blocks recorded. Use device size in such case so that we can automagically mount such media. Signed-off-by: Jan Kara <jack@suse.cz>	2009-06-18 12:33:16 +02:00
James Morris	4bf259e3ae	nfs: remove unnecessary NFS_INO_INVALID_ACL checks Unless I'm mistaken, NFS_INO_INVALID_ACL is being checked twice during getacl calls (i.e. first via nfs_revalidate_inode() and then by each all site). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:14 -07:00
Chuck Lever	a5a16bae70	NFS: More "sloppy" parsing problems Specifying "port=-5" with the kernel's current mount option parser generates "unrecognized mount option". If "sloppy" is set, this causes the mount to succeed and use the default values; the desired behavior is that, since this is a valid option with an invalid value, the mount should fail, even with "sloppy." To properly handle "sloppy" parsing, we need to distinguish between correct options with invalid values, and incorrect options. We will need to parse integer values by hand, therefore, and not rely on match_token(). For instance, these must all fail with "invalid value": port=12345678 port=-5 port=samuel and not with "unrecognized option," as they do currently. Thus, for the sake of match_token() we need to treat the values for these options as strings, and do the conversion to integers using strict_strtol(). This is basically the same solution we used for the earlier "retry=" fix (commit `ecbb3845`), except in this case the kernel actually has to parse the value, rather than ignore it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:14 -07:00
Chuck Lever	d23c45fd84	NFS: Invalid mount option values should always fail, even with "sloppy" Ian Kent reports: "I've noticed a couple of other regressions with the options vers and proto option of mount.nfs(8). The commands: mount -t nfs -o vers=<invalid version> <server>:/<path> /<mountpoint> mount -t nfs -o proto=<invalid proto> <server>:/<path> /<mountpoint> both immediately fail. But if the "-s" option is also used they both succeed with the mount falling back to defaults (by the look of it). In the past these failed even when the sloppy option was given, as I think they should. I believe the sloppy option is meant to allow the mount command to still function for mount options (for example in shared autofs maps) that exist on other Unix implementations but aren't present in the Linux mount.nfs(8). So, an invalid value specified for a known mount option is different to an unknown mount option and should fail appropriately." See RH bugzilla 486266. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:13 -07:00
Chuck Lever	065015e5ef	NFS: Remove unused XDR decoder functions Clean up: Remove xdr_decode_fhstatus() and xdr_decode_fhstatus3(), now that they are unused. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:13 -07:00
Chuck Lever	8e02f6b9aa	NFS: Update MNT and MNT3 reply decoding functions Solder xdr_stream-based XDR decoding functions into the in-kernel mountd client that are more careful about checking data types and watching for buffer overflows. The new MNT3 decoder includes support for auth-flavor list decoding. The "_sz" macro for MNT3 replies was missing the size of the file handle. I've added this back, and included the size of the auth flavor array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:13 -07:00
Chuck Lever	a14017db28	NFS: add XDR decoder for mountd version 3 auth-flavor lists Introduce an xdr_stream-based XDR decoder that can unpack the auth- flavor list returned in a MNT3 reply. The nfs_mount() function's caller allocates an array, and passes the size and a pointer to it. The decoder decodes all the flavors it can into the array, and returns the number of decoded flavors. If the caller is not interested in the auth flavors, it can pass a value of zero as the size of the pre-allocated array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:12 -07:00
Chuck Lever	4fdcd9966d	NFS: add new file handle decoders to in-kernel mountd client Introduce xdr_stream-based XDR file handle decoders to the in-kernel mountd client. These are more careful than the existing decoder functions about buffer overflows and data type and range checking. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:12 -07:00
Chuck Lever	fb12529577	NFS: Add separate mountd status code decoders for each mountd version Introduce data structures and xdr_stream-based decoding functions for unmarshalling mountd status codes properly. Mountd version 3 uses specific standard error return codes that are not errno values and not NFS3ERR_ values. These have a well-defined standard mapping to local errno values. Introduce data structures and a decoder function that map these status codes to local errno values properly. This is new functionality (but not used yet). Version 1 mountd status values are defined by RFC 1094 as UNIX error values (errno values). Errno values on heterogeneous systems do not necessarily match each other. To avoid exposing possibly incorrect errno values to upper layers, the current XDR decoder converts all non-zero MNT version 1 status codes to -EACCES. The OpenGroup XNFS standard provides a mapping similar to but smaller than the version 3 error codes. Implement a decoder that uses the XNFS error codes, replacing the current decoder. For both mountd protocol versions, map unrecognized errors to -EACCES. Finally we introduce a replacement data structure for mnt_fhstatus at this time, which is used by the new XDR decoders. In addition to documenting that the status value returned by the XDR decoders is always an errno, this new structure will be expanded in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:12 -07:00
Chuck Lever	99835db430	NFS: remove unused function in fs/nfs/mount_clnt.c Clean up: remove xdr_encode_dirpath() now that it has been replaced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:11 -07:00
Chuck Lever	29a1bd6bf8	NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument Check the length of the supplied dirpath, and see that it fits properly in the RPC buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:11 -07:00
Chuck Lever	2ad780978b	NFS: Clean up MNT program definitions Clean up: Relocate MNT program procedure number definitions to the only file that uses them. Relocate the version number definitions, which are shared, to nfs.h. Remove duplicate program number definitions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:11 -07:00
Chuck Lever	0e5c2632e1	lockd: Don't bother with RPC ping for NSM upcalls Cut NSM upcall RPC traffic in half -- don't do a NULL call first. The cases where a ping would be helpful are rare. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:11 -07:00
Chuck Lever	6c9dc42551	lockd: Update NSM state from SM_MON replies When rpc.statd starts up in user space at boot time, it attempts to write the latest NSM local state number into /proc/sys/fs/nfs/nsm_local_state. If lockd.ko isn't loaded yet (as is the case in most configurations), that file doesn't exist, thus the kernel's NSM state remains set to its initial value of zero during lockd operation. This is a problem because rpc.statd and lockd use the NSM state number to prevent repeated lock recovery on rebooted hosts. If lockd sends a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state number is received, there is no way for lockd or rpc.statd to distinguish that stale SM_NOTIFY from an actual reboot. Thus lock recovery could be performed after the rebooted host has already started reclaiming locks, and those locks will be lost. We could change /etc/init.d/nfslock so it always modprobes lockd.ko before starting rpc.statd. However, if lockd.ko is ever unloaded and reloaded, we are back at square one, since the NSM state is not preserved across an unload/reload cycle. This may happen frequently on clients that use automounter. A period of NFS inactivity causes lockd.ko to be unloaded, and the kernel loses its NSM state setting. Instead, let's use the fact that rpc.statd plants the local system's NSM state in every SM_MON (and SM_UNMON) reply. lockd performs a synchronous SM_MON upcall to the local rpc.statd _before_ sending its first NLM request to a new remote. This would permit rpc.statd to provide the current NSM state to lockd, even after lockd.ko had been unloaded and reloaded. Note that NLMPROC_LOCK arguments are constructed before the nsm_monitor() call, so we have to rearrange argument construction very slightly to make this all work out. And, the kernel appears to treat NSM state as a u32 (see struct nlm_args and nsm_res). Make nsm_local_state a u32 as well, to ensure we don't get bogus comparison results. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:10 -07:00
Chuck Lever	18fc316419	NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available Clear "ret" if the error return from svc_create_xprt(AF_INET6) was -EAFNOSUPORT. Otherwise, callback start-up will succeed, but nfs_callback_up() will return -EAFNOSUPPORT anyway, and the first NFSv4 mount attempt after a reboot will fail. Bug introduced by commit `f738f517` in 2.6.30-rc1. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:10 -07:00
Chuck Lever	a21bdd9b96	NFS: Return error code from nfs_callback_up() to user space If the kernel cannot start the NFSv4 callback service during a mount request, it returns -ENOMEM to user space, resulting in this message: mount.nfs4: Cannot allocate memory Adjust nfs_alloc_client() and nfs_get_client() to pass NFSv4 callback start-up errors back to user space so a less mysterious error message can be displayed by the mount command. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:10 -07:00
Chuck Lever	c381ad2cf2	NFS: Do not display the setting of the "intr" mount option The "intr" mount option has been deprecated for a while, but /proc/mounts continues to display "nointr" whether "intr" or "nointr" has been specified for a mount point. Since these options do not have any effect, simply do not display them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:09 -07:00
Suresh Jayaraman	bf40d3435c	NFS: add support for splice writes Adds support for splice writes. It effectively calls generic_file_splice_write() to do the writes. We need not worry about O_APPEND case as the combination of splice() writes and O_APPEND is disallowed. This patch propagates NFS write errors back to the caller. The number of bytes written via splice are being added to NFSIO_NORMALWRITTENBYTES as these are effectively cached writes. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:09 -07:00
Trond Myklebust	301933a0ac	Merge commit 'linux-pnfs/nfs41-for-2.6.31' into nfsv41-for-2.6.31	2009-06-17 17:59:58 -07:00
Hisashi Hifumi	536fc240e7	jbd2: clean up jbd2_journal_try_to_free_buffers() This patch reverts `3f31fddf`, which is no longer needed because if a race between freeing buffer and committing transaction functionality occurs and dio gets error, currently dio falls back to buffered IO due to the commit `6ccfa806`. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Mingming Cao <cmm@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-17 20:08:51 -04:00
Ricardo Labiaga	68f3f90133	nfs41: Backchannel: CB_SEQUENCE validation Validates the callback's sessionID, the slot number, and the sequence ID. Increments the slot's sequence. Detects replays, but simply prints a debug message (if debugging is enabled since we don't yet implement a duplicate request cache for the backchannel. This should not present a problem, since only idempotent callbacks are currently implemented. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Backchannel: Be more obvious about the return value] [nfs41: Backchannel: dprink in host order] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:43 -07:00
Ricardo Labiaga	963891ac43	nfs41: Backchannel: New find_client_with_session() Finds the 'struct nfs_client' that matches the server's address, major version number, and session ID. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:43 -07:00
Ricardo Labiaga	f8625a6a4b	nfs41: Backchannel: Add a backchannel slot table to the session Defines a new 'struct nfs4_slot_table' in the 'struct nfs4_session' for use by the backchannel. Initializes, resets, and destroys the backchannel slot table in the same manner the forechannel slot table is initialized, reset, and destroyed. The sequenceid for each slot in the backchannel slot table is initialized to 0, whereas the forechannel slotid's sequenceid is set to 1. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:42 -07:00
Ricardo Labiaga	050047ce71	nfs41: Backchannel: Refactor nfs4_init_slot_table() Generalize nfs4_init_slot_table() so it can be used to initialize the backchannel slot table in addition to the forechannel slot table. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:42 -07:00
Ricardo Labiaga	b73dafa7ac	nfs41: Backchannel: Refactor nfs4_reset_slot_table() Generalize nfs4_reset_slot_table() so it can be used to reset the backchannel slot table in addition to the forechannel slot table. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:41 -07:00
Ricardo Labiaga	65fc64e547	nfs41: Backchannel: update cb_sequence args and results Change the type of cs_addr and csr_status to 'struct sockaddr' and '__be32' since the cb_sequence processing function will use existing functionality that expects these types. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:40 -07:00
Benny Halevy	281fe15dc1	nfs41: verify CB_SEQUENCE position in callback compound CB_SEQUENCE must appear first in the callback compound RPC. If it is not the first operation NFS4ERR_SEQUENCE_POS must be returned. If the first operation ni the CB_COMPOUND is not CB_SEQUENCE then NFS4ERR_OP_NOT_IN_SESSION must be returned. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: refactor op preprocessing out of process_op] Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:39 -07:00
Benny Halevy	4aece6a19c	nfs41: cb_sequence xdr implementation [nfs41: get rid of READMEM and COPYMEM for callback_xdr.c] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: get rid of READ64 in callback_xdr.c] See http://linux-nfs.org/pipermail/pnfs/2009-June/007846.html Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:38 -07:00
Benny Halevy	d49433e1e3	nfs41: cb_sequence proc implementation Currently, just free up any referring calls information. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: fix csr_{,target}highestslotid] Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:38 -07:00
Benny Halevy	2d9b9ec344	nfs41: cb_sequence protocol level data structures Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:37 -07:00
Benny Halevy	34bc47c941	nfs41: consider minorversion in callback_xdr:process_op Note that this patch changes the nfsv4.0 behavior also when CONFIG_NFS_V4_1 is not defined where NFS4ERR_MINOR_VERS_MISMATCH will be returned if the client received a CB_COMPOUND with minorversion != 0. Previously, it would have returned NFS4ERR_OP_ILLEGAL for CB_SEQUENCE. (or if the server is broken and sent OP_CB_GETATTR or OP_CB_RECALL with minorversion!=0, they would have been processed normally. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: refactor op preprocessing out of process_op] See http://linux-nfs.org/pipermail/pnfs/2009-June/007845.html [nfs41: define CB_NOTIFY_DEVICEID as not supported] Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:37 -07:00
Benny Halevy	45377b94ed	nfs41: callback numbers definitions Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:36 -07:00
Benny Halevy	48a9e2d228	nfs41: decode minorversion 1 cb_compound header decode cb_compound header conforming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Get rid of cb_compound_hdr_arg.callback_ident callback_ident is not used anywhere so we shouldn't waste any memory to store it. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: no need to break read_buf in decode_compound_hdr_arg] See http://linux-nfs.org/pipermail/pnfs/2009-June/007844.html Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:35 -07:00
Benny Halevy	b8f2ef84b0	nfs41: store minorversion in cb_compound_hdr_arg Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:35 -07:00
Andy Adamson	5a0ffe544c	nfs41: Release backchannel resources associated with session Frees the preallocated backchannel resources that are associated with this session when the session is destroyed. A backchannel is currently created once per session. Destroy the backchannel only when the session is destroyed. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:34 -07:00
Andy Adamson	0f91421e8e	nfs41: Client indicates presence of NFSv4.1 callback channel. Set the SESSION4_BACK_CHAN flag to indicate the client supports a backchannel. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:33 -07:00
Andy Adamson	0b5b7ae0a8	nfs41: Setup the backchannel The NFS v4.1 callback service has already been setup, and rpc_xprt->serv points to the svc_serv structure describing it. Invoke the xprt_setup_backchannel() initialization to pre- allocate the necessary backchannel structures. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: change nfs4_put_session(nfs4_session*) to nfs4_destroy_session(nfs_session)] Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [moved xprt_setup_backchannel from nfs4_init_session to nfs4_init_backchannel] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:32 -07:00
Andy Adamson	e82dc22dac	nfs41: Allow NFSv4 and NFSv4.1 callback services to coexist Tracks the nfs_callback_info for both versions, enabling the callback service for v4 and v4.1 to run concurrently and be stopped independently of each other. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:32 -07:00
Benny Halevy	8f97524235	nfs41: create a svc_xprt for nfs41 callback thread and use for incoming callbacks Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:31 -07:00
Ricardo Labiaga	a43cde94fe	nfs41: Implement NFSv4.1 callback service process. nfs41_callback_up() initializes the necessary queues and creates the new nfs41_callback_svc thread. This thread executes the callback service which waits for requests to arrive on the svc_serv->sv_cb_list. NFS41_BC_MIN_CALLBACKS is set to 1 because we expect callbacks to not cause substantial latency. The actual processing of the callback will be implemented as a separate patch. There is only one NFSv4.1 callback service. The first caller of nfs4_callback_up() creates the service, subsequent callers increment a reference count on the service. The service is destroyed when the last caller invokes nfs_callback_down(). The transport needs to hold a reference to the callback service in order to invoke it during callback processing. Currently this reference is only obtained when the service is first created. This is incorrect, since subsequent registrations for other transports will leave the xprt->serv pointer uninitialized, leading to an oops when a callback arrives on the "unreferenced" transport. This patch fixes the problem by ensuring that a reference to the service is saved in xprt->serv, either because the service is created by this invocation to nfs4_callback_up() or by a prior invocation. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Add a reference to svc_serv during callback service bring up] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Type check arguments of nfs_callback_up] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: save svc_serv in nfs_callback_info] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Removal of ugly #ifdefs] [nfs41: Update to removal of ugly #ifdefs] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:29 -07:00
Trond Myklebust	5cd973c44a	NFSv4/NLM: Push file locking BKL dependencies down into the NLM layer Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 13:23:01 -07:00
Trond Myklebust	3f09df70e3	NFS: Ensure we always hold the BKL when dereferencing inode->i_flock Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 13:23:00 -07:00
Trond Myklebust	965b5d6791	NFSv4: Handle more errors when recovering open file and locking state It is possible for servers to return NFS4ERR_BAD_STATEID when the state management code is recovering locks or is reclaiming state when returning a delegation. Ensure that we handle that case. While we're at it, add in handlers for NFS4ERR_STALE, NFS4ERR_ADMIN_REVOKED, NFS4ERR_OPENMODE, NFS4ERR_DENIED and NFS4ERR_STALE_STATEID, since the protocol appears to allow for them too. Also handle ENOMEM... Finally, rather than add new NFSv4.0-specific errors and error handling into the generic delegation code, move that open file and locking state error handling into the NFSv4 layer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 13:22:59 -07:00
Trond Myklebust	d5122201a7	NFSv4: Move error handling out of the delegation generic code The NFSv4 delegation recovery code is required by the protocol to handle more errors. Rather than add NFSv4.0 specific errors into 'generic' delegation code, we should move the error handling into the NFSv4 layer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 13:22:58 -07:00
Trond Myklebust	01c3f05228	NFSv4: Fix the 'nolock' option regression NFSv4 should just ignore the 'nolock' option. It is an NFSv2/v3 thing... This fixes the Oops in http://bugzilla.kernel.org/show_bug.cgi?id=13330 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 13:22:58 -07:00
Benny Halevy	7146851376	nfs41: minorversion support for nfs4_{init,destroy}_callback move nfs4_init_callback into nfs4_init_client_minor_version and nfs4_destroy_callback into nfs4_clear_client_minor_version as these need to happen also when auto-negotiating the minorversion once the callback service for nfs41 becomes different than for nfs4.0 Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Fix checkpatch warning] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Type check arguments of nfs_callback_up] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Backchannel: Remove FIXME comment] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 13:06:01 -07:00
Benny Halevy	9bdaa86d2a	nfs41: Refactor nfs4_{init,destroy}_callback for nfs4.0 Refactor-out code to bring the callback service up and down. Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 12:43:47 -07:00
Benny Halevy	34dc1ad752	nfs41: increment_{open,lock}_seqid Unlike minorversion0, in nfsv4.1 the open and lock seqids need not be incremented by the client and should always be set to zero. This is implemented using a new nfs_rpc_ops methods - increment_open_seqid and increment_lock_seqid Signed-off-by: Rahul Iyer <iyer@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: check for session not minorversion] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:43:45 -07:00
Andy Adamson	78722e9c92	nfs41: only retry EXCHANGE_ID on recoverable errors Stops an infinite loop of EXCHANGE_ID. Signed-off-by: Andy Adamson <andros@netapp.com> [fixed checkpatch warnings] Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 12:43:44 -07:00
Benny Halevy	008f55d0e0	nfs41: recover lease in _nfs4_lookup_root This creates the nfsv4.1 session on mount. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:13 -07:00
Andy Adamson	b4b82607ff	nfs41: get_clid_cred for EXCHANGE_ID Unlike SETCLIENTID, EXCHANGE_ID requires a machine credential. Do not search for credentials other than the machine credential. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:13 -07:00
Andy Adamson	90a16617ee	nfs41: add a get_clid_cred function to nfs4_state_recovery_ops EXCHANGE_ID has different credential requirements than SETCLIENTID. Prepare for a separate credential function. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:12 -07:00
Andy Adamson	591d71cbde	nfs41: establish sessions-based clientid nfsv4.1 clientid is established via EXCHANGE_ID rather than SETCLIENTID{,_CONFIRM} This is implemented using a new establish_clid method in nfs4_state_recovery_ops. nfs41: establish clientid via exchange id only if cred != NULL >From 2.6.26 reclaimer() uses machine cred for setting up the client id therefore it is never expected to be NULL. Signed-off-by: Rahul Iyer <iyer@netapp.com> [removed dprintk] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: lease renewal] [revamped patch for new nfs4_state_manager design] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:11 -07:00
Andy Adamson	a7b721037f	nfs41: introduce get_state_renewal_cred Use the machine cred for sending SEQUENCE to renew the client's lease. [revamp patch for new state management design starting 2.6.29] [nfs41: support minorversion 1 for nfs4_check_lease] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: get cred in exchange_id when cred arg is NULL] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: use cl_machined_cred instead of cl_ex_cred] Since EXCHANGE_ID insists on using the machine credential, cl_ex_cred is not needed. nfs4_proc_exchange_id() is only called if the machine credential is available. Remove the credential logic from nfs4_proc_exchange_id. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:11 -07:00
Benny Halevy	8e69514f29	nfs41: support minorversion 1 for nfs4_check_lease [moved nfs4_get_renew_cred related changes to "nfs41: introduce get_state_renewal_cred"] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:10 -07:00
Benny Halevy	29fba38b79	nfs41: lease renewal Send a NFSv4.1 SEQUENCE op rather than RENEW that was deprecated in minorversion 1. Use the nfs_client minorversion to select reboot_recover/ network_partition_recovery/state_renewal ops. Note: we use reclaimer to create the nfs41 session before there are any cl_superblocks for the nfs_client. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: check for session not minorversion] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [revamped patch for new nfs4_state_manager design] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: obliterate nfs4_state_recovery_ops.renew_lease method] moved to nfs4_state_maintenance_ops [also undid per-minorversion nfs4_state_recovery_ops here] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:09 -07:00
Andy Adamson	b069d94af7	nfs41: schedule async session reset Define a new session reset state which is set upon a sequence operation error in both the sync and async error handlers. Place all new requests and all but the last outstanding rpc on the slot_tbl_waitq. Spawn the recovery thread when the last slot is free. Call nfs4_proc_destroy_session, reinitialize the session, call nfs4_proc_create_session, clear the session reset state, and wake up the next task on the slot_tbl_waitq. Return the nfs4_proc_destroy_session status to the session reclaimer and check for NFS4ERR_BADSESSION and NFS4ERR_DEADSESSION. Other destroy session errors should be handled in nfs4_proc_destroy_session where the call can be retried with adjusted arguments. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> nfs41: make nfs4_wait_bit_killable public] nfs4_wait_bit_killable to be used by NFSv4.1 session recover logic. Signed-off-by: Rahul Iyer <iyer@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: have create_session work on nfs_client] Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: trigger the state manager for session reset] Replace the session reset state with the NFS4CLNT_SESSION_SETUP cl_state. Place all rpc tasks to sleep on the slot table waitqueue until the slot table is drained, then schedule state recovery and wait for it to complete. Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: remove nfs41_session_recovery [ch] Replaced by using the nfs4_state_manager. Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: nfs4_wait_bit_killable only used locally] [nfs41: keep nfs4_wait_bit_killable static] [nfs41: keep const nfs_server in nfs4_handle_exception] [nfs41: remove session parameter from nfs4_find_slot] Signed-off-by: Andy Adamson <andros@netapp.com Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: resset the session from nfs41_setup_sequence] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:09 -07:00
Andy Adamson	4745e3154b	nfs41: kick start nfs41 session recovery when handling errors Remove checking for any errors that the SEQUENCE operation does not return. -NFS4ERR_STALE_CLIENTID, NFS4ERR_EXPIRED, NFS4ERR_CB_PATH_DOWN, NFS4ERR_BACK_CHAN_BUSY, NFS4ERR_OP_NOT_IN_SESSION. SEQUENCE operation error recovery is very primative, we only reset the session. Remove checking for any errors that are returned by the SEQUENCE operation, but that resetting the session won't address. NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SEQUENCE_POS,NFS4ERR_TOO_MANY_OPS. Add error checking for missing SEQUENCE errors that a session reset will address. NFS4ERR_BAD_HIGH_SLOT, NFS4ERR_DEADSESSION, NFS4ERR_SEQ_FALSE_RETRY. A reset of the session is currently our only response to a SEQUENCE operation error. Don't reset the session on errors where a new session won't help. Don't reset the session on errors where a new session won't help. [nfs41: nfs4_async_handle_error update error checking] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: trigger the state manager for session reset] Replace session state bit with nfs_client state bit. Set the NFS4CLNT_SESSION_SETUP bit upon a session related error in the sync/async error handlers. [nfs41: _nfs4_async_handle_error fix session reset error list] Sequence operation errors that session reset could help. NFS4ERR_BADSESSION NFS4ERR_BADSLOT NFS4ERR_BAD_HIGH_SLOT NFS4ERR_DEADSESSION NFS4ERR_CONN_NOT_BOUND_TO_SESSION NFS4ERR_SEQ_FALSE_RETRY NFS4ERR_SEQ_MISORDERED Sequence operation errors that a session reset would not help NFS4ERR_BADXDR NFS4ERR_DELAY NFS4ERR_REP_TOO_BIG NFS4ERR_REP_TOO_BIG_TO_CACHE NFS4ERR_REQ_TOO_BIG NFS4ERR_RETRY_UNCACHED_REP NFS4ERR_SEQUENCE_POS NFS4ERR_TOO_MANY_OPS Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41 nfs4_handle_exception fix session reset error list] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [moved nfs41_sequece_call_done code to nfs41: sequence operation] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:08 -07:00
Andy Adamson	eedc020e71	nfs41: use rpc prepare call state for session reset [nfs41: change nfs4_restart_rpc argument] [nfs41: check for session not minorversion] [nfs41: trigger the state manager for session reset] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [always define nfs4_restart_rpc] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:07 -07:00
Andy Adamson	c3fad1b1aa	nfs41: add session reset to state manager Move the code to reset a session from the session_reclaimer to the nfs4_state_manager. Destroy the session, and create a new one. Treat NFS4ERR_BADSESSION and NFS4ERR_DEADSESSION as a successful nfs4_proc_destroy_session. Signal nfs4_proc_create_session that this is a session reset so that the session slot table is re-used. If the clientid is stale, set both NFS4CLNT_LEASE_EXPIRED and NFS4CLNT_SESSION_SETUP bits and retry. Use a switch statement in nfs4_session_recovery_handle_error for future patche which will add handling for other errors. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: session reset in nfs4_recovery_handle_error] Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: reset session on nfs4_do_reclaim session reset error] If nfs4_do_reclaim gets a session reset error, nfs4_recovery_handle_error will set the NFS4CLNT_SESSION_SETUP bit, and the state manager should continue processing to reset the session. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [move nfs4_proc_destroy_session declaration here] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:06 -07:00
Andy Adamson	76db6d9500	nfs41: add session setup to the state manager At mount, nfs_alloc_client sets the cl_state NFS4CLNT_LEASE_EXPIRED bit and nfs4_alloc_session sets the NFS4CLNT_SESSION_SETUP bit, so both bits are set when nfs4_lookup_root calls nfs4_recover_expired_lease which schedules the nfs4_state_manager and waits for it to complete. Place the session setup after the clientid establishment in nfs4_state_manager so that the session is setup right after the clientid has been established without rescheduling the state manager. Unlike nfsv4.0, the nfs_client struct is not ready to use until the session has been established. Postpone marking the nfs_client struct to NFS_CS_READY until after a successful CREATE_SESSION call so that other threads cannot use the client until the session is established. If the EXCHANGE_ID call fails and the session has not been setup (the NFS4CLNT_SESSION_SETUP bit is set), mark the client with the error and return. If the session setup CREATE_SESSION call fails with NFS4ERR_STALE_CLIENTID which could occur due to server reboot or network partition inbetween the EXCHANGE_ID and CREATE_SESSION call, reset the NFS4CLNT_LEASE_EXPIRED and NFS4CLNT_SESSION_SETUP bits and try again. If the CREATE_SESSION call fails with other errors, mark the client with the error and return. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: NFS_CS_SESSION_SETUP cl_cons_state for back channel setup] On session setup, the CREATE_SESSION reply races with the server back channel probe which needs to succeed to setup the back channel. Set a new cl_cons_state NFS_CS_SESSION_SETUP just prior to the CREATE_SESSION call and add it as a valid state to nfs_find_client so that the client back channel can find the nfs_client struct and won't drop the server backchannel probe. Use a new cl_cons_state so that NFSv4.0 back channel behaviour which only sets NFS_CS_READY is unchanged. Adjust waiting on the nfs_client_active_wq accordingly. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: rename NFS_CS_SESSION_SETUP to NFS_CS_SESSION_INITING] Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: set NFS_CL_SESSION_INITING in alloc_session] Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: move session setup into a function] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [moved nfs4_proc_create_session declaration here] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:06 -07:00
Andy Adamson	ac72b7b3b3	nfs41: reset the session slot table Separated from nfs41: schedule async session reset Do not kfree the session slot table upon session reset, just re-initialize it. Add a boolean to nfs4_proc_create_session to inidicate if this is a session reset or a session initialization. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:05 -07:00
Andy Adamson	fc01cea963	nfs41: sequence operation Implement the sequence operation conforming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Check returned sessionid, slotid and slot sequenceid in decode_sequence. If the server returns different values for sessionID, slotID or slot sequence number than what was sent, the server is looney tunes. Pass the sequence operation status to nfs41_sequence_done in order to determine when to increment the slot sequence ID. Free slot is separated from sequence done. Signed-off-by: Rahul Iyer <iyer@netapp.com> Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Andy Adamson<andros@umich.edu> [nfs41: sequence res use slotid] Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: deref slot table in decode_sequence only for minorversion!=0] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs4_call_sync] [nfs41: remove SEQ4_STATUS_USE_TK_STATUS] [nfs41: return ESERVERFAULT in decode_sequence] [no sr_session, no sr_flags] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: use nfs4_call_sync_sequence to renew session lease] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: remove nfs4_call_sync_sequence forward definition] Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: use struct nfs_client for nfs41_proc_async_sequence] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: pass *session in seq_args and seq_res] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41 nfs41_sequence_call_done update error checking] [nfs41 nfs41_sequence_done update error checking] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: remove switch on error from nfs41_sequence_call_done] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:04 -07:00
Andy Adamson	8328d59f38	nfs41: enable nfs_client only nfs4_async_handle_error The session is per struct nfs_client, not per nfs_server. Allow the handler to be called with no nfs_server which simplifies the nfs4_proc_async_sequence session renewal call and will let it be used by pnfs file layout data servers. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:25:04 -07:00
Andy Adamson	0f3e66c6a6	nfs41: destroy_session operation Implement the destroy_session operation conforming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Andy Adamson<andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: remove extraneous rpc_clnt pointer] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41; NFS_CS_READY required for DESTROY_SESSION] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: pass *session in seq_args and seq_res] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [nfs41: fix encode_destroy_session's xdr Xcoding pointer type] Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 12:24:55 -07:00
Andy Adamson	96b09e024f	nfs41: use session attributes for rsize and wsize Set the mount points rsize and wsize to the negotiated session fore channel maximum response and requeset size. These values will be bound checked in nfs_server_set_fsinfo. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [move nfs4_session_set_rwsize into CONFIG_NFS_V4] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:24:53 -07:00
Andy Adamson	8d35301d7d	nfs41: verify session channel attribues Invalidate the session if the server returns invalid fore or back channel attributes. Use a KERN_WARNING to report the fatal session estabishment error. Signed-off-by: Andy Adamson <andros@netapp.com> [refactor nfs4_verify_channel_attrs] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:24:52 -07:00
Andy Adamson	fc931582c2	nfs41: create_session operation Implement the create_session operation conforming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Set the real fore channel max operations to preserve server resources. Note: If the server returns < NFS4_MAX_OPS, the client will very soon get an NFS4ERR_TOO_MANY_OPS. A later patch will handle this. Set the max_rqst_sz and max_resp_sz to PAGE_SIZE - we preallocate the buffers. Set the back channel max_resp_sz_cached to zero to force the client to always set csa_cachethis to FALSE because the current implementation of the back channel DRC only supports caching the CB_SEQUENCE operation. The client back channel server supports one slot, and desires 2 operations per compound. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Andy Adamson<andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: remove extraneous rpc_clnt pointer] Use the struct nfs_client cl_rpcclient. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs4_init_channel_attrs, just use nfs41_create_session_args] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: use rsize and wsize for session channel attributes] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: set channel max operations] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: set back channel attributes] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: obliterate nfs4_adjust_channel_attrs] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: have create_session work on nfs_client] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: move CONFIG_NFS_V4_1 endif] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: pass *session in seq_args and seq_res] [moved nfs4_init_slot_table definition here] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: use kcalloc to allocate slot table] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [nfs41: fix Xcode_create_session's xdr Xcoding pointer type] [nfs41: refactor decoding of channel attributes] Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 12:24:34 -07:00
Andy Adamson	2050f0cc07	nfs41: get_lease_time get_lease_time uses the FSINFO rpc operation to get the lease time attribute. nfs4_get_lease_time() is only called from the state manager on session setup so don't recover from clientid or sequence level errors. We do need to recover from NFS4ERR_DELAY or NFS4ERR_GRACE. Use NFS4_POLL_RETRY_MIN - the Linux server returns NFS4ERR_DELAY when an upcall is needed to resolve an uncached export referenced by a file handle. [nfs41: sequence res use slotid] Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: remove extraneous rpc_clnt pointer] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: have get_lease_time work on nfs_client] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: get_lease_time recover from NFS4ERR_DELAY] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: pass *session in seq_args and seq_res] [define nfs4_get_lease_time_{args,res}] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 12:24:32 -07:00
Benny Halevy	99fe60d062	nfs41: exchange_id operation Implement the exchange_id operation conforming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Unlike NFSv4.0, NFSv4.1 requires machine credentials. RPC_AUTH_GSS machine credentials will be passed into the kernel at mount time to be available for the exchange_id operation. RPC_AUTH_UNIX root mounts can use the UNIX root credential. Store the root credential in the nfs_client struct. Without a credential, NFSv4.1 state renewal fails. [nfs41: establish clientid via exchange id only if cred != NULL] Signed-off-by: Andy Adamson<andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: move nfstime4 from under CONFIG_NFS_V4_1] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: do not wait a lease time in exchange id] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: pass *session in seq_args and seq_res] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [nfs41: Ignoring impid in decode_exchange_id is missing a READ_BUF] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: fix Xcode_exchange_id's xdr Xcoding pointer type] [nfs41: get rid of unused struct nfs41_exchange_id_res members] Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 12:23:57 -07:00
Andy Adamson	938e101091	nfs41 delegreturn sequence setup done support Separate delegreturn calls from nfs41: sequence setup/done support Implement the delegreturn rpc_call_prepare method for asynchronuos nfs rpcs, call nfs41_setup_sequence from respective rpc_call_validate_args methods. Call nfs4_sequence_done from respective rpc_call_done methods. Note that we need to pass a pointer to the nfs_server in calls data for passing on to nfs4_sequence_done. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: client data server write validate and release] Signed-off-by: Andy Adamson<andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:51 -07:00
Andy Adamson	21d9a851aa	nfs41 commit sequence setup done support Separate commit calls from nfs41: sequence setup/done support Implement the commit rpc_call_prepare method for asynchronuos nfs rpcs, call nfs41_setup_sequence from respective rpc_call_validate_args methods. Call nfs4_sequence_done from respective rpc_call_done methods. Note that we need to pass a pointer to the nfs_server in calls data for passing on to nfs4_sequence_done. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: client data server write validate and release] Signed-off-by: Andy Adamson<andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Support sessions with O_DIRECT.] Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: separate free slot from sequence done] [nfs41: nfs4_sequence_free_slot use nfs_client for data server] Signed-off-by: Andy Adamson<andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:50 -07:00
Andy Adamson	def6ed7ef4	nfs41 write sequence setup done support Separate write calls from nfs41: sequence setup/done support Implement the write rpc_call_prepare method for asynchronuos nfs rpcs, call nfs41_setup_sequence from respective rpc_call_validate_args methods. Call nfs4_sequence_done from respective rpc_call_done methods. Note that we need to pass a pointer to the nfs_server in calls data for passing on to nfs4_sequence_done. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: client data server write validate and release] Signed-off-by: Andy Adamson <andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [move the nfs4_sequence_free_slot call in nfs_readpage_retry from] [nfs41: separate free slot from sequence done Signed-off-by: Andy Adamson <andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Support sessions with O_DIRECT.] Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs4_sequence_free_slot use nfs_client for data server] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:49 -07:00
Andy Adamson	f11c88af26	nfs41: read sequence setup/done support Implement the read rpc_call_prepare method for asynchronuos nfs rpcs, call nfs41_setup_sequence from respective rpc_call_validate_args methods. Call nfs4_sequence_done from respective rpc_call_done methods. Note that we need to pass a pointer to the nfs_server in calls data for passing on to nfs4_sequence_done. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: client data server write validate and release] Signed-off-by: Andy Adamson <andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [move the nfs4_sequence_free_slot call in nfs_readpage_retry from] [nfs41: separate free slot from sequence done] [remove nfs_readargs.nfs_server, use calldata->inode instead] Signed-off-by: Andy Adamson <andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Support sessions with O_DIRECT] Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs4_sequence_free_slot use nfs_client for data server] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:48 -07:00
Andy Adamson	472cfbd9b9	nfs41: unlink sequence setup/done support Implement the rpc_call_prepare methods for asynchronuos nfs rpcs, call nfs41_setup_sequence from respective rpc_call_validate_args methods. Call nfs4_sequence_done from respective rpc_call_done methods. Note that we need to pass a pointer to the nfs_server in calls data for passing on to nfs4_sequence_done. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: client data server write validate and release] Signed-off-by: Andy Adamson<andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: separate free slot from sequence done] [nfs41: sequence res use slotid] [nfs41: remove SEQ4_STATUS_USE_TK_STATUS] [nfs41: nfs4_sequence_free_slot use nfs_client for data server] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:47 -07:00
Andy Adamson	a893693c15	nfs41: locku sequence setup/done support Separate nfs4_locku calls from nfs41: sequence setup/done support Call nfs4_sequence_done from respective rpc_call_done methods. Note that we need to pass a pointer to the nfs_server in calls data for passing on to nfs4_sequence_done. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: client data server write validate and release] Signed-off-by: Andy Adamson <andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs4_sequence_free_slot use nfs_client for data server] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:46 -07:00
Andy Adamson	66179efee3	nfs41: lock sequence setup/done support Separate nfs4_lock calls from nfs41: sequence setup/done support Call nfs4_sequence_done from respective rpc_call_done methods. Note that we need to pass a pointer to the nfs_server in calls data for passing on to nfs4_sequence_done. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: client data server write validate and release] [use nfs4_sequence_done_free_slot] Signed-off-by: Andy Adamson<andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:45 -07:00
Andy Adamson	d898528cdb	nfs41: open sequence setup/done support Separate nfs4_open calls from nfs41: sequence setup/done support Call nfs4_sequence_done from respective rpc_call_done methods. Note that we need to pass a pointer to the nfs_server in calls data for passing on to nfs4_sequence_done. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: client data server write validate and release] [use nfs4_sequence_done_free_slot] Signed-off-by: Andy Adamson<andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:44 -07:00
Andy Adamson	19ddab06ed	nfs41: close sequence setup/done support Separate nfs4_close calls from nfs41: sequence setup/done support Call nfs4_sequence_done from respective rpc_call_done methods. Note that we need to pass a pointer to the nfs_server in calls data for passing on to nfs4_sequence_done. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: client data server write validate and release] Signed-off-by: Andy Adamson<andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: separate free slot from sequence done] [nfs41: sequence res use slotid] [nfs41: remove SEQ4_STATUS_USE_TK_STATUS] [nfs41: nfs4_sequence_free_slot use nfs_client for data server] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:43 -07:00
Andy Adamson	69ab40c4c3	nfs41: nfs41_call_sync_done Implement nfs4.1 synchronous rpc_call_done method that essentially just calls nfs4_sequence_done, that turns around and calls nfs41_sequence_done for minorversion1 rpcs. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: check for session not minorversion] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [move adding nfs4_sequence_free_slot from nfs41-separate-free-slot-from-sequence-done] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs41_call_sync_data use nfs_client not nfs_server] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:42 -07:00
Andy Adamson	b0df806c0f	nfs41: nfs41_sequence_done Handle session level errors, update slot sequence id and sessions bookeeping, free slot. [nfs41: sequence res use slotid] Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: remove SEQ4_STATUS_USE_TK_STATUS] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: check for session not minorversion] Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: bail out early out of nfs41_sequence_done if !res->sr_session] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [move nfs4_sequence_done from nfs41: nfs41_call_sync_done] Signed-off-by: Andy Adamson <andros@netapp.com> [move nfs4_sequence_free_slot from nfs41: separate free slot from sequence done] Don't free the slot until after all rpc_restart_calls have completed. Session reset will require more work. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [moved reset sr_slotid to nfs41_sequence_free_slot] [free slot also on unexpectecd error] [remove seq_res.sr_session member, use nfs_client's instead] [ditch seq_res.sr_flags until used] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [look at sr_slotid for bailing out early from nfs41_sequence_done] [nfs41: rpc_wake_up_next if sessions slot was not consumed.] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs4_sequence_free_slot use nfs_client for data server] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: remove unused error checking in nfs41_sequence_done] Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: remove nfs4_has_session check in nfs41_sequence_done] Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: remove nfs_client pointer check] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:41 -07:00
Andy Adamson	13615871cd	nfs41: nfs41_sequence_free_slot [from nfs41: separate free slot from sequence done] Don't free the slot until after all rpc_restart_calls have completed. Session reset will require more work. As noted by Trond, since we're using rpc_wake_up_next rather than rpc_wake_up() we must always wake up the next task in the queue either by going through nfs4_free_slot, or just calling rpc_wake_up_next if no slot is to be freed. [nfs41: sequence res use slotid] [nfs41: remove SEQ4_STATUS_USE_TK_STATUS] [got rid of nfs4_sequence_res.sr_session, use nfs_client.cl_session instead] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: rpc_wake_up_next if sessions slot was not consumed.] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs4_sequence_free_slot use nfs_client for data server] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:41 -07:00
Andy Adamson	e2c4ab3ce2	nfs41: free slot Free a slot in the slot table. Mark the slot as free in the bitmap-based allocation table by clearing a bit corresponding to the slotid. Update lowest_free_slotid if freed slotid is lower than that. Update highest_used_slotid. In the case the freed slotid equals the highest_used_slotid, scan downwards for the next highest used slotid using the optimized fls* functions. Finally, wake up thread waiting on slot_tbl_waitq for a free slot to become available. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: free slot use slotid] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: use find_first_zero_bit for nfs4_find_slot] While at it, obliterate lowest_free_slotid and fix-up related comments. As per review comment 21/85. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: use __clear_bit for nfs4_free_slot] While at it, fix-up function comment. Part of review comment 22/85. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: use find_last_bit in nfs4_free_slot to determine highest used slot.] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: rpc_sleep_on slot_tbl_waitq must be called under slot_tbl_lock] Otherwise there's a race (we've hit) with nfs4_free_slot where nfs41_setup_sequence sees a full slot table, unlocks slot_tbl_lock, nfs4_free_slots happen concurrently and call rpc_wake_up_next where there's nobody to wake up yet, context goes back to nfs41_setup_sequence which goes to sleep when the slot table is actually empty now and there's no-one to wake it up anymore. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:39 -07:00
Andy Adamson	fbcd4abcb3	nfs41: setup_sequence method Allocate a slot in the session slot table and set the sequence op arguments. Called at the rpc prepare stage. Add a status to nfs41_sequence_res, initialize it to one so that we catch rpc level failures which do not go through decode_sequence which sets the new status field. Note that upon an rpc level failure, we don't know if the server processed the sequence operation or not. Proceed as if the server did process the sequence operation. Signed-off-by: Rahul Iyer <iyer@netapp.com> [nfs41: sequence args use slotid] [nfs41: find slot return slotid] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: remove SEQ4_STATUS_USE_TK_STATUS] As per 11-14-08 review [move extern declaration from nfs41: sequence setup/done support] [removed sa_session definition, changed sa_cache_this into a u8 to reduce footprint] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: rpc_sleep_on slot_tbl_waitq must be called under slot_tbl_lock] Otherwise there's a race (we've hit) with nfs4_free_slot where nfs41_setup_sequence sees a full slot table, unlocks slot_tbl_lock, nfs4_free_slots happen concurrently and call rpc_wake_up_next where there's nobody to wake up yet, context goes back to nfs41_setup_sequence which goes to sleep when the slot table is actually empty now and there's no-one to wake it up anymore. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:39 -07:00
Benny Halevy	510b81756f	nfs41: find slot Find a free slot using bitmap-based allocation. Use the optimized ffz function to find a zero bit in the bitmap that indicates a free slot, starting the search from the 'lowest_free_slotid' position. If found, mark the slot as used in the bitmap, get the slot's slotid and seqid, and update max_slotid to be used by the SEQUENCE operation. Also, update lowest_free_slotid for next search. If no free slot was found the caller has to wait for a free slot (outside the scope of this function) Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: find slot return slotid] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [use find_first_zero_bit for nfs4_find_slot as per review comment 21/85.] [use NFS4_MAX_SLOT_TABLE rather than NFS4_NO_SLOT] [nfs41: rpc_sleep_on slot_tbl_waitq must be called under slot_tbl_lock] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:37 -07:00
Andy Adamson	ce5039c1be	nfs41: nfs4_setup_sequence Perform the nfs4_setup_sequence in the rpc_call_prepare state. If a session slot is not available, we will rpc_sleep_on the slot wait queue leaving the tk_action as rpc_call_prepare. Once we have a session slot, hang on to it even through rpc_restart_calls. Ensure the nfs41_sequence_res sr_slot pointer is NULL before rpc_run_task is called as nfs41_setup_sequence will only find a new slot if it is NULL. A future patch will call free slot after any rpc_restart_calls, and handle the rpc restart that result from a sequence operation error. Signed-off-by: Rahul Iyer <iyer@netapp.com> [nfs41: sequence res use slotid] Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: simplify nfs4_call_sync] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs4_call_sync] [nfs41: check for session not minorversion] [nfs41: remove rpc_message from nfs41_call_sync_args] [moved NFS4_MAX_SLOT_TABLE logic into nfs41_setup_sequence] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs41_call_sync_data use nfs_client not nfs_server] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: expose nfs4_call_sync_session for lease renewal] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: remove unnecessary return check] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:36 -07:00
Andy Adamson	9b7b9fcc9c	nfs41: xdr {encode,decode}_sequence Implement stubs for encode and decode sequence, defined as no-ops when CONFIG_NFS_V4_1 is not defined. Add the nfsv41 encode and decode sizes. Add encode_sequence to all nfs4_enc_* routines and decode_sequence to all nfs4_dec_* routines as required by v41. [was nfs41: minorversion support for xdr] [added nfs_client argument to encode_sequence so not to use sequence_args to pass sa_session] Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: pass *session in seq_args and seq_res] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:36 -07:00
Benny Halevy	66cc042970	nfs41: encode minorversion in compound header Signed-off-by: Andy Adamdon <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: pass *session in seq_args and seq_res] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:35 -07:00
Benny Halevy	28f566942c	NFS: use dynamically computed compound_hdr.replen for xdr_inline_pages offset As Trond suggested, rather than passing a constant to xdr_inline_pages, keep a running count of the expected reply bytes. In preparation for nfs41, where additional op sequence are expteced when talking to nfs41 servers. [NFS: cb_compoundhdr.replen is in words not bytes] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: get fs_locations replen before encoding the GETATTR] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: get getacl replen before encoding the GETATTR] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:34 -07:00
Benny Halevy	dadf0c2767	NFS: update hdr->replen for every encode op Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:33 -07:00
Benny Halevy	0c4e8c1877	NFS: define and initialize compound_hdr.replen replen holds the running count of expected reply bytes. repl will then be used by encoding routines for xdr_inline_pages offset after which data bytes are to be received directly into the xdr buffer pages. NOTE: According to the nfsv4 and v4.1 RFCs, the replied tag SHOULD be the same is the one sent, but this is not required as a MUST for the server to do so. The server may screw us if it replies a tag of a different length in the compound result. [NFS: cb_compoundhdr.replen is in words not bytes] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:32 -07:00
Benny Halevy	6ce183919b	NFS: use decode_change_info_maxsz for xdr maxsz calculations Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:31 -07:00
Andy Adamson	5f7dbd5c75	nfs41: set up seq_res.sr_slotid Initialize nfs4_sequence_res sr_slotid to NFS4_MAX_SLOT_TABLE. [was nfs41: sequence res use slotid] Signed-off-by: Andy Adamson <andros@netapp.com> [pulled definition of struct nfs4_sequence_res.sr_slotid to here] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:30 -07:00
Benny Halevy	f3752975ca	nfs41: nfs41: pass *session in seq_args and seq_res To be used for getting the rpc's minorversion and for nfs41 xdr {en,de}coding of the sequence operation. Reset the seq session ptrs for minorversion=0 rpc calls. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:29 -07:00
Andy Adamson	cccef3b96a	nfs41: introduce nfs4_call_sync Use nfs4_call_sync rather than rpc_call_sync to provide for a nfs41 sessions-enabled interface for sessions manipulation. The nfs41 rpc logic uses the rpc_call_prepare method to recover and create the session, as well as selecting a free slot id and the rpc_call_done to free the slot and update slot table related metadata. In the coming patches we'll add rpc prepare and done routines for setting up the sequence op and processing the sequence result. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs4_call_sync] As per 11-14-08 review. Squash into "nfs41: introduce nfs4_call_sync" and "nfs41: nfs4_setup_sequence" Define two functions one for v4 and one for v41 add a pointer to struct nfs4_client to the correct one. Signed-off-by: Andy Adamson <andros@netapp.com> [added BUG() in _nfs4_call_sync_session if !CONFIG_NFS_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: check for session not minorversion] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [group minorversion specific stuff together] Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Andy Adamson <andros@netapp.com> [nfs41: fixup nfs4_clear_client_minor_version] [introduce nfs4_init_client_minor_version() in this patch] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [cleaned-up patch: got rid of nfs_call_sync_t, dprintks, cosmetics, extra server defs] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:28 -07:00
Benny Halevy	22958463d5	nfs41: use nfs4_fs_locations_res In preparation for nfs41 sequence processing. Signed-off-by: Andy Admason <andros@netapp.com> [find nfs4_fs_locations_res] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:27 -07:00
Benny Halevy	73c403a9a9	nfs41: use nfs4_setaclres In preparation for nfs41 sequence processing. Signed-off-by: Andy Admason <andros@netapp.com> [define nfs_setaclres] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:26 -07:00
Benny Halevy	9e9ecc03d6	NFS: get rid of unused xdr decode_setattr(, res) argument Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:26 -07:00
Benny Halevy	663c79b3cd	nfs41: use nfs4_getaclres In preparation for nfs41 sequence processing. Signed-off-by: Andy Admason <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: embed resp_len in nfs_getaclres] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:25 -07:00
Benny Halevy	d45b2989a7	nfs41: use nfs4_pathconf_res In preparation for nfs41 sequence processing. Signed-off-by: Andy Admason <andros@netapp.com> [define nfs4_pathconf_res] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:24 -07:00
Benny Halevy	3dda5e4347	nfs41: use nfs4_fsinfo_res In preparation for nfs41 sequence processing. Signed-off-by: Andy Admason <andros@netapp.com> [define nfs4_fsinfo_res] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:23 -07:00
Benny Halevy	24ad148a0f	nfs41: use nfs4_statfs_res In preparation for nfs41 sequence processing. Signed-off-by: Andy Admason <andros@netapp.com> [define nfs4_statfs_res] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:22 -07:00
Benny Halevy	f50c700081	nfs41: use nfs4_readlink_res In preparation for nfs41 sequence processing. Signed-off-by: Andy Admason <andros@netapp.com> [define nfs4_readlink_res] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:21 -07:00
Benny Halevy	43652ad553	nfs41: use nfs4_server_caps_arg In preparation for nfs41 sequence processing. Signed-off-by: Andy Admason <andros@netapp.com> [define nfs4_server_caps_arg] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:20 -07:00
Andy Adamson	557134a39c	nfs41: sessions client infrastructure NFSv4.1 Sessions basic data types, initialization, and destruction. The session is always associated with a struct nfs_client that holds the exchange_id results. Signed-off-by: Rahul Iyer <iyer@netapp.com> Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [remove extraneous rpc_clnt pointer, use the struct nfs_client cl_rpcclient. remove the rpc_clnt parameter from nfs4 nfs4_init_session] Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Use the presence of a session to determine behaviour instead of the minorversion number.] Signed-off-by: Andy Adamson <andros@netapp.com> [constified nfs4_has_session's struct nfs_client parameter] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Rename nfs4_put_session() to nfs4_destroy_session() and call it from nfs4_free_client() not nfs4_free_server(). Also get rid of nfs4_get_session() and the ref_count in nfs4_session struct as keeping track of nfs_client should be sufficient] Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com> [nfs41: pass rsize and wsize into nfs4_init_session] Signed-off-by: Andy Adamson <andros@netapp.com> [separated out removal of rpc_clnt parameter from nfs4_init_session ot a patch of its own] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Pass the nfs_client pointer into nfs4_alloc_session] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: don't assign to session->clp->cl_session in nfs4_destroy_session] [nfs41: fixup nfs4_clear_client_minor_version] [introduce nfs4_clear_client_minor_version() in this patch] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Refactor nfs4_init_session] Moved session allocation into nfs4_init_client_minor_version, called from nfs4_init_client. Leave rwise and wsize initialization in nfs4_init_session, called from nfs4_init_server. Reverted moving of nfs_fsid definition to nfs_fs_sb.h Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Move NFS4_MAX_SLOT_TABLE define from under CONFIG_NFS_V4_1] [Fix comile error when CONFIG_NFS_V4_1 is not set.] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [moved nfs4_init_slot_table definition to "create_session operation"] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: alloc session with GFP_KERNEL] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:19 -07:00
Benny Halevy	c2e713dd83	nfs41: translate NFS4ERR_MINOR_VERS_MISMATCH to EPROTONOSUPPORT To be returned to the mount command when trying to mount a v4 server using minorversion 1. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:18 -07:00
Benny Halevy	5aae4a9ae0	nfs41: Use mount minorversion option Use the mount minorversion option to initialize the nfs_client cl_minorversion and match it in nfs_match_client() when looking up a nfs_client. [nfs41: remove ifdefs around nfs_client_initdata.minorversion] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:17 -07:00
Benny Halevy	94a417f3d7	nfs41: nfs_client.cl_minorversion This field is set to the nfsv4 minor version for this mount. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Note: This patch sets the referral to the same minorversion as the current mount. Revisit in future patch. Signed-off-by: Andy Adamson <andros@netapp.com> [removed cl_minorversion assignment in nfs_set_client] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [always define nfs_client.cl_minorversion] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:16 -07:00
Mike Sager	3fd5be9e19	nfs41: add mount command option minorversion mount -t nfs4 -o minorversion=[0\|1] specifies whether to use 4.0 or 4.1. By default, the minorversion is set to 0. Signed-off-by: Mike Sager <sager@netapp.com> [set default minorversion to 0 as per Trond and SteveD's request] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:15 -07:00
Ricardo Labiaga	1efae38140	nfs41: Add Kconfig symbols for NFSv4.1 Added CONFIG_NFS_V4_1 and made it depend upon CONFIG_NFS_V4 and EXPERIMENTAL. Indicate that CONFIG_NFS_V4_1 is for NFS developers at the moment At the moment we're expecting folks trying out nfs41 to actively participate in the development process by helping us debug issues and ideally send patches to fix problems. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 10:46:13 -07:00
Linus Torvalds	aa2638a210	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: [SCSI] aic79xx: make driver respect nvram for IU and QAS settings [SCSI] don't attach ULD to Dell Universal Xport [SCSI] lpfc 8.3.3 : Update driver version to 8.3.3 [SCSI] lpfc 8.3.3 : Add support for Target Reset handler entrypoint [SCSI] lpfc 8.3.3 : Fix a couple of spin_lock and memory issues and a crash [SCSI] lpfc 8.3.3 : FC/FCOE discovery fixes [SCSI] lpfc 8.3.3 : Fix various SLI-3 vs SLI-4 differences [SCSI] qla2xxx: Resolve a performance issue in interrupt [SCSI] cnic, bnx2i: Fix build failure when CONFIG_PCI is not set. [SCSI] nsp_cs: time_out reaches -1 [SCSI] qla2xxx: fix printk format warnings [SCSI] ncr53c8xx: div reaches -1 [SCSI] compat: don't perform unneeded copy in sg_io code [SCSI] zfcp: Update FC pass-through support [SCSI] zfcp: Add FC pass-through support [SCSI] FC Pass Thru support	2009-06-17 09:50:44 -07:00
Linus Torvalds	b7c142dbf1	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: start using hrtimers hrtimer: export ktime_add_safe UBIFS: do not forget to register BDI device UBIFS: allow sync option in rootflags UBIFS: remove dead code UBIFS: use anonymous device UBIFS: return proper error code if the compr is not present UBIFS: return error if link and unlink race UBIFS: reset no_space flag after inode deletion	2009-06-17 09:46:33 -07:00
Steven Whitehouse	a566a6b11c	dlm: Fix uninitialised variable warning in lock.c CC [M] fs/dlm/lock.o fs/dlm/lock.c: In function ‘find_rsb’: fs/dlm/lock.c:438: warning: ‘r’ may be used uninitialized in this function Since r is used on the error path to set r_ret, set it to NULL. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2009-06-17 11:31:32 -05:00
Linus Torvalds	9cb0fbf7f8	Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (47 commits) MIPS: Add hibernation support MIPS: Move Cavium CP0 hwrena impl bits to cpu-feature-overrides.h MIPS: Allow CPU specific overriding of CP0 hwrena impl bits. MIPS: Kconfig Add SYS_SUPPORTS_HUGETLBFS and enable it for some systems. Hugetlbfs: Enable hugetlbfs for more systems in Kconfig. MIPS: TLB support for hugetlbfs. MIPS: Add hugetlbfs page defines. MIPS: Add support files for hugetlbfs. MIPS: Remove unused parameters from iPTE_LW. Staging: Add octeon-ethernet driver files. MIPS: Export erratum function needed by octeon-ethernet driver. MIPS: Cavium-Octeon: Add more chip specific feature tests. MIPS: Cavium-Octeon: Add more board type constants. MIPS: Export cvmx_sysinfo_get needed by octeon-ethernet driver. MIPS: Add named alloc functions to OCTEON boot monitor memory allocator. MIPS: Alchemy: devboards: Convert to gpio calls. MIPS: Alchemy: xxs1500: use linux gpio api. MIPS: Alchemy: MTX-1: Use linux gpio api. MIPS: Alchemy: Rewrite GPIO support. MIPS: Alchemy: Remove unused au1000_gpio.h header ...	2009-06-17 09:13:52 -07:00
Linus Torvalds	feb72ce827	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: get rid of BKL in fs/sysv get rid of BKL in fs/minix get rid of BKL in fs/efs befs ->pust_super() doesn't need BKL Cleanup of adfs headers 9P doesn't need BKL in ->umount_begin() fuse doesn't need BKL in ->umount_begin() No instance of ->bmap() needs BKL remove unlock_kernel() left accidentally ext4: avoid unnecessary spinlock in critical POSIX ACL path ext3: avoid unnecessary spinlock in critical POSIX ACL path	2009-06-17 08:46:57 -07:00
David Daney	852969b2d2	Hugetlbfs: Enable hugetlbfs for more systems in Kconfig. As part of adding hugetlbfs support for MIPS, I am adding a new kconfig variable 'SYS_SUPPORTS_HUGETLBFS'. Since some mips cpu varients don't yet support it, we can enable selection of HUGETLBFS on a system by system basis from the arch/mips/Kconfig. Signed-off-by: David Daney <ddaney@caviumnetworks.com> CC: William Irwin <wli@holomorphy.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2009-06-17 11:06:31 +01:00
Al Viro	5ac3455a84	get rid of BKL in fs/sysv Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:37 -04:00
Al Viro	cc46759a8c	get rid of BKL in fs/minix Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:37 -04:00
Al Viro	e7ec952f6a	get rid of BKL in fs/efs Only readdir() really needed it, and that's easily fixable by switch to generic_file_llseek() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:36 -04:00
Al Viro	536c94901e	befs ->pust_super() doesn't need BKL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:36 -04:00
Al Viro	608ba50bd0	Cleanup of adfs headers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:36 -04:00
Al Viro	ee450f796f	9P doesn't need BKL in ->umount_begin() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:36 -04:00
Al Viro	66c6af2e8b	fuse doesn't need BKL in ->umount_begin() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:36 -04:00
Al Viro	fe36adf47e	No instance of ->bmap() needs BKL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:35 -04:00
J. R. Okajima	b0895513f4	remove unlock_kernel() left accidentally commit `337eb00a2c` Push BKL down into ->remount_fs() and commit `4aa98cf768` Push BKL down into do_remount_sb() were uncorrectly merged. The former removes one pair of lock/unlock_kernel(), but the latter adds several unlock_kernel(). Finally a few unlock_kernel() calls left. Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:35 -04:00
Theodore Ts'o	210ad6aedb	ext4: avoid unnecessary spinlock in critical POSIX ACL path If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem to do POSIX ACL checks on any files not owned by the caller, and it does this for every single pathname component that it looks up. That obviously can be pretty expensive if the filesystem isn't careful about it, especially with locking. That's doubly sad, since the common case tends to be that there are no ACL's associated with the files in question. ext4 already caches the ACL data so that it doesn't have to look it up over and over again, but it does so by taking the inode->i_lock spinlock on every lookup. Which is a noticeable overhead even if it's a private lock, especially on CPU's where the serialization is expensive (eg Intel Netburst aka 'P4'). For the special case of not actually having any ACL's, all that locking is unnecessary. Even if somebody else were to be changing the ACL's on another CPU, we simply don't care - if we've seen a NULL ACL, we might as well use it. So just load the ACL speculatively without any locking, and if it was NULL, just use it. If it's non-NULL (either because we had a cached entry, or because the cache hasn't been filled in at all), it means that we'll need to get the lock and re-load it properly. (This commit was ported from a patch originally authored by Linus for ext3.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:35 -04:00
Linus Torvalds	9c64daff9d	ext3: avoid unnecessary spinlock in critical POSIX ACL path If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem to do POSIX ACL checks on any files not owned by the caller, and it does this for every single pathname component that it looks up. That obviously can be pretty expensive if the filesystem isn't careful about it, especially with locking. That's doubly sad, since the common case tends to be that there are no ACL's associated with the files in question. ext3 already caches the ACL data so that it doesn't have to look it up over and over again, but it does so by taking the inode->i_lock spinlock on every lookup. Which is a noticeable overhead even if it's a private lock, especially on CPU's where the serialization is expensive (eg Intel Netburst aka 'P4'). For the special case of not actually having any ACL's, all that locking is unnecessary. Even if somebody else were to be changing the ACL's on another CPU, we simply don't care - if we've seen a NULL ACL, we might as well use it. So just load the ACL speculatively without any locking, and if it was NULL, just use it. If it's non-NULL (either because we had a cached entry, or because the cache hasn't been filled in at all), it means that we'll need to get the lock and re-load it properly. This is noticeable even on Nehalem, which does locking quite well (much better than P4). From lmbench: Processor, Processes - times in microseconds - smaller is better -------------------------------------------------------------------- Host OS Mhz null null open slct fork exec sh call I/O stat clos TCP proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- - before: nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.45 2.18 69.1 273. 1141 nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.48 2.28 69.9 253. 1140 nehalem.l Linux 2.6.30- 3193 0.04 0.10 0.95 1.42 2.19 68.6 284. 1141 - after: nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.44 2.12 68.3 282. 1094 nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.20 67.0 308. 1123 nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.36 67.4 293. 1148 where you can see what appears to be a roughly 3% improvement in stat and open/close latencies from just the removal of the locking overhead. Of course, this only matters for files you don't own (the owner never needs to do the ACL checks), but that's the common case for libraries, header files, and executables. As well as for the base components of any absolute pathname, even if you are the owner of the final file. [ At some point we probably want to move this ACL caching logic entirely into the VFS layer (and only call down to the filesystem when uncached), but in the meantime this improves ext3 a bit. A similar fix to btrfs makes a much bigger difference (15x improvement in lmbench) due to broken caching. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:35 -04:00
David Howells	005411c3e9	AFS: Correctly translate auth error aborts and don't failover in such cases Authentication error abort codes should be translated to appropriate Linux error codes, rather than all being translated to EREMOTEIO - which indicates that the server had internal problems. Additionally, a server shouldn't be marked unavailable and the next server tried if an authentication error occurs. This will quickly make all the servers unavailable to the client. Instead the error should be returned straight to the user. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 21:20:14 -07:00
Linus Torvalds	517d08699b	Merge branch 'akpm' * akpm: (182 commits) fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset fbdev: bfin: fix __dev{init,exit} markings fbdev: bfin: drop unnecessary calls to memset fbdev: bfin-t350mcqb-fb: drop unused local variables fbdev: blackfin has __raw I/O accessors, so use them in fb.h fbdev: s1d13xxxfb: add accelerated bitblt functions tcx: use standard fields for framebuffer physical address and length fbdev: add support for handoff from firmware to hw framebuffers intelfb: fix a bug when changing video timing fbdev: use framebuffer_release() for freeing fb_info structures radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb? s3c-fb: CPUFREQ frequency scaling support s3c-fb: fix resource releasing on error during probing carminefb: fix possible access beyond end of carmine_modedb[] acornfb: remove fb_mmap function mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF mb862xxfb: restrict compliation of platform driver to PPC Samsung SoC Framebuffer driver: add Alpha Channel support atmel-lcdc: fix pixclock upper bound detection offb: use framebuffer_alloc() to allocate fb_info struct ... Manually fix up conflicts due to kmemcheck in mm/slab.c	2009-06-16 19:50:13 -07:00
Tomas Szepe	69050eee8e	CONFIG_FILE_LOCKING should not depend on CONFIG_BLOCK CONFIG_FILE_LOCKING should not depend on CONFIG_BLOCK. This makes it possible to run complete systems out of a CONFIG_BLOCK=n initramfs on current kernels again (this last worked on 2.6.27.*). Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:52 -07:00
Thomas Gleixner	8b0b1db013	remove put_cpu_no_resched() put_cpu_no_resched() is an optimization of put_cpu() which unfortunately can cause high latencies. The nfs iostats code uses put_cpu_no_resched() in a code sequence where a reschedule request caused by an interrupt between the get_cpu() and the put_cpu_no_resched() can delay the reschedule for at least HZ. The other users of put_cpu_no_resched() optimize correctly in interrupt code, but there is no real harm in using the put_cpu() function which is an alias for preempt_enable(). The extra check of the preemmpt count is not as critical as the potential source of missing a reschedule. Debugged in the preempt-rt tree and verified in mainline. Impact: remove a high latency source [akpm@linux-foundation.org: build fix] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:48 -07:00
Eric Dumazet	4938d7e023	poll: avoid extra wakeups in select/poll After introduction of keyed wakeups Davide Libenzi did on epoll, we are able to avoid spurious wakeups in poll()/select() code too. For example, typical use of poll()/select() is to wait for incoming network frames on many sockets. But TX completion for UDP/TCP frames call sock_wfree() which in turn schedules thread. When scheduled, thread does a full scan of all polled fds and can sleep again, because nothing is really available. If number of fds is large, this cause significant load. This patch makes select()/poll() aware of keyed wakeups and useless wakeups are avoided. This reduces number of context switches by about 50% on some setups, and work performed by sofirq handlers. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:48 -07:00
Robert P. J. Day	02d5341ae5	ntfs: use is_power_of_2() function for clarity. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:48 -07:00
Wu Fengguang	84a8924560	writeback: skip new or to-be-freed inodes 1) I_FREEING tests should be coupled with I_CLEAR The two I_FREEING tests are racy because clear_inode() can set i_state to I_CLEAR between the clear of I_SYNC and the test of I_FREEING. 2) skip I_WILL_FREE inodes in generic_sync_sb_inodes() to avoid possible races with generic_forget_inode() generic_forget_inode() sets I_WILL_FREE call writeback on its own, so generic_sync_sb_inodes() shall not try to step in and create possible races: generic_forget_inode inode->i_state \|= I_WILL_FREE; spin_unlock(&inode_lock); generic_sync_sb_inodes() spin_lock(&inode_lock); __iget(inode); __writeback_single_inode // see non zero i_count may WARN here ==> WARN_ON(inode->i_state & I_WILL_FREE); spin_unlock(&inode_lock); may call generic_forget_inode again ==> iput(inode); The above race and warning didn't turn up because writeback_inodes() holds the s_umount lock, so generic_forget_inode() finds MS_ACTIVE and returns early. But we are not sure the UBIFS calls and future callers will guarantee that. So skip I_WILL_FREE inodes for the sake of safety. Cc: Eric Sandeen <sandeen@sandeen.net> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:45 -07:00
Mike Waychison	286973552f	mm: remove __invalidate_mapping_pages variant Remove __invalidate_mapping_pages atomic variant now that its sole caller can sleep (fixed in `eccb95cee4` ("vfs: fix lock inversion in drop_pagecache_sb()")). This fixes softlockups that can occur while in the drop_caches path. Signed-off-by: Mike Waychison <mikew@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:43 -07:00
David Rientjes	2ff05b2b4e	oom: move oom_adj value from task_struct to mm_struct The per-task oom_adj value is a characteristic of its mm more than the task itself since it's not possible to oom kill any thread that shares the mm. If a task were to be killed while attached to an mm that could not be freed because another thread were set to OOM_DISABLE, it would have needlessly been terminated since there is no potential for future memory freeing. This patch moves oomkilladj (now more appropriately named oom_adj) from struct task_struct to struct mm_struct. This requires task_lock() on a task to check its oom_adj value to protect against exec, but it's already necessary to take the lock when dereferencing the mm to find the total VM size for the badness heuristic. This fixes a livelock if the oom killer chooses a task and another thread sharing the same memory has an oom_adj value of OOM_DISABLE. This occurs because oom_kill_task() repeatedly returns 1 and refuses to kill the chosen task while select_bad_process() will repeatedly choose the same task during the next retry. Taking task_lock() in select_bad_process() to check for OOM_DISABLE and in oom_kill_task() to check for threads sharing the same memory will be removed in the next patch in this series where it will no longer be necessary. Writing to /proc/pid/oom_adj for a kthread will now return -EINVAL since these threads are immune from oom killing already. They simply report an oom_adj value of OOM_DISABLE. Cc: Nick Piggin <npiggin@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:43 -07:00
KOSAKI Motohiro	6837765963	mm: remove CONFIG_UNEVICTABLE_LRU config option Currently, nobody wants to turn UNEVICTABLE_LRU off. Thus this configurability is unnecessary. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andi Kleen <andi@firstfloor.org> Acked-by: Minchan Kim <minchan.kim@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:42 -07:00
Wu Fengguang	1779754959	proc: export more page flags in /proc/kpageflags Export all page flags faithfully in /proc/kpageflags. 11. KPF_MMAP (pseudo flag) memory mapped page 12. KPF_ANON (pseudo flag) memory mapped page (anonymous) 13. KPF_SWAPCACHE page is in swap cache 14. KPF_SWAPBACKED page is swap/RAM backed 15. KPF_COMPOUND_HEAD () 16. KPF_COMPOUND_TAIL () 17. KPF_HUGE hugeTLB pages 18. KPF_UNEVICTABLE page is in the unevictable LRU list 19. KPF_HWPOISON(TBD) hardware detected corruption 20. KPF_NOPAGE (pseudo flag) no page frame at the address 32-39. more obscure flags for kernel developers (*) For compound pages, exporting _both_ head/tail info enables users to tell where a compound page starts/ends, and its order. The accompanying page-types tool will handle the details like decoupling overloaded flags and hiding obscure flags to normal users. Thanks to KOSAKI and Andi for their valuable recommendations! Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:38 -07:00
Wu Fengguang	ed7ce0f102	proc: kpagecount/kpageflags code cleanup Move increments of pfn/out to bottom of the loop. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:36 -07:00
Wu Fengguang	20a0307c03	mm: introduce PageHuge() for testing huge/gigantic pages A series of patches to enhance the /proc/pagemap interface and to add a userspace executable which can be used to present the pagemap data. Export 10 more flags to end users (and more for kernel developers): 11. KPF_MMAP (pseudo flag) memory mapped page 12. KPF_ANON (pseudo flag) memory mapped page (anonymous) 13. KPF_SWAPCACHE page is in swap cache 14. KPF_SWAPBACKED page is swap/RAM backed 15. KPF_COMPOUND_HEAD () 16. KPF_COMPOUND_TAIL () 17. KPF_HUGE hugeTLB pages 18. KPF_UNEVICTABLE page is in the unevictable LRU list 19. KPF_HWPOISON hardware detected corruption 20. KPF_NOPAGE (pseudo flag) no page frame at the address (*) For compound pages, exporting _both_ head/tail info enables users to tell where a compound page starts/ends, and its order. a simple demo of the page-types tool # ./page-types -h page-types [options] -r\|--raw Raw mode, for kernel developers -a\|--addr addr-spec Walk a range of pages -b\|--bits bits-spec Walk pages with specified bits -l\|--list Show page details in ranges -L\|--list-each Show page details one by one -N\|--no-summary Don't show summay info -h\|--help Show this usage message addr-spec: N one page at offset N (unit: pages) N+M pages range from N to N+M-1 N,M pages range from N to M-1 N, pages range from N to end ,M pages range from 0 to M bits-spec: bit1,bit2 (flags & (bit1\|bit2)) != 0 bit1,bit2=bit1 (flags & (bit1\|bit2)) == bit1 bit1,~bit2 (flags & (bit1\|bit2)) == bit1 =bit1,bit2 flags == (bit1\|bit2) bit-names: locked error referenced uptodate dirty lru active slab writeback reclaim buddy mmap anonymous swapcache swapbacked compound_head compound_tail huge unevictable hwpoison nopage reserved(r) mlocked(r) mappedtodisk(r) private(r) private_2(r) owner_private(r) arch(r) uncached(r) readahead(o) slob_free(o) slub_frozen(o) slub_debug(o) (r) raw mode bits (o) overloaded bits # ./page-types flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 487369 1903 _________________________________ 0x0000000000000014 5 0 __R_D____________________________ referenced,dirty 0x0000000000000020 1 0 _____l___________________________ lru 0x0000000000000024 34 0 __R__l___________________________ referenced,lru 0x0000000000000028 3838 14 ___U_l___________________________ uptodate,lru 0x0001000000000028 48 0 ___U_l_______________________I___ uptodate,lru,readahead 0x000000000000002c 6478 25 __RU_l___________________________ referenced,uptodate,lru 0x000100000000002c 47 0 __RU_l_______________________I___ referenced,uptodate,lru,readahead 0x0000000000000040 8344 32 ______A__________________________ active 0x0000000000000060 1 0 _____lA__________________________ lru,active 0x0000000000000068 348 1 ___U_lA__________________________ uptodate,lru,active 0x0001000000000068 12 0 ___U_lA______________________I___ uptodate,lru,active,readahead 0x000000000000006c 988 3 __RU_lA__________________________ referenced,uptodate,lru,active 0x000100000000006c 48 0 __RU_lA______________________I___ referenced,uptodate,lru,active,readahead 0x0000000000004078 1 0 ___UDlA_______b__________________ uptodate,dirty,lru,active,swapbacked 0x000000000000407c 34 0 __RUDlA_______b__________________ referenced,uptodate,dirty,lru,active,swapbacked 0x0000000000000400 503 1 __________B______________________ buddy 0x0000000000000804 1 0 __R________M_____________________ referenced,mmap 0x0000000000000828 1029 4 ___U_l_____M_____________________ uptodate,lru,mmap 0x0001000000000828 43 0 ___U_l_____M_________________I___ uptodate,lru,mmap,readahead 0x000000000000082c 382 1 __RU_l_____M_____________________ referenced,uptodate,lru,mmap 0x000100000000082c 12 0 __RU_l_____M_________________I___ referenced,uptodate,lru,mmap,readahead 0x0000000000000868 192 0 ___U_lA____M_____________________ uptodate,lru,active,mmap 0x0001000000000868 12 0 ___U_lA____M_________________I___ uptodate,lru,active,mmap,readahead 0x000000000000086c 800 3 __RU_lA____M_____________________ referenced,uptodate,lru,active,mmap 0x000100000000086c 31 0 __RU_lA____M_________________I___ referenced,uptodate,lru,active,mmap,readahead 0x0000000000004878 2 0 ___UDlA____M__b__________________ uptodate,dirty,lru,active,mmap,swapbacked 0x0000000000001000 492 1 ____________a____________________ anonymous 0x0000000000005808 4 0 ___U_______Ma_b__________________ uptodate,mmap,anonymous,swapbacked 0x0000000000005868 2839 11 ___U_lA____Ma_b__________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 30 0 __RU_lA____Ma_b__________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 513968 2007 # ./page-types -r flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 468002 1828 _________________________________ 0x0000000100000000 19102 74 _____________________r___________ reserved 0x0000000000008000 41 0 _______________H_________________ compound_head 0x0000000000010000 188 0 ________________T________________ compound_tail 0x0000000000008014 1 0 __R_D__________H_________________ referenced,dirty,compound_head 0x0000000000010014 4 0 __R_D___________T________________ referenced,dirty,compound_tail 0x0000000000000020 1 0 _____l___________________________ lru 0x0000000800000024 34 0 __R__l__________________P________ referenced,lru,private 0x0000000000000028 3794 14 ___U_l___________________________ uptodate,lru 0x0001000000000028 46 0 ___U_l_______________________I___ uptodate,lru,readahead 0x0000000400000028 44 0 ___U_l_________________d_________ uptodate,lru,mappedtodisk 0x0001000400000028 2 0 ___U_l_________________d_____I___ uptodate,lru,mappedtodisk,readahead 0x000000000000002c 6434 25 __RU_l___________________________ referenced,uptodate,lru 0x000100000000002c 47 0 __RU_l_______________________I___ referenced,uptodate,lru,readahead 0x000000040000002c 14 0 __RU_l_________________d_________ referenced,uptodate,lru,mappedtodisk 0x000000080000002c 30 0 __RU_l__________________P________ referenced,uptodate,lru,private 0x0000000800000040 8124 31 ______A_________________P________ active,private 0x0000000000000040 219 0 ______A__________________________ active 0x0000000800000060 1 0 _____lA_________________P________ lru,active,private 0x0000000000000068 322 1 ___U_lA__________________________ uptodate,lru,active 0x0001000000000068 12 0 ___U_lA______________________I___ uptodate,lru,active,readahead 0x0000000400000068 13 0 ___U_lA________________d_________ uptodate,lru,active,mappedtodisk 0x0000000800000068 12 0 ___U_lA_________________P________ uptodate,lru,active,private 0x000000000000006c 977 3 __RU_lA__________________________ referenced,uptodate,lru,active 0x000100000000006c 48 0 __RU_lA______________________I___ referenced,uptodate,lru,active,readahead 0x000000040000006c 5 0 __RU_lA________________d_________ referenced,uptodate,lru,active,mappedtodisk 0x000000080000006c 3 0 __RU_lA_________________P________ referenced,uptodate,lru,active,private 0x0000000c0000006c 3 0 __RU_lA________________dP________ referenced,uptodate,lru,active,mappedtodisk,private 0x0000000c00000068 1 0 ___U_lA________________dP________ uptodate,lru,active,mappedtodisk,private 0x0000000000004078 1 0 ___UDlA_______b__________________ uptodate,dirty,lru,active,swapbacked 0x000000000000407c 34 0 __RUDlA_______b__________________ referenced,uptodate,dirty,lru,active,swapbacked 0x0000000000000400 538 2 __________B______________________ buddy 0x0000000000000804 1 0 __R________M_____________________ referenced,mmap 0x0000000000000828 1029 4 ___U_l_____M_____________________ uptodate,lru,mmap 0x0001000000000828 43 0 ___U_l_____M_________________I___ uptodate,lru,mmap,readahead 0x000000000000082c 382 1 __RU_l_____M_____________________ referenced,uptodate,lru,mmap 0x000100000000082c 12 0 __RU_l_____M_________________I___ referenced,uptodate,lru,mmap,readahead 0x0000000000000868 192 0 ___U_lA____M_____________________ uptodate,lru,active,mmap 0x0001000000000868 12 0 ___U_lA____M_________________I___ uptodate,lru,active,mmap,readahead 0x000000000000086c 800 3 __RU_lA____M_____________________ referenced,uptodate,lru,active,mmap 0x000100000000086c 31 0 __RU_lA____M_________________I___ referenced,uptodate,lru,active,mmap,readahead 0x0000000000004878 2 0 ___UDlA____M__b__________________ uptodate,dirty,lru,active,mmap,swapbacked 0x0000000000001000 492 1 ____________a____________________ anonymous 0x0000000000005008 2 0 ___U________a_b__________________ uptodate,anonymous,swapbacked 0x0000000000005808 4 0 ___U_______Ma_b__________________ uptodate,mmap,anonymous,swapbacked 0x000000000000580c 1 0 __RU_______Ma_b__________________ referenced,uptodate,mmap,anonymous,swapbacked 0x0000000000005868 2839 11 ___U_lA____Ma_b__________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 29 0 __RU_lA____Ma_b__________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 513968 2007 # ./page-types --raw --list --no-summary --bits reserved offset count flags 0 15 _____________________r___________ 31 4 _____________________r___________ 159 97 _____________________r___________ 4096 2067 _____________________r___________ 6752 2390 _____________________r___________ 9355 3 _____________________r___________ 9728 14526 _____________________r___________ This patch: Introduce PageHuge(), which identifies huge/gigantic pages by their dedicated compound destructor functions. Also move prep_compound_gigantic_page() to hugetlb.c and make __free_pages_ok() non-static. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:36 -07:00
Andy Adamson	5d77ddfbcb	nfsd41: sanity check client drc maxreqs Ensure the client requested maximum requests are between 1 and NFSD_MAX_SLOTS_PER_SESSION Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-16 17:13:16 -07:00
Oleg Nesterov	8eeee4e2f0	send_sigio_to_task: sanitize the usage of fown->signum send_sigio_to_task() reads fown->signum several times, we can race with F_SETSIG which changes ->signum lockless. In theory, this can fool security checks or we can call group_send_sig_info() with the wrong ->si_signo which does not match "int sig". Change the code to cache ->signum. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 15:36:17 -07:00
Oleg Nesterov	2f38d70fb4	shift current_cred() from __f_setown() to f_modown() Shift current_cred() from __f_setown() to f_modown(). This reduces the number of arguments and saves 48 bytes from fs/fcntl.o. [ Note: this doesn't clear euid/uid when pid is set to NULL. But if f_owner.pid == NULL we never use f_owner.uid/euid. Otherwise we'd have a bug anyway: we must not send signals if pid was reset to NULL. ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 14:19:00 -07:00
Linus Torvalds	e1f5b94fd0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (143 commits) USB: xhci depends on PCI. USB: xhci: Add Makefile, MAINTAINERS, and Kconfig entries. USB: xhci: Respect critical sections. USB: xHCI: Fix interrupt moderation. USB: xhci: Remove packed attribute from structures. usb; xhci: Fix TRB offset calculations. USB: xhci: replace if-elseif-else with switch-case USB: xhci: Make xhci-mem.c include linux/dmapool.h USB: xhci: drop spinlock in xhci_urb_enqueue() error path. USB: Change names of SuperSpeed ep companion descriptor structs. USB: xhci: Avoid compiler reordering in Link TRB giveback. USB: xhci: Clean up xhci_irq() function. USB: xhci: Avoid global namespace pollution. USB: xhci: Fix Link TRB handoff bit twiddling. USB: xhci: Fix register write order. USB: xhci: fix some compiler warnings in xhci.h USB: xhci: fix lots of compiler warnings. USB: xhci: use xhci_handle_event instead of handle_event USB: xhci: URB cancellation support. USB: xhci: Scatter gather list support for bulk transfers. ...	2009-06-16 13:06:10 -07:00
Linus Torvalds	6fd03301d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (64 commits) debugfs: use specified mode to possibly mark files read/write only debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem. xen: remove driver_data direct access of struct device from more drivers usb: gadget: at91_udc: remove driver_data direct access of struct device uml: remove driver_data direct access of struct device block/ps3: remove driver_data direct access of struct device s390: remove driver_data direct access of struct device parport: remove driver_data direct access of struct device parisc: remove driver_data direct access of struct device of_serial: remove driver_data direct access of struct device mips: remove driver_data direct access of struct device ipmi: remove driver_data direct access of struct device infiniband: ehca: remove driver_data direct access of struct device ibmvscsi: gadget: at91_udc: remove driver_data direct access of struct device hvcs: remove driver_data direct access of struct device xen block: remove driver_data direct access of struct device thermal: remove driver_data direct access of struct device scsi: remove driver_data direct access of struct device pcmcia: remove driver_data direct access of struct device PCIE: remove driver_data direct access of struct device ... Manually fix up trivial conflicts due to different direct driver_data direct access fixups in drivers/block/{ps3disk.c,ps3vram.c}	2009-06-16 12:57:37 -07:00
Linus Torvalds	cd5232bd6b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: fix regression preventing coalescing of extents	2009-06-16 12:23:52 -07:00
Linus Torvalds	300df7dc89	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/net: Use wait_event() in o2net_send_message_vec() ocfs2: Adjust rightmost path in ocfs2_add_branch. ocfs2: fdatasync should skip unimportant metadata writeout ocfs2: Remove redundant gotos in ocfs2_mount_volume() ocfs2: Add statistics for the checksum and ecc operations. ocfs2 patch to track delayed orphan scan timer statistics ocfs2: timer to queue scan of all orphan slots ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories ocfs2: Fix possible deadlock in quota recovery ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() ocfs2: Fix lock inversion in ocfs2_local_read_info() ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() ocfs2: update comments in masklog.h ocfs2: Don't printk the error when listing too many xattrs.	2009-06-16 12:11:57 -07:00
Linus Torvalds	d613839ef9	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: remove some includings of blktrace_api.h mg_disk: seperate mg_disk.h again block: Introduce helper to reset queue limits to default values cfq: remove extraneous '\n' in blktrace output ubifs: register backing_dev_info btrfs: properly register fs backing device block: don't overwrite bdi->state after bdi_init() has been run cfq: cleanup for last_end_request in cfq_data	2009-06-16 11:46:45 -07:00
Dave Kleikamp	f7c52fd17a	jfs: fix regression preventing coalescing of extents Commit `fec1878fe9` caused a regression in which contiguous blocks being allocated to the end of an extent were getting a new extent created. This typically results in files entirely made up of 1-block extents even though the blocks are contiguous on disk. Apparently grub doesn't handle a jfs file being fragmented into too many extents, since it refuses to boot a kernel from jfs that was created by the 2.6.30 kernel. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Reported-by: Alex <alevkovich@tut.by>	2009-06-16 13:43:22 -05:00
Linus Torvalds	69257cae20	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: always update root items for fs trees at commit time	2009-06-16 11:30:16 -07:00
Linus Torvalds	23059a0df5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6: fat: split fat_generic_ioctl FAT: add 'errors' mount option	2009-06-16 11:29:44 -07:00
Alexandros Batsakis	6c18ba9f5e	nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct the change is valid for both the forechannel and the backchannel (currently dummy) Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-16 10:13:45 -07:00
Li Zefan	e212d6f250	block: remove some includings of blktrace_api.h When porting blktrace to tracepoints, we changed to trace/block.h for trace prober declarations. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 11:19:36 +02:00
Jens Axboe	a979eff181	ubifs: register backing_dev_info Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 08:21:04 +02:00
Jens Axboe	ad081f1430	btrfs: properly register fs backing device btrfs assigns this bdi to all inodes on that file system, so make sure it's registered. This isn't really important now, but will be when we put dirty inodes there. Even now, we miss the stats when the bdi isn't visible. Also fixes failure to check bdi_init() return value, and bad inherit of ->capabilities flags from the default bdi. Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 08:21:03 +02:00
Alan Stern	74675a5850	NLS: update handling of Unicode This patch (as1239) updates the kernel's treatment of Unicode. The character-set conversion routines are well behind the current state of the Unicode specification: They don't recognize the existence of code points beyond plane 0 or of surrogate pairs in the UTF-16 encoding. The old wchar_t 16-bit type is retained because it's still used in lots of places. This shouldn't cause any new problems; if a conversion now results in an invalid 16-bit code then before it must have yielded an undefined code. Difficult-to-read names like "utf_mbstowcs" are replaced with more transparent names like "utf8s_to_utf16s" and the ordering of the parameters is rationalized (buffer lengths come immediate after the pointers they refer to, and the inputs precede the outputs). Fortunately the low-level conversion routines are used in only a few places; the interfaces to the higher-level uni2char and char2uni methods have been left unchanged. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:44:43 -07:00
Clemens Ladisch	905c02acbd	nls: utf8_wcstombs: fix buffer overflow utf8_wcstombs forgot to include one-byte UTF-8 characters when calculating the output buffer size, i.e., theoretically, it was possible to overflow the output buffer with an input string that contains enough ASCII characters. In practice, this was no problem because the only user so far (VFAT) always uses a big enough output buffer. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:44:43 -07:00
Clemens Ladisch	e27ecdd94d	nls: utf8_wcstombs: use correct buffer size in error case When utf8_wcstombs encounters a character that cannot be encoded, we must not decrease the remaining output buffer size because nothing has been written to the output buffer. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:44:43 -07:00
Robin Getz	e4792aa30f	debugfs: use specified mode to possibly mark files read/write only In many SoC implementations there are hardware registers can be read or write only. This extends the debugfs to enforce the file permissions for these types of registers by providing a set of fops which are read or write only. This assumes that the kernel developer knows more about the hardware than the user (even root users) -- which is normally true. Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Bryan Wu <cooloney@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:28 -07:00
Jonathan Corbet	400ced61fa	debugfs: fix docbook error Fix an error in debugfs_create_blob's docbook description It cannot actually be used to write a binary blob. Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2009-06-15 21:30:24 -07:00
Steven Rostedt	56a83cc929	debugfs: dont stop on first failed recursive delete debugfs: dont stop on first failed recursive delete While running a while loop of removing a module that removes a debugfs directory with debugfs_remove_recursive, and at the same time doing a while loop of cat of a file in that directory, I would hit a point where somehow the cat of the file caused the remove to fail. The result is that other files did not get removed when the module was removed. I simple read of one of those file can oops the kernel because the operations to the file no longer exist (removed by module). The funny thing is that the file being cat'ed was removed. It was the siblings that were not. I see in the code to debugfs_remove_recursive there's a test that checks if the child fails to bail out of the loop to prevent an infinite loop. What this patch does is to still try any siblings in that directory. If all the siblings fail, or there are no more siblings, then we exit the loop. This fixes the above symptom, but... This is no full proof. It makes the debugfs_remove_recursive a bit more robust, but it does not explain why the one file failed. There may be some kind of delay deletion that makes the debugfs think it did not succeed. So this patch is more of a fix for the symptom but not the disease. This patch still makes the debugfs_remove_recursive more robust and until I can find out why the bug exists, this patch will keep the kernel from oopsing in most cases. Even after the cause is found I think this change can stand on its own and should be kept. [ Impact: prevent kernel oops on module unload and reading debugfs files ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:24 -07:00
Armin Kuster	557411eb2c	Sysfs: fix possible memleak in sysfs_follow_link There is the possiblity of a memory leak if a page is allocated and if sysfs_getlink() fails in the sysfs_follow_link. Signed-off-by: Armin Kuster <akuster@mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:23 -07:00
Yu Zhiguo	b9081d90f5	NFS: kill off complicated macro 'PROC' kill off obscure macro 'PROC' of NFSv2&3 in order to make the code more clear. Among other things, this makes it simpler to grep for callers of these functions--something which has frequently caused confusion among nfs developers. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-15 19:34:32 -07:00
J. Bruce Fields	e4636d535e	nfsd: minor nfsd_vfs_write cleanup There's no need to check host_err >= 0 every time here when we could check host_err < 0 once, following the usual kernel style. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-15 19:18:34 -07:00
J. Bruce Fields	d911df7b8d	nfsd: Pull write-gathering code out of nfsd_vfs_write This is a relatively self-contained piece of code that handles a special case--move it to its own function. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-15 18:54:05 -07:00
J. Bruce Fields	9d2a3f31d6	nfsd: track last inode only in use_wgather case Updating last_ino and last_dev probably isn't useful in the !use_wgather case. Also remove some pointless ifdef'd-out code. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-15 18:52:47 -07:00
Trond Myklebust	48e03bc515	nfsd: Use write gathering only with NFSv2 NFSv3 and above can use unstable writes whenever they are sending more than one write, rather than relying on the flaky write gathering heuristics. More often than not, write gathering is currently getting it wrong when the NFSv3 clients are sending a single write with FILE_SYNC for efficiency reasons. This patch turns off write gathering for NFSv3/v4, and ensures that it only applies to the one case that can actually benefit: namely NFSv2. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-15 18:14:57 -07:00
J. Bruce Fields	7eef4091a6	Merge commit 'v2.6.30' into for-2.6.31	2009-06-15 18:08:07 -07:00
Yan Zheng	978d910d31	Btrfs: always update root items for fs trees at commit time commit_fs_roots skips updating root items for fs trees that aren't modified. This is unsafe now that relocation code modifies root item's last_snapshot field without modifying corresponding fs tree. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-15 20:01:02 -04:00
Sunil Mushran	9af0b38ff3	ocfs2/net: Use wait_event() in o2net_send_message_vec() Replace wait_event_interruptible() with wait_event() in o2net_send_message_vec(). This is because this function is called by the dlm that expects signals to be blocked. Fixes oss bugzilla#1126 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1126 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-15 14:50:14 -07:00
Tao Ma	6b791bcc8b	ocfs2: Adjust rightmost path in ocfs2_add_branch. In ocfs2_add_branch, we use the rightmost rec of the leaf extent block to generate the e_cpos for the newly added branch. In the most case, it is OK but if the parent extent block's rightmost rec covers more clusters than the leaf does, it will cause kernel panic if we insert some clusters in it. The message is something like: (7445,1):ocfs2_insert_at_leaf:3775 ERROR: bug expression: le16_to_cpu(el->l_next_free_rec) >= le16_to_cpu(el->l_count) (7445,1):ocfs2_insert_at_leaf:3775 ERROR: inode 66053, depth 0, count 28, next free 28, rec.cpos 270, rec.clusters 1, insert.cpos 275, insert.clusters 1 [<fa7ad565>] ? ocfs2_do_insert_extent+0xb58/0xda0 [ocfs2] [<fa7b08f2>] ? ocfs2_insert_extent+0x5bd/0x6ba [ocfs2] [<fa7b1b8b>] ? ocfs2_add_clusters_in_btree+0x37f/0x564 [ocfs2] ... The panic can be easily reproduced by the following small test case (with bs=512, cs=4K, and I remove all the error handling so that it looks clear enough for reading). int main(int argc, char *argv) { int fd, i; char buf[5] = "test"; fd = open(argv[1], O_RDWR\|O_CREAT); for (i = 0; i < 30; i++) { lseek(fd, 40960 i, SEEK_SET); write(fd, buf, 5); } ftruncate(fd, 1146880); lseek(fd, 1126400, SEEK_SET); write(fd, buf, 5); close(fd); return 0; } The reason of the panic is that: the 30 writes and the ftruncate makes the file's extent list looks like: Tree Depth: 1 Count: 19 Next Free Rec: 1 ## Offset Clusters Block# 0 0 280 86183 SubAlloc Bit: 7 SubAlloc Slot: 0 Blknum: 86183 Next Leaf: 0 CRC32: 00000000 ECC: 0000 Tree Depth: 0 Count: 28 Next Free Rec: 28 ## Offset Clusters Block# Flags 0 0 1 143368 0x0 1 10 1 143376 0x0 ... 26 260 1 143576 0x0 27 270 1 143584 0x0 Now another write at 1126400(275 cluster) whiich will write at the gap between 271 and 280 will trigger ocfs2_add_branch, but the result after the function looks like: Tree Depth: 1 Count: 19 Next Free Rec: 2 ## Offset Clusters Block# 0 0 280 86183 1 271 0 143592 So the extent record is intersected and make the following operation bug out. This patch just try to remove the gap before we add the new branch, so that the root(branch) rightmost rec will cover the same right position. So in the above case, before adding branch the tree will be changed to Tree Depth: 1 Count: 19 Next Free Rec: 1 ## Offset Clusters Block# 0 0 271 86183 SubAlloc Bit: 7 SubAlloc Slot: 0 Blknum: 86183 Next Leaf: 0 CRC32: 00000000 ECC: 0000 Tree Depth: 0 Count: 28 Next Free Rec: 28 ## Offset Clusters Block# Flags 0 0 1 143368 0x0 1 10 1 143376 0x0 ... 26 260 1 143576 0x0 27 270 1 143584 0x0 And after branch add, the tree looks like Tree Depth: 1 Count: 19 Next Free Rec: 2 ## Offset Clusters Block# 0 0 271 86183 1 271 0 143592 Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-15 14:49:43 -07:00
Linus Torvalds	9c7cb99a82	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits) nilfs2: support contiguous lookup of blocks nilfs2: add sync_page method to page caches of meta data nilfs2: use device's backing_dev_info for btree node caches nilfs2: return EBUSY against delete request on snapshot nilfs2: modify list of unsupported features in caveats nilfs2: enable sync_page method nilfs2: set bio unplug flag for the last bio in segment nilfs2: allow future expansion of metadata read out via get info ioctl NILFS2: Pagecache usage optimization on NILFS2 nilfs2: remove nilfs_btree_operations from btree mapping nilfs2: remove nilfs_direct_operations from direct mapping nilfs2: remove bmap pointer operations nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function nilfs2: move get block functions in bmap.c into btree codes nilfs2: remove nilfs_bmap_delete_block nilfs2: remove nilfs_bmap_put_block nilfs2: remove header file for segment list operations nilfs2: eliminate removal list of segments nilfs2: add sufile function that can modify multiple segment usages ...	2009-06-15 09:13:49 -07:00
Alexey Zaytsev	df59c0ad05	[SCSI] compat: don't perform unneeded copy in sg_io code The members from 'status' in struct sg_io_hdr to the last are used to transfer information from kernel to user space. The values that user space sets are just ignored. Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-06-15 10:09:30 -05:00
Steve French	361ea1ae54	[CIFS] Fix build break Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-15 13:46:12 +00:00
Jeff Layton	61f98ffd74	cifs: display scopeid in /proc/mounts Move address display into a new function and display the scopeid as part of the address in /proc/mounts. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-15 13:42:30 +00:00
Christian Engelmayer	a2ab0ce09e	jffs2: leaking jffs2_summary in function jffs2_scan_medium In case of an error returned by file_dirty() 's' is not freed as the cleanup path is skipped. Reported by Coverity. Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-06-15 11:17:46 +01:00
Theodore Ts'o	4159175058	ext4: Don't update ctime for non-extent-mapped inodes The VFS handles updating ctime, so we don't need to update the inode's ctime in ext4_splace_branch() to update the direct or indirect blocks. This was harmless when we did this in ext3, but in ext4, thanks to delayed allocation, updating the ctime in ext4_splice_branch() can cause the ctime to mysteriously jump when the blocks are finally allocated. Thanks to Björn Steinbrink for pointing out this problem on the git mailing list. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-15 03:41:23 -04:00
Mike Frysinger	0a8eba9b7f	ramfs: ignore unknown mount options On systems where CONFIG_SHMEM is disabled, mounting tmpfs filesystems can fail when tmpfs options are used. This is because tmpfs creates a small wrapper around ramfs which rejects unknown options, and ramfs itself only supports a tiny subset of what tmpfs supports. This makes it pretty hard to use the same userspace systems across different configuration systems. As such, ramfs should ignore the tmpfs options when tmpfs is merely a wrapper around ramfs. This used to work before commit `c3b1b1cbf0` as previously, ramfs would ignore all options. But now, we get: ramfs: bad mount option: size=10M mount: mounting mdev on /dev failed: Invalid argument Another option might be to restore the previous behavior, where ramfs simply ignored all unknown mount options ... which is what Hugh prefers. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Acked-by: Matt Mackall <mpm@selenic.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-14 17:58:25 -07:00
Aneesh Kumar K.V	62e086be5d	ext4: Move __ext4_journalled_writepage() to avoid forward declaration In addition, fix two unused variable warnings. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-14 17:59:34 -04:00
Aneesh Kumar K.V	43ce1d23b4	ext4: Fix mmap/truncate race when blocksize < pagesize && !nodellaoc This patch fixes the mmap/truncate race that was fixed for delayed allocation by merging ext4_{journalled,normal,da}_writepage() into ext4_writepage(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-14 17:58:45 -04:00
Aneesh Kumar K.V	c364b22c95	ext4: Fix mmap/truncate race when blocksize < pagesize && delayed allocation It is possible to see buffer_heads which are not mapped in the writepage callback in the following scneario (where the fs blocksize is 1k and the page size is 4k): 1) truncate(f, 1024) 2) mmap(f, 0, 4096) 3) a[0] = 'a' 4) truncate(f, 4096) 5) writepage(...) Now if we get a writepage callback immediately after (4) and before an attempt to write at any other offset via mmap address (which implies we are yet to get a pagefault and do a get_block) what we would have is the page which is dirty have first block allocated and the other three buffer_heads unmapped. In the above case the writepage should go ahead and try to write the first blocks and clear the page_dirty flag. Further attempts to write to the page will again create a fault and result in allocating blocks and marking page dirty. If we don't write any other offset via mmap address we would still have written the first block to the disk and rest of the space will be considered as a hole. So to address this, we change all of the places where we look for delayed, unmapped, or unwritten buffer heads, and only check for delayed or unwritten buffer heads instead. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-14 17:57:10 -04:00
Theodore Ts'o	de9a55b841	ext4: Fix up whitespace issues in fs/ext4/inode.c This is a pure cleanup patch. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-14 17:45:34 -04:00
Theodore Ts'o	0610b6e999	ext4: Fix 64-bit block type problem on 32-bit platforms The function ext4_mb_free_blocks() was using an "unsigned long" to pass a block number; this will cause 64-bit block numbers to get truncated on x86 and other 32-bit platforms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2009-06-15 03:45:05 -04:00
Linus Torvalds	489f7ab6c1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (31 commits) trivial: remove the trivial patch monkey's name from SubmittingPatches trivial: Fix a typo in comment of addrconf_dad_start() trivial: usb: fix missing space typo in doc trivial: pci hotplug: adding __init/__exit macros to sgi_hotplug trivial: Remove the hyphen from git commands trivial: fix ETIMEOUT -> ETIMEDOUT typos trivial: Kconfig: .ko is normally not included in module names trivial: SubmittingPatches: fix typo trivial: Documentation/dell_rbu.txt: fix typos trivial: Fix Pavel's address in MAINTAINERS trivial: ftrace:fix description of trace directory trivial: unnecessary (void*) cast removal in sound/oss/msnd.c trivial: input/misc: Fix typo in Kconfig trivial: fix grammo in bus_for_each_dev() kerneldoc trivial: rbtree.txt: fix rb_entry() parameters in sample code trivial: spelling fix in ppc code comments trivial: fix typo in bio_alloc kernel doc trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txt trivial: Miscellaneous documentation typo fixes trivial: fix typo milisecond/millisecond for documentation and source comments. ...	2009-06-14 13:46:25 -07:00
Steve French	b70b92e41d	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2009-06-14 13:34:46 +00:00
Andreas Dilger	11013911da	ext4: teach the inode allocator to use a goal inode number Enhance the inode allocator to take a goal inode number as a paremeter; if it is specified, it takes precedence over Orlov or parent directory inode allocation algorithms. The extents migration function uses the goal inode number so that the extent trees allocated the migration function use the correct flex_bg. In the future, the goal inode functionality will also be used to allocate an adjacent inode for the extended attributes. Also, for testing purposes the goal inode number can be specified via /sys/fs/{dev}/inode_goal. This can be useful for testing inode allocation beyond 2^32 blocks on very large filesystems. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 11:45:35 -04:00
Theodore Ts'o	f157a4aa98	ext4: Use a hash of the topdir directory name for the Orlov parent group Instead of using a random number to determine the goal parent grop for the Orlov top directories, use a hash of the directory name. This allows for repeatable results when trying to benchmark filesystem layout algorithms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 11:09:42 -04:00
Theodore Ts'o	4ab2f15b7f	ext4: move the abort flag from s_mount_opts to s_mount_flags We're running out of space in the mount options word, and EXT4_MOUNT_ABORT isn't really a mount option, but a run-time flag. So move it to become EXT4_MF_FS_ABORTED in s_mount_flags. Also remove bogus ext2_fs.h / ext4.h simultaneous #include protection, which can never happen. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 10:09:36 -04:00
Theodore Ts'o	bc0b0d6d69	ext4: update the s_last_mounted field in the superblock This field can be very helpful when a system administrator is trying to sort through large numbers of block devices or filesystem images. What is stored in this field can be ambiguous if multiple filesystem namespaces are in play; what we store in practice is the mountpoint interpreted by the process's namespace which first opens a file in the filesystem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 10:09:48 -04:00
Theodore Ts'o	7f4520cc62	ext4: change s_mount_opt to be an unsigned int We can only fit 32 options in s_mount_opt because an unsigned long is 32-bits on a x86 machine. So use an unsigned int to save space on 64-bit platforms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 10:09:41 -04:00
Akira Fujita	748de6736c	ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctl The EXT4_IOC_MOVE_EXT exchanges the blocks between orig_fd and donor_fd, and then write the file data of orig_fd to donor_fd. ext4_mext_move_extent() is the main fucntion of ext4 online defrag, and this patch includes all functions related to ext4 online defrag. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-17 19:24:03 -04:00
Jeff Layton	1e68b2b275	cifs: add new routine for converting AF_INET and AF_INET6 addrs ...to consolidate some logic used in more than one place. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-13 08:17:30 +00:00
Jeff Layton	340481a364	cifs: have cifs_show_options show forceuid/forcegid options Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-13 08:17:30 +00:00
Jeff Layton	8616e0fc1e	cifs: remove unneeded NULL checks from cifs_show_options show_options is always called with the namespace_sem held. Therefore we don't need to worry about the vfsmount being NULL, or it vanishing while the function is running. By the same token, there's no need to worry about the superblock, tcon, smb or tcp sessions being NULL on entry. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-06-13 08:17:30 +00:00
Felix Blyakher	fd40261354	Merge branch 'master' of git://oss.sgi.com/xfs/xfs into for-linus	2009-06-12 21:28:59 -05:00
Christoph Hellwig	e83f1eb6bf	xfs: fix small mismerge in xfs_vn_mknod Identation got messed up when merging the current_umask changes with the generic ACL support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-06-12 21:15:31 -05:00
Christoph Hellwig	493b87e5ed	xfs: fix warnings with CONFIG_XFS_QUOTA disabled Fix warnings about unitialized dquot variables by making sure xfs_qm_vop_dqalloc touches it even when quotas are disabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-06-12 21:15:12 -05:00
Linus Torvalds	f3ad116588	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/configfs * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/configfs: configfs: Rework configfs_depend_item() locking and make lockdep happy configfs: Silence lockdep on mkdir() and rmdir()	2009-06-12 18:21:19 -07:00
Linus Torvalds	c53567ad45	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: use more NOFS allocation dlm: connect to nodes earlier dlm: fix use count with multiple joins dlm: Make name input parameter of {,dlm_}new_lockspace() const	2009-06-12 13:17:12 -07:00
Linus Torvalds	c9b8af00ff	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (154 commits) [SCSI] osd: Remove out-of-tree left overs [SCSI] libosd: Use REQ_QUIET requests. [SCSI] osduld: use filp_open() when looking up an osd-device [SCSI] libosd: Define an osd_dev wrapper to retrieve the request_queue [SCSI] libosd: osd_req_{read,write} takes a length parameter [SCSI] libosd: Let _osd_req_finalize_data_integrity receive number of out_bytes [SCSI] libosd: osd_req_{read,write}_kern new API [SCSI] libosd: Better printout of OSD target system information [SCSI] libosd: OSD2r05: Attribute definitions [SCSI] libosd: OSD2r05: Additional command enums [SCSI] mpt fusion: fix up doc book comments [SCSI] mpt fusion: Added support for Broadcast primitives Event handling [SCSI] mpt fusion: Queue full event handling [SCSI] mpt fusion: RAID device handling and Dual port Raid support is added [SCSI] mpt fusion: Put IOC into ready state if it not already in ready state [SCSI] mpt fusion: Code Cleanup patch [SCSI] mpt fusion: Rescan SAS topology added [SCSI] mpt fusion: SAS topology scan changes, expander events [SCSI] mpt fusion: Firmware event implementation using seperate WorkQueue [SCSI] mpt fusion: rewrite of ioctl_cmds internal generated function ...	2009-06-12 09:50:42 -07:00
Linus Torvalds	6cb8a91174	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Remove lock_kernel from gfs2_put_super() GFS2: Add tracepoints	2009-06-12 09:43:44 -07:00
Linus Torvalds	7f3591cfac	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits) lguest: add support for indirect ring entries lguest: suppress notifications in example Launcher lguest: try to batch interrupts on network receive lguest: avoid sending interrupts to Guest when no activity occurs. lguest: implement deferred interrupts in example Launcher lguest: remove obsolete LHREQ_BREAK call lguest: have example Launcher service all devices in separate threads lguest: use eventfds for device notification eventfd: export eventfd_signal and eventfd_fget for lguest lguest: allow any process to send interrupts lguest: PAE fixes lguest: PAE support lguest: Add support for kvm_hypercall4() lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated lguest: map switcher with executable page table entries lguest: fix writev returning short on console output lguest: clean up length-used value in example launcher lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition. lguest: beyond ARRAY_SIZE of cpu->arch.gdt ...	2009-06-12 09:32:26 -07:00
Linus Torvalds	c34752bc8b	Merge branch 'cuse' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'cuse' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: CUSE: implement CUSE - Character device in Userspace fuse: export symbols to be used by CUSE fuse: update fuse_conn_init() and separate out fuse_conn_kill() fuse: don't use inode in fuse_file_poll fuse: don't use inode in fuse_do_ioctl() helper fuse: don't use inode in fuse_sync_release() fuse: create fuse_do_open() helper for CUSE fuse: clean up args in fuse_finish_open() and fuse_release_fill() fuse: don't use inode in helpers called by fuse_direct_io() fuse: add members to struct fuse_file fuse: prepare fuse_direct_io() for CUSE fuse: clean up fuse_write_fill() fuse: use struct path in release structure fuse: misc cleanups	2009-06-12 09:31:20 -07:00
Linus Torvalds	d614aec475	Merge branch 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (29 commits) ide: re-implement ide_pci_init_one() on top of ide_pci_init_two() ide: unexport ide_find_dma_mode() ide: fix PowerMac bootup oops ide: skip probe if there are no devices on the port (v2) sl82c105: add printk() logging facility ide-tape: fix proc warning ide: add IDE_DFLAG_NIEN_QUIRK device flag ide: respect quirk_drives[] list on all controllers hpt366: enable all quirks for devices on quirk_drives[] list hpt366: sync quirk_drives[] list with pdc202xx_{new,old}.c ide: remove superfluous SELECT_MASK() call from do_rw_taskfile() ide: remove superfluous SELECT_MASK() call from ide_driveid_update() icside: remove superfluous ->maskproc method ide-tape: fix IDE_AFLAG_* atomic accesses ide-tape: change IDE_AFLAG_IGNORE_DSC non-atomically pdc202xx_old: kill resetproc() method pdc202xx_old: don't call pdc202xx_reset() on IRQ timeout pdc202xx_old: use ide_dma_test_irq() ide: preserve Host Protected Area by default (v2) ide-gd: implement block device ->set_capacity method (v2) ...	2009-06-12 09:29:42 -07:00
Nikanth Karthikesan	76d93ff344	trivial: fix typo in bio_alloc kernel doc Fix typo in bio_alloc kernel doc. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:47 +02:00
Thadeu Lima de Souza Cascardo	ab2274af05	trivial: fix typo compatiable/compatiability has extra 'a'. Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:46 +02:00
Wolfram Sang	2eadfc0ed6	trivial: fs/inode: Fix typo in file_update_time nanodoc The advertised flag for not updating the time was wrong. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:45 +02:00
Nikanth Karthikesan	ff677f8d10	trivial: fix comment typo in fs/compat.c Fix a typo in fs/compat.c Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:44 +02:00
Ali Gholami Rudi	88164ff4fc	trivial: ext2: fix a typo in comment in ext2.h Signed-off-by: Ali Gholami Rudi <ali@rudi.ir> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:44 +02:00
Felix Blyakher	7747a0b0af	xfs: fix freeing memory in xfs_getbmap() Regression from commit `28e211700a`. Need to free temporary buffer allocated in xfs_getbmap(). Signed-off-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Hedi Berriche <hedi@sgi.com> Reported-by: Justin Piszcz <jpiszcz@lucidpixels.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de>	2009-06-12 10:26:52 -05:00
James Bottomley	82681a318f	[SCSI] Merge branch 'linus' Conflicts: drivers/message/fusion/mptsas.c fixed up conflict between req->data_len accessors and mptsas driver updates. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-06-12 10:02:03 -05:00
Rusty Russell	5718607bb6	eventfd: export eventfd_signal and eventfd_fget for lguest lguest wants to attach eventfds to guest notifications, and lguest is usually a module. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> To: Davide Libenzi <davidel@xmailserver.org>	2009-06-12 22:27:09 +09:30
Steven Whitehouse	3ea400581f	GFS2: Remove lock_kernel from gfs2_put_super() It is not required here. Signed-off-by: Steven Whitehouse <swhiteho@redhat,com> Cc: Christoph Hellwig <hch@infradead.org>	2009-06-12 13:40:47 +01:00
Steven Whitehouse	63997775b7	GFS2: Add tracepoints This patch adds the ability to trace various aspects of the GFS2 filesystem. The trace points are divided into three groups, glocks, logging and bmap. These points have been chosen because they allow inspection of the major internal functions of GFS2 and they are also generic enough that they are unlikely to need any major changes as the filesystem evolves. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-06-12 08:49:20 +01:00
Ryusuke Konishi	aa7dfb8954	nilfs2: get rid of bd_mount_sem use from nilfs This will remove every bd_mount_sem use in nilfs. The intended exclusion control was replaced by the previous patch ("nilfs2: correct exclusion control in nilfs_remount function") for nilfs_remount(), and this patch will replace remains with a new mutex that this inserts in nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:18 -04:00
Ryusuke Konishi	e59399d010	nilfs2: correct exclusion control in nilfs_remount function nilfs_remount() changes mount state of a superblock instance. Even though nilfs accesses other superblock instances during mount or remount, the mount state was not properly protected in nilfs_remount(). Moreover, nilfs_remount() has a lock order reversal problem; nilfs_get_sb() holds: 1. bdev->bd_mount_sem 2. sb->s_umount (sget acquires) and nilfs_remount() holds: 1. sb->s_umount (locked by the caller in vfs) 2. bdev->bd_mount_sem To avoid these problems, this patch divides a semaphore protecting super block instances from nilfs->ns_sem, and applies it to the mount state protection in nilfs_remount(). With this change, bd_mount_sem use is removed from nilfs_remount() and the lock order reversal will be resolved. And the new rw-semaphore, nilfs->ns_super_sem will properly protect the mount state except the modification from nilfs_error function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:18 -04:00
Ryusuke Konishi	6dd4740662	nilfs2: simplify remaining sget() use This simplifies the test function passed on the remaining sget() callsite in nilfs. Instead of checking mount type (i.e. ro-mount/rw-mount/snapshot mount) in the test function passed to sget(), this patch first looks up the nilfs_sb_info struct which the given mount type matches, and then acquires the super block instance holding the nilfs_sb_info. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:18 -04:00
Ryusuke Konishi	3f82ff5516	nilfs2: get rid of sget use for checking if current mount is present This stops using sget() for checking if an r/w-mount or an r/o-mount exists on the device. This elimination uses a back pointer to the current mount added to nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Ryusuke Konishi	33c8e57c86	nilfs2: get rid of sget use for acquiring nilfs object This will change the way to obtain nilfs object in nilfs_get_sb() function. Previously, a preliminary sget() call was performed, and the nilfs object was acquired from a super block instance found by the sget() call. This patch, instead, instroduces a new dedicated function find_or_create_nilfs(); as the name implies, the function finds an existent nilfs object from a global list or creates a new one if no object is found on the device. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Ryusuke Konishi	81fc20bd0e	nilfs2: remove meaningless EBUSY case from nilfs_get_sb function The following EBUSY case in nilfs_get_sb() is meaningless. Indeed, this error code is never returned to the caller. if (!s->s_root) { ... } else if (!(s->s_flags & MS_RDONLY)) { err = -EBUSY; } This simply removes the else case. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Christoph Hellwig	0c95ee190e	remove the call to ->write_super in __sync_filesystem Now that all filesystems provide ->sync_fs methods we can change __sync_filesystem to only call ->sync_fs. This gives us a clear separation between periodic writeouts which are driven by ->write_super and data integrity syncs that go through ->sync_fs. (modulo file_fsync which is also going away) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00

... 13 14 15 16 17 ...

15761 Commits