linux

Author	SHA1	Message	Date
Jan Schmidt	a542ad1baf	btrfs: added helper functions to iterate backrefs These helper functions iterate back references and call a function for each backref. There is also a function to resolve an inode to a path in the file system. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2011-09-29 12:54:27 +02:00
Linus Torvalds	b6c8069d35	vfs: remove LOOKUP_NO_AUTOMOUNT flag That flag no longer makes sense, since we don't look up automount points as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT handling was buggy to begin with: it would avoid automounting even for cases where we really needed to do the automount handling, and could return ENOENT for autofs entries that hadn't been instantiated yet. With our new non-eager automount semantics, one discussion has been about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the newfstatat() and fstatat64() system calls), but it's probably not worth it: you can always force at least directory automounting by simply adding the final '/' to the filename, which works for all of the stat family system calls, old and new. So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a result of our bad default behavior. Acked-by: Ian Kent <raven@themaw.net> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-27 08:12:33 -07:00
Trond Myklebust	815d405cef	VFS: Fix the remaining automounter semantics regressions The concensus seems to be that system calls such as stat() etc should not trigger an automount. Neither should the l* versions. This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups that _should_ trigger an automount on the last path element. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [ Edited to leave out the cases that are already covered by LOOKUP_OPEN, LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally force automounting for their own reasons - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-26 19:16:46 -07:00
Linus Torvalds	d94c177bee	vfs pathname lookup: Add LOOKUP_AUTOMOUNT flag Since we've now turned around and made LOOKUP_FOLLOW not force an automount, we want to add the ability to force an automount event on lookup even if we don't happen to have one of the other flags that force it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..) Most cases will never want to use this, since you'd normally want to delay automounting as long as possible, which usually implies LOOKUP_OPEN (when we open a file or directory, we really cannot avoid the automount any more). But Trond argued sufficiently forcefully that at a minimum bind mounting a file and quotactl will want to force the automount lookup. Some other cases (like nfs_follow_remote_path()) could use it too, although LOOKUP_DIRECTORY would work there as well. This commit just adds the flag and logic, no users yet, though. It also doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and was made irrelevant by the same change that made us not follow on LOOKUP_FOLLOW. Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Ian Kent <raven@themaw.net> Cc: Jeff Layton <jlayton@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-26 17:44:55 -07:00
Linus Torvalds	fed678dc8a	Merge branch 'for-linus' of git://git.kernel.dk/linux-block * 'for-linus' of git://git.kernel.dk/linux-block: floppy: use del_timer_sync() in init cleanup blk-cgroup: be able to remove the record of unplugged device block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request mm: Add comment explaining task state setting in bdi_forker_thread() mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread() block: simplify force plug flush code a little bit block: change force plug flush call order block: Fix queue_flag update when rq_affinity goes from 2 to 1 block: separate priority boosting from REQ_META block: remove READ_META and WRITE_META xen-blkback: fixed indentation and comments xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.	2011-09-21 13:20:21 -07:00
Dave Hansen	32ef43848f	teach /proc/$pid/numa_maps about transparent hugepages This is modeled after the smaps code. It detects transparent hugepages and then does a single gather_stats() for the page as a whole. This has two benifits: 1. It is more efficient since it does many pages in a single shot. 2. It does not have to break down the huge page. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-21 13:15:44 -07:00
Dave Hansen	3200a8aaab	break out numa_maps gather_pte_stats() checks gather_pte_stats() does a number of checks on a target page to see whether it should even be considered for statistics. This breaks that code out in to a separate function so that we can use it in the transparent hugepage case in the next patch. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Christoph Lameter <cl@gentwo.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-21 13:15:44 -07:00
Dave Hansen	eb4866d006	make /proc/$pid/numa_maps gather_stats() take variable page size We need to teach the numa_maps code about transparent huge pages. The first step is to teach gather_stats() that the pte it is dealing with might represent more than one page. Note that will we use this in a moment for transparent huge pages since they have use a single pmd_t which _acts_ as a "surrogate" for a bunch of smaller pte_t's. I'm a _bit_ unhappy that this interface counts in hugetlbfs page sizes for hugetlbfs pages and PAGE_SIZE for normal pages. That means that to figure out how many _bytes_ "dirty=1" means, you must first know the hugetlbfs page size. That's easier said than done especially if you don't have visibility in to the mount. But, that's probably a discussion for another day especially since it would change behavior to fix it. But, just in case anyone wonders why this patch only passes a '1' in the hugetlb case... Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-21 13:15:44 -07:00
Linus Torvalds	43a964a7bf	Merge branch 'for-linus' of git://github.com/chrismason/linux * 'for-linus' of git://github.com/chrismason/linux: Btrfs: reserve sufficient space for ioctl clone	2011-09-20 14:22:55 -07:00
Chris Mason	0a7a0519d1	Merge branch 'btrfs-3.0' into for-linus	2011-09-20 14:49:29 -04:00
Sage Weil	b6f3409b21	Btrfs: reserve sufficient space for ioctl clone Fix a crash/BUG_ON in the clone ioctl due to insufficient reservation. We need to reserve space for: - adjusting the old extent (possibly splitting it) - adding the new extent - updating the inode Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-20 14:48:51 -04:00
Shirish Pargaonkar	cfbd6f84c2	cifs: Fix broken sec=ntlmv2/i sec option (try #2 ) Fix sec=ntlmv2/i authentication option during mount of Samba shares. cifs client was coding ntlmv2 response incorrectly. All that is needed in temp as specified in MS-NLMP seciton 3.3.2 "Define ComputeResponse(NegFlg, ResponseKeyNT, ResponseKeyLM, CHALLENGE_MESSAGE.ServerChallenge, ClientChallenge, Time, ServerName) as Set temp to ConcatenationOf(Responserversion, HiResponserversion, Z(6), Time, ClientChallenge, Z(4), ServerName, Z(4)" is MsvAvNbDomainName. For sec=ntlmsspi, build_av_pair is not used, a blob is plucked from type 2 response sent by the server to use in authentication. I tested sec=ntlmv2/i and sec=ntlmssp/i mount options against Samba (3.6) and Windows - XP, 2003 Server and 7. They all worked. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-09-19 21:16:58 -05:00
Steve French	c9c7fa0064	Fix the conflict between rwpidforward and rw mount options Both these options are started with "rw" - that's why the first one isn't switched on even if it is specified. Fix this by adding a length check for "rw" option check. Cc: <stable@kernel.org> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-09-19 21:16:20 -05:00
Pavel Shilovsky	5b980b0121	CIFS: Fix ERR_PTR dereference in cifs_get_root move it to the beginning of the loop. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-09-19 21:15:03 -05:00
Jeff Layton	9438fabb73	cifs: fix possible memory corruption in CIFSFindNext The name_len variable in CIFSFindNext is a signed int that gets set to the resume_name_len in the cifs_search_info. The resume_name_len however is unsigned and for some infolevels is populated directly from a 32 bit value sent by the server. If the server sends a very large value for this, then that value could look negative when converted to a signed int. That would make that value pass the PATH_MAX check later in CIFSFindNext. The name_len would then be used as a length value for a memcpy. It would then be treated as unsigned again, and the memcpy scribbles over a ton of memory. Fix this by making the name_len an unsigned value in CIFSFindNext. Cc: <stable@kernel.org> Reported-by: Darren Lavender <dcl@hppine99.gbr.hp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-09-19 21:14:40 -05:00
Linus Torvalds	50f2d407c0	Merge branch 'for-linus' of git://github.com/chrismason/linux * 'for-linus' of git://github.com/chrismason/linux: Btrfs: only clear the need lookup flag after the dentry is setup BTRFS: Fix lseek return value for error Btrfs: don't change inode flag of the dest clone file Btrfs: don't make a file partly checksummed through file clone Btrfs: fix pages truncation in btrfs_ioctl_clone() btrfs: fix d_off in the first dirent	2011-09-19 17:17:32 -07:00
Josef Bacik	a66e7cc626	Btrfs: only clear the need lookup flag after the dentry is setup We can race with readdir and the RCU path walking stuff. This is because we clear the need lookup flag before actually instantiating the inode. This will lead the RCU path walk stuff to find a dentry it thinks is valid without a d_inode attached. So instead unhash the dentry when we first start the lookup, and then clear the flag after we've instantiated the dentry so we're garunteed to either try the slow lookup, or have the d_inode set properly. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-18 10:34:03 -04:00
Jeff Liu	48802c8ae2	BTRFS: Fix lseek return value for error The recent reworking of btrfs' lseek lead to incorrect values being returned. This adds checks for seeking beyond EOF in SEEK_HOLE and makes sure the error values come back correct. Andi Kleen also sent in similar patches. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-18 10:34:02 -04:00
Chris Mason	2cf4ce7c2a	Merge branch 'btrfs-3.0' into for-linus	2011-09-18 10:31:44 -04:00
Li Zefan	dde820fbf7	Btrfs: don't change inode flag of the dest clone file The dst file will have the same inode flags with dst file after file clone, and I think it's unexpected. For example, the dst file will suddenly become immutable after getting some share of data with src file, if the src is immutable. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-18 10:20:46 -04:00
Li Zefan	0e7b824c4e	Btrfs: don't make a file partly checksummed through file clone To reproduce the bug: # mount /dev/sda7 /mnt # dd if=/dev/zero of=/mnt/src bs=4K count=1 # umount /mnt # mount -o nodatasum /dev/sda7 /mnt # dd if=/dev/zero of=/mnt/dst bs=4K count=1 # clone_range -s 4K -l 4K /mnt/src /mnt/dst # echo 3 > /proc/sys/vm/drop_caches # cat /mnt/dst # dmesg ... btrfs no csum found for inode 258 start 0 btrfs csum failed ino 258 off 0 csum 2566472073 private 0 It's because part of the file is checksummed and the other part is not, and then btrfs will complain checksum is not found when we read the file. Disallow file clone if src and dst file have different checksum flag, so we ensure a file is completely checksummed or unchecksummed. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-18 10:20:46 -04:00
Li Zefan	71ef078610	Btrfs: fix pages truncation in btrfs_ioctl_clone() It's a bug in commit `f81c9cdc56` (Btrfs: truncate pages from clone ioctl target range) We should pass the dest range to the truncate function, but not the src range. Also move the function before locking extent state. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-18 10:20:46 -04:00
Hidetoshi Seto	3765fefaee	btrfs: fix d_off in the first dirent Since the d_off in the first dirent for "." (that originates from the 4th argument "offset" of filldir() for the 2nd dirent for "..") is wrongly assigned in btrfs_real_readdir(), telldir returns same offset for different locations. \| # mkfs.btrfs /dev/sdb1 \| # mount /dev/sdb1 fs0 \| # cd fs0 \| # touch file0 file1 \| # ../test \| telldir: 0 \| readdir: d_off = 2, d_name = "." \| telldir: 2 \| readdir: d_off = 2, d_name = ".." \| telldir: 2 \| readdir: d_off = 3, d_name = "file0" \| telldir: 3 \| readdir: d_off = 2147483647, d_name = "file1" \| telldir: 2147483647 To fix this problem, pass filp->f_pos (which is loff_t) instead. \| # ../test \| telldir: 0 \| readdir: d_off = 1, d_name = "." \| telldir: 1 \| readdir: d_off = 2, d_name = ".." \| telldir: 2 \| readdir: d_off = 3, d_name = "file0" : At the moment the "offset" for "." is unused because there is no preceding dirent, however it is better to pass filp->f_pos to follow grammatical usage. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-18 10:20:46 -04:00
Linus Torvalds	17d8428e4c	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Do not allow multiple mounts on same mountpoint when using -o noac NFS: Fix a typo in nfs_flush_multi NFSv4: renewd needs to be able to handle the NFS4ERR_CB_PATH_DOWN error NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation NFSv4: nfs4_proc_renew should be declared static NFSv4: nfs4_proc_async_renew should use a GFP_NOFS allocation	2011-09-15 12:36:01 -07:00
Christoph Hellwig	f1fcd9f0e9	hfsplus: fix filesystem size checks generic_check_addressable can't deal with hfsplus's larger than page size allocation blocks, so simply opencode the checks that we actually need in hfsplus_fill_super. Signed-off-by: Christoph Hellwig <hch@tuxera.com> Reported-by: Pavel Ivanov <paivanof@gmail.com> Tested-by: Pavel Ivanov <paivanof@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-15 09:03:17 -07:00
Seth Forshee	f588c960fc	hfsplus: Fix kfree of wrong pointers in hfsplus_fill_super() error path Commit `6596528e39` ("hfsplus: ensure bio requests are not smaller than the hardware sectors") changed the pointers used for volume header allocations but failed to free the correct pointers in the error path path of hfsplus_fill_super() and hfsplus_read_wrapper. The second hunk came from a separate patch by Pavel Ivanov. Reported-by: Pavel Ivanov <paivanof@gmail.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-15 09:03:16 -07:00
Linus Torvalds	53d872e995	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix a use after free in xfs_end_io_direct_write	2011-09-14 16:08:29 -07:00
Al Viro	1d2ef59014	restore pinning the victim dentry in vfs_rmdir()/vfs_rename_dir() We used to get the victim pinned by dentry_unhash() prior to commit `64252c75a2` ("vfs: remove dget() from dentry_unhash()") and ->rmdir() and ->rename() instances relied on that; most of them don't care, but ones that used d_delete() themselves do. As the result, we are getting rmdir() oopses on NFS now. Just grab the reference before locking the victim and drop it explicitly after unlocking, same as vfs_rename_other() does. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Simon Kirby <sim@hostway.ca> Cc: stable@kernel.org (3.0.x) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-14 11:31:55 -07:00
Christoph Hellwig	2d2422aebc	xfs: fix a use after free in xfs_end_io_direct_write There is a window in which the ioend that we call inode_dio_wake on in xfs_end_io_direct_write is already free. Fix this by storing the inode pointer in a local variable. This is a fix for the regression introduced in 3.1-rc by "fs: move inode_dio_done to the end_io handler". Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-09-14 08:56:35 -05:00
Sachin Prabhu	fb2088ccc1	nfs: Do not allow multiple mounts on same mountpoint when using -o noac Do not allow multiple mounts on same mountpoint when using -o noac When you normally attempt to mount a share twice on the same mountpoint, a check in do_add_mount causes it to return an error # mount localhost:/nfsv3 /mnt # mount localhost:/nfsv3 /mnt mount.nfs: /mnt is already mounted or busy However when using the option 'noac', the user is able to mount the same share on the same mountpoint multiple times. This happens because a share mounted with the noac option is automatically assigned the 'sync' flag MS_SYNCHRONOUS in nfs_initialise_sb(). This flag is set after the check for already existing superblocks is done in sget(). The check for the mount flags in nfs_compare_mount_options() does not take into account the 'sync' flag applied later on in the code path. This means that when using 'noac', a new superblock structure is assigned for every new mount of the same share and multiple shares on the same mountpoint are allowed. ie. # mount -onoac localhost:/nfsv3 /mnt can be run multiple times. The patch checks for noac and assigns the sync flag before sget() is called to obtain an already existing superblock structure. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-09-13 17:10:15 -04:00
Trond Myklebust	f13c3620a4	NFS: Fix a typo in nfs_flush_multi Fix a typo which causes an Oops in the RPC layer, when using wsize < 4k. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Sricharan R <r.sricharan@ti.com>	2011-09-13 17:06:57 -04:00
Linus Torvalds	0b001b2eda	Merge branch 'for-linus' of git://github.com/chrismason/linux * 'for-linus' of git://github.com/chrismason/linux: Btrfs: add dummy extent if dst offset excceeds file end in Btrfs: calc file extent num_bytes correctly in file clone btrfs: xattr: fix attribute removal Btrfs: fix wrong nbytes information of the inode Btrfs: fix the file extent gap when doing direct IO Btrfs: fix unclosed transaction handle in btrfs_cont_expand Btrfs: fix misuse of trans block rsv Btrfs: reset to appropriate block rsv after orphan operations Btrfs: skip locking if searching the commit root in csum lookup btrfs: fix warning in iput for bad-inode Btrfs: fix an oops when deleting snapshots	2011-09-12 11:47:49 -07:00
Miklos Szeredi	5dfcc87fd7	fuse: fix memory leak kmemleak is reporting that 32 bytes are being leaked by FUSE: unreferenced object 0xe373b270 (size 32): comm "fusermount", pid 1207, jiffies 4294707026 (age 2675.187s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<b05517d7>] kmemleak_alloc+0x27/0x50 [<b0196435>] kmem_cache_alloc+0xc5/0x180 [<b02455be>] fuse_alloc_forget+0x1e/0x20 [<b0245670>] fuse_alloc_inode+0xb0/0xd0 [<b01b1a8c>] alloc_inode+0x1c/0x80 [<b01b290f>] iget5_locked+0x8f/0x1a0 [<b0246022>] fuse_iget+0x72/0x1a0 [<b02461da>] fuse_get_root_inode+0x8a/0x90 [<b02465cf>] fuse_fill_super+0x3ef/0x590 [<b019e56f>] mount_nodev+0x3f/0x90 [<b0244e95>] fuse_mount+0x15/0x20 [<b019d1bc>] mount_fs+0x1c/0xc0 [<b01b5811>] vfs_kern_mount+0x41/0x90 [<b01b5af9>] do_kern_mount+0x39/0xd0 [<b01b7585>] do_mount+0x2e5/0x660 [<b01b7966>] sys_mount+0x66/0xa0 This leak report is consistent and happens once per boot on 3.1.0-rc5-dirty. This happens if a FORGET request is queued after the fuse device was released. Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-12 11:47:10 -07:00
Miklos Szeredi	24114504c4	fuse: fix flock breakage Commit `37fb3a30b4` ("fuse: fix flock") added in 3.1-rc4 caused flock() to fail with ENOSYS with the kernel ABI version 7.16 or earlier. Fix by falling back to testing FUSE_POSIX_LOCKS for ABI versions 7.16 and earlier. Reported-by: Martin Ziegler <ziegler@email.mathematik.uni-freiburg.de> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Martin Ziegler <ziegler@email.mathematik.uni-freiburg.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-12 11:47:10 -07:00
Li Zefan	d525e8ab02	Btrfs: add dummy extent if dst offset excceeds file end in You can see there's no file extent with range [0, 4096]. Check this by btrfsck: # btrfsck /dev/sda7 root 5 inode 258 errors 100 ... Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:25 -04:00
Li Zefan	d72c0842ff	Btrfs: calc file extent num_bytes correctly in file clone num_bytes should be 4096 not 12288. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:25 -04:00
David Sterba	4815053aba	btrfs: xattr: fix attribute removal An attribute is not removed by 'setfattr -x attr file' and remains visible in attr list. This makes xfstests/062 pass again. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:25 -04:00
Miao Xie	a39f752143	Btrfs: fix wrong nbytes information of the inode If we write some data into the data hole of the file(no preallocation for this hole), Btrfs will allocate some disk space, and update nbytes of the inode, but the other element--disk_i_size needn't be updated. At this condition, we must update inode metadata though disk_i_size is not changed(btrfs_ordered_update_i_size() return 1). # mkfs.btrfs /dev/sdb1 # mount /dev/sdb1 /mnt # touch /mnt/a # truncate -s 856002 /mnt/a # dd if=/dev/zero of=/mnt/a bs=4K count=1 conv=nocreat,notrunc # umount /mnt # btrfsck /dev/sdb1 root 5 inode 257 errors 400 found 32768 bytes used err is 1 Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:25 -04:00
Miao Xie	0c1a98c814	Btrfs: fix the file extent gap when doing direct IO When we write some data to the place that is beyond the end of the file in direct I/O mode, a data hole will be created. And Btrfs should insert a file extent item that point to this hole into the fs tree. But unfortunately Btrfs forgets doing it. The following is a simple way to reproduce it: # mkfs.btrfs /dev/sdc2 # mount /dev/sdc2 /test4 # touch /test4/a # dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc # umount /test4 # btrfsck /dev/sdc2 root 5 inode 257 errors 100 Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:24 -04:00
Miao Xie	5b397377e9	Btrfs: fix unclosed transaction handle in btrfs_cont_expand The function - btrfs_cont_expand() forgot to close the transaction handle before it jump out the while loop. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:24 -04:00
Liu Bo	98c9942aca	Btrfs: fix misuse of trans block rsv At the beginning of create_pending_snapshot, trans->block_rsv is set to pending->block_rsv and is used for snapshot things, however, when it is done, we do not recover it as will. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:24 -04:00
Liu Bo	65450aa645	Btrfs: reset to appropriate block rsv after orphan operations While truncating free space cache, we forget to change trans->block_rsv back to the original one, but leave it with the orphan_block_rsv, and then with option inode_cache enable, it leads to countless warnings of btrfs_alloc_free_block and btrfs_orphan_commit_root: WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]() ... WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]() Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:24 -04:00
Josef Bacik	ddf23b3fc6	Btrfs: skip locking if searching the commit root in csum lookup It's not enough to just search the commit root, since we could be cow'ing the very block we need to search through, which would mean that its locked and we'll still deadlock. So use path->skip_locking as well. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:24 -04:00
Sergei Trofimovich	e0b6d65be5	btrfs: fix warning in iput for bad-inode iput() shouldn't be called for inodes in I_NEW state. We need to mark inode as constructed first. WARNING: at fs/inode.c:1309 iput+0x20b/0x210() Call Trace: [<ffffffff8103e7ba>] warn_slowpath_common+0x7a/0xb0 [<ffffffff8103e805>] warn_slowpath_null+0x15/0x20 [<ffffffff810eaf0b>] iput+0x20b/0x210 [<ffffffff811b96fb>] btrfs_iget+0x1eb/0x4a0 [<ffffffff811c3ad6>] btrfs_run_defrag_inodes+0x136/0x210 [<ffffffff811ad55f>] cleaner_kthread+0x17f/0x1a0 [<ffffffff81035b7d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff811ad3e0>] ? transaction_kthread+0x280/0x280 [<ffffffff8105af86>] kthread+0x96/0xa0 [<ffffffff814336d4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105aef0>] ? kthread_worker_fn+0x190/0x190 [<ffffffff814336d0>] ? gs_change+0xb/0xb Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> CC: Konstantin Khlebnikov <khlebnikov@openvz.org> Tested-by: David Sterba <dsterba@suse.cz> CC: Josef Bacik <josef@redhat.com> CC: Chris Mason <chris.mason@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:24 -04:00
Liu Bo	14c7cca780	Btrfs: fix an oops when deleting snapshots We can reproduce this oops via the following steps: $ mkfs.btrfs /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs $ for ((i=0; i<3; i++)); do btrfs sub snap /mnt/btrfs /mnt/btrfs/s_$i; done $ rm -fr /mnt/btrfs/* $ rm -fr /mnt/btrfs/* then we'll get ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:2264! [...] Call Trace: [<ffffffffa05578c7>] btrfs_rmdir+0xf7/0x1b0 [btrfs] [<ffffffff81150b95>] vfs_rmdir+0xa5/0xf0 [<ffffffff81153cc3>] do_rmdir+0x123/0x140 [<ffffffff81145ac7>] ? fput+0x197/0x260 [<ffffffff810aecff>] ? audit_syscall_entry+0x1bf/0x1f0 [<ffffffff81153d0d>] sys_unlinkat+0x2d/0x40 [<ffffffff8147896b>] system_call_fastpath+0x16/0x1b RIP [<ffffffffa054f7b9>] btrfs_orphan_add+0x179/0x1a0 [btrfs] When it comes to btrfs_lookup_dentry, we may set a snapshot's inode->i_ino to BTRFS_EMPTY_SUBVOL_DIR_OBJECTID instead of BTRFS_FIRST_FREE_OBJECTID, while the snapshot's location.objectid remains unchanged. However, btrfs_ino() does not take this into account, and returns a wrong ino, and causes the oops. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:24 -04:00
Linus Torvalds	290a1cc4f7	Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: md: Fix handling for devices from 2TB to 4TB in 0.90 metadata. md/raid1,10: Remove use-after-free bug in make_request. md/raid10: unify handling of write completion. Avoid dereferencing a 'request_queue' after last close.	2011-09-10 10:19:15 -07:00
NeilBrown	94007751bb	Avoid dereferencing a 'request_queue' after last close. On the last close of an 'md' device which as been stopped, the device is destroyed and in particular the request_queue is freed. The free is done in a separate thread so it might happen a short time later. __blkdev_put calls bdev_inode_switch_bdi after ->release has been called. Since commit `f758eeabeb` bdev_inode_switch_bdi will dereference the 'old' bdi, which lives inside a request_queue, to get a spin lock. This causes the last close on an md device to sometime take a spin_lock which lives in freed memory - which results in an oops. So move the called to bdev_inode_switch_bdi before the call to ->release. Cc: Christoph Hellwig <hch@lst.de> Cc: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>	2011-09-10 17:20:21 +10:00
Linus Torvalds	0d20fbbe82	Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client * 'for-linus' of git://ceph.newdream.net/git/ceph-client: libceph: fix leak of osd structs during shutdown ceph: fix memory leak ceph: fix encoding of ino only (not relative) paths libceph: fix msgpool	2011-09-09 15:48:34 -07:00
Miklos Szeredi	0ec26fd069	vfs: automount should ignore LOOKUP_FOLLOW Prior to 2.6.38 automount would not trigger on either stat(2) or lstat(2) on the automount point. After 2.6.38, with the introduction of the ->d_automount() infrastructure, stat(2) and others would start triggering automount while lstat(2), etc. still would not. This is a regression and a userspace ABI change. Problem originally reported here: http://thread.gmane.org/gmane.linux.kernel.autofs/6098 It appears that there was an attempt at fixing various userspace tools to not trigger the automount. But since the stat system call is rather common it is impossible to "fix" all userspace. This patch reverts the original behavior, which is to not trigger on stat(2) and other symlink following syscalls. [ It's not really clear what the right behavior is. Apparently Solaris does the "automount on stat, leave alone on lstat". And some programs can get unhappy when "stat+open+fstat" ends up giving a different result from the fstat than from the initial stat. But the change in 2.6.38 resulted in problems for some people, so we're going back to old behavior. Maybe we can re-visit this discussion at some future date - Linus ] Reported-by: Leonardo Chiquitto <leonardo.lists@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Ian Kent <raven@themaw.net> Cc: David Howells <dhowells@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-09 15:42:34 -07:00
Linus Torvalds	54d6d53744	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 and git://git.infradead.org/ubi-2.6 * branch 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: not build debug messages with CONFIG_UBIFS_FS_DEBUG disabled * branch 'linux-next' of git://git.infradead.org/ubi-2.6: UBI: do not link debug messages when debugging is disabled	2011-09-07 09:51:43 -07:00
Jim Garlick	51b8b4fb32	fs/9p: Use protocol-defined value for lock/getlock 'type' field. Signed-off-by: Jim Garlick <garlick@llnl.gov> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2011-09-06 08:17:16 -05:00
Aneesh Kumar K.V	73f507171c	fs/9p: Always ask new inode in lookup for cache mode disabled This make sure we don't end up reusing the unlinked inode object. The ideal way is to use inode i_generation. But i_generation is not available in userspace always. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2011-09-06 08:17:15 -05:00
Aneesh Kumar K.V	f88657ce3f	fs/9p: Add OS dependent open flags in 9p protocol Some of the flags are OS/arch dependent we add a 9p protocol value which maps to asm-generic/fcntl.h values in Linux Based on the original patch from Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2011-09-06 08:17:15 -05:00
Aneesh Kumar K.V	45089142b1	fs/9p: Don't update file type when updating file attributes We should only update attributes that we can change on stat2inode. Also do file type initialization in v9fs_init_inode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-09-06 08:17:14 -05:00
Aneesh Kumar K.V	5441ae5eb3	fs/9p: Add fid before dentry instantiation d_instantiate marks the dentry positive. So a parallel lookup and mkdir of the directory can find dentry that doesn't have fid attached. This can result in both the code path doing v9fs_fid_add which results in v9fs_dentry leak. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-09-06 08:17:14 -05:00
Linus Torvalds	4d7b5a116f	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix ->write_inode return values xfs: fix xfs_mark_inode_dirty during umount xfs: deprecate the nodelaylog mount option	2011-09-02 08:25:23 -07:00
Christoph Hellwig	58d84c4ee0	xfs: fix ->write_inode return values Currently we always redirty an inode that was attempted to be written out synchronously but has been cleaned by an AIL pushed internall, which is rather bogus. Fix that by doing the i_update_core check early on and return 0 for it. Also include async calls for it, as doing any work for those is just as pointless. While we're at it also fix the sign for the EIO return in case of a filesystem shutdown, and fix the completely non-sensical locking around xfs_log_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> (cherry picked from commit 297db93bb74cf687510313eb235a7aec14d67e97) Signed-off-by: Alex Elder <aelder@sgi.com>	2011-09-01 09:46:11 -05:00
Christoph Hellwig	866e4ed774	xfs: fix xfs_mark_inode_dirty during umount During umount we do not add a dirty inode to the lru and wait for it to become clean first, but force writeback of data and metadata with I_WILL_FREE set. Currently there is no way for XFS to detect that the inode has been redirtied for metadata operations, as we skip the mark_inode_dirty call during teardown. Fix this by setting i_update_core nanually in that case, so that the inode gets flushed during inode reclaim. Alternatively we could enable calling mark_inode_dirty for inodes in I_WILL_FREE state, and let the VFS dirty tracking handle this. I decided against this as we will get better I/O patterns from reclaim compared to the synchronous writeout in write_inode_now, and always marking the inode dirty in some way from xfs_mark_inode_dirty is a better safetly net in either case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> (cherry picked from commit da6742a5a4cc844a9982fdd936ddb537c0747856) Signed-off-by: Alex Elder <aelder@sgi.com>	2011-08-31 17:59:39 -05:00
Linus Torvalds	b79c4f75e4	Merge tag 'for_linus-20110831' of git://github.com/tytso/ext4 * tag 'for_linus-20110831' of git://github.com/tytso/ext4: ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining	2011-08-31 15:08:19 -07:00
Jiaying Zhang	8c0bec2151	ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining The i_mutex lock and flush_completed_IO() added by commit `2581fdc810` in ext4_evict_inode() causes lockdep complaining about potential deadlock in several places. In most/all of these LOCKDEP complaints it looks like it's a false positive, since many of the potential circular locking cases can't take place by the time the ext4_evict_inode() is called; but since at the very least it may mask real problems, we need to address this. This change removes the flush_completed_IO() and i_mutex lock in ext4_evict_inode(). Instead, we take a different approach to resolve the software lockup that commit `2581fdc810` intends to fix. Rather than having ext4-dio-unwritten thread wait for grabing the i_mutex lock of an inode, we use mutex_trylock() instead, and simply requeue the work item if we fail to grab the inode's i_mutex lock. This should speed up work queue processing in general and also prevents the following deadlock scenario: During page fault, shrink_icache_memory is called that in turn evicts another inode B. Inode B has some pending io_end work so it calls ext4_ioend_wait() that waits for inode B's i_ioend_count to become zero. However, inode B's ioend work was queued behind some of inode A's ioend work on the same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten thread on that cpu is processing inode A's ioend work, it tries to grab inode A's i_mutex lock. Since the i_mutex lock of inode A is still hold before the page fault happened, we enter a deadlock. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-08-31 11:50:51 -04:00
NeilBrown	f5b9409973	All Arch: remove linkage for sys_nfsservctl system call The nfsservctl system call is now gone, so we should remove all linkage for it. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-26 15:09:58 -07:00
Josh Boyer	e096d0c7e2	lockdep: Add helper function for dir vs file i_mutex annotation Purely in-memory filesystems do not use the inode hash as the dcache tells us if an entry already exists. As a result, they do not call unlock_new_inode, and thus directory inodes do not get put into a different lockdep class for i_sem. We need the different lockdep classes, because the locking order for i_mutex is different for directory inodes and regular inodes. Directory inodes can do "readdir()", which takes i_mutex before possibly taking mm->mmap_sem (due to a page fault while copying the directory entry to user space). In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem before accessing i_mutex. The two cases can never happen for the same inode, so no real deadlock can occur, but without the different lockdep classes, lockdep cannot understand that. As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this can lead to false positives from lockdep like below: find/645 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac but task is already holding lock: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>] vfs_readdir+0x5b/0xb4 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}: [<ffffffff8108ac26>] lock_acquire+0xbf/0x103 [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361 [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45 [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110 [<ffffffff81111557>] mmap_region+0x258/0x432 [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306 [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a [<ffffffff8100c858>] sys_mmap+0x22/0x24 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b -> #0 (&mm->mmap_sem){++++++}: [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7 [<ffffffff8108ac26>] lock_acquire+0xbf/0x103 [<ffffffff81109541>] might_fault+0x89/0xac [<ffffffff81149cff>] filldir+0x6f/0xc7 [<ffffffff811586ea>] dcache_readdir+0x67/0x205 [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4 [<ffffffff8114a073>] sys_getdents+0x7e/0xd1 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b This patch moves the directory vs file lockdep annotation into a helper function that can be called by in-memory filesystems and has hugetlbfs call it. Signed-off-by: Josh Boyer <jwboyer@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-25 10:50:18 -07:00
Christoph Hellwig	242d621964	xfs: deprecate the nodelaylog mount option Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-08-25 10:30:05 -05:00
Trond Myklebust	042b60beb4	NFSv4: renewd needs to be able to handle the NFS4ERR_CB_PATH_DOWN error The NFSv4 spec does not specify that the server must repeat that error, so in order to avoid having the delegations revoked, we should handle it immediately. Also note that NFS4ERR_CB_PATH_DOWN does in fact renew the lease... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-08-24 15:07:37 -04:00
Trond Myklebust	2f60ea6b8c	NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation RFC3530 states that if the client holds a delegation, then it is obliged to continue to send RENEW calls once every lease period in order to allow the server to return NFS4ERR_CB_PATH_DOWN if the callback path is unreachable. This is not required for NFSv4.1, since the server can at any time set the SEQ4_STATUS_CB_PATH_DOWN_SESSION in any SEQUENCE operation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-08-24 15:07:37 -04:00
Trond Myklebust	8534d4ec05	NFSv4: nfs4_proc_renew should be declared static Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-08-24 15:07:37 -04:00
Trond Myklebust	b569ad3492	NFSv4: nfs4_proc_async_renew should use a GFP_NOFS allocation We shouldn't allow the renew daemon to do direct reclaim on the NFS partition. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-08-24 15:07:35 -04:00
Linus Torvalds	051732bcbe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message fuse: mark pages accessed when written to fuse: delete dead .write_begin and .write_end aops fuse: fix flock fuse: fix non-ANSI void function notation	2011-08-24 09:14:42 -07:00
Miklos Szeredi	c2183d1e9b	fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message FUSE_NOTIFY_INVAL_ENTRY didn't check the length of the write so the message processing could overrun and result in a "kernel BUG at fs/fuse/dev.c:629!" Reported-by: Han-Wen Nienhuys <hanwenn@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org	2011-08-24 10:20:17 +02:00
Linus Torvalds	35a177a08d	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix tracing builds inside the source tree xfs: remove subdirectories xfs: don't expect xfs headers to be in subdirectories	2011-08-23 11:41:44 -07:00
Christoph Hellwig	65299a3b78	block: separate priority boosting from REQ_META Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule, and lave REQ_META purely for marking requests as metadata in blktrace. All existing callers of REQ_META except for XFS are updated to also set REQ_PRIO for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-23 14:50:29 +02:00
Christoph Hellwig	5dc06c5a70	block: remove READ_META and WRITE_META Replace all occurnanced of the undocumented READ_META with READ \| REQ_META and remove the unused WRITE_META define. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-23 14:49:55 +02:00
Christoph Hellwig	b6bede3b4c	xfs: fix tracing builds inside the source tree The code really requires the current source directory to be in the header search path. We already do this if building with an object tree separate from the source, but it needs to be added manually if building inside the source. The cflags addition for it accidentally got removed when collapsing the xfs directory structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-08-22 16:37:24 -05:00
Noah Watkins	259a187ade	ceph: fix memory leak kfree does not clean up indirect allocations in ceph_fs_client and ceph_options (e.g. snapdir_name). Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-08-22 13:06:59 -07:00
Josef Bacik	6719db6a23	Btrfs: fix 64 bit divide problem This fixes a regression introduced by commit `cdcb725c05` ("Btrfs: check if there is enough space for balancing smarter"). We can't do 64-bit divides on 32-bit architectures. In cases where we need to divide/multiply by 2 we should just left/right shift respectively, and in cases where theres N number of devices use do_div. Also make the counters u64 to match up with rw_devices. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Acked-and-tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-21 07:02:00 -07:00
Linus Torvalds	c063d8a60f	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: flush any pending end_io requests before DIO reads w/dioread_nolock ext4: fix nomblk_io_submit option so it correctly converts uninit blocks ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode ext4: Fix ext4_should_writeback_data() for no-journal mode	2011-08-21 06:59:41 -07:00
Jiaying Zhang	dccaf33fa3	ext4: flush any pending end_io requests before DIO reads w/dioread_nolock There is a race between ext4 buffer write and direct_IO read with dioread_nolock mount option enabled. The problem is that we clear PageWriteback flag during end_io time but will do uninitialized-to-initialized extent conversion later with dioread_nolock. If an O_direct read request comes in during this period, ext4 will return zero instead of the recently written data. This patch checks whether there are any pending uninitialized-to-initialized extent conversion requests before doing O_direct read to close the race. Note that this is just a bandaid fix. The fundamental issue is that we clear PageWriteback flag before we really complete an IO, which is problem-prone. To fix the fundamental issue, we may need to implement an extent tree cache that we can use to look up pending to-be-converted extents. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-08-19 19:13:32 -04:00
Michal Marek	016f1c5440	UBIFS: not build debug messages with CONFIG_UBIFS_FS_DEBUG disabled With $ grep -e UBIFS_FS_DEBUG -e DYNAMIC_DEBUG .config # CONFIG_UBIFS_FS_DEBUG is not set CONFIG_DYNAMIC_DEBUG=y Debug messages are kept in the object files due to the dynamic_pr_debug() macro, even if they are never going to be printed: $ make fs/ubifs/super.o $ strings fs/ubifs/super.o \| grep 'compiled on' compiled on: Aug 11 2011 at 12:21:38 Use plain printk to fix this. Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>	2011-08-19 18:58:58 +03:00
Linus Torvalds	35a21b4299	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets NFSv4.1: Fix the callback 'highest_used_slotid' behaviour pnfs-obj: Fix the comp_index != 0 case pnfs-obj: Bug when we are running out of bio nfs: add missing prefetch.h include	2011-08-18 22:47:13 -07:00
Linus Torvalds	5c80c71b9a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: set i_size properly when fallocating and we already btrfs: unlock on error in btrfs_file_llseek() btrfs: btrfs_permission's RO check shouldn't apply to device nodes Btrfs: truncate pages from clone ioctl target range Btrfs: fix uninitialized sync_pending Btrfs: fix wrong free space information btrfs: memory leak in btrfs_add_inode_defrag() Btrfs: use plain page_address() in header fields setget functions Btrfs: forced readonly when btrfs_drop_snapshot() fails Btrfs: check if there is enough space for balancing smarter Btrfs: fix a bug of balance on full multi-disk partitions Btrfs: fix an oops of log replay Btrfs: detect wether a device supports discard Btrfs: force unplugs when switching from high to regular priority bios	2011-08-18 14:20:00 -07:00
Linus Torvalds	01fa4ba52c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: update cifs version to 1.75 [CIFS] possible memory corruption on mount cifs: demote cERROR in build_path_from_dentry to cFYI	2011-08-18 14:19:36 -07:00
Linus Torvalds	f6a975c50a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6: fat: fat16 support maximum 4GB file/vol size as WinXP or 7. fat: fix utf8 iocharset warning message fat: fix build warning	2011-08-18 14:16:13 -07:00
Steve French	04c05b4a68	update cifs version to 1.75 Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-18 16:55:10 +00:00
Steve French	13589c437d	[CIFS] possible memory corruption on mount CIFS cleanup_volume_info_contents() looks like having a memory corruption problem. When UNCip is set to "&vol->UNC[2]" in cifs_parse_mount_options(), it should not be kfree()-ed in cleanup_volume_info_contents(). Introduced in commit `b946845a9d` Signed-off-by: J.R. Okajima <hooanon05@yahoo.co.jp> Reviewed-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-18 16:53:02 +00:00
Chris Mason	81d86e1b70	Merge branch 'btrfs-3.0' into for-linus	2011-08-18 10:38:03 -04:00
Josef Bacik	f1e490a7eb	Btrfs: set i_size properly when fallocating and we already xfstests exposed a problem with preallocate when it fallocates a range that already has an extent. We don't set the new i_size properly because we see that we already have an extent. This isn't right and we should update i_size if the space already exists. With this patch we now pass xfstests 075. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-18 10:36:39 -04:00
Dan Carpenter	9a4327ca1f	btrfs: unlock on error in btrfs_file_llseek() There were some unlocks on error missing in a recent patch to btrfs_file_llseek(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-18 10:16:05 -04:00
Jeff Mahoney	cb6db4e576	btrfs: btrfs_permission's RO check shouldn't apply to device nodes This patch tightens the read-only access checks in btrfs_permission to match the constraints in inode_permission. Currently, even though the device node itself will be unmodified, read-write access to device nodes is denied to when the device node resides on a read-only subvolume or a is a file that has been marked read-only by the btrfs conversion utility. With this patch applied, the check only affects regular files, directories, and symlinks. It also restructures the code a bit so that we don't duplicate the MAY_WRITE check for both tests. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-18 10:16:03 -04:00
Timo Warns	338d0f0a6f	befs: Validate length of long symbolic links. Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-17 13:31:24 -07:00
Namjae Jeon	710d4403a4	fat: fat16 support maximum 4GB file/vol size as WinXP or 7. FAT16 support maximum 4GB vol/file size with 64KB cluster size. Win NT/XP/7 increased the maximum cluster size to 64KB, and file/vol size increased 4GB also. Although increasing, the file size of linux FAT is still limited at 2GB. I found that it is limited by sb->maxbytes(0x7fffffff) when partition is formatted by FAT16. sb->s_maxbytes in fill_super should be set to 0xffffffff like fat32. Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>	2011-08-17 19:35:00 +09:00
Mihai Moldovan	186b53701c	fat: fix utf8 iocharset warning message The fat_msg function already formats the given message and appends a newline to it - we don't need to do this in the passed message string as well, or will end up with a blank line printed in the kernel log ring buffer. Also change the loglevel from error to warning. Signed-off-by: Mihai Moldovan <ionic@ionic.de> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>	2011-08-17 19:34:59 +09:00
Jonas Aberg	8c320c079c	fat: fix build warning This fixes a compile warning (unititialized variable) in the fat filesystem code. Signed-off-by: Jonas Aberg <jonas.aberg@stericsson.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>	2011-08-17 19:34:58 +09:00
Sage Weil	f81c9cdc56	Btrfs: truncate pages from clone ioctl target range We need to truncate page cache pages for the clone ioctl target range or else we'll confuse ourselves to no end. If the old data was cached, we used to still see it (until remount). If the page was partially updated we used to get a mix of old and new data. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:31 -04:00
Miao Xie	0e58885961	Btrfs: fix uninitialized sync_pending sync_pending is uninitialized before it be used, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:31 -04:00
Miao Xie	bb3ac5a4df	Btrfs: fix wrong free space information Btrfs subtracted the size of the allocated space twice when it allocated the space from the bitmap in the cluster, it broke the free space information and led to oops finally. And this patch also fixes the bug that ctl->free_space was subtracted without lock. Reported-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:31 -04:00
Dan Carpenter	f4ac904c41	btrfs: memory leak in btrfs_add_inode_defrag() We don't use the defrag struct on this path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:15 -04:00
Li Zefan	c97c2916e2	Btrfs: use plain page_address() in header fields setget functions We've stopped using highmem for extent buffers. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:15 -04:00
Tsutomu Itoh	cb1b69f450	Btrfs: forced readonly when btrfs_drop_snapshot() fails The filesystem turns readonly instead of returning the error to the caller when detected error in btrfs_drop_snapshot(). and, because the caller doesn't check the error, the function type is changed to 'void'. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:15 -04:00
liubo	cdcb725c05	Btrfs: check if there is enough space for balancing smarter When checking if there is enough space for balancing a block group, since we do not take raid types into consideration, we do not account corrent amounts of space that we needed. This makes us do some extra work before we get ENOSPC. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:15 -04:00
liubo	38c01b9605	Btrfs: fix a bug of balance on full multi-disk partitions When balancing, we'll first try to shrink devices for some space, but if it is working on a full multi-disk partition with raid protection, we may encounter a bug, that is, while shrinking, total_bytes may be less than bytes_used, and btrfs may allocate a dev extent that accesses out of device's bounds. Then we will not be able to write or read the data which stores at the end of the device, and get the followings: device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5 Btrfs detected SSD devices, enabling SSD mode btrfs: relocating block group 476315648 flags 9 btrfs: found 4 extents attempt to access beyond end of device sdb5: rw=145, want=546176, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546304, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546432, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546560, limit=546147 attempt to access beyond end of device Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:15 -04:00
liubo	34f3e4f23c	Btrfs: fix an oops of log replay When btrfs recovers from a crash, it may hit the oops below: ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:4580! [...] RIP: 0010:[<ffffffffa03df251>] [<ffffffffa03df251>] btrfs_add_link+0x161/0x1c0 [btrfs] [...] Call Trace: [<ffffffffa03e7b31>] ? btrfs_inode_ref_index+0x31/0x80 [btrfs] [<ffffffffa04054e9>] add_inode_ref+0x319/0x3f0 [btrfs] [<ffffffffa0407087>] replay_one_buffer+0x2c7/0x390 [btrfs] [<ffffffffa040444a>] walk_down_log_tree+0x32a/0x480 [btrfs] [<ffffffffa0404695>] walk_log_tree+0xf5/0x240 [btrfs] [<ffffffffa0406cc0>] btrfs_recover_log_trees+0x250/0x350 [btrfs] [<ffffffffa0406dc0>] ? btrfs_recover_log_trees+0x350/0x350 [btrfs] [<ffffffffa03d18b2>] open_ctree+0x1442/0x17d0 [btrfs] [...] This comes from that while replaying an inode ref item, we forget to check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree, then we will come to conflict corners which lead to BUG_ON(). Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Tested-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:15 -04:00
Josef Bacik	d5e2003c2b	Btrfs: detect wether a device supports discard We have a problem where if a user specifies discard but doesn't actually support it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem because this gets called (in a fashion) from the tree log recovery code, which has a nice little BUG_ON(ret) after it, which causes us to fail the tree log replay. So instead detect wether our devices support discard when we're adding them and then don't issue discards if we know that the device doesn't support it. And just for good measure set ret = 0 in btrfs_issue_discard just in case we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:15 -04:00
Jeff Layton	fa71f44706	cifs: demote cERROR in build_path_from_dentry to cFYI Running the cthon tests on a recent kernel caused this message to pop occasionally: CIFS VFS: did not end path lookup where expected namelen is 0 Some added debugging showed that namelen and dfsplen were both 0 when this occurred. That means that the read_seqretry returned true. Assuming that the comment inside the if statement is true, this should be harmless and just means that we raced with a rename. If that is the case, then there's no need for alarm and we can demote this to cFYI. While we're at it, print the dfsplen too so that we can see what happened here if the message pops during debugging. Cc: stable@kernel.org Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-16 13:07:24 +00:00
Sage Weil	795858dbd2	ceph: fix encoding of ino only (not relative) paths A 'path' consists of a starting ino and relative component. Encode even when there is no relative component. This is primarily needed by the NFS reexport code. Signed-off-by: Sage Weil <sage@newdream.net>	2011-08-15 13:03:56 -07:00
Linus Torvalds	a0b3447fb1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: flush journal completely before releasing metadata inodes	2011-08-15 08:40:24 -07:00
Linus Torvalds	aa2b1cf5ca	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: Do not set cifs/ntfs acl using a file handle (try #4) [CIFS] Cleanup use of CONFIG_CIFS_STATS2 ifdef to make transport routines more readable	2011-08-15 08:36:30 -07:00
Theodore Ts'o	9dd75f1f1a	ext4: fix nomblk_io_submit option so it correctly converts uninit blocks Bug discovered by Jan Kara: Finally, commit `1449032be1` returned back the old IO submission code but apparently it forgot to return the old handling of uninitialized buffers so we unconditionnaly call block_write_full_page() without specifying end_io function. So AFAICS we never convert unwritten extents to written in some cases. For example when I mount the fs as: mount -t ext4 -o nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do int fd = open(argv[1], O_RDWR \| O_CREAT \| O_TRUNC, 0600); char buf[1024]; memset(buf, 'a', sizeof(buf)); fallocate(fd, 0, 0, 16384); write(fd, buf, sizeof(buf)); I get a file full of zeros (after remounting the filesystem so that pagecache is dropped) instead of seeing the first KB contain 'a's. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-08-13 12:58:21 -04:00
Tao Ma	32c80b32c0	ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten should be done simultaneously since ext4_end_io_nolock always clear the flag and decrease the counter in the same time. We don't increase i_aiodio_unwritten when setting EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process to wait forever. Part of the patch came from Eric in his e-mail, but it doesn't fix the problem met by Michael actually. http://marc.info/?l=linux-ext4&m=131316851417460&w=2 Reported-and-Tested-by: Michael Tokarev<mjt@tls.msk.ru> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-08-13 12:30:59 -04:00
Jiaying Zhang	2581fdc810	ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode Flush inode's i_completed_io_list before calling ext4_io_wait to prevent the following deadlock scenario: A page fault happens while some process is writing inode A. During page fault, shrink_icache_memory is called that in turn evicts another inode B. Inode B has some pending io_end work so it calls ext4_ioend_wait() that waits for inode B's i_ioend_count to become zero. However, inode B's ioend work was queued behind some of inode A's ioend work on the same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten thread on that cpu is processing inode A's ioend work, it tries to grab inode A's i_mutex lock. Since the i_mutex lock of inode A is still hold before the page fault happened, we enter a deadlock. Also moves ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion, ext4_evict_inode() is called before ext4_destroy_inode() and in ext4_evict_inode(), we may call ext4_truncate() without holding i_mutex lock. As a result, there is a race between flush_completed_IO that is called from ext4_ext_truncate() and ext4_end_io_work, which may cause corruption on an io_end structure. This change moves ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode() to ext4_evict_inode() to resolve the race between ext4_truncate() and ext4_end_io_work during inode deletion. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-08-13 12:17:13 -04:00
Curt Wohlgemuth	441c850857	ext4: Fix ext4_should_writeback_data() for no-journal mode ext4_should_writeback_data() had an incorrect sequence of tests to determine if it should return 0 or 1: in particular, even in no-journal mode, 0 was being returned for a non-regular-file inode. This meant that, in non-journal mode, we would use ext4_journalled_aops for directories, symlinks, and other non-regular files. However, calling journalled aop callbacks when there is no valid handle, can cause problems. This would cause a kernel crash with Jan Kara's commit `2d859db3e4` ("ext4: fix data corruption in inodes with journalled data"), because we now dereference 'handle' in ext4_journalled_write_end(). I also added BUG_ONs to check for a valid handle in the obviously journal-only aops callbacks. I tested this running xfstests with a scratch device in these modes: - no-journal - data=ordered - data=writeback - data=journal All work fine; the data=journal run has many failures and a crash in xfstests 074, but this is no different from a vanilla kernel. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-08-13 11:25:18 -04:00
Linus Torvalds	e68ff9cd15	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: replace xfs_buf_geterror() with bp->b_error xfs: Check the return value of xfs_buf_read() for NULL "xfs: fix error handling for synchronous writes" revisited xfs: set cursor in xfs_ail_splice() even when AIL was empty xfs: Remove the macro XFS_BUFTARG_NAME xfs: Remove the macro XFS_BUF_TARGET xfs: Remove the macro XFS_BUF_SET_TARGET Replace the macro XFS_BUF_ISPINNED with helper xfs_buf_ispinned xfs: Remove the macro XFS_BUF_SET_PTR xfs: Remove the macro XFS_BUF_PTR xfs: Remove macro XFS_BUF_SET_START xfs: Remove macro XFS_BUF_HOLD xfs: Remove macro XFS_BUF_BUSY and family xfs: Remove the macro XFS_BUF_ERROR and family xfs: Remove the macro XFS_BUF_BFLAGS	2011-08-12 20:43:01 -07:00
Christoph Hellwig	c59d87c460	xfs: remove subdirectories Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the annoying subdirectories in the XFS source code. Besides the large amount of file rename the only changes are to the Makefile, a few files including headers with the subdirectory prefix, and the binary sysctl compat code that includes a header under fs/xfs/ from kernel/. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-08-12 16:21:35 -05:00
Alex Elder	06f8e2d675	xfs: don't expect xfs headers to be in subdirectories Fix up some #include directives in preparation for moving a few header files out of xfs source subdirectories. Note that "xfs_linux.h" also got its quoting convention for included files switched. Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-08-12 13:57:55 -05:00
Chandra Seetharaman	e570280521	xfs: replace xfs_buf_geterror() with bp->b_error Since we just checked bp for NULL, it is ok to replace xfs_buf_geterror() with bp->b_error in these places. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-08-12 13:39:40 -05:00
Chandra Seetharaman	ac4d6888b2	xfs: Check the return value of xfs_buf_read() for NULL Check the return value of xfs_buf_read() for NULL and return ENOMEM if it is NULL. This is necessary in a few spots to avoid subsequent code blindly dereferencing the null buffer pointer. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-08-12 13:39:29 -05:00
Linus Torvalds	ce8a84ef1e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) e1000e: increase driver version number e1000e: alternate MAC address update e1000e: do not disable receiver on 82574/82583 e1000e: alternate MAC address does not work on device id 0x1060 PCnet: Fix section mismatch bnx2x: disable dcb on 578xx since not supported yet bnx2x: properly clean indirect addresses bnx2x: prevent race between undi_unload and load flows bnx2x: fix select_queue when FCoE is disabled bnx2x: init FCOE FP only once ipv4: some rt_iif -> rt_route_iif conversions net/bridge/netfilter/ebtables.c: use available error handling code net/netlabel/netlabel_kapi.c: add missing cleanup code net/irda: sh_sir: tidyup compile warning net/irda: sh_sir: add missing header net/irda: sh_irda: add missing header slcan: ldisc generated skbs are received in softirq context scm: Capture the full credentials of the scm sender tcp: initialize variable ecn_ok in syncookies path drivers/net/wireless/wl1251: add missing kfree ...	2011-08-12 06:43:53 -07:00
Boaz Harrosh	8cf1fb2163	pnfs: Automatically select blocks & objects layouts Just like files-layout, blocks & objects layouts are part of the NFS 4.1 protocol and should be automatically selected if NFS_4_1 is selected. The small problem is that these depend on other Kernel support being present, while files only depends on NFS itself. This patch removes from the user choice the presence of objects and blocks layout. But makes sure these are selected only if the depended subsystems are present in the Kernel. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-11 17:51:27 -07:00
Eric Sandeen	8c20871998	ext4: Properly count journal credits for long symlinks Commit `df5e622340` ("ext4: fix deadlock in ext4_symlink() in ENOSPC conditions") recalculated the number of credits needed for a long symlink, in the process of splitting it into two transactions. However, the first credit calculation under-counted because if selinux is enabled, credits are needed to create the selinux xattr as well. Overrunning the reservation will result in an OOPS in jbd2_journal_dirty_metadata() due to this assert: J_ASSERT_JH(jh, handle->h_buffer_credits > 0); Fix this by increasing the reservation size. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-11 17:23:40 -07:00
Eric Sandeen	d2db60df1e	ext3: Properly count journal credits for long symlinks Commit `ae54870a1d` ("ext3: Fix lock inversion in ext3_symlink()") recalculated the number of credits needed for a long symlink, in the process of splitting it into two transactions. However, the first credit calculation under-counted because if selinux is enabled, credits are needed to create the selinux xattr as well. Overrunning the reservation will result in an OOPS in journal_dirty_metadata() due to this assert: J_ASSERT_JH(jh, handle->h_buffer_credits > 0); Fix this by increasing the reservation size. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-11 17:23:40 -07:00
Vasiliy Kulikov	72fa59970f	move RLIMIT_NPROC check from set_user() to do_execve_common() The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC check in set_user() to check for NPROC exceeding via setuid() and similar functions. Before the check there was a possibility to greatly exceed the allowed number of processes by an unprivileged user if the program relied on rlimit only. But the check created new security threat: many poorly written programs simply don't check setuid() return code and believe it cannot fail if executed with root privileges. So, the check is removed in this patch because of too often privilege escalations related to buggy programs. The NPROC can still be enforced in the common code flow of daemons spawning user processes. Most of daemons do fork()+setuid()+execve(). The check introduced in execve() (1) enforces the same limit as in setuid() and (2) doesn't create similar security issues. Neil Brown suggested to track what specific process has exceeded the limit by setting PF_NPROC_EXCEEDED process flag. With the change only this process would fail on execve(), and other processes' execve() behaviour is not changed. Solar Designer suggested to re-check whether NPROC limit is still exceeded at the moment of execve(). If the process was sleeping for days between set*uid() and execve(), and the NPROC counter step down under the limit, the defered execve() failure because NPROC limit was exceeded days ago would be unexpected. If the limit is not exceeded anymore, we clear the flag on successful calls to execve() and fork(). The flag is also cleared on successful calls to set_user() as the limit was exceeded for the previous user, not the current one. Similar check was introduced in -ow patches (without the process flag). v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user(). Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-11 11:24:42 -07:00
Shirish Pargaonkar	e22906c564	cifs: Do not set cifs/ntfs acl using a file handle (try #4 ) Set security descriptor using path name instead of a file handle. We can't be sure that the file handle has adequate permission to set a security descriptor (to modify DACL). Function set_cifs_acl_by_fid() has been removed since we can't be sure how a file was opened for writing, a valid request can fail if the file was not opened with two above mentioned permissions. We could have opted to add on WRITE_DAC and WRITE_OWNER permissions to file opens and then use that file handle but adding addtional permissions such as WRITE_DAC and WRITE_OWNER could cause an any open to fail. And it was incorrect to look for read file handle to set a security descriptor anyway. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-11 18:23:45 +00:00
Steve French	789e666123	[CIFS] Cleanup use of CONFIG_CIFS_STATS2 ifdef to make transport routines more readable Christoph had requested that the stats related code (in CONFIG_CIFS_STATS2) be moved into helpers to make code flow more readable. This patch should help. For example the following section from transport.c spin_unlock(&GlobalMid_Lock); atomic_inc(&ses->server->num_waiters); wait_event(ses->server->request_q, atomic_read(&ses->server->inFlight) < cifs_max_pending); atomic_dec(&ses->server->num_waiters); spin_lock(&GlobalMid_Lock); becomes simpler (with the patch below): spin_unlock(&GlobalMid_Lock); cifs_num_waiters_inc(server); wait_event(server->request_q, atomic_read(&server->inFlight) < cifs_max_pending); cifs_num_waiters_dec(server); spin_lock(&GlobalMid_Lock); Reviewed-by: Jeff Layton <jlayton@redhat.com> CC: Christoph Hellwig <hch@infradead.org> Signed-off-by: Steve French <sfrench@us.ibm.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>	2011-08-11 18:23:45 +00:00
Peng Tao	54a33b190a	NFS41: make PNFS_BLOCK selectable PNFS_BLOCK needs BLK_DEV_DM/MD, which is not a dependency for other pnfs layout drivers. Seperate it out so others can still build when BLK_DEV_DM/MD is not enabled. Also change select to depends on to avoid build failures. Reported-and-tested-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Peng Tao <peng_tao@emc.com> Acked-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-11 08:58:02 -07:00
Ajeet Yadav	9e978d8f7d	"xfs: fix error handling for synchronous writes" revisited xfs: fix for hang during synchronous buffer write error If removed storage while synchronous buffer write underway, "xfslogd" hangs. Detailed log http://oss.sgi.com/archives/xfs/2011-07/msg00740.html Related work `bfc60177f8` "xfs: fix error handling for synchronous writes" Given that xfs_bwrite actually does the shutdown already after waiting for the b_iodone completion and given that we actually found that calling xfs_force_shutdown from inside xfs_buf_iodone_callbacks was a major contributor the problem it better to drop this call. Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-08-10 17:00:21 -05:00
Linus Torvalds	d55140ce3a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: Ecryptfs: Add mount option to check uid of device being mounted = expect uid eCryptfs: Fix payload_len unitialized variable warning eCryptfs: fix compile error eCryptfs: Return error when lower file pointer is NULL	2011-08-10 11:08:06 -07:00
John Johansen	764355487e	Ecryptfs: Add mount option to check uid of device being mounted = expect uid Close a TOCTOU race for mounts done via ecryptfs-mount-private. The mount source (device) can be raced when the ownership test is done in userspace. Provide Ecryptfs a means to force the uid check at mount time. Signed-off-by: John Johansen <john.johansen@canonical.com> Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-08-09 23:29:01 -05:00
Alex Elder	e44f4112a4	xfs: set cursor in xfs_ail_splice() even when AIL was empty In xfs_ail_splice(), if a cursor is provided it is updated to point to the last item on the list being spliced into the AIL. But if the AIL was found to be empty, the cursor (if provided) is just initialized instead. There is no reason the empty AIL case needs to be treated any differently. And treating it the same way allows this code to be rearranged a bit, with a somewhat tidier result. Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-08-09 15:30:43 -05:00
Tyler Hicks	99b373ff2d	eCryptfs: Fix payload_len unitialized variable warning fs/ecryptfs/keystore.c: In function ‘ecryptfs_generate_key_packet_set’: fs/ecryptfs/keystore.c:1991:28: warning: ‘payload_len’ may be used uninitialized in this function [-Wuninitialized] fs/ecryptfs/keystore.c:1976:9: note: ‘payload_len’ was declared here Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-08-09 13:42:46 -05:00
Roberto Sassu	4b6fee17b1	eCryptfs: fix compile error This patch fixes the compile error reported at the address: https://bugzilla.kernel.org/show_bug.cgi?id=40292 The problem arises when compiling eCryptfs as built-in and the 'encrypted' key type as a module. The patch prevents this combination from being set in the kernel configuration, by fixing the eCryptfs dependencies. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Reported-by: David Hill <hilld@binarystorm.net> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-08-09 13:42:46 -05:00
Tyler Hicks	f61500e000	eCryptfs: Return error when lower file pointer is NULL When an eCryptfs inode's lower file has been closed, and the pointer has been set to NULL, return an error when trying to do a lower read or write rather than calling BUG(). https://bugzilla.kernel.org/show_bug.cgi?id=37292 Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: <stable@kernel.org>	2011-08-09 13:42:45 -05:00
Linus Torvalds	2f84dd7091	autofs4: fix debug printk warning uncovered by cleanup The previous comit made the autofs4 debug printouts check types against the printout format, and uncovered this bug: fs/autofs4/waitq.c:106:2: warning: format ‘%08lx’ expects type ‘long unsigned int’, but argument 4 has type ‘autofs_wqt_t’ which is due to the insane type for wait_queue_token. That thing should be some fixed well-defined size (preferably just 'unsigned int' or 'u32') but for unexplained reasons it is randomly either 'unsigned long' or 'unsigned int' depending on the architecture. For now, cast it to 'unsigned long' for printing, the way we do elsewhere. Somebody else can try to explain the typedef mess. (There's a reason we don't support excessive use of typedefs in the kernel: it's usually just a good way of confusing yourself). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-08 12:02:43 -07:00
Linus Torvalds	c3ad996246	autofs4: clean up uaotfs use of debug/info/warning printouts Use 'pr_debug()' for DPRINTK, which will do the proper type checking on the arguments (without generating code) even when DEBUG isn't #defined. Also, use the standard __VA_ARGS__ for the macros, and stop the pointless abuse of 'do { xyz } while (0)' when the macro is already a perfectly well-formed single statement. Reported-by: David Howells <dhowells@redhat.com> Suggested-by: Joe Perches <joe@perches.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-08 11:35:17 -07:00
Johannes Weiner	478e0841b3	fuse: mark pages accessed when written to As fuse does not use the page cache library functions when userspace writes to a file, it did not benefit from 'c8236db mm: mark page accessed before we write_end()' that made sure pages are properly marked accessed when written to. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-08-08 16:08:08 +02:00
Johannes Weiner	b40cdd56df	fuse: delete dead .write_begin and .write_end aops Ever since 'ea9b990 fuse: implement perform_write', the .write_begin and .write_end aops have been dead code. Their task - acquiring a page from the page cache, sending out a write request and releasing the page again - is now done batch-wise to maximize the number of pages send per userspace request. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-08-08 16:08:08 +02:00
Miklos Szeredi	37fb3a30b4	fuse: fix flock Commit `a9ff4f87` "fuse: support BSD locking semantics" overlooked a number of issues with supporing flock locks over existing POSIX locking infrastructure: - it's not backward compatible, passing flock(2) calls to userspace unconditionally (if userspace sets FUSE_POSIX_LOCKS) - it doesn't cater for the fact that flock locks are automatically unlocked on file release - it doesn't take into account the fact that flock exclusive locks (write locks) don't need an fd opened for write. The last one invalidates the original premise of the patch that flock locks can be emulated with POSIX locks. This patch fixes the first two issues. The last one needs to be fixed in userspace if the filesystem assumed that a write lock will happen only on a file operned for write (as in the case of the current fuse library). Reported-by: Sebastian Pipping <webmaster@hartwork.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-08-08 16:08:08 +02:00
Alex Elder	2ddb4e9406	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux	2011-08-08 07:06:24 -05:00
Florian Westphal	8bab6f1408	compat_ioctl: add compat handler for PPPIOCGL2TPSTATS fixes following error seen on x86_64 kernel: ioctl32(openl2tpd:7480): Unknown cmd fd(14) cmd(80487436){t:'t';sz:72} arg(ffa7e6c0) on socket:[105094] The argument (struct pppol2tp_ioc_stats) uses "aligned_u64" and thus doesn't need fixups. Cc: James Chapman <jchapman@katalix.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-07 22:24:41 -07:00
Linus Torvalds	7813b94a54	vfs: rename 'do_follow_link' to 'should_follow_link' Al points out that the do_follow_link() helper function really is misnamed - it's about whether we should try to follow a symlink or not, not about actually doing the following. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-07 13:42:25 -07:00
Ari Savolainen	206b1d09a5	Fix POSIX ACL permission check After commit `3567866bf2`: "RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached" posix_acl_permission is being called with an unsupported flag and the permission check fails. This patch fixes the issue. Signed-off-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-07 04:52:23 -04:00
Linus Torvalds	c2f340a69c	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd * 'for-linus' of git://git.open-osd.org/linux-open-osd: ore: Make ore its own module exofs: Rename raid engine from exofs/ios.c => ore exofs: ios: Move to a per inode components & device-table exofs: Move exofs specific osd operations out of ios.c exofs: Add offset/length to exofs_get_io_state exofs: Fix truncate for the raid-groups case exofs: Small cleanup of exofs_fill_super exofs: BUG: Avoid sbi realloc exofs: Remove pnfs-osd private definitions nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4	2011-08-06 22:56:03 -07:00
Linus Torvalds	3ddcd0569c	vfs: optimize inode cache access patterns The inode structure layout is largely random, and some of the vfs paths really do care. The path lookup in particular is already quite D$ intensive, and profiles show that accessing the 'inode->i_op->xyz' fields is quite costly. We already optimized the dcache to not unnecessarily load the d_op structure for members that are often NULL using the DCACHE_OP_xyz bits in dentry->d_flags, and this does something very similar for the inode ops that are used during pathname lookup. It also re-orders the fields so that the fields accessed by 'stat' are together at the beginning of the inode structure, and roughly in the order accessed. The effect of this seems to be in the 1-2% range for an empty kernel "make -j" run (which is fairly kernel-intensive, mostly in filename lookup), so it's visible. The numbers are fairly noisy, though, and likely depend a lot on exact microarchitecture. So there's more tuning to be done. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-06 22:53:23 -07:00
Linus Torvalds	830c0f0edc	vfs: renumber DCACHE_xyz flags, remove some stale ones Gcc tends to generate better code with small integers, including the DCACHE_xyz flag tests - so move the common ones to be first in the list. Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and DCACHE_AUTOFS_PENDING values, their users no longer exists in the source tree. And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the common case to be a nice straight-line fall-through. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-06 22:52:40 -07:00
Boaz Harrosh	cf283ade08	ore: Make ore its own module Export everything from ore need exporting. Change Kbuild and Kconfig to build ore.ko as an independent module. Import ore from exofs Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-08-06 19:36:19 -07:00
Boaz Harrosh	8ff660ab85	exofs: Rename raid engine from exofs/ios.c => ore ORE stands for "Objects Raid Engine" This patch is a mechanical rename of everything that was in ios.c and its API declaration to an ore.c and an osd_ore.h header. The ore engine will later be used by the pnfs objects layout driver. * File ios.c => ore.c * Declaration of types and API are moved from exofs.h to a new osd_ore.h * All used types are prefixed by ore_ from their exofs_ name. * Shift includes from exofs.h to osd_ore.h so osd_ore.h is independent, include it from exofs.h. Other than a pure rename there are no other changes. Next patch will move the ore into it's own module and will export the API to be used by exofs and later the layout driver Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-08-06 19:36:18 -07:00
Boaz Harrosh	9e9db45649	exofs: ios: Move to a per inode components & device-table Exofs raid engine was saving on memory space by having a single layout-info, single pid, and a single device-table, global to the filesystem. Then passing a credential and object_id info at the io_state level, private for each inode. It would also devise this contraption of rotating the device table view for each inode->ino to spread out the device usage. This is not compatible with the pnfs-objects standard, demanding that each inode can have it's own layout-info, device-table, and each object component it's own pid, oid and creds. So: Bring exofs raid engine to be usable for generic pnfs-objects use by: * Define an exofs_comp structure that holds obj_id and credential info. * Break up exofs_layout struct to an exofs_components structure that holds a possible array of exofs_comp and the array of devices + the size of the arrays. * Add a "comps" parameter to get_io_state() that specifies the ids creds and device array to use for each IO. This enables to keep the layout global, but the device-table view, creds and IDs at the inode level. It only adds two 64bit to each inode, since some of these members already existed in another form. * ios raid engine now access layout-info and comps-info through the passed pointers. Everything is pre-prepared by caller for generic access of these structures and arrays. At the exofs Level: * Super block holds an exofs_components struct that holds the device array, previously in layout. The devices there are in device-table order. The device-array is twice bigger and repeats the device-table twice so now each inode's device array can point to a random device and have a round-robin view of the table, making it compatible to previous exofs versions. * Each inode has an exofs_components struct that is initialized at load time, with it's own view of the device table IDs and creds. When doing IO this gets passed to the io_state together with the layout. While preforming this change. Bugs where found where credentials with the wrong IDs where used to access the different SB objects (super.c). As well as some dead code. It was never noticed because the target we use does not check the credentials. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-08-06 19:35:32 -07:00
Boaz Harrosh	85e44df474	exofs: Move exofs specific osd operations out of ios.c ios.c will be moving to an external library, for use by the objects-layout-driver. Remove from it some exofs specific functions. Also g_attr_logical_length is used both by inode.c and ios.c move definition to the later, to keep it independent Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-08-06 19:35:31 -07:00
Boaz Harrosh	e1042ba099	exofs: Add offset/length to exofs_get_io_state In future raid code we will need to know the IO offset/length and if it's a read or write to determine some of the array sizes we'll need. So add a new exofs_get_rw_state() API for use when writeing/reading. All other simple cases are left using the old way. The major change to this is that now we need to call exofs_get_io_state later at inode.c::read_exec and inode.c::write_exec when we actually know these things. So this patch is kept separate so I can test things apart from other changes. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-08-06 19:35:31 -07:00
Linus Torvalds	1957e7fdef	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: cope with negative dentries in cifs_get_root cifs: convert prefixpath delimiters in cifs_build_path_to_root CIFS: Fix missing a decrement of inFlight value cifs: demote DFS referral lookup errors to cFYI Revert "cifs: advertise the right receive buffer size to the server"	2011-08-06 13:54:36 -07:00
Linus Torvalds	1117f72ea0	vfs: show O_CLOEXE bit properly in /proc/<pid>/fdinfo/<fd> files The CLOEXE bit is magical, and for performance (and semantic) reasons we don't actually maintain it in the file descriptor itself, but in a separate bit array. Which means that when we show f_flags, the CLOEXE status is shown incorrectly: we show the status not as it is now, but as it was when the file was opened. Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit array. Uli needs this in order to re-implement the pfiles program: "For normal file descriptors (not sockets) this was the last piece of information which wasn't available. This is all part of my 'give Solaris users no reason to not switch' effort. I intend to offer the code to the util-linux-ng maintainers." Requested-by: Ulrich Drepper <drepper@akkadia.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-06 11:51:33 -07:00
Linus Torvalds	c21427043d	oom_ajd: don't use WARN_ONCE, just use printk_once WARN_ONCE() is very annoying, in that it shows the stack trace that we don't care about at all, and also triggers various user-level "kernel oopsed" logic that we really don't care about. And it's not like the user can do anything about the applications (sshd) in question, it's a distro issue. Requested-by: Andi Kleen <andi@firstfloor.org> (and many others) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-06 11:43:08 -07:00
Chris Mason	2ab1ba68ae	Btrfs: force unplugs when switching from high to regular priority bios Btrfs does bio submissions from a worker thread, and each device has a list of high priority bios and regular priority bios. Synchronous writes go to the high priority thread while async writes go to regular list. This commit brings back an explicit unplug any time we switch from high to regular priority, which makes it easier for the block layer to give us low latencies. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-05 13:48:18 -04:00
Jeff Layton	80975d21aa	cifs: cope with negative dentries in cifs_get_root The loop around lookup_one_len doesn't handle the case where it might return a negative dentry, which can cause an oops on the next pass through the loop. Check for that and break out of the loop with an error of -ENOENT if there is one. Fixes the panic reported here: https://bugzilla.redhat.com/show_bug.cgi?id=727927 Reported-by: TR Bentley <home@trarbentley.net> Reported-by: Iain Arnell <iarnell@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-05 15:03:09 +00:00
Jeff Layton	f9e8c45002	cifs: convert prefixpath delimiters in cifs_build_path_to_root Regression from 2.6.39... The delimiters in the prefixpath are not being converted based on whether posix paths are in effect. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=727834 Reported-and-Tested-by: Iain Arnell <iarnell@gmail.com> Reported-by: Patrick Oltmann <patrick.oltmann@gmx.net> Cc: Pavel Shilovsky <piastryyy@gmail.com> Cc: stable@kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-05 14:55:15 +00:00
Linus Torvalds	24f0eed266	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached get rid of boilerplate switches in posix_acl.h fix block device fallout from ->fsync() changes	2011-08-04 16:44:40 -10:00
Boaz Harrosh	16f75bb35d	exofs: Fix truncate for the raid-groups case In the general raid-group case the truncate was wrong in that it did not also fix the object length of the neighboring groups. There are two bad cases in the old code: 1. Space that should be freed was not. 2. If a file That was big is truncated small, then made bigger again, the holes would not contain zeros but could expose old data. (If the growing of the file expands to more than a full groups cycle + group size (> S + T)) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-08-04 12:35:25 -07:00
Boaz Harrosh	9ce730475e	exofs: Small cleanup of exofs_fill_super Small cleanup that unifies duplicated code used in both the error and success cases Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-08-04 12:35:23 -07:00
Boaz Harrosh	6d4073e881	exofs: BUG: Avoid sbi realloc Since the beginning we realloced the sbi structure when a bigger then one device table was specified. (I know that was really stupid). Then much later when "register bdi" was added (By Jens) it was registering the pointer to sbi->bdi before the realloc. We never saw this problem because up till now the realloc did not do anything since the device table was small enough to fit in the original allocation. But once we starting testing with large device tables (Bigger then 28) we noticed the crash of writeback operating on a deallocated pointer. * Avoid the all mess by allocating the device-table as a second array and get rid of the variable-sized structure and the rest of this mess. * Take the chance to clean near by structures and comments. * Add a needed dprint on startup to indicate the loaded layout. * Also move the bdi registration to the very end because it will only fail in a low memory, which will probably fail before hand. There are many more likely causes to not load before that. This way the error handling is made simpler. (Just doing this would be enough to fix the BUG) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-08-04 12:35:20 -07:00
Boaz Harrosh	26ae93c2dc	exofs: Remove pnfs-osd private definitions Now that pnfs-osd has hit mainline we can remove exofs's private header. (And the FIXME comment) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2011-08-04 12:35:18 -07:00
Trond Myklebust	910ac68a2b	NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets If the client is in the process of resetting the session when it receives a callback, then returning NFS4ERR_DELAY may cause a deadlock with the DESTROY_SESSION call. Basically, if the client returns NFS4ERR_DELAY in response to the CB_SEQUENCE call, then the server is entitled to believe that the client is busy because it is already processing that call. In that case, the server is perfectly entitled to respond with a NFS4ERR_BACK_CHAN_BUSY to any DESTROY_SESSION call. Fix this by having the client reply with a NFS4ERR_BADSESSION in response to the callback if it is resetting the session. Cc: stable@kernel.org [2.6.38+] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-08-04 11:55:35 -04:00
Trond Myklebust	55a673990e	NFSv4.1: Fix the callback 'highest_used_slotid' behaviour Currently, there is no guarantee that we will call nfs4_cb_take_slot() even though nfs4_callback_compound() will consistently call nfs4_cb_free_slot() provided the cb_process_state has set the 'clp' field. The result is that we can trigger the BUG_ON() upon the next call to nfs4_cb_take_slot(). This patch fixes the above problem by using the slot id that was taken in the CB_SEQUENCE operation as a flag for whether or not we need to call nfs4_cb_free_slot(). It also fixes an atomicity problem: we need to set tbl->highest_used_slotid atomically with the check for NFS4_SESSION_DRAINING, otherwise we end up racing with the various tests in nfs4_begin_drain_session(). Cc: stable@kernel.org [2.6.38+] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-08-04 11:55:35 -04:00
Boaz Harrosh	9af7db3228	pnfs-obj: Fix the comp_index != 0 case There were bugs in the case of partial layout where olo_comp_index is not zero. This used to work and was tested but one of the later cleanup SQUASHMEs broke it and was not tested since. Also add a dprint that specify those received layout parameters. Everything else was already printed. [Needed in v3.0] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-08-04 11:54:48 -04:00
Boaz Harrosh	20618b21da	pnfs-obj: Bug when we are running out of bio When we have a situation that the number of pages we want to encode is bigger then the size of the bio. (Which can currently happen only when all IO is going to a single device .e.g group_width==1) then the IO is submitted short and we report back only the amount of bytes we actually wrote/read and all is fine. BUT ... There was a bug that the current length counter was advanced before the fail to add the extra page, and we come to a situation that the CDB length was one-page longer then the actual bio size, which is of course rejected by the osd-target. While here also fix the bio size calculation, in the case that we received more then one group of devices. CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-08-04 11:54:38 -04:00
Heiko Carstens	88c9e42196	nfs: add missing prefetch.h include Fix this compile error on s390: CC [M] fs/nfs/blocklayout/blocklayout.o fs/nfs/blocklayout/blocklayout.c: In function 'bl_end_io_read': fs/nfs/blocklayout/blocklayout.c:201:4: error: implicit declaration of function 'prefetchw' Introduced with `9549ec01` "pnfsblock: bl_read_pagelist". Cc: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-08-04 11:54:25 -04:00
Linus Torvalds	14595f708e	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: use kzalloc in ext4_kzalloc()	2011-08-03 15:09:10 -10:00
Robert P. J. Day	206506ccf0	tmpfs: expand "help" to explain value of TMPFS_POSIX_ACL Expand the fs/Kconfig "help" info to clarify why it's a bad idea to deselect the TMPFS_POSIX_ACL config variable. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-03 14:25:25 -10:00
Hugh Dickins	31475dd611	mm: a few small updates for radix-swap Remove PageSwapBacked (!page_is_file_cache) cases from add_to_page_cache_locked() and add_to_page_cache_lru(): those pages now go through shmem_add_to_page_cache(). Remove a comment on maximum tmpfs size from fsstack_copy_inode_size(), and add a comment on swap entries to invalidate_mapping_pages(). And mincore_page() uses find_get_page() on what might be shmem or a tmpfs file: allow for a radix_tree_exceptional_entry(), and proceed to find_get_page() on swapper_space if so (oh, swapper_space needs #ifdef). Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-03 14:25:24 -10:00
Randy Dunlap	2af1416265	fs/dcache.c: fix new kernel-doc warning Fix new kernel-doc warning in fs/dcache.c: Warning(fs/dcache.c:797): No description found for parameter 'sb' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-03 14:25:21 -10:00
Pavel Shilovsky	0193e07226	CIFS: Fix missing a decrement of inFlight value if we failed on getting mid entry in cifs_call_async. Cc: stable@kernel.org Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-03 19:42:12 +00:00
Mathias Krause	db9481c047	ext4: use kzalloc in ext4_kzalloc() Commit 9933fc0i (ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()) intruduced wrappers around k*alloc/vmalloc but introduced a typo for ext4_kzalloc() by not using kzalloc() but kmalloc(). Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-08-03 14:57:11 -04:00
Linus Torvalds	ed8f37370d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (31 commits) Btrfs: don't call writepages from within write_full_page Btrfs: Remove unused variable 'last_index' in file.c Btrfs: clean up for find_first_extent_bit() Btrfs: clean up for wait_extent_bit() Btrfs: clean up for insert_state() Btrfs: remove unused members from struct extent_state Btrfs: clean up code for merging extent maps Btrfs: clean up code for extent_map lookup Btrfs: clean up search_extent_mapping() Btrfs: remove redundant code for dir item lookup Btrfs: make acl functions really no-op if acl is not enabled Btrfs: remove remaining ref-cache code Btrfs: remove a BUG_ON() in btrfs_commit_transaction() Btrfs: use wait_event() Btrfs: check the nodatasum flag when writing compressed files Btrfs: copy string correctly in INO_LOOKUP ioctl Btrfs: don't print the leaf if we had an error btrfs: make btrfs_set_root_node void Btrfs: fix oops while writing data to SSD partitions Btrfs: Protect the readonly flag of block group ... Fix up trivial conflicts (due to acl and writeback cleanups) in - fs/btrfs/acl.c - fs/btrfs/ctree.h - fs/btrfs/extent_io.c	2011-08-02 21:14:05 -10:00
Al Viro	3567866bf2	RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-03 00:58:42 -04:00
Jeff Layton	b802898334	cifs: demote DFS referral lookup errors to cFYI cifs: demote DFS referral lookup errors to cFYI Now that we call into this routine on every mount, anyone who doesn't have the upcall configured will get multiple printks about failed lookups. Reported-and-Tested-by: Martijn Uffing <mp3project@sarijopen.student.utwente.nl> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-03 03:19:28 +00:00
Steve French	fc05a78efb	Revert "cifs: advertise the right receive buffer size to the server" This reverts commit `c4d3396b26`. Problems discovered with readdir to Samba due to not accounting for header size properly with this change	2011-08-03 03:17:43 +00:00
Rafael J. Wysocki	da5aa861be	fix block device fallout from ->fsync() changes blkdev_fsync() needs to write pages in pagecache... Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 21:33:47 -04:00
Linus Torvalds	60ad446682	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits) ext4: prevent memory leaks from ext4_mb_init_backend() on error path ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode # ext4: use ext4_msg() instead of printk in mballoc ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree() ext4: use the correct error exit path in ext4_init_inode_table() ext4: add missing kfree() on error return path in add_new_gdb() ext4: change umode_t in tracepoint headers to be an explicit __u16 ext4: fix races in ext4_sync_parent() ext4: Fix overflow caused by missing cast in ext4_fallocate() ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole ext4: simplify parameters of reserve_backup_gdb() ext4: simplify parameters of add_new_gdb() ext4: remove lock_buffer in bclean() and setup_new_group_blocks() ext4: simplify journal handling in setup_new_group_blocks() ext4: let setup_new_group_blocks() set multiple bits at a time ext4: fix a typo in ext4_group_extend() ext4: let ext4_group_add_blocks() handle 0 blocks quickly ext4: let ext4_group_add_blocks() return an error code ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks() ... Fix up conflict in fs/ext4/inode.c: commit `aacfc19c62` ("fs: simplify the blockdev_direct_IO prototype") had changed the ext4_ind_direct_IO() function for the new simplified calling convention, while commit `dae1e52cb1` ("ext4: move ext4_ind_* functions from inode.c to indirect.c") moved the function to another file.	2011-08-01 13:56:03 -10:00
Linus Torvalds	1b8e94993c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree() VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree() switch posix_acl_chmod() to umode_t switch posix_acl_from_mode() to umode_t switch posix_acl_equiv_mode() to umode_t * switch posix_acl_create() to umode_t * block: initialise bd_super in bdget() vfs: avoid call to inode_lru_list_del() if possible vfs: avoid taking inode_hash_lock on pipes and sockets vfs: conditionally call inode_wb_list_del() VFS: Fix automount for negative autofs dentries Btrfs: load the key from the dir item in readdir into a fake dentry devtmpfs: missing initialialization in never-hit case hppfs: missing include	2011-08-01 13:48:31 -10:00
Linus Torvalds	a2d7730235	Merge branch 'pstore-efi' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'pstore-efi' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: efivars: Introduce PSTORE_EFI_ATTRIBUTES efivars: Use string functions in pstore_write efivars: introduce utf16_strncmp efivars: String functions efi: Add support for using efivars as a pstore backend pstore: Allow the user to explicitly choose a backend pstore: Make "part" unsigned pstore: Add extra context for writes and erases pstore: Extend API for more flexibility in new backends	2011-08-01 13:40:51 -10:00
Yu Jian	79a77c5ac3	ext4: prevent memory leaks from ext4_mb_init_backend() on error path In ext4_mb_init(), if the s_locality_group allocation fails it will currently cause the allocations made in ext4_mb_init_backend() to be leaked. Moving the ext4_mb_init_backend() allocation after the s_locality_group allocation avoids that problem. Signed-off-by: Yu Jian <yujian@whamcloud.com> Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-08-01 17:41:46 -04:00
Yu Jian	48e6061bf4	ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode # Signed-off-by: Yu Jian <yujian@whamcloud.com> Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-08-01 17:41:39 -04:00
Theodore Ts'o	9d8b9ec442	ext4: use ext4_msg() instead of printk in mballoc Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-08-01 17:41:35 -04:00
Josef Bacik	0d10ee2e6d	Btrfs: don't call writepages from within write_full_page When doing a writepage we call writepages to try and write out any other dirty pages in the area. This could cause problems where we commit a transaction and then have somebody else dirtying metadata in the area as we could end up writing out a lot more than we care about, which could cause latency on anybody who is waiting for the transaction to completely finish committing. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:37:36 -04:00
Mitch Harder	341d14f161	Btrfs: Remove unused variable 'last_index' in file.c The variable 'last_index' is calculated in the __btrfs_buffered_write function and passed as a parameter to the prepare_pages function, but is not used anywhere in the prepare_pages function. Remove instances of 'last_index' in these functions. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:32:39 -04:00
Xiao Guangrong	69261c4b6a	Btrfs: clean up for find_first_extent_bit() find_first_extent_bit() and find_first_extent_bit_state() share most of the code, and we can just make the former call the latter. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:32:39 -04:00
Xiao Guangrong	ded91f0814	Btrfs: clean up for wait_extent_bit() We can just use cond_resched_lock(). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:32:38 -04:00
Xiao Guangrong	3150b69969	Btrfs: clean up for insert_state() Don't duplicate set_state_bits(). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:32:30 -04:00
Xiao Guangrong	3a6d457ec7	Btrfs: remove unused members from struct extent_state These members are not used at all. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:50 -04:00
Li Zefan	4d2c8f62f1	Btrfs: clean up code for merging extent maps unpin_extent_cache() and add_extent_mapping() shares the same code that merges extent maps. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:50 -04:00
Li Zefan	ed64f06652	Btrfs: clean up code for extent_map lookup lookup_extent_map() and search_extent_map() can share most of code. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:49 -04:00
Li Zefan	7e016a038e	Btrfs: clean up search_extent_mapping() rb_node returned by __tree_search() can be a valid pointer or NULL, but won't be some errno. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:49 -04:00
Li Zefan	85d85a743d	Btrfs: remove redundant code for dir item lookup When we search a dir item with a specific hash code, we can just return NULL without further checking if btrfs_search_slot() returns 1. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:48 -04:00
Li Zefan	9b89d95a14	Btrfs: make acl functions really no-op if acl is not enabled So there's no overhead for something we don't use. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:48 -04:00
Li Zefan	15de900d08	Btrfs: remove remaining ref-cache code Since commit `f2a97a9dbd` ("btrfs: remove all unused functions"), there's no extern functions at all in ref-cache.c, so just remove the remaining dead code. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:47 -04:00
Li Zefan	b9c8300c2a	Btrfs: remove a BUG_ON() in btrfs_commit_transaction() wait_for_commit() always returns 0. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:47 -04:00
Li Zefan	72d63ed642	Btrfs: use wait_event() Use wait_event() when possible to avoid code duplication. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:46 -04:00
Li Zefan	e55179b3d7	Btrfs: check the nodatasum flag when writing compressed files If mounting with nodatasum option, we won't csum file data for general write or direct-io write, and this rule should also be applied when writing compressed files. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:46 -04:00
Li Zefan	77906a5075	Btrfs: copy string correctly in INO_LOOKUP ioctl Memory areas [ptr, ptr+total_len] and [name, name+total_len] may overlap, so it's wrong to use memcpy(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:45 -04:00
Josef Bacik	b783e62d96	Btrfs: don't print the leaf if we had an error In __btrfs_free_extent we will print the leaf if we fail to find the extent we wanted, but the problem is if we get an error we won't have a leaf so often this leads to a NULL pointer dereference and we lose the error that actually occurred. So only print the leaf if ret > 0, which means we didn't find the item we were looking for but we didn't error either. This way the error is preserved. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:45 -04:00
Mark Fasheh	bf5f32ecb6	btrfs: make btrfs_set_root_node void This is fairly trivial - btrfs_set_root_node() - always returns zero so we can just make it void. All callers ignore the return code now anyway. I also made sure to check that none of the functions that btrfs_set_root_node() calls returns an error that we might have needed to catch and pass back. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:44 -04:00
liubo	ff1f2b4407	Btrfs: fix oops while writing data to SSD partitions Here I have a two SSD-partitions btrfs, and they are defaultly set to "data=raid0, metadata=raid1", then I try to fill my btrfs partition till "No space left on device", via "dd if=/dev/zero of=/mnt/btrfs/tmp". I get an oops panic from kernel BUG at fs/btrfs/extent-tree.c:5199!, which refers to find_free_extent's BUG_ON(index != get_block_group_index(block_group)); In SSD mode, in order to find enough space to alloc, we may check the block_group cache which has been checked sometime before, but the index is not updated, where it hits the BUG_ON. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Acked-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:44 -04:00
WuBo	61cfea9bb8	Btrfs: Protect the readonly flag of block group The access for ro in btrfs_block_group_cache should be protected because of the racy lock in relocation. Signed-off-by: Wu Bo <wu.bo@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:43 -04:00
Jeff Mahoney	1bf85046e4	btrfs: Make extent-io callbacks that never fail return void The set/clear bit and the extent split/merge hooks only ever return 0. Changing them to return void simplifies the error handling cases later. This patch changes the hook prototypes, the single implementation of each, and the functions that call them to return void instead. Since all four of these hooks execute under a spinlock, they're necessarily simple. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:43 -04:00
Li Zefan	b6973aa622	Btrfs: fix readahead in file defrag We passed the wrong value to btrfs_force_ra(). Fix this by changing the argument of btrfs_force_ra() from last_index to nr_page. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:42 -04:00
Tsutomu Itoh	b532402e4d	Btrfs: return error to caller when btrfs_unlink() failes When btrfs_unlink_inode() and btrfs_orphan_add() in btrfs_unlink() are error, the error code is returned to the caller instead of BUG_ON(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:42 -04:00
Wanlong Gao	a0f98dde11	Btrfs:don't check the return value of __btrfs_add_inode_defrag Don't need to check the return value of __btrfs_add_inode_defrag(), since it will always return 0. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:41 -04:00
Chris Mason	b43b31bdf2	Merge branch 'alloc_path' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/btrfs-error-handling into for-linus	2011-08-01 14:27:34 -04:00
Dave Kleikamp	1c8007b076	jfs: flush journal completely before releasing metadata inodes This fixes a race during unmount. We need to not only make sure that the journal is completely written, but that the metadata changes make it to disk before releasing ipimap and ipbmap. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-08-01 12:41:00 -05:00
Linus Torvalds	5f66d2b58c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: CIFS: Cleanup demupltiplex thread exiting code CIFS: Move mid search to a separate function CIFS: Move RFC1002 check to a separate function CIFS: Simplify socket reading in demultiplex thread CIFS: Move buffer allocation to a separate function cifs: remove unneeded variable initialization in cifs_reconnect_tcon cifs: simplify refcounting for oplock breaks cifs: fix compiler warning in CIFSSMBQAllEAs cifs: fix name parsing in CIFSSMBQAllEAs cifs: don't start signing too early cifs: trivial: goto out here is unnecessary cifs: advertise the right receive buffer size to the server	2011-08-01 06:14:25 -10:00
Pavel Shilovsky	762dfd1057	CIFS: Cleanup demupltiplex thread exiting code Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-01 12:49:45 +00:00
Pavel Shilovsky	ad69bae178	CIFS: Move mid search to a separate function Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-01 12:49:42 +00:00
Pavel Shilovsky	98bac62c9f	CIFS: Move RFC1002 check to a separate function Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-01 12:49:38 +00:00
Pavel Shilovsky	e7015fb1c5	CIFS: Simplify socket reading in demultiplex thread Move reading to separate function and remove csocket variable. Also change semantic in a little: goto incomplete_rcv only when we get -EAGAIN (or a familiar error) while reading rfc1002 header. In this case we don't check for echo timeout when we don't get whole header at once, as it was before. Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-01 12:49:34 +00:00
Theodore Ts'o	f18a5f21c2	ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-08-01 08:45:38 -04:00
Theodore Ts'o	9933fc0ac1	ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree() Introduce new helper functions which try kmalloc, and then fall back to vmalloc if necessary, and use them for allocating and deallocating s_flex_groups. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-08-01 08:45:02 -04:00
Pavel Shilovsky	3d9c2472a5	CIFS: Move buffer allocation to a separate function Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-08-01 12:33:44 +00:00
Yongqiang Yang	33853a0dde	ext4: use the correct error exit path in ext4_init_inode_table() This patch lets ext4_init_inode_table() handle errors right. ext4_init_inode_table() should down_write() alloc_sem which has been up_write()ed and stop the started journal handle. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-08-01 06:32:19 -04:00
Markus Trippelsdorf	206d440f64	xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set commit `4e34e719e4`, that takes the ACL checks to common code, accidentely broke the build when CONFIG_FS_POSIX_ACL is not set: CC fs/xfs/linux-2.6/xfs_iops.o fs/xfs/linux-2.6/xfs_iops.c:1025:14: error: ‘xfs_get_acl’ undeclared here (not in a function) Fix this by declaring xfs_get_acl a static inline function. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:35:04 -04:00
David Howells	43c1c9cd24	VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock Reorganise shrink_dcache_for_umount_subtree() in light of the demise of dcache_lock. Without that dcache_lock, there is no need for the batching of removal of dentries from the system under it (we wanted to make intensive use of the locked data whilst we held it, but didn't want to hold it for long at a time). This works, provided the preceding patch is correct in its removal of locking on dentry->d_lock on the basis that no one should be locking these dentries any more as the whole superblock is defunct. With this patch, the calls to dentry_lru_del() and __d_shrink() are placed at the point where each dentry is detached handled. It is possible that, as an alternative, the batching should still be done - but only for dentry_lru_del() of all a dentry's children in one go. In such a case, the batching would be done under dcache_lru_lock. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:27:57 -04:00
David Howells	c6627c60c0	VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree() Locks of the dcache_lock were replaced by locks of dentry->d_lock in commits such as: `2304450783` `2fd6b7f507` as part of the RCU-based pathwalk changes, despite the fact that the caller (shrink_dcache_for_umount()) notes in the banner comment the reasons that d_lock is not necessary in these functions: /* * destroy the dentries attached to a superblock on unmounting * - we don't need to use dentry->d_lock because: * - the superblock is detached from all mountings and open files, so the * dentry trees will not be rearranged by the VFS * - s_umount is write-locked, so the memory pressure shrinker will ignore * any dentries belonging to this superblock that it comes across * - the filesystem itself is no longer permitted to rearrange the dentries * in this superblock */ So remove these locks. If the locks are actually necessary, then this banner comment should be altered instead. The hash table chains are protected by 1-bit locks in the hash table heads, so those shouldn't be a problem. Note that to make this work, __d_drop() has to be split so that the RCUwalk barrier can be avoided. This causes problems otherwise as it has an assertion that dentry->d_lock is locked - but there is no need for that as no one else can be trying to access this dentry, except to step over it (and that should be handled by d_free(), I think). Signed-off-by: David Howells <dhowells@redhat.com> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:27:57 -04:00
David Howells	35f40ef002	VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree() Remove the detached-dentry counter from shrink_dcache_for_umount_subtree() as the value it computes is no longer used as of commit `312d3ca856` which made the nr_dentry counters summed per-CPU rather than global atomic. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:27:57 -04:00
Al Viro	86bc704db0	switch posix_acl_chmod() to umode_t again, that's what all callers pass to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:10:32 -04:00
Al Viro	3a5fba19b0	switch posix_acl_from_mode() to umode_t ... seeing that this is what all callers pass to it anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:10:20 -04:00
Al Viro	d6952123b5	switch posix_acl_equiv_mode() to umode_t * ... so that &inode->i_mode could be passed to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:10:06 -04:00
Al Viro	d3fb612076	switch posix_acl_create() to umode_t * so we can pass &inode->i_mode to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 02:09:42 -04:00
Lachlan McIlroy	782b94cdf5	block: initialise bd_super in bdget() bd_super is currently reset to NULL in kill_block_super() so we rely on previous users of the block_device object to initialise this value for the next user. This quirk was exposed on RHEL5 when a third party filesystem did not always use kill_block_super() and therefore bd_super wasn't being reset when a block_device object was recycled within the cache. This may not be a problem upstream but makes sense to be defensive. Signed-off-by: Lachlan McIlroy <lmcilroy@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 01:57:44 -04:00
Eric Dumazet	c4ae0c6545	vfs: avoid call to inode_lru_list_del() if possible inode_lru_list_del() is expensive because of per superblock lru locking, while some inodes are not in lru list. Adding a check in iput_final() can speedup pipe/sockets workloads on SMP. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 01:41:17 -04:00
Eric Dumazet	f2ee7abf4c	vfs: avoid taking inode_hash_lock on pipes and sockets Some inodes (pipes, sockets, ...) are not hashed, no need to take contended inode_hash_lock at dismantle time. nice speedup on SMP machines on socket intensive workloads. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 01:41:17 -04:00
Eric Dumazet	b12362bdb6	vfs: conditionally call inode_wb_list_del() Some inodes (pipes, sockets, ...) are not in bdi writeback list. evict() can avoid calling inode_wb_list_del() and its expensive spinlock by checking inode i_wb_list being empty or not. At this point, no other cpu/user can concurrently manipulate this inode i_wb_list Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 01:41:17 -04:00
David Howells	5a30d8a2b8	VFS: Fix automount for negative autofs dentries Autofs may set the DCACHE_NEED_AUTOMOUNT flag on negative dentries. These need attention from the automounter daemon regardless of the LOOKUP_FOLLOW flag. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 01:38:01 -04:00
Josef Bacik	b4aff1f874	Btrfs: load the key from the dir item in readdir into a fake dentry In btrfs we have 2 indexes for inodes. One is for readdir, it's in this nice sequential order and works out brilliantly for readdir. However if you use ls, it usually stat's each file it gets from readdir. This is where the second index comes in, which is based on a hash of the name of the file. So then the lookup has to lookup this index, and then lookup the inode. The index lookup is going to be in random order (since its based on the name hash), which gives us less than stellar performance. Since we know the inode location from the readdir index, I create a dummy dentry and copy the location key into dentry->d_fsdata. Then on lookup if we have d_fsdata we use that location to lookup the inode, avoiding looking up the other directory index. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-08-01 01:31:42 -04:00
Trond Myklebust	a00ed25cce	NFS: Re-enable compilation of nfs with !CONFIG_NFS_V4 \|\| !CONFIG_NFS_V4_1 Fix two recently introduced compile problems: Fix a typo in fs/nfs/pnfs.h Move the pnfs_blksize declaration outside the CONFIG_NFS_V4 section in struct nfs_server. Reported-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-31 14:27:04 -10:00
Jeff Layton	c4a5534a1b	cifs: remove unneeded variable initialization in cifs_reconnect_tcon Reported-and-acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-31 21:27:16 +00:00
Jeff Layton	ad635942c8	cifs: simplify refcounting for oplock breaks Currently, we take a sb->s_active reference and a cifsFileInfo reference when an oplock break workqueue job is queued. This is unnecessary and more complicated than it needs to be. Also as Al points out, deactivate_super has non-trivial locking implications so it's best to avoid that if we can. Instead, just cancel any pending oplock breaks for this filehandle synchronously in cifsFileInfo_put after taking it off the lists. That should ensure that this job doesn't outlive the structures it depends on. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-31 21:21:20 +00:00
Jeff Layton	5980fc966b	cifs: fix compiler warning in CIFSSMBQAllEAs The recent fix to the above function causes this compiler warning to pop on some gcc versions: CC [M] fs/cifs/cifssmb.o fs/cifs/cifssmb.c: In function ‘CIFSSMBQAllEAs’: fs/cifs/cifssmb.c:5708: warning: ‘ea_name_len’ may be used uninitialized in this function Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-31 21:21:13 +00:00
Jeff Layton	91d065c473	cifs: fix name parsing in CIFSSMBQAllEAs The code that matches EA names in CIFSSMBQAllEAs is incorrect. It uses strncmp to do the comparison with the length limited to the name_len sent in the response. Problem: Suppose we're looking for an attribute named "foobar" and have an attribute before it in the EA list named "foo". The comparison will succeed since we're only looking at the first 3 characters. Fix this by also comparing the length of the provided ea_name with the name_len in the response. If they're not equal then it shouldn't match. Reported-by: Jian Li <jiali@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-31 21:21:09 +00:00
Jeff Layton	998d6fcb24	cifs: don't start signing too early Sniffing traffic on the wire shows that windows clients send a zeroed out signature field in a NEGOTIATE request, and send "BSRSPYL" in the signature field during SESSION_SETUP. Make the cifs client behave the same way. It doesn't seem to make much difference in any server that I've tested against, but it's probably best to follow windows behavior as closely as possible here. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-31 21:21:06 +00:00
Jeff Layton	1f1cff0be0	cifs: trivial: goto out here is unnecessary ...and remove some obsolete comments. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-31 21:21:02 +00:00
Jeff Layton	c4d3396b26	cifs: advertise the right receive buffer size to the server Currently, we mirror the same size back to the server that it sends us. That makes little sense. Instead we should be sending the server the maximum buffer size that we can handle -- CIFSMaxBufSize minus the 4 byte RFC1001 header. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-31 21:20:58 +00:00
Linus Torvalds	24c3047095	Merge branch 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs * 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) pnfsblock: write_pagelist handle zero invalid extents pnfsblock: note written INVAL areas for layoutcommit pnfsblock: bl_write_pagelist pnfsblock: bl_read_pagelist pnfsblock: cleanup_layoutcommit pnfsblock: encode_layoutcommit pnfsblock: merge rw extents pnfsblock: add extent manipulation functions pnfsblock: bl_find_get_extent pnfsblock: xdr decode pnfs_block_layout4 pnfsblock: call and parse getdevicelist pnfsblock: merge extents pnfsblock: lseg alloc and free pnfsblock: remove device operations pnfsblock: add device operations pnfsblock: basic extent code pnfsblock: use pageio_ops api pnfsblock: add blocklayout Kconfig option, Makefile, and stubs pnfs: cleanup_layoutcommit pnfs: ask for layout_blksize and save it in nfs_server ...	2011-07-31 06:26:50 -10:00
Peng Tao	71cdd40fd4	pnfsblock: write_pagelist handle zero invalid extents For invalid extents, find other pages in the same fsblock and write them out. [pnfsblock: write_begin] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	31e6306a40	pnfsblock: note written INVAL areas for layoutcommit Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	650e2d39bd	pnfsblock: bl_write_pagelist Note: When upper layer's read/write request cannot be fulfilled, the block layout driver shouldn't silently mark the page as error. It should do what can be done and leave the rest to the upper layer. To do so, we should set rdata/wdata->res.count properly. When upper layer re-send the read/write request to finish the rest part of the request, pgbase is the position where we should start at. [pnfsblock: bl_write_pagelist support functions] [pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: Zhang Jingwang <yyalone@gmail.com> [pnfs-block: use new write_pagelist api] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> [SQUASHME: pnfsblock: mds_offset is set in the generic layer] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: mark IO error with NFS_LAYOUT_{RW\|RO}_FAILED] Signed-off-by: Peng Tao <peng_tao@emc.com> [pnfsblock: SQUASHME: adjust to API change] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: fixup blksize alignment in bl_setup_layoutcommit] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: Zhang Jingwang <yyalone@gmail.com> [pnfs-block: use new write_pagelist api] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	9549ec01b0	pnfsblock: bl_read_pagelist Note: When upper layer's read/write request cannot be fulfilled, the block layout driver shouldn't silently mark the page as error. It should do what can be done and leave the rest to the upper layer. To do so, we should set rdata/wdata->res.count properly. When upper layer re-send the read/write request to finish the rest part of the request, pgbase is the position where we should start at. [pnfsblock: mark IO error with NFS_LAYOUT_{RW\|RO}_FAILED] Signed-off-by: Peng Tao <peng_tao@emc.com> [pnfsblock: read path error handling] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: Zhang Jingwang <yyalone@gmail.com> [pnfs-block: use new read_pagelist api] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	b2be7811dd	pnfsblock: cleanup_layoutcommit In blocklayout driver. There are two things happening while layoutcommit/cleanup. 1. the modified extents are encoded. 2. On cleanup the extents are put back on the layout rw extents list, for reads. In the new system where actual xdr encoding is done in encode_layoutcommit() directly into xdr buffer, these are the new commit stages: 1. On setup_layoutcommit, the range is adjusted as before and a structure is allocated for communication with bl_encode_layoutcommit && bl_cleanup_layoutcommit (Generic layer provides a void-star to hang it on) 2. bl_encode_layoutcommit is called to do the actual encoding directly into xdr. The commit-extent-list is not freed and is stored on above structure. FIXME: The code is not yet converted to the new XDR cleanup 3. On cleanup the commit-extent-list is put back by a call to set_to_rw() as before, but with no need for XDR decoding of the list as before. And the commit-extent-list is freed. Finally allocated structure is freed. [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] Signed-off-by: Jim Rees <rees@umich.edu> [pnfsblock: introduce bl_committing list] Signed-off-by: Peng Tao <peng_tao@emc.com> [pnfsblock: SQUASHME: adjust to API change] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [blocklayout: encode_layoutcommit implementation] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [pnfsblock: fix bug setting up layoutcommit.] Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> [pnfsblock: cleanup_layoutcommit wants a status parameter] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	90ace12ac4	pnfsblock: encode_layoutcommit In blocklayout driver. There are two things happening while layoutcommit/cleanup. 1. the modified extents are encoded. 2. On cleanup the extents are put back on the layout rw extents list, for reads. In the new system where actual xdr encoding is done in encode_layoutcommit() directly into xdr buffer, these are the new commit stages: 1. On setup_layoutcommit, the range is adjusted as before and a structure is allocated for communication with bl_encode_layoutcommit && bl_cleanup_layoutcommit (Generic layer provides a void-star to hang it on) 2. bl_encode_layoutcommit is called to do the actual encoding directly into xdr. The commit-extent-list is not freed and is stored on above structure. FIXME: The code is not yet converted to the new XDR cleanup 3. On cleanup the commit-extent-list is put back by a call to set_to_rw() as before, but with no need for XDR decoding of the list as before. And the commit-extent-list is freed. Finally allocated structure is freed. [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [blocklayout: encode_layoutcommit implementation] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [pnfsblock: fix bug setting up layoutcommit.] Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> [pnfsblock: prevent commit list corruption] [pnfsblock: fix layoutcommit with an empty opaque] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	9f3770422c	pnfsblock: merge rw extents Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	c1c2a4cd35	pnfsblock: add extent manipulation functions Adds working implementations of various support functions to handle INVAL extents, needed by writes, such as bl_mark_sectors_init and bl_is_sector_init. [pnfsblock: fix 64-bit compiler warnings for extent manipulation] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [Implement release_inval_marks] Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	6d742ba538	pnfsblock: bl_find_get_extent Implement bl_find_get_extent(), one of the core extent manipulation routines. [pnfsblock: Lookup list entry of layouts and tags in reverse order] Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> pnfsblock: fix print format warnings for sector_t and size_t gcc spews warnings about these on x86_64, e.g.: fs/nfs/blocklayout/blocklayout.c:74: warning: format ‘%Lu’ expects type ‘long long unsigned int’, but argument 2 has type ‘sector_t’ fs/nfs/blocklayout/blocklayout.c:388: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’ Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Fred Isaman	e9437ccef9	pnfsblock: xdr decode pnfs_block_layout4 XDR decodes the block layout payload sent in LAYOUTGET result, storing the result in an extent list. [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: fix bug getting pnfs_layout_type in translate_devid().] Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Fred Isaman	2f9fd18260	pnfsblock: call and parse getdevicelist Call GETDEVICELIST during mount, then call and parse GETDEVICEINFO for each device returned. [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: Jim Rees <rees@umich.edu> [pnfsblock: fix pnfs_deviceid references] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: fix print format warnings for sector_t and size_t] [pnfs-block: #include <linux/vmalloc.h>] [pnfsblock: no PNFS_NFS_SERVER] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfsblock: fix bug determining size of striped volume] [pnfsblock: fix oops when using multiple devices] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: get rid of vmap and deviceid->area structure] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Fred Isaman	03341d2cc9	pnfsblock: merge extents Replace a stub, so that extents underlying the layouts are properly added, merged, or ignored as necessary. Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: delete the new node before put it] Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00

... 3 4 5 6 7 ...

24267 Commits