linux

Author	SHA1	Message	Date
Anton Blanchard	fc63cf2370	exec: setup_arg_pages() fails to return errors In setup_arg_pages we work hard to assign a value to ret, but on exit we always return 0. Also remove a now duplicated exit path and branch to out_unlock instead. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-11-12 07:25:58 -08:00
Heiko Carstens	7779d7bed9	fs: add missing compat_ptr handling for FS_IOC_RESVSP ioctl For FS_IOC_RESVSP and FS_IOC_RESVSP64 compat_sys_ioctl() uses its arg argument as a pointer to userspace. However it is missing a a call to compat_ptr() which will do a proper pointer conversion. This was introduced with `3e63cbb1` "fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ankit Jain <me@ankitjain.org> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Arnd Bergmann <arndbergmann@googlemail.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> [2.6.31.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-11-12 07:25:57 -08:00
Sukadev Bhattiprolu	29f12ca321	pidns: fix a leak in /proc dentries and inodes with pid namespaces. Daniel Lezcano reported a leak in 'struct pid' and 'struct pid_namespace' that is discussed in: http://lkml.org/lkml/2009/10/2/159. To summarize the thread, when container-init is terminated, it sets the PF_EXITING flag, zaps other processes in the container and waits to reap them. As a part of reaping, the container-init should flush any /proc dentries associated with the processes. But because the container-init is itself exiting and the following PF_EXITING check, the dentries are not flushed, resulting in leak in /proc inodes and dentries. This fix reverts the commit `7766755a2f` ("Fix /proc dcache deadlock in do_exit") which introduced the check for PF_EXITING. At the time of the commit, shrink_dcache_parent() flushed dentries from other filesystems also and could have caused a deadlock which the commit fixed. But as pointed out by Eric Biederman, after commit `0feae5c47a`, shrink_dcache_parent() no longer affects other filesystems. So reverting the commit is now safe. As pointed out by Jan Kara, the leak is not as critical since the unclaimed space will be reclaimed under memory pressure or by: echo 3 > /proc/sys/vm/drop_caches But since this check is no longer required, its best to remove it. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Reported-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Jan Kara <jack@ucw.cz> Cc: Andrea Arcangeli <andrea@cpushare.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-11-12 07:25:57 -08:00
Stefan Schmidt	ff5e4b51a3	fs/jbd: Export log_start_commit to fix ext3 build. This fixes: ERROR: "log_start_commit" [fs/ext3/ext3.ko] undefined! Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>	2009-11-12 10:24:12 +01:00
Linus Torvalds	aa021baa32	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix panic when trying to destroy a newly allocated Btrfs: allow more metadata chunk preallocation Btrfs: fallback on uncompressed io if compressed io fails Btrfs: find ideal block group for caching Btrfs: avoid null deref in unpin_extent_cache() Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root Btrfs: fix some metadata enospc issues Btrfs: fix how we set max_size for free space clusters Btrfs: cleanup transaction starting and fix journal_info usage Btrfs: fix data allocation hint start	2009-11-11 13:38:59 -08:00
Josef Bacik	a6dbd429d8	Btrfs: fix panic when trying to destroy a newly allocated There is a problem where iget5_locked will look for an inode, not find it, and then subsequently try to allocate it. Another CPU will have raced in and allocated the inode instead, so when iget5_locked gets the inode spin lock again and does a search, it finds the new inode. So it goes ahead and calls destroy_inode on the inode it just allocated. The problem is we don't set BTRFS_I(inode)->root until the new inode is completely initialized. This patch makes us set root to NULL when alloc'ing a new inode, so when we get to btrfs_destroy_inode and we see that root is NULL we can just free up the memory and continue on. This fixes the panic http://www.kerneloops.org/submitresult.php?number=812690 Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-11-11 15:53:34 -05:00
Linus Torvalds	fd801452a3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: JBD/JBD2: free j_wbuf if journal init fails. ext3: Wait for proper transaction commit on fsync ext3: retry failed direct IO allocations	2009-11-11 11:52:22 -08:00
Linus Torvalds	16fe4101ae	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: partial revert to fix double brelse WARNING() ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O ext4: code clean up for dio fallocate handling ext4: skip conversion of uninit extents after direct IO if there isn't any ext4: fix ext4_ext_direct_IO()'s return value after converting uninit extents ext4: discard preallocation when restarting a transaction during truncate	2009-11-11 11:28:11 -08:00
Chris Mason	33b2580864	Btrfs: allow more metadata chunk preallocation On an FS where all of the space has not been allocated into chunks yet, the enospc can return enospc just because the existing metadata chunks are full. We get around this by allowing more metadata chunks to be allocated up to a certain limit, and finding the right limit is a little fuzzy. The problem is the reservations for delalloc would preallocate way too much of the FS as metadata. We need to start saying no and just force some IO to happen. But we also need to let a reasonable amount of the FS become metadata. This bumps the hard limit up, later releases will have a better system. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-11-11 14:20:20 -05:00
Josef Bacik	f5a84ee3cd	Btrfs: fallback on uncompressed io if compressed io fails Currently compressed IO does not deal with not having its entire extent able to be allocated. So if we have enough free space to allocate for the extent, but its not contiguous, it will fail spectacularly. This patch fixes this by falling back on uncompressed IO which lets us spread the delalloc extent across multiple extents. I tested this by making us randomly think the reservation had failed to make it fallback on the uncompressed io way and it seemed to work fine. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-11-11 14:20:20 -05:00
Josef Bacik	ccf0e72537	Btrfs: find ideal block group for caching This patch changes a few things. Hopefully the comments are helpfull, but I'll try and be as verbose here. Problem: My fedora box was taking 1 minute and 21 seconds to boot with btrfs as root. Part of this problem was we pick the first block group we can find and start caching it, even if it may not have enough free space. The other problem is we only search for cached block groups the first time around, which we won't find any cached block groups because this is a newly mounted fs, so we end up caching several block groups during bootup, which with alot of fragmentation takes around 30-45 seconds to complete, which bogs down the system. So Solution: 1) Don't cache block groups willy-nilly at first. Instead try and figure out which block group has the most free, and therefore will take the least amount of time to cache. 2) Don't be so picky about cached block groups. The other problem is once we've filled up a cluster, if the block group isn't finished caching the next time we try and do the allocation we'll completely ignore the cluster and start searching from the beginning of the space, which makes us cache more block groups, which slows us down even more. So instead of skipping block groups that are not finished caching when we have a hint, only skip the block group if it hasn't started caching yet. There is one other tweak in here. Before if we allocated a chunk and still couldn't find new space, we'd end up switching the space info to force another chunk allocation. This could make us end up with way too many chunks, so keep track of this particular case. With this patch and my previous cluster fixes my fedora box now boots in 43 seconds, and according to the bootchart is not held up by our block group caching at all. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-11-11 14:20:19 -05:00
Dan Carpenter	4eb3991c5d	Btrfs: avoid null deref in unpin_extent_cache() I re-orderred the checks to avoid dereferencing "em" if it was null. Found by smatch static checker. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-11-11 14:20:18 -05:00
Li Dongyang	df66916e71	Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root We don't need to call btrfs_release_path because btrfs_free_path will do that for us. Signed-off-by: Li Dongyang <Jerry87905@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-11-11 14:20:18 -05:00
Josef Bacik	5df6a9f606	Btrfs: fix some metadata enospc issues We weren't reserving metadata space for rename, rmdir and unlink, which could cause problems. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-11-11 14:20:17 -05:00
Josef Bacik	01dea1efc2	Btrfs: fix how we set max_size for free space clusters This patch fixes a problem where max_size can be set to 0 even though we filled the cluster properly. We set max_size to 0 if we restart the cluster window, but if the new start entry is big enough to be our new cluster then we could return with a max_size set to 0, which will mean the next time we try to allocate from this cluster it will fail. So set max_extent to the entry's size. Tested this on my box and now we actually allocate from the cluster after we fill it. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-11-11 14:20:17 -05:00
Josef Bacik	249ac1e55c	Btrfs: cleanup transaction starting and fix journal_info usage We use journal_info to tell if we're in a nested transaction to make sure we don't commit the transaction within a nested transaction. We use another method to see if there are any outstanding ioctl trans handles, so if we're starting one do not set current->journal_info, since it will screw with other filesystems. This patch also cleans up the starting stuff so there aren't any magic numbers. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-11-11 14:20:16 -05:00
Josef Bacik	6346c93988	Btrfs: fix data allocation hint start Sometimes our start allocation hint when we cow a file can be either EXTENT_HOLE or some other such place holder, which is not optimal. So if we find that our em->block_start is one of these special values, check to see where the first block of the inode is stored, and use that as a hint. If that block is also a special value, just fallback on a hint of 0 and let the allocator figure out a good place to put the data. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-11-11 14:20:16 -05:00
Tao Ma	7b02bec07e	JBD/JBD2: free j_wbuf if journal init fails. If journal init fails, we need to free j_wbuf. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-11-11 15:24:14 +01:00
Jan Kara	fe8bc91c4c	ext3: Wait for proper transaction commit on fsync We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2009-11-11 15:22:49 +01:00
Eric Sandeen	ea0174a713	ext3: retry failed direct IO allocations On a 256M 4k block filesystem, doing this in a loop: dd if=/dev/zero of=test oflag=direct bs=1M count=64 rm -f test eventually leads to spurious ENOSPC: dd: writing `test': No space left on device As with other block allocation callers, it looks like we need to potentially retry the allocations on the initial ENOSPC. A similar patch went into ext4 (commit `fbbf694566`) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-11-11 15:22:49 +01:00
Trond Myklebust	96d25e5322	NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT Changeset `a65318bf3a` (NFSv4: Simplify some cache consistency post-op GETATTRs) incorrectly changed the getattr bitmap for readdir(). This causes the readdir() function to fail to return a fileid/inode number, which again exposed a bug in the NFS readdir code that causes spurious ENOENT errors to appear in applications (see http://bugzilla.kernel.org/show_bug.cgi?id=14541). The immediate band aid is to revert the incorrect bitmap change, but more long term, we should change the NFS readdir code to cope with the fact that NFSv4 servers are not required to support fileids/inode numbers. Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-11-11 16:15:42 +09:00
Linus Torvalds	b7b69c7e97	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix missing cleanup of gc cache on error cases nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks	2009-11-09 09:52:55 -08:00
Theodore Ts'o	1e424a3483	ext4: partial revert to fix double brelse WARNING() This is a partial revert of commit `6487a9d` (only the changes made to fs/ext4/namei.c), since it is causing the following brelse() double-free warning when running fsstress on a file system with 1k blocksize and we run into a block allocation failure while converting a single-block directory to a multi-block hash-tree indexed directory. WARNING: at fs/buffer.c:1197 __brelse+0x2e/0x33() Hardware name: VFS: brelse: Trying to free free buffer Modules linked in: Pid: 2226, comm: jbd2/sdd-8 Not tainted 2.6.32-rc6-00577-g0003f55 #101 Call Trace: [<c01587fb>] warn_slowpath_common+0x65/0x95 [<c0158869>] warn_slowpath_fmt+0x29/0x2c [<c021168e>] __brelse+0x2e/0x33 [<c0288a9f>] jbd2_journal_refile_buffer+0x67/0x6c [<c028a9ed>] jbd2_journal_commit_transaction+0x319/0x14d8 [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60 [<c0175bcc>] ? sched_clock_cpu+0x12a/0x13e [<c017f6b4>] ? trace_hardirqs_off+0xb/0xd [<c0175c1f>] ? cpu_clock+0x3f/0x5b [<c017f6ec>] ? lock_release_holdtime+0x36/0x137 [<c0664ad0>] ? _spin_unlock_irqrestore+0x44/0x51 [<c0180af3>] ? trace_hardirqs_on_caller+0x103/0x124 [<c0180b1f>] ? trace_hardirqs_on+0xb/0xd [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60 [<c0290d1c>] kjournald2+0x11a/0x310 [<c017118e>] ? autoremove_wake_function+0x0/0x38 [<c0290c02>] ? kjournald2+0x0/0x310 [<c0170ee6>] kthread+0x66/0x6b [<c0170e80>] ? kthread+0x0/0x6b [<c01251b3>] kernel_thread_helper+0x7/0x10 ---[ end trace 5579351b86af61e3 ]--- Commit `6487a9d` was an attempt some buffer head leaks in an ENOSPC error path, but in some cases it actually results in an excess ENOSPC, as shown above. Fixing this means cleaning up who is responsible for releasing the buffer heads from the callee to the caller of add_dirent_to_buf(). Since that's a relatively complex change, and we're late in the rcX development cycle, I'm reverting this now, and holding back a more complete fix until after 2.6.32 ships. We've lived with this buffer_head leak on ENOSPC in ext3 and ext4 for a very long time; a few more months won't kill us. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Curt Wohlgemuth <curtw@google.com>	2009-11-08 15:45:44 -05:00
Ryusuke Konishi	c083234f15	nilfs2: fix missing cleanup of gc cache on error cases This fixes an -rc1 regression brought by the commit: `1cf58fa840` ("nilfs2: shorten freeze period due to GC in write operation v3"). Although the patch moved out a function call of nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding cleanup job needed for the error case. This will move the missing cleanup job to the destination function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Jiro SEKIBA <jir@unicus.jp>	2009-11-08 19:04:25 +09:00
Ryusuke Konishi	5399dd1fc8	nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks This fixes a kernel oops reported by Markus Trippelsdorf in the email titled "[NILFS users] kernel Oops while running nilfs_cleanerd". The oops was caused by a bug of error path in nilfs_ioctl_move_blocks() function, which was inlined in nilfs_ioctl_clean_segments(). nilfs_ioctl_move_blocks checks duplication of blocks which will be moved in garbage collection. But, the check should have be done within nilfs_ioctl_move_inode_block() to prevent list corruption among buffers storing the target blocks. To fix the kernel oops, this moves forward the duplication check before the list insertion. I also tested this for stable trees [2.6.30, 2.6.31]. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>	2009-11-08 19:01:35 +09:00
Jeff Layton	f475f67754	cifs: don't use CIFSGetSrvInodeNumber in is_path_accessible Because it's lighter weight, CIFS tries to use CIFSGetSrvInodeNumber to verify the accessibility of the root inode and then falls back to doing a full QPathInfo if that fails with -EOPNOTSUPP. I have at least a report of a server that returns NT_STATUS_INTERNAL_ERROR rather than something that translates to EOPNOTSUPP. Rather than trying to be clever with that call, just have is_path_accessible do a normal QPathInfo. That call is widely supported and it shouldn't increase the overhead significantly. Cc: Stable <stable@kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-11-06 22:06:14 +00:00
Jeff Layton	ec06aedd44	cifs: clean up handling when server doesn't consistently support inode numbers It's possible that a server will return a valid FileID when we query the FILE_INTERNAL_INFO for the root inode, but then zeroed out inode numbers when we do a FindFile with an infolevel of SMB_FIND_FILE_ID_FULL_DIR_INFO. In this situation turn off querying for server inode numbers, generate a warning for the user and just generate an inode number using iunique. Once we generate any inode number with iunique we can no longer use any server inode numbers or we risk collisions, so ensure that we don't do that in cifs_get_inode_info either. Cc: Stable <stable@kernel.org> Reported-by: Timothy Normand Miller <theosib@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-11-06 22:04:37 +00:00
Mingming	ba230c3f6d	ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O To prepare for a direct I/O write, we need to split the unwritten extents before submitting the I/O. When no extents needed to be split, ext4_split_unwritten_extents() was incorrectly returning 0 instead of the size of uninitialized extents. This bug caused the wrong return value sent back to VFS code when it gets called from async IO path, leading to an unnecessary fall back to buffered IO. This bug also hid the fact that the check to see whether or not a split would be necessary was incorrect; we can only skip splitting the extent if the write completely covers the uninitialized extent. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-06 04:01:23 -05:00
Linus Torvalds	e5a9236222	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: invalidate target of rename fuse: fix kunmap in fuse_ioctl_copy_user fuse: prevent fuse_put_request on invalid pointer	2009-11-05 13:23:29 -08:00
Linus Torvalds	1bbc9a66d0	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/kvm: Remove problematic BUILD_BUG_ON statement powerpc/pci: Fix regression in powerpc MSI-X powerpc: Avoid giving out RTC dates below EPOCH powerpc/mm: Remove debug context clamping from nohash code powerpc: Cleanup Kconfig selection of hugetlbfs support	2009-11-05 13:22:49 -08:00
Linus Torvalds	d4116f8204	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: sysfs: Don't leak secdata when a sysfs_dirent is freed.	2009-11-05 10:57:39 -08:00
Linus Torvalds	411094acb7	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, fs: Fix x86 procfs stack information for threads on 64-bit x86: Add reboot quirk for 3 series Mac mini x86: Fix printk message typo in mtrr cleanup code dma-debug: Fix compile warning with PAE enabled x86/amd-iommu: Un__init function required on shutdown x86/amd-iommu: Workaround for erratum 63	2009-11-05 10:54:08 -08:00
Eric W. Biederman	4c3da2209b	sysfs: Don't leak secdata when a sysfs_dirent is freed. While refreshing my sysfs patches I noticed a leak in the secdata implementation. We don't free the secdata when we free the sysfs dirent. This is a bug in 2.6.32-rc5 that we really should close. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-11-05 08:19:18 +11:00
Stefani Seibold	89240ba059	x86, fs: Fix x86 procfs stack information for threads on 64-bit This patch fixes two issues in the procfs stack information on x86-64 linux. The 32 bit loader compat_do_execve did not store stack start. (this was figured out by Alexey Dobriyan). The stack information on a x64_64 kernel always shows 0 kbyte stack usage, because of a missing implementation of the KSTK_ESP macro which always returned -1. The new implementation now returns the right value. Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1257240160.4889.24.camel@wall-e> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-04 13:25:03 +01:00
Miklos Szeredi	5219f346b0	fuse: invalidate target of rename Invalidate the target's attributes, which may have changed (such as nlink, change time) so that they are refreshed on the next getattr(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-11-04 10:24:52 +01:00
Jens Axboe	0bd87182d3	fuse: fix kunmap in fuse_ioctl_copy_user Looks like another victim of the confusing kmap() vs kmap_atomic() API differences. Reported-by: Todor Gyumyushev <yodor1@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org	2009-11-04 10:24:51 +01:00
Anand V. Avati	f60311d5f7	fuse: prevent fuse_put_request on invalid pointer fuse_direct_io() has a loop where requests are allocated in each iteration. if allocation fails, the loop is broken out and follows into an unconditional fuse_put_request() on that invalid pointer. Signed-off-by: Anand V. Avati <avati@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@kernel.org	2009-11-04 10:24:50 +01:00
Linus Torvalds	51bb296b09	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cfq-iosched: limit coop preemption cfq-iosched: fix bad return value cfq_should_preempt() backing-dev: bdi sb prune should be in the unregister path, not destroy Fix bio_alloc() and bio_kmalloc() documentation bio_put(): add bio_clone() to the list of functions in the comment	2009-11-03 18:16:21 -08:00
Mingming	4b70df1816	ext4: code clean up for dio fallocate handling The ext4_debug() call in ext4_end_io_dio() should be moved after the check to make sure that io_end is non-NULL. The comment above ext4_get_block_dio_write() ("Maximum number of blocks...") is a duplicate; the original and correct comment is above the #define DIO_MAX_BLOCKS up above. Based on review comments from Curt Wohlgemuth. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-03 14:44:54 -05:00
Mingming	5f5249507e	ext4: skip conversion of uninit extents after direct IO if there isn't any At the end of direct I/O operation, ext4_ext_direct_IO() always called ext4_convert_unwritten_extents(), regardless of whether there were any unwritten extents involved in the I/O or not. This commit adds a state flag so that ext4_ext_direct_IO() only calls ext4_convert_unwritten_extents() when necessary. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-10 10:48:04 -05:00
Mingming	109f556519	ext4: fix ext4_ext_direct_IO()'s return value after converting uninit extents After a direct I/O request covering an uninitalized extent (i.e., created using the fallocate system call) or a hole in a file, ext4 will convert the uninitialized extent so it is marked as initialized by calling ext4_convert_unwritten_extents(). This function returns zero on success. This return value was getting returned by ext4_direct_IO(); however the file system's direct_IO function is supposed to return the number of bytes read or written on a success. By returning zero, it confused the direct I/O code into falling back to buffered I/O unnecessarily. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-10 10:48:08 -05:00
Ryusuke Konishi	05b4358ad5	nilfs2: add zero-fill for new btree node buffers Adds missing initialization of newly allocated b-tree node buffers. This avoids garbage data to be mixed in b-tree node blocks. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-03 12:32:03 +09:00
Ryusuke Konishi	aeda7f6343	nilfs2: fix irregular checkpoint creation due to data flush When nilfs flushes out dirty data to reduce memory pressure, creation of checkpoints is wrongly postponed. This bug causes irregular checkpoint creation especially in small footprint systems. To correct this issue, a timer for the checkpoint creation has to be continued if a log writer does not create a checkpoint. This will do the correction. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-03 12:32:03 +09:00
Ryusuke Konishi	b1e19e5601	nilfs2: fix dirty page accounting leak causing hang at write Bruno Prémont and Dunphy, Bill noticed me that NILFS will certainly hang on ARM-based targets. I found this was caused by an underflow of dirty pages counter. A b-tree cache routine was marking page dirty without adjusting page account information. This fixes the dirty page accounting leak and resolves the hang on arm-based targets. Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Reported-by: Dunphy, Bill <WDunphy@tandbergdata.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: stable <stable@kernel.org>	2009-11-03 12:31:36 +09:00
Aneesh Kumar K.V	fa5d11133b	ext4: discard preallocation when restarting a transaction during truncate When restart a transaction during a truncate operation, we drop and reacquire i_data_sem. After reacquiring i_data_sem, we need to discard any inode-based preallocation that might have been grabbed while we released i_data_sem (for example, if pdflush is allocating blocks and racing against the truncate). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-02 18:50:49 -05:00
Linus Torvalds	1836d95928	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: fix readdir corner cases 9p: fix readlink 9p: fix a small bug in readdir for long directories	2009-11-02 12:23:21 -08:00
Linus Torvalds	d4da6c9ccf	Revert "ext4: Remove journal_checksum mount option and enable it by default" This reverts commit `d0646f7b63`, as requested by Eric Sandeen. It can basically cause an ext4 filesystem to miss recovery (and thus get mounted with errors) if the journal checksum does not match. Quoth Eric: "My hand-wavy hunch about what is happening is that we're finding a bad checksum on the last partially-written transaction, which is not surprising, but if we have a wrapped log and we're doing the initial scan for head/tail, and we abort scanning on that bad checksum, then we are essentially running an unrecovered filesystem. But that's hand-wavy and I need to go look at the code. We lived without journal checksums on by default until now, and at this point they're doing more harm than good, so we should revert the default-changing commit until we can fix it and do some good power-fail testing with the fixes in place." See http://bugzilla.kernel.org/show_bug.cgi?id=14354 for all the gory details. Requested-by: Eric Sandeen <sandeen@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Alexey Fisher <bug-track@fisher-privat.net> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Mathias Burén <mathias.buren@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-11-02 10:15:27 -08:00
Eric Van Hensbergen	3e2796a90c	9p: fix readdir corner cases The patch below also addresses a couple of other corner cases in readdir seen with a large (e.g. 64k) msize. I'm not sure what people think of my co-opting of fid->aux here. I'd be happy to rework if there's a better way. When the size of the user supplied buffer passed to readdir is smaller than the data returned in one go by the 9P read request, v9fs_dir_readdir() currently discards extra data so that, on the next call, a 9P read request will be issued with offset < previous offset + bytes returned, which voilates the constraint described in paragraph 3 of read(5) description. This patch preseves the leftover data in fid->aux for use in the next call. Signed-off-by: Jim Garlick <garlick@llnl.gov> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-11-02 08:43:45 -06:00
Martin Stava	2511cd0b3b	9p: fix readlink I do not know if you've looked on the patch, but unfortunately it is incorrect. A suggested better version is in this email (the old version didn't work in case the user provided buffer was not long enough - it incorrectly appended null byte on a position of last char, and thus broke the contract of the readlink method). However, I'm still not sure this is 100% correct thing to do, I think readlink is supposed to return buffer without last null byte in all cases, but we do return last null byte (even the old version).. on the other hand it is likely unspecified what is in the remaining part of the buffer, so null character may be fine there ;): Signed-off-by: Martin Stava <martin.stava@gmail.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-11-02 08:43:45 -06:00
Martin Stava	f91b90993f	9p: fix a small bug in readdir for long directories Here is a proposed patch for bug in readdir. Listing of dirs with many files fails without this patch. Signed-off-by: Martin Stava <martin.stava@gmail.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-11-02 08:43:44 -06:00
Alberto Bertogli	5f04eeb8a7	Fix bio_alloc() and bio_kmalloc() documentation Commit `451a9ebf` accidentally broke bio_alloc() and bio_kmalloc() comments by (almost) swapping them. This patch fixes that, by placing the comments in the right place. Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-11-02 11:41:13 +01:00
Alberto Bertogli	ad0bf11070	bio_put(): add bio_clone() to the list of functions in the comment In bio_put()'s comment, add bio_clone() to the list of functions that can give you a bio reference. Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-11-02 11:39:22 +01:00
Linus Torvalds	a80a66caf8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs * 'for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs: xfs: fix xfs_quota remove error xfs: free temporary cursor in xfs_dialloc	2009-10-31 12:12:49 -07:00
Ryota Yamauchi	c7ff91d722	xfs: fix xfs_quota remove error The xfs_quota returns ENOSYS when remove command is executed. Reproducable with following steps. # mount -t xfs -o uquota /dev/sda7 /mnt/mp1 # xfs_quota -x -c off -c remove XFS_QUOTARM: Function not implemented. The remove command is allowed during quotaoff, but xfs_fs_set_xstate() checks whether quota is running, and it leads to ENOSYS. To solve this problem, add a check for X_QUOTARM. Signed-off-by: Ryota Yamauchi <r-yamauchi@vf.jp.nec.com> Signed-off-by: Utako Kusaka <u-kusaka@wm.jp.nec.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2009-10-30 09:27:44 +01:00
Eric Sandeen	3b826386d3	xfs: free temporary cursor in xfs_dialloc Commit `bd16956599` seems to have a slight regression where this code path: if (!--searchdistance) { /* * Not in range - save last search * location and allocate a new inode */ ... goto newino; } doesn't free the temporary cursor (tcur) that got dup'd in this function. This leaks an item in the xfs_btree_cur zone, and it's caught on module unload: =========================================================== BUG xfs_btree_cur: Objects remaining on kmem_cache_close() ----------------------------------------------------------- It seems like maybe a single free at the end of the function might be cleaner, but for now put a del_cursor right in this code block similar to the handling in the rest of the function. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Christoph Hellwig <hch@lst.de>	2009-10-30 09:27:07 +01:00
Benjamin Herrenschmidt	5a1eb5c445	powerpc: Cleanup Kconfig selection of hugetlbfs support Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 15:03:54 +11:00
Coly Li	837711f862	ocfs2: return f_fsid info in ocfs2_statfs() Currently the f_fsid of struct kstatfs returned from ocfs2_statfs() is undefined (vfs layer fills in 0 as default). Since in some conditions, f_fsid value might be used in a (f_fsid, ino) pair to uniquely identify a file, ocfs2 should return a unique defined f_fsid value from ocfs2_statfs(). Because uuid_str is the same on big or litlle endian machine, it's endian consistent to use osb->uuid_str to generate f_fsid value. Signed-off-by: Coly Li <coly.li@suse.de> Cc: Sunil Mushran <sunil.mushran@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-10-29 15:02:20 -07:00
Linus Torvalds	68e71d1902	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: backing-dev: ensure that a removed bdi no longer has super_block referencing it block: use after free bug in __blkdev_get block: silently error unsupported empty barriers too	2009-10-29 09:17:19 -07:00
Linus Torvalds	0d43f5123d	Merge branch 'sh/for-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * 'sh/for-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: sh: Fix hugetlbfs dependencies for SH-3 && MMU configurations. sh: Document uImage.bin target in archhelp. sh: add uImage.bin target sh: rsk7203 CONFIG_MTD=n fix sh: Check for return_to_handler when unwinding the stack sh: Build fix: define more __movmem* symbols sh: __irq_entry annotate do_IRQ(). Fix up sh/powerpc conflicts in fs/Kconfig	2009-10-29 09:07:15 -07:00
Linus Torvalds	fb3165b59f	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4: The link() operation should return any delegation on the file NFSv4: Fix two unbalanced put_rpccred() issues. NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE nfs: Panic when commit fails	2009-10-29 09:02:24 -07:00
Linus Torvalds	36f8a53ff2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fixing to avoid invalid kfree() in cifs_get_tcp_session()	2009-10-29 09:02:01 -07:00
Linus Torvalds	0a53f1693c	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/ppc64: Use preempt_schedule_irq instead of preempt_schedule powerpc: Minor cleanup to lib/Kconfig.debug powerpc: Minor cleanup to sound/ppc/Kconfig powerpc: Minor cleanup to init/Kconfig powerpc: Limit memory hotplug support to PPC64 Book-3S machines powerpc: Limit hugetlbfs support to PPC64 Book-3S machines powerpc: Fix compile errors found by new ppc64e_defconfig powerpc: Add a Book-3E 64-bit defconfig powerpc/booke: Fix xmon single step on PowerPC Book-E powerpc: Align vDSO base address powerpc: Fix segment mapping in vdso32 powerpc/iseries: Remove compiler version dependent hack powerpc/perf_events: Fix priority of MSR HV vs PR bits powerpc/5200: Update defconfigs drivers/serial/mpc52xx_uart.c: Use UPIO_MEM rather than SERIAL_IO_MEM powerpc/boot/dts: drop obsolete 'fsl5200-clocking' of: Remove nested function mpc5200: support for the MAN mpc5200 based board mucmc52 mpc5200: support for the MAN mpc5200 based board uc101	2009-10-29 08:59:06 -07:00
Linus Torvalds	2375669214	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix double IRELE in xfs_dqrele_inode	2009-10-29 08:18:25 -07:00
Jeff Mahoney	47f365eb57	hfs: fix oops on mount with corrupted btree extent records A particular fsfuzzer run caused an hfs file system to crash on mount. This is due to a corrupted MDB extent record causing a miscalculation of HFS_I(inode)->first_blocks for the extent tree. If the extent records are zereod out, it won't trigger the first_blocks special case. Instead it falls through to the extent code which we're still in the middle of initializing. This patch catches the 0 size extent records, reports the corruption, and fails the mount. Reported-by: Ramon de Carvalho Valle <rcvalle@linux.vnet.ibm.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:29 -07:00
Ben Hutchings	5c36fe3d87	hfsplus: refuse to mount volumes larger than 2TB As found in <http://bugs.debian.org/550010>, hfsplus is using type u32 rather than sector_t for some sector number calculations. In particular, hfsplus_get_block() does: u32 ablock, dblock, mask; ... map_bh(bh_result, sb, (dblock << HFSPLUS_SB(sb).fs_shift) + HFSPLUS_SB(sb).blockoffset + (iblock & mask)); I am not confident that I can find and fix all cases where a sector number may be truncated. For now, avoid data loss by refusing to mount HFS+ volumes with more than 2^32 sectors (2TB). [akpm@linux-foundation.org: fix 32 and 64-bit issues] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Eric Sesterhenn <snakebyte@gmx.de> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:27 -07:00
Hugh Dickins	370c28def6	hwpoison: fix/proc/meminfo alignment Given such a long name, the kB count in /proc/meminfo's HardwareCorrupted line is being shown too far right (it does align with x86_64's VmallocChunk above, but I hope nobody will ever have that much corrupted!). Align it. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
Tao Ma	2f48d593b6	ocfs2: duplicate inline data properly during reflink. The old reflink fails to handle inodes with inline data and will oops if it encounters them. This patch copies inline data to the new inode. Extended attributes may still be refcounted. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Tested-by: Tristan Ye <tristan.ye@oracle.com>	2009-10-28 22:48:23 -07:00
Tao Ma	87f4b1bb98	ocfs2: Move ocfs2_complete_reflink to the right place. As its name ocfs2_complete_reflink indicates, it should be called after all the work for reflink is done, so it really should be called after we reflink xattr successfully. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Tested-by: Tristan Ye <tristan.ye@oracle.com>	2009-10-28 22:44:19 -07:00
Joel Becker	fb5cbe9efd	ocfs2: Return -EINVAL when a device is not ocfs2. In case of non-modular kernels the root filesystem is mounted by trying several filesystems. If ocfs2 was tried before the actual filesystem type, the mount would fail because ocfs2_sb_probe() returns -EAGAIN instead of -EINVAL. ocfs2 will now return -EINVAL properly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Reported-by: Laszlo Attila Toth <panther@balabit.hu>	2009-10-28 22:28:24 -07:00
Kumar Gala	0cd9ad73b8	powerpc: Limit hugetlbfs support to PPC64 Book-3S machines Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-27 16:42:41 +11:00
Paul Mundt	ffb4a73d89	sh: Fix hugetlbfs dependencies for SH-3 && MMU configurations. The hugetlb dependencies presently depend on SUPERH && MMU while the hugetlb page size definitions depend on CPU_SH4 or CPU_SH5. This unfortunately allows SH-3 + MMU configurations to enable hugetlbfs without a corresponding HPAGE_SHIFT definition, resulting in the build blowing up. As SH-3 doesn't support variable page sizes, we tighten up the dependenies a bit to prevent hugetlbfs from being enabled. These days we also have a shiny new SYS_SUPPORTS_HUGETLBFS, so switch to using that rather than adding to the list of corner cases in fs/Kconfig. Reported-by: Kristoffer Ericson <kristoffer.ericson@gmail.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2009-10-27 07:22:37 +09:00
Neil Brown	960cc0f4fe	block: use after free bug in __blkdev_get commit `0762b8bde9` (from 14 months ago) introduced a use-after-free bug which has just recently started manifesting in my md testing. I tried git bisect to find out what caused the bug to start manifesting, and it could have been the recent change to blk_unregister_queue (`48c0d4d4c0`) but the results were inconclusive. This patch certainly fixes my symptoms and looks correct as the two calls are now in the same order as elsewhere in that function. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-26 15:27:11 +01:00
Trond Myklebust	9a3936aac1	NFSv4: The link() operation should return any delegation on the file Otherwise, we have to wait for the server to recall it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-10-26 08:09:46 -04:00
Trond Myklebust	141aeb9f26	NFSv4: Fix two unbalanced put_rpccred() issues. Commits `29fba38b` (nfs41: lease renewal) and `fc01cea9` (nfs41: sequence operation) introduce a couple of put_rpccred() calls on credentials for which there is no corresponding get_rpccred(). See http://bugzilla.kernel.org/show_bug.cgi?id=14249 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-10-26 08:09:46 -04:00
Trond Myklebust	52567b03ca	NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE RFC 3530 states that when we recieve the error NFS4ERR_RESOURCE, we are not supposed to bump the sequence number on OPEN, LOCK, LOCKU, CLOSE, etc operations. The problem is that we map that error into EREMOTEIO in the XDR layer, and so the NFSv4 middle-layer routines like seqid_mutating_err(), and nfs_increment_seqid() don't recognise it. The fix is to defer the mapping until after the middle layers have processed the error. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-10-23 14:46:42 -04:00
Terry Loftin	a8b40bc7e6	nfs: Panic when commit fails Actually pass the NFS_FILE_SYNC option to the server to avoid a Panic in nfs_direct_write_complete() when a commit fails. At the end of an nfs write, if the nfs commit fails, all the writes will be rescheduled. They are supposed to be rescheduled as NFS_FILE_SYNC writes, but the rpc_task structure is not completely intialized and so the option is not passed. When the rescheduled writes complete, the return indicates that they are NFS_UNSTABLE and we try to do another commit. This leads to a Panic because the commit data structure pointer was set to null in the initial (failed) commit attempt. Signed-off-by: Terry Loftin <terry.loftin@hp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-10-23 14:16:30 -04:00
Linus Torvalds	d995053d04	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: dnotify: ignore FS_EVENT_ON_CHILD inotify: fix coalesce duplicate events into a single event in special case inotify: deprecate the inotify kernel interface fsnotify: do not set group for a mark before it is on the i_list	2009-10-22 08:28:28 +09:00
Yinghai Lu	4223a4a155	nfs: Fix nfs_parse_mount_options() kfree() leak Fix a (small) memory leak in one of the error paths of the NFS mount options parsing code. Regression introduced in 2.6.30 by commit `a67d18f` (NFS: load the rpc/rdma transport module automatically). Reported-by: Yinghai Lu <yinghai@kernel.org> Reported-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-22 08:15:23 +09:00
Earl Chew	ad3960243e	fs: pipe.c null pointer dereference This patch fixes a null pointer exception in pipe_rdwr_open() which generates the stack trace: > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: > [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70 > [<ffffffff8028125c>] __dentry_open+0x13c/0x230 > [<ffffffff8028143d>] do_filp_open+0x2d/0x40 > [<ffffffff802814aa>] do_sys_open+0x5a/0x100 > [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67 The failure mode is triggered by an attempt to open an anonymous pipe via /proc/pid/fd/* as exemplified by this script: ============================================================= while : ; do { echo y ; sleep 1 ; } \| { while read ; do echo z$REPLY; done ; } & PID=$! OUT=$(ps -efl \| grep 'sleep 1' \| grep -v grep \| { read PID REST ; echo $PID; } ) OUT="${OUT%% }" DELAY=$((RANDOM 1000 / 32768)) usleep $((DELAY * 1000 + RANDOM % 1000 )) echo n > /proc/$OUT/fd/1 # Trigger defect done ============================================================= Note that the failure window is quite small and I could only reliably reproduce the defect by inserting a small delay in pipe_rdwr_open(). For example: static int pipe_rdwr_open(struct inode inode, struct file filp) { msleep(100); mutex_lock(&inode->i_mutex); Although the defect was observed in pipe_rdwr_open(), I think it makes sense to replicate the change through all the pipe_*_open() functions. The core of the change is to verify that inode->i_pipe has not been released before attempting to manipulate it. If inode->i_pipe is no longer present, return ENOENT to indicate so. The comment about potentially using atomic_t for i_pipe->readers and i_pipe->writers has also been removed because it is no longer relevant in this context. The inode->i_mutex lock must be used so that inode->i_pipe can be dealt with correctly. Signed-off-by: Earl Chew <earl_chew@agilent.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-22 08:11:44 +09:00
Andreas Gruenbacher	945526846a	dnotify: ignore FS_EVENT_ON_CHILD Mask off FS_EVENT_ON_CHILD in dnotify_handle_event(). Otherwise, when there is more than one watch on a directory and dnotify_should_send_event() succeeds, events with FS_EVENT_ON_CHILD set will trigger all watches and cause spurious events. This case was overlooked in commit `e42e2773`. #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <signal.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> static void create_event(int s, siginfo_t* si, void* p) { printf("create\n"); } static void delete_event(int s, siginfo_t* si, void* p) { printf("delete\n"); } int main (void) { struct sigaction action; char tmpdir, file; int fd1, fd2; sigemptyset (&action.sa_mask); action.sa_flags = SA_SIGINFO; action.sa_sigaction = create_event; sigaction (SIGRTMIN + 0, &action, NULL); action.sa_sigaction = delete_event; sigaction (SIGRTMIN + 1, &action, NULL); # define TMPDIR "/tmp/test.XXXXXX" tmpdir = malloc(strlen(TMPDIR) + 1); strcpy(tmpdir, TMPDIR); mkdtemp(tmpdir); # define TMPFILE "/file" file = malloc(strlen(tmpdir) + strlen(TMPFILE) + 1); sprintf(file, "%s/%s", tmpdir, TMPFILE); fd1 = open (tmpdir, O_RDONLY); fcntl(fd1, F_SETSIG, SIGRTMIN); fcntl(fd1, F_NOTIFY, DN_MULTISHOT \| DN_CREATE); fd2 = open (tmpdir, O_RDONLY); fcntl(fd2, F_SETSIG, SIGRTMIN + 1); fcntl(fd2, F_NOTIFY, DN_MULTISHOT \| DN_DELETE); if (fork()) { /* This triggers a create event / creat(file, 0600); / This triggers a create and delete event (!) */ unlink(file); } else { sleep(1); rmdir(tmpdir); } return 0; } Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>	2009-10-20 18:02:33 -04:00
Wei Yongjun	3de0ef4f20	inotify: fix coalesce duplicate events into a single event in special case If we do rename a dir entry, like this: rename("/tmp/ino7UrgoJ.rename1", "/tmp/ino7UrgoJ.rename2") rename("/tmp/ino7UrgoJ.rename2", "/tmp/ino7UrgoJ") The duplicate events should be coalesced into a single event. But those two events do not be coalesced into a single event, due to some bad check in event_compare(). It can not match the two NULL inodes as the same event. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2009-10-18 15:49:38 -04:00
Eric Paris	9f0d793b52	fsnotify: do not set group for a mark before it is on the i_list fsnotify_add_mark is supposed to add a mark to the g_list and i_list and to set the group and inode for the mark. fsnotify_destroy_mark_by_entry uses the fact that ->group != NULL to know if this group should be destroyed or if it's already been done. But fsnotify_add_mark sets the group and inode before it actually adds the mark to the i_list and g_list. This can result in a race in inotify, it requires 3 threads. sys_inotify_add_watch("file") sys_inotify_add_watch("file") sys_inotify_rm_watch([a]) inotify_update_watch() inotify_new_watch() inotify_add_to_idr() ^--- returns wd = [a] inotfiy_update_watch() inotify_new_watch() inotify_add_to_idr() fsnotify_add_mark() ^--- returns wd = [b] returns to userspace; inotify_idr_find([a]) ^--- gives us the pointer from task 1 fsnotify_add_mark() ^--- this is going to set the mark->group and mark->inode fields, but will return -EEXIST because of the race with [b]. fsnotify_destroy_mark() ^--- since ->group != NULL we call back into inotify_freeing_mark() which calls inotify_remove_from_idr([a]) since fsnotify_add_mark() failed we call: inotify_remove_from_idr([a]) <------WHOOPS it's not in the idr, this could have been any entry added later! The fix is to make sure we don't set mark->group until we are sure the mark is on the inode and fsnotify_add_mark will return success. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-10-18 15:49:38 -04:00
Linus Torvalds	8e7768d3f4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: fix socket fd translation dlm: fix lowcomms_connect_node for sctp	2009-10-15 15:21:20 -07:00
Linus Torvalds	dcbeb0bec5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: always pin metadata in discard mode Btrfs: enable discard support Btrfs: add -o discard option Btrfs: properly wait log writers during log sync Btrfs: fix possible ENOSPC problems with truncate Btrfs: fix btrfs acl #ifdef checks Btrfs: streamline tree-log btree block writeout Btrfs: avoid tree log commit when there are no changes Btrfs: only write one super copy during fsync	2009-10-15 15:06:37 -07:00
Neil Brown	83db93f4de	sysfs: Allow sysfs_notify_dirent to be called from interrupt context. sysfs_notify_dirent is a simple atomic operation that can be used to alert user-space that new data can be read from a sysfs attribute. Unfortunately it cannot currently be called from non-process context because of its use of spin_lock which is sometimes taken with interrupts enabled. So change all lockers of sysfs_open_dirent_lock to disable interrupts, thus making sysfs_notify_dirent safe to be called from non-process context (as drivers/md does in md_safemode_timeout). sysfs_get_open_dirent is (documented as being) only called from process context, so it uses spin_lock_irq. Other places use spin_lock_irqsave. The usage for sysfs_notify_dirent in md_safemode_timeout was introduced in 2.6.28, so this patch is suitable for that and more recent kernels. Reported-by: Joel Andres Granados <jgranado@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-10-14 15:16:25 -07:00
Cornelia Huck	a6a8357788	sysfs: Allow sysfs_move_dir(..., NULL) again. As device_move() and kobject_move() both handle a NULL destination, sysfs_move_dir() should do this as well (again) and fall back to sysfs_root in that case. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-10-14 15:16:25 -07:00
Chris Mason	444528b3e6	Btrfs: always pin metadata in discard mode We have an optimization in btrfs to allow blocks to be immediately freed if they were allocated in this transaction and never written. Otherwise they are pinned and freed when the transaction commits. This isn't optimal for discard mode because immediately freeing them means immediately discarding them. It is better to give the block to the pinning code and letting the (slow) discard happen later. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-14 10:32:50 -04:00
Christoph Hellwig	0634857488	Btrfs: enable discard support The discard support code in btrfs currently is guarded by ifdefs for BIO_RW_DISCARD, which is never defines as it's the name of an enum memeber. Just remove the useless ifdefs to actually enable the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-14 10:32:49 -04:00
Christoph Hellwig	e244a0aeb6	Btrfs: add -o discard option Enable discard by default is not a good idea given the the trim speed of SSD prototypes we've seen, and the carecteristics for many high-end arrays. Turn of discards by default and require the -o discard option to enable them on. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-14 10:32:49 -04:00
Yan, Zheng	86df7eb921	Btrfs: properly wait log writers during log sync A recently fsync optimization make btrfs_sync_log skip calling wait_for_writer in the single log writer case. This is incorrect since the writer count can also be increased by btrfs_pin_log. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-14 10:32:48 -04:00
Josef Bacik	5d5e103a70	Btrfs: fix possible ENOSPC problems with truncate There's a problem where we don't do any space reservation for truncates, which can cause you to OOPs because you will be allowed to go off in the weeds a bit since we don't account for the delalloc bytes that are created as a result of the truncate. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-14 10:32:47 -04:00
Alex Elder	ba313e68fa	Merge branch 'master' of ssh://oss.sgi.com/oss/git/xfs/xfs into for-linus	2009-10-13 15:47:22 -05:00
Christoph Hellwig	05277c75f6	xfs: fix double IRELE in xfs_dqrele_inode xfs_dqrele_inode calls xfs_iput to release the ilock and a reference and then also calls IRELE which does a second decrement of the reference count. This leads to a premature freeing of inodes when quotas were turned off while the filesystem was mounted. Thanks to Utako Kusaka for reporting the bug and provinding a good testcase. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Utako Kusaka <u-kusaka@wm.jp.nec.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-10-13 13:16:36 -05:00
Chris Mason	0eda294dfc	Btrfs: fix btrfs acl #ifdef checks The btrfs acl code was #ifdefing for a define that didn't exist. This correctly matches it to the values used by the Kconfig file. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-13 13:51:39 -04:00
Chris Mason	690587d109	Btrfs: streamline tree-log btree block writeout Syncing the tree log is a 3 phase operation. 1) write and wait for all the tree log blocks for a given root. 2) write and wait for all the tree log blocks for the tree of tree log roots. 3) write and wait for the super blocks (barriers here) This isn't as efficient as it could be because there is no requirement to wait for the blocks from step one to hit the disk before we start writing the blocks from step two. This commit changes the sequence so that we don't start waiting until all the tree blocks from both steps one and two have been sent to disk. We do this by breaking up btrfs_write_wait_marked_extents into two functions, which is trivial because it was already broken up into two parts. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-13 13:35:12 -04:00
Chris Mason	257c62e1bc	Btrfs: avoid tree log commit when there are no changes rpm has a habit of running fdatasync when the file hasn't changed. We already detect if a file hasn't been changed in the current transaction but it might have been sent to the tree-log in this transaction and not changed since the last call to fsync. In this case, we want to avoid a tree log sync, which includes a number of synchronous writes and barriers. This commit extends the existing tracking of the last transaction to change a file to also track the last sub-transaction. The end result is that rpm -ivh and -Uvh are roughly twice as fast, and on par with ext3. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-13 13:35:12 -04:00
Chris Mason	4722607db6	Btrfs: only write one super copy during fsync During a tree-log commit for fsync, we've been writing at least two copies of the super block and forcing them to disk. The other filesystems write only one, and this change brings us on par with them. A full transaction commit will write all the super copies, so we still have redundant info written on a regular basis. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-10-13 13:35:11 -04:00
Linus Torvalds	80f506918f	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: Add cciss_allow_hpsa module parameter cciss: Fix multiple calls to pci_release_regions blk-settings: fix function parameter kernel-doc notation writeback: kill space in debugfs item name writeback: account IO throttling wait as iowait elv_iosched_store(): fix strstrip() misuse cfq-iosched: avoid probable slice overrun when idling cfq-iosched: apply bool value where we return 0/1 cfq-iosched: fix think time allowed for seekers cfq-iosched: fix the slice residual sign cfq-iosched: abstract out the 'may this cfqq dispatch' logic block: use proper BLK_RW_ASYNC in blk_queue_start_tag() block: Seperate read and write statistics of in_flight requests v2 block: get rid of kblock_schedule_delayed_work() cfq-iosched: fix possible problem with jiffies wraparound cfq-iosched: fix issue with rq-rq merging and fifo list ordering	2009-10-13 10:21:33 -07:00
Theodore Ts'o	96ec2e0a71	ext3: Don't update superblock write time when filesystem is read-only This avoids updating the superblock write time when we are mounting the root file system read/only but we need to replay the journal; at that point, for people who are east of GMT and who make their clock tick in localtime for Windows bug-for-bug compatibility, and this will cause e2fsck to complain and force a full file system check. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2009-10-13 00:06:43 +02:00
Stefan Richter	a1be9eee29	NFS: suppress a build warning struct sockaddr_storage * can safely be used as struct sockaddr *. Suppress an "incompatible pointer type" warning. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-12 10:25:12 -07:00

1 2 3 4 5 ...

15790 Commits