Commit Graph

27799 Commits

Author SHA1 Message Date
Al Viro
63a44583f3 qnx6: don't bother with ->i_dentry in inode-freeing callback
we'll initialize it in inode_init_always() when we allocate that
object again.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:48 +04:00
Al Viro
6ce6e24e72 get rid of magic in proc_namespace.c
don't rely on proc_mounts->m being the first field; container_of()
is there for purpose.  No need to bother with ->private, while
we are at it - the same container_of will do nicely.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:48 +04:00
Al Viro
f7a99c5b7c get rid of ->mnt_longterm
it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:47 +04:00
Julia Lawall
d187663ef2 fs/direct-io.c: adjust suspicious bit operation
READ is 0, so the result of the bit-and operation is 0.  Rewrite with == as
done elsewhere in the same file.

This problem was found using Coccinelle (http://coccinelle.lip6.fr/).

Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:46 +04:00
Artem Bityutskiy
3dd847820d affs: get rid of affs_sync_super
This patch makes affs stop using the VFS '->write_super()' method along with
the 's_dirt' superblock flag, because they are on their way out.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblocks using the '->write_super()' call-back.  But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:45 +04:00
Artem Bityutskiy
a215fef7ed affs: introduce VFS superblock object back-reference
Add an 'sb' VFS superblock back-reference to the 'struct affs_sb_info' data
structure - we will need to find the VFS superblock from a 'struct
affs_sb_info' object in the next patch, so this change is jut a preparation.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:45 +04:00
Artem Bityutskiy
a837107439 affs: stop using lock_super
The VFS's 'lock_super()' and 'unlock_super()' calls are deprecated and unwanted
and just wait for a brave knight who'd kill them. This patch makes AFFS stop
using them and use the buffer-head's own lock instead.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:44 +04:00
Artem Bityutskiy
e0471c8d8a affs: re-structure superblock locking a bit
AFFS wants to serialize the superblock (the root block in AFFS terms) updates
and uses 'lock_super()/unlock_super()' for these purposes. This patch pushes the
locking down to the 'affs_commit_super()' from the callers.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:43 +04:00
Artem Bityutskiy
0164b1a32e affs: remove useless superblock writeout on remount
We do not need to write out the superblock from '->remount_fs()' because
VFS has already called '->sync_fs()' by this time and the superblock has
already been written out. Thus, remove the 'affs_write_super()'
infocation from 'affs_remount()'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:42 +04:00
Artem Bityutskiy
c9753b1d20 affs: remove useless superblock writeout on unmount
We do not need to write out the superblock from '->put_super()' because VFS has
already called '->sync_fs()' by this time and the superblock has already been
written out. Thus, remove the 'affs_commit_super()' infocation from
'affs_put_super()'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:42 +04:00
Artem Bityutskiy
bc86256d2e affs: stop setting bm_flags
AFFS stores values '1' and '2' in 'bm_flags', and I fail to see any logic when
it prefers one or another. AFFS writes '1' only from '->put_super()', while
'->sync_fs()' and '->write_super()' store value '2'.  So on the first glance,
it looks like we want to have '1' if we unmount.  However, this does not really
happen in these cases:
  1. superblock is written via 'write_super()' then we unmount;
  2. we re-mount R/O, then unmount.
which are quite typical.

I could not find good documentation describing this field, except of one random
piece of documentation in the internet which says that -1 means that the root
block is valid, which is not consistent with what we have in the Linux AFFS
driver.

Jan Kara commented on this: "I have some vague recollection that on Amiga
boolean was usually encoded as: 0 == false, ~0 == -1 == true. But it has been
ages..."

Thus, my conclusion is that value of '1' is as good as value of '2' and we can
just always use '2'. An Jan Kara suggested to go further: "generally bm_flags
handling looks strange. If they are 0, we mount fs read only and thus cannot
change them.  If they are != 0, we write 2 there. So IMHO if you just removed
bm_flags setting, nothing will really happen."

So this patch removes the bm_flags setting completely. This makes the "clean"
argument of the 'affs_commit_super()' function unneeded, so it is also removed.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:41 +04:00
Christoph Hellwig
1632dcc93f xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
xfs_bdstrat_cb only adds a check for a shutdown filesystem over
xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
filesystem a little earlier.  In addition the shutdown handling in
xfs_bdstrat_cb is not very suitable for this caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:49 -05:00
Christoph Hellwig
40a9b7963d xfs: prevent recursion in xfs_buf_iorequest
If the b_iodone handler is run in calling context in xfs_buf_iorequest we
can run into a recursion where xfs_buf_iodone_callbacks keeps calling back
into xfs_buf_iorequest because an I/O error happened, which keeps calling
back into xfs_buf_iorequest.  This chain will usually not take long
because the filesystem gets shut down because of log I/O errors, but even
over a short time it can cause stack overflows if run on the same context.

As a short term workaround make sure we always call the iodone handler in
workqueue context.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:39 -05:00
Dave Chinner
aa292847b9 xfs: don't defer metadata allocation to the workqueue
Almost all metadata allocations come from shallow stack usage
situations. Avoid the overhead of switching the allocation to a
workqueue as we are not in danger of running out of stack when
making these allocations. Metadata allocations are already marked
through the args that are passed down, so this is trivial to do.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Mel Gorman <mgorman@suse.de>
Tested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:27 -05:00
Dave Chinner
e3a746f5aa xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near
The current cursor is reallocated when retrying the allocation, so
the existing cursor needs to be destroyed in both the restart and
the failure cases.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:18 -05:00
Linus Torvalds
4264e6a263 NFS client bugfixes for Linux 3.5
- Fix an NFSv4 mount regression
 - Fix O_DIRECT list manipulation snafus
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQADo8AAoJEGcL54qWCgDyu8wP/2entLv4jYgM7zp+/kOBhps6
 RpAv57YYv86pLk5gxoZg/JJs3z4QzS3CjMRXK11CEzRMt1NUqJfI6587TIl11DYW
 0UJcrAQo4X3EgNR7CCKAJaHUmYLayjUgquL8OjdBTuXCrpJzGdHcnBa71oF3xaEb
 jJhsA0W+qqBpa0295HH8sdHkURX1+46OdSd47+Dg+PZurkr0CRhjZo+DX+AkaSsU
 blM+pYUgu20NFaNUEfBphib3XMnnCGXc84g3gqjeOldRMijJGv1VcmhfsmJwmyLA
 1mImwZ2HDzySVLrbA/D7yeddZfXuTaf8krFmyP07U6I7hIEXCAEr16DmqCQCF1UB
 ppzZM9lu8f6df9tIlmjdA/hGOfs5OSQzD1L9Irn8xpRIWYmg+mrxXSzIG6wZE8zu
 n1EURW5CrLrlz2d7rdhqA+Y/GIkAvIO0/iBbJnvkMUdMPsd9/ikFPkHGzMhsK1KN
 AxEi2r+n0e8Hwaueu0EP685QjrEjOyLDdKIsAzNy/UEGN4v297UoW8ZyYnrzEIJG
 mQg6l4ke34zuw3AxoR3X4SuW329rGpG7x0pXNwqJG292T9jki334dPzJCy1LbaGl
 o1g+Z0BcyMe9VC7tRwOFEiGsbcd/OYPRbVVhu/3RlrWuVvj46QDvVpQ7+1pOr2s/
 4Xu70AlKw3B02ge621+p
 =bX2K
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix an NFSv4 mount regression
 - Fix O_DIRECT list manipulation snafus

* tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix an NFSv4 mount regression
  NFS: Fix list manipulation snafus in fs/nfs/direct.c
2012-07-13 10:58:45 -07:00
Dave Jones
8d657eb3b4 Remove easily user-triggerable BUG from generic_setlease
This can be trivially triggered from userspace by passing in something unexpected.

    kernel BUG at fs/locks.c:1468!
    invalid opcode: 0000 [#1] SMP
    RIP: 0010:generic_setlease+0xc2/0x100
    Call Trace:
      __vfs_setlease+0x35/0x40
      fcntl_setlease+0x76/0x150
      sys_fcntl+0x1c6/0x810
      system_call_fastpath+0x1a/0x1f

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org # 3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-13 10:50:23 -07:00
Jeff Moyer
91f68c89d8 block: fix infinite loop in __getblk_slow
Commit 080399aaaf ("block: don't mark buffers beyond end of disk as
mapped") exposed a bug in __getblk_slow that causes mount to hang as it
loops infinitely waiting for a buffer that lies beyond the end of the
disk to become uptodate.

The problem was initially reported by Torsten Hilbrich here:

    https://lkml.org/lkml/2012/6/18/54

and also reported independently here:

    http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511

and then Richard W.M.  Jones and Marcos Mello noted a few separate
bugzillas also associated with the same issue.  This patch has been
confirmed to fix:

    https://bugzilla.redhat.com/show_bug.cgi?id=835019

The main problem is here, in __getblk_slow:

        for (;;) {
                struct buffer_head * bh;
                int ret;

                bh = __find_get_block(bdev, block, size);
                if (bh)
                        return bh;

                ret = grow_buffers(bdev, block, size);
                if (ret < 0)
                        return NULL;
                if (ret == 0)
                        free_more_memory();
        }

__find_get_block does not find the block, since it will not be marked as
mapped, and so grow_buffers is called to fill in the buffers for the
associated page.  I believe the for (;;) loop is there primarily to
retry in the case of memory pressure keeping grow_buffers from
succeeding.  However, we also continue to loop for other cases, like the
block lying beond the end of the disk.  So, the fix I came up with is to
only loop when grow_buffers fails due to memory allocation issues
(return value of 0).

The attached patch was tested by myself, Torsten, and Rich, and was
found to resolve the problem in call cases.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Josh Boyer <jwboyer@redhat.com>
Cc: Stable <stable@vger.kernel.org>  # 3.0+
[ Jens is on vacation, taking this directly  - Linus ]
--
Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-13 08:36:35 -07:00
Mathias Krause
0143fc5e9f udf: avoid info leak on export
For type 0x51 the udf.parent_partref member in struct fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-13 11:21:21 +02:00
Mathias Krause
fe685aabf7 isofs: avoid info leak on export
For type 1 the parent_offset member in struct isofs_fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-13 11:21:21 +02:00
Liu Bo
10983f2e8d Btrfs: fix typo in convert_extent_bit
It should be convert_extent_bit.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-07-12 11:27:34 +02:00
Steven J. Magnani
5d8ecbbc28 fat: fix non-atomic NFS i_pos read
fat_encode_fh() can fetch an invalid i_pos value on systems where 64-bit
accesses are not atomic.  Make it use the same accessor as the rest of the
FAT code.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11 16:04:47 -07:00
Bob Liu
fea9f718b3 fs: ramfs: file-nommu: add SetPageUptodate()
There is a bug in the below scenario for !CONFIG_MMU:

 1. create a new file
 2. mmap the file and write to it
 3. read the file can't get the correct value

Because

  sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page()

which causes the page to be zeroed.

Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that
generic_file_aio_read() do not call simple_readpage().

Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11 16:04:47 -07:00
Luis Henriques
a4e08d001f ocfs2: fix NULL pointer dereference in __ocfs2_change_file_space()
As ocfs2_fallocate() will invoke __ocfs2_change_file_space() with a NULL
as the first parameter (file), it may trigger a NULL pointer dereferrence
due to a missing check.

Addresses http://bugs.launchpad.net/bugs/1006012

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Reported-by: Bret Towe <magnade@gmail.com>
Tested-by: Bret Towe <magnade@gmail.com>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11 16:04:43 -07:00
Dong Aisheng
475d009429 of: Improve prom_update_property() function
prom_update_property() currently fails if the property doesn't
actually exist yet which isn't what we want. Change to add-or-update
instead of update-only, then we can remove a lot duplicated lines.

Suggested-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Dong Aisheng <dong.aisheng@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-11 15:26:51 +10:00
Trond Myklebust
f1daf666dd NFSv4: Fix an NFSv4 mount regression
The helper nfs_fs_mount() will always call nfs4_try_mount with the
mount_info->fill_super argument pointing to nfs_fill_super, which is
NFSv2/v3 only.
Fix is to have nfs4_try_mount replace it with nfs4_fill_super.

The regression was introduced by commit c40f8d1d (NFS: Create a common
fs_mount() function)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-10 13:25:39 -04:00
Jan Kara
57b9655d01 udf: Improve table length check to avoid possible overflow
When a partition table length is corrupted to be close to 1 << 32, the
check for its length may overflow on 32-bit systems and we will think
the length is valid. Later on the kernel can crash trying to read beyond
end of buffer. Fix the check to avoid possible overflow.

CC: stable@vger.kernel.org
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-10 18:02:17 +02:00
Jan Kara
44f4f729e7 ext3: Check return value of blkdev_issue_flush()
blkdev_issue_flush() can fail. Make sure the error gets properly propagated.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 23:40:46 +02:00
Jan Kara
349ecd6a3c jbd: Check return value of blkdev_issue_flush()
blkdev_issue_flush() can fail. Make sure the error gets properly propagated.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 23:38:36 +02:00
Jan Kara
17dc59ba41 udf: Do not decrement i_blocks when freeing indirect extent block
Indirect extent block is not accounted in i_blocks during allocation
thus we should not decrement i_blocks when we are freeing such block
during truncation.

Reported-by: Steve Nickel <snickel58@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 13:24:21 +02:00
Jan Kara
bff943af6f udf: Fix memory leak when mounting
When we are mounting filesystem, we can load one partition table before
finding out that we cannot complete processing of logical volume descriptor
and trying the reserve descriptor. Free the table properly before trying
the reserve descriptor.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Wanlong Gao
e124a32043 ext2: cleanup the confused goto label
Cleanup the confused goto label, since the big lock has been removed.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Ashish Sangwan
a0e589b485 UDF: Remove unnecessary variable "offset" from udf_fill_inode
The variable "offset" is not needed. Remove it.

Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Artem Bityutskiy
db8109ef98 udf: stop using s_dirt
The UDF file-system does not need the 's_dirt' superblock flag because it does
not define the 'write_super()' method. This flag was set to 1 in few places and
set to 0 in '->sync_fs()' and was basically useless. Stop using it because it
is on its way out.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:11 +02:00
Eric Sandeen
f007dbf8e5 ext3: force ro mount if ext3_setup_super() fails
If ext3_setup_super() fails i.e. due to a too-high revision,
the error is logged in dmesg but the fs is not mounted RO as
indicated.

Tested by:

[164152.114551] EXT3-fs (sdb6): error: revision level too high, forcing read-only mode
/dev/sdb6 /mnt/test2 ext3 rw,seclabel,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered 0 0
                          ^^

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:11 +02:00
Jeff Liu
f3da93105b quota: fix checkpatch.pl warning by replacing <asm/uaccess.h> with <linux/uaccess.h>
checkpatch.pl warns:

"WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>"

Below patch fixes it.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:11 +02:00
Trond Myklebust
4035c2487f NFS: Fix list manipulation snafus in fs/nfs/direct.c
Fix 2 bugs in nfs_direct_write_reschedule:

 - The request needs to be removed from the 'reqs' list before it can
   be added to 'failed'.
 - Fix an infinite loop if the 'failed' list is non-empty.

Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-08 10:32:08 -04:00
Linus Torvalds
332a2e1244 vfs: make O_PATH file descriptors usable for 'fchdir()'
We already use them for openat() and friends, but fchdir() also wants to
be able to use O_PATH file descriptors.  This should make it comparable
to the O_SEARCH of Solaris.  In particular, O_PATH allows you to access
(not-quite-open) a directory you don't have read persmission to, only
execute permission.

Noticed during development of multithread support for ksh93.

Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org    # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-07 17:19:02 -07:00
Linus Torvalds
26c439d400 Fixes an incorrect access mode check when preparing to open a file in the lower
filesystem. This isn't an urgent fix, but it is simple and the check was
 obviously incorrect.
 
 Also fixes a couple important bugs in the eCryptfs miscdev interface. These
 changes are low risk due to the small number of users that use the miscdev
 interface. I was able to keep the changes minimal and I have some cleaner, more
 complete changes queued up for the next merge window that will build on these
 patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCgAGBQJP92LsAAoJENaSAD2qAscKCWkP/3BWpv0AS0fnrZPniXv/+vjf
 gdV4NcQhE/86VsQ7CtZS7jqfSVTzm+YTta9BTKj6jWZuGUZGcjXsZdyMpleBZukh
 TvRSW3HKCRtC8XNHzle3YUukD1o465nMEiCUQOYcWjAa3in7cZTiFU+3S2Unn5UF
 yh2Slfzjxkl2EUHEbcBiBayzaMH2gqwAvRR4sjM0P175m/jjDF6pDGT5vc0skvcP
 kLzFr/3Ia9BW1nU0yblTtSNcHzYV8GTJVEpj1NR7q59x2gVJubF6hBDtbZdaaGK0
 rYlKV+w9mRwzUCuVdb4zPCa9EGrbqH4gYvIWsCW+R0zoK57rfIRolQVYEglGE2TU
 K3HHL6UOsPASZCQqhi+K+tCmYtZaCfeMhDRgxyDOaxS4rQ6dy+XO6f9zM30qw1UB
 QHeVEQl7bM0IpByCcjVbuNJT4zTlW7xmsLm/pbGv60UBdZpqaUZptEBEpgUFjq30
 shgNLlHHWvelhf52gbff+ytCHf+IDVPT/Q2aGjhC2fgqWiQno44vR88gtMQz6b7g
 4yEL7t0TqBB9jCBu/ikTITGpRH5S149e3oYGm2P/+YYZUGlw0Gf9N6TBkctJFSg/
 /vk6aobMnjfxmeM80xOKey5Y1zDis660sgt1hX8NVAuo4hp7VQfWGhEZ8lYqzCzP
 aJci4ZXaDzwXx6UCC5w2
 =TEei
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.5-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull eCryptfs fixes from Tyler Hicks:
 "Fixes an incorrect access mode check when preparing to open a file in
  the lower filesystem.  This isn't an urgent fix, but it is simple and
  the check was obviously incorrect.

  Also fixes a couple important bugs in the eCryptfs miscdev interface.
  These changes are low risk due to the small number of users that use
  the miscdev interface.  I was able to keep the changes minimal and I
  have some cleaner, more complete changes queued up for the next merge
  window that will build on these patches."

* tag 'ecryptfs-3.5-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files
  eCryptfs: Fix lockdep warning in miscdev operations
  eCryptfs: Properly check for O_RDONLY flag before doing privileged open
2012-07-06 15:32:18 -07:00
Tyler Hicks
8dc6780587 eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files
File operations on /dev/ecryptfs would BUG() when the operations were
performed by processes other than the process that originally opened the
file. This could happen with open files inherited after fork() or file
descriptors passed through IPC mechanisms. Rather than calling BUG(), an
error code can be safely returned in most situations.

In ecryptfs_miscdev_release(), eCryptfs still needs to handle the
release even if the last file reference is being held by a process that
didn't originally open the file. ecryptfs_find_daemon_by_euid() will not
be successful, so a pointer to the daemon is stored in the file's
private_data. The private_data pointer is initialized when the miscdev
file is opened and only used when the file is released.

https://launchpad.net/bugs/994247

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
2012-07-06 15:51:12 -05:00
Linus Torvalds
1b7fa4c271 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
Pull ocfs2 fixes from Joel Becker.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  aio: make kiocb->private NUll in init_sync_kiocb()
  ocfs2: Fix bogus error message from ocfs2_global_read_info
  ocfs2: for SEEK_DATA/SEEK_HOLE, return internal error unchanged if ocfs2_get_clusters_nocache() or ocfs2_inode_lock() call failed.
  ocfs2: use spinlock irqsave for downconvert lock.patch
  ocfs2: Misplaced parens in unlikley
  ocfs2: clear unaligned io flag when dio fails
2012-07-06 10:04:39 -07:00
Linus Torvalds
064ea1ae80 Merge git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize
  cifs: fix parsing of password mount option
2012-07-06 10:02:12 -07:00
Linus Torvalds
5eecb9cc90 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "I held off on my rc5 pull because I hit an oops during log recovery
  after a crash.  I wanted to make sure it wasn't a regression because
  we have some logging fixes in here.

  It turns out that a commit during the merge window just made it much
  more likely to trigger directory logging instead of full commits,
  which exposed an old bug.

  The new backref walking code got some additional fixes.  This should
  be the final set of them.

  Josef fixed up a corner where our O_DIRECT writes and buffered reads
  could expose old file contents (not stale, just not the most recent).
  He and Liu Bo fixed crashes during tree log recover as well.

  Ilya fixed errors while we resume disk balancing operations on
  readonly mounts."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: run delayed directory updates during log replay
  Btrfs: hold a ref on the inode during writepages
  Btrfs: fix tree log remove space corner case
  Btrfs: fix wrong check during log recovery
  Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
  Btrfs: resume balance on rw (re)mounts properly
  Btrfs: restore restriper state on all mounts
  Btrfs: fix dio write vs buffered read race
  Btrfs: don't count I/O statistic read errors for missing devices
  Btrfs: resolve tree mod log locking issue in btrfs_next_leaf
  Btrfs: fix tree mod log rewind of ADD operations
  Btrfs: leave critical region in btrfs_find_all_roots as soon as possible
  Btrfs: always put insert_ptr modifications into the tree mod log
  Btrfs: fix tree mod log for root replacements at leaf level
  Btrfs: support root level changes in __resolve_indirect_ref
  Btrfs: avoid waiting for delayed refs when we must not
2012-07-05 13:06:25 -07:00
Greg Kroah-Hartman
6fbfd0592e Merge v3.5-rc5 into driver-core-next
This picks up the big printk fixes, and resolves a merge issue with:
	drivers/extcon/extcon_gpio.c

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-05 08:25:34 -07:00
Jan Kara
a4564ead76 ocfs2: Fix bogus error message from ocfs2_global_read_info
'status' variable in ocfs2_global_read_info() is always != 0 when leaving the
function because it happens to contain number of read bytes. Thus we always log
error message although everything is OK. Since all error cases properly call
mlog_errno() before jumping to out_err, there's no reason to call mlog_errno()
on exit at all. This is a fallout of c1e8d35e (conversion of mlog_exit()
calls).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:17 -07:00
Jeff Liu
65622e647b ocfs2: for SEEK_DATA/SEEK_HOLE, return internal error unchanged if ocfs2_get_clusters_nocache() or ocfs2_inode_lock() call failed.
Hello,

Since ENXIO only means "offset beyond EOF" for SEEK_DATA/SEEK_HOLE,
Hence we should return the internal error unchanged if ocfs2_inode_lock() or
ocfs2_get_clusters_nocache() call failed rather than ENXIO.
Otherwise, it will confuse the user applications when they trying to understand the root cause.

Thanks Dave for pointing this out.

Thanks,
-Jeff

Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:16 -07:00
Srinivas Eeda
a75e9ccabd ocfs2: use spinlock irqsave for downconvert lock.patch
When ocfs2dc thread holds dc_task_lock spinlock and receives soft IRQ it
deadlock itself trying to get same spinlock in ocfs2_wake_downconvert_thread.
Below is the stack snippet.

The patch disables interrupts when acquiring dc_task_lock spinlock.

	ocfs2_wake_downconvert_thread
	ocfs2_rw_unlock
	ocfs2_dio_end_io
	dio_complete
	.....
	bio_endio
	req_bio_endio
	....
	scsi_io_completion
	blk_done_softirq
	__do_softirq
	do_softirq
	irq_exit
	do_IRQ
	ocfs2_downconvert_thread
	[kthread]

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:15 -07:00
roel
16865b7c42 ocfs2: Misplaced parens in unlikley
Fix misplaced parentheses

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:13 -07:00
Junxiao Bi
3e5d3c35a6 ocfs2: clear unaligned io flag when dio fails
The unaligned io flag is set in the kiocb when an unaligned
dio is issued, it should be cleared even when the dio fails,
or it may affect the following io which are using the same
kiocb.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:26:50 -07:00
Tyler Hicks
60d65f1f07 eCryptfs: Fix lockdep warning in miscdev operations
Don't grab the daemon mutex while holding the message context mutex.
Addresses this lockdep warning:

 ecryptfsd/2141 is trying to acquire lock:
  (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]

 but task is already holding lock:
  (&(*daemon)->mux){+.+...}, at: [<ffffffffa029c2ec>] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(*daemon)->mux){+.+...}:
        [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
        [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
        [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
        [<ffffffffa029c5d7>] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs]
        [<ffffffffa029b744>] ecryptfs_send_message+0x134/0x1e0 [ecryptfs]
        [<ffffffffa029a24e>] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs]
        [<ffffffffa02960f8>] ecryptfs_write_metadata+0x108/0x250 [ecryptfs]
        [<ffffffffa0290f80>] ecryptfs_create+0x130/0x250 [ecryptfs]
        [<ffffffff811963a4>] vfs_create+0xb4/0x120
        [<ffffffff81197865>] do_last+0x8c5/0xa10
        [<ffffffff811998f9>] path_openat+0xd9/0x460
        [<ffffffff81199da2>] do_filp_open+0x42/0xa0
        [<ffffffff81187998>] do_sys_open+0xf8/0x1d0
        [<ffffffff81187a91>] sys_open+0x21/0x30
        [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b

 -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}:
        [<ffffffff810a3418>] __lock_acquire+0x1bf8/0x1c50
        [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
        [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
        [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
        [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
        [<ffffffff811887d3>] vfs_read+0xb3/0x180
        [<ffffffff811888ed>] sys_read+0x4d/0x90
        [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-07-03 16:34:10 -07:00
Tyler Hicks
9fe79d7600 eCryptfs: Properly check for O_RDONLY flag before doing privileged open
If the first attempt at opening the lower file read/write fails,
eCryptfs will retry using a privileged kthread. However, the privileged
retry should not happen if the lower file's inode is read-only because a
read/write open will still be unsuccessful.

The check for determining if the open should be retried was intended to
be based on the access mode of the lower file's open flags being
O_RDONLY, but the check was incorrectly performed. This would cause the
open to be retried by the privileged kthread, resulting in a second
failed open of the lower file. This patch corrects the check to
determine if the open request should be handled by the privileged
kthread.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-07-03 16:34:09 -07:00
Linus Torvalds
a3da2c6913 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block bits from Jens Axboe:
 "As vacation is coming up, thought I'd better get rid of my pending
  changes in my for-linus branch for this iteration.  It contains:

   - Two patches for mtip32xx.  Killing a non-compliant sysfs interface
     and moving it to debugfs, where it belongs.

   - A few patches from Asias.  Two legit bug fixes, and one killing an
     interface that is no longer in use.

   - A patch from Jan, making the annoying partition ioctl warning a bit
     less annoying, by restricting it to !CAP_SYS_RAWIO only.

   - Three bug fixes for drbd from Lars Ellenberg.

   - A fix for an old regression for umem, it hasn't really worked since
     the plugging scheme was changed in 3.0.

   - A few fixes from Tejun.

   - A splice fix from Eric Dumazet, fixing an issue with pipe
     resizing."

* 'for-linus' of git://git.kernel.dk/linux-block:
  scsi: Silence unnecessary warnings about ioctl to partition
  block: Drop dead function blk_abort_queue()
  block: Mitigate lock unbalance caused by lock switching
  block: Avoid missed wakeup in request waitqueue
  umem: fix up unplugging
  splice: fix racy pipe->buffers uses
  drbd: fix null pointer dereference with on-congestion policy when diskless
  drbd: fix list corruption by failing but already aborted reads
  drbd: fix access of unallocated pages and kernel panic
  xen/blkfront: Add WARN to deal with misbehaving backends.
  blkcg: drop local variable @q from blkg_destroy()
  mtip32xx: Create debugfs entries for troubleshooting
  mtip32xx: Remove 'registers' and 'flags' from sysfs
  blkcg: fix blkg_alloc() failure path
  block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
  block: fix return value on cfq_init() failure
  mtip32xx: Remove version.h header file inclusion
  xen/blkback: Copy id field when doing BLKIF_DISCARD.
2012-07-03 15:45:10 -07:00
Jeff Layton
ec01d738a1 cifs: when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize
When the server doesn't advertise CAP_LARGE_READ_X, then MS-CIFS states
that you must cap the size of the read at the client's MaxBufferSize.
Unfortunately, testing with many older servers shows that they often
can't service a read larger than their own MaxBufferSize.

Since we can't assume what the server will do in this situation, we must
be conservative here for the default. When the server can't do large
reads, then assume that it can't satisfy any read larger than its
MaxBufferSize either.

Luckily almost all modern servers can do large reads, so this won't
affect them. This is really just for older win9x and OS/2 era servers.
Also, note that this patch just governs the default rsize. The admin can
always override this if he so chooses.

Cc: <stable@vger.kernel.org> # 3.2
Reported-by: David H. Durgee <dhdurgee@acm.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steven French <sfrench@w500smf.(none)>
2012-07-03 12:54:42 -05:00
Chris Mason
b6305567e7 Btrfs: run delayed directory updates during log replay
While we are resolving directory modifications in the
tree log, we are triggering delayed metadata updates to
the filesystem btrees.

This commit forces the delayed updates to run so the
replay code can find any modifications done.  It stops
us from crashing because the directory deleltion replay
expects items to be removed immediately from the tree.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
2012-07-02 15:39:19 -04:00
Josef Bacik
7fd1a3f73f Btrfs: hold a ref on the inode during writepages
We can race with unlink and not actually be able to do our igrab in
btrfs_add_ordered_extent.  This will result in all sorts of problems.
Instead of doing the complicated work to try and handle returning an error
properly from btrfs_add_ordered_extent, just hold a ref to the inode during
writepages.  If we cannot grab a ref we know we're freeing this inode anyway
and can just drop the dirty pages on the floor, because screw them we're
going to invalidate them anyway.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:18 -04:00
Josef Bacik
bdb7d303b3 Btrfs: fix tree log remove space corner case
The tree log stuff can have allocated space that we end up having split
across a bitmap and a real extent.  The free space code does not deal with
this, it assumes that if it finds an extent or bitmap entry that the entire
range must fall within the entry it finds.  This isn't necessarily the case,
so rework the remove function so it can handle this case properly.  This
fixed two panics the user hit, first in the case where the space was
initially in a bitmap and then in an extent entry, and then the reverse
case.  Thanks,

Reported-and-tested-by: Shaun Reich <sreich@kde.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:18 -04:00
Liu Bo
6bf02314d9 Btrfs: fix wrong check during log recovery
When we're evicting an inode during log recovery, we need to ensure that the inode
is not in orphan state any more, which means inode's run_time flags has _no_
BTRFS_INODE_HAS_ORPHAN_ITEM.  Thus, the BUG_ON was triggered because of a wrong
check for the flags.

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:17 -04:00
Alexander Block
d3a94048c9 Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
We used the wrong ioctl macro for the getflags ioctl before.
As we don't have the set/getflags ioctls in the user space ioctl.h
at the moment, it's safe to fix it now.

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
2012-07-02 15:39:17 -04:00
Ilya Dryomov
2b6ba629b5 Btrfs: resume balance on rw (re)mounts properly
This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-07-02 15:39:17 -04:00
Ilya Dryomov
68310a5e42 Btrfs: restore restriper state on all mounts
Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts.  This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-07-02 15:39:16 -04:00
Josef Bacik
c3473e8300 Btrfs: fix dio write vs buffered read race
Miao pointed out there's a problem with mixing dio writes and buffered
reads.  If the read happens between us invalidating the page range and
actually locking the extent we can bring in pages into page cache.  Then
once the write finishes if somebody tries to read again it will just find
uptodate pages and we'll read stale data.  So we need to lock the extent and
check for uptodate bits in the range.  If there are uptodate bits we need to
unlock and invalidate again.  This will keep this race from happening since
we will hold the extent locked until we create the ordered extent, and then
teh read side always waits for ordered extents.  There was also a race in
how we updated i_size, previously we were relying on the generic DIO stuff
to adjust the i_size after the DIO had completed, but this happens outside
of the extent lock which means reads could come in and not see the updated
i_size.  So instead move this work into where we create the extents, and
then this way the update ordered i_size stuff works properly in the endio
handlers.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-07-02 15:36:23 -04:00
Stefan Behrens
597a60fade Btrfs: don't count I/O statistic read errors for missing devices
It is normal behaviour of the low level btrfs function btrfs_map_bio()
to complete a bio with -EIO if the device is missing, instead of just
preventing the bio creation in an earlier step.
This used to cause I/O statistic read error increments and annoying
printk_ratelimited messages. This commit fixes the issue.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reported-by: Carey Underwood <cwillu@cwillu.com>
2012-07-02 15:36:23 -04:00
Linus Torvalds
221d3ebf3a Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes from Jan Kara:
 "Make UDF more robust in presence of corrupted filesystem"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Fortify loading of sparing table
  udf: Avoid run away loop when partition table length is corrupted
  udf: Use 'ret' instead of abusing 'i' in udf_load_logicalvol()
2012-06-28 11:43:45 -07:00
Linus Torvalds
9a7c6b73c4 Fix the debugfs regression - we never enable it because incorrect
'IS_ENABLED()' macro usage: should be 'IS_ENABLED(CONFIG_DEBUG_FS)',
 but we had 'IS_ENABLED(DEBUG_FS)'. Also fix incorrect assertion.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP7Hx+AAoJECmIfjd9wqK04FEP/3NC0qMlleuEcHv9AFwOJ1PB
 rn1PDjz6kuEjZ1/xhGFoboBOUj577qyzIP6IvG7MqSH66Yc8gBF+sPbKWWglx7wB
 i2y7/Fi4lHM3w59GfvnOhdI7McklhPyl3R183MZ3EBJk4V0LPi2rXsl7G5puLNgG
 XkJuOjXLXZPgyeMR+DlBsoaaxBMihnh/pdpUAyLER1cQdzQwCzba82tNrMgnCp7i
 TIFTPtn+LmEQyHcqXx5ub/FV6BiEUXIJbkKlp5Ajqyh/olSNtGdCHRrP5MXpU2kI
 DmtyMvp+3PBHxrUYQjXT6uerL9uXhIUyRv49qO0tS68fUg44JCFflmPkoV9qEvvl
 ADbOqklx1DWdVCiZdhXWe1GFhf6U+TOoUyeiIzGIy0fIIlycNl915F4LzxVUqQKm
 yoqouEvzqd1LIAsopakF2DDIKoK6ViWmHBkuN04B+u+iab4DC4aX3vgxq+Ie8mqA
 0QNIamovk/2MR2665XhbARu0yDSEmGZvD6dkuSAgXIxjw7tdvvlY7pkSouWTOSR0
 fbqPrhgbRON+mT4Fcrb5dMq+PAiOTw5kp7az90+U6i1oLm5TY8CaHt3UvmdspOM/
 UeHRhLR8o/RhfnvnexiOxIWtQGCX+CCuePe9oN/fQabj9dfvKI1iR+zoyheFLwiT
 HxaU6I7oVYQVkeifNJVV
 =iZTv
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.5-rc5' of git://git.infradead.org/linux-ubifs

Pull ubi/ubifs fixes from Artem Bityutskiy:
 "Fix the debugfs regression - we never enable it because incorrect
  'IS_ENABLED()' macro usage: should be 'IS_ENABLED(CONFIG_DEBUG_FS)',
  but we had 'IS_ENABLED(DEBUG_FS)'.  Also fix incorrect assertion."

* tag 'upstream-3.5-rc5' of git://git.infradead.org/linux-ubifs:
  UBI: correct usage of IS_ENABLED()
  UBIFS: correct usage of IS_ENABLED()
  UBIFS: fix assertion
2012-06-28 11:41:43 -07:00
Jan Kara
1df2ae31c7 udf: Fortify loading of sparing table
Add sanity checks when loading sparing table from disk to avoid accessing
unallocated memory or writing to it.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-06-28 19:31:09 +02:00
Jan Kara
adee11b208 udf: Avoid run away loop when partition table length is corrupted
Check provided length of partition table so that (possibly maliciously)
corrupted partition table cannot cause accessing data beyond current buffer.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-06-28 19:30:58 +02:00
Jan Kara
cb14d340ef udf: Use 'ret' instead of abusing 'i' in udf_load_logicalvol()
Signed-off-by: Jan Kara <jack@suse.cz>
2012-06-28 19:30:40 +02:00
Masatake YAMATO
44b8db1386 GFS2: Fixing double brelse'ing bh allocated in gfs2_meta_read when EIO occurs
This patch fixes buffer_head double free in following code path:

gfs2_block_map
=> gfs2_meta_inode_buffer
 => gfs2_meta_indirect_buffer
  => gfs2_meta_read
=> release_metapath

gfs2_block_map calls gfs2_meta_inode_buffer with &mp.mp_bh[0]
as an argument. mp.mp_bh are filled with zero at the beginning
of gfs2_block_map.

If gfs2_meta_inode_buffer returns non-zero value, gfs2_block_map
calls release_metapath to free buffers chained to mp.mp_bh.
release_metapath checks each slot of mp.mp_bh[i] and
free(with brelse) unless the slot is filled with NULL.

&mp.mp_bh[0] passed to gfs2_meta_inode_buffer is filled at
gfs2_meta_read. gfs2_meta_read is filled a buffer allocated with
gfs2_getbuf even if EIO occurs. When EIO occurs, the allocated buffer
is brelse'ed though the pointer(wrong poiner) points the brelse'ed is
passed back to caller via an argument bhp.

gfs2_meta_indirect_buffer, the caller also pass the wrong pointer
to its caller with EIO. Finally gfs2_block_map gets both EIO and
&mp.mp_bh[0] filled with the wrong pointer. release_metapath
calls brelse again on the wrong pointer.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-28 15:35:47 +01:00
Jan Schmidt
d42244a0d3 Btrfs: resolve tree mod log locking issue in btrfs_next_leaf
With the tree mod log, we may end up with two roots (the current root and a
rewinded version of it) both pointing to two leaves, l1 and l2, of which l2
had already been cow-ed in the current transaction. If we don't rewind any
tree blocks, we cannot have two roots both pointing to an already cowed tree
block.

Now there is btrfs_next_leaf, which has a leaf locked and wants a lock on
the next (right) leaf. And there is push_leaf_left, which has a (cowed!)
leaf locked and wants a lock on the previous (left) leaf.

In order to solve this dead lock situation, we use try_lock in
btrfs_next_leaf (only in case it's called with a tree mod log time_seq
paramter) and if we fail to get a lock on the next leaf, we give up our lock
on the current leaf and retry from the very beginning.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:40 +02:00
Jan Schmidt
19956c7e94 Btrfs: fix tree mod log rewind of ADD operations
When a MOD_LOG_KEY_ADD operation is rewinded, we remove the key from the
tree block. If its not the last key, removal involves a move operation.
This move operation was explicitly done before this commit.

However, at insertion time, there's a move operation before the actual
addition to make room for the new key, which is recorded in the tree mod
log as well. This means, we must drop the move operation when rewinding the
add operation, because the next operation we'll be rewinding will be the
corresponding MOD_LOG_MOVE_KEYS operation.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:40 +02:00
Jan Schmidt
155725c9c0 Btrfs: leave critical region in btrfs_find_all_roots as soon as possible
When delayed refs exist, btrfs_find_all_roots used to hold the delayed ref
mutex way longer than actually required. We ought to drop it immediately
after we're done collecting all the delayed refs.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:39 +02:00
Jan Schmidt
c3e0696523 Btrfs: always put insert_ptr modifications into the tree mod log
Several callers of insert_ptr set the tree_mod_log parameter to 0 to avoid
addition to the tree mod log. In fact, we need all of those operations. This
commit simply removes the additional parameter and makes addition to the
tree mod log unconditional.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:39 +02:00
Jan Schmidt
28da9fb446 Btrfs: fix tree mod log for root replacements at leaf level
For the tree mod log, we don't log any operations at leaf level. If the root
is at the leaf level (i.e. the tree consists only of the root), then
__tree_mod_log_oldest_root will find a ROOT_REPLACE operation in the log
(because we always log that one no matter which level), but no other
operations.

With this patch __tree_mod_log_oldest_root exits cleanly instead of
BUGging in this situation. get_old_root checks if its really a root at leaf
level in case we don't have any operations and WARNs if this assumption
breaks.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:38 +02:00
Jan Schmidt
9345457f4a Btrfs: support root level changes in __resolve_indirect_ref
With the tree mod log, we can have a tree that's two levels high, but
btrfs_search_old_slot may still return a path with the tree root at level
one instead. __resolve_indirect_ref must care for this and accept parents in
a lower level than expected.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:38 +02:00
Jan Schmidt
8ca78f3eda Btrfs: avoid waiting for delayed refs when we must not
We track two conditions to decide if we should sleep while waiting for more
delayed refs, the number of delayed refs (num_refs) and the first entry in
the list of blockers (first_seq).

When we suspect staleness, we save num_refs and do one more cycle. If
nothing changes, we then save first_seq for later comparison and do
wait_event. We ought to save first_seq the very same moment we're saving
num_refs. Otherwise we cannot be sure that nothing has changed and we might
start waiting when we shouldn't, which could lead to starvation.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:35 +02:00
Brian Norris
2d4cf5ae12 UBIFS: correct usage of IS_ENABLED()
Commit "818039c UBIFS: fix debugfs-less systems support" fixed one
regression but introduced a different regression - the debugfs is now always
compiled out. Root cause: IS_ENABLED() arguments should be used with the
CONFIG_* prefix.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-06-27 14:22:15 +03:00
Greg Kroah-Hartman
bcc66c0b88 Merge 3.5-rc4 into staging-next
This picks up the staging changes made in 3.5-rc4 so that everyone can sync up
properly.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-25 09:31:00 -07:00
Linus Torvalds
002b758b6d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
 "There are a couple of fixes from Yan for bad pointer dereferences in
  the messenger code and when fiddling with page->private after page
  migration, a fix from Alex for a use-after-free in the osd client
  code, and a couple fixes for the message refcounting and shutdown
  ordering."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: flush msgr queue during mon_client shutdown
  rbd: Clear ceph_msg->bio_iter for retransmitted message
  libceph: use con get/put ops from osd_client
  libceph: osd_client: don't drop reply reference too early
  ceph: check PG_Private flag before accessing page->private
2012-06-22 17:47:08 -07:00
Linus Torvalds
369c4f542f Fixes for 3.5-rc
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJP435OAAoJENaLyazVq6ZOms0P/38KYwNpgGgoeO57ZNXtGXen
 C98aa0IwFkjNUFPIogJD4e6gcxfxI9d+626xFZpnkoIEXXpEco5xexjBSIRfg3d7
 rNB4HgyQvybJrmRimqyIonTq5DVhQNnlxfYLJKtpM8dhSodNF3YGWmXMcXRSvoZO
 D1gMXJxTCoZJ5HjVMFRUOfgKX8RYrM7zNmrzMnefUOkuyFN2Dll7ZxGil05GKnQn
 IYds1DvRnyge8o4b7tRUrI50YM3j/w1HgtEDuquQ6UWvOCdbivpPH/+JaP5yD6kQ
 H39UBmi+2orC5m8AbDGGaeNuWC844emsLHDkZ63YTrDlEPwo7XZZQ7q8PgKsfqWz
 wzOUj9K0VAmcRmPjLsgwktsZYr+tyNYW+fYz6NzlSTtBK8fTyTPiyTNEJ4fEA42O
 poRbwH1yoM0YLyII+I+dOb2gk7mOsJa7QMYUE+Art6QWapfUOvQEaIewIEoMyJL9
 5Tl4jAZzso0aHmz/72s7CgV7j3gUrOCKGklPYIe5pGt5504CUxypjF2E01cqV5hJ
 QNz3LuC8Koraj787DZqY/w7Kk+SNsyzBOidlgy7hgjJEmNrsXAtweyGXH1gKOF3G
 ZstBsrgCacgPi+1ixJNJBCDbnM0p6XUZhqFT76aV78CaMgKfslzbgS9qFeCEBS2h
 kMvFaTHC9obmAGikOD4f
 =QLde
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-Jun-21-2012' of git://oss.sgi.com/xfs/xfs

Pull XFS fixes from Ben Myers:
 - Fix stale data exposure with unwritten extents
 - Fix a warning in xfs_alloc_vextent with ODEBUG
 - Fix overallocation and alignment of pages for xfs_bufs
 - Fix a cursor leak
 - Fix a log hang
 - Fix a crash related to xfs_sync_worker
 - Rename xfs log structure from struct log to struct xlog so we can use
   crash dumps effectively

* tag 'for-linus-Jun-21-2012' of git://oss.sgi.com/xfs/xfs:
  xfs: rename log structure to xlog
  xfs: shutdown xfs_sync_worker before the log
  xfs: Fix overallocation in xfs_buf_allocate_memory()
  xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
  xfs: check for stale inode before acquiring iflock on push
  xfs: fix debug_object WARN at xfs_alloc_vextent()
  xfs: xfs_vm_writepage clear iomap_valid when !buffer_uptodate (REV2)
2012-06-22 11:07:55 -07:00
Linus Torvalds
636040b4ed NFS client bugfixes for Linux 3.5
Fixes include:
 - Fix a write hang due to an uninitalised variable when !defined(CONFIG_NFS_V4)
 - Address upcall races in the legacy NFSv4 idmapper
 - Remove an O_DIRECT refcounting issue
 - Fix a pNFS refcounting bug when the file layout metadata server is also
   acting as a data server
 - Fix a pNFS module loading race.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP46PEAAoJEGcL54qWCgDyClwP/RlcSUAgTeFo3VFcedMdZqKN
 cdYzzyT2r0rzxtEOkdE1aFqukspMTx6cU83opHYJKYP66stkx98JW0+LVcsg8vtm
 SKjYZRAM/xsZVo+m8E3iQ9Z7K0kn1W/+OSwzJO7arIqo++fV8aiGn4+Gpgx9SrWS
 FU5iC7p1LThOZks3Nis0VLbLDpS058vRJgyfCzTS1NyjABHOOEYb6JhhkYeXLH7G
 vn0QRGXyq2sGxUYhR3BNWdRMn9XV5p5mOUoVjLxPBV84gm7wIjYKOGJHQRSn4Ew/
 CEYAvBksGpT7ifJflkzgg8acVSvuq7HacMpHj9O9SpT5aesvVuhcm3pxUN1YJo6m
 WNRj3kqgc6eCuIbiA+ENZuHIsLDzOFw3H4RhKCPj7C9HG88nFGIhrGJ9OIIut1AF
 X81L5aTox3UASZXuieZ0dAqVyTH7n288SSTzYaYy5O++4cW4hqZt7wQegr8Sk6b9
 8zrWXkLjTNGFZo3mAhlgZf5qV3UYt/yNCk9U/1JxvH+1tPTvfYpqavdXusMJ03rn
 7z4LQxwD93YhkiD8NNGDHoBoZesmE3E0ucug+Cb1wLeT0b0C9ChOYdAphQkXxkNl
 lJxN4TfoBCgwwQx88Z/UilNvIGffJwVZzRgX6y//WACPssCdM5S0Zlb8nGIGb98G
 J2uFwuqP0WNhMPSbg+Wj
 =uEPz
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix a write hang due to an uninitalised variable when
   !defined(CONFIG_NFS_V4)
 - Address upcall races in the legacy NFSv4 idmapper
 - Remove an O_DIRECT refcounting issue
 - Fix a pNFS refcounting bug when the file layout metadata server is
   also acting as a data server
 - Fix a pNFS module loading race.

* tag 'nfs-for-3.5-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Force the legacy idmapper to be single threaded
  NFS: Initialise commit_info.rpc_out when !defined(CONFIG_NFS_V4)
  NFS: Fix a refcounting issue in O_DIRECT
  NFSv4.1: Fix a race in set_pnfs_layoutdriver
  NFSv4.1: Fix umount when filelayout DS is also the MDS
2012-06-21 16:05:43 -07:00
Linus Torvalds
8874e812fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This is a small pull with btrfs fixes.  The biggest of the bunch is
  another fix for the new backref walking code.

  We're still hammering out one btrfs dio vs buffered reads problem, but
  that one will have to wait for the next rc."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: delay iput with async extents
  Btrfs: add a missing spin_lock
  Btrfs: don't assume to be on the correct extent in add_all_parents
  Btrfs: introduce btrfs_next_old_item
2012-06-21 13:41:07 -07:00
Mark Tinguely
f7bdf03a99 xfs: rename log structure to xlog
Rename the XFS log structure to xlog to help crash distinquish it from the
other logs in Linux.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:21:11 -05:00
Ben Myers
8866fc6fa5 xfs: shutdown xfs_sync_worker before the log
Revert commit 1307bbd, which uses the s_umount semaphore to provide
exclusion between xfs_sync_worker and unmount, in favor of shutting down
the sync worker before freeing the log in xfs_log_unmount.  This is a
cleaner way of resolving the race between xfs_sync_worker and unmount
than using s_umount.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2012-06-21 14:20:48 -05:00
Jan Kara
59c84ed0dd xfs: Fix overallocation in xfs_buf_allocate_memory()
Commit de1cbee which removed b_file_offset in favor of b_bn introduced a bug
causing xfs_buf_allocate_memory() to overestimate the number of necessary
pages. The problem is that xfs_buf_alloc() sets b_bn to -1 and thus effectively
every buffer is straddling a page boundary which causes
xfs_buf_allocate_memory() to allocate two pages and use vmalloc() for access
which is unnecessary.

Dave says xfs_buf_alloc() doesn't need to set b_bn to -1 anymore since the
buffer is inserted into the cache only after being fully initialized now.
So just make xfs_buf_alloc() fill in proper block number from the beginning.

CC: David Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:20:36 -05:00
Dave Chinner
76d095388b xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
When we fail to find an matching extent near the requested extent
specification during a left-right distance search in
xfs_alloc_ag_vextent_near, we fail to free the original cursor that
we used to look up the XFS_BTNUM_CNT tree and hence leak it.

Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:20:20 -05:00
Brian Foster
9a3a5dab63 xfs: check for stale inode before acquiring iflock on push
An inode in the AIL can be flush locked and marked stale if
a cluster free transaction occurs at the right time. The
inode item is then marked as flushing, which causes xfsaild
to spin and leaves the filesystem stalled. This is
reproduced by running xfstests 273 in a loop for an
extended period of time.

Check for stale inodes before the flush lock. This marks
the inode as pinned, leads to a log flush and allows the
filesystem to proceed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:20:06 -05:00
Josef Bacik
cb77fcd885 Btrfs: delay iput with async extents
There is some concern that these iput()'s could be the final iputs and could
induce lockups on people waiting on writeback.  This would happen in the
rare case that we don't create ordered extents because of an error, but it
is theoretically possible and we already have a mechanism to deal with this
so just make them delayed iputs to negate any worry.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:36 -04:00
Josef Bacik
e18fca7342 Btrfs: add a missing spin_lock
When fixing up the locking in the delayed ref destruction work I accidently
broke the locking myself ;(.  Add back a spin_lock that should be there and
we are now all set.  Thanks,
Btrfs: add a missing spin_lock

When fixing up the locking in the delayed ref destruction work I accidently
broke the locking myself ;(.  Add back a spin_lock that should be there and
we are now all set.  Thanks,

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:35 -04:00
Alexander Block
69bca40d41 Btrfs: don't assume to be on the correct extent in add_all_parents
add_all_parents did assume that path is already at a correct extent data
item, which may not be true in case of data extents that were partly
rewritten and splitted.

We need to check if we're on a matching extent for every item and only
for the ones after the first. The loop is changed to do this now.

This patch also fixes a bug introduced with commit 3b127fd8 "Btrfs:
remove obsolete btrfs_next_leaf call from __resolve_indirect_ref".
The removal of next_leaf did sometimes result in slot==nritems when
the above described case happens, and thus resulting in invalid values
(e.g. wanted_obejctid) in add_all_parents (leading to missed backrefs
or even crashes).

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:34 -04:00
Alexander Block
1c8f52a5e9 Btrfs: introduce btrfs_next_old_item
We introduce btrfs_next_old_item that uses btrfs_next_old_leaf instead
of btrfs_next_leaf.

btrfs_next_item is also changed to simply call btrfs_next_old_item with
time_seq being 0.

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:34 -04:00
Anton Vorontsov
1e6a9e5625 pstore/ram_core: Better ECC size checking
- Instead of exploiting unsigned overflows (which doesn't work for all
  sizes), use straightforward checking for ECC total size not exceeding
  initial buffer size;

- Printing overflowed buffer_size is not informative. Instead, print
  ecc_size and buffer_size;

- No need for buffer_size argument in persistent_ram_init_ecc(),
  we can address prz->buffer_size directly.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Anton Vorontsov
beeb94321a pstore/ram_core: Proper checking for post_init errors (e.g. improper ECC size)
We will implement variable-sized ECC buffers soon, so post_init routine
might fail much more likely, so we'd better check for its errors.

To make error handling simple, modify persistent_ram_free() to it be safe
at all times.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Anton Vorontsov
90b58d9690 pstore/ram: Fix error handling during przs allocation
persistent_ram_new() returns ERR_PTR() value on errors, so during
freeing of the przs we should check for both NULL and IS_ERR() entries,
otherwise bad things will happen.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Anton Vorontsov
924d37118f pstore/ram: Probe as early as possible
Registering the platform driver before module_init allows us to log oopses
that happen during device probing.

This requires changing module_init to postcore_initcall, and switching
from platform_driver_probe to platform_driver_register because the
platform device is not registered when the platform driver is registered;
and because we use driver_register, now can't use create_bundle() (since
it will try to register the same driver once again), so we have to switch
to platform_device_register_data().

Also, some __init -> __devinit changes were needed.

Overall, the registration logic is now much clearer, since we have only
one driver registration point, and just an optional dummy device, which
is created from the module parameters.

Suggested-by: Colin Cross <ccross@android.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Linus Torvalds
bc259adc9b staging tree fixes for 3.5-rc4
Here are a number of small fixes for the drivers/staging tree, as well as iio
 and pstore drivers (which came from the staging tree in the 3.5-rc1 merge).
 All of these are tiny, but resolve issues that people have been reporting.
 
 There's also a documentation update to reflect what the iio drivers really are
 doing, which is good to get straightened out.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk/iNeAACgkQMUfUDdst+ynNVwCdHCj6smC2JUbvN34gACNrpsYY
 WggAoJzQn9mQhwq0pa/ZTpaUOvCFZ39L
 =hDkC
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging tree fixes from Greg Kroah-Hartman:
 "Here are a number of small fixes for the drivers/staging tree, as well
  as iio and pstore drivers (which came from the staging tree in the
  3.5-rc1 merge).  All of these are tiny, but resolve issues that people
  have been reporting.

  There's also a documentation update to reflect what the iio drivers
  really are doing, which is good to get straightened out.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'staging-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: r8712u: Add new USB IDs
  staging: gdm72xx: Release netlink socket properly
  iio: drop wrong reference from Kconfig
  pstore/inode: Make pstore_fill_super() static
  pstore/ram: Should zap persistent zone on unlink
  pstore/ram_core: Factor persistent_ram_zap() out of post_init()
  pstore/ram_core: Do not reset restored zone's position and size
  pstore/ram: Should update old dmesg buffer before reading
  staging:iio:ad7298: Fix linker error due to missing IIO kfifo buffer
  Revert "staging: usbip: bugfix for stack corruption on 64-bit architectures"
  staging: usbip: bugfix for stack corruption on 64-bit architectures
  staging/comedi: fix build for USB not enabled
  staging: omapdrm: fix crash when freeing bad fb
  staging:iio:ad7606: Re-add missing scale attribute
  iio: Fix potential use after free
  staging:iio: remove num_interrupt_lines from documentation
  iio: documentation: Add out_altvoltage and friends
2012-06-20 15:15:03 -07:00
Linus Torvalds
fe80352460 Driver core and printk fixes for 3.5-rc4
Here are some fixes for 3.5-rc4 that resolve the kmsg problems that
 people have reported showing up after the printk and kmsg changes went
 into 3.5-rc1.  There are also a smattering of other tiny fixes for the
 extcon and hyper-v drivers that people have reported.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk/iNQcACgkQMUfUDdst+yklTQCfZCXFlhA43bZo/8Joqd2pLIIW
 2uoAoMze0SlfJeN6Qu7yY0P+qV/f/pc3
 =UNFY
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core and printk fixes from Greg Kroah-Hartman:
 "Here are some fixes for 3.5-rc4 that resolve the kmsg problems that
  people have reported showing up after the printk and kmsg changes went
  into 3.5-rc1.  There are also a smattering of other tiny fixes for the
  extcon and hyper-v drivers that people have reported.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  extcon: max8997: Add missing kfree for info->edev in max8997_muic_remove()
  extcon: Set platform drvdata in gpio_extcon_probe() and fix irq leak
  extcon: Fix wrong index in max8997_extcon_cable[]
  kmsg - kmsg_dump() fix CONFIG_PRINTK=n compilation
  printk: return -EINVAL if the message len is bigger than the buf size
  printk: use mutex lock to stop syslog_seq from going wild
  kmsg - kmsg_dump() use iterator to receive log buffer content
  vme: change maintainer e-mail address
  Extcon: Don't try to create duplicate link names
  driver core: fixup reversed deferred probe order
  printk: Fix alignment of buf causing crash on ARM EABI
  Tools: hv: verify origin of netlink connector message
2012-06-20 15:14:28 -07:00
Linus Torvalds
a2a2609c97 Merge branch 'akpm' (Andrew's patch-bomb)
* emailed from Andrew Morton <akpm@linux-foundation.org>: (21 patches)
  mm/memblock: fix overlapping allocation when doubling reserved array
  c/r: prctl: Move PR_GET_TID_ADDRESS to a proper place
  pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper
  pidns: guarantee that the pidns init will be the last pidns process reaped
  fault-inject: avoid call to random32() if fault injection is disabled
  Viresh has moved
  get_maintainer: Fix --help warning
  mm/memory.c: fix kernel-doc warnings
  mm: fix kernel-doc warnings
  mm: correctly synchronize rss-counters at exit/exec
  mm, thp: print useful information when mmap_sem is unlocked in zap_pmd_range
  h8300: use the declarations provided by <asm/sections.h>
  h8300: fix use of extinct _sbss and _ebss
  xtensa: use the declarations provided by <asm/sections.h>
  xtensa: use "test -e" instead of bashism "test -a"
  xtensa: replace xtensa-specific _f{data,text} by _s{data,text}
  memcg: fix use_hierarchy css_is_ancestor oops regression
  mm, oom: fix and cleanup oom score calculations
  nilfs2: ensure proper cache clearing for gc-inodes
  thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
  ...
2012-06-20 14:41:57 -07:00
Konstantin Khlebnikov
4fe7efdbdf mm: correctly synchronize rss-counters at exit/exec
do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does
put_user(clear_child_tid) which can update task->rss_stat and thus make
mm->rss_stat inconsistent.  This triggers the "BUG:" printk in check_mm().

Let's fix this bug in the safest way, and optimize/cleanup this later.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-20 14:39:36 -07:00
Ryusuke Konishi
fbb24a3a91 nilfs2: ensure proper cache clearing for gc-inodes
A gc-inode is a pseudo inode used to buffer the blocks to be moved by
garbage collection.

Block caches of gc-inodes must be cleared every time a garbage collection
function (nilfs_clean_segments) completes.  Otherwise, stale blocks
buffered in the caches may be wrongly reused in successive calls of the GC
function.

For user files, this is not a problem because their gc-inodes are
distinguished by a checkpoint number as well as an inode number.  They
never buffer different blocks if either an inode number, a checkpoint
number, or a block offset differs.

However, gc-inodes of sufile, cpfile and DAT file can store different data
for the same block offset.  Thus, the nilfs_clean_segments function can
move incorrect block for these meta-data files if an old block is cached.
I found this is really causing meta-data corruption in nilfs.

This fixes the issue by ensuring cache clear of gc-inodes and resolves
reported GC problems including checkpoint file corruption, b-tree
corruption, and the following warning during GC.

  nilfs_palloc_freev: entry number 307234 already freed.
  ...

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>	[2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-20 14:39:35 -07:00
Jeff Liu
3b876c8f2a xfs: fix debug_object WARN at xfs_alloc_vextent()
Fengguang reports:

[  780.529603] XFS (vdd): Ending clean mount
[  781.454590] ODEBUG: object is on stack, but not annotated
[  781.455433] ------------[ cut here ]------------
[  781.455433] WARNING: at /c/kernel-tests/sound/lib/debugobjects.c:301 __debug_object_init+0x173/0x1f1()
[  781.455433] Hardware name: Bochs
[  781.455433] Modules linked in:
[  781.455433] Pid: 26910, comm: kworker/0:2 Not tainted 3.4.0+ #51
[  781.455433] Call Trace:
[  781.455433]  [<ffffffff8106bc84>] warn_slowpath_common+0x83/0x9b
[  781.455433]  [<ffffffff8106bcb6>] warn_slowpath_null+0x1a/0x1c
[  781.455433]  [<ffffffff814919a5>] __debug_object_init+0x173/0x1f1
[  781.455433]  [<ffffffff81491c65>] debug_object_init+0x14/0x16
[  781.455433]  [<ffffffff8108842a>] __init_work+0x20/0x22
[  781.455433]  [<ffffffff8134ea56>] xfs_alloc_vextent+0x6c/0xd5

Use INIT_WORK_ONSTACK in xfs_alloc_vextent instead of INIT_WORK.

Reported-by: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-20 14:58:24 -05:00
Alain Renaud
66f9311381 xfs: xfs_vm_writepage clear iomap_valid when !buffer_uptodate (REV2)
On filesytems with a block size smaller than PAGE_SIZE we currently have
a problem with unwritten extents.  If a we have multi-block page for
which an unwritten extent has been allocated, and only some of the
buffers have been written to, and they are not contiguous, we can expose
stale data from disk in the blocks between the writes after extent
conversion.

Example of a page with unwritten and real data.
buffer  content
0       empty  b_state = 0
1       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
2       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
3       empty  b_state = 0
4       empty  b_state = 0
5       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
6       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
7       empty  b_state = 0

Buffers 1, 2, 5, and 6 have been written to, leaving 0, 3, 4, and 7
empty.  Currently buffers 1, 2, 5, and 6 are added to a single ioend,
and when IO has completed, extent conversion creates a real extent from
block 1 through block 6, leaving 0 and 7 unwritten.  However buffers 3
and 4 were not written to disk, so stale data is exposed from those
blocks on a subsequent read.

Fix this by setting iomap_valid = 0 when we find a buffer that is not
Uptodate.  This ensures that buffers 5 and 6 are not added to the same
ioend as buffers 1 and 2.  Later these blocks will be converted into two
separate real extents, leaving the blocks in between unwritten.

Signed-off-by: Alain Renaud <arenaud@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-20 14:57:28 -05:00
Bryan Schumaker
b1027439df NFS: Force the legacy idmapper to be single threaded
It was initially coded under the assumption that there would only be one
request at a time, so use a lock to enforce this requirement..

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: stable@vger.kernel.org [3.4+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-20 14:38:11 -04:00
Yan, Zheng
61600ef848 ceph: check PG_Private flag before accessing page->private
I got lots of NULL pointer dereference Oops when compiling kernel on ceph.
The bug is because the kernel page migration routine replaces some pages
in the page cache with new pages, these new pages' private can be non-zero.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 28c0254ede)
2012-06-20 07:43:48 -05:00
Trond Myklebust
1a0de48ae5 NFS: Initialise commit_info.rpc_out when !defined(CONFIG_NFS_V4)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-06-19 18:42:28 -04:00
Trond Myklebust
5a695da263 NFS: Fix a refcounting issue in O_DIRECT
In nfs_direct_write_reschedule(), the requests from nfs_scan_commit_list
have a refcount of 2, whereas the operations in
nfs_direct_write_completion_ops expect them to have a refcount of 1.

This patch adds a call to release the extra references.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-06-19 18:42:14 -04:00
Trond Myklebust
0a9c63fae7 NFSv4.1: Fix a race in set_pnfs_layoutdriver
The call to try_module_get() dereferences ld_type outside the
spin locks, which means that it may be pointing to garbage if
a module unload was in progress.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-19 13:32:45 -04:00
Trond Myklebust
2a4c8994ee NFSv4.1: Fix umount when filelayout DS is also the MDS
Currently there is a 'chicken and egg' issue when the DS is also the mounted
MDS. The nfs_match_client() reference from nfs4_set_ds_client bumps the
cl_count, the nfs_client is not freed at umount, and nfs4_deviceid_purge_client
is not called to dereference the MDS usage of a deviceid which holds a
reference to the DS nfs_client.  The result is the umount program returns,
but the nfs_client is not freed, and the cl_session hearbeat continues.

The MDS (and all other nfs mounts) lose their last nfs_client reference in
nfs_free_server when the last nfs_server (fsid) is umounted.
The file layout DS lose their last nfs_client reference in destroy_ds
when the last deviceid referencing the data server is put and destroy_ds is
called. This is triggered by a call to nfs4_deviceid_purge_client which
removes references to a pNFS deviceid used by an MDS mount.

The fix is to track how many pnfs enabled filesystems are mounted from
this server, and then to purge the device id cache once that count reaches
zero.

Reported-by: Jorge Mora <Jorge.Mora@netapp.com>
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-18 08:45:16 -04:00
Dan Carpenter
1cfb727107 UBIFS: fix assertion
The asserts here never check anything because it uses '|' instead of
'&'.  Now if the flags are not set it prints a warning a a stack trace.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-06-18 14:17:08 +03:00
Matthew Garrett
7dea9665fe hfsplus: fix bless ioctl when used with hardlinks
HFS+ doesn't really implement hard links - instead, hardlinks are indicated
by a magic file type which refers to an indirect node in a hidden
directory. The spec indicates that stat() should return the inode number
of the indirect node, but it turns out that this doesn't satisfy the
firmware when it's looking for a bootloader - it wants the catalog ID of
the hardlink file instead. Fix up this case.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-17 14:39:59 -07:00
Janne Kalliomäki
a6dc8c0421 hfsplus: fix overflow in sector calculations in hfsplus_submit_bio
The variable io_size was unsigned int, which caused the wrong sector number
to be calculated after aligning it. This then caused mount to fail with big
volumes, as backup volume header information was searched from a
wrong sector.

Signed-off-by: Janne Kalliomäki <janne@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-17 14:39:45 -07:00
Linus Torvalds
d865983292 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs compile warning fixes from Chris Mason.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: cast devid to unsigned long long for printk %llu
  Btrfs: init old_generation in get_old_root
2012-06-16 17:01:41 -07:00
Linus Torvalds
873b779d99 NFS client bugfixes for Linux 3.5
Highlights include:
 
  - Fix a couple of mount regressions due to the recent cleanups.
  - Fix an Oops in the open recovery code
  - Fix an rpc_pipefs upcall hang that results from some of the
    net namespace work from 3.4.x (stable kernel candidate).
  - Fix a couple of write and o_direct regressions that were found
    at last weeks Bakeathon testing event in Ann Arbor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP2gmaAAoJEGcL54qWCgDyrBMP/RY/T++He8y5k3M9aEqiIv0q
 D8ZVMwzID6f4Zgw4xRg96aYr02sBTw0q+0mP5x1EZmg8mK29rnBiVeKHE1iwSfXq
 10/SYISlpIjhJC4I4kHXGd2KClgj7qRRCbDKFRWwoIIwYU+kJn8MRnPa9XqdL8kP
 q68lrtayW8THSJDR8bk1GQn+ARxGeoY++qzHxm3vpQCbZVVb19VqKMWAWSN4VKqb
 epWehOSAzB3iA7HrLRbf8Y8/sDdXewxCQpr9CC/wxuu++l5ifPphR0ToX+k9VZXI
 BKFLUojCUZHTMAgCxuxjrFYehMeyClbzL2lLkz5Pgj0gQhOX6Myj+WMXoEg/uWfo
 XNf51FH3yBbnfayTaOUs6Y50iuU+dQO7TUTAoWTPpW9V/iT5z/fWAKUVJhDtrPk5
 DVDkR6SEgb4P1RqkehZKLq5k5GSAcTR+MZr452eDrFYXJrY8ORDE6o6kP4Rr3Nnd
 n8gap0gHxzIYlhBghem6+nLN+HhpZQopWeD8mNub20VuXsChRDr9/+XWuMCSJaZF
 2kleVdt2+rTDzi9bJTRYlsX397oaThL0NbRvshHAwnXIDtIQrzxx6+dUyOsEWMEu
 go/EdSUUESXGNlsWTqewCBsOjPeE4L5ijI/QglfDkF+CzD5dDjrxl+5i57iMKVfc
 Ydste3pQJkS7PiZu1sWA
 =unbu
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - Fix a couple of mount regressions due to the recent cleanups.
   - Fix an Oops in the open recovery code
   - Fix an rpc_pipefs upcall hang that results from some of the net
     namespace work from 3.4.x (stable kernel candidate).
   - Fix a couple of write and o_direct regressions that were found at
     last weeks Bakeathon testing event in Ann Arbor."

* tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: add an endian notation for sparse
  NFSv4.1: integer overflow in decode_cb_sequence_args()
  rpc_pipefs: allow rpc_purge_list to take a NULL waitq pointer
  NFSv4 do not send an empty SETATTR compound
  NFSv2: EOF incorrectly set on short read
  NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts
  NFS: fix directio refcount bug on commit
  NFSv4: Fix unnecessary delegation returns in nfs4_do_open
  NFSv4.1: Convert another trivial printk into a dprintk
  NFS4: Fix open bug when pnfs module blacklisted
  NFS: Remove incorrect BUG_ON in nfs_found_client
  NFS: Map minor mismatch error to protocol not support error.
  NFS: Fix a commit bug
  NFS4: Set parsed mount data version to 4
  NFSv4.1: Ensure we clear session state flags after a session creation
  NFSv4.1: Convert a trivial printk into a dprintk
  NFSv4: Fix up decode_attr_mdsthreshold
  NFSv4: Fix an Oops in the open recovery code
  NFSv4.1: Fix a request leak on the back channel
2012-06-15 17:37:23 -07:00
Linus Torvalds
93dd048dbd Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linux
Pull two nfsd bugfixes from J. Bruce Fields.

* 'for-3.5' of git://linux-nfs.org/~bfields/linux:
  nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
  NFS: hard-code init_net for NFS callback transports
2012-06-15 17:27:31 -07:00
Chris Mason
a8c4a33b98 Btrfs: cast devid to unsigned long long for printk %llu
Avoid warning in 32 bit machines

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 20:07:17 -04:00
Chris Mason
4325edd078 Btrfs: init old_generation in get_old_root
gcc was giving an uninit variable warning here.  Strictly
speaking we don't need to init it, but this will make things
much less error prone.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 20:06:54 -04:00
Linus Torvalds
718f58ad61 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
 "The dates look like I had to rebase this morning because there was a
  compiler warning for a printk arg that I had missed earlier.

  These are all fixes, including one to prevent using stale pointers for
  device names, and lots of fixes around transaction abort cleanups
  (Josef, Liu Bo).

  Jan Schmidt also sent in a number of fixes for the new reference
  number tracking code.

  Liu Bo beat me to updating the MAINTAINERS file.  Since he thought to
  also fix the git url, I kept his commit."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits)
  Btrfs: update MAINTAINERS info for BTRFS FILE SYSTEM
  Btrfs: destroy the items of the delayed inodes in error handling routine
  Btrfs: make sure that we've made everything in pinned tree clean
  Btrfs: avoid memory leak of extent state in error handling routine
  Btrfs: do not resize a seeding device
  Btrfs: fix missing inherited flag in rename
  Btrfs: fix incompat flags setting
  Btrfs: fix defrag regression
  Btrfs: call filemap_fdatawrite twice for compression
  Btrfs: keep inode pinned when compressing writes
  Btrfs: implement ->show_devname
  Btrfs: use rcu to protect device->name
  Btrfs: unlock everything properly in the error case for nocow
  Btrfs: fix btrfs_destroy_marked_extents
  Btrfs: abort the transaction if the commit fails
  Btrfs: wake up transaction waiters when aborting a transaction
  Btrfs: fix locking in btrfs_destroy_delayed_refs
  Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error
  Btrfs: fix race in tree mod log addition
  Btrfs: add btrfs_next_old_leaf
  ...
2012-06-15 16:04:37 -07:00
Kay Sievers
e2ae715d66 kmsg - kmsg_dump() use iterator to receive log buffer content
Provide an iterator to receive the log buffer content, and convert all
kmsg_dump() users to it.

The structured data in the kmsg buffer now contains binary data, which
should no longer be copied verbatim to the kmsg_dump() users.

The iterator should provide reliable access to the buffer data, and also
supports proper log line-aware chunking of data while iterating.

Signed-off-by: Kay Sievers <kay@vrfy.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Reported-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Tested-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-15 14:53:59 -07:00
Miao Xie
67cde3448d Btrfs: destroy the items of the delayed inodes in error handling routine
the items of the delayed inodes were forgotten to be freed, this patch
fixes it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:28 -04:00
Liu Bo
ed0eaa1498 Btrfs: make sure that we've made everything in pinned tree clean
Since we have two trees for recording pinned extents, we need to go through
both of them to make sure that we've done everything clean.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:27 -04:00
Liu Bo
6e841e32b1 Btrfs: avoid memory leak of extent state in error handling routine
We've forgotten to clear extent states in pinned tree, which will results in
space counter mismatch and memory leak:

WARNING: at fs/btrfs/extent-tree.c:7537 btrfs_free_block_groups+0x1f3/0x2e0 [btrfs]()
...
space_info 2 has 8380416 free, is not full
space_info total=12582912, used=4096, pinned=4096, reserved=0, may_use=0, readonly=4194304
btrfs state leak: start 29364224 end 29376511 state 1 in tree ffff880075f20090 refs 1
...

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:27 -04:00
Liu Bo
4e42ae1bdc Btrfs: do not resize a seeding device
Seeding devices are not supposed to change any more.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:26 -04:00
Liu Bo
bc1782374b Btrfs: fix missing inherited flag in rename
When we move a file into a directory with compression flag, we need to
inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well.
But if we move a file into a directory without compression flag, we need
to clear both of them.

It is the way how our setflags deals with compression flag, so keep
the same behaviour here.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:33:30 -04:00
Chris Mason
acbcabd2de Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into for-linus 2012-06-15 11:33:16 -04:00
Li Zefan
69e380d176 Btrfs: fix incompat flags setting
It's a bug, but it happens to work, as BTRFS_COMPRESS_LZO == 2, which
has only one bit set.

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-06-14 21:30:57 -04:00
Li Zefan
6c282eb40e Btrfs: fix defrag regression
If a file has 3 small extents:

| ext1 | ext2 | ext3 |

Running "btrfs fi defrag" will only defrag the last two extents, if those
extent mappings hasn't been read into memory from disk.

This bug was introduced by commit 17ce6ef8d7
("Btrfs: add a check to decide if we should defrag the range")

The cause is, that commit looked into previous and next extents using
lookup_extent_mapping() only.

While at it, remove the code that checks the previous extent, since
it's sufficient to check the next extent.

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-06-14 21:30:55 -04:00
Josef Bacik
7ddf5a42d3 Btrfs: call filemap_fdatawrite twice for compression
I removed this in an earlier commit and I was wrong.  Because compression
can return from filemap_fdatawrite() without having actually set any of it's
pages as writeback() it can make filemap_fdatawait() do essentially nothing,
and then we won't find any ordered extents because they may not have been
created yet.  So not only does this make fsync() completely useless, but it
will also screw up if you truncate on a non-page aligned offset since we
zero out the end and then wait on ordered extents and then call drop caches.
We can drop the cache before the io completes and then we try to unpin the
extent we just wrote we won't find it and everything goes sideways.  So fix
this by putting it back and put a giant comment there to keep me from trying
to remove it in the future.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:30:54 -04:00
Josef Bacik
8180ef8894 Btrfs: keep inode pinned when compressing writes
A user reported lots of problems using compression on the new code and it
turns out part of the problem was that igrab() was failing when we added a
new ordered extent.  This is because when writing out an inode under
compression we immediately return without actually doing anything to the
pages, and then in another thread at some point down the line actually do
the ordered dance.  The problem is between the point that we start writeback
and we actually add the ordered extent we could be trying to reclaim the
inode, which makes igrab() return NULL.  So we need to do an igrab() when we
create the async extent and then drop it when we are done with it.  This
makes sure we stay pinned in memory until the ordered extent can get a
reference on it and we are good to go.  With this patch we no longer panic
in btrfs_finish_ordered_io().  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:30:53 -04:00
Josef Bacik
9c5085c147 Btrfs: implement ->show_devname
Because btrfs can remove the device that was mounted we need to have a
->show_devname so that in this case we can print out some other device in
the file system to /proc/mount.  So if there are multiple devices in a btrfs
file system we will just print the device with the lowest devid that we can
find.  This will make everything consistent and deal with device removal
properly.  The drawback is if you mount with a device that is higher than
the lowest devicd it won't show up as the mounted device in /proc/mounts,
but this is a small price to pay. This was inspired by Miao Xie's patch.
Thanks,

Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:30:37 -04:00
Josef Bacik
606686eeac Btrfs: use rcu to protect device->name
Al pointed out that we can just toss out the old name on a device and add a
new one arbitrarily, so anybody who uses device->name in printk could
possibly use free'd memory.  Instead of adding locking around all of this he
suggested doing it with RCU, so I've introduced a struct rcu_string that
does just that and have gone through and protected all accesses to
device->name that aren't under the uuid_mutex with rcu_read_lock().  This
protects us and I will use it for dealing with removing the device that we
used to mount the file system in a later patch.  Thanks,

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:16 -04:00
Josef Bacik
17ca04aff7 Btrfs: unlock everything properly in the error case for nocow
I was getting hung on umount when a transaction was aborted because a range
of one of the free space inodes was still locked.  This is because the nocow
stuff doesn't unlock anything on error.  This fixed the problem and I
verified that is what was happening.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:15 -04:00
Josef Bacik
ee670f0af3 Btrfs: fix btrfs_destroy_marked_extents
So we're forcing the eb's to have their ref count set to 1 so invalidatepage
works but this breaks lots of things, for example root nodes, and is just
plain wrong, we don't need to just evict all of this stuff.  Also drop the
invalidatepage altogether and add a page_cache_release().  With this patch
we no longer hang when trying to access the root nodes after an aborted
transaction and we no longer leak memory.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:14 -04:00
Josef Bacik
7b8b92af58 Btrfs: abort the transaction if the commit fails
If a transaction commit fails we don't abort it so we don't set an error on
the file system.  This patch fixes that by actually calling the abort stuff
and then adding a check for a fs error in the transaction start stuff to
make sure it is caught properly.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:13 -04:00
Josef Bacik
d7096fc3ef Btrfs: wake up transaction waiters when aborting a transaction
I was getting lots of hung tasks and a NULL pointer dereference because we
are not cleaning up the transaction properly when it aborts.  First we need
to reset the running_transaction to NULL so we don't get a bad dereference
for any start_transaction callers after this.  Also we cannot rely on
waitqueue_active() since it's just a list_empty(), so just call wake_up()
directly since that will do the barrier for us and such.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:12 -04:00
Josef Bacik
b939d1ab76 Btrfs: fix locking in btrfs_destroy_delayed_refs
The transaction abort stuff was throwing warnings from the list debugging
code because we do a list_del_init outside of the delayed_refs spin lock.
The delayed refs locking makes baby Jesus cry so it's not hard to get wrong,
but we need to take the ref head mutex to make sure it's not being processed
currently, and so if it is we need to drop the spin lock and then take and
drop the mutex and do the search again.  If we can take the mutex then we
can safely remove the head from the list and carry on.  Now when the
transaction aborts I don't get the list debugging warnings.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:11 -04:00
Josef Bacik
beb42dd793 Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error
While doing my enospc work I got a transaction abortion that resulted in a
panic when we tried to unlock_page() an already unlocked page.  This is
because we aren't calling extent_clear_unlock_delalloc with the locked page
so it was unlocking all the pages in the range.  This is wrong since
__extent_writepage expects to have the page locked still unless we return
*page_started as 1.  This should keep us from panicing.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:09 -04:00
J. Bruce Fields
bc2df47a40 nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
Most frequent symptom was a BUG triggering in expire_client, with the
server locking up shortly thereafter.

Introduced by 508dc6e110 "nfsd41:
free_session/free_client must be called under the client_lock".

Cc: stable@kernel.org
Cc: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-14 13:54:08 -04:00
Stanislav Kinsbursky
12918b10d5 NFS: hard-code init_net for NFS callback transports
In case of destroying mount namespace on child reaper exit, nsproxy is zeroed
to the point already. So, dereferencing of it is invalid.
This patch hard-code "init_net" for all network namespace references for NFS
callback services. This will be fixed with proper NFS callback
containerization.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-14 13:53:43 -04:00
Jan Schmidt
3310c36eef Btrfs: fix race in tree mod log addition
When adding to the tree modification log, we grab two locks at different
stages. We must not drop the outer lock until we're done with section
protected by the inner lock. This moves the unlock call for the outer lock
to the appropriate position.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:52:39 +02:00
Jan Schmidt
3d7806eca4 Btrfs: add btrfs_next_old_leaf
To make sense of the tree mod log, the backref walker not only needs
btrfs_search_old_slot, but it also called btrfs_next_leaf, which in turn was
calling btrfs_search_slot. This obviously didn't give the correct result.

This commit adds btrfs_next_old_leaf, a drop-in replacement for
btrfs_next_leaf with a time_seq parameter. If it is zero, it behaves exactly
like btrfs_next_leaf. If it is non-zero, it will use btrfs_search_old_slot
with this time_seq parameter.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:52:09 +02:00
Jan Schmidt
a95236d99f Btrfs: fix return value for __tree_mod_log_oldest_root
In __tree_mod_log_oldest_root() we must return the found operation even if
it's not a ROOT_REPLACE operation. Otherwise, the caller assumes that there
are no operations to be rewinded and returns immediately.

The code in the caller is modified to improve readability.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:44:22 +02:00
Jan Schmidt
8ba97a15e7 Btrfs: use btrfs_read_lock_root_node in get_old_root
get_old_root could race with root node updates because we weren't locking
the node early enough. Use btrfs_read_lock_root_node to grab the root locked
in the very beginning and release the lock as soon as possible (just like
btrfs_search_slot does).

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:44:21 +02:00
Jan Schmidt
f617e2fd52 Btrfs: remove obsolete btrfs_next_leaf call from __resolve_indirect_ref
When resolving indirect refs, we used to call btrfs_next_leaf in case we
didn't find an exact match. While we should find exact matches most of the
time, in case we don't, we must continue searching. Treating those matches
differently depending on the level we're searching doesn't make sense.

Even worse, we might end up searching for a key larger than the largest, in
which case there is no next_leaf and subsequent jobs would fail. This commit
drops the bogous lines.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:44:20 +02:00
Bob Peterson
666d1d8ad2 GFS2: Combine functions get_local_rgrp and gfs2_inplace_reserve
This function combines rgrp functions get_local_rgrp and
gfs2_inplace_reserve so that the double retry loop is gone.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-14 09:58:40 +01:00
Anton Vorontsov
521f7288a8 pstore/platform: Disable automatic updates by default
Having automatic updates seems pointless for production system, and
even dangerous and thus counter-productive:

1. If we can mount pstore, or read files, we can as well read
   /proc/kmsg. So, there's little point in duplicating the
   functionality and present the same information but via another
   userland ABI;

2. Expecting the kernel to behave sanely after oops/panic is naive.
   It might work, but you'd rather not try it. Screwed up kernel
   can do rather bad things, like recursive faults[1]; and pstore
   rather provoking bad things to happen. It uses:

   1. Timers (assumes sane interrupts state);
   2. Workqueues and mutexes (assumes scheduler in a sane state);
   3. kzalloc (a working slab allocator);

   That's too much for a dead kernel, so the debugging facility
   itself might just make debugging harder, which is not what
   we want.

Maybe for non-oops message types it would make sense to re-enable
automatic updates, but so far I don't see any use case for this.
Even for tracing, it has its own run-time/normal ABI, so we're
only interested in pstore upon next boot, to retrieve what has
gone wrong with HW or SW.

So, let's disable the updates by default.

[1]
BUG: unable to handle kernel paging request at fffffffffffffff8
IP: [<ffffffff8104801b>] kthread_data+0xb/0x20
[...]
Process kworker/0:1 (pid: 14, threadinfo ffff8800072c0000, task ffff88000725b100)
[...
Call Trace:
 [<ffffffff81043710>] wq_worker_sleeping+0x10/0xa0
 [<ffffffff813687a8>] __schedule+0x568/0x7d0
 [<ffffffff8106c24d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81087e22>] ? call_rcu_sched+0x12/0x20
 [<ffffffff8102b596>] ? release_task+0x156/0x2d0
 [<ffffffff8102b45e>] ? release_task+0x1e/0x2d0
 [<ffffffff8106c24d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81368ac4>] schedule+0x24/0x70
 [<ffffffff8102cba8>] do_exit+0x1f8/0x370
 [<ffffffff810051e7>] oops_end+0x77/0xb0
 [<ffffffff8135c301>] no_context+0x1a6/0x1b5
 [<ffffffff8135c4de>] __bad_area_nosemaphore+0x1ce/0x1ed
 [<ffffffff81053156>] ? ttwu_queue+0xc6/0xe0
 [<ffffffff8135c50b>] bad_area_nosemaphore+0xe/0x10
 [<ffffffff8101fa47>] do_page_fault+0x2c7/0x450
 [<ffffffff8106e34b>] ? __lock_release+0x6b/0xe0
 [<ffffffff8106bf21>] ? mark_held_locks+0x61/0x140
 [<ffffffff810502fe>] ? __wake_up+0x4e/0x70
 [<ffffffff81185f7d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff81158970>] ? pstore_register+0x120/0x120
 [<ffffffff8136a37f>] page_fault+0x1f/0x30
 [<ffffffff81158970>] ? pstore_register+0x120/0x120
 [<ffffffff81185ab8>] ? memcpy+0x68/0x110
 [<ffffffff8115875a>] ? pstore_get_records+0x3a/0x130
 [<ffffffff811590f4>] ? persistent_ram_copy_old+0x64/0x90
 [<ffffffff81158bf4>] ramoops_pstore_read+0x84/0x130
 [<ffffffff81158799>] pstore_get_records+0x79/0x130
 [<ffffffff81042536>] ? process_one_work+0x116/0x450
 [<ffffffff81158970>] ? pstore_register+0x120/0x120
 [<ffffffff8115897e>] pstore_dowork+0xe/0x10
 [<ffffffff81042594>] process_one_work+0x174/0x450
 [<ffffffff81042536>] ? process_one_work+0x116/0x450
 [<ffffffff81042e13>] worker_thread+0x123/0x2d0
 [<ffffffff81042cf0>] ? manage_workers.isra.28+0x120/0x120
 [<ffffffff81047d8e>] kthread+0x8e/0xa0
 [<ffffffff8136ba74>] kernel_thread_helper+0x4/0x10
 [<ffffffff8136a199>] ? retint_restore_args+0xe/0xe
 [<ffffffff81047d00>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff8136ba70>] ? gs_change+0xb/0xb
Code: be e2 00 00 00 48 c7 c7 d1 2a 4e 81 e8 bf fb fd ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 44 00 00 48 8b 87 08 02 00 00 55 48 89 e5 <48> 8b 40 f8 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
RIP  [<ffffffff8104801b>] kthread_data+0xb/0x20
 RSP <ffff8800072c1888>
CR2: fffffffffffffff8
---[ end trace 996a332dc399111d ]---
Fixing recursive fault but reboot is needed!

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:59:37 -07:00
Anton Vorontsov
a3f5f075c2 pstore/platform: Make automatic updates interval configurable
There is no behavioural change, the default value is still 60 seconds.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:59:37 -07:00
Anton Vorontsov
b8587daa75 pstore/ram_core: Remove now unused code
The code tried to maintain the global list of persistent ram zones,
which isn't a great idea overall, plus since Android's ram_console
is no longer there, we can remove some unused functions.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Colin Cross <ccross@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:59:37 -07:00
Anton Vorontsov
602b5be4f1 pstore/ram_core: Silence some printks
Since we use multiple regions, the messages are somewhat annoying.
We do print total mapped memory already, so no need to print the
information for each region in the library routines.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Colin Cross <ccross@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:59:28 -07:00
Anton Vorontsov
b5d38e9bf1 pstore/ram: Add console messages handling
The console log size is configurable via ramoops.console_size
module option, and the log itself is available via
<pstore-mount>/console-ramoops file.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:59:28 -07:00
Anton Vorontsov
755d66b48f pstore/ram: Factor ramoops_get_next_prz() out of ramoops_pstore_read()
This will help make code clearer when we'll add support for other
message types.

The patch also changes return value from -EINVAL to 0 in case of
end-of-records. The exact value doesn't matter for pstore (it should
be just <= 0), but 0 feels more correct.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:59:28 -07:00
Anton Vorontsov
f4c5d2423c pstore/ram: Factor dmesg przs initialization out of probe()
This will help make code clearer when we'll add support for other
message types.

This also makes probe() much shorter and understandable, plus
makes mem/record size checking a bit easier.

Implementation detail: we now use a paddr pointer, this will
be used for allocating persistent ram zones for other message
types.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:59:28 -07:00
Anton Vorontsov
cac2eb7b58 pstore/ram: Give proper names to dump-related variables
We're about to add support for other message types, so let's rename
some variables to not be confused later.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:59:28 -07:00
Anton Vorontsov
f29e5956ae pstore: Add console log messages support
Pstore doesn't support logging kernel messages in run-time, it only
dumps dmesg when kernel oopses/panics. This makes pstore useless for
debugging hangs caused by HW issues or improper use of HW (e.g.
weird device inserted -> driver tried to write a reserved bits ->
SoC hanged. In that case we don't get any messages in the pstore.

Therefore, let's add a runtime logging support: PSTORE_TYPE_CONSOLE.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Colin Cross <ccross@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:59:27 -07:00
Anton Vorontsov
364ed2f465 pstore/inode: Make pstore_fill_super() static
There's no reason to extern it. The patch fixes the annoying sparse
warning:

CHECK   fs/pstore/inode.c
fs/pstore/inode.c:264:5: warning: symbol 'pstore_fill_super' was not
declared. Should it be static?

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:52:40 -07:00
Anton Vorontsov
93cce04968 pstore/ram: Should zap persistent zone on unlink
Otherwise, unlinked file will reappear on the next boot.

Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:52:40 -07:00
Anton Vorontsov
fce3979304 pstore/ram_core: Factor persistent_ram_zap() out of post_init()
A handy function that we will use outside of ram_core soon. But
so far just factor it out and start using it in post_init().

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:52:40 -07:00
Anton Vorontsov
25b63da647 pstore/ram_core: Do not reset restored zone's position and size
Otherwise, the files will survive just one reboot, and on a subsequent
boot they will disappear.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:52:39 -07:00
Anton Vorontsov
201e4aca5a pstore/ram: Should update old dmesg buffer before reading
Without the update, we'll only see the new dmesg buffer after the
reboot, but previously we could see it right away. Making an oops
visible in pstore filesystem before reboot is a somewhat dubious
feature, but removing it wasn't an intentional change, so let's
restore it.

For this we have to make persistent_ram_save_old() safe for calling
multiple times, and also extern it.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:52:39 -07:00
Arend van Spriel
a59d6293e5 debugfs: change parameter check in debugfs_remove() functions
The dentry parameter in debugfs_remove() and debugfs_remove_recursive()
is checked being a NULL pointer. To make cleanup by callers easier this
check is extended using the IS_ERR_OR_NULL macro instead because the
debugfs_create_... functions can return a ERR_PTR() value.

Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:40:41 -07:00
Eric Dumazet
047fe36052 splice: fix racy pipe->buffers uses
Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()

commit 35f3d14dbb (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.

Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.

Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.

splice_shrink_spd() loses its struct pipe_inode_info argument.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tom Herbert <therbert@google.com>
Cc: stable <stable@vger.kernel.org> # 2.6.35
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-13 21:16:42 +02:00
Bob Peterson
0d515210b6 GFS2: Add kobject release method
This patch adds a kobject release function that properly maintains
the kobject use count, so that accesses to the sysfs files do not
cause an access to freed kernel memory after an unmount.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-13 15:59:48 +01:00
Suresh Jayaraman
e73f843a32 cifs: fix parsing of password mount option
The double delimiter check that allows a comma in the password parsing code is
unconditional. We set "tmp_end" to the end of the string and we continue to
check for double delimiter. In the case where the password doesn't contain a
comma we end up setting tmp_end to NULL and eventually setting "options" to
"end". This results in the premature termination of the options string and hence
the values of UNCip and UNC are being set to NULL. This results in mount failure
with "Connecting to DFS root not implemented yet" error.

This error is usually not noticable as we have password as the last option in
the superblock mountdata. But when we call expand_dfs_referral() from
cifs_mount() and try to compose mount options for the submount, the resulting
mountdata will be of the form

   ",ver=1,user=foo,pass=bar,ip=x.x.x.x,unc=\\server\share"

and hence results in the above error. This bug has been seen with older NAS
servers running Samba 3.0.24.

Fix this by moving the double delimiter check inside the conditional loop.

Changes since -v1

   - removed the wrong strlen() micro optimization.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Cc: stable@vger.kernel.org [3.1+]
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-06-12 12:53:02 -05:00
Linus Torvalds
266ae4e615 fix unbalanced wb->list_lock in 3.5-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJP0tJwAAoJECvKgwp+S8JaSVAP/1FKw7u5E9QzOW8kPCj58nig
 FVF3BgaYgVrJglRjNOGTSfhJl3TVHGTzEMGICtsKUvXmgs7VmMudyWFuyokxfrZi
 z70DIbSuPDOMEt31MXiACq9z4E2IrZZ2kTYpRVd+zCV/2s4p789ejDLxOkk2ikpt
 M4TcIMPCAU2e4/ljRjNEQkg8d23ehdJsH9Hg8k56Jtb57e0LNpwnaZ0FzFKCzYh0
 Orm2dm9375vyzXZ2WNvgzl8ClJdEjkCyTq79i/39aQs4WeXsEP3nk+PFZYd+jrYA
 XwvV1sHJ43BOCCWwYS5dP0i9uGnRVlh/F2tyWWSSU2OizHzUXuOUWfXgY1JWyzaT
 uwEoCa8atYUOD7gp6ndnNR3uqBXtw0SihESst5fC7rKWuKOl62XToD6Pg6rPMex4
 ZuRaUw4Ts4efbX0dw3iDHMrS6Cf7a0JH52QOEYovrge4mDjiJzClHR69hsmVyoVb
 7s3j0q+mhur+NcnMkvC0C+pHn/m18q1i/yIBrD2K/UoLVGA17QMIZGoKfGnkeoaF
 a4fcoJ7fQE09tJ9KV1NLEdeD9PXQtETcNIAIZCfBCEJP9CPhHaUYkky+sOMcvKTr
 xEAe4f4WiHY7mTy8CRcDQjiOBigJe7Ksm+0qElauGcYw5Kqqrpdap1yPKbG4QWPL
 ABVYJyOvKSC3HLA6i7Gi
 =RBRw
 -----END PGP SIGNATURE-----

Merge tag 'writeback-lock-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback locking fix from Wu Fengguang:
 "fix unbalanced wb->list_lock in 3.5-rc1"

* tag 'writeback-lock-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Fix lock imbalance in writeback_sb_inodes()
2012-06-12 18:28:58 +03:00
Dan Carpenter
e216c8c771 NFS: add an endian notation for sparse
This is supposed to be a __be32 value.  Sparse complains a lot:

fs/nfs/callback_xdr.c:699:30: warning: incorrect type in initializer (different base types)
fs/nfs/callback_xdr.c:699:30:    expected unsigned int [unsigned] status
fs/nfs/callback_xdr.c:699:30:    got restricted __be32 const [usertype] csr_status
fs/nfs/callback_xdr.c:715:9: warning: cast to restricted __be32
fs/nfs/callback_xdr.c:716:16: warning: incorrect type in return expression (different base types)
fs/nfs/callback_xdr.c:716:16:    expected restricted __be32
fs/nfs/callback_xdr.c:716:16:    got unsigned int [unsigned] status

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-12 09:54:40 -04:00
Dan Carpenter
0439f31c35 NFSv4.1: integer overflow in decode_cb_sequence_args()
This seems like it could overflow on 32 bits.  Use kmalloc_array() which
has overflow protection built in.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-12 09:54:36 -04:00
Randy Dunlap
b84297197c exofs: fix sparse non-ANSI function warning
Fix sparse non-ANSI function warning:

  fs/exofs/sys.c:112:28: warning: non-ANSI function declaration of function 'exofs_sysfs_dbg_print'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-12 06:33:22 +03:00
Andy Adamson
2669940db8 NFSv4 do not send an empty SETATTR compound
Commit 536e43d12b ATTR_OPEN check can result in
an ia_valid with only ATTR_FILE set, and no NFS_VALID_ATTRS attributes to
request from the server.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-11 17:25:53 -04:00
Sachin Prabhu
64f9a83665 NFSv2: EOF incorrectly set on short read
In cases where the server returns fewer bytes then those requested, we
can incorrectly set the eof flag for the file. Fixing this allows the
request to be retried with updated offset and count arguments.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-11 17:25:00 -04:00
Steven Whitehouse
0fe2f1e929 GFS2: Size seq_file buffer more carefully
This places a limit on the buffer size for archs with larger
PAGE_SIZE.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-06-11 13:49:47 +01:00
Steven Whitehouse
1bb49303b7 GFS2: Use seq_vprintf for glocks debugfs file
Make use of the newly added seq_vprintf() function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
2012-06-11 13:26:50 +01:00
Steven Whitehouse
a4808147dc seq_file: Add seq_vprintf function and export it
The existing seq_printf function is rewritten in terms of the new
seq_vprintf which is also exported to modules. This allows GFS2
(and potentially other seq_file users) to have a vprintf based
interface and to avoid an extra copy into a temporary buffer in
some cases.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
2012-06-11 13:16:35 +01:00
Bryan Schumaker
c5afc8da5b NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts
Older versions of nfs utils don't always pass a "vers=" mount option for
NFS.  This chould lead to attempts at using NFS v0 due to a zeroed out
nfs_parsed_mount_data struct.  I solve this by setting the default NFS
version to NFS_DEFAULT_VERSION in the v2 and v3 cases (v4 has already been
taken care of by a similar patch).

Reported-by: Joerg Roedel <joro@&bytes.org>
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09 14:38:59 -04:00
Fred Isaman
906369e43c NFS: fix directio refcount bug on commit
This reverts a hunk from commit 0427708657
"NFS: Clean up - Simplify reference counting in fs/nfs/direct.c"

The cleanups in that patch affect the write path, but by the time
processing hits commit the removed reference has been added back by
nfs_scan_commit_list().  Without this reversion, any page that is
sent to commit holds on to an unbalanced reference that is never
freed.  The immediate effect is an imbalance over the wire between
OPENs and CLOSEs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09 14:32:45 -04:00
Jan Kara
ead188f9f9 writeback: Fix lock imbalance in writeback_sb_inodes()
Fix bug introduced by 169ebd90.  We have to have wb_list_lock locked when
restarting writeback loop after having waited for inode writeback.

Bug description by Ted Tso:

  I can reproduce this fairly easily by using ext4 w/o a journal, running
  under KVM with 1024megs memory, with fsstress (xfstests #13):

  [   45.153294] =====================================
  [   45.154784] [ BUG: bad unlock balance detected! ]
  [   45.155591] 3.5.0-rc1-00002-gb22b1f1 #124 Not tainted
  [   45.155591] -------------------------------------
  [   45.155591] flush-254:16/2499 is trying to release lock (&(&wb->list_lock)->rlock) at:
  [   45.155591] [<c022c3da>] writeback_sb_inodes+0x160/0x327
  [   45.155591] but there are no more locks to release!

Reported-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-06-09 08:32:15 +09:00
Linus Torvalds
77249539cd ext4 bug fixes for 3.5-rc2
This update contains two bug fixes, both destined for the stable tree.
 Perhaps the most important is one which fixes ext4 when used with file
 systems originally formatted for use with ext3, but then later
 converted to take advantage of ext4.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJP0jyAAAoJENNvdpvBGATwGSIP/R0+4j/SqSncjLIEx6EvrCeY
 eyR9JYb0F44P1bKlZ6WErrupysmhPqHubnbxMg9bVMTxefN/m6tEyG+EG4MwMxFP
 mRtc5EVT0i/9k7U6s4Ey0kSnBnZqx6IBdFUR5lHrvAf7Ud920eY12Cx4oscI7Fj6
 E81lWASOkDTjzsJpzFIqcW7b257Ne1oxPAu1p290W5riSkDzle7qqT5xT+lBAbr0
 YM8ZFhznUdyv2aeuchf/dfys5YdVgSporplIjvP69bRyNr4x5B6QOUGxuX3RNhV3
 Hqf83GW8HycyAEdl4AtakNtxiMPhrIy0S/RamWqBTE2+nws3Wnkd1wN4c5HjZZad
 w5u2E/uSxedZfgGDZZriRwHCgR+2I2CEtMy45gL96o+2Cel6unN+vjVPTVbMpMUy
 3PsPoPjPVmNPLyn4vIfHlEQ5ZLhBzVpXFEsca9s+6JeHQ4dKP77fa8iq5AjzCjzA
 FYvmf1fNbDW1da81fRUOSnwuaC0JuUvOu5eQLLQ3jzz5gj+DvZvQqvleMHTXkQ3X
 hLpSXNTtXUBYeXIw5aJzh5VPJqqGzPkpJhAEbMRgCobRA1HtuAEJzZf4J7+jiWjV
 IrQpw0HIeZiK1oZW8iC8YYbIz8AakU+MnIVhvcPc3rIr5eeEtWPM/9ETx/mrO9s2
 /y0bwu84eQLIctFjzR2s
 =9oyU
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bug fixes from Theodore Ts'o:
 "This update contains two bug fixes, both destined for the stable tree.
  Perhaps the most important is one which fixes ext4 when used with file
  systems originally formatted for use with ext3, but then later
  converted to take advantage of ext4."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: don't set i_flags in EXT4_IOC_SETFLAGS
  ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg
2012-06-08 11:15:31 -07:00
Linus Torvalds
e72643088f Fix UBI and UBIFS - they refuse to work without debugfs. This was
broken by the 3.5-rc1 UBI/UBIFS changes when we removed the debugging
 Kconfig switches.
 
 Also, correct locking in 'ubi_wl_flush()' - it was extended to support
 flushing a specific LEB in 3.5-rc1, and the locking was sub-optimal.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP0jAHAAoJECmIfjd9wqK0qN8P/1JRuky3xsX8o5S5tqVLKkzO
 4naM8bH1AuUemg8f+K+2IUyHp1Mp0lsBj7zA46+3xWXvRcZBJaWgqZFkLOuWSZn0
 ChRGKKhdwRkXokq7JHgUJCrdTLyXsBV+uuBbJ+oTPQa2+7f1mxgZuLc+MJzpoKAs
 sNdE3A2pH3hhXDJKuT6hrM2wqD/2rW5bGurW4sYBxVUJEZ793HeXu30lTffZuWHo
 ClzDtC0iWIS5NmdN0sbQL6Os3p+uiR5mjJek0eABTHAmgXvfBUWpf9ewTXYS2ulK
 zY8brtIy2N8qmXFa8ZyguZKpCdulDqRLPz/2bmX5rOJSsyogoNxHI+UlZFCfq0u/
 txA+Wm4dQZtvSM3Stws6gxuIDCX83wOgHq/gUitMsVkxo+McwevjXpZb4Ha80OVl
 Iw7ERGrbxZn+B+lTh+9jgD6kstqpefO7fBP5gkLRvHToxDKY+6sdH6b3UuE6pDJI
 SA2D1uDyk33A/dyvYZThcvsq7KMdF0mhfLuxkScSfP17hc0hnh/FSzkj3fHrADfy
 oA4Oo/jKFbOFnmByNHjzvOaqeYHVsKwzq+feDcZ7sYA5GNrVMVViLZQljQBdJWo5
 E9howusrb81AGvjCIFobntBLE8DZF5OGfgt/n+60+jo3/qern/4jeuWfGL/Q9Wcw
 x27oVmOBLjuHuG3hXlGt
 =X7QS
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS fixes from Artem Bityutskiy:
 "Fix UBI and UBIFS - they refuse to work without debugfs.  This was
  broken by the 3.5-rc1 UBI/UBIFS changes when we removed the debugging
  Kconfig switches.

  Also, correct locking in 'ubi_wl_flush()' - it was extended to support
  flushing a specific LEB in 3.5-rc1, and the locking was sub-optimal."

* tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs:
  UBI: correct ubi_wl_flush locking
  UBIFS: fix debugfs-less systems support
  UBI: fix debugfs-less systems support
2012-06-08 11:04:06 -07:00
Linus Torvalds
32ba9c3fca Revert "vfs: stop d_splice_alias creating directory aliases"
This reverts commit 7732a557b1 (and commit
3f50fff4da, which was a follow-up
cleanup).

We're chasing an elusive bug that Dave Jones can apparently reproduce
using his system call fuzzer tool, and that looks like some kind of
locking ordering problem on the directory i_mutex chain.  Our i_mutex
locking is rather complex, and depends on the topological ordering of
the directories, which is why we have been very wary of splicing
directory entries around.

Of course, we really don't want to ever see aliased unconnected
directories anyway, so none of this should ever happen, but this revert
aims to basically get us back to a known older state.

Bruce points to some of the previous discussion at

       http://marc.info/?i=<20110310105821.GE22723@ZenIV.linux.org.uk>

and in particular a long post from Neil:

       http://marc.info/?i=<20110311150749.2fa2be66@notabene.brown>

It should be noted that it's possible that Dave's problems come from
other changes altohgether, including possibly just the fact that Dave
constantly is teachning his fuzzer new tricks.  So what appears to be a
new bug could in fact be an old one that just gets newly triggered, but
reverting these patches as "still under heavy discussion" is the right
thing regardless.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-08 10:34:03 -07:00
Trond Myklebust
2d0dbc6ae8 NFSv4: Fix unnecessary delegation returns in nfs4_do_open
While nfs4_do_open() expects the fmode argument to be restricted to
combinations of FMODE_READ and FMODE_WRITE, both nfs4_atomic_open()
and nfs4_proc_create will pass the nfs_open_context->mode,
which contains the full fmode_t.

This patch ensures that nfs4_do_open strips the other fmode_t bits,
fixing a problem in which the nfs4_do_open call would result in an
unnecessary delegation return.

Reported-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-06-08 11:08:42 -04:00
Benjamin Marzinski
90306c41dc GFS2: Use lvbs for storing rgrp information with mount option
Instead of reading in the resource groups when gfs2 is checking
for free space to allocate from, gfs2 can store the necessary infromation
in the resource group's lvb.  Also, instead of searching for unlinked
inodes in every resource group that's checked for free space, gfs2 can
store the number of unlinked but inodes in the lvb, and only check for
unlinked inodes if it will find some.

The first time a resource group is locked, the lvb must initialized.
Since this involves counting the unlinked inodes in the resource group,
this takes a little extra time.  But after that, if the resource group
is locked with GL_SKIP, the buffer head won't be read in unless it's
actually needed.

Enabling the resource groups lvbs is done via the rgrplvb mount option.  If
this option isn't set, the lvbs will still be set and updated, but they won't
be verfied or used by the filesystem.  To safely turn on this option, all of
the nodes mounting the filesystem must be running code with this patch, and
the filesystem must have been completely unmounted since they were updated.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-08 11:50:01 +01:00
Steven Whitehouse
ba1ddcb6ca GFS2: Cache last hash bucket for glock seq_files
For the glocks and glstats seq_files, which are exposed via debugfs
we should cache the most recent hash bucket, along with the offset
into that bucket. This allows us to restart from that point, rather
than having to begin at the beginning each time.

This is an idea from Eric Dumazet, however I've slightly extended it
so that if the position from which we are due to start is at any
point beyond the last cached point, we start from the last cached
point, plus whatever is the appropriate offset. I don't really expect
people to be lseeking around these files, but if they did so with only
positive offsets, then we'd still get some of the benefit of using a
cached offset.

With my simple test of around 200k entries in the file, I'm seeing
an approx 10x speed up.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-08 11:16:22 +01:00
Linus Torvalds
48d212a2ee Revert "mm: correctly synchronize rss-counters at exit/exec"
This reverts commit 40af1bbdca.

It's horribly and utterly broken for at least the following reasons:

 - calling sync_mm_rss() from mmput() is fundamentally wrong, because
   there's absolutely no reason to believe that the task that does the
   mmput() always does it on its own VM.  Example: fork, ptrace, /proc -
   you name it.

 - calling it *after* having done mmdrop() on it is doubly insane, since
   the mm struct may well be gone now.

 - testing mm against NULL before you call it is insane too, since a
NULL mm there would have caused oopses long before.

.. and those are just the three bugs I found before I decided to give up
looking for me and revert it asap.  I should have caught it before I
even took it, but I trusted Andrew too much.

Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-07 17:54:07 -07:00
Tao Ma
b22b1f178f ext4: don't set i_flags in EXT4_IOC_SETFLAGS
Commit 7990696 uses the ext4_{set,clear}_inode_flags() functions to
change the i_flags automatically but fails to remove the error setting
of i_flags.  So we still have the problem of trashing state flags.
Fix this by removing the assignment.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-06-07 19:04:19 -04:00
Theodore Ts'o
b0dd6b70f0 ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg
Ext3 filesystems that are converted to use as many ext4 file system
features as possible will enable uninit_bg to speed up e2fsck times.
These file systems will have a native ext3 layout of inode tables and
block allocation bitmaps (as opposed to ext4's flex_bg layout).
Unfortunately, in these cases, when first allocating a block in an
uninitialized block group, ext4 would incorrectly calculate the number
of free blocks in that block group, and then errorneously report that
the file system was corrupt:

EXT4-fs error (device vdd): ext4_mb_generate_buddy:741: group 30, 32254 clusters in bitmap, 32258 in gd

This problem can be reproduced via:

    mke2fs -q -t ext4 -O ^flex_bg /dev/vdd 5g
    mount -t ext4 /dev/vdd /mnt
    fallocate -l 4600m /mnt/test

The problem was caused by a bone headed mistake in the check to see if a
particular metadata block was part of the block group.

Many thanks to Kees Cook for finding and bisecting the buggy commit
which introduced this bug (commit fd034a84e1, present since v3.2).

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: stable@kernel.org
2012-06-07 18:56:06 -04:00
Konstantin Khlebnikov
40af1bbdca mm: correctly synchronize rss-counters at exit/exec
mm->rss_stat counters have per-task delta: task->rss_stat.  Before
changing task->mm pointer the kernel must flush this delta with
sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task->signal->maxrss, taskstats,
audit and other stuff.  Unfortunately the kernel does this before
calling mm_release(), which can call put_user() for processing
task->clear_child_tid.  So at this point we can trigger page-faults and
task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier.  After mm_release() there should
be no pagefaults.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>		[3.4.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-07 14:43:55 -07:00
Trond Myklebust
02c67525cf NFSv4.1: Convert another trivial printk into a dprintk
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07 13:45:53 -04:00
Fred Isaman
fb47ddc9d5 NFS4: Fix open bug when pnfs module blacklisted
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07 13:44:24 -04:00
Trond Myklebust
f07936f2c4 NFS: Remove incorrect BUG_ON in nfs_found_client
It is perfectly valid for nfs_get_client() to return a nfs_client that
is in the process of setting up the NFSv4.1 session.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07 10:21:03 -04:00
Steven Whitehouse
df5d2f5560 GFS2: Increase buffer size for glocks and glstats debugfs files
As per Al Viro's suggestion, this increases the buffer size used
for these two files. This provides a speed up of slightly less than
8x (i.e. proportional to the buffer size) for cases when we have
large numbers of glocks.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-07 13:30:16 +01:00
Artem Bityutskiy
818039c7d5 UBIFS: fix debugfs-less systems support
Commit "f70b7e5 UBIFS: remove Kconfig debugging option" broke UBIFS and it
refuses to initialize if debugfs (CONFIG_DEBUG_FS) is disabled. I incorrectly
assumed that debugfs files creation function will return success if debugfs
is disabled, but they actually return -ENODEV. This patch fixes the issue.

Reported-by: Paul Parsons <lost.distance@yahoo.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Paul Parsons <lost.distance@yahoo.com>
2012-06-07 10:43:54 +03:00
Steve Dickson
f25efd851c NFS: Map minor mismatch error to protocol not support error.
Sservers that only have NFSv4.1 support the
NFS4ERR_MINOR_VERS_MISMATCH error is return on
v4.0 mounts. Mapping that error to EPROTONOSUPPORT
will cause the mount to back off to v3 instead of
failing.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-06 14:32:40 -04:00
Steven Whitehouse
1b8ba31a88 GFS2: Fix error handling when reading an invalid block from the journal
When we read an invalid block from the journal, we should not call
withdraw, but simply print a message and return an error. It is
up to the caller to then handle that error. In the case of mount
that means a failed mount, rather than a withdraw (requiring a
reboot). In the case of recovering another nodes journal then
we return an error via the uevent.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-06 11:27:49 +01:00
Steven Whitehouse
23d0bb834e GFS2: Add "top dir" flag support
This patch adds support for the "top dir" flag. Currently this is unused
but a subsequent patch is planned which will add support for the
Orlov allocation policy when allocating subdirectories in a parent
with this flag set.

In order to ensure backward compatible behaviour, mkfs.gfs2 does
not currently tag the root directory with this flag, it must always be
set manually.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-06 11:27:36 +01:00
Bob Peterson
5407e24229 GFS2: Fold quota data into the reservations struct
This patch moves the ancillary quota data structures into the
block reservations structure. This saves GFS2 some time and
effort in allocating and deallocating the qadata structure.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-06 11:20:22 +01:00
Bob Peterson
0a305e4960 GFS2: Extend the life of the reservations
This patch lengthens the lifespan of the reservations structure for
inodes. Before, they were allocated and deallocated for every write
operation. With this patch, they are allocated when the first write
occurs, and deallocated when the last process closes the file.
It's more efficient to do it this way because it saves GFS2 a lot of
unnecessary allocates and frees. It also gives us more flexibility
for the future: (1) we can now fold the qadata structure back into
the structure and save those alloc/frees, (2) we can use this for
multi-block reservations.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-06 11:17:59 +01:00
Trond Myklebust
9bce008bae NFS: Fix a commit bug
The new commit code fails to copy the verifier into the wb_verf field
of _all_ the nfs_page structures; it only copies it into the first entry.
The consequence is that most requests end up failing to match in
nfs_commit_release.

Fix is to copy the verifier into the req->wb_verf field in
nfs_write_completion.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-06-05 18:38:47 -04:00
Bryan Schumaker
cdf66442fa NFS4: Set parsed mount data version to 4
This patch only affects mounting through "-t nfs4" since it doesn't set
up an nfs version to use in the mount data.  The nfs client was trying
to mount using NFS v0, causing either a BUG() or a protocol not
supported message.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-05 15:50:47 -04:00
Linus Torvalds
f9ba7179ce Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix blksize calculation
  fuse: fix stat call on 32 bit platforms
  fuse: optimize fallocate on permanent failure
  fuse: add FALLOCATE operation
  fuse: Convert to kstrtoul_from_user
2012-06-05 10:11:11 -07:00
Trond Myklebust
bda197f5d7 NFSv4.1: Ensure we clear session state flags after a session creation
Both nfs4_reset_session and nfs41_init_clientid need to clear all the
session related state flags on success.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-05 10:22:14 -04:00
Trond Myklebust
08106ac7c8 NFSv4.1: Convert a trivial printk into a dprintk
There is no need to bug the user about the server returning an error
on destroy_session. The error will be handled by the state manager,
without any need for further input from anyone else.
So convert that printk into a debugging dprintk.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-05 10:08:24 -04:00
Trond Myklebust
029c534737 NFSv4: Fix up decode_attr_mdsthreshold
Fix an incorrect use of 'likely()'. The FATTR4_WORD2_MDSTHRESHOLD
bit is only expected in NFSv4.1 OPEN calls, and so is actually
rather _unlikely_.

decode_attr_mdsthreshold needs to clear FATTR4_WORD2_MDSTHRESHOLD
from the attribute bitmap after it has decoded the data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
2012-06-05 10:00:47 -04:00
Trond Myklebust
1549210fcc NFSv4: Fix an Oops in the open recovery code
The open recovery code does not need to request a new value for the
mdsthreshold, and so does not allocate a struct nfs4_threshold.
The problem is that encode_getfattr_open() will still request an
mdsthreshold, and so we end up Oopsing in decode_attr_mdsthreshold.

This patch fixes encode_getfattr_open so that it doesn't request an
mdsthreshold when the caller isn't asking for one. It also fixes
decode_attr_mdsthreshold so that it errors if the server returns
an mdsthreshold that we didn't ask for (instead of Oopsing).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
2012-06-05 10:00:14 -04:00
Linus Torvalds
bf2785a818 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French.

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Move get_next_mid to ops struct
  CIFS: Make accessing is_valid_oplock/dump_detail ops struct field safe
  CIFS: Improve identation in cifs_unlock_range
  CIFS: Fix possible wrong memory allocation
2012-06-04 15:00:58 -07:00
Linus Torvalds
0640113be2 vfs: Fix /proc/<tid>/fdinfo/<fd> file handling
Cyrill Gorcunov reports that I broke the fdinfo files with commit
30a08bf2d3 ("proc: move fd symlink i_mode calculations into
tid_fd_revalidate()"), and he's quite right.

The tid_fd_revalidate() function is not just used for the <tid>/fd
symlinks, it's also used for the <tid>/fdinfo/<fd> files, and the
permission model for those are different.

So do the dynamic symlink permission handling just for symlinks, making
the fdinfo files once more appear as the proper regular files they are.

Of course, Al Viro argued (probably correctly) that we shouldn't do the
symlink permission games at all, and make the symlinks always just be
the normal 'lrwxrwxrwx'.  That would have avoided this issue too, but
since somebody noticed that the permissions had changed (which was the
reason for that original commit 30a08bf2d3 in the first place), people
do apparently use this feature.

[ Basically, you can use the symlink permission data as a cheap "fdinfo"
  replacement, since you see whether the file is open for reading and/or
  writing by just looking at st_mode of the symlink.  So the feature
  does make sense, even if the pain it has caused means we probably
  shouldn't have done it to begin with. ]

Reported-and-tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-04 11:00:45 -07:00
Jan Schmidt
4d5a0565ce Btrfs: remove call to btrfs_header_nritems with no effect
This is a leftover from cleanup patch 559af821. Before the cleanup,
btrfs_header_nritems was called inside an if condition. As it has no side
effects we need to preserve here, it should simply be dropped.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-04 14:35:29 +02:00
Linus Torvalds
829f51dbd8 Merge branch 'akpm' (Fixups for Andrew's patchbomb)
Merge fixups for the mac NLS tables from Andrew.

* emailed from Andrew Morton, and one cleanup by me:
  nls: fix (and rename) mac NLS table files and config options
  fs/nls/Makefile: remove bogus CONFIG_ assignments
2012-06-01 19:56:23 -07:00
Linus Torvalds
8b8c0daa2c nls: fix (and rename) mac NLS table files and config options
The config options in the Kconfig file (with _CODEPAGE_ in the name)
didn't match the config option name in the Makefile (no _CODEPAGE_).

And both of them were of the hard-to-read MACXYZZY variety, which made
them hard to parse for normal humans: MACROMAN easily reads as "macro
man", not as "Mac Roman".

So rename the options to be consistent, and be NLS_MAC_xyzzy.  Rename
the files to be mac-xyzzy.c too, and drop the "nls" part entirely (it's
already in the directory name).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01 19:51:22 -07:00
Andrew Morton
92a8956364 fs/nls/Makefile: remove bogus CONFIG_ assignments
These were debug things which snuck through.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Cc: Vladimir Serbinenko <phcoder@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01 19:47:26 -07:00
Linus Torvalds
f5e7e844a5 - More robust parsing especially of xattr data in JFFS2
- Updates to mxc_nand and gpmi drivers to support new boards and device tree
  - Improve consistency of information about ECC strength in NAND devices
  - Clean up partition handling of plat_nand
  - Support NAND drivers without dedicated access to OOB area
  - BCH hardware ECC support for OMAP
  - Other fixes and cleanups, and a few new device IDs
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iEYEABECAAYFAk/JG1wACgkQdwG7hYl686M80wCglN4kutx20j+KJWuZofkr9Hog
 weEAoI4jrqEWEdW9EcT2CIWQw7eG+1v+
 =7tdo
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd

Pull mtd update from David Woodhouse:
 - More robust parsing especially of xattr data in JFFS2
 - Updates to mxc_nand and gpmi drivers to support new boards and device tree
 - Improve consistency of information about ECC strength in NAND devices
 - Clean up partition handling of plat_nand
 - Support NAND drivers without dedicated access to OOB area
 - BCH hardware ECC support for OMAP
 - Other fixes and cleanups, and a few new device IDs

Fixed trivial conflict in drivers/mtd/nand/gpmi-nand/gpmi-nand.c due to
added include files next to each other.

* tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd: (75 commits)
  mtd: mxc_nand: move ecc strengh setup before nand_scan_tail
  mtd: block2mtd: fix recursive call of mtd_writev
  mtd: gpmi-nand: define ecc.strength
  mtd: of_parts: fix breakage in Kconfig
  mtd: nand: fix scan_read_raw_oob
  mtd: docg3 fix in-middle of blocks reads
  mtd: cfi_cmdset_0002: Slight cleanup of fixup messages
  mtd: add fixup for S29NS512P NOR flash.
  jffs2: allow to complete xattr integrity check on first GC scan
  jffs2: allow to discriminate between recoverable and non-recoverable errors
  mtd: nand: omap: add support for hardware BCH ecc
  ARM: OMAP3: gpmc: add BCH ecc api and modes
  mtd: nand: check the return code of 'read_oob/read_oob_raw'
  mtd: nand: remove 'sndcmd' parameter of 'read_oob/read_oob_raw'
  mtd: m25p80: Add support for Winbond W25Q80BW
  jffs2: get rid of jffs2_sync_super
  jffs2: remove unnecessary GC pass on sync
  jffs2: remove unnecessary GC pass on umount
  jffs2: remove lock_super
  mtd: gpmi: add gpmi support for mx6q
  ...
2012-06-01 16:55:42 -07:00
Linus Torvalds
86c47b70f6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull third pile of signal handling patches from Al Viro:
 "This time it's mostly helpers and conversions to them; there's a lot
  of stuff remaining in the tree, but that'll either go in -rc2
  (isolated bug fixes, ideally via arch maintainers' trees) or will sit
  there until the next cycle."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  x86: get rid of calling do_notify_resume() when returning to kernel mode
  blackfin: check __get_user() return value
  whack-a-mole with TIF_FREEZE
  FRV: Optimise the system call exit path in entry.S [ver #2]
  FRV: Shrink TIF_WORK_MASK [ver #2]
  FRV: Prevent syscall exit tracing and notify_resume at end of kernel exceptions
  new helper: signal_delivered()
  powerpc: get rid of restore_sigmask()
  most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set
  set_restore_sigmask() is never called without SIGPENDING (and never should be)
  TIF_RESTORE_SIGMASK can be set only when TIF_SIGPENDING is set
  don't call try_to_freeze() from do_signal()
  pull clearing RESTORE_SIGMASK into block_sigmask()
  sh64: failure to build sigframe != signal without handler
  openrisc: tracehook_signal_handler() is supposed to be called on success
  new helper: sigmask_to_save()
  new helper: restore_saved_sigmask()
  new helpers: {clear,test,test_and_clear}_restore_sigmask()
  HAVE_RESTORE_SIGMASK is defined on all architectures now
2012-06-01 11:53:44 -07:00
Pavel Shilovsky
8825736060 CIFS: Move get_next_mid to ops struct
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-06-01 12:35:19 -05:00
Pavel Shilovsky
7f0adb53bc CIFS: Make accessing is_valid_oplock/dump_detail ops struct field safe
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-06-01 12:35:16 -05:00
Pavel Shilovsky
ea319d57d3 CIFS: Improve identation in cifs_unlock_range
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-06-01 12:35:12 -05:00
Pavel Shilovsky
0013fb4ca3 CIFS: Fix possible wrong memory allocation
when cifs_reconnect sets maxBuf to 0 and we try to calculate a size
of memory we need to store locks.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-06-01 12:35:08 -05:00
Linus Torvalds
1193755ac6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs changes from Al Viro.
 "A lot of misc stuff.  The obvious groups:
   * Miklos' atomic_open series; kills the damn abuse of
     ->d_revalidate() by NFS, which was the major stumbling block for
     all work in that area.
   * ripping security_file_mmap() and dealing with deadlocks in the
     area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
     general.
   * ->encode_fh() switched to saner API; insane fake dentry in
     mm/cleancache.c gone.
   * assorted annotations in fs (endianness, __user)
   * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
   * ->update_time() work from Josef.
   * other bits and pieces all over the place.

  Normally it would've been in two or three pull requests, but
  signal.git stuff had eaten a lot of time during this cycle ;-/"

Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
  nfs: don't open in ->d_revalidate
  vfs: retry last component if opening stale dentry
  vfs: nameidata_to_filp(): don't throw away file on error
  vfs: nameidata_to_filp(): inline __dentry_open()
  vfs: do_dentry_open(): don't put filp
  vfs: split __dentry_open()
  vfs: do_last() common post lookup
  vfs: do_last(): add audit_inode before open
  vfs: do_last(): only return EISDIR for O_CREAT
  vfs: do_last(): check LOOKUP_DIRECTORY
  vfs: do_last(): make ENOENT exit RCU safe
  vfs: make follow_link check RCU safe
  vfs: do_last(): use inode variable
  vfs: do_last(): inline walk_component()
  vfs: do_last(): make exit RCU safe
  vfs: split do_lookup()
  Btrfs: move over to use ->update_time
  fs: introduce inode operation ->update_time
  reiserfs: get rid of resierfs_sync_super
  reiserfs: mark the superblock as dirty a bit later
  ...
2012-06-01 10:34:35 -07:00
Linus Torvalds
4edebed866 Ext4 updates for 3.5
The major new feature added in this update is Darrick J. Wong's
 metadata checksum feature, which adds crc32 checksums to ext4's
 metadata fields.  There is also the usual set of cleanups and bug
 fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJPyNleAAoJENNvdpvBGATwtLMP/i3WsPyTvxmYP6HttHXQb8Jk
 GYCoTQ5bZMuTbOwOGg3w137cXWBv5uuPpxIk79YVLHSWx6HuanlGIa7/VnPKIaLu
 2ihuvVfnrDqpwQ4MJaSq4R1Eka9JCwZ7HbYYo+fYOVobxgw588JVV9VVI9EdKRGz
 z11UkW8iHE0f6Xa5gOhdAMkR0uaPnxwJX/qHZYiHuognRivuwMglqWJSiMr8nQmo
 A2GmeoLehhW+k65IqgTCmSW6ZgFTvZdk6bskQIij3fOYHW3hHn/gcLFtmLTIZ/B5
 LZdg/lngPYve+R/UyypliGKi+pv1qNEiTiBm0rrBgsdZFkBdGj0soSvGZzeK+Mp4
 Q1vAmOBPYPFzs6nVzPst2n/osryyykFCK6TgSGZ50dosJ0NO8cBeDdX/gh9JKD2R
 yQUMUltOCCSj/eWU4iwqZ0T3FXRiH/+S3XMHznoKJiwUyGDBNQy4+Yg2k2WzUXrz
 Cu5t5BwNG2WNP7y5Et/wmUIzpC7VPId4qYmGyHe7OwTxSJgW+6f7GVkHfjWcDMuv
 pGgEUiInbMmLajP3v2/LKfVU4hXLZy4uJbhoBgDdeIpZrnPifJG/MwDOS4W+dLVT
 tDzgO1SAh3/E4jATreZ5bjzD/HGsfe1OX09UH3Pbc1EcgkrLnyrQXFwdHshdVu4A
 cxMoKNPVCQJySb1UrLkO
 =SdJJ
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull Ext4 updates from Theodore Ts'o:
 "The major new feature added in this update is Darrick J Wong's
  metadata checksum feature, which adds crc32 checksums to ext4's
  metadata fields.

  There is also the usual set of cleanups and bug fixes."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
  ext4: hole-punch use truncate_pagecache_range
  jbd2: use kmem_cache_zalloc wrapper instead of flag
  ext4: remove mb_groups before tearing down the buddy_cache
  ext4: add ext4_mb_unload_buddy in the error path
  ext4: don't trash state flags in EXT4_IOC_SETFLAGS
  ext4: let getattr report the right blocks in delalloc+bigalloc
  ext4: add missing save_error_info() to ext4_error()
  ext4: add debugging trigger for ext4_error()
  ext4: protect group inode free counting with group lock
  ext4: use consistent ssize_t type in ext4_file_write()
  ext4: fix format flag in ext4_ext_binsearch_idx()
  ext4: cleanup in ext4_discard_allocated_blocks()
  ext4: return ENOMEM when mounts fail due to lack of memory
  ext4: remove redundundant "(char *) bh->b_data" casts
  ext4: disallow hard-linked directory in ext4_lookup
  ext4: fix potential integer overflow in alloc_flex_gd()
  ext4: remove needs_recovery in ext4_mb_init()
  ext4: force ro mount if ext4_setup_super() fails
  ext4: fix potential NULL dereference in ext4_free_inodes_counts()
  ext4/jbd2: add metadata checksumming to the list of supported features
  ...
2012-06-01 10:12:15 -07:00
Al Viro
754421c8ca HAVE_RESTORE_SIGMASK is defined on all architectures now
Everyone either defines it in arch thread_info.h or has TIF_RESTORE_SIGMASK
and picks default set_restore_sigmask() in linux/thread_info.h.  Kill the
ifdefs, slap #error in linux/thread_info.h to catch breakage when new ones
get merged.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:58:46 -04:00
Miklos Szeredi
0ef97dcfce nfs: don't open in ->d_revalidate
NFSv4 can't do reliable opens in d_revalidate, since it cannot know whether a
mount needs to be followed or not.  It does check d_mountpoint() on the dentry,
which can result in a weird error if the VFS found that the mount does not in
fact need to be followed, e.g.:

  # mount --bind /mnt/nfs /mnt/nfs-clone
  # echo something > /mnt/nfs/tmp/bar
  # echo x > /tmp/file
  # mount --bind /tmp/file /mnt/nfs-clone/tmp/bar
  # cat  /mnt/nfs/tmp/bar
  cat: /mnt/nfs/tmp/bar: Not a directory

Which should, by any sane filesystem, result in "something" being printed.

So instead do the open in f_op->open() and in the unlikely case that the cached
dentry turned out to be invalid, drop the dentry and return EOPENSTALE to let
the VFS retry.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:02 -04:00
Miklos Szeredi
16b1c1cd71 vfs: retry last component if opening stale dentry
NFS optimizes away d_revalidates for last component of open.  This means that
open itself can find the dentry stale.

This patch allows the filesystem to return EOPENSTALE and the VFS will retry the
lookup on just the last component if possible.

If the lookup was done using RCU mode, including the last component, then this
is not possible since the parent dentry is lost.  In this case fall back to
non-RCU lookup.  Currently this is not used since NFS will always leave RCU
mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:01 -04:00
Miklos Szeredi
50ee93afca vfs: nameidata_to_filp(): don't throw away file on error
If open fails, don't put the file.  This allows it to be reused if open needs to
be retried.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:01 -04:00
Miklos Szeredi
91daee988d vfs: nameidata_to_filp(): inline __dentry_open()
Copy __dentry_open() into nameidata_to_filp().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:01 -04:00
Miklos Szeredi
78f71eff3c vfs: do_dentry_open(): don't put filp
Move put_filp() out to __dentry_open(), the only caller now.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:00 -04:00
Miklos Szeredi
90ad1a8ecb vfs: split __dentry_open()
Split __dentry_open() into two functions:

  do_dentry_open() - does most of the actual work, doesn't put file on failure
  open_check_o_direct() - after a successful open, checks direct_IO method

This will allow i_op->atomic_open to do just the file initialization and leave
the direct_IO checking to the VFS.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:00 -04:00
Miklos Szeredi
5f5daac12a vfs: do_last() common post lookup
Now the post lookup code can be shared between O_CREAT and plain opens since
they are essentially the same.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:00 -04:00
Miklos Szeredi
d7fdd7f6e1 vfs: do_last(): add audit_inode before open
This allows this code to be shared between O_CREAT and plain opens.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:59 -04:00
Miklos Szeredi
050ac841ea vfs: do_last(): only return EISDIR for O_CREAT
This allows this code to be shared between O_CREAT and plain opens.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:59 -04:00
Miklos Szeredi
af2f55426d vfs: do_last(): check LOOKUP_DIRECTORY
Check for ENOTDIR before finishing open.  This allows this code to be shared
between O_CREAT and plain opens.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:58 -04:00
Miklos Szeredi
54c33e7f95 vfs: do_last(): make ENOENT exit RCU safe
This will allow this code to be used in RCU mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:58 -04:00
Miklos Szeredi
d45ea86792 vfs: make follow_link check RCU safe
This will allow this code to be used in RCU mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:58 -04:00
Miklos Szeredi
decf340087 vfs: do_last(): use inode variable
Use helper variable instead of path->dentry->d_inode before complete_walk().
This will allow this code to be used in RCU mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:57 -04:00
Miklos Szeredi
a1eb331530 vfs: do_last(): inline walk_component()
Copy walk_component() into do_lookup().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:57 -04:00
Miklos Szeredi
e276ae672f vfs: do_last(): make exit RCU safe
Allow returning from do_last() with LOOKUP_RCU still set on the "out:" and
"exit:" labels.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:57 -04:00
Miklos Szeredi
697f514df1 vfs: split do_lookup()
Split do_lookup() into two functions:

  lookup_fast() - does cached lookup without i_mutex
  lookup_slow() - does lookup with i_mutex

Both follow managed dentries.

The new functions are needed by atomic_open.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:56 -04:00
Josef Bacik
e41f941a23 Btrfs: move over to use ->update_time
Btrfs had been doing it's own file_update_time so we could catch ENOSPC
properly, so just update our btrfs_update_time to work with the new stuff and
then we'll be fancy later.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-01 12:07:52 -04:00
Josef Bacik
c3b2da3148 fs: introduce inode operation ->update_time
Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail.  We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates.  So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made.  The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.

I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-01 12:07:25 -04:00
Linus Torvalds
51eab603f5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This includes a fairly large change from Josef around data writeback
  completion.  Before, the writeback wasn't completed until the metadata
  insertions for the extent were done, and this made for fairly large
  latency spikes on the last page of each ordered extent.

  We already had a separate mechanism for tracking pending metadata
  insertions, so Josef just needed to tweak things a little to end
  writeback earlier on the page.  Overall it makes us much friendly to
  memory reclaim and lowers latencies quite a lot for synchronous IO.

  Jan Schmidt has finished some background work required to track btree
  blocks as they go through changes in ownership.  It's the missing
  piece he needed for both btrfs send/receive and subvolume quotas.
  Neither of those are ready yet, but the new tracking code is included
  here.  Most of the time, the new code is off.  It is only used by
  scrub and other backref walkers.

  Stefan Behrens has added io failure tracking.  This includes counters
  for which drives are causing the most trouble so the admin (or an
  automated tool) can choose to kick them out.  We're tracking IO
  errors, crc errors, and generation checks we do on each metadata
  block.

  RAID5/6 did miss the cut this time because I'm having trouble with
  corruptions.  I'll nail it down next week and post as a beta testing
  before 3.6"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (58 commits)
  Btrfs: fix tree mod log rewinded level and rewinding of moved keys
  Btrfs: fix tree mod log del_ptr
  Btrfs: add tree_mod_dont_log helper
  Btrfs: add missing spin_lock for insertion into tree mod log
  Btrfs: add inodes before dropping the extent lock in find_all_leafs
  Btrfs: use delayed ref sequence numbers for all fs-tree updates
  Btrfs: fix false positive in check-integrity on unmount
  Btrfs: fix runtime warning in check-integrity check data mode
  Btrfs: set ioprio of scrub readahead to idle
  Btrfs: fix return code in drop_objectid_items
  Btrfs: check to see if the inode is in the log before fsyncing
  Btrfs: return value of btrfs_read_buffer is checked correctly
  Btrfs: read device stats on mount, write modified ones during commit
  Btrfs: add ioctl to get and reset the device stats
  Btrfs: add device counters for detected IO and checksum errors
  btrfs: Drop unused function btrfs_abort_devices()
  Btrfs: fix the same inode id problem when doing auto defragment
  Btrfs: fall back to non-inline if we don't have enough space
  Btrfs: fix how we deal with the orphan block rsv
  Btrfs: convert the inode bit field to use the actual bit operations
  ...
2012-06-01 08:37:31 -07:00
Linus Torvalds
419f431949 Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linux
Pull the rest of the nfsd commits from Bruce Fields:
 "... and then I cherry-picked the remainder of the patches from the
  head of my previous branch"

This is the rest of the original nfsd branch, rebased without the
delegation stuff that I thought really needed to be redone.

I don't like rebasing things like this in general, but in this situation
this was the lesser of two evils.

* 'for-3.5' of git://linux-nfs.org/~bfields/linux: (50 commits)
  nfsd4: fix, consolidate client_has_state
  nfsd4: don't remove rebooted client record until confirmation
  nfsd4: remove some dprintk's and a comment
  nfsd4: return "real" sequence id in confirmed case
  nfsd4: fix exchange_id to return confirm flag
  nfsd4: clarify that renewing expired client is a bug
  nfsd4: simpler ordering of setclientid_confirm checks
  nfsd4: setclientid: remove pointless assignment
  nfsd4: fix error return in non-matching-creds case
  nfsd4: fix setclientid_confirm same_cred check
  nfsd4: merge 3 setclientid cases to 2
  nfsd4: pull out common code from setclientid cases
  nfsd4: merge last two setclientid cases
  nfsd4: setclientid/confirm comment cleanup
  nfsd4: setclientid remove unnecessary terms from a logical expression
  nfsd4: move rq_flavor into svc_cred
  nfsd4: stricter cred comparison for setclientid/exchange_id
  nfsd4: move principal name into svc_cred
  nfsd4: allow removing clients not holding state
  nfsd4: rearrange exchange_id logic to simplify
  ...
2012-06-01 08:32:58 -07:00
Artem Bityutskiy
033369d1af reiserfs: get rid of resierfs_sync_super
This patch stops reiserfs using the VFS 'write_super()' method along with the
s_dirt flag, because they are on their way out.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblock using the '->write_super()' call-back.  But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:36 -04:00
Artem Bityutskiy
5c5fd81962 reiserfs: mark the superblock as dirty a bit later
The 'journal_mark_dirty()' function currently first marks the superblock as
dirty by setting 's_dirt' to 1, then does various sanity checks and returns,
then actuall does all the magic with the journal.

This is not an ideal order, though. It makes more sense to first do all the
checks, then do all the internal stuff, and at the end notify the VFS that the
superblock is now dirty.

This patch moves the 's_dirt = 1' assignment from the very beginning of this
function to the very end.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:36 -04:00
Artem Bityutskiy
717f03c4d7 reiserfs: remove useless superblock dirtying
The 'reiserfs_resize()' function marks the superblock as dirty by assigning 1
to 's_dirt' and then calls 'journal_mark_dirty()' which does the same. Thus,
we can remove the assignment from 'reiserfs_resize()'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:36 -04:00
Artem Bityutskiy
25729b0e94 reiserfs: clean-up function return type
Turn 'reiserfs_flush_old_commits()' into a void function because the callers
do not cares about what it returns anyway.

We are going to remove the 'sb->s_dirt' field completely and this patch is a
small step towards this direction.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:36 -04:00
Artem Bityutskiy
efaa33eb13 reiserfs: cleanup reiserfs_fill_super a bit
We have the reiserfs superblock pointer in the 'sbi' variable in this
function, no need to use the 'REISERFS_SB(s)' macro which is the same.
This is jut a small clean-up.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:36 -04:00
Al Viro
e3fc629d7b switch aio and shm to do_mmap_pgoff(), make do_mmap() static
after all, 0 bytes and 0 pages is the same thing...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:17 -04:00
Hugh Dickins
5e44f8c374 ext4: hole-punch use truncate_pagecache_range
When truncating a file, we unmap pages from userspace first, as that's
usually more efficient than relying, page by page, on the fallback in
truncate_inode_page() - particularly if the file is mapped many times.

Do the same when punching a hole: 3.4 added truncate_pagecache_range()
to do the unmap and trunc, so use it in ext4_ext_punch_hole(), instead
of calling truncate_inode_pages_range() directly.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-06-01 00:15:28 -04:00
Wanlong Gao
b2f4edb335 jbd2: use kmem_cache_zalloc wrapper instead of flag
Use kmem_cache_zalloc wrapper instead of flag __GFP_ZERO.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-06-01 00:10:32 -04:00
Salman Qazi
95599968d1 ext4: remove mb_groups before tearing down the buddy_cache
We can't have references held on pages in the s_buddy_cache while we are
trying to truncate its pages and put the inode.  All the pages must be
gone before we reach clear_inode.  This can only be gauranteed if we
can prevent new users from grabbing references to s_buddy_cache's pages.

The original bug can be reproduced and the bug fix can be verified by:

while true; do mount -t ext4 /dev/ram0 /export/hda3/ram0; \
	umount /export/hda3/ram0; done &

while true; do cat /proc/fs/ext4/ram0/mb_groups; done

Signed-off-by: Salman Qazi <sqazi@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-31 23:52:14 -04:00
Salman Qazi
02b7831019 ext4: add ext4_mb_unload_buddy in the error path
ext4_free_blocks fails to pair an ext4_mb_load_buddy with a matching
ext4_mb_unload_buddy when it fails a memory allocation.

Signed-off-by: Salman Qazi <sqazi@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-31 23:51:27 -04:00
Theodore Ts'o
79906964a1 ext4: don't trash state flags in EXT4_IOC_SETFLAGS
In commit 353eb83c we removed i_state_flags with 64-bit longs, But
when handling the EXT4_IOC_SETFLAGS ioctl, we replace i_flags
directly, which trashes the state flags which are stored in the high
32-bits of i_flags on 64-bit platforms.  So use the the
ext4_{set,clear}_inode_flags() functions which use atomic bit
manipulation functions instead.

Reported-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-31 23:46:01 -04:00
Tao Ma
9660755100 ext4: let getattr report the right blocks in delalloc+bigalloc
In delayed allocation, i_reserved_data_blocks now indicates
clusters, not blocks. So report it in the right number.

This can be easily exposed by the following command:
echo foo > blah; du -hc blah; sync; du -hc blah

Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-31 22:54:16 -04:00
Linus Torvalds
a00b6151a2 Merge branch 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields.

* 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux: (23 commits)
  nfsd: trivial: use SEEK_SET instead of 0 in vfs_llseek
  SUNRPC: split upcall function to extract reusable parts
  nfsd: allocate id-to-name and name-to-id caches in per-net operations.
  nfsd: make name-to-id cache allocated per network namespace context
  nfsd: make id-to-name cache allocated per network namespace context
  nfsd: pass network context to idmap init/exit functions
  nfsd: allocate export and expkey caches in per-net operations.
  nfsd: make expkey cache allocated per network namespace context
  nfsd: make export cache allocated per network namespace context
  nfsd: pass pointer to export cache down to stack wherever possible.
  nfsd: pass network context to export caches init/shutdown routines
  Lockd: pass network namespace to creation and destruction routines
  NFSd: remove hard-coded dereferences to name-to-id and id-to-name caches
  nfsd: pass pointer to expkey cache down to stack wherever possible.
  nfsd: use hash table from cache detail in nfsd export seq ops
  nfsd: pass svc_export_cache pointer as private data to "exports" seq file ops
  nfsd: use exp_put() for svc_export_cache put
  nfsd: use cache detail pointer from svc_export structure on cache put
  nfsd: add link to owner cache detail to svc_export structure
  nfsd: use passed cache_detail pointer expkey_parse()
  ...
2012-05-31 18:18:11 -07:00
Linus Torvalds
08615d7d85 Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc patches from Andrew Morton:

 - the "misc" tree - stuff from all over the map

 - checkpatch updates

 - fatfs

 - kmod changes

 - procfs

 - cpumask

 - UML

 - kexec

 - mqueue

 - rapidio

 - pidns

 - some checkpoint-restore feature work.  Reluctantly.  Most of it
   delayed a release.  I'm still rather worried that we don't have a
   clear roadmap to completion for this work.

* emailed from Andrew Morton <akpm@linux-foundation.org>: (78 patches)
  kconfig: update compression algorithm info
  c/r: prctl: add ability to set new mm_struct::exe_file
  c/r: prctl: extend PR_SET_MM to set up more mm_struct entries
  c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
  syscalls, x86: add __NR_kcmp syscall
  fs, proc: introduce /proc/<pid>/task/<tid>/children entry
  sysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORE
  aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector()
  eventfd: change int to __u64 in eventfd_signal()
  fs/nls: add Apple NLS
  pidns: make killed children autoreap
  pidns: use task_active_pid_ns in do_notify_parent
  rapidio/tsi721: add DMA engine support
  rapidio: add DMA engine support for RIO data transfers
  ipc/mqueue: add rbtree node caching support
  tools/selftests: add mq_perf_tests
  ipc/mqueue: strengthen checks on mqueue creation
  ipc/mqueue: correct mq_attr_ok test
  ipc/mqueue: improve performance of send/recv
  selftests: add mq_open_tests
  ...
2012-05-31 18:10:18 -07:00
Cyrill Gorcunov
5b172087f9 c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
We would like to have an ability to restore command line arguments and
program environment pointers but first we need to obtain them somehow.
Thus we put these values into /proc/$pid/stat.  The exit_code is needed to
restore zombie tasks.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:32 -07:00