Commit Graph

31943 Commits

Author SHA1 Message Date
Al Viro
956ce2083c [readdir] convert ntfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:48 +04:00
Al Viro
bfee7169c0 [readdir] convert isofs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:47 +04:00
Al Viro
0312fa7ccd [readdir] convert jffs2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:47 +04:00
Al Viro
6f7f231e7b [readdir] convert f2fs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:46 +04:00
Al Viro
8f29843a51 [readdir] convert 9p
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:45 +04:00
Al Viro
0edf977d2a [readdir] convert affs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:44 +04:00
Al Viro
2638ffbac9 [readdir] convert adfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:44 +04:00
Al Viro
46d0733801 [readdir] convert logfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:43 +04:00
Al Viro
070a0ebf42 [readdir] convert jfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:42 +04:00
Al Viro
77acfa29e1 [readdir] convert ceph
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:41 +04:00
Al Viro
23db862060 [readdir] convert nfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:40 +04:00
Al Viro
725bebb278 [readdir] convert ext4
and trim the living hell out bogosities in inline dir case

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:40 +04:00
Al Viro
4deb398a1b [readdir] convert qnx6
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:39 +04:00
Al Viro
663f4deca7 [readdir] convert qnx4
... and use strnlen() instead of strlen() - it's done on untrusted data,
after all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:38 +04:00
Al Viro
9fd4d05949 [readdir] convert omfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:37 +04:00
Al Viro
1616abe841 [readdir] convert nilfs2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:36 +04:00
Al Viro
d55fea8ddb [readdir] convert sysfs
get rid of the kludges in sysfs_readdir()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:36 +04:00
Al Viro
d81a8ef598 [readdir] convert gfs2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:35 +04:00
Al Viro
75811d4fda [readdir] convert exofs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:34 +04:00
Al Viro
81b9f66e6b [readdir] convert bfs
... and get rid of that ridiculous mutex in bfs_readdir()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:33 +04:00
Al Viro
f0c3b5093a [readdir] convert procfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:32 +04:00
Al Viro
68c6147113 [readdir] convert openpromfs
what the hell is op_mutex for, BTW?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:32 +04:00
Al Viro
7aa123a0dc [readdir] convert efs
* sanity checks belong before risky operation, not after it
* don't quit as soon as we'd found an entry

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:31 +04:00
Al Viro
52018855e6 [readdir] convert configfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:30 +04:00
Al Viro
3903b38ce7 [readdir] convert romfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:29 +04:00
Al Viro
5f6039ce69 [readdir] convert squashfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:28 +04:00
Al Viro
01122e0688 [readdir] convert ubifs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:25 +04:00
Al Viro
5add2ee198 [readdir] convert udf
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:50 +04:00
Al Viro
5ded75ec4c [readdir] convert ext3
new helper: dir_relax(inode).  Call when you are in location that will
_not_ be invalidated by directory modifications (block boundary, in case
of ext*).  Returns whether the directory has survived (dropping i_mutex
allows rmdir to kill the sucker; if it returns false to us, ->iterate()
is obviously done)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:49 +04:00
Al Viro
5f99f4e79a [readdir] switch dcache_readdir() users to ->iterate()
new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx),
dir_emit_dots(file, ctx).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:48 +04:00
Al Viro
80886298c0 [readdir] simple local unixlike: switch to ->iterate()
ext2, ufs, minix, sysv

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:47 +04:00
Al Viro
bb6f619b3a [readdir] introduce ->iterate(), ctx->pos, dir_emit()
New method - ->iterate(file, ctx).  That's the replacement for ->readdir();
it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
calls dir_emit(ctx, ...) instead of filldir(data, ...).  It does *not*
update file->f_pos (or look at it, for that matter); iterate_dir() does the
update.

Note that dir_emit() takes the offset from ctx->pos (and eventually
filldir_t will lose that argument).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:47 +04:00
Al Viro
5c0ba4e076 [readdir] introduce iterate_dir() and dir_context
iterate_dir(): new helper, replacing vfs_readdir().

struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:46 +04:00
Al Viro
e06aeb5716 compat.c: LOOP_CLR_FD is taken care of in loop.c itself...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:44 +04:00
Artem Bityutskiy
605c912bb8 UBIFS: fix a horrid bug
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.

This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses
it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage,
but this may corrupt memory and lead to all kinds of problems like crashes an
security holes.

This patch fixes the problem by using the 'file->f_version' field, which
'->llseek()' always unconditionally sets to zero. We set it to 1 in
'ubifs_readdir()' and whenever we detect that it became 0, we know there was a
seek and it is time to clear the state saved in 'file->private_data'.

I tested this patch by writing a user-space program which runds readdir and
seek in parallell. I could easily crash the kernel without these patches, but
could not crash it with these patches.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:45:37 +04:00
Artem Bityutskiy
33f1a63ae8 UBIFS: prepare to fix a horrid bug
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.

First of all, this means that 'file->private_data' can be freed while
'ubifs_readdir()' uses it.  But this particular patch does not fix the problem.
This patch is only a preparation, and the fix will follow next.

In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly,
because 'file->f_pos' can be changed by '->llseek()' at any point. This may
lead 'ubifs_readdir()' to returning inconsistent data: directory entry names
may correspond to incorrect file positions.

So here we introduce a local variable 'pos', read 'file->f_pose' once at very
the beginning, and then stick to 'pos'. The result of this is that when
'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of
'ubifs_readdir()', the latter "wins".

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:45:37 +04:00
Al Viro
7995bd2871 splice: don't pass the address of ->f_pos to methods
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-20 19:02:45 +04:00
Linus Torvalds
d0ff934881 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
 "Several fixes + obvious cleanup (you've missed a couple of open-coded
  can_lookup() back then)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  snd_pcm_link(): fix a leak...
  use can_lookup() instead of direct checks of ->i_op->lookup
  move exit_task_namespaces() outside of exit_notify()
  fput: task_work_add() can fail if the caller has passed exit_task_work()
  ncpfs: fix rmdir returns Device or resource busy
2013-06-14 19:18:56 -10:00
Linus Torvalds
d58c6ff0b7 xfs: fixes for 3.10-rc6
- Remove noisy warnings about experimental support which spams the logs
 - Add padding to align directory and attr structures correctly
 - Set block number on child buffer on a root btree split
 - Disable verifiers during log recovery for non-CRC filesystems
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRu4gPAAoJENaLyazVq6ZO0GwP/j7i8hEl6hoFZZJ2WX7niFCP
 t0r218J9JZDCLSk7+rY26gmxOzifRHAIt5TRwwqSCbNnZbuQZsqFUpvDMSMY3XOj
 4qnUlO6diRLonN5ixrOb5YMTQJ8YHG7cB4jvxBDAqPqEfNpRyqikxstcH6KBmtSU
 duqhuQMdmHAjMUqfpdt5ewueOCmw6jI79ZqvMnEfSHW7YS7G4SrKYa71HkfRR6CD
 +K/FqEoDO/9psbsFlrkQ4Uvqngp8c9c0wQULxreN0BSdRbVqHfrS6eAWGhT3K2HW
 7ZGxEiTcwR5XCtDQjhw7vbZQEMeMcl6yZ6J7e+jJc53maySOOrqCaYyyrhzZFw4H
 Xh52pcVJtGuGVBHDxpfhI5e7KI4DjEugQK9AaONy02bhhTh3r3CKu5pprDyenyHr
 9s/DG8u/gJX8tm8DSBlIXv2iCvY4mTeesYkMaLHgC8uLXmItkRBoUaj1NQvnsTqo
 EF1xVVqh3aiueD4+cvu3+x4J4dTFmYQ++Oi3Zt1YpjBBb/h3n3KFUfizhRIp9r43
 R4UO5W3b6s4q/1oC+bO6Qlxfny9vcyz+UrkcLpbuo+cRTC3bKi85v2Gaaw69bcB1
 1SZCFRuVvDvzffX6Nir699Dj/uU4GETvDw/+y/igcKcETx6L4AgQPV9y/izJq5zr
 zLhC+OSCDvuOGaOmRvco
 =bijX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs

Pull xfs fixes from Ben Myers:
 - Remove noisy warnings about experimental support which spams the logs
 - Add padding to align directory and attr structures correctly
 - Set block number on child buffer on a root btree split
 - Disable verifiers during log recovery for non-CRC filesystems

* tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs:
  xfs: don't shutdown log recovery on validation errors
  xfs: ensure btree root split sets blkno correctly
  xfs: fix implicit padding in directory and attr CRC formats
  xfs: don't emit v5 superblock warnings on write
2013-06-14 19:16:31 -10:00
Al Viro
0525290119 use can_lookup() instead of direct checks of ->i_op->lookup
a couple of places got missed back when Linus has introduced that one...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-15 05:41:45 +04:00
Oleg Nesterov
e7b2c40692 fput: task_work_add() can fail if the caller has passed exit_task_work()
fput() assumes that it can't be called after exit_task_work() but
this is not true, for example free_ipc_ns()->shm_destroy() can do
this. In this case fput() silently leaks the file.

Change it to fallback to delayed_fput_work if task_work_add() fails.
The patch looks complicated but it is not, it changes the code from

	if (PF_KTHREAD) {
		schedule_work(...);
		return;
	}
	task_work_add(...)

to
	if (!PF_KTHREAD) {
		if (!task_work_add(...))
			return;
		/* fallback */
	}
	schedule_work(...);

As for shm_destroy() in particular, we could make another fix but I
think this change makes sense anyway. There could be another similar
user, it is not safe to assume that task_work_add() can't fail.

Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-15 05:39:08 +04:00
Dave Chinner
d302cf1d31 xfs: don't shutdown log recovery on validation errors
Unfortunately, we cannot guarantee that items logged multiple times
and replayed by log recovery do not take objects back in time. When
they are taken back in time, the go into an intermediate state which
is corrupt, and hence verification that occurs on this intermediate
state causes log recovery to abort with a corruption shutdown.

Instead of causing a shutdown and unmountable filesystem, don't
verify post-recovery items before they are written to disk. This is
less than optimal, but there is no way to detect this issue for
non-CRC filesystems If log recovery successfully completes, this
will be undone and the object will be consistent by subsequent
transactions that are replayed, so in most cases we don't need to
take drastic action.

For CRC enabled filesystems, leave the verifiers in place - we need
to call them to recalculate the CRCs on the objects anyway. This
recovery problem can be solved for such filesystems - we have a LSN
stamped in all metadata at writeback time that we can to determine
whether the item should be replayed or not. This is a separate piece
of work, so is not addressed by this patch.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 9222a9cf86)
2013-06-14 15:59:45 -05:00
Dave Chinner
088c9f67c3 xfs: ensure btree root split sets blkno correctly
For CRC enabled filesystems, the BMBT is rooted in an inode, so it
passes through a different code path on root splits than the
freespace and inode btrees. This is much less traversed by xfstests
than the other trees. When testing on a 1k block size filesystem,
I've been seeing ASSERT failures in generic/234 like:

XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317

which are generally preceded by a lblock check failure. I noticed
this in the bmbt stats:

$ pminfo -f xfs.btree.block_map

xfs.btree.block_map.lookup
    value 39135

xfs.btree.block_map.compare
    value 268432

xfs.btree.block_map.insrec
    value 15786

xfs.btree.block_map.delrec
    value 13884

xfs.btree.block_map.newroot
    value 2

xfs.btree.block_map.killroot
    value 0
.....

Very little coverage of root splits and merges. Indeed, on a 4k
filesystem, block_map.newroot and block_map.killroot are both zero.
i.e. the code is not exercised at all, and it's the only generic
btree infrastructure operation that is not exercised by a default run
of xfstests.

Turns out that on a 1k filesystem, generic/234 accounts for one of
those two root splits, and that is somewhat of a smoking gun. In
fact, it's the same problem we saw in the directory/attr code where
headers are memcpy()d from one block to another without updating the
self describing metadata.

Simple fix - when copying the header out of the root block, make
sure the block number is updated correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit ade1335afe)
2013-06-14 15:59:31 -05:00
Dave Chinner
5170711df7 xfs: fix implicit padding in directory and attr CRC formats
Michael L. Semon has been testing CRC patches on a 32 bit system and
been seeing assert failures in the directory code from xfs/080.
Thanks to Michael's heroic efforts with printk debugging, we found
that the problem was that the last free space being left in the
directory structure was too small to fit a unused tag structure and
it was being corrupted and attempting to log a region out of bounds.
Hence the assert failure looked something like:

.....
#5 calling xfs_dir2_data_log_unused() 36 32
#1 4092 4095 4096
#2 8182 8183 4096
XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568

Where #1 showed the first region of the dup being logged (i.e. the
last 4 bytes of a directory buffer) and #2 shows the corrupt values
being calculated from the length of the dup entry which overflowed
the size of the buffer.

It turns out that the problem was not in the logging code, nor in
the freespace handling code. It is an initial condition bug that
only shows up on 32 bit systems. When a new buffer is initialised,
where's the freespace that is set up:

[  172.316249] calling xfs_dir2_leaf_addname() from xfs_dir_createname()
[  172.316346] #9 calling xfs_dir2_data_log_unused()
[  172.316351] #1 calling xfs_trans_log_buf() 60 63 4096
[  172.316353] #2 calling xfs_trans_log_buf() 4094 4095 4096

Note the offset of the first region being logged? It's 60 bytes into
the buffer. Once I saw that, I pretty much knew that the bug was
going to be caused by this.

Essentially, all direct entries are rounded to 8 bytes in length,
and all entries start with an 8 byte alignment. This means that we
can decode inplace as variables are naturally aligned. With the
directory data supposedly starting on a 8 byte boundary, and all
entries padded to 8 bytes, the minimum freespace in a directory
block is supposed to be 8 bytes, which is large enough to fit a
unused data entry structure (6 bytes in size). The fact we only have
4 bytes of free space indicates a directory data block alignment
problem.

And what do you know - there's an implicit hole in the directory
data block header for the CRC format, which means the header is 60
byte on 32 bit intel systems and 64 bytes on 64 bit systems. Needs
padding. And while looking at the structures, I found the same
problem in the attr leaf header. Fix them both.

Note that this only affects 32 bit systems with CRCs enabled.
Everything else is just fine. Note that CRC enabled filesystems created
before this fix on such systems will not be readable with this fix
applied.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Debugged-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 8a1fd2950e)
2013-06-14 15:59:16 -05:00
Dave Chinner
47ad2fcba9 xfs: don't emit v5 superblock warnings on write
We write the superblock every 30s or so which results in the
verifier being called. Right now that results in this output
every 30s:

XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
Use of these features in this kernel is at your own risk!

And spamming the logs.

We don't need to check for whether we support v5 superblocks or
whether there are feature bits we don't support set as these are
only relevant when we first mount the filesytem. i.e. on superblock
read. Hence for the write verification we can just skip all the
checks (and hence verbose output) altogether.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 34510185ab)
2013-06-14 15:58:47 -05:00
Linus Torvalds
a2648ebb7e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This is an assortment of crash fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: stop all workers before cleaning up roots
  Btrfs: fix use-after-free bug during umount
  Btrfs: init relocate extent_io_tree with a mapping
  btrfs: Drop inode if inode root is NULL
  Btrfs: don't delete fs_roots until after we cleanup the transaction
2013-06-13 22:34:14 -07:00
Linus Torvalds
a568fa1c91 Merge branch 'akpm' (updates from Andrew Morton)
Merge misc fixes from Andrew Morton:
 "Bunch of fixes and one little addition to math64.h"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
  include/linux/math64.h: add div64_ul()
  mm: memcontrol: fix lockless reclaim hierarchy iterator
  frontswap: fix incorrect zeroing and allocation size for frontswap_map
  kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
  mm: migration: add migrate_entry_wait_huge()
  ocfs2: add missing lockres put in dlm_mig_lockres_handler
  mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
  drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()
  aio: fix io_destroy() regression by using call_rcu()
  rtc-at91rm9200: use shadow IMR on at91sam9x5
  rtc-at91rm9200: add shadow interrupt mask
  rtc-at91rm9200: refactor interrupt-register handling
  rtc-at91rm9200: add configuration support
  rtc-at91rm9200: add match-table compile guard
  fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
  swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
  drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree
  cciss: fix broken mutex usage in ioctl
  audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
  drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel
  ...
2013-06-12 16:29:53 -07:00
Xue jiufei
27749f2ff0 ocfs2: add missing lockres put in dlm_mig_lockres_handler
dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path.

Signed-off-by: joyce <xuejiufei@huawei.com>
Reviewed-by: shencanquan <shencanquan@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:46 -07:00
Kent Overstreet
4fcc712f5c aio: fix io_destroy() regression by using call_rcu()
There was a regression introduced by 36f5588905 ("aio: refcounting
cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
using RCU in the shutdown path, but the synchronize_rcu() was done in
the context of the io_destroy() syscall greatly increasing the time it
could block.

This patch switches it to call_rcu() and makes shutdown asynchronous
(more asynchronous than it was originally; before the refcount changes
io_destroy() would still wait on pending kiocbs).

Note that there's a global quota on the max outstanding kiocbs, and that
quota must be manipulated synchronously; otherwise io_setup() could
return -EAGAIN when there isn't quota available, and userspace won't
have any way of waiting until shutdown of the old kioctxs has finished
(besides busy looping).

So we release our quota before kioctx shutdown has finished, which
should be fine since the quota never corresponded to anything real
anyways.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:46 -07:00
Goldwyn Rodrigues
e099127169 fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
While removing a non-empty directory, the kernel dumps a message:

  (rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39

Suppress the error message from being printed in the dmesg so users
don't panic.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:45 -07:00
Xiaowei.Hu
7869e59067 ocfs2: ocfs2_prep_new_orphaned_file() should return ret
If an error occurs, for example an EIO in __ocfs2_prepare_orphan_dir,
ocfs2_prep_new_orphaned_file will release the inode_ac, then when the
caller of ocfs2_prep_new_orphaned_file gets a 0 return, it will refer to
a NULL ocfs2_alloc_context struct in the following functions.  A kernel
panic happens.

Signed-off-by: "Xiaowei.Hu" <xiaowei.hu@oracle.com>
Reviewed-by: shencanquan <shencanquan@huawei.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:44 -07:00
Kees Cook
637241a900 kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections.  Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions.  With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.

To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:

 - /proc/kmsg allows:
  - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
    single-reader interface (SYSLOG_ACTION_READ).
  - everything, after an open.

 - syslog syscall allows:
  - anything, if CAP_SYSLOG.
  - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
    dmesg_restrict==0.
  - nothing else (EPERM).

The use-cases were:
 - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
 - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
   destructive SYSLOG_ACTION_READs.

AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.

Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.

To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.

 - /dev/kmsg allows:
  - open if CAP_SYSLOG or dmesg_restrict==0
  - reading/polling, after open

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192

[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:44 -07:00
Linus Torvalds
8d7a8fe2ce Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "There is a pair of fixes for double-frees in the recent bundle for
  3.10, a couple of fixes for long-standing bugs (sleep while atomic and
  an endianness fix), and a locking fix that can be triggered when osds
  are going down"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: fix cleanup in rbd_add()
  rbd: don't destroy ceph_opts in rbd_add()
  ceph: ceph_pagelist_append might sleep while atomic
  ceph: add cpu_to_le32() calls when encoding a reconnect capability
  libceph: must hold mutex for reset_changed_osds()
2013-06-12 08:28:19 -07:00
Mikulas Patocka
bbd465df73 hpfs: fix warnings when the filesystem fills up
This patch fixes warnings due to missing lock on write error path.

  WARNING: at fs/hpfs/hpfs_fn.h:353 hpfs_truncate+0x75/0x80 [hpfs]()
  Hardware name: empty
  Pid: 26563, comm: dd Tainted: P           O 3.9.4 #12
  Call Trace:
    hpfs_truncate+0x75/0x80 [hpfs]
    hpfs_write_begin+0x84/0x90 [hpfs]
    _hpfs_bmap+0x10/0x10 [hpfs]
    generic_file_buffered_write+0x121/0x2c0
    __generic_file_aio_write+0x1c7/0x3f0
    generic_file_aio_write+0x7c/0x100
    do_sync_write+0x98/0xd0
    hpfs_file_write+0xd/0x50 [hpfs]
    vfs_write+0xa2/0x160
    sys_write+0x51/0xa0
    page_fault+0x22/0x30
    system_call_fastpath+0x1a/0x1f

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: stable@kernel.org  # 2.6.39+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-08 17:39:40 -07:00
Linus Torvalds
81db4dbf59 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:

 - Trivial: unused variable removal

 - Posix-timers: Add the clock ID to the new proc interface to make it
   useful.  The interface is new and should be functional when we reach
   the final 3.10 release.

 - Cure a false positive warning in the tick code introduced by the
   overhaul in 3.10

 - Fix for a persistent clock detection regression introduced in this
   cycle

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Correct run-time detection of persistent_clock.
  ntp: Remove unused variable flags in __hardpps
  posix-timers: Show clock ID in proc file
  tick: Cure broadcast false positive pending bit warning
2013-06-08 15:51:21 -07:00
Josef Bacik
13e6c37b98 Btrfs: stop all workers before cleaning up roots
Dave reported a panic because the extent_root->commit_root was NULL in the
caching kthread.  That is because we just unset it in free_root_pointers, which
is not the correct thing to do, we have to either wait for the caching kthread
to complete or hold the extent_commit_sem lock so we know the thread has exited.
This patch makes the kthreads all stop first and then we do our cleanup.  This
should fix the race.  Thanks,

Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-08 15:11:35 -04:00
Liu Bo
2932505abe Btrfs: fix use-after-free bug during umount
Commit be283b2e67
(    Btrfs: use helper to cleanup tree roots) introduced the following bug,

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
 IP: [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs]
[...]
 Pid: 2463, comm: btrfs-cache-1 Tainted: G           O 3.9.0+ #4 innotek GmbH VirtualBox/VirtualBox
 RIP: 0010:[<ffffffffa039368c>]  [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs]
 Process btrfs-cache-1 (pid: 2463, threadinfo ffff880112d60000, task ffff880117679730)
[...]
 Call Trace:
  [<ffffffffa0398a99>] btrfs_search_slot+0x104/0x64d [btrfs]
  [<ffffffffa039aea4>] btrfs_next_old_leaf+0xa7/0x334 [btrfs]
  [<ffffffffa039b141>] btrfs_next_leaf+0x10/0x12 [btrfs]
  [<ffffffffa039ea13>] caching_thread+0x1a3/0x2e0 [btrfs]
  [<ffffffffa03d8811>] worker_loop+0x14b/0x48e [btrfs]
  [<ffffffffa03d86c6>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
  [<ffffffff81068d3d>] kthread+0x8d/0x95
  [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43
  [<ffffffff8151e5ac>] ret_from_fork+0x7c/0xb0
  [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43
RIP  [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs]

We've free'ed commit_root before actually getting to free block groups where
caching thread needs valid extent_root->commit_root.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-06-08 15:10:01 -04:00
Josef Bacik
a9995eece3 Btrfs: init relocate extent_io_tree with a mapping
Dave reported a NULL pointer deref.  This is caused because he thought he'd be
smart and add sanity checks to the extent_io bit operations, but he didn't
expect a tree to have a NULL mapping.  To fix this we just need to init the
relocation's processed_blocks with the btree_inode->i_mapping.  Thanks,

Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-06-08 15:07:53 -04:00
Naohiro Aota
6379ef9fb2 btrfs: Drop inode if inode root is NULL
There is a path where btrfs_drop_inode() is called with its inode's root
is NULL: In btrfs_new_inode(), when btrfs_set_inode_index() fails,
iput() is called. We should handle this case before taking look at the
root->root_item.

Signed-off-by: Naohiro Aota <naota@elisp.net>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-06-08 15:07:53 -04:00
Josef Bacik
7b5ff90ed0 Btrfs: don't delete fs_roots until after we cleanup the transaction
We get a use after free if we had a transaction to cleanup since there could be
delayed inodes which refer to their respective fs_root.  Thanks

Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-06-08 15:07:53 -04:00
Linus Torvalds
b8e9dbacdd * Fixes how eCryptfs handles msync to sync both the upper and lower file
* A couple of MAINTAINERS updates
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCgAGBQJRsU9rAAoJENaSAD2qAscK3ccP/ia/C8LawYZbgbZq8VVGY0li
 LzBbO+OT2O47h7tmILOM2fgR86emt1FkKa+snWBGWiigGfeTyQX6g2/I9adEEY3D
 H8n5DBc6d6TlBO2ca1dXUkIe/b2b3Mj4NwxZIXDLwd15nJDwW5aQCFMtlrJ0UmBp
 q2Xuvj/AprXTz02jk7V7vcor5wd8VgveTnLTIclLQ1AwJ+Zt51nx5UJo4jqlQydF
 1MO+tvKRo4LWs+dl1j1J4VkZIfy3frXO8m2W9wc+pJRE6FO1jyDrbjjyaP5BfelU
 tfpKkx801tnp//1NTjOcakhifH/kCm43WQn4eV7qb386pV85nGEZrP1MdRoEVmv3
 39NlKLi9sUvLZ2DgAl1qsdh36VD9PF1z1uKlcRjABJb/5IOM2RfvM2FltfqLtMM0
 lenZOZb3r62CiKIoqz6CgQnU18XE3ICo/mBiuWm1FS2l/IgwKfU4dUV361KH4ykb
 KzHKFVDY68SdPHkYJtEGJ86pgLA7N6Lu97c8+zesnGzYK1HLf5I7MvP9toM0zUXX
 TEv68sbRGEOia7EGcyE9gSVZebhtneosi4/z73GJa93kYyc+6AleLL3y+X7MthwT
 WbvbzmrsU1JeRO/VM99xRYDspDM3kxnTQlhyGOw+GunUY6Zk7nl2IiNcNTPfGP3w
 QJhg09DfITcwOmf2ImFb
 =NgKp
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.10-rc5-msync' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs fixes from Tyler Hicks:
 - Fixes how eCryptfs handles msync to sync both the upper and lower
   file
 - A couple of MAINTAINERS updates

* tag 'ecryptfs-3.10-rc5-msync' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: Check return of filemap_write_and_wait during fsync
  Update eCryptFS maintainers
  ecryptfs: fixed msync to flush data
2013-06-07 16:21:44 -07:00
Linus Torvalds
e432785934 Merge branch 'for-3.10' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fix from Steve French:
 "Fix one byte buffer overrun with prefixpaths on cifs mounts which can
  cause a problem with mount depending on the string length"

* 'for-3.10' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix off-by-one bug in build_unc_path_to_root
2013-06-07 16:05:43 -07:00
Dave Chiluk
698b822363 ncpfs: fix rmdir returns Device or resource busy
1d2ef59014 caused a regression in ncpfs such that
directories could no longer be removed.  This was because ncp_rmdir checked
to see if a dentry could be unhashed before allowing it to be removed. Since
1d2ef59014 introduced a change that incremented
dentry->d_count causing it to always be greater than 1 unhash would always
fail.  Thus causing the error path in ncp_rmdir to always be taken.  Removing
this error path is safe as unhashing is still accomplished by calls to dput
from vfs_rmdir.

Signed-off-by: Dave Chiluk <chiluk@canonical.com>
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-07 12:15:38 -04:00
Linus Torvalds
e6395b68ad xfs: update for 3.10-rc5
- Rework of dquot CRCs
 - Fix for remote attribute invalidation of a leaf
 - Fix ordering of transaction replay in recovery
 - Implement CRCs for inode unlinked list
 - Disable noattr2/attr2 mount options when CRCs are enabled
 - Bump the limitation of ACL entries for v5 superblocks
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRsLfDAAoJENaLyazVq6ZOgBQP/jKupB/cOV3sewBsPDDPBR46
 xg3qaps6zpEWtXGWnXe8HF/u57YfoA5K+YVwq6+jkIsYFjP3dDdLPDeEeC3HoB9I
 VZPmV5VEACvUyD9WhMeSjAbRPAtweFFbTuZZqULv2SpG+tUaF8VUz7luUM4XpcFa
 NtxccORMmBBN1j71Qod4+xxJ1BM/KtXV1RBMudtPAWr+//LKwLm/9HFavw2uXeQW
 xgebmc95DXrFpjwXepHQXW/xTmVPclah034JC8kj+Q/VhmWvVIgo331gVL0M1+L9
 U7OObUdJAJ+5VN872TgwbzWMSRCPKZ9PTh78LZksYtweb5GjD/x6yVXFSWAix/O7
 q1EkYUfBR1YHZVTAfoE2QGCTkgJCsellBJWzIoov2/Qqq18QRA1EtznjtZ+WkqP6
 dKzgcnDb7LEPjg6Y8L4ZKGhylXGkPmCScTprjgPIWoAUS2ytURkaG+CCK5sp/Ldn
 KRu0zjbrcQrxAVkMo1E2hkOOZDD1qazJE5mfhOnQxLKsPmryM5tarzWmd9O+VjH8
 tiJosS3JGoz1rLOGKYjlqEr9G0zc/3Bmaz7tFnaHeCWTKUHxo7AXwnEfPmKle29h
 lbJFW86DU56QlNb/mLEE+v2ojC/PDpcYHddxG9Yo3B5CrDX8xJgXQ6ML0g86ceL3
 tuyMnOF8opR72Wavc0co
 =R+6z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.10-rc5' of git://oss.sgi.com/xfs/xfs

Pull more xfs updates from Ben Myers:
 "Here are several fixes for filesystems with CRC support turned on:
  fixes for quota, remote attributes, and recovery.  There is also some
  feature work related to CRCs: the implementation of CRCs for the inode
  unlinked lists, disabling noattr2/attr2 options when appropriate, and
  bumping the maximum number of ACLs.

  I would have preferred to defer this last category of items to 3.11.
  This would require setting a feature bit for the on-disk changes, so
  there is some pressure to get these in 3.10.  I believe this
  represents the end of the CRC related queue.

   - Rework of dquot CRCs
   - Fix for remote attribute invalidation of a leaf
   - Fix ordering of transaction replay in recovery
   - Implement CRCs for inode unlinked list
   - Disable noattr2/attr2 mount options when CRCs are enabled
   - Bump the limitation of ACL entries for v5 superblocks"

* tag 'for-linus-v3.10-rc5' of git://oss.sgi.com/xfs/xfs:
  xfs: increase number of ACL entries for V5 superblocks
  xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
  xfs: inode unlinked list needs to recalculate the inode CRC
  xfs: fix log recovery transaction item reordering
  xfs: fix remote attribute invalidation for a leaf
  xfs: rework dquot CRCs
2013-06-06 16:15:25 -07:00
Dave Chinner
0a8aa19397 xfs: increase number of ACL entries for V5 superblocks
The limit of 25 ACL entries is arbitrary, but baked into the on-disk
format.  For version 5 superblocks, increase it to the maximum nuber
of ACLs that can fit into a single xattr.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinuguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 5c87d4bc1a)
2013-06-06 10:52:15 -05:00
Dave Chinner
f763fd440e xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
attr2 format is always enabled for v5 superblock filesystems, so the
mount options to enable or disable it need to be cause mount errors.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit d3eaace84e)
2013-06-06 10:51:34 -05:00
Dave Chinner
ad868afddb xfs: inode unlinked list needs to recalculate the inode CRC
The inode unlinked list manipulations operate directly on the inode
buffer, and so bypass the inode CRC calculation mechanisms. Hence an
inode on the unlinked list has an invalid CRC. Fix this by
recalculating the CRC whenever we modify an unlinked list pointer in
an inode, ncluding during log recovery. This is trivial to do and
results in  unlinked list operations always leaving a consistent
inode in the buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 0a32c26e72)
2013-06-06 10:51:19 -05:00
Dave Chinner
7540617075 xfs: fix log recovery transaction item reordering
There are several constraints that inode allocation and unlink
logging impose on log recovery. These all stem from the fact that
inode alloc/unlink are logged in buffers, but all other inode
changes are logged in inode items. Hence there are ordering
constraints that recovery must follow to ensure the correct result
occurs.

As it turns out, this ordering has been working mostly by chance
than good management. The existing code moves all buffers except
cancelled buffers to the head of the list, and everything else to
the tail of the list. The problem with this is that is interleaves
inode items with the buffer cancellation items, and hence whether
the inode item in an cancelled buffer gets replayed is essentially
left to chance.

Further, this ordering causes problems for log recovery when inode
CRCs are enabled. It typically replays the inode unlink buffer long before
it replays the inode core changes, and so the CRC recorded in an
unlink buffer is going to be invalid and hence any attempt to
validate the inode in the buffer is going to fail. Hence we really
need to enforce the ordering that the inode alloc/unlink code has
expected log recovery to have since inode chunk de-allocation was
introduced back in 2003...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit a775ad7780)
2013-06-06 10:51:07 -05:00
Dave Chinner
ea929536a4 xfs: fix remote attribute invalidation for a leaf
When invalidating an attribute leaf block block, there might be
remote attributes that it points to. With the recent rework of the
remote attribute format, we have to make sure we calculate the
length of the attribute correctly. We aren't doing that in
xfs_attr3_leaf_inactive(), so fix it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinuguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 59913f14df)
2013-06-06 10:50:52 -05:00
Dave Chinner
bb9b8e86ad xfs: rework dquot CRCs
Calculating dquot CRCs when the backing buffer is written back just
doesn't work reliably. There are several places which manipulate
dquots directly in the buffers, and they don't calculate CRCs
appropriately, nor do they always set the buffer up to calculate
CRCs appropriately.

Firstly, if we log a dquot buffer (e.g. during allocation) it gets
logged without valid CRC, and so on recovery we end up with a dquot
that is not valid.

Secondly, if we recover/repair a dquot, we don't have a verifier
attached to the buffer and hence CRCs are not calculated on the way
down to disk.

Thirdly, calculating the CRC after we've changed the contents means
that if we re-read the dquot from the buffer, we cannot verify the
contents of the dquot are valid, as the CRC is invalid.

So, to avoid all the dquot CRC errors that are being detected by the
read verifier, change to using the same model as for inodes. That
is, dquot CRCs are calculated and written to the backing buffer at
the time the dquot is flushed to the backing buffer. If we modify
the dquot directly in the backing buffer, calculate the CRC
immediately after the modification is complete. Hence the dquot in
the on-disk buffer should always have a valid CRC.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 6fcdc59de2)
2013-06-06 10:50:35 -05:00
Tyler Hicks
bc5abcf7e4 eCryptfs: Check return of filemap_write_and_wait during fsync
Error out of ecryptfs_fsync() if filemap_write_and_wait() fails.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Paul Taysom <taysom@chromium.org>
Cc: Olof Johansson <olofj@chromium.org>
Cc: stable@vger.kernel.org # v3.6+
2013-06-04 23:53:31 -07:00
Linus Torvalds
6f66f9005b Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Pull gfs2 fixes from Steven Whitehouse:
 "There are four patches this time.

  The first fixes a problem where the wrong descriptor type was being
  written into the log for journaled data blocks.

  The second fixes a race relating to the deallocation of allocator
  data.

  The third provides a fallback if kmalloc is unable to satisfy a
  request to allocate a directory hash table.

  The fourth fixes the iopen glock caching so that inodes are deleted in
  a more timely manner after rmdir/unlink"

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Don't cache iopen glocks
  GFS2: Fall back to vmalloc if kmalloc fails for dir hash tables
  GFS2: Increase i_writecount during gfs2_setattr_size
  GFS2: Set log descriptor type for jdata blocks
2013-06-05 09:06:28 +09:00
Linus Torvalds
8764d86100 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "One patch fixes an Oops introduced in 3.9 with the readdirplus
  feature.  The rest are fixes for async-dio in 3.10"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix alignment in short read optimization for async_dio
  fuse: return -EIOCBQUEUED from fuse_direct_IO() for all async requests
  fuse: fix readdirplus Oops in fuse_dentry_revalidate
  fuse: update inode size and invalidate attributes on fallocate
  fuse: truncate pagecache range on hole punch
  fuse: allocate for_background dio requests based on io->async state
2013-06-05 09:03:31 +09:00
Linus Torvalds
0ab60871b4 A couple jfs bug fixes for 3.10-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.20 (GNU/Linux)
 
 iQIcBAABAgAGBQJRrLmSAAoJEDaohF61QIxkakoP/jo8fo2KaixVHRag1h7pBwVG
 7UJrgYmojoFUsmem+k34rXOzwc8eqv5BzTwGJe81ktatUjPkEQELdwryB4Rw6YYI
 F8/Mrc2AjHEavuTnojmENP22+YkPOW90r+k00dVk+cY9FQsc1/AhNYtzpVg9hSRX
 RBhGcgfnQQCAlIf09vr7R19oBLiHfrFNnUYJeUg/d0bxW8uiTimX7ENDTOtn61WF
 hLEABnL2tC0UtK3z/B0qNIpDE5DHHaLo8WcGjlXE3Pte4JA2rPDJwaHaUgvCEo6R
 iYCw5Dq6bL7Ce7wMP1GzY+LHgaeUg5Nknnkl5vEKZSZuy7dfC2eBlYbziqyJm73A
 0e7UgypkaNGa4p4MzM0J5OW+4bRRPzNir99BnpU8Bm6dJTY8ZeIMTTQFX2Fla8+G
 SCo6dMjEUfF+yIykDwmeZYjHl5XeKM6B3imoJ8+yD9vZLzg6MY4SUmgPEKL7BFoi
 U4aiID6N67LYAMb5vapzpcl+uHZbrw7CBnI2FanqdsAoPjath1CgrEvseBAUowpy
 EUeCLaQ05xR8ynm44tQCPQFgqBpRxAiVtpy9nPulA3cQCbwelEuLKrZB17DBfK9i
 w7f5e8nNG8gXfRuL5idGzS7g6Hp90JUhBWKh0dJC820OM7idWnXGCLbYTm7t6pvo
 m931/xUofSsXCLNwEo88
 =JZIK
 -----END PGP SIGNATURE-----

Merge tag 'jfs-3.10-rc5' of git://github.com/kleikamp/linux-shaggy

Pull jfs bugfixes from David Kleikamp:
 "A couple jfs bug fixes for 3.10-rc5"

* tag 'jfs-3.10-rc5' of git://github.com/kleikamp/linux-shaggy:
  fs/jfs: Add check if journaling to disk has been disabled in lbmRead()
  jfs: Several bugs in jfs_freeze() and jfs_unfreeze()
2013-06-04 06:33:44 +09:00
Bob Peterson
a6a4d98b01 GFS2: Don't cache iopen glocks
This patch makes GFS2 immediately reclaim/delete all iopen glocks
as soon as they're dequeued. This allows deleters to get an
EXclusive lock on iopen so files are deleted properly instead of
being set as unlinked.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-03 16:40:22 +01:00
Bob Peterson
e8830d8856 GFS2: Fall back to vmalloc if kmalloc fails for dir hash tables
This version has one more correction: the vmalloc calls are replaced
by __vmalloc calls to preserve the GFP_NOFS flag.

When GFS2's directory management code allocates buffers for a
directory hash table, if it can't get the memory it needs, it
currently gives a bad return code. Rather than giving an error,
this patch allows it to use virtual memory rather than kernel
memory for the hash table. This should make it possible for
directories to function properly, even when kernel memory becomes
very fragmented.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-03 16:39:44 +01:00
Bob Peterson
2b3dcf3581 GFS2: Increase i_writecount during gfs2_setattr_size
This patch calls get_write_access in a few functions. This
merely increases inode->i_writecount for the duration of the function.
That will ensure that any file closes won't delete the inode's
multi-block reservation while the function is running.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-03 16:38:58 +01:00
Bob Peterson
4a58681205 GFS2: Set log descriptor type for jdata blocks
This patch sets the log descriptor type according to whether the
journal commit is for (journaled) data or metadata. This was
recently broken when the functions to process data and metadata
log ops were combined.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-03 16:38:39 +01:00
Maxim Patlasov
e5c5f05dca fuse: fix alignment in short read optimization for async_dio
The bug was introduced with async_dio feature: trying to optimize short reads,
we cut number-of-bytes-to-read to i_size boundary. Hence the following example:

	truncate --size=300 /mnt/file
	dd if=/mnt/file of=/dev/null iflag=direct

led to FUSE_READ request of 300 bytes size. This turned out to be problem
for userspace fuse implementations who rely on assumption that kernel fuse
does not change alignment of request from client FS.

The patch turns off the optimization if async_dio is disabled. And, if it's
enabled, the patch fixes adjustment of number-of-bytes-to-read to preserve
alignment.

Note, that we cannot throw out short read optimization entirely because
otherwise a direct read of a huge size issued on a tiny file would generate
a huge amount of fuse requests and most of them would be ACKed by userspace
with zero bytes read.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-06-03 15:15:42 +02:00
Brian Foster
c9ecf989cc fuse: return -EIOCBQUEUED from fuse_direct_IO() for all async requests
If request submission fails for an async request (i.e.,
get_user_pages() returns -ERESTARTSYS), we currently skip the
-EIOCBQUEUED return and drop into wait_for_sync_kiocb() forever.

Avoid this by always returning -EIOCBQUEUED for async requests. If
an error occurs, the error is passed into fuse_aio_complete(),
returned via aio_complete() and thus propagated to userspace via
io_getevents().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-06-03 15:15:42 +02:00
Miklos Szeredi
28420dad23 fuse: fix readdirplus Oops in fuse_dentry_revalidate
Fix bug introduced by commit 4582a4ab2a "FUSE: Adapt readdirplus to application
usage patterns".

We need to check for a positive dentry; negative dentries are not added by
readdirplus.  Secondly we need to advise the use of readdirplus on the *parent*,
otherwise the whole thing is useless.  Thirdly all this is only relevant if
"readdirplus_auto" mode is selected by the filesystem.

We advise the use of readdirplus only if the dentry was still valid.  If we had
to redo the lookup then there was no use in doing the -plus version.

Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Feng Shuo <steve.shuo.feng@gmail.com>
CC: stable@vger.kernel.org
2013-06-03 14:40:22 +02:00
Linus Torvalds
6cf3c73620 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull assorted fixes from Al Viro:
 "There'll be more - I'm trying to dig out from under the pile of mail
  (a couple of weeks of something flu-like ;-/) and there's several more
  things waiting for review; this is just the obvious stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  zoran: racy refcount handling in vm_ops ->open()/->close()
  befs_readdir(): do not increment ->f_pos if filldir tells us to stop
  hpfs: deadlock and race in directory lseek()
  qnx6: qnx6_readdir() has a braino in pos calculation
  fix buffer leak after "scsi: saner replacements for ->proc_info()"
  vfs: Fix invalid ida_remove() call
2013-06-01 19:51:52 +09:00
Linus Torvalds
f8cb27916a NFS client fixes:
- Fix a regression that broke NFS mounting using klibc and busybox
 - Stable fix to check access modes correctly on NFSv4 delegated open()
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRqUQdAAoJEGcL54qWCgDy4gMP/RfrcBIAJxEBUfB7Q/hCjoRG
 CTmfoq8XK/fTTmCBuTTjf0QntlKNJMFJSnJwGZqFzqA6PJOSDvkWFDf0WzzCtIEN
 3D8C6rko7oYUU2tcu8bJK3CGkfFNgh6ON519XMHMcOZCIpzycIO6DrZFvkcOS6aW
 lzlqHkB/ksH/G4Z7k2C9YyS6Ljr0odAxhsOcn27BWMtktCvsgTyPhowfo83aOzyd
 EFnwHwXSCbawyYIqNiBTFjoudaZwlFDhbStSWX5jcI/JwpR8rfhN8n1dU+AJknE6
 d6DdEgk1CVuBUke0VcDLf0dJ5iMbKY5bnoJF5PF09qTINPmErKUMNI13P3wt+Uyn
 RkmhoiH0ooiBhrkCJWlZHo/EJxl/ionQR7vOlEhN/Hvp11oreXmvHx/pL2WCCanJ
 wLEJnFGflay7NUfkKWSuUpyyBJTS7/5lrWaabBZ1qxlHSOonzwaKW3G0slzJroiV
 x6YmM7xEe6CEquEQmyFm5kJjlueYJpPzIvxFYsPun1cxpzjKosXgNCOEe8oe5Znj
 JkvNS+zc+VN7pw06EtmKgyqoavUSbVcAeZICegskCkUuhYQkdJaP5Lc819qeZj8D
 rlgyltkh6XP7s8W7a4UWePHqrjZyen62ULtNPLoEainSbdpu0BrItZqgnokQ3AIe
 mRiIKkpkoqHKVWWozhCh
 =HTIq
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull two NFS client fixes from Trond Myklebust:
 - Fix a regression that broke NFS mounting using klibc and busybox
 - Stable fix to check access modes correctly on NFSv4 delegated open()

* tag 'nfs-for-3.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix security flavor negotiation with legacy binary mounts
  NFSv4: Fix a thinko in nfs4_try_open_cached
2013-06-01 19:48:59 +09:00
Linus Torvalds
1d822d6094 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull reiserfs fixes from Jan Kara:
 "Three reiserfs fixes.  They fix real problems spotted by users so I
  hope they are ok even at this stage."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: fix deadlock with nfs racing on create/lookup
  reiserfs: fix problems with chowning setuid file w/ xattrs
  reiserfs: fix spurious multiple-fill in reiserfs_readdir_dentry
2013-06-01 06:59:14 +09:00
Linus Torvalds
7cfb953258 xfs: extended attribute fixes for CRCs
- Remove assert on count of remote attribute CRC headers
 - Fix the number of blocks read in for remote attributes
 - Zero remote attribute tails properly
 - Fix mapping of remote attribute buffers to have correct length
 - initialize temp leaf properly in xfs_attr3_leaf_unbalance, and
   xfs_attr3_leaf_compact
 - Rework remote atttributes to have a header per block
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRqQE3AAoJENaLyazVq6ZOBHMQAIaaBdvrHV0tg/EyzVCuVFul
 BJH2RrL7WqFe6cXjkb5qV9neesj/dyDoSOEVqko7klNxlUhnSK6j9UnM9QfeAZVg
 x22TQ+c938giCu/iibdaFsyEVIgX2WfjPY+OoKJh1kTPOimO1ekRE/H8qRJ3A5Ge
 hl5H/pyDcJXiXowNVQEF6sPj59BzEn65kelpIn/uW5BzrmdIjfM+wnvr4mvkxrCS
 0FM9acLmEl/ri448wsZjrlA8yltqg/J8LS+oc6AgWhWGB9qL3LzM7xtQblv47ail
 kJrvJek8RMQT1nQ8N1gQa0jCSCtWB+Lir04ynTM0b5h2NKcm6iFih5kd6a+wl5qd
 p26LkTSjIQLv21dx9fylTvUIf4xx0NCW/BNqSPpEgTXi6FUiYGzEq7iuvCfbGSYf
 +k9PB3KkVhmRzv/IWvP8OmnrhXZhQVo55Z+OulHwuaZ/+orL0Y0TtypW6imjETK5
 b/JiRoVaN1OroZUyg1qk4OnYftl6j93X3iyLniIZiI5hTMNZBhiqe4EcHd++pYDN
 gpaQC/V/9pAgSWhyyD6E5Jv7HvkUuOc73mSPHH3oz5WAoEI6I1XWx+pExo15BiZD
 FADL3HSnnvGDgGxfWb0nYSS/M0KdAT9lpu+48lKg1sIXVANnYkHxJNofM6ZXe042
 BUuItw8SCGARf9JEJF8J
 =VDbP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.10-rc4-crc-xattr-fixes' of git://oss.sgi.com/xfs/xfs

Pull xfs extended attribute fixes for CRCs from Ben Myers:
 "Here are several fixes that are relevant on CRC enabled XFS
  filesystems.  They are followed by a rework of the remote attribute
  code so that each block of the attribute contains a header with a CRC.

  Previously there was a CRC header per extent in the remote attribute
  code, but this was untenable because it was not possible to know how
  many extents would be allocated for the attribute until after the
  allocation has completed, due to the fragmentation of free space.
  This became complicated because the size of the headers needs to be
  added to the length of the payload to get the overall length required
  for the allocation.  With a header per block, things are less
  complicated at the cost of a little space.

  I would have preferred to defer this and the rest of the CRC queue to
  3.11 to mitigate risk for existing non-crc users in 3.10.  Doing so
  would require setting a feature bit for the on-disk changes, and so I
  have been pressured into sending this pull request by Eric Sandeen and
  David Chinner from Red Hat.  I'll send another pull request or two
  with the rest of the CRC queue next week.

   - Remove assert on count of remote attribute CRC headers
   - Fix the number of blocks read in for remote attributes
   - Zero remote attribute tails properly
   - Fix mapping of remote attribute buffers to have correct length
   - initialize temp leaf properly in xfs_attr3_leaf_unbalance, and
     xfs_attr3_leaf_compact
   - Rework remote atttributes to have a header per block"

* tag 'for-linus-v3.10-rc4-crc-xattr-fixes' of git://oss.sgi.com/xfs/xfs:
  xfs: rework remote attr CRCs
  xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
  xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
  xfs: correctly map remote attr buffers during removal
  xfs: remote attribute tail zeroing does too much
  xfs: remote attribute read too short
  xfs: remote attribute allocation may be contiguous
2013-06-01 06:56:21 +09:00
Linus Torvalds
e8d256aca0 xfs: fixes for 3.10-rc4
- Fix nested transactions in xfs_qm_scall_setqlim
 - Clear suid/sgid bits when we truncate with size update
 - Fix recovery for split buffers
 - Fix block count on remote symlinks
 - Add fsgeom flag for v5 superblock support
 - Disable XFS_IOC_SWAPEXT for CRC enabled filesystems
 - Fix dirv3 freespace block corruption
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRqPJxAAoJENaLyazVq6ZOxVIQAIPG+G+PKTQsbj9kNAxk2yGb
 jAD9HD9bbFFD1Xz3RoO/CoJpXcxo/Gj/6r94ZNTlmvR1SHyL4P/9gCBI8F9Su+Ru
 wAtSyWyubV8bERYLfQ5zys7sNWBGhwYkHyjslIkuoRMv6jlAFpZugghKbF7hNCKg
 qNCkMlaj0uQQAWLAjiWMUwMYjQEhzmvds1VCf4pSCycI3/opYIi9ZV6Wr5vHcJkw
 nicZa5M9OH1oiIADKiMBXMdNGjmLx3l3gM7eqOoJeJWlok6MSp0AciB2hYwDuz/8
 5cxkId5dHO0IV1Bf0N4vYDGxnDBRbbncVzywFYS0WS/4MjL8IGR7jUzGZV+bH52v
 jax5TbS6EkbmgVr2hErmjBvUoAnVRZ1rbAYDNpvudhNjAz0730y0Zy17J2ahja6Q
 0wunmpyd/3hfaeczVu1DkXDWIYv6c0yGxz2Ca9o+bk9PIS7T0NjVWc2niATtnPWh
 HaQU0Ix/2y2zHWnSFJ28iikG1Dm5iTyByCUK1sToHLilB7o6GA8HVVzf5T02usN4
 OxLWqKunwUQzlTJ9Ng6YUGlI/iZiep/VgSY59QpivF6KwMN+S7XE1lPnGYRvHNVy
 gJCjyjbT9qPsJ1gy5pun9IuzbusYPpOzU/OUQtDNx35xyrRfV4Jy1fijSCHykHM+
 AWVVy0u03OueOE3/LQGv
 =r+vv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.10-rc4' of git://oss.sgi.com/xfs/xfs

Pull xfs fixes from Ben Myers:
 - Fix nested transactions in xfs_qm_scall_setqlim
 - Clear suid/sgid bits when we truncate with size update
 - Fix recovery for split buffers
 - Fix block count on remote symlinks
 - Add fsgeom flag for v5 superblock support
 - Disable XFS_IOC_SWAPEXT for CRC enabled filesystems
 - Fix dirv3 freespace block corruption

* tag 'for-linus-v3.10-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: fix dir3 freespace block corruption
  xfs: disable swap extents ioctl on CRC enabled filesystems
  xfs: add fsgeom flag for v5 superblock support.
  xfs: fix incorrect remote symlink block count
  xfs: fix split buffer vector log recovery support
  xfs: kill suid/sgid through the truncate path.
  xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
2013-06-01 06:50:16 +09:00
Jeff Layton
1fc29baced cifs: fix off-by-one bug in build_unc_path_to_root
commit 839db3d10a (cifs: fix up handling of prefixpath= option) changed
the code such that the vol->prepath no longer contained a leading
delimiter and then fixed up the places that accessed that field to
account for that change.

One spot in build_unc_path_to_root was missed however. When doing the
pointer addition on pos, that patch failed to account for the fact that
we had already incremented "pos" by one when adding the length of the
prepath. This caused a buffer overrun by one byte.

This patch fixes the problem by correcting the handling of "pos".

Cc: <stable@vger.kernel.org> # v3.8+
Reported-by: Marcus Moeller <marcus.moeller@gmx.ch>
Reported-by: Ken Fallon <ken.fallon@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-05-31 16:23:35 -05:00
Jeff Mahoney
a1457c0ce9 reiserfs: fix deadlock with nfs racing on create/lookup
Reiserfs is currently able to be deadlocked by having two NFS clients
where one has removed and recreated a file and another is accessing the
file with an open file handle.

If one client deletes and recreates a file with timing such that the
recreated file obtains the same [dirid, objectid] pair as the original
file while another client accesses the file via file handle, the create
and lookup can race and deadlock if the lookup manages to create the
in-memory inode first.

The create thread, in insert_inode_locked4, will hold the write lock
while waiting on the other inode to be unlocked. The lookup thread,
anywhere in the iget path, will release and reacquire the write lock while
it schedules. If it needs to reacquire the lock while the create thread
has it, it will never be able to make forward progress because it needs
to reacquire the lock before ultimately unlocking the inode.

This patch drops the write lock across the insert_inode_locked4 call so
that the ordering of inode_wait -> write lock is retained. Since this
would have been the case before the BKL push-down, this is safe.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-05-31 23:14:20 +02:00
Jeff Mahoney
4a8570112b reiserfs: fix problems with chowning setuid file w/ xattrs
reiserfs_chown_xattrs() takes the iattr struct passed into ->setattr
and uses it to iterate over all the attrs associated with a file to change
ownership of xattrs (and transfer quota associated with the xattr files).

When the setuid bit is cleared during chown, ATTR_MODE and iattr->ia_mode
are passed to all the xattrs as well. This means that the xattr directory
will have S_IFREG added to its mode bits.

This has been prevented in practice by a missing IS_PRIVATE check
in reiserfs_acl_chmod, which caused a double-lock to occur while holding
the write lock. Since the file system was completely locked up, the
writeout of the corrupted mode never happened.

This patch temporarily clears everything but ATTR_UID|ATTR_GID for the
calls to reiserfs_setattr and adds the missing IS_PRIVATE check.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-05-31 23:14:11 +02:00
Jeff Mahoney
0bdc7acba5 reiserfs: fix spurious multiple-fill in reiserfs_readdir_dentry
After sleeping for filldir(), we check to see if the file system has
changed and research. The next_pos pointer is updated but its value
isn't pushed into the key used for the search itself. As a result,
the search returns the same item that the last cycle of the loop did
and filldir() is called multiple times with the same data.

The end result is that the buffer can contain the same name multiple
times. This can be returned to userspace or used internally in the
xattr code where it can manifest with the following warning:

jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2)

reiserfs_for_each_xattr uses reiserfs_readdir_dentry to iterate over
the xattr names and ends up trying to unlink the same name twice. The
second attempt fails with -ENOENT and the error is returned. At some
point I'll need to add support into reiserfsck to remove the orphaned
directories left behind when this occurs.

The fix is to push the value into the key before researching.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-05-31 23:13:58 +02:00
Al Viro
448293aadb befs_readdir(): do not increment ->f_pos if filldir tells us to stop
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31 15:17:56 -04:00
Al Viro
31abdab9c1 hpfs: deadlock and race in directory lseek()
For one thing, there's an ABBA deadlock on hpfs fs-wide lock and i_mutex
in hpfs_dir_lseek() - there's a lot of methods that grab the former with
the caller already holding the latter, so it must take i_mutex first.

For another, locking the damn thing, carefully validating the offset,
then dropping locks and assigning the offset is obviously racy.

Moreover, we _must_ do hpfs_add_pos(), or the machinery in dnode.c
won't modify the sucker on B-tree surgeries.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31 15:17:43 -04:00
Al Viro
1d7095c72d qnx6: qnx6_readdir() has a braino in pos calculation
We want to mask lower 5 bits out, not leave only those and clear the
rest...  As it is, we end up always starting to read from the beginning
of directory, no matter what the current position had been.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31 15:17:31 -04:00
Takashi Iwai
5d477b6079 vfs: Fix invalid ida_remove() call
When the group id of a shared mount is not allocated, the umount still
tries to call mnt_release_group_id(), which eventually hits a kernel
warning at ida_remove() spewing a message like:
  ida_remove called for id=0 which is not allocated.

This patch fixes the bug simply checking the group id in the caller.

Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31 15:16:33 -04:00
Linus Torvalds
484b002e28 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:

 - Three EFI-related fixes

 - Two early memory initialization fixes

 - build fix for older binutils

 - fix for an eager FPU performance regression -- currently we don't
   allow the use of the FPU at interrupt time *at all* in eager mode,
   which is clearly wrong.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Allow FPU to be used at interrupt time even with eagerfpu
  x86, crc32-pclmul: Fix build with older binutils
  x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
  x86, range: fix missing merge during add range
  x86, efi: initial the local variable of DataSize to zero
  efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse
  efivarfs: Never return ENOENT from firmware again
2013-05-31 09:44:10 +09:00
Dave Chinner
7bc0dc271e xfs: rework remote attr CRCs
Note: this changes the on-disk remote attribute format. I assert
that this is OK to do as CRCs are marked experimental and the first
kernel it is included in has not yet reached release yet. Further,
the userspace utilities are still evolving and so anyone using this
stuff right now is a developer or tester using volatile filesystems
for testing this feature. Hence changing the format right now to
save longer term pain is the right thing to do.

The fundamental change is to move from a header per extent in the
attribute to a header per filesytem block in the attribute. This
means there are more header blocks and the parsing of the attribute
data is slightly more complex, but it has the advantage that we
always know the size of the attribute on disk based on the length of
the data it contains.

This is where the header-per-extent method has problems. We don't
know the size of the attribute on disk without first knowing how
many extents are used to hold it. And we can't tell from a
mapping lookup, either, because remote attributes can be allocated
contiguously with other attribute blocks and so there is no obvious
way of determining the actual size of the atribute on disk short of
walking and mapping buffers.

The problem with this approach is that if we map a buffer
incorrectly (e.g. we make the last buffer for the attribute data too
long), we then get buffer cache lookup failure when we map it
correctly. i.e. we get a size mismatch on lookup. This is not
necessarily fatal, but it's a cache coherency problem that can lead
to returning the wrong data to userspace or writing the wrong data
to disk. And debug kernels will assert fail if this occurs.

I found lots of niggly little problems trying to fix this issue on a
4k block size filesystem, finally getting it to pass with lots of
fixes. The thing is, 1024 byte filesystems still failed, and it was
getting really complex handling all the corner cases that were
showing up. And there were clearly more that I hadn't found yet.

It is complex, fragile code, and if we don't fix it now, it will be
complex, fragile code forever more.

Hence the simple fix is to add a header to each filesystem block.
This gives us the same relationship between the attribute data
length and the number of blocks on disk as we have without CRCs -
it's a linear mapping and doesn't require us to guess anything. It
is simple to implement, too - the remote block count calculated at
lookup time can be used by the remote attribute set/get/remove code
without modification for both CRC and non-CRC filesystems. The world
becomes sane again.

Because the copy-in and copy-out now need to iterate over each
filesystem block, I moved them into helper functions so we separate
the block mapping and buffer manupulations from the attribute data
and CRC header manipulations. The code becomes much clearer as a
result, and it is a lot easier to understand and debug. It also
appears to be much more robust - once it worked on 4k block size
filesystems, it has worked without failure on 1k block size
filesystems, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit ad1858d777)
2013-05-30 17:26:31 -05:00
Dave Chinner
634fd5322a xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
xfs_attr3_leaf_compact() uses a temporary buffer for compacting the
the entries in a leaf. It copies the the original buffer into the
temporary buffer, then zeros the original buffer completely. It then
copies the entries back into the original buffer.  However, the
original buffer has not been correctly initialised, and so the
movement of the entries goes horribly wrong.

Make sure the zeroed destination buffer is fully initialised, and
once we've set up the destination incore header appropriately, write
is back to the buffer before starting to move entries around.

While debugging this, the _d/_s prefixes weren't sufficient to
remind me what buffer was what, so rename then all _src/_dst.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit d4c712bcf2)
2013-05-30 17:26:24 -05:00
Dave Chinner
9e80c76205 xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
xfs_attr3_leaf_unbalance() uses a temporary buffer for recombining
the entries in two leaves when the destination leaf requires
compaction. The temporary buffer ends up being copied back over the
original destination buffer, so the header in the temporary buffer
needs to contain all the information that is in the destination
buffer.

To make sure the temporary buffer is fully initialised, once we've
set up the temporary incore header appropriately, write is back to
the temporary buffer before starting to move entries around.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 8517de2a81)
2013-05-30 17:26:16 -05:00
Dave Chinner
58a7228155 xfs: correctly map remote attr buffers during removal
If we don't map the buffers correctly (same as for get/set
operations) then the incore buffer lookup will fail. If a block
number matches but a length is wrong, then debug kernels will ASSERT
fail in _xfs_buf_find() due to the length mismatch. Ensure that we
map the buffers correctly by basing the length of the buffer on the
attribute data length rather than the remote block count.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 6863ef8449)
2013-05-30 17:26:08 -05:00
Dave Chinner
26f714450c xfs: remote attribute tail zeroing does too much
When an attribute data does not fill then entire remote block, we
zero the remaining part of the buffer. This, however, needs to take
into account that the buffer has a header, and so the offset where
zeroing starts and the length of zeroing need to take this into
account. Otherwise we end up with zeros over the end of the
attribute value when CRCs are enabled.

While there, make sure we only ask to map an extent that covers the
remaining range of the attribute, rather than asking every time for
the full length of remote data. If the remote attribute blocks are
contiguous with other parts of the attribute tree, it will map those
blocks as well and we can potentially zero them incorrectly. We can
also get buffer size mistmatches when trying to read or remove the
remote attribute, and this can lead to not finding the correct
buffer when looking it up in cache.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 4af3644c9a)
2013-05-30 17:25:58 -05:00