Commit Graph

36728 Commits

Author SHA1 Message Date
Steven Whitehouse
b1ab1e44b4 GFS2: Remove extra "if" in gfs2_log_flush()
By reordering some of the assignments in gfs2_log_flush() it
is possible to remove one of the "if" statements as it can be
merged with one higher up the function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-25 11:52:20 +00:00
Jan Kara
ff57cd5863 fsnotify: Allocate overflow events with proper type
Commit 7053aee26a "fsnotify: do not share events between notification
groups" used overflow event statically allocated in a group with the
size of the generic notification event. This causes problems because
some code looks at type specific parts of event structure and gets
confused by a random data it sees there and causes crashes.

Fix the problem by allocating overflow event with type corresponding to
the group type so code cannot get confused.

Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-25 11:18:06 +01:00
Jan Kara
482ef06c5e fanotify: Handle overflow in case of permission events
If the event queue overflows when we are handling permission event, we
will never get response from userspace. So we must avoid waiting for it.
Change fsnotify_add_notify_event() to return whether overflow has
happened so that we can detect it in fanotify_handle_event() and act
accordingly.

Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-25 11:17:58 +01:00
Jan Kara
2513190a92 fsnotify: Fix detection whether overflow event is queued
Currently we didn't initialize event's list head when we removed it from
the event list. Thus a detection whether overflow event is already
queued wasn't working. Fix it by always initializing the list head when
deleting event from a list.

Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-25 11:17:52 +01:00
Dan Carpenter
47ba973440 fs: NULL dereference in posix_acl_to_xattr()
This patch moves the dereference of "buffer" after the check for NULL.
The only place which passes a NULL parameter is gfs2_set_acl().

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-25 10:01:09 +00:00
Steven Whitehouse
022ef4feed GFS2: Move log buffer accounting to transaction
Now we have a master transaction into which other transactions
are merged, the accounting can be done using this master
transaction. We no longer require the superblock fields which
were being used for this function.

In addition, this allows for a clean up in calc_reserved()
making it rather easier understand. Also, by reducing the
number of variables used to track the buffers being added
and removed from the journal, a number of error checks are
now no longer required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-24 19:49:12 +00:00
Steven Whitehouse
d69a3c6561 GFS2: Move log buffer lists into transaction
Over time, we hope to be able to improve the concurrency available
in the log code. This is one small step towards that, by moving
the buffer lists from the super block, and into the transaction
structure, so that each transaction builds its own buffer lists.

At transaction commit time, the buffer lists are merged into
the currently accumulating transaction. That transaction then
is passed into the before and after commit functions at journal
flush time. Thus there should be no change in overall behaviour
yet.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-24 16:54:54 +00:00
Jaegeuk Kim
8b8343fa9d f2fs: implement a lock-free stat_show
The stat_show is just to show the current status of f2fs.
So, we can remove all the there-in locks.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:41 +09:00
Jaegeuk Kim
8a7ed66aaf f2fs: introduce a radix_tree for the free_nid list
This patch introduces a radix tree for the list of free_nids, which enhances
the performance on free nid management.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:41 +09:00
Gu Zheng
f978f5a061 f2fs: introduce help macro on_build_free_nids()
Introduce help macro on_build_free_nids() which just uses build_lock
to judge whether the building free nid is going, so that we can remove
the on_build_free_nids field from f2fs_sb_info.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: remove an unnecessary white line removal]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:40 +09:00
Jaegeuk Kim
fffc2a00fc f2fs: fix to mark the checkpointed nat entry correctly
The nat cache entry maintains a status whether it is checkpointed or not.
So, if a new cache entry is loaded from the last checkpoint,
nat_entry->checkpointed should be true.
If the cache entry is modified as being dirty, nat_entry->checkpoint should
be false.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:40 +09:00
Jaegeuk Kim
6437d1b0ad f2fs: fix to do build_stat prior to the recovery procedure
At the end of the recovery procedure, write_checkpoint is called and updates
the cp count which is managed by f2fs stat.
But, previously build_stat() is called after the recovery procedure, which
results in:

BUG: unable to handle kernel NULL pointer dereference at 000000000000012c
IP: [<ffffffffa03b1030>] write_checkpoint+0x720/0xbc0 [f2fs]
Call Trace:
 [<ffffffff810a6b44>] ? mark_held_locks+0x74/0x140
 [<ffffffff8109a3e0>] ? __init_waitqueue_head+0x60/0x60
 [<ffffffffa03bf036>] recover_fsync_data+0x656/0xf20 [f2fs]
 [<ffffffff812ee3eb>] ? security_d_instantiate+0x1b/0x30
 [<ffffffffa03aeb4d>] f2fs_fill_super+0x94d/0xa00 [f2fs]
 [<ffffffff811a9825>] mount_bdev+0x1a5/0x1f0
 [<ffffffff8114915e>] ? __get_free_pages+0xe/0x40
 [<ffffffffa03ae200>] ? f2fs_remount+0x130/0x130 [f2fs]
 [<ffffffffa03aa575>] f2fs_mount+0x15/0x20 [f2fs]
 [<ffffffff811aa713>] mount_fs+0x43/0x1b0
 [<ffffffff811c7124>] vfs_kern_mount+0x74/0x160
 [<ffffffff811c5cb1>] ? __get_fs_type+0x51/0x60
 [<ffffffff811c9727>] do_mount+0x237/0xb50
 [<ffffffff811c936a>] ? copy_mount_options+0x3a/0x170

So, this patche changes the order of recovery_fsync_data() and
f2fs_build_stats().

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:39 +09:00
Jaegeuk Kim
8618b881e9 f2fs: fix not to write data pages on the page reclaiming path
Even if f2fs_write_data_page is called by the page reclaiming path, we should
not write the page to provide enough free segments for the worst case scenario.
Otherwise, f2fs can face with no free segment while gc is conducted, resulting
in:

 ------------[ cut here ]------------
 kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/segment.c:565!
 RIP: 0010:[<ffffffffa02c3b11>]  [<ffffffffa02c3b11>] new_curseg+0x331/0x340 [f2fs]
 Call Trace:
  allocate_segment_by_default+0x204/0x280 [f2fs]
  allocate_data_block+0x108/0x210 [f2fs]
  write_data_page+0x8a/0xc0 [f2fs]
  do_write_data_page+0xe1/0x2a0 [f2fs]
  move_data_page+0x8a/0xf0 [f2fs]
  f2fs_gc+0x446/0x970 [f2fs]
  f2fs_balance_fs+0xb6/0xd0 [f2fs]
  f2fs_write_begin+0x50/0x350 [f2fs]
  ? unlock_page+0x27/0x30
  ? unlock_page+0x27/0x30
  generic_file_buffered_write+0x10a/0x280
  ? file_update_time+0xa3/0xf0
  __generic_file_aio_write+0x1c8/0x3d0
  ? generic_file_aio_write+0x52/0xb0
  ? generic_file_aio_write+0x52/0xb0
  generic_file_aio_write+0x65/0xb0
  do_sync_write+0x5a/0x90
  vfs_write+0xc5/0x1f0
  SyS_write+0x55/0xa0
  system_call_fastpath+0x16/0x1b

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:33 +09:00
Jeff Layton
a26054d184 cifs: sanity check length of data to send before sending
We had a bug discovered recently where an upper layer function
(cifs_iovec_write) could pass down a smb_rqst with an invalid amount of
data in it. The length of the SMB frame would be correct, but the rqst
struct would cause smb_send_rqst to send nearly 4GB of data.

This should never be the case. Add some sanity checking to the beginning
of smb_send_rqst that ensures that the amount of data we're going to
send agrees with the length in the RFC1002 header. If it doesn't, WARN()
and return -EIO to the upper layers.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2014-02-23 20:55:07 -06:00
Pavel Shilovsky
6b1168e161 CIFS: Fix wrong pos argument of cifs_find_lock_conflict
and use generic_file_aio_write rather than __generic_file_aio_write
in cifs_writev.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steve French <smfrench@gmail.com>
2014-02-23 20:54:50 -06:00
Namjae Jeon
e1d8fb88a6 xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for XFS.

The semantics of this flag are following:
1) It collapses the range lying between offset and length by removing any data
   blocks which are present in this range and than updates all the logical
   offsets of extents beyond "offset + len" to nullify the hole created by
   removing blocks. In short, it does not leave a hole.
2) It should be used exclusively. No other fallocate flag in combination.
3) Offset and length supplied to fallocate should be fs block size aligned
   in case of xfs and ext4.
4) Collaspe range does not work beyond i_size.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-24 10:58:19 +11:00
Namjae Jeon
00f5e61998 fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate
This patch is in response of the following post:
http://lwn.net/Articles/556136/
"ext4: introduce two new ioctls"

Dave chinner suggested that truncate_block_range
(which was one of the ioctls name) should be a fallocate operation
and not any fs specific ioctl, hence we add this functionality to new flags of fallocate.

This new functionality of collapsing range could be used by media editing tools
which does non linear editing to quickly purge and edit parts of a media file.
This will immensely improve the performance of these operations.
The limitation of fs block size aligned offsets can be easily handled
by media codecs which are encapsulated in a conatiner as they have to
just change the offset to next keyframe value to match the proper alignment.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-24 10:58:15 +11:00
Namjae Jeon
9eb79482a9 ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4.
 
The semantics of this flag are following:
1) It collapses the range lying between offset and length by removing any data
   blocks which are present in this range and than updates all the logical
   offsets of extents beyond "offset + len" to nullify the hole created by
   removing blocks. In short, it does not leave a hole.
2) It should be used exclusively. No other fallocate flag in combination.
3) Offset and length supplied to fallocate should be fs block size aligned
   in case of xfs and ext4.
4) Collaspe range does not work beyond i_size.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Tested-by: Dongsu Park <dongsu.park@profitbricks.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-23 15:18:59 -05:00
Linus Torvalds
645ceee885 Merge branch 'xfs-fixes-for-3.14-rc4' of git://oss.sgi.com/xfs/xfs
Pull xfs fixes from Dave Chinner:
 "This is the first pull request I've had to do for you, so I'm still
  sorting things out.  The reason I'm sending this and not Ben should be
  obvious from the first commit below - SGI has stepped down from the
  XFS maintainership role.  As such, I'd like to take another
  opportunity to thank them for their many years of effort maintaining
  XFS and supporting the XFS community that they developed from the
  ground up.

  So I haven't had time to work things like signed tags into my
  workflows yet, so this is just a repo branch I'm asking you to pull
  from.  And yes, I named the branch -rc4 because I wanted the fixes in
  rc4, not because the branch was for merging into -rc3.  Probably not
  right, either.

  Anyway, I should have everything sorted out by the time the next merge
  window comes around.  If there's anything that you don't like in the
  pull req, feel free to flame me unmercifully.

  The changes are fixes for recent regressions and important thinkos in
  verification code:

        - a log vector buffer alignment issue on ia32
        - timestamps on truncate got mangled
        - primary superblock CRC validation fixes and error message
          sanitisation"

* 'xfs-fixes-for-3.14-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: limit superblock corruption errors to actual corruption
  xfs: skip verification on initial "guess" superblock read
  MAINTAINERS: SGI no longer maintaining XFS
  xfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sb
  xfs: ensure correct log item buffer alignment
  xfs: ensure correct timestamp updates from truncate
2014-02-22 08:26:01 -08:00
Lukas Czerner
a633f5a319 ext4: translate fallocate mode bits to strings
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-22 06:18:17 -05:00
Jan Kara
0dc83bd30b Revert "writeback: do not sync data dirtied after sync start"
This reverts commit c4a391b53a. Dave
Chinner <david@fromorbit.com> has reported the commit may cause some
inodes to be left out from sync(2). This is because we can call
redirty_tail() for some inode (which sets i_dirtied_when to current time)
after sync(2) has started or similarly requeue_inode() can set
i_dirtied_when to current time if writeback had to skip some pages. The
real problem is in the functions clobbering i_dirtied_when but fixing
that isn't trivial so revert is a safer choice for now.

CC: stable@vger.kernel.org # >= 3.13
Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-22 02:02:28 +01:00
Nicholas Bellinger
eec70897d8 bio-integrity: Drop bio_integrity_verify BUG_ON in post bip->bip_iter world
Given that bip->bip_iter.bi_size is decremented after bio_advance() ->
bio_integrity_advance() is called, the BUG_ON() in bio_integrity_verify()
ends up tripping in v3.14-rc1 code with the advent of immutable biovecs
in:

commit d57a5f7c66
Author: Kent Overstreet <kmo@daterainc.com>
Date:   Sat Nov 23 17:20:16 2013 -0800

    bio-integrity: Convert to bvec_iter

Given that there is no easy way to ascertain the original bi_size
value, go ahead and drop this BUG_ON().

Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-21 15:56:36 -08:00
Gu Zheng
bf36f9cfa6 fs/bio-integrity: remove duplicate code
Most code of function bio_integrity_verify and bio_integrity_generate
is the same, so introduce a help function bio_integrity_generate_verify()
to remove the duplicate code.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-21 15:28:30 -08:00
Steven Whitehouse
654a6d2f96 GFS2: Reduce struct gfs2_trans in size
A couple of "int" fields were being used as boolean values
so we can make them bitfields of one bit, and put them in
what might otherwise be a hole in the structure with 64
bit alignment.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-21 11:52:00 +00:00
Darrick J. Wong
a9b8241594 ext4: merge uninitialized extents
Allow for merging uninitialized extents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-20 21:17:35 -05:00
Maxim Patlasov
e251f9bca9 ext4: avoid exposure of stale data in ext4_punch_hole()
While handling punch-hole fallocate, it's useless to truncate page cache
before removing the range from extent tree (or block map in indirect case)
because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
read) immediately after truncating page cache, but before updating extent
tree (or block map). In that case the user will see stale data even after
fallocate is completed.

Until the problem of data corruption resulting from pages backed by
already freed blocks is fully resolved, the simple thing we can do now
is to add another truncation of pagecache after punch hole is done.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2014-02-20 16:58:05 -05:00
Eric Whitney
ce140cdd9c ext4: silence warnings in extent status tree debugging code
Adjust the conversion specifications in a few optionally compiled debug
messages to match the return type of ext4_es_status().  Also, make a
couple of minor grammatical message edits while we're at it.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-20 16:09:12 -05:00
Jan Kara
1362f4ea20 quota: Fix race between dqput() and dquot_scan_active()
Currently last dqput() can race with dquot_scan_active() causing it to
call callback for an already deactivated dquot. The race is as follows:

CPU1					CPU2
  dqput()
    spin_lock(&dq_list_lock);
    if (atomic_read(&dquot->dq_count) > 1) {
     - not taken
    if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
      spin_unlock(&dq_list_lock);
      ->release_dquot(dquot);
        if (atomic_read(&dquot->dq_count) > 1)
         - not taken
					  dquot_scan_active()
					    spin_lock(&dq_list_lock);
					    if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
					     - not taken
					    atomic_inc(&dquot->dq_count);
					    spin_unlock(&dq_list_lock);
        - proceeds to release dquot
					    ret = fn(dquot, priv);
					     - called for inactive dquot

Fix the problem by making sure possible ->release_dquot() is finished by
the time we call the callback and new calls to it will notice reference
dquot_scan_active() has taken and bail out.

CC: stable@vger.kernel.org # >= 2.6.29
Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-20 21:57:04 +01:00
Jan Kara
09ebb17ab4 udf: Fix data corruption on file type conversion
UDF has two types of files - files with data stored in inode (ICB in
UDF terminology) and files with data stored in external data blocks. We
convert file from in-inode format to external format in
udf_file_aio_write() when we find out data won't fit into inode any
longer. However the following race between two O_APPEND writes can happen:

CPU1					CPU2
udf_file_aio_write()			udf_file_aio_write()
  down_write(&iinfo->i_data_sem);
  checks that i_size + count1 fits within inode
    => no need to convert
  up_write(&iinfo->i_data_sem);
					  down_write(&iinfo->i_data_sem);
					  checks that i_size + count2 fits
					    within inode => no need to convert
					  up_write(&iinfo->i_data_sem);
  generic_file_aio_write()
    - extends file by count1 bytes
					  generic_file_aio_write()
					    - extends file by count2 bytes

Clearly if count1 + count2 doesn't fit into the inode, we overwrite
kernel buffers beyond inode, possibly corrupting the filesystem as well.

Fix the problem by acquiring i_mutex before checking whether write fits
into the inode and using __generic_file_aio_write() afterwards which
puts check and write into one critical section.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-20 21:56:00 +01:00
Linus Torvalds
6a4d07f85b Merge branch 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Quite a few fixes this time.

  Three locking fixes, all marked for -stable.  A couple error path
  fixes and some misc fixes.  Hugh found a bug in memcg offlining
  sequence and we thought we could fix that from cgroup core side but
  that turned out to be insufficient and got reverted.  A different fix
  has been applied to -mm"

* 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: update cgroup_enable_task_cg_lists() to grab siglock
  Revert "cgroup: use an ordered workqueue for cgroup destruction"
  cgroup: protect modifications to cgroup_idr with cgroup_mutex
  cgroup: fix locking in cgroup_cfts_commit()
  cgroup: fix error return from cgroup_create()
  cgroup: fix error return value in cgroup_mount()
  cgroup: use an ordered workqueue for cgroup destruction
  nfs: include xattr.h from fs/nfs/nfs3proc.c
  cpuset: update MAINTAINERS entry
  arm, pm, vmpressure: add missing slab.h includes
2014-02-20 12:01:09 -08:00
Eric Sandeen
dc9ddd984d ext4: remove unused ac_ex_scanned
When looking at a bug report with:

> kernel: EXT4-fs: 0 scanned, 0 found

I thought wow, 0 scanned, that's odd?  But it's not odd; it's printing
a variable that is initialized to 0 and never touched again.

It's never been used since the original merge, so I don't really even
know what the original intent was, either.

If anyone knows how to hook it up, speak now via patch, otherwise just
yank it so it's not making a confusing situation more confusing in
kernel logs.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-20 13:32:10 -05:00
Theodore Ts'o
e861b5e9a4 ext4: avoid possible overflow in ext4_map_blocks()
The ext4_map_blocks() function returns the number of blocks which
satisfying the caller's request.  This number of blocks requested by
the caller is specified by an unsigned integer, but the return value
of ext4_map_blocks() is a signed integer (to accomodate error codes
per the kernel's standard error signalling convention).

Historically, overflows could never happen since mballoc() will refuse
to allocate more than 2048 blocks at a time (which is something we
should fix), and if the blocks were already allocated, the fact that
there would be some number of intervening metadata blocks pretty much
guaranteed that there could never be a contiguous region of data
blocks that was greater than 2**31 blocks.

However, this is now possible if there is a file system which is a bit
bigger than 8TB, and is created using the new mke2fs hugeblock
feature, which can create a perfectly contiguous file.  In that case,
if a userspace program attempted to call fallocate() on this already
fully allocated file, it's possible that ext4_map_blocks() could
return a number large enough that it would overflow a signed integer,
resulting in a ext4 thinking that the ext4_map_blocks() call had
failed with some strange error code.

Since ext4_map_blocks() is always free to return a smaller number of
blocks than what was requested by the caller, fix this by capping the
number of blocks that ext4_map_blocks() will ever try to map to 2**31
- 1.  In practice this should never get hit, except by someone
deliberately trying to provke the above-described bug.

Thanks to the PaX team for asking whethre this could possibly happen
in some off-line discussions about using some static code checking
technology they are developing to find bugs in kernel code.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-20 12:54:05 -05:00
Jiri Kosina
d4263348f7 Merge branch 'master' into for-next 2014-02-20 14:54:28 +01:00
Theodore Ts'o
ab0c00fccf ext4: make sure ex.fe_logical is initialized
The lowest levels of mballoc set all of the fields of struct
ext4_free_extent except for fe_logical, since they are just trying to
find the requested free set of blocks, and the logical block hasn't
been set yet.  This makes some static code checkers sad.  Set it to
various different debug values, which would be useful when
debugging mballoc if these values were to ever show up due to the
parts of mballoc triyng to use ac->ac_b_ex.fe_logical before it is
properly upper layers of mballoc failing to properly set, usually by
ext4_mb_use_best_found().

Addresses-Coverity-Id: #139697
Addresses-Coverity-Id: #139698
Addresses-Coverity-Id: #139699

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-20 00:36:41 -05:00
Dave Chinner
027f185e66 Merge remote-tracking branch 'xfs-async-aio-extend' into for-next 2014-02-20 15:16:39 +11:00
Dave Chinner
b678573e29 Merge branch 'xfs-fixes-for-3.15' into for-next 2014-02-20 15:16:09 +11:00
Trond Myklebust
4f14c194a9 NFSv4: Clear the open state flags if the new stateid does not match
RFC3530 and RFC5661 both prescribe that the 'opaque' field of the
open stateid returned by new OPEN/OPEN_DOWNGRADE/CLOSE calls for
the same file and open owner should match.
If this is not the case, assume that the open state has been lost,
and that we need to recover it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 21:21:07 -05:00
Trond Myklebust
226056c5c3 NFSv4: Use correct locking when updating nfs4_state in nfs4_close_done
The stateid and state->flags should be updated atomically under
protection of the state->seqlock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 21:21:07 -05:00
Trond Myklebust
78096ccac5 NFSv4.1: Ensure that we free existing layout segments if we get a new layout
If the server returns a completely new layout stateid in response to our
LAYOUTGET, then make sure to free any existing layout segments.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 21:21:06 -05:00
Trond Myklebust
9a7fe9e890 NFSv4.1: Minor optimisation in get_layout_by_fh_locked()
If the filehandles match, but the igrab() fails, or the layout is
freed before we can get it, then just return NULL.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 21:21:06 -05:00
Trond Myklebust
27999f2530 NFSv4.1: Ensure that the layout recall callback matches layout stateids
It is not sufficient to compare filehandles when we receive a layout
recall from the server; we also need to check that the layout stateids
match.

Reported-by: shaobingqing <shaobingqing@bwstor.com.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 21:21:05 -05:00
Trond Myklebust
e999e80ee9 NFSv4: Don't update the open stateid unless it is newer than the old one
This patch is in preparation for the NFSv4.1 parallel open capability.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 21:21:05 -05:00
Trond Myklebust
2c64c57dfc NFSv4.1: Fix wraparound issues in pnfs_seqid_is_newer()
Subtraction of signed integers does not have well defined wraparound
semantics in the C99 standard. In order to be wraparound-safe, we
have to use unsigned subtraction, and then cast the result.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 21:21:01 -05:00
Theodore Ts'o
7b1b2c1b9c ext4: don't calculate total xattr header size unless needed
The function ext4_expand_extra_isize_ea() doesn't need the size of all
of the extended attribute headers.  So if we don't calculate it when
it is unneeded, it we can skip some undeeded memory references, and as
a bonus, we eliminate some kvetching by static code analysis tools.

Addresses-Coverity-Id: #741291

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-19 20:15:21 -05:00
Theodore Ts'o
9a6633b1a3 ext4: add ext4_es_store_pblock_status()
Avoid false positives by static code analysis tools such as sparse and
coverity caused by the fact that we set the physical block, and then
the status in the extent_status structure.  It is also more efficient
to set both of these values at once.

Addresses-Coverity-Id: #989077
Addresses-Coverity-Id: #989078
Addresses-Coverity-Id: #1080722

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2014-02-19 20:15:15 -05:00
Eric Whitney
ce37c42919 ext4: fix error return from ext4_ext_handle_uninitialized_extents()
Commit 3779473246 breaks the return of error codes from
ext4_ext_handle_uninitialized_extents() in ext4_ext_map_blocks().  A
portion of the patch assigns that function's signed integer return
value to an unsigned int.  Consequently, negatively valued error codes
are lost and can be treated as a bogus allocated block count.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-02-19 18:52:39 -05:00
Linus Torvalds
e95003c3f9 Merge tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include stable fixes for the following bugs:

   - General performance regression due to NFS_INO_INVALID_LABEL being
     set when the server doesn't support labeled NFS
   - Hang in the RPC code due to a socket out-of-buffer race
   - Infinite loop when trying to establish the NFSv4 lease
   - Use-after-free bug in the RPCSEC gss code.
   - nfs4_select_rw_stateid is returning with a non-zero error value on
     success

  Other bug fixes:

  - Potential memory scribble in the RPC bi-directional RPC code
  - Pipe version reference leak
  - Use the correct net namespace in the new NFSv4 migration code"

* tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS fix error return in nfs4_select_rw_stateid
  NFSv4: Use the correct net namespace in nfs4_update_server
  SUNRPC: Fix a pipe_version reference leak
  SUNRPC: Ensure that gss_auth isn't freed before its upcall messages
  SUNRPC: Fix potential memory scribble in xprt_free_bc_request()
  SUNRPC: Fix races in xs_nospace()
  SUNRPC: Don't create a gss auth cache unless rpc.gssd is running
  NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS
2014-02-19 12:13:02 -08:00
Andy Adamson
146d70caaa NFS fix error return in nfs4_select_rw_stateid
Do not return an error when nfs4_copy_delegation_stateid succeeds.

Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1392737765-41942-1-git-send-email-andros@netapp.com
Fixes: ef1820f9be (NFSv4: Don't try to recover NFSv4 locks when...)
Cc: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 09:31:56 -05:00
Masanari Iida
e227867f12 treewide: Fix typo in Documentation/DocBook
This patch fix spelling typo in Documentation/DocBook.
It is because .html and .xml files are generated by make htmldocs,
I have to fix a typo within the source files.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-02-19 14:58:17 +01:00
Eric Sandeen
5ef11eb070 xfs: limit superblock corruption errors to actual corruption
Today, if

xfs_sb_read_verify
  xfs_sb_verify
    xfs_mount_validate_sb

detects superblock corruption, it'll be extremely noisy, dumping
2 stacks, 2 hexdumps, etc.

This is because we call XFS_CORRUPTION_ERROR in xfs_mount_validate_sb
as well as in xfs_sb_read_verify.

Also, *any* errors in xfs_mount_validate_sb which are not corruption
per se; things like too-big-blocksize, bad version, bad magic, v1 dirs,
rw-incompat etc - things which do not return EFSCORRUPTED - will
still do the whole XFS_CORRUPTION_ERROR spew when xfs_sb_read_verify
sees any error at all.  And it suggests to the user that they
should run xfs_repair, even if the root cause of the mount failure
is a simple incompatibility.

I'll submit that the probably-not-corrupted errors don't warrant
this much noise, so this patch removes the warning for anything
other than EFSCORRUPTED returns, and replaces the lower-level
XFS_CORRUPTION_ERROR with an xfs_notice().

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-19 15:39:35 +11:00