Commit Graph

44921 Commits

Author SHA1 Message Date
Christoph Hellwig
f0c6bcba74 xfs: reorder zeroing and flushing sequence in truncate
Currently zeroing out blocks and waiting for writeout is a bit of a mess in
truncate.  This patch gives it a clear order in preparation for the iomap
path:

 (1) we first wait for any direct I/O to complete to prevent any races
     for it
 (2) we then perform the actual zeroing, and only use the truncate_page
     helpers for truncating down.  The truncate up case already is
     handled by the separate call to xfs_zero_eof.
 (3) only then we write back dirty data, as zeroing block may cause
     dirty pages when using either xfs_zero_eof or the new iomap
     infrastructure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:52:47 +10:00
Christoph Hellwig
3b3dce0527 xfs: make xfs_bmbt_to_iomap available outside of xfs_pnfs.c
And ensure it works for RT subvolume files an set the block device,
both of which will be needed to be able to use the function in the
buffered write path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:52:47 +10:00
Christoph Hellwig
8be9f564d2 fs: iomap based fiemap implementation
Add a simple fiemap implementation based on iomap_ops, partially based
on a previous implementation from Bob Peterson <rpeterso@redhat.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:38:45 +10:00
Christoph Hellwig
9a286f0e52 fs: support DAX based iomap zeroing
This avoid needing a separate inefficient get_block based DAX zero_range
implementation in file systems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:31:39 +10:00
Christoph Hellwig
ae259a9c85 fs: introduce iomap infrastructure
Add infrastructure for multipage buffered writes.  This is implemented
using an main iterator that applies an actor function to a range that
can be written.

This infrastucture is used to implement a buffered write helper, one
to zero file ranges and one to implement the ->page_mkwrite VM
operations.  All of them borrow a fair amount of code from fs/buffers.
for now by using an internal version of __block_write_begin that
gets passed an iomap and builds the corresponding buffer head.

The file system is gets a set of paired ->iomap_begin and ->iomap_end
calls which allow it to map/reserve a range and get a notification
once the write code is finished with it.

Based on earlier code from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:23:11 +10:00
Christoph Hellwig
199a31c6d9 fs: move struct iomap from exportfs.h to a separate header
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21 09:22:39 +10:00
Dave Chinner
26f1fe858f xfs: reduce lock hold times in buffer writeback
When we have a lot of metadata to flush from the AIL, the buffer
list can get very long. The current submission code tries to batch
submission to optimise IO order of the metadata (i.e. ascending
block order) to maximise block layer merging or IO to adjacent
metadata blocks.

Unfortunately, the method used can result in long lock times
occurring as buffers locked early on in the buffer list might not be
dispatched until the end of the IO licst processing. This is because
sorting does not occur util after the buffer list has been processed
and the buffers that are going to be submitted are locked. Hence
when the buffer list is several thousand buffers long, the lock hold
times before IO dispatch can be significant.

To fix this, sort the buffer list before we start trying to lock and
submit buffers. This means we can now submit buffers immediately
after they are locked, allowing merging to occur immediately on the
plug and dispatch to occur as quickly as possible. This means there
is minimal delay between locking the buffer and IO submission
occuring, hence reducing the worst case lock hold times seen during
delayed write buffer IO submission signficantly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01 17:38:15 +10:00
Christoph Hellwig
4478fb1f2d xfs: define XFS_IOC_FREEZE even if FIFREEZE is defined
And the same for XFS_IOC_THAW.  Just because we now have a common
version of the ioctl we still need to provide the old name for it
for anyone using those.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01 17:38:15 +10:00
Eric Sandeen
0d5a75e9e2 xfs: make several functions static
Al Viro noticed that xfs_lock_inodes should be static, and
that led to ... a few more.

These are just the easy ones, others require moving functions
higher in source files, so that's not done here to keep
this review simple.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01 17:38:15 +10:00
Brian Foster
0c871f9a10 xfs: remove spurious shutdown type check from xfs_bmap_finish()
The static checker reports that after commit 8d99fe92fe ("xfs: fix
efi/efd error handling to avoid fs shutdown hangs"), the code has been
reworked such that error == -EFSCORRUPTED is not possible in this
codepath.

Remove the spurious error check and just use SHUTDOWN_META_IO_ERROR
unconditionally.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01 17:38:15 +10:00
Brian Foster
a3916e528b xfs: fix broken multi-fsb buffer logging
Multi-block buffers are logged based on buffer offset in
xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in
the buffer and marks the associated range to be logged in the
xfs_buf_log_format bitmap for that mapping. This code is broken,
however, in that it marks the actual buffer offsets of the associated
range in each bitmap rather than shifting to the byte range for that
particular mapping.

For example, on a 4k fsb fs, buffer offset 4096 refers to the first byte
of the second mapping in the buffer. This means byte 0 of the second log
format bitmap should be tagged as dirty. Instead, the current code marks
byte offset 4096 of the second log format bitmap, which is invalid and
potentially out of range of the mapping.

As a result of this, the log item format code invoked at transaction
commit time is not be able to correctly identify what parts of the
buffer to copy into log vectors. This can lead to NULL log vector
pointer dereferences in CIL push context if the item format code was not
able to locate any dirty ranges at all. This crash has been reproduced
on a 4k FSB filesystem using 16k directory blocks where an unlink
operation happened not to log anything in the first block of the
mapping. The logged offsets were all over 4k, marked as such in the
subsequent log format mappings, and thus left the transaction with an
xfs_log_item that is marked DIRTY but without any logged regions.

Further, even when the logged regions are marked correctly in the buffer
log format bitmaps, the format code doesn't copy the correct ranges of
the buffer into the log. This means that any logged region beyond the
first block of a multi-block buffer is subject to corruption after a
crash and log recovery sequence. This is due to a failure to convert the
mapping bm_len field from basic blocks to bytes in the buffer offset
tracking code in xfs_buf_item_format().

Update xfs_buf_item_log() to convert buffer offsets to segment relative
offsets when logging multi-block buffers. This ensures that the modified
regions of a buffer are logged correctly and avoids the aforementioned
crash. Also update xfs_buf_item_format() to correctly track the source
offset into the buffer for the log vector formatting code. This ensures
that the correct data is copied into the log.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01 17:38:12 +10:00
George Spelvin
e0ab7af9bd hash_string: Fix zero-length case for !DCACHE_WORD_ACCESS
The self-test was updated to cover zero-length strings; the function
needs to be updated, too.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Fixes: fcfd2fbf22 ("fs/namei.c: Add hashlen_string() function")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-29 07:33:47 -07:00
George Spelvin
f2a031b66e Rename other copy of hash_string to hashlen_string
The original name was simply hash_string(), but that conflicted with a
function with that name in drivers/base/power/trace.c, and I decided
that calling it "hashlen_" was better anyway.

But you have to do it in two places.

[ This caused build errors for architectures that don't define
  CONFIG_DCACHE_WORD_ACCESS   - Linus ]

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: fcfd2fbf22 ("fs/namei.c: Add hashlen_string() function")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-28 22:34:33 -07:00
Mikulas Patocka
037369b872 hpfs: implement the show_options method
The HPFS filesystem used generic_show_options to produce string that is
displayed in /proc/mounts.  However, there is a problem that the options
may disappear after remount.  If we mount the filesystem with option1
and then remount it with option2, /proc/mounts should show both option1
and option2, however it only shows option2 because the whole option
string is replaced with replace_mount_options in hpfs_remount_fs.

To fix this bug, implement the hpfs_show_options function that prints
options that are currently selected.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-28 16:50:24 -07:00
Mikulas Patocka
01d6e08711 affs: fix remount failure when there are no options changed
Commit c8f33d0bec ("affs: kstrdup() memory handling") checks if the
kstrdup function returns NULL due to out-of-memory condition.

However, if we are remounting a filesystem with no change to
filesystem-specific options, the parameter data is NULL.  In this case,
kstrdup returns NULL (because it was passed NULL parameter), although no
out of memory condition exists.  The mount syscall then fails with
ENOMEM.

This patch fixes the bug.  We fail with ENOMEM only if data is non-NULL.

The patch also changes the call to replace_mount_options - if we didn't
pass any filesystem-specific options, we don't call
replace_mount_options (thus we don't erase existing reported options).

Fixes: c8f33d0bec ("affs: kstrdup() memory handling")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# v4.1+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-28 16:50:24 -07:00
Mikulas Patocka
44d51706b4 hpfs: fix remount failure when there are no options changed
Commit ce657611ba ("hpfs: kstrdup() out of memory handling") checks if
the kstrdup function returns NULL due to out-of-memory condition.

However, if we are remounting a filesystem with no change to
filesystem-specific options, the parameter data is NULL.  In this case,
kstrdup returns NULL (because it was passed NULL parameter), although no
out of memory condition exists.  The mount syscall then fails with
ENOMEM.

This patch fixes the bug.  We fail with ENOMEM only if data is non-NULL.

The patch also changes the call to replace_mount_options - if we didn't
pass any filesystem-specific options, we don't call
replace_mount_options (thus we don't erase existing reported options).

Fixes: ce657611ba ("hpfs: kstrdup() out of memory handling")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-28 16:50:24 -07:00
Guenter Roeck
d66492bce1 fs: fix binfmt_aout.c build error
Various builds (such as i386:allmodconfig) fail with

  fs/binfmt_aout.c:133:2: error: expected identifier or '(' before 'return'
  fs/binfmt_aout.c:134:1: error: expected identifier or '(' before '}' token

[ Oops. My bad, I had stupidly thought that "allmodconfig" covered this
  on x86-64 too, but it obviously doesn't.  Egg on my face.  - Linus ]

Fixes: 5d22fc25d4 ("mm: remove more IS_ERR_VALUE abuses")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-28 16:34:59 -07:00
Linus Torvalds
7e0fb73c52 Merge branch 'hash' of git://ftp.sciencehorizons.net/linux
Pull string hash improvements from George Spelvin:
 "This series does several related things:

   - Makes the dcache hash (fs/namei.c) useful for general kernel use.

     (Thanks to Bruce for noticing the zero-length corner case)

   - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the
     above.

   - Avoids 64-bit multiplies in hash_64() on 32-bit platforms.  Two
     32-bit multiplies will do well enough.

   - Rids the world of the bad hash multipliers in hash_32.

     This finishes the job started in commit 689de1d6ca ("Minimal
     fix-up of bad hashing behavior of hash_64()")

     The vast majority of Linux architectures have hardware support for
     32x32-bit multiply and so derive no benefit from "simplified"
     multipliers.

     The few processors that do not (68000, h8/300 and some models of
     Microblaze) have arch-specific implementations added.  Those
     patches are last in the series.

   - Overhauls the dcache hash mixing.

     The patch in commit 0fed3ac866 ("namei: Improve hash mixing if
     CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
     Replaced with a much more careful design that's simultaneously
     faster and better.  (My own invention, as there was noting suitable
     in the literature I could find.  Comments welcome!)

   - Modify the hash_name() loop to skip the initial HASH_MIX().  This
     would let us salt the hash if we ever wanted to.

   - Sort out partial_name_hash().

     The hash function is declared as using a long state, even though
     it's truncated to 32 bits at the end and the extra internal state
     contributes nothing to the result.  And some callers do odd things:

      - fs/hfs/string.c only allocates 32 bits of state
      - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

   - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
     rather than 0..sizeof(long)-1.  This would simplify users other
     than full_name_hash"

  Special thanks to Bruce Fields for testing and finding bugs in v1.  (I
  learned some humbling lessons about "obviously correct" code.)

  On the arch-specific front, the m68k assembly has been tested in a
  standalone test harness, I've been in contact with the Microblaze
  maintainers who mostly don't care, as the hardware multiplier is never
  omitted in real-world applications, and I haven't heard anything from
  the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
  h8300: Add <asm/hash.h>
  microblaze: Add <asm/hash.h>
  m68k: Add <asm/hash.h>
  <linux/hash.h>: Add support for architecture-specific functions
  fs/namei.c: Improve dcache hash function
  Eliminate bad hash multipliers from hash_32() and  hash_64()
  Change hash_64() return value to 32 bits
  <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
  fs/namei.c: Add hashlen_string() function
  Pull out string hash to <linux/stringhash.h>
2016-05-28 16:15:25 -07:00
George Spelvin
468a942852 <linux/hash.h>: Add support for architecture-specific functions
This is just the infrastructure; there are no users yet.

This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of <asm/hash.h>.

That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis <alistai@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
2016-05-28 15:48:31 -04:00
George Spelvin
2a18da7a9c fs/namei.c: Improve dcache hash function
Patch 0fed3ac866 improved the hash mixing, but the function is slower
than necessary; there's a 7-instruction dependency chain (10 on x86)
each loop iteration.

Word-at-a-time access is a very tight loop (which is good, because
link_path_walk() is one of the hottest code paths in the entire kernel),
and the hash mixing function must not have a longer latency to avoid
slowing it down.

There do not appear to be any published fast hash functions that:
1) Operate on the input a word at a time, and
2) Don't need to know the length of the input beforehand, and
3) Have a single iterated mixing function, not needing conditional
   branches or unrolling to distinguish different loop iterations.

One of the algorithms which comes closest is Yann Collet's xxHash, but
that's two dependent multiplies per word, which is too much.

The key insights in this design are:

1) Barring expensive ops like multiplies, to diffuse one input bit
   across 64 bits of hash state takes at least log2(64) = 6 sequentially
   dependent instructions.  That is more cycles than we'd like.
2) An operation like "hash ^= hash << 13" requires a second temporary
   register anyway, and on a 2-operand machine like x86, it's three
   instructions.
3) A better use of a second register is to hold a two-word hash state.
   With careful design, no temporaries are needed at all, so it doesn't
   increase register pressure.  And this gets rid of register copying
   on 2-operand machines, so the code is smaller and faster.
4) Using two words of state weakens the requirement for one-round mixing;
   we now have two rounds of mixing before cancellation is possible.
5) A two-word hash state also allows operations on both halves to be
   done in parallel, so on a superscalar processor we get more mixing
   in fewer cycles.

I ended up using a mixing function inspired by the ChaCha and Speck
round functions.  It is 6 simple instructions and 3 cycles per iteration
(assuming multiply by 9 can be done by an "lea" instruction):

		x ^= *input++;
	y ^= x;	x = ROL(x, K1);
	x += y;	y = ROL(y, K2);
	y *= 9;

Not only is this reversible, two consecutive rounds are reversible:
if you are given the initial and final states, but not the intermediate
state, it is possible to compute both input words.  This means that at
least 3 words of input are required to create a collision.

(It also has the property, used by hash_name() to avoid a branch, that
it hashes all-zero to all-zero.)

The rotate constants K1 and K2 were found by experiment.  The search took
a sample of random initial states (I used 1023) and considered the effect
of flipping each of the 64 input bits on each of the 128 output bits two
rounds later.  Each of the 8192 pairs can be considered a biased coin, and
adding up the Shannon entropy of all of them produces a score.

The best-scoring shifts also did well in other tests (flipping bits in y,
trying 3 or 4 rounds of mixing, flipping all 64*63/2 pairs of input bits),
so the choice was made with the additional constraint that the sum of the
shifts is odd and not too close to the word size.

The final state is then folded into a 32-bit hash value by a less carefully
optimized multiply-based scheme.  This also has to be fast, as pathname
components tend to be short (the most common case is one iteration!), but
there's some room for latency, as there is a fair bit of intervening logic
before the hash value is used for anything.

(Performance verified with "bonnie++ -s 0 -n 1536:-2" on tmpfs.  I need
a better benchmark; the numbers seem to show a slight dip in performance
between 4.6.0 and this patch, but they're too noisy to quote.)

Special thanks to Bruce fields for diligent testing which uncovered a
nasty fencepost error in an earlier version of this patch.

[checkpatch.pl formatting complaints noted and respectfully disagreed with.]

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Tested-by: J. Bruce Fields <bfields@redhat.com>
2016-05-28 15:45:29 -04:00
George Spelvin
fcfd2fbf22 fs/namei.c: Add hashlen_string() function
We'd like to make more use of the highly-optimized dcache hash functions
throughout the kernel, rather than have every subsystem create its own,
and a function that hashes basic null-terminated strings is required
for that.

(The name is to emphasize that it returns both hash and length.)

It's actually useful in the dcache itself, specifically d_alloc_name().
Other uses in the next patch.

full_name_hash() is also tweaked to make it more generally useful:
1) Take a "char *" rather than "unsigned char *" argument, to
   be consistent with hash_name().
2) Handle zero-length inputs.  If we want more callers, we don't want
   to make them worry about corner cases.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
2016-05-28 15:42:50 -04:00
Linus Torvalds
23a3e178b9 This pull request contains mostly cleanups and minor
improvements of UBI and UBIFS.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJXSJl6AAoJEEtJtSqsAOnWbbAP/2ls5KpGsRSoOP4NsKo5+8k9
 GsZ8hb0iN+UGDxMB5Jlj6O1ISNPz/8o7iGBuWa5OWFdh4Nn38sA1Qv996Rg3Ca3O
 8KYEGAD7POWANfxCTmyQX/L8AsOP62B3diFktPyGrtYmLsVzx/AFN9/nM27ticdF
 PQpUtLwvL/m7/oE6ymFCB4x34+DkTa7Jx4e+chVlF61CUipGc5I60VVX1Do+/DWt
 S37ajjIxMJx0BO7Zs6vrcH9uGFSbvSpVBr9/sC16t0Fa1/vAa9pXieg3y5XZmtOl
 +6NlwUxXVu5IjHkYYGZP8jsrA7bKJlflgCo/w6WKC6V0KECnBx+oSG9uMjDVPBSM
 rc4VF4nevnNLNqFGBRWhxiYrRUCdB1TcWVDOzM16peNeUNJiWj/+4PbsiJQBwECH
 ZnE/dBrjU7v+J8UmHsUJSveW6/5PluJf1loDzrip7gk5QD13wdpPo3VOycOj1vMD
 iS6H/TBjtcEzWWh7qcBbtgkbHQH0g+w9YSzVy7ODFweuoq+4qCZ1davqzzZaRv3q
 hylor1gea3b4GnH1EDjmXyQOOuIf2rLdJhou8WoZhmy2q3FrnE+89/CCOCMkeQMQ
 qQr1lAbxD7A+8bYwSZ7Nvv6lrBUrat59DV6FHFc69MdkGT7J6z85ZOyd3lxRvJc/
 XiZZkKhrYSKgV/iLFhHW
 =L16y
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.7-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:
 "This contains mostly cleanups and minor improvements of UBI and UBIFS"

* tag 'upstream-4.7-rc1' of git://git.infradead.org/linux-ubifs:
  ubifs: ubifs_dump_inode: Fix dumping field bulk_read
  UBI: Fix static volume checks when Fastmap is used
  UBI: Set free_count to zero before walking through erase list
  UBI: Silence an unintialized variable warning
  UBI: Clean up return in ubi_remove_volume()
  UBI: Modify wrong comment in ubi_leb_map function.
  UBI: Don't read back all data in ubi_eba_copy_leb()
  UBI: Add ro-mode sysfs attribute
2016-05-27 18:49:29 -07:00
Linus Torvalds
e0714ec4f9 nfs: fix anonymous member initializer build failure with older compilers
Older versions of gcc don't understand named initializers inside a
anonymous structure or union member.  It can be worked around by adding
the bracin gin the initializer for the anonymous member.

Without this, gcc 4.4.4 will fail the build with

    CC      fs/nfs/nfs4state.o
  fs/nfs/nfs4state.c:69: error: unknown field ‘data’ specified in initializer
  fs/nfs/nfs4state.c:69: warning: missing braces around initializer
  fs/nfs/nfs4state.c:69: warning: (near initialization for ‘zero_stateid.<anonymous>.data’)
  make[2]: *** [fs/nfs/nfs4state.o] Error 1

introduced in commit 93b717fd81 ("NFSv4: Label stateids with the type")

Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Anna Schumaker <Anna.Schumaker@netapp.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 17:20:27 -07:00
Linus Torvalds
d102a56edb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Followups to the parallel lookup work:

   - update docs

   - restore killability of the places that used to take ->i_mutex
     killably now that we have down_write_killable() merged

   - Additionally, it turns out that I missed a prerequisite for
     security_d_instantiate() stuff - ->getxattr() wasn't the only thing
     that could be called before dentry is attached to inode; with smack
     we needed the same treatment applied to ->setxattr() as well"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch ->setxattr() to passing dentry and inode separately
  switch xattr_handler->set() to passing dentry and inode separately
  restore killability of old mutex_lock_killable(&inode->i_mutex) users
  add down_write_killable_nested()
  update D/f/directory-locking
2016-05-27 17:14:05 -07:00
Al Viro
3767e255b3 switch ->setxattr() to passing dentry and inode separately
smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
we'd hashed the new dentry and attached it to inode, we need ->setxattr()
instances getting the inode as an explicit argument rather than obtaining
it from dentry.

Similar change for ->getxattr() had been done in commit ce23e64.  Unlike
->getxattr() (which is used by both selinux and smack instances of
->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
it got missed back then.

Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-27 20:09:16 -04:00
Linus Torvalds
0121a32201 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
 "The meat of this is a change to use the mounter's credentials for
  operations that require elevated privileges (such as whiteout
  creation).  This fixes behavior under user namespaces as well as being
  a nice cleanup"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: Do d_type check only if work dir creation was successful
  ovl: update documentation
  ovl: override creds with the ones from the superblock mounter
2016-05-27 16:44:39 -07:00
Linus Torvalds
559b6d90a0 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs cleanups and fixes from Chris Mason:
 "We have another round of fixes and a few cleanups.

  I have a fix for short returns from btrfs_copy_from_user, which
  finally nails down a very hard to find regression we added in v4.6.

  Dave is pushing around gfp parameters, mostly to cleanup internal apis
  and make it a little more consistent.

  The rest are smaller fixes, and one speelling fixup patch"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (22 commits)
  Btrfs: fix handling of faults from btrfs_copy_from_user
  btrfs: fix string and comment grammatical issues and typos
  btrfs: scrub: Set bbio to NULL before calling btrfs_map_block
  Btrfs: fix unexpected return value of fiemap
  Btrfs: free sys_array eb as soon as possible
  btrfs: sink gfp parameter to convert_extent_bit
  btrfs: make state preallocation more speculative in __set_extent_bit
  btrfs: untangle gotos a bit in convert_extent_bit
  btrfs: untangle gotos a bit in __clear_extent_bit
  btrfs: untangle gotos a bit in __set_extent_bit
  btrfs: sink gfp parameter to set_record_extent_bits
  btrfs: sink gfp parameter to set_extent_new
  btrfs: sink gfp parameter to set_extent_defrag
  btrfs: sink gfp parameter to set_extent_delalloc
  btrfs: sink gfp parameter to clear_extent_dirty
  btrfs: sink gfp parameter to clear_record_extent_bits
  btrfs: sink gfp parameter to clear_extent_bits
  btrfs: sink gfp parameter to set_extent_bits
  btrfs: make find_workspace warn if there are no workspaces
  btrfs: make find_workspace always succeed
  ...
2016-05-27 16:37:36 -07:00
Linus Torvalds
5d22fc25d4 mm: remove more IS_ERR_VALUE abuses
The do_brk() and vm_brk() return value was "unsigned long" and returned
the starting address on success, and an error value on failure.  The
reasons are entirely historical, and go back to it basically behaving
like the mmap() interface does.

However, nobody actually wanted that interface, and it causes totally
pointless IS_ERR_VALUE() confusion.

What every single caller actually wants is just the simpler integer
return of zero for success and negative error number on failure.

So just convert to that much clearer and more common calling convention,
and get rid of all the IS_ERR_VALUE() uses wrt vm_brk().

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 15:57:31 -07:00
Arnd Bergmann
287980e49f remove lots of IS_ERR_VALUE abuses
Most users of IS_ERR_VALUE() in the kernel are wrong, as they
pass an 'int' into a function that takes an 'unsigned long'
argument. This happens to work because the type is sign-extended
on 64-bit architectures before it gets converted into an
unsigned type.

However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.

Andrzej Hajda has already fixed a lot of the worst abusers that
were causing actual bugs, but it would be nice to prevent any
users that are not passing 'unsigned long' arguments.

This patch changes all users of IS_ERR_VALUE() that I could find
on 32-bit ARM randconfig builds and x86 allmodconfig. For the
moment, this doesn't change the definition of IS_ERR_VALUE()
because there are probably still architecture specific users
elsewhere.

Almost all the warnings I got are for files that are better off
using 'if (err)' or 'if (err < 0)'.
The only legitimate user I could find that we get a warning for
is the (32-bit only) freescale fman driver, so I did not remove
the IS_ERR_VALUE() there but changed the type to 'unsigned long'.
For 9pfs, I just worked around one user whose calling conventions
are so obscure that I did not dare change the behavior.

I was using this definition for testing:

 #define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \
       unlikely((unsigned long long)(x) >= (unsigned long long)(typeof(x))-MAX_ERRNO))

which ends up making all 16-bit or wider types work correctly with
the most plausible interpretation of what IS_ERR_VALUE() was supposed
to return according to its users, but also causes a compile-time
warning for any users that do not pass an 'unsigned long' argument.

I suggested this approach earlier this year, but back then we ended
up deciding to just fix the users that are obviously broken. After
the initial warning that caused me to get involved in the discussion
(fs/gfs2/dir.c) showed up again in the mainline kernel, Linus
asked me to send the whole thing again.

[ Updated the 9p parts as per Al Viro  - Linus ]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.org/lkml/2016/1/7/363
Link: https://lkml.org/lkml/2016/5/27/486
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> # For nvmem part
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 15:26:11 -07:00
Eryu Guan
9ecd10b7a0 direct-io: fix direct write stale data exposure from concurrent buffered read
Currently direct writes inside i_size on a DIO_SKIP_HOLES filesystem are
not allowed to allocate blocks(get_more_blocks() sets 'create' to 0
before calling get_block() callback), if it's a sparse file, direct
writes fall back to buffered writes to avoid stale data exposure from
concurrent buffered read.  But there're two cases that can result in
stale data exposure are not correctly detected.

1. The detection for "writing inside i_size" is not sufficient,
   writes can be treated as "extending writes" wrongly.  For example,
   direct write 1FSB (file system block) to a 1FSB sparse file on
   ext2/3/4, starting from offset 0, in this case it's writing inside
   i_size, but 'create' is non-zero, because 'block_in_file' and
   '(i_size_read(inode) >> blkbits' are both zero.

2. Direct writes starting from or beyong i_size (not inside i_size)
   also could trigger block allocation and expose stale data.  For
   example, consider a sparse file with i_size of 2k, and a write to
   offset 2k or 3k into the file, with a filesystem block size of 4k.
   (Thanks to Jeff Moyer for pointing this case out in his review.)

The first problem can be demostrated by running ltp-aiodio test ADSP045
many times.  When testing on extN filesystems, I see test failures
occasionally, buffered read could read non-zero (stale) data.

ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 1

dio_sparse    0  TINFO  :  Dirtying free blocks
dio_sparse    0  TINFO  :  Starting I/O tests
non zero buffer at buf[0] => 0xffffffaa,ffffffaa,ffffffaa,ffffffaa
non-zero read at offset 0
dio_sparse    0  TINFO  :  Killing childrens(s)
dio_sparse    1  TFAIL  :  dio_sparse.c:191: 1 children(s) exited abnormally

The second problem can also be reproduced easily by a hacked dio_sparse
program, which accepts an option to specify the write offset.

What we should really do is to disable block allocation for writes that
could result in filling holes inside i_size.

Link: http://lkml.kernel.org/r/1463156728-13357-1-git-send-email-guaneryu@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Junxiao Bi
38b52efd21 ocfs2: bump up o2cb network protocol version
Two new messages are added to support negotiating hb timeout.  Stop
nodes frmo talking an old version to mount as they will cause the
negotiation to fail.

Link: http://lkml.kernel.org/r/1464231615-27939-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Junxiao Bi
6633ca5731 ocfs2: o2hb: fix hb hung time
hr_last_timeout_start should be set as the last time where hb is
still OK.  When hb write timeout, hung time will be (jiffies -
hr_last_timeout_start).

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Junxiao Bi
88dbe98dc7 ocfs2: o2hb: don't negotiate if last hb fail
Sometimes io error is returned when storage is down for a while.  Like
for iscsi device, stroage is made offline when session timeout, and this
will make all io return -EIO.  For this case, nodes shouldn't do
negotiate timeout but should fence self.  So let nodes fence self when
o2hb_do_disk_heartbeat return an error, this is the same behavior with
o2hb without negotiate timer.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Junxiao Bi
1bd1290283 ocfs2: o2hb: add some user/debug log
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Junxiao Bi
e76f8237a2 ocfs2: o2hb: add NEGOTIATE_APPROVE message
This message is used to re-queue write timeout timer and negotiate timer
when all nodes suffer a write hung to storage, this makes node not fence
self if storage down.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Junxiao Bi
34069b886f ocfs2: o2hb: add NEGO_TIMEOUT message
This message is sent to master node when non-master nodes's negotiate
timer expired.  Master node records these nodes in a bitmap which is
used to do write timeout timer re-queue decision.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Junxiao Bi
e0cbb79805 ocfs2: o2hb: add negotiate timer
This series of patches is to fix the issue that when storage down, all
nodes will fence self due to write timeout.

With this patch set, all nodes will keep going until storage back
online, except if the following issue happens, then all nodes will do as
before to fence self.

1. io error got
2. network between nodes down
3. nodes panic

This patch (of 6):

When storage down, all nodes will fence self due to write timeout.  The
negotiate timer is designed to avoid this, with it node will wait until
storage up again.

Negotiate timer working in the following way:

1. The timer expires before write timeout timer, its timeout is half
   of write timeout now.  It is re-queued along with write timeout timer.
   If expires, it will send NEGO_TIMEOUT message to master node(node with
   lowest node number).  This message does nothing but marks a bit in a
   bitmap recording which nodes are negotiating timeout on master node.

2. If storage down, nodes will send this message to master node, then
   when master node finds its bitmap including all online nodes, it sends
   NEGO_APPROVL message to all nodes one by one, this message will
   re-queue write timeout timer and negotiate timer.  For any node doesn't
   receive this message or meets some issue when handling this message, it
   will be fenced.  If storage up at any time, o2hb_thread will run and
   re-queue all the timer, nothing will be affected by these two steps.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Al Viro
5930122683 switch xattr_handler->set() to passing dentry and inode separately
preparation for similar switch in ->setxattr() (see the next commit for
rationale).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-27 15:39:43 -04:00
Vivek Goyal
21765194ce ovl: Do d_type check only if work dir creation was successful
d_type check requires successful creation of workdir as iterates
through work dir and expects work dir to be present in it. If that's
not the case, this check will always return d_type not supported even
if underlying filesystem might be supporting it.

So don't do this check if work dir creation failed in previous step.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-05-27 10:18:56 +02:00
Antonio Murdaca
3fe6e52f06 ovl: override creds with the ones from the superblock mounter
In user namespace the whiteout creation fails with -EPERM because the
current process isn't capable(CAP_SYS_ADMIN) when setting xattr.

A simple reproducer:

$ mkdir upper lower work merged lower/dir
$ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
$ unshare -m -p -f -U -r bash

Now as root in the user namespace:

\# touch merged/dir/{1,2,3} # this will force a copy up of lower/dir
\# rm -fR merged/*

This ends up failing with -EPERM after the files in dir has been
correctly deleted:

unlinkat(4, "2", 0)                     = 0
unlinkat(4, "1", 0)                     = 0
unlinkat(4, "3", 0)                     = 0
close(4)                                = 0
unlinkat(AT_FDCWD, "merged/dir", AT_REMOVEDIR) = -1 EPERM (Operation not
permitted)

Interestingly, if you don't place files in merged/dir you can remove it,
meaning if upper/dir does not exist, creating the char device file works
properly in that same location.

This patch uses ovl_sb_creator_cred() to get the cred struct from the
superblock mounter and override the old cred with these new ones so that
the whiteout creation is possible because overlay is wrong in assuming that
the creds it will get with prepare_creds will be in the initial user
namespace.  The old cap_raise game is removed in favor of just overriding
the old cred struct.

This patch also drops from ovl_copy_up_one() the following two lines:

override_cred->fsuid = stat->uid;
override_cred->fsgid = stat->gid;

This is because the correct uid and gid are taken directly with the stat
struct and correctly set with ovl_set_attr().

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-05-27 08:55:26 +02:00
Linus Torvalds
e12fab28df Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  drivers/pinctrl/intel/pinctrl-baytrail.c: fix build with gcc-4.4
  update "mm/zsmalloc: don't fail if can't create debugfs info"
  dma-debug: avoid spinlock recursion when disabling dma-debug
  mm: oom_reaper: remove some bloat
  memcg: fix mem_cgroup_out_of_memory() return value.
  ocfs2: fix improper handling of return errno
  mm: slub: remove unused virt_to_obj()
  mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta
  mm: make CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on !FLATMEM explicitly
  seqlock: fix raw_read_seqcount_latch()
2016-05-26 21:32:40 -07:00
Linus Torvalds
478a1469a7 Filesystem DAX locking for 4.7
- We use a bit in an exceptional radix tree entry as a lock bit and use it
   similarly to how page lock is used for normal faults.  This fixes races
   between hole instantiation and read faults of the same index.
 
 - Filesystem DAX PMD faults are disabled, and will be re-enabled when PMD
   locking is implemented.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXRKwLAAoJEJ/BjXdf9fLB+BkP/3HBm05KlAKDklvnBIPFDMUK
 hA7g2K6vuvaEDZXZQ1ioc1Ajf1sCpVip7shXJsojZqwWmRz0/4nneF7ytluW9AjS
 dBX+0qCgKGH1fnwyGFF+MN7fuj7kGrSDz34lG0OObRN6/oKiVNb2svXiYKkT6J6C
 AgsWlWRUpMy9jrn1u/FduMjDhk92Z3ojarexuicr0i8NUlBClCIrdCEmUMi4orSB
 DuiIjestLOc7+mERBUwrXkzoh9v8Z0FpIgnDLWwpeEkAvJwWkGe5eXrBJwF+hEbi
 RYfTrOYc7bBQLo22LRb8pdighjrx3OW9EpNCfEmLDOjM3cYBbMK/d2i/ww52H6IK
 Mw6iS5rXdGgJtQIGL8N96HLFk+cDyZ8J8xNUCwbYYBJqgpMzxzVkL3vTm72tyFnl
 InWhih+miCMbBPytQSRd6+1wZG2piJTv6SsFTd5K1OaiRmJhBJZG47t2QTBRBu7Y
 5A4FGPtlraV+iDJvD6VLO1Tp8twxdLluOJ2BwdGeiKXiGh6LP+FGGFF3aFa5N4Ro
 xSslCTX7Q1G66zXQwD4+IMWLwS1FDNymPkUSsF6RQo6qfAnl9SrmYTc4xJ4QXy92
 sUdrWEz2OBTfxKNqbGyc/KrXKZT3RnEkJNft8snB2h6WTCdOPaNYs/yETUwiwkSc
 CXpuQFrxm69QYwNsqVu1
 =Pkd0
 -----END PGP SIGNATURE-----

Merge tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull DAX locking updates from Ross Zwisler:
 "Filesystem DAX locking for 4.7

   - We use a bit in an exceptional radix tree entry as a lock bit and
     use it similarly to how page lock is used for normal faults.  This
     fixes races between hole instantiation and read faults of the same
     index.

   - Filesystem DAX PMD faults are disabled, and will be re-enabled when
     PMD locking is implemented"

* tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Remove i_mmap_lock protection
  dax: Use radix tree entry lock to protect cow faults
  dax: New fault locking
  dax: Allow DAX code to replace exceptional entries
  dax: Define DAX lock bit for radix tree exceptional entry
  dax: Make huge page handling depend of CONFIG_BROKEN
  dax: Fix condition for filling of PMD holes
2016-05-26 20:00:28 -07:00
Linus Torvalds
315227f6da DAX error handling for 4.7
- Until now, dax has been disabled if media errors were found on
   any device. This enables the use of DAX in the presence of these
   errors by making all sector-aligned zeroing go through the driver.
 - The driver (already) has the ability to clear errors on writes that
   are sent through the block layer using 'DSMs' defined in ACPI 6.1.
 
 Other misc changes:
 
 - When mounting DAX filesystems, check to make sure the partition
   is page aligned. This is a requirement for DAX, and previously, we
   allowed such unaligned mounts to succeed, but subsequent reads/writes
   would fail.
 
 - Misc/cleanup fixes from Jan that remove unused code from DAX related to
   zeroing, writeback, and some size checks.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXQ4GKAAoJEHr6Yb6juE3/zowP/iclIhgXXXMQJRUHJlePMXC8
 15sGZ32JS1ak9g7vrsmNVEDNynfNtiMYdBxtUyRuj6xqgwdZvFk3F55KOCPtaeA1
 +yADkgeRkTAcwzmHw9WQVEzBCqyzSisdrwtEfH817qdq9FJdH66x2Kos6i+HeAVr
 5Q/e4gs7lKrjf384/QBl+wxNZOndJaQAPd2VRHQqx2A9F33v0ljdwRaUG1r4fjK2
 dtmhcZCqdQyuAGXW3piTnZc5ZFc3DPqO4FkEfqkEK3lFOflK0fd8wMsAZRp/Jd0j
 GJsgnVSWSqG0Dz476djlG0w8t2p5Jv1g9cKChV+ZZEdFLKWHCOUFqXNj8uI8I4k5
 cOEKCHyJ3IwfSHhNQqktEWrQN4T8ZXhWtuc9GuV4UZYuqJqHci6EdR/YsWsJjV+L
 lm/qvK4ipDS1pivxOy8KX/iN0z7Io8J9GXpStDx3g8iWjLlh4YYlbJLWeeRepo/z
 aPlV/QAKcHiGY6jzLExrZIyCWkzwo6O+0p1Kxerv9/7K/32HWbOodZ+tC8eD+N25
 pV69nCGf+u50T2TtIx1+iann4NC1r7zg5yqnT9AgpyZpiwR5joCDzI5sXW+D0rcS
 vPtfM84Ccdeq/e6mvfIpZgR0/npQapKnrmUest0J7P2BFPHiFPji1KzZ7M+1aFOo
 9R6JdrAj0Sc+FBa+cGzH
 =v6Of
 -----END PGP SIGNATURE-----

Merge tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull misc DAX updates from Vishal Verma:
 "DAX error handling for 4.7

   - Until now, dax has been disabled if media errors were found on any
     device.  This enables the use of DAX in the presence of these
     errors by making all sector-aligned zeroing go through the driver.

   - The driver (already) has the ability to clear errors on writes that
     are sent through the block layer using 'DSMs' defined in ACPI 6.1.

  Other misc changes:

   - When mounting DAX filesystems, check to make sure the partition is
     page aligned.  This is a requirement for DAX, and previously, we
     allowed such unaligned mounts to succeed, but subsequent
     reads/writes would fail.

   - Misc/cleanup fixes from Jan that remove unused code from DAX
     related to zeroing, writeback, and some size checks"

* tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: fix a comment in dax_zero_page_range and dax_truncate_page
  dax: for truncate/hole-punch, do zeroing through the driver if possible
  dax: export a low-level __dax_zero_page_range helper
  dax: use sb_issue_zerout instead of calling dax_clear_sectors
  dax: enable dax in the presence of known media errors (badblocks)
  dax: fallback from pmd to pte on error
  block: Update blkdev_dax_capable() for consistency
  xfs: Add alignment check for DAX mount
  ext2: Add alignment check for DAX mount
  ext4: Add alignment check for DAX mount
  block: Add bdev_dax_supported() for dax mount checks
  block: Add vfs_msg() interface
  dax: Remove redundant inode size checks
  dax: Remove pointless writeback from dax_do_io()
  dax: Remove zeroing from dax_io()
  dax: Remove dead zeroing code from fault handlers
  ext2: Avoid DAX zeroing to corrupt data
  ext2: Fix block zeroing in ext2_get_blocks() for DAX
  dax: Remove complete_unwritten argument
  DAX: move RADIX_DAX_ definitions to dax.c
2016-05-26 19:34:26 -07:00
Eric Ren
1f3a437fa0 ocfs2: fix improper handling of return errno
Previously, if a bad inode was found in ocfs2_iget(), -ESTALE was
returned back to the caller anyway.  Since commit d2b9d71a2da7 ("ocfs2:
check/fix inode block for online file check") can handle with return
value from ocfs2_read_locked_inode() now, we know the exact errno
returned for us.

Link: http://lkml.kernel.org/r/1463970656-18413-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-26 15:35:44 -07:00
Linus Torvalds
a10c38a4f3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "This changeset has a few main parts:

   - Ilya has finished a huge refactoring effort to sync up the
     client-side logic in libceph with the user-space client code, which
     has evolved significantly over the last couple years, with lots of
     additional behaviors (e.g., how requests are handled when cluster
     is full and transitions from full to non-full).

     This structure of the code is more closely aligned with userspace
     now such that it will be much easier to maintain going forward when
     behavior changes take place.  There are some locking improvements
     bundled in as well.

   - Zheng adds multi-filesystem support (multiple namespaces within the
     same Ceph cluster)

   - Zheng has changed the readdir offsets and directory enumeration so
     that dentry offsets are hash-based and therefore stable across
     directory fragmentation events on the MDS.

   - Zheng has a smorgasbord of bug fixes across fs/ceph"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
  ceph: fix wake_up_session_cb()
  ceph: don't use truncate_pagecache() to invalidate read cache
  ceph: SetPageError() for writeback pages if writepages fails
  ceph: handle interrupted ceph_writepage()
  ceph: make ceph_update_writeable_page() uninterruptible
  libceph: make ceph_osdc_wait_request() uninterruptible
  ceph: handle -EAGAIN returned by ceph_update_writeable_page()
  ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM
  ceph: block non-fatal signals for fault/page_mkwrite
  ceph: make logical calculation functions return bool
  ceph: tolerate bad i_size for symlink inode
  ceph: improve fragtree change detection
  ceph: keep leaf frag when updating fragtree
  ceph: fix dir_auth check in ceph_fill_dirfrag()
  ceph: don't assume frag tree splits in mds reply are sorted
  ceph: fix inode reference leak
  ceph: using hash value to compose dentry offset
  ceph: don't forbid marking directory complete after forward seek
  ceph: record 'offset' for each entry of readdir result
  ceph: define 'end/complete' in readdir reply as bit flags
  ...
2016-05-26 14:10:32 -07:00
Chris Mason
56244ef151 Btrfs: fix handling of faults from btrfs_copy_from_user
When btrfs_copy_from_user isn't able to copy all of the pages, we need
to adjust our accounting to reflect the work that was actually done.

Commit 2e78c927d7 changed around the decisions a little and we ended up
skipping the accounting adjustments some of the time.  This commit makes
sure that when we don't copy anything at all, we still hop into
the adjustments, and switches to release_bytes instead of write_bytes,
since write_bytes isn't aligned.

The accounting errors led to warnings during btrfs_destroy_inode:

[   70.847532] WARNING: CPU: 10 PID: 514 at fs/btrfs/inode.c:9350 btrfs_destroy_inode+0x2b3/0x2c0
[   70.847536] Modules linked in: i2c_piix4 virtio_net i2c_core input_leds button led_class serio_raw acpi_cpufreq sch_fq_codel autofs4 virtio_blk
[   70.847538] CPU: 10 PID: 514 Comm: umount Tainted: G        W 4.6.0-rc6_00062_g2997da1-dirty #23
[   70.847539] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014
[   70.847542]  0000000000000000 ffff880ff5cafab8 ffffffff8149d5e9 0000000000000202
[   70.847543]  0000000000000000 0000000000000000 0000000000000000 ffff880ff5cafb08
[   70.847547]  ffffffff8107bdfd ffff880ff5cafaf8 000024868120013d ffff880ff5cafb28
[   70.847547] Call Trace:
[   70.847550]  [<ffffffff8149d5e9>] dump_stack+0x51/0x78
[   70.847551]  [<ffffffff8107bdfd>] __warn+0xfd/0x120
[   70.847553]  [<ffffffff8107be3d>] warn_slowpath_null+0x1d/0x20
[   70.847555]  [<ffffffff8139c9e3>] btrfs_destroy_inode+0x2b3/0x2c0
[   70.847556]  [<ffffffff812003a1>] ? __destroy_inode+0x71/0x140
[   70.847558]  [<ffffffff812004b3>] destroy_inode+0x43/0x70
[   70.847559]  [<ffffffff810b7b5f>] ? wake_up_bit+0x2f/0x40
[   70.847560]  [<ffffffff81200c68>] evict+0x148/0x1d0
[   70.847562]  [<ffffffff81398ade>] ? start_transaction+0x3de/0x460
[   70.847564]  [<ffffffff81200d49>] dispose_list+0x59/0x80
[   70.847565]  [<ffffffff81201ba0>] evict_inodes+0x180/0x190
[   70.847566]  [<ffffffff812191ff>] ? __sync_filesystem+0x3f/0x50
[   70.847568]  [<ffffffff811e95f8>] generic_shutdown_super+0x48/0x100
[   70.847569]  [<ffffffff810b75c0>] ? woken_wake_function+0x20/0x20
[   70.847571]  [<ffffffff811e9796>] kill_anon_super+0x16/0x30
[   70.847573]  [<ffffffff81365cde>] btrfs_kill_super+0x1e/0x130
[   70.847574]  [<ffffffff811e99be>] deactivate_locked_super+0x4e/0x90
[   70.847576]  [<ffffffff811e9e61>] deactivate_super+0x51/0x70
[   70.847577]  [<ffffffff8120536f>] cleanup_mnt+0x3f/0x80
[   70.847579]  [<ffffffff81205402>] __cleanup_mnt+0x12/0x20
[   70.847581]  [<ffffffff81098358>] task_work_run+0x68/0xa0
[   70.847582]  [<ffffffff810022b6>] exit_to_usermode_loop+0xd6/0xe0
[   70.847583]  [<ffffffff81002e1d>] do_syscall_64+0xbd/0x170
[   70.847586]  [<ffffffff817d4dbc>] entry_SYSCALL64_slow_path+0x25/0x25

This is the test program I used to force short returns from
btrfs_copy_from_user

void *dontneed(void *arg)
{
	char *p = arg;
	int ret;

	while(1) {
		ret = madvise(p, BUFSIZE/4, MADV_DONTNEED);
		if (ret) {
			perror("madvise");
			exit(1);
		}
	}
}

int main(int ac, char **av) {
	int ret;
	int fd;
	char *filename;
	unsigned long offset;
	char *buf;
	int i;
	pthread_t tid;

	if (ac != 2) {
		fprintf(stderr, "usage: dammitdave filename\n");
		exit(1);
	}

	buf = mmap(NULL, BUFSIZE, PROT_READ|PROT_WRITE,
		   MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	if (buf == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}
	memset(buf, 'a', BUFSIZE);
	filename = av[1];

	ret = pthread_create(&tid, NULL, dontneed, buf);
	if (ret) {
		fprintf(stderr, "error %d from pthread_create\n", ret);
		exit(1);
	}

	ret = pthread_detach(tid);
	if (ret) {
		fprintf(stderr, "pthread detach failed %d\n", ret);
		exit(1);
	}

	while (1) {
		fd = open(filename, O_RDWR | O_CREAT, 0600);
		if (fd < 0) {
			perror("open");
			exit(1);
		}

		for (i = 0; i < ROUNDS; i++) {
			int this_write = BUFSIZE;

			offset = rand() % MAXSIZE;
			ret = pwrite(fd, buf, this_write, offset);
			if (ret < 0) {
				perror("pwrite");
				exit(1);
			} else if (ret != this_write) {
				fprintf(stderr, "short write to %s offset %lu ret %d\n",
					filename, offset, ret);
				exit(1);
			}
			if (i == ROUNDS - 1) {
				ret = sync_file_range(fd, offset, 4096,
				    SYNC_FILE_RANGE_WRITE);
				if (ret < 0) {
					perror("sync_file_range");
					exit(1);
				}
			}
		}
		ret = ftruncate(fd, 0);
		if (ret < 0) {
			perror("ftruncate");
			exit(1);
		}
		ret = close(fd);
		if (ret) {
			perror("close");
			exit(1);
		}
		ret = unlink(filename);
		if (ret) {
			perror("unlink");
			exit(1);
		}

	}
	return 0;
}

Signed-off-by: Chris Mason <clm@fb.com>
Reported-by: Dave Jones <dsj@fb.com>
Fixes: 2e78c927d7
cc: stable@vger.kernel.org # v4.6
Signed-off-by: Chris Mason <clm@fb.com>
2016-05-26 13:23:59 -07:00
Linus Torvalds
ea8ea737c4 NFS client updates for Linux 4.7
Highlights include:
 
 Features:
 - Add support for the NFS v4.2 COPY operation
 - Add support for NFS/RDMA over IPv6
 
 Bugfixes and cleanups:
 - Avoid race that crashes nfs_init_commit()
 - Fix oops in callback path
 - Fix LOCK/OPEN race when unlinking an open file
 - Choose correct stateids when using delegations in setattr, read and write
 - Don't send empty SETATTR after OPEN_CREATE
 - xprtrdma: Prevent server from writing a reply into memory client has released
 - xprtrdma: Support using Read list and Reply chunk in one RPC call
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXRu76AAoJENfLVL+wpUDrDVoQAKPKv1tEVJMRUQA3UVoKoixd
 KjmmZMjl6GfpISwTZl+a8W549jyGuYH7Gl8vSbMaE9/FI+kJW6XZQniTYfFqY8/a
 LbMSdNx1+yURisbkyO0vPqqwKw9r6UmsfGeUT8SpS3ff61yp4Oj436ra2qcPJsZ3
 cWl/lHItzX7oKFAWmr0Nmq2X8ac/8+NFyK29+V/QGfwtp3qAPbpA8XM5HrHw3rA2
 uk5uNSr3hwqz7P3+Hi7ZoO2m4nQTAbQnEunfYpxlOwz4IaM7qcGnntT6Jhwq1pGE
 /1YasG7bHeiWjhynmZZ4CWuMkogau2UJ/G68Cz7ehLhPNr8rH/ZFCJZ+XX0e0CgI
 1d+AwxZvgszIQVBY3S7sg8ezVSCPBXRFJ8rtzggGscqC53aP7L+rLfUFH+OKrhMg
 6n7RQiq4EmGDJGviB/R2HixI9CpdOf2puNhDKSJmPOqiSS7UuHMw8QCq++vdru+1
 GLGunGyO7D70yTV92KtsdzJlFlnfa/g+FIJrmaMpL3HH1h0stTctWX5xlTYmqEL3
 z3aUuT8RySk2t1FTabSj6KRWqE/krK5BMZbX91kpF27WL4c/olXFaZPqBDsj0q4u
 2rm1fIrc8RxLXctJan9ro092s/e9dup/1JxV5XWMq/EGS1ezvf+0XkCOtURaAWp3
 2aPHlx7M8iuq2SouL6f7
 =QMmY
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Features:
   - Add support for the NFS v4.2 COPY operation
   - Add support for NFS/RDMA over IPv6

  Bugfixes and cleanups:
   - Avoid race that crashes nfs_init_commit()
   - Fix oops in callback path
   - Fix LOCK/OPEN race when unlinking an open file
   - Choose correct stateids when using delegations in setattr, read and
     write
   - Don't send empty SETATTR after OPEN_CREATE
   - xprtrdma: Prevent server from writing a reply into memory client
     has released
   - xprtrdma: Support using Read list and Reply chunk in one RPC call"

* tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (61 commits)
  pnfs: pnfs_update_layout needs to consider if strict iomode checking is on
  nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled
  nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO
  nfs: avoid race that crashes nfs_init_commit
  NFS: checking for NULL instead of IS_ERR() in nfs_commit_file()
  pnfs: make pnfs_layout_process more robust
  pnfs: rework LAYOUTGET retry handling
  pnfs: lift retry logic from send_layoutget to pnfs_update_layout
  pnfs: fix bad error handling in send_layoutget
  flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds
  flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED
  pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args
  pnfs: keep track of the return sequence number in pnfs_layout_hdr
  pnfs: record sequence in pnfs_layout_segment when it's created
  pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set
  pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes
  pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io
  pNFS/flexfile: Fix erroneous fall back to read/write through the MDS
  NFS: Reclaim writes via writepage are opportunistic
  NFSv4: Use the right stateid for delegations in setattr, read and write
  ...
2016-05-26 10:33:33 -07:00
Linus Torvalds
0b9210c9c8 xfs: update for 4.7-rc1
Changes in this update:
 o fixes for mount line parsing, sparse warnings, read-only compat
   feature remount behaviour
 o allow fast path symlink lookups for inline symlinks.
 o attribute listing cleanups
 o writeback goes direct to bios rather than indirecting through
   bufferheads
 o transaction allocation cleanup
 o optimised kmem_realloc
 o added configurable error handling for metadata write errors,
   changed default error handling behaviour from "retry forever" to
   "retry until unmount then fail"
 o fixed several inode cluster writeback lookup vs reclaim race
   conditions
 o fixed inode cluster writeback checking wrong inode after lookup
 o fixed bugs where struct xfs_inode freeing wasn't actually RCU safe
 o cleaned up inode reclaim tagging
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXRo8LAAoJEK3oKUf0dfodxLgP+wQMd46i/nCncr6jMcdoVXfL
 rE6cL1LJWWVOWzax/bmdlV1VNXqqW7n0ABAVMqikzqSEd+fBQe/HOkdBeVUywu7o
 bmqgNxuofMqHaiYhiTvUijHLHWLFxIgd/jNT7L5oGRzZdmP260VGf3EPipN7aA9U
 Y3KVhFQCqohpeIUeSV4Z/eIDdHN5LyUI1s+7zMLquHKCWyO4aO4GBX8YlyXdRRVe
 cwCZb4TBryS0PBCIra31MZ5wBRwLx8PBXqcNsnTQSR5Uu+WeuwxofXz5q3kdmNOU
 XGTWiabQbcvaC4twrzqnErHEX41PAs43tWxsI/qJH49QIFdfOYM+t8ERhEa2Q3DW
 Ihl+Q/2qiOuZZterG/t5MrxhybrmQhEFVJT6Ib8b/CnwpRm+K8kWTead1YJL8Xzd
 F9k8e57BCgTbDA7jWxWDbp7eQ1/4KglBD4sefFPpsuFgO882mmo5GmymALGjmitw
 JH1v3HL3PeTkQoHfcay8ZM/zNjX643CXHwCWYEOAgD+e77TPiOs/cHLZaXbrBkLK
 PpSJNfYiBe31eeSOEGsxivMapLpus+cHZyK3uR+XU+naJhjOdaBDTTo8RsAD9jS5
 C/dzxc4l7o+gYT+UjV5KtyfEeVwtGo5mtR9XozPbNDjNQor8Vo6NQMZXMXpFYDZI
 2XgzVNpkEf/74kexdEzV
 =0tYo
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs updates from Dave Chinner:
 "A pretty average collection of fixes, cleanups and improvements in
  this request.

  Summary:
   - fixes for mount line parsing, sparse warnings, read-only compat
     feature remount behaviour
   - allow fast path symlink lookups for inline symlinks.
   - attribute listing cleanups
   - writeback goes direct to bios rather than indirecting through
     bufferheads
   - transaction allocation cleanup
   - optimised kmem_realloc
   - added configurable error handling for metadata write errors,
     changed default error handling behaviour from "retry forever" to
     "retry until unmount then fail"
   - fixed several inode cluster writeback lookup vs reclaim race
     conditions
   - fixed inode cluster writeback checking wrong inode after lookup
   - fixed bugs where struct xfs_inode freeing wasn't actually RCU safe
   - cleaned up inode reclaim tagging"

* tag 'xfs-for-linus-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (39 commits)
  xfs: fix warning in xfs_finish_page_writeback for non-debug builds
  xfs: move reclaim tagging functions
  xfs: simplify inode reclaim tagging interfaces
  xfs: rename variables in xfs_iflush_cluster for clarity
  xfs: xfs_iflush_cluster has range issues
  xfs: mark reclaimed inodes invalid earlier
  xfs: xfs_inode_free() isn't RCU safe
  xfs: optimise xfs_iext_destroy
  xfs: skip stale inodes in xfs_iflush_cluster
  xfs: fix inode validity check in xfs_iflush_cluster
  xfs: xfs_iflush_cluster fails to abort on error
  xfs: remove xfs_fs_evict_inode()
  xfs: add "fail at unmount" error handling configuration
  xfs: add configuration handlers for specific errors
  xfs: add configuration of error failure speed
  xfs: introduce table-based init for error behaviors
  xfs: add configurable error support to metadata buffers
  xfs: introduce metadata IO error class
  xfs: configurable error behavior via sysfs
  xfs: buffer ->bi_end_io function requires irq-safe lock
  ...
2016-05-26 10:13:40 -07:00
Tom Haynes
c7d73af2d2 pnfs: pnfs_update_layout needs to consider if strict iomode checking is on
As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically
support enforcing that a IOMODE_RW segment will not allow READ I/O.

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-26 08:40:56 -04:00
Tom Haynes
602c4cd452 nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-26 08:40:51 -04:00
Al Viro
002354112f restore killability of old mutex_lock_killable(&inode->i_mutex) users
The ones that are taking it exclusive, that is...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-26 00:13:25 -04:00
Yan, Zheng
e536030934 ceph: fix wake_up_session_cb()
We should reset i_requested_max_size before waking the waiters.
(zero i_requested_max_size make waiter re-request the max size)

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:42 +02:00
Yan, Zheng
9abd4db713 ceph: don't use truncate_pagecache() to invalidate read cache
truncate_pagecache() drops dirty pages, it's dangerous to use it
to invalidate read cache. Besides, we shouldn't start invalidating
read cache while there are buffer writers. Because buffer writers
may add dirty pages later.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:42 +02:00
Yan, Zheng
b109eec6f4 ceph: SetPageError() for writeback pages if writepages fails
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:41 +02:00
Yan, Zheng
ad15ec06e5 ceph: handle interrupted ceph_writepage()
writepage() can be interrupted when it's called by direct memory
reclaimer (the direct memory relaimer is killed). To avoid lossing
data, we redirty the page.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:41 +02:00
Yan, Zheng
a78bbd4b29 ceph: make ceph_update_writeable_page() uninterruptible
ceph_update_writeable_page() is used by ceph_write_begin(). It beaks
atomicity of write operation if it's interruptible.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:41 +02:00
Yan, Zheng
f0b33df57a ceph: handle -EAGAIN returned by ceph_update_writeable_page()
when ceph_update_writeable_page() return -EAGAIN, caller should
lock the page and call ceph_update_writeable_page() again.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:40 +02:00
Yan, Zheng
6ce026e411 ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:39 +02:00
Yan, Zheng
4f7e89f6ac ceph: block non-fatal signals for fault/page_mkwrite
Fault and page_mkwrite are supposed to be uninterruptable. But they
call ceph functions that are interruptible. So they should block
signals before calling functions that are interruptible

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:39 +02:00
Zhang Zhuoyu
3b33f692c8 ceph: make logical calculation functions return bool
This patch makes serverl logical caculation functions return bool to
improve readability due to these particular functions only using 0/1
as their return value.

No functional change.

Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
2016-05-26 01:15:39 +02:00
Yan, Zheng
224a7542b8 ceph: tolerate bad i_size for symlink inode
A mds bug can cause symlink's size to be truncated to zero.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:38 +02:00
Yan, Zheng
1b1bc16d66 ceph: improve fragtree change detection
check if number of splits in i_fragtree is equal to number of splits
in mds reply

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:38 +02:00
Yan, Zheng
a4b7431f39 ceph: keep leaf frag when updating fragtree
Nodes in i_fragtree are sorted according to ceph_compare_frag().
It means frag node in i_fragtree always follow its direct parent
node. To check if a leaf node is valid, we just need to check if
it's child of previous split node.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:37 +02:00
Yan, Zheng
421721195a ceph: fix dir_auth check in ceph_fill_dirfrag()
-1 is CDIR_AUTH_PARENT, it means dir's auth mds is the same as
inode's auth mds

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:37 +02:00
Yan, Zheng
a407846ef7 ceph: don't assume frag tree splits in mds reply are sorted
The algorithm that updates i_fragtree relies on that the frag tree
splits in mds reply are of the same order of i_fragtree. This is not
true because current MDS encodes frag tree splits in ascending order
of (unsigned)frag_t. But nodes in i_fragtree are sorted according to
ceph_frag_compare().

The fix is sort the frag tree splits first, then updates i_fragtree.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:37 +02:00
Yan, Zheng
209ae762a6 ceph: fix inode reference leak
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:36 +02:00
Yan, Zheng
f3c4ebe65e ceph: using hash value to compose dentry offset
If MDS sorts dentries in dirfrag in hash order, we use hash value to
compose dentry offset. dentry offset is:

  (0xff << 52) | ((24 bits hash) << 28) |
  (the nth entry hash hash collision)

This offset is stable across directory fragmentation. This alos means
there is no need to reset readdir offset if directory get fragmented
in the middle of readdir.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:36 +02:00
Yan, Zheng
076c40f18d ceph: don't forbid marking directory complete after forward seek
Forward seek within same frag does not update fi->last_name, it will
not affect contents of later readdir reply. So there is no need to
forbid marking directory complete

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:36 +02:00
Yan, Zheng
8974eebd38 ceph: record 'offset' for each entry of readdir result
This is preparation for using hash value as dentry 'offset'

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:35 +02:00
Yan, Zheng
956d39d631 ceph: define 'end/complete' in readdir reply as bit flags
Set a flag in readdir request, which indicates that client interprets
'end/complete' as bit flags. So that mds can reply additional flags in
readdir reply.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:35 +02:00
Yan, Zheng
2a5beea3f1 ceph: define struct for dir entry in readdir reply
This avoids defining multiple arrays for entries in readdir reply

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:34 +02:00
Yan, Zheng
a78600e7c4 ceph: simplify 'offset in frag'
don't distinguish leftmost frag from other frags. always use 2 as
first entry's offset.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:34 +02:00
Yan, Zheng
1cd42a4291 ceph: remove unnecessary checks in __dcache_readdir
we never add snapdir and the hidden .ceph dir into readdir cache

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:34 +02:00
Yan, Zheng
c530cd24c2 ceph: search cache postion for dcache readdir
use binary search to find cache index that corresponds to readdir
postion.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:33 +02:00
Yan, Zheng
04303d8ad0 ceph: use CEPH_MDS_OP_RMXATTR request to remove xattr
Setxattr with NULL value and XATTR_REPLACE flag should be equivalent
to removexattr. But current MDS does not support deleting vxattrs through
MDS_OP_SETXATTR request. The workaround is sending MDS_OP_RMXATTR request
if setxattr actually removs xattr.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:33 +02:00
Yan, Zheng
3f38495409 ceph: report mount root in session metadata
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:33 +02:00
Yan, Zheng
aeda081c5e ceph: don't show symlink target in debugfs/mdsc
symlink target is useless for debug and can be very long. It's annoying
to show it in debugfs/mdsc.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:32 +02:00
Yan, Zheng
6c93df5db6 ceph: don't call truncate_pagecache in ceph_writepages_start
truncate_pagecache() may decrease inode's reference. This can cause
deadlock if inode's last reference is dropped and iput_final() wants
to evict the inode. (evict() calls inode_wait_for_writeback(), which
waits for ceph_writepages_start() to return).

The fix is use work thead to truncate dirty pages. Also add 'forced
umount' check to ceph_update_writeable_page(), which prevents new
pages getting dirty.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:32 +02:00
Yan, Zheng
77310320c2 ceph: renew caps for read/write if mds session got killed.
When mds session gets killed, read/write operation may hang.
Client waits for Frw caps, but mds does not know what caps client
wants. To recover this, client sends an open request to mds. The
request will tell mds what caps client wants.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:31 +02:00
Yan, Zheng
d463a43d69 ceph: CEPH_FEATURE_MDSENC support
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:31 +02:00
Yan, Zheng
235a09821c ceph: multiple filesystem support
To access non-default filesystem, we just need to subscribe to
mdsmap.<MDS_NAMESPACE_ID> and add a new mount option for mds
namespace id.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
[idryomov@gmail.com: switch to a new libceph API]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:15:31 +02:00
Ilya Dryomov
5aea3dcd50 libceph: a major OSD client update
This is a major sync up, up to ~Jewel.  The highlights are:

- per-session request trees (vs a global per-client tree)
- per-session locking (vs a global per-client rwlock)
- homeless OSD session
- no ad-hoc global per-client lists
- support for pool quotas
- foundation for watch/notify v2 support
- foundation for map check (pool deletion detection) support

The switchover is incomplete: lingering requests can be setup and
teared down but aren't ever reestablished.  This functionality is
restored with the introduction of the new lingering infrastructure
(ceph_osd_linger_request, linger_work, etc) in a later commit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:14:03 +02:00
Linus Torvalds
55c1c7b2b6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr regression fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make xattr_resolve_handlers() safe to use with NULL ->s_xattr
  xattr: Fail with -EINVAL for NULL attribute names
2016-05-25 15:54:35 -07:00
Ilya Dryomov
fe5da05e97 libceph: redo callbacks and factor out MOSDOpReply decoding
If you specify ACK | ONDISK and set ->r_unsafe_callback, both
->r_callback and ->r_unsafe_callback(true) are called on ack.  This is
very confusing.  Redo this so that only one of them is called:

    ->r_unsafe_callback(true), on ack
    ->r_unsafe_callback(false), on commit

or

    ->r_callback, on ack|commit

Decode everything in decode_MOSDOpReply() to reduce clutter.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:28 +02:00
Ilya Dryomov
85e084feb4 libceph: drop msg argument from ceph_osdc_callback_t
finish_read(), its only user, uses it to get to hdr.data_len, which is
what ->r_result is set to on success.  This gains us the ability to
safely call callbacks from contexts other than reply, e.g. map check.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:27 +02:00
Ilya Dryomov
bb873b5391 libceph: switch to calc_target(), part 2
The crux of this is getting rid of ceph_osdc_build_request(), so that
MOSDOp can be encoded not before but after calc_target() calculates the
actual target.  Encoding now happens within ceph_osdc_start_request().

Also nuked is the accompanying bunch of pointers into the encoded
buffer that was used to update fields on each send - instead, the
entire front is re-encoded.  If we want to support target->name_len !=
base->name_len in the future, there is no other way, because oid is
surrounded by other fields in the encoded buffer.

Encoding OSD ops and adding data items to the request message were
mixed together in osd_req_encode_op().  While we want to re-encode OSD
ops, we don't want to add duplicate data items to the message when
resending, so all call to ceph_osdc_msg_data_add() are factored out
into a new setup_request_data().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:27 +02:00
Ilya Dryomov
63244fa123 libceph: introduce ceph_osd_request_target, calc_target()
Introduce ceph_osd_request_target, containing all mapping-related
fields of ceph_osd_request and calc_target() for calculating mappings
and populating it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:26 +02:00
Ilya Dryomov
f81f16339a libceph: rename ceph_calc_pg_primary()
Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to
emphasise that it returns acting primary.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:25 +02:00
Ilya Dryomov
d9591f5e28 libceph: rename ceph_oloc_oid_to_pg()
Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg().  Emphasise
that returned is raw PG and return -ENOENT instead of -EIO if the pool
doesn't exist.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:24 +02:00
Ilya Dryomov
fcd00b68bb libceph: DEFINE_RB_FUNCS macro
Given

    struct foo {
        u64 id;
        struct rb_node bar_node;
    };

generate insert_bar(), erase_bar() and lookup_bar() functions with

    DEFINE_RB_FUNCS(bar, struct foo, id, bar_node)

The key is assumed to be an integer (u64, int, etc), compared with
< and >.  nodefld has to be initialized with RB_CLEAR_NODE().

Start using it for MDS, MON and OSD requests and OSD sessions.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:23 +02:00
Ilya Dryomov
d30291b985 libceph: variable-sized ceph_object_id
Currently ceph_object_id can hold object names of up to 100
(CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
expect one - long rbd image names:

- a format 1 header is named "<imgname>.rbd"
- an object that points to a format 2 header is named "rbd_id.<imgname>"

We operate on these potentially long-named objects during rbd map, and,
for format 1 images, during header refresh.  (A format 2 header name is
a small system-generated string.)

Lift this 100 character limit by making ceph_object_id be able to point
to an externally-allocated string.  Apart from being able to work with
almost arbitrarily-long named objects, this allows us to reduce the
size of ceph_object_id from >100 bytes to 64 bytes.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:22 +02:00
Ilya Dryomov
13d1ad16d0 libceph: move message allocation out of ceph_osdc_alloc_request()
The size of ->r_request and ->r_reply messages depends on the size of
the object name (ceph_object_id), while the size of ceph_osd_request is
fixed.  Move message allocation into a separate function that would
have to be called after ceph_object_id and ceph_object_locator (which
is also going to become variable in size with RADOS namespaces) have
been filled in:

    req = ceph_osdc_alloc_request(...);
    <fill in req->r_base_oid>
    <fill in req->r_base_oloc>
    ceph_osdc_alloc_messages(req);

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:21 +02:00
Ilya Dryomov
3ed97d6345 libceph: make ceph_osdc_put_request() accept NULL
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:20 +02:00
Al Viro
0040773bff make xattr_resolve_handlers() safe to use with NULL ->s_xattr
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-25 17:34:41 -04:00
Andreas Gruenbacher
aaf431b4f9 xattr: Fail with -EINVAL for NULL attribute names
Commit 98e9cb57 improved the xattr name checks in xattr_resolve_name but
didn't update the NULL attribute name check appropriately, so NULL
attribute names lead to NULL pointer dereferences.  Turn that into
-EINVAL results instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
  fs/xattr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-25 17:33:47 -04:00
David Sterba
42f31734eb Merge branch 'cleanups-4.7' into for-chris-4.7-20160525 2016-05-25 22:51:03 +02:00
Nicholas D Steeves
0132761017 btrfs: fix string and comment grammatical issues and typos
Signed-off-by: Nicholas D Steeves <nsteeves@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-25 22:35:14 +02:00
Zhao Lei
f1fee6534d btrfs: scrub: Set bbio to NULL before calling btrfs_map_block
We usually call btrfs_put_bbio() when btrfs_map_block() failed,
btrfs_put_bbio() works right whether bbio is a valid value, or NULL.

But there is a exception, in some case, btrfs_map_block() will return
fail without touching *bbio(keeping its original value), and if bbio
was not initialized yet, invalid memory accessing will happened.

Above case is in scrub_missing_raid56_pages(), and similar case in
scrub_raid56_parity().

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-25 22:15:21 +02:00
Liu Bo
2d324f59f3 Btrfs: fix unexpected return value of fiemap
btrfs's fiemap is supposed to return 0 on success and return < 0 on
error. however, ret becomes 1 after looking up the last file extent:

  btrfs_lookup_file_extent ->
    btrfs_search_slot(..., ins_len=0, cow=0)

and if the offset is beyond EOF, we'll get 'path' pointed to the place
of potentail insertion, and ret == 1.

This may confuse applications using ioctl(FIEL_IOC_FIEMAP).

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-25 19:53:54 +02:00
Liu Bo
1c8b5b6e8b Btrfs: free sys_array eb as soon as possible
While reading sys_chunk_array in superblock, btrfs creates a temporary
extent buffer.  Since we don't use it after finishing reading
 sys_chunk_array, we don't need to keep it in memory.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-25 19:53:51 +02:00
Tom Haynes
fb1084e332 nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO
The mds can inform the client not to use the IOMODE_RW layout
segment for doing READs. I.e., it is basically a
IOMODE_WRITE layout segment.

It would do this to not interfere with the WRITEs.

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-25 13:35:20 -04:00
Weston Andros Adamson
ade8febde0 nfs: avoid race that crashes nfs_init_commit
Since the patch "NFS: Allow multiple commit requests in flight per file"
we can run multiple simultaneous commits on the same inode.  This
introduced a race over collecting pages to commit that made it possible
to call nfs_init_commit() with an empty list - which causes crashes like
the one below.

The fix is to catch this race and avoid calling nfs_init_commit and
initiate_commit when there is no work to do.

Here is the crash:

[600522.076832] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[600522.078475] IP: [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
[600522.078745] PGD 4272b1067 PUD 4272cb067 PMD 0
[600522.078972] Oops: 0000 [#1] SMP
[600522.079204] Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dcdbas ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vmw_vsock_vmci_transport vsock bonding ipmi_devintf ipmi_msghandler coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev vmw_balloon parport_pc parport acpi_cpufreq vmw_vmci i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw vmxnet3
[600522.081380]  vmw_pvscsi ata_generic pata_acpi
[600522.081809] CPU: 3 PID: 15667 Comm: /usr/bin/python Not tainted 4.1.9-100.pd.88.el7.x86_64 #1
[600522.082281] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[600522.082814] task: ffff8800bbbfa780 ti: ffff88042ae84000 task.ti: ffff88042ae84000
[600522.083378] RIP: 0010:[<ffffffffa0479e72>]  [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
[600522.083973] RSP: 0018:ffff88042ae87438  EFLAGS: 00010246
[600522.084571] RAX: 0000000000000000 RBX: ffff880003485e40 RCX: ffff88042ae87588
[600522.085188] RDX: 0000000000000000 RSI: ffff88042ae874b0 RDI: ffff880003485e40
[600522.085756] RBP: ffff88042ae87448 R08: ffff880003486010 R09: ffff88042ae874b0
[600522.086332] R10: 0000000000000000 R11: 0000000000000005 R12: ffff88042ae872d0
[600522.086905] R13: ffff88042ae874b0 R14: ffff880003485e40 R15: ffff88042704c840
[600522.087484] FS:  00007f4728ff2740(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
[600522.088070] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[600522.088663] CR2: 0000000000000040 CR3: 000000042b6aa000 CR4: 00000000001406e0
[600522.089327] Stack:
[600522.089926]  0000000000000001 ffff88042ae87588 ffff88042ae874f8 ffffffffa04f09fa
[600522.090549]  0000000000017840 0000000000017840 ffff88042ae87588 ffff8803258d9930
[600522.091169]  ffff88042ae87578 ffffffffa0563d80 0000000000000000 ffff88042704c840
[600522.091789] Call Trace:
[600522.092420]  [<ffffffffa04f09fa>] pnfs_generic_commit_pagelist+0x1da/0x320 [nfsv4]
[600522.093052]  [<ffffffffa0563d80>] ? ff_layout_commit_prepare_v3+0x30/0x30 [nfs_layout_flexfiles]
[600522.093696]  [<ffffffffa0562645>] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles]
[600522.094359]  [<ffffffffa047bc78>] nfs_generic_commit_list+0xe8/0x120 [nfs]
[600522.095032]  [<ffffffffa047bd6a>] nfs_commit_inode+0xba/0x110 [nfs]
[600522.095719]  [<ffffffffa046ac54>] nfs_release_page+0x44/0xd0 [nfs]
[600522.096410]  [<ffffffff811a8122>] try_to_release_page+0x32/0x50
[600522.097109]  [<ffffffff811bd4f1>] shrink_page_list+0x961/0xb30
[600522.097812]  [<ffffffff811bdced>] shrink_inactive_list+0x1cd/0x550
[600522.098530]  [<ffffffff811bea65>] shrink_lruvec+0x635/0x840
[600522.099250]  [<ffffffff811bed60>] shrink_zone+0xf0/0x2f0
[600522.099974]  [<ffffffff811bf312>] do_try_to_free_pages+0x192/0x470
[600522.100709]  [<ffffffff811bf6ca>] try_to_free_pages+0xda/0x170
[600522.101464]  [<ffffffff811b2198>] __alloc_pages_nodemask+0x588/0x970
[600522.102235]  [<ffffffff811fbbd5>] alloc_pages_vma+0xb5/0x230
[600522.103000]  [<ffffffff813a1589>] ? cpumask_any_but+0x39/0x50
[600522.103774]  [<ffffffff811d6115>] wp_page_copy.isra.55+0x95/0x490
[600522.104558]  [<ffffffff810e3438>] ? __wake_up+0x48/0x60
[600522.105357]  [<ffffffff811d7d3b>] do_wp_page+0xab/0x4f0
[600522.106137]  [<ffffffff810a1bbb>] ? release_task+0x36b/0x470
[600522.106902]  [<ffffffff8126dbd7>] ? eventfd_ctx_read+0x67/0x1c0
[600522.107659]  [<ffffffff811da2a8>] handle_mm_fault+0xc78/0x1900
[600522.108431]  [<ffffffff81067ef1>] __do_page_fault+0x181/0x420
[600522.109173]  [<ffffffff811446a6>] ? __audit_syscall_exit+0x1e6/0x280
[600522.109893]  [<ffffffff810681c0>] do_page_fault+0x30/0x80
[600522.110594]  [<ffffffff81024f36>] ? syscall_trace_leave+0xc6/0x120
[600522.111288]  [<ffffffff81790a58>] page_fault+0x28/0x30
[600522.111947] Code: 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 4c 8d 87 d0 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce <48> 8b 40 40 4c 8b 50 30 74 24 48 8b 87 d0 01 00 00 48 8b 7e 08
[600522.113343] RIP  [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
[600522.114003]  RSP <ffff88042ae87438>
[600522.114636] CR2: 0000000000000040

Fixes: af7cf057 (NFS: Allow multiple commit requests in flight per file)
CC: stable@vger.kernel.org
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-25 12:59:01 -04:00
Dan Carpenter
2997bfd049 NFS: checking for NULL instead of IS_ERR() in nfs_commit_file()
nfs_create_request() doesn't return NULL, it returns error pointers.

Fixes: 67911c8f18 ('NFS: Add nfs_commit_file()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-25 12:11:58 -04:00
Linus Torvalds
5d22c5ab85 A very quiet cycle for nfsd, mainly just an RDMA update from Chuck Lever.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXRL2PAAoJECebzXlCjuG+c34P/1wnkehVxDozBJp7UEzhrsE/
 U1dpwfykzVEIMh68TldBvyrt2Lb4ThLPZ7V2dVwNqA831S/VM6fWJyw8WerSgGgU
 SUGOzdF04rNfy41lXQNpDiiC417Fbp4Js4O+Q5kd+8kqQbXYqCwz0ce3DVbAT571
 JmJgBI8gZLhicyNRDOt0y6C+/3P+0bbXYvS8wkzY+CwbNczHJOCLhwViKzWTptm9
 LCSgDGm68ckpR7mZkWfEF3WdiZ9+SxeI+pT9dcomzxNfbv8NluDplYmdLbepA2J8
 uWHGprVe9WJMDnw4hJhrI2b3/rHIntpxuZYktmnb/z/ezBTyi3FXYWgAEdE1by+Y
 Gf7OewKOp8XcQ/iHRZ8vwXNrheHAr9++SB49mGBZJ3qj6bO+FrISQKX9FRxo6PrJ
 SDRgYjt5yUG2oD1AAs1NzuBPqZzR40mA6Yk4zuNAcxzK/S7DdRF/9Kjyk86TVv08
 3E3O5i1RyVcU/A7JdnbiyeDFMQoRshdnN0HShIZcSfcfW+qFKghNlO9bFfSl904F
 jlG6moNB5OBiV8FNOelY+HGAYoUdw120QxqQMv47oZGKCjv+rfK38aB4GBJ4iEuo
 TrGqNmrMrs/AKdL3Sd+8LuJqSfXggrwUDc/KS6CFz/U0eBbp6k0kcd7FEyG/J8kW
 JxQ0URgyJ+DHfc60E8LN
 =k6RP
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "A very quiet cycle for nfsd, mainly just an RDMA update from Chuck
  Lever"

* tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux:
  sunrpc: fix stripping of padded MIC tokens
  svcrpc: autoload rdma module
  svcrdma: Generalize svc_rdma_xdr_decode_req()
  svcrdma: Eliminate code duplication in svc_rdma_recvfrom()
  svcrdma: Drain QP before freeing svcrdma_xprt
  svcrdma: Post Receives only for forward channel requests
  svcrdma: Remove superfluous line from rdma_read_chunks()
  svcrdma: svc_rdma_put_context() is invoked twice in Send error path
  svcrdma: Do not add XDR padding to xdr_buf page vector
  svcrdma: Support IPv6 with NFS/RDMA
  nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid
  Remove unnecessary allocation
2016-05-24 14:39:20 -07:00
Linus Torvalds
0e01df100b Fix a number of bugs, most notably a potential stale data exposure
after a crash and a potential BUG_ON crash if a file has the data
 journalling flag enabled while it has dirty delayed allocation blocks
 that haven't been written yet.  Also fix a potential crash in the new
 project quota code and a maliciously corrupted file system.
 
 In addition, fix some DAX-specific bugs, including when there is a
 transient ENOSPC situation and races between writes via direct I/O and
 an mmap'ed segment that could lead to lost I/O.
 
 Finally the usual set of miscellaneous cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJXQ40fAAoJEPL5WVaVDYGjnwMH+wXHASgPfzZgtRInsTG8W/2L
 jsmAcMlyMAYIATWMppNtPIq0td49z1dYO0YkKhtPVMwfzu230IFWhGWp93WqP9ve
 XYHMmaBorFlMAzWgMKn1K0ExWZlV+ammmcTKgU0kU4qyZp0G/NnMtlXIkSNv2amI
 9Mn6R+v97c20gn8e9HWP/IVWkgPr+WBtEXaSGjC7dL6yI8hL+rJMqN82D76oU5ea
 vtwzrna/ISijy+etYmQzqHNYNaBKf40+B5HxQZw/Ta3FSHofBwXAyLaeEAr260Mf
 V3Eg2NDcKQxiZ3adBzIUvrRnrJV381OmHoguo8Frs8YHTTRiZ0T/s7FGr2Q0NYE=
 =7yIM
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Fix a number of bugs, most notably a potential stale data exposure
  after a crash and a potential BUG_ON crash if a file has the data
  journalling flag enabled while it has dirty delayed allocation blocks
  that haven't been written yet.  Also fix a potential crash in the new
  project quota code and a maliciously corrupted file system.

  In addition, fix some DAX-specific bugs, including when there is a
  transient ENOSPC situation and races between writes via direct I/O and
  an mmap'ed segment that could lead to lost I/O.

  Finally the usual set of miscellaneous cleanups"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
  ext4: pre-zero allocated blocks for DAX IO
  ext4: refactor direct IO code
  ext4: fix race in transient ENOSPC detection
  ext4: handle transient ENOSPC properly for DAX
  dax: call get_blocks() with create == 1 for write faults to unwritten extents
  ext4: remove unmeetable inconsisteny check from ext4_find_extent()
  jbd2: remove excess descriptions for handle_s
  ext4: remove unnecessary bio get/put
  ext4: silence UBSAN in ext4_mb_init()
  ext4: address UBSAN warning in mb_find_order_for_block()
  ext4: fix oops on corrupted filesystem
  ext4: fix check of dqget() return value in ext4_ioctl_setproject()
  ext4: clean up error handling when orphan list is corrupted
  ext4: fix hang when processing corrupted orphaned inode list
  ext4: remove trailing \n from ext4_warning/ext4_error calls
  ext4: fix races between changing inode journal mode and ext4_writepages
  ext4: handle unwritten or delalloc buffers before enabling data journaling
  ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
  ext4: do not ask jbd2 to write data for delalloc buffers
  jbd2: add support for avoiding data writes during transaction commits
  ...
2016-05-24 12:55:26 -07:00
Andreas Gruenbacher
1112018cef ubifs: ubifs_dump_inode: Fix dumping field bulk_read
The wrong field (xattr) is dumped here due to a copy-and-paste error.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-05-24 15:29:44 +02:00
Linus Torvalds
84787c572d Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - Oleg's "wait/ptrace: assume __WALL if the child is traced".  It's a
   kernel-based workaround for existing userspace issues.

 - A few hotfixes

 - befs cleanups

 - nilfs2 updates

 - sys_wait() changes

 - kexec updates

 - kdump

 - scripts/gdb updates

 - the last of the MM queue

 - a few other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (84 commits)
  kgdb: depends on VT
  drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable
  drm/radeon: make radeon_mn_get wait for mmap_sem killable
  drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
  uprobes: wait for mmap_sem for write killable
  prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
  exec: make exec path waiting for mmap_sem killable
  aio: make aio_setup_ring killable
  coredump: make coredump_wait wait for mmap_sem for write killable
  vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
  ipc, shm: make shmem attach/detach wait for mmap_sem killable
  mm, fork: make dup_mmap wait for mmap_sem for write killable
  mm, proc: make clear_refs killable
  mm: make vm_brk killable
  mm, elf: handle vm_brk error
  mm, aout: handle vm_brk failures
  mm: make vm_munmap killable
  mm: make vm_mmap killable
  mm: make mmap_sem for write waits killable for mm syscalls
  MAINTAINERS: add co-maintainer for scripts/gdb
  ...
2016-05-23 19:42:28 -07:00
Michal Hocko
f268dfe905 exec: make exec path waiting for mmap_sem killable
setup_arg_pages requires mmap_sem for write.  If the waiting task gets
killed by the oom killer it would block oom_reaper from asynchronous
address space reclaim and reduce the chances of timely OOM resolving.
Wait for the lock in the killable mode and return with EINTR if the task
got killed while waiting.  All the callers are already handling error
path and the fatal signal doesn't need any additional treatment.

The same applies to __bprm_mm_init.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Michal Hocko
013373e8b8 aio: make aio_setup_ring killable
aio_setup_ring waits for mmap_sem in writable mode.  If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.  This will also expedite the
return to the userspace and do_exit.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Benamin LaHaise <bcrl@kvack.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Michal Hocko
4136c26b65 coredump: make coredump_wait wait for mmap_sem for write killable
coredump_wait waits for mmap_sem for write currently which can prevent
oom_reaper to reclaim the oom victims address space asynchronously
because that requires mmap_sem for read.  This might happen if the oom
victim is multi threaded and some thread(s) is holding mmap_sem for read
(e.g.  page fault) and it is stuck in the page allocator while other
thread(s) reached coredump_wait already.

This patch simply uses down_write_killable and bails out with EINTR if
the lock got interrupted by the fatal signal.  do_coredump will return
right away and do_group_exit will take care to zap the whole thread
group.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Michal Hocko
4e80153a60 mm, proc: make clear_refs killable
CLEAR_REFS_MM_HIWATER_RSS and CLEAR_REFS_SOFT_DIRTY are relying on
mmap_sem for write.  If the waiting task gets killed by the oom killer
and it would operate on the current's mm it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.  This will also expedite the
return to the userspace and do_exit even if the mm is remote.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Petr Cermak <petrcermak@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Michal Hocko
ecc2bc8ac0 mm, elf: handle vm_brk error
load_elf_library doesn't handle vm_brk failure although nothing really
indicates it cannot do that because the function is allowed to fail due
to vm_mmap failures already.  This might be not a problem now but later
patch will make vm_brk killable (resp.  mmap_sem for write waiting will
become killable) and so the failure will be more probable.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Michal Hocko
864778b154 mm, aout: handle vm_brk failures
vm_brk is allowed to fail but load_aout_binary simply ignores the error
and happily continues.  I haven't noticed any problem from that in real
life but later patches will make the failure more likely because vm_brk
will become killable (resp.  mmap_sem for write waiting will become
killable) so we should be more careful now.

The error handling should be quite straightforward because there are
calls to vm_mmap which check the error properly already.  The only
notable exception is set_brk which is called after beyond_if label.  But
nothing indicates that we cannot move it above set_binfmt as the two do
not depend on each other and fail before we do set_binfmt and alter
reference counting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Oleg Nesterov
9eb8a659de exec: remove the no longer needed remove_arg_zero()->free_arg_page()
remove_arg_zero() does free_arg_page() for no reason.  This was needed
before and only if CONFIG_MMU=y: see commit 4fc75ff481 ("exec: fix
remove_arg_zero"), install_arg_page() was called for every page != NULL
in bprm->page[] array.  Today install_arg_page() has already gone and
free_arg_page() is nop after another commit b6a2fea393 ("mm: variable
length argument support").

CONFIG_MMU=n does free_arg_pages() in free_bprm() and thus it doesn't
need remove_arg_zero()->free_arg_page() too; apart from get_arg_page()
it never checks if the page in bprm->page[] was allocated or not, so the
"extra" non-freed page is fine.  OTOH, this free_arg_page() can add the
minor pessimization, the caller is going to do copy_strings_kernel()
right after remove_arg_zero() which will likely need to re-allocate the
same page again.

And as Hujunjie pointed out, the "offset == PAGE_SIZE" check is wrong
because we are going to increment bprm->p once again before return, so
CONFIG_MMU=n "leaks" the page anyway if '0' is the final byte in this
page.

NOTE: remove_arg_zero() assumes that argv[0] is null-terminated but this
is not necessarily true.  copy_strings() does "len = strnlen_user(...)",
then copy_from_user(len) but another thread or debuger can overwrite the
trailing '0' in between.  Afaics nothing really bad can happen because
we must always have the null-terminated bprm->filename copied by the 1st
copy_strings_kernel(), but perhaps we should change this code to check
"bprm->p < bprm->exec" anyway, and/or change copy_strings() to ensure
that the last byte in string is always zero.

Link: http://lkml.kernel.org/r/20160517155335.GA31435@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported by: hujunjie <jj.net@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
076a378ba6 nilfs2: fix block comments
This fixes block comments with proper formatting to eliminate the
following checkpatch.pl warnings:

  "WARNING: Block comments use * on subsequent lines"
  "WARNING: Block comments use a trailing */ on a separate line"

Link: http://lkml.kernel.org/r/1462886671-3521-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
80d6505232 nilfs2: remove loops of single statement macros
This fixes checkpatch.pl warning "WARNING: Single statement macros
should not use a do {} while (0) loop".

Link: http://lkml.kernel.org/r/1462886671-3521-7-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
7f00184e9c nilfs2: remove unnecessary else after return or break
This fixes the checkpatch.pl warning that suggests else is not
generally useful after a break or return.

Link: http://lkml.kernel.org/r/1462886671-3521-6-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
0c6c44cb9f nilfs2: avoid bare use of 'unsigned'
This fixes checkpatch.pl warning "WARNING: Prefer 'unsigned int' to
bare use of 'unsigned'".

Link: http://lkml.kernel.org/r/1462886671-3521-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
7592ecde65 nilfs2: fix code indent coding style issue
This fixes checkpatch.pl warning "WARNING: suspect code indent for
conditional statements".

Link: http://lkml.kernel.org/r/1462886671-3521-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
c9cb9b5c85 nilfs2: remove space before semicolon
This fixes the checkpatch.pl warning "WARNING: space prohibited before
semicolon" at nilfs_store_magic_and_option().

Link: http://lkml.kernel.org/r/1462886671-3521-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
06f4abf6ca nilfs2: do not emit extra newline on nilfs_warning() and nilfs_error()
This updates call sites of nilfs_warning() and nilfs_error() so that they
don't add a duplicate newline.  These output functions are already
designed to add a trailing newline to the message.

Link: http://lkml.kernel.org/r/1462886671-3521-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
facb9ec5e6 nilfs2: clean trailing semicolons in macros
Remove trailing semicolons from macros, as suggested by checkpatch.pl.

Link: http://lkml.kernel.org/r/1461935747-10380-12-git-send-email-konishi.ryusuke@lab.ntt.co.jp
[konishi.ryusuke@lab.ntt.co.jp: fix style issues]
  Link: http://lkml.kernel.org/r/20160509.231703.1481729973362188932.konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
4ad364ca1c nilfs2: add missing line spacing
Clean up checkpatch.pl warnings "WARNING: Missing a blank line after
declarations" from nilfs2.

Link: http://lkml.kernel.org/r/1461935747-10380-11-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
e7a142aaa0 nilfs2: replace __attribute__((packed)) with __packed
This fixes the following checkpatch.pl warning:

  WARNING: __packed is preferred over __attribute__((packed))
  #23: FILE: export.h:23:
  +} __attribute__ ((packed));

Link: http://lkml.kernel.org/r/1461935747-10380-10-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
2d19961d83 nilfs2: move cleanup code of metadata file from inode routines
Refactor nilfs_clear_inode() and nilfs_i_callback() so that cleanup
code or resource deallocation related to metadata file will be moved
out to mdt.c.

Link: http://lkml.kernel.org/r/1461935747-10380-9-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
24e20ead2f nilfs2: get rid of nilfs_mdt_mark_block_dirty()
nilfs_mdt_mark_block_dirty() can be replaced with primary functions
like nilfs_mdt_get_block() and mark_buffer_dirty(), and it's used only
by nilfs_ioctl_mark_blocks_dirty().

This gets rid of the function to simplify the interface of metadata
file.

Link: http://lkml.kernel.org/r/1461935747-10380-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
4b420ab4ee nilfs2: clean up old e-mail addresses
E-mail addresses of osrg.net domain are no longer available.  This
removes them from authorship notices and prevents reporters from being
confused.

Link: http://lkml.kernel.org/r/1461935747-10380-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
5726d0b454 nilfs2: remove FSF mailing address from GPL notices
This removes the extra paragraph which mentions FSF address in GPL
notices from source code of nilfs2 and avoids the checkpatch.pl error
related to it.

Link: http://lkml.kernel.org/r/1461935747-10380-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
f19e78dee9 nilfs2: remove space before comma
Fix checkpatch.pl error "ERROR: space prohibited before that ','
(ctx:WxW)" at nilfs_sufile_set_suinfo().

This also fixes checkpatch.pl warning "WARNING: Prefer 'unsigned int' to
bare use of 'unsigned'" at nilfs_sufile_set_suinfo() and
nilfs_sufile_get_suinfo().

Link: http://lkml.kernel.org/r/1461935747-10380-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ryusuke Konishi
8fa7c32094 nilfs2: fix white space issue in nilfs_mount()
Fix the following checkpatch.pl error and warnings:

  ERROR: code indent should use tabs where possible
  #1317: FILE: super.c:1317:
  + ^I^Is_new = true;$

  WARNING: please, no space before tabs
  #1317: FILE: super.c:1317:
  + ^I^Is_new = true;$

  WARNING: please, no spaces at the start of a line
  #1317: FILE: super.c:1317:
  + ^I^Is_new = true;$

Link: http://lkml.kernel.org/r/1461935747-10380-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Julia Lawall
1c613cb986 nilfs2: constify nilfs_sc_operations structures
The nilfs_sc_operations structures are never modified, so declare them
as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Salah Triki
a46d8c3fc7 fs/befs/io.c:befs_bread(): remove unneeded initialization to NULL
bh is reinitialized by sb_bread() so no need to init it
with NULL in the beginning of befs_bread()

Link: http://lkml.kernel.org/r/88481760b43226fac16adb1f1e68897e47d8235c.1462841692.git.salah.triki@acm.org
Signed-off-by: Salah Triki <salah.triki@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Salah Triki
77169af8b9 fs/befs/io.c:befs_bread_iaddr(): remove unneeded initialization to NULL
bh is reinitialized by sb_bread() so no need to init it
with NULL in the beginning of befs_bread_iaddr()

Link: http://lkml.kernel.org/r/586d2639d729345b9c07aac10ba713d8ceb8745a.1462841692.git.salah.triki@acm.org
Signed-off-by: Salah Triki <salah.triki@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Salah Triki
d6e0203369 fs/befs/linuxvfs.c:befs_iget(): remove unneeded befs_nio initialization to NULL
befs_ino is reinitialized by BEFS_I() so no need to init it
with NULL in the beginning of befs_iget()

Link: http://lkml.kernel.org/r/a5c02445e436629c4d4ba1b65d91501878942f58.1462842887.git.salah.triki@acm.org
Signed-off-by: Salah Triki <salah.triki@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Salah Triki
44ad809865 fs/befs/linuxvfs.c:befs_iget(): remove unneeded raw_inode initialization to NULL
raw_inode is reinitialized to bh->b_data so no need to init it
with NULL in the beginning of befs_iget()

Link: http://lkml.kernel.org/r/0a66baaaacb6b7e5fcea5b31b57b649261152281.1462842887.git.salah.triki@acm.org
Signed-off-by: Salah Triki <salah.triki@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Salah Triki
e4c7f5b109 fs/befs/linuxvfs.c:befs_iget(): remove unneeded initialization to NULL
bh is reinitialized by befs_bread() so no need to init it
with NULL in the beginning of befs_iget()

Link: http://lkml.kernel.org/r/38d62b1469bc3b316ba6b81fd8e26fc66fdd713b.1462842886.git.salah.triki@acm.org
Signed-off-by: Salah Triki <salah.triki@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Salah Triki
c940876368 fs/befs/linuxvfs.c:befs_get_block(): remove unneeded initialization to NULL
inode is reinitialized by befs_iget() so no need to init it
with NULL in the beginning of befs_lookup()

Link: http://lkml.kernel.org/r/03d7e46890aef94078130bed97c4f8f8ae9ea2b2.1462842886.git.salah.triki@acm.org
Signed-off-by: Salah Triki <salah.triki@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Salah Triki
6c8da24b90 fs/befs/datastream.c:befs_find_brun_dblindirect(): remove unneeded initializations to NULL
iaddr_array is unconditionally initialized to NULL in
befs_find_brun_dblindirect().

Link: http://lkml.kernel.org/r/940def273e30ef37957fba9da6981a10fb3c9741.1462649034.git.salah.triki@acm.org
Signed-off-by: Salah Triki <salah.triki@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Salah Triki
3080ea9e5b fs/befs/datastream.c:befs_read_lsymlink(): remove unneeded initialization to NULL
bh is reinitialized by befs_read_datastream() so no need to init it
with NULL in the beginning of befs_read_lsymlink().

Link: http://lkml.kernel.org/r/e22f279bceb8d026af048952e02ba98925b21c92.1462649034.git.salah.triki@acm.org
Signed-off-by: Salah Triki <salah.triki@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Salah Triki
ff2fe0aa2e fs/befs/datastream.c:befs_read_datastream(): remove unneeded initialization to NULL
bh is reinitialized by befs_bread_iaddr() so no need to init it
with NULL in the beginning of befs_read_datastream().

Link: http://lkml.kernel.org/r/81e1f70187db34d195c8e42b1ff78be6a71c0060.1462649034.git.salah.triki@acm.org
Signed-off-by: Salah Triki <salah.triki@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Ralf Baechle
f43edca7ed ELF/MIPS build fix
CONFIG_MIPS32_N32=y but CONFIG_BINFMT_ELF disabled results in the
following linker errors:

  arch/mips/built-in.o: In function `elf_core_dump':
  binfmt_elfn32.c:(.text+0x23dbc): undefined reference to `elf_core_extra_phdrs'
  binfmt_elfn32.c:(.text+0x246e4): undefined reference to `elf_core_extra_data_size'
  binfmt_elfn32.c:(.text+0x248d0): undefined reference to `elf_core_write_extra_phdrs'
  binfmt_elfn32.c:(.text+0x24ac4): undefined reference to `elf_core_write_extra_data'

CONFIG_MIPS32_O32=y but CONFIG_BINFMT_ELF disabled results in the following
linker errors:

  arch/mips/built-in.o: In function `elf_core_dump':
  binfmt_elfo32.c:(.text+0x28a04): undefined reference to `elf_core_extra_phdrs'
  binfmt_elfo32.c:(.text+0x29330): undefined reference to `elf_core_extra_data_size'
  binfmt_elfo32.c:(.text+0x2951c): undefined reference to `elf_core_write_extra_phdrs'
  binfmt_elfo32.c:(.text+0x29710): undefined reference to `elf_core_write_extra_data'

This is because binfmt_elfn32 and binfmt_elfo32 are using symbols from
elfcore but for these configurations elfcore will not be built.

Fixed by making elfcore selectable by a separate config symbol which
unlike the current mechanism can also be used from other directories
than kernel/, then having each flavor of ELF that relies on elfcore.o,
select it in Kconfig, including CONFIG_MIPS32_N32 and CONFIG_MIPS32_O32
which fixes this issue.

Link: http://lkml.kernel.org/r/20160520141705.GA1913@linux-mips.org
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Linus Torvalds
1f40c49570 libnvdimm for 4.7
1/ Device DAX for persistent memory:
    Device DAX is the device-centric analogue of Filesystem DAX
    (CONFIG_FS_DAX).  It allows memory ranges to be allocated and mapped
    without need of an intervening file system.  Device DAX is strict,
    precise and predictable.  Specifically this interface:
 
    a) Guarantees fault granularity with respect to a given page size
       (pte, pmd, or pud) set at configuration time.
 
    b) Enforces deterministic behavior by being strict about what fault
       scenarios are supported.
 
    Persistent memory is the first target, but the mechanism is also
    targeted for exclusive allocations of performance/feature differentiated
    memory ranges.
 
 2/ Support for the HPE DSM (device specific method) command formats.
    This enables management of these first generation devices until a
    unified DSM specification materializes.
 
 3/ Further ACPI 6.1 compliance with support for the common dimm
    identifier format.
 
 4/ Various fixes and cleanups across the subsystem.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXQhdeAAoJEB7SkWpmfYgCYP8P/RAgHkroL5lUKKU45TQUBKcY
 diC9POeNSccme4tIRIQCGQUZ7+7mKM5ECv2ulF4xYOHvFBCcd/8OF6xKAXs48r3v
 oguYhvX1YvIkBc9FUfBQbR1IsCOJ7uWp/UYiYCIQEXS5tS9Jv545j3ASqDt9xWoV
 TWlceZn3yWSbASiV9qZ2eXhEkk75pg4yara++rsm2/7rs/TTXn5EIjBs+57BtAo+
 6utI4fTy0CQvBYwVzam3m7y9dt2Z2jWXL4hgmT7pkvJ7HDoctVly0P9+bknJPUAo
 g+NugKgTGeiqH5GYp5CTZ9KvL91sDF4q00pfinITVdFl0E3VE293cIHlAzSQBm5/
 w58xxaRV958ZvpH7EaBmYQG82QDi/eFNqeHqVGn0xAM6MlaqO7avUMQp2lRPYMCJ
 u1z/NloR5yo+sffHxsn5Luiq9KqOf6zk33PuxEkKbN74OayCSPn/SeVCO7rQR0B6
 yPMJTTcTiCLnId1kOWAPaEmuK2U3BW/+ogg7hKgeCQSysuy5n6Ok5a2vEx/gJRAm
 v9yF68RmIWumpHr+QB0TmB8mVbD5SY+xWTm3CqJb9MipuFIOF7AVsPyTgucBvE7s
 v+i5F6MDO6tcVfiDT4AiZEt6D2TM5RbtckkUEX3ZTD6j7CGuR5D8bH0HNRrghrYk
 KT1lAk6tjWBOGAHc5Ji7
 =Y3Xv
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "The bulk of this update was stabilized before the merge window and
  appeared in -next.  The "device dax" implementation was revised this
  week in response to review feedback, and to address failures detected
  by the recently expanded ndctl unit test suite.

  Not included in this pull request are two dax topic branches (dax
  error handling, and dax radix-tree locking).  These topics were
  deferred to get a few more days of -next integration testing, and to
  coordinate a branch baseline with Ted and the ext4 tree.  Vishal and
  Ross will send the error handling and locking topics respectively in
  the next few days.

  This branch has received a positive build result from the kbuild robot
  across 226 configs.

  Summary:

   - Device DAX for persistent memory: Device DAX is the device-centric
     analogue of Filesystem DAX (CONFIG_FS_DAX).  It allows memory
     ranges to be allocated and mapped without need of an intervening
     file system.  Device DAX is strict, precise and predictable.
     Specifically this interface:

      a) Guarantees fault granularity with respect to a given page size
         (pte, pmd, or pud) set at configuration time.

      b) Enforces deterministic behavior by being strict about what
         fault scenarios are supported.

     Persistent memory is the first target, but the mechanism is also
     targeted for exclusive allocations of performance/feature
     differentiated memory ranges.

   - Support for the HPE DSM (device specific method) command formats.
     This enables management of these first generation devices until a
     unified DSM specification materializes.

   - Further ACPI 6.1 compliance with support for the common dimm
     identifier format.

   - Various fixes and cleanups across the subsystem"

* tag 'libnvdimm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (40 commits)
  libnvdimm, dax: fix deletion
  libnvdimm, dax: fix alignment validation
  libnvdimm, dax: autodetect support
  libnvdimm: release ida resources
  Revert "block: enable dax for raw block devices"
  /dev/dax, core: file operations and dax-mmap
  /dev/dax, pmem: direct access to persistent memory
  libnvdimm: stop requiring a driver ->remove() method
  libnvdimm, dax: record the specified alignment of a dax-device instance
  libnvdimm, dax: reserve space to store labels for device-dax
  libnvdimm, dax: introduce device-dax infrastructure
  nfit: add sysfs dimm 'family' and 'dsm_mask' attributes
  tools/testing/nvdimm: ND_CMD_CALL support
  nfit: disable vendor specific commands
  nfit: export subsystem ids as attributes
  nfit: fix format interface code byte order per ACPI6.1
  nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism
  nfit, libnvdimm: clarify "commands" vs "_DSMs"
  libnvdimm: increase max envelope size for ioctl
  acpi/nfit: Add sysfs "id" for NVDIMM ID
  ...
2016-05-23 11:18:01 -07:00
Linus Torvalds
f6c658df63 Enhancement
- fs-specific prefix for fscrypto
 - fault injection facility
 - expose validity bitmaps for user to be aware of fragmentation
 - fallocate/rm/preallocation speed up
 - use percpu counters
 
 Bug fixes
 - some inline_dentry/inline_data bugs
 - error handling for atomic/volatile/orphan inodes
 - recover broken superblock
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXQPu4AAoJEEAUqH6CSFDSILgP/1dj6fmtytr8c+55EBqXUGpo
 M7rS93JTxlmU5BduIo9psJsEquTQoVEmxB/Gjd+ZnI5R6Rp1c/REaP0ba374rEhZ
 ecMQh5QqzM1gRNFXrQhWFEL/KtfRqt3T80zebQP7pxFUm/m9NGMLWT43RzQ8AAhr
 Y3P0NLdvxA4HAnipKptkPJcGZQlWnL9W/MR+LgsXLXqLDwJHkVu61GcF0y2ibcJM
 lEtIRmyH5tg7hP5c5LTw9pKQFHkIZt5cHFLjrJ1x8FSm2TXOcJPbjOrThvcb+NKK
 e0O+6R0meH2eMpak+BTkZp2YbPPyXOb1N00j//lmbPjCoJPd4ZuiJ+oRoHUlTxtU
 FhO67t0brlDbMFQVRFrtv8VA8M6by+DTAAP3Ffx62I/TJkphKANCSoyQRhlWtxxO
 kRU69N7ipnRNxO4WCv40FjaQjSIElCKysP1POazRmAOQm7UFTGT9Nj37+eqUcEPJ
 HZ7O61DEHNemb0SMlJ8WSClstt0yUU+2cjRfTPAr2Wd3V8gYbRs0QUg5M2GLgywR
 EmiJfpkXse3f/nR8W6g1hganSOXA0AZX+EUibed6VkV3oYemdFbm8OymeEmLmWpM
 y2F3D7dPLW7MCoTXJqtwFWdoDwI+zkH4rJaPGTq5TVBRWVU/njX8OvoB47pOvKV1
 kccL7zv2PekE1hSDO5WF
 =6MSp
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, as Ted pointed out, fscrypto allows one more key prefix
  given by filesystem to resolve backward compatibility issues.  Other
  than that, we've fixed several error handling cases by introducing
  a fault injection facility.  We've also achieved performance
  improvement in some workloads as well as a bunch of bug fixes.

  Summary:

  Enhancements:
   - fs-specific prefix for fscrypto
   - fault injection facility
   - expose validity bitmaps for user to be aware of fragmentation
   - fallocate/rm/preallocation speed up
   - use percpu counters

  Bug fixes:
   - some inline_dentry/inline_data bugs
   - error handling for atomic/volatile/orphan inodes
   - recover broken superblock"

* tag 'for-f2fs-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (73 commits)
  f2fs: fix to update dirty page count correctly
  f2fs: flush pending bios right away when error occurs
  f2fs: avoid ENOSPC fault in the recovery process
  f2fs: make exit_f2fs_fs more clear
  f2fs: use percpu_counter for total_valid_inode_count
  f2fs: use percpu_counter for alloc_valid_block_count
  f2fs: use percpu_counter for # of dirty pages in inode
  f2fs: use percpu_counter for page counters
  f2fs: use bio count instead of F2FS_WRITEBACK page count
  f2fs: manipulate dirty file inodes when DATA_FLUSH is set
  f2fs: add fault injection to sysfs
  f2fs: no need inc dirty pages under inode lock
  f2fs: fix incorrect error path handling in f2fs_move_rehashed_dirents
  f2fs: fix i_current_depth during inline dentry conversion
  f2fs: correct return value type of f2fs_fill_super
  f2fs: fix deadlock when flush inline data
  f2fs: avoid f2fs_bug_on during recovery
  f2fs: show # of orphan inodes
  f2fs: support in batch fzero in dnode page
  f2fs: support in batch multi blocks preallocation
  ...
2016-05-21 18:25:28 -07:00
Dan Williams
36092ee8ba Merge branch 'for-4.7/dax' into libnvdimm-for-next 2016-05-21 12:33:04 -07:00
Linus Torvalds
07be1337b9 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This has our merge window series of cleanups and fixes.  These target
  a wide range of issues, but do include some important fixes for
  qgroups, O_DIRECT, and fsync handling.  Jeff Mahoney moved around a
  few definitions to make them easier for userland to consume.

  Also whiteout support is included now that issues with overlayfs have
  been cleared up.

  I have one more fix pending for page faults during btrfs_copy_from_user,
  but I wanted to get this bulk out the door first"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (90 commits)
  btrfs: fix memory leak during RAID 5/6 device replacement
  Btrfs: add semaphore to synchronize direct IO writes with fsync
  Btrfs: fix race between block group relocation and nocow writes
  Btrfs: fix race between fsync and direct IO writes for prealloc extents
  Btrfs: fix number of transaction units for renames with whiteout
  Btrfs: pin logs earlier when doing a rename exchange operation
  Btrfs: unpin logs if rename exchange operation fails
  Btrfs: fix inode leak on failure to setup whiteout inode in rename
  btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT
  Btrfs: pin log earlier when renaming
  Btrfs: unpin log if rename operation fails
  Btrfs: don't do unnecessary delalloc flushes when relocating
  Btrfs: don't wait for unrelated IO to finish before relocation
  Btrfs: fix empty symlink after creating symlink and fsync parent dir
  Btrfs: fix for incorrect directory entries after fsync log replay
  btrfs: build fixup for qgroup_account_snapshot
  btrfs: qgroup: Fix qgroup accounting when creating snapshot
  Btrfs: fix fspath error deallocation
  btrfs: make find_workspace warn if there are no workspaces
  btrfs: make find_workspace always succeed
  ...
2016-05-21 10:49:22 -07:00
Linus Torvalds
5469dc270c Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - the rest of MM

 - KASAN updates

 - procfs updates

 - exit, fork updates

 - printk updates

 - lib/ updates

 - radix-tree testsuite updates

 - checkpatch updates

 - kprobes updates

 - a few other misc bits

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
  samples/kprobes: print out the symbol name for the hooks
  samples/kprobes: add a new module parameter
  kprobes: add the "tls" argument for j_do_fork
  init/main.c: simplify initcall_blacklisted()
  fs/efs/super.c: fix return value
  checkpatch: improve --git <commit-count> shortcut
  checkpatch: reduce number of `git log` calls with --git
  checkpatch: add support to check already applied git commits
  checkpatch: add --list-types to show message types to show or ignore
  checkpatch: advertise the --fix and --fix-inplace options more
  checkpatch: whine about ACCESS_ONCE
  checkpatch: add test for keywords not starting on tabstops
  checkpatch: improve CONSTANT_COMPARISON test for structure members
  checkpatch: add PREFER_IS_ENABLED test
  lib/GCD.c: use binary GCD algorithm instead of Euclidean
  radix-tree: free up the bottom bit of exceptional entries for reuse
  dax: move RADIX_DAX_ definitions to dax.c
  radix-tree: make radix_tree_descend() more useful
  radix-tree: introduce radix_tree_replace_clear_tags()
  radix-tree: tidy up __radix_tree_create()
  ...
2016-05-20 22:31:33 -07:00
Dan Williams
acc93d30d7 Revert "block: enable dax for raw block devices"
This reverts commit 5a023cdba5.

The functionality is superseded by the new "Device DAX" facility.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-05-20 22:02:56 -07:00
Linus Torvalds
3aa2fc1667 driver core update for 4.7-rc1
Here's the "big" driver core update for 4.7-rc1.
 
 Mostly just debugfs changes, the long-known and messy races with removing
 debugfs files should be fixed thanks to the great work of Nicolai Stange.  We
 also have some isa updates in here (the x86 maintainers told me to take it
 through this tree), a new warning when we run out of dynamic char major
 numbers, and a few other assorted changes, details in the shortlog.
 
 All have been in linux-next for some time with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlc/0mwACgkQMUfUDdst+ynjXACgjNxR5nMUiM8ZuuD0i4Xj7VXd
 hnIAoM08+XDCv41noGdAcKv+2WZVZWMC
 =i+0H
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here's the "big" driver core update for 4.7-rc1.

  Mostly just debugfs changes, the long-known and messy races with
  removing debugfs files should be fixed thanks to the great work of
  Nicolai Stange.  We also have some isa updates in here (the x86
  maintainers told me to take it through this tree), a new warning when
  we run out of dynamic char major numbers, and a few other assorted
  changes, details in the shortlog.

  All have been in linux-next for some time with no reported issues"

* tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
  Revert "base: dd: don't remove driver_data in -EPROBE_DEFER case"
  gpio: ws16c48: Utilize the ISA bus driver
  gpio: 104-idio-16: Utilize the ISA bus driver
  gpio: 104-idi-48: Utilize the ISA bus driver
  gpio: 104-dio-48e: Utilize the ISA bus driver
  watchdog: ebc-c384_wdt: Utilize the ISA bus driver
  iio: stx104: Utilize the module_isa_driver and max_num_isa_dev macros
  iio: stx104: Add X86 dependency to STX104 Kconfig option
  Documentation: Add ISA bus driver documentation
  isa: Implement the max_num_isa_dev macro
  isa: Implement the module_isa_driver macro
  pnp: pnpbios: Add explicit X86_32 dependency to PNPBIOS
  isa: Decouple X86_32 dependency from the ISA Kconfig option
  driver-core: use 'dev' argument in dev_dbg_ratelimited stub
  base: dd: don't remove driver_data in -EPROBE_DEFER case
  kernfs: Move faulting copy_user operations outside of the mutex
  devcoredump: add scatterlist support
  debugfs: unproxify files created through debugfs_create_u32_array()
  debugfs: unproxify files created through debugfs_create_blob()
  debugfs: unproxify files created through debugfs_create_bool()
  ...
2016-05-20 21:26:15 -07:00
Linus Torvalds
087afe8aaf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes and more updates from David Miller:

 1) Tunneling fixes from Tom Herbert and Alexander Duyck.

 2) AF_UNIX updates some struct sock bit fields with the socket lock,
    whereas setsockopt() sets overlapping ones with locking.  Seperate
    out the synchronized vs.  the AF_UNIX unsynchronized ones to avoid
    corruption.  From Andrey Ryabinin.

 3) Mount BPF filesystem with mount_nodev rather than mount_ns, from
    Eric Biederman.

 4) A couple kmemdup conversions, from Muhammad Falak R Wani.

 5) BPF verifier fixes from Alexei Starovoitov.

 6) Don't let tunneled UDP packets get stuck in socket queues, if
    something goes wrong during the encapsulation just drop the packet
    rather than signalling an error up the call stack.  From Hannes
    Frederic Sowa.

 7) SKB ref after free in batman-adv, from Florian Westphal.

 8) TCP iSCSI, ocfs2, rds, and tipc have to disable BH in it's TCP
    callbacks since the TCP stack runs pre-emptibly now.  From Eric
    Dumazet.

 9) Fix crash in fixed_phy_add, from Rabin Vincent.

10) Fix length checks in xen-netback, from Paul Durrant.

11) Fix mixup in KEY vs KEYID macsec attributes, from Sabrina Dubroca.

12) RDS connection spamming bug fixes from Sowmini Varadhan

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits)
  net: suppress warnings on dev_alloc_skb
  uapi glibc compat: fix compilation when !__USE_MISC in glibc
  udp: prevent skbs lingering in tunnel socket queues
  bpf: teach verifier to recognize imm += ptr pattern
  bpf: support decreasing order in direct packet access
  net: usb: ch9200: use kmemdup
  ps3_gelic: use kmemdup
  net:liquidio: use kmemdup
  bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  net: cdc_ncm: update datagram size after changing mtu
  tuntap: correctly wake up process during uninit
  intel: Add support for IPv6 IP-in-IP offload
  ip6_gre: Do not allow segmentation offloads GRE_CSUM is enabled with FOU/GUE
  RDS: TCP: Avoid rds connection churn from rogue SYNs
  RDS: TCP: rds_tcp_accept_worker() must exit gracefully when terminating rds-tcp
  net: sock: move ->sk_shutdown out of bitfields.
  ipv6: Don't reset inner headers in ip6_tnl_xmit
  ip4ip6: Support for GSO/GRO
  ip6ip6: Support for GSO/GRO
  ipv6: Set features for IPv6 tunnels
  ...
2016-05-20 20:01:26 -07:00
Linus Torvalds
b99a9e8776 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Two small cifs fixes, including one spnego upcall cifs security fix
  for stable"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Remove some obsolete comments
  cifs: Create dedicated keyring for spnego operations
2016-05-20 19:16:12 -07:00