Commit Graph

45009 Commits

Author SHA1 Message Date
Linus Torvalds
3fc9d69093 Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "This branch also contains core changes.  I've come to the conclusion
  that from 4.9 and forward, I'll be doing just a single branch.  We
  often have dependencies between core and drivers, and it's hard to
  always split them up appropriately without pulling core into drivers
  when that happens.

  That said, this contains:

   - separate secure erase type for the core block layer, from
     Christoph.

   - set of discard fixes, from Christoph.

   - bio shrinking fixes from Christoph, as a followup up to the
     op/flags change in the core branch.

   - map and append request fixes from Christoph.

   - NVMeF (NVMe over Fabrics) code from Christoph.  This is pretty
     exciting!

   - nvme-loop fixes from Arnd.

   - removal of ->driverfs_dev from Dan, after providing a
     device_add_disk() helper.

   - bcache fixes from Bhaktipriya and Yijing.

   - cdrom subchannel read fix from Vchannaiah.

   - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

   - set of drbd updates and fixes from Fabian, Lars, and Philipp.

   - mg_disk error path fix from Bart.

   - user notification for failed device add for loop, from Minfei.

   - NVMe in general:
        + NVMe delay quirk from Guilherme.
        + SR-IOV support and command retry limits from Keith.
        + fix for memory-less NUMA node from Masayoshi.
        + use UINT_MAX for discard sectors, from Minfei.
        + cancel IO fixes from Ming.
        + don't allocate unused major, from Neil.
        + error code fixup from Dan.
        + use constants for PSDT/FUSE from James.
        + variable init fix from Jay.
        + fabrics fixes from Ming, Sagi, and Wei.
        + various fixes"

* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
  nvme/pci: Provide SR-IOV support
  nvme: initialize variable before logical OR'ing it
  block: unexport various bio mapping helpers
  scsi/osd: open code blk_make_request
  target: stop using blk_make_request
  block: simplify and export blk_rq_append_bio
  block: ensure bios return from blk_get_request are properly initialized
  virtio_blk: use blk_rq_map_kern
  memstick: don't allow REQ_TYPE_BLOCK_PC requests
  block: shrink bio size again
  block: simplify and cleanup bvec pool handling
  block: get rid of bio_rw and READA
  block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
  NVMe: don't allocate unused nvme_major
  nvme: avoid crashes when node 0 is memoryless node.
  nvme: Limit command retries
  loop: Make user notify for adding loop device failed
  nvme-loop: fix nvme-loop Kconfig dependencies
  nvmet: fix return value check in nvmet_subsys_alloc()
  ...
2016-07-26 15:37:51 -07:00
Linus Torvalds
d05d7f4079 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:

   - the big change is the cleanup from Mike Christie, cleaning up our
     uses of command types and modified flags.  This is what will throw
     some merge conflicts

   - regression fix for the above for btrfs, from Vincent

   - following up to the above, better packing of struct request from
     Christoph

   - a 2038 fix for blktrace from Arnd

   - a few trivial/spelling fixes from Bart Van Assche

   - a front merge check fix from Damien, which could cause issues on
     SMR drives

   - Atari partition fix from Gabriel

   - convert cfq to highres timers, since jiffies isn't granular enough
     for some devices these days.  From Jan and Jeff

   - CFQ priority boost fix idle classes, from me

   - cleanup series from Ming, improving our bio/bvec iteration

   - a direct issue fix for blk-mq from Omar

   - fix for plug merging not involving the IO scheduler, like we do for
     other types of merges.  From Tahsin

   - expose DAX type internally and through sysfs.  From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
  block: Fix front merge check
  block: do not merge requests without consulting with io scheduler
  block: Fix spelling in a source code comment
  block: expose QUEUE_FLAG_DAX in sysfs
  block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
  Btrfs: fix comparison in __btrfs_map_block()
  block: atari: Return early for unsupported sector size
  Doc: block: Fix a typo in queue-sysfs.txt
  cfq-iosched: Charge at least 1 jiffie instead of 1 ns
  cfq-iosched: Fix regression in bonnie++ rewrite performance
  cfq-iosched: Convert slice_resid from u64 to s64
  block: Convert fifo_time from ulong to u64
  blktrace: avoid using timespec
  block/blk-cgroup.c: Declare local symbols static
  block/bio-integrity.c: Add #include "blk.h"
  block/partition-generic.c: Remove a set-but-not-used variable
  block: bio: kill BIO_MAX_SIZE
  cfq-iosched: temporarily boost queue priority for idle classes
  block: drbd: avoid to use BIO_MAX_SIZE
  block: bio: remove BIO_MAX_SECTORS
  ...
2016-07-26 15:03:07 -07:00
Linus Torvalds
55392c4c06 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "This update provides the following changes:

   - The rework of the timer wheel which addresses the shortcomings of
     the current wheel (cascading, slow search for next expiring timer,
     etc).  That's the first major change of the wheel in almost 20
     years since Finn implemted it.

   - A large overhaul of the clocksource drivers init functions to
     consolidate the Device Tree initialization

   - Some more Y2038 updates

   - A capability fix for timerfd

   - Yet another clock chip driver

   - The usual pile of updates, comment improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
  tick/nohz: Optimize nohz idle enter
  clockevents: Make clockevents_subsys static
  clocksource/drivers/time-armada-370-xp: Fix return value check
  timers: Implement optimization for same expiry time in mod_timer()
  timers: Split out index calculation
  timers: Only wake softirq if necessary
  timers: Forward the wheel clock whenever possible
  timers/nohz: Remove pointless tick_nohz_kick_tick() function
  timers: Optimize collect_expired_timers() for NOHZ
  timers: Move __run_timers() function
  timers: Remove set_timer_slack() leftovers
  timers: Switch to a non-cascading wheel
  timers: Reduce the CPU index space to 256k
  timers: Give a few structs and members proper names
  hlist: Add hlist_is_singular_node() helper
  signals: Use hrtimer for sigtimedwait()
  timers: Remove the deprecated mod_timer_pinned() API
  timers, net/ipv4/inet: Initialize connection request timers as pinned
  timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
  timers, drivers/tty/metag_da: Initialize the poll timer as pinned
  ...
2016-07-25 20:43:12 -07:00
Linus Torvalds
0f657262d5 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "Various x86 low level modifications:

   - preparatory work to support virtually mapped kernel stacks (Andy
     Lutomirski)

   - support for 64-bit __get_user() on 32-bit kernels (Benjamin
     LaHaise)

   - (involved) workaround for Knights Landing CPU erratum (Dave Hansen)

   - MPX enhancements (Dave Hansen)

   - mremap() extension to allow remapping of the special VDSO vma, for
     purposes of user level context save/restore (Dmitry Safonov)

   - hweight and entry code cleanups (Borislav Petkov)

   - bitops code generation optimizations and cleanups with modern GCC
     (H. Peter Anvin)

   - syscall entry code optimizations (Paolo Bonzini)"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  x86/mm/cpa: Add missing comment in populate_pdg()
  x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
  x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
  x86/smp: Remove unnecessary initialization of thread_info::cpu
  x86/smp: Remove stack_smp_processor_id()
  x86/uaccess: Move thread_info::addr_limit to thread_struct
  x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
  x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
  x86/dumpstack: When OOPSing, rewind the stack before do_exit()
  x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
  x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
  x86/dumpstack: Try harder to get a call trace on stack overflow
  x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
  x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
  x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
  x86/mm: Use pte_none() to test for empty PTE
  x86/mm: Disallow running with 32-bit PTEs to work around erratum
  x86/mm: Ignore A/D bits in pte/pmd/pud_none()
  x86/mm: Move swap offset/type up in PTE to work around erratum
  x86/entry: Inline enter_from_user_mode()
  ...
2016-07-25 15:34:18 -07:00
Linus Torvalds
9d0be76f52 Char/Misc driver patches for 4.8-rc1
Here is the big char/misc driver update for 4.8-rc1.
 
 Not a lot of stuff, but it's all over the place, full details are in the
 shortlog below.  All of these have been in linux-next with no reported
 issues for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iFYEABECABYFAleVPBsPHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspEQgAoJOX
 nSWKA7j4JMGy1v+uNIqsgUmUAJsFyS388N+Faa2K4uyp7CYQ6jaAZw==
 =0Ofd
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the big char/misc driver update for 4.8-rc1.

  Not a lot of stuff, but it's all over the place, full details are in
  the shortlog.  All of these have been in linux-next with no reported
  issues for a while"

* tag 'char-misc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (49 commits)
  lkdtm: silence warnings about function declarations
  lkdtm: hide unused functions
  intel_th: pci: Add Kaby Lake PCH-H support
  intel_th: Fix a deadlock in modprobing
  dsp56k: prevent a harmless underflow
  chardev: add missing line break in pr_warn
  lkdtm: use struct arrays instead of enums
  lkdtm: move jprobe entry points to start of source
  lkdtm: reorganize module paramaters
  lkdtm: rename globals for clarity
  lkdtm: rename "count" to "crash_count"
  lkdtm: remove intentional off-by-one array access
  lkdtm: split remaining logic bug tests to separate file
  lkdtm: split heap corruption tests to separate file
  lkdtm: split memory permissions tests to separate file
  lkdtm: split usercopy tests to separate file
  lkdtm: drop "alloc_size" parameter
  lkdtm: add usercopy test for blocking kernel text
  extcon: adc-jack: add suspend/resume support
  extcon: add missing of_node_put after calling of_parse_phandle
  ...
2016-07-24 16:26:26 -07:00
Linus Torvalds
b403f23044 We've got ten patches this time, half of which are related to a plethora
of nasty outcomes when inodes are transitioned from the unlinked state
 to the free state. Small file systems are particularly vulnerable to these
 problems, and it can manifest as mainly hangs, but also file system
 corruption. The patches have been tested for literally many weeks, with a
 very gruelling test, so I have a high level of confidence.
 
 - Andreas Gruenbacher wrote a series of 5 patches for various lockups
   during the transition of inodes from unlinked to free. The main patch
   is titled "Fix gfs2_lookup_by_inum lock inversion" and the other 4 are
   support and cleanup patches related to that.
 - Ben Marzinski contributed 2 patches with regard to a recreatable
   problem when gfs2 tries to write a page to a file that is being
   truncated, resulting in a BUG() in gfs2_remove_from_journal.
   Note that Ben had to export vfs function __block_write_full_page to get
   this to work properly. It's been posted a long time and he talked to
   various VFS people about it, and nobody seemed to mind.
 - I contributed 3 patches. (1) The first one fixes a memory corruptor:
   a race in which one process can overwrite the gl_object pointer set by
   another process, causing kernel panic and other symptoms. (2) The second
   patch fixes another race that resulted in a false-positive BUG_ON. This
   occurred when resource group reservations were freed by one process
   while another process was trying to grab a new reservation in the same
   resource group. (3) The third patch fixes a problem with doing journal
   replay when the journals are not all the same size.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXklXIAAoJENeLYdPf93o7AbIIAImLEixK+4CaItEArAKG9TXv
 WbO+eDJfo6AOtAteB6+MdX2UxXAHJsCY6RmiEIAi5LzlVFiiCgRo4z/QgDARAw3c
 2RxlndElaESh82S27sLiFbgZeY7GZv04C0t6AzMkc830BLXiKMs6bXfeq1fzW8Sf
 AgAneACVsX0faRWo/XDuQcK81dwZ+qdOnR2+FvtOSFl1KgV0BrtnsW7IHv+5MIot
 SREDN7VvSQwQrLgwMlC0PvhwK3KCVvuO9ZziLEPpYJONESJfEmuCpG265+tUJNTw
 dIcW3p/vvgow8fb56fSnAxaeplPLlF9qJCq1M9fWZrKVbHg2uyCZMx4P52Fnmz4=
 =uUVs
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Bob Peterson:
 "We've got ten patches this time, half of which are related to a
  plethora of nasty outcomes when inodes are transitioned from the
  unlinked state to the free state.  Small file systems are particularly
  vulnerable to these problems, and it can manifest as mainly hangs, but
  also file system corruption.  The patches have been tested for
  literally many weeks, with a very gruelling test, so I have a high
  level of confidence.

   - Andreas Gruenbacher wrote a series of five patches for various
     lockups during the transition of inodes from unlinked to free.

     The main patch is titled "Fix gfs2_lookup_by_inum lock inversion"
     and the other four are support and cleanup patches related to that.

   - Ben Marzinski contributed two patches with regard to a recreatable
     problem when gfs2 tries to write a page to a file that is being
     truncated, resulting in a BUG() in gfs2_remove_from_journal.

     Note that Ben had to export vfs function __block_write_full_page to
     get this to work properly.  It's been posted a long time and he
     talked to various VFS people about it, and nobody seemed to mind.

   - I contributed 3 patches:
       o The first one fixes a memory corruptor: a race in which one
         process can overwrite the gl_object pointer set by another
         process, causing kernel panic and other symptoms.
       o The second patch fixes another race that resulted in a
         false-positive BUG_ON.  This occurred when resource group
         reservations were freed by one process while another process
         was trying to grab a new reservation in the same resource
         group.
       o The third patch fixes a problem with doing journal replay when
         the journals are not all the same size"

* tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes
  GFS2: Check rs_free with rd_rsspin protection
  gfs2: writeout truncated pages
  fs: export __block_write_full_page
  gfs2: Lock holder cleanup
  gfs2: Large-filesystem fix for 32-bit systems
  gfs2: Get rid of gfs2_ilookup
  gfs2: Fix gfs2_lookup_by_inum lock inversion
  gfs2: Initialize iopen glock holder for new inodes
  GFS2: don't set rgrp gl_object until it's inserted into rgrp tree
2016-07-24 16:07:52 -07:00
Linus Torvalds
88083e9845 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
 "This contains a fix for a potential crash/corruption issue and another
  where the suid/sgid bits weren't cleared on write"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: verify upper dentry in ovl_remove_and_whiteout()
  ovl: Copy up underlying inode's ->i_mode to overlay inode
  ovl: handle ATTR_KILL*
2016-07-23 14:25:02 +09:00
Maxim Patlasov
cfc9fde0b0 ovl: verify upper dentry in ovl_remove_and_whiteout()
The upper dentry may become stale before we call ovl_lock_rename_workdir.
For example, someone could (mistakenly or maliciously) manually unlink(2)
it directly from upperdir.

To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir
and and check if it matches the upper dentry.

Essentially, it is the same problem and similar solution as in
commit 11f3710417 ("ovl: verify upper dentry before unlink and rename").

Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-07-22 10:54:20 +02:00
Bob Peterson
e1cb6be9e1 GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes
Before this patch, if you used gfs2_jadd to add new journals of a
size smaller than the existing journals, replaying those new journals
would withdraw. That's because function gfs2_replay_incr_blk was
using the number of journal blocks (jd_block) from the superblock's
journal pointer. In other words, "My journal's max size" rather than
"the journal we're replaying's size." This patch changes the function
to use the size of the pertinent journal rather than always using the
journal we happen to be using.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-07-21 13:02:44 -05:00
Toshi Kani
163d4baaeb block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
Currently, presence of direct_access() in block_device_operations
indicates support of DAX on its block device.  Because
block_device_operations is instantiated with 'const', this DAX
capablity may not be enabled conditinally.

In preparation for supporting DAX to device-mapper devices, add
QUEUE_FLAG_DAX to request_queue flags to advertise their DAX
support.  This will allow to set the DAX capability based on how
mapped device is composed.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <linux-s390@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 21:01:01 -06:00
Christoph Hellwig
70246286e9 block: get rid of bio_rw and READA
These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces.  For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense.  Any check for READA is replaced with an
explicit check for REQ_RAHEAD.  Also remove the READA alias for
REQ_RAHEAD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:37:01 -06:00
Vincent Stehlé
df5c82a8dc Btrfs: fix comparison in __btrfs_map_block()
Add missing comparison to op in expression, which was forgotten when doing
the REQ_OP transition.

Fixes: b3d3fa5199 ("btrfs: update __btrfs_map_block for REQ_OP transition")
Signed-off-by: Vincent Stehlé <vincent.stehle@intel.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-18 15:28:23 -06:00
Jann Horn
3e0a396546 xfs: fix type confusion in xfs_ioc_swapext
Without this check, the following XFS_I invocations would return bad
pointers when used on non-XFS inodes (perhaps pointers into preceding
allocator chunks).

This could be used by an attacker to trick xfs_swap_extents into
performing locking operations on attacker-chosen structures in kernel
memory, potentially leading to code execution in the kernel.  (I have
not investigated how likely this is to be usable for an attack in
practice.)

Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-16 06:30:06 +09:00
H.J. Lu
3ebfd81f7f x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
Don't use the same syscall numbers for 2 different syscalls:

 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
 534	x32	preadv2			compat_sys_preadv2
 535	x32	pwritev2		compat_sys_pwritev2

Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset
is passed in one 64-bit register on x32, similar to compat_sys_preadv64()
and compat_sys_pwritev64().

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:30:26 +02:00
Fengguang Wu
077e2642fb chardev: add missing line break in pr_warn
To fix super long dmesg error lines like

  CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation rangeCHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation rangeswapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK)

After fix, it should look like

  CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation range
  CHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation range
  swapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK)

Reported-by: Philip Li <philip.li@intel.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-14 16:21:53 +09:00
Bob Peterson
44f52122a2 GFS2: Check rs_free with rd_rsspin protection
For the last process to close a file opened for write, function
gfs2_rsqa_delete was deleting the file's inode's block reservation
out of the rgrp reservations tree. Then it was checking to make sure
rs_free was 0, but it was performing the check outside the protection
of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed
to prevent a race between the process freeing the reservation and
another who is allocating a new set of blocks inside the same rgrp
for the same inode, thus changing its value.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-07-12 11:48:22 -05:00
Linus Torvalds
08d27eb206 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  posix_acl: de-union a_refcount and a_rcu
  nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed
  Use the right predicate in ->atomic_open() instances
2016-07-12 16:49:01 +09:00
Linus Torvalds
b987c759d2 eCryptfs fixes for 4.7-rc7:
- Provide a more concise fix for CVE-2016-1583
   + Additionally fixes linux-stable regressions caused by the cherry-picking of
     the original fix
 - Some very minor changes that have queued up
   + Fix typos in code comments
   + Remove unnecessary check for NULL before destroying kmem_cache
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJXf8nnAAoJENaSAD2qAscKwXgP/0awhY1z40dL/igP6fPv2ack
 HbqrOjUVO2DzxinvKB3vRLNy93zwESxe8UpwPsl84IJ85zOQjkUkJ8PYk1oyBf0N
 dVWqO11g6AKNZ+VQFspconvMhZATwSrsv8z3BzvwNGLsPhPuUQ+JmbBe8xMdrsZ5
 qVaWswsMtMlhM3p/zFh57vWO64fT1xiabpxSkKpG2LHJN6h6QAQxkfBfa2FuXCsN
 hZIw+ULcUJfdawXGq8lAfcYzbDmFpNt70fFquJgfJHrXFrOuensYfLcWUvhrSNbc
 HZ6imRK9LCG4IKjJTBNmCmBR8ho71yGzdKuup81Eap+2zx2kC7twokS1d5fha8iL
 Kzkx0NMDriY2N+tIfufHYk2IIenFzWG6Yuj0STswtJX4YhQGBc0H6VxcgrxE0PgW
 k1iKUV7jnJGxxN+d6lmV4+fX0vKGgBMsQq1Q76CkYLN1BAvdwz6GnWSfqP8hWz3o
 sNVyNtYh+/TXY8JMWKDBlps7Ib8W88qDW3K7YcAf2VPYAqIWm5Va1MR5m5s+UIeR
 QiCD32X/0PfDp13QRiKAHJ6C9CInyu0r+fF/g8ZMqLuWgLxoahxpr6ML/CnHoGl5
 IcDydJO3/bLBq9If8WxYsOQvVKCa4e7N7o7ZHPKd8U7O39mCGNfbQx7/FlMjtvf6
 +4HAxamUC1ogpLTkpWxI
 =Bt4P
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull eCryptfs fixes from Tyler Hicks:
 "Provide a more concise fix for CVE-2016-1583:
   - Additionally fixes linux-stable regressions caused by the
     cherry-picking of the original fix

  Some very minor changes that have queued up:
   - Fix typos in code comments
   - Remove unnecessary check for NULL before destroying kmem_cache"

* tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  ecryptfs: don't allow mmap when the lower fs doesn't support it
  Revert "ecryptfs: forbid opening files without mmap handler"
  ecryptfs: fix spelling mistakes
  eCryptfs: fix typos in comment
  ecryptfs: drop null test before destroy functions
2016-07-08 09:48:28 -07:00
Jeff Mahoney
f0fe970df3 ecryptfs: don't allow mmap when the lower fs doesn't support it
There are legitimate reasons to disallow mmap on certain files, notably
in sysfs or procfs.  We shouldn't emulate mmap support on file systems
that don't offer support natively.

CVE-2016-1583

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: stable@vger.kernel.org
[tyhicks: clean up f_op check by using ecryptfs_file_to_lower()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2016-07-08 10:35:28 -05:00
Jeff Mahoney
78c4e17241 Revert "ecryptfs: forbid opening files without mmap handler"
This reverts commit 2f36db7100.

It fixed a local root exploit but also introduced a dependency on
the lower file system implementing an mmap operation just to open a file,
which is a bit of a heavy hammer.  The right fix is to have mmap depend
on the existence of the mmap handler instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2016-07-07 18:47:57 -05:00
Linus Torvalds
ac904ae6e6 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block IO fixes from Jens Axboe:
 "Three small fixes that have been queued up and tested for this series:

   - A bug fix for xen-blkfront from Bob Liu, fixing an issue with
     incomplete requests during migration.

   - A fix for an ancient issue in retrieving the IO priority of a
     different PID than self, preventing that task from going away while
     we access it.  From Omar.

   - A writeback fix from Tahsin, fixing a case where we'd call ihold()
     with a zero ref count inode"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix use-after-free in sys_ioprio_get()
  writeback: inode cgroup wb switch should not call ihold()
  xen-blkfront: save uncompleted reqs in blkfront_resume()
2016-07-07 15:34:09 -07:00
Linus Torvalds
4c2a8499a4 Configfs fixes for Linux 4.7:
- a fix from Marek for ppos handling in configfs_write_bin_file,
    which was introduced in Linux 4.5, but didn't have any users
    until recently.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXfjK1AAoJEA+eU2VSBFGDJ0kQAMZHONK7pHwM+IxDUeTsDTfa
 FX+EplF1rLEtmUGOl01XbjgQp7acsP19YWikQfC09+ZjF6Vn1zFAFlNoU3qM+i/2
 zdukzIRBaSM+4w4HFDQ548zGGc8e9mesIIUHrHt6n/nL0OLKTU0XzbmRMXmvXAUJ
 u0nuB0OYlFJEVQFIlDYfG6E2rJy37FPilToonfw+AryVDenRm9iiUt0iMFoA8wPG
 EpogUinelxrKZ+ysOEeibaTGxxLLbd3AbWeUQbkhmsk4FfxuV7GFSfGPbhmJ1LeU
 n5X3LbK8lixG8goGdAW1NYulLnTjprZ6emUL0jwxYmI+MzOP7DUcsLqsGmdh5LEa
 Uw5gzQnzNDOUFR8XFt5CgARUHRXeDyJfzKeWvKFd4JUink6wE9R0yQJ2bPBXu3pY
 7t0P4qQSKWdeGjmYg54/JBamMba7BLQOLZuuiAplTTAt5Dg4tEi9Zuje2sUmBcDn
 3YnG4dnGxPeP9EKCElh1WWtwiRItUKT+YtaqSinL1Rh2j1aWu9WJQ3M1C+s3hFQ3
 vGR/CchllLtP4xmpY9TXEpUbBp4ZnTercWAxLRczi1/kOm3SdMIMUak26U6pTBmN
 QfkEzSx+K5F1wGaoqa2j7MuTMSNsxfT1R0xA6aBid4oOyCoPyHOVJ3mqV0AziyOl
 c7B92HgR/aGFMEevVA1G
 =mZ2H
 -----END PGP SIGNATURE-----

Merge tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfs

Pull configfs fix from Christoph Hellwig:
 "A fix from Marek for ppos handling in configfs_write_bin_file, which
  was introduced in Linux 4.5, but didn't have any users until recently"

* tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfs:
  configfs: Remove ppos increment in configfs_write_bin_file
2016-07-07 15:32:17 -07:00
Ingo Molnar
4b4b20852d Merge branch 'timers/fast-wheel' into timers/core 2016-07-07 10:35:28 +02:00
Al Viro
c94c09535c nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-05 16:02:31 -04:00
Al Viro
00699ad857 Use the right predicate in ->atomic_open() instances
->atomic_open() can be given an in-lookup dentry *or* a negative one
found in dcache.  Use d_in_lookup() to tell one from another, rather
than d_unhashed().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-05 16:02:23 -04:00
Vivek Goyal
07a2daab49 ovl: Copy up underlying inode's ->i_mode to overlay inode
Right now when a new overlay inode is created, we initialize overlay
inode's ->i_mode from underlying inode ->i_mode but we retain only
file type bits (S_IFMT) and discard permission bits.

This patch changes it and retains permission bits too. This should allow
overlay to do permission checks on overlay inode itself in task context.

[SzM] It also fixes clearing suid/sgid bits on write.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
2016-07-04 16:49:48 +02:00
Miklos Szeredi
b99c2d9138 ovl: handle ATTR_KILL*
Before 4bacc9c923 ("overlayfs: Make f_path...") file->f_path pointed to
the underlying file, hence suid/sgid removal on write worked fine.

After that patch file->f_path pointed to the overlay file, and the file
mode bits weren't copied to overlay_inode->i_mode.  So the suid/sgid
removal simply stopped working.

The fix is to copy the mode bits, but then ovl_setattr() needs to clear
ATTR_MODE to avoid the BUG() in notify_change().  So do this first, then in
the next patch copy the mode.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
2016-07-04 16:49:48 +02:00
Linus Torvalds
0b295dd5b8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
 "This makes sure userspace filesystems are not broken by the parallel
  lookups and readdir feature"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: serialize dirops by default
2016-07-03 12:02:00 -07:00
Linus Torvalds
236bfd8ed8 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
 "This contains fixes for a dentry leak, a regression in 4.6 noticed by
  Docker users and missing write access checking in truncate"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: warn instead of error if d_type is not supported
  ovl: get_write_access() in truncate
  ovl: fix dentry leak for default_permissions
2016-07-03 11:57:09 -07:00
Vivek Goyal
e7c0b5991d ovl: warn instead of error if d_type is not supported
overlay needs underlying fs to support d_type. Recently I put in a
patch in to detect this condition and started failing mount if
underlying fs did not support d_type.

But this breaks existing configurations over kernel upgrade. Those who
are running docker (partially broken configuration) with xfs not
supporting d_type, are surprised that after kernel upgrade docker does
not run anymore.

https://github.com/docker/docker/issues/22937#issuecomment-229881315

So instead of erroring out, detect broken configuration and warn
about it. This should allow existing docker setups to continue
working after kernel upgrade.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 45aebeaf4f ("ovl: Ensure upper filesystem supports d_type")
Cc: <stable@vger.kernel.org> 4.6
2016-07-03 09:39:31 +02:00
Linus Torvalds
48c4565ed6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Tmpfs readdir throughput regression fix (this cycle) + some -stable
  fodder all over the place.

  One missing bit is Miklos' tonight locks.c fix - NFS folks had already
  grabbed that one by the time I woke up ;-)"

[ The locks.c fix came through the nfsd tree just moments ago ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  namespace: update event counter when umounting a deleted dentry
  9p: use file_dentry()
  ceph: fix d_obtain_alias() misuses
  lockless next_positive()
  libfs.c: new helper - next_positive()
  dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
2016-07-01 15:20:11 -07:00
Linus Torvalds
2728c57fda One fix for lockd soft lookups in an error path, and one fix for file
leases on overlayfs.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXdr1UAAoJECebzXlCjuG+QlsQAJiZTmio6k9tupN5+iKsZNL3
 m919ooj8GYsXxlC0OTFfi09dUi1yeF8MEE3j9egk/+qxEtsdmKdEOIy0RVdcLSfd
 HeGjXgLh79hVcGxgyBP+pdax2XhZ3RVisg8F5gTw2GPo+FPFZfrEuO5h7ctn+t45
 MCQ+4yqYqzEhYnoPyo5XKh5Aj6wBWiaTzg3/jSe6uSuSfuBfyaMaBPq7l7ayGra/
 5El+tu61o/SrJ41N2EayWSj/bOFJE92LIuGOh8NdfANuuP70JhxlwgVSldah3CCQ
 6PymXAVcjw0+gJ00mokzKfTJW5FPfasxckMHaOvcFsSONy4rlmrwwqUr9C2AFzTE
 wQGIzibCDYOSI8uF+//Oe+dh8JWp2TF8rfJcmKLyJMIcCq/Xl6cNx1qrq/oSWvuk
 WKOv1otJrPeT31h/s5iLjr/E/Po1eX+d2ySJdvUHYu/5aZwFgWPnVXwJk0s9bLow
 auZU85tPnuz+tbS2pWEK+el7LMgDBdzraVRogMdH1c+m3+G9pr53EzmpYovkZ2X8
 duVJ2Leslyya347TnJAgEY47Fbeu26JaoeIChGVhKcEyCENlcqAJWaGVECrxvs3y
 p/Y2MYMkO8YrCz5wQXPiLFiG4rAc+jSIn1Q+vRGl2Pkel0y7AgJNNMFtANjMCSIO
 pg6BqUjOyKt8cXVy4UW7
 =K0mr
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux

Pull lockd/locks fixes from Bruce Fields:
 "One fix for lockd soft lookups in an error path, and one fix for file
  leases on overlayfs"

* tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux:
  locks: use file_inode()
  lockd: unregister notifier blocks if the service fails to come up completely
2016-07-01 15:18:49 -07:00
Linus Torvalds
f3683ccd12 Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "1/ Two regression fixes since v4.6: one for the byte order of a sysfs
     attribute (bz121161) and another for QEMU 2.6's NVDIMM _DSM (ACPI
     Device Specific Method) implementation that gets tripped up by new
     auto-probing behavior in the NFIT driver.

  2/ A fix tagged for -stable that stops the kernel from
     clobbering/ignoring changes to the configuration of a 'pfn'
     instance ("struct page" driver).  For example changing the
     alignment from 2M to 1G may silently revert to 2M if that value is
     currently stored on media.

  3/ A fix from Eric for an xfstests failure in dax.  It is not
     currently tagged for -stable since it requires an 8-exabyte file
     system to trigger, and there appear to be no user visible side
     effects"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nfit: fix format interface code byte order
  dax: fix offset overflow in dax_io
  acpi, nfit: fix acpi_check_dsm() vs zero functions implemented
  libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment
2016-07-01 15:15:03 -07:00
Miklos Szeredi
6343a21208 locks: use file_inode()
(Another one for the f_path debacle.)

ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.

The reason is that generic_add_lease() used filp->f_path.dentry->inode
while all the others use file_inode().  This makes a difference for files
opened on overlayfs since the former will point to the overlay inode the
latter to the underlying inode.

So generic_add_lease() added the lease to the overlay inode and
generic_delete_lease() removed it from the underlying inode.  When the file
was released the lease remained on the overlay inode's lock list, resulting
in use after free.

Reported-by: Eryu Guan <eguan@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-01 10:24:18 -04:00
Andrey Ulanov
e06b933e6d namespace: update event counter when umounting a deleted dentry
- m_start() in fs/namespace.c expects that ns->event is incremented each
  time a mount added or removed from ns->list.
- umount_tree() removes items from the list but does not increment event
  counter, expecting that it's done before the function is called.
- There are some codepaths that call umount_tree() without updating
  "event" counter. e.g. from __detach_mounts().
- When this happens m_start may reuse a cached mount structure that no
  longer belongs to ns->list (i.e. use after free which usually leads
  to infinite loop).

This change fixes the above problem by incrementing global event counter
before invoking umount_tree().

Change-Id: I622c8e84dcb9fb63542372c5dbf0178ee86bb589
Cc: stable@vger.kernel.org
Signed-off-by: Andrey Ulanov <andreyu@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-30 23:28:30 -04:00
Miklos Szeredi
b403f0e37a 9p: use file_dentry()
v9fs may be used as lower layer of overlayfs and accessing f_path.dentry
can lead to a crash.  In this case it's a NULL pointer dereference in
p9_fid_create().

Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.

Reported-by: Alessio Igor Bogani <alessioigorbogani@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Alessio Igor Bogani <alessioigorbogani@gmail.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-30 23:28:09 -04:00
Scott Mayhew
cb7d224f82 lockd: unregister notifier blocks if the service fails to come up completely
If the lockd service fails to start up then we need to be sure that the
notifier blocks are not registered, otherwise a subsequent start of the
service could cause the same notifier to be registered twice, leading to
soft lockups.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 0751ddf77b "lockd: Register callbacks on the inetaddr_chain..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-30 16:35:07 -04:00
Tahsin Erdogan
7452495555 writeback: inode cgroup wb switch should not call ihold()
Asynchronous wb switching of inodes takes an additional ref count on an
inode to make sure inode remains valid until switchover is completed.

However, anyone calling ihold() must already have a ref count on inode,
but in this case inode->i_count may already be zero:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30
CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: writeback wb_workfn (flush-8:16)
 0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000
 0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2
 ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003
Call Trace:
 [<ffffffff805990af>] dump_stack+0x4d/0x6e
 [<ffffffff80268702>] __warn+0xc2/0xe0
 [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20
 [<ffffffff8035b4ab>] ihold+0x2b/0x30
 [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180
 [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0
 [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530
 [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0
 [<ffffffff8036a147>] wb_workfn+0xd7/0x280
 [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0
 [<ffffffff8027bb09>] process_one_work+0x129/0x300
 [<ffffffff8027be06>] worker_thread+0x126/0x480
 [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561
 [<ffffffff8027bce0>] ? process_one_work+0x300/0x300
 [<ffffffff80280ff4>] kthread+0xc4/0xe0
 [<ffffffff80335578>] ? kfree+0xc8/0x100
 [<ffffffff809903cf>] ret_from_fork+0x1f/0x40
 [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70
---[ end trace aaefd2fd9f306bc4 ]---

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-30 13:58:41 -06:00
Miklos Szeredi
5c672ab3f0 fuse: serialize dirops by default
Negotiate with userspace filesystems whether they support parallel readdir
and lookup.  Disable parallelism by default for fear of breaking fuse
filesystems.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9902af79c0 ("parallel lookups: actual switch to rwsem")
Fixes: d9b3dbdcfd ("fuse: switch to ->iterate_shared()")
2016-06-30 13:10:49 +02:00
Marek Vasut
f8608985f8 configfs: Remove ppos increment in configfs_write_bin_file
The simple_write_to_buffer() already increments the @ppos on success,
see fs/libfs.c simple_write_to_buffer() comment:

"
On success, the number of bytes written is returned and the offset @ppos
advanced by this number, or negative value is returned on error.
"

If the configfs_write_bin_file() is invoked with @count smaller than the
total length of the written binary file, it will be invoked multiple times.
Since configfs_write_bin_file() increments @ppos on success, after calling
simple_write_to_buffer(), the @ppos is incremented twice.

Subsequent invocation of configfs_write_bin_file() will result in the next
piece of data being written to the offset twice as long as the length of
the previous write, thus creating buffer with "holes" in it.

The simple testcase using DTO follows:
  $ mkdir /sys/kernel/config/device-tree/overlays/1
  $ dd bs=1 if=foo.dtbo of=/sys/kernel/config/device-tree/overlays/1/dtbo
Without this patch, the testcase will result in twice as big buffer in the
kernel, which is then passed to the cfs_overlay_item_dtbo_write() .

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
2016-06-30 11:28:55 +02:00
Linus Torvalds
e7bdea7750 NFS client bugfixes for Linux 4.7
Stable bugfixes:
 - Fix _cancel_empty_pagelist
 - Fix a double page unlock
 - Make nfs_atomic_open() call d_drop() on all ->open_context() errors.
 - Fix another OPEN_DOWNGRADE bug
 
 Other bugfixes:
 - Ensure we handle delegation errors in nfs4_proc_layoutget()
 - Layout stateids start out as being invalid
 - Add sparse lock annotations for pnfs_find_alloc_layout
 - Handle bad delegation stateids in nfs4_layoutget_handle_exception
 - Fix up O_DIRECT results
 - Fix potential use after free of state in nfs4_do_reclaim.
 - Mark the layout stateid invalid when all segments are removed
 - Don't let readdirplus revalidate an inode that was marked as stale
 - Fix potential race in nfs_fhget()
 - Fix an unused variable warning
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXdDWkAAoJENfLVL+wpUDrpsUP/1F2zu12r/Zkv3ZEFhShcpQc
 2N1TRD9X7Lruod2pUD95qqjjdw+/vu3LjcyJljrasRaJENijvZ2GQhKkB7xPODlu
 qxZcmnQsH+WmpmJKcqAByAW1czcNGMoMHnt4tV0gG21NH+XUb92fgn+aGeIJDVrK
 Hcd9d8TfnFWO70ZgTUXW/hv0CXwu4MEJhN2JfF4lolbxUkmjLHHLoxSDDm0AdXGC
 EE8f0V9/7xurvOeLe5bQOQXfZPedBydsLNXa1ZacMGKgmBUoRNxJ5yCpPUtcTVBx
 HwbiY+WDFQ7MdKTzUQqqbnrIqKw8Hu4SugIV/vHRqR+Lhc6u29YGOqdU4d2G8IKW
 Nh8MBqS+dDefCkL3TJoE7MpjhP3EOO6HXnv5FjMZLOuu2X2o+Sz3+DkhYCq6pj/g
 fFh480vZfZYaTsfDf1ttvVN8kIvQ+1Uk3LK6aC2EVwgPrv+0OIRu36F0JQQimxOp
 EbYDlhVk7mzH/ZQ31GmPPbSIk+3sm2V58lqXnUMovoqPFPiN3xZDuBcnlyFrrzaI
 NjOvsdVxkOdHWbYZyBQzj16Vo651EYbAAUEwsud70N8C3aCgkTxCZ30Q0v+KqqxU
 pP5kz3zYUdkXQeHxE6T0iXG9fGcv/nGS21hTfT01YJCK7v67K8TYRNMrOEVURVgk
 LSD/CZJXJHVJn1Vr4F7o
 =IGNO
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Stable bugfixes:
   - Fix _cancel_empty_pagelist
   - Fix a double page unlock
   - Make nfs_atomic_open() call d_drop() on all ->open_context() errors.
   - Fix another OPEN_DOWNGRADE bug

  Other bugfixes:
   - Ensure we handle delegation errors in nfs4_proc_layoutget()
   - Layout stateids start out as being invalid
   - Add sparse lock annotations for pnfs_find_alloc_layout
   - Handle bad delegation stateids in nfs4_layoutget_handle_exception
   - Fix up O_DIRECT results
   - Fix potential use after free of state in nfs4_do_reclaim.
   - Mark the layout stateid invalid when all segments are removed
   - Don't let readdirplus revalidate an inode that was marked as stale
   - Fix potential race in nfs_fhget()
   - Fix an unused variable warning"

* tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Fix another OPEN_DOWNGRADE bug
  make nfs_atomic_open() call d_drop() on all ->open_context() errors.
  NFS: Fix an unused variable warning
  NFS: Fix potential race in nfs_fhget()
  NFS: Don't let readdirplus revalidate an inode that was marked as stale
  NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removed
  NFS: Fix a double page unlock
  pnfs_nfs: fix _cancel_empty_pagelist
  nfs4: Fix potential use after free of state in nfs4_do_reclaim.
  NFS: Fix up O_DIRECT results
  NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exception
  NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layout
  NFSv4.1/pnfs: Layout stateids start out as being invalid
  NFSv4.1/pnfs: Ensure we handle delegation errors in nfs4_proc_layoutget()
2016-06-29 15:30:26 -07:00
Miklos Szeredi
03bea60409 ovl: get_write_access() in truncate
When truncating a file we should check write access on the underlying
inode.  And we should do so on the lower file as well (before copy-up) for
consistency.

Original patch and test case by Aihua Zhang.

 - - >o >o - - test.c - - >o >o - -
#include <stdio.h>
#include <errno.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
	int ret;

	ret = truncate(argv[0], 4096);
	if (ret != -1) {
		fprintf(stderr, "truncate(argv[0]) should have failed\n");
		return 1;
	}
	if (errno != ETXTBSY) {
		perror("truncate(argv[0])");
		return 1;
	}

	return 0;
}
 - - >o >o - - >o >o - - >o >o - -

Reported-by: Aihua Zhang <zhangaihua1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-06-29 16:03:55 +02:00
Miklos Szeredi
a4859d7594 ovl: fix dentry leak for default_permissions
When using the 'default_permissions' mount option, ovl_permission() on
non-directories was missing a dput(alias), resulting in "BUG Dentry still
in use".

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8d3095f4ad ("ovl: default permissions")
Cc: <stable@vger.kernel.org> # v4.5+
2016-06-29 08:26:59 +02:00
Trond Myklebust
e547f26283 NFS: Fix another OPEN_DOWNGRADE bug
Olga Kornievskaia reports that the following test fails to trigger
an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE.

	fd0 = open(foo, RDRW)   -- should be open on the wire for "both"
	fd1 = open(foo, RDONLY)  -- should be open on the wire for "read"
	close(fd0) -- should trigger an open_downgrade
	read(fd1)
	close(fd1)

The issue is that we're missing a check for whether or not the current
state transitioned from an O_RDWR state as opposed to having transitioned
from a combination of O_RDONLY and O_WRONLY.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: cd9288ffae ("NFSv4: Fix another bug in the close/open_downgrade code")
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-28 16:55:34 -04:00
Eric Sandeen
023954351f dax: fix offset overflow in dax_io
This isn't functionally apparent for some reason, but
when we test io at extreme offsets at the end of the loff_t
rang, such as in fstests xfs/071, the calculation of
"max" in dax_io() can be wrong due to pos + size overflowing.

For example,

# xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file

enters dax_io with:

start 0x7ffffffffffff000
end   0x7ffffffffffff200

and the rounded up "size" variable is 0x1000.  This yields:

pos + size 0x8000000000000000 (overflows loff_t)
       end 0x7ffffffffffff200

Due to the overflow, the min() function picks the wrong
value for the "max" variable, and when we send (max - pos)
into i.e. copy_from_iter_pmem() it is also the wrong value.

This somehow(tm) gets magically absorbed without incident,
probably because iter->count is correct.  But it seems best
to fix it up properly by comparing the two values as
unsigned.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-27 12:18:44 -07:00
Linus Torvalds
fbe601f7a3 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Various small cifs/smb3 fixes, include some for stable, and some from
  the recent SMB3 test event"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  File names with trailing period or space need special case conversion
  Fix reconnect to not defer smb3 session reconnect long after socket reconnect
  cifs: check hash calculating succeeded
  cifs: dynamic allocation of ntlmssp blob
  cifs: use CIFS_MAX_DOMAINNAME_LEN when converting the domain name
  cifs: stuff the fl_owner into "pid" field in the lock request
2016-06-27 11:23:44 -07:00
Benjamin Marzinski
fd4c5748b8 gfs2: writeout truncated pages
When gfs2 attempts to write a page to a file that is being truncated,
and notices that the page is completely outside of the file size, it
tries to invalidate it.  However, this may require a transaction for
journaled data files to revoke any buffers from the page on the active
items list. Unfortunately, this can happen inside a log flush, where a
transaction cannot be started. Also, gfs2 may need to be able to remove
the buffer from the ail1 list before it can finish the log flush.

To deal with this, when writing a page of a file with data journalling
enabled gfs2 now skips the check to see if the write is outside the file
size, and simply writes it anyway. This situation can only occur when
the truncate code still has the file locked exclusively, and hasn't
marked this block as free in the metadata (which happens later in
truc_dealloc).  After gfs2 writes this page out, the truncation code
will shortly invalidate it and write out any revokes if necessary.

To do this, gfs2 now implements its own version of block_write_full_page
without the check, and calls the newly exported __block_write_full_page.
It also no longer calls gfs2_writepage_common from gfs2_jdata_writepage.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 10:03:12 -05:00
Benjamin Marzinski
b4bba38909 fs: export __block_write_full_page
gfs2 needs to be able to skip the check to see if a page is outside of
the file size when writing it out. gfs2 can get into a situation where
it needs to flush its in-memory log to disk while a truncate is in
progress. If the file being trucated has data journaling enabled, it is
possible that there are data blocks in the log that are past the end of
the file. gfs can't finish the log flush without either writing these
blocks out or revoking them. Otherwise, if the node crashed, it could
overwrite subsequent changes made by other nodes in the cluster when
it's journal was replayed.

Unfortunately, there is no way to add log entries to the log during a
flush. So gfs2 simply writes out the page instead. This situation can
only occur when the truncate code still has the file locked exclusively,
and hasn't marked this block as free in the metadata (which happens
later in truc_dealloc).  After gfs2 writes this page out, the truncation
code will shortly invalidate it and write out any revokes if necessary.

In order to make this work, gfs2 needs to be able to skip the check for
writes outside the file size. Since the check exists in
block_write_full_page, this patch exports __block_write_full_page, which
doesn't have the check.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 09:58:40 -05:00
Andreas Gruenbacher
6df9f9a253 gfs2: Lock holder cleanup
Make the code more readable by cleaning up the different ways of
initializing lock holders and checking for initialized lock holders:
mark lock holders as uninitialized by setting the holder's glock to NULL
(gfs2_holder_mark_uninitialized) instead of zeroing out the entire
object or using a separate flag.  Recognize initialized holders by their
non-NULL glock (gfs2_holder_initialized).  Don't zero out holder objects
which are immeditiately initialized via gfs2_holder_init or
gfs2_glock_nq_init.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 09:47:09 -05:00
Andreas Gruenbacher
cda9dd4207 gfs2: Large-filesystem fix for 32-bit systems
Commit ff34245d switched from iget5_locked to iget_locked among other
things, but iget_locked doesn't work for filesystems larger than 2^32
blocks on 32-bit systems.  Switch back to iget5_locked.  Filesystems
larger than 2^32 blocks are unrealistic to work well on 32-bit systems,
so this is mostly a code cleanliness fix.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 09:47:08 -05:00