Commit Graph

59807 Commits

Author SHA1 Message Date
Linus Torvalds
ad28fd1cb2 SPDX fixes for 5.3-rc2
Here are some small SPDX fixes for 5.3-rc2 for things that came in
 during the 5.3-rc1 merge window that we previously missed.
 
 Only 3 small patches here:
 	- 2 uapi patches to resolve some SPDX tags that were not correct
 	- fix an invalid SPDX tag in the iomap Makefile file
 
 All have been properly reviewed on the public mailing lists.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXT2N9w8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylY9wCeJIYfs/eNf3tsjLQXxUBMYAJNqnsAn2IaMiTt
 cv2mck7JZm5KyHpP3f5N
 =RSZa
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx

Pull SPDX fixes from Greg KH:
 "Here are some small SPDX fixes for 5.3-rc2 for things that came in
  during the 5.3-rc1 merge window that we previously missed.

  Only three small patches here:

   - two uapi patches to resolve some SPDX tags that were not correct

   - fix an invalid SPDX tag in the iomap Makefile file

  All have been properly reviewed on the public mailing lists"

* tag 'spdx-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
  iomap: fix Invalid License ID
  treewide: remove SPDX "WITH Linux-syscall-note" from kernel-space headers again
  treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers
2019-07-28 10:00:06 -07:00
Linus Torvalds
e24ce84e85 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "Two fixes for the fair scheduling class:

   - Prevent freeing memory which is accessible by concurrent readers

   - Make the RCU annotations for numa groups consistent"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Use RCU accessors consistently for ->numa_group
  sched/fair: Don't free p->numa_faults with concurrent readers
2019-07-27 21:22:33 -07:00
Linus Torvalds
88c5083442 Wimplicit-fallthrough patches for 5.3-rc2
Hi Linus,
 
 Please, pull the following patches that mark switch cases where we are
 expecting to fall through. These patches are part of the ongoing efforts
 to enable -Wimplicit-fallthrough. Most of them have been baking in linux-next
 for a whole development cycle.
 
 Also, pull the Makefile patch that globally enables the
 -Wimplicit-fallthrough option.
 
 Finally, some missing-break fixes that have been tagged for -stable:
 
  - drm/amdkfd: Fix missing break in switch statement
  - drm/amdgpu/gfx10: Fix missing break in switch statement
 
 Notice that with these changes, we completely get rid of all the
 fall-through warnings in the kernel.
 
 Thanks
 
 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl06XP0ACgkQRwW0y0cG
 2zHPXhAAsatJGNIg7vuSicVipJBIlwgRLwtbE2rV+MCneUAZj37O9NtBgbxHNoJ+
 5TBB8sLpNCw3rEvPKwm6vicRgntMclEY1vaplPOHt3RiY4lqVSIkpvFSCw1hGur3
 +34+O++n/6HbtY96T+DN5WGMXrU3JSe46xnBLIt0BOoUwyKMaNdntiyd79GrrnyB
 BwPkHrmB+9FEVq+dHmPnRIhc9WUfIdily3+j1CIncaM4eXKWLjoUlOIw1VcSEc4X
 SSdFh2co+3nm/6O54ZxUEC1PyuQZWedsgFmeiTrOanG3AEWQ/jX7GwXvPlgraa2E
 F5MzGUOQwXYL/IzXl1vGjQc4+FV4nhIjEBTDdZseOp7FP5xkHyyOwzGDEmaMXpXT
 XThb+k7Q6EbiSPLFcz8zkCwNB8ngNJMNOsGP3JPFD06MfquqdzP7T1BsHcNtiMVo
 FNwQg9KtvbK9F7a9lXDqfcxfuEScbqSvnKWWwNLImCqIBATYirJzeTeLvTbVWpA2
 iyXF3ylIclsSMOvCtaz41iuoqNh12eqw2UIFPBmkU2wpVkTt+ZIn0Apgom90xe1K
 tSUqMBbBMoHVtsBsILdkdz/m6FJpK+YmmwpRrJSDvPxA5c9caD7cj+62amW/gP4z
 Hi6mC4hw1H5bKMsC2PP/QgJcmS/pgo1zwA9J2a2NNJbYPATHfKI=
 =J2sO
 -----END PGP SIGNATURE-----

Merge tag 'Wimplicit-fallthrough-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull Wimplicit-fallthrough enablement from Gustavo A. R. Silva:
 "This marks switch cases where we are expecting to fall through, and
  globally enables the -Wimplicit-fallthrough option in the main
  Makefile.

  Finally, some missing-break fixes that have been tagged for -stable:

   - drm/amdkfd: Fix missing break in switch statement

   - drm/amdgpu/gfx10: Fix missing break in switch statement

  With these changes, we completely get rid of all the fall-through
  warnings in the kernel"

* tag 'Wimplicit-fallthrough-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  Makefile: Globally enable fall-through warning
  drm/i915: Mark expected switch fall-throughs
  drm/amd/display: Mark expected switch fall-throughs
  drm/amdkfd/kfd_mqd_manager_v10: Avoid fall-through warning
  drm/amdgpu/gfx10: Fix missing break in switch statement
  drm/amdkfd: Fix missing break in switch statement
  perf/x86/intel: Mark expected switch fall-throughs
  mtd: onenand_base: Mark expected switch fall-through
  afs: fsclient: Mark expected switch fall-throughs
  afs: yfsclient: Mark expected switch fall-throughs
  can: mark expected switch fall-throughs
  firewire: mark expected switch fall-throughs
2019-07-27 11:04:18 -07:00
Jaegeuk Kim
543b8c468f f2fs: fix to read source block before invalidating it
f2fs_allocate_data_block() invalidates old block address and enable new block
address. Then, if we try to read old block by f2fs_submit_page_bio(), it will
give WARN due to reading invalid blocks.

Let's make the order sanely back.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-26 17:49:04 -07:00
Linus Torvalds
4792ba1f1f for-5.3-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl07KzUACgkQxWXV+ddt
 WDs6Ow/+LQ4He5jW+xpgZ+5fSnSjervfSsYWxKARc4hYZog5l8e2ONASzrniKR0H
 aU2SWUyLF0Os1w++vp1FYYwa1xaPJJRTvlqTI6DSCFgEpu9DefXkCAFuP/+fXbhC
 Fx+ZDAbqgY2gUCEWcs4jQusSgz1sO0m97SJR7K4Vz+ZaBa+eneE5Hkm4kZd8AXg6
 ufKkjOD0+Vy+CUVBZdPg8RlU5KRJ+yAE7B2CRyuSeq6cF0LxTxoNeAtWz5KWd6Pn
 aNWDqa07na4kQz/p5s7Bw6qPC6i4hTdnNKVcc8N79iyz2TAetbjN8VXfbFFVCLZ9
 3+9O/kFCh+ZolgV1f9U8iLwSbg7V/8wu1uUdR3645cq2PQljluMJEHPSWOmm+LVw
 nOhm8VQiGsILbEYLRJbM3wI8dT4B0PtgP9IqYuk7TLV3qho6Pg8J2nyfpo5jehs5
 IP0gDanCBHYgsoCUwZwnJaKH3hIP2+5IJw6AqX0Lt6ge9aYA35+kWR4KE5459Pxw
 n0Eoiu+5y9QcfpgUuzb5EQIxHdRNlCjik70+n1yXBwwPtV9b6wDN9TlP3eI7p/Vx
 j+FUYcgrjwjfAP7Eh5a/ne/XEcq1R5UghM+2TWEWpqGh11NPnMDJS6d4OYyeCvA6
 yPLAhVvvqdbYM+PpALaNjvLxspxd6MSuKNLNk3pkJmkd69IXnXw=
 =RT/D
 -----END PGP SIGNATURE-----

Merge tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Two regression fixes:

   - hangs caused by a missing barrier in the locking code

   - memory leaks of extent_state due to bad handling of a cached
     pointer"

* tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix extent_state leak in btrfs_lock_and_flush_ordered_range
  btrfs: Fix deadlock caused by missing memory barrier
2019-07-26 11:08:37 -07:00
Linus Torvalds
863fa8887b Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs umount_tree() leak fix from Al Viro:
 "Fix braino introduced in 'switch the remnants of releasing the
  mountpoint away from fs_pin'.

  The most visible result is leaking struct mount when mounting btrfs,
  making it impossible to shut down"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix the struct mount leak in umount_tree()
2019-07-26 10:58:44 -07:00
Linus Torvalds
0441281965 for-linus-20190726
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl07DGAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplf5EADOOvOdsz9N/Iw8ZHHHJXCqKR26zZv75G1z
 0h1PGC7p0JZQbYFo0Zo7mjiRBGlg6tlXc2d4Gyl94XJKDwjeYTcFDvbvERdYa+MH
 d2RiFkAfR967Ri4fb+FP5L3mYOQdMJ/zk0xCDHLv/DcxeFLa5a9EJS1+vBSR+AcB
 0JpJWuHypGqGmbTaL0z9q2pmx0mgA1ERlWQtkMLrsEr2Vqg/rrjGwe2bGFY00lXc
 vKtFkpfugKc4zVAPSzC1YZgojfDDpGNEA4QMtxMsEH4hqyMpHhrtUedNY5QrjC0B
 p9h6aPXXYr2KhGP0grrEytzaYUOzK2crK5h+q+1vu6nOgx2EgmnLM9tBu/LuRH1j
 uUzKJOa3/AE+bU7uZEsaUerTBsHrgEBa1x8G92obYRnjgW3aCD2CaSbjjBhNxTZ4
 1dXyr0DTHFXZmfcfWja5tO26JTPzjwVOrwiRyU0S727UsdVJupoHiYLr5fwaDfgn
 /Du2I/XWvFtflm5i0ND0sdcX1yRlFiGZ9e45z1QFaFmcteKKWzRBDlC6mQzI/lw3
 oc583mhDR3tRtJxow+wn6AuMUehFRh8wj0UhL/MEMjLW8GiqXU5aRtanT+22Xz4L
 saNDQieeEnV7raMYXMP0qIhkJtrNASmJQos+MOJAEGOWcS2ePIUUio2kSXie+071
 BphJd2RamQ==
 =HIzH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190726' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Several io_uring fixes/improvements:
     - Blocking fix for O_DIRECT (me)
     - Latter page slowness for registered buffers (me)
     - Fix poll hang under certain conditions (me)
     - Defer sequence check fix for wrapped rings (Zhengyuan)
     - Mismatch in async inc/dec accounting (Zhengyuan)
     - Memory ordering issue that could cause stall (Zhengyuan)
      - Track sequential defer in bytes, not pages (Zhengyuan)

 - NVMe pull request from Christoph

 - Set of hang fixes for wbt (Josef)

 - Redundant error message kill for libahci (Ding)

 - Remove unused blk_mq_sched_started_request() and related ops (Marcos)

 - drbd dynamic alloc shash descriptor to reduce stack use (Arnd)

 - blkcg ->pd_stat() non-debug print (Tejun)

 - bcache memory leak fix (Wei)

 - Comment fix (Akinobu)

 - BFQ perf regression fix (Paolo)

* tag 'for-linus-20190726' of git://git.kernel.dk/linux-block: (24 commits)
  io_uring: ensure ->list is initialized for poll commands
  Revert "nvme-pci: don't create a read hctx mapping without read queues"
  nvme: fix multipath crash when ANA is deactivated
  nvme: fix memory leak caused by incorrect subsystem free
  nvme: ignore subnqn for ADATA SX6000LNP
  drbd: dynamically allocate shash descriptor
  block: blk-mq: Remove blk_mq_sched_started_request and started_request
  bcache: fix possible memory leak in bch_cached_dev_run()
  io_uring: track io length in async_list based on bytes
  io_uring: don't use iov_iter_advance() for fixed buffers
  block: properly handle IOCB_NOWAIT for async O_DIRECT IO
  blk-mq: allow REQ_NOWAIT to return an error inline
  io_uring: add a memory barrier before atomic_read
  rq-qos: use a mb for got_token
  rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule
  rq-qos: don't reset has_sleepers on spurious wakeups
  rq-qos: fix missed wake-ups in rq_qos_throttle
  wait: add wq_has_single_sleeper helper
  block, bfq: check also in-flight I/O in dispatch plugging
  block: fix sysfs module parameters directory path in comment
  ...
2019-07-26 10:32:12 -07:00
Al Viro
19a1c4092e fix the struct mount leak in umount_tree()
We need to drop everything we remove from the tree, whether
mnt_has_parent() is true or not.  Usually the bug manifests as a slow
memory leak (leaked struct mount for initramfs); it becomes much more
visible in mount_subtree() users, such as btrfs.  There we leak
a struct mount for btrfs superblock being mounted, which prevents
fs shutdown on subsequent umount.

Fixes: 56cbb429d9 ("switch the remnants of releasing the mountpoint away from fs_pin")
Reported-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-26 07:59:06 -04:00
Naohiro Aota
a3b46b86ca btrfs: fix extent_state leak in btrfs_lock_and_flush_ordered_range
btrfs_lock_and_flush_ordered_range() loads given "*cached_state" into
cachedp, which, in general, is NULL. Then, lock_extent_bits() updates
"cachedp", but it never goes backs to the caller. Thus the caller still
see its "cached_state" to be NULL and never free the state allocated
under btrfs_lock_and_flush_ordered_range(). As a result, we will
see massive state leak with e.g. fstests btrfs/005. Fix this bug by
properly handling the pointers.

Fixes: bd80d94efb ("btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-26 12:21:22 +02:00
Gustavo A. R. Silva
2988160827 afs: fsclient: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warnings:

Warning level 3 was used: -Wimplicit-fallthrough=3

fs/afs/fsclient.c: In function ‘afs_deliver_fs_fetch_acl’:
fs/afs/fsclient.c:2199:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
   call->unmarshall++;
   ~~~~~~~~~~~~~~~~^~
fs/afs/fsclient.c:2202:2: note: here
  case 1:
  ^~~~
fs/afs/fsclient.c:2216:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
   call->unmarshall++;
   ~~~~~~~~~~~~~~~~^~
fs/afs/fsclient.c:2219:2: note: here
  case 2:
  ^~~~
fs/afs/fsclient.c:2225:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
   call->unmarshall++;
   ~~~~~~~~~~~~~~~~^~
fs/afs/fsclient.c:2228:2: note: here
  case 3:
  ^~~~

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-07-25 20:09:49 -05:00
Gustavo A. R. Silva
35a3a90cc5 afs: yfsclient: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warnings:

fs/afs/yfsclient.c: In function ‘yfs_deliver_fs_fetch_opaque_acl’:
fs/afs/yfsclient.c:1984:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
   call->unmarshall++;
   ~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:1987:2: note: here
  case 1:
  ^~~~
fs/afs/yfsclient.c:2005:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
   call->unmarshall++;
   ~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2008:2: note: here
  case 2:
  ^~~~
fs/afs/yfsclient.c:2014:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
   call->unmarshall++;
   ~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2017:2: note: here
  case 3:
  ^~~~
fs/afs/yfsclient.c:2035:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
   call->unmarshall++;
   ~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2038:2: note: here
  case 4:
  ^~~~
fs/afs/yfsclient.c:2047:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
   call->unmarshall++;
   ~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2050:2: note: here
  case 5:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Also, fix some commenting style issues.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-07-25 20:09:46 -05:00
Jens Axboe
36703247d5 io_uring: ensure ->list is initialized for poll commands
Daniel reports that when testing an http server that uses io_uring
to poll for incoming connections, sometimes it hard crashes. This is
due to an uninitialized list member for the io_uring request. Normally
this doesn't trigger and none of the test cases caught it.

Reported-by: Daniel Kozak <kozzi11@gmail.com>
Tested-by: Daniel Kozak <kozzi11@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-25 10:20:18 -06:00
Linus Torvalds
a29a0a467e Merge branch 'access-creds'
The access() (and faccessat()) credentials change can cause an
unnecessary load on the RCU machinery because every access() call ends
up freeing the temporary access credential using RCU.

This isn't really noticeable on small machines, but if you have hundreds
of cores you can cause huge slowdowns due to RCU storms.

It's easy to avoid: the temporary access crededntials aren't actually
normally accessed using RCU at all, so we can avoid the whole issue by
just marking them as such.

* access-creds:
  access: avoid the RCU grace period for the temporary subjective credentials
2019-07-25 08:36:29 -07:00
Nikolay Borisov
6e7ca09b58 btrfs: Fix deadlock caused by missing memory barrier
Commit 06297d8cef ("btrfs: switch extent_buffer blocking_writers from
atomic to int") changed the type of blocking_writers but forgot to
adjust relevant code in btrfs_tree_unlock by converting the
smp_mb__after_atomic to smp_mb.  This opened up the possibility of a
deadlock due to re-ordering of setting blocking_writers and
checking/waking up the waiter. This particular lockup is explained in a
comment above waitqueue_active() function.

Fix it by converting the memory barrier to a full smp_mb, accounting
for the fact that blocking_writers is a simple integer.

Fixes: 06297d8cef ("btrfs: switch extent_buffer blocking_writers from atomic to int")
Tested-by: Johannes Thumshirn <jthumshirn@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-25 17:34:08 +02:00
Jann Horn
16d51a590a sched/fair: Don't free p->numa_faults with concurrent readers
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25 15:37:04 +02:00
Masahiro Yamada
0ce38c5f92 iomap: fix Invalid License ID
Detected by:

  $ ./scripts/spdxcheck.py
  fs/iomap/Makefile: 1:27 Invalid License ID: GPL-2.0-or-newer

Fixes: 1c230208f5 ("iomap: start moving code to fs/iomap/")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25 11:05:11 +02:00
Linus Torvalds
d7852fbd0f access: avoid the RCU grace period for the temporary subjective credentials
It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
work because it installs a temporary credential that gets allocated and
freed for each system call.

The allocation and freeing overhead is mostly benign, but because
credentials can be accessed under the RCU read lock, the freeing
involves a RCU grace period.

Which is not a huge deal normally, but if you have a lot of access()
calls, this causes a fair amount of seconday damage: instead of having a
nice alloc/free patterns that hits in hot per-CPU slab caches, you have
all those delayed free's, and on big machines with hundreds of cores,
the RCU overhead can end up being enormous.

But it turns out that all of this is entirely unnecessary.  Exactly
because access() only installs the credential as the thread-local
subjective credential, the temporary cred pointer doesn't actually need
to be RCU free'd at all.  Once we're done using it, we can just free it
synchronously and avoid all the RCU overhead.

So add a 'non_rcu' flag to 'struct cred', which can be set by users that
know they only use it in non-RCU context (there are other potential
users for this).  We can make it a union with the rcu freeing list head
that we need for the RCU case, so this doesn't need any extra storage.

Note that this also makes 'get_current_cred()' clear the new non_rcu
flag, in case we have filesystems that take a long-term reference to the
cred and then expect the RCU delayed freeing afterwards.  It's not
entirely clear that this is required, but it makes for clear semantics:
the subjective cred remains non-RCU as long as you only access it
synchronously using the thread-local accessors, but you _can_ use it as
a generic cred if you want to.

It is possible that we should just remove the whole RCU markings for
->cred entirely.  Only ->real_cred is really supposed to be accessed
through RCU, and the long-term cred copies that nfs uses might want to
explicitly re-enable RCU freeing if required, rather than have
get_current_cred() do it implicitly.

But this is a "minimal semantic changes" change for the immediate
problem.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Glauber <jglauber@marvell.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-24 10:12:09 -07:00
Linus Torvalds
21c730d734 for-5.3-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl01plkACgkQxWXV+ddt
 WDtdmQ/7B4dLmod95cUIY6vhkmlZ3joVEPCCacSj1i1VXFP+QxDgHmmGXQQ0gTuP
 fJ0oe1gae5wFGnSZTVJ1WADkVeHqg3rZenSZaBNgLnNORwRVJUpgUchuvd+3L9z6
 G0W7CU4bTX99eAERKHAy2uZ3t8bihmKySnZ79gkkfNTN0sV+yq68+5nQLMghGjwM
 svutFEEKOenkXaV24a9cffcxcQliRMVjjHojHNqA5l6QJJpjM4ZKy7KPxvOkmeFx
 VTko8E33kMO4TvipwrdZ7wD1E/mdMoUDe3dh1oJDL6h1DFlhvr/DhdYOx2cXq/0O
 3zlAL8AEWPZp2gPeRgpmydvuVLdUkgd54yvbROYYpo0GcQgINSxW6VJ+wpHOMdhA
 dmfE0NzNhLMklfJFhCL5l6GFrsdXXl1zzaVcYanZly4HY60RNu4N3rG0YE1gHKvO
 biioerrMR9/+r0ZZh9RLcw0pnvVlx2VlD9lbE7QtoeJvQX6l3yQ2r/MFvNlluGiU
 9z21wqtQf9aDnSETvvec0JM3eOgDe6PnjINxj/zUzUY1qoc/ESuSqf9i6AJ3/ix7
 L9cevq/+hMARPAqILbtKWZWI/gqtKX22DckUIHcvqYdkG4zJW3e+PworBbsTI6Lb
 4QpEHW++dzctc/sdgoEBnwFgewXGtZnSHSCL88y3Zxt0suqwixY=
 =6yeW
 -----END PGP SIGNATURE-----

Merge tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fixes for leaks caused by recently merged patches

 - one build fix

 - a fix to prevent mixing of incompatible features

* tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: don't leak extent_map in btrfs_get_io_geometry()
  btrfs: free checksum hash on in close_ctree
  btrfs: Fix build error while LIBCRC32C is module
  btrfs: inode: Don't compress if NODATASUM or NODATACOW set
2019-07-22 09:08:38 -07:00
Zhengyuan Liu
9310a7ba6d io_uring: track io length in async_list based on bytes
We are using PAGE_SIZE as the unit to determine if the total len in
async_list has exceeded max_pages, it's not fair for smaller io sizes.
For example, if we are doing 1k-size io streams, we will never exceed
max_pages since len >>= PAGE_SHIFT always gets zero. So use original
bytes to make it more accurate.

Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-21 21:46:55 -06:00
Jens Axboe
bd11b3a391 io_uring: don't use iov_iter_advance() for fixed buffers
Hrvoje reports that when a large fixed buffer is registered and IO is
being done to the latter pages of said buffer, the IO submission time
is much worse:

reading to the start of the buffer: 11238 ns
reading to the end of the buffer:   1039879 ns

In fact, it's worse by two orders of magnitude. The reason for that is
how io_uring figures out how to setup the iov_iter. We point the iter
at the first bvec, and then use iov_iter_advance() to fast-forward to
the offset within that buffer we need.

However, that is abysmally slow, as it entails iterating the bvecs
that we setup as part of buffer registration. There's really no need
to use this generic helper, as we know it's a BVEC type iterator, and
we also know that each bvec is PAGE_SIZE in size, apart from possibly
the first and last. Hence we can just use a shift on the offset to
find the right index, and then adjust the iov_iter appropriately.
After this fix, the timings are:

reading to the start of the buffer: 10135 ns
reading to the end of the buffer:   1377 ns

Or about an 755x improvement for the tail page.

Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-21 21:46:36 -06:00
Jens Axboe
6a43074e2f block: properly handle IOCB_NOWAIT for async O_DIRECT IO
A caller is supposed to pass in REQ_NOWAIT if we can't block for any
given operation, but O_DIRECT for block devices just ignore this. Hence
we'll block for various resource shortages on the block layer side,
like having to wait for requests.

Use the new REQ_NOWAIT_INLINE to ask for this error to be returned
inline, so we can handle it appropriately and return -EAGAIN to the
caller.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-21 21:46:29 -06:00
Linus Torvalds
91962d0f79 SMB3 fixes: two fixes for stable, one that had dependency on earlier patch in this merge window and can now go in, and a perf improvement in SMB3 open
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl0zg9YACgkQiiy9cAdy
 T1HEtQv/Vn2vb9jPoqbCc5QfSUDL13dNYyxhqt3xPdbb3m49a7XNHLoBByM/pnMu
 TF10QdOPkPS5eP5OTDpR7iwS9iNX9+8KGtlqlbzX7z+bwQmGbBQXg3VoYahSeeGU
 soJE8VgqoeoOkPV9Sl8tojOIVk6B+BAv4iRPpswt8iPD8Y8IPRg3ONZNoomybzfG
 GlQBFabhxJU8VLkJIb4NVB+E4AlgMDueD7HpCXD2zh6xsEULgRs2wwxfEopZlyzh
 prjNY6OZliXMuTMNLw2D07xlya3ZqaXOt+hHEuvSB//YxooTVxBme70ycLF47N3q
 Gvmaw8QB6scl110PHud9luqpHhdJsEvKOKr0Amv2ND2Atb098D0kZ+VITRP3QHt5
 LZcHtKNRXDpAvkHphgN0W5O0ngdFDFo197BaWdwPmi9rARPhuQPq61vkFpsrPwN2
 AxkG7IQANNJh4bII6/+xBpSjmtInm0BQhxuojpHIYGiZvdHNYMrJfwFcAoVWfVvl
 b9MtG9Qz
 =rP0b
 -----END PGP SIGNATURE-----

Merge tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Two fixes for stable, one that had dependency on earlier patch in this
  merge window and can now go in, and a perf improvement in SMB3 open"

* tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module number
  cifs: flush before set-info if we have writeable handles
  smb3: optimize open to not send query file internal info
  cifs: copy_file_range needs to strip setuid bits and update timestamps
  CIFS: fix deadlock in cached root handling
2019-07-21 10:01:17 -07:00
Linus Torvalds
18253e034d Merge branch 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache and mountpoint updates from Al Viro:
 "Saner handling of refcounts to mountpoints.

  Transfer the counting reference from struct mount ->mnt_mountpoint
  over to struct mountpoint ->m_dentry. That allows us to get rid of the
  convoluted games with ordering of mount shutdowns.

  The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
  mixed-filesystem shrink lists, which we'll also need for the Slab
  Movable Objects patchset"

* 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch the remnants of releasing the mountpoint away from fs_pin
  get rid of detach_mnt()
  make struct mountpoint bear the dentry reference to mountpoint, not struct mount
  Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
  fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
  __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
  nfs: dget_parent() never returns NULL
  ceph: don't open-code the check for dead lockref
2019-07-20 09:15:51 -07:00
Linus Torvalds
26473f8370 Also new for 5.3:
- Regroup the fs/iomap.c code by major functional area so that we can
   start development for 5.4 from a more stable base.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl0vMvMACgkQ+H93GTRK
 tOtgsw//Xrqy6pYnohvltKkmE2Ioo17Ylctg15MZpicxSREyozSntdUbPJ8Hv3qF
 uM80Z9PJh/XzlTbDbQ+bvEj6kAQxClGmcoKn8vBScW0LBqRz5rMwhJE2C8hyRx08
 hf310FPnZnyJK7jWGjZFhg1EsIqzQD8TZVNt4+sT/Kz/dWglkeT5sXJtoGTT8WI2
 Rgx8U8AYdpjaKfUf7X7ab68krYBNOrUS6vRp+4sfts6s7y4zILOom2QdDblwWT54
 pruq6iS4+2gyf4Pl7HXYT2A17R/coTb0AOrWNC3Sg0W4I6gdfoTXeten7jUVgXvl
 eXKOPHYYXqJadvdjPx7+DFW7sy6RSP8xe/KUp9uiEOW4dmKqxTrEoxYgFNBXgjwC
 FBUwgc2vhAw8o3P+/NcfbqYWwF/2fDvDBTQZ3kdwpmrFQqzhDyRxr5hPrhObuo5r
 wAJgP8F4M5KKdos0lg9jR4cirrInEzUOeHaLhFC+d9cFMNcxRo8ddx5KriMHVvuA
 JWgeXWvRKL3nPtbnyLRVxeEGmjhjwMkntKaCPqgD4FOD1+CGUuBtzykcPMbGfSS0
 sZd/qEJ6lZqYKRxee/R1d5RkJx+86TG3ZdWvuc49zSYavMLuqG/l2ohmfQ1P03nA
 Ux+8Bg6BbMGzlkVPXgiogHBN6ro2ZrjsHzu8E6+IuEXeL3NIC8A=
 =3uGR
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap split/cleanup from Darrick Wong:
 "As promised, here's the second part of the iomap merge for 5.3, in
  which we break up iomap.c into smaller files grouped by functional
  area so that it'll be easier in the long run to maintain cohesiveness
  of code units and to review incoming patches. There are no functional
  changes and fs/iomap.c split cleanly.

  Summary:

   - Regroup the fs/iomap.c code by major functional area so that we can
     start development for 5.4 from a more stable base"

* tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: move internal declarations into fs/iomap/
  iomap: move the main iteration code into a separate file
  iomap: move the buffered IO code into a separate file
  iomap: move the direct IO code into a separate file
  iomap: move the SEEK_HOLE code into a separate file
  iomap: move the file mapping reporting code into a separate file
  iomap: move the swapfile code into a separate file
  iomap: start moving code to fs/iomap/
2019-07-19 11:38:12 -07:00
Linus Torvalds
d2fbf4b6d5 Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull adfs updates from Al Viro:
 "More ADFS patches from Russell King"

* 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs/adfs: add time stamp and file type helpers
  fs/adfs: super: limit idlen according to directory type
  fs/adfs: super: fix use-after-free bug
  fs/adfs: super: safely update options on remount
  fs/adfs: super: correct superblock flags
  fs/adfs: clean up indirect disc addresses and fragment IDs
  fs/adfs: clean up error message printing
  fs/adfs: use %pV for error messages
  fs/adfs: use format_version from disc_record
  fs/adfs: add helper to get filesystem size
  fs/adfs: add helper to get discrecord from map
  fs/adfs: correct disc record structure
2019-07-19 11:33:22 -07:00
Linus Torvalds
933a90bf4f Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount updates from Al Viro:
 "The first part of mount updates.

  Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  mnt_init(): call shmem_init() unconditionally
  constify ksys_mount() string arguments
  don't bother with registering rootfs
  init_rootfs(): don't bother with init_ramfs_fs()
  vfs: Convert smackfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert binfmt_misc to use the new mount API
  convenience helper: get_tree_single()
  convenience helper get_tree_nodev()
  vfs: Kill sget_userns()
  ...
2019-07-19 10:42:02 -07:00
Linus Torvalds
249be8511b Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
 "The rest of MM and a kernel-wide procfs cleanup.

  Summary of the more significant patches:

   - Patch series "mm/memory_hotplug: Factor out memory block
     devicehandling", v3. David Hildenbrand.

     Some spring-cleaning of the memory hotplug code, notably in
     drivers/base/memory.c

   - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang
     Shi.

     Fix /proc/pid/smaps output for THP pages used in shmem.

   - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit.

     Bugfix and speedup for kernel/resource.c

   - Patch series "mm: Further memory block device cleanups", David
     Hildenbrand.

     More spring-cleaning of the memory hotplug code.

   - Patch series "mm: Sub-section memory hotplug support". Dan
     Williams.

     Generalise the memory hotplug code so that pmem can use it more
     completely. Then remove the hacks from the libnvdimm code which
     were there to work around the memory-hotplug code's constraints.

   - "proc/sysctl: add shared variables for range check", Matteo Croce.

     We have about 250 instances of

          int zero;
          ...
                  .extra1 = &zero,

     in the tree. This is a tree-wide sweep to make all those private
     "zero"s and "one"s use global variables.

     Alas, it isn't practical to make those two global integers const"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits)
  proc/sysctl: add shared variables for range check
  mm: migrate: remove unused mode argument
  mm/sparsemem: cleanup 'section number' data types
  libnvdimm/pfn: stop padding pmem namespaces to section alignment
  libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
  mm/devm_memremap_pages: enable sub-section remap
  mm: document ZONE_DEVICE memory-model implications
  mm/sparsemem: support sub-section hotplug
  mm/sparsemem: prepare for sub-section ranges
  mm: kill is_dev_zone() helper
  mm/hotplug: kill is_dev_zone() usage in __remove_pages()
  mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()
  mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal
  mm/sparsemem: add helpers track active portions of a section at boot
  mm/sparsemem: introduce a SECTION_IS_EARLY flag
  mm/sparsemem: introduce struct mem_section_usage
  drivers/base/memory.c: get rid of find_memory_block_hinted()
  mm/memory_hotplug: move and simplify walk_memory_blocks()
  mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns
  mm: make register_mem_sect_under_node() static
  ...
2019-07-19 09:45:58 -07:00
Matteo Croce
eec4844fae proc/sysctl: add shared variables for range check
In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range.  This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.

On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.

The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.

This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data                                         old     new   delta
    sysctl_vals                                    -      12     +12
    __kstrtab_sysctl_vals                          -      12     +12
    max                                           14      10      -4
    int_max                                       16       -     -16
    one                                           68       -     -68
    zero                                         128      28    -100
    Total: Before=20583249, After=20583085, chg -0.00%

[mcroce@redhat.com: tipc: remove two unused variables]
  Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
[akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
[arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
  Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
[akpm@linux-foundation.org: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00
Keith Busch
371096949f mm: migrate: remove unused mode argument
migrate_page_move_mapping() doesn't use the mode argument.  Remove it
and update callers accordingly.

Link: http://lkml.kernel.org/r/20190508210301.8472-1-keith.busch@intel.com
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00
Yang Shi
c06306696f mm: thp: fix false negative of shmem vma's THP eligibility
Commit 7635d9cbe8 ("mm, thp, proc: report THP eligibility for each
vma") introduced THPeligible bit for processes' smaps.  But, when
checking the eligibility for shmem vma, __transparent_hugepage_enabled()
is called to override the result from shmem_huge_enabled().  It may
result in the anonymous vma's THP flag override shmem's.  For example,
running a simple test which create THP for shmem, but with anonymous THP
disabled, when reading the process's smaps, it may show:

  7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
  Size:               4096 kB
  ...
  [snip]
  ...
  ShmemPmdMapped:     4096 kB
  ...
  [snip]
  ...
  THPeligible:    0

And, /proc/meminfo does show THP allocated and PMD mapped too:

  ShmemHugePages:     4096 kB
  ShmemPmdMapped:     4096 kB

This doesn't make too much sense.  The shmem objects should be treated
separately from anonymous THP.  Calling shmem_huge_enabled() with
checking MMF_DISABLE_THP sounds good enough.  And, we could skip stack
and dax vma check since we already checked if the vma is shmem already.

Also check if vma is suitable for THP by calling
transhuge_vma_suitable().

And minor fix to smaps output format and documentation.

Link: http://lkml.kernel.org/r/1560401041-32207-3-git-send-email-yang.shi@linux.alibaba.com
Fixes: 7635d9cbe8 ("mm, thp, proc: report THP eligibility for each vma")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:06 -07:00
Steve French
2a957ace44 cifs: update internal module number
To 2.21

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-18 18:14:47 -05:00
Ronnie Sahlberg
aa081859b1 cifs: flush before set-info if we have writeable handles
Servers can defer destaging any data and updating the mtime until close().
This means that if we do a setinfo to modify the mtime while other handles
are open for write the server may overwrite our setinfo timestamps when
if flushes the file on close() of the writeable handle.

To solve this we add an explicit flush when the mtime is about to
be updated.

This fixes "cp -p" to preserve mtime when copying a file onto an SMB2 share.

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-18 17:46:23 -05:00
Steve French
89a5bfa350 smb3: optimize open to not send query file internal info
We can cut one third of the traffic on open by not querying the
inode number explicitly via SMB3 query_info since it is now
returned on open in the qfid context.

This is better in multiple ways, and
speeds up file open about 10% (more if network is slow).

Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-18 17:44:13 -05:00
Linus Torvalds
6860c981b9 NFS client updates for Linux 5.3
Highlights include:
 
 Stable fixes:
 - SUNRPC: Ensure bvecs are re-synced when we re-encode the RPC request
 - Fix an Oops in ff_layout_track_ds_error due to a PTR_ERR() dereference
 - Revert buggy NFS readdirplus optimisation
 - NFSv4: Handle the special Linux file open access mode
 - pnfs: Fix a problem where we gratuitously start doing I/O through the MDS
 
 Features:
 - Allow NFS client to set up multiple TCP connections to the server using
    a new 'nconnect=X' mount option. Queue length is used to balance load.
 - Enhance statistics reporting to report on all transports when using
    multiple connections.
 - Speed up SUNRPC by removing bh-safe spinlocks
 - Add a mechanism to allow NFSv4 to request that containers set a
    unique per-host identifier for when the hostname is not set.
 - Ensure NFSv4 updates the lease_time after a clientid update
 
 Bugfixes and cleanup:
 - Fix use-after-free in rpcrdma_post_recvs
 - Fix a memory leak when nfs_match_client() is interrupted
 - Fix buggy file access checking in NFSv4 open for execute
 - disable unsupported client side deduplication
 - Fix spurious client disconnections
 - Fix occasional RDMA transport deadlock
 - Various RDMA cleanups
 - Various tracepoint fixes
 - Fix the TCP callback channel to guarantee the server can actually send
    the number of callback requests that was negotiated at mount time.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl0wz/EACgkQZwvnipYK
 APJUrQ/9FpbWTaI/H0BkBf+/iIDAwuGozldDDRkHpHvykvcbNSFPNk5CA4fV/lCq
 rI0tD9ijj0KJhAjS1RWLb9oXMUm/MDIIcZNvsbOiIfGftzGnxAz022KS2JyV+Xk6
 ul7sCQ8QmbQJzfZk+aFZxl5slS9i5nM+Sa//k5SK3JghOlLuYkW50OlmE6ME8rKP
 sLWD2quLh1J5mjG+B9ml4YWozNi0elFN6ZqdzO/SjIhIG7rn2Hkr0glWcEprdiS7
 qHoF6dIn80oR37YNrcbBtEGqBnMnWkpGAGq2Hgf5I2YtO+xfEgMqqs9FNIcahtQG
 SGa1bnpcys9+fG4fnwzgOgYkvGzKdi/bIYc8lhBowl+76z+6VgjwlwT3wRtnWfO9
 288LZDMMfOSYoDkbA9iGtoQRviIsoGMoeH/r5tZe7U1W35h60NGPVhH9sP9/Gc2s
 Va1xBYK6PaIg+UxeghJmCnNK3zANyEI7AtDVvn0wsJuDcgXO5/PAANenc6/Fl6nf
 U9XB4wmgaM2jN5uqLYT3bJAaAPAXTOayVnnd9oDYb87/31sYM6iwzSPkmm5+qIsi
 0Ftks4PSjpjTTrmUvs3CL+ll3Y9bWUd58qlooGwi6u1DnXDQDzbIoRQBezNFJ+Lc
 MahUVY7w9ATyJFJRvbscO7mRS8qTmr4qP91bAOsYQ0P2Yo02L7k=
 =eR7d
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - SUNRPC: Ensure bvecs are re-synced when we re-encode the RPC
     request

   - Fix an Oops in ff_layout_track_ds_error due to a PTR_ERR()
     dereference

   - Revert buggy NFS readdirplus optimisation

   - NFSv4: Handle the special Linux file open access mode

   - pnfs: Fix a problem where we gratuitously start doing I/O through
     the MDS

  Features:

   - Allow NFS client to set up multiple TCP connections to the server
     using a new 'nconnect=X' mount option. Queue length is used to
     balance load.

   - Enhance statistics reporting to report on all transports when using
     multiple connections.

   - Speed up SUNRPC by removing bh-safe spinlocks

   - Add a mechanism to allow NFSv4 to request that containers set a
     unique per-host identifier for when the hostname is not set.

   - Ensure NFSv4 updates the lease_time after a clientid update

  Bugfixes and cleanup:

   - Fix use-after-free in rpcrdma_post_recvs

   - Fix a memory leak when nfs_match_client() is interrupted

   - Fix buggy file access checking in NFSv4 open for execute

   - disable unsupported client side deduplication

   - Fix spurious client disconnections

   - Fix occasional RDMA transport deadlock

   - Various RDMA cleanups

   - Various tracepoint fixes

   - Fix the TCP callback channel to guarantee the server can actually
     send the number of callback requests that was negotiated at mount
     time"

* tag 'nfs-for-5.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits)
  pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDS
  pnfs: Fix a problem where we gratuitously start doing I/O through the MDS
  SUNRPC: Optimise transport balancing code
  SUNRPC: Ensure the bvecs are reset when we re-encode the RPC request
  pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error
  NFSv4: Don't use the zero stateid with layoutget
  SUNRPC: Fix up backchannel slot table accounting
  SUNRPC: Fix initialisation of struct rpc_xprt_switch
  SUNRPC: Skip zero-refcount transports
  SUNRPC: Replace division by multiplication in calculation of queue length
  NFSv4: Validate the stateid before applying it to state recovery
  nfs4.0: Refetch lease_time after clientid update
  nfs4: Rename nfs41_setup_state_renewal
  nfs4: Make nfs4_proc_get_lease_time available for nfs4.0
  nfs: Fix copy-and-paste error in debug message
  NFS: Replace 16 seq_printf() calls by seq_puts()
  NFS: Use seq_putc() in nfs_show_stats()
  Revert "NFS: readdirplus optimization by cache mechanism" (memleak)
  SUNRPC: Fix transport accounting when caller specifies an rpc_xprt
  NFS: Record task, client ID, and XID in xdr_status trace points
  ...
2019-07-18 14:32:33 -07:00
Trond Myklebust
d5b9216fd5 pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDS
Add tracepoints to allow debugging of the event chain leading to
a pnfs fallback to doing I/O through the MDS.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18 15:50:28 -04:00
Trond Myklebust
58bbeab425 pnfs: Fix a problem where we gratuitously start doing I/O through the MDS
If the client has to stop in pnfs_update_layout() to wait for another
layoutget to complete, it currently exits and defaults to I/O through
the MDS if the layoutget was successful.

Fixes: d03360aaf5 ("pNFS: Ensure we return the error if someone kills...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.20+
2019-07-18 15:33:42 -04:00
Amir Goldstein
bf3c90ee1e cifs: copy_file_range needs to strip setuid bits and update timestamps
cifs has both source and destination inodes locked throughout the copy.
Like ->write_iter(), we update mtime and strip setuid bits of destination
file before copy and like ->read_iter(), we update atime of source file
after copy.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-07-18 14:03:03 -05:00
Aurelien Aptel
7e5a70ad88 CIFS: fix deadlock in cached root handling
Prevent deadlock between open_shroot() and
cifs_mark_open_files_invalid() by releasing the lock before entering
SMB2_open, taking it again after and checking if we still need to use
the result.

Link: https://lore.kernel.org/linux-cifs/684ed01c-cbca-2716-bc28-b0a59a0f8521@prodrive-technologies.com/T/#u
Fixes: 3d4ef9a153 ("smb3: fix redundant opens on root")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-07-18 13:51:35 -05:00
Trond Myklebust
8e04fdfadd pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error
mirror->mirror_ds can be NULL if uninitialised, but can contain
a PTR_ERR() if call to GETDEVICEINFO failed.

Fixes: 65990d1afb ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # 4.10+
2019-07-18 14:43:52 -04:00
Trond Myklebust
d9aba2b40d NFSv4: Don't use the zero stateid with layoutget
The NFSv4.1 protocol explicitly forbids us from using the zero stateid
together with layoutget, so when we see that nfs4_select_rw_stateid()
is unable to return a valid delegation, lock or open stateid, then
we should initiate recovery and retry.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18 14:43:52 -04:00
Linus Torvalds
366a4e38b8 Also new for 5.3:
- Bring fs/xfs/libxfs/xfs_trans_inode.c in sync with userspace libxfs.
 - Convert the xfs administrator guide to rst and move it into the
   official admin guide under Documentation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl0t6wIACgkQ+H93GTRK
 tOuCqQ//TN2065e+0avyxuwNgUQIS73ujC/R8KivO3f9PlhMzjiMhgCBiIurGsi5
 8rInaCDr4L0DRL67W+3d72iWAc1eSP+l/zp2vEMUs/2YgDTGli5JfoWR0JeuKaiD
 7ymSxPvShfrwDwidMn6S7tXvcEjUEJrms4jeuERBrHwZMOvxzMiM3910oG12AfbM
 Fipvn9eDTmaInHX9T9nuBhvK077/1bXF1J2r/4XieOuhPnNjV5mLH3fJnPswu3uL
 RCSS1WYuJMBV/EQbLCgiM47dgVAtig8QSacqfpGtAZrDLAXy/Y6T+AwvclPFTE5c
 HlHLHf1+zI6FB8t2BNtnapMrRm1wmhCCHEEJLa+iHENY+J113R0I7c6Jscs6tnJd
 oW253Mc9juiB3bnC3iIAgcXrny68qPPpwozzYktQ9XsPxz5XEDEsWBpQS9rzoJRk
 u1zaU3VUxAO2sTJpKE4OXs5DLWvM5C5Yz8jbPXX3U1GBtNzk8EM9SmI0nA1+OVCA
 B4kbN6jKWMsG13bIIb16TbfkrhAucK/09Pbp2/4spayf6PYFci6E6Bg5NuVp2iaH
 n7oIk5WaobFPDbtdIaG27awAX0UCgItFrPMqLN+bBqeiYhNPuxc+ycmyb16VlRWf
 ob0OzWm6+nRJCNGQ1yBd3flILAPxV/Fa7Fegtf/HIDKCLfDW4xM=
 =+DS3
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.3-merge-13' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs cleanups from Darrick Wong:
 "We had a few more lateish cleanup patches come in for 5.3 -- a couple
  of syncups with the userspace libxfs code and a conversion of the XFS
  administrator's guide to ReST format.

  Summary:

   - Bring fs/xfs/libxfs/xfs_trans_inode.c in sync with userspace
     libxfs.

   - Convert the xfs administrator guide to rst and move it into the
     official admin guide under Documentation"

* tag 'xfs-5.3-merge-13' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  Documentation: filesystem: Convert xfs.txt to ReST
  xfs: sync up xfs_trans_inode with userspace
  xfs: move xfs_trans_inode.c to libxfs/
2019-07-18 11:18:00 -07:00
Linus Torvalds
ae9b728c8d smb3/cifs fixes (3 for stable) and improvements including much faster encryption (SMB3.1.1 GCM)
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl0wDEQACgkQiiy9cAdy
 T1E3CQv/e+8uTD0dSmU+bEBopYCtihRq7ZGXtCGSE8U/fj0l34qBxds/JLvTSSeY
 NhUD+F5e2NYSU7LZx8d9HkOJStcLaNx5Jq1YrxmGvVfUC6s7VKn9637nByXhrgrM
 t/rQj8Ot6RDGMNs7PlMUt1jjtP3zL9ugQ2DHsjLoCY+w07qbsVWCZlm9sJEmr8lS
 3umvfPPi8LKNsOxTT+DsSwZ+XN/BctCExeojVkdFRCBsYJyHbJtejeJPXWxv4/6m
 lQpY0uLwjxgRO6aZxFvMW18vhI8977f1svwA4CmgaVYB0A7yr1VptINWVPfN+mGK
 BYJRe1i54JSBZ8/vp1POvKrhLa6Y623BNpa6myjxOXYQ3/M7PDU+PycosI4V61Bp
 yyH451jdKGZYojG6O7qGGE8kTDyjCs/k/2GeNeUKvHcNX9juDBMTxx2G5kP+w/xd
 2lgvgrYlSWVG/p1ADlHtwsAEupg8xZcl/y3IGBIAw57uKAX2LRzujbeT/CpZ3phm
 k5ZljExt
 =bdbT
 -----END PGP SIGNATURE-----

Merge tag '4.3-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs updates from Steve French:
 "Fixes (three for stable) and improvements including much faster
  encryption (SMB3.1.1 GCM)"

* tag '4.3-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (27 commits)
  smb3: smbdirect no longer experimental
  cifs: fix crash in smb2_compound_op()/smb2_set_next_command()
  cifs: fix crash in cifs_dfs_do_automount
  cifs: fix parsing of symbolic link error response
  cifs: refactor and clean up arguments in the reparse point parsing
  SMB3: query inode number on open via create context
  smb3: Send netname context during negotiate protocol
  smb3: do not send compression info by default
  smb3: add new mount option to retrieve mode from special ACE
  smb3: Allow query of symlinks stored as reparse points
  cifs: Fix a race condition with cifs_echo_request
  cifs: always add credits back for unsolicited PDUs
  fs: cifs: cifsssmb: Change return type of convert_ace_to_cifs_ace
  add some missing definitions
  cifs: fix typo in debug message with struct field ia_valid
  smb3: minor cleanup of compound_send_recv
  CIFS: Fix module dependency
  cifs: simplify code by removing CONFIG_CIFS_ACL ifdef
  cifs: Fix check for matching with existing mount
  cifs: Properly handle auto disabling of serverino option
  ...
2019-07-18 11:11:51 -07:00
Linus Torvalds
d9b9c89304 Lots of exciting things this time!
- support for rbd object-map and fast-diff features (myself).  This
   will speed up reads, discards and things like snap diffs on sparse
   images.
 
 - ceph.snap.btime vxattr to expose snapshot creation time (David
   Disseldorp).  This will be used to integrate with "Restore Previous
   Versions" feature added in Windows 7 for folks who reexport ceph
   through SMB.
 
 - security xattrs for ceph (Zheng Yan).  Only selinux is supported
   for now due to the limitations of ->dentry_init_security().
 
 - support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff
   Layton).  This is actually a single feature bit which was missing
   because of the filesystem pieces.  With this in, the kernel client
   will finally be reported as "luminous" by "ceph features" -- it is
   still being reported as "jewel" even though all required Luminous
   features were implemented in 4.13.
 
 - stop NULL-terminating ceph vxattrs (Jeff Layton).  The convention
   with xattrs is to not terminate and this was causing inconsistencies
   with ceph-fuse.
 
 - change filesystem time granularity from 1 us to 1 ns, again fixing
   an inconsistency with ceph-fuse (Luis Henriques).
 
 On top of this there are some additional dentry name handling and cap
 flushing fixes from Zheng.  Finally, Jeff is formally taking over for
 Zheng as the filesystem maintainer.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl0u+X8THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi9byB/9fIxzoxtDMvixtJabuGSJRtlijDlWF
 GlO6yIWCXl/8v8easR2PCF75U/xv0+QFQmze8PVi8u4Xz589P247NnEuyEZ9n84i
 aCavARho6QLZPEL+B04NaqoHBl+ORKQTA6eKGhyKwRp/rn83z5Ubuw2tN7krHT3b
 kCY61FuTQGxNY2o/WKv/iLwINYr7H23hCf0WwyyKH1bp7OegiQ14Ebn1NtfS3sMx
 hS6h8Ya826vmUW0bCSS/9kzKYBCjksTig0HphUOHq6BoZJs++0b7GukIulRyuLfD
 J9Gr9HGPoDCVzdmFfpn2FSlxdmqfO9amUSagd0ftLQfFlPlrpoULi0GW
 =Bgxr
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "Lots of exciting things this time!

   - support for rbd object-map and fast-diff features (myself). This
     will speed up reads, discards and things like snap diffs on sparse
     images.

   - ceph.snap.btime vxattr to expose snapshot creation time (David
     Disseldorp). This will be used to integrate with "Restore Previous
     Versions" feature added in Windows 7 for folks who reexport ceph
     through SMB.

   - security xattrs for ceph (Zheng Yan). Only selinux is supported for
     now due to the limitations of ->dentry_init_security().

   - support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff
     Layton). This is actually a single feature bit which was missing
     because of the filesystem pieces. With this in, the kernel client
     will finally be reported as "luminous" by "ceph features" -- it is
     still being reported as "jewel" even though all required Luminous
     features were implemented in 4.13.

   - stop NULL-terminating ceph vxattrs (Jeff Layton). The convention
     with xattrs is to not terminate and this was causing
     inconsistencies with ceph-fuse.

   - change filesystem time granularity from 1 us to 1 ns, again fixing
     an inconsistency with ceph-fuse (Luis Henriques).

  On top of this there are some additional dentry name handling and cap
  flushing fixes from Zheng. Finally, Jeff is formally taking over for
  Zheng as the filesystem maintainer"

* tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits)
  ceph: fix end offset in truncate_inode_pages_range call
  ceph: use generic_delete_inode() for ->drop_inode
  ceph: use ceph_evict_inode to cleanup inode's resource
  ceph: initialize superblock s_time_gran to 1
  MAINTAINERS: take over for Zheng as CephFS kernel client maintainer
  rbd: setallochint only if object doesn't exist
  rbd: support for object-map and fast-diff
  rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe()
  libceph: export osd_req_op_data() macro
  libceph: change ceph_osdc_call() to take page vector for response
  libceph: bump CEPH_MSG_MAX_DATA_LEN (again)
  rbd: new exclusive lock wait/wake code
  rbd: quiescing lock should wait for image requests
  rbd: lock should be quiesced on reacquire
  rbd: introduce copyup state machine
  rbd: rename rbd_obj_setup_*() to rbd_obj_init_*()
  rbd: move OSD request allocation into object request state machines
  rbd: factor out __rbd_osd_setup_discard_ops()
  rbd: factor out rbd_osd_setup_copyup()
  rbd: introduce obj_req->osd_reqs list
  ...
2019-07-18 11:05:25 -07:00
Linus Torvalds
0fe49f70a0 - Fix a hang condition that started triggering after the Xarray
conversion of fsdax in the v4.20 kernel.
 
 - Add a 'resource' (root-only physical base address) sysfs attribute to
   device-dax instances to correlate memory-blocks onlined via the kmem
   driver with a given device instance.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdMIETAAoJEB7SkWpmfYgCei0P/A6BpAftQF8bWOX8drjrBj6J
 WSrrhmfNPQ0+D+UejfrPUGVg7JysmFpSvfaRkp41nSpKaX6wr6M2uQrHNQl5hIYK
 gi5PStYMQay4lM78TrLsFFdDqYX5M6VZhpO3Xgd82bPT2GMXhwckua4ad4WYoN8Y
 2ufNajZt/WxBL45VqL1FFqpPK+TKTbVihBR/3W36+NOSJnsj/IH5OlrHswsyq73v
 J1YkQY0IvhGR6nZdsNZZV9Faux4jsIVPFW/mh1k1QVLP1r70aJlxcCyka6lRVd4R
 ktYFOwtX/B39T72RPQB59Z4LOf/VC9pNaiK7hhWuGQ6XepMo5/0fkhYRslhQobll
 7XOYUC01J0jreMu5pvWrZKfaoF9HQwZ1q0NrwNeagZeOgrpoNLqE8WAXUj+c5hsv
 x7nPY4XNmRdw2/kkyPotyuRiGkbOOxNEdK0Avhl0id78RFiv4iwMzGdTRT+E9TMb
 SLF0KPskqKFcyjECD/zwhR2vEbm54harVqMI4pJU0745bjx/ZEfq+AYZN1Epza3N
 O2XYV+uWHi6NXALm195ccGuj2uWtfLHan9OFKyhJgfDbDfIngkr7dgtgCEAKqK9q
 zQrLemqeN3Xw2bRldsbg/6mnwhxqyLdoXbJ9JwD1JO1xvsVdElS0/z8dh4KdLv/8
 OQ2GIzgB1CyofYuyzpOI
 =UB+H
 -----END PGP SIGNATURE-----

Merge tag 'dax-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull dax updates from Dan Williams:
 "The fruits of a bug hunt in the fsdax implementation with Willy and a
  small feature update for device-dax:

   - Fix a hang condition that started triggering after the Xarray
     conversion of fsdax in the v4.20 kernel.

   - Add a 'resource' (root-only physical base address) sysfs attribute
     to device-dax instances to correlate memory-blocks onlined via the
     kmem driver with a given device instance"

* tag 'dax-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Fix missed wakeup with PMD faults
  device-dax: Add a 'resource' attribute
2019-07-18 10:58:52 -07:00
Linus Torvalds
f8c3500cd1 - virtio_pmem: The new virtio_pmem facility introduces a paravirtualized
persistent memory device that allows a guest VM to use DAX mechanisms to
   access a host-file with host-page-cache. It arranges for MAP_SYNC to
   be disabled and instead triggers a host fsync() when a 'write-cache
   flush' command is sent to the virtual disk device.
 
 - Miscellaneous small fixups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdMHwpAAoJEB7SkWpmfYgCUYoP/3vcgYBAaXNksyALF0iowPoP
 z4J0KoaOA1CzRFEQtCWUQa84CWj+XoSewwSeyrIkqKQvx/gghXblK+GVjVzBn0BD
 hmmiKr8af4DdxfzYdEXJp65cCpIiVMaJiGr20Aj9ObwvWJb4QZbz9q7hnPt6KgiI
 jVND3BpP3OERb4ZFcibdmJT5foKooMcXVG6+luVe+hc1+ZZQxJBsBaqie4brQIFq
 j59NX3HfHH2fr1vVwnVH0CO4tgbgYg9wZ2EivGu6wBWvORjrr7KiSSbOYP68EBtd
 lUoNps+vQtGnfXGwNzAjp1wuknrQYYh4/KMKjep7hiZD39rgyvBpbHbyynKzQCWV
 REe8cXr/nwphsENvBAUBiqY999EWVIxdT2iaVaSA6K/31JQAC5AFyxVK/P2Ke1SK
 rvePZ++iLQ1o4phTxQPNlVUqF9jOrFVVICGwMDqaqSkOsD9YKQdFClfOF/1ntlDz
 V0bs+Y0Pe8AJCd9ESep4X+vHAWRRIb4EQIuwLaX8RJoY+r1fGye9RPthpYYzvXKp
 DI2iJztFO3anzj2i9htNPUFIaiUmIhzEvG32O2If2yc5FL02hMpHPoFx6vHhe6s3
 f8OJ+olsJK+/IIrV8+DHqYvhzylOYIhmRTvIxIxaNDPHkhR1i2RDQ6KKK1YZmsr8
 MjAZ+Ym0GadDivs+wcM6
 =uAMG
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "Primarily just the virtio_pmem driver:

   - virtio_pmem

     The new virtio_pmem facility introduces a paravirtualized
     persistent memory device that allows a guest VM to use DAX
     mechanisms to access a host-file with host-page-cache. It arranges
     for MAP_SYNC to be disabled and instead triggers a host fsync()
     when a 'write-cache flush' command is sent to the virtual disk
     device.

   - Miscellaneous small fixups"

* tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  virtio_pmem: fix sparse warning
  xfs: disable map_sync for async flush
  ext4: disable map_sync for async flush
  dax: check synchronous mapping is supported
  dm: enable synchronous dax
  libnvdimm: add dax_dev sync flag
  virtio-pmem: Add virtio pmem driver
  libnvdimm: nd_region flush callback support
  libnvdimm, namespace: Drop uuid_t implementation detail
2019-07-18 10:52:08 -07:00
Zhengyuan Liu
c0e48f9dea io_uring: add a memory barrier before atomic_read
There is a hang issue while using fio to do some basic test. The issue
can be easily reproduced using the below script:

        while true
        do
                fio  --ioengine=io_uring  -rw=write -bs=4k -numjobs=1 \
                     -size=1G -iodepth=64 -name=uring   --filename=/dev/zero
        done

After several minutes (or more), fio would block at
io_uring_enter->io_cqring_wait in order to waiting for previously
committed sqes to be completed and can't return to user anymore until
we send a SIGTERM to fio. After receiving SIGTERM, fio hangs at
io_ring_ctx_wait_and_kill with a backtrace like this:

        [54133.243816] Call Trace:
        [54133.243842]  __schedule+0x3a0/0x790
        [54133.243868]  schedule+0x38/0xa0
        [54133.243880]  schedule_timeout+0x218/0x3b0
        [54133.243891]  ? sched_clock+0x9/0x10
        [54133.243903]  ? wait_for_completion+0xa3/0x130
        [54133.243916]  ? _raw_spin_unlock_irq+0x2c/0x40
        [54133.243930]  ? trace_hardirqs_on+0x3f/0xe0
        [54133.243951]  wait_for_completion+0xab/0x130
        [54133.243962]  ? wake_up_q+0x70/0x70
        [54133.243984]  io_ring_ctx_wait_and_kill+0xa0/0x1d0
        [54133.243998]  io_uring_release+0x20/0x30
        [54133.244008]  __fput+0xcf/0x270
        [54133.244029]  ____fput+0xe/0x10
        [54133.244040]  task_work_run+0x7f/0xa0
        [54133.244056]  do_exit+0x305/0xc40
        [54133.244067]  ? get_signal+0x13b/0xbd0
        [54133.244088]  do_group_exit+0x50/0xd0
        [54133.244103]  get_signal+0x18d/0xbd0
        [54133.244112]  ? _raw_spin_unlock_irqrestore+0x36/0x60
        [54133.244142]  do_signal+0x34/0x720
        [54133.244171]  ? exit_to_usermode_loop+0x7e/0x130
        [54133.244190]  exit_to_usermode_loop+0xc0/0x130
        [54133.244209]  do_syscall_64+0x16b/0x1d0
        [54133.244221]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The reason is that we had added a req to ctx->pending_async at the very
end, but it didn't get a chance to be processed. How could this happen?

        fio#cpu0                                        wq#cpu1

        io_add_to_prev_work                    io_sq_wq_submit_work

          atomic_read() <<< 1

                                                  atomic_dec_return() << 1->0
                                                  list_empty();    <<< true;

          list_add_tail()
          atomic_read() << 0 or 1?

As atomic_ops.rst states, atomic_read does not guarantee that the
runtime modification by any other thread is visible yet, so we must take
care of that with a proper implicit or explicit memory barrier.

This issue was detected with the help of Jackie's <liuyun01@kylinos.cn>

Fixes: 31b5151064 ("io_uring: allow workqueue item to handle multiple buffered requests")
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18 11:09:09 -06:00
Trond Myklebust
7402a4fedc SUNRPC: Fix up backchannel slot table accounting
Add a per-transport maximum limit in the socket case, and add
helpers to allow the NFSv4 code to discover that limit.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18 01:12:59 -04:00
Linus Torvalds
57a8ec387e Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "VM:
   - z3fold fixes and enhancements by Henry Burns and Vitaly Wool

   - more accurate reclaimed slab caches calculations by Yafang Shao

   - fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
     Christoph Hellwig

   - !CONFIG_MMU fixes by Christoph Hellwig

   - new novmcoredd parameter to omit device dumps from vmcore, by
     Kairui Song

   - new test_meminit module for testing heap and pagealloc
     initialization, by Alexander Potapenko

   - ioremap improvements for huge mappings, by Anshuman Khandual

   - generalize kprobe page fault handling, by Anshuman Khandual

   - device-dax hotplug fixes and improvements, by Pavel Tatashin

   - enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V

   - add pte_devmap() support for arm64, by Robin Murphy

   - unify locked_vm accounting with a helper, by Daniel Jordan

   - several misc fixes

  core/lib:
   - new typeof_member() macro including some users, by Alexey Dobriyan

   - make BIT() and GENMASK() available in asm, by Masahiro Yamada

   - changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better
     code generation, by Alexey Dobriyan

   - rbtree code size optimizations, by Michel Lespinasse

   - convert struct pid count to refcount_t, by Joel Fernandes

  get_maintainer.pl:
   - add --no-moderated switch to skip moderated ML's, by Joe Perches

  misc:
   - ptrace PTRACE_GET_SYSCALL_INFO interface

   - coda updates

   - gdb scripts, various"

[ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ]

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (100 commits)
  fs/select.c: use struct_size() in kmalloc()
  mm: add account_locked_vm utility function
  arm64: mm: implement pte_devmap support
  mm: introduce ARCH_HAS_PTE_DEVMAP
  mm: clean up is_device_*_page() definitions
  mm/mmap: move common defines to mman-common.h
  mm: move MAP_SYNC to asm-generic/mman-common.h
  device-dax: "Hotremove" persistent memory that is used like normal RAM
  mm/hotplug: make remove_memory() interface usable
  device-dax: fix memory and resource leak if hotplug fails
  include/linux/lz4.h: fix spelling and copy-paste errors in documentation
  ipc/mqueue.c: only perform resource calculation if user valid
  include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
  scripts/gdb: add helpers to find and list devices
  scripts/gdb: add lx-genpd-summary command
  drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
  kernel/pid.c: convert struct pid count to refcount_t
  drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
  select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
  select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
  ...
2019-07-17 08:58:04 -07:00
Johannes Thumshirn
373c3b80e4 btrfs: don't leak extent_map in btrfs_get_io_geometry()
btrfs_get_io_geometry() calls btrfs_get_chunk_map() to acquire a reference
on a extent_map, but on normal operation it does not drop this reference
anymore.

This leads to excessive kmemleak reports.

Always call free_extent_map(), not just in the error case.

Fixes: 5f1411265e ("btrfs: Introduce btrfs_io_geometry infrastructure")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-17 17:03:36 +02:00
Johannes Thumshirn
bfcea1c661 btrfs: free checksum hash on in close_ctree
fs_info::csum_hash gets initialized in btrfs_init_csum_hash() which is
called by open_ctree().

But it only gets freed if open_ctree() fails, not on normal operation.

This leads to a memory leak like the following found by kmemleak:
unreferenced object 0xffff888132cb8720 (size 96):

  comm "mount", pid 450, jiffies 4294912436 (age 17.584s)
  hex dump (first 32 bytes):
    04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000000c9643d4>] crypto_create_tfm+0x2d/0xd0
    [<00000000ae577f68>] crypto_alloc_tfm+0x4b/0xb0
    [<000000002b5cdf30>] open_ctree+0xb84/0x2060 [btrfs]
    [<0000000043204297>] btrfs_mount_root+0x552/0x640 [btrfs]
    [<00000000c99b10ea>] legacy_get_tree+0x22/0x40
    [<0000000071a6495f>] vfs_get_tree+0x1f/0xc0
    [<00000000f180080e>] fc_mount+0x9/0x30
    [<000000009e36cebd>] vfs_kern_mount.part.11+0x6a/0x80
    [<0000000004594c05>] btrfs_mount+0x174/0x910 [btrfs]
    [<00000000c99b10ea>] legacy_get_tree+0x22/0x40
    [<0000000071a6495f>] vfs_get_tree+0x1f/0xc0
    [<00000000b86e92c5>] do_mount+0x6b0/0x940
    [<0000000097464494>] ksys_mount+0x7b/0xd0
    [<0000000057213c80>] __x64_sys_mount+0x1c/0x20
    [<00000000cb689b5e>] do_syscall_64+0x43/0x130
    [<000000002194e289>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Free fs_info::csum_hash in close_ctree() to avoid the memory leak.

Fixes: 6d97c6e31b ("btrfs: add boilerplate code for directly including the crypto framework")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-17 17:03:33 +02:00
YueHaibing
314c4cd6d9 btrfs: Fix build error while LIBCRC32C is module
If CONFIG_BTRFS_FS is y and CONFIG_LIBCRC32C is m,
building fails:

  fs/btrfs/super.o: In function `btrfs_mount_root':
  super.c:(.text+0xb7f9): undefined reference to `crc32c_impl'
  fs/btrfs/super.o: In function `init_btrfs_fs':
  super.c:(.init.text+0x3465): undefined reference to `crc32c_impl'
  fs/btrfs/extent-tree.o: In function `hash_extent_data_ref':
  extent-tree.c:(.text+0xe60): undefined reference to `crc32c'
  extent-tree.c:(.text+0xe78): undefined reference to `crc32c'
  extent-tree.c:(.text+0xe8b): undefined reference to `crc32c'
  fs/btrfs/dir-item.o: In function `btrfs_insert_xattr_item':
  dir-item.c:(.text+0x291): undefined reference to `crc32c'
  fs/btrfs/dir-item.o: In function `btrfs_insert_dir_item':
  dir-item.c:(.text+0x429): undefined reference to `crc32c'

Select LIBCRC32C to fix it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: d5178578bc ("btrfs: directly call into crypto framework for checksumming")
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-17 17:03:30 +02:00
Qu Wenruo
42c16da6d6 btrfs: inode: Don't compress if NODATASUM or NODATACOW set
As btrfs(5) specified:

	Note
	If nodatacow or nodatasum are enabled, compression is disabled.

If NODATASUM or NODATACOW set, we should not compress the extent.

Normally NODATACOW is detected properly in run_delalloc_range() so
compression won't happen for NODATACOW.

However for NODATASUM we don't have any check, and it can cause
compressed extent without csum pretty easily, just by:
  mkfs.btrfs -f $dev
  mount $dev $mnt -o nodatasum
  touch $mnt/foobar
  mount -o remount,datasum,compress $mnt
  xfs_io -f -c "pwrite 0 128K" $mnt/foobar

And in fact, we have a bug report about corrupted compressed extent
without proper data checksum so even RAID1 can't recover the corruption.
(https://bugzilla.kernel.org/show_bug.cgi?id=199707)

Running compression without proper checksum could cause more damage when
corruption happens, as compressed data could make the whole extent
unreadable, so there is no need to allow compression for
NODATACSUM.

The fix will refactor the inode compression check into two parts:

- inode_can_compress()
  As the hard requirement, checked at btrfs_run_delalloc_range(), so no
  compression will happen for NODATASUM inode at all.

- inode_need_compress()
  As the soft requirement, checked at btrfs_run_delalloc_range() and
  compress_file_range().

Reported-by: James Harvey <jamespharvey20@gmail.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-17 17:03:28 +02:00
Darrick J. Wong
5d907307ad iomap: move internal declarations into fs/iomap/
Move internal function declarations out of fs/internal.h into
include/linux/iomap.h so that our transition is complete.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-07-17 07:21:02 -07:00
Darrick J. Wong
cb7181ff4b iomap: move the main iteration code into a separate file
Move the main iteration code into a separate file so that we can group
related functions in a single file instead of having a single enormous
source file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-07-17 07:20:43 -07:00
Darrick J. Wong
afc51aaa22 iomap: move the buffered IO code into a separate file
Move the buffered IO code into a separate file so that we can group
related functions in a single file instead of having a single enormous
source file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-07-17 07:16:00 -07:00
Darrick J. Wong
db074436f4 iomap: move the direct IO code into a separate file
Move the direct IO code into a separate file so that we can group
related functions in a single file instead of having a single enormous
source file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-07-17 07:16:00 -07:00
Darrick J. Wong
56a178981d iomap: move the SEEK_HOLE code into a separate file
Move the SEEK_HOLE/SEEK_DATA code into a separate file so that we can
group related functions in a single file instead of having a single
enormous source file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-07-17 07:14:10 -07:00
Darrick J. Wong
5157fb8f5a iomap: move the file mapping reporting code into a separate file
Move the file mapping reporting code (FIEMAP/FIBMAP) into a separate
file so that we can group related functions in a single file instead of
having a single enormous source file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-07-17 07:14:10 -07:00
Darrick J. Wong
a45c0eccc5 iomap: move the swapfile code into a separate file
Move the swapfile activation code into a separate file so that we can
group related functions in a single file instead of having a single
enormous source file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-07-17 07:14:10 -07:00
Al Viro
56cbb429d9 switch the remnants of releasing the mountpoint away from fs_pin
We used to need rather convoluted ordering trickery to guarantee
that dput() of ex-mountpoints happens before the final mntput()
of the same.  Since we don't need that anymore, there's no point
playing with fs_pin for that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-16 22:52:37 -04:00
Al Viro
2763d11912 get rid of detach_mnt()
Lift getting the original mount (dentry is actually not needed at all)
of the mountpoint into the callers - to do_move_mount() and pivot_root()
level.  That simplifies the cleanup in those and allows to get saner
arguments for attach_mnt_recursive().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-16 22:50:11 -04:00
Al Viro
4edbe133f8 make struct mountpoint bear the dentry reference to mountpoint, not struct mount
Using dput_to_list() to shift the contributing reference from ->mnt_mountpoint
to ->mnt_mp->m_dentry.  Dentries are dropped (with dput_to_list()) as soon
as struct mountpoint is destroyed; in cases where we are under namespace_sem
we use the global list, shrinking it in namespace_unlock().  In case of
detaching stuck MNT_LOCKed children at final mntput_no_expire() we use a local
list and shrink it ourselves.  ->mnt_ex_mountpoint crap is gone.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-16 22:43:40 -04:00
Matthew Wilcox (Oracle)
23c84eb783 dax: Fix missed wakeup with PMD faults
RocksDB can hang indefinitely when using a DAX file.  This is due to
a bug in the XArray conversion when handling a PMD fault and finding a
PTE entry.  We use the wrong index in the hash and end up waiting on
the wrong waitqueue.

There's actually no need to wait; if we find a PTE entry while looking
for a PMD entry, we can return immediately as we know we should fall
back to a PTE fault (which may not conflict with the lock held).

We reuse the XA_RETRY_ENTRY to signal a conflicting entry was found.
This value can never be found in an XArray while holding its lock, so
it does not create an ambiguity.

Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/CAPcyv4hwHpX-MkUEqxwdTj7wCCZCN4RV-L4jsnuwLGyL_UEG4A@mail.gmail.com
Fixes: b15cd80068 ("dax: Convert page fault handlers to XArray")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Robert Barror <robert.barror@intel.com>
Reported-by: Seema Pandit <seema.pandit@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-07-16 19:30:59 -07:00
Gustavo A. R. Silva
43e11fa2d1 fs/select.c: use struct_size() in kmalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array.  For example:

  struct foo {
       int stuff;
       struct boo entry[];
  };

  size = sizeof(struct foo) + count * sizeof(struct boo);
  instance = kmalloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

  instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

Also, notice that variable size is unnecessary, hence it is removed.

This code was detected with the help of Coccinelle.

Link: http://lkml.kernel.org/r/20190604164226.GA13823@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:25 -07:00
Oleg Nesterov
ac30102062 select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
Now that restore_saved_sigmask_unless() is always called with the same
argument right before poll_select_copy_remaining() we can move it into
poll_select_copy_remaining() and make it the only caller of restore() in
fs/select.c.

The patch also renames poll_select_copy_remaining(),
poll_select_finish() looks better after this change.

kern_select() doesn't use set_user_sigmask(), so in this case
poll_select_finish() does restore_saved_sigmask_unless() "for no
reason".  But this won't hurt, and WARN_ON(!TIF_SIGPENDING) is still
valid.

Link: http://lkml.kernel.org/r/20190606140915.GC13440@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Laight <David.Laight@aculab.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:24 -07:00
Oleg Nesterov
8cf8b5539a select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
do_poll() returns -EINTR if interrupted and after that all its callers
have to translate it into -ERESTARTNOHAND.  Change do_poll() to return
-ERESTARTNOHAND and update (simplify) the callers.

Note that this also unifies all users of restore_saved_sigmask_unless(),
see the next patch.

Linus:

: The *right* return value will actually be then chosen by
: poll_select_copy_remaining(), which will turn ERESTARTNOHAND to EINTR
: when it can't update the timeout.
:
: Except for the cases that use restart_block and do that instead and
: don't have the whole timeout restart issue as a result.

Link: http://lkml.kernel.org/r/20190606140852.GB13440@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Laight <David.Laight@aculab.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:24 -07:00
Oleg Nesterov
b772434be0 signal: simplify set_user_sigmask/restore_user_sigmask
task->saved_sigmask and ->restore_sigmask are only used in the ret-from-
syscall paths.  This means that set_user_sigmask() can save ->blocked in
->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked
was modified.

This way the callers do not need 2 sigset_t's passed to set/restore and
restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
into the trivial helper which just calls restore_saved_sigmask().

Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Laight <David.Laight@aculab.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:24 -07:00
Hariprasad Kelam
dc0dde61f1 fs/reiserfs/journal.c: change return type of dirty_one_transaction
Change return type of dirty_one_transaction from int to void.  As this
function always return success.

Fixes below issue reported by coccicheck:

  fs/reiserfs/journal.c:1690:5-8: Unneeded variable: "ret".  Return "0" on line 1719

Link: http://lkml.kernel.org/r/20190702175430.GA5882@hari-Inspiron-1545
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bharath Vedartham <linux.bhar@gmail.com>
Cc: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:24 -07:00
YueHaibing
ba542f20f9 fs/ufs/super.c: remove set but not used variable 'usb3'
Fixes gcc '-Wunused-but-set-variable' warning:

  fs/ufs/super.c: In function ufs_statfs:
  fs/ufs/super.c:1409:32: warning: variable usb3 set but not used [-Wunused-but-set-variable]

It is not used since commmit c596961d1b ("ufs: fix s_size/s_dsize
users")

Link: http://lkml.kernel.org/r/20190525140654.15924-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Mathieu Malaterre
29774f3f4e fs/hfsplus/xattr.c: replace strncpy with memcpy
strncpy() was used to copy a fixed size buffer.  Since NUL-terminating
string is not required here, prefer a memcpy function.  The generated
code (ppc32) remains the same.

Silence the following warning triggered using W=1:

  fs/hfsplus/xattr.c:410:3: warning: 'strncpy' output truncated before terminating nul copying 4 bytes from a string of the same length [-Wstringop-truncation]

Link: http://lkml.kernel.org/r/20190529113341.11972-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Pedro Cuadra
a9fba24c6a coda: add hinting support for partial file caching
This adds support for partial file caching in Coda.  Every read, write
and mmap informs the userspace cache manager about what part of a file
is about to be accessed so that the cache manager can ensure the
relevant parts are available before the operation is allowed to proceed.

When a read or write operation completes, this is also reported to allow
the cache manager to track when partially cached content can be
released.

If the cache manager does not support partial file caching, or when the
entire file has been fetched into the local cache, the cache manager may
return an EOPNOTSUPP error to indicate that intent upcalls are no longer
necessary until the file is closed.

[akpm@linux-foundation.org: little whitespace fixup]
Link: http://lkml.kernel.org/r/20190618181301.6960-1-jaharkes@cs.cmu.edu
Signed-off-by: Pedro Cuadra <pjcuadra@gmail.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Fabian Frederick
5bb44810f4 coda: ftoc validity check integration
This patch moves cfi check in coda_ftoc() instead of repeating it in the
wild.

  Module size
     text	   data	    bss	    dec	    hex	filename
    28297	   1040	    700	  30037	   7555	fs/coda/coda.ko.before
    28263	    980	    700	  29943	   74f7	fs/coda/coda.ko.after

Link: http://lkml.kernel.org/r/a2c27663ec4547018c92d71c63b1dff4650b6546.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Fabian Frederick
7f6118ce95 coda: remove sb test in coda_fid_to_inode()
coda_fid_to_inode() is only called by coda_downcall() where sb is already
being tested.

Link: http://lkml.kernel.org/r/d2163b3136348faf83ba47dc2d65a5d0a9a135dd.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Fabian Frederick
6975259ae3 coda: remove sysctl object from module when unused
Inspired by NFS sysctl process

Link: http://lkml.kernel.org/r/9afcc2cd09490849b309786bbf47fef75de7f91c.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Fabian Frederick
f94845284a coda: add __init to init_coda_psdev()
init_coda_psdev() was only called by __init function.

Link: http://lkml.kernel.org/r/a12a5a135fa6b0ea997e1a0af4be0a235c463a24.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Fabian Frederick
50e9a6efb0 coda: use SIZE() for stat
max_t expression was already defined in coda sources

Link: http://lkml.kernel.org/r/e6cda497ce8691db155cb35f8d13ea44ca6cedeb.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Fabian Frederick
79a0d65e77 coda: destroy mutex in put_super()
We can safely destroy vc_mutex at the end of umount process.

Link: http://lkml.kernel.org/r/f436f68908c467c5663bc6a9251b52cd7b95d2a5.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Jan Harkes
6dc280ebee coda: remove uapi/linux/coda_psdev.h
Nothing is left in this header that is used by userspace.

Link: http://lkml.kernel.org/r/bb11378cef94739f2cf89425dd6d302a52c64480.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
David Howells
8fc8b9df83 coda: move internal defs out of include/linux/ [ver #2]
Move include/linux/coda_psdev.h to fs/coda/ as there's nothing else that
uses it.

Link: http://lkml.kernel.org/r/3ceeee0415a929b89fb02700b6b4b3a07938acb8.1558117389.git.jaharkes@cs.cmu.edu
Link: https://patchwork.kernel.org/patch/10590257/
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Jan Harkes
b6a18c6008 coda: bump module version
The out of tree module version had been bumped several times already,
but we haven't kept this in-tree one in sync, partly because most
changes go from here to the out-of-tree copy.

Link: http://lkml.kernel.org/r/8b0ab50a2da2f0180ac32c79d91811b4d1d0bd8b.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Dan Carpenter
936dae4525 coda: get rid of CODA_FREE()
The CODA_FREE() macro just calls kvfree().  We can call that directly
instead.

Link: http://lkml.kernel.org/r/4950a94fd30ec5f84835dd4ca0bb67c0448672f5.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Dan Carpenter
4dc48193d7 coda: get rid of CODA_ALLOC()
These days we have kvzalloc() so we can delete CODA_ALLOC().

I made a couple related changes in coda_psdev_write().  First, I added
some error handling to avoid a NULL dereference if the allocation
failed.  Second, I used kvmalloc() instead of kvzalloc() because we copy
over the memory on the next line so there is no need to zero it first.

Link: http://lkml.kernel.org/r/e56010c822e7a7cbaa8a238cf82ad31c67eaa800.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Jan Harkes
5e7c31dfe7 coda: change Coda's user api to use 64-bit time_t in timespec
Move the 32-bit time_t problems to userspace.

Link: http://lkml.kernel.org/r/8d089068823bfb292a4020f773922fbd82ffad39.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Arnd Bergmann
6ced9aa7b5 coda: stop using 'struct timespec' in user API
We exchange file timestamps with user space using psdev device
read/write operations with a fixed but architecture specific binary
layout.

On 32-bit systems, this uses a 'timespec' structure that is defined by
the C library to contain two 32-bit values for seconds and nanoseconds.
As we get ready for the year 2038 overflow of the 32-bit signed seconds,
the kernel now uses 64-bit timestamps internally, and user space will do
the same change by changing the 'timespec' definition in the future.

Unfortunately, this breaks the layout of the coda_vattr structure, so we
need to redefine that in terms of something that does not change.  I'm
introducing a new 'struct vtimespec' structure here that keeps the
existing layout, and the same change has to be done in the coda user
space copy of linux/coda.h before anyone can use that on a 32-bit
architecture with 64-bit time_t.

An open question is what should happen to actual times past y2038, as
they are now truncated to the last valid date when sent to user space,
and interpreted as pre-1970 times when a timestamp with the MSB set is
read back into the kernel.  Alternatively, we could change the new
timespec64_to_coda()/coda_to_timespec64() functions to use a different
interpretation and extend the available range further to the future by
disallowing past timestamps.  This would require more changes in the
user space side though.

Link: http://lkml.kernel.org/r/562b7324149461743e4fbe2fedbf7c242f7e274a.1558117389.git.jaharkes@cs.cmu.edu
Link: https://patchwork.kernel.org/patch/10474735/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Colin Ian King
850622136f coda: clean up indentation, replace spaces with tab
Trivial fix to clean up indentation, replace spaces with tab

Link: http://lkml.kernel.org/r/ffc2bfa5a37ffcdf891c51b2e2ed618103965b24.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Jan Harkes
9a05671dd8 coda: don't try to print names that were considered too long
Probably safer to just show the unexpected length and debug it from the
userspace side.

Link: http://lkml.kernel.org/r/582ae759a4fdfa31a64c35de489fa4efabac09d6.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Jan Harkes
6e51f8aa76 coda: potential buffer overflow in coda_psdev_write()
Add checks to make sure the downcall message we got from the Coda cache
manager is large enough to contain the data it is supposed to have.
i.e.  when we get a CODA_ZAPDIR we can access &out->coda_zapdir.CodaFid.

Link: http://lkml.kernel.org/r/894fb6b250add09e4e3935f14649f21284a5cb18.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Zhouyang Jia
02551c23bc coda: add error handling for fget
When fget fails, the lack of error-handling code may cause unexpected
results.

This patch adds error-handling code after calling fget.

Link: http://lkml.kernel.org/r/2514ec03df9c33b86e56748513267a80dd8004d9.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:23 -07:00
Jan Harkes
7fa0a1da3d coda: pass the host file in vma->vm_file on mmap
Patch series "Coda updates".

The following patch series is a collection of various fixes for Coda,
most of which were collected from linux-fsdevel or linux-kernel but
which have as yet not found their way upstream.

This patch (of 22):

Various file systems expect that vma->vm_file points at their own file
handle, several use file_inode(vma->vm_file) to get at their inode or
use vma->vm_file->private_data.  However the way Coda wrapped mmap on a
host file broke this assumption, vm_file was still pointing at the Coda
file and the host file systems would scribble over Coda's inode and
private file data.

This patch fixes the incorrect expectation and wraps vm_ops->open and
vm_ops->close to allow Coda to track when the vm_area_struct is
destroyed so we still release the reference on the Coda file handle at
the right time.

Link: http://lkml.kernel.org/r/0e850c6e59c0b147dc2dcd51a3af004c948c3697.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:22 -07:00
Alexey Dobriyan
aa94b1dc5b fs/binfmt_elf.c: delete stale comment
"passed_fileno" variable was deleted 11 years ago in 2.6.25.

Link: http://lkml.kernel.org/r/20190529201747.GA23248@avx2
Fixes: d20894a237 ("Remove a.out interpreter support in ELF loader")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:22 -07:00
YueHaibing
1b113e04e2 fs/binfmt_flat.c: remove set but not used variable 'inode'
Fixes gcc '-Wunused-but-set-variable' warning:

  fs/binfmt_flat.c: In function load_flat_file:
  fs/binfmt_flat.c:419:16: warning: variable inode set but not used [-Wunused-but-set-variable]

It's never used and can be removed.

Link: http://lkml.kernel.org/r/20190525125341.9844-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:22 -07:00
Radoslaw Burny
5ec27ec735 fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.
Normally, the inode's i_uid/i_gid are translated relative to s_user_ns,
but this is not a correct behavior for proc.  Since sysctl permission
check in test_perm is done against GLOBAL_ROOT_[UG]ID, it makes more
sense to use these values in u_[ug]id of proc inodes.  In other words:
although uid/gid in the inode is not read during test_perm, the inode
logically belongs to the root of the namespace.  I have confirmed this
with Eric Biederman at LPC and in this thread:
  https://lore.kernel.org/lkml/87k1kzjdff.fsf@xmission.com

Consequences
============

Since the i_[ug]id values of proc nodes are not used for permissions
checks, this change usually makes no functional difference.  However, it
causes an issue in a setup where:

 * a namespace container is created without root user in container -
   hence the i_[ug]id of proc nodes are set to INVALID_[UG]ID

 * container creator tries to configure it by writing /proc/sys files,
   e.g. writing /proc/sys/kernel/shmmax to configure shared memory limit

Kernel does not allow to open an inode for writing if its i_[ug]id are
invalid, making it impossible to write shmmax and thus - configure the
container.

Using a container with no root mapping is apparently rare, but we do use
this configuration at Google.  Also, we use a generic tool to configure
the container limits, and the inability to write any of them causes a
failure.

History
=======

The invalid uids/gids in inodes first appeared due to 8175435777 (fs:
Update i_[ug]id_(read|write) to translate relative to s_user_ns).
However, AFAIK, this did not immediately cause any issues.  The
inability to write to these "invalid" inodes was only caused by a later
commit 0bd23d09b8 (vfs: Don't modify inodes with a uid or gid unknown
to the vfs).

Tested: Used a repro program that creates a user namespace without any
mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
Before the change, it shows the overflow uid, with the change it's 0.
The overflow uid indicates that the uid in the inode is not correct and
thus it is not possible to open the file for writing.

Link: http://lkml.kernel.org/r/20190708115130.250149-1-rburny@google.com
Fixes: 0bd23d09b8 ("vfs: Don't modify inodes with a uid or gid unknown to the vfs")
Signed-off-by: Radoslaw Burny <rburny@google.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: John Sperbeck <jsperbeck@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:21 -07:00
Alexey Dobriyan
9af27b28b1 fs/proc/inode.c: use typeof_member() macro
Don't repeat function signatures twice.

This is a kind-of-precursor for "struct proc_ops".

Note:

	typeof(pde->proc_fops->...) ...;

can't be used because ->proc_fops is "const struct file_operations *".
"const" prevents assignment down the code and it can't be deleted in the
type system.

Link: http://lkml.kernel.org/r/20190529191110.GB5703@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:21 -07:00
Kairui Song
c6c405336b vmcore: add a kernel parameter novmcoredd
Since commit 2724273e8f ("vmcore: add API to collect hardware dump in
second kernel"), drivers are allowed to add device related dump data to
vmcore as they want by using the device dump API.  This has a potential
issue, the data is stored in memory, drivers may append too much data
and use too much memory.  The vmcore is typically used in a kdump kernel
which runs in a pre-reserved small chunk of memory.  So as a result it
will make kdump unusable at all due to OOM issues.

So introduce new 'novmcoredd' command line option.  User can disable
device dump to reduce memory usage.  This is helpful if device dump is
using too much memory, disabling device dump could make sure a regular
vmcore without device dump data is still available.

[akpm@linux-foundation.org: tweak documentation]
[akpm@linux-foundation.org: vmcore.c needs moduleparam.h]
Link: http://lkml.kernel.org/r/20190528111856.7276-1-kasong@redhat.com
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:21 -07:00
Linus Torvalds
0a8ad0ffa4 orangefs: This simple pull request is just a fix for an
Unused Value that colin.king@canonical.com sent me and a
 related fix I added.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdLflkAAoJEM9EDqnrzg2+XEIQAImX96e5A/gDSAFhL23cIW73
 2mAeCjwvpHk9hmLESpek3NfKH9fBkKS07bv0ICr1P4vSW2KIecbyc6Megern8w6H
 WuY0WSELCDy207zqHf1BeGwEMC8Z/MadVSP/2wed7E0+AppcsPeGhe5yFlD8MKYq
 sFBoZ3tQ88iujNqlORlPy7bWJzsx3a9MYznDlbeIVY7gsgnFrgyHCoY+cpZTG9fr
 ZEQmyw2B7NWwzu438i3Qk+MDiztq4ldwhsO8HFt3+qTBtfUzmb7upWFMM1AUWQxd
 3XIYZiCXgPmfIsQQJE5HPeE10zrMNVRYu6CjeUZPl11iy+/OKcY/UVQCwNxbQFAc
 ZBFk6OIj9wZsMWAgnlp6zUUJ/s//OudnuSbbHlj3VDVEa+aZSJawgoIsauCyyZc/
 5/sDpKcjb5Z4O5Sd+BkT8Zg0iAszBx7aJSqj/ggPvqxt7UQ9dF97AgYdZYfav/br
 xL2O9KNfWGHj+qUV8qmQXVMJZWPrHZjQMPqA4Oi5CxmW+eF1kxr2mw3bU7suHXKm
 CA7fRECsiyo2ULIl3wT12cOrpuFRQegkUW6AuBq5Z2JNw+H15JCbPZVMllfj7xye
 TAxXXp9atcyRpCOrtB3dRO0itnGDrWx4WAxaKR127IeqKAiP2ZsPceu0mILamPno
 ekJLcwC8kwbUDrNHR170
 =PJuy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.3-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Two small fixes.

  This is just a fix for an unused value that Colin King sent me and a
  related fix I added"

* tag 'for-linus-5.3-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: eliminate needless variable assignments
  orangefs: remove redundant assignment to variable buffer_index
2019-07-16 15:15:29 -07:00
Linus Torvalds
a18f877541 for-5.3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl0sNWYACgkQxWXV+ddt
 WDsyQA/8CGnF68g6hwVuYz4K7f39gOiFlBnRxeN/3RT6vkNSyLZxvRDaDrSTzVIo
 cz2G/9qZLXsIll+3EfZlyzZZiA+4f4hEDAfAd4yVPavRom+uu7dbqzAIpgvFlYdH
 vhAYKOeWSqWElWJ06hzWO3FCwjY9GKFMk4PS0XHHp+STCT0hq1MkaHr44kiHsqdh
 T5nVGDwXz8nGDZ51RO6+mgiSrd5eHbs6kXCd8rW7hmjTx8ClKHa1tdkxN/us+pJm
 hTFT669m5ckHhY2AUKmkREoOwpnt2HcXQJNkz6gO+o03IDvYz73SScbhSYdNTlwi
 j74GLf89FA52qVM+JDg9MaWYqgf1pQI8AHK/rXw2FNbuP/eL9kuZ85ZIbO6CiO0c
 5jAixReSwzSP/V0+MKW3F7k4KtIqbHAV6mkI8zLwrAee4Xj81BOtgL7gYPFQTwSZ
 ma0hEoen7IV5+/z9upUuLA5wr4BT+h1T+EllCWe1+9+9mRYOvowtkRNBL8HZWTDI
 b65oTITfot54xX9ecKtiuG2qoqJEjjkR+YKdRM4nph6wflSNZxEoezBp3iRFpYOL
 Lx+g97RcJ2EEoBVjVMkTqfj93GeiKRifa8yXdRY+A0I2ZXZEcS8DjSJM6rj3AOPy
 4idIl+ABscayZowfqu0FSIULf1La0qiRXmbGNeG4ylhN4L6S/og=
 =eshk
 -----END PGP SIGNATURE-----

Merge tag 'for-5.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "Highlights:

   - chunks that have been trimmed and unchanged since last mount are
     tracked and skipped on repeated trims

   - use hw assissed crc32c on more arches, speedups if native
     instructions or optimized implementation is available

   - the RAID56 incompat bit is automatically removed when the last
     block group of that type is removed

  Fixes:

   - fsync fix for reflink on NODATACOW files that could lead to ENOSPC

   - fix data loss after inode eviction, renaming it, and fsync it

   - fix fsync not persisting dentry deletions due to inode evictions

   - update ctime/mtime/iversion after hole punching

   - fix compression type validation (reported by KASAN)

   - send won't be allowed to start when relocation is in progress, this
     can cause spurious errors or produce incorrect send stream

  Core:

   - new tracepoints for space update

   - tree-checker: better check for end of extents for some tree items

   - preparatory work for more checksum algorithms

   - run delayed iput at unlink time and don't push the work to cleaner
     thread where it's not properly throttled

   - wrap block mapping to structures and helpers, base for further
     refactoring

   - split large files, part 1:
       - space info handling
       - block group reservations
       - delayed refs
       - delayed allocation

   - other cleanups and refactoring"

* tag 'for-5.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (103 commits)
  btrfs: fix memory leak of path on error return path
  btrfs: move the subvolume reservation stuff out of extent-tree.c
  btrfs: migrate the delalloc space stuff to it's own home
  btrfs: migrate btrfs_trans_release_chunk_metadata
  btrfs: migrate the delayed refs rsv code
  btrfs: Evaluate io_tree in find_lock_delalloc_range()
  btrfs: migrate the global_block_rsv helpers to block-rsv.c
  btrfs: migrate the block-rsv code to block-rsv.c
  btrfs: stop using block_rsv_release_bytes everywhere
  btrfs: cleanup the target logic in __btrfs_block_rsv_release
  btrfs: export __btrfs_block_rsv_release
  btrfs: export btrfs_block_rsv_add_bytes
  btrfs: move btrfs_block_rsv definitions into it's own header
  btrfs: Simplify update of space_info in __reserve_metadata_bytes()
  btrfs: unexport can_overcommit
  btrfs: move reserve_metadata_bytes and supporting code to space-info.c
  btrfs: move dump_space_info to space-info.c
  btrfs: export block_rsv_use_bytes
  btrfs: move btrfs_space_info_add_*_bytes to space-info.c
  btrfs: move the space info update macro to space-info.h
  ...
2019-07-16 15:12:56 -07:00
Linus Torvalds
c309b6f242 docs conversion for v5.3-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl0tpocACgkQCF8+vY7k
 4RWoxA//b/fmDXP3WPzrjjSmpyB9ml0/epKzPbT5S2j0lftqKBmet29k+PCjVrTx
 Nq2QauehY9ug5h8UMVUCmzPr95F0tSIGRoqk1vrn7z0K3q6k1SHrtvqbY1Bgb2Uk
 Qvh2YFU4fQLJg8WAbExCjxCdbdmBKQVGKTwCtM+tP5OMxwAFOmQrjGaUaKCKIIA2
 7Wzrx8CpSji+bJ3uK/d36c+4M9oDly5eaxBhoboL3BI0y+GqwiSASGwTO7BxrPOg
 0wq5IZHnqS8+bprT9xQdDOqf+UOY9U1cxE/+sqsHxblfUEx9gfLy/R+FLmJn+SS9
 Z3yLy4SqVHQMpWBjEAGodohikF60PAuTdymSC11jqFaKCUxWrIZg5xO+0blMrxPF
 7vYIexutCkaBMHBlNaNsHIqB7B/2FGGKoN7QW64hwvwJCGvF7OmJcV+R4bROGvh4
 nFuis9/Nm66Fq7I3aw37ThyZ0aWZdaQ0QJTH9ksxU/ZCz2hhMNYu/rXggrDvkS4U
 nr77ZT5Gd7nj4b110zf8+99uiGiinY6hTfzPAuTCLBhaxwrv4/xDHAhpwdEB5T4j
 8gOkxV8c0XWtL7sKqhGJvs/RRe2za0Y9XH6fyxsYfWcfuLjEvug8ouXMad9gxFWH
 DL3WnKJEMGLScei2wux4kGOwEbkR1bUf2cHJfh3GpCB/y8vgLOc=
 =smxY
 -----END PGP SIGNATURE-----

Merge tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull rst conversion of docs from Mauro Carvalho Chehab:
 "As agreed with Jon, I'm sending this big series directly to you, c/c
  him, as this series required a special care, in order to avoid
  conflicts with other trees"

* tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits)
  docs: kbuild: fix build with pdf and fix some minor issues
  docs: block: fix pdf output
  docs: arm: fix a breakage with pdf output
  docs: don't use nested tables
  docs: gpio: add sysfs interface to the admin-guide
  docs: locking: add it to the main index
  docs: add some directories to the main documentation index
  docs: add SPDX tags to new index files
  docs: add a memory-devices subdir to driver-api
  docs: phy: place documentation under driver-api
  docs: serial: move it to the driver-api
  docs: driver-api: add remaining converted dirs to it
  docs: driver-api: add xilinx driver API documentation
  docs: driver-api: add a series of orphaned documents
  docs: admin-guide: add a series of orphaned documents
  docs: cgroup-v1: add it to the admin-guide book
  docs: aoe: add it to the driver-api book
  docs: add some documentation dirs to the driver-api book
  docs: driver-model: move it to the driver-api book
  docs: lp855x-driver.rst: add it to the driver-api book
  ...
2019-07-16 12:21:41 -07:00
Linus Torvalds
2954152298 Merge branch 'proc-cmdline' (/proc/<pid>/cmdline fixes)
This fixes two problems reported with the cmdline simplification and
cleanup last year:

 - the setproctitle() special cases didn't quite match the original
   semantics, and it can be noticeable:

      https://lore.kernel.org/lkml/alpine.LNX.2.21.1904052326230.3249@kich.toxcorp.com/

 - it could leak an uninitialized byte from the temporary buffer under
   the right (wrong) circustances:

      https://lore.kernel.org/lkml/20190712160913.17727-1-izbyshev@ispras.ru/

It rewrites the logic entirely, splitting it into two separate commits
(and two separate functions) for the two different cases ("unedited
cmdline" vs "setproctitle() has been used to change the command line").

* proc-cmdline:
  /proc/<pid>/cmdline: add back the setproctitle() special case
  /proc/<pid>/cmdline: remove all the special cases
2019-07-16 10:37:27 -07:00
Linus Torvalds
d26d0cd97c /proc/<pid>/cmdline: add back the setproctitle() special case
This makes the setproctitle() special case very explicit indeed, and
handles it with a separate helper function entirely.  In the process, it
re-instates the original semantics of simply stopping at the first NUL
character when the original last NUL character is no longer there.

[ The original semantics can still be seen in mm/util.c: get_cmdline()
  that is limited to a fixed-size buffer ]

This makes the logic about when we use the string lengths etc much more
obvious, and makes it easier to see what we do and what the two very
different cases are.

Note that even when we allow walking past the end of the argument array
(because the setproctitle() might have overwritten and overflowed the
original argv[] strings), we only allow it when it overflows into the
environment region if it is immediately adjacent.

[ Fixed for missing 'count' checks noted by Alexey Izbyshev ]

Link: https://lore.kernel.org/lkml/alpine.LNX.2.21.1904052326230.3249@kich.toxcorp.com/
Fixes: 5ab8271899 ("fs/proc: simplify and clarify get_mm_cmdline() function")
Cc: Jakub Jankowski <shasta@toxcorp.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Izbyshev <izbyshev@ispras.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 09:57:52 -07:00
Linus Torvalds
3d712546d8 /proc/<pid>/cmdline: remove all the special cases
Start off with a clean slate that only reads exactly from arg_start to
arg_end, without any oddities.  This simplifies the code and in the
process removes the case that caused us to potentially leak an
uninitialized byte from the temporary kernel buffer.

Note that in order to start from scratch with an understandable base,
this simplifies things _too_ much, and removes all the legacy logic to
handle setproctitle() having changed the argument strings.

We'll add back those special cases very differently in the next commit.

Link: https://lore.kernel.org/lkml/20190712160913.17727-1-izbyshev@ispras.ru/
Fixes: f5b65348fd ("proc: fix missing final NUL in get_mm_cmdline() rewrite")
Cc: Alexey Izbyshev <izbyshev@ispras.ru>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 09:57:52 -07:00