Commit Graph

60650 Commits

Author SHA1 Message Date
Vivek Goyal
51fecdd255 virtiofs: Do not end request in submission context
Submission context can hold some locks which end request code tries to hold
again and deadlock can occur. For example, fc->bg_lock. If a background
request is being submitted, it might hold fc->bg_lock and if we could not
submit request (because device went away) and tried to end request, then
deadlock happens. During testing, I also got a warning from deadlock
detection code.

So put requests on a list and end requests from a worker thread.

I got following warning from deadlock detector.

[  603.137138] WARNING: possible recursive locking detected
[  603.137142] --------------------------------------------
[  603.137144] blogbench/2036 is trying to acquire lock:
[  603.137149] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_request_end+0xdf/0x1c0 [fuse]
[  603.140701]
[  603.140701] but task is already holding lock:
[  603.140703] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_simple_background+0x92/0x1d0 [fuse]
[  603.140713]
[  603.140713] other info that might help us debug this:
[  603.140714]  Possible unsafe locking scenario:
[  603.140714]
[  603.140715]        CPU0
[  603.140716]        ----
[  603.140716]   lock(&(&fc->bg_lock)->rlock);
[  603.140718]   lock(&(&fc->bg_lock)->rlock);
[  603.140719]
[  603.140719]  *** DEADLOCK ***

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Miklos Szeredi
6c26f71759 fuse: don't advise readdirplus for negative lookup
If the FUSE_READDIRPLUS_AUTO feature is enabled, then lookups on a
directory before/during readdir are used as an indication that READDIRPLUS
should be used instead of READDIR.  However if the lookup turns out to be
negative, then selecting READDIRPLUS makes no sense.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Miklos Szeredi
2b319d1f6f fuse: don't dereference req->args on finished request
Move the check for async request after check for the request being already
finished and done with.

Reported-by: syzbot+ae0bb7aae3de6b4594e2@syzkaller.appspotmail.com
Fixes: d49937749f ("fuse: stop copying args to fuse_req")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 09:11:40 +02:00
Miklos Szeredi
3f22c74671 virtio-fs: don't show mount options
Virtio-fs does not accept any mount options, so it's confusing and wrong to
show any in /proc/mounts.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com> 
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-15 16:11:41 +02:00
Vivek Goyal
112e72373d virtio-fs: Change module name to virtiofs.ko
We have been calling it virtio_fs and even file name is virtio_fs.c. Module
name is virtio_fs.ko but when registering file system user is supposed to
specify filesystem type as "virtiofs".

Masayoshi Mizuma reported that he specified filesytem type as "virtio_fs"
and got this warning on console.

  ------------[ cut here ]------------
  request_module fs-virtio_fs succeeded, but still no fs?
  WARNING: CPU: 1 PID: 1234 at fs/filesystems.c:274 get_fs_type+0x12c/0x138
  Modules linked in: ... virtio_fs fuse virtio_net net_failover ...
  CPU: 1 PID: 1234 Comm: mount Not tainted 5.4.0-rc1 #1

So looks like kernel could find the module virtio_fs.ko but could not find
filesystem type after that.

It probably is better to rename module name to virtiofs.ko so that above
warning goes away in case user ends up specifying wrong fs name.

Reported-by: Masayoshi Mizuma <msys.mizuma@gmail.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-14 10:20:33 +02:00
Linus Torvalds
bb48a59135 for-5.4-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl2SDbMACgkQxWXV+ddt
 WDsUhw/9HcRsT6SlrwA2R5leHxCR5UMwT2Zmbxpfft37ANF0SC1UINHBfnmquM97
 xX6fdRSR9RUjF9DrdLPfLBnJDQ/MnHl1ruIVBFhJm6cJ9TJwf9E0TiJBQt+08JWg
 vy5hZBWvsPWWRBJ94XPMe4LtakK/isW4Cz5W9AdrC2Siqw69j6eZzms2AnIjyBjA
 BoKg4se2Ay2rMxLZWXIOj9374PU+N1cnRnqgh77ZxLku5WdCzrDfB5safE7UmoTG
 /MWJuuIgzOk0iQpQORRtEZDS1dNe5KT9m4xXkUbrZbQROwqnXrT1SVIsuqNAvlPk
 uaymR1W8nshepzpMlSxVydLv/mKWZNUGnDxOJ23ooow8Yd7ndppXEtFuGwCYqIFc
 xQqxuTLREvJ9+jpSv11bmDpk/ULRqpV+2PjUqGaWlGwFArJ+qFRLVGYx31eXmDPj
 t2mrPOcXGzY0pKtIpbkuUGleY/jeI+BNsvD4+QPs+jnp0nmfvH0/Rmp7grGqx2FI
 rQM8Gn4a5i3nuEDWLp8nN2wcKC3ePwy96Vp2tqfsl6TVTPx4EFzGLkWogHR2yiqI
 0LAj8YWFmWuChSv71wYOjX79CppjcbNwOakSwtDjV30jkwoh2f/0D3OwOpua2xe8
 75KQMaSB0kesGZz7ZkL1kMqA5m5w7MGZom6XZoBJ+bq2HPLB2jo=
 =2UM7
 -----END PGP SIGNATURE-----

Merge tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A bunch of fixes that accumulated in recent weeks, mostly material for
  stable.

  Summary:

   - fix for regression from 5.3 that prevents to use balance convert
     with single profile

   - qgroup fixes: rescan race, accounting leak with multiple writers,
     potential leak after io failure recovery

   - fix for use after free in relocation (reported by KASAN)

   - other error handling fixups"

* tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
  btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
  btrfs: Fix a regression which we can't convert to SINGLE profile
  btrfs: relocation: fix use-after-free on dead relocation roots
  Btrfs: fix race setting up and completing qgroup rescan workers
  Btrfs: fix missing error return if writeback for extent buffer never started
  btrfs: adjust dirty_metadata_bytes after writeback failure of extent buffer
  Btrfs: fix selftests failure due to uninitialized i_mode in test inodes
2019-09-30 10:25:24 -07:00
Linus Torvalds
1eb80d6ffb Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "A couple of misc patches"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs dynroot: switch to simple_dir_operations
  fs/handle.c - fix up kerneldoc
2019-09-29 19:42:07 -07:00
Linus Torvalds
7edee5229c 9 smb3 patches including an important patch for debugging traces with wireshark, and 3 patches for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl2Pzl0ACgkQiiy9cAdy
 T1F7aAv9EUA2vEdV+3tyKX17yGm8GBVygANsdMlGqqmRhauO0+KJnrsTR19qh9na
 oe0r6EwaS6/JwDtM/Tt0YyjyRS7GDyfT4cNNFVmrJ0fnQV11FJR0X83uzdm3HydH
 eOyKNG22TwOeFJ3kWqdvSI0AtfbmIcVoOlUAAKsAsv2ksrJIW7Q1BIgQeD8estUV
 j8VjPEIc1c/69UU/H5ktrRHMeT5PO61SV8xGM47WnYkntlFDe1E83xWGoxo996Pc
 KdGSrB1edWXK6kSlX3yQWnoo8QxcUm8IjgsudqcnOrhnro9s/cDU5ZU1RlXNQeB8
 LMtYwNA7jEu9p3TIibxOCph4gofUWNV25GbEJWOY03NxWReTvgLsMbsreul+XNv9
 fow5mvCG94SaE8xDjTvzYRBTeYoXv0WjlTTJjqAlVshirQXk7a2dEBVBipkn0Ea7
 0845c3NtR20pDGQs3vVzdStDT2MwkNUl1hN4vE1Zl0p2ClOS+eFVq9MgIEddSLi2
 Z0oJsmfg
 =o1/m
 -----END PGP SIGNATURE-----

Merge tag '5.4-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull more cifs updates from Steve French:
 "Fixes from the recent SMB3 Test events and Storage Developer
  Conference (held the last two weeks).

  Here are nine smb3 patches including an important patch for debugging
  traces with wireshark, with three patches marked for stable.

  Additional fixes from last week to better handle some newly discovered
  reparse points, and a fix the create/mkdir path for setting the mode
  more atomically (in SMB3 Create security descriptor context), and one
  for path name processing are still being tested so are not included
  here"

* tag '5.4-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix oplock handling for SMB 2.1+ protocols
  smb3: missing ACL related flags
  smb3: pass mode bits into create calls
  smb3: Add missing reparse tags
  CIFS: fix max ea value size
  fs/cifs/sess.c: Remove set but not used variable 'capabilities'
  fs/cifs/smb2pdu.c: Make SMB2_notify_init static
  smb3: fix leak in "open on server" perf counter
  smb3: allow decryption keys to be dumped by admin for debugging
2019-09-29 19:37:32 -07:00
Linus Torvalds
3f2dc2798b Merge branch 'entropy'
Merge active entropy generation updates.

This is admittedly partly "for discussion".  We need to have a way
forward for the boot time deadlocks where user space ends up waiting for
more entropy, but no entropy is forthcoming because the system is
entirely idle just waiting for something to happen.

While this was triggered by what is arguably a user space bug with
GDM/gnome-session asking for secure randomness during early boot, when
they didn't even need any such truly secure thing, the issue ends up
being that our "getrandom()" interface is prone to that kind of
confusion, because people don't think very hard about whether they want
to block for sufficient amounts of entropy.

The approach here-in is to decide to not just passively wait for entropy
to happen, but to start actively collecting it if it is missing.  This
is not necessarily always possible, but if the architecture has a CPU
cycle counter, there is a fair amount of noise in the exact timings of
reasonably complex loads.

We may end up tweaking the load and the entropy estimates, but this
should be at least a reasonable starting point.

As part of this, we also revert the revert of the ext4 IO pattern
improvement that ended up triggering the reported lack of external
entropy.

* getrandom() active entropy waiting:
  Revert "Revert "ext4: make __ext4_get_inode_loc plug""
  random: try to actively add entropy rather than passively wait for it
2019-09-29 19:25:39 -07:00
Linus Torvalds
02f03c4206 Revert "Revert "ext4: make __ext4_get_inode_loc plug""
This reverts commit 72dbcf7215.

Instead of waiting forever for entropy that may just not happen, we now
try to actively generate entropy when required, and are thus hopefully
avoiding the problem that caused the nice ext4 IO pattern fix to be
reverted.

So revert the revert.

Cc: Ahmed S. Darwish <darwish.07@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-29 17:59:23 -07:00
Linus Torvalds
9c5efe9ae7 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:

 - Apply a number of membarrier related fixes and cleanups, which fixes
   a use-after-free race in the membarrier code

 - Introduce proper RCU protection for tasks on the runqueue - to get
   rid of the subtle task_rcu_dereference() interface that was easy to
   get wrong

 - Misc fixes, but also an EAS speedup

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Avoid redundant EAS calculation
  sched/core: Remove double update_max_interval() call on CPU startup
  sched/core: Fix preempt_schedule() interrupt return comment
  sched/fair: Fix -Wunused-but-set-variable warnings
  sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
  sched/membarrier: Return -ENOMEM to userspace on memory allocation failure
  sched/membarrier: Skip IPIs when mm->mm_users == 1
  selftests, sched/membarrier: Add multi-threaded test
  sched/membarrier: Fix p->mm->membarrier_state racy load
  sched/membarrier: Call sync_core only before usermode for same mm
  sched/membarrier: Remove redundant check
  sched/membarrier: Fix private expedited registration check
  tasks, sched/core: RCUify the assignment of rq->curr
  tasks, sched/core: With a grace period after finish_task_switch(), remove unnecessary code
  tasks, sched/core: Ensure tasks are available for a grace period after leaving the runqueue
  tasks: Add a count of task RCU users
  sched/core: Convert vcpu_is_preempted() from macro to an inline function
  sched/fair: Remove unused cfs_rq_clock_task() function
2019-09-28 12:39:07 -07:00
Linus Torvalds
aefcf2f4b5 Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull kernel lockdown mode from James Morris:
 "This is the latest iteration of the kernel lockdown patchset, from
  Matthew Garrett, David Howells and others.

  From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

  There are two major changes since this was last proposed for mainline:

   - Separating lockdown from EFI secure boot. Background discussion is
     covered here: https://lwn.net/Articles/751061/

   -  Implementation as an LSM, with a default stackable lockdown LSM
      module. This allows the lockdown feature to be policy-driven,
      rather than encoding an implicit policy within the mechanism.

  The new locked_down LSM hook is provided to allow LSMs to make a
  policy decision around whether kernel functionality that would allow
  tampering with or examining the runtime state of the kernel should be
  permitted.

  The included lockdown LSM provides an implementation with a simple
  policy intended for general purpose use. This policy provides a coarse
  level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

  Enable the kernel lockdown feature. If set to integrity, kernel features
  that allow userland to modify the running kernel are disabled. If set to
  confidentiality, kernel features that allow userland to extract
  confidential information from the kernel are also disabled.

  This may also be controlled via /sys/kernel/security/lockdown and
  overriden by kernel configuration.

  New or existing LSMs may implement finer-grained controls of the
  lockdown features. Refer to the lockdown_reason documentation in
  include/linux/security.h for details.

  The lockdown feature has had signficant design feedback and review
  across many subsystems. This code has been in linux-next for some
  weeks, with a few fixes applied along the way.

  Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
  when kernel lockdown is in confidentiality mode") is missing a
  Signed-off-by from its author. Matthew responded that he is providing
  this under category (c) of the DCO"

* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
  kexec: Fix file verification on S390
  security: constify some arrays in lockdown LSM
  lockdown: Print current->comm in restriction messages
  efi: Restrict efivar_ssdt_load when the kernel is locked down
  tracefs: Restrict tracefs when the kernel is locked down
  debugfs: Restrict debugfs when the kernel is locked down
  kexec: Allow kexec_file() with appropriate IMA policy when locked down
  lockdown: Lock down perf when in confidentiality mode
  bpf: Restrict bpf when kernel lockdown is in confidentiality mode
  lockdown: Lock down tracing and perf kprobes when in confidentiality mode
  lockdown: Lock down /proc/kcore
  x86/mmiotrace: Lock down the testmmiotrace module
  lockdown: Lock down module params that specify hardware parameters (eg. ioport)
  lockdown: Lock down TIOCSSERIAL
  lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
  acpi: Disable ACPI table override if the kernel is locked down
  acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
  ACPI: Limit access to custom_method when the kernel is locked down
  x86/msr: Restrict MSR access when the kernel is locked down
  x86: Lock down IO port access when the kernel is locked down
  ...
2019-09-28 08:14:15 -07:00
Linus Torvalds
298fb76a55 Highlights:
- add a new knfsd file cache, so that we don't have to open and
 	  close on each (NFSv2/v3) READ or WRITE.  This can speed up
 	  read and write in some cases.  It also replaces our readahead
 	  cache.
 	- Prevent silent data loss on write errors, by treating write
 	  errors like server reboots for the purposes of write caching,
 	  thus forcing clients to resend their writes.
 	- Tweak the code that allocates sessions to be more forgiving,
 	  so that NFSv4.1 mounts are less likely to hang when a server
 	  already has a lot of clients.
 	- Eliminate an arbitrary limit on NFSv4 ACL sizes; they should
 	  now be limited only by the backend filesystem and the
 	  maximum RPC size.
 	- Allow the server to enforce use of the correct kerberos
 	  credentials when a client reclaims state after a reboot.
 
 And some miscellaneous smaller bugfixes and cleanup.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAl2OoFcVHGJmaWVsZHNA
 ZmllbGRzZXMub3JnAAoJECebzXlCjuG+dRoP/3OW1NxPjpjbCQWZL0M+O3AYJJla
 W8E+uoZKMosFEe/ymokMD0Vn5s47jPaMCifMjHZa2GygW8zHN9X2v0HURx/lob+o
 /rJXwMn78N/8kdbfDz2FvaCPeT0IuNzRIFBV8/sSXofqwCBwvPo+cl0QGrd4/xLp
 X35qlupx62TRk+kbdRjvv8kpS5SJ7BvR+FSA1WubNYWw2hpdEsr2OCFdGq2Wvthy
 DK6AfGBXfJGsOE+HGCSj6ejRV6i0UOJ17P8gRSsx+YT0DOe5E7ROjt+qvvRwk489
 wmR8Vjuqr1e40eGAUq3xuLfk5F5NgycY4ekVxk/cTVFNwWcz2DfdjXQUlyPAbrSD
 SqIyxN1qdKT24gtr7AHOXUWJzBYPWDgObCVBXUGzyL81RiDdhf38HRNjL2TcSDld
 tzCjQ0wbPw+iT74v6qQRY05oS+h3JOtDjU4pxsBnxVtNn4WhGJtaLfWW8o1C1QwU
 bc4aX3TlYhDmzU7n7Zjt4rFXGJfyokM+o6tPao1Z60Pmsv1gOk4KQlzLtW/jPHx4
 ZwYTwVQUKRDBfC62nmgqDyGI3/Qu11FuIxL2KXUCgkwDxNWN4YkwYjOGw9Lb5qKM
 wFpxq6CDNZB/IWLEu8Yg85kDPPUJMoI657lZb7Osr/MfBpU0YljcMOIzLBy8uV1u
 9COUbPaQipiWGu/0
 =diBo
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Highlights:

   - Add a new knfsd file cache, so that we don't have to open and close
     on each (NFSv2/v3) READ or WRITE. This can speed up read and write
     in some cases. It also replaces our readahead cache.

   - Prevent silent data loss on write errors, by treating write errors
     like server reboots for the purposes of write caching, thus forcing
     clients to resend their writes.

   - Tweak the code that allocates sessions to be more forgiving, so
     that NFSv4.1 mounts are less likely to hang when a server already
     has a lot of clients.

   - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
     limited only by the backend filesystem and the maximum RPC size.

   - Allow the server to enforce use of the correct kerberos credentials
     when a client reclaims state after a reboot.

  And some miscellaneous smaller bugfixes and cleanup"

* tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
  sunrpc: clean up indentation issue
  nfsd: fix nfs read eof detection
  nfsd: Make nfsd_reset_boot_verifier_locked static
  nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
  nfsd: handle drc over-allocation gracefully.
  nfsd: add support for upcall version 2
  nfsd: add a "GetVersion" upcall for nfsdcld
  nfsd: Reset the boot verifier on all write I/O errors
  nfsd: Don't garbage collect files that might contain write errors
  nfsd: Support the server resetting the boot verifier
  nfsd: nfsd_file cache entries should be per net namespace
  nfsd: eliminate an unnecessary acl size limit
  Deprecate nfsd fault injection
  nfsd: remove duplicated include from filecache.c
  nfsd: Fix the documentation for svcxdr_tmpalloc()
  nfsd: Fix up some unused variable warnings
  nfsd: close cached files prior to a REMOVE or RENAME that would replace target
  nfsd: rip out the raparms cache
  nfsd: have nfsd_test_lock use the nfsd_file cache
  nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
  ...
2019-09-27 17:00:27 -07:00
Linus Torvalds
8f744bdee4 add virtio-fs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXYx2zAAKCRDh3BK/laaZ
 PFpHAQD2G+F8a9e41jFTJg5YpNKMD8/Pl4T6v9chIO9qPXF2IAEAji0P1JterRfv
 ixiBhv54hSwYbk527nxNWE9tP5gAHAQ=
 =WCHy
 -----END PGP SIGNATURE-----

Merge tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse virtio-fs support from Miklos Szeredi:
 "Virtio-fs allows exporting directory trees on the host and mounting
  them in guest(s).

  This isn't actually a new filesystem, but a glue layer between the
  fuse filesystem and a virtio based back-end.

  It's similar in functionality to the existing virtio-9p solution, but
  significantly faster in benchmarks and has better POSIX compliance.
  Further permformance improvements can be achieved by sharing the page
  cache between host and guest, allowing for faster I/O and reduced
  memory use.

  Kata Containers have been including the out-of-tree virtio-fs (with
  the shared page cache patches as well) since version 1.7 as an
  experimental feature. They have been active in development and plan to
  switch from virtio-9p to virtio-fs as their default solution. There
  has been interest from other sources as well.

  The userspace infrastructure is slated to be merged into qemu once the
  kernel part hits mainline.

  This was developed by Vivek Goyal, Dave Gilbert and Stefan Hajnoczi"

* tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  virtio-fs: add virtiofs filesystem
  virtio-fs: add Documentation/filesystems/virtiofs.rst
  fuse: reserve values for mapping protocol
2019-09-27 15:54:24 -07:00
Linus Torvalds
9977b1a714 9p pull request for inclusion in 5.4
Small fixes all around:
  - avoid overlayfs copy-up for PRIVATE mmaps
  - KUMSAN uninitialized warning for transport error
  - one syzbot memory leak fix in 9p cache
  - internal API cleanup for v9fs_fill_super
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAl2OG0MACgkQq06b7GqY
 5nB21Q/+P4kD+wBg4d1b74Mf+oK2p2cSvCnYHcwMgkRsuyFgu7JXLSuRy2DLlAa4
 F+92x0jvit+yUy8bFqlEsSGGmE/dZC3WfpBOVChZRT/9ZLcxNasowtGDaySrX1e1
 tmx5QfYsNL840Cv8Vfos4QJjze1oE6RmSCUUO4KyC250PO/jxEgWGVPv1S8LfTUb
 KyhkcKlJ+TCqEON9nZ+CrQ4rndHNc7LxtBjVR0M03UYW55kYAhMLSj/AWqKA5I0q
 tARlTSgHHRdj4EJvWvLw7INg7BuvTuuyIzCIo/RfpUvijTDyLXFQI3qgtXosKLSq
 wuOacq8ykDRv8nvpLJ98daCw0pQOF2Qw40niFfl8Y7AxnaXggMI+UsVX/BnigVWX
 E6kaJm6d/YERee/f7aBsJkjKbwRqO7tMbNDrbK2/AzAaZo92dnB5nsEJnvKtCdjl
 aylBujCJq/DPp/SOf0Rl8mr24GdYVIk04hvqpxr4X6a/sacO3EUMh8GkGD7muY63
 axbYildLQHyChEn0TNdicFFrVie3A984eiK8sht+visd7c5QRm9iu1lsYtXdVGuz
 0OvSRtJuTDCTs6s1DhhVnez9luzH4VZJAj0RYJjcF1M4tzy0sKWUSSUf3RUbyxmY
 bocklihTlI5UzZfCP/wChXJ26+QrmcL7j0PrKA0NJy7ooFyT5eI=
 =VBrx
 -----END PGP SIGNATURE-----

Merge tag '9p-for-5.4' of git://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:
 "Some of the usual small fixes and cleanup.

  Small fixes all around:
   - avoid overlayfs copy-up for PRIVATE mmaps
   - KUMSAN uninitialized warning for transport error
   - one syzbot memory leak fix in 9p cache
   - internal API cleanup for v9fs_fill_super"

* tag '9p-for-5.4' of git://github.com/martinetd/linux:
  9p/vfs_super.c: Remove unused parameter data in v9fs_fill_super
  9p/cache.c: Fix memory leak in v9fs_cache_session_get_cookie
  9p: Transport error uninitialized
  9p: avoid attaching writeback_fid on mmap with type PRIVATE
2019-09-27 15:10:34 -07:00
Linus Torvalds
738f531d87 for-5.4/io_uring-2019-09-27
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl2OIu4QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsedD/4h54330vuq66DsGqzFLonLwFl5YHC5NeJX
 aV38j7pAUPHvr9CSnr3d2VwTk/ThBtPI50I9/d9SXh8n1oAAA5C/+nPf1XknME47
 giKHr3eb0FNLOySt/Ry284gla8mO0GZM83zUbDMnF0N+tfwAtFvbvgCpsPFK9vdL
 xNzMLsq6++pj7p9c6IXd+zv0nmJzDikZQz6PtU1KYlbfnU7hh/cP3CrIHIPtGbyk
 c/7tfbKTB9UnbW5guGakZt3BzNViJpK28SKn+S6AlLiEOYpC+55dBbaZQIy5qxHv
 CZsx0GJIw0Ya0Lw3UEFp/74krLHq2610jmx/va8P7MZCjZAR675G3mjxKUnC/+SY
 mEdLo6vghMNAIqMBWNu59CFQOPnqa8sqRii0q6cRWXSqKiFr1FLN8mstb3Ghh9K0
 kGVA/gw6ESWB/e/X6I+pD6pTm6O6BPWEqBzGAWSvavQQIP9YpIzf5j+k3JsRu03/
 IIzR6gW9k9u4k0rFlOJKbp1+AO5sK3VtJFR8JGELiRwwgjD91w50gjPYak5OGM37
 Mi7OHCxqtwFGTkSvT6RM6om6onBsizrveszkrPUO01bWYIHHbtu6ofLyQlfnEtpv
 qbGZtLW6KYj9VsIKZNDfg99Ff79IAOiAZDbXWAu/JKyg/gu1Y9uOiVkNFPJGPNHV
 8ourcldMGg==
 =DEYH
 -----END PGP SIGNATURE-----

Merge tag 'for-5.4/io_uring-2019-09-27' of git://git.kernel.dk/linux-block

Pull more io_uring updates from Jens Axboe:
 "Just two things in here:

   - Improvement to the io_uring CQ ring wakeup for batched IO (me)

   - Fix wrong comparison in poll handling (yangerkun)

  I realize the first one is a little late in the game, but it felt
  pointless to hold it off until the next release. Went through various
  testing and reviews with Pavel and peterz"

* tag 'for-5.4/io_uring-2019-09-27' of git://git.kernel.dk/linux-block:
  io_uring: make CQ ring wakeups be more efficient
  io_uring: compare cached_cq_tail with cq.head in_io_uring_poll
2019-09-27 12:08:24 -07:00
Qu Wenruo
d4e204948f btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
[BUG]
The following script can cause btrfs qgroup data space leak:

  mkfs.btrfs -f $dev
  mount $dev -o nospace_cache $mnt

  btrfs subv create $mnt/subv
  btrfs quota en $mnt
  btrfs quota rescan -w $mnt
  btrfs qgroup limit 128m $mnt/subv

  for (( i = 0; i < 3; i++)); do
          # Create 3 64M holes for latter fallocate to fail
          truncate -s 192m $mnt/subv/file
          xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null
          xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null
          sync

          # it's supposed to fail, and each failure will leak at least 64M
          # data space
          xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null
          rm $mnt/subv/file
          sync
  done

  # Shouldn't fail after we removed the file
  xfs_io -f -c "falloc 0 64m" $mnt/subv/file

[CAUSE]
Btrfs qgroup data reserve code allow multiple reservations to happen on
a single extent_changeset:
E.g:
	btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_1M);
	btrfs_qgroup_reserve_data(inode, &data_reserved, SZ_1M, SZ_2M);
	btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_4M);

Btrfs qgroup code has its internal tracking to make sure we don't
double-reserve in above example.

The only pattern utilizing this feature is in the main while loop of
btrfs_fallocate() function.

However btrfs_qgroup_reserve_data()'s error handling has a bug in that
on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED
flag but doesn't free previously reserved bytes.

This bug has a two fold effect:
- Clearing EXTENT_QGROUP_RESERVED ranges
  This is the correct behavior, but it prevents
  btrfs_qgroup_check_reserved_leak() to catch the leakage as the
  detector is purely EXTENT_QGROUP_RESERVED flag based.

- Leak the previously reserved data bytes.

The bug manifests when N calls to btrfs_qgroup_reserve_data are made and
the last one fails, leaking space reserved in the previous ones.

[FIX]
Also free previously reserved data bytes when btrfs_qgroup_reserve_data
fails.

Fixes: 5247255370 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-27 15:24:34 +02:00
Qu Wenruo
bab32fc069 btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
[BUG]
Under the following case with qgroup enabled, if some error happened
after we have reserved delalloc space, then in error handling path, we
could cause qgroup data space leakage:

From btrfs_truncate_block() in inode.c:

	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
					   block_start, blocksize);
	if (ret)
		goto out;

 again:
	page = find_or_create_page(mapping, index, mask);
	if (!page) {
		btrfs_delalloc_release_space(inode, data_reserved,
					     block_start, blocksize, true);
		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
		ret = -ENOMEM;
		goto out;
	}

[CAUSE]
In the above case, btrfs_delalloc_reserve_space() will call
btrfs_qgroup_reserve_data() and mark the io_tree range with
EXTENT_QGROUP_RESERVED flag.

In the error handling path, we have the following call stack:
btrfs_delalloc_release_space()
|- btrfs_free_reserved_data_space()
   |- btrsf_qgroup_free_data()
      |- __btrfs_qgroup_release_data(reserved=@reserved, free=1)
         |- qgroup_free_reserved_data(reserved=@reserved)
            |- clear_record_extent_bits();
            |- freed += changeset.bytes_changed;

However due to a completion bug, qgroup_free_reserved_data() will clear
EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)->io_failure_tree, other
than the correct BTRFS_I(inode)->io_tree.
Since io_failure_tree is never marked with that flag,
btrfs_qgroup_free_data() will not free any data reserved space at all,
causing a leakage.

This type of error handling can only be triggered by errors outside of
qgroup code. So EDQUOT error from qgroup can't trigger it.

[FIX]
Fix the wrong target io_tree.

Reported-by: Josef Bacik <josef@toxicpanda.com>
Fixes: bc42bda223 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-27 15:24:28 +02:00
Pavel Shilovsky
a016e2794f CIFS: Fix oplock handling for SMB 2.1+ protocols
There may be situations when a server negotiates SMB 2.1
protocol version or higher but responds to a CREATE request
with an oplock rather than a lease.

Currently the client doesn't handle such a case correctly:
when another CREATE comes in the server sends an oplock
break to the initial CREATE and the client doesn't send
an ack back due to a wrong caching level being set (READ
instead of RWH). Missing an oplock break ack makes the
server wait until the break times out which dramatically
increases the latency of the second CREATE.

Fix this by properly detecting oplocks when using SMB 2.1
protocol version and higher.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-09-26 16:42:44 -05:00
Steve French
ff3ee62a55 smb3: missing ACL related flags
Various SMB3 ACL related flags (for security descriptor and
ACEs for example) were missing and some fields are different
in SMB3 and CIFS. Update cifsacl.h definitions based on
current MS-DTYP specification.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2019-09-26 16:37:43 -05:00
Linus Torvalds
972a2bf7df NFS Client Updates for Linux 5.3
Stable bugfixes:
 - Dequeue the request from the receive queue while we're re-encoding # v4.20+
 - Fix buffer handling of GSS MIC without slack # 5.1
 
 Features:
 - Increase xprtrdma maximum transport header and slot table sizes
 - Add support for nfs4_call_sync() calls using a custom rpc_task_struct
 - Optimize the default readahead size
 - Enable pNFS filelayout LAYOUTGET on OPEN
 
 Other bugfixes and cleanups:
 - Fix possible null-pointer dereferences and memory leaks
 - Various NFS over RDMA cleanups
 - Various NFS over RDMA comment updates
 - Don't receive TCP data into a reset request buffer
 - Don't try to parse incomplete RPC messages
 - Fix congestion window race with disconnect
 - Clean up pNFS return-on-close error handling
 - Fixes for NFS4ERR_OLD_STATEID handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl2NC04ACgkQ18tUv7Cl
 QOs4Tg//bAlGs+dIKixAmeMKmTd6I34laUnuyV/12yPQDgo6bryLrTngfe2BYvmG
 2l+8H7yHfR4/gQE4vhR0c15xFgu6pvjBGR0/nNRaXienIPXO4xsQkcaxVA7SFRY2
 HjffZwyoBfjyRps0jL+2sTsKbRtSkf9Dn+BONRgesg51jK1jyWkXqXpmgi4uMO4i
 ojpTrW81dwo7Yhv08U2A/Q1ifMJ8F9dVYuL5sm+fEbVI/Nxoz766qyB8rs8+b4Xj
 3gkfyh/Y1zoMmu6c+r2Q67rhj9WYbDKpa6HH9yX1zM/RLTiU7czMX+kjuQuOHWxY
 YiEk73NjJ48WJEep3odess1q/6WiAXX7UiJM1SnDFgAa9NZMdfhqMm6XduNO1m60
 sy0i8AdxdQciWYexOXMsBuDUCzlcoj4WYs1QGpY3uqO1MznQS/QUfu65fx8CzaT5
 snm6ki5ivqXH/js/0Z4MX2n/sd1PGJ5ynMkekxJ8G3gw+GC/oeSeGNawfedifLKK
 OdzyDdeiel5Me1p4I28j1WYVLHvtFmEWEU9oytdG0D/rjC/pgYgW/NYvAao8lQ4Z
 06wdcyAM66ViAPrbYeE7Bx4jy8zYRkiw6Y3kIbLgrlMugu3BhIW5Mi3BsgL4f4am
 KsqkzUqPZMCOVwDuUILSuPp4uHaR+JTJttywiLniTL6reF5kTiA=
 =4Ey6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - Dequeue the request from the receive queue while we're re-encoding
     # v4.20+
   - Fix buffer handling of GSS MIC without slack # 5.1

  Features:
   - Increase xprtrdma maximum transport header and slot table sizes
   - Add support for nfs4_call_sync() calls using a custom
     rpc_task_struct
   - Optimize the default readahead size
   - Enable pNFS filelayout LAYOUTGET on OPEN

  Other bugfixes and cleanups:
   - Fix possible null-pointer dereferences and memory leaks
   - Various NFS over RDMA cleanups
   - Various NFS over RDMA comment updates
   - Don't receive TCP data into a reset request buffer
   - Don't try to parse incomplete RPC messages
   - Fix congestion window race with disconnect
   - Clean up pNFS return-on-close error handling
   - Fixes for NFS4ERR_OLD_STATEID handling"

* tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
  pNFS/filelayout: enable LAYOUTGET on OPEN
  NFS: Optimise the default readahead size
  NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
  NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
  NFSv4: Fix OPEN_DOWNGRADE error handling
  pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
  NFSv4: Add a helper to increment stateid seqids
  NFSv4: Handle RPC level errors in LAYOUTRETURN
  NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
  NFSv4: Clean up pNFS return-on-close error handling
  pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
  NFS: remove unused check for negative dentry
  NFSv3: use nfs_add_or_obtain() to create and reference inodes
  NFS: Refactor nfs_instantiate() for dentry referencing callers
  SUNRPC: Fix congestion window race with disconnect
  SUNRPC: Don't try to parse incomplete RPC messages
  SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
  SUNRPC: Fix buffer handling of GSS MIC without slack
  SUNRPC: RPC level errors should always set task->tk_rpc_status
  SUNRPC: Don't receive TCP data into a request buffer that has been reset
  ...
2019-09-26 12:20:14 -07:00
Kees Cook
7be3cb019d binfmt_elf: Do not move brk for INTERP-less ET_EXEC
When brk was moved for binaries without an interpreter, it should have
been limited to ET_DYN only. In other words, the special case was an
ET_DYN that lacks an INTERP, not just an executable that lacks INTERP.
The bug manifested for giant static executables, where the brk would end
up in the middle of the text area on 32-bit architectures.

Reported-and-tested-by: Richard Kojedzinszky <richard@kojedz.in>
Fixes: bbdc6076d2 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26 11:38:55 -07:00
Linus Torvalds
2268419e4c Changes since last update:
- Minor code cleanups.
 - Fix a superblock logging error.
 - Ensure that collapse range converts the data fork to extents format
   when necessary.
 - Revert the ALLOC_USERDATA cleanup because it caused subtle
   behavior regressions.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl2KR6oACgkQ+H93GTRK
 tOvtVA//Q6iqCTYnvbIHgNMMP6WvD/qL/G1bTm+ql7KJEyd99PC6EATY3IW2DP2Q
 gwj1VDid362eF65XiLoe/4zjhj1SmrpX92B9jDBt3n47sHLnDIjtT+54jo0n51xm
 MW+79qPC1luT16pdgurkJo7jGrR5Zj5xcUlNj8qVfH9fD9ZAUu2+OzTD+iWn8qvu
 L4geuMG2WKinxlfP6v4y5Fn7X9PiWM14N2N2p+N6ITuhdxyKch/EohI/P0syUcTJ
 Ad1za0PFFI+BtI4F9kWWBgoGe0oCqCeU6xCGkK5HJ9XTPmJrlU6WKk5lQJs6T92z
 OEAQfWkb/Yc7YbSP+CATH1vBIYQZzvhVXkHd0q1JBqhLOPeBnLhO/gVefjSDBgIq
 KlYiGlCn3Rme28KNUX+o9/JAsOVpwnxL1GIUAu4v0V2GQQIx7G0WCykiMgRjl0sm
 iCsarAev1eHPVsRWArdR1fFrOQ206tvMTz4zSREYVFMsHn0Zq3c9j3R78azMaUA6
 tbAfPlp7p0M90u3uC4FrZHyPPu6VHLNuE82zAwi8oLD1kCO9wxluF5gRg1UVOiB5
 Rp4Y/6BeO75039aZka7+5iQzzhiudezsXK138YGBx40nW+2nW2ZLWIc3Kkm2mGl3
 aJ0xsMusAv2q/lVKyg7280EjlkPYcHLTkaZ9+hCkmd3A9jcb/l8=
 =D/XD
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.4-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "There are a couple of bug fixes and some small code cleanups that came
  in recently:

   - Minor code cleanups

   - Fix a superblock logging error

   - Ensure that collapse range converts the data fork to extents format
     when necessary

   - Revert the ALLOC_USERDATA cleanup because it caused subtle behavior
     regressions"

* tag 'xfs-5.4-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: avoid unused to_mp() function warning
  xfs: log proper length of superblock
  xfs: revert 1baa2800e6 ("xfs: remove the unused XFS_ALLOC_USERDATA flag")
  xfs: removed unneeded variable
  xfs: convert inode to extent format after extent merge due to shift
2019-09-26 11:36:20 -07:00
Linus Torvalds
dadedd8563 Merge branch 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull jffs2 fix from Al Viro:
 "braino fix for mount API conversion for jffs2"

* 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  jffs2: Fix mounting under new mount API
2019-09-26 11:33:30 -07:00
Linus Torvalds
cbafe18c71 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - almost all of the rest of -mm

 - various other subsystems

Subsystems affected by this patch series:
  memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork,
  cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise,
  cleanups, pagemap

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (77 commits)
  arch/sparc/include/asm/pgtable_64.h: fix build
  mm: treewide: clarify pgtable_page_{ctor,dtor}() naming
  ntfs: remove (un)?likely() from IS_ERR() conditions
  IB/hfi1: remove unlikely() from IS_ERR*() condition
  xfs: remove unlikely() from WARN_ON() condition
  wimax/i2400m: remove unlikely() from WARN*() condition
  fs: remove unlikely() from WARN_ON() condition
  xen/events: remove unlikely() from WARN() condition
  checkpatch: check for nested (un)?likely() calls
  hexagon: drop empty and unused free_initrd_mem
  mm: factor out common parts between MADV_COLD and MADV_PAGEOUT
  mm: introduce MADV_PAGEOUT
  mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM
  mm: introduce MADV_COLD
  mm: untag user pointers in mmap/munmap/mremap/brk
  vfio/type1: untag user pointers in vaddr_get_pfn
  tee/shm: untag user pointers in tee_shm_register
  media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get
  drm/radeon: untag user pointers in radeon_gem_userptr_ioctl
  drm/amdgpu: untag user pointers
  ...
2019-09-26 10:29:42 -07:00
Denis Efremov
cc22c800e1 ntfs: remove (un)?likely() from IS_ERR() conditions
"likely(!IS_ERR(x))" is excessive. IS_ERR() already uses
unlikely() internally.

Link: http://lkml.kernel.org/r/20190829165025.15750-11-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26 10:10:44 -07:00
Denis Efremov
14ed868807 xfs: remove unlikely() from WARN_ON() condition
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely()
internally.

Link: http://lkml.kernel.org/r/20190829165025.15750-7-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26 10:10:30 -07:00
Denis Efremov
7159d54418 fs: remove unlikely() from WARN_ON() condition
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely()
internally.

Link: http://lkml.kernel.org/r/20190829165025.15750-5-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26 10:10:30 -07:00
David Howells
a3bc18a48e jffs2: Fix mounting under new mount API
The mounting of jffs2 is broken due to the changes from the new mount API
because it specifies a "source" operation, but then doesn't actually
process it.  But because it specified it, it doesn't return -ENOPARAM and
the caller doesn't process it either and the source gets lost.

Fix this by simply removing the source parameter from jffs2 and letting the
VFS deal with it in the default manner.

To test it, enable CONFIG_MTD_MTDRAM and allow the default size and erase
block size parameters, then try and mount the /dev/mtdblock<N> file that
that creates as jffs2.  No need to initialise it.

Fixes: ec10a24f10 ("vfs: Convert jffs2 to use the new mount API")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Woodhouse <dwmw2@infradead.org>
cc: Richard Weinberger <richard@nod.at>
cc: linux-mtd@lists.infradead.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-26 10:26:55 -04:00
Jens Axboe
bda521624e io_uring: make CQ ring wakeups be more efficient
For batched IO, it's not uncommon for waiters to ask for more than 1
IO to complete before being woken up. This is a problem with
wait_event() since tasks will get woken for every IO that completes,
re-check condition, then go back to sleep. For batch counts on the
order of what you do for high IOPS, that can result in 10s of extra
wakeups for the waiting task.

Add a private wake function that checks for the wake up count criteria
being met before calling autoremove_wake_function(). Pavel reports that
one test case he has runs 40% faster with proper batching of wakeups.

Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Tested-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26 03:55:40 -06:00
Steve French
c3ca78e217 smb3: pass mode bits into create calls
We need to populate an ACL (security descriptor open context)
on file and directory correct.  This patch passes in the
mode.  Followon patch will build the open context and the
security descriptor (from the mode) that goes in the open
context.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2019-09-26 02:06:42 -05:00
Andrey Konovalov
7d0325749a userfaultfd: untag user pointers
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

userfaultfd code use provided user pointers for vma lookups, which can
only by done with untagged pointers.

Untag user pointers in validate_range().

Link: http://lkml.kernel.org/r/cdc59ddd7011012ca2e689bc88c3b65b1ea7e413.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:41 -07:00
Andrey Konovalov
ed8a66b832 fs/namespace: untag user pointers in copy_mount_options
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

In copy_mount_options a user address is being subtracted from TASK_SIZE.
If the address is lower than TASK_SIZE, the size is calculated to not
allow the exact_copy_from_user() call to cross TASK_SIZE boundary.
However if the address is tagged, then the size will be calculated
incorrectly.

Untag the address before subtracting.

Link: http://lkml.kernel.org/r/1de225e4a54204bfd7f25dac2635e31aa4aa1d90.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:41 -07:00
Markus Elfring
aadc4e01db fat: delete an unnecessary check before brelse()
brelse() tests whether its argument is NULL and then returns immediately.
Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Link: http://lkml.kernel.org/r/cfff3b81-fb5d-af26-7b5e-724266509045@web.de
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
Jason Yan
b25bab1722 fs/reiserfs/do_balan.c: remove set but not used variable
Fix the following gcc warning:

fs/reiserfs/do_balan.c: In function balance_leaf_insert_right:
fs/reiserfs/do_balan.c:629:6: warning: variable ret set but not used
[-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/20190827032932.46622-2-yanaijie@huawei.com
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Cc: zhengbin <zhengbin13@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
Jason Yan
3e9fd5a48c fs/reiserfs/journal.c: remove set but not used variable
Fix the following gcc warning:

fs/reiserfs/journal.c: In function flush_used_journal_lists:
fs/reiserfs/journal.c:1791:6: warning: variable ret set but not used
[-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/20190827032932.46622-1-yanaijie@huawei.com
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Cc: zhengbin <zhengbin13@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
zhengbin
da5184c2ab fs/reiserfs/do_balan.c: remove set but not used variables
fs/reiserfs/do_balan.c: In function balance_leaf_when_delete:
fs/reiserfs/do_balan.c:245:20: warning: variable ih set but not used [-Wunused-but-set-variable]
fs/reiserfs/do_balan.c: In function balance_leaf_insert_left:
fs/reiserfs/do_balan.c:301:7: warning: variable version set but not used [-Wunused-but-set-variable]
fs/reiserfs/do_balan.c: In function balance_leaf_insert_right:
fs/reiserfs/do_balan.c:649:7: warning: variable version set but not used [-Wunused-but-set-variable]
fs/reiserfs/do_balan.c: In function balance_leaf_new_nodes_insert:
fs/reiserfs/do_balan.c:953:7: warning: variable version set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/1566379929-118398-8-git-send-email-zhengbin13@huawei.com
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
zhengbin
4fadcd1c14 fs/reiserfs/fix_node.c: remove set but not used variables
fs/reiserfs/fix_node.c: In function get_num_ver:
fs/reiserfs/fix_node.c:379:6: warning: variable cur_free set but not used [-Wunused-but-set-variable]
fs/reiserfs/fix_node.c: In function dc_check_balance_internal:
fs/reiserfs/fix_node.c:1737:6: warning: variable maxsize set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/1566379929-118398-7-git-send-email-zhengbin13@huawei.com
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
zhengbin
73fbff5eea fs/reiserfs/prints.c: remove set but not used variables
Fixes gcc '-Wunused-but-set-variable' warning:

fs/reiserfs/prints.c: In function check_internal_block_head:
fs/reiserfs/prints.c:749:21: warning: variable blkh set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/1566379929-118398-6-git-send-email-zhengbin13@huawei.com
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
zhengbin
4a70aebb12 fs/reiserfs/objectid.c: remove set but not used variables
Fixes gcc '-Wunused-but-set-variable' warning:

fs/reiserfs/objectid.c: In function reiserfs_convert_objectid_map_v1:
fs/reiserfs/objectid.c:186:25: warning: variable new_objectid_map set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/1566379929-118398-5-git-send-email-zhengbin13@huawei.com
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
zhengbin
d4a1a857e3 fs/reiserfs/lbalance.c: remove set but not used variables
Fixes gcc '-Wunused-but-set-variable' warning:

fs/reiserfs/lbalance.c: In function leaf_paste_entries:
fs/reiserfs/lbalance.c:1325:9: warning: variable old_entry_num set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/1566379929-118398-4-git-send-email-zhengbin13@huawei.com
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
zhengbin
66985cb9ee fs/reiserfs/stree.c: remove set but not used variables
Fixes gcc '-Wunused-but-set-variable' warning:

fs/reiserfs/stree.c: In function search_by_key:
fs/reiserfs/stree.c:596:6: warning: variable right_neighbor_of_leaf_node set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/1566379929-118398-3-git-send-email-zhengbin13@huawei.com
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
zhengbin
6e9ca45f77 fs/reiserfs/journal.c: remove set but not used variables
Fixes gcc '-Wunused-but-set-variable' warning:

fs/reiserfs/journal.c: In function flush_older_commits:
fs/reiserfs/journal.c:894:15: warning: variable first_trans_id set but not used [-Wunused-but-set-variable]
fs/reiserfs/journal.c: In function flush_journal_list:
fs/reiserfs/journal.c:1354:38: warning: variable last set but not used [-Wunused-but-set-variable]
fs/reiserfs/journal.c: In function do_journal_release:
fs/reiserfs/journal.c:1916:6: warning: variable flushed set but not used [-Wunused-but-set-variable]
fs/reiserfs/journal.c: In function do_journal_end:
fs/reiserfs/journal.c:3993:6: warning: variable old_start set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/1566379929-118398-2-git-send-email-zhengbin13@huawei.com
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
Jia-Ju Bai
d256085be1 fs: reiserfs: remove unnecessary check of bh in remove_from_transaction()
On lines 3430-3434, bh has been assured to be non-null:
    cn = get_journal_hash_dev(sb, journal->j_hash_table, blocknr);
    if (!cn || !cn->bh) {
        return ret;
    }
    bh = cn->bh;

Thus, the check of bh on line 3447 is unnecessary and can be removed.
Thank Andrew Morton for good advice.

Link: http://lkml.kernel.org/r/20190727084019.11307-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Cc: Bharath Vedartham <linux.bhar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:39 -07:00
Linus Torvalds
f41def3971 The highlights are:
- automatic recovery of a blacklisted filesystem session (Zheng Yan).
   This is disabled by default and can be enabled by mounting with the
   new "recover_session=clean" option.
 
 - serialize buffered reads and O_DIRECT writes (Jeff Layton).  Care is
   taken to avoid serializing O_DIRECT reads and writes with each other,
   this is based on the exclusion scheme from NFS.
 
 - handle large osdmaps better in the face of fragmented memory (myself)
 
 - don't limit what security.* xattrs can be get or set (Jeff Layton).
   We were overly restrictive here, unnecessarily preventing things like
   file capability sets stored in security.capability from working.
 
 - allow copy_file_range() within the same inode and across different
   filesystems within the same cluster (Luis Henriques)
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl2LoD8THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzixRYB/9H5S4fif8Pn9eiXkIJiT9UZR/o7S1k
 ikfQNPeDxlBLKnoZXpDp2HqCu1/YuCcJ0zpZzPGGrKECZb7r+NaayxhmEXAZ+Vsg
 YwsO3eNHBbb58pe9T4oiHp19sflwcTOeNwg8wlvmvrgfBupFz2pU8Xm72EdFyoYm
 tP0QNTOCAuQK3pJcgozaptAO1TzBL3LomyVM0YzAKcumgMg47zALpaSLWJLGtDLM
 5+5WLvcVfBGLVv60h4B62ldS39eBxqTsFodcRMUaqAsnhLK70HVfKlwR3GgtZggr
 PDqbsuIfw/O3b65U2XDKZt1P9dyG3OE/ucueduXUxJPYNGmooEE+PpE+
 =DRVP
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The highlights are:

   - automatic recovery of a blacklisted filesystem session (Zheng Yan).
     This is disabled by default and can be enabled by mounting with the
     new "recover_session=clean" option.

   - serialize buffered reads and O_DIRECT writes (Jeff Layton). Care is
     taken to avoid serializing O_DIRECT reads and writes with each
     other, this is based on the exclusion scheme from NFS.

   - handle large osdmaps better in the face of fragmented memory
     (myself)

   - don't limit what security.* xattrs can be get or set (Jeff Layton).
     We were overly restrictive here, unnecessarily preventing things
     like file capability sets stored in security.capability from
     working.

   - allow copy_file_range() within the same inode and across different
     filesystems within the same cluster (Luis Henriques)"

* tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client: (41 commits)
  ceph: call ceph_mdsc_destroy from destroy_fs_client
  libceph: use ceph_kvmalloc() for osdmap arrays
  libceph: avoid a __vmalloc() deadlock in ceph_kvmalloc()
  ceph: allow object copies across different filesystems in the same cluster
  ceph: include ceph_debug.h in cache.c
  ceph: move static keyword to the front of declarations
  rbd: pull rbd_img_request_create() dout out into the callers
  ceph: reconnect connection if session hang in opening state
  libceph: drop unused con parameter of calc_target()
  ceph: use release_pages() directly
  rbd: fix response length parameter for encoded strings
  ceph: allow arbitrary security.* xattrs
  ceph: only set CEPH_I_SEC_INITED if we got a MAC label
  ceph: turn ceph_security_invalidate_secctx into static inline
  ceph: add buffered/direct exclusionary locking for reads and writes
  libceph: handle OSD op ceph_pagelist_append() errors
  ceph: don't return a value from void function
  ceph: don't freeze during write page faults
  ceph: update the mtime when truncating up
  ceph: fix indentation in __get_snap_name()
  ...
2019-09-25 10:21:13 -07:00
Linus Torvalds
7b1373dd6e fuse update for 5.4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXYod6wAKCRDh3BK/laaZ
 PG3fAP9WXuvUeYh3X7ThQPa2D33VCIMJRd6t+1TVSSc/H8P3dAD/ehN5HIWjnmzz
 iZFc3zDtO9UCJUe23IZomblxOQbu6Qk=
 =I0S2
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Continue separating the transport (user/kernel communication) and the
   filesystem layers of fuse. Getting rid of most layering violations
   will allow for easier cleanup and optimization later on.

 - Prepare for the addition of the virtio-fs filesystem. The actual
   filesystem will be introduced by a separate pull request.

 - Convert to new mount API.

 - Various fixes, optimizations and cleanups.

* tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (55 commits)
  fuse: Make fuse_args_to_req static
  fuse: fix memleak in cuse_channel_open
  fuse: fix beyond-end-of-page access in fuse_parse_cache()
  fuse: unexport fuse_put_request
  fuse: kmemcg account fs data
  fuse: on 64-bit store time in d_fsdata directly
  fuse: fix missing unlock_page in fuse_writepage()
  fuse: reserve byteswapped init opcodes
  fuse: allow skipping control interface and forced unmount
  fuse: dissociate DESTROY from fuseblk
  fuse: delete dentry if timeout is zero
  fuse: separate fuse device allocation and installation in fuse_conn
  fuse: add fuse_iqueue_ops callbacks
  fuse: extract fuse_fill_super_common()
  fuse: export fuse_dequeue_forget() function
  fuse: export fuse_get_unique()
  fuse: export fuse_send_init_request()
  fuse: export fuse_len_args()
  fuse: export fuse_end_request()
  fuse: fix request limit
  ...
2019-09-25 09:55:59 -07:00
Linus Torvalds
4ef5b13a29 New code for 5.4:
- Report both io errors and short io results to the directio endio
   handler.
 - Allow directio callers to pass an ops structure to iomap_dio_rw.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl2EQBwACgkQ+H93GTRK
 tOvZ0w//R00Dya3pZmO+HQX/Kovo/AOKCct/gPaLvRf4DVvWZ3ZH5lPtDUSKCHBV
 Pw+potTuabwtp//mZrQ84ZMoQYOmxBmxchquJ2L49qWT/mZ1BDqBA2sxQOCehffO
 Dlv715eOPVhkVHOEAHv+BsatxlFDkuKgGcwUqxE4LNHWGrvmjm6rBWPCnxQMTwX9
 BhGo/0WG6D0leWFSMoUIpcD4Oj302/O43RX4lJsyl0Jd5pKJD/RFtaKqwIeE+pj9
 36MnoQYtT5e9BTm1/zzHRVpGgLDDWTY+IClgZNlZprU/Em+TtOGMWitWzob3wWcU
 QHj5bJ9NiWBT6QJ192i0+I0thSTwW/peOAJrkBOW3YhBkH3UIKSzaPiwCBoSEUKS
 fuIOJkRHFW7HTjKSw6wUCJJ6ShD+rNPOoaJ9Ivzz7x2HGEqb1al3EIumu6oDJgyq
 BdnLEecN/JSxPnakpSGAnvlCjZiYbJ95E0JfapzMy7HYpMXfbzZZ4JO9Ld0hQflR
 sMrGwL82tqE9ish1yRr+2nNEY5PqVJqV0XXU3dZz3HVknNuAvqrqXDXgIZT142zS
 s9vGAW+2XphyUKHH9pImQKw142n2xiZK+Pd2AuiO4I06E0EkAvN/n9CN7oJ4Cr4F
 z9rQKNXNx8OSjFtryYe6+IzQ0qQtl2OP7o88K91jYArKP17D8ds=
 =zkFp
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.4-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap updates from Darrick Wong:
 "After last week's failed pull request attempt, I scuttled everything
  in the branch except for the directio endio api changes, which were
  trivial. Everything else will simply have to wait for the next cycle.

  Summary:

   - Report both io errors and short io results to the directio endio
     handler.

   - Allow directio callers to pass an ops structure to iomap_dio_rw"

* tag 'iomap-5.4-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: move the iomap_dio_rw ->end_io callback into a structure
  iomap: split size and error for iomap_dio_rw ->end_io
2019-09-25 09:01:43 -07:00
Mathieu Desnoyers
227a4aadc7 sched/membarrier: Fix p->mm->membarrier_state racy load
The membarrier_state field is located within the mm_struct, which
is not guaranteed to exist when used from runqueue-lock-free iteration
on runqueues by the membarrier system call.

Copy the membarrier_state from the mm_struct into the scheduler runqueue
when the scheduler switches between mm.

When registering membarrier for mm, after setting the registration bit
in the mm membarrier state, issue a synchronize_rcu() to ensure the
scheduler observes the change. In order to take care of the case
where a runqueue keeps executing the target mm without swapping to
other mm, iterate over each runqueue and issue an IPI to copy the
membarrier_state from the mm_struct into each runqueue which have the
same mm which state has just been modified.

Move the mm membarrier_state field closer to pgd in mm_struct to use
a cache line already touched by the scheduler switch_mm.

The membarrier_execve() (now membarrier_exec_mmap) hook now needs to
clear the runqueue's membarrier state in addition to clear the mm
membarrier state, so move its implementation into the scheduler
membarrier code so it can access the runqueue structure.

Add memory barrier in membarrier_exec_mmap() prior to clearing
the membarrier state, ensuring memory accesses executed prior to exec
are not reordered with the stores clearing the membarrier state.

As suggested by Linus, move all membarrier.c RCU read-side locks outside
of the for each cpu loops.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-5-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-25 17:42:30 +02:00
Qu Wenruo
fab2735955 btrfs: Fix a regression which we can't convert to SINGLE profile
[BUG]
With v5.3 kernel, we can't convert to SINGLE profile:

  # btrfs balance start -f -dconvert=single $mnt
  ERROR: error during balancing '/mnt/btrfs': Invalid argument
  # dmesg -t | tail
  validate_convert_profile: data profile=0x1000000000000 allowed=0x20 is_valid=1 final=0x1000000000000 ret=1
  BTRFS error (device dm-3): balance: invalid convert data profile single

[CAUSE]
With the extra debug output added, it shows that the @allowed bit is
lacking the special in-memory only SINGLE profile bit.

Thus we fail at that (profile & ~allowed) check.

This regression is caused by commit 081db89b13 ("btrfs: use raid_attr
to get allowed profiles for balance conversion") and the fact that we
don't use any bit to indicate SINGLE profile on-disk, but uses special
in-memory only bit to help distinguish different profiles.

[FIX]
Add that BTRFS_AVAIL_ALLOC_BIT_SINGLE to @allowed, so the code should be
the same as it was and fix the regression.

Reported-by: Chris Murphy <lists@colorremedies.com>
Fixes: 081db89b13 ("btrfs: use raid_attr to get allowed profiles for balance conversion")
CC: stable@vger.kernel.org # 5.3+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-25 16:00:37 +02:00
Qu Wenruo
1fac4a5437 btrfs: relocation: fix use-after-free on dead relocation roots
[BUG]
One user reported a reproducible KASAN report about use-after-free:

  BTRFS info (device sdi1): balance: start -dvrange=1256811659264..1256811659265
  BTRFS info (device sdi1): relocating block group 1256811659264 flags data|raid0
  ==================================================================
  BUG: KASAN: use-after-free in btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
  Write of size 8 at addr ffff88856f671710 by task kworker/u24:10/261579

  CPU: 2 PID: 261579 Comm: kworker/u24:10 Tainted: P           OE     5.2.11-arch1-1-kasan #4
  Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018
  Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
  Call Trace:
   dump_stack+0x7b/0xba
   print_address_description+0x6c/0x22e
   ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
   __kasan_report.cold+0x1b/0x3b
   ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
   kasan_report+0x12/0x17
   __asan_report_store8_noabort+0x17/0x20
   btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
   record_root_in_trans+0x2a0/0x370 [btrfs]
   btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
   start_transaction+0x1ab/0xe90 [btrfs]
   btrfs_join_transaction+0x1d/0x20 [btrfs]
   btrfs_finish_ordered_io+0x7bf/0x18a0 [btrfs]
   ? lock_repin_lock+0x400/0x400
   ? __kmem_cache_shutdown.cold+0x140/0x1ad
   ? btrfs_unlink_subvol+0x9b0/0x9b0 [btrfs]
   finish_ordered_fn+0x15/0x20 [btrfs]
   normal_work_helper+0x1bd/0xca0 [btrfs]
   ? process_one_work+0x819/0x1720
   ? kasan_check_read+0x11/0x20
   btrfs_endio_write_helper+0x12/0x20 [btrfs]
   process_one_work+0x8c9/0x1720
   ? pwq_dec_nr_in_flight+0x2f0/0x2f0
   ? worker_thread+0x1d9/0x1030
   worker_thread+0x98/0x1030
   kthread+0x2bb/0x3b0
   ? process_one_work+0x1720/0x1720
   ? kthread_park+0x120/0x120
   ret_from_fork+0x35/0x40

  Allocated by task 369692:
   __kasan_kmalloc.part.0+0x44/0xc0
   __kasan_kmalloc.constprop.0+0xba/0xc0
   kasan_kmalloc+0x9/0x10
   kmem_cache_alloc_trace+0x138/0x260
   btrfs_read_tree_root+0x92/0x360 [btrfs]
   btrfs_read_fs_root+0x10/0xb0 [btrfs]
   create_reloc_root+0x47d/0xa10 [btrfs]
   btrfs_init_reloc_root+0x1e2/0x340 [btrfs]
   record_root_in_trans+0x2a0/0x370 [btrfs]
   btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
   start_transaction+0x1ab/0xe90 [btrfs]
   btrfs_start_transaction+0x1e/0x20 [btrfs]
   __btrfs_prealloc_file_range+0x1c2/0xa00 [btrfs]
   btrfs_prealloc_file_range+0x13/0x20 [btrfs]
   prealloc_file_extent_cluster+0x29f/0x570 [btrfs]
   relocate_file_extent_cluster+0x193/0xc30 [btrfs]
   relocate_data_extent+0x1f8/0x490 [btrfs]
   relocate_block_group+0x600/0x1060 [btrfs]
   btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
   btrfs_relocate_chunk+0x9e/0x180 [btrfs]
   btrfs_balance+0x14e4/0x2fc0 [btrfs]
   btrfs_ioctl_balance+0x47f/0x640 [btrfs]
   btrfs_ioctl+0x119d/0x8380 [btrfs]
   do_vfs_ioctl+0x9f5/0x1060
   ksys_ioctl+0x67/0x90
   __x64_sys_ioctl+0x73/0xb0
   do_syscall_64+0xa5/0x370
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Freed by task 369692:
   __kasan_slab_free+0x14f/0x210
   kasan_slab_free+0xe/0x10
   kfree+0xd8/0x270
   btrfs_drop_snapshot+0x154c/0x1eb0 [btrfs]
   clean_dirty_subvols+0x227/0x340 [btrfs]
   relocate_block_group+0x972/0x1060 [btrfs]
   btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
   btrfs_relocate_chunk+0x9e/0x180 [btrfs]
   btrfs_balance+0x14e4/0x2fc0 [btrfs]
   btrfs_ioctl_balance+0x47f/0x640 [btrfs]
   btrfs_ioctl+0x119d/0x8380 [btrfs]
   do_vfs_ioctl+0x9f5/0x1060
   ksys_ioctl+0x67/0x90
   __x64_sys_ioctl+0x73/0xb0
   do_syscall_64+0xa5/0x370
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  The buggy address belongs to the object at ffff88856f671100
   which belongs to the cache kmalloc-4k of size 4096
  The buggy address is located 1552 bytes inside of
   4096-byte region [ffff88856f671100, ffff88856f672100)
  The buggy address belongs to the page:
  page:ffffea0015bd9c00 refcount:1 mapcount:0 mapping:ffff88864400e600 index:0x0 compound_mapcount: 0
  flags: 0x2ffff0000010200(slab|head)
  raw: 02ffff0000010200 dead000000000100 dead000000000200 ffff88864400e600
  raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff88856f671600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff88856f671680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  >ffff88856f671700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                           ^
   ffff88856f671780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff88856f671800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ==================================================================
  BTRFS info (device sdi1): 1 enospc errors during balance
  BTRFS info (device sdi1): balance: ended with status: -28

[CAUSE]
The problem happens when finish_ordered_io() get called with balance
still running, while the reloc root of that subvolume is already dead.
(Tree is swap already done, but tree not yet deleted for possible qgroup
usage.)

That means root->reloc_root still exists, but that reloc_root can be
under btrfs_drop_snapshot(), thus we shouldn't access it.

The following race could cause the use-after-free problem:

                CPU1              |                CPU2
--------------------------------------------------------------------------
                                  | relocate_block_group()
                                  | |- unset_reloc_control(rc)
                                  | |- btrfs_commit_transaction()
btrfs_finish_ordered_io()         | |- clean_dirty_subvols()
|- btrfs_join_transaction()       |    |
   |- record_root_in_trans()      |    |
      |- btrfs_init_reloc_root()  |    |
         |- if (root->reloc_root) |    |
         |                        |    |- root->reloc_root = NULL
         |                        |    |- btrfs_drop_snapshot(reloc_root);
         |- reloc_root->last_trans|
                 = trans->transid |
	    ^^^^^^^^^^^^^^^^^^^^^^
            Use after free

[FIX]
Fix it by the following modifications:

- Test if the root has dead reloc tree before accessing root->reloc_root
  If the root has BTRFS_ROOT_DEAD_RELOC_TREE, then we don't need to
  create or update root->reloc_tree

- Clear the BTRFS_ROOT_DEAD_RELOC_TREE flag until we have fully dropped
  reloc tree
  To co-operate with above modification, so as long as
  BTRFS_ROOT_DEAD_RELOC_TREE is still set, we won't try to re-create
  reloc tree at record_root_in_trans().

Reported-by: Cebtenzzre <cebtenzzre@gmail.com>
Fixes: d2311e6985 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
CC: stable@vger.kernel.org # 5.1+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-25 15:23:42 +02:00