Commit Graph

65282 Commits

Author SHA1 Message Date
Linus Torvalds
6a70f89cc5 Merge tag 'nfs-for-5.8-3' of git://git.linux-nfs.org/projects/anna/linux-nfs into master
Pull NFS client fixes from Anna Schumaker:
 "A few more NFS client bugfixes for Linux 5.8:

  NFS:
   - Fix interrupted slots by using the SEQUENCE operation

  SUNRPC:
   - revert d03727b248 to fix unkillable IOs

  xprtrdma:
   - Fix double-free in rpcrdma_ep_create()
   - Fix recursion into rpcrdma_xprt_disconnect()
   - Fix return code from rpcrdma_xprt_connect()
   - Fix handling of connect errors
   - Fix incorrect header size calculations"

* tag 'nfs-for-5.8-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC reverting d03727b248 ("NFSv4 fix CLOSE not waiting for direct IO compeletion")
  xprtrdma: fix incorrect header size calculations
  NFS: Fix interrupted slots by sending a solo SEQUENCE operation
  xprtrdma: Fix handling of connect errors
  xprtrdma: Fix return code from rpcrdma_xprt_connect()
  xprtrdma: Fix recursion into rpcrdma_xprt_disconnect()
  xprtrdma: Fix double-free in rpcrdma_ep_create()
2020-07-17 16:37:52 -07:00
Olga Kornievskaia
65caafd0d2 SUNRPC reverting d03727b248 ("NFSv4 fix CLOSE not waiting for direct IO compeletion")
Reverting commit d03727b248 "NFSv4 fix CLOSE not waiting for
direct IO compeletion". This patch made it so that fput() by calling
inode_dio_done() in nfs_file_release() would wait uninterruptably
for any outstanding directIO to the file (but that wait on IO should
be killable).

The problem the patch was also trying to address was REMOVE returning
ERR_ACCESS because the file is still opened, is supposed to be resolved
by server returning ERR_FILE_OPEN and not ERR_ACCESS.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-17 14:47:38 -04:00
Linus Torvalds
4ebf8d7649 Merge tag 'io_uring-5.8-2020-07-17' of git://git.kernel.dk/linux-block into master
Pull io_uring fix from Jens Axboe:
 "Fix for a case where, with automatic buffer selection, we can leak the
  buffer descriptor for recvmsg"

* tag 'io_uring-5.8-2020-07-17' of git://git.kernel.dk/linux-block:
  io_uring: fix recvmsg memory leak with buffer selection
2020-07-17 10:47:51 -07:00
Linus Torvalds
0dd68a34ec Merge tag 'fuse-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into master
Pull fuse fixes from Miklos Szeredi:

 - two regressions in this cycle caused by the conversion of writepage
   list to an rb_tree

 - two regressions in v5.4 cause by the conversion to the new mount API

 - saner behavior of fsconfig(2) for the reconfigure case

 - an ancient issue with FS_IOC_{GET,SET}FLAGS ioctls

* tag 'fuse-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS
  fuse: don't ignore errors from fuse_writepages_fill()
  fuse: clean up condition for writepage sending
  fuse: reject options on reconfigure via fsconfig(2)
  fuse: ignore 'data' argument of mount(..., MS_REMOUNT)
  fuse: use ->reconfigure() instead of ->remount_fs()
  fuse: fix warning in tree_insert() and clean up writepage insertion
  fuse: move rb_erase() before tree_insert()
2020-07-17 10:36:19 -07:00
Linus Torvalds
44fea37378 Merge tag 'ovl-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into master
Pull overlayfs fixes from Miklos Szeredi:

 - fix a regression introduced in v4.20 in handling a regenerated
   squashfs lower layer

 - two regression fixes for this cycle, one of which is Oops inducing

 - miscellaneous issues

* tag 'ovl-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix lookup of indexed hardlinks with metacopy
  ovl: fix unneeded call to ovl_change_flags()
  ovl: fix mount option checks for nfs_export with no upperdir
  ovl: force read-only sb on failure to create index dir
  ovl: fix regression with re-formatted lower squashfs
  ovl: fix oops in ovl_indexdir_cleanup() with nfs_export=on
  ovl: relax WARN_ON() when decoding lower directory file handle
  ovl: remove not used argument in ovl_check_origin
  ovl: change ovl_copy_up_flags static
  ovl: inode reference leak in ovl_is_inuse true case.
2020-07-17 10:29:19 -07:00
Amir Goldstein
4518dfcf76 ovl: fix lookup of indexed hardlinks with metacopy
We recently moved setting inode flag OVL_UPPERDATA to ovl_lookup().

When looking up an overlay dentry, upperdentry may be found by index
and not by name.  In that case, we fail to read the metacopy xattr
and falsly set the OVL_UPPERDATA on the overlay inode.

This caused a regression in xfstest overlay/033 when run with
OVERLAY_MOUNT_OPTIONS="-o metacopy=on".

Fixes: 28166ab3c8 ("ovl: initialize OVL_UPPERDATA in ovl_lookup()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16 07:24:47 +02:00
Amir Goldstein
81a33c1ee9 ovl: fix unneeded call to ovl_change_flags()
The check if user has changed the overlay file was wrong, causing unneeded
call to ovl_change_flags() including taking f_lock on every file access.

Fixes: d989903058 ("ovl: do not generate duplicate fsnotify events for "fake" path")
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16 07:24:47 +02:00
David Howells
811f04bac1 afs: Fix interruption of operations
The afs filesystem driver allows unstarted operations to be cancelled by
signal, but most of these can easily be restarted (mkdir for example).  The
primary culprits for reproducing this are those applications that use
SIGALRM to display a progress counter.

File lock-extension operation is marked uninterruptible as we have a
limited time in which to do it, and the release op is marked
uninterruptible also as if we fail to unlock a file, we'll have to wait 20
mins before anyone can lock it again.

The store operation logs a warning if it gets interruption, e.g.:

	kAFS: Unexpected error from FS.StoreData -4

because it's run from the background - but it can also be run from
fdatasync()-type things.  However, store options aren't marked
interruptible at the moment.

Fix this in the following ways:

 (1) Mark store operations as uninterruptible.  It might make sense to
     relax this for certain situations, but I'm not sure how to make sure
     that background store ops aren't affected by signals to foreground
     processes that happen to trigger them.

 (2) In afs_get_io_locks(), where we're getting the serialisation lock for
     talking to the fileserver, return ERESTARTSYS rather than EINTR
     because a lot of the operations (e.g. mkdir) are restartable if we
     haven't yet started sending the op to the server.

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-15 15:49:04 -07:00
Amir Goldstein
f0e1266ed2 ovl: fix mount option checks for nfs_export with no upperdir
Without upperdir mount option, there is no index dir and the dependency
checks nfs_export => index for mount options parsing are incorrect.

Allow the combination nfs_export=on,index=off with no upperdir and move
the check for dependency redirect_dir=nofollow for non-upper mount case
to mount options parsing.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16 00:11:15 +02:00
Amir Goldstein
470c156361 ovl: force read-only sb on failure to create index dir
With index feature enabled, on failure to create index dir, overlay is
being mounted read-only.  However, we do not forbid user to remount overlay
read-write.  Fix that by setting ofs->workdir to NULL, which prevents
remount read-write.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16 00:11:15 +02:00
Amir Goldstein
a888db3101 ovl: fix regression with re-formatted lower squashfs
Commit 9df085f3c9 ("ovl: relax requirement for non null uuid of lower
fs") relaxed the requirement for non null uuid with single lower layer to
allow enabling index and nfs_export features with single lower squashfs.

Fabian reported a regression in a setup when overlay re-uses an existing
upper layer and re-formats the lower squashfs image.  Because squashfs
has no uuid, the origin xattr in upper layer are decoded from the new
lower layer where they may resolve to a wrong origin file and user may
get an ESTALE or EIO error on lookup.

To avoid the reported regression while still allowing the new features
with single lower squashfs, do not allow decoding origin with lower null
uuid unless user opted-in to one of the new features that require
following the lower inode of non-dir upper (index, xino, metacopy).

Reported-by: Fabian <godi.beat@gmx.net>
Link: https://lore.kernel.org/linux-unionfs/32532923.JtPX5UtSzP@fgdesktop/
Fixes: 9df085f3c9 ("ovl: relax requirement for non null uuid of lower fs")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16 00:10:31 +02:00
Amir Goldstein
20396365a1 ovl: fix oops in ovl_indexdir_cleanup() with nfs_export=on
Mounting with nfs_export=on, xfstests overlay/031 triggers a kernel panic
since v5.8-rc1 overlayfs updates.

 overlayfs: orphan index entry (index/00fb1..., ftype=4000, nlink=2)
 BUG: kernel NULL pointer dereference, address: 0000000000000030
 RIP: 0010:ovl_cleanup_and_whiteout+0x28/0x220 [overlay]

Bisect point at commit c21c839b84 ("ovl: whiteout inode sharing")

Minimal reproducer:
--------------------------------------------------
rm -rf l u w m
mkdir -p l u w m
mkdir -p l/testdir
touch l/testdir/testfile
mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m
echo 1 > m/testdir/testfile
umount m
rm -rf u/testdir
mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m
umount m
--------------------------------------------------

When mount with nfs_export=on, and fail to verify an orphan index, we're
cleaning this index from indexdir by calling ovl_cleanup_and_whiteout().
This dereferences ofs->workdir, that was earlier set to NULL.

The design was that ovl->workdir will point at ovl->indexdir, but we are
assigning ofs->indexdir to ofs->workdir only after ovl_indexdir_cleanup().
There is no reason not to do it sooner, because once we get success from
ofs->indexdir = ovl_workdir_create(... there is no turning back.

Reported-and-tested-by: Murphy Zhou <jencce.kernel@gmail.com>
Fixes: c21c839b84 ("ovl: whiteout inode sharing")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16 00:09:59 +02:00
Amir Goldstein
124c2de2c0 ovl: relax WARN_ON() when decoding lower directory file handle
Decoding a lower directory file handle to overlay path with cold
inode/dentry cache may go as follows:

1. Decode real lower file handle to lower dir path
2. Check if lower dir is indexed (was copied up)
3. If indexed, get the upper dir path from index
4. Lookup upper dir path in overlay
5. If overlay path found, verify that overlay lower is the lower dir
   from step 1

On failure to verify step 5 above, user will get an ESTALE error and a
WARN_ON will be printed.

A mismatch in step 5 could be a result of lower directory that was renamed
while overlay was offline, after that lower directory has been copied up
and indexed.

This is a scripted reproducer based on xfstest overlay/052:

  # Create lower subdir
  create_dirs
  create_test_files $lower/lowertestdir/subdir
  mount_dirs
  # Copy up lower dir and encode lower subdir file handle
  touch $SCRATCH_MNT/lowertestdir
  test_file_handles $SCRATCH_MNT/lowertestdir/subdir -p -o $tmp.fhandle
  # Rename lower dir offline
  unmount_dirs
  mv $lower/lowertestdir $lower/lowertestdir.new/
  mount_dirs
  # Attempt to decode lower subdir file handle
  test_file_handles $SCRATCH_MNT -p -i $tmp.fhandle

Since this WARN_ON() can be triggered by user we need to relax it.

Fixes: 4b91c30a5a ("ovl: lookup connected ancestor of dir in inode cache")
Cc: <stable@vger.kernel.org> # v4.16+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16 00:09:17 +02:00
youngjun
d78a0dcf64 ovl: remove not used argument in ovl_check_origin
ovl_check_origin outparam 'ctrp' argument not used by caller.  So remove
this argument.

Signed-off-by: youngjun <her0gyugyu@gmail.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16 00:06:16 +02:00
youngjun
5ac8e8025a ovl: change ovl_copy_up_flags static
"ovl_copy_up_flags" is used in copy_up.c.
so, change it static.

Signed-off-by: youngjun <her0gyugyu@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16 00:06:16 +02:00
youngjun
24f14009b8 ovl: inode reference leak in ovl_is_inuse true case.
When "ovl_is_inuse" true case, trap inode reference not put.  plus adding
the comment explaining sequence of ovl_is_inuse after ovl_setup_trap.

Fixes: 0be0bfd2de ("ovl: fix regression caused by overlapping layers detection")
Cc: <stable@vger.kernel.org> # v4.19+
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: youngjun <her0gyugyu@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16 00:05:40 +02:00
Pavel Begunkov
681fda8d27 io_uring: fix recvmsg memory leak with buffer selection
io_recvmsg() doesn't free memory allocated for struct io_buffer. This can
causes a leak when used with automatic buffer selection.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-15 13:35:56 -06:00
Chirantan Ekbote
31070f6cce fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS
The ioctl encoding for this parameter is a long but the documentation says
it should be an int and the kernel drivers expect it to be an int.  If the
fuse driver treats this as a long it might end up scribbling over the stack
of a userspace process that only allocated enough space for an int.

This was previously discussed in [1] and a patch for fuse was proposed in
[2].  From what I can tell the patch in [2] was nacked in favor of adding
new, "fixed" ioctls and using those from userspace.  However there is still
no "fixed" version of these ioctls and the fact is that it's sometimes
infeasible to change all userspace to use the new one.

Handling the ioctls specially in the fuse driver seems like the most
pragmatic way for fuse servers to support them without causing crashes in
userspace applications that call them.

[1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/
[2]: https://sourceforge.net/p/fuse/mailman/message/31771759/

Signed-off-by: Chirantan Ekbote <chirantan@chromium.org>
Fixes: 59efec7b90 ("fuse: implement ioctl support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-15 14:18:20 +02:00
Vasily Averin
7779b047a5 fuse: don't ignore errors from fuse_writepages_fill()
fuse_writepages() ignores some errors taken from fuse_writepages_fill() I
believe it is a bug: if .writepages is called with WB_SYNC_ALL it should
either guarantee that all data was successfully saved or return error.

Fixes: 26d614df1d ("fuse: Implement writepages callback")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14 14:45:42 +02:00
Miklos Szeredi
6ddf3af93e fuse: clean up condition for writepage sending
fuse_writepages_fill uses following construction:

if (wpa && ap->num_pages &&
    (A || B || C)) {
        action;
} else if (wpa && D) {
        if (E) {
                the same action;
        }
}

 - ap->num_pages check is always true and can be removed

 - "if" and "else if" calls the same action and can be merged.

Move checking A, B, C, D, E conditions to a helper, add comments.

Original-patch-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14 14:45:41 +02:00
Miklos Szeredi
b330966f79 fuse: reject options on reconfigure via fsconfig(2)
Previous patch changed handling of remount/reconfigure to ignore all
options, including those that are unknown to the fuse kernel fs.  This was
done for backward compatibility, but this likely only affects the old
mount(2) API.

The new fsconfig(2) based reconfiguration could possibly be improved.  This
would make the new API less of a drop in replacement for the old, OTOH this
is a good chance to get rid of some weirdnesses in the old API.

Several other behaviors might make sense:

 1) unknown options are rejected, known options are ignored

 2) unknown options are rejected, known options are rejected if the value
 is changed, allowed otherwise

 3) all options are rejected

Prior to the backward compatibility fix to ignore all options all known
options were accepted (1), even if they change the value of a mount
parameter; fuse_reconfigure() does not look at the config values set by
fuse_parse_param().

To fix that we'd need to verify that the value provided is the same as set
in the initial configuration (2).  The major drawback is that this is much
more complex than just rejecting all attempts at changing options (3);
i.e. all options signify initial configuration values and don't make sense
on reconfigure.

This patch opts for (3) with the rationale that no mount options are
reconfigurable in fuse.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14 14:45:41 +02:00
Miklos Szeredi
e8b20a474c fuse: ignore 'data' argument of mount(..., MS_REMOUNT)
The command

  mount -o remount -o unknownoption /mnt/fuse

succeeds on kernel versions prior to v5.4 and fails on kernel version at or
after.  This is because fuse_parse_param() rejects any unrecognised options
in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT.

This causes a regression in case the fuse filesystem is in fstab, since
remount sends all options found there to the kernel; even ones that are
meant for the initial mount and are consumed by the userspace fuse server.

Fix this by ignoring mount options, just as fuse_remount_fs() did prior to
the conversion to the new API.

Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Fixes: c30da2e981 ("fuse: convert to use the new mount API")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14 14:45:41 +02:00
Miklos Szeredi
0189a2d367 fuse: use ->reconfigure() instead of ->remount_fs()
s_op->remount_fs() is only called from legacy_reconfigure(), which is not
used after being converted to the new API.

Convert to using ->reconfigure().  This restores the previous behavior of
syncing the filesystem and rejecting MS_MANDLOCK on remount.

Fixes: c30da2e981 ("fuse: convert to use the new mount API")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14 14:45:41 +02:00
Miklos Szeredi
c146024ec4 fuse: fix warning in tree_insert() and clean up writepage insertion
fuse_writepages_fill() calls tree_insert() with ap->num_pages = 0 which
triggers the following warning:

 WARNING: CPU: 1 PID: 17211 at fs/fuse/file.c:1728 tree_insert+0xab/0xc0 [fuse]
 RIP: 0010:tree_insert+0xab/0xc0 [fuse]
 Call Trace:
  fuse_writepages_fill+0x5da/0x6a0 [fuse]
  write_cache_pages+0x171/0x470
  fuse_writepages+0x8a/0x100 [fuse]
  do_writepages+0x43/0xe0

Fix up the warning and clean up the code around rb-tree insertion:

 - Rename tree_insert() to fuse_insert_writeback() and make it return the
   conflicting entry in case of failure

 - Re-add tree_insert() as a wrapper around fuse_insert_writeback()

 - Rename fuse_writepage_in_flight() to fuse_writepage_add() and reverse
   the meaning of the return value to mean

    + "true" in case the writepage entry was successfully added

    + "false" in case it was in-fligt queued on an existing writepage
       entry's auxiliary list or the existing writepage entry's temporary
       page updated

   Switch from fuse_find_writeback() + tree_insert() to
   fuse_insert_writeback()

 - Move setting orig_pages to before inserting/updating the entry; this may
   result in the orig_pages value being discarded later in case of an
   in-flight request

 - In case of a new writepage entry use fuse_writepage_add()
   unconditionally, only set data->wpa if the entry was added.

Fixes: 6b2fb79963 ("fuse: optimize writepages search")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Original-path-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14 14:45:41 +02:00
Miklos Szeredi
69a6487ac0 fuse: move rb_erase() before tree_insert()
In fuse_writepage_end() the old writepages entry needs to be removed from
the rbtree before inserting the new one, otherwise tree_insert() would
fail.  This is a very rare codepath and no reproducer exists.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14 14:45:41 +02:00
Anna Schumaker
913fadc5b1 NFS: Fix interrupted slots by sending a solo SEQUENCE operation
We used to do this before 3453d5708b, but this was changed to better
handle the NFS4ERR_SEQ_MISORDERED error code. This commit fixed the slot
re-use case when the server doesn't receive the interrupted operation,
but if the server does receive the operation then it could still end up
replying to the client with mis-matched operations from the reply cache.

We can fix this by sending a SEQUENCE to the server while recovering from
a SEQ_MISORDERED error when we detect that we are in an interrupted slot
situation.

Fixes: 3453d5708b (NFSv4.1: Avoid false retries when RPC calls are interrupted)
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-13 10:50:41 -04:00
Linus Torvalds
4437dd6e8f Merge tag 'io_uring-5.8-2020-07-12' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "Two late fixes again:

   - Fix missing msg_name assignment in certain cases (Pavel)

   - Correct a previous fix for full coverage (Pavel)"

* tag 'io_uring-5.8-2020-07-12' of git://git.kernel.dk/linux-block:
  io_uring: fix not initialised work->flags
  io_uring: fix missing msg_name assignment
2020-07-12 12:17:58 -07:00
Linus Torvalds
72c34e8d70 Merge tag 'for-5.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
 "Two refcounting fixes and one prepartory patch for upcoming splice
  cleanup:

   - fix double put of block group with nodatacow

   - fix missing block group put when remounting with discard=async

   - explicitly set splice callback (no functional change), to ease
     integrating splice cleanup patches"

* tag 'for-5.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: wire up iter_file_splice_write
  btrfs: fix double put of block group with nocow
  btrfs: discard: add missing put when grabbing block group from unused list
2020-07-12 10:58:35 -07:00
Pavel Begunkov
16d598030a io_uring: fix not initialised work->flags
59960b9deb ("io_uring: fix lazy work init") tried to fix missing
io_req_init_async(), but left out work.flags and hash. Do it earlier.

Fixes: 7cdaf587de ("io_uring: avoid whole io_wq_work copy for requests completed inline")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-12 09:40:50 -06:00
Pavel Begunkov
dd821e0c95 io_uring: fix missing msg_name assignment
Ensure to set msg.msg_name for the async portion of send/recvmsg,
as the header copy will copy to/from it.

Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-12 09:40:25 -06:00
Linus Torvalds
5ab39e08ff Merge tag '5.8-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Four cifs/smb3 fixes: the three for stable fix problems found recently
  with change notification including a reference count leak"

* tag '5.8-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module version number
  cifs: fix reference leak for tlink
  smb3: fix unneeded error message on change notify
  cifs: remove the retry in cifs_poxis_lock_set
  smb3: fix access denied on change notify request to some servers
2020-07-10 21:16:48 -07:00
Linus Torvalds
a581387e41 Merge tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:

 - Fix memleak for error path in registered files (Yang)

 - Export CQ overflow state in flags, necessary to fix a case where
   liburing doesn't know if it needs to enter the kernel (Xiaoguang)

 - Fix for a regression in when user memory is accounted freed, causing
   issues with back-to-back ring exit + init if the ulimit -l setting is
   very tight.

* tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-block:
  io_uring: account user memory freed when exit has been queued
  io_uring: fix memleak in io_sqe_files_register()
  io_uring: fix memleak in __io_sqe_files_update()
  io_uring: export cq overflow status to userspace
2020-07-10 09:57:57 -07:00
Linus Torvalds
b1b11d0063 Merge tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc
Pull in-kernel read and write op cleanups from Christoph Hellwig:
 "Cleanup in-kernel read and write operations

  Reshuffle the (__)kernel_read and (__)kernel_write helpers, and ensure
  all users of in-kernel file I/O use them if they don't use iov_iter
  based methods already.

  The new WARN_ONs in combination with syzcaller already found a missing
  input validation in 9p. The fix should be on your way through the
  maintainer ASAP".

[ This is prep-work for the real changes coming 5.9 ]

* tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc:
  fs: remove __vfs_read
  fs: implement kernel_read using __kernel_read
  integrity/ima: switch to using __kernel_read
  fs: add a __kernel_read helper
  fs: remove __vfs_write
  fs: implement kernel_write using __kernel_write
  fs: check FMODE_WRITE in __kernel_write
  fs: unexport __kernel_write
  bpfilter: switch to kernel_write
  autofs: switch to kernel_write
  cachefiles: switch to kernel_write
2020-07-10 09:45:15 -07:00
Linus Torvalds
d02b0478c1 Merge tag 'gfs2-v5.8-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
 "Fix gfs2 readahead deadlocks by adding a IOCB_NOIO flag that allows
  gfs2 to use the generic fiel read iterator functions without having to
  worry about being called back while holding locks".

* tag 'gfs2-v5.8-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Rework read and page fault locking
  fs: Add IOCB_NOIO flag for generic_file_read_iter
2020-07-10 08:53:21 -07:00
Jens Axboe
309fc03a32 io_uring: account user memory freed when exit has been queued
We currently account the memory after the exit work has been run, but
that leaves a gap where a process has closed its ring and until the
memory has been accounted as freed. If the memlocked ulimit is
borderline, then that can introduce spurious setup errors returning
-ENOMEM because the free work hasn't been run yet.

Account this as freed when we close the ring, as not to expose a tiny
gap where setting up a new ring can fail.

Fixes: 85faa7b834 ("io_uring: punt final io_ring_ctx wait-and-free to workqueue")
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-10 09:18:35 -06:00
Yang Yingliang
667e57da35 io_uring: fix memleak in io_sqe_files_register()
I got a memleak report when doing some fuzz test:

BUG: memory leak
unreferenced object 0x607eeac06e78 (size 8):
  comm "test", pid 295, jiffies 4294735835 (age 31.745s)
  hex dump (first 8 bytes):
    00 00 00 00 00 00 00 00                          ........
  backtrace:
    [<00000000932632e6>] percpu_ref_init+0x2a/0x1b0
    [<0000000092ddb796>] __io_uring_register+0x111d/0x22a0
    [<00000000eadd6c77>] __x64_sys_io_uring_register+0x17b/0x480
    [<00000000591b89a6>] do_syscall_64+0x56/0xa0
    [<00000000864a281d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Call percpu_ref_exit() on error path to avoid
refcount memleak.

Fixes: 05f3fb3c53 ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-10 07:50:21 -06:00
Christoph Hellwig
d777659113 btrfs: wire up iter_file_splice_write
btrfs implements the iter_write op and thus can use the more efficient
iov_iter based splice implementation.  For now falling back to the less
efficient default is pretty harmless, but I have a pending series that
removes the default, and thus would cause btrfs to not support splice
at all.

Reported-by: Andy Lavr <andy.lavr@gmail.com>
Tested-by: Andy Lavr <andy.lavr@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-09 19:57:58 +02:00
Josef Bacik
230ed39743 btrfs: fix double put of block group with nocow
While debugging a patch that I wrote I was hitting use-after-free panics
when accessing block groups on unmount.  This turned out to be because
in the nocow case if we bail out of doing the nocow for whatever reason
we need to call btrfs_dec_nocow_writers() if we called the inc.  This
puts our block group, but a few error cases does

if (nocow) {
    btrfs_dec_nocow_writers();
    goto error;
}

unfortunately, error is

error:
	if (nocow)
		btrfs_dec_nocow_writers();

so we get a double put on our block group.  Fix this by dropping the
error cases calling of btrfs_dec_nocow_writers(), as it's handled at the
error label now.

Fixes: 762bf09893 ("btrfs: improve error handling in run_delalloc_nocow")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-09 17:44:26 +02:00
Steve French
a8dab63ea6 cifs: update internal module version number
To 2.28

Signed-off-by: Steve French <stfrench@microsoft.com>
2020-07-09 10:07:09 -05:00
Ronnie Sahlberg
a77592a700 cifs: fix reference leak for tlink
Don't leak a reference to tlink during the NOTIFY ioctl

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org> # v5.6+
2020-07-09 10:06:52 -05:00
Yang Yingliang
f3bd9dae37 io_uring: fix memleak in __io_sqe_files_update()
I got a memleak report when doing some fuzz test:

BUG: memory leak
unreferenced object 0xffff888113e02300 (size 488):
comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
a0 a4 ce 19 81 88 ff ff 60 ce 09 0d 81 88 ff ff ........`.......
backtrace:
[<00000000129a84ec>] kmem_cache_zalloc include/linux/slab.h:659 [inline]
[<00000000129a84ec>] __alloc_file+0x25/0x310 fs/file_table.c:101
[<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151
[<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193
[<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233
[<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
[<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74
[<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720
[<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0xffff8881152dd5e0 (size 16):
comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s)
hex dump (first 16 bytes):
01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000074caa794>] kmem_cache_zalloc include/linux/slab.h:659 [inline]
[<0000000074caa794>] lsm_file_alloc security/security.c:567 [inline]
[<0000000074caa794>] security_file_alloc+0x32/0x160 security/security.c:1440
[<00000000c6745ea3>] __alloc_file+0xba/0x310 fs/file_table.c:106
[<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151
[<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193
[<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233
[<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
[<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74
[<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720
[<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

If io_sqe_file_register() failed, we need put the file that get by fget()
to avoid the memleak.

Fixes: c3a31e6056 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08 20:16:19 -06:00
Xiaoguang Wang
6d5f904904 io_uring: export cq overflow status to userspace
For those applications which are not willing to use io_uring_enter()
to reap and handle cqes, they may completely rely on liburing's
io_uring_peek_cqe(), but if cq ring has overflowed, currently because
io_uring_peek_cqe() is not aware of this overflow, it won't enter
kernel to flush cqes, below test program can reveal this bug:

static void test_cq_overflow(struct io_uring *ring)
{
        struct io_uring_cqe *cqe;
        struct io_uring_sqe *sqe;
        int issued = 0;
        int ret = 0;

        do {
                sqe = io_uring_get_sqe(ring);
                if (!sqe) {
                        fprintf(stderr, "get sqe failed\n");
                        break;;
                }
                ret = io_uring_submit(ring);
                if (ret <= 0) {
                        if (ret != -EBUSY)
                                fprintf(stderr, "sqe submit failed: %d\n", ret);
                        break;
                }
                issued++;
        } while (ret > 0);
        assert(ret == -EBUSY);

        printf("issued requests: %d\n", issued);

        while (issued) {
                ret = io_uring_peek_cqe(ring, &cqe);
                if (ret) {
                        if (ret != -EAGAIN) {
                                fprintf(stderr, "peek completion failed: %s\n",
                                        strerror(ret));
                                break;
                        }
                        printf("left requets: %d\n", issued);
                        continue;
                }
                io_uring_cqe_seen(ring, cqe);
                issued--;
                printf("left requets: %d\n", issued);
        }
}

int main(int argc, char *argv[])
{
        int ret;
        struct io_uring ring;

        ret = io_uring_queue_init(16, &ring, 0);
        if (ret) {
                fprintf(stderr, "ring setup failed: %d\n", ret);
                return 1;
        }

        test_cq_overflow(&ring);
        return 0;
}

To fix this issue, export cq overflow status to userspace by adding new
IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as
io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08 19:17:06 -06:00
Steve French
8668115cf2 smb3: fix unneeded error message on change notify
We should not be logging a warning repeatedly on change notify.

CC: Stable <stable@vger.kernel.org> # v5.6+
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-07-08 03:59:02 -05:00
Christoph Hellwig
775802c057 fs: remove __vfs_read
Fold it into the two callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:57 +02:00
Christoph Hellwig
6209dd9132 fs: implement kernel_read using __kernel_read
Consolidate the two in-kernel read helpers to make upcoming changes
easier.  The only difference are the missing call to rw_verify_area
in kernel_read, and an access_ok check that doesn't make sense for
kernel buffers to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:57 +02:00
Christoph Hellwig
61a707c543 fs: add a __kernel_read helper
This is the counterpart to __kernel_write, and skip the rw_verify_area
call compared to kernel_read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:56 +02:00
Christoph Hellwig
53ad86266b fs: remove __vfs_write
Fold it into the two callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:56 +02:00
Christoph Hellwig
81238b2cff fs: implement kernel_write using __kernel_write
Consolidate the two in-kernel write helpers to make upcoming changes
easier.  The only difference are the missing call to rw_verify_area
in kernel_write, and an access_ok check that doesn't make sense for
kernel buffers to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:56 +02:00
Christoph Hellwig
a01ac27be4 fs: check FMODE_WRITE in __kernel_write
Add a WARN_ON_ONCE if the file isn't actually open for write.  This
matches the check done in vfs_write, but actually warn warns as a
kernel user calling write on a file not opened for writing is a pretty
obvious programming error.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:56 +02:00
Christoph Hellwig
9db9775224 fs: unexport __kernel_write
This is a very special interface that skips sb_writes protection, and not
used by modules anymore.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:56 +02:00