Commit Graph

64110 Commits

Author SHA1 Message Date
Brian Foster
f20192991d xfs: simplify inode flush error handling
The inode flush code has several layers of error handling between
the inode and cluster flushing code. If the inode flush fails before
acquiring the backing buffer, the inode flush is aborted. If the
cluster flush fails, the current inode flush is aborted and the
cluster buffer is failed to handle the initial inode and any others
that might have been attached before the error.

Since xfs_iflush() is the only caller of xfs_iflush_cluster(), the
error handling between the two can be condensed in the top-level
function. If we update xfs_iflush_int() to always fall through to
the log item update and attach the item completion handler to the
buffer, any errors that occur after the first call to
xfs_iflush_int() can be handled with a buffer I/O failure.

Lift the error handling from xfs_iflush_cluster() into xfs_iflush()
and consolidate with the existing error handling. This also replaces
the need to release the buffer because failing the buffer with
XBF_ASYNC drops the current reference.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-07 08:27:45 -07:00
Brian Foster
54b3b1f619 xfs: factor out buffer I/O failure code
We use the same buffer I/O failure code in a few different places.
It's not much code, but it's not necessarily self-explanatory.
Factor it into a helper and document it in one place.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-07 08:27:45 -07:00
Brian Foster
cb6ad0993e xfs: refactor failed buffer resubmission into xfsaild
Flush locked log items whose underlying buffers fail metadata
writeback are tagged with a special flag to indicate that the flush
lock is already held. This is currently implemented in the type
specific ->iop_push() callback, but the processing required for such
items is not type specific because we're only doing basic state
management on the underlying buffer.

Factor the failed log item handling out of the inode and dquot
->iop_push() callbacks and open code the buffer resubmit helper into
a single helper called from xfsaild_push_item(). This provides a
generic mechanism for handling failed metadata buffer writeback with
a bit less code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-07 08:27:45 -07:00
Darrick J. Wong
8bc3b5e4b7 xfs: clean up the error handling in xfs_swap_extents
Make sure we release resources properly if we cannot clean out the COW
extents in preparation for an extent swap.

Fixes: 96987eea53 ("xfs: cancel COW blocks before swapext")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-06 13:17:21 -07:00
Ira Weiny
840d493dff fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
nearly identical.  The only difference is that *_to_linux() is called after
inode setup and disallows changing the DAX flag.

Combining them can be done with a flag which indicates if this is the initial
setup to allow the DAX flag to be properly set only at init time.

So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
directly.

While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
use xfs_ip2xflags() to ensure future diflags are included correctly.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:43 -07:00
Ira Weiny
32dbc5655f fs/xfs: Create function xfs_inode_should_enable_dax()
xfs_inode_supports_dax() should reflect if the inode can support DAX not
that it is enabled for DAX.

Change the use of xfs_inode_supports_dax() to reflect only if the inode
and underlying storage support dax.

Add a new function xfs_inode_should_enable_dax() which reflects if the
inode should be enabled for DAX.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:43 -07:00
Ira Weiny
8d6c3446ec fs/xfs: Make DAX mount option a tri-state
As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
continues to operate the same.  We add 'always', 'never', and 'inode'
(default).

[1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:43 -07:00
Ira Weiny
606723d982 fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
In prep for the new tri-state mount option which then introduces
XFS_MOUNT_DAX_NEVER.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:43 -07:00
Ira Weiny
d45344d6c4 fs/xfs: Remove unnecessary initialization of i_rwsem
An earlier call of xfs_reinit_inode() from xfs_iget_cache_hit() already
handles initialization of i_rwsem.

Doing so again is unneeded.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:17 -07:00
Christoph Hellwig
2f88f1efd0 xfs: spell out the parameter name for ->cancel_item
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:17 -07:00
Christoph Hellwig
3ec1b26c04 xfs: use a xfs_btree_cur for the ->finish_cleanup state
Given how XFS is all based around btrees it doesn't make much sense
to offer a totally generic state when we can just use the btree cursor.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:17 -07:00
Christoph Hellwig
f09d167c20 xfs: turn dfp_done into a xfs_log_item
All defer op instance place their own extension of the log item into
the dfp_done field.  Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.

Also use the opportunity to improve the ->finish_item calling conventions
to place the done log item as the higher level structure before the
list_entry used for the individual items.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:17 -07:00
Christoph Hellwig
bb47d79750 xfs: refactor xfs_defer_finish_noroll
Split out a helper that operates on a single xfs_defer_pending structure
to untangle the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:17 -07:00
Christoph Hellwig
13a8333339 xfs: turn dfp_intent into a xfs_log_item
All defer op instance place their own extension of the log item into
the dfp_intent field.  Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
d367a868e4 xfs: merge the ->diff_items defer op into ->create_intent
This avoids a per-item indirect call, and also simplifies the interface
a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
c1f09188e8 xfs: merge the ->log_item defer op into ->create_intent
These are aways called together, and my merging them we reduce the amount
of indirect calls, improve type safety and in general clean up the code
a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
e046e94948 xfs: factor out a xfs_defer_create_intent helper
Create a helper that encapsulates the whole logic to create a defer
intent.  This reorders some of the work that was done, but none of
that has an affect on the operation as only fields that don't directly
interact are affected.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
fd9cbe5121 xfs: remove the xfs_inode_log_item_t typedef
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
c84e819090 xfs: remove the xfs_efd_log_item_t typedef
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
82ff450b2d xfs: remove the xfs_efi_log_item_t typedef
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
98b69b1285 xfs: refactor xlog_recover_buffer_pass1
Split out a xlog_add_buffer_cancelled helper which does the low-level
manipulation of the buffer cancelation table, and in that helper call
xlog_find_buffer_cancelled instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
f15ab3f60e xfs: simplify xlog_recover_inode_ra_pass2
Don't bother to allocate memory and convert the log item when we
only need the block number and the length.  Just extract them directly
and call xlog_buf_readahead separately in each branch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:15 -07:00
Christoph Hellwig
7d4894b4ce xfs: factor out a xlog_buf_readahead helper
Add a little helper to readahead a buffer if it hasn't been cancelled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:15 -07:00
Christoph Hellwig
5ce70b770d xfs: rename inode_list xlog_recover_reorder_trans
This list contains pretty much everything that is not a buffer.  The
comment calls it item_list, which is a much better name than inode
list, so switch the actual variable name to that as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:15 -07:00
Christoph Hellwig
e968350aad xfs: refactor the buffer cancellation table helpers
Replace the somewhat convoluted use of xlog_peek_buffer_cancelled and
xlog_check_buffer_cancelled with two obvious helpers:

 xlog_is_buffer_cancelled, which returns true if there is a buffer in
 the cancellation table, and
 xlog_put_buffer_cancelled, which also decrements the reference count
 of the buffer cancellation table.

Both share a little helper to look up the entry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:15 -07:00
Eric Sandeen
ec43f6da31 xfs: define printk_once variants for xfs messages
There are a couple places where we directly call printk_once() and one
of them doesn't follow the standard xfs subsystem printk format as a
result.

#define printk_once variants to go with our existing printk_ratelimited
#defines so we can do one-shot printks in a consistent manner.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:15 -07:00
Arnd Bergmann
166405f6b5 xfs: stop CONFIG_XFS_DEBUG from changing compiler flags
I ran into a linker warning in XFS that originates from a mismatch
between libelf, binutils and objtool when certain files in the kernel
are built with "gcc -g":

x86_64-linux-ld: fs/xfs/xfs_trace.o: unable to initialize decompress status for section .debug_info

After some discussion, nobody could identify why xfs sets this flag
here. CONFIG_XFS_DEBUG used to enable lots of unrelated settings, but
now its main purpose is to enable extra consistency checks and assertions
that are unrelated to the debug info.

Remove the Makefile logic to set the flag here. If anyone relies
on the debug info, this can simply be enabled again with the global
CONFIG_DEBUG_INFO option.

Dave Chinner writes:

I'm pretty sure it was needed for the original kgdb integration back
in the early 2000s. That was when SGI used to patch their XFS dev
tree with kgdb and debug symbols were needed by the custom kgdb
modules that were ported across from the Irix kernel debugger.

ISTR that the early kcrash kernel dump analysis tools (again,
originated from the Irix "icrash" kernel dump tools) had custom XFS
debug scripts that needed also the debug info to work correctly...

Which is a long way of saying "we don't need it anymore" instead of
"nobody knows why it was set"... :)

Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/lkml/20200409074130.GD21033@infradead.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:15 -07:00
Kaixu Xia
57fd2d8f61 xfs: remove unnecessary check of the variable resblks in xfs_symlink
Since the "no-allocation" reservations has been removed, the resblks
value should be larger than zero, so remove the unnecessary check.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:15 -07:00
Kaixu Xia
cd59455980 xfs: simplify the flags setting in xfs_qm_scall_quotaon
Simplify the setting of the flags value, and only consider
quota enforcement stuff here.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:14 -07:00
Kaixu Xia
7994aae851 xfs: remove unnecessary assertion from xfs_qm_vop_create_dqattach
The check XFS_IS_QUOTA_RUNNING() has been done when enter the
xfs_qm_vop_create_dqattach() function, it will return directly
if the result is false, so the followed XFS_IS_QUOTA_RUNNING()
assertion is unnecessary. If we truly care about this, the check
also can be added to the condition of next if statements.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:14 -07:00
Kaixu Xia
ea1c90403d xfs: remove unnecessary variable udqp from xfs_ioctl_setattr
The initial value of variable udqp is NULL, and we only set the
flag XFS_QMOPT_PQUOTA in xfs_qm_vop_dqalloc() function, so only
the pdqp value is initialized and the udqp value is still NULL.
Since the udqp value is NULL in the rest part of xfs_ioctl_setattr()
function, it is meaningless and do nothing. So remove it from
xfs_ioctl_setattr().

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:14 -07:00
Kaixu Xia
fb353ff19d xfs: reserve quota inode transaction space only when needed
We share an inode between gquota and pquota with the older
superblock that doesn't have separate pquotino, and for the
need_alloc == false case we don't need to call xfs_dir_ialloc()
function, so add the check if reserved free disk blocks is
needed.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:14 -07:00
Kaixu Xia
d51bafe0d2 xfs: combine two if statements with same condition
The two if statements have same condition, and the mask value
does not change in xfs_setattr_nonsize(), so combine them.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:14 -07:00
Kaixu Xia
c140735bbb xfs: trace quota allocations for all quota types
The trace event xfs_dquot_dqalloc does not depend on the
value uq, so remove the condition, and trace quota allocations
for all quota types.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:14 -07:00
Darrick J. Wong
0d2d35a33e xfs: report unrecognized log item type codes during recovery
When we're sorting recovered log items ahead of recovering them and
encounter a log item of unknown type, actually print the type code when
we're rejecting the whole transaction to aid in debugging.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-04 09:03:14 -07:00
Linus Torvalds
262f7a6b83 for-5.7-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl6u7jUACgkQxWXV+ddt
 WDu6AQ/+K1vegSRJMhG1c0U3XECeYfki7NZVizzMs+G6oCU2LxBPla+qidugc0pA
 5wAjP5AFaJQWv9JrVRyBfnvsH9HedL+9fNVmZlWZZ1ujXvZSyArdp5n9IyPCJ926
 gA39nHSlcUOYSUfkiU8OqUOTyQjh9ZzSxbqIwsc4lKK9FrcLJ8fLXtbyKjLsxx7A
 CTUYmyip6weQvMhQBWMFiN8LLle49s28BBbCfPenD+1sSF0UR6UyrFjDxBqusjkQ
 mkoFwgnVLkES6ni1fJSUdDJMOaPkCCwn9EBiTwF29ki2Kbhu/erCHUZ+OLEDUOMg
 JqIbAxWmx9+VNthVJWpVjNk9Eojr8LstpItG747DepE3S34bbtTSw9n0Ppp1lNrG
 YFAA2ZIyhv5lZaq7f/hxfKQtz3MjsnKDoXZQbVnYh+FOiIssjDrK45UB9FP4Gy5I
 nO/AejuOfaBqijz6PLLmHBA/SlsF50ejek32iiQQU+jVb9WGxCYUARXBVSh+7Iw5
 PS6KkWQgXePCn3ulIc3eeQDJhP4gY1vCqIUsY5GbM/zHlBP75bDk0qP/kIu2j4yR
 2Vrw3sG1tylBTWInjm7HiP9/9ZGy552AVSgqTeiv32VeBZ1hmQP04IbyzqYz4Clq
 Qf7TJCDmTJSBr6TfvpsYtTyARhvh0pZ7X1b4Ymm5D/laSWXevf0=
 =xn0p
 -----END PGP SIGNATURE-----

Merge tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull more btrfs fixes from David Sterba:
 "A few more stability fixes, minor build warning fixes and git url
  fixup:

   - fix partial loss of prealloc extent past i_size after fsync

   - fix potential deadlock due to wrong transaction handle passing via
     journal_info

   - fix gcc 4.8 struct intialization warning

   - update git URL in MAINTAINERS entry"

* tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  MAINTAINERS: btrfs: fix git repo URL
  btrfs: fix gcc-4.8 build warning for struct initializer
  btrfs: transaction: Avoid deadlock due to bad initialization timing of fs_info::journal_info
  btrfs: fix partial loss of prealloc extent past i_size after fsync
2020-05-03 11:30:08 -07:00
Linus Torvalds
f66ed1ebbf Changes for 5.7:
- Move the FIBMAP range check and warning out of the backend iomap
 implementation and into the frontend ioctl_fibmap so that the checking
 is consistent for all implementations.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl6q6lQACgkQ+H93GTRK
 tOvt4g/+NlLRvPceod9x7goJGuBAJD3gmuP/Ma7qzFi5YZE7tbbBKikvKWIgtz8l
 D4kPRepVTeOCECWzvYwbreqizk0WNr5Buc5Ia3QMPrigIUPomRygvNAcFmLIRF58
 VFKIoUupM9oxPbzc5RXLx0QHYanUFZY41AzFTTQb9EGRw+WUzpih6FUxRrra0pFp
 c5FN9pUaX7kAaUfryS5oK5f6T1ZmZWXQyaNOv+fXLdtd9eNMUxTOiBr+agZn0Ay3
 XIdYWfI2ruyDiYYvaO52NAj9+MRwP9oW0aQLnFHwThv1M4I5qxtg0Ljhl4wT6vq5
 VC2HHicETTuN0nTMQo183AU8AS9/SbSaFmgliVGrWiHp+IOyZzEYe3++damAUenH
 k9o7un6i8nISVdoGs3U2yv6hJN1vmvWOK4JE26EOU/AfjHyYE8aqNRf4XR/f5bTr
 nfD45eoN8V00iCIunL2UhluBeON1+KGUdMevn0ia948I9e5+DVMIsUm+vSf3c0ah
 F8oQlGUucApi3KzVA72nmIwG/gP7oUrtjgBKSoRE+W3/ixcy1S5mc0oUYh4I62Ia
 Sgv9pHUNwbWSVXfWIx83YmkaJpCurp5VuJy4FWsg6BNCB81lIosSKKjHpwwx3Xyi
 19WWxvPFrZ2JxxWp6M5XWvYydQS590Mc5j2ywHluZsrwOVc2UBc=
 =6rBo
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fix from Darrick Wong:
 "Hoist the check for an unrepresentable FIBMAP return value into
  ioctl_fibmap.

  The internal kernel function can handle 64-bit values (and is needed
  to fix a regression on ext4 + jbd2). It is only the userspace ioctl
  that is so old that it cannot deal"

* tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fibmap: Warn and return an error in case of block > INT_MAX
2020-05-02 11:31:12 -07:00
Linus Torvalds
29a47f456d NFS client bugfixes for Linux 5.7
Highlights include:
 
 Stable fixes
 - fix handling of backchannel binding in BIND_CONN_TO_SESSION
 
 Bugfixes
 - Fix a credential use-after-free issue in pnfs_roc()
 - Fix potential posix_acl refcnt leak in nfs3_set_acl
 - defer slow parts of rpc_free_client() to a workqueue
 - Fix an Oopsable race in __nfs_list_for_each_server()
 - Fix trace point use-after-free race
 - Regression: the RDMA client no longer responds to server disconnect requests
 - Fix return values of xdr_stream_encode_item_{present, absent}
 - _pnfs_return_layout() must always wait for layoutreturn completion
 
 Cleanups
 - Remove unreachable error conditions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl6tczsACgkQZwvnipYK
 APKHWg//QGx2Tolj5dh2jBHa47A5/SYnJxCZAA0/fWdwRtFkW3HyyGne1jU86do2
 SMAVpBpri1WJPt5d3DH66gu4l4UxG1h84s7QP4lGfSa85EmtLh+LoZQCZRqYoDOo
 JAMzWctELu1TUpaa1N5Dhg/qMtMy6ulRMWgzTLqB9a/pQa3onugTK6W7xiut2prj
 PBfFq7N9XXmPboSeGV9bR4L8XKSbTCLEt3U1F2zAGU7UUINvDfpjEXq7BHYCewKL
 ObPW6EWZksyna16H8i/xGWoKgE4JFVjMwQAP7UdDBi+FW9RI6UpTBoR6z9N748j0
 jEocDbI21wgnwmtrVTbzsYm6ttHl4D4egoNxn7m5zjxTU4Ba/RQG2aaHUGFOYpJj
 1FI1f6V1Y5v4mJajdsEH+pGW/4vK/4YMR+7YHJ/hYU/WiXjLf7onIIifdWt4SQdo
 lvZbGcx6IAHYUA4lI7hkcvrK4bbqAnPLFq28nlUWEID5q5D+nA1ZR9iN0FToviDy
 FYyhQzyfD1kt98SV1DjWUqvDDd6IB64iDZTXGmtWvj6c2nbezGiFffvtzUL5LFxY
 QfI8lkpmUyt1EiWlZWhtOh4zsiM5yMZkJB/3RJv3RMmswizSSAHdgCKWhdLpX0bl
 TG1L8yEmcTc5ANS37EhlpcBNbfYw7oIF/OXuReTSRoMQl5hxjfY=
 =w0zk
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - fix handling of backchannel binding in BIND_CONN_TO_SESSION

  Bugfixes:
   - Fix a credential use-after-free issue in pnfs_roc()
   - Fix potential posix_acl refcnt leak in nfs3_set_acl
   - defer slow parts of rpc_free_client() to a workqueue
   - Fix an Oopsable race in __nfs_list_for_each_server()
   - Fix trace point use-after-free race
   - Regression: the RDMA client no longer responds to server disconnect
     requests
   - Fix return values of xdr_stream_encode_item_{present, absent}
   - _pnfs_return_layout() must always wait for layoutreturn completion

  Cleanups:
   - Remove unreachable error conditions"

* tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix a race in __nfs_list_for_each_server()
  NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
  SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
  NFSv4: Remove unreachable error condition due to rpc_run_task()
  SUNRPC: Remove unreachable error condition
  xprtrdma: Fix use of xdr_stream_encode_item_{present, absent}
  xprtrdma: Fix trace point use-after-free race
  xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler()
  nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
  NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
  NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion
2020-05-02 11:24:01 -07:00
Linus Torvalds
cf0185308c io_uring-5.7-2020-05-01
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6spz8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjHjEACp2V+14XpWl1F6rJpLSq0BJZ3wCReqj7it
 tPImiZsx3fLiwslW8IFrDuT1tyCpODOECSA87vXebHjHvgmrbDayrAUJXlyYSk0N
 +zwXTg7wH9XQ0CEQbzPIA/DK3evJ/CqRgTAa8r/ZIdm1sx8jIyq2QrwAo9IX7YyG
 mQttrm37C4RrzU2dqcp0aBFhmiC6GRI34IYNK6idJ3wUFOCAg1Ur3veX9aG94gaV
 cA1P12sSYnIAIAxUko/siPIvtJJ9s1tLJ6UREpqUMzgrfSEhZTPRvyv8xQLmTIl1
 BcFj7Y3iorGde5PQUEPYoW7GXydU1LefJLH1C8GAbwRO1YyPD78Rff0sV8Bi0y9Z
 hLnnvc7GEII/z0yxqnasEYYlWxhAcusO7HQDf1NMsxfuNXy5ofn1Kfuk5FFEcvj+
 AjqvpN+sfJ9GPHrAGNT06kTMV0imCEmxuEanEc7cg1c2nfH4mJqt/vbH9tyD0aFk
 JBuOeXToYywRqHHGSGcHGPkClcDoAw6dXqqKdJj6i0ya+EIsP2Ztp40Ae9yCDqew
 AhrYQuEsJ7WJvxjogKn8fX8GSRnOJF1jb54pcNffw/e5q04e5YG/ACII+W/L1nPB
 81BDcQjzB+f6xNxDZFGh0tQKvuVDe8b//vY+g2v6YoJYcAkLUSjy2FJDpoBjhzUu
 03mYIP8kAg==
 =cZOE
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Fix for statx not grabbing the file table, making AT_EMPTY_PATH fail

 - Cover a few cases where async poll can handle retry, eliminating the
   need for an async thread

 - fallback request busy/free fix (Bijan)

 - syzbot reported SQPOLL thread exit fix for non-preempt (Xiaoguang)

 - Fix extra put of req for sync_file_range (Pavel)

 - Always punt splice async. We'll improve this for 5.8, but wanted to
   eliminate the inode mutex lock from the non-blocking path for 5.7
   (Pavel)

* tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-block:
  io_uring: punt splice async because of inode mutex
  io_uring: check non-sync defer_list carefully
  io_uring: fix extra put in sync_file_range()
  io_uring: use cond_resched() in io_ring_ctx_wait_and_kill()
  io_uring: use proper references for fallback_req locking
  io_uring: only force async punt if poll based retry can't handle it
  io_uring: enable poll retry for any file with ->read_iter / ->write_iter
  io_uring: statx must grab the file table for valid fd
2020-05-01 17:03:06 -07:00
Pavel Begunkov
2fb3e82284 io_uring: punt splice async because of inode mutex
Nonblocking do_splice() still may wait for some time on an inode mutex.
Let's play safe and always punt it async.

Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-01 08:50:57 -06:00
Pavel Begunkov
4ee3631451 io_uring: check non-sync defer_list carefully
io_req_defer() do double-checked locking. Use proper helpers for that,
i.e. list_empty_careful().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-01 08:50:30 -06:00
Pavel Begunkov
7759a0bfad io_uring: fix extra put in sync_file_range()
[   40.179474] refcount_t: underflow; use-after-free.
[   40.179499] WARNING: CPU: 6 PID: 1848 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0
...
[   40.179612] RIP: 0010:refcount_warn_saturate+0xae/0xf0
[   40.179617] Code: 28 44 0a 01 01 e8 d7 01 c2 ff 0f 0b 5d c3 80 3d 15 44 0a 01 00 75 91 48 c7 c7 b8 f5 75 be c6 05 05 44 0a 01 01 e8 b7 01 c2 ff <0f> 0b 5d c3 80 3d f3 43 0a 01 00 0f 85 6d ff ff ff 48 c7 c7 10 f6
[   40.179619] RSP: 0018:ffffb252423ebe18 EFLAGS: 00010286
[   40.179623] RAX: 0000000000000000 RBX: ffff98d65e929400 RCX: 0000000000000000
[   40.179625] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 00000000ffffffff
[   40.179627] RBP: ffffb252423ebe18 R08: 0000000000000001 R09: 000000000000055d
[   40.179629] R10: 0000000000000c8c R11: 0000000000000001 R12: 0000000000000000
[   40.179631] R13: ffff98d68c434400 R14: ffff98d6a9cbaa20 R15: ffff98d6a609ccb8
[   40.179634] FS:  0000000000000000(0000) GS:ffff98d6af580000(0000) knlGS:0000000000000000
[   40.179636] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   40.179638] CR2: 00000000033e3194 CR3: 000000006480a003 CR4: 00000000003606e0
[   40.179641] Call Trace:
[   40.179652]  io_put_req+0x36/0x40
[   40.179657]  io_free_work+0x15/0x20
[   40.179661]  io_worker_handle_work+0x2f5/0x480
[   40.179667]  io_wqe_worker+0x2a9/0x360
[   40.179674]  ? _raw_spin_unlock_irqrestore+0x24/0x40
[   40.179681]  kthread+0x12c/0x170
[   40.179685]  ? io_worker_handle_work+0x480/0x480
[   40.179690]  ? kthread_park+0x90/0x90
[   40.179695]  ret_from_fork+0x35/0x40
[   40.179702] ---[ end trace 85027405f00110aa ]---

Opcode handler must never put submission ref, but that's what
io_sync_file_range_finish() do. use io_steal_work() there.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-01 08:50:30 -06:00
Xiaoguang Wang
3fd44c8671 io_uring: use cond_resched() in io_ring_ctx_wait_and_kill()
While working on to make io_uring sqpoll mode support syscalls that need
struct files_struct, I got cpu soft lockup in io_ring_ctx_wait_and_kill(),

    while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait))
        cpu_relax();

above loop never has an chance to exit, it's because preempt isn't enabled
in the kernel, and the context calling io_ring_ctx_wait_and_kill() and
io_sq_thread() run in the same cpu, if io_sq_thread calls a cond_resched()
yield cpu and another context enters above loop, then io_sq_thread() will
always in runqueue and never exit.

Use cond_resched() can fix this issue.

 Reported-by: syzbot+66243bb7126c410cefe6@syzkaller.appspotmail.com
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-30 22:24:27 -06:00
Bijan Mottahedeh
dd461af659 io_uring: use proper references for fallback_req locking
Use ctx->fallback_req address for test_and_set_bit_lock() and
clear_bit_unlock().

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-30 22:24:27 -06:00
Jens Axboe
490e89676a io_uring: only force async punt if poll based retry can't handle it
We do blocking retry from our poll handler, if the file supports polled
notifications. Only mark the request as needing an async worker if we
can't poll for it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-30 22:24:27 -06:00
Jens Axboe
af197f50ac io_uring: enable poll retry for any file with ->read_iter / ->write_iter
We can have files like eventfd where it's perfectly fine to do poll
based retry on them, right now io_file_supports_async() doesn't take
that into account.

Pass in data direction and check the f_op instead of just always needing
an async worker.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-30 22:24:22 -06:00
Trond Myklebust
9c07b75b80 NFS: Fix a race in __nfs_list_for_each_server()
The struct nfs_server gets put on the cl_superblocks list before
the server->super field has been initialised, in which case the
call to nfs_sb_active() will Oops. Add a check to ensure that
we skip such a list entry.

Fixes: 3c9e502b59 ("NFS: Add a helper nfs_client_for_each_server()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-30 15:08:26 -04:00
Ritesh Harjani
b75dfde121 fibmap: Warn and return an error in case of block > INT_MAX
We better warn the fibmap user and not return a truncated and therefore
an incorrect block map address if the bmap() returned block address
is greater than INT_MAX (since user supplied integer pointer).

It's better to pr_warn() all user of ioctl_fibmap() and return a proper
error code rather than silently letting a FS corruption happen if the
user tries to fiddle around with the returned block map address.

We fix this by returning an error code of -ERANGE and returning 0 as the
block mapping address in case if it is > INT_MAX.

Now iomap_bmap() could be called from either of these two paths.
Either when a user is calling an ioctl_fibmap() interface to get
the block mapping address or by some filesystem via use of bmap()
internal kernel API.
bmap() kernel API is well equipped with handling of u64 addresses.

WARN condition in iomap_bmap_actor() was mainly added to warn all
the fibmap users. But now that we have directly added this warning
for all fibmap users and also made sure to return 0 as block map address
in case if addr > INT_MAX.
So we can now remove this logic from iomap_bmap_actor().

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-30 07:57:46 -07:00
Arnd Bergmann
9c6c723f48 btrfs: fix gcc-4.8 build warning for struct initializer
Some older compilers like gcc-4.8 warn about mismatched curly braces in
a initializer:

fs/btrfs/backref.c: In function 'is_shared_data_backref':
fs/btrfs/backref.c:394:9: error: missing braces around
initializer [-Werror=missing-braces]
  struct prelim_ref target = {0};
         ^
fs/btrfs/backref.c:394:9: error: (near initialization for
'target.rbnode') [-Werror=missing-braces]

Use the GNU empty initializer extension to avoid this.

Fixes: ed58f2e66e ("btrfs: backref, don't add refs from shared block when resolving normal backref")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-30 12:17:49 +02:00
Linus Torvalds
96c9a7802a Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Two old bugs..."

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  propagate_one(): mnt_set_mountpoint() needs mount_lock
  dlmfs_file_write(): fix the bogosity in handling non-zero *ppos
2020-04-28 14:38:39 -07:00
David Howells
dd7bc8158b Fix use after free in get_tree_bdev()
Commit 6fcf0c72e4, a fix to get_tree_bdev() put a missing blkdev_put() in
the wrong place, before a warnf() that displays the bdev under
consideration rather after it.

This results in a silent lockup in printk("%pg") called via warnf() from
get_tree_bdev() under some circumstances when there's a race with the
blockdev being frozen.  This can be caused by xfstests/tests/generic/085 in
combination with Lukas Czerner's ext4 mount API conversion patchset.  It
looks like it ought to occur with other users of get_tree_bdev() such as
XFS, but apparently doesn't.

Fix this by switching the order of the lines.

Fixes: 6fcf0c72e4 ("vfs: add missing blkdev_put() in get_tree_bdev()")
Reported-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ian Kent <raven@themaw.net>
cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-28 14:37:40 -07:00
Olga Kornievskaia
dff58530c4 NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
Currently, if the client sends BIND_CONN_TO_SESSION with
NFS4_CDFC4_FORE_OR_BOTH but only gets NFS4_CDFS4_FORE back it ignores
that it wasn't able to enable a backchannel.

To make sure, the client sends BIND_CONN_TO_SESSION as the first
operation on the connections (ie., no other session compounds haven't
been sent before), and if the client's request to bind the backchannel
is not satisfied, then reset the connection and retry.

Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-28 15:58:38 -04:00
Linus Torvalds
51184ae37e for-5.7-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl6m1fcACgkQxWXV+ddt
 WDuJAw//WLUlHd/NNlmV92pR0yAqqpBlnYSf/zHKqxFmetZWANiFx7l9+ag03g7o
 nVYLdmsj/Y38IuEbwmWP2/0K/gfErdUxs5Qq/eE/Ui10hk+53sUFAiKBMoVdWmta
 zt5WlXUc4YBqGMqU15iz7YQfjPZDuinvWgvCEBNAZ66O3cdhcdQRRZtYGGYUJbvA
 tUrIejCsTj/U9UfVwgoSC9aAsSnUPL2ef7enxT6iUA/+1bTTBd6dUX+GCzAOnvzJ
 ejDWr55wgmrUhhEkDs+0yvEiO+sBXcQM1QJCHfFLp6lddKmPI4G63LLgAT8y5FS7
 DA1d2PNVT17yJxVA5E4ahaSGabiL8WjleZFPVURVPuoT867HRDbH2YR6B8QdGNYt
 iXu9yPU1CDTnSGiMBj3Q+X6M7w6ABoWr6JEXGX7kfGlpXTFZ2JinzvC615+Ina1u
 Vufcwg8kQpF/teIDZYV1U4gXT6y1UFneJYthXCl+Y0DXIeV4pAeAPuLVjL3asSQa
 ARgO6LSgeVdYc6kRyxW3wVMBPq0Peygc5iYQo3wEv2zD5vRFlRp/2uF488VaTN4e
 OUNBrSJK8luZDUSVH5k9z5MVXH9Dz4HyFqQ9uuV4W7CzcjQlipI1R4Em/j6Ub8g1
 l09gu10XQU07LVgrZdSNesAIv4R3/zola+9F320IFimLoPL73KI=
 =Tbyq
 -----END PGP SIGNATURE-----

Merge tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - regression fixes:
     - transaction leak when deleting unused block group
     - log cleanup after transaction abort

 - fix block group leak when removing fails

 - transaction leak if relocation recovery fails

 - fix SPDX header

* tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix transaction leak in btrfs_recover_relocation
  btrfs: fix block group leak when removing fails
  btrfs: drop logs when we've aborted a transaction
  btrfs: fix memory leak of transaction when deleting unused block group
  btrfs: discard: Use the correct style for SPDX License Identifier
2020-04-27 13:32:46 -07:00
Jens Axboe
5b0bbee473 io_uring: statx must grab the file table for valid fd
Clay reports that OP_STATX fails for a test case with a valid fd
and empty path:

 -- Test 0: statx:fd 3: SUCCEED, file mode 100755
 -- Test 1: statx:path ./uring_statx: SUCCEED, file mode 100755
 -- Test 2: io_uring_statx:fd 3: FAIL, errno 9: Bad file descriptor
 -- Test 3: io_uring_statx:path ./uring_statx: SUCCEED, file mode 100755

This is due to statx not grabbing the process file table, hence we can't
lookup the fd in async context. If the fd is valid, ensure that we grab
the file table so we can grab the file from async context.

Cc: stable@vger.kernel.org # v5.6
Reported-by: Clay Harris <bugs@claycon.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-27 10:41:22 -06:00
Qu Wenruo
fcc99734d1 btrfs: transaction: Avoid deadlock due to bad initialization timing of fs_info::journal_info
[BUG]
One run of btrfs/063 triggered the following lockdep warning:
  ============================================
  WARNING: possible recursive locking detected
  5.6.0-rc7-custom+ #48 Not tainted
  --------------------------------------------
  kworker/u24:0/7 is trying to acquire lock:
  ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs]

  but task is already holding lock:
  ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs]

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(sb_internal#2);
    lock(sb_internal#2);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  4 locks held by kworker/u24:0/7:
   #0: ffff88817b495948 ((wq_completion)btrfs-endio-write){+.+.}, at: process_one_work+0x557/0xb80
   #1: ffff888189ea7db8 ((work_completion)(&work->normal_work)){+.+.}, at: process_one_work+0x557/0xb80
   #2: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs]
   #3: ffff888174ca4da8 (&fs_info->reloc_mutex){+.+.}, at: btrfs_record_root_in_trans+0x83/0xd0 [btrfs]

  stack backtrace:
  CPU: 0 PID: 7 Comm: kworker/u24:0 Not tainted 5.6.0-rc7-custom+ #48
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
  Call Trace:
   dump_stack+0xc2/0x11a
   __lock_acquire.cold+0xce/0x214
   lock_acquire+0xe6/0x210
   __sb_start_write+0x14e/0x290
   start_transaction+0x66c/0x890 [btrfs]
   btrfs_join_transaction+0x1d/0x20 [btrfs]
   find_free_extent+0x1504/0x1a50 [btrfs]
   btrfs_reserve_extent+0xd5/0x1f0 [btrfs]
   btrfs_alloc_tree_block+0x1ac/0x570 [btrfs]
   btrfs_copy_root+0x213/0x580 [btrfs]
   create_reloc_root+0x3bd/0x470 [btrfs]
   btrfs_init_reloc_root+0x2d2/0x310 [btrfs]
   record_root_in_trans+0x191/0x1d0 [btrfs]
   btrfs_record_root_in_trans+0x90/0xd0 [btrfs]
   start_transaction+0x16e/0x890 [btrfs]
   btrfs_join_transaction+0x1d/0x20 [btrfs]
   btrfs_finish_ordered_io+0x55d/0xcd0 [btrfs]
   finish_ordered_fn+0x15/0x20 [btrfs]
   btrfs_work_helper+0x116/0x9a0 [btrfs]
   process_one_work+0x632/0xb80
   worker_thread+0x80/0x690
   kthread+0x1a3/0x1f0
   ret_from_fork+0x27/0x50

It's pretty hard to reproduce, only one hit so far.

[CAUSE]
This is because we're calling btrfs_join_transaction() without re-using
the current running one:

btrfs_finish_ordered_io()
|- btrfs_join_transaction()		<<< Call #1
   |- btrfs_record_root_in_trans()
      |- btrfs_reserve_extent()
	 |- btrfs_join_transaction()	<<< Call #2

Normally such btrfs_join_transaction() call should re-use the existing
one, without trying to re-start a transaction.

But the problem is, in btrfs_join_transaction() call #1, we call
btrfs_record_root_in_trans() before initializing current::journal_info.

And in btrfs_join_transaction() call #2, we're relying on
current::journal_info to avoid such deadlock.

[FIX]
Call btrfs_record_root_in_trans() after we have initialized
current::journal_info.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-27 17:16:07 +02:00
Filipe Manana
f135cea30d btrfs: fix partial loss of prealloc extent past i_size after fsync
When we have an inode with a prealloc extent that starts at an offset
lower than the i_size and there is another prealloc extent that starts at
an offset beyond i_size, we can end up losing part of the first prealloc
extent (the part that starts at i_size) and have an implicit hole if we
fsync the file and then have a power failure.

Consider the following example with comments explaining how and why it
happens.

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  # Create our test file with 2 consecutive prealloc extents, each with a
  # size of 128Kb, and covering the range from 0 to 256Kb, with a file
  # size of 0.
  $ xfs_io -f -c "falloc -k 0 128K" /mnt/foo
  $ xfs_io -c "falloc -k 128K 128K" /mnt/foo

  # Fsync the file to record both extents in the log tree.
  $ xfs_io -c "fsync" /mnt/foo

  # Now do a redudant extent allocation for the range from 0 to 64Kb.
  # This will merely increase the file size from 0 to 64Kb. Instead we
  # could also do a truncate to set the file size to 64Kb.
  $ xfs_io -c "falloc 0 64K" /mnt/foo

  # Fsync the file, so we update the inode item in the log tree with the
  # new file size (64Kb). This also ends up setting the number of bytes
  # for the first prealloc extent to 64Kb. This is done by the truncation
  # at btrfs_log_prealloc_extents().
  # This means that if a power failure happens after this, a write into
  # the file range 64Kb to 128Kb will not use the prealloc extent and
  # will result in allocation of a new extent.
  $ xfs_io -c "fsync" /mnt/foo

  # Now set the file size to 256K with a truncate and then fsync the file.
  # Since no changes happened to the extents, the fsync only updates the
  # i_size in the inode item at the log tree. This results in an implicit
  # hole for the file range from 64Kb to 128Kb, something which fsck will
  # complain when not using the NO_HOLES feature if we replay the log
  # after a power failure.
  $ xfs_io -c "truncate 256K" -c "fsync" /mnt/foo

So instead of always truncating the log to the inode's current i_size at
btrfs_log_prealloc_extents(), check first if there's a prealloc extent
that starts at an offset lower than the i_size and with a length that
crosses the i_size - if there is one, just make sure we truncate to a
size that corresponds to the end offset of that prealloc extent, so
that we don't lose the part of that extent that starts at i_size if a
power failure happens.

A test case for fstests follows soon.

Fixes: 31d11b83b9 ("Btrfs: fix duplicate extents after fsync of file with prealloc extents")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-27 17:16:07 +02:00
Al Viro
b0d3869ce9 propagate_one(): mnt_set_mountpoint() needs mount_lock
... to protect the modification of mp->m_count done by it.  Most of
the places that modify that thing also have namespace_lock held,
but not all of them can do so, so we really need mount_lock here.
Kudos to Piotr Krysiuk <piotras@gmail.com>, who'd spotted a related
bug in pivot_root(2) (fixed unnoticed in 5.3); search for other
similar turds has caught out this one.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27 10:37:14 -04:00
Linus Torvalds
d4fb4bfb37 Five cifs/smb3 fixes, 3 for DFS, one for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl6lIA4ACgkQiiy9cAdy
 T1FI6QwAg4mCQPvqebKd0/OaJAPne/dzS+iDpxGhCHWjyRYfXwttSHj6HTDjbb20
 OMrvOpKR4plV8LQOXyzbI7rJvDcL1UFbcBxUQUEp9I7BuVbKhE/7CWcBPc2bMiKF
 1yJhUHUjsSMP35H4f3w8J+eKzXcJnXljsruI61FVn4kagRzsUrTOfyhtdfcobPHA
 0o0eZPPhAmoN2Vaf8jpVDEECHotbIKRr6hwN4/lPiOjVvqmHbi42RFmn06rlKqWA
 FBJqYKHK9VyL6458nTego5BXoJ4DSVf28Ow367sYFekpqA2eENfKRIHZ/feBzTH+
 GOn44GJqMcpMXkGgMuR7qMk8wi+nYTBrGXgpXjD3Yw/mHLiPbmscrudwZ30HQ5Rr
 1tgEgFd064gCzA/sm8MmAzSo5Du9oGyabuDewoatKHztNLZA9jMCO/kvuYoCtnLW
 vwlPcnedl4fUir3sdzU9JwHxhcoiAREktqQCXWVew9FGedvdfxVDuPMejayrND9k
 KK6zbll3
 =x+F1
 -----END PGP SIGNATURE-----

Merge tag '5.7-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Five cifs/smb3 fixes:two for DFS reconnect failover, one lease fix for
  stable and the others to fix a missing spinlock during reconnect"

* tag '5.7-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix uninitialised lease_key in open_shroot()
  cifs: ensure correct super block for DFS reconnect
  cifs: do not share tcons with DFS
  cifs: minor update to comments around the cifs_tcp_ses_lock mutex
  cifs: protect updating server->dstaddr with a spinlock
2020-04-26 11:44:17 -07:00
Linus Torvalds
a8a0e2a96b Driver core fixes for 5.7-rc3
Here are some small firmware/driver core/debugfs fixes for 5.7-rc3.
 
 The debugfs change is now possible as now the last users of
 debugfs_create_u32() have been fixed up in the different trees that got
 merged into 5.7-rc1, and I don't want it creeping back in.
 
 The firmware changes did cause a regression in linux-next, so the final
 patch here reverts part of that, re-exporting the symbol to resolve that
 issue.  All of these patches, with the exception of the final one, have
 been in linux-next with only that one reported issue.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXqVliw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymf6ACfS5HoPt+kWKtfKteN/mt6WUeJz6oAoMDg4Qvf
 4ncqmH9jt0lj5NAwHxFi
 =DP2q
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some small firmware/driver core/debugfs fixes for 5.7-rc3.

  The debugfs change is now possible as now the last users of
  debugfs_create_u32() have been fixed up in the different trees that
  got merged into 5.7-rc1, and I don't want it creeping back in.

  The firmware changes did cause a regression in linux-next, so the
  final patch here reverts part of that, re-exporting the symbol to
  resolve that issue. All of these patches, with the exception of the
  final one, have been in linux-next with only that one reported issue"

* tag 'driver-core-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  firmware_loader: revert removal of the fw_fallback_config export
  debugfs: remove return value of debugfs_create_u32()
  firmware_loader: remove unused exports
  firmware: imx: fix compile-testing
2020-04-26 11:04:15 -07:00
Linus Torvalds
b2768df24e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull pid leak fix from Eric Biederman:
 "Oleg noticed that put_pid(thread_pid) was not getting called when proc
  was not compiled in.

  Let's get that fixed before 5.7 is released and causes problems for
  anyone"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Put thread_pid in release_task not proc_flush_pid
2020-04-25 12:25:32 -07:00
Xiyu Yang
6e47666ef9 NFSv4: Remove unreachable error condition due to rpc_run_task()
nfs4_proc_layoutget() invokes rpc_run_task(), which return the value to
"task". Since rpc_run_task() is impossible to return an ERR pointer,
there is no need to add the IS_ERR() condition on "task" here. So we
need to remove it.

Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-25 09:20:42 -04:00
Eric W. Biederman
6ade99ec61 proc: Put thread_pid in release_task not proc_flush_pid
Oleg pointed out that in the unlikely event the kernel is compiled
with CONFIG_PROC_FS unset that release_task will now leak the pid.

Move the put_pid out of proc_flush_pid into release_task to fix this
and to guarantee I don't make that mistake again.

When possible it makes sense to keep get and put in the same function
so it can easily been seen how they pair up.

Fixes: 7bc3e6e55a ("proc: Use a list of inodes to flush from proc")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-04-24 15:49:00 -05:00
Linus Torvalds
aee1a009c9 io_uring-5.7-2020-04-24
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6jKZkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkqsEACnY1xBZfO3tw0x+XqIQW1qqtls8/buMKen
 Iqo2XOJZNMgjMO6T5naPblh1f3JxUVihR8NE3PSm8ZERIl6Xq9YesXATFsC1C+sH
 giR0O4ae7lkYRrlNNvo+K9BmS90AwzTYb73imDFmt+/BuySY67rysN4Gv0q+ySWZ
 1zDdyK8R7v/WX33h0nrP9g2zG4yrYtpWXyeR26aK/BtdVv/rJqu9EiD6Kaz3oHgh
 JI2XLmuDB4d9evUfL9rW0lGd+R0uQUBVj2r9J8x9Ff176OjVhr1cPcbU2Dc/Ldnd
 0Qe1mJ3LcSEvjHrJ84J4C0wRyFiArqbFw8Fy560VDtpgS/44V8j0W5Edh6zNGehY
 xS0NxZfTPaqM5sGKafnaqBfOnrhlZOCcqrDAGe7djsGARGrbzsERpzv4TuBOE+gJ
 hxf9MDYZdIW5QVWmKpTIqAJZfCg3h+Lv/EHhp0Dqv2lIPkWmEHDF3mggej/vcfJ1
 1YEvfIM1TdeEfQPcauqggR8Yo0vUXIfobaJw99R+BwEmowNYbvE4/jH183PgjzSn
 R9xojcDOxo2x1ITCp2YkF+GQ6k2ZXL5v4mEf9zY9C2QiCkhOdzOtecfvQ/wL4/r3
 JZlPpNd+Tw2bXRtIu6ZNq1q1/l93byv4ps6NPvEeGna1klzzCiAZnO71Ln5bWUdu
 2YJoHRfI2g==
 =L+dP
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.7-2020-04-24' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
 "Single fixup for a change that went into -rc2"

* tag 'io_uring-5.7-2020-04-24' of git://git.kernel.dk/linux-block:
  io_uring: only restore req->work for req that needs do completion
2020-04-24 12:58:22 -07:00
Linus Torvalds
3d29cb17ba block-5.7-2020-04-24
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6jKKUQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoYCEADA5naNC7RC7XAB90tyZrqUAGd33pUGyu86
 ZDc3xyd9V51xj21IoIUWLF7yqR+NFVnhEKcZVAHgZcTnHRAzT2opTV0NkkfseiUA
 p0ozevwJR6K++X/fefHZNYjCPcmFiC3FFTlNALBBBtTcIVKQKAYaX7fNEp/hrJOE
 njrkaujqqtq4QA4d7iPC3pXTn0mFC64+9lsBS67YG+qSKq/nM1Grjsw+eANTwKqZ
 +uBPJzDAEkqlqVQ3H16tLFb631agNEfgE0+KyLDufMNlahZ9n4+lBJWBKoeKXLCW
 2OGjhq3MeIVZbvVtpnoVJBlxmECGr+d5PfuZc9Nn+v3XPWW48RLZg15BlFlV60JQ
 uRTMWfokpTFUEYIO6Rb7J/1Jz2XWgGZzxX3SPVKwLRtk6um/vgtjloD0KFKY9j3P
 YhzMVDyORqV8URk7TYkCYRDYkiOJ7bsJ0RiSirU9i6Mt8hAtW8cMTYcFWRCA/sbA
 6N92E87YyiFLajclR5YVeZeBDjRYeZ6/6rK0MtXcqMQLTU6GfPSTb/D5tJ5BPCyi
 2XI23vPeGtq8cN6dyB39y0l1NcP7/x6wnJesja+zDbOqfkkk07BBbzQey2hD2zBl
 LbM+7G6EQLASbI9lgzCRD/2EbZXi2OkqI3CqBAvw8aYh/t2brDw9+e6ShlnEa5JU
 eQfw1WGhkg==
 =06Zn
 -----END PGP SIGNATURE-----

Merge tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few fixes/changes that should go into this release:

   - null_blk zoned fixes (Damien)

   - blkdev_close() sync improvement (Douglas)

   - Fix regression in blk-iocost that impacted (at least) systemtap
     (Waiman)

   - Comment fix, header removal (Zhiqiang, Jianpeng)"

* tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block:
  null_blk: Cleanup zoned device initialization
  null_blk: Fix zoned command handling
  block: remove unused header
  blk-iocost: Fix error on iocost_ioc_vrate_adj
  bdev: Reduce time holding bd_mutex in sync in blkdev_close()
  buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.
2020-04-24 12:44:19 -07:00
Linus Torvalds
9a19562852 AFS miscellany
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl6jDR0ACgkQ+7dXa6fL
 C2ttIg//Zz6bEpu7BAdvrXmUCfcYbI4gbVRPEFcAz4/z8c05UJXdkps2oVj1sKmb
 hLRBIxArRo7tcdziIdwwk8fckaW1i60wXfsiaAEyxPBuW+oB6fEUqoEmshUjw36u
 lzseygJnyKNKNX8B6MSYz3NQv5kaVefD6UoQ84+3m7Me/AJx9s+LZEUTrvlz5Myy
 BbE19Jnx5SlgqkVyuis6FQ0u+cXUdVleIm3LFzzbaP9syLlsleAJjXU3EPM3/mzK
 BcV77DhMGJhKZ0DhFuUkKE1EUslR4vJiV7gDMdyJKuSTlIU+1IGYWiI6XPyk/BLH
 trpSDHe8DuCCGPmQCQPM4XxfQJVlnKej+sFoUeqCShndkK9ayTuYot5eARbqGj4x
 SEVQ6PWgnLcWtSuxQDIWJBBWZPJZ8/v3yDld0ij95wbGqAywnsiVBt85XPK4Ccje
 ew3urAK52wlQxwy2U+Rn39hzLi6vCx0Z3ncJ/ak5TarcL8txQhCOcKukTB7Wa4Ie
 MKW+IANoYvLgFmbnXLlsBBxpcewNwxQhklMkSx5G+3EnWxXIOqRPumOPxV2UfYrA
 Mgv3F1PZo9Q3SU6eb8lGIYyeho0+6qV/OZzmcy6Xl8nNHeJXZ9eGsSYSlKYUQ7WI
 rum/g7UPBxni7wkyJxrn90yxirFG81Dm4216ThKGSQ6Mu5pDmBA=
 =tTG+
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20200424' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull misc AFS fixes from David Howells:
 "Three miscellaneous fixes to the afs filesystem:

   - Remove some struct members that aren't used, aren't set or aren't
     read, plus a wake up that nothing ever waits for.

   - Actually set the AFS_SERVER_FL_HAVE_EPOCH flag so that the code
     that depends on it can work.

   - Make a couple of waits uninterruptible if they're done for an
     operation that isn't supposed to be interruptible"

* tag 'afs-fixes-20200424' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate
  afs: Fix to actually set AFS_SERVER_FL_HAVE_EPOCH
  afs: Remove some unused bits
2020-04-24 10:32:40 -07:00
David Howells
c4bfda16d1 afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate
When an operation is meant to be done uninterruptibly (such as
FS.StoreData), we should not be allowing volume and server record checking
to be interrupted.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 16:33:32 +01:00
David Howells
69cf3978f3 afs: Fix to actually set AFS_SERVER_FL_HAVE_EPOCH
AFS keeps track of the epoch value from the rxrpc protocol to note (a) when
a fileserver appears to have restarted and (b) when different endpoints of
a fileserver do not appear to be associated with the same fileserver
(ie. all probes back from a fileserver from all of its interfaces should
carry the same epoch).

However, the AFS_SERVER_FL_HAVE_EPOCH flag that indicates that we've
received the server's epoch is never set, though it is used.

Fix this to set the flag when we first receive an epoch value from a probe
sent to the filesystem client from the fileserver.

Fixes: 3bf0fb6f33 ("afs: Probe multiple fileservers simultaneously")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 16:32:49 +01:00
David Howells
be59167c8f afs: Remove some unused bits
Remove three bits:

 (1) afs_server::no_epoch is neither set nor used.

 (2) afs_server::have_result is set and a wakeup is applied to it, but
     nothing looks at it or waits on it.

 (3) afs_vl_dump_edestaddrreq() prints afs_addr_list::probed, but nothing
     sets it for VL servers.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 16:32:49 +01:00
Al Viro
3815f1be54 dlmfs_file_write(): fix the bogosity in handling non-zero *ppos
'count' is how much you want written, not the final position.
Moreover, it can legitimately be less than the current position...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-23 13:45:27 -04:00
Linus Torvalds
1ddd873948 Fixes:
- Address several use-after-free and memory leak bugs
 
 - Prevent a backchannel livelock
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJemdwYAAoJEDNqszNvZn+XU1oQAKOm9vypO6w252kXdhFSxAlB
 3tMxXALNDrFP3PXsKCa/sKKMRvkUkx+9pdnTuXDPvffd3ZgyB8DzJilryEtiqT4Y
 JsuoWHg2QyNeKUFGmtZ5AsefPaR8WL/aiYPTi1PUqnq4rNPjAgOGgLUv+LME2jFU
 Yx773d5CNHXDq6zv1Au0128URnQZDy/7URdfgX1FhLA8aQWjiG08fhBEGncXjV/X
 mo3RMCwE2uzNRruW7OJyCehb8d+IKBDZ0LEeZDW/ve4hNtL+Ke5eCEoemYtUN07e
 U3gRMB8Pt+55L+ZFP8KJYOtfRx2SkOTMcbASC2z/WECq5vumGmn4WovSSVJFGIUN
 5WVf8ADM2w3RmTFh11Jl5mZnziGRNY/4hAW7PrR4ZDhJxjdKA+iLLd7571kkCE63
 II6qxw/WV7Yz3T6v4BoOcDf1DOylnS1JXqmPGYia2aAhyFZgRVasOVIkB0meaaFe
 zSKzKsTrir1Ru8/xt5zIgyEQwqATp2rwzkoPuTeQZLOht0fsSIGBpD1ZWXUaMAji
 cfojhd4731cvoxMMGG27IMiHTG6rpKneaZ21Z/7/61P+cjHm/ITOLZzzRvhQMQU7
 wuskRf3KTs+3k4x6P9E0qQU1DcJkPSYGq+JDdh389Plald4MLTAZYjIK+J3X35oL
 QNnUeKzr1YhWWqgchthG
 =Zoup
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6

Pull nfsd fixes from Chuck Lever:
 "The first set of 5.7-rc fixes for NFS server issues.

  These were all unresolved at the time the 5.7 window opened, and
  needed some additional time to ensure they were correctly addressed.
  They are ready now.

  At the moment I know of one more urgent issue regarding the NFS
  server. A fix has been tested and is under review. I expect to send
  one more pull request, containing this fix (which now consists of 3
  patches).

  Fixes:

   - Address several use-after-free and memory leak bugs

   - Prevent a backchannel livelock"

* tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6:
  svcrdma: Fix leak of svc_rdma_recv_ctxt objects
  svcrdma: Fix trace point use-after-free race
  SUNRPC: Fix backchannel RPC soft lockups
  SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge
  nfsd: memory corruption in nfsd4_lock()
2020-04-23 09:33:43 -07:00
Linus Torvalds
6f8cd037a5 Description for this pull request:
- several bug fixes(broken mount discard option, remount failure, memory leak)
 - add missing MODULE_ALIAS_FS for automatically loading exfat module.
 - set s_time_gran and truncate atime with exfat timestamp granularity.
 -----BEGIN PGP SIGNATURE-----
 
 iQJMBAABCgA2FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAl6g6qYYHG5hbWphZS5q
 ZW9uQHNhbXN1bmcuY29tAAoJEGcL+wNRRCEIhJgP/jPwy087q3iB41cQKNikcN4e
 Iby7lRtzZca5QQHfQN5UQbJaHGd5uJF/hIbc9K5jom1+1/5MgLYbXsrJgPzH4tef
 3AgDz6R0ufVsToA2Cjt37RtlhvXjxAOOe5NIbL6Zrv+KE4TOKpRa+f5SMRvORjSW
 J+NxJRRSrrqHcH4Th2gbknAsX6QkSxURxFhhiKYqbfsw9Aw8ogOgsT7if0r8aAXh
 J3IU58YDcvjj5JxVdsMT5VAGe8hCMNBMDC+bjRb5/SbhxP4B8pryRSS95sdSuCAQ
 TGNsMrKFpSziseuX0lcor0OS9JHcGlxZ38cP6Eth4i6glOvrxWuU6RcG5MQATwDc
 u17gflAfvaceuAPoCw2lImSBmFXK3GUjbbksQvuXQ7LEAgzguPBju/W3ZbNLAKUa
 P3rOOO7KPVrexMll2N8tg+qyUXznbWtR18RPCGvnT9xrIRygiAqrkvviZK1ua+Yh
 nBBlFDnTLIlpX37Og3o1A/WDyAG2Uhhv0BMfzn/elIlAew0wC7C6HjcGiHn5wLnJ
 r7aac/HunakSXD0JIkpMDW7cqnq6RpByeUOBxqWbyUVRbuQrVMqfXvGmrww+NDz7
 CnWvAJ3GQfVeHV9WMqnqQKFOZns8U31MRDgAvaYqzmUKtVZy/N2O6D0eoNolCd/w
 IHIFbvJyivAG0McFVZhl
 =Ropa
 -----END PGP SIGNATURE-----

Merge tag 'for-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat fixes from Namjae Jeon:

 - several bug fixes(broken mount discard option, remount failure,
   memory leak)

 - add missing MODULE_ALIAS_FS for automatically loading exfat module.

 - set s_time_gran and truncate atime with exfat timestamp granularity.

* tag 'for-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: truncate atimes to 2s granularity
  exfat: properly set s_time_gran
  exfat: remove 'bps' mount-option
  exfat: Unify access to the boot sector
  exfat: add missing MODULE_ALIAS_FS()
  exfat: Fix discard support
2020-04-23 09:31:20 -07:00
Xiyu Yang
1402d17dfd btrfs: fix transaction leak in btrfs_recover_relocation
btrfs_recover_relocation() invokes btrfs_join_transaction(), which joins
a btrfs_trans_handle object into transactions and returns a reference of
it with increased refcount to "trans".

When btrfs_recover_relocation() returns, "trans" becomes invalid, so the
refcount should be decreased to keep refcount balanced.

The reference counting issue happens in one exception handling path of
btrfs_recover_relocation(). When read_fs_root() failed, the refcnt
increased by btrfs_join_transaction() is not decreased, causing a refcnt
leak.

Fix this issue by calling btrfs_end_transaction() on this error path
when read_fs_root() failed.

Fixes: 79787eaab4 ("btrfs: replace many BUG_ONs with proper error handling")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23 17:24:56 +02:00
Xiyu Yang
f6033c5e33 btrfs: fix block group leak when removing fails
btrfs_remove_block_group() invokes btrfs_lookup_block_group(), which
returns a local reference of the block group that contains the given
bytenr to "block_group" with increased refcount.

When btrfs_remove_block_group() returns, "block_group" becomes invalid,
so the refcount should be decreased to keep refcount balanced.

The reference counting issue happens in several exception handling paths
of btrfs_remove_block_group(). When those error scenarios occur such as
btrfs_alloc_path() returns NULL, the function forgets to decrease its
refcnt increased by btrfs_lookup_block_group() and will cause a refcnt
leak.

Fix this issue by jumping to "out_put_group" label and calling
btrfs_put_block_group() when those error scenarios occur.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23 17:24:52 +02:00
Josef Bacik
ef67963dac btrfs: drop logs when we've aborted a transaction
Dave reported a problem where we were panicing with generic/475 with
misc-5.7.  This is because we were doing IO after we had stopped all of
the worker threads, because we do the log tree cleanup on roots at drop
time.  Cleaning up the log tree will always need to do reads if we
happened to have evicted the blocks from memory.

Because of this simply add a helper to btrfs_cleanup_transaction() that
will go through and drop all of the log roots.  This gets run before we
do the close_ctree() work, and thus we are allowed to do any reads that
we would need.  I ran this through many iterations of generic/475 with
constrained memory and I did not see the issue.

  general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  CPU: 2 PID: 12359 Comm: umount Tainted: G        W 5.6.0-rc7-btrfs-next-58 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
  RIP: 0010:btrfs_queue_work+0x33/0x1c0 [btrfs]
  RSP: 0018:ffff9cfb015937d8 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff8eb5e339ed80 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffff8eb5eb33b770 RDI: ffff8eb5e37a0460
  RBP: ffff8eb5eb33b770 R08: 000000000000020c R09: ffffffff9fc09ac0
  R10: 0000000000000007 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b
  R13: ffff9cfb00229040 R14: 0000000000000008 R15: ffff8eb5d3868000
  FS:  00007f167ea022c0(0000) GS:ffff8eb5fae00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f167e5e0cb1 CR3: 0000000138c18004 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   btrfs_end_bio+0x81/0x130 [btrfs]
   __split_and_process_bio+0xaf/0x4e0 [dm_mod]
   ? percpu_counter_add_batch+0xa3/0x120
   dm_process_bio+0x98/0x290 [dm_mod]
   ? generic_make_request+0xfb/0x410
   dm_make_request+0x4d/0x120 [dm_mod]
   ? generic_make_request+0xfb/0x410
   generic_make_request+0x12a/0x410
   ? submit_bio+0x38/0x160
   submit_bio+0x38/0x160
   ? percpu_counter_add_batch+0xa3/0x120
   btrfs_map_bio+0x289/0x570 [btrfs]
   ? kmem_cache_alloc+0x24d/0x300
   btree_submit_bio_hook+0x79/0xc0 [btrfs]
   submit_one_bio+0x31/0x50 [btrfs]
   read_extent_buffer_pages+0x2fe/0x450 [btrfs]
   btree_read_extent_buffer_pages+0x7e/0x170 [btrfs]
   walk_down_log_tree+0x343/0x690 [btrfs]
   ? walk_log_tree+0x3d/0x380 [btrfs]
   walk_log_tree+0xf7/0x380 [btrfs]
   ? plist_requeue+0xf0/0xf0
   ? delete_node+0x4b/0x230
   free_log_tree+0x4c/0x130 [btrfs]
   ? wait_log_commit+0x140/0x140 [btrfs]
   btrfs_free_log+0x17/0x30 [btrfs]
   btrfs_drop_and_free_fs_root+0xb0/0xd0 [btrfs]
   btrfs_free_fs_roots+0x10c/0x190 [btrfs]
   ? do_raw_spin_unlock+0x49/0xc0
   ? _raw_spin_unlock+0x29/0x40
   ? release_extent_buffer+0x121/0x170 [btrfs]
   close_ctree+0x289/0x2e6 [btrfs]
   generic_shutdown_super+0x6c/0x110
   kill_anon_super+0xe/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x3a/0x70

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 8c38938c7b ("btrfs: move the root freeing stuff into btrfs_put_root")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23 17:23:11 +02:00
Filipe Manana
5150bf1963 btrfs: fix memory leak of transaction when deleting unused block group
When cleaning pinned extents right before deleting an unused block group,
we check if there's still a previous transaction running and if so we
increment its reference count before using it for cleaning pinned ranges
in its pinned extents iotree. However we ended up never decrementing the
reference count after using the transaction, resulting in a memory leak.

Fix it by decrementing the reference count.

Fixes: fe119a6eeb ("btrfs: switch to per-transaction pinned extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23 17:22:45 +02:00
Paulo Alcantara
0fe0781f29 cifs: fix uninitialised lease_key in open_shroot()
SMB2_open_init() expects a pre-initialised lease_key when opening a
file with a lease, so set pfid->lease_key prior to calling it in
open_shroot().

This issue was observed when performing some DFS failover tests and
the lease key was never randomly generated.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
2020-04-22 20:29:11 -05:00
Paulo Alcantara
3786f4bddc cifs: ensure correct super block for DFS reconnect
This patch is basically fixing the lookup of tcons (DFS specific) during
reconnect (smb2pdu.c:__smb2_reconnect) to update their prefix paths.

Previously, we relied on the TCP_Server_Info pointer
(misc.c:tcp_super_cb) to determine which tcon to update the prefix path

We could not rely on TCP server pointer to determine which super block
to update the prefix path when reconnecting tcons since it might map
to different tcons that share same TCP connection.

Instead, walk through all cifs super blocks and compare their DFS full
paths with the tcon being updated to.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-04-22 20:27:30 -05:00
Paulo Alcantara
65303de829 cifs: do not share tcons with DFS
This disables tcon re-use for DFS shares.

tcon->dfs_path stores the path that the tcon should connect to when
doing failing over.

If that tcon is used multiple times e.g. 2 mounts using it with
different prefixpath, each will need a different dfs_path but there is
only one tcon. The other solution would be to split the tcon in 2
tcons during failover but that is much harder.

tcons could not be shared with DFS in cifs.ko because in a
DFS namespace like:

          //domain/dfsroot -> /serverA/dfsroot, /serverB/dfsroot

          //serverA/dfsroot/link -> /serverA/target1/aa/bb

          //serverA/dfsroot/link2 -> /serverA/target1/cc/dd

you can see that link and link2 are two DFS links that both resolve to
the same target share (/serverA/target1), so cifs.ko will only contain a
single tcon for both link and link2.

The problem with that is, if we (auto)mount "link" and "link2", cifs.ko
will only contain a single tcon for both DFS links so we couldn't
perform failover or refresh the DFS cache for both links because
tcon->dfs_path was set to either "link" or "link2", but not both --
which is wrong.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-22 20:22:08 -05:00
Eric Sandeen
81df1ad406 exfat: truncate atimes to 2s granularity
The timestamp for access_time has double seconds granularity(There is no
10msIncrement field for access_time unlike create/modify_time).
exfat's atimes are restricted to only 2s granularity so after
we set an atime, round it down to the nearest 2s and set the
sub-second component of the timestamp to 0.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22 20:14:06 +09:00
Eric Sandeen
674a9985b8 exfat: properly set s_time_gran
The s_time_gran superblock field indicates the on-disk nanosecond
granularity of timestamps, and for exfat that seems to be 10ms, so
set s_time_gran to 10000000ns. Without this, in-memory timestamps
change when they get re-read from disk.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22 20:14:06 +09:00
Tetsuhiro Kohada
cbd445d9a9 exfat: remove 'bps' mount-option
remount fails because exfat_show_options() returns unsupported
option 'bps'.
> # mount -o ro,remount
> exfat: Unknown parameter 'bps'

To fix the problem, just remove 'bps' option from exfat_show_options().

Signed-off-by: Tetsuhiro Kohada <Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22 20:14:05 +09:00
Tetsuhiro Kohada
b0516833d8 exfat: Unify access to the boot sector
Unify access to boot sector via 'sbi->pbr_bh'.
This fixes vol_flags inconsistency at read failed in fs_set_vol_flags(),
and buffer_head leak in __exfat_fill_super().

Signed-off-by: Tetsuhiro Kohada <Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22 20:14:05 +09:00
Thomas Backlund
cd76ac258c exfat: add missing MODULE_ALIAS_FS()
This adds the necessary MODULE_ALIAS_FS() to exfat so the module gets
automatically loaded when an exfat filesystem is mounted.

Signed-off-by: Thomas Backlund <tmb@mageia.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22 20:14:05 +09:00
Pali Rohár
b7e038a924 exfat: Fix discard support
Discard support was always unconditionally disabled. Now it is disabled
only in the case when blk_queue_discard() returns false.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22 20:14:05 +09:00
Steve French
d92c7ce41e cifs: minor update to comments around the cifs_tcp_ses_lock mutex
Update comment to note that it protects server->dstaddr

Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-21 23:51:18 -05:00
Sudip Mukherjee
db973a7289 coredump: fix null pointer dereference on coredump
If the core_pattern is set to "|" and any process segfaults then we get
a null pointer derefernce while trying to coredump. The call stack shows:

    RIP: do_coredump+0x628/0x11c0

When the core_pattern has only "|" there is no use of trying the
coredump and we can check that while formating the corename and exit
with an error.

After this change I get:

    format_corename failed
    Aborting core

Fixes: 315c69261d ("coredump: split pipe command whitespace before expanding template")
Reported-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Wise <pabs3@bonedaddy.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200416194612.21418-1-sudipm.mukherjee@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-21 11:11:56 -07:00
Jann Horn
bdebd6a283 vmalloc: fix remap_vmalloc_range() bounds checks
remap_vmalloc_range() has had various issues with the bounds checks it
promises to perform ("This function checks that addr is a valid
vmalloc'ed area, and that it is big enough to cover the vma") over time,
e.g.:

 - not detecting pgoff<<PAGE_SHIFT overflow

 - not detecting (pgoff<<PAGE_SHIFT)+usize overflow

 - not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same
   vmalloc allocation

 - comparing a potentially wildly out-of-bounds pointer with the end of
   the vmalloc region

In particular, since commit fc9702273e ("bpf: Add mmap() support for
BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer
dereferences by calling mmap() on a BPF map with a size that is bigger
than the distance from the start of the BPF map to the end of the
address space.

This could theoretically be used as a kernel ASLR bypass, by using
whether mmap() with a given offset oopses or returns an error code to
perform a binary search over the possible address range.

To allow remap_vmalloc_range_partial() to verify that addr and
addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset
to remap_vmalloc_range_partial() instead of adding it to the pointer in
remap_vmalloc_range().

In remap_vmalloc_range_partial(), fix the check against
get_vm_area_size() by using size comparisons instead of pointer
comparisons, and add checks for pgoff.

Fixes: 833423143c ("[PATCH] mm: introduce remap_vmalloc_range()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-21 11:11:56 -07:00
Ma, Jianpeng
d56deb1e4e block: remove unused header
Dax related code already removed from this file.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-21 09:51:10 -06:00
Ronnie Sahlberg
fada37f6f6 cifs: protect updating server->dstaddr with a spinlock
We use a spinlock while we are reading and accessing the destination address for a server.
We need to also use this spinlock to protect when we are modifying this address from
reconn_set_ipaddr().

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-21 09:57:56 -05:00
Nishad Kamdar
317ddf3715 btrfs: discard: Use the correct style for SPDX License Identifier
This patch corrects the SPDX License Identifier style in header file
related to Btrfs File System support.  For C header files
Documentation/process/license-rules.rst mandates C-like comments
(opposed to C source files where C++ style should be used).

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-20 17:43:42 +02:00
Douglas Anderson
b849dd84b6 bdev: Reduce time holding bd_mutex in sync in blkdev_close()
While trying to "dd" to the block device for a USB stick, I
encountered a hung task warning (blocked for > 120 seconds).  I
managed to come up with an easy way to reproduce this on my system
(where /dev/sdb is the block device for my USB stick) with:

  while true; do dd if=/dev/zero of=/dev/sdb bs=4M; done

With my reproduction here are the relevant bits from the hung task
detector:

 INFO: task udevd:294 blocked for more than 122 seconds.
 ...
 udevd           D    0   294      1 0x00400008
 Call trace:
  ...
  mutex_lock_nested+0x40/0x50
  __blkdev_get+0x7c/0x3d4
  blkdev_get+0x118/0x138
  blkdev_open+0x94/0xa8
  do_dentry_open+0x268/0x3a0
  vfs_open+0x34/0x40
  path_openat+0x39c/0xdf4
  do_filp_open+0x90/0x10c
  do_sys_open+0x150/0x3c8
  ...

 ...
 Showing all locks held in the system:
 ...
 1 lock held by dd/2798:
  #0: ffffff814ac1a3b8 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x50/0x204
 ...
 dd              D    0  2798   2764 0x00400208
 Call trace:
  ...
  schedule+0x8c/0xbc
  io_schedule+0x1c/0x40
  wait_on_page_bit_common+0x238/0x338
  __lock_page+0x5c/0x68
  write_cache_pages+0x194/0x500
  generic_writepages+0x64/0xa4
  blkdev_writepages+0x24/0x30
  do_writepages+0x48/0xa8
  __filemap_fdatawrite_range+0xac/0xd8
  filemap_write_and_wait+0x30/0x84
  __blkdev_put+0x88/0x204
  blkdev_put+0xc4/0xe4
  blkdev_close+0x28/0x38
  __fput+0xe0/0x238
  ____fput+0x1c/0x28
  task_work_run+0xb0/0xe4
  do_notify_resume+0xfc0/0x14bc
  work_pending+0x8/0x14

The problem appears related to the fact that my USB disk is terribly
slow and that I have a lot of RAM in my system to cache things.
Specifically my writes seem to be happening at ~15 MB/s and I've got
~4 GB of RAM in my system that can be used for buffering.  To write 4
GB of buffer to disk thus takes ~4000 MB / ~15 MB/s = ~267 seconds.

The 267 second number is a problem because in __blkdev_put() we call
sync_blockdev() while holding the bd_mutex.  Any other callers who
want the bd_mutex will be blocked for the whole time.

The problem is made worse because I believe blkdev_put() specifically
tells other tasks (namely udev) to go try to access the device at right
around the same time we're going to hold the mutex for a long time.

Putting some traces around this (after disabling the hung task detector),
I could confirm:
 dd:    437.608600: __blkdev_put() right before sync_blockdev() for sdb
 udevd: 437.623901: blkdev_open() right before blkdev_get() for sdb
 dd:    661.468451: __blkdev_put() right after sync_blockdev() for sdb
 udevd: 663.820426: blkdev_open() right after blkdev_get() for sdb

A simple fix for this is to realize that sync_blockdev() works fine if
you're not holding the mutex.  Also, it's not the end of the world if
you sync a little early (though it can have performance impacts).
Thus we can make a guess that we're going to need to do the sync and
then do it without holding the mutex.  We still do one last sync with
the mutex but it should be much, much faster.

With this, my hung task warnings for my test case are gone.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-20 09:31:20 -06:00
Andreas Gruenbacher
7648f939cb nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
nfs3_set_acl keeps track of the acl it allocated locally to determine if an acl
needs to be released at the end.  This results in a memory leak when the
function allocates an acl as well as a default acl.  Fix by releasing acls
that differ from the acl originally passed into nfs3_set_acl.

Fixes: b7fa0554cf ("[PATCH] NFS: Add support for NFSv3 ACLs")
Reported-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-20 09:58:59 -04:00
Trond Myklebust
4d8948c733 NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
If the credential returned by pnfs_prepare_layoutreturn()
does not match the credential of the RPC call, then we do
end up calling pnfs_send_layoutreturn() with that credential,
so don't free it!

Fixes: 44ea8dfce0 ("NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-19 23:53:52 -04:00
Trond Myklebust
7bcc10585b NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion
We require that any outstanding layout return completes before we can
free up the inode so that the layout itself can be freed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-19 19:27:26 -04:00
Xiaoguang Wang
44575a6731 io_uring: only restore req->work for req that needs do completion
When testing io_uring IORING_FEAT_FAST_POLL feature, I got below panic:
BUG: kernel NULL pointer dereference, address: 0000000000000030
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 5 PID: 2154 Comm: io_uring_echo_s Not tainted 5.6.0+ #359
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:io_wq_submit_work+0xf/0xa0
Code: ff ff ff be 02 00 00 00 e8 ae c9 19 00 e9 58 ff ff ff 66 0f 1f
84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 2f <8b>
45 30 48 8d 9d 48 ff ff ff 25 01 01 00 00 83 f8 01 75 07 eb 2a
RSP: 0018:ffffbef543e93d58 EFLAGS: 00010286
RAX: ffffffff84364f50 RBX: ffffa3eb50f046b8 RCX: 0000000000000000
RDX: ffffa3eb0efc1840 RSI: 0000000000000006 RDI: ffffa3eb50f046b8
RBP: 0000000000000000 R08: 00000000fffd070d R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa3eb50f046b8
R13: ffffa3eb0efc2088 R14: ffffffff85b69be0 R15: ffffa3eb0effa4b8
FS:  00007fe9f69cc4c0(0000) GS:ffffa3eb5ef40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 0000000020410000 CR4: 00000000000006e0
Call Trace:
 task_work_run+0x6d/0xa0
 do_exit+0x39a/0xb80
 ? get_signal+0xfe/0xbc0
 do_group_exit+0x47/0xb0
 get_signal+0x14b/0xbc0
 ? __x64_sys_io_uring_enter+0x1b7/0x450
 do_signal+0x2c/0x260
 ? __x64_sys_io_uring_enter+0x228/0x450
 exit_to_usermode_loop+0x87/0xf0
 do_syscall_64+0x209/0x230
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x7fe9f64f8df9
Code: Bad RIP value.

task_work_run calls io_wq_submit_work unexpectedly, it's obvious that
struct callback_head's func member has been changed. After looking into
codes, I found this issue is still due to the union definition:
    union {
        /*
         * Only commands that never go async can use the below fields,
         * obviously. Right now only IORING_OP_POLL_ADD uses them, and
         * async armed poll handlers for regular commands. The latter
         * restore the work, if needed.
         */
        struct {
            struct callback_head	task_work;
            struct hlist_node	hash_node;
            struct async_poll	*apoll;
        };
        struct io_wq_work	work;
    };

When task_work_run has multiple work to execute, the work that calls
io_poll_remove_all() will do req->work restore for  non-poll request
always, but indeed if a non-poll request has been added to a new
callback_head, subsequent callback will call io_async_task_func() to
handle this request, that means we should not do the restore work
for such non-poll request. Meanwhile in io_async_task_func(), we should
drop submit ref when req has been canceled.

Fix both issues.

Fixes: b1f573bd15 ("io_uring: restore req->work when canceling poll request")
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

Use io_double_put_req()

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-19 13:55:27 -06:00
Linus Torvalds
3e0dea5768 An update for the proc interface of time namespaces: Use symbolic names
instead of clockid numbers. The usability nuisance of numbers was noticed
 by Michael when polishing the man page.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6cVQsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWBjEAC0dCUHKDLoG0FeyG4tb4FEBW2iTqM8
 UFirH26K18s8QSePdvfJlaxtN2SdfNZG7UgYN7wz1fDFQy05zTz7Rek8UrDuu3rh
 mVph/UZtUJl+6ypW2Lw9x5RWpT5yzay2iowUyBPnNxU9F/0uRKvXQFju3L83Lo/z
 Z4ni7gVEw87dQi5E74tEv6iaydgPuCBpGxoMahotnHyclqMjA0QuAK6nhN5ZTcAn
 senoorS/VqkSF5qEvIUwe7+F+kkMbwQryT7merJyNwh/F49xTTXRyBmiys1MF8Og
 MTEvldXKy2pCh2UfRa/x84WWwOUVNivTXdIXjhalsblczL0j1z9MsQ8b3AOXOiLf
 S+/Ntbb2dGo4qE22jekMwZ54Pm4x5NzChCU8+3pvd6IrPWZKi6vue74Kd0RNHQg/
 0kWOlZnIP2ArVW0bFqV6jhMYkjmVdK6gm7cUpFV66L2H8zbfFuc4OlxJYEFYivye
 9Yck+rFQmMwA15ZXYIpggkd7Rf/5CGF1CiMBAvP/ILubpgbJqnn6/tGByq8tDKdy
 mqXX+NHF0M/7rJd5vr7wP6p3E5nQ9l/41rh9ii9EDLXf4jsWVO3EyobJ7fFHwprs
 5tTWGxVJymUQLq/LQPXOVVENGK+ZsXXNGn/4n8IOVroeypxADTGyhtSh122kFFhv
 jPcVHqpBUd0g4Q==
 =slEk
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull time namespace fix from Thomas Gleixner:
 "An update for the proc interface of time namespaces: Use symbolic
  names instead of clockid numbers. The usability nuisance of numbers
  was noticed by Michael when polishing the man page"

* tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets
2020-04-19 11:46:21 -07:00
Linus Torvalds
439f1da923 Miscellaneous bug fixes and cleanups for ext4, including a fix for
generic/388 in data=journal mode, removing some BUG_ON's, and cleaning
 up some compiler warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl6cj80ACgkQ8vlZVpUN
 gaOx5Qf/XY7JUEp1nGgcdZyUd8uho3dKkG4TuUU5PvGsiDb4ozGsyU51q2LnOHWF
 uzDJaE03z5uc1i8C9mQRLzjzaOC8B8kQZuKfkcQ/xI4CS3cG4qRdeNdHUz5QyfhK
 5THDzr2z1tuWDuhlp+jCPjCz1fJowHxva/7ktf1OrMVEErYlZXT8CPLIRBCeuuCX
 /07/8tJ5jJoqpI3kmy1jFotMEhIBE0vixf+sfcp2RWjdb0/1LH2JPWCytX+hhSFR
 SadWDvTIvVy/rMahLHgc/VyPP47QwLWzBmLm9CdyxmDeUaM4Qwx8Zfog4+8g78wl
 IvSuHRDdTYnOO35Qbzjl2wanhzCiQQ==
 =qzEh
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous bug fixes and cleanups for ext4, including a fix for
  generic/388 in data=journal mode, removing some BUG_ON's, and cleaning
  up some compiler warnings"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: convert BUG_ON's to WARN_ON's in mballoc.c
  ext4: increase wait time needed before reuse of deleted inode numbers
  ext4: remove set but not used variable 'es' in ext4_jbd2.c
  ext4: remove set but not used variable 'es'
  ext4: do not zeroout extents beyond i_disksize
  ext4: fix return-value types in several function comments
  ext4: use non-movable memory for superblock readahead
  ext4: use matching invalidatepage in ext4_writepage
2020-04-19 11:05:15 -07:00
Linus Torvalds
aee0314bc3 Three small smb3 fixes: two debug related, and one fixing a performance problem with 64K pages
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl6b28kACgkQiiy9cAdy
 T1EZ+wwAqHCqrIgelrLFiQwHkMg1KQMBnul3mBuCJ6qxGTyzSVLWBYsfHabLqWmC
 Ann71PFygGc+5R195CcMZ/RAHGTTEbwJP5s/wGwm3wUfqImLPOpMr/jd8rv9GvE2
 atsthBnFlPE+dY5BD9fr7JIWpZxE3yevCtVifyPjA879zzqIoT9lkFcjCNTqV37l
 tRe4JyObxKSrPUUELC30XPFoBGT/Cgcoz+I0JFL+gz8Yt9CEBXL2DKdnZJERbIpm
 t+yjKAYC9QN5eF7kew8Fide4LohH7jL2EAmllWKUTRH1pHNEKgyMbSMm3F2RzoXG
 0R/70stukgXemlsCD2+BSXDZ3smPHwoKq+FftYanHd1pamOQHJMWcQ/tCk8gg9/Z
 Qq0wwBBbVP6HOMwoDOOW53/lwiU/hoR2Re3jy7K0DOGJAFNkxo98oXfT7HJfmKeW
 q1LQvKR7ch3iFaOUkg/Tv+8o3inUuYLUgegCPvM6RkGkG0Mqs8SEkA9AyyqFmBnG
 kY1K83Ct
 =G+Rl
 -----END PGP SIGNATURE-----

Merge tag '5.7-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Three small smb3 fixes: two debug related (helping network tracing for
  SMB2 mounts, and the other removing an unintended debug line on
  signing failures), and one fixing a performance problem with 64K
  pages"

* tag '5.7-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: remove overly noisy debug line in signing errors
  cifs: improve read performance for page size 64KB & cache=strict & vers=2.1+
  cifs: dump the session id and keys also for SMB2 sessions
2020-04-19 11:00:27 -07:00
Linus Torvalds
c0d73a868d Fixes for 5.7:
- Fix a partially uninitialized variable.
 - Teach the background gc threads to apply for fsfreeze protection.
 - Fix some scaling problems when multiple threads try to flush the
   filesystem when we're about to hit ENOSPC.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl6ZSVQACgkQ+H93GTRK
 tOuPfg//XQ9HX0VAd4xYM3uAr50gNIUPMfOjrlUdZfnj+DOxJDb7IbN9t6+NXYU7
 dfVdUeSPy7vwC/JUyVVgBbTfCX1CnQoeNWtg6EAdEF0msJIlbCH4sm+pI2Vofnqp
 1VDT9fU1cmrtz/dtS6teJT49P/uCPCmKRGAcnIJn/E7FZUiDS0je2iwV8jbJtAyo
 xfTHO39t5jBxBRBLRSuJUzYYvvW1ix3zheebLUQZMolnKRkKafWPja1I2N2lRt23
 VnXwEjgFpqkT2OcDk5jljkJLbImHmNNVTc6J7SomtxZfWZDwvVfIHgMUC1OsyvW3
 tJCp/22xAqqkBQS6Gx6qoXQubnqsfka86krq8C/juz5q5Doc7TPClpc4eyY/XZ0+
 q3/67K9Z5MbudUQRmDBrNqmBBiI93qVB6DmeDLvQbBIIBDNFcWTRar0WB+/s/i3S
 V4BMTyGfwU7u6ZSVzx+W619uLfgwH1mG4uzDg4xk4b4Uia3+/3zjJkh2WzrT98eq
 N+jwQr5MbWyxmjbFtcsO6ZUqlh7X5RXmjFXBAZjauVwCQAaSvnHR2SdyAvUrD2bG
 V2ujYVJ8dAJjXeS/9ILWW+oo/tQTlmmUE898oP6ZljuSYj/ONLqM4AMUoR4Ie1Vp
 BTuRr0VkAoJH2yTK/OTXYe6mBCFSyrp2l3CEC7EDLrCRQQInbRo=
 =YkcH
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.7-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "The three commits here fix some livelocks and other clashes with
  fsfreeze, a potential corruption problem, and a minor race between
  processes freeing and allocating space when the filesystem is near
  ENOSPC.

  Summary:

   - Fix a partially uninitialized variable.

   - Teach the background gc threads to apply for fsfreeze protection.

   - Fix some scaling problems when multiple threads try to flush the
     filesystem when we're about to hit ENOSPC"

* tag 'xfs-5.7-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: move inode flush to the sync workqueue
  xfs: fix partially uninitialized structure in xfs_reflink_remap_extent
  xfs: acquire superblock freeze protection on eofblocks scans
2020-04-18 11:46:39 -07:00
Zhiqiang Liu
c4b4c2a78a buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.
free_more_memory func has been completely removed in commit bc48f001de
("buffer: eliminate the need to call free_more_memory() in __getblk_slow()")

So comment and `WB_REASON_FREE_MORE_MEM` reason about free_more_memory
are no longer needed.

Fixes: bc48f001de ("buffer: eliminate the need to call free_more_memory() in __getblk_slow()")
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-17 21:38:11 -06:00