Commit Graph

57176 Commits

Author SHA1 Message Date
Brian Foster
627209fbcc xfs: create delalloc bmapi wrapper for full extent allocation
The writeback delalloc conversion code is racy with respect to
changes in the currently cached file mapping. This stems from the
fact that the bmapi allocation code requires a file range to
allocate and the writeback conversion code assumes the range of the
currently cached mapping is still valid with respect to the fork. It
may not be valid, however, because the ilock is cycled (potentially
multiple times) between the time the cached mapping was populated
and the delalloc conversion occurs.

To facilitate a solution to this problem, create a new
xfs_bmapi_delalloc() wrapper to xfs_bmapi_write() that takes a file
(FSB) offset and attempts to allocate whatever delalloc extent backs
the offset. Use a new bmapi flag to cause xfs_bmapi_write() to set
the range based on the extent backing the bno parameter unless bno
lands in a hole. If bno does land in a hole, fall back to the
current behavior (which may result in an error or quietly skipping
holes in the specified range depending on other parameters). This
patch does not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11 16:07:01 -08:00
Brian Foster
3b35089807 xfs: remove superfluous writeback mapping eof trimming
Now that the cached writeback mapping is explicitly invalidated on
data fork changes, the EOF trimming band-aid is no longer necessary.
Remove xfs_trim_extent_eof() as well since it has no other users.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11 16:07:01 -08:00
Brian Foster
d9252d526b xfs: validate writeback mapping using data fork seq counter
The writeback code caches the current extent mapping across multiple
xfs_do_writepage() calls to avoid repeated lookups for sequential
pages backed by the same extent. This is known to be slightly racy
with extent fork changes in certain difficult to reproduce
scenarios. The cached extent is trimmed to within EOF to help avoid
the most common vector for this problem via speculative
preallocation management, but this is a band-aid that does not
address the fundamental problem.

Now that we have an xfs_ifork sequence counter mechanism used to
facilitate COW writeback, we can use the same mechanism to validate
consistency between the data fork and cached writeback mappings. On
its face, this is somewhat of a big hammer approach because any
change to the data fork invalidates any mapping currently cached by
a writeback in progress regardless of whether the data fork change
overlaps with the range under writeback. In practice, however, the
impact of this approach is minimal in most cases.

First, data fork changes (delayed allocations) caused by sustained
sequential buffered writes are amortized across speculative
preallocations. This means that a cached mapping won't be
invalidated by each buffered write of a common file copy workload,
but rather only on less frequent allocation events. Second, the
extent tree is always entirely in-core so an additional lookup of a
usable extent mostly costs a shared ilock cycle and in-memory tree
lookup. This means that a cached mapping reval is relatively cheap
compared to the I/O itself. Third, spurious invalidations don't
impact ioend construction. This means that even if the same extent
is revalidated multiple times across multiple writepage instances,
we still construct and submit the same size ioend (and bio) if the
blocks are physically contiguous.

Update struct xfs_writepage_ctx with a new field to hold the
sequence number of the data fork associated with the currently
cached mapping. Check the wpc seqno against the data fork when the
mapping is validated and reestablish the mapping whenever the fork
has changed since the mapping was cached. This ensures that
writeback always uses a valid extent mapping and thus prevents lost
writebacks and stale delalloc block problems.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11 16:07:01 -08:00
Brian Foster
9f9bc034b8 xfs: update fork seq counter on data fork changes
The sequence counter in the xfs_ifork structure is only updated on
COW forks. This is because the counter is currently only used to
optimize out repetitive COW fork checks at writeback time.

Tweak the extent code to update the seq counter regardless of the
fork type in preparation for using this counter on data forks as
well.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11 16:07:00 -08:00
Marco Benatto
d519da41e2 xfs: Introduce XFS_PTAG_VERIFIER_ERROR panic mask
Currently we have a few PTAGs in place allowing us to transform a filesystem
error in a BUG() call.  However, we don't have a panic tag for corrupt
metadata, so introduce XFS_PTAG_VERIFIER_ERROR so that the administrator can
use the fs.xfs.panic_mask sysctl knob to convert any error detected by buffer
verifiers into a kernel panic.

Signed-off-by: Marco Benatto <mbenatto@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[darrick: light editing of commit message]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11 16:07:00 -08:00
YueHaibing
e88db81645 xfs: remove duplicated xfs_defer.h
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11 16:07:00 -08:00
Darrick J. Wong
654805367d xfs: check attribute name validity
Check extended attribute entry names for invalid characters.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:40 -08:00
Darrick J. Wong
e5d7d51b34 xfs: check directory name validity
Check directory entry names for invalid characters.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:40 -08:00
Darrick J. Wong
87c9607df2 xfs: fix off-by-one error in rtbitmap cross-reference
Fix an off-by-one error in the realtime bitmap "is used" cross-reference
helper function if the realtime extent size is a single block.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:40 -08:00
Darrick J. Wong
f8c1d7023e xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t
Teach scrub to flag extent maps that exceed the range that can be mapped
with a xfs_dablk_t.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:40 -08:00
Darrick J. Wong
3258cb208c xfs: abort xattr scrub if fatal signals are pending
The extended attribute scrubber should abort the "read all attrs" loop
if there's a fatal signal pending on the process.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:39 -08:00
Darrick J. Wong
f9e63342b8 xfs: consolidate scrub dinode mapping code into a single function
Move all the confusing dinode mapping code that's split between
xchk_iallocbt_check_cluster and xchk_iallocbt_check_cluster_ifree into
the first function so that it's clearer how we find the dinode for a
given inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:39 -08:00
Darrick J. Wong
4539b8a780 xfs: scrub big block inode btrees correctly
Teach scrub how to handle the case that there are one or more inobt
records covering a given inode cluster.  This fixes the operation on big
block filesystems (e.g. 64k blocks, 512 byte inodes).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:39 -08:00
Darrick J. Wong
b9454fe056 xfs: clean up the inode cluster checking in the inobt scrub
The code to check inobt records against inode clusters is a mess of
poorly named variables and unnecessary parameters.  Clean the
unnecessary inode number parameters out of _check_cluster_freemask in
favor of computing them inside the function instead of making the caller
do it.  In xchk_iallocbt_check_cluster, rename the variables to make it
more obvious just what chunk_ino and cluster_ino represent.

Add a tracepoint to make it easier to track each inode cluster as we
scrub it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:39 -08:00
Darrick J. Wong
a1954242fa xfs: hoist inode cluster checks out of loop
Hoist the inode cluster checks out of the inobt record check loop into
a separate function in preparation for refactoring of that loop.  No
functional changes here; that's in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:39 -08:00
Darrick J. Wong
22234c62f9 xfs: check inobt record alignment on big block filesystems
On a big block filesystem, there may be multiple inobt records covering
a single inode cluster.  These records obviously won't be aligned to
cluster alignment rules, and they must cover the entire cluster.  Teach
scrub to check for these things.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:39 -08:00
Darrick J. Wong
c050fdfeb5 xfs: check the ir_startino alignment directly
In xchk_iallocbt_rec, check the alignment of ir_startino by converting
the inode cluster block alignment into units of inodes instead of the
other way around (converting ir_startino to blocks).  This prevents us
from tripping over off-by-one errors in ir_startino which are obscured
by the inode -> block conversion.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:38 -08:00
Darrick J. Wong
435dcf0787 xfs: never try to scrub more than 64 inodes per inobt record
Make sure we never check more than XFS_INODES_PER_CHUNK inodes for any
given inobt record since there can be more than one inobt record mapped
to an inode cluster.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11 16:06:38 -08:00
Linus Torvalds
e5a8a11632 for-linus-20190209
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlxfARoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjsgEACP8vQzbvsOZOxHKi9Vcd8ziwyjyBebNh4F
 cKOx2Blgv0ReVAqLOVp9VJOJQoVQumV1btaA2YrmevxnCMpNUBpbP6G02tAqe9Z+
 D75FSpZXy4UvcMSlhfc/iB/RMI06benI9LnuL7zbzIQtrbtu+OFRnO6fpQOVGLxT
 Qa1wt/Rgahc48L4aHnIgPn0nyBRsEvuhC6FjI2D8akDaNiaHzwtGbpx7yDdmLNml
 fCzC2uSRJ31bXsO/5/fJorinaJ56r5N8aHaINYwXDv8zd8i94nQZhITAasXub1Km
 0nyuAg/fSzIdkrGmPINTKFaGYsOfRwpS4C4vagreBhzjfolPY0z9sQEQ63gZzDrd
 mAjHPxLTd165OLlR/RxoMC8AjZCZ0/YQaucxUOPkaIHfth5/dy5BFaCkWyA/I7/Z
 VnAyq0SqeL4hgIOGxZM0HeehKx+palNdJNZTcY7vF/7MVPuh5WM6z/FWsFa8k+ss
 B9YN4wchh7I8EVbLmfz9s/eqabRWF3Agh1dE+yAKwt1KIWHaMXWZTRQnj/69fs2e
 s3pwVMiiSz6K/Xnoe12nmQ4K0XeyKNROO78IIGY/Oa0Pe/hzCAaJMRMDsLp5EcJj
 dxpoi1OfGHMGoqYhL6tx6Atq5f6CMDrS28k/D44DHfO7T1qQGVy1A9SY7ZCfM5+c
 HKxTuRh8mg==
 =tuL6
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190209' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request from Christoph, fixing namespace locking when
   dealing with the effects log, and a rapid add/remove issue (Keith)

 - blktrace tweak, ensuring requests with -1 sectors are shown (Jan)

 - link power management quirk for a Smasung SSD (Hans)

 - m68k nfblock dynamic major number fix (Chengguang)

 - series fixing blk-iolatency inflight counter issue (Liu)

 - ensure that we clear ->private when setting up the aio kiocb (Mike)

 - __find_get_block_slow() rate limit print (Tetsuo)

* tag 'for-linus-20190209' of git://git.kernel.dk/linux-block:
  blk-mq: remove duplicated definition of blk_mq_freeze_queue
  Blk-iolatency: warn on negative inflight IO counter
  blk-iolatency: fix IO hang due to negative inflight counter
  blktrace: Show requests without sector
  fs: ratelimit __find_get_block_slow() failure message.
  m68k: set proper major_num when specifying module param major_num
  libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
  nvme-pci: fix rapid add remove sequence
  nvme: lock NS list changes while handling command effects
  aio: initialize kiocb private in case any filesystems expect it.
2019-02-09 10:26:09 -08:00
Linus Torvalds
8c8e62cc98 Driver core fixes for 5.0-rc6
Here are some driver core fixes for 5.0-rc6.
 
 Well, not so much "driver core" as "debugfs".  There's a lot of
 outstanding debugfs cleanup patches coming in through different
 subsystem trees, and in that process the debugfs core was found that it
 really should return errors when something bad happens, to prevent
 random files from showing up in the root of debugfs afterward.  So
 debugfs was fixed up to handle this properly, and then two fixes for
 the relay and blk-mq code was needed as it was making invalid
 assumptions about debugfs return values.
 
 There's also a cacheinfo fix in here that resolves a tiny issue.
 
 All of these have been in linux-next for over a week with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXF069g8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk0+gCgy9PTVAJR5ZbYtWTJOTdBnd7pfqMAoMuGxc+6
 LLEbfSykLRxEf5SeOJun
 =KP8e
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some driver core fixes for 5.0-rc6.

  Well, not so much "driver core" as "debugfs". There's a lot of
  outstanding debugfs cleanup patches coming in through different
  subsystem trees, and in that process the debugfs core was found that
  it really should return errors when something bad happens, to prevent
  random files from showing up in the root of debugfs afterward. So
  debugfs was fixed up to handle this properly, and then two fixes for
  the relay and blk-mq code was needed as it was making invalid
  assumptions about debugfs return values.

  There's also a cacheinfo fix in here that resolves a tiny issue.

  All of these have been in linux-next for over a week with no reported
  problems"

* tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  blk-mq: protect debugfs_create_files() from failures
  relay: check return of create_buf_file() properly
  debugfs: debugfs_lookup() should return NULL if not found
  debugfs: return error values, not NULL
  debugfs: fix debugfs_rename parameter checking
  cacheinfo: Keep the old value if of_property_read_u32 fails
2019-02-08 10:53:44 -08:00
Linus Torvalds
bd5ff862ec Changes since last update:
- Fix cache coherency problem with writeback mappings
 - Fix buffer deadlock when shutting fs down
 - Fix a null pointer dereference when running online repair
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlxYbyEACgkQ+H93GTRK
 tOtLNg//VIiU6w1EgiQBgk8XxN3i9YQLBIzfbtgZGtLJWX8zYnGx9H4iVfZ9UDr8
 eKFoPVLvqZo5mutuot+3+ps5H4z6g9BotS048FJjFoarQwtYqnG3tcFkmLDInKW1
 jTPWBV/P7w+ODyPO082SyQ+Zn9pooyXkPoBbgA+vbQoqIsY5IF7VeFasrFMtRu21
 PEm/CpMxK5VMly+5ceOoqtdlWvRPDfczLfzW/iDZ4Qs2itUqFA6TJo5TD7kd4A/f
 yrjV6H5tWtp0uvBCBDq4W225uVUFVWC+wTrrII6qbvuDBNWfBsQ65GczYubAAu9X
 kdJdY3xj/Br1dk6jLTciCjihbjJ49xaxXfLAokNkh1pjHqHyinB5ALXC0dG4o+eo
 d5y5qo10zt3HZ8Kzr8753SzxRBjGbQhok+ytrBSpX8GckhAXmH5S6WZDFDh6PbJj
 5PSwvL7FNbS4M/Myjl+dwk3kWLVrGV2SglOJxCCsqCZPxNzopIrNf1uazLTZV+/2
 d+G7LQPXSvjK/iLfDQH/6sBIREx0nd3H/6mnmWBg/1xMD6z/Hgn8GJvAE8luRtOi
 usXYcjlkSEOSwxbUC4fCo0CrPp8DOHbrEEO4pavTN+GVIYsIen0ghq/x0HfcOCEu
 XguyRTYdQXnLZNo3zCqmnU4/C1W2L5Oce4IDznH8PEIgqAqXXrQ=
 =XsPH
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "Here are a handful of XFS fixes to fix a data corruption problem, a
  crasher bug, and a deadlock.

  Summary:

   - Fix cache coherency problem with writeback mappings

   - Fix buffer deadlock when shutting fs down

   - Fix a null pointer dereference when running online repair"

* tag 'xfs-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: set buffer ops when repair probes for btree type
  xfs: end sync buffer I/O properly on shutdown error
  xfs: eof trim writeback mapping as soon as it is cached
2019-02-08 10:46:14 -08:00
Linus Torvalds
ee6c0737a0 Two small nfsd bugfixes for 5.0, for an RDMA bug and a file clone bug.
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcXJY6AAoJECebzXlCjuG+scAP/jm9ETUxC3E9ZcVetSLs7vf1
 AxHVsr3E3qf9uViq/+1NBRJ/BKE1Porzi1Uz01ie5MdY6SHWFMxvIvZrTRuEAhbc
 VA9xeRcnX+7RkPWik3sepSUieVy8KgJDMxdE07HwwyzST14I1s5Ev79wo6XiYlTw
 3+ZdZe19Y4owmTkbDiLsxJVsI2Y8b+9BIhZ9/ICRyFzZclnyLdO15HTDr4q7E9cw
 ZEZMOoljX4cjY8cD7tqf68QECxZYm4a8Ba+L6P+oqKajq/6yUrocXA2UG65EMtIQ
 LtMIdpkC+zHFagRQBC3ymqEWTpX9ED6TA4H4ZSdh6UH8NwYXHsxQUYfI4Oetju/B
 iqWBbXpwg2jMNRDhXS/KdNezYcGjYGJZ03dezeSP/GwojoiKALD0iXxWU2zyq0Qs
 crnhmc2j1wZZl5CFXLCYwjDjHbeH6gWGfLuzAv1Q9/jQUitQ2CpC4t1MCRdOlWBt
 cqjCleF6Rd3oVk51BdYPm5OyCHyQjrrXmOsx8aHkY774p7TqsaHBjSreTBjQWhO8
 wAfnr6yS4/21Hfrji52Nf4Q6UJ1FFEWB0wXJhYAHzem1RsPGSbnEDam6Bpd6Bh0d
 ZxAU4spoVhezQPjF4JCmHfiAJ7rbegfltJa669rE5L1kbUYYE6TAAcOm8RTSWoPZ
 iGxkg/en5XiGsOGH9AyB
 =++rV
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "Two small nfsd bugfixes for 5.0, for an RDMA bug and a file clone bug"

* tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linux:
  svcrdma: Remove max_sge check at connect time
  nfsd: Fix error return values for nfsd4_clone_file_range()
2019-02-07 15:44:45 -07:00
Linus Torvalds
076a3f5537 fuse fixes for 5.0-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXFls+wAKCRDh3BK/laaZ
 PC+hAQDRkyJeAmMzpHwvv/IASqpJgc6HrSzH0p201lDyARcKIAD+MWxZHYP4ltAn
 WVTLIvYT1xsoqGG3plfZ/d1iNbAWcwU=
 =cL/O
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
 "A fix for a CUSE regression introduced in v4.20, as well as fixes for
  a couple of old bugs"

* tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: decrement NR_WRITEBACK_TEMP on the right page
  fuse: call pipe_buf_release() under pipe lock
  cuse: fix ioctl
  fuse: handle zero sized retrieve correctly
2019-02-07 07:52:08 +00:00
Trond Myklebust
e3fdc89ca4 nfsd: Fix error return values for nfsd4_clone_file_range()
If the parameter 'count' is non-zero, nfsd4_clone_file_range() will
currently clobber all errors returned by vfs_clone_file_range() and
replace them with EINVAL.

Fixes: 42ec3d4c02 ("vfs: make remap_file_range functions take and...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06 15:32:05 -05:00
Tetsuo Handa
43636c804d fs: ratelimit __find_get_block_slow() failure message.
When something let __find_get_block_slow() hit all_mapped path, it calls
printk() for 100+ times per a second. But there is no need to print same
message with such high frequency; it is just asking for stall warning, or
at least bloating log files.

  [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.873324][T15342] b_state=0x00000029, b_size=512
  [  399.878403][T15342] device loop0 blocksize: 4096
  [  399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.890400][T15342] b_state=0x00000029, b_size=512
  [  399.895595][T15342] device loop0 blocksize: 4096
  [  399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.907471][T15342] b_state=0x00000029, b_size=512
  [  399.912506][T15342] device loop0 blocksize: 4096

This patch reduces frequency to up to once per a second, in addition to
concatenating three lines into one.

  [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-06 12:58:56 -07:00
Mike Marshall
ec51f8ee1e aio: initialize kiocb private in case any filesystems expect it.
A recent optimization had left private uninitialized.

Fixes: 2bc4ca9bb6 ("aio: don't zero entire aio_kiocb aio_get_req()")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-06 08:04:22 -07:00
Darrick J. Wong
add46b3b02 xfs: set buffer ops when repair probes for btree type
In xrep_findroot_block, we work out the btree type and correctness of a
given block by calling different btree verifiers on root block
candidates.  However, we leave the NULL b_ops while ->verify_read
validates the block, which means that if the verifier calls
xfs_buf_verifier_error it'll crash on the null b_ops.  Fix it to set
b_ops before calling the verifier and unsetting it if the verifier
fails.

Furthermore, improve the documentation around xfs_buf_ensure_ops, which
is the function that is responsible for cleaning up the b_ops state of
buffers that go through xrep_findroot_block but don't match anything.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-03 14:03:59 -08:00
Brian Foster
465fa17f4a xfs: end sync buffer I/O properly on shutdown error
As of commit e339dd8d8b ("xfs: use sync buffer I/O for sync delwri
queue submission"), the delwri submission code uses sync buffer I/O
for sync delwri I/O. Instead of waiting on async I/O to unlock the
buffer, it uses the underlying sync I/O completion mechanism.

If delwri buffer submission fails due to a shutdown scenario, an
error is set on the buffer and buffer completion never occurs. This
can cause xfs_buf_delwri_submit() to deadlock waiting on a
completion event.

We could check the error state before waiting on such buffers, but
that doesn't serialize against the case of an error set via a racing
I/O completion. Instead, invoke I/O completion in the shutdown case
regardless of buffer I/O type.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-03 14:03:06 -08:00
Brian Foster
aa6ee4ab69 xfs: eof trim writeback mapping as soon as it is cached
The cached writeback mapping is EOF trimmed to try and avoid races
between post-eof block management and writeback that result in
sending cached data to a stale location. The cached mapping is
currently trimmed on the validation check, which leaves a race
window between the time the mapping is cached and when it is trimmed
against the current inode size.

For example, if a new mapping is cached by delalloc conversion on a
blocksize == page size fs, we could cycle various locks, perform
memory allocations, etc.  in the writeback codepath before the
associated mapping is eventually trimmed to i_size. This leaves
enough time for a post-eof truncate and file append before the
cached mapping is trimmed. The former event essentially invalidates
a range of the cached mapping and the latter bumps the inode size
such the trim on the next writepage event won't trim all of the
invalid blocks. fstest generic/464 reproduces this scenario
occasionally and causes a lost writeback and stale delalloc blocks
warning on inode inactivation.

To work around this problem, trim the cached writeback mapping as
soon as it is cached in addition to on subsequent validation checks.
This is a minor tweak to tighten the race window as much as possible
until a proper invalidation mechanism is available.

Fixes: 40214d128e ("xfs: trim writepage mapping to within eof")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-03 14:02:49 -08:00
Linus Torvalds
312b3a93dd for-5.0-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlxWtJgACgkQxWXV+ddt
 WDsDow//ZpnyDwQWvSIfF2UUQOPlcBjbHKuBA7rU0wdybW635QYGR0mqnI+1VnMj
 7ssUkeN6N0a2gQzrUG4Y+zpdzOWv2xQ4jKZ9GMOp9gwyzEFyPkcFXOnmM8UfYtVu
 e3fK65e8BZHmTeu0kGKah4Dt1g0t4fUmhsKR4Pfp5YNJC+zuuGTwUW1K/ZQHXJ+3
 kTHc7WP1lsF7wgaZ+Gl+Kvp8fVrHVdygMVTdRBW8QaBgPLa/KExvK62jW+NmCYhj
 7OIkWdew7e8IXc3Ie5IbOomHAv7IgqqgiO9VO9+n0EpyV4UobUgxrgBKJ+0yc1Ya
 eidbKhMslwUE50y00JVm+vw0gwQHkR+hZDn/mRB6xiIeI8tu/yQIJZ6AhYJXoByR
 cu8+SNO5Z5dOZ1f146ZH8lnkr24tuSnkDUhbRDR5pdb4tAHHej2ALzhbfbwbPEpF
 IverYLw5fOMGeRU/mBsjkVadpSZ4S0HVNU85ERdhLtVLK1PSaY2UkUaA+Ii5y7au
 EYDjaGMflmJ8cAVqgtgedEff2n8OKDnzRZlz4IPLI73MVSITGZkM7PmYmYsLm2Bs
 mDPnmyqR8kzcd1RRtSeZTvqOpAIZG+QUOmD2jiKrchmp54Sz0V/HJWRs3aQybD6Q
 ph0yAcbkvgp/ewe5IFgaI0kcyH7zYdL6GtiI2WUE3/8DObgrgsA=
 =E2PP
 -----END PGP SIGNATURE-----

Merge tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - regression fix: transaction commit can run away due to delayed ref
   waiting heuristic, this is not necessary now because of the proper
   reservation mechanism introduced in 5.0

 - regression fix: potential crash due to use-before-check of an ERR_PTR
   return value

 - fix for transaction abort during transaction commit that needs to
   properly clean up pending block groups

 - fix deadlock during b-tree node/leaf splitting, when this happens on
   some of the fundamental trees, we must prevent new tree block
   allocation to re-enter indirectly via the block group flushing path

 - potential memory leak after errors during mount

* tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: On error always free subvol_name in btrfs_mount
  btrfs: clean up pending block groups when transaction commit aborts
  btrfs: fix potential oops in device_list_add
  btrfs: don't end the transaction for delayed refs in throttle
  Btrfs: fix deadlock when allocating tree block during leaf/node split
2019-02-03 08:48:33 -08:00
Linus Torvalds
b9de6efed2 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "24 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
  autofs: fix error return in autofs_fill_super()
  autofs: drop dentry reference only when it is never used
  fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
  mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
  psi: clarify the Kconfig text for the default-disable option
  mm, memory_hotplug: __offline_pages fix wrong locking
  mm: hwpoison: use do_send_sig_info() instead of force_sig()
  kasan: mark file common so ftrace doesn't trace it
  init/Kconfig: fix grammar by moving a closing parenthesis
  lib/test_kmod.c: potential double free in error handling
  mm, oom: fix use-after-free in oom_kill_process
  mm/hotplug: invalid PFNs from pfn_to_online_page()
  mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
  psi: fix aggregation idle shut-off
  mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
  mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
  oom, oom_reaper: do not enqueue same task twice
  mm: migrate: make buffer_migrate_page_norefs() actually succeed
  kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
  x86_64: increase stack size for KASAN_EXTRA
  ...
2019-02-02 09:32:58 -08:00
Linus Torvalds
33640d718c SMB3 fixes, some from this week's SMB3 test evemt, 5 for stable and a particularly important one for queryxattr (see xfstests 70 and 117)
-----BEGIN PGP SIGNATURE-----
 
 iQGyBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlxUf8kACgkQiiy9cAdy
 T1Fy8Qv4rlgVRBPcSJpvZKhFjMd5KskVIDnP0tOc5od+Mg+547eAg7UKKnJVDgKS
 OUtiP8uC/UErWvQ214hoXF1sMwoG+rDdTRdIOVDLD2a2gIIJzaIPADhYt4w5kiAX
 nO9NwPwk03KTzNUQmpdFPofbLK4csmJUR4kBNE+XhZTulmw9BkplGXnNkQGjsEdf
 nY6YdfqofE//DmR/KB1BylD0lxDtnfuB5zoELPmM6X4iWtD9W6pKVg23huEv+Dmh
 80LdYt77vI4RKEzOAmsYEpROdAmlCksVQC1nlh5EHOuMN/9TCdpdWRahJWWhbbKr
 /b8TuD+0QUUKEQdGvwM8DM/OCjnwoJphzI8CK1VzRpM4XkZxpQTzng2zVFPNzs4Y
 wwKCsxxTO7ePGg6hE/DTnRn9XJtKbR1nQsIjHcw6rE0ydl+P6YRZDUEehqyRU+jd
 Z09XCPS1+zeUPE6PRE9wlrIt7QdxOynufhaWRKf18b2b4cfKbNGcd+rMMZPagUQe
 x2CMkEU=
 =qX/z
 -----END PGP SIGNATURE-----

Merge tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb3 fixes from Steve French:
 "SMB3 fixes, some from this week's SMB3 test evemt, 5 for stable and a
  particularly important one for queryxattr (see xfstests 70 and 117)"

* tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module version number
  CIFS: fix use-after-free of the lease keys
  CIFS: Do not consider -ENODATA as stat failure for reads
  CIFS: Do not count -ENODATA as failure for query directory
  CIFS: Fix trace command logging for SMB2 reads and writes
  CIFS: Fix possible oops and memory leaks in async IO
  cifs: limit amount of data we request for xattrs to CIFSMaxBufSize
  cifs: fix computation for MAX_SMB2_HDR_SIZE
2019-02-01 16:53:01 -08:00
Ian Kent
f585b283e3 autofs: fix error return in autofs_fill_super()
In autofs_fill_super() on error of get inode/make root dentry the return
should be ENOMEM as this is the only failure case of the called
functions.

Link: http://lkml.kernel.org/r/154725123240.11260.796773942606871359.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Pan Bian
63ce5f552b autofs: drop dentry reference only when it is never used
autofs_expire_run() calls dput(dentry) to drop the reference count of
dentry.  However, dentry is read via autofs_dentry_ino(dentry) after
that.  This may result in a use-free-bug.  The patch drops the reference
count of dentry only when it is never used.

Link: http://lkml.kernel.org/r/154725122396.11260.16053424107144453867.stgit@pluto-themaw-net
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Jan Kara
c27d82f52f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
When superblock has lots of inodes without any pagecache (like is the
case for /proc), drop_pagecache_sb() will iterate through all of them
without dropping sb->s_inode_list_lock which can lead to softlockups
(one of our customers hit this).

Fix the problem by going to the slow path and doing cond_resched() in
case the process needs rescheduling.

Link: http://lkml.kernel.org/r/20190114085343.15011-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Alexey Dobriyan
1fde6f21d9 proc: fix /proc/net/* after setns(2)
/proc entries under /proc/net/* can't be cached into dcache because
setns(2) can change current net namespace.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: avoid vim miscolorization]
[adobriyan@gmail.com: write test, add dummy ->d_revalidate hook: necessary if /proc/net/* is pinned at setns time]
  Link: http://lkml.kernel.org/r/20190108192350.GA12034@avx2
Link: http://lkml.kernel.org/r/20190107162336.GA9239@avx2
Fixes: 1da4d377f9 ("proc: revalidate misc dentries")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Mateusz Stępień <mateusz.stepien@netrounds.com>
Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:22 -08:00
Linus Torvalds
9ace868a17 Changes since last update:
- fix page migration when using iomap for pagecache management
 - fix a use-after-free bug in the directio code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlxPK1IACgkQ+H93GTRK
 tOvzeA/+I4bWVmovfV+EGFzHSV6zRsy17v6c4ncMrdia41rhmvxl+sAJgj2+uFCb
 J37cCMpPrmAOz+JGIW7PbCt8uzmwaXOfB9p9N58wM+hSSxtlN+wZFsIaoepOUTDK
 t3e2L7QQxQjN9HXZU0RNUi/zgS3poDfzap7cZ71spBxX5hVd1zVQa0q/o5OXr7OI
 sBlZLsIOhmS8WU2TmfkwzUVi+/FR4dCgyP8eDAGho/KbwvO9sfWzLNxf0U/ORWfA
 JG+2LX42eKKjX6wo39zW1mAXwOBhnLnCqOOgKrDy8XRXPARiNAzLNn0AwhxJSAqD
 z3qE298Oag6gZo0lNJjDXIko5D9y2koWjQe7z4fJ2JYpV5mSyq8/F4XDNO2FKIap
 07p0OBGa1yfwQYmS5TrhJvYwvsHqTNs122jpowqeD3o0Xh64y2TZLEMdhhIsRjga
 +S9OSVQ15JDf/QeI5LPGK6Oc6B3JnRYgrYf7g7DYu4eqEsJ3V3pqtbXzjxGkzUjx
 5xf85ujuRUQoKCPZQ00ewmsfZMfOcaYqfhosx6LvR6ZPvPH3Ex3nLHks5ZV/eXfR
 Auusq6XMiHb5ljukfVCa0WStntUl5gMaRha5QJy1Vg5Zd1ikvTkB5CkJFlODJ8hS
 GaEIa58Gf75dkHOvm4bFEOoy2EcE3I+qdwR+0xgg+6gDlNeZZNQ=
 =KYgc
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fixes from Darrick Wong:
 "A couple of iomap fixes to eliminate some memory corruption and hang
  problems that were reported:

   - fix page migration when using iomap for pagecache management

   - fix a use-after-free bug in the directio code"

* tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: fix a use after free in iomap_dio_rw
  iomap: get/put the page in iomap_page_create/release()
2019-02-01 10:30:18 -08:00
Andreas Gruenbacher
e74c98ca2d gfs2: Revert "Fix loop in gfs2_rbm_find"
This reverts commit 2d29f6b96d.

It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone.  Let's revert this
for now to have more time for a proper fix.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-31 11:45:11 -08:00
Linus Torvalds
937108b093 NFS client fixes for Linux 5.0
Stable bugfix:
 - Fix up return value on fatal errors in nfs_page_async_flush()
 
 Other bugfix:
 - Fix NULL pointer dereference of dev_name
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlxTOEsACgkQ18tUv7Cl
 QOuS2A//U2J1xz2N8R/k9I4puMXss+DpUAfryNRrDul0qL4tsr7UhHzHezJVl17X
 coPGA/YD+voybyT+eYeACCHUhDMNN8gj2KoCMlE1ueWAbiCOxrS4NgFM2djO3lka
 dlfqgSbVS1Z7+KtEEiFGq/HiF6y0WxanMBHnfhllNbXBDE6W0/+EPdgjX7fZF3FF
 AS6QQmruXL/b1/hJasfTsF3wcHs3y+Y23RP85j4F8aYrcWLOyPUhhuzv/o6Zoh37
 fqltMxueWy+2qpn8dBE+9ILuKnUxnIsIwpF4YFhI7XrQlqMIWYMrShiqSDqYeVUP
 3qdX8LtRR2VsNCTDR9HamVtCkbi9DkJRXQA/fChVPiLA+P0W2Q2uiKsNKEijuZdl
 9fvl9aIL/+glczHrZeJTKellFSEocaZ/L5gVmpM6Fk8zyFitP0+nkO40g/qou+A0
 O77A+EK9v4XPe8z87kwrZhphT12QZK2oIPMAZDnjitktbuObip0Wva4w92KnIqK0
 QPIN081oxNF7BnWEUESCTeqXl670lV83Xek1eVHSCTnFOI68riP1YoUQlIhujV/R
 82J+y6HJYtLDj87NuJrAAXtUrtzAPDr39TJr3V2aH0kdpPajUAhkC3gLix13ORyM
 cmP3K1M3U5f3HAElrywqQrGcxaYKN/Hpfb2427vEnbxieTKElVo=
 =ZOXa
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.0-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "This addresses two bugs, one in the error code handling of
  nfs_page_async_flush() and one to fix a potential NULL pointer
  dereference in nfs_parse_devname().

  Stable bugfix:
   - Fix up return value on fatal errors in nfs_page_async_flush()

  Other bugfix:
   - Fix NULL pointer dereference of dev_name"

* tag 'nfs-for-5.0-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Fix up return value on fatal errors in nfs_page_async_flush()
  nfs: Fix NULL pointer dereference of dev_name
2019-01-31 10:13:05 -08:00
Steve French
b9b9378b49 cifs: update internal module version number
To 2.17

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-31 07:05:06 -06:00
Aurelien Aptel
d339adc12a CIFS: fix use-after-free of the lease keys
The request buffers are freed right before copying the pointers.
Use the func args instead which are identical and still valid.

Simple reproducer (requires KASAN enabled) on a cifs mount:

echo foo > foo ; tail -f foo & rm foo

Cc: <stable@vger.kernel.org> # 4.20
Fixes: 179e44d49c ("smb3: add tracepoint for sending lease break responses to server")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
2019-01-31 07:03:20 -06:00
Waiman Long
af0c9af1b3 fs/dcache: Track & report number of negative dentries
The current dentry number tracking code doesn't distinguish between
positive & negative dentries.  It just reports the total number of
dentries in the LRU lists.

As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.

This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the 5th field in the
/proc/sys/fs/dentry-state file.  The number, however, does not include
negative dentries that are in flight but not in the LRU yet as well as
those in the shrinker lists which are on the way out anyway.

The number of positive dentries in the LRU lists can be roughly found by
subtracting the number of negative dentries from the unused count.

Matthew Wilcox had confirmed that since the introduction of the
dentry_stat structure in 2.1.60, the dummy array was there, probably for
future extension.  They were not replacements of pre-existing fields.
So no sane applications that read the value of /proc/sys/fs/dentry-state
will do dummy thing if the last 2 fields of the sysctl parameter are not
zero.  IOW, it will be safe to use one of the dummy array entry for
negative dentry count.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-30 11:02:11 -08:00
Waiman Long
1dbd449c99 fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.

The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused.  This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().

To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.

Fixes: 4e717f5c10 ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable@kernel.org
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-30 11:02:11 -08:00
Eric W. Biederman
532b618bdf btrfs: On error always free subvol_name in btrfs_mount
The subvol_name is allocated in btrfs_parse_subvol_options and is
consumed and freed in mount_subvol.  Add a free to the error paths that
don't call mount_subvol so that it is guaranteed that subvol_name is
freed when an error happens.

Fixes: 312c89fbca ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()")
Cc: stable@vger.kernel.org # v4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-30 18:16:47 +01:00
David Sterba
c7cc64a985 btrfs: clean up pending block groups when transaction commit aborts
The fstests generic/475 stresses transaction aborts and can reveal
space accounting or use-after-free bugs regarding block goups.

In this case the pending block groups that remain linked to the
structures after transaction commit aborts in the middle.

The corrupted slabs lead to failures in following tests, eg. generic/476

  [ 8172.752887] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
  [ 8172.755799] #PF error: [normal kernel read fault]
  [ 8172.757571] PGD 661ae067 P4D 661ae067 PUD 3db8e067 PMD 0
  [ 8172.759000] Oops: 0000 [#1] PREEMPT SMP
  [ 8172.760209] CPU: 0 PID: 39 Comm: kswapd0 Tainted: G        W         5.0.0-rc2-default #408
  [ 8172.762495] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
  [ 8172.765772] RIP: 0010:shrink_page_list+0x2f9/0xe90
  [ 8172.770453] RSP: 0018:ffff967f00663b18 EFLAGS: 00010287
  [ 8172.771184] RAX: 0000000000000000 RBX: ffff967f00663c20 RCX: 0000000000000000
  [ 8172.772850] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c0620ab20e0
  [ 8172.774629] RBP: ffff967f00663dd8 R08: 0000000000000000 R09: 0000000000000000
  [ 8172.776094] R10: ffff8c0620ab22f8 R11: ffff8c063f772688 R12: ffff967f00663b78
  [ 8172.777533] R13: ffff8c063f625600 R14: ffff8c063f625608 R15: dead000000000200
  [ 8172.778886] FS:  0000000000000000(0000) GS:ffff8c063d400000(0000) knlGS:0000000000000000
  [ 8172.780545] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8172.781787] CR2: 0000000000000058 CR3: 000000004e962000 CR4: 00000000000006f0
  [ 8172.783547] Call Trace:
  [ 8172.784112]  shrink_inactive_list+0x194/0x410
  [ 8172.784747]  shrink_node_memcg.constprop.85+0x3a5/0x6a0
  [ 8172.785472]  shrink_node+0x62/0x1e0
  [ 8172.786011]  balance_pgdat+0x216/0x460
  [ 8172.786577]  kswapd+0xe3/0x4a0
  [ 8172.787085]  ? finish_wait+0x80/0x80
  [ 8172.787795]  ? balance_pgdat+0x460/0x460
  [ 8172.788799]  kthread+0x116/0x130
  [ 8172.789640]  ? kthread_create_on_node+0x60/0x60
  [ 8172.790323]  ret_from_fork+0x24/0x30
  [ 8172.794253] CR2: 0000000000000058

or accounting errors at umount time:

  [ 8159.537251] WARNING: CPU: 2 PID: 19031 at fs/btrfs/extent-tree.c:5987 btrfs_free_block_groups+0x3d5/0x410 [btrfs]
  [ 8159.543325] CPU: 2 PID: 19031 Comm: umount Tainted: G        W         5.0.0-rc2-default #408
  [ 8159.545472] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
  [ 8159.548155] RIP: 0010:btrfs_free_block_groups+0x3d5/0x410 [btrfs]
  [ 8159.554030] RSP: 0018:ffff967f079cbde8 EFLAGS: 00010206
  [ 8159.555144] RAX: 0000000001000000 RBX: ffff8c06366cf800 RCX: 0000000000000000
  [ 8159.556730] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8c06255ad800
  [ 8159.558279] RBP: ffff8c0637ac0000 R08: 0000000000000001 R09: 0000000000000000
  [ 8159.559797] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8c0637ac0108
  [ 8159.561296] R13: ffff8c0637ac0158 R14: 0000000000000000 R15: dead000000000100
  [ 8159.562852] FS:  00007f7f693b9fc0(0000) GS:ffff8c063d800000(0000) knlGS:0000000000000000
  [ 8159.564839] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8159.566160] CR2: 00007f7f68fab7b0 CR3: 000000000aec7000 CR4: 00000000000006e0
  [ 8159.567898] Call Trace:
  [ 8159.568597]  close_ctree+0x17f/0x350 [btrfs]
  [ 8159.569628]  generic_shutdown_super+0x64/0x100
  [ 8159.570808]  kill_anon_super+0x14/0x30
  [ 8159.571857]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [ 8159.573063]  deactivate_locked_super+0x29/0x60
  [ 8159.574234]  cleanup_mnt+0x3b/0x70
  [ 8159.575176]  task_work_run+0x98/0xc0
  [ 8159.576177]  exit_to_usermode_loop+0x83/0x90
  [ 8159.577315]  do_syscall_64+0x15b/0x180
  [ 8159.578339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

This fix is based on 2 Josef's patches that used sideefects of
btrfs_create_pending_block_groups, this fix introduces the helper that
does what we need.

CC: stable@vger.kernel.org # 4.4+
CC: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-30 18:16:47 +01:00
Al Viro
92900e5160 btrfs: fix potential oops in device_list_add
alloc_fs_devices() can return ERR_PTR(-ENOMEM), so dereferencing its
result before the check for IS_ERR() is a bad idea.

Fixes: d1a6300282 ("btrfs: add members to fs_devices to track fsid changes")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-30 18:16:40 +01:00
Greg Kroah-Hartman
37ea7b630a debugfs: debugfs_lookup() should return NULL if not found
Lots of callers of debugfs_lookup() were just checking NULL to see if
the file/directory was found or not.  By changing this in ff9fb72bc0
("debugfs: return error values, not NULL") we caused some subsystems to
easily crash.

Fixes: ff9fb72bc0 ("debugfs: return error values, not NULL")
Reported-by: syzbot+b382ba6a802a3d242790@syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-30 12:39:49 +01:00
Pavel Shilovsky
082aaa8700 CIFS: Do not consider -ENODATA as stat failure for reads
When doing reads beyound the end of a file the server returns
error STATUS_END_OF_FILE error which is mapped to -ENODATA.
Currently we report it as a failure which confuses read stats.
Change it to not consider -ENODATA as failure for stat purposes.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-01-29 17:27:16 -06:00
Pavel Shilovsky
8e6e72aece CIFS: Do not count -ENODATA as failure for query directory
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-01-29 17:24:53 -06:00
Pavel Shilovsky
7d42e72fe8 CIFS: Fix trace command logging for SMB2 reads and writes
Currently we log success once we send an async IO request to
the server. Instead we need to analyse a response and then log
success or failure for a particular command. Also fix argument
list for read logging.

Cc: <stable@vger.kernel.org> # 4.18
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-01-29 17:19:56 -06:00