Commit Graph

4878 Commits

Author SHA1 Message Date
Darrick J. Wong
ac503a4cc9 xfs: refactor the geometry structure filling function
Refactor the geometry structure filling function to use the superblock
to fill the fields.  While we're at it, make the function less indenty
and use some whitespace to make the function easier to read.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-08 10:54:48 -08:00
Darrick J. Wong
c368ebcd4c xfs: hoist xfs_fs_geometry to libxfs
Move xfs_fs_geometry to libxfs so that we can clean up the fs geometry
reporting in xfsprogs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-08 10:54:48 -08:00
Darrick J. Wong
b872af2c87 xfs: trace log reservations at mount time
At each mount, emit the transaction reservation type information via
tracepoints.  This makes it easier to compare the log reservation info
calculated by the kernel and xfsprogs so that we can more easily diagnose
minimum log size failures on freshly formatted filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-08 10:54:47 -08:00
Darrick J. Wong
9c712a1346 xfs: dump the first 128 bytes of any corrupt buffer
Increase the corrupt buffer dump to the first 128 bytes since v5
filesystems have larger block headers than before.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:47 -08:00
Darrick J. Wong
d9418ed08a xfs: teach error reporting functions to take xfs_failaddr_t
Convert the two other error reporting functions to take xfs_failaddr_t
when the caller wishes to capture a code pointer instead of the classic
void * pointer.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:47 -08:00
Darrick J. Wong
eebf3cab9c xfs: standardize quota verification function outputs
Rename xfs_dqcheck to xfs_dquot_verify and make it return an
xfs_failaddr_t like every other structure verifier function.
This enables us to check on-disk quotas in the same way that we check
everything else.  Callers are now responsible for logging errors, as
XFS_QMOPT_DOWARN goes away.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:47 -08:00
Darrick J. Wong
eeea798028 xfs: separate dquot repair into a separate function
Move the dquot repair code into a separate function and remove
XFS_QMOPT_DQREPAIR in favor of calling the helper directly.  Remove
other dead code because quotacheck is the only caller of DQREPAIR.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:47 -08:00
Darrick J. Wong
b55725974c xfs: create a new buf_ops pointer to verify structure metadata
Expose all metadata structure buffer verifier functions via buf_ops.
These will be used by the online scrub mechanism to look for problems
with buffers that are already sitting around in memory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:47 -08:00
Darrick J. Wong
8ba92d43d4 xfs: fail out of xfs_attr3_leaf_lookup_int if it looks corrupt
If the xattr leaf block looks corrupt, return -EFSCORRUPTED to userspace
instead of ASSERTing on debug kernels or running off the end of the
buffer on regular kernels.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:47 -08:00
Darrick J. Wong
9cfb9b4747 xfs: provide a centralized method for verifying inline fork data
Replace the current haphazard dir2 shortform verifier callsites with a
centralized verifier function that can be called either with the default
verifier functions or with a custom set.  This helps us strengthen
integrity checking while providing us with flexibility for repair tools.

xfs_repair wants this to be able to supply its own verifier functions
when trying to fix possibly corrupt metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:47 -08:00
Darrick J. Wong
dc042c2d8f xfs: refactor short form directory structure verifier function
Change the short form directory structure verifier function to return
the instruction pointer of a failing check or NULL if everything's ok.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:46 -08:00
Darrick J. Wong
0795e004fd xfs: create structure verifier function for short form symlinks
Create a function to check the structure of short form symlink targets.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:46 -08:00
Darrick J. Wong
1e1bbd8e7e xfs: create structure verifier function for shortform xattrs
Create a function to perform structure verification for short form
extended attributes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:46 -08:00
Darrick J. Wong
71493b839e xfs: move inode fork verifiers to xfs_dinode_verify
Consolidate the fork size and format verifiers to xfs_dinode_verify so
that we can reject bad inodes earlier and in a single place.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:46 -08:00
Darrick J. Wong
50aa90ef03 xfs: verify dinode header first
Move the v3 inode integrity information (crc, owner, metauuid) before we
look at anything else in the inode so that we don't waste time on a torn
write or a totally garbled block.  This makes xfs_dinode_verify more
consistent with the other verifiers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:46 -08:00
Darrick J. Wong
bc1a09b8e3 xfs: refactor verifier callers to print address of failing check
Refactor the callers of verifiers to print the instruction address of a
failing check.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:46 -08:00
Darrick J. Wong
a6a781a58b xfs: have buffer verifier functions report failing address
Modify each function that checks the contents of a metadata buffer to
return the instruction address of the failing test so that we can report
more precise failure errors to the log.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:46 -08:00
Darrick J. Wong
31ca03c92c xfs: refactor xfs_verifier_error and xfs_buf_ioerror
Since all verification errors also mark the buffer as having an error,
we can combine these two calls.  Later we'll add a xfs_failaddr_t
parameter to promote the idea of reporting corruption errors and the
address of the failing check to enable better debugging reports.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:45 -08:00
Darrick J. Wong
9101d3707b xfs: remove XFS_WANT_CORRUPTED_RETURN from dir3 data verifiers
Since __xfs_dir3_data_check verifies on-disk metadata, we can't have it
noisily blowing asserts and hanging the system on corrupt data coming in
off the disk.  Instead, have it return a boolean like all the other
checker functions, and only have it noisily fail if we fail in debug
mode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:45 -08:00
Darrick J. Wong
e1e55aaf1c xfs: refactor short form btree pointer verification
Now that we have xfs_verify_agbno, use it to verify short form btree
pointers instead of open-coding them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:45 -08:00
Darrick J. Wong
8368a6019d xfs: refactor long-format btree header verification routines
Create two helper functions to verify the headers of a long format
btree block.  We'll use this later for the realtime rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:45 -08:00
Darrick J. Wong
59f6fec3bd xfs: remove XFS_FSB_SANITY_CHECK
We already have a function to verify fsb pointers, so get rid of the
last users of the (less robust) macro.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:54:45 -08:00
Darrick J. Wong
d658e72b4a xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode
In xfs_scrub_get_inode, we don't do a good enough job distinguishing
EINVAL returns from xfs_iget w/ IGET_UNTRUSTED -- this can happen if the
passed in inode number is invalid (past eofs, inobt says it isn't an
inode) or if the inum is actually valid but the inode buffer fails
verifier.  In the first case we still want to return ENOENT, but in the
second case we want to capture the corruption error.

Therefore, if xfs_iget returns EINVAL, try the raw imap lookup.  If that
succeeds, we conclude it's a corruption error, otherwise we just bounce
out to userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:49:04 -08:00
Darrick J. Wong
1ad1205e71 xfs: always grab transaction when scrubbing inode
Always allocate a transaction for inode scrubbing, even if the _iget
fails.  This is something that is nice to have now for consistency with
the other scrubbers but will become critical when we get to online
repair where we'll actually use the transaction + raw buffer read to fix
the verifier errors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:49:03 -08:00
Darrick J. Wong
2b9e9b5771 xfs: xfs_scrub_bmap should use for_each_xfs_iext
Refactor xfs_scrub_bmap to use for_each_xfs_iext now that it exists.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:49:03 -08:00
Darrick J. Wong
e5b37faa93 xfs: catch a few more error codes when scrubbing secondary sb
The superblock validation routines return a variety of error codes to
reject a mount request.  For scrub we can assume that the mount
succeeded, so if we see these things appear when scrubbing secondary sb
X, we can treat them all like corruption.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:49:02 -08:00
Darrick J. Wong
5a0f433745 xfs: ignore agfl read errors when not scrubbing agfl
In xfs_scrub_ag_read_headers, if we're not scrubbing the AGFL but
hit a read error reading the AGFL, we should reset the error code
so that it doesn't propagate up into the caller.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:49:02 -08:00
Brian Foster
c017cb5ddf xfs: eliminate duplicate icreate tx reservation functions
The create transaction reservation calculation has two different
branches of code depending on whether the filesystem is a v5 format
fs or older. Each branch considers the max reservation between the
allocation case (new chunk allocation + record insert) and the
modify case (chunk exists, record modification) of inode allocation.

The modify case is the same for both superblock versions with the
exception of the finobt. The finobt helper checks the feature bit,
however, and so the modify case already shares the same code.

Now that inode chunk allocation has been refactored into a helper
that checks the superblock version to calculate the appropriate
reservation for the create transaction, the only remaining
difference between the create and icreate branches is the call to
the finobt helper. As noted above, the finobt helper is a no-op when
the feature is not enabled. Therefore, these branches are
effectively duplicate and can be condensed.

Remove the xfs_calc_create_*() branch of functions and update the
various callers to use the xfs_calc_icreate_*() variant. The latter
creates the same reservation size for v4 create transactions as the
removed branch. As such, this patch does not result in transaction
reservation changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:38 -08:00
Brian Foster
57af33e451 xfs: refactor inode chunk alloc/free tx reservation
The reservation for the various forms of inode allocation is
scattered across several different functions. This includes two
variants of chunk allocation (v5 icreate transactions vs. older
create transactions) and the inode free transaction.

To clean up some of this code and clarify the purpose of specific
allocfree reservations, continue the pattern of defining helper
functions for smaller operational units of broader transactions.
Refactor the reservation into an inode chunk alloc/free helper that
considers the various conditions based on filesystem format.

An inode chunk free involves an extent free and buffer
invalidations. The latter requires reservation for log headers only.
An inode chunk allocation modifies the free space btrees and logs
the chunk on v4 supers. v5 supers initialize the inode chunk using
ordered buffers and so do not log the chunk.

As a side effect of this refactoring, add one more allocfree res to
the ifree transaction. Technically this does not serve a specific
purpose because inode chunks are freed via deferred operations and
thus occur after a transaction roll. tr_ifree has a bit of a history
of tx overruns caused by too many agfl fixups during sustained file
deletion workloads, so add this extra reservation as a form of
padding nonetheless.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:38 -08:00
Brian Foster
f03c78f397 xfs: include an allocfree res for inobt modifications
Analysis of recent reports of log reservation overruns and code
inspection has uncovered that the reservations associated with inode
operations may not cover the worst case scenarios. In particular,
many cases only include one allocfree res. for a particular
operation even though said operations may also entail AGFL fixups
and inode btree block allocations in addition to the actual inode
chunk allocation. This can easily turn into two or three block
allocations (or frees) per operation.

In theory, the only way to define the worst case reservation is to
include an allocfree res for each individual allocation in a
transaction. Since that is impractical (we can perform multiple agfl
fixups per tx and not every allocation results in a full tree
operation), we need to find a reasonable compromise that addresses
the deficiency in practice without blowing out the size of the
transactions.

Since the inode btrees are not filled by the AGFL, record insertion
and removal can directly result in block allocations and frees
depending on the shape of the tree. These allocations and frees
occur in the same transaction context as the inobt update itself,
but are separate from the allocation/free that might be required for
an inode chunk. Therefore, it makes sense to assume that an [f]inobt
insert/remove can directly result in one or more block allocations
on behalf of the tree.

Refactor the inode transaction reservations to include one allocfree
res. per inode btree modification to cover allocations required by
the tree itself. This separates the reservation required to allocate
the inode chunk from the reservation required for inobt record
insertion/removal. Apply the same logic to the finobt. This results
in killing off the finobt modify condition because we no longer
assume that the broader transaction reservation will cover finobt
block allocations and finobt shape changes can occur in either of
the inobt allocation or modify situations.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:37 -08:00
Brian Foster
a606ebdb85 xfs: truncate transaction does not modify the inobt
The truncate transaction does not ever modify the inode btree, but
includes an associated log reservation. Update
xfs_calc_itruncate_reservation() to remove the reservation
associated with inobt updates.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:37 -08:00
Brian Foster
e8341d9f63 xfs: fix up agi unlinked list reservations
The current AGI unlinked list addition and removal reservations do
not reflect the worst case log usage. An unlinked list removal can
log up to two on-disk inode clusters but only includes reservation
for one. An unlinked list addition logs the on-disk cluster but
includes reservation for an in-core inode.

Update the AGI unlinked list reservation helpers to calculate the
correct worst case reservation for the associated operations.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:36 -08:00
Brian Foster
a6f485908d xfs: include inobt buffers in ifree tx log reservation
The tr_ifree transaction handles inode unlinks and inode chunk
frees. The current transaction calculation does not accurately
reflect worst case changes to the inode btree, however. The inobt
portion of the current transaction reservation only covers
modification of a single inobt buffer (for the particular inode
record). This is a historical artifact from the days before XFS
supported full inode chunk removal.

When support for inode chunk removal was added in commit
254f6311ed1b ("Implement deletion of inode clusters in XFS."), the
additional log reservation required for chunk removal was not added
correctly. The new reservation only considered the header overhead
of associated buffers rather than the full contents of the btrees
and AGF and AGFL buffers affected by the transaction. The
reservation for the free space btrees was subsequently fixed up in
commit 5fe6abb82f76 ("Add space for inode and allocation btrees to
ITRUNCATE log reservation"), but the res. for full inobt joins has
never been added.

Further review of the ifree reservation uncovered a couple more
problems:

- The undocumented +2 blocks are intended for the AGF and AGFL, but
  are also not sized correctly and should be logged as full sectors
  (not FSBs).
- The additional single block header is undocumented and serves no
  apparent purpose.

Update xfs_calc_ifree_reservation() to include a full inobt join in
the reservation calculation. Refactor the undocumented blocks
appropriately and fix up the comments to reflect the current
calculation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:36 -08:00
Brian Foster
2c8f626539 xfs: print transaction log reservation on overrun
The transaction dump code displays the content and reservation
consumption of a particular transaction in the event of an overrun.
It currently displays the reservation associated with the
transaction ticket, but not the original reservation attached to the
transaction.

The latter value reflects the original transaction reservation
calculation before additional reservation overhead is assigned, such
as for the CIL context header and potential split region headers.

Update xlog_print_trans() to also print the original transaction
reservation in the event of overrun. This provides a reference point
to identify how much reservation overhead was added to a particular
ticket by xfs_log_calc_unit_res().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:35 -08:00
Darrick J. Wong
29c1c123a3 xfs: scrub inode nsec fields
Check that the nanosecond fields in each timestamp aren't larger
than a billion.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-08 10:41:35 -08:00
Eric Sandeen
8e63083762 xfs: move all scrub input checking to xfs_scrub_validate
There were ad-hoc checks for some scrub types but not others;
mark each scrub type with ... it's type, and use that to validate
the allowed and/or required input fields.

Moving these checks out of xfs_scrub_setup_ag_header makes it
a thin wrapper, so unwrap it in the process.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[darrick: add xfs_ prefix to enum, check scrub args after checking type]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:34 -08:00
Eric Sandeen
0a085ddf0e xfs: factor out scrub input checking
Do this before adding more core checks.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:34 -08:00
Eric Sandeen
bfb3e9b926 xfs: explicitly initialize meta_scrub_ops array by type
An implicit mapping to type by order of initialization seems
error-prone, and doesn't lend itself to cscope-ing.

Also add sanity checks about size of array vs. max types,
and a defensive check that ->scrub exists before using it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:33 -08:00
Richard Wareing
a015831596 xfs: Show realtime device stats on statfs calls if realtime flags set
- Reports realtime device free blocks in statfs calls if (realtime)
  inheritance bit is set on the inode of directory, or realtime flag
  in the case of files.  This is a bit more intuitive, especially for
  use-cases which are using a much larger device for the realtime device.
- Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
  realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
  option.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Richard Wareing <rwareing@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08 10:41:33 -08:00
Linus Torvalds
12e971b652 Changes since last update:
- Fix resource cleanup of failed quota initialization
 - Fix integer overflow problems wrt s_maxbytes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJaS8yUAAoJEPh/dxk0SrTrrNMP+gLCitWenObhf6uA0Aysb3Vr
 EnhNFaqZA7RRLbQRwLESblvhExp9WTrtFmWOAFh1Q0ETBEIazIGkXfKDeOChxaCY
 LMPb83vQarZoV++HoiBeFbShf39dFw2ufGHyveZwvxk4kgYgQRFzIVZbRTg7CA/C
 nMLPZ9IBDBhEwnCVpH+gKJMcU6j5I9IIePwaEIKnB0o99fsEgZfnM0B4Wl0DRrzn
 nE6DOvkGZiNF4on1J2KgL2rB0r+VEyyMtBTCRs519rEaa8ACFUQDqEqoUIC92SnS
 pD/n9S2JwVH1dLX7cRoiMQcX/r4do83LlK0IvMswApMuNqYRQU6332lwosdgo7KQ
 8+antAlVKuqMAGNvhVWMy1DuaRO5gCqRwL1wpzebNHsw4eRsDD2MNkeLXbM2P2oL
 5OflIrPLMlLORlPtwbJclm8CcnQzQGMAa5yEDJcU1PIWH/urdRd+KqWQ+N0Zfj6m
 J3L4tXDY61hqwZ8BISe+/9iFDooGV/6Ri4mbez4UWiN6UfaKKokaFZzbo2n3VTb9
 Htx5KsrzslfGWAnoeIT9GnyFhT4te9IHT69jl2AorvxpmdXdfOI8TgrzS8TzuKGD
 N6TadC4IZGLLpww+rND6Bywdc8/garmFbck+/nVdMRwNAsZUE+m08OrNFMCqmYms
 p9jIA2tRh94Hu4Awi8hG
 =2rs/
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull XFS fixes from Darrick Wong:
 "I have just a few fixes for bugs and resource cleanup problems this
  week:

   - Fix resource cleanup of failed quota initialization

   - Fix integer overflow problems wrt s_maxbytes"

* tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix s_maxbytes overflow problems
  xfs: quota: check result of register_shrinker()
  xfs: quota: fix missed destroy of qi_tree_lock
2018-01-05 12:59:32 -08:00
Darrick J. Wong
b4d8ad7fd3 xfs: fix s_maxbytes overflow problems
Fix some integer overflow problems if offset + count happen to be large
enough to cause an integer overflow.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-02 10:16:32 -08:00
Aliaksei Karaliou
3a3882ff26 xfs: quota: check result of register_shrinker()
xfs_qm_init_quotainfo() does not check result of register_shrinker()
which was tagged as __must_check recently, reported by sparse.

Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com>
[darrick: move xfs_qm_destroy_quotainos nearer xfs_qm_init_quotainos]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-02 10:16:32 -08:00
Aliaksei Karaliou
2196881566 xfs: quota: fix missed destroy of qi_tree_lock
xfs_qm_destroy_quotainfo() does not destroy quotainfo->qi_tree_lock
while destroys quotainfo->qi_quotaofflock.

Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-02 10:16:32 -08:00
Linus Torvalds
fca0e39b2b Changes since last update:
- Fix a locking problem during xattr block conversion that could lead to
   the log checkpointing thread to try to write an incomplete buffer to
   disk, which leads to a corruption shutdown
 - Fix a null pointer dereference when removing delayed allocation extents
 - Remove post-eof speculative allocations when reflinking a block past
   current inode size so that we don't just leave them there and assert on
   inode reclaim
 - Relax an assert which didn't accurately reflect the way locking works
   and would trigger under heavy io load
 - Avoid infinite loop when cancelling copy on write extents after a
   writeback failure
 - Try to avoid copy on write transaction reservation overflows when
   remapping after a successful write
 - Fix various problems with the copy-on-write reservation automatic
   garbage collection not being cleaned up properly during a ro remount
 - Fix problems with rmap log items being processed in the wrong order,
   leading to corruption shutdowns
 - Fix problems with EFI recovery wherein the "remove any rmapping if
   present" mechanism wasn't actually doing anything, which would lead
   to corruption problems later when the extent is reallocated, leading
   to multiple rmaps for the same extent
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJaO+dwAAoJEPh/dxk0SrTrY8YP/R9AXH3Wt6S2QGGjZfXURa22
 /cioJKFl8hWay00ZT8Zcj4Pdx6R+stvausj5ECDvpdWZG+d28e61c1bxg+bqRYO5
 JWXikWnAa80RQ5uEjOXHoUjAgk6u6YYuQHEuHH/xA0nL4Cw98WLSzLjqk7ZU53rx
 P17dgUWWHta/w8OpxG9UG5pxvNW3VRitiyCMWxa2gzBPncHnCk3fu9lInpDzH9S+
 xakwCRtfiAykoOG/O5pnMg6vw5r6ENwK7DymxXgqF+Vv/HzgMbeJs+9UON2eACtp
 ECHGffN4pXpqWVcGDMs5cWCOfLUEjxCrotMLYpIrdZs5DptmOcOWpQpHWl4JiaXB
 rqAxx3D0Yo+00ENponM01un8UgCXF5gqsDGyTzn99aPpDVqxCJw1XmSdOXRhcnnF
 At2raUkXF+nbqaVwL3Y7ZJuOKs1hi3HpsYwwfvClR8cTFk/BaY6sQ4QnVR0Ggkg6
 8lZxeDb8VdoUjWO11sX1edwGtR8g+p3PSHiUFSnh1JsbP2I0R+TV+j5Y9rMotxFT
 Eq6+Ehp889GeSpEBCrDpMgNIABMjBxoi5JvOwXSUNhF5Rh/1Vf//7v31nXcyVlah
 a95IhCYfQLFMtaYaGr2ElvdO+Qs1+ppsD207I4H86XotjRkvD7U+mJoYm9EaujQX
 jgUDdZEsP5h5DX524VHU
 =i51V
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "Here are some XFS fixes for 4.15-rc5. Apologies for the unusually
  large number of patches this late, but I wanted to make sure the
  corruption fixes were really ready to go.

  Changes since last update:

   - Fix a locking problem during xattr block conversion that could lead
     to the log checkpointing thread to try to write an incomplete
     buffer to disk, which leads to a corruption shutdown

   - Fix a null pointer dereference when removing delayed allocation
     extents

   - Remove post-eof speculative allocations when reflinking a block
     past current inode size so that we don't just leave them there and
     assert on inode reclaim

   - Relax an assert which didn't accurately reflect the way locking
     works and would trigger under heavy io load

   - Avoid infinite loop when cancelling copy on write extents after a
     writeback failure

   - Try to avoid copy on write transaction reservation overflows when
     remapping after a successful write

   - Fix various problems with the copy-on-write reservation automatic
     garbage collection not being cleaned up properly during a ro
     remount

   - Fix problems with rmap log items being processed in the wrong
     order, leading to corruption shutdowns

   - Fix problems with EFI recovery wherein the "remove any rmapping if
     present" mechanism wasn't actually doing anything, which would lead
     to corruption problems later when the extent is reallocated,
     leading to multiple rmaps for the same extent"

* tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: only skip rmap owner checks for unknown-owner rmap removal
  xfs: always honor OWN_UNKNOWN rmap removal requests
  xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order
  xfs: set cowblocks tag for direct cow writes too
  xfs: remove leftover CoW reservations when remounting ro
  xfs: don't be so eager to clear the cowblocks tag on truncate
  xfs: track cowblocks separately in i_flags
  xfs: allow CoW remap transactions to use reserve blocks
  xfs: avoid infinite loop when cancelling CoW blocks after writeback failure
  xfs: relax is_reflink_inode assert in xfs_reflink_find_cow_mapping
  xfs: remove dest file's post-eof preallocations before reflinking
  xfs: move xfs_iext_insert tracepoint to report useful information
  xfs: account for null transactions in bunmapi
  xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of an attribute
  xfs: add the ability to join a held buffer to a defer_ops
2017-12-22 12:27:27 -08:00
Darrick J. Wong
68c58e9b9a xfs: only skip rmap owner checks for unknown-owner rmap removal
For rmap removal, refactor the rmap owner checks into a separate
function, then skip the checks if we are performing an unknown-owner
removal.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-21 08:48:38 -08:00
Darrick J. Wong
33df3a9cf9 xfs: always honor OWN_UNKNOWN rmap removal requests
Calling xfs_rmap_free with an unknown owner is supposed to remove any
rmaps covering that range regardless of owner.  This is used by the EFI
recovery code to say "we're freeing this, it mustn't be owned by
anything anymore", but for whatever reason xfs_free_ag_extent filters
them out.

Therefore, remove the filter and make xfs_rmap_unmap actually treat it
as a wildcard owner -- free anything that's already there, and if
there's no owner at all then that's fine too.

There are two existing callers of bmap_add_free that take care the rmap
deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
cleanup; convert these to use OWN_NULL (via helpers), and now we really
require that an RUI (if any) gets added to the defer ops before any EFI.

Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
growfs will have to consult directly with the rmap to ensure that there
aren't any rmaps in the grown region.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-21 08:48:38 -08:00
Darrick J. Wong
0525e952dc xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order
Under the deferred rmap operation scheme, there's a certain order in
which the rmap deferred ops have to be queued to maintain integrity
during log replay.  For alloc/map operations that order is cui -> rui;
for free/unmap operations that order is cui -> rui -> efi.  However, the
initial refcount code got the ordering wrong in the free side of things
because it queued refcount free op and an EFI and the refcount free op
queued a rmap free op, resulting in the order cui -> efi -> rui.

If we fail before the efd finishes, the efi recovery will try to do a
wildcard rmap removal and the subsequent rui will fail to find the rmap
and blow up.  This didn't ever happen due to other screws up in handling
unknown owner rmap removals, but those other screw ups broke recovery in
other ways, so fix the ordering to follow the intended rules.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-21 08:48:38 -08:00
Darrick J. Wong
86d692bfad xfs: set cowblocks tag for direct cow writes too
If a user performs a direct CoW write, we end up loading the CoW fork
with preallocated extents.  Therefore, we must set the cowblocks tag so
that they can be cleared out if we run low on space.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-21 08:47:37 -08:00
Darrick J. Wong
10ddf64e42 xfs: remove leftover CoW reservations when remounting ro
When we're remounting the filesystem readonly, remove all CoW
preallocations prior to going ro.  If the fs goes down after the ro
remount, we never clean up the staging extents, which means xfs_check
will trip over them on a subsequent run.  Practically speaking, the next
mount will clean them up too, so this is unlikely to be seen.  Since we
shut down the cowblocks cleaner on remount-ro, we also have to make sure
we start it back up if/when we remount-rw.

Found by adding clonerange to fsstress and running xfs/017.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-21 08:47:32 -08:00
Darrick J. Wong
363e59baa4 xfs: don't be so eager to clear the cowblocks tag on truncate
Currently, xfs_itruncate_extents clears the cowblocks tag if i_cnextents
is zero.  This is wrong, since i_cnextents only tracks real extents in
the CoW fork, which means that we could have some delayed CoW
reservations still in there that will now never get cleaned.

Fix a further bug where we /don't/ clear the reflink iflag if there are
any attribute blocks -- really, it's only safe to clear the reflink flag
if there are no data fork extents and no cow fork extents.

Found by adding clonerange to fsstress in xfs/017.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-21 08:47:28 -08:00