Add a helper function to get rid of buffers that we have decided are
corrupt after the verifiers have run. This function is intended to
handle metadata checks that can't happen in the verifiers, such as
inter-block relationship checking. Note that we now mark the buffer
stale so that it will not end up on any LRU and will be purged on
release.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Use the xfs_da_get_buf_daddr function directly for the two callers
that pass a mapped disk address, and then remove the mappedbno argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Move the code for reading an already mapped block into
xfs_da3_node_read_mapped, which is the only caller ever passing a block
number in the mappedbno argument and replace the mappedbno argument with
the simple xfs_dabuf_get flags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This argument is always hard coded to -1, so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Instead of causing a relatively expensive indirect call for each
hashing and comparism of a file name in a directory just use an
inline function and a simple branch on the ASCII CI bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix unused variable warning]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Convert the last of the open coded corruption check and report idioms to
use the XFS_IS_CORRUPT macro.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There is no real need for xfs_dir2_data_freescan wrapper, so rename
xfs_dir2_data_freescan_int to xfs_dir2_data_freescan and let the
callers dereference the mount pointer from the inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Replace the ->data_get_ftype and ->data_put_ftype dir ops methods with
directly called xfs_dir2_data_get_ftype and xfs_dir2_data_put_ftype
helpers that takes care of the differences between the directory format
with and without the file type field.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Replace the ->data_bestfree_p dir ops method with a directly called
xfs_dir2_data_bestfree_p helper that takes care of the differences
between the v4 and v5 on-disk format.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Move the data block fixed offsets towards our structure for dir/attr
geometry parameters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Replace the ->data_entry_tag_p dir ops method with a directly called
xfs_dir2_data_entry_tag_p helper that takes care of the differences
between the directory format with and without the file type field.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Replace the ->data_entsize dir ops method with a directly called
xfs_dir2_data_entsize helper that takes care of the differences between
the directory format with and without the file type field.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that the max bests value is in struct xfs_da_geometry both instances
of ->db_to_fdb and ->db_to_fdindex are identical. Replace them with
local xfs_dir2_db_to_fdb and xfs_dir2_db_to_fdindex functions in
xfs_dir2_node.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Move the max free bests count towards our structure for dir/attr
geometry parameters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Move the free header size towards our structure for dir/attr geometry
parameters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All but two callers of the ->free_bests_p dir operation already have a
struct xfs_dir3_icfree_hdr from a previous call to
xfs_dir2_free_hdr_from_disk at hand. Add a pointer to the bests to
struct xfs_dir3_icfree_hdr to clean up this pattern. To optimize this
pattern, pass the struct xfs_dir3_icfree_hdr to xfs_dir2_free_log_bests
instead of recalculating the pointer there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Return the xfs_dir3_icfree_hdr used by the helpers called from
xfs_dir2_node_addname_int to the main function to prepare for the
next round of changes where we'll use the ichdr in xfs_dir3_icfree_hdr
to avoid extra operations to find the bests pointers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Replace the ->free_hdr_to_disk dir ops method with a directly called
xfs_dir2_free_hdr_to_disk helper that takes care of the differences
between the v4 and v5 on-disk format.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Replace the ->free_hdr_from_disk dir ops method with a directly called
xfs_dir_free_hdr_from_disk helper that takes care of the differences
between the v4 and v5 on-disk format.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Move the max leaf entries count towards our structure for dir/attr
geometry parameters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Move the leaf header size towards our structure for dir/attr geometry
parameters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All callers of the ->node_tree_p dir operation already have a struct
xfs_dir3_icleaf_hdr from a previous call to xfs_da_leaf_hdr_from_disk at
hand, or just need slight changes to the calling conventions to do so.
Add a pointer to the entries to struct xfs_dir3_icleaf_hdr to clean up
this pattern. To make this possible the xfs_dir3_leaf_log_ents function
grow a new argument to pass the xfs_dir3_icleaf_hdr that call callers
already have, and xfs_dir2_leaf_lookup_int returns the
xfs_dir3_icleaf_hdr to the callers so that they can later use it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Replace the ->leaf_hdr_to_disk dir ops method with a directly called
xfs_dir_leaf_hdr_to_disk helper that takes care of the differences
between the v4 and v5 on-disk format.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Replace the ->leaf_hdr_from_disk dir ops method with a directly called
xfs_dir2_leaf_hdr_from_disk helper that takes care of the differences
between the v4 and v5 on-disk format.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Make sure we log something to dmesg whenever we return -EFSCORRUPTED up
the call stack.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When a directory is growing rapidly, new blocks tend to get added at
the end of the directory. These end up at the end of the freespace
index, and when the directory gets large finding these new
freespaces gets expensive. The code does a linear search across the
frespace index from the first block in the directory to the last,
hence meaning the newly added space is the last index searched.
Instead, do a reverse order index search, starting from the last
block and index in the freespace index. This makes most lookups for
free space on rapidly growing directories O(1) instead of O(N), but
should not have any impact on random insert workloads because the
average search length is the same regardless of which end of the
array we start at.
The result is a major improvement in large directory grow rates:
create time(sec) / rate (files/s)
File count vanilla Prev commit Patched
10k 0.41 / 24.3k 0.42 / 23.8k 0.41 / 24.3k
20k 0.74 / 27.0k 0.76 / 26.3k 0.75 / 26.7k
100k 3.81 / 26.4k 3.47 / 28.8k 3.27 / 30.6k
200k 8.58 / 23.3k 7.19 / 27.8k 6.71 / 29.8k
1M 85.69 / 11.7k 48.53 / 20.6k 37.67 / 26.5k
2M 280.31 / 7.1k 130.14 / 15.3k 79.55 / 25.2k
10M 3913.26 / 2.5k 552.89 / 18.1k
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When running a "create millions inodes in a directory" test
recently, I noticed we were spending a huge amount of time
converting freespace block headers from disk format to in-memory
format:
31.47% [kernel] [k] xfs_dir2_node_addname
17.86% [kernel] [k] xfs_dir3_free_hdr_from_disk
3.55% [kernel] [k] xfs_dir3_free_bests_p
We shouldn't be hitting the best free block scanning code so hard
when doing sequential directory creates, and it turns out there's
a highly suboptimal loop searching the the best free array in
the freespace block - it decodes the block header before checking
each entry inside a loop, instead of decoding the header once before
running the entry search loop.
This makes a massive difference to create rates. Profile now looks
like this:
13.15% [kernel] [k] xfs_dir2_node_addname
3.52% [kernel] [k] xfs_dir3_leaf_check_int
3.11% [kernel] [k] xfs_log_commit_cil
And the wall time/average file create rate differences are
just as stark:
create time(sec) / rate (files/s)
File count vanilla patched
10k 0.41 / 24.3k 0.42 / 23.8k
20k 0.74 / 27.0k 0.76 / 26.3k
100k 3.81 / 26.4k 3.47 / 28.8k
200k 8.58 / 23.3k 7.19 / 27.8k
1M 85.69 / 11.7k 48.53 / 20.6k
2M 280.31 / 7.1k 130.14 / 15.3k
The larger the directory, the bigger the performance improvement.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Simplify the logic in xfs_dir2_node_addname_int() by factoring out
the free block index lookup code that finds a block with enough free
space for the entry to be added. The code that is moved gets a major
cleanup at the same time, but there is no algorithm change here.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Factor out the code that adds a data block to a directory from
xfs_dir2_node_addname_int(). This makes the code flow cleaner and
more obvious and provides clear isolation of upcoming optimsations.
Signed-off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This gets rid of the need for a forward declaration of the static
function xfs_dir2_addname_int() and readies the code for factoring
of xfs_dir2_addname_int().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Continue our game of replacing ASSERTs for corrupt ondisk metadata with
EFSCORRUPTED returns.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.
nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this. I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We need to derive the mount pointer from a buffer in a lot of place.
Add a direct pointer to short cut the pointer chasing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The inode geometry structure isn't related to ondisk format; it's
support for the mount structure. Move it to xfs_shared.h.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Remove typedefs and consolidate local variable initialization.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
When building with -Wsometimes-uninitialized, Clang warns:
fs/xfs/libxfs/xfs_dir2_node.c:481:6: warning: variable 'lowstale' is
used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
fs/xfs/libxfs/xfs_dir2_node.c:481:6: warning: variable 'highstale' is
used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
While it isn't technically wrong, it isn't a problem in practice because
highstale and lowstale are only initialized in xfs_dir2_leafn_add when
compact is not zero then they are passed to xfs_dir3_leaf_find_entry,
where they are initialized before use when compact is zero. Regardless,
it's better not to be passing around uninitialized stack memory so zero
initialize these variables, which silences this warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/393
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Most buffer verifiers have hardcoded magic value checks
conditionalized on the version of the filesystem. The magic value
field of the verifier structure facilitates abstraction of some of
this code. Populate the ->magic field of various verifiers to take
advantage of this abstraction. No functional changes.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Make use of the swap macro and remove unnecessary variable *tmp*. This
makes the code easier to read and maintain. Also, slightly refactor some
code.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Remove the verbose license text from XFS files and replace them
with SPDX tags. This does not change the license of any of the code,
merely refers to the common, up-to-date license files in LICENSES/
This change was mostly scripted. fs/xfs/Makefile and
fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
and modified by the following command:
for f in `git grep -l "GNU General" fs/xfs/` ; do
echo $f
cat $f | awk -f hdr.awk > $f.new
mv -f $f.new $f
done
And the hdr.awk script that did the modification (including
detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
is as follows:
$ cat hdr.awk
BEGIN {
hdr = 1.0
tag = "GPL-2.0"
str = ""
}
/^ \* This program is free software/ {
hdr = 2.0;
next
}
/any later version./ {
tag = "GPL-2.0+"
next
}
/^ \*\// {
if (hdr > 0.0) {
print "// SPDX-License-Identifier: " tag
print str
print $0
str=""
hdr = 0.0
next
}
print $0
next
}
/^ \* / {
if (hdr > 1.0)
next
if (hdr > 0.0) {
if (str != "")
str = str "\n"
str = str $0
next
}
print $0
next
}
/^ \*/ {
if (hdr > 0.0)
next
print $0
next
}
// {
if (hdr > 0.0) {
if (str != "")
str = str "\n"
str = str $0
next
}
print $0
}
END { }
$
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Explicitly pass the buffer length to xfs_corruption_error() instead of
assuming XFS_CORRUPTION_DUMP_LEN so that we avoid dumping off the end
of the buffer.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
In xfs_dir2_data_use_free, we examine on-disk metadata and ASSERT if
it doesn't make sense. Since a carefully crafted fuzzed image can cause
the kernel to crash after blowing a bunch of assertions, let's move
those checks into a validator function and rig everything up to return
EFSCORRUPTED to userspace. Found by lastbit fuzzing ltail.bestcount via
xfs/391.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Yet another round of playing whack-a-mole with directory code that
asserts on corrupt on-disk metadata when it really should be returning
-EFSCORRUPTED instead of ASSERTing. Found by a xfs/391 crash while
lastbit fuzzing of ltail.bestcount.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
By splitting the b_fspriv field into two different fields (b_log_item
and b_li_list). It's possible to get rid of an old ABI workaround, by
using the new b_log_item field to store xfs_buf_log_item separated from
the log items attached to the buffer, which will be linked in the new
b_li_list field.
This way, there is no more need to reorder the log items list to place
the buf_log_item at the beginning of the list, simplifying a bit the
logic to handle buffer IO.
This also opens the possibility to change buffer's log items list into a
proper list_head.
b_log_item field is still defined as a void *, because it is still used
by the log buffers to store xlog_in_core structures, and there is no
need to add an extra field on xfs_buf just for xlog_in_core.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: minor style changes]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Starting with commit 57e734423a ("vsprintf: refactor %pK code out of
pointer"), the behavior of the raw '%p' printk format specifier was
changed to print a 32-bit hash of the pointer value to avoid leaking
kernel pointers into dmesg. For most situations that's good.
This is /undesirable/ behavior when we're trying to debug XFS, however,
so define a PTR_FMT that prints the actual pointer when we're in debug
mode.
Note that %p for tracepoints still prints the raw pointer, so in the
long run we could consider rewriting some of these messages as
tracepoints.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Since %p prepends "0x" to the outputted string, we can drop the prefix.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Expose all metadata structure buffer verifier functions via buf_ops.
These will be used by the online scrub mechanism to look for problems
with buffers that are already sitting around in memory.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Refactor the callers of verifiers to print the instruction address of a
failing check.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Modify each function that checks the contents of a metadata buffer to
return the instruction address of the failing test so that we can report
more precise failure errors to the log.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Since all verification errors also mark the buffer as having an error,
we can combine these two calls. Later we'll add a xfs_failaddr_t
parameter to promote the idea of reporting corruption errors and the
address of the failing check to enable better debugging reports.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Modify the existing dir leafn lasthash function to enable us to
calculate the highest hash value of a leaf1 block. This will be used by
the directory scrubbing code to check the sanity of hashes in leaf1
directory blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>