Remove the xfs_icluster structure and replace with a radix tree lookup.
We don't need to keep a list of inodes in each cluster around anymore as
we can look them up quickly when we need to. The only time we need to do
this now is during inode writeback.
Factor the inode cluster writeback code out of xfs_iflush and convert it
to use radix_tree_gang_lookup() instead of walking a list of inodes built
when we first read in the inodes.
This remove 3 pointers from each xfs_inode structure and the xfs_icluster
structure per inode cluster. Hence we reduce the cache footprint of the
xfs_inodes by between 5-10% depending on cluster sparseness.
To be truly efficient we need a radix_tree_gang_lookup_range() call to
stop searching once we are past the end of the cluster instead of trying
to find a full cluster's worth of inodes.
Before (ia64):
$ cat /sys/slab/xfs_inode/object_size 536
After:
$ cat /sys/slab/xfs_inode/object_size 512
SGI-PV: 977460
SGI-Modid: xfs-linux-melb:xfs-kern:30502a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
If the radix_tree_preload() fails, we need to destroy the inode we just
read in before trying again. This could leak xfs_vnode structures when
there is memory pressure. Noticed by Christoph Hellwig.
SGI-PV: 977823
SGI-Modid: xfs-linux-melb:xfs-kern:30606a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
The log force added in xfs_iget_core() has been a performance issue since
it was introduced for tight loops that allocate then unlink a single file.
under heavy writeback, this can introduce unnecessary latency due tothe
log I/o getting stuck behind bulk data writes.
Fix this latency problem by avoinding the need for the log force by moving
the place we mark linux inode dirty to the transaction commit rather than
on transaction completion.
This also closes a potential hole in the sync code where a linux inode is
not dirty between the time it is modified and the time the log buffer has
been written to disk.
SGI-PV: 972753
SGI-Modid: xfs-linux-melb:xfs-kern:30007a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Get rid of vnode useage in xfs_iget.c and pass Linux inode / xfs_inode
where apropinquate. And kill some useless helpers while we're at it.
SGI-PV: 971186
SGI-Modid: xfs-linux-melb:xfs-kern:29808a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
xfs_iocore_t is a structure embedded in xfs_inode. Except for one field it
just duplicates fields already in xfs_inode, and there is nothing this
abstraction buys us on XFS/Linux. This patch removes it and shrinks source
and binary size of xfs aswell as shrinking the size of xfs_inode by 60/44
bytes in debug/non-debug builds.
SGI-PV: 970852
SGI-Modid: xfs-linux-melb:xfs-kern:29754a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Simplify vnode tracing calls by embedding function name & return addr in
the calling macro.
Also do a lot of vnode->inode renaming for consistency, while we're at it.
SGI-PV: 970335
SGI-Modid: xfs-linux-melb:xfs-kern:29650a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The radix tree based inode caches did away with the inode cluster hashes,
replacing them with a bunch of masking and gang lookups on the radix tree.
This masking got broken when moving the code to per-ag radix trees and
indexing by agino # rather than straight inode number. The result is
clustered inode writeback does not cluster and things can go extremely
slowly when there are lots of inodes to write.
Fix it up by comparing the agino # of the inode we just looked up to the
index of the cluster we are looking for.
Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com>
SGI-PV: 972915
SGI-Modid: xfs-linux-melb:xfs-kern:30033a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Now that struct bhv_vfs doesn't have any members left we can kill it and
go directly from the super_block to the xfs_mount everywhere.
SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29509a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Also remove the now dead behavior code.
SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29505a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
It's entirely unused except for ignored arguments in the mrlock
initialization, so remove it.
SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29499a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
struct bhv_vnode is on it's way out, so move the trace buffer to the XFS
inode. Note that this makes the tracing macros rather misnamed, but this
kind of fallout will be fixed up incrementally later on.
SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29498a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
All flags previously handled at the vnode level are not in the xfs_inode
where we already have a flags mechanisms and free bits for flags
previously in the vnode.
SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29495a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
One of the perpetual scaling problems XFS has is indexing it's incore
inodes. We currently uses hashes and the default hash sizes chosen can
only ever be a tradeoff between memory consumption and the maximum
realistic size of the cache.
As a result, anyone who has millions of inodes cached on a filesystem
needs to tunes the size of the cache via the ihashsize mount option to
allow decent scalability with inode cache operations.
A further problem is the separate inode cluster hash, whose size is based
on the ihashsize but is smaller, and so under certain conditions (sparse
cluster cache population) this can become a limitation long before the
inode hash is causing issues.
The following patchset removes the inode hash and cluster hash and
replaces them with radix trees to avoid the scalability limitations of the
hashes. It also reduces the size of the inodes by 3 pointers....
SGI-PV: 969561
SGI-Modid: xfs-linux-melb:xfs-kern:29481a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The previous fixes for the use after free in xfs_iunpin left a nasty log
deadlock when xfslogd unpinned the inode and dropped the last reference to
the inode. the ->clear_inode() method can issue transactions, and if the
log was full, the transaction could push on the log and get stuck trying
to push the inode it was currently unpinning.
To fix this, we provide xfs_iunpin a guarantee that it will always have a
valid xfs_inode <-> linux inode link or a particular flag will be set on
the inode. We then use log forces during lookup to ensure transactions are
completed before we recycle the inode. This ensures that xfs_iunpin will
never use the linux inode after it is being freed, and any lookup on an
inode on the reclaim list will wait until it is safe to attach a new linux
inode to the xfs inode.
SGI-PV: 956832
SGI-Modid: xfs-linux-melb:xfs-kern:27359a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Shailendra Tripathi <stripathi@agami.com>
Signed-off-by: Takenori Nagano <t-nagano@ah.jp.nec.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
The previous attempts to fix the linux inode use-after-free in xfs_iunpin
simply made the problem harder to hit. We actually need complete exclusion
between xfs_reclaim and xfs_iunpin, as well as ensuring that the i_flags
are consistent during both of these functions. Introduce a new spinlock
for exclusion and the i_flags, and fix up xfs_iunpin to use igrab before
marking the inode dirty.
SGI-PV: 952967
SGI-Modid: xfs-linux-melb:xfs-kern:26964a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
One sema to rule them all, one sema to find them...
SGI-PV: 907752
SGI-Modid: xfs-linux-melb:xfs-kern:26911a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
one page.
SGI-PV: 955302
SGI-Modid: xfs-linux-melb:xfs-kern:26800a
Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
is check if semaphore is actually locked, which can be trivially done in
portable way. Code gets more reabable, while we are at it...
SGI-PV: 953915
SGI-Modid: xfs-linux-melb:xfs-kern:26274a
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Nathan Scott <nathans@sgi.com>
millions of inodes cached and has sparse cluster population, removing
inodes from the cluster hash consumes excessive amounts of CPU time.
Reduce the CPU cost by making removal O(1) via use of a double linked list
for the hash chains.
SGI-PV: 951551
SGI-Modid: xfs-linux-melb:xfs-kern:25683a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
fixes crashes under high nfs load
SGI-PV: 941429
SGI-Modid: xfs-linux:xfs-kern:197929a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!