To avoid having to initialise some fields of the XFS inode on every
allocation, we can use the slab init-once feature to initialise them. All
we have to guarantee is that when we free the inode, all it's entries are
in the initial state. Add asserts where possible to ensure debug kernels
check this initial state before freeing and after allocation.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31925a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
[PATCH] kill the rest of struct file propagation in block ioctls
[PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
[PATCH] get rid of blkdev_locked_ioctl()
[PATCH] get rid of blkdev_driver_ioctl()
[PATCH] sanitize blkdev_get() and friends
[PATCH] remember mode of reiserfs journal
[PATCH] propagate mode through swsusp_close()
[PATCH] propagate mode through open_bdev_excl/close_bdev_excl
[PATCH] pass fmode_t to blkdev_put()
[PATCH] kill the unused bsize on the send side of /dev/loop
[PATCH] trim file propagation in block/compat_ioctl.c
[PATCH] end of methods switch: remove the old ones
[PATCH] switch sr
[PATCH] switch sd
[PATCH] switch ide-scsi
[PATCH] switch tape_block
[PATCH] switch dcssblk
[PATCH] switch dasd
[PATCH] switch mtd_blkdevs
[PATCH] switch mmc
...
Now that we've moved the readdir hack to the nfsd code, we can
remove the local version from the XFS code.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
replace open_bdev_excl/close_bdev_excl with variants taking fmode_t.
superblock gets the value used to mount it stored in sb->s_mode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When we skip unrecognized options in xfs_fs_remount we should just break
out of the switch and not return because otherwise we may skip clearing
the xfs-internal read-only flag. This will only show up on some
operations like touch because most read-only checks are done by the VFS
which thinks this filesystem is r/w. Eventually we should replace the
XFS read-only flag with a helper that always checks the VFS flag to make
sure they can never get out of sync.
Bug reported and fix verified by Marcel Beister on #xfs.
Bug fix verified by updated xfstests/189.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Timothy Shimmin <tes@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.
This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we disable barriers as soon as we get a buffer in xlog_iodone
that has the XBF_ORDERED flag cleared. But this can be the case not only
for buffers where the barrier failed, but also the first buffer of a
split log write in case of a log wraparound. Due to the disabled
barriers we can easily get directory corruption on unclean shutdowns.
So instead of using this check add a new buffer flag for failed barrier
writes.
This is a regression vs 2.6.26 caused by patch to use the right macro
to check for the ORDERED flag, as we previously got true returned for
every buffer.
Thanks to Toei Rei for reporting the bug.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: David Chinner <david@fromorbit.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the help from some tracing I found that we try to map extents beyond
eof when doing a direct I/O read. It appears that the way to inform the
generic direct I/O path (ie do_direct_IO()) that we have breached eof is
to return an unmapped buffer from xfs_get_blocks_direct(). This will cause
do_direct_IO() to jump to the hole handling code where is will check for
eof and then abort.
This problem was found because a direct I/O read was trying to map beyond
eof and was encountering delayed allocations. The delayed allocations
beyond eof are speculative allocations and they didn't get converted when
the direct I/O flushed the file because there was only enough space in the
current AG to convert and write out the dirty pages within eof. Note that
xfs_iomap_write_allocate() wont necessarily convert all the delayed
allocation passed to it - it will return after allocating the first extent
- so if the delayed allocation extends beyond eof then it will stay that
way.
SGI-PV: 983683
SGI-Modid: xfs-linux-melb:xfs-kern:31929a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Logically we would return an error in xfs_fs_remount code to prevent users
from believing they might have changed mount options using remount which
can't be changed.
But unfortunately mount(8) adds all options from mtab and fstab to the
mount arguments in some cases so we can't blindly reject options, but have
to check for each specified option if it actually differs from the
currently set option and only reject it if that's the case.
Until that is implemented we return success for every remount request, and
silently ignore all options that we can't actually change.
SGI-PV: 985710
SGI-Modid: xfs-linux-melb:xfs-kern:31908a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
As pointed out during review d_add_ci argument order should match d_add,
so switch the dentry and inode arguments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The patches that are intended to introduce copy-on-write credentials for 2.6.28
require abstraction of access to some fields of the task structure,
particularly for the case of one task accessing another's credentials where RCU
will have to be observed.
Introduced here are trivial no-op versions of the desired accessors for current
and other tasks so that other subsystems can start to be converted over more
easily.
Wrappers are introduced into a new header (linux/cred.h) for UID/GID,
EUID/EGID, SUID/SGID, FSUID/FSGID, cap_effective and current's subscribed
user_struct. These wrappers are macros because the ordering between header
files mitigates against making them inline functions.
linux/cred.h is #included from linux/sched.h.
Further, XFS is modified such that it no longer defines and uses parameterised
versions of current_fs[ug]id(), thus getting rid of the namespace collision
otherwise incurred.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Use KM_NOFS to prevent recursion back into the filesystem which can cause
deadlocks.
In the case of xfs_iread() we hold the lock on the inode cluster buffer
while allocating memory for the trace buffers. If we recurse back into XFS
to flush data that may require a transaction to allocate extents which
needs log space. This can deadlock with the xfsaild thread which can't
push the tail of the log because it is trying to get the inode cluster
buffer lock.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31838a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
xfs_readsb is called before xfs_mount so xfs_freesb should be called after
xfs_unmountfs, too. This means it now happens after a few things during
the of xfs_unmount which all have nothing to do with the superblock.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31835a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Remove all the useless flags and code keyed off it in xfs_mountfs.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31831a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
The root inode is allocated in xfs_mountfs so it should be release in
xfs_unmountfs. For the unmount case that means we do it after the the
xfs_sync(mp, SYNC_WAIT | SYNC_CLOSE) in the forced shutdown case and the
dmapi unmount event. Note that both reference the rip variable which might
be freed by that time in case inode flushing has kicked in, so strictly
speaking this might count as a bug fix
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31830a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_ichtime updates the xfs_inode and Linux inode timestamps just fine, no
need to call file_update_time and then copy the values over to the XFS
inode. The only additional thing in file_update_time are checks not
applicable to the write path.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31829a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Port a little optmization from file_update_time to xfs_ichgtime, and only
update the timestamp and mark the inode dirty if the timestamp actually
changes in the timer tick resultion supported by the running kernel.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31827a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
In xfs_ialloc we just want to set all timestamps to the current time. We
don't need to mark the inode dirty like xfs_ichgtime does, and we don't
need nor want the opimizations in xfs_ichgtime that I will introduce in
the next patch.
So just opencode the timestamp update in xfs_ialloc, and remove the new
unused XFS_ICHGTIME_ACC case in xfs_ichgtime.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31825a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Now that all users of the sema_t are gone from XFS we can finally kill it.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31823a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
The xfs_buf_t b_iodonesema is really just a semaphore that wants to be a
completion. Change it to a completion and remove the last user of the
sema_t from XFS.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31815a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
A lot of code has been converted away from semaphores, but there are still
comments that reference semaphore behaviour. The log code is the worst
offender. Update the comments to reflect what the code really does now.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31814a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Sanitize setting up the Linux indode.
Setting up the xfs_inode <-> inode link is opencoded in xfs_iget_core now
because that's the only place it needs to be done, xfs_initialize_vnode is
renamed to xfs_setup_inode and loses all superflous paramaters. The check
for I_NEW is removed because it always is true and the di_mode check moves
into xfs_iget_core because it's only needed there.
xfs_set_inodeops and xfs_revalidate_inode are merged into xfs_setup_inode
and the whole things is moved into xfs_iops.c where it belongs.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31782a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
All remaining bhv_vnode_t instance are in code that's more or less Linux
specific. (Well, for xfs_acl.c that could be argued, but that code is on
the removal list, too). So just do an s/bhv_vnode_t/struct inode/ over the
whole tree. We can clean up variable naming and some useless helpers
later.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31781a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
In various places we can just move a VFS_I call into the argument list of
called functions/macros instead of having a local bhv_vnode_t.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31776a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Now that all direct calls to VN_HOLD/VN_RELE are gone we can implement
IHOLD/IRELE directly.
For the IHOLD case also replace igrab with a direct increment of i_count
because we are guaranteed to already have a live and referenced inode by
the VFS. Also remove the vn_hold statistic because it's been rather
meaningless for some time with most references done by other callers.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31764a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
bhv_vnode_t is just a typedef for struct inode, so there's
no need for a helper to convert between the two.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31761a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
bhv_vnode_t is just a typedef for struct inode, so there's
no need for a helper to convert between the two.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31760a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Looks like somehow xfs got missed in the conversion that took place in
e231c2ee64, "Convert ERR_PTR(PTR_ERR(p))
instances to ERR_CAST(p)
<http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit
diff;h=e231c2ee64eb1c5cd3c63c31da9dac7d888dcf7f>"
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31757a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Add a helper to free the m_fsname/m_rtname/m_logname allocations and use
it properly for all mount failure cases. Also switch the allocations for
these to kstrdup while we're at it.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31728a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_vtoi() is redundant and only unsed in small sections of code.
Replace them with widely used XFS_I() inline and kill xfs_vtoi().
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31725a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
In several places we directly convert from the XFS inode
to the linux (VFS) inode by a simple deference of ip->i_vnode.
We should not do this - a helper function should be used to
extract the VFS inode from the XFS inode.
Introduce the function VFS_I() to extract the VFS inode
from the XFS inode. The name was chosen to match XFS_I() which
is used to extract the XFS inode from the VFS inode.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31720a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
We should not access a buffer after dropping it's reference count
otherwise we could race with another thread that releases the final
reference count and frees the buffer causing us to access potentially
unmapped memory. The bug this change fixes only occured on DEBUG XFS since
the offending code was in an ASSERT.
SGI-PV: 984429
SGI-Modid: xfs-linux-melb:xfs-kern:31715a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Like the page lock change, this also requires name change, so convert the
raw test_and_set bitop to a trylock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).
This also facilitates lockdeping of page lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These days most of the attributes in struct inode are properly kept in
sync by XFS. This patch removes the need for vn_revalidate completely by:
- keeping inode.i_flags uptodate after any flags are updated in
xfs_ioctl_setattr
- keeping i_mode, i_uid and i_gid uptodate in xfs_setattr
SGI-PV: 984566
SGI-Modid: xfs-linux-melb:xfs-kern:31679a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
it can be switched to take struct iattr directly and thus simplify the
implementation greatly. Also rename the ATTR_ flags to XFS_ATTR_ to not
conflict with the ATTR_ flags used by the VFS.
SGI-PV: 984565
SGI-Modid: xfs-linux-melb:xfs-kern:31678a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
->setattr but also addition XFS-specific attributes: project id, inode
flags and extent size hint. Having these in a single function makes it
more complicated and forces to have us a bhv_vattr intermediate structure
eating up stackspace.
This patch adds a new xfs_ioctl_setattr helper for the XFS ioctls that set
these attributes and remove the code to set them through xfs_setattr.
SGI-PV: 984564
SGI-Modid: xfs-linux-melb:xfs-kern:31677a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Remount currently happily accept any option thrown at it, although the
only filesystem specific option it actually handles is barrier/nobarrier.
And it actually doesn't handle these correctly either because it only uses
the value it parsed when we're doing a ro->rw transition. In addition to
that there's also a bad bug in xfs_parseargs which doesn't touch the
actual option in the mount point except for a single one,
XFS_MOUNT_SMALL_INUMS and thus forced any filesystem that's every
remounted in some way to not support 64bit inodes with no way to recover
unless unmounted.
This patch changes xfs_fs_remount to use it's own linux/parser.h based
options parse instead of xfs_parseargs and reject all options except for
barrier/nobarrier and to the right thing in general. Eventually I'd like
to have a single big option table used for mount aswell but that can wait
for a while.
SGI-PV: 983964
SGI-Modid: xfs-linux-melb:xfs-kern:31382a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
md raid1 can pass down barriers, but does not set an ordered flag on the
queue, so xfs does not even attempt a barrier write, and will never use
barriers on these block devices.
Remove the flag check and just let the barrier write test determine
barrier support.
A possible risk here is that if something does not set an ordered flag and
also does not properly return an error on a barrier write... but if it's
any consolation jbd/ext3/reiserfs never test the flag, and don't even do a
test write, they just disable barriers the first time an actual journal
barrier write fails.
SGI-PV: 983924
SGI-Modid: xfs-linux-melb:xfs-kern:31377a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Currently the xfs module init/exit code is a mess. It's farmed out over a
lot of function with very little error checking. This patch makes sure we
propagate all initialization failures properly and clean up after them.
Various runtime initializations are replaced with compile-time
initializations where possible to make this easier. The exit path is
similarly consolidated.
There's now split out function to create/destroy the kmem zones and
alloc/free the trace buffers. I've also changed the ktrace allocations to
KM_MAYFAIL and handled errors resulting from that.
And yes, we really should replace the XFS_*_TRACE ifdefs with a single
XFS_TRACE..
SGI-PV: 976035
SGI-Modid: xfs-linux-melb:xfs-kern:31354a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_attrmulti_by_handle currently request the size based on
sizeof(attr_multiop_t) but should be using sizeof(xfs_attr_multiop_t)
because that is what it is dealing with. Despite beeing wrong this
actually harmless in practice because both structures are the same size on
all platforms.
But this sizeof was the only user of struct attr_multiop so we can just
kill it. Also move the ATTR_OP_* defines xfs_attr.h into the struct
xfs_attr_multiop defintion in xfs_fs.h because they are only used with
that structure, and are part of the user ABI for the
XFS_IOC_ATTRMULTI_BY_HANDLE ioctl.
SGI-PV: 983508
SGI-Modid: xfs-linux-melb:xfs-kern:31352a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_attrlist_by_handle should only take the ATTR_ flags for the root
namespaces. The ATTR_KERN* flags may change at anytime and expect special
preconditions that can't be guaranteed for userspace-originating requests.
For example passing down ATTR_KERNNOVAL through xfs_attrlist_by_handle
will hit an assert in debug builds currently.
SGI-PV: 983677
SGI-Modid: xfs-linux-melb:xfs-kern:31351a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>