Commit Graph

281 Commits

Author SHA1 Message Date
Benjamin Marzinski
7a9f53b3c1 [GFS2] Alternate gfs2_iget to avoid looking up inodes being freed
There is a possible deadlock between two processes on the same node, where one
process is deleting an inode, and another process is looking for allocated but
unused inodes to delete in order to create more space.

process A does an iput() on inode X, and it's i_count drops to 0. This causes
iput_final() to be called, which puts an inode into state I_FREEING at
generic_delete_inode(). There no point between when iput_final() is called, and
when I_FREEING is set where GFS2 could acquire any glocks. Once I_FREEING is
set, no other process on that node can successfully look up that inode until
the delete finishes.

process B locks the the resource group for the same inode in get_local_rgrp(),
which is called by gfs2_inplace_reserve_i()

process A tries to lock the resource group for the inode in
gfs2_dinode_dealloc(), but it's already locked by process B

process B waits in find_inode for the inode to have the I_FREEING state cleared.

Deadlock.

This patch solves the problem by adding an alternative to gfs2_iget(),
gfs2_iget_skip(), that simply skips any inodes that are in the I_FREEING
state.o The alternate test function is just like the original one, except that
it fails if the inode is being freed, and sets a skipped flag. The alternate
set function is just like the original, except that it fails if the skipped
flag is set. Only try_rgrp_unlink() calls gfs2_iget_skip() instead of
gfs2_iget().

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:56:29 +01:00
Wendy Cheng
e9bd2b3baf [GFS2] fix inode meta data corruption
Fix a nasty inode meta data corruption issue by keeping the buffer head in
icache array. This buffer needs to stay in memory until journal flush occurs
Otherwise, gfs2_meta_inode_buffer could do a disk read before the inode hits
disk. It ends up with meta data corruptions. The buffer will be released as
part of the existing journal flush logic.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:51 +01:00
Wendy Cheng
35dcc52e3a [GFS2] Remove i_mode passing from NFS File Handle
GFS2 has been passing i_mode within NFS File Handle. Other than the
wrong assumption that there is always room for this extra 16 bit value,
the current gfs2_get_dentry doesn't really need the i_mode to work
correctly. Note that GFS2 NFS code does go thru the same lookup code
path as direct file access route (where the mode is obtained from name
lookup) but gfs2_get_dentry() is coded for different purpose. It is not
used during lookup time. It is part of the file access procedure call.
When the call is invoked, if on-disk inode is not in-memory, it has to
be read-in. This makes i_mode passing a useless overhead.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:11 +01:00
Wendy Cheng
bb9bcf0616 [GFS2] Obtaining no_formal_ino from directory entry
GFS2 lookup code doesn't ask for inode shared glock. This implies during
in-memory inode creation for existing file, GFS2 will not disk-read in
the inode contents. This leaves no_formal_ino un-initialized during
lookup time. The un-initialized no_formal_ino is subsequently encoded
into file handle. Clients will get ESTALE error whenever it tries to
access these files.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:08 +01:00
Abhijith Das
d93cfa9884 [GFS2] Fix deallocation issues
There were two issues during deallocation of unlinked inodes. The
first was relating to the use of a "try" lock which in the case of
the inode lock wasn't trying hard enough to deallocate in all
circumstances (now changed to a normal glock) and in the case of
the iopen lock didn't wait for the demotion of the shared lock before
attempting to get the exclusive lock, and thereby sometimes (timing dependent)
not completing the deallocation when it should have done.

The second issue related to the lack of a way to invalidate dcache entries
on remote nodes (now fixed by this patch) which meant that unlinks were
taking a long time to return disk space to the fs. By adding some code to
invalidate the dcache entries across the cluster for unlinked inodes, that
is now fixed.

This patch was written jointly by Abhijith Das and Steven Whitehouse.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:36 +01:00
akpm@linux-foundation.org
037bcbb756 [GFS2] gfs2_lookupi() uninitialised var fix
fs/gfs2/inode.c: In function 'gfs2_lookupi':
fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function

Looks like a real bug to me.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:29 +01:00
Steven Whitehouse
c8cdf47937 [GFS2] Recovery for lost unlinked inodes
Under certain circumstances its possible (though rather unlikely) that
inodes which were unlinked by one node while still open on another might
get "lost" in the sense that they don't get deallocated if the node
which held the inode open crashed before it was unlinked.

This patch adds the recovery code which allows automatic deallocation of
the inode if its found during block allocation (the sensible time to
look for such inodes since we are scanning the rgrp's bitmaps anyway at
this time, so it adds no overhead to do this).

Since the inode will have had its i_nlink set to zero, all we need to
trigger recovery is a lookup and an iput(), and the normal deallocation
code takes care of the rest.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:26 +01:00
Steven Whitehouse
e1cc86037b [GFS2] Fix bug in error path of inode
This fixes a bug in the ordering of operations in the error path of
createi. Its not valid to do an iput() when holding the inode's glock
since the iput() will (in this case) result in delete_inode() being
called which needs to grab the lock itself. This was causing the
recursive lock checking code to trigger.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:22 +01:00
Steven Whitehouse
4bd91ba181 [GFS2] Add nanosecond timestamp feature
This adds a nanosecond timestamp feature to the GFS2 filesystem. Due
to the way that the on-disk format works, older filesystems will just
appear to have this field set to zero. When mounted by an older version
of GFS2, the filesystem will simply ignore the extra fields so that
it will again appear to have whole second resolution, so that its
trivially backward compatible.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:12 +01:00
Steven Whitehouse
bb8d8a6f54 [GFS2] Fix sign problem in quota/statfs and cleanup _host structures
This patch fixes some sign issues which were accidentally introduced
into the quota & statfs code during the endianess annotation process.
Also included is a general clean up which moves all of the _host
structures out of gfs2_ondisk.h (where they should not have been to
start with) and into the places where they are actually used (often only
one place). Also those _host structures which are not required any more
are removed entirely (which is the eventual plan for all of them).

The conversion routines from ondisk.c are also moved into the places
where they are actually used, which for almost every one, was just one
single place, so all those are now static functions. This also cleans up
the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

The net result is a reduction of about 100 lines of code, many functions
now marked static plus the bug fixes as mentioned above. For good
measure I ran the code through sparse after making these changes to
check that there are no warnings generated.

This fixes Red Hat bz #239686

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:10 +01:00
Steven Whitehouse
dbb7cae2a3 [GFS2] Clean up inode number handling
This patch cleans up the inode number handling code. The main difference
is that instead of looking up the inodes using a struct gfs2_inum_host
we now use just the no_addr member of this structure. The tests relating
to no_formal_ino can then be done by the calling code. This has
advantages in that we want to do different things in different code
paths if the no_formal_ino doesn't match. In the NFS patch we want to
return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
no_formal_ino doesn't match and thus we can withdraw in this case.

In order to later fix bz #201012, we need to be able to look up an inode
without knowing no_formal_ino, as the only information that is known to
us is the on-disk location of the inode in question.

This patch will also help us to fix bz #236099 at a later date by
cleaning up a lot of the code in that area.

There are no user visible changes as a result of this patch and there
are no changes to the on-disk format either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:24 +01:00
Steven Whitehouse
1be3867955 [GFS2] Fix bz 229831, lookup returns wrong inode
The following patch fixes Red Hat bz 229831. Without this patch its
possible for the wrong inode to be returned in certain cases. It is a
pretty unusual event, so that its taken some time to track down. Thanks
and due to Josef Whiter who did a lot of the testing required to thrack
this down and fix it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 14:01:53 -05:00
Wendy Cheng
fb0d3bce8e [GFS2] pass formal ino in do_filldir_main
ok, the following is the minimum changes to get NFSD going before we
settle down this issue .. would appreciate this in the tree so other NFS
related works can get done in parallel.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07 13:58:45 -05:00
Russell Cattelan
ddee76089c [GFS2] Fix unlink deadlocks
Move the glock acquisition to outside of the transactions.

Lock odering must be preserved in order to prevent ABBA
deadlocks. The current gfs2_change_nlink code would tries
to grab the glock after having started a transaction and thus is holding
the log lock. This is inconsistent with other code paths in
gfs that grab the resource group glock prior to staring
a tranactions.

One problem with this fix is that the resource group
lock is always grabbed now even if the inode still has
ref count and can not be marked for unlink.

Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:38:17 -05:00
Steven Whitehouse
d7c103d0bd [GFS2] Fix recursive locking attempt with NFS
In certain cases, its possible for NFS to call the lookup code while
holding the glock (when doing a readdirplus operation) so we need to
check for that and not try and lock the glock twice. This also fixes a
typo in a previous NFS related GFS2 patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:53 -05:00
Eric Sandeen
ddfe062783 [GFS2] use CURRENT_TIME_SEC instead of get_seconds in gfs2
I was looking something else up and came across this...

I don't honestly have a good reason to change it other than to make it
like every other Linux filesystem in this regard.  ;-)  It doesn't
functionally change anything, but makes some lines shorter. :)

I'm also curious; why does gfs2 have 64-bits of on-disk timestamps, but
not in timespec_t format, and only stores second resolutions?  Seems like
you're halfway to sub-second resolutions already.

I suppose if that gets implemented then all of the below should
instead be CURRENT_TIME not CURRENT_TIME_SEC.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:38 -05:00
Adrian Bunk
03dc6a538e [GFS2] make gfs2_change_nlink_i() static
On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc3-mm1:
>...
>  git-gfs2-nmw.patch
>...
>  git trees
>...

This patch makes the needlessly globlal gfs2_change_nlink_i() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:49 -05:00
S. Wendy Cheng
87d21e07f3 [GFS2] Fix gfs2_rename deadlock
Second round of gfs2_rename lock re-ordering to allow Anaconda adding
root partition on top of gfs2. Previous to this patch the recursive
lock detector in glock.c can be triggered due to attempting to lock
the rgrp twice. This fixes it by checking to see whether the rgrp
is already locked.

This fixes Red Hat bugzilla #221237

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:31 -05:00
Russell Cattelan
6c93fd1e57 [GFS2] BZ 217008 fsfuzzer fix.
Update the quilt header comments to match the
code changes.

Change gfs2_lookup_simple to return an error in the case
of a NULL inode.
The callers of gfs2_lookup_simple do not check for NULL
in the no entry case and such would end up dereferencing a NULL ptr.

This fixes:
http://projects.info-pull.com/mokb/MOKB-15-11-2006.html

Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:28 -05:00
S. Wendy Cheng
5509826f1e [GFS2] Fix change nlink deadlock
Bugzilla 215088

Fix deadlock in gfs2_change_nlink() while installing RHEL5 into GFS2
partition. The gfs2_rename() apparently needs block allocation for the
new name (into the directory) where it requires rg locks. At the same
time, while updating the nlink count for the replaced file,
gfs2_change_nlink() tries to return the inode meta-data back to resource
group where it needs rg locks too. Our logic doesn't allow process to
acquire these locks recursively by the same process  (RHEL installer)
that results a BUG call. This only happens within rename code path and
only if the destination file exists before the rename operation.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:36:15 -05:00
Steven Whitehouse
28626e2078 [GFS2] Fix glock ordering on inode creation
The lock order here should be parent -> child rather than
numeric order.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:36:33 -05:00
Steven Whitehouse
dcd2479959 [GFS2] Remove unused function from inode.c
The gfs2_glock_nq_m_atime function is unused in so far as its only
ever called with num_gh = 1, and this falls through to the
gfs2_glock_nq_atime function, so we might as well call that directly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:35:57 -05:00
Steven Whitehouse
9e2dbdac3d [GFS2] Remove gfs2_inode_attr_in
This function wasn't really doing the right thing. There was no need
to update the inode size at this point and the updating of the
i_blocks field has now been moved to the places where di_blocks is
updated. A result of this patch and some those preceeding it is that
unlocking a glock is now a much more efficient process, since there
is no longer any requirement to copy data from the gfs2 inode into
the vfs inode at this point.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:52 -05:00
Steven Whitehouse
e7c698d74f [GFS2] Inode number is constant
Since the inode number is constant, we don't need to keep updating
it everytime we refresh the other inode fields.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:48 -05:00
Steven Whitehouse
6b124d8dba [GFS2] Only set inode flags when required
We were setting the inode flags from GFS2's flags far too often, even when they
couldn't possibly have changed. This patch reduces the amount of flag
setting going on so that we do it only when the inode is read in or
when the flags have changed. The create case is covered by the "when
the inode is read in" case.

This also fixes a bug where we didn't set S_SYNC correctly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:45 -05:00
Steven Whitehouse
294caaa3b8 [GFS2] Tidy up 0 initialisations in inode.c
We don't need to use endian conversions for 0 initialisations
when creating a new on-disk inode.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:33 -05:00
Steven Whitehouse
bfded27ba0 [GFS2] Shrink gfs2_inode (8) - i_vn
This shrinks the size of the gfs2_inode by 8 bytes by
replacing the version counter with a one bit valid/invalid
flag.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:30 -05:00
Steven Whitehouse
a9583c7983 [GFS2] Shrink gfs2_inode (7) - di_payload_format
This is almost never used. Its there for backward
compatibility with GFS1. It doesn't need its own
field since it can always be calculated from the
inode mode & flags. This saves a bit more space
in the gfs2_inode.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:26 -05:00
Steven Whitehouse
1a7b1eed58 [GFS2] Shrink gfs2_inode (6) - di_atime/di_mtime/di_ctime
Remove the di_[amc]time fields and use inode->i_[amc]time
fields instead. This saves 24 bytes from the gfs2_inode.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:23 -05:00
Steven Whitehouse
4f56110a00 [GFS2] Shrink gfs2_inode (5) - di_nlink
Remove the di_nlink field in favour of inode->i_nlink and
update the nlink handling to use the proper macros. This
saves 4 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:20 -05:00
Steven Whitehouse
2933f9254a [GFS2] Shrink gfs2_inode (4) - di_uid/di_gid
Remove duplicate di_uid/di_gid fields in favour of using
inode->i_uid/inode->i_gid instead. This saves 8 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:17 -05:00
Steven Whitehouse
b60623c238 [GFS2] Shrink gfs2_inode (3) - di_mode
This removes the duplicate di_mode field in favour of using the
inode->i_mode field. This saves 4 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:14 -05:00
Steven Whitehouse
e7f14f4d09 [GFS2] Shrink gfs2_inode (2) - di_major/di_minor
This removes the device numbers from this structure by using
inode->i_rdev instead. It also cleans up the code in gfs2_mknod.
It results in shrinking the gfs2_inode by 8 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:11 -05:00
Steven Whitehouse
af339c0241 [GFS2] Shrink gfs2_inode (1) - di_header/di_num
The metadata header doesn't need to be stored in the incore
struct gfs2_inode since its constant, and this patch removes it.
Also, there is already a field for the inode's number in the
struct gfs2_inode, so we don't need one in struct gfs2_dinode_host
as well.

This saves 28 bytes of space in the struct gfs2_inode.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:07 -05:00
Steven Whitehouse
4cc14f0b88 [GFS2] Change argument to gfs2_dinode_print
Change argument for gfs2_dinode_print in order to prepare
for removal of duplicate fields between struct inode and
struct gfs2_dinode_host.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:03 -05:00
Steven Whitehouse
ea744d01c6 [GFS2] Move gfs2_dinode_in to inode.c
gfs2_dinode_in() is only ever called from one place, so move it
to that place (in inode.c) and make it static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:00 -05:00
Steven Whitehouse
891ea14712 [GFS2] Change argument to gfs2_dinode_in
This is a preliminary patch to enable the removal of fields
in gfs2_dinode_host which are duplicated in struct inode.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:57 -05:00
Steven Whitehouse
539e5d6b7a [GFS2] Change argument of gfs2_dinode_out
Everywhere this was called, a struct gfs2_inode was available,
but despite that, it was always called with a struct gfs2_dinode
as an argument. By making this change it paves the way to start
eliminating fields duplicated between the kernel's struct inode
and the struct gfs2_dinode.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:54 -05:00
Al Viro
b44b84d765 [GFS2] gfs2 misc endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:46 -05:00
Al Viro
629a21e7ec [GFS2] split and annotate gfs2_inum
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:32 -05:00
Al Viro
e697264709 [GFS2] split and annotate gfs2_inum_range
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:11 -05:00
Al Viro
3ca68df6ee [GFS2] split gfs2_dinode into on-disk and host variants
The latter is used as part of gfs2-private part of struct inode.
It actually stores a lot of fields differently; for now the
declaration is just cloned, inode field is swtiched and changes
propagated.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:32:50 -05:00
Steven Whitehouse
26d83dedf6 [GFS2] Fix OOM error handling
Fix the OOM error handling in inode.c where it was possible for
a NULL pointer to be dereferenced.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-06 08:59:42 -05:00
Ryan O'Hara
fcb47e0bd2 [GFS2] Initialize SELinux extended attributes at inode creation time.
This patch has gfs2_security_init declared as a static function, which
is correct. As a result, the declaration of this function in inode.h is
removed (and thus inode.h is unchanged). Also removed #include eaops.h,
which is not needed.

Signed-Off-By: Ryan O'Hara <rohara@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-03 11:57:35 -04:00
Steven Whitehouse
f92a0b6ff4 [GFS2] Mark nlink cleared so VFS sees it happen
This does nothing atm, but will be required for later support of
r/o bind mounts.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 16:01:53 -04:00
Steven Whitehouse
48516ced21 [GFS2] Remove uneeded endian conversion
In many places GFS2 was calling the endian conversion routines
for an inode even when only a single field, or a few fields might
have changed. As a result we were copying lots of data needlessly.

This patch replaces those calls with conversion of just the
required fields in each case. This should be faster and easier
to understand. There are still other places which suffer from this
problem, but this is a start in the right direction.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 12:39:19 -04:00
Theodore Ts'o
825f9075d7 [GFS2] inode-diet: Eliminate i_blksize from the inode structure
This eliminates the i_blksize field from struct inode.  Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.

Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-28 08:32:51 -04:00
Theodore Ts'o
bba9dfd835 [GFS2] inode_diet: Replace inode.u.generic_ip with inode.i_private (gfs)
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86.  (It would be more on an x86_64 system).  This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).

This patch:

The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer.  Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer.  This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-28 08:32:24 -04:00
Steven Whitehouse
907b9bceb4 [GFS2/DLM] Fix trailing whitespace
As per Andrew Morton's request, removed trailing whitespace.

Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-25 09:26:04 -04:00
Fabio Massimo Di Nitto
7d308590ae [GFS2] Export lm_interface to kernel headers
lm_interface.h has a few out of the tree clients such as GFS1
and userland tools.

Right now, these clients keeps a copy of the file in their build tree
that can go out of sync.

Move lm_interface.h to include/linux, export it to userland and
clean up fs/gfs2 to use the new location.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-19 08:45:18 -04:00
Steven Whitehouse
c26687113a [GFS2] Remove a cast, tidy gfs2_inode_attr_in
The remains of the changes for Jan Engelhardt's third email. Remove
a cast and tidy up gfs2_inode_attr_in.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-04 13:55:48 -04:00
Steven Whitehouse
cd915493fc [GFS2] Change all types to uX style
This makes all fixed size types have consistent names.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-04 12:49:07 -04:00
Steven Whitehouse
75d3b817a0 [GFS2] Tidy up bmap/inode code
As per Jan Engelhardt's third set of comments, this make various
code style changes and moves the structures from format.h into
super.c, which was the only place that format.h was actually used.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-04 11:41:31 -04:00
Steven Whitehouse
e9fc2aa091 [GFS2] Update copyright, tidy up incore.h
As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this
updates the copyright message to say "version" in full rather than
"v.2". Also incore.h has been updated to remove forward structure
declarations which are not required.

The gfs2_quota_lvb structure has now had endianess annotations added
to it. Also quota.c has been updated so that we now store the
lvb data locally in endian independant format to avoid needing
a structure in host endianess too. As a result the endianess
conversions are done as required at various points and thus the
conversion routines in lvb.[ch] are no longer required. I've
moved the one remaining constant in lvb.h thats used into lm.h
and removed the unused lvb.[ch].

I have not changed the HIF_ constants. That is left to a later patch
which I hope will unify the gh_flags and gh_iflags fields of the
struct gfs2_holder.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-01 11:05:15 -04:00
Steven Whitehouse
420b9e5e45 [GFS2] Tidy up in various files
Tidy up some files and remove an unused routine in meta_io.h. Also
added a bit of extra debugging in meta_io.h.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-07-31 15:42:17 -04:00
Abhijith Das
b2a580d87b [PATCH] patch to init di_payload_format field in gfs2_dinode
A missing initialisation when creating a new on disk inode.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-07-11 09:54:17 -04:00
Steven Whitehouse
4340fe6253 [GFS2] Add generation number
This adds a generation number for the eventual use of NFS to the
ondisk inode. Its backward compatible with the current code since
it doesn't really matter what the generation number is to start with,
and indeed since its set to zero, due to it being taken from padding
in both the inode and rgrp header, it should be fine.

The eventual plan is to use this rather than no_formal_ino in the
NFS filehandles. At that point no_formal_ino will be unused.

At the same time we also add a releasepages call back to the
"normal" address space for gfs2 inodes. Also I've removed a
one-linrer function thats not required any more.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-07-11 09:46:33 -04:00
Steven Whitehouse
29937ac6ca [GFS2] Fixes to scanning of glocks (again)
This really is the correct fix this time. We just ignore all
glocks associated with inodes until the inodes are pushed
from the inode cache. At that point the glocks are queued for
reclaim, so we don't need to do it here.

Also fix one or two other minor bugs.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-07-06 17:58:03 -04:00
Steven Whitehouse
bdd512aeea [GFS2] Remove unused flag
The flag GIF_MIN_INIT is no longer used or required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-06-22 15:26:33 -04:00
Steven Whitehouse
faf450ef4a [GFS2] Remove gfs2_repermission
gfs2_repermission is just a wrapper for permission, so remove it and
call permission directly where required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-06-22 10:59:10 -04:00
Steven Whitehouse
feaa7bba02 [GFS2] Fix unlinked file handling
This patch fixes the way we have been dealing with unlinked,
but still open files. It removes all limits (other than memory
for inodes, as per every other filesystem) on numbers of these
which we can support on GFS2. It also means that (like other
fs) its the responsibility of the last process to close the file
to deallocate the storage, rather than the person who did the
unlinking. Note that with GFS2, those two events might take place
on different nodes.

Also there are a number of other changes:

 o We use the Linux inode subsystem as it was intended to be
used, wrt allocating GFS2 inodes
 o The Linux inode cache is now the point which we use for
local enforcement of only holding one copy of the inode in
core at once (previous to this we used the glock layer).
 o We no longer use the unlinked "special" file. We just ignore it
completely. This makes unlinking more efficient.
 o We now use the 4th block allocation state. The previously unused
state is used to track unlinked but still open inodes.
 o gfs2_inoded is no longer needed
 o Several fields are now no longer needed (and removed) from the in
core struct gfs2_inode
 o Several fields are no longer needed (and removed) from the in core
superblock

There are a number of future possible optimisations and clean ups
which have been made possible by this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-06-14 15:32:57 -04:00
Steven Whitehouse
320dd101e2 [GFS2] glock debugging and inode cache changes
This adds some extra debugging to glock.c and changes
inode.c's deallocation code to call the debugging code
at a suitable moment. I'm chasing down a particular bug
to do with deallocation at the moment and the code can
go again once the bug is fixed.

Also this includes the first part of some changes to unify
the Linux struct inode and GFS2's struct gfs2_inode. This
transformation will happen in small parts over the next short
period.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 16:25:27 -04:00
Steven Whitehouse
3a8a9a1034 [GFS2] Update copyright date to 2006
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 15:09:15 -04:00
Steven Whitehouse
bd8968010a [GFS2] Remove semaphore.h from C files
We no longer use semaphores, everything has been converted to
mutex or rwsem, so we don't need to include this header any more.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-18 14:54:58 -04:00
Steven Whitehouse
64c14ea73b [GFS2] Fix ref count bug that used to bite us on umount
The ref count of certain glock's got elevated too far during unlink
which caused umount to fail. This fixes it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-16 13:37:11 -04:00
Steven Whitehouse
9801f6461e [GFS2] Remove incorrect initialisation of gh_owner
The gh_owner field shouldn't be set or reset outside the glock code.
These were left over from when recursive locking was allowed. It
isn't any more, so they are not needed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-12 14:06:02 -04:00
Steven Whitehouse
fd88de569b [GFS2] Readpages support
This adds readpages support (and also corrects a small bug in
the readpage error path at the same time). Hopefully this will
improve performance by allowing GFS to submit larger lumps of
I/O at a time.

In order to simplify the setting of BH_Boundary, it currently gets
set when we hit the end of a indirect pointer block. There is
always a boundary at this point with the current allocation code.
It doesn't get all the boundaries right though, so there is still
room for improvement in this.

See comments in fs/gfs2/ops_address.c for further information about
readpages with GFS2.

Signed-off-by: Steven Whitehouse
2006-05-05 16:59:11 -04:00
Steven Whitehouse
363275216c [GFS2] Reordering in deallocation to avoid recursive locking
Despite my earlier careful search, there was a recursive lock left
in the deallocation code. This removes it. It also should speed up
deallocation be reducing the number of locking operations which take
place by using two "try lock" operations on the two locks involved in
inode deallocation which allows us to grab the locks out of order
(compared with NFS which grabs the inode lock first and the iopen
lock later). It is ok for us to fail while doing this since if it
does fail it means that someone else is still using the inode and
thus it wouldn't be possible to deallocate anyway.

This fixes the bug reported to me by Rob Kenna.

Cc: Rob Kenna <rkenna@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:46:21 -04:00
Steven Whitehouse
190562bd84 [GFS2] Fix a bug: scheduling under a spinlock
At some stage, a mutex was added to gfs2_glock_put() without
checking all its call sites. Two of them were called from
under a spinlock causing random delays at various points and
crashes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-20 16:57:23 -04:00
Steven Whitehouse
71b86f562b [GFS2] Further updates to dir and logging code
This reduces the size of the directory code by about 3k and gets
readdir() to use the functions which were introduced in the previous
directory code update.

Two memory allocations are merged into one. Eliminates zeroing of some
buffers which were never used before they were initialised by
other data.

There is still scope for further improvement in the directory code.

On the logging side, a hand created mutex has been replaced by a
standard Linux mutex in the log allocation code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-03-28 14:14:04 -05:00
Steven Whitehouse
c752666c17 [GFS2] Fix bug in directory code and tidy up
Due to a typo, the dir leaf split operation was (for the first
split in a directory) writing the new hash vaules at the
wrong offset. This is now fixed.

Also some other tidy ups are included:

 - We use GFS2's hash function for dentries (see ops_dentry.c) so that
   we don't have to keep recalculating the hash values.
 - A lot of common code is eliminated between the various directory
   lookup routines.
 - Better error checking on directory lookup (previously different
   routines checked for different errors)
 - The leaf split operation has a couple of redundant operations
   removed from it, so it should be faster.

There is still further scope for further clean ups in the directory
code, and readdir in particular could do with slimming down a bit.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-03-20 12:30:04 -05:00
Steven Whitehouse
c9fd43078f [GFS2] Tidy up mount code.
We no longer lookup ".gfs2_admin" in the root directory in order to
find it, but instead use the inode number given in the superblock.
Both the root directory and the admin directory are now looked up using
the same routine, so the redundant code is removed.

Also, there is no longer a reference to the root inode in the
GFS2 super block. When required this can be retreived via
sb->s_root->d_inode instead.

Assuming that we introduce a metadata filesystem type for GFS, then
this is a first step towards that goal.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-03-01 15:31:02 -05:00
Steven Whitehouse
5c676f6d35 [GFS2] Macros removal in gfs2.h
As suggested by Pekka Enberg <penberg@cs.helsinki.fi>.

The DIV_RU macro is renamed DIV_ROUND_UP and and moved to kernel.h
The other macros are gone from gfs2.h as (although not requested
by Pekka Enberg) are a number of included header file which are now
included individually. The inode number comparison function is
now an inline function.

The DT2IF and IF2DT may be addressed in a future patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-27 17:23:27 -05:00
Steven Whitehouse
568f4c9659 [GFS2] 80 Column audit of GFS2
Requested by:
Prarit Bhargava <prarit@redhat.com>

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-27 12:00:42 -05:00
Steven Whitehouse
f55ab26a8f [GFS2] Use mutices rather than semaphores
As well as a number of minor bug fixes, this patch changes GFS
to use mutices rather than semaphores. This results in better
information in case there are any locking problems.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-21 12:51:39 +00:00
Steven Whitehouse
7359a19cc7 [GFS2] Fix for root inode ref count bug
Umount is now working correctly again. The bug was due to
not getting an extra ref count when mounting the fs. We
should have bumped it by two (once for the internal pointer
to the root inode from the super block and once for the
inode hanging off the dcache entry for root).

Also this patch tidys up the code dealing with looking up
and creating inodes. We now pass Linux inodes (with gfs2_inodes
attached) rather than the other way around and this reduces code
duplication in various places.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-13 12:27:43 +00:00
Steven Whitehouse
f42faf4fa4 [GFS2] Add gfs2_internal_read()
Add the new external read function. Its temporarily in jdata.c
even though the protoype is in ops_file.h - this will change
shortly. The current implementation will change to a page cache
one when that happens.

In order to effect the above changes, the various internal inodes
now have Linux inodes attached to them. We keep the references to
the Linux inodes, rather than the gfs2_inodes in the super block.

In order to get everything to work correctly I've had to reorder
the init sequence on mount (which I should probably have done
earlier when .gfs2_admin was made visible).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-01-30 18:34:10 +00:00
Steven Whitehouse
2442a098be [GFS2] Bug fix relating to endian conversion in inode.c
A two line fix to get endian conversion correct.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-01-30 11:49:32 +00:00
Steven Whitehouse
d4e9c4c3bf [GFS2] Add an additional argument to gfs2_trans_add_bh()
This adds an extra argument to gfs2_trans_add_bh() to indicate whether the
bh being added to the transaction is metadata or data. Its currently unused
since all existing callers set it to 1 (metadata) but following patches will
make use of it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-01-18 11:19:28 +00:00
Steven Whitehouse
b96ca4fa4e [GFS2] Update init_dinode() to reduce stack usage
We no longer allocate a dinode on the stack in init_dinode()
and we no longer use gfs2_dinode_out (eliminating one copy) and
gfs2_meta_header_in (eliminating another copy). The meta_header_in
fucntion is now no longer referenced from outside gfs2_ondisk.c, so
make it static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-01-18 10:57:10 +00:00
David Teigland
b3b94faa5f [GFS2] The core of GFS2
This patch contains all the core files for GFS2.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-01-16 16:50:04 +00:00