Commit Graph

109 Commits

Author SHA1 Message Date
Kurt Hackel
c27069e6cf ocfs2: continue recovery when a dead node is encountered
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:11 -07:00
Kurt Hackel
67a187412b ocfs2: remove unneccesary spin_unlock() in dlm_remaster_locks()
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:10 -07:00
Kurt Hackel
6a41321121 ocfs2: dlm_remaster_locks() should never exit without completing
We cannot restart recovery. Once we begin to recover a node, keep the state
of the recovery intact and follow through, regardless of any other node
deaths that may occur.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:09 -07:00
Kurt Hackel
c8df412e1c ocfs2: special case recovery lock in dlmlock_remote()
If the previous master of the recovery lock dies, let calc_usage take it
down completely and let the caller completely redo the dlmlock() call.
Otherwise, there will never be an opportunity to re-master the lockres and
recovery wont be able to progress.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:08 -07:00
Kurt Hackel
36407488b1 ocfs2: pending mastery asserts and migrations should block each other
Use the existing structure for blocking migrations when ASTs are pending to
achieve the same result. If we can catch the assert before it goes on the
wire, just cancel it and let the migration continue.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:08 -07:00
Kurt Hackel
c87a9ae705 ocfs2: temporarily disable automatic lock migration
Now we never change the owner of a lock resource until unmount or node
death. This will be re-enabled once some issues in the algorithm used have
been resolved.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:07 -07:00
Kurt Hackel
2abaf97e62 ocfs2: do not unconditionally purge the lockres in dlmlock_remote()
In dlmlock_remote(), do not call purge_lockres until the lock resource
actually changes. otherwise, the mastery info on the lockres will go away
underneath the caller.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:06 -07:00
Kurt Hackel
aa087b8497 ocfs2: increase backoff before waiting for recovery
When mastering non-recovery lock resources, additional time was frequently
needed to allow the disk heartbeat to catch up with the network timeout. the
recovery lock resource is time critical and avoids this path.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:05 -07:00
Kurt Hackel
f42a100b22 ocfs2: have dlm_pre_master_reco_lockres() ignore dead nodes
Recovery will spin in dlm_pre_master_reco_lockres if we do not ignore
timed-out network responses from dead nodes.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:05 -07:00
Kurt Hackel
6ff06a9391 ocfs2: give the dlm dirty list a reference on the lockres
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:04 -07:00
Kurt Hackel
e7e69eb389 ocfs2: teach dlm_restart_lock_mastery() to wait on recovery
Change behavior of dlm_restart_lock_mastery() when a node goes down.  Dump
all responses that have been collected and start over.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:03 -07:00
Kurt Hackel
e4eb03681a ocfs2: gracefully handle stale create_lock messages.
This is an error on the sending side, so gracefully error out on the
receiving end.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:02 -07:00
Kurt Hackel
ccd8b1f916 ocfs2: update lvb immediately during recovery
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:02 -07:00
Kurt Hackel
588e00902b ocfs2: do not send master requests to localhost
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:01 -07:00
Kurt Hackel
8b2198097a ocfs2: purge lockres' sooner
Immediately purge a lockress that the local node is not the master of.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:00 -07:00
Kurt Hackel
343e26a400 ocfs2: dump mismatching migrated lvbs before BUG()
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:59 -07:00
Kurt Hackel
466d1a4591 ocfs2: make dlm recovery finalization 2 stage
Makes it easier for the recovery process to deal with node death.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:58 -07:00
Kurt Hackel
69d72b066c ocfs2: dlm recovery / lockres reference count fix
Take a reference on lockres structures while they are on the recovery list.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:58 -07:00
Kurt Hackel
a9ee4c8a67 ocfs2: better error handling during assert master message
handle errors during lock assert master by either killing self or other node

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:57 -07:00
Kurt Hackel
a7f90d83ea ocfs2: dump lockres info before we BUG() on a bad reference
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:56 -07:00
Mark Fasheh
c0a8520c73 ocfs2: do LVB puts in place
Don't wait until the AST will be fired to do the LVB copy into the lock
resource.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:55 -07:00
Kurt Hackel
aa85235427 ocfs2: mle ref count debugging
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:55 -07:00
Kurt Hackel
dc2ed195dd ocfs2: allow for an assert message during lock mastery
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:54 -07:00
Kurt Hackel
2d1a868c56 ocfs2: take mle reference during migration
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:53 -07:00
Kurt Hackel
41b8c8a101 ocfs2: properly initialize the mle structure
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:52 -07:00
Kurt Hackel
da01ad0552 ocfs2: detach mle from heartbeat events
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:52 -07:00
Kurt Hackel
a2bf04774b ocfs2: mle ref counting fixes
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:51 -07:00
Kurt Hackel
958837197e ocfs2: better mle debugging
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:50 -07:00
Kurt Hackel
d6dea6e973 ocfs2: clean up recovery related messages
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:49 -07:00
Kurt Hackel
29c0fa0f56 ocfs2: handle network errors during recovery
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:49 -07:00
Kurt Hackel
c3187ce5e3 ocfs2: only recover one dead node at a time
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:48 -07:00
Kurt Hackel
ab27eb6f47 ocfs2: Better tracking for recovery state changes
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:47 -07:00
Kurt Hackel
8bc674cb48 ocfs2: Fix empty lvb check
The check for an empty lvb should check the entire buffer not just the first
byte.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:46 -07:00
Kurt Hackel
aba9aac788 ocfs2: fix inverted logic in dlm_is_node_dead
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:45 -07:00
Kurt Hackel
2580a580e0 ocfs2: recheck lockres master before sending an unlock request.
Recovery may have happened and it may now be mastered locally.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:45 -07:00
Kurt Hackel
8d79d088e8 ocfs2: add a small delay after a failed migration
Otherwise we risk starving other threads.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:44 -07:00
Mark Fasheh
685f1adb38 ocfs2: silence a compile warning in dlm_alloc_pagevec()
Reported by Andrew Morton.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:43 -07:00
Joel Becker
c8f33b6e86 [PATCH] ocfs2: Alloc at least a page for the DLM hash
The OCFS2 DLM allocates a number of pages for a hash to lookup locks.
There was a bug where a PAGE_SIZE bigger than the hash size (eg, 64K
pages) would result in zero pages allocated.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:42 -07:00
Daniel Phillips
03d864c02c ocfs2: allocate lockres hash pages in an array
This allows us to have a hash table greater than a single page which greatly
improves dlm performance on some tests.

Signed-off-by: Daniel Phillips <phillips@google.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:42 -07:00
Mark Fasheh
95c4f581d6 ocfs2: inline dlm_lockres_get()
It's called on every lookup so this might help performance a bit.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:41 -07:00
Daniel Phillips
4198985f7a [PATCH] Clean up ocfs2 hash probe and make it faster
Signed-Off-By: Daniel Phillips <phillips@google.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:40 -07:00
Mark Fasheh
a3d3329159 ocfs2: calculate lockid hash values outside of the spinlock
Fixes a performance bug - pointed out by Andrew.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:39 -07:00
Mark Fasheh
65c491d833 ocfs2: move lockres qstr next to hlist_node structure
Gains us a bit of performance on loads which heavily hit the lockres hash.
Patch suggested by Daniel Phillips <phillips@google.com>.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:39 -07:00
Akinobu Mita
f116629d03 [PATCH] fs: use list_move()
This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B) under fs/.

Cc: Ian Kent <raven@themaw.net>
Acked-by: Joel Becker <joel.becker@oracle.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Hans Reiser <reiserfs-dev@namesys.com>
Cc: Urban Widmark <urban@teststation.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:18 -07:00
Pekka Enberg
090d2b185d [PATCH] read_mapping_page for address space
Add read_mapping_page() which is used for callers that pass
mapping->a_ops->readpage as the filler for read_cache_page.  This removes
some duplication from filesystem code.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
David Howells
726c334223 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch.  That reduced the significance of
sb->s_root, allowing NFS to place a fake root there.  However, NFS does
require a dentry to use as a target for the statfs operation.  This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
David Howells
454e2398be [PATCH] VFS: Permit filesystem to override root dentry on mount
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers.  For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing.  In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

 (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
     pointer argument and return an integer, so most filesystems have to change
     very little.

 (*) If one of the convenience function is not used, then get_sb() should
     normally call simple_set_mnt() to instantiate the vfsmount. This will
     always return 0, and so can be tail-called from get_sb().

 (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
     dcache upon superblock destruction rather than shrink_dcache_anon().

     This is required because the superblock may now have multiple trees that
     aren't actually bound to s_root, but that still need to be cleaned up. The
     currently called functions assume that the whole tree is rooted at s_root,
     and that anonymous dentries are not the roots of trees which results in
     dentries being left unculled.

     However, with the way NFS superblock sharing are currently set to be
     implemented, these assumptions are violated: the root of the filesystem is
     simply a dummy dentry and inode (the real inode for '/' may well be
     inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
     with child trees.

     [*] Anonymous until discovered from another tree.

 (*) The documentation has been adjusted, including the additional bit of
     changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
Sunil Mushran
afae00ab45 ocfs2: fix gfp mask in some file system paths
We were using GFP_KERNEL in a handful of places which really wanted
GFP_NOFS. Fix this.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-05-17 14:38:49 -07:00
Mark Fasheh
dd4a2c2bfe ocfs2: Don't populate uptodate cache in ocfs2_force_read_journal()
This greatly reduces the amount of memory useded during recovery.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-05-17 14:38:48 -07:00
Mark Fasheh
c4374f8a60 ocfs2: take meta data lock in ocfs2_file_aio_read()
Temporarily take the meta data lock in ocfs2_file_aio_read() to allow us to
update our inode fields.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-05-17 14:38:47 -07:00