linux/fs/gfs2
Bob Peterson f90e5b5b13 GFS2: Processes waiting on inode glock that no processes are holding
This patch fixes a race in the GFS2 glock state machine that may
result in lockups.  The symptom is that all nodes but one will
hang, waiting for a particular glock.  All the holder records
will have the "W" (Waiting) bit set.  The other node will
typically have the glock stuck in Exclusive mode (EX) with no
holder records, but the dinode will be cached.  In other words,
an entry with "I:" will appear in the glock dump for that glock,
but nothing else.

The race has to do with the glock "Pending Demote" bit, which
can be set, then immediately reset, thus losing the fact that
another node needs the glock.  The sequence of events is:

1. Something schedules the glock workqueue (e.g. glock request from fs)
2. The glock workqueue gets to the point between the test of the reply pending
bit and the spin lock:

        if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
                finish_xmote(gl, gl->gl_reply);
                drop_ref = 1;
        }
        down_read(&gfs2_umount_flush_sem);         <---- i.e. here
        spin_lock(&gl->gl_spin);

3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
            (b) the demote request which sets GLF_PENDING_DEMOTE

4. The following test is executed:

        if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
            gl->gl_state != LM_ST_UNLOCKED &&
            gl->gl_demote_state != LM_ST_EXCLUSIVE) {

This resets the pending demote flag, and gl->gl_demote_state is not equal to
exclusive, however because the reply from the dlm arrived after we checked for
the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.

The patch closes the timing window by only transitioning the
"Pending demote" bit to the "demote" flag once we know the
other conditions (not unlocked and not exclusive) are met.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-25 10:37:11 +01:00
..
acl.c GFS2: Post-VFS scale update for RCU path walk 2011-01-21 09:39:24 +00:00
acl.h fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
aops.c GFS2: Improve bug trap code in ->releasepage() 2011-05-03 11:49:19 +01:00
bmap.c GFS2: Wipe directory hash table metadata when deallocating a directory 2011-05-21 14:05:58 +01:00
bmap.h GFS2: New truncate sequence 2010-09-20 11:18:16 +01:00
dentry.c gfs2: fix d_revalidate oopsen on NFS exports 2011-03-10 03:44:48 -05:00
dir.c GFS2: When adding a new dir entry, inc link count if it is a subdir 2011-05-09 16:43:53 +01:00
dir.h GFS2: When adding a new dir entry, inc link count if it is a subdir 2011-05-09 16:43:53 +01:00
export.c GFS2: Make writeback more responsive to system conditions 2011-04-20 09:01:37 +01:00
file.c GFS2: make sure fallocate bytes is a multiple of blksize 2011-05-03 11:47:42 +01:00
gfs2.h
glock.c GFS2: Processes waiting on inode glock that no processes are holding 2011-05-25 10:37:11 +01:00
glock.h GFS2: Alter point of entry to glock lru list for glocks with an address_space 2011-04-20 08:59:48 +01:00
glops.c GFS2: Move gfs2_refresh_inode() and friends into glops.c 2011-05-09 16:44:49 +01:00
glops.h GFS2: Clean up fsync() 2011-04-20 09:00:41 +01:00
incore.h GFS2: Use UUID field in generic superblock 2011-05-10 15:01:59 +01:00
inode.c GFS2: Move all locking inside the inode creation function 2011-05-13 12:11:17 +01:00
inode.h GFS2: Clean up symlink creation 2011-05-13 10:34:59 +01:00
Kconfig GFS2: No longer experimental 2010-09-20 11:18:46 +01:00
lock_dlm.c GFS2: Fix glock deallocation race 2011-03-09 10:58:04 +00:00
log.c GFS2: Wait properly when flushing the ail list 2011-05-21 19:21:07 +01:00
log.h GFS2: Make writeback more responsive to system conditions 2011-04-20 09:01:37 +01:00
lops.c GFS2: Optimise glock lru and end of life inodes 2011-04-20 09:01:17 +01:00
lops.h
main.c GFS2: Optimise glock lru and end of life inodes 2011-04-20 09:01:17 +01:00
Makefile GFS2: Rename ops_inode.c to inode.c 2011-05-10 13:12:49 +01:00
meta_io.c GFS2: Improve tracing support (adds two flags) 2011-04-20 09:00:59 +01:00
meta_io.h GFS2: Remove unused macro 2011-04-20 09:00:24 +01:00
ops_fstype.c GFS2: Use UUID field in generic superblock 2011-05-10 15:01:59 +01:00
quota.c GFS2: quota allows exceeding hard limit 2011-03-09 09:32:44 +00:00
quota.h mm: add context argument to shrinker callback 2010-07-19 14:56:17 +10:00
recovery.c GFS2: Fix spectator umount issue 2010-09-29 14:20:52 +01:00
recovery.h gfs2: use workqueue instead of slow-work 2010-07-23 13:14:25 +02:00
rgrp.c GFS2: Wipe directory hash table metadata when deallocating a directory 2011-05-21 14:05:58 +01:00
rgrp.h GFS2: deallocation performance patch 2011-02-24 12:13:48 +00:00
super.c GFS2: Move final part of inode.c into super.c 2011-05-09 16:45:38 +01:00
super.h gfs: constify xattr_handler 2010-05-21 18:31:20 -04:00
sys.c GFS2: Use UUID field in generic superblock 2011-05-10 15:01:59 +01:00
sys.h GFS2: Remove ancient, unused code 2009-01-05 07:39:13 +00:00
trace_gfs2.h GFS2: Add an AIL writeback tracepoint 2011-04-20 09:01:58 +01:00
trans.c GFS2: Various gfs2_logd improvements 2010-05-05 09:39:18 +01:00
trans.h GFS2: reserve more blocks for transactions 2010-09-28 09:44:24 +01:00
util.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
util.h GFS2: Metadata address space clean up 2010-03-01 14:07:37 +00:00
xattr.c GFS2: Clean up duplicated setattr code 2010-11-30 10:30:19 +00:00
xattr.h sanitize xattr handler prototypes 2009-12-16 12:16:49 -05:00