linux/fs/gfs2
Bob Peterson 03678a99d1 gfs2: Ignore dlm recovery requests if gfs2 is withdrawn
When a node fails, user space informs dlm of the node failure,
and dlm instructs gfs2 on the surviving nodes to perform journal
recovery. It does this by calling various callback functions in
lock_dlm.c. To mark its progress, it keeps generation numbers
and recover bits in a dlm "control" lock lvb, which is seen by
all nodes to determine which journals need to be replayed.

The gfs2 on all nodes get the same recovery requests from dlm,
so they all try to do the recovery, but only one will be
granted the exclusive lock on the journal. The others fail
with a "Busy" message on their "try lock."

However, when a node is withdrawn, it cannot safely do any
recovery or replay any journals. To make matters worse,
gfs2 might withdraw as a result of attempting recovery. For
example, this might happen if the device goes offline, or if
an hba fails. But in today's gfs2 code, it doesn't check for
being withdrawn at any step in the recovery process. What's
worse is that these callbacks from dlm have no return code,
so there is no way to indicate failure back to dlm. We can
send a "Recovery failed" uevent eventually, but that tells
user space what happened, not dlm's kernel code.

Before this patch, lock_dlm would perform its recovery steps but
ignore the result, and eventually it would still update its
generation number in the lvb, despite the fact that it may have
withdrawn or encountered an error. The other nodes would then
see the newer generation number in the lvb and conclude that
they don't need to do recovery because the generation number
is newer than the last one they saw. They think a different
node has already recovered the journal.

This patch adds checks to several of the callbacks used by dlm
in its recovery state machine so that the functions are ignored
and skipped if an io error has occurred or if the file system
is withdrawn. That prevents the lvb bits from being updated, and
therefore dlm and user space still see the need for recovery to
take place.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10 07:39:50 -06:00
..
acl.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
acl.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
aops.c gfs2: minor cleanup: remove unneeded variable ret in gfs2_jdata_writepage 2020-01-08 10:39:57 -06:00
aops.h gfs2: mark stuffed_readpage static 2019-07-03 14:45:18 +02:00
bmap.c GFS2 changes for this merge window: 2019-12-05 13:20:11 -08:00
bmap.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
dentry.c gfs2: untangle the logic in gfs2_drevalidate 2019-09-03 09:42:41 +02:00
dir.c fs/gfs2: remove unused IS_DINODE and IS_LEAF macros 2020-01-21 11:19:38 +01:00
dir.h gfs2: Delete an unnecessary check before brelse() 2019-09-04 20:22:17 +02:00
export.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
file.c gfs2: fix O_SYNC write handling 2020-02-06 18:49:41 +01:00
gfs2.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
glock.c gfs2: Rework how rgrp buffer_heads are managed 2020-02-10 07:39:48 -06:00
glock.h gfs2: Use async glocks for rename 2019-09-04 20:22:17 +02:00
glops.c gfs2: Rework how rgrp buffer_heads are managed 2020-02-10 07:39:48 -06:00
glops.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
incore.h gfs2: log error reform 2020-02-10 07:39:49 -06:00
inode.c gfs2: Avoid access time thrashing in gfs2_inode_lookup 2020-01-15 15:20:07 +01:00
inode.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
Kconfig treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
lock_dlm.c gfs2: Ignore dlm recovery requests if gfs2 is withdrawn 2020-02-10 07:39:50 -06:00
log.c gfs2: log error reform 2020-02-10 07:39:49 -06:00
log.h gfs2: eliminate ssize parameter from gfs2_struct2blk 2020-01-07 18:46:06 +01:00
lops.c gfs2: Only complain the first time an io error occurs in quota or log 2020-02-10 07:39:50 -06:00
lops.h gfs2: Remove active journal side effect from gfs2_write_log_header 2019-11-12 15:17:53 +01:00
main.c SPDX update for 5.2-rc4 2019-06-08 12:52:42 -07:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
meta_io.c gfs2: Introduce function gfs2_withdrawn 2019-11-14 19:46:18 +01:00
meta_io.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
ops_fstype.c gfs2: eliminate ssize parameter from gfs2_struct2blk 2020-01-07 18:46:06 +01:00
quota.c gfs2: Only complain the first time an io error occurs in quota or log 2020-02-10 07:39:50 -06:00
quota.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
recovery.c gfs2: Ignore dlm recovery requests if gfs2 is withdrawn 2020-02-10 07:39:50 -06:00
recovery.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
rgrp.c gfs2: Rework how rgrp buffer_heads are managed 2020-02-10 07:39:48 -06:00
rgrp.h gfs2: Rework how rgrp buffer_heads are managed 2020-02-10 07:39:48 -06:00
super.c gfs2: Abort gfs2_freeze if io error is seen 2019-11-15 17:57:30 +01:00
super.h gfs2: Convert gfs2 to fs_context 2019-09-18 22:47:05 -04:00
sys.c gfs2: Split gfs2_lm_withdraw into two functions 2020-02-10 07:39:44 -06:00
sys.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
trace_gfs2.h gfs2: eliminate rs_inum and reduce the size of gfs2 inodes 2018-06-21 07:39:31 -05:00
trans.c Revert "gfs2: eliminate tr_num_revoke_rm" 2020-01-28 15:04:53 +01:00
trans.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
util.c gfs2: Introduce concept of a pending withdraw 2020-02-10 07:39:47 -06:00
util.h gfs2: Introduce concept of a pending withdraw 2020-02-10 07:39:47 -06:00
xattr.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00
xattr.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 2019-06-05 17:37:12 +02:00