linux/fs/ocfs2
Eric Ren 439a36b8ef ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a precess
already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that are
invoked directly by vfs code.  For instance:

  const struct inode_operations ocfs2_file_iops = {
      .permission     = ocfs2_permission,
      .get_acl        = ocfs2_iop_get_acl,
      .set_acl        = ocfs2_iop_set_acl,
  };

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):

  do_sys_open
   may_open
    inode_permission
     ocfs2_permission
      ocfs2_inode_lock() <=== first time
       generic_permission
        get_acl
         ocfs2_iop_get_acl
  	ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two of
ocfs2_inode_lock().  Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of
the remote EX lock request.  Another hand, the recursive cluster lock
(the second one) will be blocked in in __ocfs2_cluster_lock() because of
OCFS2_LOCK_BLOCKED.  But, the downconvert never complete, why? because
there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.

1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track
   of the processes' pid who has taken the cluster lock of this lock
   resource;

2. introduce a new flag for ocfs2_inode_lock_full:
   OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for
   us if we've got cluster lock.

3. export a helper: ocfs2_is_locked_by_me() is used to check if we have
   got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs
routines can call into each other.

The performance penalty of processing the holder list should only be
seen at a few cases where the tracking logic is used, such as get/set
acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real
world, as far as I can see, including permission check,
(get|set)_(acl|attr), and the gfs2 code also do so.

[sfr@canb.auug.org.au remove some inlines]
Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:27 -08:00
..
cluster locking/atomic, kref: Add kref_read() 2017-01-14 11:37:18 +01:00
dlm locking/atomic, kref: Add kref_read() 2017-01-14 11:37:18 +01:00
dlmfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
acl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
acl.h ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00
alloc.c ocfs2: add newlines to some error messages 2016-12-10 12:39:45 -08:00
alloc.h ocfs2: retry on ENOSPC if sufficient space in truncate log 2016-08-02 17:31:41 -04:00
aops.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-12-17 18:44:00 -08:00
aops.h ocfs2: clean up unused 'page' parameter in ocfs2_write_end_nolock() 2016-12-12 18:55:06 -08:00
blockcheck.c
blockcheck.h
buffer_head_io.c block,fs: untangle fs.h and blk_types.h 2016-11-01 09:43:26 -06:00
buffer_head_io.h
dcache.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
dcache.h ocfs2: revert iput deferring code in ocfs2_drop_dentry_lock 2014-04-03 16:20:55 -07:00
dir.c ocfs2: fix not enough credit panic 2016-11-11 08:12:37 -08:00
dir.h VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
dlmglue.c ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock 2017-02-22 16:41:27 -08:00
dlmglue.h ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock 2017-02-22 16:41:27 -08:00
export.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
export.h
extent_map.c ocfs2: neaten do_error, ocfs2_error and ocfs2_abort 2015-09-04 16:54:41 -07:00
extent_map.h
file.c ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features 2016-12-10 12:39:45 -08:00
file.h ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features 2016-12-10 12:39:45 -08:00
filecheck.c ocfs2: sysfile interfaces for online file check 2016-03-22 15:36:02 -07:00
filecheck.h ocfs2: sysfile interfaces for online file check 2016-03-22 15:36:02 -07:00
heartbeat.c
heartbeat.h
inode.c ocfs2: replace CURRENT_TIME macro 2016-12-12 18:55:06 -08:00
inode.h ocfs2: convert inode refcount test to a helper 2016-12-10 12:39:45 -08:00
ioctl.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
ioctl.h
journal.c ocfs2: use time64_t to represent orphan scan times 2016-12-12 18:55:06 -08:00
journal.h jbd2: add support for avoiding data writes during transaction commits 2016-04-24 00:56:07 -04:00
Kconfig
localalloc.c ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local 2016-03-25 16:37:42 -07:00
localalloc.h ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters 2014-02-06 13:48:51 -08:00
locks.c ocfs2: fix flock panic issue 2015-12-29 17:45:49 -08:00
locks.h
Makefile ocfs2: disable BUG assertions in reading blocks 2016-06-24 17:23:52 -07:00
mmap.c ocfs2: clean up unused 'page' parameter in ocfs2_write_end_nolock() 2016-12-12 18:55:06 -08:00
mmap.h
move_extents.c ocfs2: convert inode refcount test to a helper 2016-12-10 12:39:45 -08:00
move_extents.h
namei.c ocfs2: replace CURRENT_TIME macro 2016-12-12 18:55:06 -08:00
namei.h ocfs2: do not include dio entry in case of orphan scan 2015-11-05 19:34:48 -08:00
ocfs1_fs_compat.h
ocfs2_fs.h ocfs2: fix comment in struct ocfs2_extended_slot 2016-05-19 19:12:14 -07:00
ocfs2_ioctl.h
ocfs2_lockid.h
ocfs2_lockingver.h
ocfs2_trace.h switch generic_file_splice_read() to use of ->read_iter() 2016-10-05 18:23:56 -04:00
ocfs2.h ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock 2017-02-22 16:41:27 -08:00
quota_global.c ocfs2: Protect periodic quota syncing with s_umount semaphore 2016-11-30 08:36:54 +01:00
quota_local.c ocfs2: Use s_umount for quota recovery protection 2016-11-30 08:37:21 +01:00
quota.h quota: constify qtree_fmt_operations structures 2016-01-04 10:58:35 +01:00
refcounttree.c vfs: fix isize/pos/len checks for reflink & dedupe 2016-12-22 23:00:23 -05:00
refcounttree.h ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features 2016-12-10 12:39:45 -08:00
reservations.c ocfs2: make resv_lock spinlock static 2015-02-10 14:30:29 -08:00
reservations.h
resize.c ocfs2: solve a problem of crossing the boundary in updating backups 2016-03-25 16:37:42 -07:00
resize.h
slot_map.c ocfs2: clean up an unneeded goto in ocfs2_put_slot() 2016-05-19 19:12:14 -07:00
slot_map.h
stack_o2cb.c ocfs2: avoid a pointless delay in o2cb_cluster_check() 2015-04-14 16:48:57 -07:00
stack_user.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
stackglue.c ocfs2: fix crash caused by stale lvb with fsdlm plugin 2017-01-10 18:31:54 -08:00
stackglue.h ocfs2: fix crash caused by stale lvb with fsdlm plugin 2017-01-10 18:31:54 -08:00
suballoc.c ocfs2: fix double unlock in case retry after free truncate log 2016-09-19 15:36:17 -07:00
suballoc.h ocfs2: rollback alloc_dinode counts when ocfs2_block_group_set_bits() failed 2014-04-03 16:20:56 -07:00
super.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2016-12-19 08:23:53 -08:00
super.h ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local 2016-03-25 16:37:42 -07:00
symlink.c vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
symlink.h
sysfile.c ocfs2: avoid system inode ref confusion by adding mutex lock 2014-04-03 16:20:57 -07:00
sysfile.h
uptodate.c ocfs2: remove NULL assignments on static 2014-06-04 16:53:53 -07:00
uptodate.h
xattr.c ocfs2: convert inode refcount test to a helper 2016-12-10 12:39:45 -08:00
xattr.h ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00