linux/include/rdma
Jason Gunthorpe 621e55ff5b RDMA/devices: Do not deadlock during client removal
lockdep reports:

   WARNING: possible circular locking dependency detected

   modprobe/302 is trying to acquire lock:
   0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990

   but task is already holding lock:
   000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core]

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #2 (&device->client_data_rwsem){++++}:
          down_read+0x3f/0x160
          ib_get_net_dev_by_params+0xd5/0x200 [ib_core]
          cma_ib_req_handler+0x5f6/0x2090 [rdma_cm]
          cm_process_work+0x29/0x110 [ib_cm]
          cm_req_handler+0x10f5/0x1c00 [ib_cm]
          cm_work_handler+0x54c/0x311d [ib_cm]
          process_one_work+0x4aa/0xa30
          worker_thread+0x62/0x5b0
          kthread+0x1ca/0x1f0
          ret_from_fork+0x24/0x30

   -> #1 ((work_completion)(&(&work->work)->work)){+.+.}:
          process_one_work+0x45f/0xa30
          worker_thread+0x62/0x5b0
          kthread+0x1ca/0x1f0
          ret_from_fork+0x24/0x30

   -> #0 ((wq_completion)ib_cm){+.+.}:
          lock_acquire+0xc8/0x1d0
          flush_workqueue+0x102/0x990
          cm_remove_one+0x30e/0x3c0 [ib_cm]
          remove_client_context+0x94/0xd0 [ib_core]
          disable_device+0x10a/0x1f0 [ib_core]
          __ib_unregister_device+0x5a/0xe0 [ib_core]
          ib_unregister_device+0x21/0x30 [ib_core]
          mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib]
          __mlx5_ib_remove+0x3d/0x70 [mlx5_ib]
          mlx5_ib_remove+0x12e/0x140 [mlx5_ib]
          mlx5_remove_device+0x144/0x150 [mlx5_core]
          mlx5_unregister_interface+0x3f/0xf0 [mlx5_core]
          mlx5_ib_cleanup+0x10/0x3a [mlx5_ib]
          __x64_sys_delete_module+0x227/0x350
          do_syscall_64+0xc3/0x6a4
          entry_SYSCALL_64_after_hwframe+0x49/0xbe

Which is due to the read side of the client_data_rwsem being obtained
recursively through a work queue flush during cm client removal.

The lock is being held across the remove in remove_client_context() so
that the function is a fence, once it returns the client is removed. This
is required so that the two callers do not proceed with destruction until
the client completes removal.

Instead of using client_data_rwsem use the existing device unregistration
refcount and add a similar client unregistration (client->uses) refcount.

This will fence the two unregistration paths without holding any locks.

Cc: <stable@vger.kernel.org>
Fixes: 921eab1143 ("RDMA/devices: Re-organize device.c locking")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-01 11:44:47 -04:00
..
ib_addr.h RDMA/core: Annotate timeout as unsigned long 2018-10-16 13:34:01 -04:00
ib_cache.h RDMA/cma: Use rdma_read_gid_attr_ndev_rcu to access netdev 2019-05-03 11:10:03 -03:00
ib_cm.h RDMA/core: Annotate timeout as unsigned long 2018-10-16 13:34:01 -04:00
ib_fmr_pool.h IB/core: Make function ib_fmr_pool_unmap return void 2018-11-21 16:13:02 -07:00
ib_hdrs.h IB/hfi1: Build TID RDMA WRITE request 2019-02-05 18:07:43 -05:00
ib_mad.h RDMA: Use __packed annotation instead of __attribute__ ((packed)) 2019-03-25 21:14:12 -03:00
ib_marshall.h IB/core: Convert ah_attr from OPA to IB when copying to user 2017-08-08 14:47:18 -04:00
ib_pack.h IB/core: Fix calculation of maximum RoCE MTU 2017-10-18 12:11:36 -04:00
ib_pma.h
ib_sa.h RDMA/core: Annotate timeout as unsigned long 2018-10-16 13:34:01 -04:00
ib_smi.h RDMA: Use __packed annotation instead of __attribute__ ((packed)) 2019-03-25 21:14:12 -03:00
ib_umem_odp.h RDMA/umem: Move page_shift from ib_umem to ib_odp_umem 2019-05-21 15:23:24 -03:00
ib_umem.h RDMA/umem: Move page_shift from ib_umem to ib_odp_umem 2019-05-21 15:23:24 -03:00
ib_verbs.h RDMA/devices: Do not deadlock during client removal 2019-08-01 11:44:47 -04:00
ib.h RDMA/core: Return bool instead of int 2018-07-30 20:49:04 -06:00
iw_cm.h RDMA: Get rid of iw_cm_verbs 2019-05-03 10:56:56 -03:00
iw_portmap.h RDMA/iwpm: move kdoc comments to functions 2019-02-05 15:40:41 -07:00
mr_pool.h Linux 5.2-rc6 2019-06-28 21:18:23 -03:00
opa_addr.h include/rdma/opa_addr.h: Fix an endianness issue 2018-07-03 14:11:34 -06:00
opa_port_info.h RDMA: Use __packed annotation instead of __attribute__ ((packed)) 2019-03-25 21:14:12 -03:00
opa_smi.h RDMA: Use __packed annotation instead of __attribute__ ((packed)) 2019-03-25 21:14:12 -03:00
opa_vnic.h IB/hfi1: Add support to receive 16B bypass packets 2017-08-22 14:22:37 -04:00
rdma_cm_ib.h RDMA/{cma, ucma}: Simplify and rename rdma_set_ib_paths 2018-01-10 22:00:33 -07:00
rdma_cm.h IB/cma: Define option to set ack timeout and pack tos_set 2019-02-08 16:14:21 -07:00
rdma_counter.h RDMA/core: Make rdma_counter.h compile stand alone 2019-07-09 09:44:47 -03:00
rdma_netlink.h RDMA/netlink: Audit policy settings for netlink attributes 2019-06-25 16:26:54 -03:00
rdma_vt.h IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPs 2019-06-28 22:34:26 -03:00
rdmavt_cq.h IB/{hfi1, qib, rdmavt}: Put qp in error state when cq is full 2019-06-28 22:34:26 -03:00
rdmavt_mr.h IB/rdmavt: Handle dereg of inuse MRs properly 2017-08-28 19:12:31 -04:00
rdmavt_qp.h IB/hfi1: Unreserve a flushed OPFN request 2019-07-22 14:57:55 -03:00
restrack.h RDMA/core: Make rdma_counter.h compile stand alone 2019-07-09 09:44:47 -03:00
rw.h Linux 5.2-rc6 2019-06-28 21:18:23 -03:00
signature.h RDMA/core: Add signature attrs element for ib_mr structure 2019-06-24 11:49:26 -03:00
tid_rdma_defs.h IB/hfi1: Build TID RDMA WRITE request 2019-02-05 18:07:43 -05:00
uverbs_ioctl.h IB/verbs: Add helper function rdma_udata_to_drv_context 2019-02-15 11:17:25 -07:00
uverbs_named_ioctl.h RDMA/uverbs: Fix typo in string concatenation macro 2018-12-06 21:11:06 -07:00
uverbs_std_types.h IB: When attrs.udata/ufile is available use that instead of uobject 2019-04-08 13:05:25 -03:00
uverbs_types.h IB: Pass uverbs_attr_bundle down uobject destroy path 2019-04-01 14:55:36 -03:00