linux/net/smc
Wen Gu 61f434b028 net/smc: Resolve the race between link group access and termination
We encountered some crashes caused by the race between the access
and the termination of link groups.

Here are some of panic stacks we met:

1) Race between smc_clc_wait_msg() and __smc_lgr_terminate()

 BUG: kernel NULL pointer dereference, address: 00000000000002f0
 Workqueue: smc_hs_wq smc_listen_work [smc]
 RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc]
 Call Trace:
  <TASK>
  ? smc_clc_send_accept+0x45/0xa0 [smc]
  ? smc_clc_send_accept+0x45/0xa0 [smc]
  smc_listen_work+0x783/0x1220 [smc]
  ? finish_task_switch+0xc4/0x2e0
  ? process_one_work+0x1ad/0x3c0
  process_one_work+0x1ad/0x3c0
  worker_thread+0x4c/0x390
  ? rescuer_thread+0x320/0x320
  kthread+0x149/0x190
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x1f/0x30
  </TASK>

smc_listen_work()                abnormal case like port error
---------------------------------------------------------------
                                | __smc_lgr_terminate()
                                |  |- smc_conn_kill()
                                |      |- smc_lgr_unregister_conn()
                                |          |- set conn->lgr = NULL
smc_clc_wait_msg()              |
 |- access conn->lgr (panic)    |

2) Race between smc_setsockopt() and __smc_lgr_terminate()

 BUG: kernel NULL pointer dereference, address: 00000000000002e8
 RIP: 0010:smc_setsockopt+0x17a/0x280 [smc]
 Call Trace:
  <TASK>
  __sys_setsockopt+0xfc/0x190
  __x64_sys_setsockopt+0x20/0x30
  do_syscall_64+0x34/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  </TASK>

smc_setsockopt()                 abnormal case like port error
--------------------------------------------------------------
                                | __smc_lgr_terminate()
                                |  |- smc_conn_kill()
                                |      |- smc_lgr_unregister_conn()
                                |          |- set conn->lgr = NULL
mod_delayed_work()              |
 |- access conn->lgr (panic)    |

There are some other panic places and they are caused by the
similar reason as described above, which is accessing link
group after termination, thus getting a NULL pointer or invalid
resource.

Currently, there seems to be no synchronization between the
link group access and a sudden termination of it. This patch
tries to fix this by introducing reference count of link group
and not freeing link group until reference count is zero.

Link group might be referred to by links or smc connections. So
the operation to the link group reference count can be concluded
as follows:

object          [hold or initialized as 1]       [put]
-------------------------------------------------------------------
link group      smc_lgr_create()                 smc_lgr_free()
connections     smc_conn_create()                smc_conn_free()
links           smcr_link_init()                 smcr_link_clear()

Througth this way, we extend the life cycle of link group and
ensure it is longer than the life cycle of connections and links
above it, so that avoid invalid access to link group after its
termination.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-13 12:55:40 +00:00
..
af_smc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-01-09 17:00:17 -08:00
Kconfig treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Makefile net/smc: Introduce tracepoint for fallback 2021-11-01 13:39:14 +00:00
smc_cdc.c net/smc: fix kernel panic caused by race of smc_sock 2021-12-28 12:42:45 +00:00
smc_cdc.h net/smc: fix kernel panic caused by race of smc_sock 2021-12-28 12:42:45 +00:00
smc_clc.c net/smc: remove redundant re-assignment of pointer link 2022-01-02 12:14:10 +00:00
smc_clc.h net/smc: add v2 format of CLC decline message 2021-10-16 14:58:13 +01:00
smc_close.c net/smc: Keep smc_close_final rc during active close 2021-12-02 12:14:36 +00:00
smc_close.h net/smc: remove close abort worker 2019-10-22 11:23:44 -07:00
smc_core.c net/smc: Resolve the race between link group access and termination 2022-01-13 12:55:40 +00:00
smc_core.h net/smc: Resolve the race between link group access and termination 2022-01-13 12:55:40 +00:00
smc_diag.c net/smc: Add netlink net namespace support 2022-01-02 12:07:39 +00:00
smc_ib.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
smc_ib.h net/smc: Introduce net namespace support for linkgroup 2022-01-02 12:07:39 +00:00
smc_ism.c net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
smc_ism.h net/smc: keep static copy of system EID 2021-09-14 12:49:10 +01:00
smc_llc.c net/smc: Print net namespace in log 2022-01-02 12:07:39 +00:00
smc_llc.h net/smc: extend LLC layer for SMC-Rv2 2021-10-16 14:58:13 +01:00
smc_netlink.c net/smc: add generic netlink support for system EID 2021-09-14 12:49:10 +01:00
smc_netlink.h net/smc: add support for user defined EIDs 2021-09-14 12:49:10 +01:00
smc_netns.h net/smc: introduce list of pnetids for Ethernet devices 2020-09-28 15:19:03 -07:00
smc_pnet.c net/smc: fix possible NULL deref in smc_pnet_add_eth() 2022-01-12 14:45:29 +00:00
smc_pnet.h net/smc: determine proposed ISM devices 2020-09-28 15:19:03 -07:00
smc_rx.c net/smc: Introduce tracepoints for tx and rx msg 2021-11-01 13:39:14 +00:00
smc_rx.h smc: add support for splice() 2018-05-04 11:45:06 -04:00
smc_stats.c net/smc: Fix ENODATA tests in smc_nl_get_fback_stats() 2021-06-21 12:16:58 -07:00
smc_stats.h net/smc: Make SMC statistics network namespace aware 2021-06-16 12:54:02 -07:00
smc_tracepoint.c net/smc: Introduce tracepoint for smcr link down 2021-11-01 13:39:14 +00:00
smc_tracepoint.h net/smc: Add net namespace for tracepoints 2022-01-02 12:07:39 +00:00
smc_tx.c net/smc: Introduce tracepoints for tx and rx msg 2021-11-01 13:39:14 +00:00
smc_tx.h net/smc: eliminate cursor read and write calls 2018-07-23 10:57:14 -07:00
smc_wr.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-12-30 12:12:12 -08:00
smc_wr.h net/smc: fix kernel panic caused by race of smc_sock 2021-12-28 12:42:45 +00:00
smc.h net/smc: Resolve the race between link group access and termination 2022-01-13 12:55:40 +00:00