linux/net/smc
D. Wythe 5c15b3123f net/smc: Prevent smc_release() from long blocking
In nginx/wrk benchmark, there's a hung problem with high probability
on case likes that: (client will last several minutes to exit)

server: smc_run nginx

client: smc_run wrk -c 10000 -t 1 http://server

Client hangs with the following backtrace:

0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c
3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93
9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373
14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c

This issue dues to flush_work(), which is used to wait for
smc_connect_work() to finish in smc_release(). Once lots of
smc_connect_work() was pending or all executing work dangling,
smc_release() has to block until one worker comes to free, which
is equivalent to wait another smc_connnect_work() to finish.

In order to fix this, There are two changes:

1. For those idle smc_connect_work(), cancel it from the workqueue; for
   executing smc_connect_work(), waiting for it to finish. For that
   purpose, replace flush_work() with cancel_work_sync().

2. Since smc_connect() hold a reference for passive closing, if
   smc_connect_work() has been cancelled, release the reference.

Fixes: 24ac3a08e6 ("net/smc: rebuild nonblocking connect")
Reported-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 08:11:05 -08:00
..
af_smc.c net/smc: Prevent smc_release() from long blocking 2021-12-16 08:11:05 -08:00
Kconfig treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Makefile net/smc: Introduce tracepoint for fallback 2021-11-01 13:39:14 +00:00
smc_cdc.c net/smc: improved fix wait on already cleared link 2021-10-08 17:00:16 +01:00
smc_cdc.h net/smc: pre-fetch send buffer outside of send_lock 2020-05-30 18:12:25 -07:00
smc_clc.c net/smc: add v2 format of CLC decline message 2021-10-16 14:58:13 +01:00
smc_clc.h net/smc: add v2 format of CLC decline message 2021-10-16 14:58:13 +01:00
smc_close.c net/smc: Keep smc_close_final rc during active close 2021-12-02 12:14:36 +00:00
smc_close.h net/smc: remove close abort worker 2019-10-22 11:23:44 -07:00
smc_core.c net/smc: fix wrong list_del in smc_lgr_cleanup_early 2021-12-02 12:07:46 +00:00
smc_core.h net/smc: extend LLC layer for SMC-Rv2 2021-10-16 14:58:13 +01:00
smc_diag.c net/smc: Introduce SMCR get link command 2020-12-01 17:56:13 -08:00
smc_ib.c net/smc: stop links when their GID is removed 2021-10-16 14:58:13 +01:00
smc_ib.h net/smc: retrieve v2 gid from IB device 2021-10-16 14:58:13 +01:00
smc_ism.c net/smc: keep static copy of system EID 2021-09-14 12:49:10 +01:00
smc_ism.h net/smc: keep static copy of system EID 2021-09-14 12:49:10 +01:00
smc_llc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
smc_llc.h net/smc: extend LLC layer for SMC-Rv2 2021-10-16 14:58:13 +01:00
smc_netlink.c net/smc: add generic netlink support for system EID 2021-09-14 12:49:10 +01:00
smc_netlink.h net/smc: add support for user defined EIDs 2021-09-14 12:49:10 +01:00
smc_netns.h net/smc: introduce list of pnetids for Ethernet devices 2020-09-28 15:19:03 -07:00
smc_pnet.c net/smc: retrieve v2 gid from IB device 2021-10-16 14:58:13 +01:00
smc_pnet.h net/smc: determine proposed ISM devices 2020-09-28 15:19:03 -07:00
smc_rx.c net/smc: Introduce tracepoints for tx and rx msg 2021-11-01 13:39:14 +00:00
smc_rx.h smc: add support for splice() 2018-05-04 11:45:06 -04:00
smc_stats.c net/smc: Fix ENODATA tests in smc_nl_get_fback_stats() 2021-06-21 12:16:58 -07:00
smc_stats.h net/smc: Make SMC statistics network namespace aware 2021-06-16 12:54:02 -07:00
smc_tracepoint.c net/smc: Introduce tracepoint for smcr link down 2021-11-01 13:39:14 +00:00
smc_tracepoint.h net/smc: Print function name in smcr_link_down tracepoint 2021-11-05 10:14:38 +00:00
smc_tx.c net/smc: Introduce tracepoints for tx and rx msg 2021-11-01 13:39:14 +00:00
smc_tx.h net/smc: eliminate cursor read and write calls 2018-07-23 10:57:14 -07:00
smc_wr.c net/smc: extend LLC layer for SMC-Rv2 2021-10-16 14:58:13 +01:00
smc_wr.h net/smc: extend LLC layer for SMC-Rv2 2021-10-16 14:58:13 +01:00
smc.h net/smc: extend LLC layer for SMC-Rv2 2021-10-16 14:58:13 +01:00