linux/net
Dust Li 349d43127d net/smc: fix kernel panic caused by race of smc_sock
A crash occurs when smc_cdc_tx_handler() tries to access smc_sock
but smc_release() has already freed it.

[ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88
[ 4570.696048] #PF: supervisor write access in kernel mode
[ 4570.696728] #PF: error_code(0x0002) - not-present page
[ 4570.697401] PGD 0 P4D 0
[ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111
[ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0
[ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30
<...>
[ 4570.711446] Call Trace:
[ 4570.711746]  <IRQ>
[ 4570.711992]  smc_cdc_tx_handler+0x41/0xc0
[ 4570.712470]  smc_wr_tx_tasklet_fn+0x213/0x560
[ 4570.712981]  ? smc_cdc_tx_dismisser+0x10/0x10
[ 4570.713489]  tasklet_action_common.isra.17+0x66/0x140
[ 4570.714083]  __do_softirq+0x123/0x2f4
[ 4570.714521]  irq_exit_rcu+0xc4/0xf0
[ 4570.714934]  common_interrupt+0xba/0xe0

Though smc_cdc_tx_handler() checked the existence of smc connection,
smc_release() may have already dismissed and released the smc socket
before smc_cdc_tx_handler() further visits it.

smc_cdc_tx_handler()           |smc_release()
if (!conn)                     |
                               |
                               |smc_cdc_tx_dismiss_slots()
                               |      smc_cdc_tx_dismisser()
                               |
                               |sock_put(&smc->sk) <- last sock_put,
                               |                      smc_sock freed
bh_lock_sock(&smc->sk) (panic) |

To make sure we won't receive any CDC messages after we free the
smc_sock, add a refcount on the smc_connection for inflight CDC
message(posted to the QP but haven't received related CQE), and
don't release the smc_connection until all the inflight CDC messages
haven been done, for both success or failed ones.

Using refcount on CDC messages brings another problem: when the link
is going to be destroyed, smcr_link_clear() will reset the QP, which
then remove all the pending CQEs related to the QP in the CQ. To make
sure all the CQEs will always come back so the refcount on the
smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced
by smc_ib_modify_qp_error().
And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we
need to wait for all pending WQEs done, or we may encounter use-after-
free when handling CQEs.

For IB device removal routine, we need to wait for all the QPs on that
device been destroyed before we can destroy CQs on the device, or
the refcount on smc_connection won't reach 0 and smc_sock cannot be
released.

Fixes: 5f08318f61 ("smc: connection data control (CDC)")
Reported-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-28 12:42:45 +00:00
..
6lowpan
9p 9p: fix a bunch of checkpatch warnings 2021-11-04 21:04:25 +09:00
802 llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
8021q net: vlan: fix underflow for the real_dev refcnt 2021-11-26 11:20:46 -08:00
appletalk
atm net: atm: use address setting helpers 2021-10-24 13:59:45 +01:00
ax25 ax25: NPD bug when detaching AX25 device 2021-12-18 12:33:56 +00:00
batman-adv Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
bluetooth bluetooth: use dev_addr_set() 2021-10-25 11:01:29 -07:00
bpf bpf: Add dummy BPF STRUCT_OPS for test purpose 2021-11-01 14:10:00 -07:00
bpfilter
bridge net: bridge: fix ioctl old_deviceless bridge argument 2021-12-23 09:53:50 -08:00
caif net: caif: get ready for const netdev->dev_addr 2021-10-24 13:59:45 +01:00
can can: j1939: j1939_tp_cmd_recv(): check the dst address of TP.CM_BAM 2021-11-06 17:29:32 +01:00
ceph libceph, ceph: move ceph_osdc_copy_from() into cephfs code 2021-11-08 03:29:52 +01:00
core net/sched: flow_dissector: Fix matching on zone id for invalid conns 2021-12-17 18:06:35 -08:00
dcb
dccp tcp: switch orphan_count to bare per-cpu counters 2021-10-15 11:28:34 +01:00
decnet
dns_resolver
dsa net: dsa: tag_ocelot: use traffic class to map priority on injected header 2021-12-23 09:44:59 -08:00
ethernet eth: platform: add a helper for loading netdev->dev_addr 2021-10-08 14:54:33 +01:00
ethtool ethtool: do not perform operations on net devices being unregistered 2021-12-06 16:53:32 -08:00
hsr net: hsr: Add support for redbox supervision frames 2021-10-26 14:52:17 +01:00
ieee802154 mac802154: use dev_addr_set() - manual 2021-10-20 14:27:40 +01:00
ife
ipv4 net: udp: fix alignment problem in udp4_seq_show() 2021-12-27 14:47:14 +00:00
ipv6 ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate 2021-12-27 12:17:33 +00:00
iucv
kcm
key
l2tp net/l2tp: Fix reference count leak in l2tp_udp_recv_core 2021-09-09 11:00:20 +01:00
l3mdev
lapb
llc llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
mac80211 mac80211: fix locking in ieee80211_start_ap error path 2021-12-20 11:33:23 +00:00
mac802154 mac802154: use dev_addr_set() - manual 2021-10-20 14:27:40 +01:00
mctp mctp: Don't let RTM_DELROUTE delete local routes 2021-12-02 12:15:25 +00:00
mpls net: mpls: Remove rcu protection from nh_dev 2021-11-29 12:39:42 +00:00
mptcp mptcp: fix deadlock in __mptcp_push_pending() 2021-12-14 18:49:40 -08:00
ncsi net/ncsi : Add payload to be 32-bit aligned to fix dropped packets 2021-11-24 11:53:17 +00:00
netfilter netfilter: ctnetlink: remove expired entries first 2021-12-16 14:10:52 +01:00
netlabel net: fix NULL pointer reference in cipso_v4_doi_free 2021-08-30 12:23:18 +01:00
netlink net: netlink: af_netlink: Prevent empty skb by adding a check on len. 2021-11-30 17:45:01 -08:00
netrom ax25: constify dev_addr passing 2021-10-13 09:40:45 -07:00
nfc nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done 2021-12-09 07:50:32 -08:00
nsh
openvswitch net: openvswitch: Fix matching zone id for invalid conns arriving from tc 2021-12-17 18:06:36 -08:00
packet net/packet: rx_owner_map depends on pg_vec 2021-12-15 17:49:36 -08:00
phonet phonet/pep: refuse to enable an unbound pipe 2021-12-20 11:49:51 +00:00
psample
qrtr net: qrtr: combine nameservice into main module 2021-09-28 17:36:43 -07:00
rds rds: memory leak in __rds_conn_create() 2021-12-14 12:51:52 +00:00
rfkill
rose rose: constify dev_addr passing 2021-10-13 09:40:45 -07:00
rxrpc rxrpc: Fix rxrpc_local leak in rxrpc_lookup_peer() 2021-11-29 15:40:02 +00:00
sched net: openvswitch: Fix matching zone id for invalid conns arriving from tc 2021-12-17 18:06:36 -08:00
sctp sctp: use call_rcu to free endpoint 2021-12-25 17:13:37 +00:00
smc net/smc: fix kernel panic caused by race of smc_sock 2021-12-28 12:42:45 +00:00
strparser bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding 2021-11-09 01:05:28 +01:00
sunrpc NFS client bugfixes for Linux 5.16 2021-11-27 10:33:55 -08:00
switchdev net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device 2021-10-27 14:54:02 +01:00
tipc Revert "tipc: use consistent GFP flags" 2021-12-17 19:18:48 -08:00
tls net/tls: Fix authentication failure in CCM mode 2021-11-29 12:48:28 +00:00
unix af_unix: fix regression in read after shutdown 2021-11-20 15:10:30 +00:00
vmw_vsock virtio/vsock: fix the transport to work with VMADDR_CID_ANY 2021-12-08 15:41:50 -05:00
wireless cfg80211: Acquire wiphy mutex on regulatory work 2021-12-14 11:20:11 +01:00
x25
xdp xsk: Do not sleep in poll() when need_wakeup set 2021-12-14 15:20:54 +01:00
xfrm Core: 2021-11-02 06:20:58 -07:00
compat.c
devres.c
Kconfig net/core: disable NET_RX_BUSY_POLL on PREEMPT_RT 2021-10-01 15:45:10 -07:00
Makefile
socket.c Core: 2021-08-31 16:43:06 -07:00
sysctl_net.c sections: move and rename core_kernel_data() to is_kernel_core_data() 2021-11-09 10:02:50 -08:00