linux/net
Magnus Karlsson 18b1ab7aa7 xsk: Fix race at socket teardown
Fix a race in the xsk socket teardown code that can lead to a NULL pointer
dereference splat. The current xsk unbind code in xsk_unbind_dev() starts by
setting xs->state to XSK_UNBOUND, sets xs->dev to NULL and then waits for any
NAPI processing to terminate using synchronize_net(). After that, the release
code starts to tear down the socket state and free allocated memory.

  BUG: kernel NULL pointer dereference, address: 00000000000000c0
  PGD 8000000932469067 P4D 8000000932469067 PUD 0
  Oops: 0000 [#1] PREEMPT SMP PTI
  CPU: 25 PID: 69132 Comm: grpcpp_sync_ser Tainted: G          I       5.16.0+ #2
  Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.2.10 03/09/2015
  RIP: 0010:__xsk_sendmsg+0x2c/0x690
  [...]
  RSP: 0018:ffffa2348bd13d50 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000040 RCX: ffff8d5fc632d258
  RDX: 0000000000400000 RSI: ffffa2348bd13e10 RDI: ffff8d5fc5489800
  RBP: ffffa2348bd13db0 R08: 0000000000000000 R09: 00007ffffffff000
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d5fc5489800
  R13: ffff8d5fcb0f5140 R14: ffff8d5fcb0f5140 R15: 0000000000000000
  FS:  00007f991cff9400(0000) GS:ffff8d6f1f700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000000c0 CR3: 0000000114888005 CR4: 00000000001706e0
  Call Trace:
  <TASK>
  ? aa_sk_perm+0x43/0x1b0
  xsk_sendmsg+0xf0/0x110
  sock_sendmsg+0x65/0x70
  __sys_sendto+0x113/0x190
  ? debug_smp_processor_id+0x17/0x20
  ? fpregs_assert_state_consistent+0x23/0x50
  ? exit_to_user_mode_prepare+0xa5/0x1d0
  __x64_sys_sendto+0x29/0x30
  do_syscall_64+0x3b/0xc0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

There are two problems with the current code. First, setting xs->dev to NULL
before waiting for all users to stop using the socket is not correct. The
entry to the data plane functions xsk_poll(), xsk_sendmsg(), and xsk_recvmsg()
are all guarded by a test that xs->state is in the state XSK_BOUND and if not,
it returns right away. But one process might have passed this test but still
have not gotten to the point in which it uses xs->dev in the code. In this
interim, a second process executing xsk_unbind_dev() might have set xs->dev to
NULL which will lead to a crash for the first process. The solution here is
just to get rid of this NULL assignment since it is not used anymore. Before
commit 42fddcc7c6 ("xsk: use state member for socket synchronization"),
xs->dev was the gatekeeper to admit processes into the data plane functions,
but it was replaced with the state variable xs->state in the aforementioned
commit.

The second problem is that synchronize_net() does not wait for any process in
xsk_poll(), xsk_sendmsg(), or xsk_recvmsg() to complete, which means that the
state they rely on might be cleaned up prematurely. This can happen when the
notifier gets called (at driver unload for example) as it uses xsk_unbind_dev().
Solve this by extending the RCU critical region from just the ndo_xsk_wakeup
to the whole functions mentioned above, so that both the test of xs->state ==
XSK_BOUND and the last use of any member of xs is covered by the RCU critical
section. This will guarantee that when synchronize_net() completes, there will
be no processes left executing xsk_poll(), xsk_sendmsg(), or xsk_recvmsg() and
state can be cleaned up safely. Note that we need to drop the RCU lock for the
skb xmit path as it uses functions that might sleep. Due to this, we have to
retest the xs->state after we grab the mutex that protects the skb xmit code
from, among a number of things, an xsk_unbind_dev() being executed from the
notifier at the same time.

Fixes: 42fddcc7c6 ("xsk: use state member for socket synchronization")
Reported-by: Elza Mathew <elza.mathew@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20220228094552.10134-1-magnus.karlsson@gmail.com
2022-02-28 15:39:53 +01:00
..
6lowpan
9p virtio,vdpa,qemu_fw_cfg: features, cleanups, fixes 2022-01-18 10:05:48 +02:00
802 net: 802: Use memset_startat() to clear struct fields 2021-11-19 11:23:23 +00:00
8021q vlan: move dev_put into vlan_dev_uninit 2022-02-09 13:33:39 +00:00
appletalk
atm proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
ax25 ax25: fix UAF bugs of net_device caused by rebinding operation 2022-02-09 13:30:07 +00:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-01-05 14:36:10 -08:00
bluetooth Bluetooth: hci_sync: Fix not using conn_timeout 2022-02-24 21:34:28 +01:00
bpf bpf: Add dummy BPF STRUCT_OPS for test purpose 2021-11-01 14:10:00 -07:00
bpfilter
bridge net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled 2022-02-16 20:35:00 -08:00
caif Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
can net-timestamp: convert sk->sk_tskey to atomic_t 2022-02-18 11:14:52 +00:00
ceph libceph: optionally use bounce buffer on recv path in crc mode 2022-02-02 18:50:36 +01:00
core net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends 2022-02-22 16:32:35 -08:00
dcb net: dcb: flush lingering app table entries for unregistered devices 2022-02-25 10:42:34 +00:00
dccp dccp: Inline dccp_listen_start(). 2021-11-23 20:16:22 -08:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
dns_resolver
dsa net: dsa: fix panic when removing unoffloaded port from bridge 2022-02-22 17:03:02 -08:00
ethernet gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlers 2021-11-24 17:21:42 -08:00
ethtool ethtool: use phydev variable 2022-01-06 12:33:35 +00:00
hsr net: Write lock dev_base_lock without disabling bottom halves. 2021-11-29 12:12:36 +00:00
ieee802154 net: ieee802154: Return meaningful error codes from the netlink helpers 2022-01-27 08:20:47 +01:00
ife
ipv4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2022-02-25 10:44:15 +00:00
ipv6 net: ipv6: ensure we call ipv6_mc_down() at most once 2022-02-28 11:04:45 +00:00
iucv net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
kcm net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
key xfrm: Check if_id in xfrm_migrate 2022-01-26 07:44:01 +01:00
l2tp l2tp: add netns refcount tracker to l2tp_dfs_seq_data 2021-12-10 06:38:27 -08:00
l3mdev
lapb
llc sock: Use sock_owned_by_user_nocheck() instead of sk_lock.owned. 2021-12-10 19:43:00 -08:00
mac80211 mac80211: mlme: check for null after calling kmemdup 2022-01-31 15:20:12 +01:00
mac802154 mac802154: use dev_addr_set() - manual 2021-10-20 14:27:40 +01:00
mctp mctp: fix use after free 2022-02-15 14:54:40 +00:00
mpls net: mpls: Fix GCC 12 warning 2022-02-10 15:29:39 +00:00
mptcp mptcp: Correctly set DATA_FIN timeout when number of retransmits is large 2022-02-24 21:54:54 -08:00
ncsi all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate 2022-01-15 08:47:31 -08:00
netfilter netfilter: nf_tables: fix memory leak during stateful obj update 2022-02-22 08:28:04 +01:00
netlabel lsm: security_task_getsecid_subj() -> security_current_getsecid_subj() 2021-11-22 17:52:47 -05:00
netlink net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
netrom netrom: fix api breakage in nr_setsockopt() 2022-01-07 14:11:05 +00:00
nfc Networking fixes for 5.17-rc1, including fixes from netfilter, bpf. 2022-01-20 10:57:05 +02:00
nsh
openvswitch openvswitch: Fix setting ipv6 fields causing hw csum failure 2022-02-24 09:16:21 -08:00
packet af_packet: fix data-race in packet_setsockopt / packet_setsockopt 2022-02-01 20:21:10 -08:00
phonet phonet/pep: refuse to enable an unbound pipe 2021-12-20 11:49:51 +00:00
psample
qrtr bus: mhi: core: Add an API for auto queueing buffers for DL channel 2021-12-17 17:17:14 +01:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-12-16 16:13:19 -08:00
rfkill rfkill: allow to get the software rfkill state 2021-12-20 11:02:38 +01:00
rose net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
rxrpc rxrpc: Adjust retransmission backoff 2022-01-22 02:03:24 +00:00
sched net: sched: avoid newline at end of message in NL_SET_ERR_MSG_MOD 2022-02-23 12:45:44 +00:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-01-05 14:36:10 -08:00
smc net/smc: Fix cleanup when register ULP fails 2022-02-28 11:31:49 +00:00
strparser bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding 2021-11-09 01:05:28 +01:00
sunrpc SUNRPC: lock against ->sock changing during sysfs read 2022-02-08 09:14:26 -05:00
switchdev net: switchdev: add net device refcount tracker 2021-12-07 20:44:58 -08:00
tipc tipc: Fix end of loop tests for list_for_each_entry() 2022-02-23 12:35:40 +00:00
tls net/tls: Fix another skb memory leak when running kTLS traffic 2022-01-17 13:07:47 +00:00
unix af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress 2022-01-14 18:31:37 -08:00
vmw_vsock vsock: remove vsock from connected table when connect is interrupted by a signal 2022-02-17 08:56:02 -08:00
wireless cfg80211: fix race in netlink owner interface destruction 2022-02-04 16:31:44 +01:00
x25 net: x25: drop harmless check of !more 2021-12-09 18:35:11 -08:00
xdp xsk: Fix race at socket teardown 2022-02-28 15:39:53 +01:00
xfrm xfrm: enforce validity of offload input flags 2022-02-09 09:00:40 +01:00
compat.c
devres.c
Kconfig net: kunit: add a test for dev_addr_lists 2021-11-20 12:25:57 +00:00
Kconfig.debug net: add networking namespace refcount tracker 2021-12-10 06:38:26 -08:00
Makefile
socket.c net: fix documentation for kernel_getsockname 2022-02-14 14:01:19 +00:00
sysctl_net.c sections: move and rename core_kernel_data() to is_kernel_core_data() 2021-11-09 10:02:50 -08:00