linux/net
John Fastabend 35d2b7ffff bpf, sockmap: Fix preempt_rt splat when using raw_spin_lock_t
Sockmap and sockhash maps are a collection of psocks that are
objects representing a socket plus a set of metadata needed
to manage the BPF programs associated with the socket. These
maps use the stab->lock to protect from concurrent operations
on the maps, e.g. trying to insert to objects into the array
at the same time in the same slot. Additionally, a sockhash map
has a bucket lock to protect iteration and insert/delete into
the hash entry.

Each psock has a psock->link which is a linked list of all the
maps that a psock is attached to. This allows a psock (socket)
to be included in multiple sockmap and sockhash maps. This
linked list is protected the psock->link_lock.

They _must_ be nested correctly to avoid deadlock:

  lock(stab->lock)
    : do BPF map operations and psock insert/delete
    lock(psock->link_lock)
       : add map to psock linked list of maps
    unlock(psock->link_lock)
  unlock(stab->lock)

For non PREEMPT_RT kernels both raw_spin_lock_t and spin_lock_t
are guaranteed to not sleep. But, with PREEMPT_RT kernels the
spin_lock_t variants may sleep. In the current code we have
many patterns like this:

   rcu_critical_section:
      raw_spin_lock(stab->lock)
         spin_lock(psock->link_lock) <- may sleep ouch
         spin_unlock(psock->link_lock)
      raw_spin_unlock(stab->lock)
   rcu_critical_section

Nesting spin_lock() inside a raw_spin_lock() violates locking
rules for PREEMPT_RT kernels. And additionally we do alloc(GFP_ATOMICS)
inside the stab->lock, but those might sleep on PREEMPT_RT kernels.
The result is splats like this:

./test_progs -t sockmap_basic
[   33.344330] bpf_testmod: loading out-of-tree module taints kernel.
[   33.441933]
[   33.442089] =============================
[   33.442421] [ BUG: Invalid wait context ]
[   33.442763] 6.5.0-rc5-01731-gec0ded2e0282 #4958 Tainted: G           O
[   33.443320] -----------------------------
[   33.443624] test_progs/2073 is trying to lock:
[   33.443960] ffff888102a1c290 (&psock->link_lock){....}-{3:3}, at: sock_map_update_common+0x2c2/0x3d0
[   33.444636] other info that might help us debug this:
[   33.444991] context-{5:5}
[   33.445183] 3 locks held by test_progs/2073:
[   33.445498]  #0: ffff88811a208d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_map_update_elem_sys+0xff/0x330
[   33.446159]  #1: ffffffff842539e0 (rcu_read_lock){....}-{1:3}, at: sock_map_update_elem_sys+0xf5/0x330
[   33.446809]  #2: ffff88810d687240 (&stab->lock){+...}-{2:2}, at: sock_map_update_common+0x177/0x3d0
[   33.447445] stack backtrace:
[   33.447655] CPU: 10 PID

To fix observe we can't readily remove the allocations (for that
we would need to use/create something similar to bpf_map_alloc). So
convert raw_spin_lock_t to spin_lock_t. We note that sock_map_update
that would trigger the allocate and potential sleep is only allowed
through sys_bpf ops and via sock_ops which precludes hw interrupts
and low level atomic sections in RT preempt kernel. On non RT
preempt kernel there are no changes here and spin locks sections
and alloc(GFP_ATOMIC) are still not sleepable.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230830053517.166611-1-john.fastabend@gmail.com
2023-08-30 09:58:42 +02:00
..
6lowpan 6lowpan: Remove redundant initialisation. 2023-03-29 08:22:52 +01:00
9p net: annotate data-races around sock->ops 2023-08-09 15:32:43 -07:00
802
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
appletalk sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
atm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
ax25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-24 10:51:39 -07:00
bluetooth Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODED 2023-08-24 12:23:46 -07:00
bpf bpf: Prevent inlining of bpf_fentry_test7() 2023-08-30 08:36:17 +02:00
bpfilter net: Use umd_cleanup_helper() 2023-05-31 13:06:57 +02:00
bridge netfilter: ebtables: fix fortify warnings in size_entry_mwt() 2023-08-22 15:13:20 +02:00
caif sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
can can: raw: add missing refcount for memory leak fix 2023-08-22 17:18:50 -07:00
ceph libceph: fix potential hang in ceph_osdc_notify() 2023-08-02 09:07:34 +02:00
core bpf, sockmap: Fix preempt_rt splat when using raw_spin_lock_t 2023-08-30 09:58:42 +02:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-01 21:07:46 -07:00
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-29 07:44:56 +02:00
devlink devlink: move devlink_notify_register/unregister() to dev.c 2023-08-28 08:02:24 -07:00
dns_resolver
dsa net: dsa: mark parsed interface mode for legacy switch drivers 2023-08-09 13:08:09 -07:00
ethernet
ethtool ethtool: netlink: always pass genl_info to .prepare_data 2023-08-15 15:01:03 -07:00
handshake net/handshake: Trace events for TLS Alert helpers 2023-07-28 14:07:59 -07:00
hsr net/hsr: Remove unused function declarations 2023-07-31 20:11:47 -07:00
ieee802154 genetlink: use attrs from struct genl_info 2023-08-15 15:00:45 -07:00
ife
ipv4 inet: fix IP_TRANSPARENT error handling 2023-08-28 10:27:03 +01:00
ipv6 bpf-next-for-netdev 2023-08-25 18:40:15 -07:00
iucv
kcm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
l2tp inet: introduce inet->inet_flags 2023-08-16 11:09:16 +01:00
l3mdev
lapb
llc net/llc/llc_conn.c: fix 4 instances of -Wmissing-variable-declarations 2023-08-09 15:34:28 -07:00
mac80211 wireless-next patches for v6.6 2023-08-25 18:35:09 -07:00
mac802154 Core WPAN changes: 2023-06-24 15:41:46 -07:00
mctp sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
mpls net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
mptcp mptcp: register default scheduler 2023-08-22 17:31:19 -07:00
ncsi genetlink: make genl_info->nlhdr const 2023-08-15 14:54:44 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-24 10:51:39 -07:00
netlabel netlabel: Remove unused declaration netlbl_cipsov4_doi_free() 2023-08-02 12:28:22 -07:00
netlink genetlink: add a family pointer to struct genl_info 2023-08-15 15:01:03 -07:00
netrom netrom: Deny concurrent connect(). 2023-08-28 06:58:46 +01:00
nfc genetlink: use attrs from struct genl_info 2023-08-15 15:00:45 -07:00
nsh net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-10 14:10:53 -07:00
phonet sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
psample
qrtr net: qrtr: Handle IPCR control port format of older targets 2023-07-17 09:02:30 +01:00
rds net/rds: Remove unused function declarations 2023-08-13 12:25:42 +01:00
rfkill net: rfkill-gpio: Add explicit include for of.h 2023-04-06 20:36:27 +02:00
rose sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
rxrpc Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-29 07:44:56 +02:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-24 10:51:39 -07:00
smc net: annotate data-races around sk->sk_lingertime 2023-08-21 07:41:57 +01:00
strparser
sunrpc Networking changes for 6.6. 2023-08-29 11:33:01 -07:00
switchdev net: switchdev: Add a helper to replay objects on a bridge port 2023-07-21 08:54:03 +01:00
tipc genetlink: use attrs from struct genl_info 2023-08-15 15:00:45 -07:00
tls tls: get cipher_name from cipher_desc in tls_set_sw_offload 2023-08-27 17:17:42 -07:00
unix net: annotate data-races around sock->ops 2023-08-09 15:32:43 -07:00
vmw_vsock vsock: Remove unused function declarations 2023-07-31 14:41:08 -07:00
wireless wifi: nl80211: Remove unused declaration nl80211_pmsr_dump_results() 2023-08-22 21:40:40 +02:00
x25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
xdp xsk: Fix xsk_build_skb() error: 'skb' dereferencing possible ERR_PTR() 2023-08-30 08:41:23 +02:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
compat.c net/compat: Update msg_control_is_user when setting a kernel pointer 2023-04-14 11:09:27 +01:00
devres.c
Kconfig bpf: Add fd-based tcx multi-prog infra with link support 2023-07-19 10:07:27 -07:00
Kconfig.debug
Makefile net/handshake: Create a NETLINK service for handling handshake requests 2023-04-19 18:48:48 -07:00
socket.c net: Avoid address overwrite in kernel_connect 2023-08-23 09:42:05 +01:00
sysctl_net.c