linux/net
Alexander Lobakin 0315a075f1 net: fix premature exit from NAPI state polling in napi_disable()
Commit 719c571970 ("net: make napi_disable() symmetric with
enable") accidentally introduced a bug sometimes leading to a kernel
BUG when bringing an iface up/down under heavy traffic load.

Prior to this commit, napi_disable() was polling n->state until
none of (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC) is set and then
always flip them. Now there's a possibility to get away with the
NAPIF_STATE_SCHE unset as 'continue' drops us to the cmpxchg()
call with an uninitialized variable, rather than straight to
another round of the state check.

Error path looks like:

napi_disable():
unsigned long val, new; /* new is uninitialized */

do {
	val = READ_ONCE(n->state); /* NAPIF_STATE_NPSVC and/or
				      NAPIF_STATE_SCHED is set */
	if (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) { /* true */
		usleep_range(20, 200);
		continue; /* go straight to the condition check */
	}
	new = val | <...>
} while (cmpxchg(&n->state, val, new) != val); /* state == val, cmpxchg()
						  writes garbage */

napi_enable():
do {
	val = READ_ONCE(n->state);
	BUG_ON(!test_bit(NAPI_STATE_SCHED, &val)); /* 50/50 boom */
<...>

while the typical BUG splat is like:

[  172.652461] ------------[ cut here ]------------
[  172.652462] kernel BUG at net/core/dev.c:6937!
[  172.656914] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  172.661966] CPU: 36 PID: 2829 Comm: xdp_redirect_cp Tainted: G          I       5.15.0 #42
[  172.670222] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[  172.680646] RIP: 0010:napi_enable+0x5a/0xd0
[  172.684832] Code: 07 49 81 cc 00 01 00 00 4c 89 e2 48 89 d8 80 e6 fb f0 48 0f b1 55 10 48 39 c3 74 10 48 8b 5d 10 f6 c7 04 75 3d f6 c3 01 75 b4 <0f> 0b 5b 5d 41 5c c3 65 ff 05 b8 e5 61 53 48 c7 c6 c0 f3 34 ad 48
[  172.703578] RSP: 0018:ffffa3c9497477a8 EFLAGS: 00010246
[  172.708803] RAX: ffffa3c96615a014 RBX: 0000000000000000 RCX: ffff8a4b575301a0
< snip >
[  172.782403] Call Trace:
[  172.784857]  <TASK>
[  172.786963]  ice_up_complete+0x6f/0x210 [ice]
[  172.791349]  ice_xdp+0x136/0x320 [ice]
[  172.795108]  ? ice_change_mtu+0x180/0x180 [ice]
[  172.799648]  dev_xdp_install+0x61/0xe0
[  172.803401]  dev_xdp_attach+0x1e0/0x550
[  172.807240]  dev_change_xdp_fd+0x1e6/0x220
[  172.811338]  do_setlink+0xee8/0x1010
[  172.814917]  rtnl_setlink+0xe5/0x170
[  172.818499]  ? bpf_lsm_binder_set_context_mgr+0x10/0x10
[  172.823732]  ? security_capable+0x36/0x50
< snip >

Fix this by replacing 'do { } while (cmpxchg())' with an "infinite"
for-loop with an explicit break.

From v1 [0]:
 - just use a for-loop to simplify both the fix and the existing
   code (Eric).

[0] https://lore.kernel.org/netdev/20211110191126.1214-1-alexandr.lobakin@intel.com

Fixes: 719c571970 ("net: make napi_disable() symmetric with enable")
Suggested-by: Eric Dumazet <edumazet@google.com> # for-loop
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211110195605.1304-1-alexandr.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-10 17:45:15 -08:00
..
6lowpan 6lowpan: iphc: Fix an off-by-one check of array index 2021-07-22 16:19:03 +02:00
9p net/9p: increase default msize to 128k 2021-09-05 08:36:44 +09:00
802 llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
8021q net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-03 14:19:23 +00:00
appletalk net: socket: rework compat_ifreq_ioctl() 2021-07-23 14:20:25 +01:00
atm net: atm: use address setting helpers 2021-10-24 13:59:45 +01:00
ax25 ax25: constify dev_addr passing 2021-10-13 09:40:45 -07:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
bluetooth bluetooth: use dev_addr_set() 2021-10-25 11:01:29 -07:00
bpf bpf: Add dummy BPF STRUCT_OPS for test purpose 2021-11-01 14:10:00 -07:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-11-01 20:05:14 -07:00
caif net: caif: get ready for const netdev->dev_addr 2021-10-24 13:59:45 +01:00
can can: j1939: j1939_tp_cmd_recv(): check the dst address of TP.CM_BAM 2021-11-06 17:29:32 +01:00
ceph Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
core net: fix premature exit from NAPI state polling in napi_disable() 2021-11-10 17:45:15 -08:00
dcb
dccp tcp: switch orphan_count to bare per-cpu counters 2021-10-15 11:28:34 +01:00
decnet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
dns_resolver
dsa net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge 2021-11-03 14:22:00 +00:00
ethernet eth: platform: add a helper for loading netdev->dev_addr 2021-10-08 14:54:33 +01:00
ethtool ethtool: fix ethtool msg len calculation for pause stats 2021-11-03 11:20:45 +00:00
hsr net: hsr: Add support for redbox supervision frames 2021-10-26 14:52:17 +01:00
ieee802154 mac802154: use dev_addr_set() - manual 2021-10-20 14:27:40 +01:00
ife
ipv4 bpf, sockmap: Fix race in ingress receive verdict with redirect to self 2021-11-09 00:58:26 +01:00
ipv6 ipv6: remove useless assignment to newinet in tcp_v6_syn_recv_sock() 2021-11-05 19:49:40 -07:00
iucv net/iucv: Replace deprecated CPU-hotplug functions. 2021-08-09 10:13:32 +01:00
kcm net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
key
l2tp net/l2tp: Fix reference count leak in l2tp_udp_recv_core 2021-09-09 11:00:20 +01:00
l3mdev
lapb
llc llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
mac802154 mac802154: use dev_addr_set() - manual 2021-10-20 14:27:40 +01:00
mctp mctp: handle the struct sockaddr_mctp_ext padding field 2021-11-04 19:17:48 -07:00
mpls mpls: defer ttl decrement in mpls_forward() 2021-07-23 17:17:56 +01:00
mptcp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
ncsi net/ncsi: add get MAC address command to get Intel i210 MAC address 2021-09-01 17:18:56 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2021-11-02 18:02:54 -07:00
netlabel net: fix NULL pointer reference in cipso_v4_doi_free 2021-08-30 12:23:18 +01:00
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-07 15:24:06 -07:00
netrom ax25: constify dev_addr passing 2021-10-13 09:40:45 -07:00
nfc NFC: add necessary privilege flags in netlink layer 2021-11-03 11:15:18 +00:00
nsh
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-19 18:09:18 -07:00
packet af_packet: Introduce egress hook 2021-10-14 23:06:44 +02:00
phonet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
psample
qrtr net: qrtr: combine nameservice into main module 2021-09-28 17:36:43 -07:00
rds net/rds: dma_map_sg is entitled to merge entries 2021-08-18 15:35:50 -07:00
rfkill
rose rose: constify dev_addr passing 2021-10-13 09:40:45 -07:00
rxrpc rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies() 2021-09-24 14:18:34 +01:00
sched net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_any 2021-11-09 19:16:23 -08:00
sctp sctp: remove unreachable code from sctp_sf_violation_chunk() 2021-11-07 19:32:39 +00:00
smc net/smc: fix sk_refcnt underflow on linkdown and fallback 2021-11-10 14:43:59 +00:00
strparser bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding 2021-11-09 01:05:28 +01:00
sunrpc Bug fixes for NFSD error handling paths 2021-10-07 14:11:40 -07:00
switchdev net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device 2021-10-27 14:54:02 +01:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
unix net: Implement ->sock_is_readable() for UDP and AF_UNIX 2021-10-26 12:29:33 -07:00
vmw_vsock vsock: prevent unnecessary refcnt inc for nonblocking connect 2021-11-10 14:36:11 +00:00
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
x25
xdp xsk: Fix clang build error in __xp_alloc 2021-09-29 13:59:13 +02:00
xfrm Core: 2021-11-02 06:20:58 -07:00
compat.c
devres.c
Kconfig net/core: disable NET_RX_BUSY_POLL on PREEMPT_RT 2021-10-01 15:45:10 -07:00
Makefile mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
socket.c Core: 2021-08-31 16:43:06 -07:00
sysctl_net.c