linux/net
John Fastabend 476d98018f bpf, sockmap: On cleanup we additionally need to remove cached skb
Its possible if a socket is closed and the receive thread is under memory
pressure it may have cached a skb. We need to ensure these skbs are
free'd along with the normal ingress_skb queue.

Before 799aa7f98d ("skmsg: Avoid lock_sock() in sk_psock_backlog()") tear
down and backlog processing both had sock_lock for the common case of
socket close or unhash. So it was not possible to have both running in
parrallel so all we would need is the kfree in those kernels.

But, latest kernels include the commit 799aa7f98d5e and this requires a
bit more work. Without the ingress_lock guarding reading/writing the
state->skb case its possible the tear down could run before the state
update causing it to leak memory or worse when the backlog reads the state
it could potentially run interleaved with the tear down and we might end up
free'ing the state->skb from tear down side but already have the reference
from backlog side. To resolve such races we wrap accesses in ingress_lock
on both sides serializing tear down and backlog case. In both cases this
only happens after an EAGAIN error case so having an extra lock in place
is likely fine. The normal path will skip the locks.

Note, we check state->skb before grabbing lock. This works because
we can only enqueue with the mutex we hold already. Avoiding a race
on adding state->skb after the check. And if tear down path is running
that is also fine if the tear down path then removes state->skb we
will simply set skb=NULL and the subsequent goto is skipped. This
slight complication avoids locking in normal case.

With this fix we no longer see this warning splat from tcp side on
socket close when we hit the above case with redirect to ingress self.

[224913.935822] WARNING: CPU: 3 PID: 32100 at net/core/stream.c:208 sk_stream_kill_queues+0x212/0x220
[224913.935841] Modules linked in: fuse overlay bpf_preload x86_pkg_temp_thermal intel_uncore wmi_bmof squashfs sch_fq_codel efivarfs ip_tables x_tables uas xhci_pci ixgbe mdio xfrm_algo xhci_hcd wmi
[224913.935897] CPU: 3 PID: 32100 Comm: fgs-bench Tainted: G          I       5.14.0-rc1alu+ #181
[224913.935908] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[224913.935914] RIP: 0010:sk_stream_kill_queues+0x212/0x220
[224913.935923] Code: 8b 83 20 02 00 00 85 c0 75 20 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 df e8 2b 11 fe ff eb c3 0f 0b e9 7c ff ff ff 0f 0b eb ce <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 90 0f 1f 44 00 00 41 57 41
[224913.935932] RSP: 0018:ffff88816271fd38 EFLAGS: 00010206
[224913.935941] RAX: 0000000000000ae8 RBX: ffff88815acd5240 RCX: dffffc0000000000
[224913.935948] RDX: 0000000000000003 RSI: 0000000000000ae8 RDI: ffff88815acd5460
[224913.935954] RBP: ffff88815acd5460 R08: ffffffff955c0ae8 R09: fffffbfff2e6f543
[224913.935961] R10: ffffffff9737aa17 R11: fffffbfff2e6f542 R12: ffff88815acd5390
[224913.935967] R13: ffff88815acd5480 R14: ffffffff98d0c080 R15: ffffffff96267500
[224913.935974] FS:  00007f86e6bd1700(0000) GS:ffff888451cc0000(0000) knlGS:0000000000000000
[224913.935981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[224913.935988] CR2: 000000c0008eb000 CR3: 00000001020e0005 CR4: 00000000003706e0
[224913.935994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[224913.936000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[224913.936007] Call Trace:
[224913.936016]  inet_csk_destroy_sock+0xba/0x1f0
[224913.936033]  __tcp_close+0x620/0x790
[224913.936047]  tcp_close+0x20/0x80
[224913.936056]  inet_release+0x8f/0xf0
[224913.936070]  __sock_release+0x72/0x120
[224913.936083]  sock_close+0x14/0x20

Fixes: a136678c0b ("bpf: sk_msg, zap ingress queue on psock down")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210727160500.1713554-3-john.fastabend@gmail.com
2021-07-27 14:55:21 -07:00
..
6lowpan
9p 9p/trans_virtio: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
802 net/802/garp: fix memleak in garp_request_join() 2021-07-01 11:21:57 -07:00
8021q net: vlan: pass thru all GSO_SOFTWARE in hw_enc_features 2021-06-18 11:58:03 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
atm atm: Use list_for_each_entry() to simplify code in resources.c 2021-06-10 14:08:09 -07:00
ax25
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
bluetooth TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
bpf bpf, test: fix NULL pointer dereference on invalid expected_attach_type 2021-07-12 17:13:08 +02:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge net: bridge: multicast: fix MRD advertisement router port marking race 2021-07-11 12:11:06 -07:00
caif net: fix uninit-value in caif_seqpkt_sendmsg 2021-07-15 11:08:33 -07:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
ceph Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
core bpf, sockmap: On cleanup we additionally need to remove cached skb 2021-07-27 14:55:21 -07:00
dcb net: dcb: Return the correct errno code 2021-06-01 17:01:33 -07:00
dccp net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
decnet decnet: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
dns_resolver
dsa net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() 2021-07-13 14:47:10 -07:00
ethernet of: net: pass the dst buffer to of_get_mac_address() 2021-04-13 14:35:02 -07:00
ethtool net: sock: extend SO_TIMESTAMPING for PHC binding 2021-07-01 13:08:18 -07:00
hsr net: hsr: don't check sequence number if tag removal is offloaded 2021-06-16 12:13:01 -07:00
ieee802154 ieee802154: fix error return code in ieee802154_llsec_getparams() 2021-06-03 10:59:49 +02:00
ife
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2021-07-15 14:39:45 -07:00
ipv6 ipv6: allocate enough headroom in ip6_finish_output2() 2021-07-12 11:25:12 -07:00
iucv s390: iucv: Avoid field over-reading memcpy() 2021-07-01 15:54:01 -07:00
kcm net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
key net: Remove unnecessary variables 2021-05-26 07:03:39 +02:00
l2tp l2tp: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
l3mdev
lapb net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c 2021-06-08 16:31:25 -07:00
llc llc2: Remove redundant assignment to rc 2021-04-27 14:16:14 -07:00
mac80211 mac80211: Switch to a virtual time-based airtime scheduler 2021-06-23 18:12:00 +02:00
mac802154
mpls mpls: Remove redundant assignment to err 2021-04-27 14:17:00 -07:00
mptcp net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
ncsi net/ncsi: add dummy response handler for Intel boards 2021-07-08 14:16:39 -07:00
netfilter netfilter: nft_last: incorrect arithmetics when restoring last used 2021-07-06 14:15:13 +02:00
netlabel netlabel: Fix memory leak in netlbl_mgmt_add_common 2021-06-15 11:19:04 -07:00
netlink net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
netrom net: netrom: Fix fall-through warnings for Clang 2021-05-17 19:57:08 -05:00
nfc TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
nsh
openvswitch openvswitch: Optimize operation for key comparison 2021-07-01 11:13:10 -07:00
packet Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
phonet
psample
qrtr net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
rds Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
rfkill Another set of updates, all over the map: 2021-04-20 16:44:04 -07:00
rose
rxrpc Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
sched net/sched: act_ct: remove and free nf_table callbacks 2021-07-02 13:36:35 -07:00
sctp net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
smc net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
strparser net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
sunrpc NFS client updates for Linux 5.14 2021-07-09 09:43:57 -07:00
switchdev net: switchdev: add a context void pointer to struct switchdev_notifier_info 2021-06-28 14:09:03 -07:00
tipc Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
unix net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
vmw_vsock Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
wireless cfg80211: Support hidden AP discovery over 6GHz band 2021-06-23 13:05:09 +02:00
x25 net: x25: Use list_for_each_entry() to simplify code in x25_route.c 2021-06-10 14:08:09 -07:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
xfrm Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
compat.c net: Return the correct errno code 2021-06-03 15:13:56 -07:00
devres.c net: devres: Correct a grammatical error 2021-06-11 12:55:28 -07:00
Kconfig bpf, kconfig: Add consolidated menu entry for bpf with core options 2021-05-11 13:56:16 -07:00
Makefile
socket.c net: socket: support hardware timestamp conversion to PHC bound 2021-07-01 13:08:18 -07:00
sysctl_net.c net: Ensure net namespace isolation of sysctls 2021-04-12 13:27:11 -07:00