linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-11 05:33:09 +00:00

History

Maciej Żenczykowski 364745fbe9 bpf: Do not change gso_size during bpf_skb_change_proto() This is technically a backwards incompatible change in behaviour, but I'm going to argue that it is very unlikely to break things, and likely to fix far more then it breaks. In no particular order, various reasons follow: (a) I've long had a bug assigned to myself to debug a super rare kernel crash on Android Pixel phones which can (per stacktrace) be traced back to BPF clat IPv6 to IPv4 protocol conversion causing some sort of ugly failure much later on during transmit deep in the GSO engine, AFAICT precisely because of this change to gso_size, though I've never been able to manually reproduce it. I believe it may be related to the particular network offload support of attached USB ethernet dongle being used for tethering off of an IPv6-only cellular connection. The reason might be we end up with more segments than max permitted, or with a GSO packet with only one segment... (either way we break some assumption and hit a BUG_ON) (b) There is no check that the gso_size is > 20 when reducing it by 20, so we might end up with a negative (or underflowing) gso_size or a gso_size of 0. This can't possibly be good. Indeed this is probably somehow exploitable (or at least can result in a kernel crash) by delivering crafted packets and perhaps triggering an infinite loop or a divide by zero... As a reminder: gso_size (MSS) is related to MTU, but not directly derived from it: gso_size/MSS may be significantly smaller then one would get by deriving from local MTU. And on some NICs (which do loose MTU checking on receive, it may even potentially be larger, for example my work pc with 1500 MTU can receive 1520 byte frames [and sometimes does due to bugs in a vendor plat46 implementation]). Indeed even just going from 21 to 1 is potentially problematic because it increases the number of segments by a factor of 21 (think DoS, or some other crash due to too many segments). (c) It's always safe to not increase the gso_size, because it doesn't result in the max packet size increasing. So the skb_increase_gso_size() call was always unnecessary for correctness (and outright undesirable, see later). As such the only part which is potentially dangerous (ie. could cause backwards compatibility issues) is the removal of the skb_decrease_gso_size() call. (d) If the packets are ultimately destined to the local device, then there is absolutely no benefit to playing around with gso_size. It only matters if the packets will egress the device. ie. we're either forwarding, or transmitting from the device. (e) This logic only triggers for packets which are GSO. It does not trigger for skbs which are not GSO. It will not convert a non-GSO MTU sized packet into a GSO packet (and you don't even know what the MTU is, so you can't even fix it). As such your transmit path must already be able to handle an MTU 20 bytes larger then your receive path (for IPv4 to IPv6 translation) - and indeed 28 bytes larger due to IPv4 fragments. Thus removing the skb_decrease_gso_size() call doesn't actually increase the size of the packets your transmit side must be able to handle. ie. to handle non-GSO max-MTU packets, the IPv4/IPv6 device/ route MTUs must already be set correctly. Since for example with an IPv4 egress MTU of 1500, IPv4 to IPv6 translation will already build 1520 byte IPv6 frames, so you need a 1520 byte device MTU. This means if your IPv6 device's egress MTU is 1280, your IPv4 route must be 1260 (and actually 1252, because of the need to handle fragments). This is to handle normal non-GSO packets. Thus the reduction is simply not needed for GSO packets, because when they're correctly built, they will already be the right size. (f) TSO/GSO should be able to exactly undo GRO: the number of packets (TCP segments) should not be modified, so that TCP's MSS counting works correctly (this matters for congestion control). If protocol conversion changes the gso_size, then the number of TCP segments may increase or decrease. Packet loss after protocol conversion can result in partial loss of MSS segments that the sender sent. How's the sending TCP stack going to react to receiving ACKs/SACKs in the middle of the segments it sent? (g) skb_{decrease,increase}_gso_size() are already no-ops for GSO_BY_FRAGS case (besides triggering WARN_ON_ONCE). This means you already cannot guarantee that gso_size (and thus resulting packet MTU) is changed. ie. you must assume it won't be changed. (h) changing gso_size is outright buggy for UDP GSO packets, where framing matters (I believe that's also the case for SCTP, but it's already excluded by [g]). So the only remaining case is TCP, which also doesn't want it (see [f]). (i) see also the reasoning on the previous attempt at fixing this (commit `fa7b83bf3b`) which shows that the current behaviour causes TCP packet loss: In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the coalesced packet payload can be > MSS, but < MSS + 20. bpf_skb_proto_6_to_4() will upgrade the MSS and it can be > the payload length. After then tcp_gso_segment checks for the payload length if it is <= MSS. The condition is causing the packet to be dropped. tcp_gso_segment(): [...] mss = skb_shinfo(skb)->gso_size; if (unlikely(skb->len <= mss)) goto out; [...] Thus changing the gso_size is simply a very bad idea. Increasing is unnecessary and buggy, and decreasing can go negative. Fixes: `6578171a7f` ("bpf: add bpf_skb_change_proto helper") Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Dongseok Yi <dseok.yi@samsung.com> Cc: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/bpf/CANP3RGfjLikQ6dg=YpBU0OeHvyv7JOki7CyOUS9modaXAi-9vQ@mail.gmail.com Link: https://lore.kernel.org/bpf/20210617000953.2787453-2-zenczykowski@gmail.com		2021-06-24 15:48:17 +02:00
..
6lowpan	6lowpan: Fix some typos in nhc_udp.c	2021-03-24 17:52:11 -07:00
9p	9p/trans_virtio: Fix spelling mistakes	2021-06-02 14:01:55 -07:00
802
8021q	net: vlan: Avoid using strncpy()	2021-06-03 14:15:10 -07:00
appletalk	net: appletalk: fix some mistakes in grammar	2021-06-08 19:27:57 -07:00
atm	atm: Use list_for_each_entry() to simplify code in resources.c	2021-06-10 14:08:09 -07:00
ax25	net/ax25: Delete obsolete TODO file	2021-03-30 16:54:50 -07:00
batman-adv	batman-adv: Drop reduntant batadv interface check	2021-06-02 22:25:45 +02:00
bluetooth	Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net	2021-06-07 13:01:52 -07:00
bpf	bpf: Prepare bpf syscall to be used from kernel and user space.	2021-05-19 00:33:40 +02:00
bpfilter	net: remove redundant 'depends on NET'	2021-01-27 17:04:12 -08:00
bridge	net: bridge: mrp: Update ring transitions.	2021-06-04 14:41:28 -07:00
caif	Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net	2021-06-07 13:01:52 -07:00
can	linux-can-next-for-5.14-20210527	2021-05-27 14:39:11 -07:00
ceph	libceph: Fix spelling mistakes	2021-06-03 13:24:23 -07:00
core	bpf: Do not change gso_size during bpf_skb_change_proto()	2021-06-24 15:48:17 +02:00
dcb	net: dcb: Return the correct errno code	2021-06-01 17:01:33 -07:00
dccp	dccp: tfrc: fix doc warnings in tfrc_equation.c	2021-06-10 14:08:49 -07:00
decnet	decnet: Fix spelling mistakes	2021-06-02 14:01:55 -07:00
dns_resolver	net: remove redundant 'depends on NET'	2021-01-27 17:04:12 -08:00
dsa	net: dsa: dsa_slave_phy_connect(): extend phy's flags with port specific phy flags	2021-06-14 12:54:43 -07:00
ethernet	of: net: pass the dst buffer to of_get_mac_address()	2021-04-13 14:35:02 -07:00
ethtool	ethtool: add a stricter length check	2021-06-16 00:40:44 -07:00
hsr	net: hsr: don't check sequence number if tag removal is offloaded	2021-06-16 12:13:01 -07:00
ieee802154	ieee802154: fix error return code in ieee802154_llsec_getparams()	2021-06-03 10:59:49 +02:00
ife	net: remove redundant 'depends on NET'	2021-01-27 17:04:12 -08:00
ipv4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	2021-06-17 11:54:56 -07:00
ipv6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	2021-06-17 11:54:56 -07:00
iucv	net/af_iucv: clean up some forward declarations	2021-06-12 13:06:33 -07:00
kcm	net: kcm: fix memory leak in kcm_sendmsg	2021-06-03 14:13:26 -07:00
key
l2tp	l2tp: Fix spelling mistakes	2021-06-07 14:08:30 -07:00
l3mdev	l3mdev: Correct function names in the kerneldoc comments	2021-03-28 17:56:55 -07:00
lapb	net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c	2021-06-08 16:31:25 -07:00
llc	llc2: Remove redundant assignment to rc	2021-04-27 14:16:14 -07:00
mac80211	mac80211: extend protection against mixed key and fragment cache attacks	2021-05-11 20:14:50 +02:00
mac802154	net: mac802154: Fix general protection fault	2021-04-06 22:42:16 +02:00
mpls	mpls: Remove redundant assignment to err	2021-04-27 14:17:00 -07:00
mptcp	Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net	2021-06-07 13:01:52 -07:00
ncsi	net/ncsi: Fix spelling mistakes	2021-06-07 14:08:30 -07:00
netfilter	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next	2021-06-09 14:50:35 -07:00
netlabel	netlabel: Fix memory leak in netlbl_mgmt_add_common	2021-06-15 11:19:04 -07:00
netlink	netlink: disable IRQs for netlink_lock_table()	2021-05-17 15:31:03 -07:00
netrom	net: netrom: nr_in: Remove redundant assignment to ns	2021-04-28 13:59:08 -07:00
nfc	Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net	2021-06-07 13:01:52 -07:00
nsh
openvswitch	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-05-27 09:55:10 -07:00
packet	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-05-27 09:55:10 -07:00
phonet
psample	psample: Add additional metadata attributes	2021-03-14 15:00:43 -07:00
qrtr	net: qrtr: ns: Fix error return code in qrtr_ns_init()	2021-05-19 13:14:35 -07:00
rds	rds: Fix spelling mistakes	2021-05-31 22:45:05 -07:00
rfkill	Another set of updates, all over the map:	2021-04-20 16:44:04 -07:00
rose	net: rose: Fix fall-through warnings for Clang	2021-03-10 12:45:15 -08:00
rxrpc	rxrpc: Fix a typo	2021-06-02 14:01:55 -07:00
sched	net: sched: fix error return code in tcf_del_walker()	2021-06-17 11:36:18 -07:00
sctp	sctp: sm_statefuns: Fix spelling mistakes	2021-05-31 22:45:15 -07:00
smc	net/smc: Make SMC statistics network namespace aware	2021-06-16 12:54:02 -07:00
strparser
sunrpc	NFS client updates for Linux 5.13	2021-05-07 11:23:41 -07:00
switchdev	net: bridge: propagate extack through switchdev_port_attr_set	2021-02-14 17:38:11 -08:00
tipc	tipc:subscr.c: fix a spelling mistake	2021-06-10 13:48:43 -07:00
tls	skbuff: add a parameter to __skb_frag_unref	2021-06-07 14:11:47 -07:00
unix	af_unix: remove the repeated word "and"	2021-06-10 13:46:38 -07:00
vmw_vsock	vsock/loopback: enable SEQPACKET for transport	2021-06-11 13:32:47 -07:00
wireless	cfg80211: mitigate A-MSDU aggregation attacks	2021-05-11 20:13:13 +02:00
x25	net: x25: Use list_for_each_entry() to simplify code in x25_route.c	2021-06-10 14:08:09 -07:00
xdp	xdp: Extend xdp_redirect_map with broadcast support	2021-05-26 09:46:16 +02:00
xfrm	xfrm: ipcomp: remove unnecessary get_cpu()	2021-04-19 12:49:29 +02:00
compat.c	net: Return the correct errno code	2021-06-03 15:13:56 -07:00
devres.c	net: devres: Correct a grammatical error	2021-06-11 12:55:28 -07:00
Kconfig	bpf, kconfig: Add consolidated menu entry for bpf with core options	2021-05-11 13:56:16 -07:00
Makefile	net: l3mdev: use obj-$(CONFIG_NET_L3_MASTER_DEV) form in net/Makefile	2021-01-27 17:03:52 -08:00
socket.c	net: Fix a misspell in socket.c	2021-03-25 16:56:27 -07:00
sysctl_net.c	net: Ensure net namespace isolation of sysctls	2021-04-12 13:27:11 -07:00