linux/net
Shmulik Ladkani b8247f095e net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs
Given:
 - tap0 and vxlan0 are bridged
 - vxlan0 stacked on eth0, eth0 having small mtu (e.g. 1400)

Assume GSO skbs arriving from tap0 having a gso_size as determined by
user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500).

After encapsulation these skbs have skb_gso_network_seglen that exceed
eth0's ip_skb_dst_mtu.

These skbs are accidentally passed to ip_finish_output2 AS IS.
Alas, each final segment (segmented either by validate_xmit_skb or by
hardware UFO) would be larger than eth0 mtu.
As a result, those above-mtu segments get dropped on certain networks.

This behavior is not aligned with the NON-GSO case:
Assume a non-gso 1500-sized IP packet arrives from tap0. After
encapsulation, the vxlan datagram is fragmented normally at the
ip_finish_output-->ip_fragment code path.

The expected behavior for the GSO case would be segmenting the
"gso-oversized" skb first, then fragmenting each segment according to
dst mtu, and finally passing the resulting fragments to ip_finish_output2.

'ip_finish_output_gso' already supports this "Slowpath" behavior,
according to the IPSKB_FRAG_SEGS flag, which is only set during ipv4
forwarding (not set in the bridged case).

In order to support the bridged case, we'll mark skbs arriving from an
ingress interface that get udp-encaspulated as "allowed to be fragmented",
causing their network_seglen to be validated by 'ip_finish_output_gso'
(and fragment if needed).

Note the TUNNEL_DONT_FRAGMENT tun_flag is still honoured (both in the
gso and non-gso cases), which serves users wishing to forbid
fragmentation at the udp tunnel endpoint.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 16:40:22 -07:00
..
6lowpan 6lowpan: ndisc: set invalid unicast short addr to unspec 2016-07-08 13:23:12 +02:00
9p remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
802
8021q net: introduce default neigh_construct/destroy ndo calls for L2 upper devices 2016-07-05 09:06:28 -07:00
appletalk
atm net: add dev arg to ndo_neigh_construct/destroy 2016-07-05 09:06:28 -07:00
ax25 AX.25: Close socket connection on session completion 2016-06-18 20:55:34 -07:00
batman-adv This feature patchset includes the following changes: 2016-07-04 23:33:59 -07:00
bluetooth Bluetooth: Increment management interface revision 2016-07-13 10:02:52 +02:00
bridge net: bridge: remove _deliver functions and consolidate forward code 2016-07-16 19:57:38 -07:00
caif caif: Remove unneeded header file 2016-06-28 05:26:14 -04:00
can can: only call can_stat_update with procfs 2016-06-23 11:23:49 +02:00
ceph libceph: use %s instead of %pE in dout()s 2016-05-30 23:00:23 +02:00
core bpf: avoid stack copy and use skb ctx for event output 2016-07-15 14:23:56 -07:00
dcb
dccp dccp: do not assume DCCP code is non preemptible 2016-05-02 17:02:25 -04:00
decnet net: fix decnet rtnexthop parsing 2016-07-05 14:08:47 -07:00
dns_resolver KEYS: Add a facility to restrict new links into a keyring 2016-04-11 22:37:37 +01:00
dsa net: dsa: Fix non static symbol warning 2016-07-12 11:34:30 -07:00
ethernet
hsr net/hsr: Use setup_timer and mod_timer. 2016-05-16 14:00:43 -04:00
ieee802154 ieee802154: 6lowpan: fix intra pan id check 2016-07-08 13:23:12 +02:00
ipv4 net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs 2016-07-19 16:40:22 -07:00
ipv6 net: ipmr/ip6mr: add support for keeping an entry age 2016-07-16 20:19:43 -07:00
ipx
irda TTY and Serial driver update for 4.7-rc1 2016-05-20 20:57:27 -07:00
iucv af_iucv: use paged SKBs for big inbound messages 2016-06-15 12:21:05 -07:00
kcm bpf: refactor bpf_prog_get and type check into helper 2016-07-01 16:00:47 -04:00
key
l2tp ipv6: use TOS marks from sockets for routing decision 2016-06-11 15:33:26 -07:00
l3mdev net: vrf: Implement get_saddr for IPv6 2016-06-17 21:25:29 -07:00
lapb net/lapb: tuse %*ph to dump buffers 2016-05-29 22:33:25 -07:00
llc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-09 15:59:24 -04:00
mac80211 cfg80211: Add mesh peer AID setting API 2016-07-06 15:04:52 +02:00
mac802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
mpls mpls: allow routes on ipip and sit devices 2016-07-09 17:45:56 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2016-07-06 09:15:15 -07:00
netlabel netlabel: fix a problem with netlbl_secattr_catmap_setrng() 2016-04-05 16:10:47 -04:00
netlink net/netlink/af_netlink.h: Remove unused structure. 2016-06-09 22:26:24 -07:00
netrom
nfc nfc: nci: Add nci_nfcc_loopback to the nci core 2016-05-04 01:48:16 +02:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-07-06 10:35:22 -07:00
phonet
qrtr Merge tag 'qcom-soc-for-4.7-2' into net-next 2016-05-17 14:11:19 -04:00
rds RDS: TCP: Enable multipath RDS for TCP 2016-07-15 11:36:58 -07:00
rfkill rfkill: Use switch to demux userspace operations 2016-04-05 10:48:53 +02:00
rose
rxrpc rxrpc: checking for IS_ERR() instead of NULL 2016-07-15 14:16:25 -07:00
sched hfsc: reduce hfsc_sched to 14 cachelines 2016-07-08 23:08:39 -04:00
sctp sctp: fix GSO for IPv6 2016-07-16 22:02:09 -07:00
sunrpc rpc: share one xps between all backchannels 2016-06-15 10:32:25 -04:00
switchdev net/switchdev: Export the same parent ID service function 2016-07-14 13:34:29 -07:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-07-06 10:35:22 -07:00
unix Merge branch 'overlayfs-af_unix-fix' into overlayfs-linus 2016-06-12 12:05:21 +02:00
vmw_vsock vsock: make listener child lock ordering explicit 2016-06-27 10:44:46 -04:00
wimax
wireless cfg80211: Add mesh peer AID setting API 2016-07-06 15:04:52 +02:00
x25 net: fix a kernel infoleak in x25 module 2016-05-09 22:45:33 -04:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-09 15:59:24 -04:00
compat.c packet: compat support for sock_fprog 2016-06-09 23:41:03 -07:00
Kconfig bpf: add generic constant blinding for use in jits 2016-05-16 13:49:32 -04:00
Makefile net: Add Qualcomm IPC router 2016-05-08 23:46:14 -04:00
socket.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
sysctl_net.c