linux/net
Stefano Brivio 4cb47a8644 tunnels: PMTU discovery support for directly bridged IP packets
It's currently possible to bridge Ethernet tunnels carrying IP
packets directly to external interfaces without assigning them
addresses and routes on the bridged network itself: this is the case
for UDP tunnels bridged with a standard bridge or by Open vSwitch.

PMTU discovery is currently broken with those configurations, because
the encapsulation effectively decreases the MTU of the link, and
while we are able to account for this using PMTU discovery on the
lower layer, we don't have a way to relay ICMP or ICMPv6 messages
needed by the sender, because we don't have valid routes to it.

On the other hand, as a tunnel endpoint, we can't fragment packets
as a general approach: this is for instance clearly forbidden for
VXLAN by RFC 7348, section 4.3:

   VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
   fragment encapsulated VXLAN packets due to the larger frame size.
   The destination VTEP MAY silently discard such VXLAN fragments.

The same paragraph recommends that the MTU over the physical network
accomodates for encapsulations, but this isn't a practical option for
complex topologies, especially for typical Open vSwitch use cases.

Further, it states that:

   Other techniques like Path MTU discovery (see [RFC1191] and
   [RFC1981]) MAY be used to address this requirement as well.

Now, PMTU discovery already works for routed interfaces, we get
route exceptions created by the encapsulation device as they receive
ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
we already rebuild those messages with the appropriate MTU and route
them back to the sender.

Add the missing bits for bridged cases:

- checks in skb_tunnel_check_pmtu() to understand if it's appropriate
  to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
  RFC 4443 section 2.4 for ICMPv6. This function is already called by
  UDP tunnels

- a new function generating those ICMP or ICMPv6 replies. We can't
  reuse icmp_send() and icmp6_send() as we don't see the sender as a
  valid destination. This doesn't need to be generic, as we don't
  cover any other type of ICMP errors given that we only provide an
  encapsulation function to the sender

While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
we might receive GSO buffers here, and the passed headroom already
includes the inner MAC length, so we don't have to account for it
a second time (that would imply three MAC headers on the wire, but
there are just two).

This issue became visible while bridging IPv6 packets with 4500 bytes
of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
bytes of encapsulation headroom, we would advertise MTU as 3950, and
we would reject fragmented IPv6 datagrams of 3958 bytes size on the
wire. We're exclusively dealing with network MTU here, though, so we
could get Ethernet frames up to 3964 octets in that case.

v2:
- moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
- split IPv4/IPv6 functions (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04 13:01:45 -07:00
..
6lowpan treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
9p Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-02 01:02:12 -07:00
802
8021q net: get rid of lockdep_set_class_and_subclass() 2020-06-28 21:37:23 -07:00
appletalk appletalk: Fix atalk_proc_init() return path 2020-08-03 15:48:32 -07:00
atm net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
ax25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-25 17:49:04 -07:00
batman-adv batman-adv: Introduce a configurable per interface hop penalty 2020-06-26 10:37:11 +02:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-02 01:02:12 -07:00
bpf bpf: Allow to specify ifindex for skb in bpf_prog_test_run_skb 2020-08-03 23:32:23 +02:00
bpfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-02 01:02:12 -07:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2020-08-03 16:03:18 -07:00
caif net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
can net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
ceph libceph: don't omit used_replica in target_copy() 2020-06-16 16:02:08 +02:00
core Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-08-03 18:27:40 -07:00
dcb dcb_doit: remove redundant skb check 2020-06-23 20:27:09 -07:00
dccp net: remove sockptr_advance 2020-07-28 13:43:40 -07:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2020-08-03 16:03:18 -07:00
dns_resolver
dsa net: dsa: stop overriding master's ndo_get_phys_port_name 2020-07-23 15:14:58 -07:00
ethernet net: move devres helpers into a separate source file 2020-05-23 16:56:17 -07:00
ethtool mlx5-updates-2020-08-03 2020-08-03 18:24:30 -07:00
hsr hsr: Use %pM format specifier for MAC addresses 2020-07-31 16:46:26 -07:00
ieee802154 net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
ife
ipv4 tunnels: PMTU discovery support for directly bridged IP packets 2020-08-04 13:01:45 -07:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-08-03 18:27:40 -07:00
iucv net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
kcm net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-02 01:02:12 -07:00
l2tp l2tp: improve API documentation in l2tp_core.h 2020-07-30 16:45:31 -07:00
l3mdev l3mdev: add infrastructure for table to VRF mapping 2020-06-20 17:22:22 -07:00
lapb treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
llc net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
mac80211 mac80211: Do not report beacon loss if beacon filtering enabled 2020-08-03 13:02:06 +02:00
mac802154 treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
mpls net: Removed the device type check to add mpls support for devices 2020-07-27 11:40:47 -07:00
mptcp mptcp: fix bogus sendmsg() return code under pressure 2020-08-03 18:14:20 -07:00
ncsi net/ncsi: use eth_zero_addr() to clear mac address 2020-07-23 11:49:41 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2020-08-03 16:03:18 -07:00
netlabel net: netlabel: kerneldoc fixes 2020-07-13 17:20:40 -07:00
netlink bpf: Refactor bpf_iter_reg to have separate seq_info member 2020-07-25 20:16:32 -07:00
netrom net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-25 17:49:04 -07:00
nsh treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
openvswitch net: openvswitch: make masks cache size configurable 2020-08-03 15:17:48 -07:00
packet net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
phonet net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
psample net: psample: fix build error when CONFIG_INET is not enabled 2020-05-23 16:36:05 -07:00
qrtr Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-25 17:49:04 -07:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-02 01:02:12 -07:00
rfkill
rose net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-02 01:02:12 -07:00
sched net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct 2020-08-03 15:04:48 -07:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-25 17:49:04 -07:00
smc net/smc: unique reason code for exceeded max dmb count 2020-07-27 10:30:01 -07:00
strparser
sunrpc xprtrdma: fix incorrect header size calculations 2020-07-15 13:01:01 -04:00
switchdev net: switchdev: kerneldoc fixes 2020-07-13 17:20:40 -07:00
tipc tipc: Use is_broadcast_ether_addr() instead of memcmp() 2020-08-03 16:21:46 -07:00
tls net: remove sockptr_advance 2020-07-28 13:43:40 -07:00
unix net: make ->{get,set}sockopt in proto_ops optional 2020-07-19 18:16:41 -07:00
vmw_vsock Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-25 17:49:04 -07:00
wimax
wireless nl80211: use eth_zero_addr() to clear mac address 2020-08-03 10:56:22 +02:00
x25 net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
xdp xdp: Prevent kernel-infoleak in xsk_getsockopt() 2020-07-28 12:50:15 +02:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-02 01:02:12 -07:00
compat.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-02 01:02:12 -07:00
devres.c net: devres: rename the release callback of devm_register_netdev() 2020-06-30 15:57:34 -07:00
Kconfig net: ethtool: Remove PHYLIB direct dependency 2020-07-07 15:41:05 -07:00
Makefile net: move devres helpers into a separate source file 2020-05-23 16:56:17 -07:00
socket.c net: improve the user pointer check in init_user_sockptr 2020-07-28 13:43:40 -07:00
sysctl_net.c