linux/net/ipv6
Jakub Sitnicki 00bc0ef588 ipv6: Skip XFRM lookup if dst_entry in socket cache is valid
At present we perform an xfrm_lookup() for each UDPv6 message we
send. The lookup involves querying the flow cache (flow_cache_lookup)
and, in case of a cache miss, creating an XFRM bundle.

If we miss the flow cache, we can end up creating a new bundle and
deriving the path MTU (xfrm_init_pmtu) from on an already transformed
dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
to xfrm_lookup(). This can happen only if we're caching the dst_entry
in the socket, that is when we're using a connected UDP socket.

To put it another way, the path MTU shrinks each time we miss the flow
cache, which later on leads to incorrectly fragmented payload. It can
be observed with ESPv6 in transport mode:

  1) Set up a transformation and lower the MTU to trigger fragmentation
    # ip xfrm policy add dir out src ::1 dst ::1 \
      tmpl src ::1 dst ::1 proto esp spi 1
    # ip xfrm state add src ::1 dst ::1 \
      proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
    # ip link set dev lo mtu 1500

  2) Monitor the packet flow and set up an UDP sink
    # tcpdump -ni lo -ttt &
    # socat udp6-listen:12345,fork /dev/null &

  3) Send a datagram that needs fragmentation with a connected socket
    # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
    2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
    00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
    00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
    00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
    (^ ICMPv6 Parameter Problem)
    00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136

  4) Compare it to a non-connected socket
    # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
    00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
    00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)

What happens in step (3) is:

  1) when connecting the socket in __ip6_datagram_connect(), we
     perform an XFRM lookup, miss the flow cache, create an XFRM
     bundle, and cache the destination,

  2) afterwards, when sending the datagram, we perform an XFRM lookup,
     again, miss the flow cache (due to mismatch of flowi6_iif and
     flowi6_oif, which is an issue of its own), and recreate an XFRM
     bundle based on the cached (and already transformed) destination.

To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
altogether whenever we already have a destination entry cached in the
socket. This prevents the path MTU shrinkage and brings us on par with
UDPv4.

The fix also benefits connected PINGv6 sockets, another user of
ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
twice.

Joint work with Hannes Frederic Sowa.

Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-08 11:16:06 -07:00
..
ila ila: ipv6/ila: fix nlsize calculation for lwtunnel 2016-05-10 16:00:25 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2016-06-01 17:54:19 -07:00
addrconf_core.c ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument 2015-07-31 15:21:30 -07:00
addrconf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-27 15:43:10 -04:00
addrlabel.c ipv6/addrlabel: fix ip6addrlbl_get() 2015-12-22 15:57:54 -05:00
af_inet6.c udp: Add GRO functions to UDP socket 2016-04-07 16:53:29 -04:00
ah6.c ah6: fix error return code 2015-08-25 13:37:31 -07:00
anycast.c ipv6: coding style: comparison for equality with NULL 2015-03-31 13:51:54 -04:00
datagram.c sock: propagate __sock_cmsg_send() error 2016-05-16 13:46:23 -04:00
esp6.c esp6: Switch to new AEAD interface 2015-05-28 11:23:20 +08:00
exthdrs_core.c ipv6: re-enable fragment header matching in ipv6_find_hdr 2016-03-03 16:35:20 -05:00
exthdrs_offload.c ipv6: fix exthdrs offload registration in out_rt path 2015-09-02 15:31:00 -07:00
exthdrs.c ipv6: rename IP6_INC_STATS_BH() 2016-04-27 22:48:24 -04:00
fib6_rules.c ipv6: fix the incorrect return value of throw route 2015-10-23 02:38:18 -07:00
fou6.c fou: add Kconfig options for IPv6 support 2016-05-29 22:24:21 -07:00
icmp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-09 15:59:24 -04:00
inet6_connection_sock.c soreuseport: fast reuseport TCP socket selection 2016-02-11 03:54:15 -05:00
inet6_hashtables.c net: rename NET_{ADD|INC}_STATS_BH() 2016-04-27 22:48:24 -04:00
ip6_checksum.c ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short 2016-03-13 23:55:13 -04:00
ip6_fib.c net: vrf: Create FIB tables on link create 2016-05-06 15:51:47 -04:00
ip6_flowlabel.c ipv6: add new struct ipcm6_cookie 2016-05-03 16:08:14 -04:00
ip6_gre.c ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit path. 2016-05-24 14:33:48 -07:00
ip6_icmp.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
ip6_input.c ipv6: Change "final" protocol processing for encapsulation 2016-05-20 18:03:16 -04:00
ip6_offload.c ip4ip6: Support for GSO/GRO 2016-05-20 18:03:17 -04:00
ip6_offload.h udp: Add GRO functions to UDP socket 2016-04-07 16:53:29 -04:00
ip6_output.c ipv6: Skip XFRM lookup if dst_entry in socket cache is valid 2016-06-08 11:16:06 -07:00
ip6_tunnel.c ipv6: Don't reset inner headers in ip6_tnl_xmit 2016-05-20 18:03:17 -04:00
ip6_udp_tunnel.c ip_tunnel: add support for setting flow label via collect metadata 2016-03-11 15:14:26 -05:00
ip6_vti.c net: replace dst_cache ip6_tunnel implementation with the generic one 2016-02-16 20:21:48 -05:00
ip6mr.c ipv6: rename IP6_INC_STATS_BH() 2016-04-27 22:48:24 -04:00
ipcomp6.c ipv6: White-space cleansing : Structure layouts 2014-08-24 22:37:52 -07:00
ipv6_sockglue.c ipv6: add new struct ipcm6_cookie 2016-05-03 16:08:14 -04:00
Kconfig fou: fix IPv6 Kconfig options 2016-05-31 14:07:49 -07:00
Makefile fou: fix IPv6 Kconfig options 2016-05-31 14:07:49 -07:00
mcast_snoop.c net: fix wrong skb_get() usage / crash in IGMP/MLD parsing code 2015-08-13 17:08:39 -07:00
mcast.c mld, igmp: Fix reserved tailroom calculation 2016-03-03 15:41:07 -05:00
mip6.c ipv6: use ktime_t for internal timestamps 2015-10-05 03:16:47 -07:00
ndisc.c ipv6: add option to drop unsolicited neighbor advertisements 2016-02-11 04:27:36 -05:00
netfilter.c ipv6: Pass struct net into ip6_route_me_harder 2015-09-29 20:21:32 +02:00
output_core.c ipv4, ipv6: Pass net into ip_local_out and ip6_local_out 2015-10-08 04:27:02 -07:00
ping.c ipv6: add new struct ipcm6_cookie 2016-05-03 16:08:14 -04:00
proc.c udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts 2014-11-07 15:45:50 -05:00
protocol.c net: Export inet_offloads and inet6_offloads 2014-09-19 17:15:31 -04:00
raw.c ipv6: add new struct ipcm6_cookie 2016-05-03 16:08:14 -04:00
reassembly.c ipv6: rename IP6_INC_STATS_BH() 2016-04-27 22:48:24 -04:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-15 13:32:48 -04:00
sit.c net: define gso types for IPx over IPv4 and IPv6 2016-05-20 18:03:15 -04:00
syncookies.c net: rename NET_{ADD|INC}_STATS_BH() 2016-04-27 22:48:24 -04:00
sysctl_net_ipv6.c ipv6: Implement different admin modes for automatic flow labels 2015-07-31 17:07:11 -07:00
tcp_ipv6.c tcp: record TLP and ER timer stats in v6 stats 2016-06-07 17:12:22 -07:00
tcpv6_offload.c tcp: cleanup static functions 2015-02-28 16:56:51 -05:00
tunnel6.c ipv6: fix tunnel error handling 2015-11-03 10:52:13 -05:00
udp_impl.h net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
udp_offload.c gso: Remove arbitrary checks for unsupported GSO 2016-05-20 18:03:15 -04:00
udp.c Possible problem with e6afc8ac ("udp: remove headers from UDP packets before queueing") 2016-06-02 18:29:49 -04:00
udplite.c net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
xfrm6_input.c netfilter: Pass struct net into the netfilter hooks 2015-09-17 17:18:37 -07:00
xfrm6_mode_beet.c xfrm: simplify xfrm_address_t use 2015-03-31 13:58:35 -04:00
xfrm6_mode_ro.c
xfrm6_mode_transport.c
xfrm6_mode_tunnel.c ipv6: update skb->csum when CE mark is propagated 2016-01-15 15:07:23 -05:00
xfrm6_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm6_policy.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2015-12-22 16:26:31 -05:00
xfrm6_protocol.c xfrm6: Properly handle unsupported protocols 2014-05-06 07:08:38 +02:00
xfrm6_state.c ipv6: White-space cleansing : Line Layouts 2014-08-24 22:37:52 -07:00
xfrm6_tunnel.c ipv6: White-space cleansing : gaps between function and symbol export 2014-08-24 22:37:52 -07:00