linux/net/ipv4
Paolo Abeni cfde141ea3 mptcp: move option parsing into mptcp_incoming_options()
The mptcp_options_received structure carries several per
packet flags (mp_capable, mp_join, etc.). Such fields must
be cleared on each packet, even on dropped ones or packet
not carrying any MPTCP options, but the current mptcp
code clears them only on TCP option reset.

On several races/corner cases we end-up with stray bits in
incoming options, leading to WARN_ON splats. e.g.:

[  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
[  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
[  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
[  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
[  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
[  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
[  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
[  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
[  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
[  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
[  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
[  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
[  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
[  171.232586] Call Trace:
[  171.233109]  <IRQ>
[  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
[  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
[  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
[  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
[  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
[  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
[  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
[  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
[  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
[  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
[  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
[  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
[  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
[  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
[  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
[  171.282358]  </IRQ>

We could address the issue clearing explicitly the relevant fields
in several places - tcp_parse_option, tcp_fast_parse_options,
possibly others.

Instead we move the MPTCP option parsing into the already existing
mptcp ingress hook, so that we need to clear the fields in a single
place.

This allows us dropping an MPTCP hook from the TCP code and
removing the quite large mptcp_options_received from the tcp_sock
struct. On the flip side, the MPTCP sockets will traverse the
option space twice (in tcp_parse_option() and in
mptcp_incoming_options(). That looks acceptable: we already
do that for syn and 3rd ack packets, plain TCP socket will
benefit from it, and even MPTCP sockets will experience better
code locality, reducing the jumps between TCP and MPTCP code.

v1 -> v2:
 - rebased on current '-net' tree

Fixes: 648ef4b886 ("mptcp: Implement MPTCP receive path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:23:22 -07:00
..
bpfilter
netfilter netfilter: Replace zero-length array with flexible-array member 2020-03-15 15:20:16 +01:00
af_inet.c mptcp: add and use MIB counter infrastructure 2020-03-29 22:14:49 -07:00
ah4.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
arp.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
bpf_tcp_ca.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-03-30 19:52:37 -07:00
cipso_ipv4.c ipv4: ensure rcu_read_lock() in cipso_v4_error() 2020-02-22 21:45:55 -08:00
datagram.c inet: stop leaking jiffies on the wire 2019-11-01 14:57:52 -07:00
devinet.c net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin 2020-04-09 10:27:23 -07:00
esp4_offload.c esp4: add gso_segment for esp4 beet mode 2020-03-26 14:51:07 +01:00
esp4.c ESP: Export esp_output_fill_trailer function 2020-02-19 13:52:32 +01:00
fib_frontend.c ipv4: fix a RCU-list lock in inet_dump_fib() 2020-03-23 20:59:40 -07:00
fib_lookup.h net: add net available in build_state 2020-03-29 22:30:57 -07:00
fib_notifier.c
fib_rules.c
fib_semantics.c ipv4: Update fib_select_default to handle nexthop objects 2020-04-22 19:57:39 -07:00
fib_trie.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-30 20:48:43 -07:00
fou.c fou: Fix IPv6 netlink policy 2020-01-23 14:32:52 +01:00
gre_demux.c gre: fix uninit-value in __iptunnel_pull_header 2020-03-08 21:25:37 -07:00
gre_offload.c net: remove the check argument from __skb_gro_checksum_convert 2020-01-03 12:24:34 -08:00
icmp.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
igmp.c igmp: remove unused macro IGMP_Vx_UNSOLICITED_REPORT_INTERVAL 2020-02-23 21:15:36 -08:00
inet_connection_sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
inet_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
inet_fragment.c
inet_hashtables.c tcp/dccp: fix possible race __inet_lookup_established() 2019-12-13 21:40:49 -08:00
inet_timewait_sock.c
inetpeer.c inetpeer: fix data-race in inet_putpeer / inet_putpeer 2019-11-07 16:15:56 -08:00
ip_forward.c
ip_fragment.c
ip_gre.c net: ip_gre: Accept IFLA_INFO_DATA-less configuration 2020-03-16 17:19:56 -07:00
ip_input.c bpf: Add socket assign support 2020-03-30 13:45:04 -07:00
ip_options.c
ip_output.c net: Fix typo of SKB_SGO_CB_OFFSET 2020-03-29 21:53:18 -07:00
ip_sockglue.c
ip_tunnel_core.c net: add net available in build_state 2020-03-29 22:30:57 -07:00
ip_tunnel.c net, ip_tunnel: fix interface lookup with no key 2020-03-29 22:16:27 -07:00
ip_vti.c vti[6]: fix packet tx through bpf_redirect() in XinY cases 2020-02-06 13:27:30 +01:00
ipcomp.c
ipconfig.c Documentation: nfsroot.rst: Fix references to nfsroot.rst 2020-03-02 13:11:46 -07:00
ipip.c
ipmr_base.c
ipmr.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
Kconfig This has been a busy cycle for documentation work. Highlights include: 2020-03-30 12:45:23 -07:00
Makefile bpf: Add sockmap hooks for UDP sockets 2020-03-09 22:34:58 +01:00
metrics.c
netfilter.c
netlink.c
nexthop.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
ping.c
proc.c mptcp: add and use MIB counter infrastructure 2020-03-29 22:14:49 -07:00
protocol.c
raw_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
raw.c raw: Add missing annotations to raw_seq_start() and raw_seq_stop() 2020-03-11 23:19:40 -07:00
route.c Remove DST_HOST 2020-03-23 21:57:44 -07:00
syncookies.c mptcp: handle tcp fallback when using syn cookies 2020-01-29 17:45:20 +01:00
sysctl_net_ipv4.c tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted. 2020-03-12 12:08:09 -07:00
tcp_bbr.c tcp_bbr: improve arithmetic division in bbr_update_bw() 2020-01-21 10:45:49 +01:00
tcp_bic.c tcp: fix stretch ACK bugs in BIC 2020-03-16 18:26:54 -07:00
tcp_bpf.c bpf, tcp: Make tcp_bpf_recvmsg static 2020-03-20 15:56:55 +01:00
tcp_cdg.c
tcp_cong.c bpf: tcp: Support tcp_congestion_ops in bpf 2020-01-09 08:46:18 -08:00
tcp_cubic.c tcp_cubic: refactor code to perform a divide only when needed 2019-12-30 14:44:27 -08:00
tcp_dctcp.c
tcp_dctcp.h
tcp_diag.c inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data 2020-02-27 18:50:19 -08:00
tcp_fastopen.c tcp: add TCP_INFO status for failed client TFO 2019-10-25 19:25:37 -07:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c mptcp: move option parsing into mptcp_incoming_options() 2020-04-30 12:23:22 -07:00
tcp_ipv4.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
tcp_lp.c
tcp_metrics.c net-tcp: Disable TCP ssthresh metrics cache by default 2019-12-09 20:17:48 -08:00
tcp_minisocks.c mptcp: Add handling of incoming MP_JOIN requests 2020-03-29 22:14:48 -07:00
tcp_nv.c
tcp_offload.c
tcp_output.c tcp: also NULL skb->dev when copy was needed 2020-03-20 19:36:38 -07:00
tcp_rate.c
tcp_recovery.c
tcp_scalable.c tcp: fix stretch ACK bugs in Scalable 2020-03-16 18:26:54 -07:00
tcp_timer.c tcp: export count for rehash attempts 2020-01-26 15:28:47 +01:00
tcp_ulp.c bpf: sockmap: Only check ULP for TCP sockets 2020-03-09 22:34:58 +01:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c tcp: fix stretch ACK bugs in Veno 2020-03-16 18:26:55 -07:00
tcp_westwood.c
tcp_yeah.c tcp: fix stretch ACK bugs in Yeah 2020-03-16 18:26:55 -07:00
tcp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-25 18:58:11 -07:00
tunnel4.c
udp_bpf.c bpf: Add sockmap hooks for UDP sockets 2020-03-09 22:34:58 +01:00
udp_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
udp_impl.h
udp_offload.c udp: initialize is_flist with 0 in udp_gro_receive 2020-03-30 10:35:03 -07:00
udp_tunnel.c
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-03-30 19:52:37 -07:00
udplite.c
xfrm4_input.c
xfrm4_output.c xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish 2020-04-22 12:32:11 -07:00
xfrm4_policy.c net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
xfrm4_protocol.c xfrm: add route lookup to xfrm4_rcv_encap 2019-12-09 09:59:07 +01:00
xfrm4_state.c
xfrm4_tunnel.c