linux/net
Guillaume Nault 891584f48a inet: frags: re-introduce skb coalescing for local delivery
Before commit d4289fcc9b ("net: IP6 defrag: use rbtrees for IPv6
defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
generating many fragments) and running over an IPsec tunnel, reported
more than 6Gbps throughput. After that patch, the same test gets only
9Mbps when receiving on a be2net nic (driver can make a big difference
here, for example, ixgbe doesn't seem to be affected).

By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
(IPv4 fragment coalescing was dropped by commit 14fe22e334 ("Revert
"ipv4: use skb coalescing in defragmentation"")).

Without fragment coalescing, be2net runs out of Rx ring entries and
starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
the netperf traffic is only composed of UDP fragments, any lost packet
prevents reassembly of the full datagram. Therefore, fragments which
have no possibility to ever get reassembled pile up in the reassembly
queue, until the memory accounting exeeds the threshold. At that point
no fragment is accepted anymore, which effectively discards all
netperf traffic.

When reassembly timeout expires, some stale fragments are removed from
the reassembly queue, so a few packets can be received, reassembled
and delivered to the netperf receiver. But the nic still drops frames
and soon the reassembly queue gets filled again with stale fragments.
These long time frames where no datagram can be received explain why
the performance drop is so significant.

Re-introducing fragment coalescing is enough to get the initial
performances again (6.6Gbps with be2net): driver doesn't drop frames
anymore (no more rx_drops_no_frags errors) and the reassembly engine
works at full speed.

This patch is quite conservative and only coalesces skbs for local
IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
forwarding). Coalescing could be extended in the future if need be, as
more scenarios would probably benefit from it.

[0]: Test configuration
Sender:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
netserver -D -L fc00:2::1

Receiver:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 15:55:10 -07:00
..
6lowpan 6lowpan: no need to check return value of debugfs_create functions 2019-07-06 12:50:01 +02:00
9p 9p pull request for inclusion in 5.13 2019-07-12 17:31:19 -07:00
802 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
appletalk treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 372 2019-06-05 17:37:10 +02:00
atm treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ax25 ax25: fix inconsistent lock state in ax25_destroy_timer 2019-06-16 14:22:37 -07:00
batman-adv batman-adv: Fix deletion of RTR(4|6) mcast list entries 2019-07-22 21:34:58 +02:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-07-11 10:55:49 -07:00
bpf treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 2019-05-30 11:29:53 -07:00
bpfilter Kbuild updates for v5.3 2019-07-12 16:03:16 -07:00
bridge net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER 2019-08-05 13:32:53 -07:00
caif treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 194 2019-05-30 11:29:22 -07:00
can can: gw: Fix error path of cgw_module_init 2019-07-24 11:19:03 +02:00
ceph Lots of exciting things this time! 2019-07-18 11:05:25 -07:00
core net: fix bpf_xdp_adjust_head regression for generic-XDP 2019-08-05 11:17:40 -07:00
dcb treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201 2019-05-30 11:29:52 -07:00
dccp proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
decnet treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 53 2019-05-24 17:36:42 +02:00
dns_resolver Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs" 2019-07-10 18:43:43 -07:00
dsa net: dsa: sja1105: Fix memory leak on meta state machine error path 2019-08-06 14:37:02 -07:00
ethernet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
hsr hsr: switch ->dellink() to ->ndo_uninit() 2019-07-11 14:37:45 -07:00
ieee802154 inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
ife treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
ipv4 inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
ipv6 inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
iucv net/af_iucv: mark expected switch fall-throughs 2019-07-29 10:26:14 -07:00
kcm treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-07-08 19:48:57 -07:00
l2tp compat_ioctl: pppoe: fix PPPOEIOCSFWD handling 2019-07-30 14:42:13 -07:00
l3mdev ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF 2019-06-23 13:24:17 -07:00
lapb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-17 20:20:36 -07:00
llc treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 281 2019-06-05 17:36:36 +02:00
mac80211 Revert "mac80211: set NETIF_F_LLTX when using intermediate tx queues" 2019-07-30 14:52:50 +02:00
mac802154 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
mpls proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
ncsi treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
netfilter netfilter: ipset: Fix rename concurrency with listing 2019-07-29 21:18:07 +02:00
netlabel treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
netlink net: remove empty netlink_tap_exit_net 2019-06-14 19:50:33 -07:00
netrom netrom: hold sock when setting skb->destructor 2019-07-24 15:49:05 -07:00
nfc nfc: fix potential illegal memory access 2019-07-08 12:46:24 -07:00
nsh treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
openvswitch ovs: datapath: hide clang frame-overflow warnings 2019-07-24 15:45:11 -07:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-27 21:06:39 -07:00
phonet treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 336 2019-06-05 17:37:07 +02:00
psample treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
qrtr treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 284 2019-06-05 17:36:37 +02:00
rds net: rds: Fix possible null-pointer dereferences in rds_rdma_cm_event_handler_cmn() 2019-07-27 13:58:12 -07:00
rfkill treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
rose treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
rxrpc rxrpc: Fix the lack of notification when sendmsg() fails on a DATA packet 2019-07-30 15:27:59 +01:00
sched net sched: update vlan action for batched events operations 2019-08-06 14:05:39 -07:00
sctp net: sctp: drop unneeded likely() call around IS_ERR() 2019-07-29 13:57:58 -07:00
smc net/smc: avoid fallback in case of non-blocking connect 2019-08-05 13:24:37 -07:00
strparser Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
sunrpc Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
switchdev treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tipc tipc: compat: allow tipc commands without arguments 2019-08-01 18:14:00 -04:00
tls net/tls: partially revert fix transition through disconnect with close 2019-08-05 13:15:30 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
vmw_vsock hv_sock: Fix hang when a connection is closed 2019-08-02 17:26:27 -07:00
wimax treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 268 2019-06-05 17:30:29 +02:00
wireless {nl,mac}80211: fix interface combinations on crypto controlled devices 2019-07-26 13:50:43 +02:00
x25 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 41 2019-05-24 17:27:12 +02:00
xdp xdp: fix potential deadlock on socket mutex 2019-07-12 15:02:21 +02:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-07-08 19:48:57 -07:00
compat.c uio: make import_iovec()/compat_import_iovec() return bytes on success 2019-05-31 15:30:03 -06:00
Kconfig net: ipv4: move tcp_fastopen server side code to SipHash library 2019-06-17 13:56:26 -07:00
Makefile
socket.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
sysctl_net.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00