linux/include/net/netns
Eric Dumazet 133c4c0d37 tcp: defer regular ACK while processing socket backlog
This idea came after a particular workload requested
the quickack attribute set on routes, and a performance
drop was noticed for large bulk transfers.

For high throughput flows, it is best to use one cpu
running the user thread issuing socket system calls,
and a separate cpu to process incoming packets from BH context.
(With TSO/GRO, bottleneck is usually the 'user' cpu)

Problem is the user thread can spend a lot of time while holding
the socket lock, forcing BH handler to queue most of incoming
packets in the socket backlog.

Whenever the user thread releases the socket lock, it must first
process all accumulated packets in the backlog, potentially
adding latency spikes. Due to flood mitigation, having too many
packets in the backlog increases chance of unexpected drops.

Backlog processing unfortunately shifts a fair amount of cpu cycles
from the BH cpu to the 'user' cpu, thus reducing max throughput.

This patch takes advantage of the backlog processing,
and the fact that ACK are mostly cumulative.

The idea is to detect we are in the backlog processing
and defer all eligible ACK into a single one,
sent from tcp_release_cb().

This saves cpu cycles on both sides, and network resources.

Performance of a single TCP flow on a 200Gbit NIC:

- Throughput is increased by 20% (100Gbit -> 120Gbit).
- Number of generated ACK per second shrinks from 240,000 to 40,000.
- Number of backlog drops per second shrinks from 230 to 0.

Benchmark context:
 - Regular netperf TCP_STREAM (no zerocopy)
 - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
 - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)

This feature is guarded by a new sysctl, and enabled by default:
 /proc/sys/net/ipv4/tcp_backlog_ack_defer

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
..
bpf.h bpf: Invert the dependency between bpf-netns.h and netns/bpf.h 2021-12-29 20:03:05 -08:00
can.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
conntrack.h netfilter: ctnetlink: make event listener tracking global 2023-02-22 00:28:47 +01:00
core.h net: make default_rps_mask a per netns attribute 2023-02-20 11:22:54 +00:00
flow_table.h netfilter: nf_flow_table: count pending offload workqueue tasks 2022-07-11 16:25:14 +02:00
generic.h netns: Replace zero-length array with DECLARE_FLEX_ARRAY() helper 2022-09-28 18:51:47 -07:00
hash.h netns: provide pure entropy for net_hash_mix() 2019-03-28 17:00:45 -07:00
ieee802154_6lowpan.h net: dynamically allocate fqdir structures 2019-05-26 14:08:05 -07:00
ipv4.h tcp: defer regular ACK while processing socket backlog 2023-09-12 19:10:01 +02:00
ipv6.h net/ipv6: convert skip_notify_on_dev_down sysctl to u8 2023-06-02 22:55:43 -07:00
mctp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
mib.h net: reorganize fields in netns_mib 2021-04-02 14:31:44 -07:00
mpls.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
netfilter.h Remove DECnet support from kernel 2022-08-22 14:26:30 +01:00
nexthop.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
nftables.h net: Remove leftover include from nftables.h 2023-08-13 14:55:25 +01:00
packet.h
sctp.h sctp: add dif and sdif check in asoc and ep lookup 2022-11-18 11:42:54 +00:00
smc.h net/smc: Unbind r/w buffer size from clcsock and make them tunable 2022-09-22 12:58:21 +02:00
unix.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
xdp.h net: xsk: Don't include <linux/rculist.h> 2022-12-06 20:04:34 -08:00
xfrm.h xfrm: rework default policy structure 2022-03-18 07:23:12 +01:00