linux/Documentation
Eric Dumazet 65466904b0 tcp: adjust TSO packet sizes based on min_rtt
Back when tcp_tso_autosize() and TCP pacing were introduced,
our focus was really to reduce burst sizes for long distance
flows.

The simple heuristic of using sk_pacing_rate/1024 has worked
well, but can lead to too small packets for hosts in the same
rack/cluster, when thousands of flows compete for the bottleneck.

Neal Cardwell had the idea of making the TSO burst size
a function of both sk_pacing_rate and tcp_min_rtt()

Indeed, for local flows, sending bigger bursts is better
to reduce cpu costs, as occasional losses can be repaired
quite fast.

This patch is based on Neal Cardwell implementation
done more than two years ago.
bbr is adjusting max_pacing_rate based on measured bandwidth,
while cubic would over estimate max_pacing_rate.

/proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable
this new feature, in logarithmic steps.

Tested:

100Gbit NIC, two hosts in the same rack, 4K MTU.
600 flows rate-limited to 20000000 bytes per second.

Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO)

~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96005

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         65,945.29 msec task-clock                #    2.845 CPUs utilized
         1,314,632      context-switches          # 19935.279 M/sec
             5,292      cpu-migrations            #   80.249 M/sec
           940,641      page-faults               # 14264.023 M/sec
   201,117,030,926      cycles                    # 3049769.216 GHz                   (83.45%)
    17,699,435,405      stalled-cycles-frontend   #    8.80% frontend cycles idle     (83.48%)
   136,584,015,071      stalled-cycles-backend    #   67.91% backend cycles idle      (83.44%)
    53,809,530,436      instructions              #    0.27  insn per cycle
                                                  #    2.54  stalled cycles per insn  (83.36%)
     9,062,315,523      branches                  # 137422329.563 M/sec               (83.22%)
       153,008,621      branch-misses             #    1.69% of all branches          (83.32%)

      23.182970846 seconds time elapsed

TcpInSegs                       15648792           0.0
TcpOutSegs                      58659110           0.0  # Average of 3.7 4K segments per TSO packet
TcpExtTCPDelivered              58654791           0.0
TcpExtTCPDeliveredCE            19                 0.0

After patch:

~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96046

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         48,982.58 msec task-clock                #    2.104 CPUs utilized
           186,014      context-switches          # 3797.599 M/sec
             3,109      cpu-migrations            #   63.472 M/sec
           941,180      page-faults               # 19214.814 M/sec
   153,459,763,868      cycles                    # 3132982.807 GHz                   (83.56%)
    12,069,861,356      stalled-cycles-frontend   #    7.87% frontend cycles idle     (83.32%)
   120,485,917,953      stalled-cycles-backend    #   78.51% backend cycles idle      (83.24%)
    36,803,672,106      instructions              #    0.24  insn per cycle
                                                  #    3.27  stalled cycles per insn  (83.18%)
     5,947,266,275      branches                  # 121417383.427 M/sec               (83.64%)
        87,984,616      branch-misses             #    1.48% of all branches          (83.43%)

      23.281200256 seconds time elapsed

TcpInSegs                       1434706            0.0
TcpOutSegs                      58883378           0.0  # Average of 41 4K segments per TSO packet
TcpExtTCPDelivered              58878971           0.0
TcpExtTCPDeliveredCE            9664               0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20220309015757.2532973-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:05:44 -08:00
..
ABI docs: ABI: Document new timecard sysfs nodes. 2022-03-03 14:42:46 +00:00
accounting - A bunch of fixes: forced idle time accounting, utilization values 2022-01-23 17:35:27 +02:00
admin-guide Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-03 17:36:16 -08:00
arc docs: ARC: Improve readability 2021-12-10 14:28:01 -07:00
arm Documentation: arm: marvell: Extend Avanta list 2022-01-27 11:22:34 -07:00
arm64 KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata 2022-02-03 09:22:30 +00:00
block docs: block: remove queue-sysfs.rst 2022-01-09 18:59:10 -07:00
bpf bpf, docs: Add a missing colon in verifier.rst 2022-02-28 18:20:35 +01:00
cdrom
core-api swiotlb: fix info leak with DMA_FROM_DEVICE 2022-02-14 10:22:28 +01:00
cpu-freq cpufreq: Reintroduce ready() callback 2022-02-09 13:18:49 +05:30
crypto
dev-tools linux-kselftest-kunit-fixes-5.17-rc4 2022-02-10 15:39:59 -08:00
devicetree dt-bindings: net: dsa: add rtl8_4 and rtl8_4t tag formats 2022-03-05 11:04:25 +00:00
doc-guide docs: discourage use of list tables 2022-01-07 09:33:13 -07:00
driver-api Three small documentation fixes. 2022-01-22 09:02:57 +02:00
fault-injection
fb
features ARM: 9158/1: leave it to core code to manage thread_info::cpu 2021-12-17 11:34:31 +00:00
filesystems netfs, cachefiles: Add a method to query presence of data in the cache 2022-02-01 10:29:18 -06:00
firmware_class
firmware-guide Device properties framework updates for 5.17-rc1 2022-01-10 20:48:19 -08:00
fpga
gpu Revert "fbcon: Disable accelerated scrolling" 2022-02-02 15:15:11 +01:00
hid
hwmon hwmon/pmbus: (ir38064) Add support for IR38060, IR38164 IR38263 2021-12-26 15:02:07 -08:00
i2c Docs: Fixes link to I2C specification 2021-12-31 14:39:28 +01:00
ia64
ide
iio
infiniband
input
isdn
kbuild doc: kbuild: fix default in imply table 2022-01-08 18:28:21 +09:00
kernel-hacking docs: fix typo in Documentation/kernel-hacking/locking.rst 2022-01-27 11:22:33 -07:00
leds
litmus-tests
livepatch Documentation: livepatch: Add livepatch API page 2021-12-23 11:35:53 +01:00
locking Documentation/locking/locktypes: Update migrate_disable() bits. 2021-11-30 15:40:31 +01:00
m68k
maintainer
mhi
mips
misc-devices
netlabel
networking tcp: adjust TSO packet sizes based on min_rtt 2022-03-09 20:05:44 -08:00
nios2
nvdimm
openrisc
parisc
PCI
pcmcia
power Merge branches 'pm-opp', 'pm-devfreq' and 'powercap' 2022-01-10 18:00:31 +01:00
powerpc
process Kbuild updates for v5.17 2022-01-19 11:15:19 +02:00
RCU Merge branches 'doc.2021.11.30c', 'exp.2021.12.07a', 'fastnohz.2021.11.30c', 'fixes.2021.11.30c', 'nocb.2021.12.09a', 'nolibc.2021.11.30c', 'tasks.2021.12.09a', 'torture.2021.12.07a' and 'torturescript.2021.11.30c' into HEAD 2021-12-09 11:38:09 -08:00
riscv riscv: Move KASAN mapping next to the kernel mapping 2022-01-19 17:54:04 -08:00
s390
scheduler docs/scheduler: fix typo and warning in sched-bwc 2021-12-06 12:15:49 -07:00
scsi
security docs: update self-protection __ro_after_init status 2021-12-10 14:02:06 -07:00
sh
sound ALSA: hda/realtek: Add new alc285-hp-amp-init model 2021-12-14 10:44:26 +01:00
sparc
sphinx docs: automarkup.py: Fix invalid HTML link output and broken URI fragments 2022-01-07 09:32:58 -07:00
sphinx-static docs: add support for RTD dark mode 2021-12-10 14:05:55 -07:00
spi spi: pxa2xx: Get rid of unused enable_loopback member 2021-11-29 12:20:00 +00:00
staging Three small documentation fixes. 2022-01-22 09:02:57 +02:00
target
timers rcu: Remove the RCU_FAST_NO_HZ Kconfig option 2021-11-30 17:24:47 -08:00
tools Tracing fixes for 5.17: 2022-02-26 12:10:17 -08:00
trace Three small documentation fixes. 2022-01-22 09:02:57 +02:00
translations cpufreq: Reintroduce ready() callback 2022-02-09 13:18:49 +05:30
tty Documentation: add TTY chapter 2021-11-26 16:27:43 +01:00
usb docs: ABI: fixed req_number desc in UAC1 2021-12-30 12:10:44 +01:00
userspace-api xen: update missing ioctl magic numers documentation 2022-02-03 08:24:34 +01:00
virt Merge branch 'kvm-ppc-cap-210' into kvm-master 2022-02-22 09:07:16 -05:00
vm docs/vm: Fix typo in *harden* 2022-01-27 11:22:34 -07:00
w1
watchdog
x86 x86/sgx: Fix minor documentation issues 2021-11-17 06:36:09 -08:00
xtensa
.gitignore
arch.rst docs: Add documentation for ARC processors 2021-11-29 14:53:11 -07:00
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py docs: add support for RTD dark mode 2021-12-10 14:05:55 -07:00
COPYING-logo
docutils.conf
dontdiff
index.rst docs: Hook the RTLA documents into the kernel docs build 2022-01-27 11:20:39 -07:00
Kconfig
logo.gif
Makefile docs: address some text issues with css/theme support 2021-12-16 15:54:12 -07:00
memory-barriers.txt asm-generic: introduce io_stop_wc() and add implementation for ARM64 2021-12-22 10:44:53 +00:00
SubmittingPatches
watch_queue.rst