linux/net
Eric Dumazet 9b462d02d6 tcp: TCP Small Queues and strange attractors
TCP Small queues tries to keep number of packets in qdisc
as small as possible, and depends on a tasklet to feed following
packets at TX completion time.
Choice of tasklet was driven by latencies requirements.

Then, TCP stack tries to avoid reorders, by locking flows with
outstanding packets in qdisc in a given TX queue.

What can happen is that many flows get attracted by a low performing
TX queue, and cpu servicing TX completion has to feed packets for all of
them, making this cpu 100% busy in softirq mode.

This became particularly visible with latest skb->xmit_more support

Strategy adopted in this patch is to detect when tcp_wfree() is called
from ksoftirqd and let the outstanding queue for this flow being drained
before feeding additional packets, so that skb->ooo_okay can be set
to allow select_queue() to select the optimal queue :

Incoming ACKS are normally handled by different cpus, so this patch
gives more chance for these cpus to take over the burden of feeding
qdisc with future packets.

Tested:

lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 &

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:18 AM      eth1 595448.00 1190564.00  38381.09 1760253.12      0.00      0.00      1.00
06:16:19 AM      eth1 594858.00 1189686.00  38340.76 1758952.72      0.00      0.00      0.00
06:16:20 AM      eth1 597017.00 1194019.00  38480.79 1765370.29      0.00      0.00      1.00
06:16:21 AM      eth1 595450.00 1190936.00  38380.19 1760805.05      0.00      0.00      0.00
06:16:22 AM      eth1 596385.00 1193096.00  38442.56 1763976.29      0.00      0.00      1.00
06:16:23 AM      eth1 598155.00 1195978.00  38552.97 1768264.60      0.00      0.00      0.00
06:16:24 AM      eth1 594405.00 1188643.00  38312.57 1757414.89      0.00      0.00      1.00
06:16:25 AM      eth1 593366.00 1187154.00  38252.16 1755195.83      0.00      0.00      0.00
06:16:26 AM      eth1 593188.00 1186118.00  38232.88 1753682.57      0.00      0.00      1.00
06:16:27 AM      eth1 596301.00 1192241.00  38440.94 1762733.09      0.00      0.00      0.00
Average:         eth1 595457.30 1190843.50  38381.69 1760664.84      0.00      0.00      0.50
lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog
 backlog 7606336b 2513p requeues 167982
 backlog 224072b 74p requeues 566
 backlog 581376b 192p requeues 5598
 backlog 181680b 60p requeues 1070
 backlog 5305056b 1753p requeues 110166    // Here, this TX queue is attracting flows
 backlog 157456b 52p requeues 1758
 backlog 672216b 222p requeues 3025
 backlog 60560b 20p requeues 24541
 backlog 448144b 148p requeues 21258

lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect

Immediate jump to full bandwidth, and traffic is properly
shard on all tx queues.

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:46 AM      eth1 1397632.00 2795397.00  90081.87 4133031.26      0.00      0.00      1.00
06:16:47 AM      eth1 1396874.00 2793614.00  90032.99 4130385.46      0.00      0.00      0.00
06:16:48 AM      eth1 1395842.00 2791600.00  89966.46 4127409.67      0.00      0.00      1.00
06:16:49 AM      eth1 1395528.00 2791017.00  89946.17 4126551.24      0.00      0.00      0.00
06:16:50 AM      eth1 1397891.00 2795716.00  90098.74 4133497.39      0.00      0.00      1.00
06:16:51 AM      eth1 1394951.00 2789984.00  89908.96 4125022.51      0.00      0.00      0.00
06:16:52 AM      eth1 1394608.00 2789190.00  89886.90 4123851.36      0.00      0.00      1.00
06:16:53 AM      eth1 1395314.00 2790653.00  89934.33 4125983.09      0.00      0.00      0.00
06:16:54 AM      eth1 1396115.00 2792276.00  89984.25 4128411.21      0.00      0.00      1.00
06:16:55 AM      eth1 1396829.00 2793523.00  90030.19 4130250.28      0.00      0.00      0.00
Average:         eth1 1396158.40 2792297.00  89987.09 4128439.35      0.00      0.00      0.50

lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog
 backlog 7900052b 2609p requeues 173287
 backlog 878120b 290p requeues 589
 backlog 1068884b 354p requeues 5621
 backlog 996212b 329p requeues 1088
 backlog 984100b 325p requeues 115316
 backlog 956848b 316p requeues 1781
 backlog 1080996b 357p requeues 3047
 backlog 975016b 322p requeues 24571
 backlog 990156b 327p requeues 21274

(All 8 TX queues get a fair share of the traffic)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 17:16:26 -04:00
..
6lowpan 6lowpan: Allow 6LoWPAN to be modular 2014-08-07 11:44:18 -07:00
9p 9P: remove unnecessary break after return 2014-07-15 16:27:00 -07:00
802 net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
8021q net: better IFF_XMIT_DST_RELEASE support 2014-10-07 13:22:11 -04:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-07-16 14:09:34 -07:00
atm net: better IFF_XMIT_DST_RELEASE support 2014-10-07 13:22:11 -04:00
ax25
batman-adv batman-adv: Fix parameter order of hlist_add_behind 2014-08-16 19:19:08 -07:00
bluetooth Bluetooth: 6lowpan: Check transmit errors for multicast packets 2014-10-02 13:41:57 +03:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-10-08 16:22:22 -04:00
caif caif_usb: use target structure member in memset 2014-10-14 16:05:45 -04:00
can
ceph libceph: do not hard code max auth ticket len 2014-09-10 20:08:36 +04:00
core net: Trap attempts to call sock_kfree_s() with a NULL pointer. 2014-10-14 17:02:37 -04:00
dcb dcbnl : Fix misleading dcb_app->priority explanation 2014-07-30 17:21:05 -07:00
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-10-08 21:40:54 -04:00
decnet af_decnet: Use time_after_eq 2014-08-22 12:23:11 -07:00
dns_resolver Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2014-08-06 08:06:39 -07:00
dsa net: dsa: do not call phy_start_aneg 2014-10-04 20:44:44 -04:00
ethernet net: Add function for parsing the header length out of linear ethernet frames 2014-09-05 17:47:02 -07:00
hsr net/hsr: Remove left-over never-true conditional code. 2014-07-11 15:04:40 -07:00
ieee802154 Merge tag 'master-2014-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2014-10-05 21:34:39 -04:00
ipv4 tcp: TCP Small Queues and strange attractors 2014-10-14 17:16:26 -04:00
ipv6 ipv6: remove aca_lock spinlock from struct ifacaddr6 2014-10-14 13:15:15 -04:00
ipx
irda irda: add __init to irlan_open 2014-09-30 17:08:06 -04:00
iucv iucv: Convert pr_warning to pr_warn 2014-09-10 12:40:10 -07:00
key af_key: remove unnecessary break after return 2014-07-15 16:27:00 -07:00
l2tp l2tp: Refactor l2tp core driver to make use of the common UDP tunnel functions 2014-09-19 15:57:15 -04:00
lapb
llc net_dma: simple removal 2014-09-28 07:05:16 -07:00
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-10-07 14:48:29 -04:00
mac802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-10-08 21:40:54 -04:00
mpls net: Remove gso_send_check as an offload callback 2014-09-26 00:22:47 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2014-10-10 15:01:09 -04:00
netlabel netlabel: kernel-doc warning fix 2014-10-09 01:40:05 -04:00
netlink netlink: Annotate RCU locking for seq_file walker 2014-08-14 15:13:40 -07:00
netrom net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
nfc NFC: nci: Add support for proprietary RF Protocols 2014-09-24 02:02:24 +02:00
openvswitch openvswitch: fix a sparse warning 2014-10-07 00:10:48 -04:00
packet net: Pass a "more" indication down into netdev_start_xmit() code paths. 2014-09-01 17:39:55 -07:00
phonet net: fix rcu access on phonet_routes 2014-10-06 18:16:30 -04:00
rds rds: avoid calling sock_kfree_s() on allocation failure 2014-10-14 17:00:19 -04:00
rfkill net: rfkill: kernel-doc warning fixes 2014-10-08 15:24:15 -04:00
rose rose: use %*ph specifier 2014-09-07 16:07:25 -07:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-23 12:09:27 -04:00
sched net_sched: restore qdisc quota fairness limits after bulk dequeue 2014-10-09 19:12:26 -04:00
sctp net: sctp: fix remote memory pressure from excessive queueing 2014-10-14 12:46:22 -04:00
sunrpc Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux 2014-10-08 12:51:44 -04:00
tipc tipc: fix bug in multicast congestion handling 2014-10-07 14:50:15 -04:00
unix af_unix: remove 0 assignment on static 2014-10-07 17:03:14 -04:00
vmw_vsock
wimax wimax: convert printk to pr_foo() 2014-10-07 20:28:44 -04:00
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-10-07 14:48:29 -04:00
x25
xfrm net: cleanup and document skb fclone layout 2014-10-01 16:34:25 -04:00
compat.c net: sendmsg: fix NULL pointer dereference 2014-07-29 12:20:22 -07:00
Kconfig net: bpf: fix bpf syscall dependence on anon_inodes 2014-10-10 15:02:23 -04:00
Makefile 6lowpan: introduce new net/6lowpan directory 2014-07-12 01:53:30 +02:00
nonet.c
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-23 12:09:27 -04:00
sysctl_net.c