linux/net/core
Eric Dumazet 46d3ceabd8 tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 18:12:59 -07:00
..
datagram.c net: skb_free_datagram_locked() doesnt drop all packets 2012-06-27 15:40:57 -07:00
dev_addr_lists.c net: addr_list: add exclusive dev_uc_add and dev_mc_add 2012-04-15 13:06:04 -04:00
dev.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-10 23:56:33 -07:00
drop_monitor.c drop_monitor: dont sleep in atomic context 2012-06-04 11:42:01 -04:00
dst.c net: Kill dst->_neighbour, accessors, and final uses. 2012-07-05 02:42:00 -07:00
ethtool.c ethtool: Make more commands available to unprivileged processes 2012-06-12 18:51:09 -07:00
fib_rules.c ipv4: Elide fib_validate_source() completely when possible. 2012-06-29 01:36:36 -07:00
filter.c net/core: fix kernel-doc warnings 2012-06-08 22:20:58 -07:00
flow_dissector.c net: flow_dissector.c missing include linux/export.h 2012-01-24 16:03:33 -05:00
flow.c net: Add a flow_cache_flush_deferred function 2011-12-21 16:48:08 -05:00
gen_estimator.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
gen_stats.c gen_stats: Stop using NLA_PUT*(). 2012-04-02 04:33:44 -04:00
iovec.c net: get rid of some pointless casts to sockaddr 2012-03-11 19:11:22 -07:00
link_watch.c net: linkwatch: allow vlans to get carrier changes faster 2011-09-15 15:36:34 -04:00
Makefile sock_diag: Move the sock_ code to net/core/ 2011-12-06 13:58:02 -05:00
neighbour.c neigh: Convert over to dst_neigh_lookup_skb(). 2012-07-05 01:12:00 -07:00
net_namespace.c net: core: Use pr_<level> 2012-05-17 05:00:04 -04:00
net-sysfs.c wireless: remove wext sysfs 2012-06-05 15:32:15 -04:00
net-sysfs.h xps: Add CONFIG_XPS 2010-11-28 18:24:14 -08:00
net-traces.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
netevent.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
netpoll.c netpoll: fix netpoll_send_udp() bugs 2012-06-13 15:57:31 -07:00
netprio_cgroup.c net: cgroup: fix out of bounds accesses 2012-07-09 14:50:54 -07:00
pktgen.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-05-20 21:53:04 -04:00
request_sock.c ipv4:correct description for tcp_max_syn_backlog 2011-12-06 13:02:28 -05:00
rtnetlink.c net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
scm.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
secure_seq.c net: fix some sparse errors 2012-01-17 10:31:12 -05:00
skbuff.c net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
sock_diag.c netlink: add netlink_kernel_cfg parameter to netlink_kernel_create 2012-06-29 16:46:02 -07:00
sock.c tcp: TCP Small Queues 2012-07-11 18:12:59 -07:00
stream.c net: Fix the condition passed to sk_wait_event() 2010-10-03 20:41:32 -07:00
sysctl_net_core.c net: Delete all remaining instances of ctl_path 2012-04-20 21:22:30 -04:00
timestamping.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
user_dma.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
utils.c net: Fixed coding style issues relating to braces. 2012-04-12 16:35:48 -04:00