linux/net/ipv4
Eric Dumazet f54b311142 tcp: auto corking
With the introduction of TCP Small Queues, TSO auto sizing, and TCP
pacing, we can implement Automatic Corking in the kernel, to help
applications doing small write()/sendmsg() to TCP sockets.

Idea is to change tcp_push() to check if the current skb payload is
under skb optimal size (a multiple of MSS bytes)

If under 'size_goal', and at least one packet is still in Qdisc or
NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
will be delayed up to TX completion time.

This delay might allow the application to coalesce more bytes
in the skb in following write()/sendmsg()/sendfile() system calls.

The exact duration of the delay is depending on the dynamics
of the system, and might be zero if no packet for this flow
is actually held in Qdisc or NIC TX ring.

Using FQ/pacing is a way to increase the probability of
autocorking being triggered.

Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
this feature and default it to 1 (enabled)

Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
This counter is incremented every time we detected skb was under used
and its flush was deferred.

Tested:

Interesting effects when using line buffered commands under ssh.

Excellent performance results in term of cpu usage and total throughput.

lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
9410.39

 Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':

      35209.439626 task-clock                #    2.901 CPUs utilized
             2,294 context-switches          #    0.065 K/sec
               101 CPU-migrations            #    0.003 K/sec
             4,079 page-faults               #    0.116 K/sec
    97,923,241,298 cycles                    #    2.781 GHz                     [83.31%]
    51,832,908,236 stalled-cycles-frontend   #   52.93% frontend cycles idle    [83.30%]
    25,697,986,603 stalled-cycles-backend    #   26.24% backend  cycles idle    [66.70%]
   102,225,978,536 instructions              #    1.04  insns per cycle
                                             #    0.51  stalled cycles per insn [83.38%]
    18,657,696,819 branches                  #  529.906 M/sec                   [83.29%]
        91,679,646 branch-misses             #    0.49% of all branches         [83.40%]

      12.136204899 seconds time elapsed

lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
6624.89

 Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
      40045.864494 task-clock                #    3.301 CPUs utilized
               171 context-switches          #    0.004 K/sec
                53 CPU-migrations            #    0.001 K/sec
             4,080 page-faults               #    0.102 K/sec
   111,340,458,645 cycles                    #    2.780 GHz                     [83.34%]
    61,778,039,277 stalled-cycles-frontend   #   55.49% frontend cycles idle    [83.31%]
    29,295,522,759 stalled-cycles-backend    #   26.31% backend  cycles idle    [66.67%]
   108,654,349,355 instructions              #    0.98  insns per cycle
                                             #    0.57  stalled cycles per insn [83.34%]
    19,552,170,748 branches                  #  488.244 M/sec                   [83.34%]
       157,875,417 branch-misses             #    0.81% of all branches         [83.34%]

      12.130267788 seconds time elapsed

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:51:41 -05:00
..
netfilter ipv4/ipv6: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
af_inet.c Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:30:30 +09:00
ah4.c ipv4: properly refresh rtable entries on pmtu/redirect events 2013-06-03 00:07:42 -07:00
arp.c net: neighbour: Remove CONFIG_ARPD 2013-09-03 21:41:43 -04:00
cipso_ipv4.c ipv4/ipv6: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
datagram.c ipv4: fix possible seqlock deadlock 2013-11-14 17:31:14 -05:00
devinet.c net: igmp: Allow user-space configuration of igmp unsolicited report interval 2013-08-09 11:27:46 -07:00
esp4.c net: esp{4,6}: get rid of struct esp_data 2013-10-29 06:39:42 +01:00
fib_frontend.c fib_trie: remove duplicated rcu lock 2013-10-18 13:53:59 -04:00
fib_lookup.h net: ipv4/ipv6: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
fib_rules.c fib_rules: fix suppressor names and default values 2013-08-03 10:40:23 -07:00
fib_semantics.c fib: Use const struct nl_info * in rtmsg_fib 2013-10-18 14:42:15 -04:00
fib_trie.c seq_file: remove "%n" usage from seq_file users 2013-11-15 09:32:20 +09:00
gre_demux.c ipv4: generalize gre_handle_offloads 2013-10-19 19:36:18 -04:00
gre_offload.c ipip: add GSO/TSO support 2013-10-19 19:36:19 -04:00
icmp.c ipv4: processing ancillary IP_TOS or IP_TTL 2013-09-28 15:21:52 -07:00
igmp.c ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put 2013-09-30 22:28:56 -07:00
inet_connection_sock.c inet: rename ir_loc_port to ir_num 2013-10-10 14:37:35 -04:00
inet_diag.c inet_diag: use sock_gen_put() 2013-10-17 15:02:02 -04:00
inet_fragment.c inet: remove old fragmentation hash initializing 2013-10-23 17:01:41 -04:00
inet_hashtables.c inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once 2013-10-19 19:45:35 -04:00
inet_lro.c ipv4: replace ip_fast_csum with csum_replace2 2013-03-15 09:12:25 -04:00
inet_timewait_sock.c tcp/dccp: remove twchain 2013-10-08 23:19:24 -04:00
inetpeer.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
ip_forward.c ipv4: introduce rt_uses_gateway 2012-10-08 17:42:36 -04:00
ip_fragment.c ipv4: initialize ip4_frags hash secret as late as possible 2013-10-23 17:01:40 -04:00
ip_gre.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
ip_input.c net: add SNMP counters tracking incoming ECN bits 2013-08-08 22:24:59 -07:00
ip_options.c net/ipv4: Ensure that location of timestamp option is stored 2013-03-12 05:35:39 -04:00
ip_output.c ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE 2013-11-05 21:52:27 -05:00
ip_sockglue.c inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions 2013-11-23 14:46:23 -08:00
ip_tunnel_core.c ipv4: generalize gre_handle_offloads 2013-10-19 19:36:18 -04:00
ip_tunnel.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-11-19 15:50:47 -08:00
ip_vti.c xfrm: Release dst if this dst is improper for vti tunnel 2013-11-19 15:50:57 -05:00
ipcomp.c ipv4: properly refresh rtable entries on pmtu/redirect events 2013-06-03 00:07:42 -07:00
ipconfig.c ipconfig: add informative timeout messages while waiting for carrier 2013-04-02 14:35:33 -04:00
ipip.c ipip: add GSO/TSO support 2013-10-19 19:36:19 -04:00
ipmr.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
Kconfig net: neighbour: Remove CONFIG_ARPD 2013-09-03 21:41:43 -04:00
Makefile net: gre: move GSO functions to gre_offload 2013-07-03 14:37:39 -07:00
netfilter.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
ping.c inet: fix possible seqlock deadlocks 2013-11-29 16:37:36 -05:00
proc.c tcp: auto corking 2013-12-06 12:51:41 -05:00
protocol.c net: remove outdated comment for ipv4 and ipv6 protocol handler 2013-11-28 18:47:51 -05:00
raw.c inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions 2013-11-23 14:46:23 -08:00
route.c ipv4: fix race in concurrent ip_route_input_slow() 2013-11-20 15:28:44 -05:00
syncookies.c inet: split syncookie keys for ipv4 and ipv6 and initialize with net_get_random_once 2013-10-19 19:45:35 -04:00
sysctl_net_ipv4.c tcp: auto corking 2013-12-06 12:51:41 -05:00
tcp_bic.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_cong.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_cubic.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_diag.c
tcp_fastopen.c tcp: enable sockets to use MSG_FASTOPEN by default 2013-11-04 19:57:47 -05:00
tcp_highspeed.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_htcp.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_hybla.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_illinois.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_input.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_ipv4.c inet: fix possible seqlock deadlocks 2013-11-29 16:37:36 -05:00
tcp_lp.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_memcontrol.c tcp_memcg: remove useless var old_lim 2013-11-23 14:46:21 -08:00
tcp_metrics.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
tcp_minisocks.c ipv6: make lookups simpler and faster 2013-10-09 00:01:25 -04:00
tcp_offload.c gro: Clean up tcpX_gro_receive checksum verification 2013-11-23 14:46:19 -08:00
tcp_output.c tcp: optimize some skb_shinfo(skb) uses 2013-12-06 12:51:40 -05:00
tcp_probe.c ipv6: make lookups simpler and faster 2013-10-09 00:01:25 -04:00
tcp_scalable.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_timer.c tcp: temporarily disable Fast Open on SYN timeout 2013-10-29 22:50:41 -04:00
tcp_vegas.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_vegas.h net: ipv4/ipv6: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
tcp_veno.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_westwood.c tcp: refactor F-RTO 2013-03-21 11:47:50 -04:00
tcp_yeah.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp.c tcp: auto corking 2013-12-06 12:51:41 -05:00
tunnel4.c
udp_diag.c netlink: rename ssk to sk in struct netlink_skb_params 2013-04-19 14:57:56 -04:00
udp_impl.h net: ipv4/ipv6: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
udp_offload.c ipip: add GSO/TSO support 2013-10-19 19:36:19 -04:00
udp.c inet: fix possible seqlock deadlocks 2013-11-29 16:37:36 -05:00
udplite.c
xfrm4_input.c net: Add skb_unclone() helper function. 2013-02-15 15:10:37 -05:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2013-09-30 15:24:57 -04:00
xfrm4_output.c xfrm: revert ipv4 mtu determination to dst_mtu 2013-08-26 12:40:53 +02:00
xfrm4_policy.c xfrm: Fix null pointer dereference when decoding sessions 2013-11-01 07:08:46 +01:00
xfrm4_state.c xfrm: make local error reporting more robust 2013-08-14 13:07:12 +02:00
xfrm4_tunnel.c sit: add IPv4 over IPv4 support 2013-05-31 17:19:05 -07:00