linux/net
Eric Dumazet d41a69f1d3 tcp: make tcp_sendmsg() aware of socket backlog
Large sendmsg()/write() hold socket lock for the duration of the call,
unless sk->sk_sndbuf limit is hit. This is bad because incoming packets
are parked into socket backlog for a long time.
Critical decisions like fast retransmit might be delayed.
Receivers have to maintain a big out of order queue with additional cpu
overhead, and also possible stalls in TX once windows are full.

Bidirectional flows are particularly hurt since the backlog can become
quite big if the copy from user space triggers IO (page faults)

Some applications learnt to use sendmsg() (or sendmmsg()) with small
chunks to avoid this issue.

Kernel should know better, right ?

Add a generic sk_flush_backlog() helper and use it right
before a new skb is allocated. Typically we put 64KB of payload
per skb (unless MSG_EOR is requested) and checking socket backlog
every 64KB gives good results.

As a matter of fact, tests with TSO/GSO disabled give very nice
results, as we manage to keep a small write queue and smaller
perceived rtt.

Note that sk_flush_backlog() maintains socket ownership,
so is not equivalent to a {release_sock(sk); lock_sock(sk);},
to ensure implicit atomicity rules that sendmsg() was
giving to (possibly buggy) applications.

In this simple implementation, I chose to not call tcp_release_cb(),
but we might consider this later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 17:02:26 -04:00
..
6lowpan 6lowpan: move mac802154 header 2016-04-13 10:41:10 +02:00
9p net/9p: convert to new CQ API 2016-03-10 20:54:09 -05:00
802
8021q vlan: propagate gso_max_segs 2016-03-17 21:05:01 -04:00
appletalk appletalk: fix erroneous return value 2016-02-18 14:59:34 -05:00
atm
ax25 ax25: add link layer header validation function 2016-03-09 22:13:01 -05:00
batman-adv batman-adv: clarify CFG80211 dependency 2016-03-02 13:45:47 -05:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2016-04-26 13:15:56 -04:00
bridge ipv6: rename IP6_INC_STATS_BH() 2016-04-27 22:48:24 -04:00
caif net: caif: fix misleading indentation 2016-03-14 13:09:50 -04:00
can sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
ceph mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
core tcp: make tcp_sendmsg() aware of socket backlog 2016-05-02 17:02:26 -04:00
dcb
dccp dccp: do not assume DCCP code is non preemptible 2016-05-02 17:02:25 -04:00
decnet decnet: Do not build routes to devices without decnet private data. 2016-04-10 23:01:30 -04:00
dns_resolver
dsa net: dsa: Provide CPU port statistics to master netdev 2016-04-28 17:16:17 -04:00
ethernet eth: Pull header from first fragment via eth_get_headlen 2016-02-24 13:58:05 -05:00
hsr NLA_BINARY misuse bug in HSR 2016-04-21 13:59:08 -04:00
ieee802154 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2016-04-26 13:15:56 -04:00
ipv4 tcp: make tcp_sendmsg() aware of socket backlog 2016-05-02 17:02:26 -04:00
ipv6 udp: prepare for non BH masking at backlog processing 2016-05-02 17:02:25 -04:00
ipx
irda
iucv
kcm kcm: Add receive message timeout 2016-03-09 16:36:15 -05:00
key
l2tp l2tp: use nla_put_u64_64bit() 2016-04-25 15:09:10 -04:00
l3mdev net: l3mdev: address selection should only consider devices in L3 domain 2016-02-26 14:22:26 -05:00
lapb
llc sock: tigthen lockdep checks for sock_owned_by_user 2016-04-13 22:37:20 -04:00
mac80211 cfg80211: remove enum ieee80211_band 2016-04-12 15:56:15 +02:00
mac802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
mpls GSO: Add GSO type for fixed IPv4 ID 2016-04-14 16:23:40 -04:00
netfilter netfilter/ipvs: use nla_put_u64_64bit() 2016-04-25 15:09:11 -04:00
netlabel netlabel: do not initialise statics to NULL 2016-03-07 11:08:26 -05:00
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-23 18:51:33 -04:00
netrom
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
openvswitch ovs: align nlattr properly when needed 2016-04-26 12:00:48 -04:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-23 18:51:33 -04:00
phonet
rds RDS: TCP: Call pskb_extract() helper function 2016-04-25 16:54:14 -04:00
rfkill rfkill: Use switch to demux userspace operations 2016-04-05 10:48:53 +02:00
rose
rxrpc net: udp: rename UDP_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
sched net: remove NETDEV_TX_LOCKED support 2016-04-26 15:53:05 -04:00
sctp sctp: prepare for socket backlog behavior change 2016-05-02 17:02:26 -04:00
sunrpc net: udp: rename UDP_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
switchdev switchdev: Adding complete operation to deferred switchdev ops 2016-04-24 14:23:32 -04:00
tipc tipc: set 'active' state correctly for first established link 2016-05-01 19:40:22 -04:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-02-23 00:09:14 -05:00
vmw_vsock VSOCK: Only check error on skb_recv_datagram when skb is NULL 2016-04-19 20:42:01 -04:00
wimax
wireless wireless: use nla_put_u64_64bit() 2016-04-25 15:09:11 -04:00
x25
xfrm xfrm: align nlattr properly when needed 2016-04-23 20:13:25 -04:00
compat.c
Kconfig Make DST_CACHE a silent config option 2016-03-21 22:56:38 -04:00
Makefile kcm: Kernel Connection Multiplexor module 2016-03-09 16:36:14 -05:00
socket.c tcp: remove SKBTX_ACK_TSTAMP since it is redundant 2016-04-28 16:06:10 -04:00
sysctl_net.c