linux/net/core
Yaogong Wang 9f5afeae51 tcp: use an RB tree for ooo receive queue
Over the years, TCP BDP has increased by several orders of magnitude,
and some people are considering to reach the 2 Gbytes limit.

Even with current window scale limit of 14, ~1 Gbytes maps to ~740,000
MSS.

In presence of packet losses (or reorders), TCP stores incoming packets
into an out of order queue, and number of skbs sitting there waiting for
the missing packets to be received can be in the 10^5 range.

Most packets are appended to the tail of this queue, and when
packets can finally be transferred to receive queue, we scan the queue
from its head.

However, in presence of heavy losses, we might have to find an arbitrary
point in this queue, involving a linear scan for every incoming packet,
throwing away cpu caches.

This patch converts it to a RB tree, to get bounded latencies.

Yaogong wrote a preliminary patch about 2 years ago.
Eric did the rebase, added ofo_last_skb cache, polishing and tests.

Tested with network dropping between 1 and 10 % packets, with good
success (about 30 % increase of throughput in stress tests)

Next step would be to also use an RB tree for the write queue at sender
side ;)

Signed-off-by: Yaogong Wang <wygivan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-By: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:25:58 -07:00
..
datagram.c udp: enable MSG_PEEK at non-zero offset 2016-04-05 16:29:37 -04:00
dev_addr_lists.c
dev_ioctl.c
dev.c net: batch calls to flush_all_backlogs() 2016-08-30 22:17:20 -07:00
devlink.c devlink: add hardware messages tracing facility 2016-07-12 14:20:18 -07:00
drop_monitor.c drop_monitor: make genl_multicast_group const 2016-09-01 14:09:00 -07:00
dst_cache.c net: dst_cache_per_cpu_dst_set() can be static 2016-03-18 17:45:08 -04:00
dst.c net: add dst_cache to ovs vxlan lwtunnel 2016-02-16 20:21:48 -05:00
ethtool.c sctp: Add GSO support 2016-06-03 19:37:21 -04:00
fib_rules.c fib_rules: Added NLM_F_EXCL support to fib_nl_newrule 2016-06-30 08:23:19 -04:00
filter.c bpf: get rid of cgroup helper related ifdefs 2016-08-18 23:38:16 -07:00
flow_dissector.c flow_dissector: Get vlan priority in addition to vlan id 2016-08-18 23:13:13 -07:00
flow.c flowcache: Avoid OOM condition under preasure 2016-03-17 10:28:42 +01:00
gen_estimator.c net: sched: do not acquire qdisc spinlock in qdisc/class stats dump 2016-06-07 16:37:14 -07:00
gen_stats.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-10 11:52:24 -07:00
hwbm.c net: hwbm: Fix unbalanced spinlock in error case 2016-05-25 12:35:09 -07:00
link_watch.c dev: introduce dev_get_iflink() 2015-04-02 14:04:59 -04:00
lwtunnel.c net: lwtunnel: Handle fragmentation 2016-08-30 22:27:18 -07:00
Makefile net: add a hardware buffer management helper API 2016-03-14 12:19:46 -04:00
neighbour.c neigh: allow admin to set NUD_STALE 2016-08-08 15:36:38 -07:00
net_namespace.c netns: avoid disabling irq for netns id 2016-09-04 11:39:59 -07:00
net-procfs.c net: remove NETDEV_TX_LOCKED support 2016-04-26 15:53:05 -04:00
net-sysfs.c net: introduce NETDEV_CHANGE_TX_QUEUE_LEN 2016-07-01 05:32:17 -04:00
net-sysfs.h
net-traces.c net: IPv6 fib lookup tracepoint 2015-11-22 11:54:10 -05:00
netclassid_cgroup.c core: remove unneded headers for net cgroup controllers. 2016-02-17 15:31:27 -05:00
netevent.c netevent: remove automatic variable in register_netevent_notifier() 2015-05-31 00:03:21 -07:00
netpoll.c net: tracepoint napi:napi_poll add work and budget 2016-07-09 18:05:02 -04:00
netprio_cgroup.c core: remove unneded headers for net cgroup controllers. 2016-02-17 15:31:27 -05:00
pktgen.c net: pktgen: support injecting packets for qdisc testing 2016-07-04 16:07:34 -07:00
ptp_classifier.c ptp: Change ptp_class to a proper bitmask 2015-11-03 11:08:22 -05:00
request_sock.c tcp: restore fastopen operations 2015-10-05 03:19:06 -07:00
rtnetlink.c rtnetlink: fdb dump: optimize by saving last interface markers 2016-09-01 16:56:15 -07:00
scm.c unix: correctly track in-flight fds in sending process user_struct 2016-02-08 10:30:42 -05:00
secure_seq.c net: remove a sparse error in secure_dccpv6_sequence_number() 2015-05-25 22:55:37 -04:00
skbuff.c tcp: use an RB tree for ooo receive queue 2016-09-08 17:25:58 -07:00
sock_diag.c sock_diag: align nlattr properly when needed 2016-04-26 12:00:48 -04:00
sock_reuseport.c soreuseport: fix NULL ptr dereference SO_REUSEPORT after bind 2016-01-19 14:44:23 -05:00
sock.c net: remove clear_sk() method 2016-08-23 23:25:29 -07:00
stream.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
sysctl_net_core.c bpf: add generic constant blinding for use in jits 2016-05-16 13:49:32 -04:00
timestamping.c net: skb_defer_rx_timestamp should check for phydev before setting up classify 2015-07-09 14:17:15 -07:00
tso.c net: tso: add support for IPv6 2015-10-26 22:24:22 -07:00
utils.c net: the space is required before the open parenthesis '(' 2016-06-29 05:15:14 -04:00