linux/net/sched
Eric Dumazet 5640f76858 net: use a per task frag allocator
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24 16:31:37 -04:00
..
act_api.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
act_csum.c ipv6: correct the ipv6 option name - Pad0 to Pad1 2012-05-17 15:49:51 -04:00
act_gact.c net_sched: gact: Fix potential panic in tcf_gact(). 2012-08-03 16:47:24 -07:00
act_ipt.c net_sched: act: Delete estimator in error path. 2012-08-06 13:30:01 -07:00
act_mirred.c act_mirred: do not drop packets when fails to mirror it 2012-08-16 14:54:44 -07:00
act_nat.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
act_pedit.c net_sched: act: Delete estimator in error path. 2012-08-06 13:30:01 -07:00
act_police.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
act_simple.c net_sched: act: Delete estimator in error path. 2012-08-06 13:30:01 -07:00
act_skbedit.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
cls_api.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
cls_basic.c net sched: Pass the skb into change so it can access NETLINK_CB 2012-08-14 21:55:28 -07:00
cls_cgroup.c net sched: Pass the skb into change so it can access NETLINK_CB 2012-08-14 21:55:28 -07:00
cls_flow.c userns: Convert cls_flow to work with user namespaces enabled 2012-08-14 21:55:28 -07:00
cls_fw.c net sched: Pass the skb into change so it can access NETLINK_CB 2012-08-14 21:55:28 -07:00
cls_route.c net sched: Pass the skb into change so it can access NETLINK_CB 2012-08-14 21:55:28 -07:00
cls_rsvp6.c
cls_rsvp.c
cls_rsvp.h net sched: Pass the skb into change so it can access NETLINK_CB 2012-08-14 21:55:28 -07:00
cls_tcindex.c net sched: Pass the skb into change so it can access NETLINK_CB 2012-08-14 21:55:28 -07:00
cls_u32.c net sched: Pass the skb into change so it can access NETLINK_CB 2012-08-14 21:55:28 -07:00
em_canid.c net: em_canid: Ematch rule to match CAN frames according to their identifiers 2012-07-04 13:07:05 +02:00
em_cmp.c net_sched: cleanups 2011-01-19 23:31:12 -08:00
em_ipset.c net: sched: add ipset ematch 2012-07-12 07:54:46 -07:00
em_meta.c net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
em_nbyte.c net_sched: cleanups 2011-01-19 23:31:12 -08:00
em_text.c net_sched: cleanups 2011-01-19 23:31:12 -08:00
em_u32.c net_sched: cleanups 2011-01-19 23:31:12 -08:00
ematch.c net: Convert net_ratelimit uses to net_<level>_ratelimited 2012-05-15 13:45:03 -04:00
Kconfig net: sched: add ipset ematch 2012-07-12 07:54:46 -07:00
Makefile net: sched: add ipset ematch 2012-07-12 07:54:46 -07:00
sch_api.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
sch_atm.c sch_atm.c: get rid of poinless extern 2012-06-01 10:37:18 -04:00
sch_blackhole.c
sch_cbq.c net-sched: sch_cbq: avoid infinite loop 2012-09-11 22:20:43 -04:00
sch_choke.c net: sched: factorize code (qdisc_drop()) 2012-05-04 11:50:05 -04:00
sch_codel.c fq_codel: should use qdisc backlog as threshold 2012-05-16 15:30:26 -04:00
sch_drr.c net_sched: update bstats in dequeue() 2012-05-10 23:33:01 -04:00
sch_dsmark.c net: sched: factorize code (qdisc_drop()) 2012-05-04 11:50:05 -04:00
sch_fifo.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_fq_codel.c fq_codel: dont reinit flow state 2012-09-03 14:36:50 -04:00
sch_generic.c net: qdisc busylock needs lockdep annotations 2012-09-05 17:49:27 -04:00
sch_gred.c net_sched: gred: actually perform idling in WRED mode 2012-09-13 16:10:13 -04:00
sch_hfsc.c net_sched: update bstats in dequeue() 2012-05-10 23:33:01 -04:00
sch_htb.c net_sched: update bstats in dequeue() 2012-05-10 23:33:01 -04:00
sch_ingress.c net_sched: factorize qdisc stats handling 2011-01-10 16:07:54 -08:00
sch_mq.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
sch_mqprio.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_multiq.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_netem.c netem: refine early skb orphaning 2012-07-16 23:08:33 -07:00
sch_plug.c net_sched: sch_plug: plug_qdisc_ops is static 2012-02-13 16:04:40 -05:00
sch_prio.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_qfq.c sched: add missing group change to qfq_change_class 2012-08-08 16:02:05 -07:00
sch_red.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_sfb.c sch_sfb: Fix missing NULL check 2012-07-12 08:33:18 -07:00
sch_sfq.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_tbf.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_teql.c sch_teql: Convert over to dev_neigh_lookup_skb(). 2012-07-05 01:09:06 -07:00