linux/net/core
Eric Dumazet 5640f76858 net: use a per task frag allocator
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24 16:31:37 -04:00
..
datagram.c net: skb_free_datagram_locked() doesnt drop all packets 2012-06-27 15:40:57 -07:00
dev_addr_lists.c netdev: make address const in device address management 2012-09-19 16:35:22 -04:00
dev.c netpoll: call ->ndo_select_queue() in tx path 2012-09-19 17:19:09 -04:00
drop_monitor.c drop_monitor: dont sleep in atomic context 2012-06-04 11:42:01 -04:00
dst.c net: remove delay at device dismantle 2012-08-22 21:50:36 -07:00
ethtool.c net: provide a default dev->ethtool_ops 2012-09-19 15:40:15 -04:00
fib_rules.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
filter.c filter: add MOD operation 2012-09-10 15:44:56 -04:00
flow_dissector.c ipv6: add ipv6_addr_hash() helper 2012-07-18 11:28:46 -07:00
flow.c net: Add a flow_cache_flush_deferred function 2011-12-21 16:48:08 -05:00
gen_estimator.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
gen_stats.c gen_stats: Stop using NLA_PUT*(). 2012-04-02 04:33:44 -04:00
iovec.c net: get rid of some pointless casts to sockaddr 2012-03-11 19:11:22 -07:00
link_watch.c net: Set device operstate at registration time 2012-08-24 12:46:13 -04:00
Makefile sock_diag: Move the sock_ code to net/core/ 2011-12-06 13:58:02 -05:00
neighbour.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
net_namespace.c net: Statically initialize init_net.dev_base_head 2012-07-18 13:32:27 -07:00
net-sysfs.c net: add unknown state to sysfs NIC duplex export 2012-09-05 17:40:07 -04:00
net-sysfs.h xps: Add CONFIG_XPS 2010-11-28 18:24:14 -08:00
net-traces.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
netevent.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
netpoll.c netpoll: call ->ndo_select_queue() in tx path 2012-09-19 17:19:09 -04:00
netprio_cgroup.c netprio_cgroup: Use memcpy instead of the for-loop to copy priomap 2012-09-13 16:18:40 -04:00
pktgen.c pktgen: fix crash with vlan and packet size less than 46 2012-09-13 17:10:00 -04:00
request_sock.c tcp: TCP Fast Open Server - support TFO listeners 2012-08-31 20:02:19 -04:00
rtnetlink.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
scm.c scm: Don't use struct ucred in NETLINK_CB and struct scm_cookie. 2012-09-07 14:42:05 -04:00
secure_seq.c netfilter: ipv6: add IPv6 NAT support 2012-08-30 03:00:17 +02:00
skbuff.c net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
sock_diag.c netlink: hide struct module parameter in netlink_kernel_create 2012-09-08 18:46:30 -04:00
sock.c net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
stream.c net: Fix the condition passed to sk_wait_event() 2010-10-03 20:41:32 -07:00
sysctl_net_core.c net: Delete all remaining instances of ctl_path 2012-04-20 21:22:30 -04:00
timestamping.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
user_dma.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
utils.c net: core: add function for incremental IPv6 pseudo header checksum updates 2012-08-30 03:00:16 +02:00