linux/net/ipv6
Eric Dumazet 8b27dae5a2 tcp: add one skb cache for rx
Often times, recvmsg() system calls and BH handling for a particular
TCP socket are done on different cpus.

This means the incoming skb had to be allocated on a cpu,
but freed on another.

This incurs a high spinlock contention in slab layer for small rpc,
but also a high number of cache line ping pongs for larger packets.

A full size GRO packet might use 45 page fragments, meaning
that up to 45 put_page() can be involved.

More over performing the __kfree_skb() in the recvmsg() context
adds a latency for user applications, and increase probability
of trapping them in backlog processing, since the BH handler
might found the socket owned by the user.

This patch, combined with the prior one increases the rpc
performance by about 10 % on servers with large number of cores.

(tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
 instead of 8 Mpps)

This also increases single bulk flow performance on 40Gbit+ links,
since in this case there are often two cpus working in tandem :

 - CPU handling the NIC rx interrupts, feeding the receive queue,
  and (after this patch) freeing the skbs that were consumed.

 - CPU in recvmsg() system call, essentially 100 % busy copying out
  data to user space.

Having at most one skb in a per-socket cache has very little risk
of memory exhaustion, and since it is protected by socket lock,
its management is essentially free.

Note that if rps/rfs is used, we do not enable this feature, because
there is high chance that the same cpu is handling both the recvmsg()
system call and the TCP rx path, but that another cpu did the skb
allocations in the device driver right before the RPS/RFS logic.

To properly handle this case, it seems we would need to record
on which cpu skb was allocated, and use a different channel
to give skbs back to this cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23 21:57:38 -04:00
..
ila genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
netfilter netfilter: nf_tables: merge ipv4 and ipv6 nat chain types 2019-03-01 14:36:59 +01:00
addrconf_core.c ipv6_stub: add ipv6_route_input stub/proxy. 2019-02-13 18:27:55 -08:00
addrconf.c net: ignore sysctl_devconf_inherit_init_net without SYSCTL 2019-03-04 13:14:34 -08:00
addrlabel.c net: ipv6: addrlabel: perform strict checks also for doit handlers 2019-01-19 10:09:59 -08:00
af_inet6.c ipv6: Add icmp_echo_ignore_anycast for ICMPv6 2019-03-20 16:29:37 -07:00
ah6.c
anycast.c net/ipv6: compute anycast address hash only if dev is null 2018-11-08 17:04:43 -08:00
calipso.c
datagram.c ipv6: fix kernel-infoleak in ipv6_local_error() 2019-01-10 09:36:41 -05:00
esp6_offload.c net: use skb_sec_path helper in more places 2018-12-19 11:21:37 -08:00
esp6.c esp: Skip TX bytes accounting when sending from a request socket 2019-01-28 11:20:58 +01:00
exthdrs_core.c
exthdrs_offload.c
exthdrs.c
fib6_notifier.c
fib6_rules.c
fou6.c fou, fou6: avoid uninit-value in gue_err() and gue6_err() 2019-03-08 15:19:53 -08:00
icmp.c ipv6: Add icmp_echo_ignore_anycast for ICMPv6 2019-03-20 16:29:37 -07:00
inet6_connection_sock.c
inet6_hashtables.c net: tcp6: prefer listeners bound to an address 2018-12-14 15:55:20 -08:00
ip6_checksum.c net: udp: fix handling of CHECKSUM_COMPLETE packets 2018-10-24 14:18:16 -07:00
ip6_fib.c ipv6: Fix dump of specific table with strict checking 2019-01-02 20:15:43 -08:00
ip6_flowlabel.c
ip6_gre.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 12:06:19 -08:00
ip6_icmp.c
ip6_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 21:43:31 -08:00
ip6_offload.c gso: validate gso_type on ipip style tunnels 2019-02-20 11:24:27 -08:00
ip6_offload.h
ip6_output.c net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATE 2019-03-03 21:05:10 -08:00
ip6_tunnel.c ip: validate header length on virtual device xmit 2019-01-01 12:05:02 -08:00
ip6_udp_tunnel.c net/ipv6/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE 2019-01-17 14:55:52 -08:00
ip6_vti.c ip: validate header length on virtual device xmit 2019-01-01 12:05:02 -08:00
ip6mr.c ip6mr: Do not call __IP6_INC_STATS() from preemptible context 2019-03-04 10:55:48 -08:00
ipcomp6.c
ipv6_sockglue.c net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATE 2019-03-03 21:05:10 -08:00
Kconfig
Makefile
mcast_snoop.c net: remove unneeded switch fall-through 2019-02-21 13:48:00 -08:00
mcast.c bridge: join all-snoopers multicast address 2019-01-22 17:18:08 -08:00
mip6.c
ndisc.c ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called 2018-10-26 15:58:06 -07:00
netfilter.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2019-02-18 11:38:30 -08:00
output_core.c
ping.c
proc.c
protocol.c
raw.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-20 11:53:36 -08:00
reassembly.c net: remove unused struct inet_frag_queue.fragments field 2019-02-26 08:27:05 -08:00
route.c ipv6: Remove fallback argument from ip6_hold_safe 2019-03-21 13:30:34 -07:00
seg6_hmac.c
seg6_iptunnel.c ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation 2019-01-30 14:06:12 -08:00
seg6_local.c bpf: add End.DT6 action to bpf_lwt_seg6_action helper 2018-07-31 09:22:48 +02:00
seg6.c genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
sit.c net: sit: fix UBSAN Undefined behaviour in check_6rd 2019-03-11 10:32:45 -07:00
syncookies.c
sysctl_net_ipv6.c
tcp_ipv6.c tcp: add one skb cache for rx 2019-03-23 21:57:38 -04:00
tcpv6_offload.c net: use indirect call wrappers at GRO transport layer 2018-12-15 13:23:02 -08:00
tunnel6.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
udp_impl.h udp6: add missing rehash callback to udplite 2019-01-17 15:01:08 -08:00
udp_offload.c net: use indirect call wrappers at GRO transport layer 2018-12-15 13:23:02 -08:00
udp.c udpv6: fix possible user after free in error handler 2019-02-22 16:05:11 -08:00
udplite.c udp6: add missing rehash callback to udplite 2019-01-17 15:01:08 -08:00
xfrm6_input.c net: use skb_sec_path helper in more places 2018-12-19 11:21:37 -08:00
xfrm6_mode_beet.c
xfrm6_mode_ro.c
xfrm6_mode_transport.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm6_mode_tunnel.c
xfrm6_output.c xfrm6: call kfree_skb when skb is toobig 2018-09-03 07:37:57 +02:00
xfrm6_policy.c xfrm6: remove BUG_ON from xfrm6_dst_ifdown 2018-11-22 07:55:48 +01:00
xfrm6_protocol.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
xfrm6_state.c
xfrm6_tunnel.c xfrm: destroy xfrm_state synchronously on net exit path 2019-02-05 06:29:20 +01:00