linux/net/core
Björn Töpel 7fd3253a7d net: Introduce preferred busy-polling
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
option or system-wide using the /proc/sys/net/core/busy_read knob, is
an opportunistic. That means that if the NAPI context is not
scheduled, it will poll it. If, after busy-polling, the budget is
exceeded the busy-polling logic will schedule the NAPI onto the
regular softirq handling.

One implication of the behavior above is that a busy/heavy loaded NAPI
context will never enter/allow for busy-polling. Some applications
prefer that most NAPI processing would be done by busy-polling.

This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
in concert with the napi_defer_hard_irqs and gro_flush_timeout
knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
introduced in commit 6f8b12d661 ("net: napi: add hard irqs deferral
feature"), and allows for a user to defer interrupts to be enabled and
instead schedule the NAPI context from a watchdog timer. When a user
enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
and the NAPI context is being processed by a softirq, the softirq NAPI
processing will exit early to allow the busy-polling to be performed.

If the application stops performing busy-polling via a system call,
the watchdog timer defined by gro_flush_timeout will timeout, and
regular softirq handling will resume.

In summary; Heavy traffic applications that prefer busy-polling over
softirq processing should use this option.

Example usage:

  $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
  $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout

Note that the timeout should be larger than the userspace processing
window, otherwise the watchdog will timeout and fall back to regular
softirq processing.

Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20201130185205.196029-2-bjorn.topel@gmail.com
2020-12-01 00:09:25 +01:00
..
bpf_sk_storage.c bpf: Fix the irq and nmi check in bpf_sk_storage for tracing usage 2020-11-16 16:46:01 -08:00
datagram.c net: zerocopy: combine pages in zerocopy_sg_from_iter() 2020-08-20 16:12:50 -07:00
datagram.h
dev_addr_lists.c net: core: add nested_level variable in net_device 2020-09-28 15:00:15 -07:00
dev_ioctl.c net: dev_ioctl: remove redundant initialization of variable err 2020-11-03 17:49:26 -08:00
dev.c net: Introduce preferred busy-polling 2020-12-01 00:09:25 +01:00
devlink.c devlink: Avoid overwriting port attributes of registered port 2020-11-12 08:06:57 -08:00
drop_monitor.c genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
dst_cache.c
dst.c net: Correct the comment of dst_dev_put() 2020-09-10 13:28:57 -07:00
failover.c
fib_notifier.c
fib_rules.c fib: fix fib_rule_ops indirect call wrappers when CONFIG_IPV6=m 2020-09-08 20:09:08 -07:00
filter.c net: Properly typecast int values to set sk_max_pacing_rate 2020-10-22 12:18:25 -07:00
flow_dissector.c net: core: fix spelling typo in flow_dissector.c 2020-11-07 15:52:21 -08:00
flow_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-25 17:49:04 -07:00
gen_estimator.c
gen_stats.c docs: networking: convert gen_stats.txt to ReST 2020-04-28 14:39:46 -07:00
gro_cells.c
hwbm.c
link_watch.c net: Add IF_OPER_TESTING 2020-04-20 12:43:24 -07:00
lwt_bpf.c net: add net available in build_state 2020-03-29 22:30:57 -07:00
lwtunnel.c net: ipv6: add rpl sr tunnel 2020-03-29 22:30:57 -07:00
Makefile
neighbour.c net: neighbor: add fdb extended attribute 2020-06-24 14:36:33 -07:00
net_namespace.c bpf, net: Rework cookie generator as per-cpu one 2020-09-30 11:50:35 -07:00
net-procfs.c net-sysfs: add backlog len and CPU id to softnet data 2020-09-21 13:56:37 -07:00
net-sysfs.c net-sysfs: Fix inconsistent of format with argument type in net-sysfs.c 2020-10-01 18:45:23 -07:00
net-sysfs.h net-sysfs: add netdev_change_owner() 2020-02-26 20:07:25 -08:00
net-traces.c
netclassid_cgroup.c cgroup, netclassid: remove double cond_resched 2020-04-21 15:44:30 -07:00
netevent.c
netpoll.c net: make sure napi_list is safe for RCU traversal 2020-09-10 13:08:46 -07:00
netprio_cgroup.c netprio_cgroup: Fix unlimited memory leak of v2 cgroups 2020-05-09 20:59:21 -07:00
page_pool.c net: page_pool: Add bulk support for ptr_ring 2020-11-14 02:29:00 +01:00
pktgen.c pktgen: Fix inconsistent of format with argument type in pktgen.c 2020-10-01 18:45:23 -07:00
ptp_classifier.c ptp: Add generic ptp v2 header parsing function 2020-08-19 16:07:49 -07:00
request_sock.c
rtnetlink.c rtnetlink: fix data overflow in rtnl_calcit() 2020-10-21 18:24:08 -07:00
scm.c fs: Add receive_fd() wrapper for __receive_fd() 2020-07-13 11:03:44 -07:00
secure_seq.c crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.h 2020-05-08 15:32:17 +10:00
skbuff.c net: skb_vlan_untag(): don't reset transport offset if set by GRO layer 2020-11-09 20:03:55 -08:00
skmsg.c bpf, sockmap: Allow skipping sk_skb parser program 2020-10-11 18:09:44 -07:00
sock_diag.c bpf, net: Rework cookie generator as per-cpu one 2020-09-30 11:50:35 -07:00
sock_map.c net, sockmap: Don't call bpf_prog_put() on NULL pointer 2020-10-15 21:05:23 +02:00
sock_reuseport.c udp: Copy has_conns in reuseport_grow(). 2020-07-21 15:31:02 -07:00
sock.c net: Introduce preferred busy-polling 2020-12-01 00:09:25 +01:00
stream.c
sysctl_net_core.c net: add option to not create fall-back tunnels in root-ns as well 2020-08-28 06:52:44 -07:00
timestamping.c net: Introduce a new MII time stamping interface. 2019-12-25 19:51:33 -08:00
tso.c net: tso: add UDP segmentation support 2020-06-18 20:46:23 -07:00
utils.c net: Fix skb->csum update in inet_proto_csum_replace16(). 2020-01-24 20:54:30 +01:00
xdp.c net: page_pool: Add bulk support for ptr_ring 2020-11-14 02:29:00 +01:00