linux/include
Eric Dumazet 6f8b12d661 net: napi: add hard irqs deferral feature
Back in commit 3b47d30396 ("net: gro: add a per device gro flush timer")
we added the ability to arm one high resolution timer, that we used
to keep not-complete packets in GRO engine a bit longer, hoping that further
frames might be added to them.

Since then, we added the napi_complete_done() interface, and commit
364b605573 ("net: busy-poll: return busypolling status to drivers")
allowed drivers to avoid re-arming NIC interrupts if we made a promise
that their NAPI poll() handler would be called in the near future.

This infrastructure can be leveraged, thanks to a new device parameter,
which allows to arm the napi hrtimer, instead of re-arming the device
hard IRQ.

We have noticed that on some servers with 32 RX queues or more, the chit-chat
between the NIC and the host caused by IRQ delivery and re-arming could hurt
throughput by ~20% on 100Gbit NIC.

In contrast, hrtimers are using local (percpu) resources and might have lower
cost.

The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy
than gro_flush_timeout (/sys/class/net/ethX/)

By default, both gro_flush_timeout and napi_defer_hard_irqs are zero.

This patch does not change the prior behavior of gro_flush_timeout
if used alone : NIC hard irqs should be rearmed as before.

One concrete usage can be :

echo 20000 >/sys/class/net/eth1/gro_flush_timeout
echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs

If at least one packet is retired, then we will reset napi counter
to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans
of the queue.

On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ
avoidance was only possible if napi->poll() was exhausting its budget
and not call napi_complete_done().

This feature also can be used to work around some non-optimal NIC irq
coalescing strategies.

Having the ability to insert XX usec delays between each napi->poll()
can increase cache efficiency, since we increase batch sizes.

It also keeps serving cpus not idle too long, reducing tail latencies.

Co-developed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23 12:43:20 -07:00
..
acpi Additional ACPI updates for 5.7-rc1 2020-04-06 10:35:06 -07:00
asm-generic hyperv-fixes for 5.7-rc1 2020-04-14 11:58:04 -07:00
clocksource pwm: omap-dmtimer: Drop unused header file 2020-03-30 18:03:06 +02:00
crypto crypto: curve25519 - do not pollute dispatcher based on assembler 2020-04-09 00:01:59 +09:00
drm drm/bridge: analogix_dp: Split bind() into probe() and real bind() 2020-04-09 10:29:35 +02:00
dt-bindings RISC-V Patches for the 5.7 Merge Window, Part 1 2020-04-09 10:51:30 -07:00
keys KEYS: Don't write out to userspace while holding key semaphore 2020-03-29 12:40:41 +01:00
kunit kunit: subtests should be indented 4 spaces according to TAP 2020-03-26 14:08:41 -06:00
kvm
linux net: napi: add hard irqs deferral feature 2020-04-23 12:43:20 -07:00
math-emu
media
misc
net ipv6: Honor all IPv6 PIO Valid Lifetime values 2020-04-23 12:29:21 -07:00
pcmcia
ras
rdma IB/mlx5: Expose UAR object and its alloc/destroy commands 2020-03-27 12:59:04 -03:00
scsi SCSI misc on 20200402 2020-04-02 17:03:53 -07:00
soc net: mscc: ocelot: support 4 PTP programmable pins 2020-04-21 15:38:33 -07:00
sound ASoC: Fixes for v5.7 2020-04-08 18:08:09 +02:00
target scsi: target: fix hang when multiple threads try to destroy the same iscsi session 2020-03-26 21:47:47 -04:00
trace net: qrtr: Add tracepoint support 2020-04-22 12:55:54 -07:00
uapi net: Add IF_OPER_TESTING 2020-04-20 12:43:24 -07:00
vdso
video
xen xen: Use evtchn_type_t as a type for event channels 2020-04-07 12:12:54 +02:00