linux/include
Philo Lu 78c91ae2c6 ipv4/udp: Add 4-tuple hash for connected socket
Currently, the udp_table has two hash table, the port hash and portaddr
hash. Usually for UDP servers, all sockets have the same local port and
addr, so they are all on the same hash slot within a reuseport group.

In some applications, UDP servers use connect() to manage clients. In
particular, when firstly receiving from an unseen 4 tuple, a new socket
is created and connect()ed to the remote addr:port, and then the fd is
used exclusively by the client.

Once there are connected sks in a reuseport group, udp has to score all
sks in the same hash2 slot to find the best match. This could be
inefficient with a large number of connections, resulting in high
softirq overhead.

To solve the problem, this patch implement 4-tuple hash for connected
udp sockets. During connect(), hash4 slot is updated, as well as a
corresponding counter, hash4_cnt, in hslot2. In __udp4_lib_lookup(),
hslot4 will be searched firstly if the counter is non-zero. Otherwise,
hslot2 is used like before. Note that only connected sockets enter this
hash4 path, while un-connected ones are not affected.

hlist_nulls is used for hash4, because we probably move to another hslot
wrongly when lookup with concurrent rehash. Then we check nulls at the
list end to see if we should restart lookup. Because udp does not use
SLAB_TYPESAFE_BY_RCU, we don't need to touch sk_refcnt when lookup.

Stress test results (with 1 cpu fully used) are shown below, in pps:
(1) _un-connected_ socket as server
    [a] w/o hash4: 1,825176
    [b] w/  hash4: 1,831750 (+0.36%)

(2) 500 _connected_ sockets as server
    [c] w/o hash4:   290860 (only 16% of [a])
    [d] w/  hash4: 1,889658 (+3.1% compared with [b])

With hash4, compute_score is skipped when lookup, so [d] is slightly
better than [b].

Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com>
Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com>
Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18 11:56:21 +00:00
..
acpi ACPI: processor: Move arch_init_invariance_cppc() call later 2024-11-06 21:31:36 +01:00
asm-generic move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
clocksource
crypto move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
cxl cxl: Move mailbox related bits to the same context 2024-09-12 08:38:01 -07:00
drm drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic() 2024-10-31 10:31:34 +01:00
dt-bindings soc: convert ep93xx to devicetree 2024-09-26 12:00:25 -07:00
keys KEYS: Remove unused declarations 2024-09-20 18:28:26 +03:00
kunit The core clk framework is left largely untouched this time around except for 2024-09-23 15:01:48 -07:00
kvm
linux net/udp: Add 4-tuple hash list basis 2024-11-18 11:56:21 +00:00
math-emu
media
memory
misc
net ipv4/udp: Add 4-tuple hash for connected socket 2024-11-18 11:56:21 +00:00
pcmcia
ras
rdma move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
rv
scsi move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
soc soc: fsl_qbman: use be16_to_cpu() in qm_sg_entry_get_off() 2024-11-04 18:44:43 -08:00
sound ALSA: hda: fix trigger_tstamp_latched 2024-10-02 12:50:24 +02:00
target move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
trace rxrpc: Add a tracepoint for aborts being proposed 2024-11-11 15:27:46 -08:00
uapi ipsec-next-2024-11-15 2024-11-18 11:52:49 +00:00
ufs Many singleton patches - please see the various changelogs for details. 2024-09-21 08:20:50 -07:00
vdso random: vDSO: add a __vdso_getrandom prototype for all architectures 2024-09-13 17:28:35 +02:00
video fbdev: da8xx: remove the driver 2024-10-15 10:08:23 +02:00
xen xen: Remove dependency between pciback and privcmd 2024-10-18 11:59:04 +02:00