linux/net/netfilter
Julian Anastasov 705dd34440 ipvs: use kthreads for stats estimation
Estimating all entries in single list in timer context
by single CPU causes large latency with multiple IPVS rules
as reported in [1], [2], [3].

Spread the estimator structures in multiple chains and
use kthread(s) for the estimation. The chains are processed
in multiple (50) timer ticks to ensure the 2-second interval
between estimations with some accuracy. Every chain is
processed under RCU lock.

Every kthread works over its own data structure and all
such contexts are attached to array. The contexts can be
preserved while the kthread tasks are stopped or restarted.
When estimators are removed, unused kthread contexts are
released and the slots in array are left empty.

First kthread determines parameters to use, eg. maximum
number of estimators to process per kthread based on
chain's length (chain_max), allowing sub-100us cond_resched
rate and estimation taking up to 1/8 of the CPU capacity
to avoid any problems if chain_max is not correctly
calculated.

chain_max is calculated taking into account factors
such as CPU speed and memory/cache speed where the
cache_factor (4) is selected from real tests with
current generation of CPU/NUMA configurations to
correct the difference in CPU usage between
cached (during calc phase) and non-cached (working) state
of the estimated per-cpu data.

First kthread also plays the role of distributor of
added estimators to all kthreads, keeping low the
time to add estimators. The optimization is based on
the fact that newly added estimator should be estimated
after 2 seconds, so we have the time to offload the
adding to chain from controlling process to kthread 0.

The allocated kthread context may grow from 1 to 50
allocated structures for timer ticks which saves memory for
setups with small number of estimators.

We also add delayed work est_reload_work that will
make sure the kthread tasks are properly started/stopped.

ip_vs_start_estimator() is changed to report errors
which allows to safely store the estimators in
allocated structures.

Many thanks to Jiri Wiesner for his valuable comments
and for spending a lot of time reviewing and testing
the changes on different platforms with 48-256 CPUs and
1-8 NUMA nodes under different cpufreq governors.

[1] Report from Yunhong Jiang:
https://lore.kernel.org/netdev/D25792C1-1B89-45DE-9F10-EC350DC04ADC@gmail.com/
[2]
https://marc.info/?l=linux-virtual-server&m=159679809118027&w=2
[3] Report from Dust:
https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: yunhong-cgl jiang <xintian1976@gmail.com>
Cc: "dust.li" <dust.li@linux.alibaba.com>
Reviewed-by: Jiri Wiesner <jwiesner@suse.de>
Tested-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-12-10 22:44:43 +01:00
..
ipset netfilter: ipset: Add support for new bitmask parameter 2022-11-30 18:55:36 +01:00
ipvs ipvs: use kthreads for stats estimation 2022-12-10 22:44:43 +01:00
core.c Remove DECnet support from kernel 2022-08-22 14:26:30 +01:00
Kconfig netfilter: nft_objref: make it builtin 2022-10-25 13:48:35 +02:00
Makefile netfilter: nft_inner: support for inner tunnel header matching 2022-10-25 13:48:42 +02:00
nf_conncount.c netfilter: nf_conncount: reduce unnecessary GC 2022-05-16 13:05:40 +02:00
nf_conntrack_acct.c netfilter: conntrack: remove extension register api 2022-02-04 06:30:28 +01:00
nf_conntrack_amanda.c
nf_conntrack_bpf.c net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c 2022-10-03 09:17:32 -07:00
nf_conntrack_broadcast.c netfilter: nf_conntrack: use rcu accessors where needed 2022-07-11 16:25:15 +02:00
nf_conntrack_core.c netfilter: conntrack: use siphash_4u64 2022-11-15 10:53:19 +01:00
nf_conntrack_ecache.c netfilter: conntrack: add nf_conntrack_events autodetect mode 2022-05-13 18:56:28 +02:00
nf_conntrack_expect.c netfilter: conntrack: convert to refcount_t api 2022-01-09 23:30:13 +01:00
nf_conntrack_extend.c netfilter: extensions: introduce extension genid count 2022-05-13 18:52:16 +02:00
nf_conntrack_ftp.c netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed 2022-09-20 23:50:03 +02:00
nf_conntrack_h323_asn1.c netfilter: Use fallthrough pseudo-keyword 2020-07-22 01:18:05 +02:00
nf_conntrack_h323_main.c netfilter: nf_ct_h323: cap packet size at 64k 2022-08-11 16:50:49 +02:00
nf_conntrack_h323_types.c
nf_conntrack_helper.c net: move add ct helper function to nf_conntrack_helper for ovs and tc 2022-11-08 12:15:19 +01:00
nf_conntrack_irc.c netfilter: nf_conntrack_irc: Tighten matching on DCC message 2022-09-07 15:55:23 +02:00
nf_conntrack_labels.c netfilter: conntrack: remove extension register api 2022-02-04 06:30:28 +01:00
nf_conntrack_netbios_ns.c netfilter: nf_conntrack_netbios_ns: fix helper module alias 2022-01-11 10:41:44 +01:00
nf_conntrack_netlink.c netfilter: remove nf_conntrack_helper sysctl and modparam toggles 2022-08-31 12:12:32 +02:00
nf_conntrack_pptp.c netfilter: nf_conntrack: add missing __rcu annotations 2022-07-11 16:25:15 +02:00
nf_conntrack_proto_dccp.c netfilter: conntrack: pass hook state to log functions 2021-06-18 14:47:43 +02:00
nf_conntrack_proto_generic.c
nf_conntrack_proto_gre.c netfilter: conntrack: nf_ct_gre_keymap_flush() removal 2021-07-02 02:07:01 +02:00
nf_conntrack_proto_icmp.c netfilter: conntrack: pass hook state to log functions 2021-06-18 14:47:43 +02:00
nf_conntrack_proto_icmpv6.c netfilter: conntrack: set icmpv6 redirects as RELATED 2022-11-30 23:01:20 +01:00
nf_conntrack_proto_sctp.c netfilter: conntrack: add sctp DATA_SENT state 2022-11-30 18:26:09 +01:00
nf_conntrack_proto_tcp.c netfilter: conntrack: reduce timeout when receiving out-of-window fin or rst 2022-09-07 16:46:03 +02:00
nf_conntrack_proto_udp.c Revert "netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY" 2022-03-03 13:35:22 +01:00
nf_conntrack_proto.c netfilter: conntrack: merge ipv4+ipv6 confirm functions 2022-11-30 18:55:30 +01:00
nf_conntrack_sane.c netfilter: nf_ct_sane: remove pseudo skb linearization 2022-08-11 16:50:25 +02:00
nf_conntrack_seqadj.c netfilter: conntrack: remove extension register api 2022-02-04 06:30:28 +01:00
nf_conntrack_sip.c netfilter: nf_conntrack_sip: fix ct_sip_walk_headers 2022-09-07 15:06:26 +02:00
nf_conntrack_snmp.c
nf_conntrack_standalone.c netfilter: conntrack: add sctp DATA_SENT state 2022-11-30 18:26:09 +01:00
nf_conntrack_tftp.c
nf_conntrack_timeout.c netfilter: nf_conntrack: use rcu accessors where needed 2022-07-11 16:25:15 +02:00
nf_conntrack_timestamp.c netfilter: conntrack: remove extension register api 2022-02-04 06:30:28 +01:00
nf_dup_netdev.c netfilter: nf_dup_netdev: add and use recursion counter 2022-06-21 10:50:41 +02:00
nf_flow_table_core.c netfilter: flowtable: fix stuck flows on cleanup due to pending work 2022-08-24 07:43:21 +02:00
nf_flow_table_inet.c netfilter: flowtable: Fix QinQ and pppoe support for inet table 2022-03-16 11:25:04 +01:00
nf_flow_table_ip.c netfilter: flowtable: add a 'default' case to flowtable datapath 2022-12-08 22:11:00 +01:00
nf_flow_table_offload.c netfilter: flowtable: fix stuck flows on cleanup due to pending work 2022-08-24 07:43:21 +02:00
nf_flow_table_procfs.c netfilter: nf_flow_table: count pending offload workqueue tasks 2022-07-11 16:25:14 +02:00
nf_hooks_lwtunnel.c netfilter: add netfilter hooks to SRv6 data plane 2021-08-30 01:51:36 +02:00
nf_internals.h netfilter: ctnetlink: add kernel side filtering for dump 2020-05-27 22:20:34 +02:00
nf_log_syslog.c netfilter: nf_log: incorrect offset to network header 2022-07-09 09:55:43 +02:00
nf_log.c netfilter: move from strlcpy with unused retval to strscpy 2022-09-07 16:46:03 +02:00
nf_nat_amanda.c netfilter: nat: move repetitive nat port reserve loop to a helper 2022-09-07 16:46:04 +02:00
nf_nat_bpf.c net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c 2022-10-03 09:17:32 -07:00
nf_nat_core.c netfilter: nf_nat: Fix possible memory leak in nf_nat_init() 2022-11-02 10:47:22 +01:00
nf_nat_ftp.c netfilter: nat: move repetitive nat port reserve loop to a helper 2022-09-07 16:46:04 +02:00
nf_nat_helper.c netfilter: nat: avoid long-running port range loop 2022-09-07 16:46:04 +02:00
nf_nat_irc.c netfilter: nat: move repetitive nat port reserve loop to a helper 2022-09-07 16:46:04 +02:00
nf_nat_masquerade.c netfilter: conntrack: add nf_ct_iter_data object for nf_ct_iterate_cleanup*() 2022-05-13 18:56:27 +02:00
nf_nat_proto.c netfilter: nat: move nf_xfrm_me_harder to where it is used 2021-04-26 03:20:07 +02:00
nf_nat_redirect.c
nf_nat_sip.c netfilter: nat: move repetitive nat port reserve loop to a helper 2022-09-07 16:46:04 +02:00
nf_nat_tftp.c
nf_queue.c netfilter: nf_queue: handle socket prefetch 2022-03-01 11:51:15 +01:00
nf_sockopt.c netfilter: switch nf_setsockopt to sockptr_t 2020-07-24 15:41:54 -07:00
nf_synproxy_core.c ip: Fix data-races around sysctl_ip_default_ttl. 2022-07-15 11:49:55 +01:00
nf_tables_api.c netfilter: nft_inner: fix IS_ERR() vs NULL check 2022-11-22 22:29:54 +01:00
nf_tables_core.c netfilter: nft_inner: support for inner tunnel header matching 2022-10-25 13:48:42 +02:00
nf_tables_offload.c netfilter: nf_tables: bail out early if hardware offload is not supported 2022-06-06 19:19:15 +02:00
nf_tables_trace.c netfilter: nf_tables: avoid skb access on nf_stolen 2022-06-27 19:22:54 +02:00
nfnetlink_acct.c netfilter: use nfnetlink_unicast() 2021-05-29 01:04:53 +02:00
nfnetlink_cthelper.c netfilter: nf_conntrack: use rcu accessors where needed 2022-07-11 16:25:15 +02:00
nfnetlink_cttimeout.c netfilter: cttimeout: fix slab-out-of-bounds read typo in cttimeout_net_exit 2022-06-17 23:31:20 +02:00
nfnetlink_hook.c Remove DECnet support from kernel 2022-08-22 14:26:30 +01:00
nfnetlink_log.c net: Get rcv tstamp if needed in nfnetlink_{log, queue}.c 2022-03-03 14:38:48 +00:00
nfnetlink_osf.c netfilter: nfnetlink_osf: fix possible bogus match in nf_osf_find() 2022-09-07 15:55:28 +02:00
nfnetlink_queue.c netfilter: nf_queue: do not allow packet truncation below transport header offset 2022-07-26 21:12:42 +02:00
nfnetlink.c netfilter: nfnetlink: fix potential dead lock in nfnetlink_rcv_msg() 2022-11-08 23:16:13 +01:00
nft_bitwise.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_byteorder.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_chain_filter.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-22 11:41:16 +01:00
nft_chain_nat.c netfilter: nf_tables: remove unused arg in nft_set_pktinfo_unspec() 2021-05-29 01:04:54 +02:00
nft_chain_route.c netfilter: nf_tables: remove unused arg in nft_set_pktinfo_unspec() 2021-05-29 01:04:54 +02:00
nft_cmp.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_compat.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_connlimit.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_counter.c netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET 2022-11-15 10:53:17 +01:00
nft_ct.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_dup_netdev.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_dynset.c netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET 2022-11-15 10:53:17 +01:00
nft_exthdr.c sctp: move SCTP_PAD4 and SCTP_TRUNC4 to linux/sctp.h 2022-11-17 21:43:34 -08:00
nft_fib_inet.c netfilter: nft_fib: add reduce support 2022-03-20 00:29:47 +01:00
nft_fib_netdev.c netfilter: nft_fib: add reduce support 2022-03-20 00:29:47 +01:00
nft_fib.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_flow_offload.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_fwd_netdev.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_hash.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_immediate.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_inner.c netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET 2022-11-15 10:53:17 +01:00
nft_last.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_limit.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_log.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_lookup.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_masq.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_meta.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_nat.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_numgen.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_objref.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_osf.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_payload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next 2022-11-15 20:33:20 -08:00
nft_queue.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_quota.c netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET 2022-11-15 10:53:17 +01:00
nft_range.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_redir.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_reject_inet.c netfilter: nf_tables: do not reduce read-only expressions 2022-03-20 00:29:46 +01:00
nft_reject_netdev.c netfilter: nf_tables: do not reduce read-only expressions 2022-03-20 00:29:46 +01:00
nft_reject.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_rt.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_set_bitmap.c netfilter: nft_set_bitmap: Fix spelling mistake 2022-07-11 16:40:37 +02:00
nft_set_hash.c netfilter: nft_dynset: restore set element counter when failing to update 2022-06-27 19:03:37 +02:00
nft_set_pipapo_avx2.c netfilter: nft_set_pipapo_avx2: remove redundant pointer lt 2021-12-24 16:58:17 +01:00
nft_set_pipapo_avx2.h netfilter: nf_tables: prefer direct calls for set lookups 2021-05-29 01:04:27 +02:00
nft_set_pipapo.c netfilter: nft_set_pipapo: release elements in clone from abort path 2022-07-02 21:04:19 +02:00
nft_set_pipapo.h netfilter: nf_tables: prefer direct calls for set lookups 2021-05-29 01:04:27 +02:00
nft_set_rbtree.c netfilter: nft_set_rbtree: overlap detection with element re-addition after deletion 2022-04-22 15:49:15 +02:00
nft_socket.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_synproxy.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_tproxy.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_tunnel.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_xfrm.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
utils.c netfilter: use actual socket sk rather than skb sk when routing harder 2020-10-30 12:57:39 +01:00
x_tables.c netfilter: move from strlcpy with unused retval to strscpy 2022-09-07 16:46:03 +02:00
xt_addrtype.c
xt_AUDIT.c netfilter: fix clang-12 fmt string warnings 2021-06-01 23:53:51 +02:00
xt_bpf.c bpf: Refactor BPF_PROG_RUN into a function 2021-08-17 00:45:07 +02:00
xt_cgroup.c
xt_CHECKSUM.c
xt_CLASSIFY.c
xt_cluster.c
xt_comment.c
xt_connbytes.c
xt_connlabel.c
xt_connlimit.c netfilter: x_tables: use correct integer types 2022-07-11 16:40:45 +02:00
xt_connmark.c netfilter: Replace HTTP links with HTTPS ones 2020-07-29 20:09:18 +02:00
xt_CONNSECMARK.c netfilter: Replace HTTP links with HTTPS ones 2020-07-29 20:09:18 +02:00
xt_conntrack.c
xt_cpu.c
xt_CT.c netfilter: nf_conntrack: use rcu accessors where needed 2022-07-11 16:25:15 +02:00
xt_dccp.c
xt_devgroup.c
xt_dscp.c
xt_DSCP.c netfilter: x_tables: use correct integer types 2022-07-11 16:40:45 +02:00
xt_ecn.c
xt_esp.c
xt_hashlimit.c proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
xt_helper.c
xt_hl.c
xt_HL.c
xt_HMARK.c netfilter: xt_HMARK: Use ip_is_fragment() helper 2020-08-28 19:55:51 +02:00
xt_IDLETIMER.c netfilter: xt_IDLETIMER: replace snprintf in show functions with sysfs_emit 2021-11-08 12:14:05 +01:00
xt_ipcomp.c
xt_iprange.c
xt_ipvs.c
xt_l2tp.c
xt_LED.c
xt_length.c
xt_limit.c netfilter: x_tables: improve limit_mt scalability 2021-05-29 01:04:52 +02:00
xt_LOG.c netfilter: log: work around missing softdep backend module 2021-09-21 03:46:56 +02:00
xt_mac.c
xt_mark.c
xt_MASQUERADE.c
xt_multiport.c
xt_nat.c netfilter: Add MODULE_DESCRIPTION entries to kernel modules 2020-06-25 00:50:31 +02:00
xt_NETMAP.c
xt_nfacct.c netfilter: Remove unnecessary conversion to bool 2020-12-01 09:45:29 +01:00
xt_NFLOG.c netfilter: log: work around missing softdep backend module 2021-09-21 03:46:56 +02:00
xt_NFQUEUE.c
xt_osf.c
xt_owner.c
xt_physdev.c
xt_pkttype.c
xt_policy.c
xt_quota.c
xt_rateest.c
xt_RATEEST.c netfilter: move from strlcpy with unused retval to strscpy 2022-09-07 16:46:03 +02:00
xt_realm.c
xt_recent.c proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
xt_REDIRECT.c
xt_repldata.h
xt_sctp.c sctp: move SCTP_PAD4 and SCTP_TRUNC4 to linux/sctp.h 2022-11-17 21:43:34 -08:00
xt_SECMARK.c netfilter: xt_SECMARK: add new revision to fix structure layout 2021-05-03 23:02:44 +02:00
xt_set.c
xt_socket.c netfilter: xt_socket: missing ifdef CONFIG_IP6_NF_IPTABLES dependency 2022-02-13 23:55:48 +01:00
xt_state.c
xt_statistic.c treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
xt_string.c
xt_tcpmss.c
xt_TCPMSS.c netfilter: x_tables: use correct integer types 2022-07-11 16:40:45 +02:00
xt_TCPOPTSTRIP.c
xt_tcpudp.c
xt_TEE.c
xt_time.c netfilter: Replace HTTP links with HTTPS ones 2020-07-29 20:09:18 +02:00
xt_TPROXY.c netfilter: xt_TPROXY: remove pr_debug invocations 2022-07-21 00:56:00 +02:00
xt_TRACE.c netfilter: nf_log: add module softdeps 2021-03-31 22:34:10 +02:00
xt_u32.c