linux/net
Florian Westphal b16ac3c4c8 netfilter: conntrack: include zone id in tuple hash again
commit deedb59039 ("netfilter: nf_conntrack: add direction support for zones")
removed the zone id from the hash value.

This has implications on hash chain lengths with overlapping tuples, which
can hit 64k entries on released kernels, before upper droplimit was added
in d7e7747ac5 ("netfilter: refuse insertion if chain has grown too large").

With that change reverted, test script coming with this series shows
linear insertion time growth:

 10000 entries in 3737 ms (now 10000 total, loop 1)
 10000 entries in 16994 ms (now 20000 total, loop 2)
 10000 entries in 47787 ms (now 30000 total, loop 3)
 10000 entries in 72731 ms (now 40000 total, loop 4)
 10000 entries in 95761 ms (now 50000 total, loop 5)
 10000 entries in 96809 ms (now 60000 total, loop 6)
 inserted 60000 entries from packet path in 333825 ms

With d7e7747ac5 in place, the test fails.

There are three supported zone use cases:
 1. Connection is in the default zone (zone 0).
    This means to special config (the default).
 2. Connection is in a different zone (1 to 2**16).
    This means rules are in place to put packets in
    the desired zone, e.g. derived from vlan id or interface.
 3. Original direction is in zone X and Reply is in zone 0.

3) allows to use of the existing NAT port collision avoidance to provide
   connectivity to internet/wan even when the various zones have overlapping
   source networks separated via policy routing.

In case the original zone is 0 all three cases are identical.

There is no way to place original direction in zone x and reply in
zone y (with y != 0).

Zones need to be assigned manually via the iptables/nftables ruleset,
before conntrack lookup occurs (raw table in iptables) using the
"CT" target conntrack template support
(-j CT --{zone,zone-orig,zone-reply} X).

Normally zone assignment happens based on incoming interface, but could
also be derived from packet mark, vlan id and so on.

This means that when case 3 is used, the ruleset will typically not even
assign a connection tracking template to the "reply" packets, so lookup
happens in zone 0.

However, it is possible that reply packets also match a ct zone
assignment rule which sets up a template for zone X (X > 0) in original
direction only.

Therefore, after making the zone id part of the hash, we need to do a
second lookup using the reply zone id if we did not find an entry on
the first lookup.

In practice, most deployments will either not use zones at all or the
origin and reply zones are the same, no second lookup is required in
either case.

After this change, packet path insertion test passes with constant
insertion times:

 10000 entries in 1064 ms (now 10000 total, loop 1)
 10000 entries in 1074 ms (now 20000 total, loop 2)
 10000 entries in 1066 ms (now 30000 total, loop 3)
 10000 entries in 1079 ms (now 40000 total, loop 4)
 10000 entries in 1081 ms (now 50000 total, loop 5)
 10000 entries in 1082 ms (now 60000 total, loop 6)
 inserted 60000 entries from packet path in 6452 ms

Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-21 03:46:55 +02:00
..
6lowpan 6lowpan: iphc: Fix an off-by-one check of array index 2021-07-22 16:19:03 +02:00
9p 9p/trans_virtio: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
802 net: 802: remove dead leftover after ipx driver removal 2021-08-13 16:30:35 -07:00
8021q dev_ioctl: split out ndo_eth_ioctl 2021-07-27 20:11:45 +01:00
appletalk net: socket: rework compat_ifreq_ioctl() 2021-07-23 14:20:25 +01:00
atm atm: Use list_for_each_entry() to simplify code in resources.c 2021-06-10 14:08:09 -07:00
ax25 ax25: use skb_expand_head 2021-08-03 11:21:39 +01:00
batman-adv Kbuild updates for v5.15 2021-09-03 15:33:47 -07:00
bluetooth TTY / Serial patches for 5.15-rc1 2021-09-01 09:51:16 -07:00
bpf bpf: Refactor BPF_PROG_RUN into a function 2021-08-17 00:45:07 +02:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge net: bridge: mcast: fix vlan port router deadlock 2021-09-03 13:43:19 +01:00
caif net: fix uninit-value in caif_seqpkt_sendmsg 2021-07-15 11:08:33 -07:00
can net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
ceph Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
core skbuff: clean up inconsistent indenting 2021-09-03 11:51:26 +01:00
dcb net: dcb: Return the correct errno code 2021-06-01 17:01:33 -07:00
dccp dccp: don't duplicate ccid when cloning dccp sock 2021-09-08 11:28:35 +01:00
decnet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
dns_resolver
dsa net: dsa: tag_rtl4_a: Fix egress tags 2021-09-01 11:48:05 +01:00
ethernet move netdev_boot_setup into Space.c 2021-08-03 13:05:26 +01:00
ethtool ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
hsr net: hsr: don't check sequence number if tag removal is offloaded 2021-06-16 12:13:01 -07:00
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-13 06:41:22 -07:00
ife
ipv4 ip_gre: validate csum_start only on pull 2021-09-05 18:59:32 +01:00
ipv6 netfilter: ip6_tables: zero-initialize fragment offset 2021-09-15 17:00:28 +02:00
iucv net/iucv: Replace deprecated CPU-hotplug functions. 2021-08-09 10:13:32 +01:00
kcm net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
key net: Remove unnecessary variables 2021-05-26 07:03:39 +02:00
l2tp l2tp: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
l3mdev
lapb net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c 2021-06-08 16:31:25 -07:00
llc net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
mac80211 mac80211: parse transmit power envelope element 2021-08-26 10:18:56 +02:00
mac802154 ieee802154: Remove redundant initialization of variable ret 2021-09-07 14:06:08 +01:00
mctp mctp: perform route destruction under RCU read lock 2021-09-08 11:29:16 +01:00
mpls mpls: defer ttl decrement in mpls_forward() 2021-07-23 17:17:56 +01:00
mptcp mptcp: Only send extra TCP acks in eligible socket states 2021-09-03 11:49:30 +01:00
ncsi net/ncsi: add get MAC address command to get Intel i210 MAC address 2021-09-01 17:18:56 -07:00
netfilter netfilter: conntrack: include zone id in tuple hash again 2021-09-21 03:46:55 +02:00
netlabel net: fix NULL pointer reference in cipso_v4_doi_free 2021-08-30 12:23:18 +01:00
netlink net: netlink: Remove unused function 2021-07-30 18:35:47 +02:00
netrom net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
nfc net: in_irq() cleanup 2021-08-13 14:09:19 -07:00
nsh
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-19 18:09:18 -07:00
packet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
phonet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
psample
qrtr net: qrtr: revert check in qrtr_endpoint_post() 2021-09-02 11:37:02 +01:00
rds net/rds: dma_map_sg is entitled to merge entries 2021-08-18 15:35:50 -07:00
rfkill Another set of updates, all over the map: 2021-04-20 16:44:04 -07:00
rose
rxrpc net: RxRPC: make dependent Kconfig symbols be shown indented 2021-08-18 10:12:11 +01:00
sched fq_codel: reject silly quantum parameters 2021-09-04 10:49:46 +01:00
sctp sctp: move the active_key update after sh_keys is added 2021-08-03 11:43:43 +01:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-13 06:41:22 -07:00
strparser net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
sunrpc NFS Client Updates for Linux 5.15 2021-09-04 10:25:26 -07:00
switchdev net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge 2021-08-04 12:35:07 +01:00
tipc tipc: clean up inconsistent indenting 2021-09-03 11:51:26 +01:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
unix af_unix: fix potential NULL deref in unix_dgram_connect() 2021-08-31 11:31:52 +01:00
vmw_vsock vsock/virtio: avoid potential deadlock when vsock device remove 2021-08-12 10:57:27 -07:00
wireless cfg80211: use wiphy DFS domain if it is self-managed 2021-08-26 11:04:55 +02:00
x25 net: x25: Use list_for_each_entry() to simplify code in x25_route.c 2021-06-10 14:08:09 -07:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ 2021-08-27 11:16:29 +01:00
compat.c net: Return the correct errno code 2021-06-03 15:13:56 -07:00
devres.c net: devres: Correct a grammatical error 2021-06-11 12:55:28 -07:00
Kconfig mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
Makefile mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
socket.c Core: 2021-08-31 16:43:06 -07:00
sysctl_net.c