mainlining shenanigans
Go to file
Florian Westphal b16ac3c4c8 netfilter: conntrack: include zone id in tuple hash again
commit deedb59039 ("netfilter: nf_conntrack: add direction support for zones")
removed the zone id from the hash value.

This has implications on hash chain lengths with overlapping tuples, which
can hit 64k entries on released kernels, before upper droplimit was added
in d7e7747ac5 ("netfilter: refuse insertion if chain has grown too large").

With that change reverted, test script coming with this series shows
linear insertion time growth:

 10000 entries in 3737 ms (now 10000 total, loop 1)
 10000 entries in 16994 ms (now 20000 total, loop 2)
 10000 entries in 47787 ms (now 30000 total, loop 3)
 10000 entries in 72731 ms (now 40000 total, loop 4)
 10000 entries in 95761 ms (now 50000 total, loop 5)
 10000 entries in 96809 ms (now 60000 total, loop 6)
 inserted 60000 entries from packet path in 333825 ms

With d7e7747ac5 in place, the test fails.

There are three supported zone use cases:
 1. Connection is in the default zone (zone 0).
    This means to special config (the default).
 2. Connection is in a different zone (1 to 2**16).
    This means rules are in place to put packets in
    the desired zone, e.g. derived from vlan id or interface.
 3. Original direction is in zone X and Reply is in zone 0.

3) allows to use of the existing NAT port collision avoidance to provide
   connectivity to internet/wan even when the various zones have overlapping
   source networks separated via policy routing.

In case the original zone is 0 all three cases are identical.

There is no way to place original direction in zone x and reply in
zone y (with y != 0).

Zones need to be assigned manually via the iptables/nftables ruleset,
before conntrack lookup occurs (raw table in iptables) using the
"CT" target conntrack template support
(-j CT --{zone,zone-orig,zone-reply} X).

Normally zone assignment happens based on incoming interface, but could
also be derived from packet mark, vlan id and so on.

This means that when case 3 is used, the ruleset will typically not even
assign a connection tracking template to the "reply" packets, so lookup
happens in zone 0.

However, it is possible that reply packets also match a ct zone
assignment rule which sets up a template for zone X (X > 0) in original
direction only.

Therefore, after making the zone id part of the hash, we need to do a
second lookup using the reply zone id if we did not find an entry on
the first lookup.

In practice, most deployments will either not use zones at all or the
origin and reply zones are the same, no second lookup is required in
either case.

After this change, packet path insertion test passes with constant
insertion times:

 10000 entries in 1064 ms (now 10000 total, loop 1)
 10000 entries in 1074 ms (now 20000 total, loop 2)
 10000 entries in 1066 ms (now 30000 total, loop 3)
 10000 entries in 1079 ms (now 40000 total, loop 4)
 10000 entries in 1081 ms (now 50000 total, loop 5)
 10000 entries in 1082 ms (now 60000 total, loop 6)
 inserted 60000 entries from packet path in 6452 ms

Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-21 03:46:55 +02:00
arch ARM: 2021-09-07 13:40:51 -07:00
block block-5.15-2021-09-05 2021-09-06 10:06:26 -07:00
certs certs: Add support for using elliptic curve keys for signing modules 2021-08-23 19:55:42 +03:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-08-30 12:57:10 -07:00
Documentation dt-bindings: net: sun8i-emac: Add compatible for D1 2021-09-08 11:27:34 +01:00
drivers net: stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume 2021-09-08 12:28:26 +01:00
fs fuse update for 5.15 2021-09-07 12:18:29 -07:00
include Networking stragglers and fixes for 5.15-rc1, including changes from netfilter, 2021-09-07 14:02:58 -07:00
init Enable '-Werror' by default for all kernel builds 2021-09-05 11:24:05 -07:00
ipc memcg: enable accounting of ipc resources 2021-09-03 09:58:12 -07:00
kernel kgdb patches for 5.15 2021-09-07 12:08:04 -07:00
lib lib/test_scanf: split up number parsing test routines 2021-09-06 11:04:03 -07:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
mm Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly" 2021-09-07 11:03:45 -07:00
net netfilter: conntrack: include zone id in tuple hash again 2021-09-21 03:46:55 +02:00
samples kgdb patches for 5.15 2021-09-07 12:08:04 -07:00
scripts don't make the syscall checking produce errors from warnings 2021-09-06 09:15:09 -07:00
security Kbuild updates for v5.15 2021-09-03 15:33:47 -07:00
sound Kbuild updates for v5.15 2021-09-03 15:33:47 -07:00
tools Networking stragglers and fixes for 5.15-rc1, including changes from netfilter, 2021-09-07 14:02:58 -07:00
usr .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
virt KVM: Drop unused kvm_dirty_gfn_invalid() 2021-09-06 08:23:46 -04:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: update email address of Matthias Fuchs and Thomas Körper 2021-08-19 09:39:44 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: move Murali Karicheri to credits 2021-04-29 15:47:30 -07:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Networking stragglers and fixes for 5.15-rc1, including changes from netfilter, 2021-09-07 14:02:58 -07:00
Makefile Enable '-Werror' by default for all kernel builds 2021-09-05 11:24:05 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.