linux/include/net/netfilter
Pablo Neira Ayuso 0838aa7fcf netfilter: fix netns dependencies with conntrack templates
Quoting Daniel Borkmann:

"When adding connection tracking template rules to a netns, f.e. to
configure netfilter zones, the kernel will endlessly busy-loop as soon
as we try to delete the given netns in case there's at least one
template present, which is problematic i.e. if there is such bravery that
the priviledged user inside the netns is assumed untrusted.

Minimal example:

  ip netns add foo
  ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
  ip netns del foo

What happens is that when nf_ct_iterate_cleanup() is being called from
nf_conntrack_cleanup_net_list() for a provided netns, we always end up
with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
don't get a soft-lockup as we still have a schedule() point, but the
serving CPU spins on 100% from that point onwards.

Since templates are normally allocated with nf_conntrack_alloc(), we
also bump net->ct.count. The issue why they are not yet nf_ct_put() is
because the per netns .exit() handler from x_tables (which would eventually
invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is
called in the dependency chain at a *later* point in time than the per
netns .exit() handler for the connection tracker.

This is clearly a chicken'n'egg problem: after the connection tracker
.exit() handler, we've teared down all the connection tracking
infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
invoked at a later point in time during the netns cleanup, as that would
lead to a use-after-free. At the same time, we cannot make x_tables depend
on the connection tracker module, so that the xt_ct_tg_destroy() would
be invoked earlier in the cleanup chain."

Daniel confirms this has to do with the order in which modules are loaded or
having compiled nf_conntrack as modules while x_tables built-in. So we have no
guarantees regarding the order in which netns callbacks are executed.

Fix this by allocating the templates through kmalloc() from the respective
SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
is marked as unlikely since conntrack templates are rarely allocated and only
from the configuration plane path.

Note that templates are not kept in any list to avoid further dependencies with
nf_conntrack anymore, thus, the tmpl larval list is removed.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
2015-07-20 14:58:19 +02:00
..
ipv4 netfilter: fix sparse warnings in reject handling 2015-03-10 15:01:32 +01:00
ipv6 netfilter: fix sparse warnings in reject handling 2015-03-10 15:01:32 +01:00
br_netfilter.h netfilter: bridge: split ipv6 code into separated file 2015-06-18 21:14:21 +02:00
nf_conntrack_acct.h
nf_conntrack_core.h netfilter: Convert print_tuple functions to return void 2014-11-05 14:10:33 -05:00
nf_conntrack_ecache.h
nf_conntrack_expect.h
nf_conntrack_extend.h
nf_conntrack_helper.h
nf_conntrack_l3proto.h netfilter: Convert print_tuple functions to return void 2014-11-05 14:10:33 -05:00
nf_conntrack_l4proto.h netfilter: Convert print_tuple functions to return void 2014-11-05 14:10:33 -05:00
nf_conntrack_labels.h
nf_conntrack_seqadj.h
nf_conntrack_synproxy.h
nf_conntrack_timeout.h
nf_conntrack_timestamp.h
nf_conntrack_tuple.h
nf_conntrack_zones.h
nf_conntrack.h netfilter: fix netns dependencies with conntrack templates 2015-07-20 14:58:19 +02:00
nf_log.h netfilter: restore rule tracing via nfnetlink_log 2015-03-19 11:14:48 +01:00
nf_nat_core.h
nf_nat_helper.h
nf_nat_l3proto.h netfilter: Pass nf_hook_state through nf_nat_ipv6_{in,out,fn,local_fn}(). 2015-04-04 12:48:08 -04:00
nf_nat_l4proto.h
nf_nat_redirect.h netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module 2014-11-27 13:08:42 +01:00
nf_nat.h netfilter: fix compilation of masquerading without IP_NF_TARGET_MASQUERADE 2014-09-11 17:02:45 +02:00
nf_queue.h netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook 2015-06-23 06:23:23 -07:00
nf_tables_bridge.h netfilter: nf_tables_bridge: export nft_reject_ip*hdr_validate functions 2014-11-27 12:58:05 +01:00
nf_tables_core.h netfilter: nf_tables: add support for dynamic set updates 2015-04-08 16:58:27 +02:00
nf_tables_ipv4.h netfilter: Pass nf_hook_state through nft_set_pktinfo*(). 2015-04-04 12:54:27 -04:00
nf_tables_ipv6.h netfilter: Pass nf_hook_state through nft_set_pktinfo*(). 2015-04-04 12:54:27 -04:00
nf_tables.h netfilter: nf_tables_netdev: unregister hooks on net_device removal 2015-06-15 23:02:35 +02:00
nfnetlink_log.h
nfnetlink_queue.h
nft_masq.h netfilter: nf_tables: restrict nat/masq expressions to nat chain type 2014-10-13 20:42:00 +02:00
nft_meta.h netfilter: nf_tables: get rid of NFT_REG_VERDICT usage 2015-04-13 17:17:07 +02:00
nft_redir.h netfilter: nf_tables: add new expression nft_redir 2014-10-27 22:49:39 +01:00
nft_reject.h netfilter: nft_reject: introduce icmp code abstraction for inet and bridge 2014-10-02 18:29:57 +02:00
xt_rateest.h