linux/net/netfilter
Daniel Borkmann deedb59039 netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.

The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.

As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.

Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.

If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).

Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.

Below a minimal, simplified collision example (script in [2]) with
netperf sessions:

  +--- tenant-1 ---+   mark := 1
  |    netperf     |--+
  +----------------+  |                CT zone := mark [ORIGINAL]
   [ip,sport] := X   +--------------+  +--- gateway ---+
                     | mark routing |--|     SNAT      |-- ... +
                     +--------------+  +---------------+       |
  +--- tenant-2 ---+  |                                     ~~~|~~~
  |    netperf     |--+                +-----------+           |
  +----------------+   mark := 2       | netserver |------ ... +
   [ip,sport] := X                     +-----------+
                                        [ip,port] := Y
On the gateway netns, example:

  iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
  iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully

  iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
  iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark

conntrack dump from gateway netns:

  netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns

  tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
                           src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
               [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
                           src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
               [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
                        src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
               [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
                        src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
               [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2

Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.

I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.

  [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
  [2] https://paste.fedoraproject.org/242835/65657871/

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-18 01:22:50 +02:00
..
ipset netfilter: ipset: Fix coding styles reported by checkpatch.pl 2015-06-14 10:40:18 +02:00
ipvs netfilter: nf_conntrack: push zone object into functions 2015-08-11 12:29:01 +02:00
core.c netfilter: rename local nf_hook_list to hook_list 2015-07-23 16:18:35 +02:00
Kconfig netfilter: factor out packet duplication for IPv4/IPv6 2015-08-07 11:49:49 +02:00
Makefile netfilter: nf_tables: add netdev table to filter from ingress 2015-05-26 18:41:23 +02:00
nf_conntrack_acct.c netfilter: Remove uses of seq_<foo> return values 2015-03-18 10:51:35 +01:00
nf_conntrack_amanda.c net: Remove state argument from skb_find_text() 2015-02-22 15:59:54 -05:00
nf_conntrack_broadcast.c
nf_conntrack_core.c netfilter: nf_conntrack: add direction support for zones 2015-08-18 01:22:50 +02:00
nf_conntrack_ecache.c netfilter: conntrack: remove timer from ecache extension 2014-06-25 19:15:38 +02:00
nf_conntrack_expect.c netfilter: nf_conntrack: add direction support for zones 2015-08-18 01:22:50 +02:00
nf_conntrack_extend.c
nf_conntrack_ftp.c netfilter: replace strnicmp with strncasecmp 2014-10-14 02:18:24 +02:00
nf_conntrack_h323_asn1.c
nf_conntrack_h323_main.c ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST 2015-05-25 13:25:33 -04:00
nf_conntrack_h323_types.c
nf_conntrack_helper.c netfilter: fix spelling errors 2014-10-30 17:35:30 +01:00
nf_conntrack_irc.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
nf_conntrack_l3proto_generic.c netfilter: Convert print_tuple functions to return void 2014-11-05 14:10:33 -05:00
nf_conntrack_labels.c netfilter: connlabels: remove unneeded includes 2013-07-31 16:39:18 +02:00
nf_conntrack_netbios_ns.c
nf_conntrack_netlink.c netfilter: nf_conntrack: add direction support for zones 2015-08-18 01:22:50 +02:00
nf_conntrack_pptp.c netfilter: nf_conntrack: push zone object into functions 2015-08-11 12:29:01 +02:00
nf_conntrack_proto_dccp.c netfilter: Convert print_tuple functions to return void 2014-11-05 14:10:33 -05:00
nf_conntrack_proto_generic.c netfilter: conntrack: warn the user if there is a better helper to use 2015-06-12 14:06:24 +02:00
nf_conntrack_proto_gre.c netfilter: Convert print_tuple functions to return void 2014-11-05 14:10:33 -05:00
nf_conntrack_proto_sctp.c netfilter: nf_ct_sctp: minimal multihoming support 2015-07-30 12:59:25 +02:00
nf_conntrack_proto_tcp.c conntrack: RFC5961 challenge ACK confuse conntrack LAST-ACK transition 2015-05-15 20:50:56 +02:00
nf_conntrack_proto_udp.c netfilter: Convert print_tuple functions to return void 2014-11-05 14:10:33 -05:00
nf_conntrack_proto_udplite.c netfilter: Convert print_tuple functions to return void 2014-11-05 14:10:33 -05:00
nf_conntrack_proto.c netfilter: nf_conntrack: remove dead code 2014-01-03 23:41:37 +01:00
nf_conntrack_sane.c netfilter: nf_ct_helper: better logging for dropped packets 2013-02-19 02:48:05 +01:00
nf_conntrack_seqadj.c netfilter: nf_ct_seqadj: print ack seq in the right host byte order 2015-01-05 13:52:20 +01:00
nf_conntrack_sip.c netfilter: replace strnicmp with strncasecmp 2014-10-14 02:18:24 +02:00
nf_conntrack_snmp.c
nf_conntrack_standalone.c netfilter: nf_conntrack: add direction support for zones 2015-08-18 01:22:50 +02:00
nf_conntrack_tftp.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
nf_conntrack_timeout.c
nf_conntrack_timestamp.c netfilter: nf_ct_timestamp: Fix BUG_ON after netns deletion 2013-12-20 14:58:29 +01:00
nf_internals.h netfilter: nf_queue: fix nf_queue_nf_hook_drop() 2015-07-23 16:17:58 +02:00
nf_log_common.c netfilter: bridge: add helpers for fetching physin/outdev 2015-04-08 16:49:08 +02:00
nf_log.c netfilter: restore rule tracing via nfnetlink_log 2015-03-19 11:14:48 +01:00
nf_nat_amanda.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
nf_nat_core.c netfilter: nf_conntrack: add direction support for zones 2015-08-18 01:22:50 +02:00
nf_nat_ftp.c netfilter: nf_ct_helper: better logging for dropped packets 2013-02-19 02:48:05 +01:00
nf_nat_helper.c netfilter: nf_conntrack: make sequence number adjustments usuable without NAT 2013-08-28 00:26:48 +02:00
nf_nat_irc.c netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper 2014-01-06 14:17:17 +01:00
nf_nat_proto_common.c netfilter: use IS_ENABLED() macro 2014-06-30 11:38:03 +02:00
nf_nat_proto_dccp.c netfilter: use IS_ENABLED() macro 2014-06-30 11:38:03 +02:00
nf_nat_proto_sctp.c netfilter: use IS_ENABLED() macro 2014-06-30 11:38:03 +02:00
nf_nat_proto_tcp.c netfilter: use IS_ENABLED() macro 2014-06-30 11:38:03 +02:00
nf_nat_proto_udp.c netfilter: use IS_ENABLED() macro 2014-06-30 11:38:03 +02:00
nf_nat_proto_udplite.c netfilter: use IS_ENABLED() macro 2014-06-30 11:38:03 +02:00
nf_nat_proto_unknown.c
nf_nat_redirect.c netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module 2014-11-27 13:08:42 +01:00
nf_nat_sip.c netfilter: replace strnicmp with strncasecmp 2014-10-14 02:18:24 +02:00
nf_nat_tftp.c netfilter: nf_ct_helper: better logging for dropped packets 2013-02-19 02:48:05 +01:00
nf_queue.c netfilter: nf_queue: fix nf_queue_nf_hook_drop() 2015-07-23 16:17:58 +02:00
nf_sockopt.c netfilter: don't use mutex_lock_interruptible() 2014-08-08 16:47:23 +02:00
nf_synproxy_core.c netfilter: nf_conntrack: push zone object into functions 2015-08-11 12:29:01 +02:00
nf_tables_api.c netfilter: nftables: Only run the nftables chains in the proper netns 2015-07-15 18:17:36 +02:00
nf_tables_core.c netfilter: nftables: Only run the nftables chains in the proper netns 2015-07-15 18:17:36 +02:00
nf_tables_inet.c netfilter: nf_tables: fix error path in the init functions 2014-01-09 23:25:48 +01:00
nf_tables_netdev.c netfilter: nf_tables_netdev: unregister hooks on net_device removal 2015-06-15 23:02:35 +02:00
nfnetlink_acct.c netfilter: nfacct: per network namespace support 2015-08-07 11:50:56 +02:00
nfnetlink_cthelper.c netfilter: Zero the tuple in nfnl_cthelper_parse_tuple() 2015-03-12 13:07:36 +01:00
nfnetlink_cttimeout.c netfilter: cttimeout: allow to set/get default protocol timeouts 2013-10-01 13:17:39 +02:00
nfnetlink_log.c netfilter: Kill unused copies of RCV_SKB_FAIL 2015-06-18 21:14:27 +02:00
nfnetlink_queue_core.c netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook 2015-06-23 06:23:23 -07:00
nfnetlink_queue_ct.c netfilter: nf_conntrack: make sequence number adjustments usuable without NAT 2013-08-28 00:26:48 +02:00
nfnetlink.c netfilter: nfnetlink: keep going batch handling on missing modules 2015-07-02 17:59:33 +02:00
nft_bitwise.c netfilter: nf_tables: support variable sized data in nft_data_init() 2015-04-13 17:17:30 +02:00
nft_byteorder.c netfilter: nf_tables: switch registers to 32 bit addressing 2015-04-13 17:17:29 +02:00
nft_cmp.c netfilter: nf_tables: support variable sized data in nft_data_init() 2015-04-13 17:17:30 +02:00
nft_compat.c netfilter: x_tables: add context to know if extension runs from nft_compat 2015-05-15 20:14:07 +02:00
nft_counter.c netfilter: nft_counter: convert it to use per-cpu counters 2015-08-07 11:49:48 +02:00
nft_ct.c netfilter: nf_tables: switch registers to 32 bit addressing 2015-04-13 17:17:29 +02:00
nft_dynset.c netfilter: nft_dynset: dynamic stateful expression instantiation 2015-04-13 20:19:55 +02:00
nft_exthdr.c netfilter: nf_tables: switch registers to 32 bit addressing 2015-04-13 17:17:29 +02:00
nft_hash.c netfilter: nf_tables: variable sized set element keys / data 2015-04-13 17:17:31 +02:00
nft_immediate.c netfilter: nf_tables: support variable sized data in nft_data_init() 2015-04-13 17:17:30 +02:00
nft_limit.c netfilter: nft_limit: add per-byte limiting 2015-08-07 11:50:50 +02:00
nft_log.c netfilter: nf_tables: get rid of NFT_REG_VERDICT usage 2015-04-13 17:17:07 +02:00
nft_lookup.c netfilter: nf_tables: add flag to indicate set contains expressions 2015-04-13 20:12:32 +02:00
nft_masq.c netfilter: nf_tables: validate hooks in NAT expressions 2015-01-19 14:52:39 +01:00
nft_meta.c net: #ifdefify sk_classid member of struct sock 2015-07-21 16:04:30 -07:00
nft_nat.c netfilter: nf_tables: switch registers to 32 bit addressing 2015-04-13 17:17:29 +02:00
nft_payload.c netfilter: nf_tables: switch registers to 32 bit addressing 2015-04-13 17:17:29 +02:00
nft_queue.c netfilter: nf_tables: get rid of NFT_REG_VERDICT usage 2015-04-13 17:17:07 +02:00
nft_rbtree.c netfilter: nf_tables: variable sized set element keys / data 2015-04-13 17:17:31 +02:00
nft_redir.c netfilter: nf_tables: add register parsing/dumping helpers 2015-04-13 17:17:28 +02:00
nft_reject_inet.c netfilter; Add some missing default cases to switch statements in nft_reject. 2015-04-27 13:20:34 -04:00
nft_reject.c netfilter; Add some missing default cases to switch statements in nft_reject. 2015-04-27 13:20:34 -04:00
x_tables.c netfilter: add and use jump label for xt_tee 2015-07-15 18:18:06 +02:00
xt_addrtype.c ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST 2015-05-25 13:25:33 -04:00
xt_AUDIT.c netfilter: Convert uses of __constant_<foo> to <foo> 2014-03-13 14:13:19 +01:00
xt_bpf.c net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
xt_cgroup.c netfilter: x_tables: fix cgroup matching on non-full sks 2015-04-01 11:26:42 +02:00
xt_CHECKSUM.c
xt_CLASSIFY.c
xt_cluster.c net: use reciprocal_scale() helper 2014-08-23 12:21:21 -07:00
xt_comment.c
xt_connbytes.c netfilter: Convert pr_warning to pr_warn 2014-09-10 12:40:10 -07:00
xt_connlabel.c
xt_connlimit.c netfilter: nf_conntrack: push zone object into functions 2015-08-11 12:29:01 +02:00
xt_connmark.c netfilter: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
xt_CONNSECMARK.c
xt_conntrack.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
xt_cpu.c
xt_CT.c netfilter: nf_conntrack: add direction support for zones 2015-08-18 01:22:50 +02:00
xt_dccp.c
xt_devgroup.c
xt_dscp.c
xt_DSCP.c netfilter: fix various sparse warnings 2014-11-13 12:14:42 +01:00
xt_ecn.c
xt_esp.c
xt_hashlimit.c netfilter: Remove checks of seq_printf() return values 2014-11-05 14:11:02 -05:00
xt_helper.c
xt_hl.c
xt_HL.c
xt_HMARK.c net: use reciprocal_scale() helper 2014-08-23 12:21:21 -07:00
xt_IDLETIMER.c netfilter: IDLETIMER: fix lockdep warning 2015-07-13 17:23:25 +02:00
xt_ipcomp.c netfilter: xt_ipcomp: Use ntohs to ease sparse warning 2014-02-19 11:41:25 +01:00
xt_iprange.c
xt_ipvs.c
xt_l2tp.c netfilter: introduce l2tp match extension 2014-01-09 21:36:39 +01:00
xt_LED.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-08-05 18:46:26 -07:00
xt_length.c
xt_limit.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
xt_LOG.c netfilter: xt_LOG: add missing string format in nf_log_packet() 2014-06-28 18:50:35 +02:00
xt_mac.c
xt_mark.c netfilter: xt_MARK: Add ARP support 2015-05-14 13:00:27 +02:00
xt_multiport.c
xt_nat.c
xt_NETMAP.c
xt_nfacct.c netfilter: nfacct: per network namespace support 2015-08-07 11:50:56 +02:00
xt_NFLOG.c netfilter: log: netns NULL ptr bug when calling from conntrack 2013-05-15 14:11:07 +02:00
xt_NFQUEUE.c netfilter: xt_NFQUEUE: separate reusable code 2013-12-07 23:20:45 +01:00
xt_osf.c netfilter: xt_osf: Use continue to reduce indentation 2014-12-23 14:20:10 +01:00
xt_owner.c
xt_physdev.c netfilter: physdev: use helpers 2015-04-08 16:49:09 +02:00
xt_pkttype.c
xt_policy.c
xt_quota.c
xt_rateest.c net_sched: add 64bit rate estimators 2013-06-11 02:51:03 -07:00
xt_RATEEST.c net: sched: make bstats per cpu and estimator RCU safe 2014-09-30 01:02:26 -04:00
xt_realm.c
xt_recent.c netfilter: xt_recent: don't reject rule if new hitcount exceeds table max 2015-02-16 17:00:47 +01:00
xt_REDIRECT.c netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module 2014-11-27 13:08:42 +01:00
xt_repldata.h net: netfilter: LLVMLinux: vlais-netfilter 2014-06-07 11:44:39 -07:00
xt_sctp.c
xt_SECMARK.c
xt_set.c netfilter: ipset: Fix coding styles reported by checkpatch.pl 2015-06-14 10:40:18 +02:00
xt_socket.c netfilter: xt_socket: add XT_SOCKET_RESTORESKMARK flag 2015-06-18 13:05:09 +02:00
xt_state.c
xt_statistic.c net: replace macros net_random and net_srandom with direct calls to prandom 2014-01-14 15:15:25 -08:00
xt_string.c net: Remove state argument from skb_find_text() 2015-02-22 15:59:54 -05:00
xt_tcpmss.c
xt_TCPMSS.c netfilter: x_tables: add context to know if extension runs from nft_compat 2015-05-15 20:14:07 +02:00
xt_TCPOPTSTRIP.c netfilter: xt_TCPOPTSTRIP: fix possible off by one access 2013-08-01 11:45:15 +02:00
xt_tcpudp.c
xt_TEE.c netfilter: factor out packet duplication for IPv4/IPv6 2015-08-07 11:49:49 +02:00
xt_time.c
xt_TPROXY.c inet: inet_twsk_deschedule factorization 2015-07-09 15:12:20 -07:00
xt_TRACE.c
xt_u32.c