linux/net
Daniel Borkmann deedb59039 netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.

The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.

As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.

Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.

If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).

Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.

Below a minimal, simplified collision example (script in [2]) with
netperf sessions:

  +--- tenant-1 ---+   mark := 1
  |    netperf     |--+
  +----------------+  |                CT zone := mark [ORIGINAL]
   [ip,sport] := X   +--------------+  +--- gateway ---+
                     | mark routing |--|     SNAT      |-- ... +
                     +--------------+  +---------------+       |
  +--- tenant-2 ---+  |                                     ~~~|~~~
  |    netperf     |--+                +-----------+           |
  +----------------+   mark := 2       | netserver |------ ... +
   [ip,sport] := X                     +-----------+
                                        [ip,port] := Y
On the gateway netns, example:

  iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
  iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully

  iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
  iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark

conntrack dump from gateway netns:

  netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns

  tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
                           src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
               [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
                           src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
               [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
                        src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
               [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

  tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
                        src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
               [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2

Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.

I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.

  [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
  [2] https://paste.fedoraproject.org/242835/65657871/

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-18 01:22:50 +02:00
..
6lowpan 6lowpan: add request for ipv6 module 2015-07-23 17:10:48 +02:00
9p virtio/vhost: fixes for 4.2 2015-07-23 13:07:04 -07:00
802
8021q vlan: Add GRO support for non hardware accelerated vlan 2015-06-01 16:50:52 -07:00
appletalk net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
atm br2684: Remove unnecessary formatting macros b1 and bs 2015-07-31 15:25:52 -07:00
ax25 NET: AX.25: Stop heartbeat timer on disconnect. 2015-07-15 15:59:58 -07:00
batman-adv batman-adv: change the MAC of each VLAN upon ndo_set_mac_address 2015-06-07 17:07:20 +02:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-07-31 23:52:20 -07:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2015-08-04 23:57:45 -07:00
caif caif: fix leaks and race in caif_queue_rcv_skb() 2015-07-21 00:02:44 -07:00
can can: replace timestamp as unique skb attribute 2015-07-12 21:13:22 +02:00
ceph libceph: treat sockaddr_storage with uninitialized family as blank 2015-07-09 20:30:34 +03:00
core lwtunnel: set skb protocol and dev 2015-08-03 22:26:13 -07:00
dcb
dccp tcp: fix recv with flags MSG_WAITALL | MSG_PEEK 2015-07-27 01:06:53 -07:00
decnet net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
dns_resolver
dsa net: dsa: Add netconsole support 2015-07-31 15:45:37 -07:00
ethernet net: Add full IPv6 addresses to flow_keys 2015-06-04 15:44:30 -07:00
hsr
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-07-31 23:52:20 -07:00
ipv4 netfilter: nf_conntrack: add direction support for zones 2015-08-18 01:22:50 +02:00
ipv6 netfilter: nf_conntrack: add direction support for zones 2015-08-18 01:22:50 +02:00
ipx net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
irda irda: use msecs_to_jiffies for conversion to jiffies 2015-05-25 17:46:21 -04:00
iucv net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-06-24 16:49:49 -07:00
l2tp net: Modify sk_alloc to not reference count the netns of kernel sockets. 2015-05-11 10:50:18 -04:00
lapb
llc tcp: fix recv with flags MSG_WAITALL | MSG_PEEK 2015-07-27 01:06:53 -07:00
mac80211 cfg80211: use RTNL locked reg_can_beacon for IR-relaxation 2015-07-17 15:02:02 +02:00
mac802154 mac802154: Fix memory corruption with global deferred transmit state. 2015-07-30 14:08:55 +02:00
mpls af_mpls: add null dev check in find_outdev 2015-08-06 22:03:58 -07:00
netfilter netfilter: nf_conntrack: add direction support for zones 2015-08-18 01:22:50 +02:00
netlabel netlink: implement nla_put_in_addr and nla_put_in6_addr 2015-03-31 13:58:35 -04:00
netlink netlink: don't hold mutex in rcu callback when releasing mmapd ring 2015-07-21 22:22:56 -07:00
netrom netfilter: Remove spurios included of netfilter.h 2015-06-18 21:14:32 +02:00
nfc NFC: nci: fix mistake in uart generic driver 2015-06-15 18:10:37 +02:00
openvswitch openvswitch: Re-add CONFIG_OPENVSWITCH_VXLAN 2015-07-29 23:03:10 -07:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-07-31 23:52:20 -07:00
phonet net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
rds Changes for 4.2-rc 2015-07-15 17:03:03 -07:00
rfkill net: rfkill: gpio: make better use of gpiod API 2015-05-29 13:13:45 +02:00
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-24 02:58:51 -07:00
rxrpc net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
sched netfilter: nf_conntrack: add direction support for zones 2015-08-18 01:22:50 +02:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-07-31 23:52:20 -07:00
sunrpc NFS client bugfixes for Linux 4.2 2015-07-28 09:37:44 -07:00
switchdev switchdev: add offload_fwd_mark generator helper 2015-07-20 18:32:44 -07:00
tipc ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument 2015-07-31 15:21:30 -07:00
unix net/unix: support SCM_SECURITY for stream sockets 2015-06-10 22:49:20 -07:00
vmw_vsock net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
wimax
wireless cfg80211: use RTNL locked reg_can_beacon for IR-relaxation 2015-07-17 15:02:02 +02:00
x25 net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
xfrm xfrm: Fix a typo 2015-07-21 00:28:36 -07:00
compat.c net: switch importing msghdr from userland to {compat_,}import_iovec() 2015-04-09 00:02:26 -04:00
Kconfig lwtunnel: infrastructure for handling light weight tunnels like mpls 2015-07-21 10:39:03 -07:00
Makefile
socket.c net: Add a struct net parameter to sock_create_kern 2015-05-11 10:50:17 -04:00
sysctl_net.c