linux/drivers/net
Eugene Crosser 55161e67d4 vrf: Revert "Reset skb conntrack connection..."
This reverts commit 09e856d54b.

When an interface is enslaved in a VRF, prerouting conntrack hook is
called twice: once in the context of the original input interface, and
once in the context of the VRF interface. If no special precausions are
taken, this leads to creation of two conntrack entries instead of one,
and breaks SNAT.

Commit above was intended to avoid creation of extra conntrack entries
when input interface is enslaved in a VRF. It did so by resetting
conntrack related data associated with the skb when it enters VRF context.

However it breaks netfilter operation. Imagine a use case when conntrack
zone must be assigned based on the original input interface, rather than
VRF interface (that would make original interfaces indistinguishable). One
could create netfilter rules similar to these:

        chain rawprerouting {
                type filter hook prerouting priority raw;
                iif realiface1 ct zone set 1 return
                iif realiface2 ct zone set 2 return
        }

This works before the mentioned commit, but not after: zone assignment
is "forgotten", and any subsequent NAT or filtering that is dependent
on the conntrack zone does not work.

Here is a reproducer script that demonstrates the difference in behaviour.

==========
#!/bin/sh

# This script demonstrates unexpected change of nftables behaviour
# caused by commit 09e856d54b ""vrf: Reset skb conntrack
# connection on VRF rcv"
# https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1
#
# Before the commit, it was possible to assign conntrack zone to a
# packet (or mark it for `notracking`) in the prerouting chanin, raw
# priority, based on the `iif` (interface from which the packet
# arrived).
# After the change, # if the interface is enslaved in a VRF, such
# assignment is lost. Instead, assignment based on the `iif` matching
# the VRF master interface is honored. Thus it is impossible to
# distinguish packets based on the original interface.
#
# This script demonstrates this change of behaviour: conntrack zone 1
# or 2 is assigned depending on the match with the original interface
# or the vrf master interface. It can be observed that conntrack entry
# appears in different zone in the kernel versions before and after
# the commit.

IPIN=172.30.30.1
IPOUT=172.30.30.2
PFXL=30

ip li sh vein >/dev/null 2>&1 && ip li del vein
ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
nft list table testct >/dev/null 2>&1 && nft delete table testct

ip li add vein type veth peer veout
ip li add tvrf type vrf table 9876
ip li set veout master tvrf
ip li set vein up
ip li set veout up
ip li set tvrf up
/sbin/sysctl -w net.ipv4.conf.veout.accept_local=1
/sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0
ip addr add $IPIN/$PFXL dev vein
ip addr add $IPOUT/$PFXL dev veout

nft -f - <<__END__
table testct {
	chain rawpre {
		type filter hook prerouting priority raw;
		iif { veout, tvrf } meta nftrace set 1
		iif veout ct zone set 1 return
		iif tvrf ct zone set 2 return
		notrack
	}
	chain rawout {
		type filter hook output priority raw;
		notrack
	}
}
__END__

uname -rv
conntrack -F
ping -W 1 -c 1 -I vein $IPOUT
conntrack -L

Signed-off-by: Eugene Crosser <crosser@average.org>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20 11:27:19 +01:00
..
appletalk
arcnet
bonding bonding: 3ad: pass parameter bond_params by reference 2021-09-07 10:28:50 +01:00
caif
can can: peak_usb: pcan_usb_fd_decode_status(): remove unnecessary test on the nullity of a pointer 2021-10-17 22:51:51 +02:00
dsa net: dsa: mt7530: correct ds->num_ports 2021-10-18 13:22:21 +01:00
ethernet cavium: Fix return values of the probe function 2021-10-19 13:09:57 +01:00
fddi fddi: switch from 'pci_' to 'dma_' API 2021-08-29 10:50:24 +01:00
fjes
hamradio hamradio: baycom_epp: fix build for UML 2021-10-18 12:57:42 +01:00
hippi
hyperv
ieee802154
ipa asm-generic: build fixes for v5.15 2021-10-08 11:57:54 -07:00
ipvlan
mctp
mdio net: mdio-ipq4019: Fix the error for an optional regs resource 2021-09-28 17:28:54 -07:00
netdevsim Driver core update for 5.15-rc1 2021-09-01 08:44:42 -07:00
pcs net: pcs: xpcs: fix incorrect steps on disable EEE 2021-10-06 11:18:27 +01:00
phy net: phy: Do not shutdown PHYs in READY state 2021-10-09 13:47:37 +01:00
plip
ppp
slip
team
usb lan78xx: select CRC32 2021-10-15 14:34:35 +01:00
vmxnet3 ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
wan net: wan: wanxl: define CROSS_COMPILE_M68K 2021-09-16 14:08:04 +01:00
wireguard
wireless asm-generic: build fixes for v5.15 2021-10-08 11:57:54 -07:00
wwan Networking stragglers and fixes for 5.15-rc1, including changes from netfilter, 2021-09-07 14:02:58 -07:00
xen-netback xen-netback: correct success/error reporting for the SKB-with-fraglist case 2021-09-19 12:10:26 +01:00
bareudp.c
dummy.c
eql.c
geneve.c
gtp.c
ifb.c
Kconfig
LICENSE.SRC
loopback.c
macsec.c
macvlan.c
macvtap.c
Makefile
mdio.c
mhi_net.c drivers: net: mhi: fix error path in mhi_net_newlink 2021-09-24 14:25:05 +01:00
mii.c
net_failover.c
netconsole.c
nlmon.c
ntb_netdev.c
rionet.c
sb1000.c
Space.c
sungem_phy.c
tap.c
thunderbolt.c
tun.c ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
veth.c
virtio_net.c virtio-net: fix for skb_over_panic inside big mode 2021-10-09 13:50:33 +01:00
vrf.c vrf: Revert "Reset skb conntrack connection..." 2021-10-20 11:27:19 +01:00
vsockmon.c
vxlan.c nexthop: Fix memory leaks in nexthop notification chain listeners 2021-09-23 12:33:22 +01:00
xen-netfront.c xen/netfront: don't trust the backend response data blindly 2021-08-25 10:43:21 +01:00