linux/net
David Ahern 40867d74c3 net: Add l3mdev index to flow struct and avoid oif reset for port devices
The fundamental premise of VRF and l3mdev core code is binding a socket
to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
Legacy code resets flowi_oif to the l3mdev losing any original port
device binding. Ben (among others) has demonstrated use cases where the
original port device binding is important and needs to be retained.
This patch handles that by adding a new entry to the common flow struct
that can indicate the l3mdev index for later rule and table matching
avoiding the need to reset flowi_oif.

In addition to allowing more use cases that require port device binds,
this patch brings a few datapath simplications:

1. l3mdev_fib_rule_match is only called when walking fib rules and
   always after l3mdev_update_flow. That allows an optimization to bail
   early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
   only that index needs to be checked for the FIB table id.

2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
   (e.g., VRF) device. By resetting flowi_oif only for this case the
   FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
   removing several checks in the datapath. The flowi_iif path can be
   simplified to only be called if the it is not loopback (loopback can
   not be assigned to an L3 domain) and the l3mdev index is not already
   set.

3. Avoid another device lookup in the output path when the fib lookup
   returns a reject failure.

Note: 2 functional tests for local traffic with reject fib rules are
updated to reflect the new direct failure at FIB lookup time for ping
rather than the failure on packet path. The current code fails like this:

    HINT: Fails since address on vrf device is out of device scope
    COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
    ping: Warning: source address might be selected on device other than: eth1
    PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.

    --- 172.16.3.1 ping statistics ---
    1 packets transmitted, 0 received, 100% packet loss, time 0ms

where the test now directly fails:

    HINT: Fails since address on vrf device is out of device scope
    COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
    ping: connect: No route to host

Signed-off-by: David Ahern <dsahern@kernel.org>
Tested-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-15 20:20:02 -07:00
..
6lowpan net: don't include ndisc.h from ipv6.h 2022-02-04 14:15:11 -08:00
9p xen/9p: use alloc/free_pages_exact() 2022-03-07 09:48:55 +01:00
802
8021q Revert "vlan: move dev_put into vlan_dev_uninit" 2022-02-23 15:21:13 -08:00
appletalk
atm proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
ax25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-10 17:16:56 -08:00
batman-adv batman-adv: Use netif_rx(). 2022-03-07 11:40:41 +00:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-10 17:16:56 -08:00
bpf bpf, test_run: Fix overflow in XDP frags bpf_test_finish 2022-03-02 01:09:15 +01:00
bpfilter
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next 2022-03-15 11:52:25 -07:00
caif net: caif: Use netif_rx(). 2022-03-04 12:02:19 +00:00
can can: isotp: set max PDU size to 64 kByte 2022-03-10 09:23:45 +01:00
ceph libceph: optionally use bounce buffer on recv path in crc mode 2022-02-02 18:50:36 +01:00
core net: Add lockdep asserts to ____napi_schedule(). 2022-03-14 10:09:28 +00:00
dcb net: dcb: disable softirqs in dcbnl_flush_dev() 2022-03-03 08:01:55 -08:00
dccp dccp: remove max48() 2022-01-27 13:53:27 +00:00
decnet net: decnet: use time_is_before_jiffies() instead of open coding it 2022-02-28 13:21:32 +00:00
dns_resolver
dsa net: dsa: report and change port dscp priority using dcbnl 2022-03-14 10:36:15 +00:00
ethernet
ethtool ethtool: add support to set/get completion queue event size 2022-02-23 20:33:05 -08:00
hsr net: add per-cpu storage and net->core_stats 2022-03-11 23:17:24 -08:00
ieee802154 net: ipv6: Handle delivery_time in ipv6 defrag 2022-03-03 14:38:48 +00:00
ife
ipv4 net: Add l3mdev index to flow struct and avoid oif reset for port devices 2022-03-15 20:20:02 -07:00
ipv6 net: Add l3mdev index to flow struct and avoid oif reset for port devices 2022-03-15 20:20:02 -07:00
iucv s390/iucv: sort out physical vs virtual pointers usage 2022-02-22 16:09:13 -08:00
kcm net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
key xfrm: Check if_id in xfrm_migrate 2022-01-26 07:44:01 +01:00
l2tp l2tp: add netns refcount tracker to l2tp_dfs_seq_data 2021-12-10 06:38:27 -08:00
l3mdev net: Add l3mdev index to flow struct and avoid oif reset for port devices 2022-03-15 20:20:02 -07:00
lapb
llc sock: Use sock_owned_by_user_nocheck() instead of sk_lock.owned. 2021-12-10 19:43:00 -08:00
mac80211 brcmfmac 2022-03-11 13:00:17 -08:00
mac802154
mctp mctp: Avoid warning if unregister notifies twice 2022-02-25 22:23:23 -08:00
mpls net: mpls: Fix GCC 12 warning 2022-02-10 15:29:39 +00:00
mptcp mptcp: add fullmesh flag check for adding address 2022-03-08 22:06:12 -08:00
ncsi all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate 2022-01-15 08:47:31 -08:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next 2022-03-15 11:52:25 -07:00
netlabel
netlink net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
netrom netrom: fix api breakage in nr_setsockopt() 2022-01-07 14:11:05 +00:00
nfc nfc: llcp: Revert "NFC: Keep socket alive until the DISC PDU is actually sent" 2022-03-03 10:43:37 +00:00
nsh
openvswitch net: openvswitch: fix uAPI incompatibility with existing user space 2022-03-10 20:14:52 -08:00
packet net: Handle delivery_time in skb->tstamp during network tapping with af_packet 2022-03-03 14:38:48 +00:00
phonet phonet: Use netif_rx(). 2022-03-07 11:40:41 +00:00
psample
qrtr bus: mhi: core: Add an API for auto queueing buffers for DL channel 2021-12-17 17:17:14 +01:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-12-16 16:13:19 -08:00
rfkill rfkill: allow to get the software rfkill state 2021-12-20 11:02:38 +01:00
rose net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
rxrpc rxrpc: Adjust retransmission backoff 2022-01-22 02:03:24 +00:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next 2022-03-15 11:52:25 -07:00
sctp sctp: fix kernel-infoleak for SCTP sockets 2022-03-10 14:46:42 -08:00
smc net/smc: fix -Wmissing-prototypes warning when CONFIG_SYSCTL not set 2022-03-09 20:02:35 -08:00
strparser
sunrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-10 17:29:56 -08:00
switchdev net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device 2022-02-24 21:31:43 -08:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-10 17:16:56 -08:00
tls tls: cap the output scatter list to something reasonable 2022-02-04 10:14:07 +00:00
unix Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2022-01-24 15:42:29 -08:00
vmw_vsock vsock: remove vsock from connected table when connect is interrupted by a signal 2022-02-17 08:56:02 -08:00
wireless brcmfmac 2022-03-11 13:00:17 -08:00
x25 net: x25: drop harmless check of !more 2021-12-09 18:35:11 -08:00
xdp i40e: xsk: Move tmp desc array from driver to pool 2022-01-27 17:25:32 +01:00
xfrm net: add per-cpu storage and net->core_stats 2022-03-11 23:17:24 -08:00
compat.c
devres.c
Kconfig page_pool: Add allocation stats 2022-03-03 09:55:28 +00:00
Kconfig.debug net: add networking namespace refcount tracker 2021-12-10 06:38:26 -08:00
Makefile
socket.c net: fix documentation for kernel_getsockname 2022-02-14 14:01:19 +00:00
sysctl_net.c