linux/net
Vladimir Oltean 90dc8fd360 net: bridge: notify switchdev of disappearance of old FDB entry upon migration
Currently the bridge emits atomic switchdev notifications for
dynamically learnt FDB entries. Monitoring these notifications works
wonders for switchdev drivers that want to keep their hardware FDB in
sync with the bridge's FDB.

For example station A wants to talk to station B in the diagram below,
and we are concerned with the behavior of the bridge on the DUT device:

                   DUT
 +-------------------------------------+
 |                 br0                 |
 | +------+ +------+ +------+ +------+ |
 | |      | |      | |      | |      | |
 | | swp0 | | swp1 | | swp2 | | eth0 | |
 +-------------------------------------+
      |        |                  |
  Station A    |                  |
               |                  |
         +--+------+--+    +--+------+--+
         |  |      |  |    |  |      |  |
         |  | swp0 |  |    |  | swp0 |  |
 Another |  +------+  |    |  +------+  | Another
  switch |     br0    |    |     br0    | switch
         |  +------+  |    |  +------+  |
         |  |      |  |    |  |      |  |
         |  | swp1 |  |    |  | swp1 |  |
         +--+------+--+    +--+------+--+
                                  |
                              Station B

Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has
the following property: frames injected from its control interface bypass
the internal address analyzer logic, and therefore, this hardware does
not learn from the source address of packets transmitted by the network
stack through it. So, since bridging between eth0 (where Station B is
attached) and swp0 (where Station A is attached) is done in software,
the switchdev hardware will never learn the source address of Station B.
So the traffic towards that destination will be treated as unknown, i.e.
flooded.

This is where the bridge notifications come in handy. When br0 on the
DUT sees frames with Station B's MAC address on eth0, the switchdev
driver gets these notifications and can install a rule to send frames
towards Station B's address that are incoming from swp0, swp1, swp2,
only towards the control interface. This is all switchdev driver private
business, which the notification makes possible.

All is fine until someone unplugs Station B's cable and moves it to the
other switch:

                   DUT
 +-------------------------------------+
 |                 br0                 |
 | +------+ +------+ +------+ +------+ |
 | |      | |      | |      | |      | |
 | | swp0 | | swp1 | | swp2 | | eth0 | |
 +-------------------------------------+
      |        |                  |
  Station A    |                  |
               |                  |
         +--+------+--+    +--+------+--+
         |  |      |  |    |  |      |  |
         |  | swp0 |  |    |  | swp0 |  |
 Another |  +------+  |    |  +------+  | Another
  switch |     br0    |    |     br0    | switch
         |  +------+  |    |  +------+  |
         |  |      |  |    |  |      |  |
         |  | swp1 |  |    |  | swp1 |  |
         +--+------+--+    +--+------+--+
               |
           Station B

Luckily for the use cases we care about, Station B is noisy enough that
the DUT hears it (on swp1 this time). swp1 receives the frames and
delivers them to the bridge, who enters the unlikely path in br_fdb_update
of updating an existing entry. It moves the entry in the software bridge
to swp1 and emits an addition notification towards that.

As far as the switchdev driver is concerned, all that it needs to ensure
is that traffic between Station A and Station B is not forever broken.
If it does nothing, then the stale rule to send frames for Station B
towards the control interface remains in place. But Station B is no
longer reachable via the control interface, but via a port that can
offload the bridge port learning attribute. It's just that the port is
prevented from learning this address, since the rule overrides FDB
updates. So the rule needs to go. The question is via what mechanism.

It sure would be possible for this switchdev driver to keep track of all
addresses which are sent to the control interface, and then also listen
for bridge notifier events on its own ports, searching for the ones that
have a MAC address which was previously sent to the control interface.
But this is cumbersome and inefficient. Instead, with one small change,
the bridge could notify of the address deletion from the old port, in a
symmetrical manner with how it did for the insertion. Then the switchdev
driver would not be required to monitor learn/forget events for its own
ports. It could just delete the rule towards the control interface upon
bridge entry migration. This would make hardware address learning be
possible again. Then it would take a few more packets until the hardware
and software FDB would be in sync again.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:34:45 -08:00
..
6lowpan
9p 9p for 5.11-rc1 2020-12-21 10:28:02 -08:00
802
8021q
appletalk net: appletalk: fix kerneldoc warnings 2020-10-30 11:48:17 -07:00
atm atm: nicstar: Replace in_interrupt() usage 2020-11-18 16:43:55 -08:00
ax25
batman-adv batman-adv: Drop unused soft-interface.h include in fragmentation.c 2020-12-04 08:41:16 +01:00
bluetooth Bluetooth: Increment management interface revision 2020-12-07 17:02:00 +02:00
bpf bpf: fix raw_tp test run in preempt kernel 2020-09-30 08:34:08 -07:00
bpfilter Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" 2020-10-15 12:33:24 -07:00
bridge net: bridge: notify switchdev of disappearance of old FDB entry upon migration 2021-01-07 15:34:45 -08:00
caif
can can: raw: return -ERANGE when filterset does not fit into user space buffer 2021-01-06 15:15:41 +01:00
ceph libceph: align session_key and con_secret to 16 bytes 2020-12-28 20:34:33 +01:00
core udp_tunnel: hard-wire NDOs to udp_tunnel_nic_*_port() helpers 2021-01-07 12:53:29 -08:00
dcb net: dcb: Validate netlink message in DCB handler 2020-12-23 12:19:48 -08:00
dccp selinux/stable-5.11 PR 20201214 2020-12-16 11:01:04 -08:00
decnet treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
dns_resolver
dsa net: dsa: print error on invalid port index 2021-01-06 16:21:08 -08:00
ethernet net: datagram: fix some kernel-doc markups 2020-11-17 14:15:03 -08:00
ethtool ethtool: fix error paths in ethnl_set_channels() 2020-12-16 13:27:17 -08:00
hsr genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
ieee802154 treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
ife
ipv4 udp_tunnel: reshuffle NETIF_F_RX_UDP_TUNNEL_PORT checks 2021-01-07 12:53:29 -08:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2020-12-18 18:07:14 -08:00
iucv net/af_iucv: use DECLARE_SOCKADDR to cast from sockaddr 2020-12-08 15:56:53 -08:00
kcm net: kcm: Replace fput with sockfd_put 2021-01-05 16:02:32 -08:00
key
l2tp lsm,selinux: pass flowi_common instead of flowi to the LSM hooks 2020-11-23 18:36:21 -05:00
l3mdev net: l3mdev: Fix kerneldoc warning 2020-10-30 11:43:42 -07:00
lapb net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event 2021-01-04 13:42:41 -08:00
llc net: llc: Fix kerneldoc warnings 2020-10-30 11:34:09 -07:00
mac80211 A new set of wireless changes: 2020-12-12 10:07:56 -08:00
mac802154 net: mac802154: convert tasklets to use new tasklet_setup() API 2020-11-07 10:40:56 -08:00
mpls mpls: drop skb's dst in mpls_forward() 2020-11-03 12:55:53 -08:00
mptcp net: mptcp: cap forward allocation to 1M 2020-12-28 13:53:57 -08:00
ncsi net/ncsi: Use real net-device for response handler 2020-12-23 12:22:23 -08:00
netfilter netfilter: nftables: add set expression flags 2020-12-28 10:50:26 +01:00
netlabel Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-11-19 19:08:46 -08:00
netlink netlink: export policy in extended ACK 2020-10-09 20:22:32 -07:00
netrom
nfc net: nfc: nci: Change the NCI close sequence 2021-01-05 16:06:34 -08:00
nsh
openvswitch net: openvswitch: fix TTL decrement exception action execution 2020-12-14 17:18:25 -08:00
packet net: af_packet: fix procfs header for 64-bit pointers 2020-12-18 12:17:23 -08:00
phonet
psample genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
qrtr wireless-drivers-next patches for v5.11 2020-12-04 10:56:37 -08:00
rds rds: stop using dmapool 2020-11-17 15:22:06 -04:00
rfkill rfkill: add a reason to the HW rfkill state 2020-12-11 12:47:17 +01:00
rose rose: Fix Null pointer dereference in rose_send_frame() 2020-11-20 10:04:58 -08:00
rxrpc net: rxrpc: convert comma to semicolon 2020-12-09 16:23:07 -08:00
sched net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sctp sctp: Fix some typo 2020-11-23 17:44:11 -08:00
smc net/smc: fix access to parent of an ib device 2020-12-16 13:33:47 -08:00
strparser
sunrpc NFS client updates for Linux 5.11 2020-12-17 12:15:03 -08:00
switchdev
tipc net: tipc: Replace expression with offsetof() 2021-01-05 15:43:41 -08:00
tls net: fix proc_fs init handling in af_packet and tls 2020-12-14 19:39:30 -08:00
unix networking changes for the 5.10 merge window 2020-10-15 18:42:13 -07:00
vmw_vsock af_vsock: Assign the vsock transport considering the vsock address flags 2020-12-14 19:33:39 -08:00
wireless A new set of wireless changes: 2020-12-12 10:07:56 -08:00
x25 net: x25: Remove unimplemented X.25-over-LLC code stubs 2020-12-12 17:15:33 -08:00
xdp xsk: Rollback reservation at NETDEV_TX_BUSY 2020-12-18 16:10:21 +01:00
xfrm selinux/stable-5.11 PR 20201214 2020-12-16 11:01:04 -08:00
compat.c iov_iter: transparently handle compat iovecs in import_iovec 2020-10-03 00:02:13 -04:00
devres.c
Kconfig wimax: move out to staging 2020-10-29 19:27:45 +01:00
Makefile wimax: move out to staging 2020-10-29 19:27:45 +01:00
socket.c for-5.11/io_uring-2020-12-14 2020-12-16 12:44:05 -08:00
sysctl_net.c