linux/net
Vladimir Oltean 161ca59d39 net: dsa: reference count the MDB entries at the cross-chip notifier level
Ever since the cross-chip notifiers were introduced, the design was
meant to be simplistic and just get the job done without worrying too
much about dangling resources left behind.

For example, somebody installs an MDB entry on sw0p0 in this daisy chain
topology. It gets installed using ds->ops->port_mdb_add() on sw0p0,
sw1p4 and sw2p4.

                                                    |
           sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
        [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
        [   x   ] [       ] [       ] [       ] [       ]
                                          |
                                          +---------+
                                                    |
           sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
        [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
        [       ] [       ] [       ] [       ] [   x   ]
                                          |
                                          +---------+
                                                    |
           sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
        [  user ] [  user ] [  user ] [  user ] [  dsa  ]
        [       ] [       ] [       ] [       ] [   x   ]

Then the same person deletes that MDB entry. The cross-chip notifier for
deletion only matches sw0p0:

                                                    |
           sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
        [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
        [   x   ] [       ] [       ] [       ] [       ]
                                          |
                                          +---------+
                                                    |
           sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
        [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
        [       ] [       ] [       ] [       ] [       ]
                                          |
                                          +---------+
                                                    |
           sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
        [  user ] [  user ] [  user ] [  user ] [  dsa  ]
        [       ] [       ] [       ] [       ] [       ]

Why?

Because the DSA links are 'trunk' ports, if we just go ahead and delete
the MDB from sw1p4 and sw2p4 directly, we might delete those multicast
entries when they are still needed. Just consider the fact that somebody
does:

- add a multicast MAC address towards sw0p0 [ via the cross-chip
  notifiers it gets installed on the DSA links too ]
- add the same multicast MAC address towards sw0p1 (another port of that
  same switch)
- delete the same multicast MAC address from sw0p0.

At this point, if we deleted the MAC address from the DSA links, it
would be flooded, even though there is still an entry on switch 0 which
needs it not to.

So that is why deletions only match the targeted source port and nothing
on DSA links. Of course, dangling resources means that the hardware
tables will eventually run out given enough additions/removals, but hey,
at least it's simple.

But there is a bigger concern which needs to be addressed, and that is
our support for SWITCHDEV_OBJ_ID_HOST_MDB. DSA simply translates such an
object into a dsa_port_host_mdb_add() which ends up as ds->ops->port_mdb_add()
on the upstream port, and a similar thing happens on deletion:
dsa_port_host_mdb_del() will trigger ds->ops->port_mdb_del() on the
upstream port.

When there are 2 VLAN-unaware bridges spanning the same switch (which is
a use case DSA proudly supports), each bridge will install its own
SWITCHDEV_OBJ_ID_HOST_MDB entries. But upon deletion, DSA goes ahead and
emits a DSA_NOTIFIER_MDB_DEL for dp->cpu_dp, which is shared between the
user ports enslaved to br0 and the user ports enslaved to br1. Not good.
The host-trapped multicast addresses installed by br1 will be deleted
when any state changes in br0 (IGMP timers expire, or ports leave, etc).

To avoid this, we could of course go the route of the zero-sum game and
delete the DSA_NOTIFIER_MDB_DEL call for dp->cpu_dp. But the better
design is to just admit that on shared ports like DSA links and CPU
ports, we should be reference counting calls, even if this consumes some
dynamic memory which DSA has traditionally avoided. On the flip side,
the hardware tables of switches are limited in size, so it would be good
if the OS managed them properly instead of having them eventually
overflow.

To address the memory usage concern, we only apply the refcounting of
MDB entries on ports that are really shared (CPU ports and DSA links)
and not on user ports. In a typical single-switch setup, this means only
the CPU port (and the host MDB entries are not that many, really).

The name of the newly introduced data structures (dsa_mac_addr) is
chosen in such a way that will be reusable for host FDB entries (next
patch).

With this change, we can finally have the same matching logic for the
MDB additions and deletions, as well as for their host-trapped variants.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29 10:46:23 -07:00
..
6lowpan 6lowpan: Fix some typos in nhc_udp.c 2021-03-24 17:52:11 -07:00
9p 9p/trans_virtio: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
802
8021q net: vlan: pass thru all GSO_SOFTWARE in hw_enc_features 2021-06-18 11:58:03 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
atm atm: Use list_for_each_entry() to simplify code in resources.c 2021-06-10 14:08:09 -07:00
ax25 net/ax25: Delete obsolete TODO file 2021-03-30 16:54:50 -07:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
bluetooth Bluetooth: Fix handling of HCI_LE_Advertising_Set_Terminated event 2021-06-26 07:12:45 +02:00
bpf bpf: Prepare bpf syscall to be used from kernel and user space. 2021-05-19 00:33:40 +02:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge net: bridge: allow br_fdb_replay to be called for the bridge device 2021-06-29 10:46:23 -07:00
caif net: caif: modify the label out_err to out 2021-06-18 12:07:09 -07:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
ceph libceph: Fix spelling mistakes 2021-06-03 13:24:23 -07:00
core net: update netdev_rx_csum_fault() print dump only once 2021-06-28 15:54:57 -07:00
dcb net: dcb: Return the correct errno code 2021-06-01 17:01:33 -07:00
dccp dccp: tfrc: fix doc warnings in tfrc_equation.c 2021-06-10 14:08:49 -07:00
decnet decnet: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
dns_resolver
dsa net: dsa: reference count the MDB entries at the cross-chip notifier level 2021-06-29 10:46:23 -07:00
ethernet of: net: pass the dst buffer to of_get_mac_address() 2021-04-13 14:35:02 -07:00
ethtool ethtool: Validate module EEPROM offset as part of policy 2021-06-22 10:40:54 -07:00
hsr net: hsr: don't check sequence number if tag removal is offloaded 2021-06-16 12:13:01 -07:00
ieee802154 ieee802154: fix error return code in ieee802154_llsec_getparams() 2021-06-03 10:59:49 +02:00
ife
ipv4 ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages 2021-06-28 14:29:45 -07:00
ipv6 ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages 2021-06-28 14:29:45 -07:00
iucv net/af_iucv: clean up some forward declarations 2021-06-12 13:06:33 -07:00
kcm revert "net: kcm: fix memory leak in kcm_sendmsg" 2021-06-07 13:34:37 -07:00
key net: Remove unnecessary variables 2021-05-26 07:03:39 +02:00
l2tp l2tp: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
l3mdev l3mdev: Correct function names in the kerneldoc comments 2021-03-28 17:56:55 -07:00
lapb net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c 2021-06-08 16:31:25 -07:00
llc llc2: Remove redundant assignment to rc 2021-04-27 14:16:14 -07:00
mac80211 mac80211: Switch to a virtual time-based airtime scheduler 2021-06-23 18:12:00 +02:00
mac802154 net: mac802154: Fix general protection fault 2021-04-06 22:42:16 +02:00
mpls mpls: Remove redundant assignment to err 2021-04-27 14:17:00 -07:00
mptcp mptcp: fix 'masking a bool' warning 2021-06-28 13:04:06 -07:00
ncsi net/ncsi: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2021-06-23 12:31:28 -07:00
netlabel netlabel: Fix memory leak in netlbl_mgmt_add_common 2021-06-15 11:19:04 -07:00
netlink netlink: disable IRQs for netlink_lock_table() 2021-05-17 15:31:03 -07:00
netrom net: netrom: nr_in: Remove redundant assignment to ns 2021-04-28 13:59:08 -07:00
nfc Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2021-06-07 13:01:52 -07:00
nsh
openvswitch openvswitch: add trace points 2021-06-22 10:47:32 -07:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
phonet
psample psample: Add additional metadata attributes 2021-03-14 15:00:43 -07:00
qrtr Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
rfkill Another set of updates, all over the map: 2021-04-20 16:44:04 -07:00
rose net: rose: Fix fall-through warnings for Clang 2021-03-10 12:45:15 -08:00
rxrpc rxrpc: Fix a typo 2021-06-02 14:01:55 -07:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-06-28 15:28:03 -07:00
sctp sctp: send the next probe immediately once the last one is acked 2021-06-24 12:58:03 -07:00
smc net/smc: Ensure correct state of the socket in send path 2021-06-25 11:53:51 -07:00
strparser
sunrpc xprtrdma: Revert 586a0787ce 2021-05-27 08:46:19 -04:00
switchdev net: switchdev: add a context void pointer to struct switchdev_notifier_info 2021-06-28 14:09:03 -07:00
tipc net: tipc: replace align() with ALIGN in msg.c 2021-06-28 13:31:57 -07:00
tls skbuff: add a parameter to __skb_frag_unref 2021-06-07 14:11:47 -07:00
unix __unix_find_socket_byname(): don't pass hash and type separately 2021-06-21 12:28:49 -07:00
vmw_vsock virtio/vsock: avoid NULL deref in virtio_transport_seqpacket_allow() 2021-06-22 09:49:37 -07:00
wireless cfg80211: Support hidden AP discovery over 6GHz band 2021-06-23 13:05:09 +02:00
x25 net: x25: Use list_for_each_entry() to simplify code in x25_route.c 2021-06-10 14:08:09 -07:00
xdp xdp: Add proper __rcu annotations to redirect map entries 2021-06-24 19:41:15 +02:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git 2021-06-28 13:17:16 -07:00
compat.c net: Return the correct errno code 2021-06-03 15:13:56 -07:00
devres.c net: devres: Correct a grammatical error 2021-06-11 12:55:28 -07:00
Kconfig bpf, kconfig: Add consolidated menu entry for bpf with core options 2021-05-11 13:56:16 -07:00
Makefile
socket.c net: add pf_family_names[] for protocol family 2021-06-21 14:41:54 -07:00
sysctl_net.c net: Ensure net namespace isolation of sysctls 2021-04-12 13:27:11 -07:00