linux/net
Vladimir Oltean 2b86cb8299 net: dsa: declare lockless TX feature for slave ports
Be there a platform with the following layout:

      Regular NIC
       |
       +----> DSA master for switch port
               |
               +----> DSA master for another switch port

After changing DSA back to static lockdep class keys in commit
1a33e10e4a ("net: partially revert dynamic lockdep key changes"), this
kernel splat can be seen:

[   13.361198] ============================================
[   13.366524] WARNING: possible recursive locking detected
[   13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
[   13.377874] --------------------------------------------
[   13.383201] swapper/0/0 is trying to acquire lock:
[   13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.397879]
[   13.397879] but task is already holding lock:
[   13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.413593]
[   13.413593] other info that might help us debug this:
[   13.420140]  Possible unsafe locking scenario:
[   13.420140]
[   13.426075]        CPU0
[   13.428523]        ----
[   13.430969]   lock(&dsa_slave_netdev_xmit_lock_key);
[   13.435946]   lock(&dsa_slave_netdev_xmit_lock_key);
[   13.440924]
[   13.440924]  *** DEADLOCK ***
[   13.440924]
[   13.446860]  May be due to missing lock nesting notation
[   13.446860]
[   13.453668] 6 locks held by swapper/0/0:
[   13.457598]  #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
[   13.466593]  #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
[   13.474803]  #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
[   13.483886]  #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[   13.492793]  #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[   13.503094]  #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[   13.512000]
[   13.512000] stack backtrace:
[   13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
[   13.530421] Call trace:
[   13.532871]  dump_backtrace+0x0/0x1d8
[   13.536539]  show_stack+0x24/0x30
[   13.539862]  dump_stack+0xe8/0x150
[   13.543271]  __lock_acquire+0x1030/0x1678
[   13.547290]  lock_acquire+0xf8/0x458
[   13.550873]  _raw_spin_lock+0x44/0x58
[   13.554543]  __dev_queue_xmit+0x84c/0xbe0
[   13.558562]  dev_queue_xmit+0x24/0x30
[   13.562232]  dsa_slave_xmit+0xe0/0x128
[   13.565988]  dev_hard_start_xmit+0xf4/0x448
[   13.570182]  __dev_queue_xmit+0x808/0xbe0
[   13.574200]  dev_queue_xmit+0x24/0x30
[   13.577869]  neigh_resolve_output+0x15c/0x220
[   13.582237]  ip6_finish_output2+0x244/0xb10
[   13.586430]  __ip6_finish_output+0x1dc/0x298
[   13.590709]  ip6_output+0x84/0x358
[   13.594116]  mld_sendpack+0x2bc/0x560
[   13.597786]  mld_ifc_timer_expire+0x210/0x390
[   13.602153]  call_timer_fn+0xcc/0x400
[   13.605822]  run_timer_softirq+0x588/0x6e0
[   13.609927]  __do_softirq+0x118/0x590
[   13.613597]  irq_exit+0x13c/0x148
[   13.616918]  __handle_domain_irq+0x6c/0xc0
[   13.621023]  gic_handle_irq+0x6c/0x160
[   13.624779]  el1_irq+0xbc/0x180
[   13.627927]  cpuidle_enter_state+0xb4/0x4d0
[   13.632120]  cpuidle_enter+0x3c/0x50
[   13.635703]  call_cpuidle+0x44/0x78
[   13.639199]  do_idle+0x228/0x2c8
[   13.642433]  cpu_startup_entry+0x2c/0x48
[   13.646363]  rest_init+0x1ac/0x280
[   13.649773]  arch_call_rest_init+0x14/0x1c
[   13.653878]  start_kernel+0x490/0x4bc

Lockdep keys themselves were added in commit ab92d68fc2 ("net: core:
add generic lockdep keys"), and it's very likely that this splat existed
since then, but I have no real way to check, since this stacked platform
wasn't supported by mainline back then.

>From Taehee's own words:

  This patch was considered that all stackable devices have LLTX flag.
  But the dsa doesn't have LLTX, so this splat happened.
  After this patch, dsa shares the same lockdep class key.
  On the nested dsa interface architecture, which you illustrated,
  the same lockdep class key will be used in __dev_queue_xmit() because
  dsa doesn't have LLTX.
  So that lockdep detects deadlock because the same lockdep class key is
  used recursively although actually the different locks are used.
  There are some ways to fix this problem.

  1. using NETIF_F_LLTX flag.
  If possible, using the LLTX flag is a very clear way for it.
  But I'm so sorry I don't know whether the dsa could have LLTX or not.

  2. using dynamic lockdep again.
  It means that each interface uses a separate lockdep class key.
  So, lockdep will not detect recursive locking.
  But this way has a problem that it could consume lockdep class key
  too many.
  Currently, lockdep can have 8192 lockdep class keys.
   - you can see this number with the following command.
     cat /proc/lockdep_stats
     lock-classes:                         1251 [max: 8192]
     ...
     The [max: 8192] means that the maximum number of lockdep class keys.
  If too many lockdep class keys are registered, lockdep stops to work.
  So, using a dynamic(separated) lockdep class key should be considered
  carefully.
  In addition, updating lockdep class key routine might have to be existing.
  (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())

  3. Using lockdep subclass.
  A lockdep class key could have 8 subclasses.
  The different subclass is considered different locks by lockdep
  infrastructure.
  But "lock-classes" is not counted by subclasses.
  So, it could avoid stopping lockdep infrastructure by an overflow of
  lockdep class keys.
  This approach should also have an updating lockdep class key routine.
  (lockdep_set_subclass())

  4. Using nonvalidate lockdep class key.
  The lockdep infrastructure supports nonvalidate lockdep class key type.
  It means this lockdep is not validated by lockdep infrastructure.
  So, the splat will not happen but lockdep couldn't detect real deadlock
  case because lockdep really doesn't validate it.
  I think this should be used for really special cases.
  (lockdep_set_novalidate_class())

Further discussion here:
https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/

There appears to be no negative side-effect to declaring lockless TX for
the DSA virtual interfaces, which means they handle their own locking.
So that's what we do to make the splat go away.

Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.

Fixes: ab92d68fc2 ("net: core: add generic lockdep keys")
Suggested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27 15:09:28 -07:00
..
6lowpan
9p 9pnet: allow making incomplete read requests 2020-03-27 09:29:56 +00:00
802 net: 802: psnap.c: Use built-in RCU list checking 2020-02-24 13:02:53 -08:00
8021q net: vlan: suppress "failed to kill vid" warnings 2020-02-17 14:30:54 -08:00
appletalk
atm atm: fix a memory leak of vcc->user_back 2020-05-04 11:59:38 -07:00
ax25 ax25: fix setsockopt(SO_BINDTODEVICE) 2020-05-20 20:59:07 -07:00
batman-adv batman-adv: Fix refcnt leak in batadv_v_ogm_process 2020-04-21 10:08:05 +02:00
bluetooth Bluetooth: L2CAP: Use DEFER_SETUP to group ECRED connections 2020-03-25 22:16:08 +01:00
bpf bpf: Fix build warning regarding missing prototypes 2020-03-28 18:13:18 +01:00
bpfilter SPDX patches for 5.7-rc1. 2020-04-03 13:12:26 -07:00
bridge bridge: multicast: work around clang bug 2020-05-27 11:34:48 -07:00
caif net: caif: Add lockdep expression to RCU traversal primitive 2020-03-11 22:55:25 -07:00
can can: j1939: j1939_sk_bind(): take priv after lock is held 2019-12-08 11:52:02 +01:00
ceph libceph: directly skip to the end of redirect reply 2020-03-30 12:42:41 +02:00
core flow_dissector: Drop BPF flow dissector prog ref on netns cleanup 2020-05-21 17:52:45 -07:00
dcb
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-02-29 15:53:35 -08:00
decnet Remove DST_HOST 2020-03-23 21:57:44 -07:00
dns_resolver KEYS: Don't write out to userspace while holding key semaphore 2020-03-29 12:40:41 +01:00
dsa net: dsa: declare lockless TX feature for slave ports 2020-05-27 15:09:28 -07:00
ethernet net: remove eth_change_mtu 2020-01-27 11:09:31 +01:00
ethtool ethtool: count header size in reply size estimate 2020-05-21 16:59:19 -07:00
hsr net: hsr: fix incorrect type usage for protocol variable 2020-05-06 15:00:20 -07:00
ieee802154 nl802154: add missing attribute validation for dev_type 2020-03-03 13:28:48 -08:00
ife
ipv4 ipv4: nexthop version of fib_info_nh_uses_dev 2020-05-26 16:06:07 -07:00
ipv6 net: don't return invalid table id error when we fall back to PF_UNSPEC 2020-05-21 17:25:50 -07:00
iucv treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
kcm net: kcm: kcmproc.c: Fix RCU list suspicious usage warning 2020-03-16 17:14:02 -07:00
key
l2tp l2tp: Allow management of tunnels and session in user namespace 2020-04-08 14:30:46 -07:00
l3mdev
lapb
llc af_llc: fix if-statement empty body warning 2020-02-26 20:38:13 -08:00
mac80211 mac80211: mesh: fix discovery timer re-arming issue / crash 2020-05-25 10:31:16 +02:00
mac802154
mpls net: add net available in build_state 2020-03-29 22:30:57 -07:00
mptcp mptcp: avoid NULL-ptr derefence on fallback 2020-05-26 20:18:24 -07:00
ncsi net/ncsi: Support for multi host mellanox card 2020-01-09 18:36:22 -08:00
netfilter netfilter: nfnetlink_cthelper: unbreak userspace helper support 2020-05-25 20:39:14 +02:00
netlabel netlabel: cope with NULL catmap 2020-05-12 18:12:40 -07:00
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-25 18:58:11 -07:00
netrom net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node 2020-04-18 13:09:46 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
nsh
openvswitch net: openvswitch: ovs_ct_exit to be done under ovs_lock 2020-04-20 10:53:54 -07:00
packet net/packet: tpacket_rcv: avoid a producer race condition 2020-03-15 00:25:25 -07:00
phonet net: Remove redundant BUG_ON() check in phonet_pernet 2020-01-03 12:25:50 -08:00
psample net: psample: fix skb_over_panic 2019-11-26 14:40:13 -08:00
qrtr net: qrtr: Fix passing invalid reference to qrtr_local_enqueue() 2020-05-21 17:04:53 -07:00
rds net/rds: Use ERR_PTR for rds_message_alloc_sgs() 2020-04-15 12:33:29 -07:00
rfkill rfkill: Fix incorrect check to avoid NULL pointer dereference 2019-12-16 10:15:49 +01:00
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-01-26 10:40:21 +01:00
rxrpc rxrpc: Fix a memory leak in rxkad_verify_response() 2020-05-23 00:35:46 +01:00
sched net/sched: fix infinite loop in sch_fq_pie 2020-05-27 11:11:18 -07:00
sctp net: sctp: Fix spelling in Kconfig help 2020-05-26 20:32:06 -07:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
strparser
sunrpc NFS client bugfixes for Linux 5.7 2020-05-15 14:03:13 -07:00
switchdev net: switchdev: do not propagate bridge updates across bridges 2020-02-26 20:58:33 -08:00
tipc tipc: block BH before using dst_cache 2020-05-22 15:39:00 -07:00
tls net/tls: fix race condition causing kernel panic 2020-05-25 17:41:40 -07:00
unix net: datagram: drop 'destructor' argument from several helpers 2020-02-28 12:12:53 -08:00
vmw_vsock vsock: fix timeout in vsock_accept() 2020-05-27 11:20:23 -07:00
wimax
wireless cfg80211: fix debugfs rename crash 2020-05-25 13:12:32 +02:00
x25 net/x25: Fix null-ptr-deref in x25_disconnect 2020-04-28 14:08:59 -07:00
xdp xsk: Add missing check on user supplied headroom size 2020-04-15 13:07:18 +02:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2020-03-30 10:59:20 -07:00
compat.c net: abstract out normal and compat msghdr import 2020-03-10 09:12:49 -06:00
Kconfig net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build 2020-03-25 12:24:33 -07:00
Makefile mptcp: Add MPTCP socket stubs 2020-01-24 13:44:07 +01:00
socket.c for-5.7/io_uring-2020-03-29 2020-03-30 12:18:49 -07:00
sysctl_net.c