Commit Graph

1043918 Commits

Author SHA1 Message Date
Jakub Kicinski
cbcc5072c2 Merge branch 'fix-two-possible-memory-leak-problems-in-nfc-digital-module'
Ziyang Xuan says:

====================
Fix two possible memory leak problems in NFC digital module.
====================

Link: https://lore.kernel.org/r/cover.1634111083.git.william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13 17:44:32 -07:00
Ziyang Xuan
291c932fc3 NFC: digital: fix possible memory leak in digital_in_send_sdd_req()
'skb' is allocated in digital_in_send_sdd_req(), but not free when
digital_in_send_cmd() failed, which will cause memory leak. Fix it
by freeing 'skb' if digital_in_send_cmd() return failed.

Fixes: 2c66daecc4 ("NFC Digital: Add NFC-A technology support")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13 17:44:29 -07:00
Ziyang Xuan
58e7dcc9ca NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()
'params' is allocated in digital_tg_listen_mdaa(), but not free when
digital_send_cmd() failed, which will cause memory leak. Fix it by
freeing 'params' if digital_send_cmd() return failed.

Fixes: 1c7a4c24fb ("NFC Digital: Add target NFC-DEP support")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13 17:44:29 -07:00
Ziyang Xuan
0911ab3189 nfc: fix error handling of nfc_proto_register()
When nfc proto id is using, nfc_proto_register() return -EBUSY error
code, but forgot to unregister proto. Fix it by adding proto_unregister()
in the error handling case.

Fixes: c7fe3b52c1 ("NFC: add NFC socket family")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20211013034932.2833737-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13 17:32:38 -07:00
Vladimir Oltean
1f922d9e37 Revert "net: procfs: add seq_puts() statement for dev_mcast"
This reverts commit ec18e84554.

It turns out that there are user space programs which got broken by that
change. One example is the "ifstat" program shipped by Debian:
https://packages.debian.org/source/bullseye/ifstat
which, confusingly enough, seems to not have anything in common with the
much more familiar (at least to me) ifstat program from iproute2:
https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/misc/ifstat.c

root@debian:~# ifstat
ifstat: /proc/net/dev: unsupported format.

This change modified the header (first two lines of text) in
/proc/net/dev so that it looks like this:

root@debian:~# cat /proc/net/dev
Interface|                            Receive                                       |                                 Transmit
         |            bytes      packets errs   drop fifo frame compressed multicast|            bytes      packets errs   drop fifo colls carrier compressed
       lo:            97400         1204    0      0    0     0          0         0            97400         1204    0      0    0     0       0          0
    bond0:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
     sit0:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
     eno2:          5002206         6651    0      0    0     0          0         0        105518642      1465023    0      0    0     0       0          0
     swp0:           134531         2448    0      0    0     0          0         0         99599598      1464381    0      0    0     0       0          0
     swp1:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
     swp2:          4867675         4203    0      0    0     0          0         0            58134          631    0      0    0     0       0          0
    sw0p0:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
    sw0p1:           124739         2448    0   1422    0     0          0         0         93741184      1464369    0      0    0     0       0          0
    sw0p2:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
    sw2p0:          4850863         4203    0      0    0     0          0         0            54722          619    0      0    0     0       0          0
    sw2p1:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
    sw2p2:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
    sw2p3:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
      br0:            10508          212    0    212    0     0          0       212         61369558       958857    0      0    0     0       0          0

whereas before it looked like this:

root@debian:~# cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:   13160     164    0    0    0     0          0         0    13160     164    0    0    0     0       0          0
 bond0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
  sit0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
  eno2:   30824     268    0    0    0     0          0         0     3332      37    0    0    0     0       0          0
  swp0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
  swp1:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
  swp2:   30824     268    0    0    0     0          0         0     2428      27    0    0    0     0       0          0
 sw0p0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
 sw0p1:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
 sw0p2:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
 sw2p0:   29752     268    0    0    0     0          0         0     1564      17    0    0    0     0       0          0
 sw2p1:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
 sw2p2:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
 sw2p3:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0

The reason why the ifstat shipped by Debian (v1.1, with a Debian patch
upgrading it to 1.1-8.1 at the time of writing) is broken is because its
"proc" driver/backend parses the header very literally:

main/drivers.c#L825
  if (!data->checked && strncmp(buf, "Inter-|", 7))
    goto badproc;

and there's no way in which the header can be changed such that programs
parsing like that would not get broken.

Even if we fix this ancient and very "lightly" maintained program to
parse the text output of /proc/net/dev in a more sensible way, this
story seems bound to repeat again with other programs, and modifying
them all could cause more trouble than it's worth. On the other hand,
the reverted patch had no other reason than an aesthetic one, so
reverting it is the simplest way out.

I don't know what other distributions would be affected; the fact that
Debian doesn't ship the iproute2 version of the program (a different
code base altogether, which uses netlink and not /proc/net/dev) is
surprising in itself.

Fixes: ec18e84554 ("net: procfs: add seq_puts() statement for dev_mcast")
Link: https://lore.kernel.org/netdev/20211009163511.vayjvtn3rrteglsu@skbuf/
Cc: Yajun Deng <yajun.deng@linux.dev>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211013001909.3164185-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13 17:24:38 -07:00
Nanyong Sun
f03dca0c9e net: encx24j600: check error in devm_regmap_init_encx24j600
devm_regmap_init may return error which caused by like out of memory,
this will results in null pointer dereference later when reading
or writing register:

general protection fault in encx24j600_spi_probe
KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
CPU: 0 PID: 286 Comm: spi-encx24j600- Not tainted 5.15.0-rc2-00142-g9978db750e31-dirty #11 9c53a778c1306b1b02359f3c2bbedc0222cba652
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:regcache_cache_bypass drivers/base/regmap/regcache.c:540
Code: 54 41 89 f4 55 53 48 89 fb 48 83 ec 08 e8 26 94 a8 fe 48 8d bb a0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4a 03 00 00 4c 8d ab b0 00 00 00 48 8b ab a0 00
RSP: 0018:ffffc900010476b8 EFLAGS: 00010207
RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: 0000000000000000
RDX: 0000000000000012 RSI: ffff888002de0000 RDI: 0000000000000094
RBP: ffff888013c9a000 R08: 0000000000000000 R09: fffffbfff3f9cc6a
R10: ffffc900010476e8 R11: fffffbfff3f9cc69 R12: 0000000000000001
R13: 000000000000000a R14: ffff888013c9af54 R15: ffff888013c9ad08
FS:  00007ffa984ab580(0000) GS:ffff88801fe00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055a6384136c8 CR3: 000000003bbe6003 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 encx24j600_spi_probe drivers/net/ethernet/microchip/encx24j600.c:459
 spi_probe drivers/spi/spi.c:397
 really_probe drivers/base/dd.c:517
 __driver_probe_device drivers/base/dd.c:751
 driver_probe_device drivers/base/dd.c:782
 __device_attach_driver drivers/base/dd.c:899
 bus_for_each_drv drivers/base/bus.c:427
 __device_attach drivers/base/dd.c:971
 bus_probe_device drivers/base/bus.c:487
 device_add drivers/base/core.c:3364
 __spi_add_device drivers/spi/spi.c:599
 spi_add_device drivers/spi/spi.c:641
 spi_new_device drivers/spi/spi.c:717
 new_device_store+0x18c/0x1f1 [spi_stub 4e02719357f1ff33f5a43d00630982840568e85e]
 dev_attr_store drivers/base/core.c:2074
 sysfs_kf_write fs/sysfs/file.c:139
 kernfs_fop_write_iter fs/kernfs/file.c:300
 new_sync_write fs/read_write.c:508 (discriminator 4)
 vfs_write fs/read_write.c:594
 ksys_write fs/read_write.c:648
 do_syscall_64 arch/x86/entry/common.c:50
 entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:113

Add error check in devm_regmap_init_encx24j600 to avoid this situation.

Fixes: 04fbfce7a2 ("net: Microchip encx24j600 driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Link: https://lore.kernel.org/r/20211012125901.3623144-1-sunnanyong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13 15:25:25 -07:00
Jakub Kicinski
b70b152173 mlx5-fixes-2021-10-12
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFl9XcACgkQSD+KveBX
 +j5oMwf/eL9WL4LcghB/CK2k8RsvxHNe10xsNCzfTY+HtLSnr9o2igVqDrPGkJO5
 Ik3V0ZqEdJ3NumYOWFK9jTi76q7qx8U4lLkiER29l03uqgHtYTK6stlc9xgGMKws
 pjPPeATJjK4aYMHT9roPXn8pPEdsKi0cUSS23hTcdRA53rybwSePbOT3jq5Wd8tb
 vH/T/MHQPQGmmmQKIAXz7PTBJmpRxgXd4eDavROWWojO+xKQK/YEampNkg/jdEe1
 D3kvfVGo3oU4f3Rz32jLFeNni7cLG3VKZfI4Pu5Iq1wq341HCO7XbTkv9MGLXtR6
 1XW+ZxNoezvyHG6jkxy1tsR5BwGDtw==
 =DlnK
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2021-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2021-10-12

* tag 'mlx5-fixes-2021-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Fix division by 0 in mlx5e_select_queue for representors
  net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp
  net/mlx5e: Switchdev representors are not vlan challenged
  net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path
  net/mlx5e: Allow only complete TXQs partition in MQPRIO channel mode
  net/mlx5: Fix cleanup of bridge delayed work
====================

Link: https://lore.kernel.org/r/20211012205323.20123-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13 13:39:55 -07:00
Vegard Nossum
427f974d97 net: korina: select CRC32
Fix the following build/link error by adding a dependency on the CRC32
routines:

  ld: drivers/net/ethernet/korina.o: in function `korina_multicast_list':
  korina.c:(.text+0x1af): undefined reference to `crc32_le'

Fixes: ef11291bcd ("Add support the Korina (IDT RC32434) Ethernet MAC")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Florian fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20211012152509.21771-1-vegard.nossum@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13 13:28:35 -07:00
Vegard Nossum
e599ee234a net: arc: select CRC32
Fix the following build/link error by adding a dependency on the CRC32
routines:

  ld: drivers/net/ethernet/arc/emac_main.o: in function `arc_emac_set_rx_mode':
  emac_main.c:(.text+0xb11): undefined reference to `crc32_le'

The crc32_le() call comes through the ether_crc_le() call in
arc_emac_set_rx_mode().

[v2: moved the select to ARC_EMAC_CORE; the Makefile is a bit confusing,
but the error comes from emac_main.o, which is part of the arc_emac module,
which in turn is enabled by CONFIG_ARC_EMAC_CORE. Note that arc_emac is
different from emac_arc...]

Fixes: 775dd682e2 ("arc_emac: implement promiscuous mode and multicast filtering")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Link: https://lore.kernel.org/r/20211012093446.1575-1-vegard.nossum@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13 09:00:10 -07:00
Jakub Kicinski
847c6bdba8 Merge branch 'felix-dsa-driver-fixes'
Vladimir Oltean says:

====================
Felix DSA driver fixes

This is an assorted collection of fixes for issues seen on the NXP
LS1028A switch.

- PTP packet drops due to switch congestion result in catastrophic
  damage to the driver's state
- loops are not blocked by STP if using the ocelot-8021q tagger
- driver uses the wrong CPU port when two of them are defined in DT
- module autoloading is broken* with both tagging protocol drivers
  (ocelot and ocelot-8021q)

Changes in v2:
- Stop printing that we aren't going to take TX timestamps if we don't
  have TX timestamping anyway, and we are just carrying PTP frames for a
  cascaded DSA switch.
- Shorten the deferred xmit kthread name so that it fits the 16
  character limit (TASK_COMM_LEN)
====================

Link: https://lore.kernel.org/r/20211012114044.2526146-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:22 -07:00
Vladimir Oltean
8d5f7954b7 net: dsa: felix: break at first CPU port during init and teardown
The NXP LS1028A switch has two Ethernet ports towards the CPU, but only
one of them is capable of acting as an NPI port at a time (inject and
extract packets using DSA tags).

However, using the alternative ocelot-8021q tagging protocol, it should
be possible to use both CPU ports symmetrically, but for that we need to
mark both ports in the device tree as DSA masters.

In the process of doing that, it can be seen that traffic to/from the
network stack gets broken, and this is because the Felix driver iterates
through all DSA CPU ports and configures them as NPI ports. But since
there can only be a single NPI port, we effectively end up in a
situation where DSA thinks the default CPU port is the first one, but
the hardware port configured to be an NPI is the last one.

I would like to treat this as a bug, because if the updated device trees
are going to start circulating, it would be really good for existing
kernels to support them, too.

Fixes: adb3dccf09 ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:19 -07:00
Vladimir Oltean
43ba33b4f1 net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports
When setting up a bridge with stp_state 1, topology changes are not
detected and loops are not blocked. This is because the standard way of
transmitting a packet, based on VLAN IDs redirected by VCAP IS2 to the
right egress port, does not override the port STP state (in the case of
Ocelot switches, that's really the PGID_SRC masks).

To force a packet to be injected into a port that's BLOCKING, we must
send it as a control packet, which means in the case of this tagger to
send it using the manual register injection method. We already do this
for PTP frames, extend the logic to apply to any link-local MAC DA.

Fixes: 7c83a7c539 ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:19 -07:00
Vladimir Oltean
1328a88325 net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent
At present, when a PTP packet which requires TX timestamping gets
dropped under congestion by the switch, things go downhill very fast.
The driver keeps a clone of that skb in a queue of packets awaiting TX
timestamp interrupts, but interrupts will never be raised for the
dropped packets.

Moreover, matching timestamped packets to timestamps is done by a 2-bit
timestamp ID, and this can wrap around and we can match on the wrong skb.

Since with the default NPI-based tagging protocol, we get no notification
about packet drops, the best we can do is eventually recover from the
drop of a PTP frame: its skb will be dead memory until another skb which
was assigned the same timestamp ID happens to find it.

However, with the ocelot-8021q tagger which injects packets using the
manual register interface, it appears that we can check for more
information, such as:

- whether the input queue has reached the high watermark or not
- whether the injection group's FIFO can accept additional data or not

so we know that a PTP frame is likely to get dropped before actually
sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it
can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet).

But when we do that, we can also remove the skb from the timestamping
queue, because there surely won't be any timestamp that matches it.

Fixes: 0a6f17c6ae ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:18 -07:00
Vladimir Oltean
49f885b2d9 net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib
Michael reported that when using the "ocelot-8021q" tagging protocol,
the switch driver module must be manually loaded before the tagging
protocol can be loaded/is available.

This appears to be the same problem described here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
where due to the fact that DSA tagging protocols make use of symbols
exported by the switch drivers, circular dependencies appear and this
breaks module autoloading.

The ocelot_8021q driver needs the ocelot_can_inject() and
ocelot_port_inject_frame() functions from the switch library. Previously
the wrong approach was taken to solve that dependency: shims were
provided for the case where the ocelot switch library was compiled out,
but that turns out to be insufficient, because the dependency when the
switch lib _is_ compiled is problematic too.

We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
static inline functions, because these access I/O functions like
__ocelot_write_ix() which is called by ocelot_write_rix(). Making those
static inline basically means exposing the whole guts of the ocelot
switch library, not ideal...

We already have one tagging protocol driver which calls into the switch
driver during xmit but not using any exported symbol: sja1105_defer_xmit.
We can do the same thing here: create a kthread worker and one work item
per skb, and let the switch driver itself do the register accesses to
send the skb, and then consume it.

Fixes: 0a6f17c6ae ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Reported-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:18 -07:00
Vladimir Oltean
deab6b1cd9 net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver
As explained here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
DSA tagging protocol drivers cannot depend on symbols exported by switch
drivers, because this creates a circular dependency that breaks module
autoloading.

The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
exported by the common ocelot switch lib. This function looks at
OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
DSA tag, for PTP timestamping (the command: one-step/two-step, and the
TX timestamp identifier).

None of that requires deep insight into the driver, it is quite
stateless, as it only depends upon the skb->cb. So let's make it a
static inline function and put it in include/linux/dsa/ocelot.h, a
file that despite its name is used by the ocelot switch driver for
populating the injection header too - since commit 40d3f295b5 ("net:
mscc: ocelot: use common tag parsing code with DSA").

With that function declared as static inline, its body is expanded
inside each call site, so the dependency is broken and the DSA tagger
can be built without the switch library, upon which the felix driver
depends.

Fixes: 39e5308b32 ("net: mscc: ocelot: support PTP Sync one-step timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:18 -07:00
Vladimir Oltean
ebb4c6a990 net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header
The sad reality is that when a PTP frame with a TX timestamping request
is transmitted, it isn't guaranteed that it will make it all the way to
the wire (due to congestion inside the switch), and that a timestamp
will be taken by the hardware and placed in the timestamp FIFO where an
IRQ will be raised for it.

The implication is that if enough PTP frames are silently dropped by the
hardware such that the timestamp ID has rolled over, it is possible to
match a timestamp to an old skb.

Furthermore, nobody will match on the real skb corresponding to this
timestamp, since we stupidly matched on a previous one that was stale in
the queue, and stopped there.

So PTP timestamping will be broken and there will be no way to recover.

It looks like the hardware parses the sequenceID from the PTP header,
and also provides that metadata for each timestamp. The driver currently
ignores this, but it shouldn't.

As an extra resiliency measure, do the following:

- check whether the PTP sequenceID also matches between the skb and the
  timestamp, treat the skb as stale otherwise and free it

- if we see a stale skb, don't stop there and try to match an skb one
  more time, chances are there's one more skb in the queue with the same
  timestamp ID, otherwise we wouldn't have ever found the stale one (it
  is by timestamp ID that we matched it).

While this does not prevent PTP packet drops, it at least prevents
the catastrophic consequences of incorrect timestamp matching.

Since we already call ptp_classify_raw in the TX path, save the result
in the skb->cb of the clone, and just use that result in the interrupt
code path.

Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:18 -07:00
Vladimir Oltean
fba01283d8 net: mscc: ocelot: deny TX timestamping of non-PTP packets
It appears that Ocelot switches cannot timestamp non-PTP frames,
I tested this using the isochron program at:
https://github.com/vladimiroltean/tsn-scripts

with the result that the driver increments the ocelot_port->ts_id
counter as expected, puts it in the REW_OP, but the hardware seems to
not timestamp these packets at all, since no IRQ is emitted.

Therefore check whether we are sending PTP frames, and refuse to
populate REW_OP otherwise.

Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:18 -07:00
Vladimir Oltean
9fde506e0c net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb
When skb_match is NULL, it means we received a PTP IRQ for a timestamp
ID that the kernel has no idea about, since there is no skb in the
timestamping queue with that timestamp ID.

This is a grave error and not something to just "continue" over.
So print a big warning in case this happens.

Also, move the check above ocelot_get_hwtimestamp(), there is no point
in reading the full 64-bit current PTP time if we're not going to do
anything with it anyway for this skb.

Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:18 -07:00
Vladimir Oltean
52849bcf00 net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO
PTP packets with 2-step TX timestamp requests are matched to packets
based on the egress port number and a 6-bit timestamp identifier.
All PTP timestamps are held in a common FIFO that is 128 entry deep.

This patch ensures that back-to-back timestamping requests cannot exceed
the hardware FIFO capacity. If that happens, simply send the packets
without requesting a TX timestamp to be taken (in the case of felix,
since the DSA API has a void return code in ds->ops->port_txtstamp) or
drop them (in the case of ocelot).

I've moved the ts_id_lock from a per-port basis to a per-switch basis,
because we need separate accounting for both numbers of PTP frames in
flight. And since we need locking to inc/dec the per-switch counter,
that also offers protection for the per-port counter and hence there is
no reason to have a per-port counter anymore.

Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:17 -07:00
Vladimir Oltean
c57fe0037a net: mscc: ocelot: make use of all 63 PTP timestamp identifiers
At present, there is a problem when user space bombards a port with PTP
event frames which have TX timestamping requests (or when a tc-taprio
offload is installed on a port, which delays the TX timestamps by a
significant amount of time). The driver will happily roll over the 2-bit
timestamp ID and this will cause incorrect matches between an skb and
the TX timestamp collected from the FIFO.

The Ocelot switches have a 6-bit PTP timestamp identifier, and the value
63 is reserved, so that leaves identifiers 0-62 to be used.

The timestamp identifiers are selected by the REW_OP packet field, and
are actually shared between CPU-injected frames and frames which match a
VCAP IS2 rule that modifies the REW_OP. The hardware supports
partitioning between the two uses of the REW_OP field through the
PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP
IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2.

The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping,
and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in
ocelot_init_timestamp() which makes all timestamp identifiers available
to CPU injection.

Therefore, we can make use of all 63 timestamp identifiers, which should
allow more timestampable packets to be in flight on each port. This is
only part of the solution, more issues will be addressed in future changes.

Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:17 -07:00
Jakub Kicinski
3af760e4d3 Merge branch 'fix-circular-dependency-between-sja1105-and-tag_sja1105'
Vladimir Oltean says:

====================
Fix circular dependency between sja1105 and tag_sja1105

As discussed here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
DSA tagging protocols cannot use symbols exported by switch drivers.

Eliminate the two instances of that from tag_sja1105, and that allows us
to have a working setup with modules again.
====================

Re-applying to net, this was mistakenly applied to net-next,
see first Link.

Link: https://lore.kernel.org/r/20211012114044.2526146-1-vladimir.oltean@nxp.com/
Link: https://lore.kernel.org/r/20210922143726.2431036-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:33:59 -07:00
Vladimir Oltean
4ac0567e40 net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver
It's nice to be able to test a tagging protocol with dsa_loop, but not
at the cost of losing the ability of building the tagging protocol and
switch driver as modules, because as things stand, there is a circular
dependency between the two. Tagging protocol drivers cannot depend on
switch drivers, that is a hard fact.

The reasoning behind the blamed patch was that accessing dp->priv should
first make sure that the structure behind that pointer is what we really
think it is.

Currently the "sja1105" and "sja1110" tagging protocols only operate
with the sja1105 switch driver, just like any other tagging protocol and
switch combination. The only way to mix and match them is by modifying
the code, and this applies to dsa_loop as well (by default that uses
DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
practice there isn't one.

Until we extend dsa_loop to allow user space configuration, treat the
problem as a non-issue and just say that DSA ports found by tag_sja1105
are always sja1105 ports, which is in fact true. But keep the
dsa_port_is_sja1105 function so that it's easy to patch it during
testing, and rely on dead code elimination.

Fixes: 994d2cbb08 ("net: dsa: tag_sja1105: be dsa_loop-safe")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:33:36 -07:00
Vladimir Oltean
28da0555c3 net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver
The problem is that DSA tagging protocols really must not depend on the
switch driver, because this creates a circular dependency at insmod
time, and the switch driver will effectively not load when the tagging
protocol driver is missing.

The code was structured in the way it was for a reason, though. The DSA
driver-facing API for PTP timestamping relies on the assumption that
two-step TX timestamps are provided by the hardware in an out-of-band
manner, typically by raising an interrupt and making that timestamp
available inside some sort of FIFO which is to be accessed over
SPI/MDIO/etc.

So the API puts .port_txtstamp into dsa_switch_ops, because it is
expected that the switch driver needs to save some state (like put the
skb into a queue until its TX timestamp arrives).

On SJA1110, TX timestamps are provided by the switch as Ethernet
packets, so this makes them be received and processed by the tagging
protocol driver. This in itself is great, because the timestamps are
full 64-bit and do not require reconstruction, and since Ethernet is the
fastest I/O method available to/from the switch, PTP timestamps arrive
very quickly, no matter how bottlenecked the SPI connection is, because
SPI interaction is not needed at all.

DSA's code structure and strict isolation between the tagging protocol
driver and the switch driver break the natural code organization.

When the tagging protocol driver receives a packet which is classified
as a metadata packet containing timestamps, it passes those timestamps
one by one to the switch driver, which then proceeds to compare them
based on the recorded timestamp ID that was generated in .port_txtstamp.

The communication between the tagging protocol and the switch driver is
done through a method exported by the switch driver, sja1110_process_meta_tstamp.
To satisfy build requirements, we force a dependency to build the
tagging protocol driver as a module when the switch driver is a module.
However, as explained in the first paragraph, that causes the circular
dependency.

To solve this, move the skb queue from struct sja1105_private :: struct
sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
The latter is a data structure for which hacks have already been put
into place to be able to create persistent storage per switch that is
accessible from the tagging protocol driver (see sja1105_setup_ports).

With the skb queue directly accessible from the tagging protocol driver,
we can now move sja1110_process_meta_tstamp into the tagging driver
itself, and avoid exporting a symbol.

Fixes: 566b18c8b7 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:33:36 -07:00
Alvin Šipraga
43a4b4dbd4 net: dsa: fix spurious error message when unoffloaded port leaves bridge
Flip the sign of a return value check, thereby suppressing the following
spurious error:

  port 2 failed to notify DSA_NOTIFIER_BRIDGE_LEAVE: -EOPNOTSUPP

... which is emitted when removing an unoffloaded DSA switch port from a
bridge.

Fixes: d371b7c92d ("net: dsa: Unset vlan_filtering when ports leave the bridge")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20211012112730.3429157-1-alvin@pqrs.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 16:11:23 -07:00
Baowen Zheng
60d950f443 nfp: flow_offload: move flow_indr_dev_register from app init to app start
In commit 74fc4f8287 ("net: Fix offloading indirect devices dependency
on qdisc order creation"), it adds a process to trigger the callback to
setup the bo callback when the driver regists a callback.

In our current implement, we are not ready to run the callback when nfp
call the function flow_indr_dev_register, then there will be error
message as:

kernel: Oops: 0000 [#1] SMP PTI
kernel: CPU: 0 PID: 14119 Comm: kworker/0:0 Tainted: G
kernel: Workqueue: events work_for_cpu_fn
kernel: RIP: 0010:nfp_flower_indr_setup_tc_cb+0x258/0x410
kernel: RSP: 0018:ffffbc1e02c57bf8 EFLAGS: 00010286
kernel: RAX: 0000000000000000 RBX: ffff9c761fabc000 RCX: 0000000000000001
kernel: RDX: 0000000000000001 RSI: fffffffffffffff0 RDI: ffffffffc0be9ef1
kernel: RBP: ffffbc1e02c57c58 R08: ffffffffc08f33aa R09: ffff9c6db7478800
kernel: R10: 0000009c003f6e00 R11: ffffbc1e02800000 R12: ffffbc1e000d9000
kernel: R13: ffffbc1e000db428 R14: ffff9c6db7478800 R15: ffff9c761e884e80
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: fffffffffffffff0 CR3: 00000009e260a004 CR4: 00000000007706f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: PKRU: 55555554
kernel: Call Trace:
kernel: ? flow_indr_dev_register+0xab/0x210
kernel: ? __cond_resched+0x15/0x30
kernel: ? kmem_cache_alloc_trace+0x44/0x4b0
kernel: ? nfp_flower_setup_tc+0x1d0/0x1d0 [nfp]
kernel: flow_indr_dev_register+0x158/0x210
kernel: ? tcf_block_unbind+0xe0/0xe0
kernel: nfp_flower_init+0x40b/0x650 [nfp]
kernel: nfp_net_pci_probe+0x25f/0x960 [nfp]
kernel: ? nfp_rtsym_read_le+0x76/0x130 [nfp]
kernel: nfp_pci_probe+0x6a9/0x820 [nfp]
kernel: local_pci_probe+0x45/0x80

So we need to call flow_indr_dev_register in app start process instead of
init stage.

Fixes: 74fc4f8287 ("net: Fix offloading indirect devices dependency on qdisc order creation")
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/r/20211012124850.13025-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 16:07:52 -07:00
Maxim Mikityanskiy
84c8a87402 net/mlx5e: Fix division by 0 in mlx5e_select_queue for representors
Commit 846d6da1fc ("net/mlx5e: Fix division by 0 in
mlx5e_select_queue") makes mlx5e_build_nic_params assign a non-zero
initial value to priv->num_tc_x_num_ch, so that mlx5e_select_queue
doesn't fail with division by 0 if called before the first activation of
channels. However, the initialization flow of representors doesn't call
mlx5e_build_nic_params, so this bug can still happen with representors.

This commit fixes the bug by adding the missing assignment to
mlx5e_build_rep_params.

Fixes: 846d6da1fc ("net/mlx5e: Fix division by 0 in mlx5e_select_queue")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-12 13:52:03 -07:00
Aya Levin
0bc73ad46a net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp
Due to current HW arch limitations, RX-FCS (scattering FCS frame field
to software) and RX-port-timestamp (improved timestamp accuracy on the
receive side) can't work together.
RX-port-timestamp is not controlled by the user and it is enabled by
default when supported by the HW/FW.
This patch sets RX-port-timestamp opposite to RX-FCS configuration.

Fixes: 102722fc68 ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-12 13:51:59 -07:00
Saeed Mahameed
b2107cdc43 net/mlx5e: Switchdev representors are not vlan challenged
Before this patch, mlx5 representors advertised the
NETIF_F_VLAN_CHALLENGED bit, this could lead to missing features when
using reps with vxlan/bridge and maybe other virtual interfaces,
when such interfaces inherit this bit and block vlan usage in their
topology.

Example:
$ip link add dev bridge type bridge
 # add representor interface to the bridge
$ip link set dev pf0hpf master
$ip link add link bridge name vlan10 type vlan id 10 protocol 802.1q
Error: 8021q: VLANs not supported on device.

Reps are perfectly capable of handling vlan traffic, although they don't
implement vlan_{add,kill}_vid ndos, hence, remove
NETIF_F_VLAN_CHALLENGED advertisement.

Fixes: cb67b83292 ("net/mlx5e: Introduce SRIOV VF representors")
Reported-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
2021-10-12 13:51:56 -07:00
Valentine Fatiev
94b960b9de net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path
Prior to this patch in case mlx5_core_destroy_cq() failed it returns
without completing all destroy operations and that leads to memory leak.
Instead, complete the destroy flow before return error.

Also move mlx5_debug_cq_remove() to the beginning of mlx5_core_destroy_cq()
to be symmetrical with mlx5_core_create_cq().

kmemleak complains on:

unreferenced object 0xc000000038625100 (size 64):
  comm "ethtool", pid 28301, jiffies 4298062946 (age 785.380s)
  hex dump (first 32 bytes):
    60 01 48 94 00 00 00 c0 b8 05 34 c3 00 00 00 c0  `.H.......4.....
    02 00 00 00 00 00 00 00 00 db 7d c1 00 00 00 c0  ..........}.....
  backtrace:
    [<000000009e8643cb>] add_res_tree+0xd0/0x270 [mlx5_core]
    [<00000000e7cb8e6c>] mlx5_debug_cq_add+0x5c/0xc0 [mlx5_core]
    [<000000002a12918f>] mlx5_core_create_cq+0x1d0/0x2d0 [mlx5_core]
    [<00000000cef0a696>] mlx5e_create_cq+0x210/0x3f0 [mlx5_core]
    [<000000009c642c26>] mlx5e_open_cq+0xb4/0x130 [mlx5_core]
    [<0000000058dfa578>] mlx5e_ptp_open+0x7f4/0xe10 [mlx5_core]
    [<0000000081839561>] mlx5e_open_channels+0x9cc/0x13e0 [mlx5_core]
    [<0000000009cf05d4>] mlx5e_switch_priv_channels+0xa4/0x230
[mlx5_core]
    [<0000000042bbedd8>] mlx5e_safe_switch_params+0x14c/0x300
[mlx5_core]
    [<0000000004bc9db8>] set_pflag_tx_port_ts+0x9c/0x160 [mlx5_core]
    [<00000000a0553443>] mlx5e_set_priv_flags+0xd0/0x1b0 [mlx5_core]
    [<00000000a8f3d84b>] ethnl_set_privflags+0x234/0x2d0
    [<00000000fd27f27c>] genl_family_rcv_msg_doit+0x108/0x1d0
    [<00000000f495e2bb>] genl_family_rcv_msg+0xe4/0x1f0
    [<00000000646c5c2c>] genl_rcv_msg+0x78/0x120
    [<00000000d53e384e>] netlink_rcv_skb+0x74/0x1a0

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Valentine Fatiev <valentinef@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-12 13:51:52 -07:00
Tariq Toukan
ca20dfda05 net/mlx5e: Allow only complete TXQs partition in MQPRIO channel mode
Do not allow configurations of MQPRIO channel mode that do not
fully define and utilize the channels txqs.

Fixes: ec60c4581b ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-12 13:51:48 -07:00
Shay Drory
2266bb1e12 net/mlx5: Fix cleanup of bridge delayed work
Currently, bridge cleanup is calling to cancel_delayed_work(). When this
function is finished, there is a chance that the delayed work is still
running. Also, the delayed work is queueing itself.
As a result, we might execute the delayed work after the bridge cleanup
have finished and hit a null-ptr oops[1].

Fix it by using cancel_delayed_work_sync(), which is waiting until the
work is done and will cancel the queue work.

[1]
[ 8202.143043 ] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 8202.144438 ] #PF: supervisor write access in kernel mode
[ 8202.145476 ] #PF: error_code(0x0002) - not-present page
[ 8202.146520 ] PGD 0 P4D 0
[ 8202.147126 ] Oops: 0002 [#1] SMP NOPTI
[ 8202.147899 ] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.14.0-rc6_for_upstream_min_debug_2021_08_25_16_06 #1
[ 8202.149741 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 8202.151908 ] RIP: 0010:_raw_spin_lock+0xc/0x20
[ 8202.156234 ] RSP: 0018:ffff88846f885ea0 EFLAGS: 00010046
[ 8202.157289 ] RAX: 0000000000000000 RBX: ffff88846f880000 RCX: 0000000000000000
[ 8202.158731 ] RDX: 0000000000000001 RSI: ffff8881004000c8 RDI: 0000000000000000
[ 8202.160177 ] RBP: ffff8881fe684978 R08: ffff888100140000 R09: ffffffff824455b8
[ 8202.161569 ] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 8202.163004 ] R13: 0000000000000012 R14: 0000000000000200 R15: ffff88812992d000
[ 8202.164018 ] FS:  0000000000000000(0000) GS:ffff88846f880000(0000) knlGS:0000000000000000
[ 8202.164960 ] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8202.165634 ] CR2: 0000000000000000 CR3: 0000000108cac004 CR4: 0000000000370ea0
[ 8202.166450 ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8202.167807 ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8202.168852 ] Call Trace:
[ 8202.169421 ]  <IRQ>
[ 8202.169792 ]  __queue_work+0xf2/0x3d0
[ 8202.170481 ]  ? queue_work_node+0x40/0x40
[ 8202.171270 ]  call_timer_fn+0x2b/0x100
[ 8202.171932 ]  __run_timers.part.0+0x152/0x220
[ 8202.172717 ]  ? __hrtimer_run_queues+0x171/0x290
[ 8202.173526 ]  ? kvm_clock_get_cycles+0xd/0x10
[ 8202.174232 ]  ? ktime_get+0x35/0x90
[ 8202.174943 ]  run_timer_softirq+0x26/0x50
[ 8202.175745 ]  __do_softirq+0xc7/0x271
[ 8202.176373 ]  irq_exit_rcu+0x93/0xb0
[ 8202.176983 ]  sysvec_apic_timer_interrupt+0x72/0x90
[ 8202.177755 ]  </IRQ>
[ 8202.178245 ]  asm_sysvec_apic_timer_interrupt+0x12/0x20

Fixes: c636a0f0f3 ("net/mlx5: Bridge, dynamic entry ageing")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-12 13:51:45 -07:00
Jacob Keller
4d4a223a86 ice: fix locking for Tx timestamp tracking flush
Commit 4dd0d5c33c ("ice: add lock around Tx timestamp tracker flush")
added a lock around the Tx timestamp tracker flow which is used to
cleanup any left over SKBs and prepare for device removal.

This lock is problematic because it is being held around a call to
ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY
write command to firmware. This could lead to a deadlock if the mutex
actually sleeps, and causes the following warning on a kernel with
preemption debugging enabled:

[  715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573
[  715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod
[  715.435652] INFO: lockdep is turned off.
[  715.439591] Preemption disabled at:
[  715.439594] [<0000000000000000>] 0x0
[  715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G        W  OE     5.15.0-rc4+ #42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c
[  715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[  715.468483] Call Trace:
[  715.470940]  dump_stack_lvl+0x6a/0x9a
[  715.474613]  ___might_sleep.cold+0x224/0x26a
[  715.478895]  __mutex_lock+0xb3/0x1440
[  715.482569]  ? stack_depot_save+0x378/0x500
[  715.486763]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.494979]  ? kfree+0xc1/0x520
[  715.498128]  ? mutex_lock_io_nested+0x12a0/0x12a0
[  715.502837]  ? kasan_set_free_info+0x20/0x30
[  715.507110]  ? __kasan_slab_free+0x10b/0x140
[  715.511385]  ? slab_free_freelist_hook+0xc7/0x220
[  715.516092]  ? kfree+0xc1/0x520
[  715.519235]  ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.527359]  ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.535133]  ? pci_device_remove+0xab/0x1d0
[  715.539318]  ? __device_release_driver+0x35b/0x690
[  715.544110]  ? driver_detach+0x214/0x2f0
[  715.548035]  ? bus_remove_driver+0x11d/0x2f0
[  715.552309]  ? pci_unregister_driver+0x26/0x250
[  715.556840]  ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.564799]  ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0
[  715.570554]  ? do_syscall_64+0x3b/0x90
[  715.574303]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[  715.579529]  ? start_flush_work+0x542/0x8f0
[  715.583719]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.591923]  ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.599960]  ? wait_for_completion_io+0x250/0x250
[  715.604662]  ? lock_acquire+0x196/0x200
[  715.608504]  ? do_raw_spin_trylock+0xa5/0x160
[  715.612864]  ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.620813]  ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.628497]  ? __debug_check_no_obj_freed+0x1e8/0x3c0
[  715.633550]  ? trace_hardirqs_on+0x1c/0x130
[  715.637748]  ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.646220]  ? do_raw_spin_trylock+0xa5/0x160
[  715.650581]  ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.658797]  ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.667013]  ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.675403]  ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.683440]  ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.691037]  ? _raw_spin_unlock_irqrestore+0x46/0x73
[  715.696005]  pci_device_remove+0xab/0x1d0
[  715.700018]  __device_release_driver+0x35b/0x690
[  715.704637]  driver_detach+0x214/0x2f0
[  715.708389]  bus_remove_driver+0x11d/0x2f0
[  715.712489]  pci_unregister_driver+0x26/0x250
[  715.716857]  ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.724637]  __do_sys_delete_module.constprop.0+0x2d8/0x4e0
[  715.730210]  ? free_module+0x6d0/0x6d0
[  715.733963]  ? task_work_run+0xe1/0x170
[  715.737803]  ? exit_to_user_mode_loop+0x17f/0x1d0
[  715.742509]  ? rcu_read_lock_sched_held+0x12/0x80
[  715.747215]  ? trace_hardirqs_on+0x1c/0x130
[  715.751401]  do_syscall_64+0x3b/0x90
[  715.754981]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  715.760033] RIP: 0033:0x7f4dfe59000b
[  715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48
[  715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b
[  715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918
[  715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940
[  715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0

Notice that this is the only case where we use the lock in this way. In
the cleanup kthread and work kthread the lock is only taken around the
bit accesses. This was done intentionally to avoid this kind of issue.
The way the lock is used, we only protect ordering of bit sets vs bit
clears. The Tx writers in the hot path don't need to be protected
against the entire kthread loop. The Tx queues threads only need to
ensure that they do not re-use an index that is currently in use. The
cleanup loop does not need to block all new set bits, since it will
re-queue itself if new timestamps are present.

Fix the tracker flow so that it uses the same flow as the standard
cleanup thread. In addition, ensure the in_use bitmap actually gets
cleared properly.

This fixes the warning and also avoids the potential deadlock that might
have occurred otherwise.

Fixes: 4dd0d5c33c ("ice: add lock around Tx timestamp tracker flush")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 12:10:39 +01:00
David S. Miller
7389074ced Merge branch 'ioam-fixes'
Justin Iurman says:

====================
Correct the IOAM behavior for undefined trace type bits

(@Jakub @David: there will be a conflict for #2 when merging net->net-next, due
to commit [1]. The conflict is only 5-10 lines for #2 (#1 should be fine) inside
the file tools/testing/selftests/net/ioam6.sh, so quite short though possibly
ugly. Sorry for that, I didn't expect to post this one... Had I known, I'd have
made the opposite.)

Modify both the input and output behaviors regarding the trace type when one of
the undefined bits is set. The goal is to keep the interoperability when new
fields (aka new bits inside the range 12-21) will be defined.

The draft [2] says the following:
---------------------------------------------------------------
"Bit 12-21  Undefined.  These values are available for future
       assignment in the IOAM Trace-Type Registry (Section 8.2).
       Every future node data field corresponding to one of
       these bits MUST be 4-octets long.  An IOAM encapsulating
       node MUST set the value of each undefined bit to 0.  If
       an IOAM transit node receives a packet with one or more
       of these bits set to 1, it MUST either:

       1.  Add corresponding node data filled with the reserved
           value 0xFFFFFFFF, after the node data fields for the
           IOAM-Trace-Type bits defined above, such that the
           total node data added by this node in units of
           4-octets is equal to NodeLen, or

       2.  Not add any node data fields to the packet, even for
           the IOAM-Trace-Type bits defined above."
---------------------------------------------------------------

The output behavior has been modified to respect the fact that "an IOAM encap
node MUST set the value of each undefined bit to 0" (i.e., undefined bits can't
be set anymore).

As for the input behavior, current implementation is based on the second choice
(i.e., "not add any data fields to the packet [...]"). With this solution, any
interoperability is lost (i.e., if a new bit is defined, then an "old" kernel
implementation wouldn't fill IOAM data when such new bit is set inside the trace
type).

The input behavior is therefore relaxed and these undefined bits are now allowed
to be set. It is only possible thanks to the sentence "every future node data
field corresponding to one of these bits MUST be 4-octets long". Indeed, the
default empty value (the one for 4-octet fields) is inserted whenever an
undefined bit is set.

  [1] cfbe9b0021
  [2] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:49:50 +01:00
Justin Iurman
7b1700e009 selftests: net: modify IOAM tests for undef bits
The output behavior for undefined bits is now directly tested inside the bash
script. Trying to set an undefined bit should be refused.

The input behavior for undefined bits has been removed due to the fact that we
would need another sender allowed to set undefined bits.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:49:49 +01:00
Justin Iurman
2bbc977ca6 ipv6: ioam: move the check for undefined bits
The check for undefined bits in the trace type is moved from the input side to
the output side, while the input side is relaxed and now inserts default empty
values when an undefined bit is set.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:49:49 +01:00
Arun Ramadoss
ef1100ef20 net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work
When the ksz module is installed and removed using rmmod, kernel crashes
with null pointer dereferrence error. During rmmod, ksz_switch_remove
function tries to cancel the mib_read_workqueue using
cancel_delayed_work_sync routine and unregister switch from dsa.

During dsa_unregister_switch it calls ksz_mac_link_down, which in turn
reschedules the workqueue since mib_interval is non-zero.
Due to which queue executed after mib_interval and it tries to access
dp->slave. But the slave is unregistered in the ksz_switch_remove
function. Hence kernel crashes.

To avoid this crash, before canceling the workqueue, resetted the
mib_interval to 0.

v1 -> v2:
-Removed the if condition in ksz_mib_read_work

Fixes: 469b390e1b ("net: dsa: microchip: use delayed_work instead of timer + work")
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:35:53 +01:00
Vegard Nossum
9973a43012 r8152: select CRC32 and CRYPTO/CRYPTO_HASH/CRYPTO_SHA256
Fix the following build/link errors by adding a dependency on
CRYPTO, CRYPTO_HASH, CRYPTO_SHA256 and CRC32:

  ld: drivers/net/usb/r8152.o: in function `rtl8152_fw_verify_checksum':
  r8152.c:(.text+0x2b2a): undefined reference to `crypto_alloc_shash'
  ld: r8152.c:(.text+0x2bed): undefined reference to `crypto_shash_digest'
  ld: r8152.c:(.text+0x2c50): undefined reference to `crypto_destroy_tfm'
  ld: drivers/net/usb/r8152.o: in function `_rtl8152_set_rx_mode':
  r8152.c:(.text+0xdcb0): undefined reference to `crc32_le'

Fixes: 9370f2d05a ("r8152: support request_firmware for RTL8153")
Fixes: ac718b6930 ("net/usb: new driver for RTL8152")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:33:32 +01:00
Maarten Zanders
4a3e0aeddf net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's
mv88e6xxx_port_ppu_updates() interpretes data in the PORT_STS
register incorrectly for internal ports (ie no PPU). In these
cases, the PHY_DETECT bit indicates link status. This results
in forcing the MAC state whenever the PHY link goes down which
is not intended. As a side effect, LED's configured to show
link status stay lit even though the physical link is down.

Add a check in mac_link_down and mac_link_up to see if it
concerns an external port and only then, look at PPU status.

Fixes: 5d5b231da7 (net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down)
Reported-by: Maarten Zanders <m.zanders@televic.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Maarten Zanders <maarten.zanders@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:32:14 +01:00
Wan Jiabing
74a3bc42fe net: mscc: ocelot: Fix dumplicated argument in ocelot
Fix the following coccicheck warning:
drivers/net/ethernet/mscc/ocelot.c:474:duplicated argument to & or |
drivers/net/ethernet/mscc/ocelot.c:476:duplicated argument to & or |
drivers/net/ethernet/mscc/ocelot_net.c:1627:duplicated argument
to & or |

These DEV_CLOCK_CFG_MAC_TX_RST are duplicate here.
Here should be DEV_CLOCK_CFG_MAC_RX_RST.

Fixes: e6e12df625 ("net: mscc: ocelot: convert to phylink")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:21:55 +01:00
Stephen Boyd
0edf0824e0 af_unix: Rename UNIX-DGRAM to UNIX to maintain backwards compatability
Then name of this protocol changed in commit 94531cfcbe ("af_unix: Add
unix_stream_proto for sockmap") because that commit added stream support
to the af_unix protocol. Renaming the existing protocol makes a ChromeOS
protocol test[1] fail now that the name has changed in
/proc/net/protocols from "UNIX" to "UNIX-DGRAM".

Let's put the name back to how it was while keeping the stream protocol
as "UNIX-STREAM" so that the procfs interface doesn't change. This fixes
the test and maintains backwards compatibility in proc.

Cc: Jiang Wang <jiang.wang@bytedance.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Dmitry Osipenko <digetx@gmail.com>
Link: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/supported_protocols.go;l=50;drc=e8b1c3f94cb40a054f4aa1ef1aff61e75dc38f18 [1]
Fixes: 94531cfcbe ("af_unix: Add unix_stream_proto for sockmap")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:16:49 +01:00
Xuan Zhuo
732b74d647 virtio-net: fix for skb_over_panic inside big mode
commit 126285651b ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
accidentally reverted the effect of
commit 1a8024239d ("virtio-net: fix for skb_over_panic inside big mode")
on drivers/net/virtio_net.c

As a result, users of crosvm (which is using large packet mode)
are experiencing crashes with 5.14-rc1 and above that do not
occur with 5.13.

Crash trace:

[   61.346677] skbuff: skb_over_panic: text:ffffffff881ae2c7 len:3762 put:3762 head:ffff8a5ec8c22000 data:ffff8a5ec8c22010 tail:0xec2 end:0xec0 dev:<NULL>
[   61.369192] kernel BUG at net/core/skbuff.c:111!
[   61.372840] invalid opcode: 0000 [#1] SMP PTI
[   61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0-rc1 linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1
[   61.376450] Hardware name: ChromiumOS crosvm, BIOS 0

..

[   61.393635] Call Trace:
[   61.394127]  <IRQ>
[   61.394488]  skb_put.cold+0x10/0x10
[   61.395095]  page_to_skb+0xf7/0x410
[   61.395689]  receive_buf+0x81/0x1660
[   61.396228]  ? netif_receive_skb_list_internal+0x1ad/0x2b0
[   61.397180]  ? napi_gro_flush+0x97/0xe0
[   61.397896]  ? detach_buf_split+0x67/0x120
[   61.398573]  virtnet_poll+0x2cf/0x420
[   61.399197]  __napi_poll+0x25/0x150
[   61.399764]  net_rx_action+0x22f/0x280
[   61.400394]  __do_softirq+0xba/0x257
[   61.401012]  irq_exit_rcu+0x8e/0xb0
[   61.401618]  common_interrupt+0x7b/0xa0
[   61.402270]  </IRQ>

See
https://lore.kernel.org/r/5edaa2b7c2fe4abd0347b8454b2ac032b6694e2c.camel%40collabora.com
for the report.

Apply the original 1a8024239d ("virtio-net: fix for skb_over_panic inside big mode")
again, the original logic still holds:

In virtio-net's large packet mode, there is a hole in the space behind
buf.

    hdr_padded_len - hdr_len

We must take this into account when calculating tailroom.

Cc: Greg KH <gregkh@linuxfoundation.org>
Fixes: fb32856b16 ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Fixes: 126285651b ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: Corentin Noël <corentin.noel@collabora.com>
Tested-by: Corentin Noël <corentin.noel@collabora.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09 13:50:33 +01:00
Florian Fainelli
f49823939e net: phy: Do not shutdown PHYs in READY state
In case a PHY device was probed thus in the PHY_READY state, but not
configured and with no network device attached yet, we should not be
trying to shut it down because it has been brought back into reset by
phy_device_reset() towards the end of phy_probe() and anyway we have not
configured the PHY yet.

Fixes: e2f016cf77 ("net: phy: add a shutdown procedure")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09 13:47:37 +01:00
chongjiapeng
a5a14ea7b4 qed: Fix missing error code in qed_slowpath_start()
The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'rc'.

Eliminate the follow smatch warning:

drivers/net/ethernet/qlogic/qed/qed_main.c:1298 qed_slowpath_start()
warn: missing error code 'rc'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: d51e4af5c2 ("qed: aRFS infrastructure support")
Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09 13:46:41 +01:00
Vladimir Oltean
1951b3f19c net: dsa: hold rtnl_lock in dsa_switch_setup_tag_protocol
It was a documented fact that ds->ops->change_tag_protocol() offered
rtnetlink mutex protection to the switch driver, since there was an
ASSERT_RTNL right before the call in dsa_switch_change_tag_proto()
(initiated from sysfs).

The blamed commit introduced another call path for
ds->ops->change_tag_protocol() which does not hold the rtnl_mutex.
This is:

dsa_tree_setup
-> dsa_tree_setup_switches
   -> dsa_switch_setup
      -> dsa_switch_setup_tag_protocol
         -> ds->ops->change_tag_protocol()
   -> dsa_port_setup
      -> dsa_slave_create
         -> register_netdevice(slave_dev)
-> dsa_tree_setup_master
   -> dsa_master_setup
      -> dev->dsa_ptr = cpu_dp

The reason why the rtnl_mutex is held in the sysfs call path is to
ensure that, once the master and all the DSA interfaces are down (which
is required so that no packets flow), they remain down during the
tagging protocol change.

The above calling order illustrates the fact that it should not be risky
to change the initial tagging protocol to the one specified in the
device tree at the given time:

- packets cannot enter the dsa_switch_rcv() packet type handler since
  netdev_uses_dsa() for the master will not yet return true, since
  dev->dsa_ptr has not yet been populated

- packets cannot enter the dsa_slave_xmit() function because no DSA
  interface has yet been registered

So from the DSA core's perspective, holding the rtnl_mutex is indeed not
necessary.

Yet, drivers may need to do things which need rtnl_mutex protection. For
example:

felix_set_tag_protocol
-> felix_setup_tag_8021q
   -> dsa_tag_8021q_register
      -> dsa_tag_8021q_setup
         -> dsa_tag_8021q_port_setup
            -> vlan_vid_add
               -> ASSERT_RTNL

These drivers do not really have a choice to take the rtnl_mutex
themselves, since in the sysfs case, the rtnl_mutex is already held.

Fixes: deff710703 ("net: dsa: Allow default tag protocol to be overridden from DT")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09 13:44:05 +01:00
Zheyu Ma
6510e80a0b isdn: mISDN: Fix sleeping function called from invalid context
The driver can call card->isac.release() function from an atomic
context.

Fix this by calling this function after releasing the lock.

The following log reveals it:

[   44.168226 ] BUG: sleeping function called from invalid context at kernel/workqueue.c:3018
[   44.168941 ] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 5475, name: modprobe
[   44.169574 ] INFO: lockdep is turned off.
[   44.169899 ] irq event stamp: 0
[   44.170160 ] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[   44.170627 ] hardirqs last disabled at (0): [<ffffffff814209ed>] copy_process+0x132d/0x3e00
[   44.171240 ] softirqs last  enabled at (0): [<ffffffff81420a1a>] copy_process+0x135a/0x3e00
[   44.171852 ] softirqs last disabled at (0): [<0000000000000000>] 0x0
[   44.172318 ] Preemption disabled at:
[   44.172320 ] [<ffffffffa009b0a9>] nj_release+0x69/0x500 [netjet]
[   44.174441 ] Call Trace:
[   44.174630 ]  dump_stack_lvl+0xa8/0xd1
[   44.174912 ]  dump_stack+0x15/0x17
[   44.175166 ]  ___might_sleep+0x3a2/0x510
[   44.175459 ]  ? nj_release+0x69/0x500 [netjet]
[   44.175791 ]  __might_sleep+0x82/0xe0
[   44.176063 ]  ? start_flush_work+0x20/0x7b0
[   44.176375 ]  start_flush_work+0x33/0x7b0
[   44.176672 ]  ? trace_irq_enable_rcuidle+0x85/0x170
[   44.177034 ]  ? kasan_quarantine_put+0xaa/0x1f0
[   44.177372 ]  ? kasan_quarantine_put+0xaa/0x1f0
[   44.177711 ]  __flush_work+0x11a/0x1a0
[   44.177991 ]  ? flush_work+0x20/0x20
[   44.178257 ]  ? lock_release+0x13c/0x8f0
[   44.178550 ]  ? __kasan_check_write+0x14/0x20
[   44.178872 ]  ? do_raw_spin_lock+0x148/0x360
[   44.179187 ]  ? read_lock_is_recursive+0x20/0x20
[   44.179530 ]  ? __kasan_check_read+0x11/0x20
[   44.179846 ]  ? do_raw_spin_unlock+0x55/0x900
[   44.180168 ]  ? ____kasan_slab_free+0x116/0x140
[   44.180505 ]  ? _raw_spin_unlock_irqrestore+0x41/0x60
[   44.180878 ]  ? skb_queue_purge+0x1a3/0x1c0
[   44.181189 ]  ? kfree+0x13e/0x290
[   44.181438 ]  flush_work+0x17/0x20
[   44.181695 ]  mISDN_freedchannel+0xe8/0x100
[   44.182006 ]  isac_release+0x210/0x260 [mISDNipac]
[   44.182366 ]  nj_release+0xf6/0x500 [netjet]
[   44.182685 ]  nj_remove+0x48/0x70 [netjet]
[   44.182989 ]  pci_device_remove+0xa9/0x250

Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09 13:42:51 +01:00
Shannon Nelson
5c976a5657 ionic: don't remove netdev->dev_addr when syncing uc list
Bridging, and possibly other upper stack gizmos, adds the
lower device's netdev->dev_addr to its own uc list, and
then requests it be deleted when the upper bridge device is
removed.  This delete request also happens with the bridging
vlan_filtering is enabled and then disabled.

Bonding has a similar behavior with the uc list, but since it
also uses set_mac to manage netdev->dev_addr, it doesn't have
the same the failure case.

Because we store our netdev->dev_addr in our uc list, we need
to ignore the delete request from dev_uc_sync so as to not
lose the address and all hope of communicating.  Note that
ndo_set_mac_address is expressly changing netdev->dev_addr,
so no limitation is set there.

Fixes: 2a654540be ("ionic: Add Rx filter and rx_mode ndo support")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09 11:56:59 +01:00
Haiyang Zhang
be0499369d net: mana: Fix error handling in mana_create_rxq()
Fix error handling in mana_create_rxq() when
cq->gdma_id >= gc->max_num_cqs.

Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1633698691-31721-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08 17:00:04 -07:00
Xiaolong Huang
1f3e2e97c0 isdn: cpai: check ctr->cnr to avoid array index out of bound
The cmtp_add_connection() would add a cmtp session to a controller
and run a kernel thread to process cmtp.

	__module_get(THIS_MODULE);
	session->task = kthread_run(cmtp_session, session, "kcmtpd_ctr_%d",
								session->num);

During this process, the kernel thread would call detach_capi_ctr()
to detach a register controller. if the controller
was not attached yet, detach_capi_ctr() would
trigger an array-index-out-bounds bug.

[   46.866069][ T6479] UBSAN: array-index-out-of-bounds in
drivers/isdn/capi/kcapi.c:483:21
[   46.867196][ T6479] index -1 is out of range for type 'capi_ctr *[32]'
[   46.867982][ T6479] CPU: 1 PID: 6479 Comm: kcmtpd_ctr_0 Not tainted
5.15.0-rc2+ #8
[   46.869002][ T6479] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.14.0-2 04/01/2014
[   46.870107][ T6479] Call Trace:
[   46.870473][ T6479]  dump_stack_lvl+0x57/0x7d
[   46.870974][ T6479]  ubsan_epilogue+0x5/0x40
[   46.871458][ T6479]  __ubsan_handle_out_of_bounds.cold+0x43/0x48
[   46.872135][ T6479]  detach_capi_ctr+0x64/0xc0
[   46.872639][ T6479]  cmtp_session+0x5c8/0x5d0
[   46.873131][ T6479]  ? __init_waitqueue_head+0x60/0x60
[   46.873712][ T6479]  ? cmtp_add_msgpart+0x120/0x120
[   46.874256][ T6479]  kthread+0x147/0x170
[   46.874709][ T6479]  ? set_kthread_struct+0x40/0x40
[   46.875248][ T6479]  ret_from_fork+0x1f/0x30
[   46.875773][ T6479]

Signed-off-by: Xiaolong Huang <butterflyhuangxx@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211008065830.305057-1-butterflyhuangxx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08 16:52:58 -07:00
Sebastian Andrzej Siewior
1413269086 mqprio: Correct stats in mqprio_dump_class_stats().
Introduction of lockless subqueues broke the class statistics.
Before the change stats were accumulated in `bstats' and `qstats'
on the stack which was then copied to struct gnet_dump.

After the change the `bstats' and `qstats' are initialized to 0
and never updated, yet still fed to gnet_dump. The code updates
the global qdisc->cpu_bstats and qdisc->cpu_qstats instead,
clobbering them. Most likely a copy-paste error from the code in
mqprio_dump().

__gnet_stats_copy_basic() and __gnet_stats_copy_queue() accumulate
the values for per-CPU case but for global stats they overwrite
the value, so only stats from the last loop iteration / tc end up
in sch->[bq]stats.

Use the on-stack [bq]stats variables again and add the stats manually
in the global case.

Fixes: ce679e8df7 ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
https://lore.kernel.org/all/20211007175000.2334713-2-bigeasy@linutronix.de/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08 16:27:22 -07:00
Jakub Kicinski
bccf56c4cb Merge branch 'dsa-bridge-tx-forwarding-offload-fixes-part-1'
Vladimir Oltean says:

====================
DSA bridge TX forwarding offload fixes - part 1

This is part 1 of a series of fixes to the bridge TX forwarding offload
feature introduced for v5.15. Sadly, the other fixes are so intrusive
that they cannot be reasonably be sent to the "net" tree, as they also
include API changes. So they are left as part 2 for net-next.
====================

Link: https://lore.kernel.org/r/20211007164711.2897238-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08 15:47:49 -07:00