Commit Graph

1199328 Commits

Author SHA1 Message Date
Florian Kauer
8b86f10ab6 igc: No strict mode in pure launchtime/CBS offload
The flags IGC_TXQCTL_STRICT_CYCLE and IGC_TXQCTL_STRICT_END
prevent the packet transmission over slot and cycle boundaries.
This is important for taprio offload where the slots and
cycles correspond to the slots and cycles configured for the
network.

However, the Qbv offload feature of the i225 is also used for
enabling TX launchtime / ETF offload. In that case, however,
the cycle has no meaning for the network and is only used
internally to adapt the base time register after a second has
passed.

Enabling strict mode in this case would unnecessarily prevent
the transmission of certain packets (i.e. at the boundary of a
second) and thus interferes with the ETF qdisc that promises
transmission at a certain point in time.

Similar to ETF, this also applies to CBS offload that also should
not be influenced by strict mode unless taprio offload would be
enabled at the same time.

This fully reverts
commit d8f45be01d ("igc: Use strict cycles for Qbv scheduling")
but its commit message only describes what was already implemented
before that commit. The difference to a plain revert of that commit
is that it now copes with the base_time = 0 case that was fixed with
commit e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")

In particular, enabling strict mode leads to TX hang situations
under high traffic if taprio is applied WITHOUT taprio offload
but WITH ETF offload, e.g. as in

    sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
	    num_tc 1 \
	    map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
	    queues 1@0 \
	    base-time 0 \
	    sched-entry S 01 300000 \
	    flags 0x1 \
	    txtime-delay 500000 \
	    clockid CLOCK_TAI
    sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
	    clockid CLOCK_TAI \
	    delta 500000 \
	    offload \
	    skip_sock_check

and traffic generator

    sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns

with traffic.cfg

    #define ETH_P_IP        0x0800

    {
      /* Ethernet Header */
      0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e,  # MAC Dest - adapt as needed
      0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36,  # MAC Src  - adapt as needed
      const16(ETH_P_IP),

      /* IPv4 Header */
      0b01000101, 0,   # IPv4 version, IHL, TOS
      const16(1028),   # IPv4 total length (UDP length + 20 bytes (IP header))
      const16(2),      # IPv4 ident
      0b01000000, 0,   # IPv4 flags, fragmentation off
      64,              # IPv4 TTL
      17,              # Protocol UDP
      csumip(14, 33),  # IPv4 checksum

      /* UDP Header */
      10,  0, 48, 1,   # IP Src - adapt as needed
      10,  0, 48, 10,  # IP Dest - adapt as needed
      const16(5555),   # UDP Src Port
      const16(6666),   # UDP Dest Port
      const16(1008),   # UDP length (UDP header 8 bytes + payload length)
      csumudp(14, 34), # UDP checksum

      /* Payload */
      fill('W', 1000),
    }

and the observed message with that is for example

 igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
   Tx Queue             <0>
   TDH                  <d0>
   TDT                  <f0>
   next_to_use          <f0>
   next_to_clean        <d0>
 buffer_info[next_to_clean]
   time_stamp           <ffff661f>
   next_to_watch        <00000000245a4efb>
   jiffies              <ffff6e48>
   desc.status          <1048000>

Fixes: d8f45be01d ("igc: Use strict cycles for Qbv scheduling")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-10 08:58:16 -07:00
Florian Kauer
e5d88c53d0 igc: Handle already enabled taprio offload for basetime 0
Since commit e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
it is possible to enable taprio offload with a basetime of 0.
However, the check if taprio offload is already enabled (and thus -EALREADY
should be returned for igc_save_qbv_schedule) still relied on
adapter->base_time > 0.

This can be reproduced as follows:

    # TAPRIO offload (flags == 0x2) and base-time = 0
    sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
	    num_tc 1 \
	    map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
	    queues 1@0 \
	    base-time 0 \
	    sched-entry S 01 300000 \
	    flags 0x2

    # The second call should fail with "Error: Device failed to setup taprio offload."
    # But that only happens if base-time was != 0
    sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
	    num_tc 1 \
	    map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
	    queues 1@0 \
	    base-time 0 \
	    sched-entry S 01 300000 \
	    flags 0x2

Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-10 08:40:22 -07:00
Florian Kauer
82ff5f29b7 igc: Do not enable taprio offload for invalid arguments
Only set adapter->taprio_offload_enable after validating the arguments.
Otherwise, it stays set even if the offload was not enabled.
Since the subsequent code does not get executed in case of invalid
arguments, it will not be read at first.
However, by activating and then deactivating another offload
(e.g. ETF/TX launchtime offload), taprio_offload_enable is read
and erroneously keeps the offload feature of the NIC enabled.

This can be reproduced as follows:

    # TAPRIO offload (flags == 0x2) and negative base-time leading to expected -ERANGE
    sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
	    num_tc 1 \
	    map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
	    queues 1@0 \
	    base-time -1000 \
	    sched-entry S 01 300000 \
	    flags 0x2

    # IGC_TQAVCTRL is 0x0 as expected (iomem=relaxed for reading register)
    sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1

    # Activate ETF offload
    sudo tc qdisc replace dev enp1s0 parent root handle 6666 mqprio \
	    num_tc 3 \
	    map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	    queues 1@0 1@1 2@2 \
	    hw 0
    sudo tc qdisc add dev enp1s0 parent 6666:1 etf \
	    clockid CLOCK_TAI \
	    delta 500000 \
	    offload

    # IGC_TQAVCTRL is 0x9 as expected
    sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1

    # Deactivate ETF offload again
    sudo tc qdisc delete dev enp1s0 parent 6666:1

    # IGC_TQAVCTRL should now be 0x0 again, but is observed as 0x9
    sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1

Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-10 08:40:22 -07:00
Florian Kauer
8046063df8 igc: Rename qbv_enable to taprio_offload_enable
In the current implementation the flags adapter->qbv_enable
and IGC_FLAG_TSN_QBV_ENABLED have a similar name, but do not
have the same meaning. The first one is used only to indicate
taprio offload (i.e. when igc_save_qbv_schedule was called),
while the second one corresponds to the Qbv mode of the hardware.
However, the second one is also used to support the TX launchtime
feature, i.e. ETF qdisc offload. This leads to situations where
adapter->qbv_enable is false, but the flag IGC_FLAG_TSN_QBV_ENABLED
is set. This is prone to confusion.

The rename should reduce this confusion. Since it is a pure
rename, it has no impact on functionality.

Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-10 08:40:22 -07:00
Junfeng Guo
9d0aba9831 gve: unify driver name usage
Current codebase contained the usage of two different names for this
driver (i.e., `gvnic` and `gve`), which is quite unfriendly for users
to use, especially when trying to bind or unbind the driver manually.
The corresponding kernel module is registered with the name of `gve`.
It's more reasonable to align the name of the driver with the module.

Fixes: 893ce44df5 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Cc: csully@google.com
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-10 08:29:55 +01:00
Azeem Shaikh
989b52cdc8 net: sched: Replace strlcpy with strscpy
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().

Direct replacement is safe here since return value of -errno
is used to check for truncation instead of sizeof(dest).

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89

Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-10 08:23:53 +01:00
Jiasheng Jiang
87355b7c3d net: dsa: qca8k: Add check for skb_copy
Add check for the return value of skb_copy in order to avoid NULL pointer
dereference.

Fixes: 2cd5485663 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-10 08:21:53 +01:00
Simon Horman
73c4d1b307 net: lan743x: select FIXED_PHY
The blamed commit introduces usage of fixed_phy_register() but
not a corresponding dependency on FIXED_PHY.

This can result in a build failure.

 s390-linux-ld: drivers/net/ethernet/microchip/lan743x_main.o: in function `lan743x_phy_open':
 drivers/net/ethernet/microchip/lan743x_main.c:1514: undefined reference to `fixed_phy_register'

Fixes: 624864fbff ("net: lan743x: add fixed phy support for LAN7431 device")
Cc: stable@vger.kernel.org
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/netdev/725bf1c5-b252-7d19-7582-a6809716c7d6@infradead.org/
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-09 11:23:47 +01:00
Ziyang Xuan
06a0716949 ipv6/addrconf: fix a potential refcount underflow for idev
Now in addrconf_mod_rs_timer(), reference idev depends on whether
rs_timer is not pending. Then modify rs_timer timeout.

There is a time gap in [1], during which if the pending rs_timer
becomes not pending. It will miss to hold idev, but the rs_timer
is activated. Thus rs_timer callback function addrconf_rs_timer()
will be executed and put idev later without holding idev. A refcount
underflow issue for idev can be caused by this.

	if (!timer_pending(&idev->rs_timer))
		in6_dev_hold(idev);
		  <--------------[1]
	mod_timer(&idev->rs_timer, jiffies + when);

To fix the issue, hold idev if mod_timer() return 0.

Fixes: b7b1bfce0b ("ipv6: split duplicate address detection and router solicitation timer")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-09 11:19:01 +01:00
Eric Dumazet
51d03e2f22 udp6: fix udp6_ehashfn() typo
Amit Klein reported that udp6_ehash_secret was initialized but never used.

Fixes: 1bbdceef1e ("inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once")
Reported-by: Amit Klein <aksecurity@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-08 15:20:48 +01:00
Kuniyuki Iwashima
2aaa8a15de icmp6: Fix null-ptr-deref of ip6_null_entry->rt6i_idev in icmp6_dev().
With some IPv6 Ext Hdr (RPL, SRv6, etc.), we can send a packet that
has the link-local address as src and dst IP and will be forwarded to
an external IP in the IPv6 Ext Hdr.

For example, the script below generates a packet whose src IP is the
link-local address and dst is updated to 11::.

  # for f in $(find /proc/sys/net/ -name *seg6_enabled*); do echo 1 > $f; done
  # python3
  >>> from socket import *
  >>> from scapy.all import *
  >>>
  >>> SRC_ADDR = DST_ADDR = "fe80::5054:ff:fe12:3456"
  >>>
  >>> pkt = IPv6(src=SRC_ADDR, dst=DST_ADDR)
  >>> pkt /= IPv6ExtHdrSegmentRouting(type=4, addresses=["11::", "22::"], segleft=1)
  >>>
  >>> sk = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW)
  >>> sk.sendto(bytes(pkt), (DST_ADDR, 0))

For such a packet, we call ip6_route_input() to look up a route for the
next destination in these three functions depending on the header type.

  * ipv6_rthdr_rcv()
  * ipv6_rpl_srh_rcv()
  * ipv6_srh_rcv()

If no route is found, ip6_null_entry is set to skb, and the following
dst_input(skb) calls ip6_pkt_drop().

Finally, in icmp6_dev(), we dereference skb_rt6_info(skb)->rt6i_idev->dev
as the input device is the loopback interface.  Then, we have to check if
skb_rt6_info(skb)->rt6i_idev is NULL or not to avoid NULL pointer deref
for ip6_null_entry.

BUG: kernel NULL pointer dereference, address: 0000000000000000
 PF: supervisor read access in kernel mode
 PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 157 Comm: python3 Not tainted 6.4.0-11996-gb121d614371c #35
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
FS:  00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
 <IRQ>
 ip6_pkt_drop (net/ipv6/route.c:4513)
 ipv6_rthdr_rcv (net/ipv6/exthdrs.c:640 net/ipv6/exthdrs.c:686)
 ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
 ip6_input_finish (./include/linux/rcupdate.h:781 net/ipv6/ip6_input.c:483)
 __netif_receive_skb_one_core (net/core/dev.c:5455)
 process_backlog (./include/linux/rcupdate.h:781 net/core/dev.c:5895)
 __napi_poll (net/core/dev.c:6460)
 net_rx_action (net/core/dev.c:6529 net/core/dev.c:6660)
 __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554)
 do_softirq (kernel/softirq.c:454 kernel/softirq.c:441)
 </IRQ>
 <TASK>
 __local_bh_enable_ip (kernel/softirq.c:381)
 __dev_queue_xmit (net/core/dev.c:4231)
 ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:135)
 rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
 sock_sendmsg (net/socket.c:725 net/socket.c:748)
 __sys_sendto (net/socket.c:2134)
 __x64_sys_sendto (net/socket.c:2146 net/socket.c:2142 net/socket.c:2142)
 do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f9dc751baea
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffe98712c38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffe98712cf8 RCX: 00007f9dc751baea
RDX: 0000000000000060 RSI: 00007f9dc6460b90 RDI: 0000000000000003
RBP: 00007f9dc56e8be0 R08: 00007ffe98712d70 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f9dc6af5d1b
 </TASK>
Modules linked in:
CR2: 0000000000000000
 ---[ end trace 0000000000000000 ]---
RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
FS:  00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled

Fixes: 4832c30d54 ("net: ipv6: put host and anycast routes on device with address")
Reported-by: Wang Yufen <wangyufen@huawei.com>
Closes: https://lore.kernel.org/netdev/c41403a9-c2f6-3b7e-0c96-e1901e605cd0@huawei.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-08 10:08:54 +01:00
David S. Miller
bbffab69d0 Merge branch 's390-ism-fixes'
Niklas Schnelle says:

====================
s390/ism: Fixes to client handling

This is v2 of the patch previously titled "s390/ism: Detangle ISM client
IRQ and event forwarding". As suggested by Paolo Abeni I split the patch
up. While doing so I noticed another problem that was fixed by this patch
concerning the way the workqueues access the client structs. This means the
second patch turning the workqueues into simple direct calls also fixes
a problem. Finally I split off a third patch just for fixing
ism_unregister_client()s error path.

The code after these 3 patches is identical to the result of the v1 patch
except that I also turned the dev_err() for still registered DMBs into
a WARN().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-08 10:07:14 +01:00
Niklas Schnelle
266deeea34 s390/ism: Do not unregister clients with registered DMBs
When ism_unregister_client() is called but the client still has DMBs
registered it returns -EBUSY and prints an error. This only happens
after the client has already been unregistered however. This is
unexpected as the unregister claims to have failed. Furthermore as this
implies a client bug a WARN() is more appropriate. Thus move the
deregistration after the check and use WARN().

Fixes: 89e7d2ba61 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-08 10:07:14 +01:00
Niklas Schnelle
76631ffa2f s390/ism: Fix and simplify add()/remove() callback handling
Previously the clients_lock was protecting the clients array against
concurrent addition/removal of clients but was also accessed from IRQ
context. This meant that it had to be a spinlock and that the add() and
remove() callbacks in which clients need to do allocation and take
mutexes can't be called under the clients_lock. To work around this these
callbacks were moved to workqueues. This not only introduced significant
complexity but is also subtly broken in at least one way.

In ism_dev_init() and ism_dev_exit() clients[i]->tgt_ism is used to
communicate the added/removed ISM device to the work function. While
write access to client[i]->tgt_ism is protected by the clients_lock and
the code waits that there is no pending add/remove work before and after
setting clients[i]->tgt_ism this is not enough. The problem is that the
wait happens based on per ISM device counters. Thus a concurrent
ism_dev_init()/ism_dev_exit() for a different ISM device may overwrite
a clients[i]->tgt_ism between unlocking the clients_lock and the
subsequent wait for the work to finnish.

Thankfully with the clients_lock no longer held in IRQ context it can be
turned into a mutex which can be held during the calls to add()/remove()
completely removing the need for the workqueues and the associated
broken housekeeping including the per ISM device counters and the
clients[i]->tgt_ism.

Fixes: 89e7d2ba61 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-08 10:07:14 +01:00
Niklas Schnelle
6b5c13b591 s390/ism: Fix locking for forwarding of IRQs and events to clients
The clients array references all registered clients and is protected by
the clients_lock. Besides its use as general list of clients the clients
array is accessed in ism_handle_irq() to forward ISM device events to
clients.

While the clients_lock is taken in the IRQ handler when calling
handle_event() it is however incorrectly not held during the
client->handle_irq() call and for the preceding clients[] access leaving
it unprotected against concurrent client (un-)registration.

Furthermore the accesses to ism->sba_client_arr[] in ism_register_dmb()
and ism_unregister_dmb() are not protected by any lock. This is
especially problematic as the client ID from the ism->sba_client_arr[]
is not checked against NO_CLIENT and neither is the client pointer
checked.

Instead of expanding the use of the clients_lock further add a separate
array in struct ism_dev which references clients subscribed to the
device's events and IRQs. This array is protected by ism->lock which is
already taken in ism_handle_irq() and can be taken outside the IRQ
handler when adding/removing subscribers or the accessing
ism->sba_client_arr[]. This also means that the clients_lock is no
longer taken in IRQ context.

Fixes: 89e7d2ba61 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-08 10:07:14 +01:00
Paolo Abeni
c329b261af net: prevent skb corruption on frag list segmentation
Ian reported several skb corruptions triggered by rx-gro-list,
collecting different oops alike:

[   62.624003] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[   62.631083] #PF: supervisor read access in kernel mode
[   62.636312] #PF: error_code(0x0000) - not-present page
[   62.641541] PGD 0 P4D 0
[   62.644174] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   62.648629] CPU: 1 PID: 913 Comm: napi/eno2-79 Not tainted 6.4.0 #364
[   62.655162] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F, BIOS 1.7a 10/13/2022
[   62.663344] RIP: 0010:__udp_gso_segment (./include/linux/skbuff.h:2858
./include/linux/udp.h:23 net/ipv4/udp_offload.c:228 net/ipv4/udp_offload.c:261
net/ipv4/udp_offload.c:277)
[   62.687193] RSP: 0018:ffffbd3a83b4f868 EFLAGS: 00010246
[   62.692515] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000
[   62.699743] RDX: ffffa124def8a000 RSI: 0000000000000079 RDI: ffffa125952a14d4
[   62.706970] RBP: ffffa124def8a000 R08: 0000000000000022 R09: 00002000001558c9
[   62.714199] R10: 0000000000000000 R11: 00000000be554639 R12: 00000000000000e2
[   62.721426] R13: ffffa125952a1400 R14: ffffa125952a1400 R15: 00002000001558c9
[   62.728654] FS:  0000000000000000(0000) GS:ffffa127efa40000(0000)
knlGS:0000000000000000
[   62.736852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   62.742702] CR2: 00000000000000c0 CR3: 00000001034b0000 CR4: 00000000003526e0
[   62.749948] Call Trace:
[   62.752498]  <TASK>
[   62.779267] inet_gso_segment (net/ipv4/af_inet.c:1398)
[   62.787605] skb_mac_gso_segment (net/core/gro.c:141)
[   62.791906] __skb_gso_segment (net/core/dev.c:3403 (discriminator 2))
[   62.800492] validate_xmit_skb (./include/linux/netdevice.h:4862
net/core/dev.c:3659)
[   62.804695] validate_xmit_skb_list (net/core/dev.c:3710)
[   62.809158] sch_direct_xmit (net/sched/sch_generic.c:330)
[   62.813198] __dev_queue_xmit (net/core/dev.c:3805 net/core/dev.c:4210)
net/netfilter/core.c:626)
[   62.821093] br_dev_queue_push_xmit (net/bridge/br_forward.c:55)
[   62.825652] maybe_deliver (net/bridge/br_forward.c:193)
[   62.829420] br_flood (net/bridge/br_forward.c:233)
[   62.832758] br_handle_frame_finish (net/bridge/br_input.c:215)
[   62.837403] br_handle_frame (net/bridge/br_input.c:298
net/bridge/br_input.c:416)
[   62.851417] __netif_receive_skb_core.constprop.0 (net/core/dev.c:5387)
[   62.866114] __netif_receive_skb_list_core (net/core/dev.c:5570)
[   62.871367] netif_receive_skb_list_internal (net/core/dev.c:5638
net/core/dev.c:5727)
[   62.876795] napi_complete_done (./include/linux/list.h:37
./include/net/gro.h:434 ./include/net/gro.h:429 net/core/dev.c:6067)
[   62.881004] ixgbe_poll (drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3191)
[   62.893534] __napi_poll (net/core/dev.c:6498)
[   62.897133] napi_threaded_poll (./include/linux/netpoll.h:89
net/core/dev.c:6640)
[   62.905276] kthread (kernel/kthread.c:379)
[   62.913435] ret_from_fork (arch/x86/entry/entry_64.S:314)
[   62.917119]  </TASK>

In the critical scenario, rx-gro-list GRO-ed packets are fed, via a
bridge, both to the local input path and to an egress device (tun).

The segmentation of such packets unsafely writes to the cloned skbs
with shared heads.

This change addresses the issue by uncloning as needed the
to-be-segmented skbs.

Reported-by: Ian Kumlien <ian.kumlien@gmail.com>
Tested-by: Ian Kumlien <ian.kumlien@gmail.com>
Fixes: 3a1296a38d ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-08 10:03:26 +01:00
Rafał Miłecki
e7731194fd net: bgmac: postpone turning IRQs off to avoid SoC hangs
Turning IRQs off is done by accessing Ethernet controller registers.
That can't be done until device's clock is enabled. It results in a SoC
hang otherwise.

This bug remained unnoticed for years as most bootloaders keep all
Ethernet interfaces turned on. It seems to only affect a niche SoC
family BCM47189. It has two Ethernet controllers but CFE bootloader uses
only the first one.

Fixes: 34322615cb ("net: bgmac: Mask interrupts during probe")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-08 10:02:15 +01:00
Ivan Babrou
8139dccd46 udp6: add a missing call into udp_fail_queue_rcv_skb tracepoint
The tracepoint has existed for 12 years, but it only covered udp
over the legacy IPv4 protocol. Having it enabled for udp6 removes
the unnecessary difference in error visibility.

Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Fixes: 296f7ea75b ("udp: add tracepoints for queueing skb to rcvbuf")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-07 09:16:52 +01:00
Shannon Nelson
3a7af34fb6 ionic: remove dead device fail path
Remove the probe error path code that leaves the driver bound
to the device, but with essentially a dead device.  This was
useful maybe twice early in the driver's life and no longer
makes sense to keep.

Fixes: 30a1e6d0f8 ("ionic: keep ionic dev on lif init fail")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-07 09:14:45 +01:00
Nitya Sunkad
abfb2a58a5 ionic: remove WARN_ON to prevent panic_on_warn
Remove unnecessary early code development check and the WARN_ON
that it uses.  The irq alloc and free paths have long been
cleaned up and this check shouldn't have stuck around so long.

Fixes: 77ceb68e29 ("ionic: Add notifyq support")
Signed-off-by: Nitya Sunkad <nitya.sunkad@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-07 09:13:16 +01:00
Sai Krishna
7709fbd492 octeontx2-af: Move validation of ptp pointer before its usage
Moved PTP pointer validation before its use to avoid smatch warning.
Also used kzalloc/kfree instead of devm_kzalloc/devm_kfree.

Fixes: 2ef4e45d99 ("octeontx2-af: Add PTP PPS Errata workaround on CN10K silicon")
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-07 09:08:57 +01:00
Ratheesh Kannoth
af42088bda octeontx2-af: Promisc enable/disable through mbox
In legacy silicon, promiscuous mode is only modified
through CGX mbox messages. In CN10KB silicon, it is modified
from CGX mbox and NIX. This breaks legacy application
behaviour. Fix this by removing call from NIX.

Fixes: d6c9784baf ("octeontx2-af: Invoke exact match functions if supported")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-07 09:06:50 +01:00
David S. Miller
b61aac027b Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-07-05 (igc)

This series contains updates to igc driver only.

Husaini adds check to increment Qbv change error counter only on taprio
Qbvs. He also removes delay during Tx ring configuration and
resolves Tx hang that could occur when transmitting on a gate to be
closed.

Prasad Koya reports ethtool link mode as TP (twisted pair).

Tee Min corrects value for max SDU.

Aravindhan ensures that registers for PPS are always programmed to occur
in future.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-07 08:56:12 +01:00
Junfeng Guo
0503efeadb gve: Set default duplex configuration to full
Current duplex mode was unset in the driver, resulting in the default
parameter being set to 0, which corresponds to half duplex. It might
mislead users to have incorrect expectation about the driver's
transmission capabilities.
Set the default duplex configuration to full, as the driver runs in
full duplex mode at this point.

Fixes: 7e074d5a76 ("gve: Enable Link Speed Reporting in the driver.")
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Message-ID: <20230706044128.2726747-1-junfeng.guo@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-06 19:14:24 -07:00
Jakub Kicinski
41b9eff0ce Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-07-05 (ice)

This series contains updates to ice driver only.

Sridhar fixes incorrect comparison of max Tx rate limit to occur against
each TC value rather than the aggregate. He also resolves an issue with
the wrong VSI being used when setting max Tx rate when TCs are enabled.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: Fix tx queue rate limit when TCs are configured
  ice: Fix max_rate check while configuring TX rate limits
====================

Link: https://lore.kernel.org/r/20230705201346.49370-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-06 19:14:16 -07:00
Jakub Kicinski
4863b57bfd mlx5-fixes-2023-07-05
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmSlrvAACgkQSD+KveBX
 +j4EcggAsZlHQHRaA9re/5Fr8VV/YNmf+eLM6/2F6CD1sLcwWEsmDXkqZpL/wqVv
 bcP/dE/ehiMd1FI6XmEb9aGZY29mQBoItwCnxeUrK75d8CQib/pH87VkQdT9Uf6W
 RVqPk2rOL778sTy6V/UZDecfmZB5XfEoOO6f0YP/j2t8HxmfdetBN0orE0iRQBmO
 +0W8X5bIfCmGyWdQzOJB4a+S7Wi1JACr6CIOdNtADuZ7C7kKiPWrvEl7DMHlyX3Y
 TyrE//piSJuxaHwdMNBHPsXK5ZYyn7ZJjqeCA+Kd+lKMJlJEO7R+TkTOP0RKE++a
 ZdDetCqLKegQPpWZgjGzei7PpSpaFA==
 =cNgk
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2023-07-05

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: RX, Fix page_pool page fragment tracking for XDP
  net/mlx5: Query hca_cap_2 only when supported
  net/mlx5e: TC, CT: Offload ct clear only once
  net/mlx5e: Check for NOT_READY flag state after locking
  net/mlx5: Register a unique thermal zone per device
  net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq
  net/mlx5e: fix memory leak in mlx5e_ptp_open
  net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_create
  net/mlx5e: fix double free in mlx5e_destroy_flow_table
====================

Link: https://lore.kernel.org/r/20230705175757.284614-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-06 19:11:21 -07:00
M A Ramdhan
0323bce598 net/sched: cls_fw: Fix improper refcount update leads to use-after-free
In the event of a failure in tcf_change_indev(), fw_set_parms() will
immediately return an error after incrementing or decrementing
reference counter in tcf_bind_filter().  If attacker can control
reference counter to zero and make reference freed, leading to
use after free.

In order to prevent this, move the point of possible failure above the
point where the TC_FW_CLASSID is handled.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: M A Ramdhan <ramdhan@starlabs.sg>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Message-ID: <20230705161530.52003-1-ramdhan@starlabs.sg>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-06 19:10:49 -07:00
Quan Zhou
525c469e5d wifi: mt76: mt7921e: fix init command fail with enabled device
For some cases as below, we may encounter the unpreditable chip stats
in driver probe()
* The system reboot flow do not work properly, such as kernel oops while
  rebooting, and then the driver do not go back to default status at
  this moment.
* Similar to the flow above. If the device was enabled in BIOS or UEFI,
  the system may switch to Linux without driver fully shutdown.

To avoid the problem, force push the device back to default in probe()
* mt7921e_mcu_fw_pmctrl() : return control privilege to chip side.
* mt7921_wfsys_reset()    : cleanup chip config before resource init.

Error log
[59007.600714] mt7921e 0000:02:00.0: ASIC revision: 79220010
[59010.889773] mt7921e 0000:02:00.0: Message 00000010 (seq 1) timeout
[59010.889786] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59014.217839] mt7921e 0000:02:00.0: Message 00000010 (seq 2) timeout
[59014.217852] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59017.545880] mt7921e 0000:02:00.0: Message 00000010 (seq 3) timeout
[59017.545893] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59020.874086] mt7921e 0000:02:00.0: Message 00000010 (seq 4) timeout
[59020.874099] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59024.202019] mt7921e 0000:02:00.0: Message 00000010 (seq 5) timeout
[59024.202033] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59027.530082] mt7921e 0000:02:00.0: Message 00000010 (seq 6) timeout
[59027.530096] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59030.857888] mt7921e 0000:02:00.0: Message 00000010 (seq 7) timeout
[59030.857904] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59034.185946] mt7921e 0000:02:00.0: Message 00000010 (seq 8) timeout
[59034.185961] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59037.514249] mt7921e 0000:02:00.0: Message 00000010 (seq 9) timeout
[59037.514262] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59040.842362] mt7921e 0000:02:00.0: Message 00000010 (seq 10) timeout
[59040.842375] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59040.923845] mt7921e 0000:02:00.0: hardware init failed

Cc: stable@vger.kernel.org
Fixes: 5c14a5f944 ("mt76: mt7921: introduce mt7921e support")
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Juan Martinez <juan.martinez@amd.com>
Co-developed-by: Leon Yen <leon.yen@mediatek.com>
Signed-off-by: Leon Yen <leon.yen@mediatek.com>
Signed-off-by: Quan Zhou <quan.zhou@mediatek.com>
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Message-ID: <39fcb7cee08d4ab940d38d82f21897483212483f.1688569385.git.deren.wu@mediatek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-06 19:10:41 -07:00
Jakub Kicinski
1ce1a745b3 Merge branch 'fix-dropping-of-oversize-preemptible-frames-with-felix-dsa-driver'
Vladimir Oltean says:

====================
Fix dropping of oversize preemptible frames with felix DSA driver

It has been reported that preemptible traffic doesn't completely behave
as expected. Namely, large packets should be able to be squeezed
(through fragmentation) through taprio time slots smaller than the
transmission time of the full frame. That does not happen due to logic
in the driver (for oversize frame dropping with taprio) that was not
updated in order for this use case to work.

I am not sure whether it qualifies as "net" material, because some
structural changes are involved, and it is a "never worked" scenario.
OTOH, this is a complaint coming from users for a v6.4 kernel.
It's up to maintainers to decide whether this series can be considered;
I've submitted it as non-RFC in the optimistic case that it will be :)

Demo script illustrating the issue below.

add_taprio()
{
	local ifname=$1

	echo "Creating root taprio"
	tc qdisc replace dev $ifname handle 8001: parent root stab overhead 24 taprio \
		num_tc 8 \
		map 0 1 2 3 4 5 6 7 \
		queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
		base-time 0 \
		sched-entry S 01 1216 \
		sched-entry S fe 12368 \
		fp P E E E E E E E \
		flags 0x2
}

remove_taprio()
{
	local ifname=$1

	echo "Removing taprio"
	tc qdisc del dev $ifname root
}

ip netns add ns0
ip link set eno0 netns ns0 && ip -n ns0 link set eno0 up && ip -n ns0 addr add 192.168.100.1/24 dev eno0
ip addr add 192.168.100.2/24 dev swp0 && ip link set swp0 up
ip netns exec ns0 ethtool --set-mm eno0 pmac-enabled on verify-enabled off tx-enabled on
ethtool --set-mm swp0 pmac-enabled on verify-enabled off tx-enabled on
add_taprio swp0

ping 192.168.100.1 -s 1000 -c 5 # sent through TC0
ethtool -I --show-mm swp0 | grep MACMergeFragCountTx # should increase

ip addr flush swp0 && ip link set swp0 down
remove_taprio swp0
ethtool --set-mm swp0 pmac-enabled off verify-enabled off tx-enabled off
ip netns exec ns0 ethtool --set-mm eno0 pmac-enabled off verify-enabled off tx-enabled off
ip netns del ns0
====================

Link: https://lore.kernel.org/r/20230705104422.49025-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-06 19:10:25 -07:00
Vladimir Oltean
c6efb4ae38 net: mscc: ocelot: fix oversize frame dropping for preemptible TCs
This switch implements Hold/Release in a strange way, with no control
from the user as required by IEEE 802.1Q-2018 through Set-And-Hold-MAC
and Set-And-Release-MAC, but rather, it emits HOLD requests implicitly
based on the schedule.

Namely, when the gate of a preemptible TC is about to close (actually
QSYS::PREEMPTION_CFG.HOLD_ADVANCE octet times in advance of this event),
the QSYS seems to emit a HOLD request pulse towards the MAC which
preempts the currently transmitted packet, and further packets are held
back in the queue system.

This allows large frames to be squeezed through small time slots,
because HOLD requests initiated by the gate events result in the frame
being segmented in multiple fragments, the bit time of which is equal to
the size of the time slot.

It has been reported that the vsc9959_tas_guard_bands_update() logic
breaks this, because it doesn't take preemptible TCs into account, and
enables oversized frame dropping when the time slot doesn't allow a full
MTU to be sent, but it does allow 2*minFragSize to be sent (128B).
Packets larger than 128B are dropped instead of being sent in multiple
fragments.

Confusingly, the manual says:

| For guard band, SDU calculation of a traffic class of a port, if
| preemption is enabled (through 'QSYS::PREEMPTION_CFG.P_QUEUES') then
| QSYS::PREEMPTION_CFG.HOLD_ADVANCE is used, otherwise
| QSYS::QMAXSDU_CFG_*.QMAXSDU_* is used.

but this only refers to the static guard band durations, and the
QMAXSDU_CFG_* registers have dual purpose - the other being oversized
frame dropping, which takes place irrespective of whether frames are
preemptible or express.

So, to fix the problem, we need to call vsc9959_tas_guard_bands_update()
from ocelot_port_update_active_preemptible_tcs(), and modify the guard
band logic to consider a different (lower) oversize limit for
preemptible traffic classes.

Fixes: 403ffc2c34 ("net: mscc: ocelot: add support for preemptible traffic classes")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Message-ID: <20230705104422.49025-4-vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-06 19:10:22 -07:00
Vladimir Oltean
c60819149b net: dsa: felix: make vsc9959_tas_guard_bands_update() visible to ocelot->ops
In a future change we will need to make
ocelot_port_update_active_preemptible_tcs() call
vsc9959_tas_guard_bands_update(), but that is currently not possible,
since the ocelot switch lib does not have access to functions private to
the DSA wrapper.

Move the pointer to vsc9959_tas_guard_bands_update() from felix->info
(which is private to the DSA driver) to ocelot->ops (which is also
visible to the ocelot switch lib).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Message-ID: <20230705104422.49025-3-vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-06 19:10:22 -07:00
Vladimir Oltean
009d30f1a7 net: mscc: ocelot: extend ocelot->fwd_domain_lock to cover ocelot->tas_lock
In a future commit we will have to call vsc9959_tas_guard_bands_update()
from ocelot_port_update_active_preemptible_tcs(), and that will be
impossible due to the AB/BA locking dependencies between
ocelot->tas_lock and ocelot->fwd_domain_lock.

Just like we did in commit 3ff468ef98 ("net: mscc: ocelot: remove
struct ocelot_mm_state :: lock"), the only solution is to expand the
scope of ocelot->fwd_domain_lock for it to also serialize changes made
to the Time-Aware Shaper, because those will have to result in a
recalculation of cut-through TCs, which is something that depends on the
forwarding domain.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Message-ID: <20230705104422.49025-2-vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-06 19:10:22 -07:00
Paolo Abeni
ceb20a3cc5 netfilter pull request 23-07-06
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmSl9YgACgkQ1V2XiooU
 IOQ4nw//f0ASAgbKUEBQTUPwVvG/wngMy99cfyxwkvWjZJ8dbgh0SBq0vK+fc06Q
 9+xixns7i+jIbaMNhgo80SWbdXSNhsAOHoB25F+jv5FkhKPyLWa/HcvjJK2WCs/8
 ri05VJ59sWVlgewxn2WFO9NXyrS3vWfgUYKp0Z0R4v7+8NnfV4pRP3UuvIyYGL6r
 4A1J7ZfDRa/391cE4uUVv8a4rgcb1CvfEpZaltnRcSZQbBKru5ysjCSnlqELJ/3F
 sPxAuTN+LAviJf9U/g7PfmLZ8U/YPRZ1bEanwDnO8hk3bMkGjWBm/NIUpBvu/bh6
 T7gzIkH0wImerogl2rpIomqt4LgUusyj4FWBePw1BC9zYTuDXXriSzYkQgiI7Vd6
 8XVWzYpFtdF9Is3SOEFxkGgu1ZkS3bkD3zTAuaGFNxfIjakUsxLZhez0BocYs97S
 l9s8x1vwQtJyGtC8PgAcLsqMlYXJZ7ur8LP9st76Ghtc88OYJUMchUI/S8gShWWi
 tv/xUx8erV2v9abW588LKz1LzUd4lMPR0GXlCLfOOKzT8gjsjHtQbc/1JTAD3s5F
 HojjAXTC6VccfMrno8EPMx5uFIZV7Kvpw34R/7yRzu6ewm9Myib+bvmcdm5Wu3V+
 j8YyO/9eqOPKKMxHeXUg8WCy20RFxH4MdxpR9kJ2VOqSFHsqLmo=
 =c4IW
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix missing overflow use refcount checks in nf_tables.

2) Do not set IPS_ASSURED for IPS_NAT_CLASH entries in GRE tracker,
   from Florian Westphal.

3) Bail out if nf_ct_helper_hash is NULL before registering helper,
   from Florent Revest.

4) Use siphash() instead siphash_4u64() to fix performance regression,
   also from Florian.

5) Do not allow to add rules to removed chains via ID,
   from Thadeu Lima de Souza Cascardo.

6) Fix oob read access in byteorder expression, also from Thadeu.

netfilter pull request 23-07-06

====================

Link: https://lore.kernel.org/r/20230705230406.52201-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-06 11:02:58 +02:00
Klaus Kudielka
21327f81db net: mvneta: fix txq_map in case of txq_number==1
If we boot with mvneta.txq_number=1, the txq_map is set incorrectly:
MVNETA_CPU_TXQ_ACCESS(1) refers to TX queue 1, but only TX queue 0 is
initialized. Fix this.

Fixes: 50bf8cb6fc ("net: mvneta: Configure XPS support")
Signed-off-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://lore.kernel.org/r/20230705053712.3914-1-klaus.kudielka@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-06 10:29:39 +02:00
Thadeu Lima de Souza Cascardo
caf3ef7468 netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[   23.095215] ==================================================================
[   23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[   23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[   23.096358]
[   23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[   23.096770] Call Trace:
[   23.096910]  <IRQ>
[   23.097030]  dump_stack_lvl+0x60/0xc0
[   23.097218]  print_report+0xcf/0x630
[   23.097388]  ? nft_byteorder_eval+0x13c/0x320
[   23.097577]  ? kasan_addr_to_slab+0xd/0xc0
[   23.097760]  ? nft_byteorder_eval+0x13c/0x320
[   23.097949]  kasan_report+0xc9/0x110
[   23.098106]  ? nft_byteorder_eval+0x13c/0x320
[   23.098298]  __asan_load2+0x83/0xd0
[   23.098453]  nft_byteorder_eval+0x13c/0x320
[   23.098659]  nft_do_chain+0x1c8/0xc50
[   23.098852]  ? __pfx_nft_do_chain+0x10/0x10
[   23.099078]  ? __kasan_check_read+0x11/0x20
[   23.099295]  ? __pfx___lock_acquire+0x10/0x10
[   23.099535]  ? __pfx___lock_acquire+0x10/0x10
[   23.099745]  ? __kasan_check_read+0x11/0x20
[   23.099929]  nft_do_chain_ipv4+0xfe/0x140
[   23.100105]  ? __pfx_nft_do_chain_ipv4+0x10/0x10
[   23.100327]  ? lock_release+0x204/0x400
[   23.100515]  ? nf_hook.constprop.0+0x340/0x550
[   23.100779]  nf_hook_slow+0x6c/0x100
[   23.100977]  ? __pfx_nft_do_chain_ipv4+0x10/0x10
[   23.101223]  nf_hook.constprop.0+0x334/0x550
[   23.101443]  ? __pfx_ip_local_deliver_finish+0x10/0x10
[   23.101677]  ? __pfx_nf_hook.constprop.0+0x10/0x10
[   23.101882]  ? __pfx_ip_rcv_finish+0x10/0x10
[   23.102071]  ? __pfx_ip_local_deliver_finish+0x10/0x10
[   23.102291]  ? rcu_read_lock_held+0x4b/0x70
[   23.102481]  ip_local_deliver+0xbb/0x110
[   23.102665]  ? __pfx_ip_rcv+0x10/0x10
[   23.102839]  ip_rcv+0x199/0x2a0
[   23.102980]  ? __pfx_ip_rcv+0x10/0x10
[   23.103140]  __netif_receive_skb_one_core+0x13e/0x150
[   23.103362]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[   23.103647]  ? mark_held_locks+0x48/0xa0
[   23.103819]  ? process_backlog+0x36c/0x380
[   23.103999]  __netif_receive_skb+0x23/0xc0
[   23.104179]  process_backlog+0x91/0x380
[   23.104350]  __napi_poll.constprop.0+0x66/0x360
[   23.104589]  ? net_rx_action+0x1cb/0x610
[   23.104811]  net_rx_action+0x33e/0x610
[   23.105024]  ? _raw_spin_unlock+0x23/0x50
[   23.105257]  ? __pfx_net_rx_action+0x10/0x10
[   23.105485]  ? mark_held_locks+0x48/0xa0
[   23.105741]  __do_softirq+0xfa/0x5ab
[   23.105956]  ? __dev_queue_xmit+0x765/0x1c00
[   23.106193]  do_softirq.part.0+0x49/0xc0
[   23.106423]  </IRQ>
[   23.106547]  <TASK>
[   23.106670]  __local_bh_enable_ip+0xf5/0x120
[   23.106903]  __dev_queue_xmit+0x789/0x1c00
[   23.107131]  ? __pfx___dev_queue_xmit+0x10/0x10
[   23.107381]  ? find_held_lock+0x8e/0xb0
[   23.107585]  ? lock_release+0x204/0x400
[   23.107798]  ? neigh_resolve_output+0x185/0x350
[   23.108049]  ? mark_held_locks+0x48/0xa0
[   23.108265]  ? neigh_resolve_output+0x185/0x350
[   23.108514]  neigh_resolve_output+0x246/0x350
[   23.108753]  ? neigh_resolve_output+0x246/0x350
[   23.109003]  ip_finish_output2+0x3c3/0x10b0
[   23.109250]  ? __pfx_ip_finish_output2+0x10/0x10
[   23.109510]  ? __pfx_nf_hook+0x10/0x10
[   23.109732]  __ip_finish_output+0x217/0x390
[   23.109978]  ip_finish_output+0x2f/0x130
[   23.110207]  ip_output+0xc9/0x170
[   23.110404]  ip_push_pending_frames+0x1a0/0x240
[   23.110652]  raw_sendmsg+0x102e/0x19e0
[   23.110871]  ? __pfx_raw_sendmsg+0x10/0x10
[   23.111093]  ? lock_release+0x204/0x400
[   23.111304]  ? __mod_lruvec_page_state+0x148/0x330
[   23.111567]  ? find_held_lock+0x8e/0xb0
[   23.111777]  ? find_held_lock+0x8e/0xb0
[   23.111993]  ? __rcu_read_unlock+0x7c/0x2f0
[   23.112225]  ? aa_sk_perm+0x18a/0x550
[   23.112431]  ? filemap_map_pages+0x4f1/0x900
[   23.112665]  ? __pfx_aa_sk_perm+0x10/0x10
[   23.112880]  ? find_held_lock+0x8e/0xb0
[   23.113098]  inet_sendmsg+0xa0/0xb0
[   23.113297]  ? inet_sendmsg+0xa0/0xb0
[   23.113500]  ? __pfx_inet_sendmsg+0x10/0x10
[   23.113727]  sock_sendmsg+0xf4/0x100
[   23.113924]  ? move_addr_to_kernel.part.0+0x4f/0xa0
[   23.114190]  __sys_sendto+0x1d4/0x290
[   23.114391]  ? __pfx___sys_sendto+0x10/0x10
[   23.114621]  ? __pfx_mark_lock.part.0+0x10/0x10
[   23.114869]  ? lock_release+0x204/0x400
[   23.115076]  ? find_held_lock+0x8e/0xb0
[   23.115287]  ? rcu_is_watching+0x23/0x60
[   23.115503]  ? __rseq_handle_notify_resume+0x6e2/0x860
[   23.115778]  ? __kasan_check_write+0x14/0x30
[   23.116008]  ? blkcg_maybe_throttle_current+0x8d/0x770
[   23.116285]  ? mark_held_locks+0x28/0xa0
[   23.116503]  ? do_syscall_64+0x37/0x90
[   23.116713]  __x64_sys_sendto+0x7f/0xb0
[   23.116924]  do_syscall_64+0x59/0x90
[   23.117123]  ? irqentry_exit_to_user_mode+0x25/0x30
[   23.117387]  ? irqentry_exit+0x77/0xb0
[   23.117593]  ? exc_page_fault+0x92/0x140
[   23.117806]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   23.118081] RIP: 0033:0x7f744aee2bba
[   23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[   23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[   23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[   23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[   23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[   23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[   23.121617]  </TASK>
[   23.121749]
[   23.121845] The buggy address belongs to the virtual mapping at
[   23.121845]  [ffffc90000000000, ffffc90000009000) created by:
[   23.121845]  irq_init_percpu_irqstack+0x1cf/0x270
[   23.122707]
[   23.122803] The buggy address belongs to the physical page:
[   23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[   23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[   23.123998] page_type: 0xffffffff()
[   23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[   23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[   23.125023] page dumped because: kasan: bad access detected
[   23.125326]
[   23.125421] Memory state around the buggy address:
[   23.125682]  ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.126072]  ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[   23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[   23.126840]                                               ^
[   23.127138]  ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[   23.127522]  ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[   23.127906] ==================================================================
[   23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-06 00:53:14 +02:00
Linus Torvalds
6843306689 Including fixes from bluetooth, bpf and wireguard.
Current release - regressions:
 
  - nvme-tcp: fix comma-related oops after sendpage changes
 
 Current release - new code bugs:
 
  - ptp: make max_phase_adjustment sysfs device attribute invisible
    when not supported
 
 Previous releases - regressions:
 
  - sctp: fix potential deadlock on &net->sctp.addr_wq_lock
 
  - mptcp:
    - ensure subflow is unhashed before cleaning the backlog
    - do not rely on implicit state check in mptcp_listen()
 
 Previous releases - always broken:
 
  - net: fix net_dev_start_xmit trace event vs skb_transport_offset()
 
  - Bluetooth:
    - fix use-bdaddr-property quirk
    - L2CAP: fix multiple UaFs
    - ISO: use hci_sync for setting CIG parameters
    - hci_event: fix Set CIG Parameters error status handling
    - hci_event: fix parsing of CIS Established Event
    - MGMT: fix marking SCAN_RSP as not connectable
 
  - wireguard: queuing: use saner cpu selection wrapping
 
  - sched: act_ipt: various bug fixes for iptables <> TC interactions
 
  - sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX
 
  - dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging
 
  - eth: sfc: fix null-deref in devlink port without MAE access
 
  - eth: ibmvnic: do not reset dql stats on NON_FATAL err
 
 Misc:
 
  - xsk: honor SO_BINDTODEVICE on bind
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSlu+MACgkQMUZtbf5S
 Irslgw//S7jf/GL8V6y8VL3te+/OPOZnLDTzFFOdy64/y97FE6XIacJUpyWRhtmz
 oSzcSNHETPW9U+xSGa2ZQlKhAXt6n9iRNvUegql+VBb13Iz+l7AdTeoxRv/YuwDo
 5lTOIB6cBw+ATd0oxS6wr8SyUlcvktUKBfTAItjbVM55aXfIUpXIa84+F7avJgIA
 XP1u/3PHhwItmwo/hXhHH0+P0QA8ix1q2SvRB7DAlQLBsTuQhaKjXWQkYYTKw/Nt
 dtvh8iQSs/YXaHMjTa5CK28HOD8+ywIizr+uJ9VaNqIzV0W5JE9IE8P4NFpBcY7t
 kGjTYODOph7dkNmZ5RLj3N+B6CyC57OXDzoo/tr8940UytCLVj9EVyduarLGLx57
 edqK9cUz5kWejyGoyZ4Pvlo/SKvCQ2HKMeiAJ0/nNpTJMFuygMoqGsaD6ttzkXMj
 fZLPjRUK3axd+15ZzhLEf8HyL5Qh+qPqqX9p7NljfMKwhxMWJ5fuICJfdGOSdMJR
 ndL+wPfRPFQwszZ4pbTY2Ivn29mo8ScBOSOEgQs2mOny+zFzTzmqNWz/jcFfQnjS
 cylxBEHrgudT2FuCImZ/v66TM5yakHXqIdpTGG+zsvJWQqjM96Z3I7WRvi0g9d75
 n84il+j34mnzl90j2xEutqUiK7BQ9ZpZBsutPVTKBIHKWWiortI=
 =9yzk
 -----END PGP SIGNATURE-----

Merge tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, bpf and wireguard.

  Current release - regressions:

   - nvme-tcp: fix comma-related oops after sendpage changes

  Current release - new code bugs:

   - ptp: make max_phase_adjustment sysfs device attribute invisible
     when not supported

  Previous releases - regressions:

   - sctp: fix potential deadlock on &net->sctp.addr_wq_lock

   - mptcp:
      - ensure subflow is unhashed before cleaning the backlog
      - do not rely on implicit state check in mptcp_listen()

  Previous releases - always broken:

   - net: fix net_dev_start_xmit trace event vs skb_transport_offset()

   - Bluetooth:
      - fix use-bdaddr-property quirk
      - L2CAP: fix multiple UaFs
      - ISO: use hci_sync for setting CIG parameters
      - hci_event: fix Set CIG Parameters error status handling
      - hci_event: fix parsing of CIS Established Event
      - MGMT: fix marking SCAN_RSP as not connectable

   - wireguard: queuing: use saner cpu selection wrapping

   - sched: act_ipt: various bug fixes for iptables <> TC interactions

   - sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX

   - dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging

   - eth: sfc: fix null-deref in devlink port without MAE access

   - eth: ibmvnic: do not reset dql stats on NON_FATAL err

  Misc:

   - xsk: honor SO_BINDTODEVICE on bind"

* tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
  nfp: clean mc addresses in application firmware when closing port
  selftests: mptcp: pm_nl_ctl: fix 32-bit support
  selftests: mptcp: depend on SYN_COOKIES
  selftests: mptcp: userspace_pm: report errors with 'remove' tests
  selftests: mptcp: userspace_pm: use correct server port
  selftests: mptcp: sockopt: return error if wrong mark
  selftests: mptcp: sockopt: use 'iptables-legacy' if available
  selftests: mptcp: connect: fail if nft supposed to work
  mptcp: do not rely on implicit state check in mptcp_listen()
  mptcp: ensure subflow is unhashed before cleaning the backlog
  s390/qeth: Fix vipa deletion
  octeontx-af: fix hardware timestamp configuration
  net: dsa: sja1105: always enable the send_meta options
  net: dsa: tag_sja1105: fix MAC DA patching from meta frames
  net: Replace strlcpy with strscpy
  pptp: Fix fib lookup calls.
  mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check
  net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
  xsk: Honor SO_BINDTODEVICE on bind
  ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported
  ...
2023-07-05 15:44:45 -07:00
Linus Torvalds
73a3fcdaa7 f2fs update for 6.5-rc1
In this cycle, we've mainly investigated the zoned block device support along
 with patches such as correcting write pointers between f2fs and storage, adding
 asynchronous zone reset flow, and managing the number of open zones. Other than
 them, f2fs adds another mount option, "errors=x" to specify how to handle when
 it detects an unexpected behavior at runtime.
 
 Enhancement:
  - support errors=remount-ro|continue|panic mountoption
  - enforce some inode flag policies
  - allow .tmp compression given extensions
  - add some ioctls to manage the f2fs compression
  - improve looped node chain flow
  - avoid issuing small-sized discard commands during checkpoint
  - implement an asynchronous zone reset
 
 Bug fix:
  - fix deadlock in xattr and inode page lock
  - fix and add sanity check in some error paths
  - fix to avoid NULL pointer dereference f2fs_write_end_io() along with put_super
  - set proper flags to quota files
  - fix potential deadlock due to unpaired node_write lock use
  - fix over-estimating free section during FG GC
  - fix the wrong condition to determine atomic context
 
 As usual, also there are a number of patches having code refactoring and minor
 clean-ups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmSlracACgkQQBSofoJI
 UNJETA//RhFkVGbKlZ7cAuB6xwaLPmRi+aUSn3/rBbdq6CjBOslClEYwW981Fe19
 q6VSIX7Zt5RKZg2+meHLhE28rGknLaZ4wFC/F4BDjoykN119g+KXkypjf+OVbatK
 zLImOzXVKjAWJevpYCENcUQY/xI2kNtg3csp6kq9GQoZCD2US/wzzI0QF6xCQ1q3
 WZpHHy2fi6Lry2xGEbDLKlxg9e5nDqhOSp6S/taF+w+RyUMlTcoIU2fIWT0iZ2kI
 taFe+PWHHkJg2oJcgZ+hx5ZACGb1BWdHg8n1N0MK/IiNbA6CWe+iSM9T0TtctM5V
 Gkz4233bFR976O27Bu2znqil+AH3ECZ/F2HbRbh5vTFJIhUMwNQ1GJTgDqE5Vf8h
 R4W/ejbBSLM/g4TK5boyWrrKpAAVmhFhSj+OBboIlFFKchP4RKaqL7Az+m0tGSD3
 0uCVAFtzmctBmgJ9ko+dxwwFwbbGiG6MDYeGIODMdqFQMQrcGqPoOrcWbj2hPoaW
 LRz/OzA8N0fQUfvCH6E31Ypd6cBFrG++FgtFA4lsv10KNsJUOx4MbEWxF5HobV3t
 axpcRsOOb95hAmqenY3sWyiR+IcIEAYlzx6D6QEbNPe47fuLxr9jx52Ollw6RDfj
 RvkaxzAIDeESMWWNftzJ8r8Wt6RzoBv5JjBkFIMbCz28V3v9R44=
 =TAG3
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this cycle, we've mainly investigated the zoned block device
  support along with patches such as correcting write pointers between
  f2fs and storage, adding asynchronous zone reset flow, and managing
  the number of open zones.

  Other than them, f2fs adds another mount option, "errors=x" to specify
  how to handle when it detects an unexpected behavior at runtime.

  Enhancements:
   - support 'errors=remount-ro|continue|panic' mount option
   - enforce some inode flag policies
   - allow .tmp compression given extensions
   - add some ioctls to manage the f2fs compression
   - improve looped node chain flow
   - avoid issuing small-sized discard commands during checkpoint
   - implement an asynchronous zone reset

  Bug fixes:
   - fix deadlock in xattr and inode page lock
   - fix and add sanity check in some error paths
   - fix to avoid NULL pointer dereference f2fs_write_end_io() along
     with put_super
   - set proper flags to quota files
   - fix potential deadlock due to unpaired node_write lock use
   - fix over-estimating free section during FG GC
   - fix the wrong condition to determine atomic context

  As usual, also there are a number of patches with code refactoring and
  minor clean-ups"

* tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
  f2fs: fix to do sanity check on direct node in truncate_dnode()
  f2fs: only set release for file that has compressed data
  f2fs: fix compile warning in f2fs_destroy_node_manager()
  f2fs: fix error path handling in truncate_dnode()
  f2fs: fix deadlock in i_xattr_sem and inode page lock
  f2fs: remove unneeded page uptodate check/set
  f2fs: update mtime and ctime in move file range method
  f2fs: compress tmp files given extension
  f2fs: refactor struct f2fs_attr macro
  f2fs: convert to use sbi directly
  f2fs: remove redundant assignment to variable err
  f2fs: do not issue small discard commands during checkpoint
  f2fs: check zone write pointer points to the end of zone
  f2fs: add f2fs_ioc_get_compress_blocks
  f2fs: cleanup MIN_INLINE_XATTR_SIZE
  f2fs: add helper to check compression level
  f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
  f2fs: do more sanity check on inode
  f2fs: compress: fix to check validity of i_compress_flag field
  f2fs: add sanity compress level check for compressed file
  ...
2023-07-05 14:14:37 -07:00
Linus Torvalds
bb8e7e9f0b More new code for 6.5:
* Fix some ordering problems with log items during log recovery.
  * Don't deadlock the system by trying to flush busy freed extents while
    holding on to busy freed extents.
  * Improve validation of log geometry parameters when reading the
    primary superblock.
  * Validate the length field in the AGF header.
  * Fix recordset filtering bugs when re-calling GETFSMAP to return more
    results when the resultset didn't previously fit in the caller's buffer.
  * Fix integer overflows in GETFSMAP when working with rt volumes larger
    than 2^32 fsblocks.
  * Fix GETFSMAP reporting the undefined space beyond the last rtextent.
  * Fix filtering bugs in GETFSMAP's log device backend if the log ever
    becomes longer than 2^32 fsblocks.
  * Improve validation of file offsets in the GETFSMAP range parameters.
  * Fix an off by one bug in the pmem media failure notification
    computation.
  * Validate the length field in the AGI header too.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZKL9IwAKCRBKO3ySh0YR
 prFLAQC+dp1bV5ShBPfYJMCSUS7gmZEge01QrLTqcpyu8mO5GgD/YLUdD2Iebc8t
 AS1Awj1iec7AFtCWcd3bTeNZD7vL9w0=
 =j/oi
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.5-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull more xfs updates from Darrick Wong:

 - Fix some ordering problems with log items during log recovery

 - Don't deadlock the system by trying to flush busy freed extents while
   holding on to busy freed extents

 - Improve validation of log geometry parameters when reading the
   primary superblock

 - Validate the length field in the AGF header

 - Fix recordset filtering bugs when re-calling GETFSMAP to return more
   results when the resultset didn't previously fit in the caller's
   buffer

 - Fix integer overflows in GETFSMAP when working with rt volumes larger
   than 2^32 fsblocks

 - Fix GETFSMAP reporting the undefined space beyond the last rtextent

 - Fix filtering bugs in GETFSMAP's log device backend if the log ever
   becomes longer than 2^32 fsblocks

 - Improve validation of file offsets in the GETFSMAP range parameters

 - Fix an off by one bug in the pmem media failure notification
   computation

 - Validate the length field in the AGI header too

* tag 'xfs-6.5-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: Remove unneeded semicolon
  xfs: AGI length should be bounds checked
  xfs: fix the calculation for "end" and "length"
  xfs: fix xfs_btree_query_range callers to initialize btree rec fully
  xfs: validate fsmap offsets specified in the query keys
  xfs: fix logdev fsmap query result filtering
  xfs: clean up the rtbitmap fsmap backend
  xfs: fix getfsmap reporting past the last rt extent
  xfs: fix integer overflows in the fsmap rtbitmap and logdev backends
  xfs: fix interval filtering in multi-step fsmap queries
  xfs: fix bounds check in xfs_defer_agfl_block()
  xfs: AGF length has never been bounds checked
  xfs: journal geometry is not properly bounds checked
  xfs: don't block in busy flushing when freeing extents
  xfs: allow extent free intents to be retried
  xfs: pass alloc flags through to xfs_extent_busy_flush()
  xfs: use deferred frees for btree block freeing
  xfs: don't reverse order of items in bulk AIL insertion
  xfs: remove redundant initializations of pointers drop_leaf and save_leaf
2023-07-05 14:08:03 -07:00
Linus Torvalds
ace1ba1c90 pwm: Changes for v6.5-rc1
There's a little bit of everything in here: we've got various
 improvements and cleanups to drivers, some fixes across the board and a
 bit of new hardware support.
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEEiOrDCAFJzPfAjcif3SOs138+s6EFAmSlRkwZHHRoaWVycnku
 cmVkaW5nQGdtYWlsLmNvbQAKCRDdI6zXfz6zof9ZD/9PtA1GNgkM/FCJiIcIBXpx
 FVFf6y+gHx50g+O7JDbfcNxS3b5rcbJQOwhxtTNR+9orkYUWuOkiUqWf8bUZuXCt
 tjw7xW1LzDdLnxrarCD37T0aihYtKnKLxIQwdIEwS4nntaqJIdDdF62VDX+jdsbN
 YvXN1WrMbGFq77EgjbWlHv8Bsm/T1mWPi3CNsKi33LvS4flCC1Mq28YalGUmwpZo
 ZCJmUhPFwRbmaQ74yjUOa/UST18NwjQ/P6X25jiQ5Fjg8HZZH7RDJMduWQsT9520
 U7I7UtHyZiJSGXonHTTvyhUK55wrrTVJHaXxufqvQQPTpEMG1rtlgGnBjvi/4fxt
 vPIjWi46QCFtKzRFIaazHBiIEmbS+ahs8uSERmoBhlgynHyqe6/cZBdKwRp0kjSY
 jLhSiE38ODENdbQmyEWuCQqv1yWQfRwwm1/Y5B4/GEX9TRNMFfUDPTEIYKEPxPM2
 XI/FjMM+bP52t/rEPgCHL9E26tWv3SFPNYrafacLLfpol2iLdlAzp+ni3uz6Zj3H
 f3d6LyKjVAPc1R3ICdrEu+YHRA7K10wlOQwWwXsg57dJmLtKY7gbIKJjmGVYZX63
 od8/UUPzpF+eNHbrHFR1bAlwNO1XvPqHhPaQrqTQAV6zf3+VEovljLEGQmKz7yQP
 ROWkedAsD7rwkFKzz9zulg==
 =gWkY
 -----END PGP SIGNATURE-----

Merge tag 'pwm/for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm updates from Thierry Reding:
 "There's a little bit of everything in here: we've got various
  improvements and cleanups to drivers, some fixes across the board and
  a bit of new hardware support"

* tag 'pwm/for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (22 commits)
  dt-bindings: pwm: convert pwm-bcm2835 bindings to YAML
  pwm: Add Renesas RZ/G2L MTU3a PWM driver
  pwm: mtk_disp: Fix the disable flow of disp_pwm
  dt-bindings: pwm: restrict node name suffixes
  pwm: pca9685: Switch i2c driver back to use .probe()
  pwm: ab8500: Fix error code in probe()
  MAINTAINERS: add pwm to PolarFire SoC entry
  pwm: add microchip soft ip corePWM driver
  pwm: sysfs: Do not apply state to already disabled PWMs
  pwm: imx-tpm: force 'real_period' to be zero in suspend
  pwm: meson: make full use of common clock framework
  pwm: meson: don't use hdmi/video clock as mux parent
  pwm: meson: switch to using struct clk_parent_data for mux parents
  pwm: meson: remove not needed check in meson_pwm_calc
  pwm: meson: fix handling of period/duty if greater than UINT_MAX
  pwm: meson: modify and simplify calculation in meson_pwm_get_state
  dt-bindings: pwm: Add R-Car V3U device tree bindings
  dt-bindings: pwm: imx: add i.MX8QXP compatible
  pwm: mediatek: Add support for MT7981
  dt-bindings: pwm: mediatek: Add mediatek,mt7981 compatible
  ...
2023-07-05 12:55:06 -07:00
Linus Torvalds
b986158164 Devicetree updates for v6.5, part 2:
- Whitespace clean-ups in binding examples
 
 - Restrict node name suffixes to "-[0-9]+" for cases of multiple
   instances which don't have unit-addresses
 
 - Convert brcm,kona-wdt and cdns,wdt-r1p2 watchdog bindings to DT schema
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmSlqhkACgkQ+vtdtY28
 YcMIKQ/+J31iU4ILtNYDYIpwPHplcAKPnY/tNkAClCmIaL9+9MSTJssSu4eETVZ9
 vOt11TCPJ0UVxQnDGqTNYWzGe9Z76RqxAeceV2gkLb8zWSRRG1PV2wnjJ1bdOzKd
 NU0bgL9ow8QtCjEQp+Bf+IjgyJr2dzTuke/5EzaO87khkjSvFW/EZoH31mTzaqSU
 EO8pmHJScwj+Hf0MJuYBWxUcedsRaG824MIHPeQDn9OfHecv/Sq3Diccd9JJGLWB
 RREXynLo5BAARzbZ0Jvkkpn/UUnmUMWJpAJClx5YbDCDRnPce5uEpslxuxuyVSzT
 EgRe+z4+smlNlPryQRY/F9SiYP1BymeiQDXAVto+xSkzKU820Kag2cw2DlowtKoR
 GhyNnKFfqfg0zwfAynhuKDfec+1a367EAad9Vta0c1oZVqarqjD/tu6C0VC4dLPb
 NE2spUEkWw1G1is3867Yb0RAhi50E/Oq3smYIbWuQP9UDeiZGJemFCQx9ZpdUTY4
 lvu0DTQOrySXRkkeh1kjqHhYerbGn1fgwZTfsOJfWDwL3qgnm9DFx53vemsfnSjT
 l6xj5sB56LaxPEitkMzOp6mKFK5hXQMZ6HYD/YedmiZCFKl4KuPjdBqUVo6swlfQ
 PLRI0UpoMEFvJ44LL2oGGye9ZZgrCnkrEPJUg4clxLPqne8jJeg=
 =cRW/
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull more devicetree updates from Rob Herring:

 - Whitespace clean-ups in binding examples

 - Restrict node name suffixes to "-[0-9]+" for cases of multiple
   instances which don't have unit-addresses

 - Convert brcm,kona-wdt and cdns,wdt-r1p2 watchdog bindings to DT
   schema

* tag 'devicetree-for-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: soc: qcom: stats: Update maintainer email
  dt-bindings: cleanup DTS example whitespaces
  dt-bindings: timestamp: restrict node name suffixes
  dt-bindings: slimbus: restrict node name suffixes
  dt-bindings: watchdog: restrict node name suffixes
  dt-bindings: watchdog: brcm,kona-wdt: convert txt file to yaml
  dt-bindings: watchdog: cdns,wdt-r1p2: Convert cadence watchdog to yaml
2023-07-05 12:50:27 -07:00
Aravindhan Gunasekaran
84a192e461 igc: Handle PPS start time programming for past time values
I225/6 hardware can be programmed to start PPS output once
the time in Target Time registers is reached. The time
programmed in these registers should always be into future.
Only then PPS output is triggered when SYSTIM register
reaches the programmed value. There are two modes in i225/6
hardware to program PPS, pulse and clock mode.

There were issues reported where PPS is not generated when
start time is in past.

Example 1, "echo 0 0 0 2 0 > /sys/class/ptp/ptp0/period"

In the current implementation, a value of '0' is programmed
into Target time registers and PPS output is in pulse mode.
Eventually an interrupt which is triggered upon SYSTIM
register reaching Target time is not fired. Thus no PPS
output is generated.

Example 2, "echo 0 0 0 1 0 > /sys/class/ptp/ptp0/period"

Above case, a value of '0' is programmed into Target time
registers and PPS output is in clock mode. Here, HW tries to
catch-up the current time by incrementing Target Time
register. This catch-up time seem to vary according to
programmed PPS period time as per the HW design. In my
experiments, the delay ranged between few tens of seconds to
few minutes. The PPS output is only generated after the
Target time register reaches current time.

In my experiments, I also observed PPS stopped working with
below test and could not recover until module is removed and
loaded again.

1) echo 0 <future time> 0 1 0 > /sys/class/ptp/ptp1/period
2) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period
3) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period

After this PPS did not work even if i re-program with proper
values. I could only get this back working by reloading the
driver.

This patch takes care of calculating and programming
appropriate future time value into Target Time registers.

Fixes: 5e91c72e56 ("igc: Fix PPS delta between two synchronized end-points")
Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05 11:18:56 -07:00
Tan Tee Min
25102893e4 igc: Include the length/type field and VLAN tag in queueMaxSDU
IEEE 802.1Q does not have clear definitions of what constitutes an
SDU (Service Data Unit), but IEEE Std 802.3 clause 3.1.2 does define
the MAC service primitives and clause 3.2.7 does define the MAC Client
Data for Q-tagged frames.

It shows that the mac_service_data_unit (MSDU) does NOT contain the
preamble, destination and source address, or FCS. The MSDU does contain
the length/type field, MAC client data, VLAN tag and any padding
data (prior to the FCS).

Thus, the maximum 802.3 frame size that is allowed to be transmitted
should be QueueMaxSDU (MSDU) + 16 (6 byte SA + 6 byte DA + 4 byte FCS).

Fixes: 92a0dcb842 ("igc: offload queue max SDU from tc-taprio")
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05 11:18:56 -07:00
Prasad Koya
9ac3fc2f42 igc: set TP bit in 'supported' and 'advertising' fields of ethtool_link_ksettings
set TP bit in the 'supported' and 'advertising' fields. i225/226 parts
only support twisted pair copper.

Fixes: 8c5ad0dae9 ("igc: Add ethtool support")
Signed-off-by: Prasad Koya <prasad@arista.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05 11:18:43 -07:00
Yinjun Zhang
cc7eab25b1 nfp: clean mc addresses in application firmware when closing port
When moving devices from one namespace to another, mc addresses are
cleaned in software while not removed from application firmware. Thus
the mc addresses are remained and will cause resource leak.

Now use `__dev_mc_unsync` to clean mc addresses when closing port.

Fixes: e20aa071cd ("nfp: fix schedule in atomic context when sync mc address")
Cc: stable@vger.kernel.org
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Message-ID: <20230705052818.7122-1-louis.peens@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-05 10:59:12 -07:00
Jakub Kicinski
fdaff05b4a bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZKWhbwAKCRDbK58LschI
 g4XPAPsE0BO25GLziHlTxhsXceSewwNzPTPbK/77Kl7jNVVF+wEAlh2og3OlUdiS
 4RtwqyIoP7P4XZ5ii3XGylYN7wBHHgE=
 =AIOh
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-07-05

We've added 2 non-merge commits during the last 1 day(s) which contain
a total of 3 files changed, 16 insertions(+), 4 deletions(-).

The main changes are:

1) Fix BTF to warn but not returning an error for a NULL BTF to still be
   able to load modules under CONFIG_DEBUG_INFO_BTF, from SeongJae Park.

2) Fix xsk sockets to honor SO_BINDTODEVICE in bind(), from Ilya Maximets.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  xsk: Honor SO_BINDTODEVICE on bind
  bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()
====================

Link: https://lore.kernel.org/r/20230705171716.6494-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-05 10:57:13 -07:00
Dragos Tatulea
7abd955a58 net/mlx5e: RX, Fix page_pool page fragment tracking for XDP
Currently mlx5e releases pages directly to the page_pool for XDP_TX and
does page fragment counting for XDP_REDIRECT. RX pages from the
page_pool are leaking on XDP_REDIRECT because the xdp core will release
only one fragment out of MLX5E_PAGECNT_BIAS_MAX and subsequently the page
is marked as "skip release" which avoids the driver release.

A fix would be to take an extra fragment for XDP_REDIRECT and not set the
"skip release" bit so that the release on the driver side can handle the
remaining bias fragments. But this would be a shortsighted solution.
Instead, this patch converges the two XDP paths (XDP_TX and XDP_REDIRECT) to
always do fragment tracking. The "skip release" bit is no longer
necessary for XDP.

Fixes: 6f57428460 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05 10:57:04 -07:00
Maher Sanalla
6496357aa5 net/mlx5: Query hca_cap_2 only when supported
On vport enable, where fw's hca caps are queried, the driver queries
hca_caps_2 without checking if fw truly supports them, causing a false
failure of vfs vport load and blocking SRIOV enablement on old devices
such as CX4 where hca_caps_2 support is missing.

Thus, add a check for the said caps support before accessing them.

Fixes: e5b9642a33 ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05 10:57:04 -07:00
Yevgeny Kliteynik
f7a485115a net/mlx5e: TC, CT: Offload ct clear only once
Non-clear CT action causes a flow rule split, while CT clear action
doesn't and is just a header-rewrite to the current flow rule.
But ct offload is done in post_parse and is per ct action instance,
so ct clear offload is parsed multiple times, while its deleted once.

Fix this by post_parsing the ct action only once per flow attribute
(which is per flow rule) by using a offloaded ct_attr flag.

Fixes: 08fe94ec5f ("net/mlx5e: TC, Remove special handling of CT action")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05 10:57:04 -07:00
Vlad Buslov
65e64640e9 net/mlx5e: Check for NOT_READY flag state after locking
Currently the check for NOT_READY flag is performed before obtaining the
necessary lock. This opens a possibility for race condition when the flow
is concurrently removed from unready_flows list by the workqueue task,
which causes a double-removal from the list and a crash[0]. Fix the issue
by moving the flag check inside the section protected by
uplink_priv->unready_flows_lock mutex.

[0]:
[44376.389654] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP
[44376.391665] CPU: 7 PID: 59123 Comm: tc Not tainted 6.4.0-rc4+ #1
[44376.392984] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[44376.395342] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core]
[44376.396857] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06
[44376.399167] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246
[44376.399680] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00
[44376.400337] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0
[44376.401001] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001
[44376.401663] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000
[44376.402342] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000
[44376.402999] FS:  00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000
[44376.403787] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44376.404343] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0
[44376.405004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[44376.405665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[44376.406339] Call Trace:
[44376.406651]  <TASK>
[44376.406939]  ? die_addr+0x33/0x90
[44376.407311]  ? exc_general_protection+0x192/0x390
[44376.407795]  ? asm_exc_general_protection+0x22/0x30
[44376.408292]  ? mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core]
[44376.408876]  __mlx5e_tc_del_fdb_peer_flow+0xbc/0xe0 [mlx5_core]
[44376.409482]  mlx5e_tc_del_flow+0x42/0x210 [mlx5_core]
[44376.410055]  mlx5e_flow_put+0x25/0x50 [mlx5_core]
[44376.410529]  mlx5e_delete_flower+0x24b/0x350 [mlx5_core]
[44376.411043]  tc_setup_cb_reoffload+0x22/0x80
[44376.411462]  fl_reoffload+0x261/0x2f0 [cls_flower]
[44376.411907]  ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core]
[44376.412481]  ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core]
[44376.413044]  tcf_block_playback_offloads+0x76/0x170
[44376.413497]  tcf_block_unbind+0x7b/0xd0
[44376.413881]  tcf_block_setup+0x17d/0x1c0
[44376.414269]  tcf_block_offload_cmd.isra.0+0xf1/0x130
[44376.414725]  tcf_block_offload_unbind+0x43/0x70
[44376.415153]  __tcf_block_put+0x82/0x150
[44376.415532]  ingress_destroy+0x22/0x30 [sch_ingress]
[44376.415986]  qdisc_destroy+0x3b/0xd0
[44376.416343]  qdisc_graft+0x4d0/0x620
[44376.416706]  tc_get_qdisc+0x1c9/0x3b0
[44376.417074]  rtnetlink_rcv_msg+0x29c/0x390
[44376.419978]  ? rep_movs_alternative+0x3a/0xa0
[44376.420399]  ? rtnl_calcit.isra.0+0x120/0x120
[44376.420813]  netlink_rcv_skb+0x54/0x100
[44376.421192]  netlink_unicast+0x1f6/0x2c0
[44376.421573]  netlink_sendmsg+0x232/0x4a0
[44376.421980]  sock_sendmsg+0x38/0x60
[44376.422328]  ____sys_sendmsg+0x1d0/0x1e0
[44376.422709]  ? copy_msghdr_from_user+0x6d/0xa0
[44376.423127]  ___sys_sendmsg+0x80/0xc0
[44376.423495]  ? ___sys_recvmsg+0x8b/0xc0
[44376.423869]  __sys_sendmsg+0x51/0x90
[44376.424226]  do_syscall_64+0x3d/0x90
[44376.424587]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[44376.425046] RIP: 0033:0x7f045134f887
[44376.425403] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[44376.426914] RSP: 002b:00007ffd63a82b98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[44376.427592] RAX: ffffffffffffffda RBX: 000000006481955f RCX: 00007f045134f887
[44376.428195] RDX: 0000000000000000 RSI: 00007ffd63a82c00 RDI: 0000000000000003
[44376.428796] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[44376.429404] R10: 00007f0451208708 R11: 0000000000000246 R12: 0000000000000001
[44376.430039] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400
[44376.430644]  </TASK>
[44376.430907] Modules linked in: mlx5_ib mlx5_core act_mirred act_tunnel_key cls_flower vxlan dummy sch_ingress openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_g
ss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core]
[44376.433936] ---[ end trace 0000000000000000 ]---
[44376.434373] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core]
[44376.434951] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06
[44376.436452] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246
[44376.436924] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00
[44376.437530] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0
[44376.438179] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001
[44376.438786] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000
[44376.439393] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000
[44376.439998] FS:  00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000
[44376.440714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44376.441225] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0
[44376.441843] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[44376.442471] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: ad86755b18 ("net/mlx5e: Protect unready flows with dedicated lock")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05 10:57:03 -07:00
Saeed Mahameed
631079e08a net/mlx5: Register a unique thermal zone per device
Prior to this patch only one "mlx5" thermal zone could have been
registered regardless of the number of individual mlx5 devices in the
system.

To fix this setup a unique name per device to register its own thermal
zone.

In order to not register a thermal zone for a virtual device (VF/SF) add
a check for PF device type.

The new name is a concatenation between "mlx5_" and "<PCI_DEV_BDF>", which
will also help associating a thermal zone with its PCI device.

$ lspci | grep ConnectX
00:04.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
00:05.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

$ cat /sys/devices/virtual/thermal/thermal_zone0/type
mlx5_0000:00:04.0
$ cat /sys/devices/virtual/thermal/thermal_zone1/type
mlx5_0000:00:05.0

Fixes: c1fef618d6 ("net/mlx5: Implement thermal zone")
CC: Sandipan Patra <spatra@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05 10:57:03 -07:00