If there is a failure during rtw89_fw_h2c_raw() rtw89_debug_priv_send_h2c
should return negative error code instead of a positive value count.
Fix this bug by returning correct error code.
Fixes: e3ec7017f6 ("rtw89: add Realtek 802.11ax driver")
Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/tencent_AD09A61BC4DA92AD1EB0790F5C850E544D07@qq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For some device types like TXGBE_ID_XAUI, *checksum computed in
txgbe_calc_eeprom_checksum() is larger than TXGBE_EEPROM_SUM. Remove the
limit on the size of *checksum.
Fixes: 049fe53653 ("net: txgbe: Add operations to interact with firmware")
Fixes: 5e2ea7801f ("net: txgbe: Fix unsigned comparison to zero in txgbe_calc_eeprom_checksum()")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://lore.kernel.org/r/20230711063414.3311-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet says[1]:
-------
Speaking of psched_mtu(), I see that net/sched/sch_pie.c is using it
without holding RTNL, so dev->mtu can be changed underneath.
KCSAN could issue a warning.
-------
Annotate dev->mtu with READ_ONCE() so KCSAN don't issue a warning.
[1] https://lore.kernel.org/all/CANn89iJoJO5VtaJ-2=_d2aOQhb0Xw8iBT_Cxqp2HyuS-zj6azw@mail.gmail.com/
v1 -> v2: Fix commit message
Fixes: d4b36210c2 ("net: pkt_sched: PIE AQM scheme")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230711021634.561598-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ENA adapters on our instances occasionally reset. Once recently
logged a UBSAN failure to console in the process:
UBSAN: shift-out-of-bounds in build/linux/drivers/net/ethernet/amazon/ena/ena_com.c:540:13
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 28 PID: 70012 Comm: kworker/u72:2 Kdump: loaded not tainted 5.15.117
Hardware name: Amazon EC2 c5d.9xlarge/, BIOS 1.0 10/16/2017
Workqueue: ena ena_fw_reset_device [ena]
Call Trace:
<TASK>
dump_stack_lvl+0x4a/0x63
dump_stack+0x10/0x16
ubsan_epilogue+0x9/0x36
__ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e
? __const_udelay+0x43/0x50
ena_delay_exponential_backoff_us.cold+0x16/0x1e [ena]
wait_for_reset_state+0x54/0xa0 [ena]
ena_com_dev_reset+0xc8/0x110 [ena]
ena_down+0x3fe/0x480 [ena]
ena_destroy_device+0xeb/0xf0 [ena]
ena_fw_reset_device+0x30/0x50 [ena]
process_one_work+0x22b/0x3d0
worker_thread+0x4d/0x3f0
? process_one_work+0x3d0/0x3d0
kthread+0x12a/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
Apparently, the reset delays are getting so large they can trigger a
UBSAN panic.
Looking at the code, the current timeout is capped at 5000us. Using a
base value of 100us, the current code will overflow after (1<<29). Even
at values before 32, this function wraps around, perhaps
unintentionally.
Cap the value of the exponent used for this backoff at (1<<16) which is
larger than currently necessary, but large enough to support bigger
values in the future.
Cc: stable@vger.kernel.org
Fixes: 4bb7f4cf60 ("net: ena: reduce driver load time")
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230711013621.GE1926@templeofstupid.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The simple_write_to_buffer() function is designed to handle partial
writes. It returns negatives on error, otherwise it returns the number
of bytes that were able to be copied. This code doesn't check the
return properly. We only know that the first byte is written, the rest
of the buffer might be uninitialized.
There is no need to use the simple_write_to_buffer() function.
Partial writes are prohibited by the "if (*ppos != 0)" check at the
start of the function. Just use memdup_user() and copy the whole
buffer.
Fixes: d3cbb907ae ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/7c1f950b-3a7d-4252-82a6-876e53078ef7@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The kernel does not currently validate that both the minimum and maximum
ports of a port range are specified. This can lead user space to think
that a filter matching on a port range was successfully added, when in
fact it was not. For example, with a patched (buggy) iproute2 that only
sends the minimum port, the following commands do not return an error:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 100-200 action pass
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp dst_port 100-200 action pass
# tc filter show dev swp1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
filter protocol ip pref 1 flower chain 0 handle 0x2
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action pass
random type none pass val 0
index 2 ref 1 bind 1
Fix by returning an error unless both ports are specified:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 100-200 action pass
Error: Both min and max source ports must be specified.
We have an error talking to the kernel
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp dst_port 100-200 action pass
Error: Both min and max destination ports must be specified.
We have an error talking to the kernel
Fixes: 5c72299fba ("net: sched: cls_flower: Classify packets using port ranges")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
igc: Fix corner cases for TSN offload
Florian Kauer says:
The igc driver supports several different offloading capabilities
relevant in the TSN context. Recent patches in this area introduced
regressions for certain corner cases that are fixed in this series.
Each of the patches (except the first one) addresses a different
regression that can be separately reproduced. Still, they have
overlapping code changes so they should not be separately applied.
Especially #4 and #6 address the same observation,
but both need to be applied to avoid TX hang occurrences in
the scenario described in the patches.
====================
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
linux-arm-msm is the list most people subscribe to in order to receive
updates about Qualcomm related drivers. Make sure changes for the
Qualcomm ethernet driver make it there.
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20230710195240.197047-1-ahalaney@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is equivalent to 'gen2', and it was always confusing to have
two identical config entries. The split config patch actually had
been originally developed after removing 'use_tfh" and didn't add
the use_tfh in the new configs as they'd later been copied to the
new files. Thus the easiest way to fix the init crash here now is
to just remove use_tfh (which is erroneously unset in most of the
configs now) and use 'gen2' in the code instead.
There's possibly still an unwind error in iwl_txq_gen2_init() as
it crashes if TXQ 0 fails to initialize, but we can deal with it
later since the original failure is due to the use_tfh confusion.
Tested-by: Xi Ruoyao <xry111@xry111.site>
Reported-and-tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Reported-and-tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217622
Link: https://lore.kernel.org/all/9274d9bd3d080a457649ff5addcc1726f08ef5b2.camel@xry111.site/
Link: https://lore.kernel.org/all/CAAJw_Zug6VCS5ZqTWaFSr9sd85k%3DtyPm9DEE%2BmV%3DAKoECZM%2BsQ@mail.gmail.com/
Fixes: 19898ce9cf ("wifi: iwlwifi: split 22000.c into multiple files")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230710145038.84186-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Quieten a gcc (11.3.0) build error or warning by checking the function
call status and returning -EBUSY if the function call failed.
This is similar to what several other wireless drivers do for the
SIOCGIWRATE ioctl call when there is a locking problem.
drivers/net/wireless/cisco/airo.c: error: 'status_rid.currentXmitRate' is used uninitialized [-Werror=uninitialized]
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/39abf2c7-24a-f167-91da-ed4c5435d1c4@linux-m68k.org
Link: https://lore.kernel.org/r/20230709133154.26206-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Due to hardware limitation, MCAM drop rule with
ether_type == 802.1Q and vlan_id == 0 is not supported. Hence rejecting
such rules.
Fixes: dce677da57 ("octeontx2-pf: Add vlan-etype to ntuple filters")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20230710103027.2244139-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Remove unnecessary of_node_put from the continue path to prevent
child node from being released twice, which could avoid resource
leak or other unexpected issues.
Signed-off-by: Lu Hongfei <luhongfei@vivo.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Fixes: de879a016a ("net: dsa: felix: add functionality when not all ports are supported")
Link: https://lore.kernel.org/r/20230710031859.36784-1-luhongfei@vivo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wei Fang says:
====================
net: fec: fix some issues of ndo_xdp_xmit()
We encountered some issues when testing the ndo_xdp_xmit() interface
of the fec driver on i.MX8MP and i.MX93 platforms. These issues are
easy to reproduce, and the specific reproduction steps are as follows.
step1: The ethernet port of a board (board A) is connected to the EQOS
port of i.MX8MP/i.MX93, and the FEC port of i.MX8MP/i.MX93 is connected
to another ethernet port, such as a switch port.
step2: Board A uses the pktgen_sample03_burst_single_flow.sh to generate
and send packets to i.MX8MP/i.MX93. The command is shown below.
./pktgen_sample03_burst_single_flow.sh -i eth0 -d 192.168.6.8 -m \
56:bf:0d:68:b0:9e -s 1500
step3: i.MX8MP/i.MX93 use the xdp_redirect bfp program to redirect the
XDP frames from EQOS port to FEC port. The command is shown below.
./xdp_redirect eth1 eth0
After a few moments, the warning or error logs will be printed in the
console, for more details, please refer to the commit message of each
patch.
====================
Link: https://lore.kernel.org/r/20230706081012.2278063-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In the case of heavy XDP traffic to be transmitted, the console
will print the error log continuously if there are lack of enough
BDs to accommodate the frames. The log looks like below.
[ 160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!
Not only will this log be replicated and redundant, it will also
degrade XDP performance. So we use netdev_err_once() instead of
netdev_err() now.
Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When the XDP feature is enabled and with heavy XDP frames to be
transmitted, there is a considerable probability that available
tx BDs are insufficient. This will lead to some XDP frames to be
discarded and the "NOT enough BD for SG!" error log will appear
in the console (as shown below).
[ 160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!
In the case of heavy XDP traffic, sometimes the speed of recycling
tx BDs may be slower than the speed of sending XDP frames. There
may be several specific reasons, such as the interrupt is not
responsed in time, the efficiency of the NAPI callback function is
too low due to all the queues (tx queues and rx queues) share the
same NAPI, and so on.
After trying various methods, I think that increase the size of tx
BD ring is simple and effective. Maybe the best resolution is that
allocate NAPI for each queue to improve the efficiency of the NAPI
callback, but this change is a bit big and I didn't try this method.
Perheps this method will be implemented in a future patch.
This patch also updates the tx_wake_threshold of tx ring which is
related to the size of tx ring in the previous logic. Otherwise,
the tx_wake_threshold will be too high (403 BDs), which is more
likely to impact the slow path in the case of heavy XDP traffic,
because XDP path and slow path share the tx BD rings. According
to Jakub's suggestion, the tx_wake_threshold is at least equal to
tx_stop_threshold + 2 * MAX_SKB_FRAGS, if a queue of hundreds of
entries is overflowing, we should be able to apply a hysteresis
of a few tens of entries.
Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Once the XDP frames have been successfully transmitted through the
ndo_xdp_xmit() interface, it's the driver responsibility to free
the frames so that the page_pool can recycle the pages and reuse
them. However, this action is not implemented in the fec driver.
This leads to a user-visible problem that the console will print
the following warning log.
[ 157.568851] page_pool_release_retry() stalled pool shutdown 1389 inflight 60 sec
[ 217.983446] page_pool_release_retry() stalled pool shutdown 1389 inflight 120 sec
[ 278.399006] page_pool_release_retry() stalled pool shutdown 1389 inflight 181 sec
[ 338.812885] page_pool_release_retry() stalled pool shutdown 1389 inflight 241 sec
[ 399.226946] page_pool_release_retry() stalled pool shutdown 1389 inflight 302 sec
Therefore, to solve this issue, we free XDP frames via xdp_return_frame()
while cleaning the tx BD ring.
Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When a XDP program is installed or uninstalled, fec_restart() will
be invoked to reset MAC and buffer descriptor rings. It's reasonable
not to transmit any packet during the process of reset. However, the
NETDEV_XDP_ACT_NDO_XMIT bit of xdp_features is enabled by default,
that is to say, it's possible that the fec_enet_xdp_xmit() will be
invoked even if the process of reset is not finished. In this case,
the redirected XDP frames might be dropped and available transmit BDs
may be incorrectly deemed insufficient. So this patch disable the
NETDEV_XDP_ACT_NDO_XMIT feature by default and dynamically configure
this feature when the bpf program is installed or uninstalled.
Fixes: e4ac7cc6e5 ("net: fec: turn on XDP features")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The insertion of an empty frame was introduced with
commit db0b124f02 ("igc: Enhance Qbv scheduling by using first flag bit")
in order to ensure that the current cycle has at least one packet if
there is some packet to be scheduled for the next cycle.
However, the current implementation does not properly check if
a packet is already scheduled for the current cycle. Currently,
an empty packet is always inserted if and only if
txtime >= end_of_cycle && txtime > last_tx_cycle
but since last_tx_cycle is always either the end of the current
cycle (end_of_cycle) or the end of a previous cycle, the
second part (txtime > last_tx_cycle) is always true unless
txtime == last_tx_cycle.
What actually needs to be checked here is if the last_tx_cycle
was already written within the current cycle, so an empty frame
should only be inserted if and only if
txtime >= end_of_cycle && end_of_cycle > last_tx_cycle.
This patch does not only avoid an unnecessary insertion, but it
can actually be harmful to insert an empty packet if packets
are already scheduled in the current cycle, because it can lead
to a situation where the empty packet is actually processed
as the first packet in the upcoming cycle shifting the packet
with the first_flag even one cycle into the future, finally leading
to a TX hang.
The TX hang can be reproduced on a i225 with:
sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x1 \
txtime-delay 500000 \
clockid CLOCK_TAI
sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload \
skip_sock_check
and traffic generator
sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns
with traffic.cfg
#define ETH_P_IP 0x0800
{
/* Ethernet Header */
0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed
0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 48, 1, # IP Src - adapt as needed
10, 0, 48, 10, # IP Dest - adapt as needed
const16(5555), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
and the observed message with that is for example
igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
Tx Queue <0>
TDH <32>
TDT <3c>
next_to_use <3c>
next_to_clean <32>
buffer_info[next_to_clean]
time_stamp <ffff26a8>
next_to_watch <00000000632a1828>
jiffies <ffff27f8>
desc.status <1048000>
Fixes: db0b124f02 ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
It is possible (verified on a running system) that frames are processed
by igc_tx_launchtime with a txtime before the start of the cycle
(baset_est).
However, the result of txtime - baset_est is written into a u32,
leading to a wrap around to a positive number. The following
launchtime > 0 check will only branch to executing launchtime = 0
if launchtime is already 0.
Fix it by using a s32 before checking launchtime > 0.
Fixes: db0b124f02 ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The flags IGC_TXQCTL_STRICT_CYCLE and IGC_TXQCTL_STRICT_END
prevent the packet transmission over slot and cycle boundaries.
This is important for taprio offload where the slots and
cycles correspond to the slots and cycles configured for the
network.
However, the Qbv offload feature of the i225 is also used for
enabling TX launchtime / ETF offload. In that case, however,
the cycle has no meaning for the network and is only used
internally to adapt the base time register after a second has
passed.
Enabling strict mode in this case would unnecessarily prevent
the transmission of certain packets (i.e. at the boundary of a
second) and thus interferes with the ETF qdisc that promises
transmission at a certain point in time.
Similar to ETF, this also applies to CBS offload that also should
not be influenced by strict mode unless taprio offload would be
enabled at the same time.
This fully reverts
commit d8f45be01d ("igc: Use strict cycles for Qbv scheduling")
but its commit message only describes what was already implemented
before that commit. The difference to a plain revert of that commit
is that it now copes with the base_time = 0 case that was fixed with
commit e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
In particular, enabling strict mode leads to TX hang situations
under high traffic if taprio is applied WITHOUT taprio offload
but WITH ETF offload, e.g. as in
sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x1 \
txtime-delay 500000 \
clockid CLOCK_TAI
sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload \
skip_sock_check
and traffic generator
sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns
with traffic.cfg
#define ETH_P_IP 0x0800
{
/* Ethernet Header */
0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed
0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 48, 1, # IP Src - adapt as needed
10, 0, 48, 10, # IP Dest - adapt as needed
const16(5555), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
and the observed message with that is for example
igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
Tx Queue <0>
TDH <d0>
TDT <f0>
next_to_use <f0>
next_to_clean <d0>
buffer_info[next_to_clean]
time_stamp <ffff661f>
next_to_watch <00000000245a4efb>
jiffies <ffff6e48>
desc.status <1048000>
Fixes: d8f45be01d ("igc: Use strict cycles for Qbv scheduling")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Since commit e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
it is possible to enable taprio offload with a basetime of 0.
However, the check if taprio offload is already enabled (and thus -EALREADY
should be returned for igc_save_qbv_schedule) still relied on
adapter->base_time > 0.
This can be reproduced as follows:
# TAPRIO offload (flags == 0x2) and base-time = 0
sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x2
# The second call should fail with "Error: Device failed to setup taprio offload."
# But that only happens if base-time was != 0
sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x2
Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Only set adapter->taprio_offload_enable after validating the arguments.
Otherwise, it stays set even if the offload was not enabled.
Since the subsequent code does not get executed in case of invalid
arguments, it will not be read at first.
However, by activating and then deactivating another offload
(e.g. ETF/TX launchtime offload), taprio_offload_enable is read
and erroneously keeps the offload feature of the NIC enabled.
This can be reproduced as follows:
# TAPRIO offload (flags == 0x2) and negative base-time leading to expected -ERANGE
sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time -1000 \
sched-entry S 01 300000 \
flags 0x2
# IGC_TQAVCTRL is 0x0 as expected (iomem=relaxed for reading register)
sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1
# Activate ETF offload
sudo tc qdisc replace dev enp1s0 parent root handle 6666 mqprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
hw 0
sudo tc qdisc add dev enp1s0 parent 6666:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload
# IGC_TQAVCTRL is 0x9 as expected
sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1
# Deactivate ETF offload again
sudo tc qdisc delete dev enp1s0 parent 6666:1
# IGC_TQAVCTRL should now be 0x0 again, but is observed as 0x9
sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1
Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In the current implementation the flags adapter->qbv_enable
and IGC_FLAG_TSN_QBV_ENABLED have a similar name, but do not
have the same meaning. The first one is used only to indicate
taprio offload (i.e. when igc_save_qbv_schedule was called),
while the second one corresponds to the Qbv mode of the hardware.
However, the second one is also used to support the TX launchtime
feature, i.e. ETF qdisc offload. This leads to situations where
adapter->qbv_enable is false, but the flag IGC_FLAG_TSN_QBV_ENABLED
is set. This is prone to confusion.
The rename should reduce this confusion. Since it is a pure
rename, it has no impact on functionality.
Fixes: e17090eb24 ("igc: allow BaseTime 0 enrollment for Qbv")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Current codebase contained the usage of two different names for this
driver (i.e., `gvnic` and `gve`), which is quite unfriendly for users
to use, especially when trying to bind or unbind the driver manually.
The corresponding kernel module is registered with the name of `gve`.
It's more reasonable to align the name of the driver with the module.
Fixes: 893ce44df5 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Cc: csully@google.com
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
Direct replacement is safe here since return value of -errno
is used to check for truncation instead of sizeof(dest).
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add check for the return value of skb_copy in order to avoid NULL pointer
dereference.
Fixes: 2cd5485663 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The blamed commit introduces usage of fixed_phy_register() but
not a corresponding dependency on FIXED_PHY.
This can result in a build failure.
s390-linux-ld: drivers/net/ethernet/microchip/lan743x_main.o: in function `lan743x_phy_open':
drivers/net/ethernet/microchip/lan743x_main.c:1514: undefined reference to `fixed_phy_register'
Fixes: 624864fbff ("net: lan743x: add fixed phy support for LAN7431 device")
Cc: stable@vger.kernel.org
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/netdev/725bf1c5-b252-7d19-7582-a6809716c7d6@infradead.org/
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now in addrconf_mod_rs_timer(), reference idev depends on whether
rs_timer is not pending. Then modify rs_timer timeout.
There is a time gap in [1], during which if the pending rs_timer
becomes not pending. It will miss to hold idev, but the rs_timer
is activated. Thus rs_timer callback function addrconf_rs_timer()
will be executed and put idev later without holding idev. A refcount
underflow issue for idev can be caused by this.
if (!timer_pending(&idev->rs_timer))
in6_dev_hold(idev);
<--------------[1]
mod_timer(&idev->rs_timer, jiffies + when);
To fix the issue, hold idev if mod_timer() return 0.
Fixes: b7b1bfce0b ("ipv6: split duplicate address detection and router solicitation timer")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Klein reported that udp6_ehash_secret was initialized but never used.
Fixes: 1bbdceef1e ("inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once")
Reported-by: Amit Klein <aksecurity@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Schnelle says:
====================
s390/ism: Fixes to client handling
This is v2 of the patch previously titled "s390/ism: Detangle ISM client
IRQ and event forwarding". As suggested by Paolo Abeni I split the patch
up. While doing so I noticed another problem that was fixed by this patch
concerning the way the workqueues access the client structs. This means the
second patch turning the workqueues into simple direct calls also fixes
a problem. Finally I split off a third patch just for fixing
ism_unregister_client()s error path.
The code after these 3 patches is identical to the result of the v1 patch
except that I also turned the dev_err() for still registered DMBs into
a WARN().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When ism_unregister_client() is called but the client still has DMBs
registered it returns -EBUSY and prints an error. This only happens
after the client has already been unregistered however. This is
unexpected as the unregister claims to have failed. Furthermore as this
implies a client bug a WARN() is more appropriate. Thus move the
deregistration after the check and use WARN().
Fixes: 89e7d2ba61 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously the clients_lock was protecting the clients array against
concurrent addition/removal of clients but was also accessed from IRQ
context. This meant that it had to be a spinlock and that the add() and
remove() callbacks in which clients need to do allocation and take
mutexes can't be called under the clients_lock. To work around this these
callbacks were moved to workqueues. This not only introduced significant
complexity but is also subtly broken in at least one way.
In ism_dev_init() and ism_dev_exit() clients[i]->tgt_ism is used to
communicate the added/removed ISM device to the work function. While
write access to client[i]->tgt_ism is protected by the clients_lock and
the code waits that there is no pending add/remove work before and after
setting clients[i]->tgt_ism this is not enough. The problem is that the
wait happens based on per ISM device counters. Thus a concurrent
ism_dev_init()/ism_dev_exit() for a different ISM device may overwrite
a clients[i]->tgt_ism between unlocking the clients_lock and the
subsequent wait for the work to finnish.
Thankfully with the clients_lock no longer held in IRQ context it can be
turned into a mutex which can be held during the calls to add()/remove()
completely removing the need for the workqueues and the associated
broken housekeeping including the per ISM device counters and the
clients[i]->tgt_ism.
Fixes: 89e7d2ba61 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The clients array references all registered clients and is protected by
the clients_lock. Besides its use as general list of clients the clients
array is accessed in ism_handle_irq() to forward ISM device events to
clients.
While the clients_lock is taken in the IRQ handler when calling
handle_event() it is however incorrectly not held during the
client->handle_irq() call and for the preceding clients[] access leaving
it unprotected against concurrent client (un-)registration.
Furthermore the accesses to ism->sba_client_arr[] in ism_register_dmb()
and ism_unregister_dmb() are not protected by any lock. This is
especially problematic as the client ID from the ism->sba_client_arr[]
is not checked against NO_CLIENT and neither is the client pointer
checked.
Instead of expanding the use of the clients_lock further add a separate
array in struct ism_dev which references clients subscribed to the
device's events and IRQs. This array is protected by ism->lock which is
already taken in ism_handle_irq() and can be taken outside the IRQ
handler when adding/removing subscribers or the accessing
ism->sba_client_arr[]. This also means that the clients_lock is no
longer taken in IRQ context.
Fixes: 89e7d2ba61 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turning IRQs off is done by accessing Ethernet controller registers.
That can't be done until device's clock is enabled. It results in a SoC
hang otherwise.
This bug remained unnoticed for years as most bootloaders keep all
Ethernet interfaces turned on. It seems to only affect a niche SoC
family BCM47189. It has two Ethernet controllers but CFE bootloader uses
only the first one.
Fixes: 34322615cb ("net: bgmac: Mask interrupts during probe")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tracepoint has existed for 12 years, but it only covered udp
over the legacy IPv4 protocol. Having it enabled for udp6 removes
the unnecessary difference in error visibility.
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Fixes: 296f7ea75b ("udp: add tracepoints for queueing skb to rcvbuf")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the probe error path code that leaves the driver bound
to the device, but with essentially a dead device. This was
useful maybe twice early in the driver's life and no longer
makes sense to keep.
Fixes: 30a1e6d0f8 ("ionic: keep ionic dev on lif init fail")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unnecessary early code development check and the WARN_ON
that it uses. The irq alloc and free paths have long been
cleaned up and this check shouldn't have stuck around so long.
Fixes: 77ceb68e29 ("ionic: Add notifyq support")
Signed-off-by: Nitya Sunkad <nitya.sunkad@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moved PTP pointer validation before its use to avoid smatch warning.
Also used kzalloc/kfree instead of devm_kzalloc/devm_kfree.
Fixes: 2ef4e45d99 ("octeontx2-af: Add PTP PPS Errata workaround on CN10K silicon")
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In legacy silicon, promiscuous mode is only modified
through CGX mbox messages. In CN10KB silicon, it is modified
from CGX mbox and NIX. This breaks legacy application
behaviour. Fix this by removing call from NIX.
Fixes: d6c9784baf ("octeontx2-af: Invoke exact match functions if supported")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-07-05 (igc)
This series contains updates to igc driver only.
Husaini adds check to increment Qbv change error counter only on taprio
Qbvs. He also removes delay during Tx ring configuration and
resolves Tx hang that could occur when transmitting on a gate to be
closed.
Prasad Koya reports ethtool link mode as TP (twisted pair).
Tee Min corrects value for max SDU.
Aravindhan ensures that registers for PPS are always programmed to occur
in future.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Current duplex mode was unset in the driver, resulting in the default
parameter being set to 0, which corresponds to half duplex. It might
mislead users to have incorrect expectation about the driver's
transmission capabilities.
Set the default duplex configuration to full, as the driver runs in
full duplex mode at this point.
Fixes: 7e074d5a76 ("gve: Enable Link Speed Reporting in the driver.")
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Message-ID: <20230706044128.2726747-1-junfeng.guo@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-07-05 (ice)
This series contains updates to ice driver only.
Sridhar fixes incorrect comparison of max Tx rate limit to occur against
each TC value rather than the aggregate. He also resolves an issue with
the wrong VSI being used when setting max Tx rate when TCs are enabled.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix tx queue rate limit when TCs are configured
ice: Fix max_rate check while configuring TX rate limits
====================
Link: https://lore.kernel.org/r/20230705201346.49370-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmSlrvAACgkQSD+KveBX
+j4EcggAsZlHQHRaA9re/5Fr8VV/YNmf+eLM6/2F6CD1sLcwWEsmDXkqZpL/wqVv
bcP/dE/ehiMd1FI6XmEb9aGZY29mQBoItwCnxeUrK75d8CQib/pH87VkQdT9Uf6W
RVqPk2rOL778sTy6V/UZDecfmZB5XfEoOO6f0YP/j2t8HxmfdetBN0orE0iRQBmO
+0W8X5bIfCmGyWdQzOJB4a+S7Wi1JACr6CIOdNtADuZ7C7kKiPWrvEl7DMHlyX3Y
TyrE//piSJuxaHwdMNBHPsXK5ZYyn7ZJjqeCA+Kd+lKMJlJEO7R+TkTOP0RKE++a
ZdDetCqLKegQPpWZgjGzei7PpSpaFA==
=cNgk
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2023-07-05
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: RX, Fix page_pool page fragment tracking for XDP
net/mlx5: Query hca_cap_2 only when supported
net/mlx5e: TC, CT: Offload ct clear only once
net/mlx5e: Check for NOT_READY flag state after locking
net/mlx5: Register a unique thermal zone per device
net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq
net/mlx5e: fix memory leak in mlx5e_ptp_open
net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_create
net/mlx5e: fix double free in mlx5e_destroy_flow_table
====================
Link: https://lore.kernel.org/r/20230705175757.284614-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the event of a failure in tcf_change_indev(), fw_set_parms() will
immediately return an error after incrementing or decrementing
reference counter in tcf_bind_filter(). If attacker can control
reference counter to zero and make reference freed, leading to
use after free.
In order to prevent this, move the point of possible failure above the
point where the TC_FW_CLASSID is handled.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: M A Ramdhan <ramdhan@starlabs.sg>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Message-ID: <20230705161530.52003-1-ramdhan@starlabs.sg>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For some cases as below, we may encounter the unpreditable chip stats
in driver probe()
* The system reboot flow do not work properly, such as kernel oops while
rebooting, and then the driver do not go back to default status at
this moment.
* Similar to the flow above. If the device was enabled in BIOS or UEFI,
the system may switch to Linux without driver fully shutdown.
To avoid the problem, force push the device back to default in probe()
* mt7921e_mcu_fw_pmctrl() : return control privilege to chip side.
* mt7921_wfsys_reset() : cleanup chip config before resource init.
Error log
[59007.600714] mt7921e 0000:02:00.0: ASIC revision: 79220010
[59010.889773] mt7921e 0000:02:00.0: Message 00000010 (seq 1) timeout
[59010.889786] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59014.217839] mt7921e 0000:02:00.0: Message 00000010 (seq 2) timeout
[59014.217852] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59017.545880] mt7921e 0000:02:00.0: Message 00000010 (seq 3) timeout
[59017.545893] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59020.874086] mt7921e 0000:02:00.0: Message 00000010 (seq 4) timeout
[59020.874099] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59024.202019] mt7921e 0000:02:00.0: Message 00000010 (seq 5) timeout
[59024.202033] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59027.530082] mt7921e 0000:02:00.0: Message 00000010 (seq 6) timeout
[59027.530096] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59030.857888] mt7921e 0000:02:00.0: Message 00000010 (seq 7) timeout
[59030.857904] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59034.185946] mt7921e 0000:02:00.0: Message 00000010 (seq 8) timeout
[59034.185961] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59037.514249] mt7921e 0000:02:00.0: Message 00000010 (seq 9) timeout
[59037.514262] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59040.842362] mt7921e 0000:02:00.0: Message 00000010 (seq 10) timeout
[59040.842375] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59040.923845] mt7921e 0000:02:00.0: hardware init failed
Cc: stable@vger.kernel.org
Fixes: 5c14a5f944 ("mt76: mt7921: introduce mt7921e support")
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Juan Martinez <juan.martinez@amd.com>
Co-developed-by: Leon Yen <leon.yen@mediatek.com>
Signed-off-by: Leon Yen <leon.yen@mediatek.com>
Signed-off-by: Quan Zhou <quan.zhou@mediatek.com>
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Message-ID: <39fcb7cee08d4ab940d38d82f21897483212483f.1688569385.git.deren.wu@mediatek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean says:
====================
Fix dropping of oversize preemptible frames with felix DSA driver
It has been reported that preemptible traffic doesn't completely behave
as expected. Namely, large packets should be able to be squeezed
(through fragmentation) through taprio time slots smaller than the
transmission time of the full frame. That does not happen due to logic
in the driver (for oversize frame dropping with taprio) that was not
updated in order for this use case to work.
I am not sure whether it qualifies as "net" material, because some
structural changes are involved, and it is a "never worked" scenario.
OTOH, this is a complaint coming from users for a v6.4 kernel.
It's up to maintainers to decide whether this series can be considered;
I've submitted it as non-RFC in the optimistic case that it will be :)
Demo script illustrating the issue below.
add_taprio()
{
local ifname=$1
echo "Creating root taprio"
tc qdisc replace dev $ifname handle 8001: parent root stab overhead 24 taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 \
sched-entry S 01 1216 \
sched-entry S fe 12368 \
fp P E E E E E E E \
flags 0x2
}
remove_taprio()
{
local ifname=$1
echo "Removing taprio"
tc qdisc del dev $ifname root
}
ip netns add ns0
ip link set eno0 netns ns0 && ip -n ns0 link set eno0 up && ip -n ns0 addr add 192.168.100.1/24 dev eno0
ip addr add 192.168.100.2/24 dev swp0 && ip link set swp0 up
ip netns exec ns0 ethtool --set-mm eno0 pmac-enabled on verify-enabled off tx-enabled on
ethtool --set-mm swp0 pmac-enabled on verify-enabled off tx-enabled on
add_taprio swp0
ping 192.168.100.1 -s 1000 -c 5 # sent through TC0
ethtool -I --show-mm swp0 | grep MACMergeFragCountTx # should increase
ip addr flush swp0 && ip link set swp0 down
remove_taprio swp0
ethtool --set-mm swp0 pmac-enabled off verify-enabled off tx-enabled off
ip netns exec ns0 ethtool --set-mm eno0 pmac-enabled off verify-enabled off tx-enabled off
ip netns del ns0
====================
Link: https://lore.kernel.org/r/20230705104422.49025-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>