In order to benefit from the new napi_defer_hard_irqs feature,
we need to use napi_complete_done() variant in this driver.
RX path is already using it, this patch implements TX completion side.
mlx4_en_process_tx_cq() now returns the amount of retired packets,
instead of a boolean, so that mlx4_en_poll_tx_cq() can pass
this value to napi_complete_done().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gro_flush_timeout and napi_defer_hard_irqs can be read
from napi_complete_done() while other cpus write the value,
whithout explicit synchronization.
Use READ_ONCE()/WRITE_ONCE() to annotate the races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Back in commit 3b47d30396 ("net: gro: add a per device gro flush timer")
we added the ability to arm one high resolution timer, that we used
to keep not-complete packets in GRO engine a bit longer, hoping that further
frames might be added to them.
Since then, we added the napi_complete_done() interface, and commit
364b605573 ("net: busy-poll: return busypolling status to drivers")
allowed drivers to avoid re-arming NIC interrupts if we made a promise
that their NAPI poll() handler would be called in the near future.
This infrastructure can be leveraged, thanks to a new device parameter,
which allows to arm the napi hrtimer, instead of re-arming the device
hard IRQ.
We have noticed that on some servers with 32 RX queues or more, the chit-chat
between the NIC and the host caused by IRQ delivery and re-arming could hurt
throughput by ~20% on 100Gbit NIC.
In contrast, hrtimers are using local (percpu) resources and might have lower
cost.
The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy
than gro_flush_timeout (/sys/class/net/ethX/)
By default, both gro_flush_timeout and napi_defer_hard_irqs are zero.
This patch does not change the prior behavior of gro_flush_timeout
if used alone : NIC hard irqs should be rearmed as before.
One concrete usage can be :
echo 20000 >/sys/class/net/eth1/gro_flush_timeout
echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs
If at least one packet is retired, then we will reset napi counter
to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans
of the queue.
On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ
avoidance was only possible if napi->poll() was exhausting its budget
and not call napi_complete_done().
This feature also can be used to work around some non-optimal NIC irq
coalescing strategies.
Having the ability to insert XX usec delays between each napi->poll()
can increase cache efficiency, since we increase batch sizes.
It also keeps serving cpus not idle too long, reducing tail latencies.
Co-developed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the absence of MC1, the size calculation function
cudbg_mem_region_size() was returing wrong MC size and
resulted in adapter crash. This patch adds new argument
to cudbg_mem_region_size() which will have actual size
and returns error to caller in the absence of MC1.
Fixes: a1c69520f7 ("cxgb4: collect MC memory dump")
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>"
Signed-off-by: David S. Miller <davem@davemloft.net>
driver already had psp_firmware_header struture to
deal with different layout of sos ucode. the sos
micorcode initialization could be common one.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
vmr ring is dedicated for sriov vf (i.e.guest driver
in sriov), which is general communication interface
between driver and psp fw accross all ip version.
it is not correct to make it as ip specific callback.
it is even worse to check specific tOS version per IP
version (like psp_v11/v12).
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sabrina Dubroca says:
====================
net: vxlan/geneve: use the correct nlattr array for extack
The ->validate callbacks for vxlan and geneve have a couple of typos
in extack, where the nlattr array for IFLA_* attributes is used
instead of the link-specific one.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
IFLA_GENEVE_* attributes are in the data array, which is correctly
used when fetching the value, but not when setting the extended
ack. Because IFLA_GENEVE_MAX < IFLA_MAX, we avoid out of bounds
array accesses, but we don't provide a pointer to the invalid
attribute to userspace.
Fixes: a025fb5f49 ("geneve: Allow configuration of DF behaviour")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFLA_VXLAN_* attributes are in the data array, which is correctly
used when fetching the value, but not when setting the extended
ack. Because IFLA_VXLAN_MAX < IFLA_MAX, we avoid out of bounds
array accesses, but we don't provide a pointer to the invalid
attribute to userspace.
Fixes: 653ef6a3e4 ("vxlan: change vxlan_[config_]validate() to use netlink_ext_ack for error reporting")
Fixes: b4d3069783 ("vxlan: Allow configuration of DF behaviour")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru says:
====================
qed*: Add support for pcie advanced error recovery.
The patch series adds qed/qede driver changes for PCIe Advanced Error
Recovery (AER) support.
Patch (1) adds qed changes to enable the device to send error messages
to root port when detected.
Patch (2) adds qede support for handling the detected errors (AERs).
Changes from previous version:
-------------------------------
v2: use pci_num_vf() instead of caching the value in edev.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The error recovery is handled by management firmware (MFW) with the help of
qed/qede drivers. Upon detecting the errors, driver informs MFW about this
event which in turn starts a recovery process. MFW sends ERROR_RECOVERY
notification to the driver which performs the required cleanup/recovery
from the driver side.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mlxsw_sp_acl_rulei_create() function is supposed to return an error
pointer from mlxsw_afa_block_create(). The problem is that these
functions both return NULL instead of error pointers. Half the callers
expect NULL and half expect error pointers so it could lead to a NULL
dereference on failure.
This patch changes both of them to return error pointers and changes all
the callers which checked for NULL to check for IS_ERR() instead.
Fixes: 4cda7d8d70 ("mlxsw: core: Introduce flexible actions support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gro_cells lib is used by different encapsulating netdevices, such as
geneve, macsec, vxlan etc. to speed up decapsulated traffic processing.
CPU tag is a sort of "encapsulation", and we can use the same mechs to
greatly improve overall DSA performance.
skbs are passed to the GRO layer after removing CPU tags, so we don't
need any new packet offload types as it was firstly proposed by me in
the first GRO-over-DSA variant [1].
The size of struct gro_cells is sizeof(void *), so hot struct
dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields
remain in one 32-byte cacheline.
The other positive side effect is that drivers for network devices
that can be shipped as CPU ports of DSA-driven switches can now use
napi_gro_frags() to pass skbs to kernel. Packets built that way are
completely non-linear and are likely being dropped without GRO.
This was tested on to-be-mainlined-soon Ethernet driver that uses
napi_gro_frags(), and the overall performance was on par with the
variant from [1], sometimes even better due to minimal overhead.
net.core.gro_normal_batch tuning may help to push it to the limit
on particular setups and platforms.
iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup
on 1.2 GHz MIPS board:
5.7-rc2 baseline:
[ID] Interval Transfer Bitrate Retr
[ 5] 0.00-120.01 sec 9.00 GBytes 644 Mbits/sec 413 sender
[ 5] 0.00-120.00 sec 8.99 GBytes 644 Mbits/sec receiver
Iface RX packets TX packets
eth0 7097731 7097702
port0 426050 6671829
port1 6671681 425862
port1.218 6671677 425851
With this patch:
[ID] Interval Transfer Bitrate Retr
[ 5] 0.00-120.01 sec 12.2 GBytes 870 Mbits/sec 122 sender
[ 5] 0.00-120.00 sec 12.2 GBytes 870 Mbits/sec receiver
Iface RX packets TX packets
eth0 9474792 9474777
port0 455200 353288
port1 9019592 455035
port1.218 353144 455024
v2:
- Add some performance examples in the commit message;
- No functional changes.
[1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@dlink.ru/
Signed-off-by: Alexander Lobakin <bloodyreaper@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MV_V2_PORT_CTRL_SWRST bit in MV_V2_PORT_CTRL is reserved on 88E2110.
Setting SWRST on 88E2110 breaks packets transfer after interface down/up
cycle.
Fixes: 8f48c2ac85 ("net: marvell10g: soft-reset the PHY when coming out of low power")
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC4862 5.5.3 e) prevents received Router Advertisements from reducing
the Valid Lifetime of configured addresses to less than two hours, thus
preventing hosts from reacting to the information provided by a router
that has positive knowledge that a prefix has become invalid.
This patch makes hosts honor all Valid Lifetime values, as per
draft-gont-6man-slaac-renum-06, Section 4.2. This is meant to help
mitigate the problem discussed in draft-ietf-v6ops-slaac-renum.
Note: Attacks aiming at disabling an advertised prefix via a Valid
Lifetime of 0 are not really more harmful than other attacks
that can be performed via forged RA messages, such as those
aiming at completely disabling a next-hop router via an RA that
advertises a Router Lifetime of 0, or performing a Denial of
Service (DoS) attack by advertising illegitimate prefixes via
forged PIOs. In scenarios where RA-based attacks are of concern,
proper mitigations such as RA-Guard [RFC6105] [RFC7113] should
be implemented.
Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For chip like CHIP_OLAND with si enabled(amdgpu.si_support=1),
the amdgpu will expose pp_num_states to the /sys directory.
In this moment, read the pp_num_states file will excute the
amdgpu_get_pp_num_states func. In our case, the data hasn't
been initialized, so the kernel will access some ilegal
address, trigger the segmentfault and system will reboot soon:
uos@uos-PC:~$ cat /sys/devices/pci0000\:00/0000\:00\:00.0/0000\:01\:00
.0/pp_num_states
Message from syslogd@uos-PC at Apr 22 09:26:20 ...
kernel:[ 82.154129] Internal error: Oops: 96000004 [#1] SMP
This patch aims to fix this problem, avoid that reading file
triggers the kernel sementfault.
Signed-off-by: limingyu <limingyu@uniontech.com>
Signed-off-by: zhoubinbin <zhoubinbin@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c: In function amdgpu_job_submit:
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c:148:26: warning: variable priority set but not used [-Wunused-but-set-variable]
commit 33abcb1f5a ("drm/amdgpu: set compute queue priority at mqd_init")
left behind this, remove it.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
According to the current kiq read register method,
there will be race condition when using KIQ to read
register if multiple clients want to read at same time
just like the expample below:
1. client-A start to read REG-0 throguh KIQ
2. client-A poll the seqno-0
3. client-B start to read REG-1 through KIQ
4. client-B poll the seqno-1
5. the kiq complete these two read operation
6. client-A to read the register at the wb buffer and
get REG-1 value
Therefore, use amdgpu_device_wb_get() to request reg_val_offs
for each kiq read register.
v2: fix the error remove
v3: fix the print typo
v4: remove unused variables
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Do mass update of port.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of rl.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of uar.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of pd.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of pagealloc.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of mr.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of mcg.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of main.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of vxlan.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of mpfs.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of gid.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of lag.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of fw.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of fs_core to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of FPGA to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of statistics to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of eq.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of ecpf.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>