The offloaded HW stats are designed to allow per-netdevice enablement and
disablement. These stats are only accessible through RTM_GETSTATS, and
therefore should be toggled by a RTM_SETSTATS message. Add it, and the
necessary skeleton handler.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new IFLA_STATS_LINK_OFFLOAD_XSTATS child attribute,
IFLA_OFFLOAD_XSTATS_L3_STATS, to carry statistics for traffic that takes
place in a HW router.
The offloaded HW stats are designed to allow per-netdevice enablement and
disablement. Additionally, as a netdevice is configured, it may become or
cease being suitable for binding of a HW counter. Both of these aspects
need to be communicated to the userspace. To that end, add another child
attribute, IFLA_OFFLOAD_XSTATS_HW_S_INFO:
- attr nest IFLA_OFFLOAD_XSTATS_HW_S_INFO
- attr nest IFLA_OFFLOAD_XSTATS_L3_STATS
- attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_REQUEST
- {0,1} as u8
- attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED
- {0,1} as u8
Thus this one attribute is a nest that can be used to carry information
about various types of HW statistics, and indexing is very simply done by
wrapping the information for a given statistics suite into the attribute
that carries the suite is the RTM_GETSTATS query. At the same time, because
_HW_S_INFO is nested directly below IFLA_STATS_LINK_OFFLOAD_XSTATS, it is
possible through filtering to request only the metadata about individual
statistics suites, without having to hit the HW to get the actual counters.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Offloading switch device drivers may be able to collect statistics of the
traffic taking place in the HW datapath that pertains to a certain soft
netdevice, such as VLAN. Add the necessary infrastructure to allow exposing
these statistics to the offloaded netdevice in question. The API was shaped
by the following considerations:
- Collection of HW statistics is not free: there may be a finite number of
counters, and the act of counting may have a performance impact. It is
therefore necessary to allow toggling whether HW counting should be done
for any particular SW netdevice.
- As the drivers are loaded and removed, a particular device may get
offloaded and unoffloaded again. At the same time, the statistics values
need to stay monotonic (modulo the eventual 64-bit wraparound),
increasing only to reflect traffic measured in the device.
To that end, the netdevice keeps around a lazily-allocated copy of struct
rtnl_link_stats64. Device drivers then contribute to the values kept
therein at various points. Even as the driver goes away, the struct stays
around to maintain the statistics values.
- Different HW devices may be able to count different things. The
motivation behind this patch in particular is exposure of HW counters on
Nvidia Spectrum switches, where the only practical approach to counting
traffic on offloaded soft netdevices currently is to use router interface
counters, and count L3 traffic. Correspondingly that is the statistics
suite added in this patch.
Other devices may be able to measure different kinds of traffic, and for
that reason, the APIs are built to allow uniform access to different
statistics suites.
- Because soft netdevices and offloading drivers are only loosely bound, a
netdevice uses a notifier chain to communicate with the drivers. Several
new notifiers, NETDEV_OFFLOAD_XSTATS_*, have been added to carry messages
to the offloading drivers.
- Devices can have various conditions for when a particular counter is
available. As the device is configured and reconfigured, the device
offload may become or cease being suitable for counter binding. A
netdevice can use a notifier type NETDEV_OFFLOAD_XSTATS_REPORT_USED to
ping offloading drivers and determine whether anyone currently implements
a given statistics suite. This information can then be propagated to user
space.
When the driver decides to unoffload a netdevice, it can use a
newly-added function, netdev_offload_xstats_report_delta(), to record
outstanding collected statistics, before destroying the HW counter.
This patch adds a helper, call_netdevice_notifiers_info_robust(), for
dispatching a notifier with the possibility of unwind when one of the
consumers bails. Given the wish to eventually get rid of the global
notifier block altogether, this helper only invokes the per-netns notifier
block.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Obtaining stats for the IFLA_STATS_LINK_OFFLOAD_XSTATS nest involves a HW
access, and can fail for more reasons than just netlink message size
exhaustion. Therefore do not always return -EMSGSIZE on the failure path,
but respect the error code provided by the callee. Set the error explicitly
where it is reasonable to assume -EMSGSIZE as the failure reason.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Later patches add handlers for more HW-backed statistics. An extack will be
useful when communicating HW / driver errors to the client. Add the
arguments as appropriate.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The filter_mask field of RTM_GETSTATS header determines which top-level
attributes should be included in the netlink response. This saves
processing time by only including the bits that the user cares about
instead of always dumping everything. This is doubly important for
HW-backed statistics that would typically require a trip to the device to
fetch the stats.
So far there was only one HW-backed stat suite per attribute. However,
IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest, and will gain a new stat suite in
the following patches. It would therefore be advantageous to be able to
filter within that nest, and select just one or the other HW-backed
statistics suite.
Extend rtnetlink so that RTM_GETSTATS permits attributes in the payload.
The scheme is as follows:
- RTM_GETSTATS
- struct if_stats_msg
- attr nest IFLA_STATS_GET_FILTERS
- attr IFLA_STATS_LINK_OFFLOAD_XSTATS
- u32 filter_mask
This scheme reuses the existing enumerators by nesting them in a dedicated
context attribute. This is covered by policies as usual, therefore a
gradual opt-in is possible. Currently only IFLA_STATS_LINK_OFFLOAD_XSTATS
nest has filtering enabled, because for the SW counters the issue does not
seem to be that important.
rtnl_offload_xstats_get_size() and _fill() are extended to observe the
requested filters.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IFLA_STATS_LINK_OFFLOAD_XSTATS attribute is a nest whose child
attributes carry various special hardware statistics. The code that handles
this nest was written with the idea that all these statistics would be
exposed by the device driver of a physical netdevice.
In the following patches, a new attribute is added to the abovementioned
nest, which however can be defined for some soft netdevices. The NDO-based
approach to querying these does not work, because it is not the soft
netdevice driver that exposes these statistics, but an offloading NIC
driver that does so.
The current code does not scale well to this usage. Simply rewrite it back
to the pattern seen in other fill-like and get_size-like functions
elsewhere.
Extract to helpers the code that is concerned with handling specifically
NDO-backed statistics so that it can be easily reused should more such
statistics be added.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The currently used names rtnl_get_offload_stats() and
rtnl_get_offload_stats_size() do not clearly show the namespace. The former
function additionally seems to have been named this way in accordance with
the NDO name, as opposed to the naming used in the rtnetlink.c file (and
indeed elsewhere in the netlink handling code). As more and
differently-flavored attributes are introduced, a common clear prefix is
needed for all related functions.
Rename the functions to follow the rtnl_offload_xstats_* naming scheme.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today when VFs are put in promiscuous mode, they can request PF
to configure device for them to receive all VLANs traffic regardless
of what vlan is configured by the PF (via ip link) and PF allows this
config request regardless of whether VF is trusted or not.
From security POV, when VLAN is configured for VF through PF (via ip link),
honour such config requests from VF only when they are configured to be
trusted, otherwise restrict such VFs vlan promisc mode config.
Cc: stable@vger.kernel.org
Fixes: f990c82c38 ("qed*: Add support for ndo_set_vf_trust")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver does support SR-IOV VFs trust configuration but
it does not display it when queried via ip link utility.
Cc: stable@vger.kernel.org
Fixes: f990c82c38 ("qed*: Add support for ndo_set_vf_trust")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
@ 2022-03-02 10:39 Bhupesh Sharma
2022-03-02 10:39 ` [PATCH v2 1/2 net-next] net: stmmac: Add support for SM8150 Bhupesh Sharma
2022-03-02 10:39 ` [PATCH v2 2/2 net-next] net: stmmac: dwmac-qcom-ethqos: Adjust rgmii loopback_en per platform Bhupesh Sharma
0 siblings, 2 replies; 3+ messages in thread
Bhupesh Sharma says:
====================
net: stmmac: Enable support for Qualcomm SA8155p-ADP board
Changes since v1:
-----------------
- v1 can be seen here: https://lore.kernel.org/netdev/20220126221725.710167-1-bhupesh.sharma@linaro.org/t/
- Fixed review comments from Bjorn - broke the v1 series into two
separate series - one each for 'net' tree and 'arm clock/dts' tree
- so as to ease review of the same from the respective maintainers.
- This series is intended for the 'net' tree.
The SA8155p-ADP board supports on-board ethernet (Gibabit Interface),
with support for both RGMII and RMII buses.
This patchset adds the support for the same.
Note that this patchset is based on an earlier sent patchset
for adding PDC controller support on SM8150 (see [1]).
[1]. https://lore.kernel.org/linux-arm-msm/20220226184028.111566-1-bhupesh.sharma@linaro.org/T/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Not all platforms should have RGMII_CONFIG_LOOPBACK_EN and the result it
about 50% packet loss on incoming messages. So make it possile to
configure this per compatible and enable it for QCS404.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds compatible, POR config & driver data for ethernet controller
found in SM8150 SoC.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
[bhsharma: Massage the commit log and other cosmetic changes]
Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Damato says:
====================
page_pool: Add stats counters
Greetings:
Welcome to v9.
This revisions adds a commit which updates the page_pool documentation to
describe the stats API, structures, and fields.
Additionally, this revision contains a minor cosmetic change suggested by
Saeed in page_pool_recycle_in_ring in commit 2: "page_pool: Add recycle
stats", which removes an unnecessary #ifdef.
There are no functional changes in this revision.
Benchmark output from the v7 cover [1] is pasted below, as it is still
relevant since no functional changes have been made in this revision:
Benchmarks have been re-run. As always, results between runs are highly
variable; you'll find results showing that stats disabled are both faster
and slower than stats enabled in back to back benchmark runs.
Raw benchmark output with stats off [2] and stats on [3] are available for
examination.
Test system:
- 2x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
- 2 NUMA zones, with 18 cores per zone and 2 threads per core
bench_page_pool_simple results, loops=200000000
test name stats enabled stats disabled
cycles nanosec cycles nanosec
for_loop 0 0.335 0 0.336
atomic_inc 14 6.106 13 6.022
lock 30 13.365 32 13.968
no-softirq-page_pool01 75 32.884 74 32.308
no-softirq-page_pool02 79 34.696 74 32.302
no-softirq-page_pool03 110 48.005 105 46.073
tasklet_page_pool01_fast_path 14 6.156 14 6.211
tasklet_page_pool02_ptr_ring 41 18.028 39 17.391
tasklet_page_pool03_slow 107 46.646 105 46.123
bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=4:
test name stats enabled stats disabled
cycles nanosec cycles nanosec
page_pool_cross_cpu CPU(0) 3973 1731.596 4015 1750.015
page_pool_cross_cpu CPU(1) 3976 1733.217 4022 1752.864
page_pool_cross_cpu CPU(2) 3973 1731.615 4016 1750.433
page_pool_cross_cpu CPU(3) 3976 1733.218 4021 1752.806
page_pool_cross_cpu CPU(4) 994 433.305 1005 438.217
page_pool_cross_cpu average 3378 - 3415 -
bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=8:
test name stats enabled stats disabled
cycles nanosec cycles nanosec
page_pool_cross_cpu CPU(0) 6969 3037.488 6909 3011.463
page_pool_cross_cpu CPU(1) 6974 3039.469 6913 3012.961
page_pool_cross_cpu CPU(2) 6969 3037.575 6910 3011.585
page_pool_cross_cpu CPU(3) 6974 3039.415 6913 3012.961
page_pool_cross_cpu CPU(4) 6969 3037.288 6909 3011.368
page_pool_cross_cpu CPU(5) 6972 3038.732 6913 3012.920
page_pool_cross_cpu CPU(6) 6969 3037.350 6909 3011.386
page_pool_cross_cpu CPU(7) 6973 3039.356 6913 3012.921
page_pool_cross_cpu CPU(8) 871 379.934 864 376.620
page_pool_cross_cpu average 6293 - 6239 -
Thanks.
[1]: https://lore.kernel.org/all/1645810914-35485-1-git-send-email-jdamato@fastly.com/
[2]: 327dcd71d1/v7_stats_disabled
[3]: 327dcd71d1/v7_stats_enabled
v8 -> v9:
- Add documentation about the page_pool_get_stats API, stats
structures, and fields to Documentation/networking/page_pool.rst.
- Remove unnecessary #ifdef in page_pool_recycle_in_ring.
v7 -> v8:
- Rename mlx5 ethtool stats so that users have a better idea of
their meaning.
v6 -> v7:
- stats split out into two structs one single per-page pool struct
for allocation path stats and one per-cpu pointer for recycle
path stats.
- page_pool_get_stats updated to use a wrapper struct to gather
stats for allocation and recycle stats with a single argument.
- placement of structs adjusted
- mlx5 driver modified to use page_pool_get_stats API
v5 -> v6:
- Per cpu page_pool_stats struct pointer is now marked as
____cacheline_aligned_in_smp. Placement of the field in the
struct is unchanged; it is the last field.
v4 -> v5:
- Fixed the description of the kernel option in Kconfig.
- Squashed commits 1-10 from v4 into a single commit for easier
review.
- Changed the comment style of the comment for
the this_cpu_inc_alloc_stat macro.
- Changed the return type of page_pool_get_stats from struct
page_pool_stat * to bool.
v3 -> v4:
- Restructured stats to be per-cpu per-pool.
- Global stats and proc file were removed.
- Exposed an API (page_pool_get_stats) for batching the pool stats.
v2 -> v3:
- patch 8/10 ("Add stat tracking cache refill") fixed placement of
counter increment.
- patch 10/10 ("net-procfs: Show page pool stats in proc") updated:
- fix unused label warning from kernel test robot,
- fixed page_pool_seq_show to only display the refill stat
once,
- added a remove_proc_entry for page_pool_stat to
dev_proc_net_exit.
v1 -> v2:
- A new kernel config option has been added, which defaults to N,
preventing this code from being compiled in by default
- The stats structure has been converted to a per-cpu structure
- The stats are now exported via proc (/proc/net/page_pool_stat)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds support for the page_pool_get_stats API to mlx5. If the
user has enabled CONFIG_PAGE_POOL_STATS in their kernel, ethtool will
output page pool stats.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the new stats API, kernel config parameter, and stats structure
information to the page_pool documentation.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a function page_pool_get_stats which can be used by drivers to obtain
stats for a specified page_pool.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add per-cpu stats tracking page pool recycling events:
- cached: recycling placed page in the page pool cache
- cache_full: page pool cache was full
- ring: page placed into the ptr ring
- ring_full: page released from page pool because the ptr ring was full
- released_refcnt: page released (and not recycled) because refcnt > 1
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add per-pool statistics counters for the allocation path of a page pool.
These stats are incremented in softirq context, so no locking or per-cpu
variables are needed.
This code is disabled by default and a kernel config option is provided for
users who wish to enable them.
The statistics added are:
- fast: successful fast path allocations
- slow: slow path order-0 allocations
- slow_high_order: slow path high order allocations
- empty: ptr ring is empty, so a slow path allocation was forced.
- refill: an allocation which triggered a refill of the cache
- waive: pages obtained from the ptr ring that cannot be added to
the cache due to a NUMA mismatch.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Network drivers such as igb or igc call eth_get_headlen() to determine the
header length for their to be constructed skbs in receive path.
When running HSR on top of these drivers, it results in triggering BUG_ON() in
skb_pull(). The reason is the skb headlen is not sufficient for HSR to work
correctly. skb_pull() notices that.
For instance, eth_get_headlen() returns 14 bytes for TCP traffic over HSR which
is not correct. The problem is, the flow dissection code does not take HSR into
account. Therefore, add support for it.
Reported-by: Anthony Harivel <anthony.harivel@linutronix.de>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20220228195856.88187-1-kurt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eliminate the following coccicheck warning:
./net/openvswitch/flow.c:379:2-3: Unneeded semicolon
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220227132208.24658-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add extack message to return exact message to user when adding invalid
filter with conflict flags for TC action.
In previous implement we just return EINVAL which is confusing for user.
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Link: https://lore.kernel.org/r/1646191769-17761-1-git-send-email-baowen.zheng@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2022-03-01
This series contains updates to iavf driver only.
Mateusz adds support for interrupt moderation for 50G and 100G speeds
as well as support for the driver to specify a request as its primary
MAC address. He also refactors VLAN V2 capability exchange into more
generic extended capabilities to ease the addition of future
capabilities. Finally, he corrects the incorrect return of iavf_status
values and removes non-inclusive language.
Minghao Chi removes unneeded variables, instead returning values
directly.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
iavf: Remove non-inclusive language
iavf: Fix incorrect use of assigning iavf_status to int
iavf: stop leaking iavf_status as "errno" values
iavf: remove redundant ret variable
iavf: Add usage of new virtchnl format to set default MAC
iavf: refactor processing of VLAN V2 capability message
iavf: Add support for 50G/100G in AIM algorithm
====================
Link: https://lore.kernel.org/r/20220301185939.3005116-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use ida_alloc_xxx()/ida_free() instead to
ida_simple_get()/ida_simple_remove().
The latter is deprecated and more verbose.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220301131212.26348-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert sfp to use %pe for printing error codes, which can print them
as errno symbols rather than numbers.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1nOyEN-00BuuE-OB@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert phylink to use %pe for printing error codes, which can print
them as errno symbols rather than numbers.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1nOyEI-00Buu8-K9@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- bump version strings, by Simon Wunderlich
- Remove redundant 'flush_workqueue()' calls, by Christophe JAILLET
- Migrate to linux/container_of.h, by Sven Eckelmann
- Demote batadv-on-batadv skip error message, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmIfnFgWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeofBZEACtLe1VvUbNi00KMFWE7N32/C6v
X7snbt9HeoWJUAGQ8C89Eu80sAa0Jpig99qnQNNFRT6UR0T/DkFUYtUVkd5HV1TV
OwiZag6PvROck4FyN2YYde5NA96PvMm6/70NlVWL4dXB1IVWoQvGBtoWNmuom/hA
EkCIXt7IE1T1Y+OrAyeRM5KXcxK8nNYQbL2fKvampELAu8SRcq/cF7vfUQYq9OTz
7PNxTRqbZ2EOzp57A0EyYqYSzNpoKgQxyJsMjRGBZ6mooJB/GHNhj6B7qxyva/70
O942Twq9HY0F+XPhUxVDD5W2W8g2Mai1FFYpXlMpHOhiQQuVHqp9g6SLNOxGjEhC
O1UrPRHdC4KKQoEqqJdYwdyFBE7yNvkJkgF1dUIpoAAjn6xcYo9uWUq+hxItbW2k
OxmhNA9xLkiEtffT1sEJxf0rAyUj6WK88PsBVaVwxMSnSgRq87s3b926EnaxOnkx
Te7V8ZnNFk+kvJQHtAmW1ZylAeAMAOvJ7m8f3+RzS4h7C5hiYYFP4B3QRJ8uIVAO
klKohohPvGIuann1fyu3qRB2tm4Op+PurakGzusryVDkrPm70Gvtdy34M3s68vH+
y41pWZuSwz5HjBHvXrVDgXPK8Jo7KfxUM3Xrt7sd7mJJ6ik6GMbe5PkM04cjxVks
4kZKB4oCk72u8DtknA==
=OG/5
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-pullrequest-20220302' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Remove redundant 'flush_workqueue()' calls, by Christophe JAILLET
- Migrate to linux/container_of.h, by Sven Eckelmann
- Demote batadv-on-batadv skip error message, by Sven Eckelmann
* tag 'batadv-next-pullrequest-20220302' of git://git.open-mesh.org/linux-merge:
batman-adv: Demote batadv-on-batadv skip error message
batman-adv: Migrate to linux/container_of.h
batman-adv: Remove redundant 'flush_workqueue()' calls
batman-adv: Start new development cycle
====================
Link: https://lore.kernel.org/r/20220302163522.102842-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The error message "Cannot find parent device" was shown for users of
macvtap (on batadv devices) whenever the macvtap was moved to a different
netns. This happens because macvtap doesn't provide an implementation for
rtnl_link_ops->get_link_net.
The situation for which this message is printed is actually not an error
but just a warning that the optional sanity check was skipped. So demote
the message from error to warning and adjust the text to better explain
what happened.
Reported-by: Leonardo Mörlein <freifunk@irrelefant.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The commit d2a8ebbf81 ("kernel.h: split out container_of() and
typeof_member() macros") introduced a new header for the container_of
related macros from (previously) linux/kernel.h.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Daniel Braunwarth says:
====================
if_ether.h: add industrial fieldbus Ethertypes
This set of patches adds the Ethertypes for PROFINET and EtherCAT.
The defines should be used by iproute2 to extend the list of available link
layer protocols.
====================
Link: https://lore.kernel.org/r/20220228133029.100913-1-daniel@braunwarth.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages. This fixes iproute2 which otherwise resolved
the link interface to an interface in the wrong namespace.
Test commands:
ip netns add nst
ip link add dummy0 type dummy
ip link add link macvtap0 link dummy0 type macvtap
ip link set macvtap0 netns nst
ip -netns nst link show macvtap0
Before:
10: macvtap0@gre0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff
After:
10: macvtap0@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Reported-by: Leonardo Mörlein <freifunk@irrelefant.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Link: https://lore.kernel.org/r/20220228003240.1337426-1-sven@narfation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the following coccicheck warning:
./drivers/net/ethernet/netronome/nfp/flower/qos_conf.c:750:7-55: WARNING
avoid newline at end of message in NL_SET_ERR_MSG_MOD
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220301112356.1820985-1-wanjiabing@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Íñigo Huguet says:
====================
sfc: optimize RXQs count and affinities
In sfc driver one RX queue per physical core was allocated by default.
Later on, IRQ affinities were set spreading the IRQs in all NUMA local
CPUs.
However, with that default configuration it result in a non very optimal
configuration in many modern systems. Specifically, in systems with hyper
threading and 2 NUMA nodes, affinities are set in a way that IRQs are
handled by all logical cores of one same NUMA node. Handling IRQs from
both hyper threading siblings has no benefit, and setting affinities to one
queue per physical core is neither a very good idea because there is a
performance penalty for moving data across nodes (I was able to check it
with some XDP tests using pktgen).
This patches reduce the default number of channels to one per physical
core in the local NUMA node. Then, they set IRQ affinities to CPUs in
the local NUMA node only. This way we save hardware resources since
channels are limited resources. We also leave more room for XDP_TX
channels without hitting driver's limit of 32 channels per interface.
Running performance tests using iperf with a SFC9140 device showed no
performance penalty for reducing the number of channels.
RX XDP tests showed that performance can go down to less than half if
the IRQ is handled by a CPU in a different NUMA node, which doesn't
happen with the new defaults from this patches.
====================
Link: https://lore.kernel.org/r/20220228132254.25787-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Affinity hints were being set to CPUs in local NUMA node first, and then
in other CPUs. This was creating 2 unintended issues:
1. Channels created to be assigned each to a different physical core
were assigned to hyperthreading siblings because of being in same
NUMA node.
Since the patch previous to this one, this did not longer happen
with default rss_cpus modparam because less channels are created.
2. XDP channels could be assigned to CPUs in different NUMA nodes,
decreasing performance too much (to less than half in some of my
tests).
This patch sets the affinity hints spreading the channels only in local
NUMA node's CPUs. A fallback for the case that no CPU in local NUMA node
is online has been added too.
Example of CPUs being assigned in a non optimal way before this and the
previous patch (note: in this system, xdp-8 to xdp-15 are created
because num_possible_cpus == 64, but num_present_cpus == 32 so they're
never used):
$ lscpu | grep -i numa
NUMA node(s): 2
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
$ grep -H . /proc/irq/*/0000:07:00.0*/../smp_affinity_list
/proc/irq/141/0000:07:00.0-0/../smp_affinity_list:0
/proc/irq/142/0000:07:00.0-1/../smp_affinity_list:1
/proc/irq/143/0000:07:00.0-2/../smp_affinity_list:2
/proc/irq/144/0000:07:00.0-3/../smp_affinity_list:3
/proc/irq/145/0000:07:00.0-4/../smp_affinity_list:4
/proc/irq/146/0000:07:00.0-5/../smp_affinity_list:5
/proc/irq/147/0000:07:00.0-6/../smp_affinity_list:6
/proc/irq/148/0000:07:00.0-7/../smp_affinity_list:7
/proc/irq/149/0000:07:00.0-8/../smp_affinity_list:16
/proc/irq/150/0000:07:00.0-9/../smp_affinity_list:17
/proc/irq/151/0000:07:00.0-10/../smp_affinity_list:18
/proc/irq/152/0000:07:00.0-11/../smp_affinity_list:19
/proc/irq/153/0000:07:00.0-12/../smp_affinity_list:20
/proc/irq/154/0000:07:00.0-13/../smp_affinity_list:21
/proc/irq/155/0000:07:00.0-14/../smp_affinity_list:22
/proc/irq/156/0000:07:00.0-15/../smp_affinity_list:23
/proc/irq/157/0000:07:00.0-xdp-0/../smp_affinity_list:8
/proc/irq/158/0000:07:00.0-xdp-1/../smp_affinity_list:9
/proc/irq/159/0000:07:00.0-xdp-2/../smp_affinity_list:10
/proc/irq/160/0000:07:00.0-xdp-3/../smp_affinity_list:11
/proc/irq/161/0000:07:00.0-xdp-4/../smp_affinity_list:12
/proc/irq/162/0000:07:00.0-xdp-5/../smp_affinity_list:13
/proc/irq/163/0000:07:00.0-xdp-6/../smp_affinity_list:14
/proc/irq/164/0000:07:00.0-xdp-7/../smp_affinity_list:15
/proc/irq/165/0000:07:00.0-xdp-8/../smp_affinity_list:24
/proc/irq/166/0000:07:00.0-xdp-9/../smp_affinity_list:25
/proc/irq/167/0000:07:00.0-xdp-10/../smp_affinity_list:26
/proc/irq/168/0000:07:00.0-xdp-11/../smp_affinity_list:27
/proc/irq/169/0000:07:00.0-xdp-12/../smp_affinity_list:28
/proc/irq/170/0000:07:00.0-xdp-13/../smp_affinity_list:29
/proc/irq/171/0000:07:00.0-xdp-14/../smp_affinity_list:30
/proc/irq/172/0000:07:00.0-xdp-15/../smp_affinity_list:31
CPUs assignments after this and previous patch, so normal channels
created only one per core in NUMA node and affinities set only to local
NUMA node:
$ grep -H . /proc/irq/*/0000:07:00.0*/../smp_affinity_list
/proc/irq/116/0000:07:00.0-0/../smp_affinity_list:0
/proc/irq/117/0000:07:00.0-1/../smp_affinity_list:1
/proc/irq/118/0000:07:00.0-2/../smp_affinity_list:2
/proc/irq/119/0000:07:00.0-3/../smp_affinity_list:3
/proc/irq/120/0000:07:00.0-4/../smp_affinity_list:4
/proc/irq/121/0000:07:00.0-5/../smp_affinity_list:5
/proc/irq/122/0000:07:00.0-6/../smp_affinity_list:6
/proc/irq/123/0000:07:00.0-7/../smp_affinity_list:7
/proc/irq/124/0000:07:00.0-xdp-0/../smp_affinity_list:16
/proc/irq/125/0000:07:00.0-xdp-1/../smp_affinity_list:17
/proc/irq/126/0000:07:00.0-xdp-2/../smp_affinity_list:18
/proc/irq/127/0000:07:00.0-xdp-3/../smp_affinity_list:19
/proc/irq/128/0000:07:00.0-xdp-4/../smp_affinity_list:20
/proc/irq/129/0000:07:00.0-xdp-5/../smp_affinity_list:21
/proc/irq/130/0000:07:00.0-xdp-6/../smp_affinity_list:22
/proc/irq/131/0000:07:00.0-xdp-7/../smp_affinity_list:23
/proc/irq/132/0000:07:00.0-xdp-8/../smp_affinity_list:0
/proc/irq/133/0000:07:00.0-xdp-9/../smp_affinity_list:1
/proc/irq/134/0000:07:00.0-xdp-10/../smp_affinity_list:2
/proc/irq/135/0000:07:00.0-xdp-11/../smp_affinity_list:3
/proc/irq/136/0000:07:00.0-xdp-12/../smp_affinity_list:4
/proc/irq/137/0000:07:00.0-xdp-13/../smp_affinity_list:5
/proc/irq/138/0000:07:00.0-xdp-14/../smp_affinity_list:6
/proc/irq/139/0000:07:00.0-xdp-15/../smp_affinity_list:7
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Handling channels from CPUs in different NUMA node can penalize
performance, so better configure only one channel per core in the same
NUMA node than the NIC, and not per each core in the system.
Fallback to all other online cores if there are not online CPUs in local
NUMA node.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove non-inclusive language from the iavf driver.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently there are functions in iavf_virtchnl.c for polling specific
virtchnl receive events. These are all assigning iavf_status values to
int values. Fix this and explicitly assign int values if iavf_status
is not IAVF_SUCCESS.
Also, refactor a small amount of duplicated code that can be reused by
all of the previously mentioned functions.
Finally, fix some spacing errors for variable assignment and get rid of
all the goto statements in the refactored functions for clarity.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Several functions in the iAVF core files take status values of the enum
iavf_status and convert them into integer values. This leads to
confusion as functions return both Linux errno values and status codes
intermixed. Reporting status codes as if they were "errno" values can
lead to confusion when reviewing error logs. Additionally, it can lead
to unexpected behavior if a return value is not interpreted properly.
Fix this by introducing iavf_status_to_errno, a switch that explicitly
converts from the status codes into an appropriate error value. Also
introduce a virtchnl_status_to_errno function for the one case where we
were returning both virtchnl status codes and iavf_status codes in the
same function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Return value directly instead of taking this in another redundant
variable.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use new type field of VIRTCHNL_OP_ADD_ETH_ADDR and
VIRTCHNL_OP_DEL_ETH_ADDR requests to indicate that
VF wants to change its default MAC address.
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In order to handle the capability exchange necessary for
VIRTCHNL_VF_OFFLOAD_VLAN_V2, the driver must send
a VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS message. This must occur prior to
__IAVF_CONFIG_ADAPTER, and the driver must wait for the response from
the PF.
To handle this, the __IAVF_INIT_GET_OFFLOAD_VLAN_V2_CAPS state was
introduced. This state is intended to process the response from the VLAN
V2 caps message. This works ok, but is difficult to extend to adding
more extended capability exchange.
Existing (and future) AVF features are relying more and more on these
sort of extended ops for processing additional capabilities. Just like
VLAN V2, this exchange must happen prior to __IAVF_CONFIG_ADPATER.
Since we only send one outstanding AQ message at a time during init, it
is not clear where to place this state. Adding more capability specific
states becomes a mess. Instead of having the "previous" state send
a message and then transition into a capability-specific state,
introduce __IAVF_EXTENDED_CAPS state. This state will use a list of
extended_caps that determines what messages to send and receive. As long
as there are extended_caps bits still set, the driver will remain in
this state performing one send or one receive per state machine loop.
Refactor the VLAN V2 negotiation to use this new state, and remove the
capability-specific state. This makes it significantly easier to add
a new similar capability exchange going forward.
Extended capabilities are processed by having an associated SEND and
RECV extended capability bit. During __IAVF_EXTENDED_CAPS, the
driver checks these bits in order by feature, first the send bit for
a feature, then the recv bit for a feature. Each send flag will call
a function that sends the necessary response, while each receive flag
will wait for the response from the PF. If a given feature can't be
negotiated with the PF, the associated flags will be cleared in
order to skip processing of that feature.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>