ethtool rx-buf-len is for setting receive buffer size,
support setting it via ethtool -G parameter and getting
it via ethtool -g parameter.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
SUNRPC: add some netns refcount trackers
Effort started in linux-5.17
Our goal is to replace get_net()/put_net() pairs with
get_net_track()/put_net_track() to get instant notifications
of imbalance bugs in the future.
Patches were split from a bigger series sent one month ago.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
struct svc_xprt holds a long lived reference to a netns,
it is worth tracking it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
ethtool: add header/data split indication
TCP ZC Rx requires data to be placed neatly into pages, separate
from the networking headers. This is not supported by most devices
so to make deployment easy this set adds a way for the driver to
report support for this feature thru ethtool.
The larger scope of configuring splitting headers and data, or DMA
scatter seems dauntingly broad, so this set focuses specifically
on the question "is this device usable with TCP ZC Rx?".
The aim is to avoid a litany of conditions on HW platforms, features,
and firmware versions in orchestration systems when the drivers can
easily tell their SG config.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For applications running on a mix of platforms it's useful
to have a clear indication whether host's NIC supports the
geometry requirements of TCP zero-copy. TCP zero-copy Rx
requires data to be neatly placed into memory pages.
Most NICs can't do that.
This patch is adding GET support only, since the NICs
I work with either always have the feature enabled or
enable it whenever MTU is set to jumbo. In other words
I don't need SET. But adding set should be trivial.
(The only note on SET is that we will likely want
the setting to be "sticky" and use 0 / `unknown`
to reset it back to driver default.)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock says:
====================
Allow disabling KSZ switch refclock
The reference clock output from the KSZ9477 and related Microchip
switch devices is not required on all board designs. Add a device
tree property to disable it for power and EMI reasons.
Changes since v3:
-rework some code for simplicity
Changes since v2:
-check for conflicting options in DT, added note in bindings doc
Changes since v1:
-added Acked-by on patch 1, rebase to net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new microchip,synclko-disable property which can be specified
to disable the reference clock output from the device if not required
by the board design.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document the new microchip,synclko-disable property which can be
specified to disable the reference clock output from the device if not
required by the board design.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir points out that since we removed mii_lpa_to_linkmode_lpa_sgmii(),
mii_lpa_mod_linkmode_lpa_sgmii() is also no longer called.
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of unnecessary if check on tx_desc pointer in
mvneta_xdp_submit_frame routine since num_frames is always greater than
0 and tx_desc pointer is always initialized.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert sparx5 to use the mac_select_interface rather than using
phylink_set_pcs(). The intention here is to unify the approach for
PCS and eventually remove phylink_set_pcs().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov says:
====================
udp/ipv6 optimisations
Shed some weight from udp/ipv6. Zerocopy benchmarks over dummy showed
~5% tx/s improvement, should be similar for small payload non-zc
cases.
The performance comes from killing 4 atomics and a couple of big struct
memcpy/memset. 1/10 removes a pair of atomics on dst refcounting for
cork->skb setup, 9/10 saves another pair on cork init. 5/10 and 8/10
kill extra 88B memset and memcpy respectively.
====================
Link: https://lore.kernel.org/r/cover.1643243772.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Inline a part of ipv6_fixup_options() to avoid extra overhead on
function call if opt is NULL.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
udpv6_sendmsg() doesn't need dst after calling ip6_make_skb(), so
instead of taking an additional reference inside ip6_setup_cork()
and releasing the initial one afterwards, we can hand over a reference
into ip6_make_skb() saving two atomics. The only other user of
ip6_setup_cork() is ip6_append_data() and it requires an extra
dst_hold().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
udpv6_sendmsg() first initialises an on-stack 88B struct flowi6 and then
copies it into cork, which is expensive. Avoid the copy in corkless case
by initialising on-stack cork->fl directly.
The main part is a couple of lines under !corkreq check. The rest
converts fl6 variable to be a pointer.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Another preparation patch. inet_cork_full already contains a field for
iflow, so we can avoid passing a separate struct iflow6 into
__ip6_append_data() and ip6_make_skb(), and use the flow stored in
inet_cork_full. Make sure callers set cork->fl, i.e. we init it in
ip6_append_data() and before calling ip6_make_skb().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert a struct inet_cork argument in __ip6_append_data() to struct
inet_cork_full. As one struct contains another inet_cork is still can
be accessed via ->base field. It's a preparation patch making further
changes a bit cleaner.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It doesn't appear there is any reason for ip6_cork_release() to zero
cork->fl, it'll be fully filled on next initialisation. This 88 bytes
memset accounts to 0.3-0.5% of total CPU cycles.
It's also needed in following patches and allows to remove an extar flow
copy in udp_v6_push_pending_frames().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clean up ip6_setup_cork() and ip6_cork_release() adding a local variable
for v6_cork->opt. It's a preparation patch for further changes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ipv6_push_nfrag_opts() doesn't change passed daddr, and so
__ip6_make_skb() doesn't actually need to keep an on-stack copy of
fl6->daddr. Set initially final_dst to fl6->daddr,
ipv6_push_nfrag_opts() will override it if needed, and get rid of extra
copies.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Corked AF_INET for ipv6 socket doesn't appear to be the hottest case,
so move it out of the common path under up->pending check to remove
overhead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
__ip6_make_skb() gets a cork->dst ref, hands it over to skb and shortly
after puts cork->dst. Save two atomics by stealing it without extra
referencing, ip6_cork_release() handles NULL cork->dst.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel says:
====================
mlxsw: Various updates
This patchset contains miscellaneous updates for mlxsw. No user visible
changes that I am aware of.
Patches #1-#5 rework registration of internal traps in preparation of
line cards support.
Patch #6 improves driver resilience against a misbehaving device.
Patch #7 prevents the driver from overwriting device internal actions.
See the commit message for more details.
====================
Link: https://lore.kernel.org/r/20220127090226.283442-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In Spectrum-2 and later ASICs, each TCAM region has a default action
that is executed in case a packet did not match any rule in the region.
The location of the action in the database (KVDL) is computed by adding
the region's index to a base value.
Some TCAM regions are not exposed to the host and used internally by the
device. Allocate KVDL entries for the default actions of these regions
to avoid the host from overwriting them.
With mlxsw, lookups in the internal regions are not currently performed,
but it is a good practice not to overwrite their default actions.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When processing events generated by the device's firmware, the driver
protects itself from events reported for non-existent local ports, but
not for the CPU port (local port 0), which exists, but does not have all
the fields as any local port.
This can result in a NULL pointer dereference when trying access
'struct mlxsw_sp_port' fields which are not initialized for CPU port.
Commit 63b08b1f68 ("mlxsw: spectrum: Protect driver from buggy firmware")
already handled such issue by bailing early when processing a PUDE event
reported for the CPU port.
Generalize the approach by moving the check to a common function and
making use of it in all relevant places.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For event traps which are used in core, avoid having a separate trap
group for each event. Instead of that introduce a single core event trap
group and use it for all event traps.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These functions belong to core.c alongside the functions that
register/unregister a single trap. Move it there. Make the functions
possibly usable by other parts of mlxsw code.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of initializing the trap groups used by core in spectrum.c
over op, do it directly in core.c
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The call inits the EMAD group, but other groups as well. Therefore, move
it out of EMAD init code and call it before.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of calling the same code four times, do it in a loop over array
which contains trap grups to be set.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1) Dima, adds an internal mlx5 steering callback per steering provider
(FW vs SW steering), to advertise steering capabilities implemented by
each module, this helps upper modules in mlx5 to know what is
supported and what's not without the need to tell what is the underlying
steering mode.
2nd patch is the usecase where this interface is used to implement
Vlan Push/pop for uplink with SW steering, where in FW mode it's not
supported yet.
2) Roi Dayan improves code readability and maintainability
as preparation step for multi attribute instance per flow
in mlx5 TC module
Currently the mlx5_flow object contains a single mlx5_attr instance.
However, multi table actions (e.g. CT) instantiate multiple attr instances.
This is a refactoring series in a preparation to support multiple
attribute instances per flow.
The commits prepare functions to get attr instance instead of using
flow->attr and also using attr->flags if the flag is more relevant
to be attr flag and not a flow flag considering there will be multiple
attr instances. i.e. CT and SAMPLE flags.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmHzApAACgkQSD+KveBX
+j4yXwf/ai6mtBr7TOYvb1nTy5YMnqk0hXm1jwsYFrenw3qYX4ua8oE2rAlAtzeN
BCXOdO3kPw2FZpajBD1vZIYpam9jIzf7cxZ0V7KcNEyX9ro6FpmvOp2TpfAfQZdr
8fD1z6zy9I0gXrV2HDcvRZKDvB6s7G8E7AkBP2NpTo7jLQAk53iGDMgkSH8v12zO
XrL10cVrzOe/rEP1W5DnmRgrK0xcwb3zv5PxmT3+PUdUzfdl3OFGHUFFemNz0+4G
DB8MuHpa77sgrYmuXX6r+0GUTlHYcVa12pMOJC6UnUyLOFN2/LEmgEDyncJE+Qlz
0JlF4q/tGWWyeCNrDb4vb4rpj1XVKQ==
=J8s3
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2022-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-01-27
1) Dima, adds an internal mlx5 steering callback per steering provider
(FW vs SW steering), to advertise steering capabilities implemented by
each module, this helps upper modules in mlx5 to know what is
supported and what's not without the need to tell what is the underlying
steering mode.
2nd patch is the usecase where this interface is used to implement
Vlan Push/pop for uplink with SW steering, where in FW mode it's not
supported yet.
2) Roi Dayan improves code readability and maintainability
as preparation step for multi attribute instance per flow
in mlx5 TC module
Currently the mlx5_flow object contains a single mlx5_attr instance.
However, multi table actions (e.g. CT) instantiate multiple attr instances.
This is a refactoring series in a preparation to support multiple
attribute instances per flow.
The commits prepare functions to get attr instance instead of using
flow->attr and also using attr->flags if the flag is more relevant
to be attr flag and not a flow flag considering there will be multiple
attr instances. i.e. CT and SAMPLE flags.
* tag 'mlx5-updates-2022-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: VLAN push on RX, pop on TX
net/mlx5: Introduce software defined steering capabilities
net/mlx5: Remove unused TIR modify bitmask enums
net/mlx5e: CT, Remove redundant flow args from tc ct calls
net/mlx5e: TC, Store mapped tunnel id on flow attr
net/mlx5e: Test CT and SAMPLE on flow attr
net/mlx5e: Refactor eswitch attr flags to just attr flags
net/mlx5e: CT, Don't set flow flag CT for ct clear flow
net/mlx5e: TC, Hold sample_attr on stack instead of pointer
net/mlx5e: TC, Reject rules with multiple CT actions
net/mlx5e: TC, Refactor mlx5e_tc_add_flow_mod_hdr() to get flow attr
net/mlx5e: TC, Pass attr to tc_act can_offload()
net/mlx5e: TC, Split pedit offloads verify from alloc_tc_pedit_action()
net/mlx5e: TC, Move pedit_headers_action to parse_attr
net/mlx5e: Move counter creation call to alloc_flow_attr_counter()
net/mlx5e: Pass attr arg for attaching/detaching encaps
net/mlx5e: Move code chunk setting encap dests into its own function
====================
Link: https://lore.kernel.org/r/20220127204007.146300-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some older NIC hardware isn't capable of doing VLAN push on RX and pop
on TX.
A workaround has been added in software to support it, but it has a
performance penalty since it requires a hairpin + loopback.
There's no such limitation with the newer NICs, so no need to pay the
price of the w/a. With this change the software w/a is disabled for
certain HW versions and steering modes that support it.
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There are two different internal steering modes, abstracted from the
rest of the driver. In order to keep upper layer of the driver agnostic
to the differences in capabilities of the steering modes, this patch
introduces mlx5_fs_get_capabilities() API to check if a certain software
defined capability is supported. It differs from the capabilities
exposed by the hardware, as it takes into account the flow steering mode
(SMFS/DMFS) currently enabled.
This implementation supports only two capability flags:
MLX5_FLOW_STEERING_CAP_VLAN_PUSH_ON_RX
MLX5_FLOW_STEERING_CAP_VLAN_POP_ON_TX
They map to DR_ACTION_STATE_PUSH_VLAN and DR_ACTION_STATE_POP_VLAN
actions, implemented in SW steering earlier in commit f5e22be534
("net/mlx5: DR, Split modify VLAN state to separate pop/push states").
Which enables using of pop/push vlan without restrictions, e.g. doing
vlan pop on TX and RX, compared to FW steering that supports only vlan
pop on RX and push on TX.
Other capabilities can be added in the future.
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
struct mlx5_ifc_modify_tir_bitmask_bits is used for the bitmask
of MODIFY_TIR operations.
Remove the unused bitmask enums.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The flow arg is not being used so remove it.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In preparation for multiple attr instances the tunnel_id should
be attr specific and not flow specific.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently the mlx5_flow object contains a single mlx5_attr instance.
However, multi table actions (e.g. CT) instantiate multiple attr instances.
Prepare for multiple attr instances by testing for CT or SAMPLE flag on attr
flags instead of flow flag.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The flags are flow attrs and not esw specific attr flags.
Refactor to remove the esw prefix and move from eswitch.h
to en_tc.h where struct mlx5_flow_attr exists.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
ct clear action is a normal flow with a modify header for registers to
0. there is no need for any special handling in tc_ct.c.
Parsing of ct clear action still allocates mod acts to set 0 on the
registers and the driver continue to add a normal rule with modify hdr
context.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In later commit we are going to instantiate multiple attr instances
for flow instead of single attr.
Parsing TC sample allocates a new memory but there is no symmetric
cleanup in the infrastructure.
To avoid asymmetric alloc/free use sample_attr as part of the flow attr
and not allocated and held as a pointer.
This will avoid a cleanup leak when sample action is not on the first
attr.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The driver doesn't support multiple CT actions.
Multiple CT clear actions are ok as they are redundant also with
another CT actions.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In later commit we are going to instantiate multiple attr instances
for flow instead of single attr.
Make sure mlx5e_tc_add_flow_mod_hdr() use the correct attr and not flow->attr.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In later commit we are going to instantiate multiple attr instances
for flow instead of single attr.
Make sure the parsing using correct attr and not flow->attr.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Split pedit verify part into a new subfunction for better
maintainability.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Move pedit_headers_action from flow parse_state to flow parse_attr.
In a follow up commit we are going to have multiple attr per flow
and pedit_headers_action are unique per attr.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>