Commit Graph

997003 Commits

Author SHA1 Message Date
Alexander Lobakin
4a6e7ec93a vlan/8021q: avoid retpoline overhead on GRO
The two most popular headers going after VLAN are IPv4 and IPv6.
Retpoline overhead for them is addressed only in dev_gro_receive(),
when they lie right after the outermost Ethernet header.
Use the indirect call wrappers in VLAN GRO receive code to reduce
the penalty on receiving tagged frames (when hardware stripping is
off or not available).

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:51:12 -07:00
Alexander Lobakin
86af2c82c2 gro: add combined call_gro_receive() + INDIRECT_CALL_INET() helper
call_gro_receive() is used to limit GRO recursion, but it works only
with callback pointers.
There's a combined version of call_gro_receive() + INDIRECT_CALL_2()
in <net/inet_common.h>, but it doesn't check for IPv6 modularity.
Add a similar new helper to cover both of these. It can and will be
used to avoid retpoline overhead when IP header lies behind another
offloaded proto.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:51:12 -07:00
Alexander Lobakin
e75ec151c1 gro: make net/gro.h self-contained
If some source file includes <net/gro.h>, but doesn't include
<linux/indirect_call_wrapper.h>:

In file included from net/8021q/vlan_core.c:7:
./include/net/gro.h:6:1: warning: data definition has no type or storage class
    6 | INDIRECT_CALLABLE_DECLARE(struct sk_buff *ipv6_gro_receive(struct list_head *,
      | ^~~~~~~~~~~~~~~~~~~~~~~~~
./include/net/gro.h:6:1: error: type defaults to ‘int’ in declaration of ‘INDIRECT_CALLABLE_DECLARE’ [-Werror=implicit-int]

[...]

Include <linux/indirect_call_wrapper.h> directly. It's small and
won't pull lots of dependencies.
Also add some incomplete struct declarations to be fully stacked.

Fixes: 04f00ab227 ("net/core: move gro function declarations to separate header ")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:51:12 -07:00
Bhaskar Chowdhury
1816bf1f53 Fix a typo
s/serisouly/seriously/

...and the sentence construction.

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:46:19 -07:00
David S. Miller
84b9000a4b Merge branch 'ionic-fixes'
Shannon Nelson says:

====================
ionic fixes

These are a few little fixes and cleanups found while working
on other features and more testing.
====================
2021-03-18 19:16:10 -07:00
Shannon Nelson
e768929de1 ionic: protect adminq from early destroy
Don't destroy the adminq while there is an outstanding request.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:16:10 -07:00
Shannon Nelson
9e8eaf8427 ionic: stop watchdog when in broken state
Up to now we've been ignoring any error return from the
queue starting in the link status check, so we fix that here.
If the driver had to reset and couldn't get things running
properly again, for example after a Tx Timeout and the FW is
not responding to commands, don't let the link watchdog try
to restart the queues.  At this point the user can try to DOWN
and UP the device to clear the errors.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:16:10 -07:00
Shannon Nelson
8c775344c7 ionic: block actions during fw reset
Block some actions while the FW is in a reset activity
and the queues are not configured.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:16:10 -07:00
Shannon Nelson
acc606d3e4 ionic: update ethtool support bits for BASET
Add support in get_link_ksettings for a couple of
new BASET connections.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:16:10 -07:00
Shannon Nelson
9b761574fe ionic: fix unchecked reference
We can get to the counter without going through the pointer
that the robot complained about.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:16:10 -07:00
Shannon Nelson
2103ed2fab ionic: simplify the intr_index use in txq_init
The qcq->intr.index was set when the queue was allocated,
there is no need to reach around to find it.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:16:10 -07:00
Shannon Nelson
25cc5a5fac ionic: code cleanup details
Catch a couple of missing macro name uses, fix a couple
of misspellings, etc.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:16:10 -07:00
Vladimir Oltean
df291e54cc net: ocelot: support multiple bridges
The ocelot switches are a bit odd in that they do not have an STP state
to put the ports into. Instead, the forwarding configuration is delayed
from the typical port_bridge_join into stp_state_set, when the port enters
the BR_STATE_FORWARDING state.

I can only guess that the implementation of this quirk is the reason that
led to the simplification of the driver such that only one bridge could
be offloaded at a time.

We can simplify the data structures somewhat, and introduce a per-port
bridge device pointer and STP state, similar to how the LAG offload
works now (there we have a per-port bonding device pointer and TX
enabled state). This allows offloading multiple bridges with relative
ease, while still keeping in place the quirk to delay the programming of
the PGIDs.

We actually need this change now because we need to remove the bogus
restriction from ocelot_bridge_stp_state_set that ocelot->bridge_mask
needs to contain BIT(port), otherwise that function is a no-op.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:13:42 -07:00
Horatiu Vultur
d25fde64d1 net: ocelot: Fix deletetion of MRP entries from MAC table
When a MRP ring was deleted or disabled, the driver was iterating over
the ports to detect if any other MPR rings exists and in case it didn't
exist it would delete the MAC table entry. But the problem was that it
used the last iterated port to delete the MAC table entry and this could
be a NULL port.

The fix consists of using the port on which the function was called.

Fixes: 7c588c3e96 ("net: ocelot: Extend MRP")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:13:42 -07:00
Xie He
536e1004d2 net: lapbether: Close the LAPB device before its underlying Ethernet device closes
When a virtual LAPB device's underlying Ethernet device closes, the LAPB
device is also closed.

However, currently the LAPB device is closed after the Ethernet device
closes. It would be better to close it before the Ethernet device closes.
This would allow the LAPB device to transmit a last frame to notify the
other side that it is disconnecting.

Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:13:42 -07:00
Colin Ian King
0f9651bb3a octeontx2-af: Remove redundant initialization of pointer pfvf
The pointer pfvf is being initialized with a value that is
never read and it is being updated later with a new value.  The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Fixes: 56bcef528b ("octeontx2-af: Use npc_install_flow API for promisc and broadcast entries")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:13:42 -07:00
Johan Hovold
269aa03012 net: cdc_ncm: drop redundant driver-data assignment
The driver data for the data interface has already been set by
usb_driver_claim_interface() so drop the subsequent redundant
assignment.

Note that this also avoids setting the driver data three times in case
of a combined interface.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:13:42 -07:00
zuoqilin
92a310cdcf nfc/fdp: Simplify the return expression of fdp_nci_open()
Simplify the return expression.

Signed-off-by: zuoqilin <zuoqilin@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:13:42 -07:00
Xiong Zhenwu
a835f9034e /net/core/: fix misspellings using codespell tool
A typo is found out by codespell tool in 1734th line of drop_monitor.c:

$ codespell ./net/core/
./net/core/drop_monitor.c:1734: guarnateed  ==> guaranteed

Fix a typo found by codespell.

Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:13:41 -07:00
Xiong Zhenwu
7f1330c1b1 /net/hsr: fix misspellings using codespell tool
A typo is found out by codespell tool in 111th line of hsr_debugfs.c:

$ codespell ./net/hsr/

net/hsr/hsr_debugfs.c:111: Debufs  ==> Debugfs

Fix typos found by codespell.

Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:13:41 -07:00
Lee Jones
21e0b8fc16 of: of_net: Provide function name and param description
Fixes the following W=1 kernel build warning(s):

 drivers/of/of_net.c:104: warning: Function parameter or member 'np' not described in 'of_get_mac_address'
 drivers/of/of_net.c:104: warning: expecting prototype for mac(). Prototype was for of_get_mac_address() instead

Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: netdev@vger.kernel.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:13:41 -07:00
Wong, Vee Khee
76da35dc99 stmmac: intel: Add PSE and PCH PTP clock source selection
Intel mGbE variant implemented in EHL and TGL can be set to select
different clock frequency based on GPO bits in MAC_GPIO_STATUS register.

We introduce a new "void (*ptp_clk_freq_config)(void *priv)" in platform
data so that if a platform is required to configure the frequency of clock
source, in this case Intel mGBE does, the platform-specific configuration
of the PTP clock setting is done when stmmac_ptp_register() is called.

Signed-off-by: Wong, Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Co-developed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 19:10:51 -07:00
David S. Miller
d7417ee918 Merge branch 'mv88e6xxx-offload-bridge-flags'
Tobias Waldekranz says:

====================
net: dsa: mv88e6xxx: Offload bridge port flags

Add support for offloading learning and broadcast flooding flags. With
this in place, mv88e6xx supports offloading of all bridge port flags
that are currently supported by the bridge.

Broadcast flooding is somewhat awkward to control as there is no
per-port bit for this like there is for unknown unicast and unknown
multicast. Instead we have to update the ATU entry for the broadcast
address for all currently used FIDs.

v2 -> v3:
  - Only return a netdev from dsa_port_to_bridge_port if the port is
    currently bridged (Vladimir & Florian)

v1 -> v2:
  - Ensure that mv88e6xxx_vtu_get handles VID 0 (Vladimir)
  - Fixed off-by-one in mv88e6xxx_port_set_assoc_vector (Vladimir)
  - Fast age all entries on port when disabling learning (Vladimir)
  - Correctly detect bridge flags on LAG ports (Vladimir)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:24:06 -07:00
Tobias Waldekranz
8d1d8298eb net: dsa: mv88e6xxx: Offload bridge broadcast flooding flag
These switches have two modes of classifying broadcast:

1. Broadcast is multicast.
2. Broadcast is its own unique thing that is always flooded
   everywhere.

This driver uses the first option, making sure to load the broadcast
address into all active databases. Because of this, we can support
per-port broadcast flooding by (1) making sure to only set the subset
of ports that have it enabled whenever joining a new bridge or VLAN,
and (2) by updating all active databases whenever the setting is
changed on a port.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:24:06 -07:00
Tobias Waldekranz
041bd545e1 net: dsa: mv88e6xxx: Offload bridge learning flag
Allow a user to control automatic learning per port.

Many chips have an explicit "LearningDisable"-bit that can be used for
this, but we opt for setting/clearing the PAV instead, as it works on
all devices at least as far back as 6083.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:24:06 -07:00
Tobias Waldekranz
7b9f16fe40 net: dsa: mv88e6xxx: Flood all traffic classes on standalone ports
In accordance with the comment in dsa_port_bridge_leave, standalone
ports shall be configured to flood all types of traffic. This change
aligns the mv88e6xxx driver with that policy.

Previously a standalone port would initially not egress any unknown
traffic, but after joining and then leaving a bridge, it would.

This does not matter that much since we only ever send FROM_CPUs on
standalone ports, but it seems prudent to make sure that the initial
values match those that are applied after a bridging/unbridging cycle.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:24:06 -07:00
Tobias Waldekranz
0806dd4654 net: dsa: mv88e6xxx: Use standard helper for broadcast address
Use the conventional declaration style of a MAC address in the
kernel (u8 addr[ETH_ALEN]) for the broadcast address, then set it
using the existing helper.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:24:06 -07:00
Tobias Waldekranz
34065c5830 net: dsa: mv88e6xxx: Remove some bureaucracy around querying the VTU
The hardware has a somewhat quirky protocol for reading out the VTU
entry for a particular VID. But there is no reason why we cannot
create a better API for ourselves in the driver.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:24:06 -07:00
Tobias Waldekranz
d89ef4b8b3 net: dsa: mv88e6xxx: Provide generic VTU iterator
Move the intricacies of correctly iterating over the VTU to a common
implementation.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:24:06 -07:00
Tobias Waldekranz
ffcec3f257 net: dsa: mv88e6xxx: Avoid useless attempts to fast-age LAGs
When a port is a part of a LAG, the ATU will create dynamic entries
belonging to the LAG ID when learning is enabled. So trying to
fast-age those out using the constituent port will have no
effect. Unfortunately the hardware does not support move operations on
LAGs so there is no obvious way to transform the request to target the
LAG instead.

Instead we document this known limitation and at least avoid wasting
any time on it.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:24:06 -07:00
Tobias Waldekranz
cc76ce9e8d net: dsa: Add helper to resolve bridge port from DSA port
In order for a driver to be able to query a bridge for information
about itself, e.g. reading out port flags, it has to use a netdev that
is known to the bridge. In the simple case, that is just the netdev
representing the port, e.g. swp0 or swp1 in this example:

   br0
   / \
swp0 swp1

But in the case of an offloaded lag, this will be the bond or team
interface, e.g. bond0 in this example:

     br0
     /
  bond0
   / \
swp0 swp1

Add a helper that hides some of this complexity from the
drivers. Then, redefine dsa_port_offloads_bridge_port using the helper
to avoid double accounting of the set of possible offloaded uppers.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:24:06 -07:00
David S. Miller
44b958a686 Merge branch 'ipa-32bit'
Alex Elder says:

====================
net: ipa: support 32-bit targets

There is currently a configuration dependency that restricts IPA to
be supported only on 64-bit machines.  There are only a few things
that really require that, and those are fixed in this series.  The
last patch in the series removes the CONFIG_64BIT build dependency
for IPA.

Version 2 of this series uses upper_32_bits() rather than creating
a new function to extract bits out of a DMA address.  Version 3 of
uses lower_32_bits() as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:20:35 -07:00
Alex Elder
99e75a37bd net: ipa: relax 64-bit build requirement
We currently assume the IPA driver is built only for a 64 bit kernel.

When this constraint was put in place it eliminated some do_div()
calls, replacing them with the "/" and "%" operators.  We now only
use these operations on u32 and size_t objects.  In a 32-bit kernel
build, size_t will be 32 bits wide, so there remains no reason to
use do_div() for divide and modulo.

A few recent commits also fix some code that assumes that DMA
addresses are 64 bits wide.

With that, we can get rid of the 64-bit build requirement.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:20:35 -07:00
Alex Elder
e5d4e96b44 net: ipa: fix table alignment requirement
We currently have a build-time check to ensure that the minimum DMA
allocation alignment satisfies the constraint that IPA filter and
route tables must point to rules that are 128-byte aligned.

But what's really important is that the actual allocated DMA memory
has that alignment, even if the minimum is smaller than that.

Remove the BUILD_BUG_ON() call checking against minimim DMA alignment
and instead verify at rutime that the allocated memory is properly
aligned.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:20:34 -07:00
Alex Elder
3c54b7be5d net: ipa: use upper_32_bits()
Use upper_32_bits() to extract the high-order 32 bits of a DMA
address.  This avoids doing a 32-position shift on a DMA address
if it happens not to be 64 bits wide.  Use lower_32_bits() to
extract the low-order 32 bits (because that's what it's for).

Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:20:34 -07:00
Alex Elder
d2fd2311de net: ipa: fix assumptions about DMA address size
Some build time checks in ipa_table_validate_build() assume that a
DMA address is 64 bits wide.  That is more restrictive than it has
to be.  A route or filter table is 64 bits wide no matter what the
size of a DMA address is on the AP.  The code actually uses a
pointer to __le64 to access table entries, and a fixed constant
IPA_TABLE_ENTRY_SIZE to describe the size of those entries.

Loosen up two checks so they still verify some requirements, but
such that they do not assume the size of a DMA address is 64 bits.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:20:34 -07:00
David S. Miller
5108802abc Merge branch 's390-qeth-next'
Julian Wiedmann says:

====================
s390/qeth: updates 2021-03-18

please apply the following patch series for qeth to netdev's net-next
tree.

This brings two small optimizations (replace a hard-coded GFP_ATOMIC,
pass through the NAPI budget to enable napi_consume_skb()), and removes
some redundant VLAN filter code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:18:38 -07:00
Julian Wiedmann
d96a8c693d s390/qeth: remove RX VLAN filter stubs in L3 driver
The callbacks have been slimmed down to a level where they no longer do
any actual work. So stop pretending that we support the
NETIF_F_HW_VLAN_CTAG_FILTER feature.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:18:37 -07:00
Julian Wiedmann
ad4bbd7285 s390/qeth: enable napi_consume_skb() for pending TX buffers
Pending TX buffers are completed from the same NAPI code as normal
TX buffers. Pass the NAPI budget to qeth_tx_complete_buf() so that
the freeing of the completed skbs can be deferred.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:18:37 -07:00
Julian Wiedmann
e47ded97f9 s390/qeth: allocate initial TX Buffer structs with GFP_KERNEL
qeth_init_qdio_out_buf() is typically called during initialization, and
the GFP_ATOMIC is only needed for a very specific & rare case during TX
completion.

Allow callers to specify a gfp mask, so that the initialization path can
select GFP_KERNEL. While at it also clarify the function name.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 16:18:37 -07:00
David S. Miller
c2ed62b997 Merge branch 'net-xps-improve-the-xps-maps-handling'
Antoine Tenart says:

====================
net: xps: improve the xps maps handling

This series aims at fixing various issues with the xps code, including
out-of-bound accesses and use-after-free. While doing so we try to
improve the xps code maintainability and readability.

The main change is moving dev->num_tc and dev->nr_ids in the xps maps, to
avoid out-of-bound accesses as those two fields can be updated after the
maps have been allocated. This allows further reworks, to improve the
xps code readability and allow to stop taking the rtnl lock when
reading the maps in sysfs. The maps are moved to an array in net_device,
which simplifies the code a lot.

One future improvement may be to remove the use of xps_map_mutex from
net/core/dev.c, but that may require extra care.

Thanks!
Antoine

Since v3:
  - Removed the 3 patches about the rtnl lock and __netif_set_xps_queue
    as there are extra issues. Those patches were not tied to the
    others, and I'll see want can be done as a separate effort.
  - One small fix in patch 12.

Since v2:
  - Patches 13-16 are new to the series.
  - Fixed another issue I found while preparing v3 (use after free of
    old xps maps).
  - Kept the rtnl lock when calling netdev_get_tx_queue and
    netdev_txq_to_tc.
  - Use get_device/put_device when using the sb_dev.
  - Take the rtnl lock in mlx5 and virtio_net when calling
    netif_set_xps_queue.
  - Fixed a coding style issue.

Since v1:
  - Reordered the patches to improve readability and avoid introducing
    issues in between patches.
  - Use dev_maps->nr_ids to allocate the mask in xps_queue_show but
    still default to nr_cpu_ids/dev->num_rx_queues in xps_queue_show
    when dev_maps hasn't been allocated yet for backward
    compatibility.:w
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:23 -07:00
Antoine Tenart
75b2758abc net: NULL the old xps map entries when freeing them
In __netif_set_xps_queue, old map entries from the old dev_maps are
freed but their corresponding entry in the old dev_maps aren't NULLed.
Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart
2d05bf0153 net: fix use after free in xps
When setting up an new dev_maps in __netif_set_xps_queue, we remove and
free maps from unused CPUs/rx-queues near the end of the function; by
calling remove_xps_queue. However it's possible those maps are also part
of the old not-freed-yet dev_maps, which might be used concurrently.
When that happens, a map can be freed while its corresponding entry in
the old dev_maps table isn't NULLed, leading to: "BUG: KASAN:
use-after-free" in different places.

This fixes the map freeing logic for unused CPUs/rx-queues, to also NULL
the map entries from the old dev_maps table.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart
2db6cdaeba net-sysfs: move the xps cpus/rxqs retrieval in a common function
Most of the xps_cpus_show and xps_rxqs_show functions share the same
logic. Having it in two different functions does not help maintenance.
This patch moves their common logic into a new function, xps_queue_show,
to improve this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart
d7be87a687 net-sysfs: move the rtnl unlock up in the xps show helpers
Now that nr_ids and num_tc are stored in the xps dev_maps, which are RCU
protected, we do not have the need to protect the maps in the rtnl lock.
Move the rtnl unlock up so we reduce the rtnl locking section.

We also increase the reference count on the subordinate device if any,
as we don't want this device to be freed while we use it (now that the
rtnl lock isn't protecting it in the whole function).

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart
132f743b01 net: improve queue removal readability in __netif_set_xps_queue
Improve the readability of the loop removing tx-queue from unused
CPUs/rx-queues in __netif_set_xps_queue. The change should only be
cosmetic.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart
402fbb992e net: add an helper to copy xps maps to the new dev_maps
This patch adds an helper, xps_copy_dev_maps, to copy maps from dev_maps
to new_dev_maps at a given index. The logic should be the same, with an
improved code readability and maintenance.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart
044ab86d43 net: move the xps maps to an array
Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in
net_device. That will simplify a lot the code removing the need for lots
of if/else conditionals as the correct map will be available using its
offset in the array.

This should not modify the xps maps behaviour in any way.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart
6f36158e05 net: remove the xps possible_mask
Remove the xps possible_mask. It was an optimization but we can just
loop from 0 to nr_ids now that it is embedded in the xps dev_maps. That
simplifies the code a bit.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00
Antoine Tenart
5478fcd0f4 net: embed nr_ids in the xps maps
Embed nr_ids (the number of cpu for the xps cpus map, and the number of
rxqs for the xps cpus map) in dev_maps. That will help not accessing out
of bound memory if those values change after dev_maps was allocated.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18 14:56:22 -07:00