Commit Graph

843511 Commits

Author SHA1 Message Date
Maxim Mikityanskiy
a011b49f4e net/mlx5e: Consider XSK in XDP MTU limit calculation
Use the existing mlx5e_get_linear_rq_headroom function to calculate the
headroom for mlx5e_xdp_max_mtu. This function takes the XSK headroom
into consideration, which will be used in the following patches.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:28 +02:00
Maxim Mikityanskiy
84a0a2310d net/mlx5e: XDP_TX from UMEM support
When an XDP program returns XDP_TX, and the RQ is XSK-enabled, it
requires careful handling, because convert_to_xdp_frame creates a new
page and copies the data there, while our driver expects the xdp_frame
to point to the same memory as the xdp_buff. Handle this case
separately: map the page, and in the end unmap it and call
xdp_return_frame.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:27 +02:00
Maxim Mikityanskiy
b9673cf555 net/mlx5e: Share the XDP SQ for XDP_TX between RQs
Put the XDP SQ that is used for XDP_TX into the channel. It used to be a
part of the RQ, but with introduction of AF_XDP there will be one more
RQ that could share the same XDP SQ. This patch is a preparation for
that change.

Separate XDP_TX statistics per RQ were implemented in one of the previous
patches.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:27 +02:00
Maxim Mikityanskiy
d963fa1511 net/mlx5e: Refactor struct mlx5e_xdp_info
Currently, struct mlx5e_xdp_info has some issues that have to be cleaned
up before the upcoming AF_XDP support makes things too complicated and
messy. This structure is used both when sending the packet and on
completion. Moreover, the cleanup procedure on completion depends on the
origin of the packet (XDP_REDIRECT, XDP_TX). Adding AF_XDP support will
add new flows that use this structure even differently. To avoid
overcomplicating the code, this commit refactors the usage of this
structure in the following ways:

1. struct mlx5e_xdp_info is split into two different structures. One is
struct mlx5e_xdp_xmit_data, a transient structure that doesn't need to
be stored and is only used while sending the packet. The other is still
struct mlx5e_xdp_info that is stored in a FIFO and contains the fields
needed on completion.

2. The fields of struct mlx5e_xdp_info that are used in different flows
are put into a union. A special enum indicates the cleanup mode and
helps choose the right union member. This approach is clear and
explicit. Although it could be possible to "guess" the mode by looking
at the values of the fields and at the XDP SQ type, it wouldn't be that
clear and extendable and would require looking through the whole chain
to understand what's going on.

For the reference, there are the fields of struct mlx5e_xdp_info that
are used in different flows (including AF_XDP ones):

Packet origin          | Fields used on completion | Cleanup steps
-----------------------+---------------------------+------------------
XDP_REDIRECT,          | xdpf, dma_addr            | DMA unmap and
XDP_TX from XSK RQ     |                           | xdp_return_frame.
-----------------------+---------------------------+------------------
XDP_TX from regular RQ | di                        | Recycle page.
-----------------------+---------------------------+------------------
AF_XDP TX              | (none)                    | Increment the
                       |                           | producer index in
                       |                           | Completion Ring.

On send, the same set of mlx5e_xdp_xmit_data fields is used in all
flows: DMA and virtual addresses and length.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:27 +02:00
Maxim Mikityanskiy
ed084fb604 net/mlx5e: Allow ICO SQ to be used by multiple RQs
Prepare to creation of the XSK RQ, which will require posting UMRs, too.
The same ICO SQ will be used for both RQs and also to trigger interrupts
by posting NOPs. UMR WQEs can't be reused any more. Optimization
introduced in commit ab966d7e4f ("net/mlx5e: RX, Recycle buffer of
UMR WQEs") is reverted.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:27 +02:00
Maxim Mikityanskiy
a069e977d6 net/mlx5e: Calculate linear RX frag size considering XSK
Additional conditions introduced:

- XSK implies XDP.
- Headroom includes the XSK headroom if it exists.
- No space is reserved for struct shared_skb_info in XSK mode.
- Fragment size smaller than the XSK chunk size is not allowed.

A new auxiliary function mlx5e_get_linear_rq_headroom with the support
for XSK is introduced. Use this function in the implementation of
mlx5e_get_rq_headroom. Change headroom to u32 to match the headroom
field in struct xdp_umem.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:27 +02:00
Maxim Mikityanskiy
6ed9350fe0 net/mlx5e: Replace deprecated PCI_DMA_TODEVICE
The PCI API for DMA is deprecated, and PCI_DMA_TODEVICE is just defined
to DMA_TO_DEVICE for backward compatibility. Just use DMA_TO_DEVICE.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:27 +02:00
Maxim Mikityanskiy
4bce4e5cb6 xsk: Return the whole xdp_desc from xsk_umem_consume_tx
Some drivers want to access the data transmitted in order to implement
acceleration features of the NICs. It is also useful in AF_XDP TX flow.

Change the xsk_umem_consume_tx API to return the whole xdp_desc, that
contains the data pointer, length and DMA address, instead of only the
latter two. Adapt the implementation of i40e and ixgbe to this change.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:27 +02:00
Maxim Mikityanskiy
123e8da1d3 xsk: Change the default frame size to 4096 and allow controlling it
The typical XDP memory scheme is one packet per page. Change the AF_XDP
frame size in libbpf to 4096, which is the page size on x86, to allow
libbpf to be used with the drivers with the packet-per-page scheme.

Add a command line option -f to xdpsock to allow to specify a custom
frame size.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:26 +02:00
Maxim Mikityanskiy
2761ed4b6e libbpf: Support getsockopt XDP_OPTIONS
Query XDP_OPTIONS in libbpf to determine if the zero-copy mode is active
or not.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:26 +02:00
Maxim Mikityanskiy
2640d3c812 xsk: Add getsockopt XDP_OPTIONS
Make it possible for the application to determine whether the AF_XDP
socket is running in zero-copy mode. To achieve this, add a new
getsockopt option XDP_OPTIONS that returns flags. The only flag
supported for now is the zero-copy mode indicator.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:26 +02:00
Maxim Mikityanskiy
d57d76428a xsk: Add API to check for available entries in FQ
Add a function that checks whether the Fill Ring has the specified
amount of descriptors available. It will be useful for mlx5e that wants
to check in advance, whether it can allocate a bulk of RX descriptors,
to get the best performance.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:26 +02:00
Maxim Mikityanskiy
e18953240d net/mlx5e: Attach/detach XDP program safely
When an XDP program is set, a full reopen of all channels happens in two
cases:

1. When there was no program set, and a new one is being set.

2. When there was a program set, but it's being unset.

The full reopen is necessary, because the channel parameters may change
if XDP is enabled or disabled. However, it's performed in an unsafe way:
if the new channels fail to open, the old ones are already closed, and
the interface goes down. Use the safe way to switch channels instead.
The same way is already used for other configuration changes.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:26 +02:00
Roman Gushchin
e5c891a349 bpf: fix cgroup bpf release synchronization
Since commit 4bfc0bb2c6 ("bpf: decouple the lifetime of cgroup_bpf
from cgroup itself"), cgroup_bpf release occurs asynchronously
(from a worker context), and before the release of the cgroup itself.

This introduced a previously non-existing race between the release
and update paths. E.g. if a leaf's cgroup_bpf is released and a new
bpf program is attached to the one of ancestor cgroups at the same
time. The race may result in double-free and other memory corruptions.

To fix the problem, let's protect the body of cgroup_bpf_release()
with cgroup_mutex, as it was effectively previously, when all this
code was called from the cgroup release path with cgroup mutex held.

Also let's skip cgroups, which have no chances to invoke a bpf
program, on the update path. If the cgroup bpf refcnt reached 0,
it means that the cgroup is offline (no attached processes), and
there are no associated sockets left. It means there is no point
in updating effective progs array! And it can lead to a leak,
if it happens after the release. So, let's skip such cgroups.

Big thanks for Tejun Heo for discovering and debugging of this problem!

Fixes: 4bfc0bb2c6 ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:51:58 +02:00
David S. Miller
d7ee287827 Generic DIM
From: Tal Gilboa and Yamin Fridman
 
 Implement net DIM over a generic DIM library, add RDMA DIM
 
 dim.h lib exposes an implementation of the DIM algorithm for
 dynamically-tuned interrupt moderation for networking interfaces.
 
 We want a similar functionality for other protocols, which might need to
 optimize interrupts differently. Main motivation here is DIM for NVMf
 storage protocol.
 
 Current DIM implementation prioritizes reducing interrupt overhead over
 latency. Also, in order to reduce DIM's own overhead, the algorithm might
 take some time to identify it needs to change profiles. While this is
 acceptable for networking, it might not work well on other scenarios.
 
 Here we propose a new structure to DIM. The idea is to allow a slightly
 modified functionality without the risk of breaking Net DIM behavior for
 netdev. We verified there are no degradations in current DIM behavior with
 the modified solution.
 
 Suggested solution:
 - Common logic is implemented in lib/dim/dim.c
 - Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
   the common logic in dim.c
 - Any new DIM logic will be implemented in "lib/dim/new_dim.c".
   This new implementation will expose modified versions of profiles,
   dim_step() and dim_decision().
 - DIM API is declared in include/linux/dim.h for all implementations.
 
 Pros for this solution are:
 - Zero impact on existing net_dim implementation and usage
 - Relatively more code reuse (compared to two separate solutions)
 - Increased extensibility
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0SiDAACgkQSD+KveBX
 +j4mlQgAoabbtxwIapMg62tKhnlr/3hEEyuEjiQRhzMUb5y5zOIJNRL82Nerv4K3
 oD7v76emw/Xb1ErPe4yJJRdkpbpabEAFKbqkM0R7xFPNK33ZbphlU1d9YGgUKp1W
 Uieydn5kfubzi6p7R1TC7c2qW1Wr6rE1wnNJyC/sy8zFEP/OEH0NmgPDyOs7ckWI
 xaoJ1lhNWMoGN+Qk+fxazd1VtsKdcbc9JvLD2e8ON5qWdnuNiLhawcJVoWzfsCAa
 V/Tdff+Rj9bVFsi468u20og4EdV3ceAOTzMnhCYpAtYZHVbVk6MFgh87uQYkTrfY
 sW8BDmWKkRdHtzHd3a3yQiVDnz/iuw==
 =bAZn
 -----END PGP SIGNATURE-----

Merge tag 'blk-dim-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mamameed says:

====================
Generic DIM

From: Tal Gilboa and Yamin Fridman

Implement net DIM over a generic DIM library, add RDMA DIM

dim.h lib exposes an implementation of the DIM algorithm for
dynamically-tuned interrupt moderation for networking interfaces.

We want a similar functionality for other protocols, which might need to
optimize interrupts differently. Main motivation here is DIM for NVMf
storage protocol.

Current DIM implementation prioritizes reducing interrupt overhead over
latency. Also, in order to reduce DIM's own overhead, the algorithm might
take some time to identify it needs to change profiles. While this is
acceptable for networking, it might not work well on other scenarios.

Here we propose a new structure to DIM. The idea is to allow a slightly
modified functionality without the risk of breaking Net DIM behavior for
netdev. We verified there are no degradations in current DIM behavior with
the modified solution.

Suggested solution:
- Common logic is implemented in lib/dim/dim.c
- Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
  the common logic in dim.c
- Any new DIM logic will be implemented in "lib/dim/new_dim.c".
  This new implementation will expose modified versions of profiles,
  dim_step() and dim_decision().
- DIM API is declared in include/linux/dim.h for all implementations.

Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Increased extensibility
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 12:42:51 -07:00
Christian Lamparter
a653f2f538 net: dsa: qca8k: introduce reset via gpio feature
The QCA8337(N) has a RESETn signal on Pin B42 that
triggers a chip reset if the line is pulled low.
The datasheet says that: "The active low duration
must be greater than 10 ms".

This can hopefully fix some of the issues related
to pin strapping in OpenWrt for the EA8500 which
suffers from detection issues after a SoC reset.

Please note that the qca8k_probe() function does
currently require to read the chip's revision
register for identification purposes.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:17:30 -07:00
Christian Lamparter
e7dd8a8948 dt-bindings: net: dsa: qca8k: document reset-gpios property
This patch documents the qca8k's reset-gpios property that
can be used if the QCA8337N ends up in a bad state during
reset.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:17:30 -07:00
David Ahern
b2c709cce6 ipv6: Convert gateway validation to use fib6_info
Gateway validation does not need a dst_entry, it only needs the fib
entry to validate the gateway resolution and egress device. So,
convert ip6_nh_lookup_table from ip6_pol_route to fib6_table_lookup
and ip6_route_check_nh to use fib6_lookup over rt6_lookup.

ip6_pol_route is a call to fib6_table_lookup and if successful a call
to fib6_select_path. From there the exception cache is searched for an
entry or a dst_entry is created to return to the caller. The exception
entry is not relevant for gateway validation, so what matters are the
calls to fib6_table_lookup and then fib6_select_path.

Similarly, rt6_lookup can be replaced with a call to fib6_lookup with
RT6_LOOKUP_F_IFACE set in flags. Again, the exception cache search is
not relevant, only the lookup with path selection. The primary difference
in the lookup paths is the use of rt6_select with fib6_lookup versus
rt6_device_match with rt6_lookup. When you remove complexities in the
rt6_select path, e.g.,
1. saddr is not set for gateway validation, so RT6_LOOKUP_F_HAS_SADDR
   is not relevant
2. rt6_check_neigh is not called so that removes the RT6_NUD_FAIL_DO_RR
   return and round-robin logic.

the code paths are believed to be equivalent for the given use case -
validate the gateway and optionally given the device. Furthermore, it
aligns the validation with onlink code path and the lookup path actually
used for rx and tx.

Adjust the users, ip6_route_check_nh_onlink and ip6_route_check_nh to
handle a fib6_info vs a rt6_info when performing validation checks.

Existing selftests fib-onlink-tests.sh and fib_tests.sh are used to
verify the changes.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:10:23 -07:00
David S. Miller
5b1bf3f644 Merge branch 'FDB-VLAN-and-PTP-fixes-for-SJA1105-DSA'
Vladimir Oltean says:

====================
FDB, VLAN and PTP fixes for SJA1105 DSA

This patchset is an assortment of fixes for the net-next version of the
sja1105 DSA driver:
- Avoid a kernel panic when the driver fails to probe or unregisters
- Finish Arnd Bermann's idea of compiling PTP support as part of the
  main DSA driver and not separately
- Better handling of initial port-based VLAN as well as VLANs for
  dsa_8021q FDB entries
- Fix address learning for the SJA1105 P/Q/R/S family
- Make static FDB entries persistent across switch resets
- Fix reporting of statically-added FDB entries in 'bridge fdb show'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:22 -07:00
Vladimir Oltean
d763778224 net: dsa: sja1105: Implement is_static for FDB entries on E/T
The first generation switches don't tell us through the dynamic config
interface whether the dumped FDB entries are static or not (the LOCKEDS
bit from P/Q/R/S).

However, now that we're keeping a mirror of all 'bridge fdb' commands in
the static config, this is an opportunity to compare a dumped FDB entry
to the driver's private database.  After all, what makes an entry static
is that *we* added it.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:22 -07:00
Vladimir Oltean
b3ee526a88 net: dsa: sja1105: Use correct dsa_8021q VIDs for FDB commands
A FDB entry means that "frames that match this VID and DMAC must be
forwarded to this port".

In the case of dsa_8021q however, the VID is not a single one (and
neither two, as my previous patch assumed). The VID can be set either by
the CPU port (1 tx_vid), or by any of the other front-panel port (n-1
rx_vid's).

Fixes: 93647594d8 ("net: dsa: sja1105: Hide the dsa_8021q VLANs from the bridge fdb command")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:21 -07:00
Vladimir Oltean
17ae655540 net: dsa: sja1105: Populate is_static for FDB entries on P/Q/R/S
The reason why this wasn't tackled earlier is that I had hoped I
understood the user manual wrong.  But unfortunately hacks are required
in order to retrieve the static/dynamic nature of FDB entries on SJA1105
P/Q/R/S, since this info is stored in the writeback buffer of the
dynamic config command.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:21 -07:00
Vladimir Oltean
4a95078636 net: dsa: sja1105: Add a high-level overview of the dynamic config interface
When trying to add support for LOCKEDS (static FDB entries) on SJA1105
P/Q/R/S, at first I didn't remember how the abstraction I created
worked, and actually thought it works by mistake.

To avoid other people staring at the code and not making much sense out
of it, add some comments at the top of the file.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:21 -07:00
Vladimir Oltean
60f6053ff1 net: dsa: sja1105: Back up static FDB entries in kernel memory
After commit 8456721dd4 ("net: dsa: sja1105: Add support for
configuring address ageing time"), we started to reset the switch rather
often (each time the bridge core changes the ageing time on a switch
port).

The unfortunate reality is that SJA1105 doesn't have any {cold, warm,
whatever} reset mode in which it accepts a new configuration stream
without flushing the FDB.  Instead, in its world, the FDB *is* an
optional part of the static configuration.

So we play its game, and do what we also do for VLANs: for each 'bridge
fdb' command, we add the FDB entry through the dynamic interface, and we
append the in-kernel static config memory with info that we're going to
use later, when the next reset command is going to be issued.

The result is that 'bridge fdb' commands are now persistent (dynamically
learned entries are lost, but that's ok).

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:21 -07:00
Vladimir Oltean
6c56e167cc net: dsa: sja1105: Make P/Q/R/S learn MAC addresses
At the end of the commit 1da7382134 ("net: dsa: sja1105: Add FDB
operations for P/Q/R/S series") message, I said that:

    At the moment only FDB entries installed statically through 'bridge fdb'
    are visible in the dump callback - the dynamically learned ones are
    still under investigation.

It looks like the reason why they were not visible in 'bridge fdb' was
that they were never learned - always flooded.

SJA1105 P/Q/R/S manual says about the MAXADDRP[port] field:

    Specify the maximum number of MAC address dynamically learned from
    the respective port. It is used to limit the number of learned MAC
    addresses per port.

It looks like not providing a value in the static config (aka providing
zeroes) is enough for it to not store the learned addresses in the FDB.

For now we divide the 1024 entry FDB "equally" amongst the 5 ports. This
may be revisited if the situation calls for that - for now I'm happy
that learning works.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:21 -07:00
Vladimir Oltean
0803948e23 net: dsa: sja1105: Actually implement the P/Q/R/S FDB bits
In commit 1da7382134 ("net: dsa: sja1105: Add FDB operations for
P/Q/R/S series"), these bits were set in the static config, but
apparently they did not do anything.  The reason is that the packing
accessors for them were part of a patch I forgot to send.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:21 -07:00
Vladimir Oltean
e3502b8297 net: dsa: sja1105: Make vid 1 the default pvid
In SJA1105 there is no concept of 'default values' per se, everything
needs to be driver-supplied through the static configuration tables.

The issue is that the hardware manual says that 'at least the default
untagging VLAN' is mandatory to be provided through the static config.
But VLAN 0 isn't a very good initial pvid - its use is reserved for
priority-tagged frames, and the layers of the stack that care about
those already make sure that this VLAN is installed, as can be seen in
the message below:

  8021q: adding VLAN 0 to HW filter on device swp2

So change the pvid provided through the static configuration to 1, which
matches the bridge core's defaults.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:21 -07:00
Vladimir Oltean
29dd908d35 net: dsa: sja1105: Cancel PTP delayed work on unregister
Currently when the driver unloads and PTP is enabled, the delayed work
that prevents the timecounter from expiring becomes a ticking time bomb.
The kernel will schedule the work thread within 60 seconds of driver
removal, but the work handler is no longer there, leading to this
strange and inconclusive stack trace:

[   64.473112] Unable to handle kernel paging request at virtual address 79746970
[   64.480340] pgd = 008c4af9
[   64.483042] [79746970] *pgd=00000000
[   64.486620] Internal error: Oops: 80000005 [#1] SMP ARM
[   64.491820] Modules linked in:
[   64.494871] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc5-01634-ge3a2773ba9e5 #1246
[   64.503007] Hardware name: Freescale LS1021A
[   64.507259] PC is at 0x79746970
[   64.510393] LR is at call_timer_fn+0x3c/0x18c
[   64.514729] pc : [<79746970>]    lr : [<c03bd734>]    psr: 60010113
[   64.520965] sp : c1901de0  ip : 00000000  fp : c1903080
[   64.526163] r10: c1901e38  r9 : ffffe000  r8 : c19064ac
[   64.531363] r7 : 79746972  r6 : e98dd260  r5 : 00000100  r4 : c1a9e4a0
[   64.537859] r3 : c1900000  r2 : ffffa400  r1 : 79746972  r0 : e98dd260
[   64.544359] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   64.551460] Control: 10c5387d  Table: a8a2806a  DAC: 00000051
[   64.557176] Process swapper/0 (pid: 0, stack limit = 0x1ddb27f0)
[   64.563147] Stack: (0xc1901de0 to 0xc1902000)
[   64.567481] 1de0: eb6a4918 3d60d7c3 c1a9e554 e98dd260 eb6a34c0 c1a9e4a0 ffffa400 c19064ac
[   64.575616] 1e00: ffffe000 c03bd95c c1901e34 c1901e34 eb6a34c0 c1901e30 c1903d00 c186f4c0
[   64.583751] 1e20: c1906488 29e34000 c1903080 c03bdca4 00000000 eaa6f218 00000000 eb6a45c0
[   64.591886] 1e40: eb6a45c0 20010193 00000003 c03c0a68 20010193 3f7231be c1903084 00000002
[   64.600022] 1e60: 00000082 00000001 ffffe000 c1a9e0a4 00000100 c0302298 02b64722 0000000f
[   64.608157] 1e80: c186b3c8 c1877540 c19064ac 0000000a c186b350 ffffa401 c1903d00 c1107348
[   64.616292] 1ea0: 00200102 c0d87a14 ea823c00 ffffe000 00000012 00000000 00000000 ea810800
[   64.624427] 1ec0: f0803000 c1876ba8 00000000 c034c784 c18774b8 c039fb50 c1906c90 c1978aac
[   64.632562] 1ee0: f080200c f0802000 c1901f10 c0709ca8 c03091a0 60010013 ffffffff c1901f44
[   64.640697] 1f00: 00000000 c1900000 c1876ba8 c0301a8c 00000000 000070a0 eb6ac1a0 c031da60
[   64.648832] 1f20: ffffe000 c19064ac c19064f0 00000001 00000000 c1906488 c1876ba8 00000000
[   64.656967] 1f40: ffffffff c1901f60 c030919c c03091a0 60010013 ffffffff 00000051 00000000
[   64.665102] 1f60: ffffe000 c0376aa4 c1a9da37 ffffffff 00000037 3f7231be c1ab20c0 000000cc
[   64.673238] 1f80: c1906488 c1906480 ffffffff 00000037 c1ab20c0 c1ab20c0 00000001 c0376e1c
[   64.681373] 1fa0: c1ab2118 c1700ea8 ffffffff ffffffff 00000000 c1700754 c17dfa40 ebfffd80
[   64.689509] 1fc0: 00000000 c17dfa40 3f7733be 00000000 00000000 c1700330 00000051 10c0387d
[   64.697644] 1fe0: 00000000 8f000000 410fc075 10c5387d 00000000 00000000 00000000 00000000
[   64.705788] [<c03bd734>] (call_timer_fn) from [<c03bd95c>] (expire_timers+0xd8/0x144)
[   64.713579] [<c03bd95c>] (expire_timers) from [<c03bdca4>] (run_timer_softirq+0xe4/0x1dc)
[   64.721716] [<c03bdca4>] (run_timer_softirq) from [<c0302298>] (__do_softirq+0x130/0x3c8)
[   64.729854] [<c0302298>] (__do_softirq) from [<c034c784>] (irq_exit+0xbc/0xd8)
[   64.737040] [<c034c784>] (irq_exit) from [<c039fb50>] (__handle_domain_irq+0x60/0xb4)
[   64.744833] [<c039fb50>] (__handle_domain_irq) from [<c0709ca8>] (gic_handle_irq+0x58/0x9c)
[   64.753143] [<c0709ca8>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
[   64.760583] Exception stack(0xc1901f10 to 0xc1901f58)
[   64.765605] 1f00:                                     00000000 000070a0 eb6ac1a0 c031da60
[   64.773740] 1f20: ffffe000 c19064ac c19064f0 00000001 00000000 c1906488 c1876ba8 00000000
[   64.781873] 1f40: ffffffff c1901f60 c030919c c03091a0 60010013 ffffffff
[   64.788456] [<c0301a8c>] (__irq_svc) from [<c03091a0>] (arch_cpu_idle+0x38/0x3c)
[   64.795816] [<c03091a0>] (arch_cpu_idle) from [<c0376aa4>] (do_idle+0x1bc/0x298)
[   64.803175] [<c0376aa4>] (do_idle) from [<c0376e1c>] (cpu_startup_entry+0x18/0x1c)
[   64.810707] [<c0376e1c>] (cpu_startup_entry) from [<c1700ea8>] (start_kernel+0x480/0x4ac)
[   64.818839] Code: bad PC value
[   64.821890] ---[ end trace e226ed97b1c584cd ]---
[   64.826482] Kernel panic - not syncing: Fatal exception in interrupt
[   64.832807] CPU1: stopping
[   64.835501] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D           5.2.0-rc5-01634-ge3a2773ba9e5 #1246
[   64.845013] Hardware name: Freescale LS1021A
[   64.849266] [<c0312394>] (unwind_backtrace) from [<c030cc74>] (show_stack+0x10/0x14)
[   64.856972] [<c030cc74>] (show_stack) from [<c0ff4138>] (dump_stack+0xb4/0xc8)
[   64.864159] [<c0ff4138>] (dump_stack) from [<c0310854>] (handle_IPI+0x3bc/0x3dc)
[   64.871519] [<c0310854>] (handle_IPI) from [<c0709ce8>] (gic_handle_irq+0x98/0x9c)
[   64.879050] [<c0709ce8>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
[   64.886489] Exception stack(0xea8cbf60 to 0xea8cbfa8)
[   64.891514] bf60: 00000000 0000307c eb6c11a0 c031da60 ffffe000 c19064ac c19064f0 00000002
[   64.899649] bf80: 00000000 c1906488 c1876ba8 00000000 00000000 ea8cbfb0 c030919c c03091a0
[   64.907780] bfa0: 600d0013 ffffffff
[   64.911250] [<c0301a8c>] (__irq_svc) from [<c03091a0>] (arch_cpu_idle+0x38/0x3c)
[   64.918609] [<c03091a0>] (arch_cpu_idle) from [<c0376aa4>] (do_idle+0x1bc/0x298)
[   64.925967] [<c0376aa4>] (do_idle) from [<c0376e1c>] (cpu_startup_entry+0x18/0x1c)
[   64.933496] [<c0376e1c>] (cpu_startup_entry) from [<803025cc>] (0x803025cc)
[   64.940422] Rebooting in 3 seconds..

In this case, what happened is that the DSA driver failed to probe at
boot time due to a PHY issue during phylink_connect_phy:

[    2.245607] fsl-gianfar soc:ethernet@2d90000 eth2: error -19 setting up slave phy
[    2.258051] sja1105 spi0.1: failed to create slave for port 0.0

Fixes: bb77f36ac2 ("net: dsa: sja1105: Add support for the PTP clock")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:21 -07:00
Vladimir Oltean
3d64ea387c net: dsa: sja1105: Build PTP support in main DSA driver
As Arnd Bergmann pointed out in commit 78fe8a28fb ("net: dsa: sja1105:
fix ptp link error"), there is no point in having PTP support as a
separate loadable kernel module.

So remove the exported symbols and make sja1105.ko contain PTP support
or not based on CONFIG_NET_DSA_SJA1105_PTP.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:03:21 -07:00
David S. Miller
c881e10e3f Merge branch 'net-dsa-microchip-Convert-to-regmap'
Marek Vasut says:

====================
net: dsa: microchip: Convert to regmap

This patchset converts KSZ9477 switch driver to regmap.

This was tested with extra patches on KSZ8795. This was also tested
on KSZ9477 on Microchip KSZ9477EVB board, which I now have.
====================

Signed-off-by: Marek Vasut <marex@denx.de>
2019-06-27 11:00:32 -07:00
Marek Vasut
d4bcd99cd9 net: dsa: microchip: Replace ad-hoc bit manipulation with regmap
Regmap provides bit manipulation functions to set/clear bits, use those
insted of reimplementing them.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:00:32 -07:00
Marek Vasut
255b59ad0d net: dsa: microchip: Factor out regmap config generation into common header
The regmap config tables are rather similar for various generations of
the KSZ8xxx/KSZ9xxx switches. Introduce a macro which allows generating
those tables without duplication. Note that $regalign parameter is not
used right now, but will be used in KSZ87xx series switches.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:00:32 -07:00
Marek Vasut
ee394fea6f net: dsa: microchip: Dispose of ksz_io_ops
Since the driver now uses regmap , get rid of ad-hoc ksz_io_ops
abstraction, which no longer has any meaning. Moreover, since regmap
has it's own locking, get rid of the register access mutex.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:00:31 -07:00
Marek Vasut
46558d601c net: dsa: microchip: Initial SPI regmap support
Add basic SPI regmap support into the driver.

Previous patches unconver that ksz_spi_write() is always ever called
with len = 1, 2 or 4. We can thus drop the if (len > SPI_TX_BUF_LEN)
check and we can also drop the allocation of the txbuf which is part
of the driver data and wastes 256 bytes for no reason. Regmap covers
the whole thing now.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:00:31 -07:00
Marek Vasut
ff509dab43 net: dsa: microchip: Factor out register access opcode generation
Factor out the code which sends out the register read/write opcodes
to the switch, since the code differs in single bit between read and
write.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <Woojung.Huh@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:00:31 -07:00
Marek Vasut
5ce9676e8b net: dsa: microchip: Use PORT_CTRL_ADDR() instead of indirect function call
The indirect function call to dev->dev_ops->get_port_addr() is expensive
especially if called for every single register access, and only returns
the value of PORT_CTRL_ADDR() macro. Use PORT_CTRL_ADDR() macro directly
instead.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:00:31 -07:00
Marek Vasut
bafea01f65 net: dsa: microchip: Move ksz_cfg and ksz_port_cfg to ksz9477.c
These functions are only used by the KSZ9477 code, move them from
the header into that code. Note that these functions will be soon
replaced by regmap equivalents.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <Woojung.Huh@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:00:31 -07:00
Marek Vasut
860cbe92ff net: dsa: microchip: Inline ksz_spi.h
The functions in the header file are static, and the header file is
included from single C file, just inline the code into the C file.
The bonus is that it's easier to spot further content to clean up.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <Woojung.Huh@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:00:31 -07:00
Marek Vasut
78e4e32fe3 net: dsa: microchip: Remove ksz_{get,set}()
These functions and callbacks are never used, remove them.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <Woojung.Huh@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:00:31 -07:00
Marek Vasut
77972783fd net: dsa: microchip: Remove ksz_{read,write}24()
These functions and callbacks are never used, remove them.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <Woojung.Huh@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 11:00:31 -07:00
David S. Miller
1c57de6951 Merge branch 'net-aquantia-implement-vlan-offloads'
Igor Russkikh says:

====================
net: aquantia: implement vlan offloads

This patchset introduces hardware VLAN offload support and also does some
maintenance: we replace driver version with uts version string, add
documentation file for atlantic driver, and update maintainers
adding Igor as a maintainer.

v3: shuffle doc sections, per Andrew's comments

v2: updates in doc, gpl spdx tag cleanup
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:58:32 -07:00
Igor Russkikh
04f207fb0c net: aquantia: implement vlan offload configuration
set_features should update flags and reinit hardware if
vlan offload settings were changed.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Tested-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:58:32 -07:00
Igor Russkikh
880b3ca504 net: aquantia: vlan offloads logic in datapath
Update datapath by adding logic related to hardware assisted
vlan strip/insert behaviour.

Tested-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:58:32 -07:00
Igor Russkikh
d3ed7c5cf7 net: aquantia: adding fields and device features for vlan offload
Updating features and vlan_features with vlan HW offload.
Added vlan_tag fields to rx/tx ring_buff to track vlan related data.

Tested-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:58:32 -07:00
Igor Russkikh
161dea83f1 net: aquantia: added vlan offload related macros and functions
Register declaration macros required to work with vlan offload mode.

Tested-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:58:32 -07:00
Igor Russkikh
17f54a3bf5 net: aquantia: make all files GPL-2.0-only
It was noticed some files had -or-later, however overall driver has
-only license. Clean this up.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:58:32 -07:00
Igor Russkikh
f94551c88d maintainers: declare aquantia atlantic driver maintenance
Aquantia is resposible now for all new features and bugfixes.
Reflect that in MAINTAINERS.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:58:32 -07:00
Igor Russkikh
5a5d7a4dd4 net: aquantia: add documentation for the atlantic driver
Document contains configuration options description,
details and examples of driver various settings.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:58:32 -07:00
Igor Russkikh
2d3910c4dc net: aquantia: replace internal driver version code with uts
As it was discussed some time previously, driver is better to
report kernel version string, as it in a best way identifies
the codebase.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:58:32 -07:00
David S. Miller
0b58f64845 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2019-06-26

This series contains updates to ixgbe and i40e only.

Mauro S. M. Rodrigues update the ixgbe driver to handle transceivers who
comply with SFF-8472 but do not implement the Digital Diagnostic
Monitoring (DOM) interface.  Update the driver to check the necessary
bits to see if DOM is implemented before trying to read the additional
256 bytes in the EEPROM for DOM data.

Young Xiao fixes a potential divide by zero issue in ixgbe driver.

Aleksandr fixes i40e to recognize 2.5 and 5.0 GbE link speeds so that it
is not reported as "Unknown bps".  Fixes the driver to read the firmware
LLDP agent status during DCB initialization, and to properly log the
LLDP agent status to help with debugging when DCB fails to initialize.

Martyna fixes i40e for the missing supported and advertised link modes
information in ethtool.

Jake fixes a function header comment that was incorrect for a PTP
function in i40e.

Maciej fixes an issue for i40e when a XDP program is loaded the
descriptor count gets reset to the default value, resolve the issue by
making the current descriptor count persistent across resets.

Alice corrects a copyright date which she found to be incorrect.

Piotr adds a log entry when the traffic class 0 is added or deleted, which
was not being logged previously.

Gustavo A. R. Silva updates i40e to use struct_size() where possible.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 10:49:06 -07:00