Commit Graph

966273 Commits

Author SHA1 Message Date
Heiner Kallweit
1c470b53ec r8169: use pm_runtime_put_sync in rtl_open error path
We can safely runtime-suspend the chip if rtl_open() fails. Therefore
switch the error path to use pm_runtime_put_sync() as well.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/aa093b1e-f295-5700-1cb7-954b54dd8f17@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 16:35:59 -07:00
Heiner Kallweit
3a689e3497 r8169: remove unneeded memory barrier in rtl_tx
tp->dirty_tx isn't changed outside rtl_tx(). Therefore I see no need
to guarantee a specific order of reading tp->dirty_tx and tp->cur_tx.
Having said that we can remove the memory barrier.
In addition use READ_ONCE() when reading tp->cur_tx because it can
change in parallel to rtl_tx().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/2264563a-fa9e-11b0-2c42-31bc6b8e2790@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 16:35:49 -07:00
Armin Wolf
c24672cf59 ne2k: Fix Typo in RW-Bugfix
Correct a typo in ne.c and ne2k-pci.c which
prevented activation of the RW-Bugfix.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20201029143357.7008-1-W_Armin@gmx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 16:17:02 -07:00
Parshuram Thombare
e4e143e26c net: macb: add support for high speed interface
This patch adds support for 10GBASE-R interface to the linux driver for
Cadence's ethernet controller.
This controller has separate MAC's and PCS'es for low and high speed paths.
High speed PCS supports 100M, 1G, 2.5G, 5G and 10G through rate adaptation
implementation. However, since it doesn't support auto negotiation, linux
driver is modified to support 10GBASE-R instead of USXGMII.

Signed-off-by: Parshuram Thombare <pthombar@cadence.com>
Link: https://lore.kernel.org/r/1603975627-18338-1-git-send-email-pthombar@cadence.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 16:13:20 -07:00
Karsten Graul
3752404a68 net/smc: improve return codes for SMC-Dv2
To allow better problem diagnosis the return codes for SMC-Dv2 are
improved by this patch. A few more CLC DECLINE codes are defined and
sent to the peer when an SMC connection cannot be established.
There are now multiple SMC variations that are offered by the client and
the server may encounter problems to initialize all of them.
Because only one diagnosis code can be sent to the client the decision
was made to send the first code that was encountered. Because the server
tries the variations in the order of importance (SMC-Dv2, SMC-D, SMC-R)
this makes sure that the diagnosis code of the most important variation
is sent.

v2: initialize rc in smc_listen_v2_check().

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20201031181938.69903-1-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 15:44:13 -07:00
Jakub Kicinski
cfb2cffafa Merge branch 'support-for-octeontx2-98xx-silcion'
Subbaraya Sundeep says:

====================
Support for OcteonTx2 98xx silicon

OcteonTx2 series of silicons have multiple variants, the
98xx variant has two network interface controllers (NIX blocks)
each of which supports upto 100Gbps. Similarly 98xx supports
two crypto blocks (CPT) to double the crypto performance.
The current RVU drivers support a single NIX and
CPT blocks, this patchset adds support for multiple
blocks of same type to be active at the same time.

Also the number of serdes controllers (CGX) have increased
from three to five on 98xx. Each of the CGX block supports
upto 4 physical interfaces depending on the serdes mode ie
upto 20 physical interfaces. At a time each CGX block can
be mapped to a single NIX. The HW configuration to map CGX
and NIX blocks is done by firmware.

NPC has two new interfaces added NIX1_RX and NIX1_TX
similar to NIX0 interfaces. Also MCAM entries is increased
from 4k to 16k. To support the 16k entries extended set
is added in hardware which are at completely different
register offsets. Fortunately new constant registers
can be read to figure out the extended set is present
or not.

This patch set modifies existing AF and PF drivers
in below order to support 98xx:
- Prepare for supporting multiple blocks of same type.
  Functions which operate with block type to get or set
  resources count are modified to operate with block address
- Manage allocating and freeing LFs from new NIX1 and CPT1 RVU blocks.
- NIX block specific initialization and teardown for NIX1
- Based on the mapping set by Firmware, assign the NIX block
  LFs to a PF/VF.
- Multicast entries context is setup for NIX1 along with NIX0
- NPC changes to support extended set of MCAM entries, counters
  and NIX1 interfaces to NPC.
- All the mailbox changes required for the new blocks in 98xx.
- Since there are more CGX links in 98xx the hardcoded LBK
  link value needed by netdev drivers is not sufficient any
  more. Hence AF consumers need to get the number of all links
  and calculate the LBK link.
- Debugfs changes to display NIX1 contexts similar to NIX0
- Debugfs change to display mapping between CGX, NIX and PF.
====================

Link: https://lore.kernel.org/r/1603948549-781-1-git-send-email-sundeep.lkml@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:50 -07:00
Rakesh Babu
e2fb373038 octeontx2-af: Display CGX, NIX and PF map in debugfs.
Unlike earlier silicon variants, OcteonTx2 98xx
silicon has 2 NIX blocks and each of the CGX is
mapped to either of the NIX blocks. Each NIX
block supports 100G. Mapping btw NIX blocks and
CGX is done by firmware based on CGX speed config
to have a maximum possible network bandwidth.
Since the mapping is not fixed, it's difficult
for a user to figure out. Hence added a debugfs
entry which displays mapping between CGX LMAC,
NIX block and RVU PF.
Sample result of this entry ::

~# cat /sys/kernel/debug/octeontx2/rvu_pf_cgx_map
PCI dev         RVU PF Func     NIX block       CGX     LMAC
0002:02:00.0    0x400           NIX0            CGX0    LMAC0

Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:47 -07:00
Rakesh Babu
0f3ce484af octeontx2-af: Display NIX1 also in debugfs
If NIX1 block is also implemented then add a new
directory for NIX1 in debugfs root. Stats of
NIX1 block can be read/writen from/to the files
in directory "/sys/kernel/debug/octeontx2/nix1/".

Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:47 -07:00
Subbaraya Sundeep
8bcf5ced65 octeontx2-pf: Calculate LBK link instead of hardcoding
CGX links are followed by LBK links but number of
CGX and LBK links varies between platforms. Hence
get the number of links present in hardware from
AF and use it to calculate LBK link number.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:47 -07:00
Subbaraya Sundeep
a84cdcea3b octeontx2-af: Mbox changes for 98xx
This patch puts together all mailbox changes
for 98xx silicon:

Attach ->
Modify resource attach mailbox handler to
request LFs from a block address out of multiple
blocks of same type. If a PF/VF need LFs from two
blocks of same type then attach mbox should be
called twice.

Example:
        struct rsrc_attach *attach;
        .. Allocate memory for message ..
        attach->cptlfs = 3; /* 3 LFs from CPT0 */
        .. Send message ..
        .. Allocate memory for message ..
        attach->modify = 1;
        attach->cpt_blkaddr = BLKADDR_CPT1;
        attach->cptlfs = 2; /* 2 LFs from CPT1 */
        .. Send message ..

Detach ->
Update detach mailbox and its handler to detach
resources from CPT1 and NIX1 blocks.

MSIX ->
Updated the MSIX mailbox and its handler to return
MSIX offsets for the new block CPT1.

Free resources ->
Update free_rsrc mailbox and its handler to return
the free resources count of new blocks NIX1 and CPT1

Links ->
Number of CGX,LBK and SDP links may vary between
platforms. For example, in 98xx number of CGX and LBK
links are more than 96xx. Hence the info about number
of links present in hardware is useful for consumers to
request link configuration properly. This patch sends
this info in nix_lf_alloc_rsp.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:47 -07:00
Subbaraya Sundeep
1c1935c994 octeontx2-af: Add NIX1 interfaces to NPC
On 98xx silicon, NPC block has additional
mcam entries, counters and NIX1 interfaces.
Extended set of registers are present for the
new mcam entries and counters.
This patch does the following:
- updates the register accessing macros
  to use extended set if present.
- configures the MKEX profile for NIX1 interfaces also.
- updates mcam entry write functions to use assigned
  NIX0/1 interfaces for the PF/VF.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:47 -07:00
Subbaraya Sundeep
55efcc5714 octeontx2-af: Setup MCE context for assigned NIX
Initialize MCE context for the assigned NIX0/1
block for a CGX mapped PF. Modified rvu_nix_aq_enq_inst
function to work with nix_hw so that MCE contexts
for both NIX blocks can be inited.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:47 -07:00
Subbaraya Sundeep
c5a73b632b octeontx2-af: Map NIX block from CGX connection
Firmware configures NIX block mapping for all CGXs
to achieve maximum throughput. This patch reads
the configuration and create mapping between RVU
PF and NIX blocks. And for LBK VFs assign NIX0 for
even numbered VFs and NIX1 for odd numbered VFs.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:46 -07:00
Rakesh Babu
221f3dff29 octeontx2-af: Initialize NIX1 block
This patch modifies NIX functions to operate
with nix_hw context so that existing functions
can be used for both NIX0 and NIX1 blocks. And
the NIX blocks present in the system are initialized
during driver init and freed during exit.

Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:46 -07:00
Rakesh Babu
9932fb7250 octeontx2-af: Manage new blocks in 98xx
AF manages the tasks of allocating, freeing
LFs from RVU blocks to PF and VFs. With new
NIX1 and CPT1 blocks in 98xx, this patch
adds support for handling new blocks too.

Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:46 -07:00
Subbaraya Sundeep
cdd41e8785 octeontx2-af: Update get/set resource count functions
Since multiple blocks of same type are present in
98xx, modify functions which get resource count and
which update resource count to work with individual
block address instead of block type.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:16:46 -07:00
Robert Hancock
1a02556086 net: axienet: Properly handle PCS/PMA PHY for 1000BaseX mode
Update the axienet driver to properly support the Xilinx PCS/PMA PHY
component which is used for 1000BaseX and SGMII modes, including
properly configuring the auto-negotiation mode of the PHY and reading
the negotiated state from the PHY.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Link: https://lore.kernel.org/r/20201028171429.1699922-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 14:13:46 -07:00
Alex Elder
624251b4b5 net: ipa: avoid a bogus warning
The previous commit added support for IPA having up to six source
and destination resources.  But currently nothing uses more than
four.  (Five of each are used in a newer version of the hardware.)

I find that in one of my build environments the compiler complains
about newly-added code in two spots.  Inspection shows that the
warnings have no merit, but this compiler does not recognize that.

    ipa_main.c:457:39: warning: array index 5 is past the end of the
        array (which contains 4 elements) [-Warray-bounds]
    (and the same warning at line 483)

We can make this warning go away by changing the number of elements
in the source and destination resource limit arrays--now rather than
waiting until we need it to support the newer hardware.  This change
was coming soon anyway; make it now to get rid of the warning.

Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20201031151524.32132-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 13:22:58 -07:00
Jakub Kicinski
023efb15aa Merge branch 'net-add-functionality-to-net-core-byte-packet-counters-and-use-it-in-r8169'
Heiner Kallweit says:

====================
net: add functionality to net core byte/packet counters and use it in r8169

This series adds missing functionality to the net core handling of
byte/packet counters and statistics. The extensions are then used
to remove private rx/tx byte/packet counters in r8169 driver.
====================

Link: https://lore.kernel.org/r/1fdb8ecd-be0a-755d-1d92-c62ed8399e77@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 10:23:03 -07:00
Heiner Kallweit
f1d5470594 r8169: remove no longer needed private rx/tx packet/byte counters
After switching to the net core rx/tx byte/packet counters we can
remove the now unused private version.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 10:23:02 -07:00
Heiner Kallweit
5e4cb48001 r8169: use struct pcpu_sw_netstats for rx/tx packet/byte counters
Switch to the net core rx/tx byte/packet counter infrastructure.
This simplifies the code, only small drawback is some memory overhead
because we use just one queue, but allocate the counters per cpu.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 10:23:02 -07:00
Heiner Kallweit
81b01894d7 net: core: add devm_netdev_alloc_pcpu_stats
We have netdev_alloc_pcpu_stats(), and we have devm_alloc_percpu().
Add a managed version of netdev_alloc_pcpu_stats, e.g. for allocating
the per-cpu stats in the probe() callback of a driver. It needs to be
a macro for dealing properly with the type argument.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 10:22:49 -07:00
Heiner Kallweit
d3fd65484c net: core: add dev_sw_netstats_tx_add
Add dev_sw_netstats_tx_add(), complementing already existing
dev_sw_netstats_rx_add(). Other than dev_sw_netstats_rx_add allow to
pass the number of packets as function argument.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 10:21:31 -07:00
Jakub Kicinski
4e5d79bbe8 Merge branch 'in_interrupt-cleanup-part-2'
Sebastian Andrzej Siewior says:

====================
in_interrupt() cleanup, part 2

in the discussion about preempt count consistency across kernel configurations:

  https://lore.kernel.org/r/20200914204209.256266093@linutronix.de/

Linus clearly requested that code in drivers and libraries which changes
behaviour based on execution context should either be split up so that
e.g. task context invocations and BH invocations have different interfaces
or if that's not possible the context information has to be provided by the
caller which knows in which context it is executing.

This includes conditional locking, allocation mode (GFP_*) decisions and
avoidance of code paths which might sleep.

In the long run, usage of 'preemptible, in_*irq etc.' should be banned from
driver code completely.

This is part two addressing remaining drivers except for orinoco-usb.
====================

Cherry picking only Ethernet changes.

Link: https://lore.kernel.org/r/20201027225454.3492351-1-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 09:55:44 -07:00
Sebastian Andrzej Siewior
beca92820d net: tlan: Replace in_irq() usage
The driver uses in_irq() to determine if the tlan_priv::lock has to be
acquired in tlan_mii_read_reg() and tlan_mii_write_reg().

The interrupt handler acquires the lock outside of these functions so the
in_irq() check is meant to prevent a lock recursion deadlock. But this
check is incorrect when interrupt force threading is enabled because then
the handler runs in thread context and in_irq() correctly returns false.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

tlan_set_timer() has this conditional as well, but this function is only
invoked from task context or the timer callback itself. So it always has to
lock and the check can be removed.

tlan_mii_read_reg(), tlan_mii_write_reg() and tlan_phy_print() are invoked
from interrupt and other contexts.

Split out the actual function body into helper variants which are called
from interrupt context and make the original functions wrappers which
acquire tlan_priv::lock unconditionally.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Samuel Chessman <chessman@tux.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 09:55:38 -07:00
Sebastian Andrzej Siewior
dc5e8bfcd1 net: forcedeth: Replace context and lock check with a lockdep_assert()
nv_update_stats() triggers a WARN_ON() when invoked from hard interrupt
context because the locks in use are not hard interrupt safe. It also has
an assert_spin_locked() which was the lock check before the lockdep era.

Lockdep has way broader locking correctness checks and covers both issues,
so replace the warning and the lock assert with lockdep_assert_held().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Rain River <rain.1986.08.12@gmail.com>
Cc: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 09:55:30 -07:00
Sebastian Andrzej Siewior
5ce7f3f46f net: neterion: s2io: Replace in_interrupt() for context detection
wait_for_cmd_complete() uses in_interrupt() to detect whether it is safe to
sleep or not.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

in_interrupt() also is only partially correct because it fails to chose the
correct code path when just preemption or interrupts are disabled.

Add an argument 'may_block' to both functions and adjust the callers to
pass the context information.

The following call chains which end up invoking wait_for_cmd_complete()
were analyzed to be safe to sleep:

 s2io_card_up()
   s2io_set_multicast()

 init_nic()
   init_tti()

 s2io_close()
   do_s2io_delete_unicast_mc()
     do_s2io_add_mac()

 s2io_set_mac_addr()
   do_s2io_prog_unicast()
     do_s2io_add_mac()

 s2io_reset()
   do_s2io_restore_unicast_mc()
     do_s2io_add_mc()
       do_s2io_add_mac()

 s2io_open()
   do_s2io_prog_unicast()
     do_s2io_add_mac()

The following call chains which end up invoking wait_for_cmd_complete()
were analyzed to be safe to sleep:

 __dev_set_rx_mode()
    s2io_set_multicast()

 s2io_txpic_intr_handle()
   s2io_link()
     init_tti()

Add a may_sleep argument to wait_for_cmd_complete(), s2io_set_multicast()
and init_tti() and hand the context information in from the call sites.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31 09:55:16 -07:00
Jakub Kicinski
68bb4665a2 Merge branch 'l2-multicast-forwarding-for-ocelot-switch'
Vladimir Oltean says:

====================
L2 multicast forwarding for Ocelot switch

This series enables the mscc_ocelot switch to forward raw L2 (non-IP)
mdb entries as configured by the bridge driver after this patch:

https://patchwork.ozlabs.org/project/netdev/patch/20201028233831.610076-1-vladimir.oltean@nxp.com/
====================

Link: https://lore.kernel.org/r/20201029022738.722794-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 18:26:02 -07:00
Vladimir Oltean
e5d1f896fd net: mscc: ocelot: support L2 multicast entries
There is one main difference in mscc_ocelot between IP multicast and L2
multicast. With IP multicast, destination ports are encoded into the
upper bytes of the multicast MAC address. Example: to deliver the
address 01:00:5E:11:22:33 to ports 3, 8, and 9, one would need to
program the address of 00:03:08:11:22:33 into hardware. Whereas for L2
multicast, the MAC table entry points to a Port Group ID (PGID), and
that PGID contains the port mask that the packet will be forwarded to.
As to why it is this way, no clue. My guess is that not all port
combinations can be supported simultaneously with the limited number of
PGIDs, and this was somehow an issue for IP multicast but not for L2
multicast. Anyway.

Prior to this change, the raw L2 multicast code was bogus, due to the
fact that there wasn't really any way to test it using the bridge code.
There were 2 issues:
- A multicast PGID was allocated for each MDB entry, but it wasn't in
  fact programmed to hardware. It was dummy.
- In fact we don't want to reserve a multicast PGID for every single MDB
  entry. That would be odd because we can only have ~60 PGIDs, but
  thousands of MDB entries. So instead, we want to reserve a multicast
  PGID for every single port combination for multicast traffic. And
  since we can have 2 (or more) MDB entries delivered to the same port
  group (and therefore PGID), we need to reference-count the PGIDs.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 18:25:56 -07:00
Vladimir Oltean
bb8d53fd94 net: mscc: ocelot: make entry_type a member of struct ocelot_multicast
This saves a re-classification of the MDB address on deletion.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 18:25:56 -07:00
Vladimir Oltean
728e69ae29 net: mscc: ocelot: remove the "new" variable in ocelot_port_mdb_add
It is Not Needed, a comment will suffice.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 18:25:56 -07:00
Vladimir Oltean
ebbd860e25 net: mscc: ocelot: use ether_addr_copy
Since a helper is available for copying Ethernet addresses, let's use it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 18:25:56 -07:00
Vladimir Oltean
7c31314313 net: mscc: ocelot: classify L2 mdb entries as LOCKED
ocelot.h says:

/* MAC table entry types.
 * ENTRYTYPE_NORMAL is subject to aging.
 * ENTRYTYPE_LOCKED is not subject to aging.
 * ENTRYTYPE_MACv4 is not subject to aging. For IPv4 multicast.
 * ENTRYTYPE_MACv6 is not subject to aging. For IPv6 multicast.
 */

We don't want the permanent entries added with 'bridge mdb' to be
subject to aging.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 18:25:55 -07:00
Vladimir Oltean
0e761ac08f net: bridge: explicitly convert between mdb entry state and port group flags
When creating a new multicast port group, there is implicit conversion
between the __u8 state member of struct br_mdb_entry and the unsigned
char flags member of struct net_bridge_port_group. This implicit
conversion relies on the fact that MDB_PERMANENT is equal to
MDB_PG_FLAGS_PERMANENT.

Let's be more explicit and convert the state to flags manually.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201028234815.613226-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:58:16 -07:00
Nikolay Aleksandrov
955062b03f net: bridge: mcast: add support for raw L2 multicast groups
Extend the bridge multicast control and data path to configure routes
for L2 (non-IP) multicast groups.

The uapi struct br_mdb_entry union u is extended with another variant,
mac_addr, which does not change the structure size, and which is valid
when the proto field is zero.

To be compatible with the forwarding code that is already in place,
which acts as an IGMP/MLD snooping bridge with querier capabilities, we
need to declare that for L2 MDB entries (for which there exists no such
thing as IGMP/MLD snooping/querying), that there is always a querier.
Otherwise, these entries would be flooded to all bridge ports and not
just to those that are members of the L2 multicast group.

Needless to say, only permanent L2 multicast groups can be installed on
a bridge port.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201028233831.610076-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:49:19 -07:00
Jakub Kicinski
8ece853d12 Merge branch 'sfc-ef100-tso-enhancements'
Edward Cree says:

====================
sfc: EF100 TSO enhancements

Support TSO over encapsulation (with GSO_PARTIAL), and over VLANs
 (which the code already handled but we didn't advertise).  Also
 correct our handling of IPID mangling.

I couldn't find documentation of exactly what shaped SKBs we can
 get given, so patch #2 is slightly guesswork, but when I tested
 TSO over both underlay and (VxLAN) overlay, the checksums came
 out correctly, so at least in those cases the edits we're making
 must be the right ones.
Similarly, I'm not 100% sure I've correctly understood how FIXEDID
 and MANGLEID are supposed to work in patch #3.
====================

Link: https://lore.kernel.org/r/6e1ea05f-faeb-18df-91ef-572445691d89@solarflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:43:03 -07:00
Edward Cree
b61e8100dc sfc: advertise our vlan features
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:42:53 -07:00
Edward Cree
dbe2f251f9 sfc: only use fixed-id if the skb asks for it
AIUI, the NETIF_F_TSO_MANGLEID flag is a signal to the stack that a
 driver may _need_ to mangle IDs in order to do TSO, and conversely
 a signal from the stack that the driver is permitted to do so.
Since we support both fixed and incrementing IPIDs, we should rely
 on the SKB_GSO_FIXEDID flag on a per-skb basis, rather than using
 the MANGLEID feature to make all TSOs fixed-id.
Includes other minor cleanups of ef100_make_tso_desc() coding style.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:42:53 -07:00
Edward Cree
806f9f23b6 sfc: implement encap TSO on EF100
The NIC only needs to know where the headers it has to edit (TCP and
 inner and outer IPv4) are, which fits GSO_PARTIAL nicely.
It also supports non-PARTIAL offload of UDP tunnels, again just
 needing to be told the outer transport offset so that it can edit
 the UDP length field.
(It's not clear to me whether the stack will ever use the non-PARTIAL
 version with the netdev feature flags we're setting here.)

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:42:53 -07:00
Edward Cree
a7a375ca56 sfc: extend bitfield macros to 17 fields
We need EFX_POPULATE_OWORD_17 for an encap TSO descriptor on EF100.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:42:53 -07:00
Jakub Kicinski
dc956588d4 Merge branch 'net-ipa-minor-bug-fixes'
Alex Elder says:

====================
net: ipa: minor bug fixes

This series fixes several bugs.  They are minor, in that the code
currently works on supported platforms even without these patches
applied, but they're bugs nevertheless and should be fixed.

Version 2 improves the commit message for the fourth patch.  It also
fixes a bug in two spots in the last patch.  Both of these changes
were suggested by Willem de Bruijn.
====================

Link: https://lore.kernel.org/r/20201028194148.6659-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:20:18 -07:00
Alex Elder
4a0d7579d4 net: ipa: avoid going past end of resource group array
The minimum and maximum limits for resources assigned to a given
resource group are programmed in pairs, with the limits for two
groups set in a single register.

If the number of supported resource groups is odd, only half of the
register that defines these limits is valid for the last group; that
group has no second group in the pair.

Currently we ignore this constraint, and it turns out to be harmless,
but it is not guaranteed to be.  This patch addresses that, and adds
support for programming the 5th resource group's limits.

Rework how the resource group limit registers are programmed by
having a single function program all group pairs rather than having
one function program each pair.  Add the programming of the 4-5
resource group pair limits to this function.  If a resource group is
not supported, pass a null pointer to ipa_resource_config_common()
for that group and have that function write zeroes in that case.

Tested-by: Sujit Kautkar <sujitka@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:20:13 -07:00
Alex Elder
8c365f747f net: ipa: distinguish between resource group types
The number of resource groups supported by the hardware can be
different for source and destination resources.  Determine the
number supported for each using separate functions.  Make the
functions inline end move their definitions into "ipa_reg.h",
because they determine whether certain register definitions are
valid.  Pass just the IPA hardware version as argument.

IPA_RESOURCE_GROUP_COUNT represents the maximum number of resource
groups the driver supports for any hardware version.  Change that
symbol to be two separate constants, one for source and the other
for destination resource groups.  Rename them to end with "_MAX"
rather than "_COUNT", to reflect their true purpose.

Tested-by: Sujit Kautkar <sujitka@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:20:10 -07:00
Alex Elder
2d2653424c net: ipa: assign endpoint to a resource group
The IPA hardware manages various resources (e.g. descriptors)
internally to perform its functions.  The resources are grouped,
allowing different endpoints to use separate resource pools.  This
way one group of endpoints can be configured to operate unaffected
by the resource use of endpoints in a different group.

Endpoints should be assigned to a resource group, but we currently
don't do that.

Define a new resource_group field in the endpoint configuration
data, and use it to assign the proper resource group to use for
each AP endpoint.

Tested-by: Sujit Kautkar <sujitka@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:20:07 -07:00
Alex Elder
d773f404c8 net: ipa: fix resource group field mask definition
The mask for the RSRC_GRP field in the INIT_RSRC_GRP endpoint
initialization register is incorrectly defined for IPA v4.2 (where
it is only one bit wide).  So we need to fix this.

The fix is not straightforward, however.  Field masks are passed to
functions like u32_encode_bits(), and for that they must be constant.

To address this, we define a new inline function that returns the
*encoded* value to use for a given RSRC_GRP field, which depends on
the IPA version.  The caller can then use something like this, to
assign a given endpoint resource id 1:

    u32 offset = IPA_REG_ENDP_INIT_RSRC_GRP_N_OFFSET(endpoint_id);
    u32 val = rsrc_grp_encoded(ipa->version, 1);

    iowrite32(val, ipa->reg_virt + offset);

The next patch requires this fix.

Tested-by: Sujit Kautkar <sujitka@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:20:03 -07:00
Alex Elder
279dc95574 net: ipa: assign proper packet context base
At the end of ipa_mem_setup() we write the local packet processing
context base register to tell it where the processing context memory
is.  But we are writing the wrong value.

The value written turns out to be the offset of the modem header
memory region (assigned earlier in the function).  Fix this bug.

Tested-by: Sujit Kautkar <sujitka@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:19:58 -07:00
Moritz Fischer
c1181f42ff net: dec: tulip: de2104x: Add shutdown handler to stop NIC
The driver does not implement a shutdown handler which leads to issues
when using kexec in certain scenarios. The NIC keeps on fetching
descriptors which gets flagged by the IOMMU with errors like this:

DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000
DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000
DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000
DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000
DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000

Signed-off-by: Moritz Fischer <mdf@kernel.org>
Link: https://lore.kernel.org/r/20201028172125.496942-1-mdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:14:38 -07:00
Robert Hancock
1887023a5e net: phy: marvell: add special handling of Finisar modules with 88E1111
The Finisar FCLF8520P2BTL 1000BaseT SFP module uses a Marvel 88E1111 PHY
with a modified PHY ID. Add support for this ID using the 88E1111
methods.

By default these modules do not have 1000BaseX auto-negotiation enabled,
which is not generally desirable with Linux networking drivers. Add
handling to enable 1000BaseX auto-negotiation when these modules are
used in 1000BaseX mode. Also, some special handling is required to ensure
that 1000BaseT auto-negotiation is enabled properly when desired.

Based on existing handling in the AMD xgbe driver and the information in
the Finisar FAQ:
https://www.finisar.com/sites/default/files/resources/an-2036_1000base-t_sfp_faqreve1.pdf

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20201028171540.1700032-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 17:11:44 -07:00
Jakub Kicinski
be25f43aed Merge branch 'sctp-implement-rfc6951-udp-encapsulation-of-sctp'
Xin Long says:

====================
sctp: Implement RFC6951: UDP Encapsulation of SCTP

Description From the RFC:

   The Main Reasons:

   o  To allow SCTP traffic to pass through legacy NATs, which do not
      provide native SCTP support as specified in [BEHAVE] and
      [NATSUPP].

   o  To allow SCTP to be implemented on hosts that do not provide
      direct access to the IP layer.  In particular, applications can
      use their own SCTP implementation if the operating system does not
      provide one.

   Implementation Notes:

   UDP-encapsulated SCTP is normally communicated between SCTP stacks
   using the IANA-assigned UDP port number 9899 (sctp-tunneling) on both
   ends.  There are circumstances where other ports may be used on
   either end, and it might be required to use ports other than the
   registered port.

   Each SCTP stack uses a single local UDP encapsulation port number as
   the destination port for all its incoming SCTP packets, this greatly
   simplifies implementation design.

   An SCTP implementation supporting UDP encapsulation MUST maintain a
   remote UDP encapsulation port number per destination address for each
   SCTP association.  Again, because the remote stack may be using ports
   other than the well-known port, each port may be different from each
   stack.  However, because of remapping of ports by NATs, the remote
   ports associated with different remote IP addresses may not be
   identical, even if they are associated with the same stack.

   Because the well-known port might not be used, implementations need
   to allow other port numbers to be specified as a local or remote UDP
   encapsulation port number through APIs.

Patches:

   This patchset is using the udp4/6 tunnel APIs to implement the UDP
   Encapsulation of SCTP with not much change in SCTP protocol stack
   and with all current SCTP features keeped in Linux Kernel.

   1 - 4: Fix some UDP issues that may be triggered by SCTP over UDP.
   5 - 7: Process incoming UDP encapsulated packets and ICMP packets.
   8 -10: Remote encap port's update by sysctl, sockopt and packets.
   11-14: Process outgoing pakects with UDP encapsulated and its GSO.
   15-16: Add the part from draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
      17: Enable this feature.

Tests:

  - lksctp-tools/src/func_tests with UDP Encapsulation enabled/disabled:

      Both make v4test and v6test passed.

  - sctp-tests with UDP Encapsulation enabled/disabled:

      repeatability/procdumps/sctpdiag/gsomtuchange/extoverflow/
      sctphashtable passed. Others failed as expected due to those
      "iptables -p sctp" rules.

  - netperf on lo/netns/virtio_net, with gso enabled/disabled and
    with ip_checksum enabled/disabled, with UDP Encapsulation
    enabled/disabled:

      No clear performance dropped.

v1->v2:
  - Fix some incorrect code in the patches 5,6,8,10,11,13,14,17, suggested
    by Marcelo.
  - Append two patches 15-16 to add the Additional Considerations for UDP
    Encapsulation of SCTP from draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
v2->v3:
  - remove the cleanup code in patch 2, suggested by Willem.
  - remove the patch 3 and fix the checksum in the new patch 3 after
    talking with Paolo, Marcelo and Guillaume.
  - add 'select NET_UDP_TUNNEL' in patch 4 to solve a compiling error.
  - fix __be16 type cast warning in patch 8.
  - fix the wrong endian orders when setting values in 14,16.
v3->v4:
  - add entries in ip-sysctl.rst in patch 7,16, as Marcelo Suggested.
  - not create udp socks when udp_port is set to 0 in patch 16, as
    Marcelo noticed.
v4->v5:
  - improve the description for udp_port and encap_port entries in patch
    7, 16.
  - use 0 as the default udp_port.
====================

Link: https://lore.kernel.org/r/cover.1603955040.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 15:24:54 -07:00
Xin Long
046c052b47 sctp: enable udp tunneling socks
This patch is to enable udp tunneling socks by calling
sctp_udp_sock_start() in sctp_ctrlsock_init(), and
sctp_udp_sock_stop() in sctp_ctrlsock_exit().

Also add sysctl udp_port to allow changing the listening
sock's port by users.

Wit this patch, the whole sctp over udp feature can be
enabled and used.

v1->v2:
  - Also update ctl_sock udp_port in proc_sctp_do_udp_port()
    where netns udp_port gets changed.
v2->v3:
  - Call htons() when setting sk udp_port from netns udp_port.
v3->v4:
  - Not call sctp_udp_sock_start() when new_value is 0.
  - Add udp_port entry in ip-sysctl.rst.
v4->v5:
  - Not call sctp_udp_sock_start/stop() in sctp_ctrlsock_init/exit().
  - Improve the description of udp_port in ip-sysctl.rst.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 15:24:49 -07:00