Commit Graph

1017154 Commits

Author SHA1 Message Date
Xin Long
7307e4fa4d sctp: enable PLPMTUD when the transport is ready
sctp_transport_pl_reset() is called whenever any of these 3 members in
transport is changed:

  - probe_interval
  - param_flags & SPP_PMTUD_ENABLE
  - state == ACTIVE

If all are true, start the PLPMTUD when it's not yet started. If any of
these is false, stop the PLPMTUD when it's already running.

sctp_transport_pl_update() is called when the transport dst has changed.
It will restart the PLPMTUD probe. Again, the pathmtu won't change but
use the dst's mtu until the Search phase is done.

Note that after using PLPMTUD, the pathmtu is only initialized with the
dst mtu when the transport dst changes. At other time it is updated by
pl.pmtu. So sctp_transport_pmtu_check() will be called only when PLPMTUD
is disabled in sctp_packet_config().

After this patch, the PLPMTUD feature from RFC8899 will be activated
and can be used by users.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:52 -07:00
Xin Long
8369640831 sctp: do state transition when receiving an icmp TOOBIG packet
PLPMTUD will short-circuit the old process for icmp TOOBIG packets.
This part is described in rfc8899#section-4.6.2 (PL_PTB_SIZE =
PTB_SIZE - other_headers_len). Note that from rfc8899#section-5.2
State Machine, each case below is for some specific states only:

  a) PL_PTB_SIZE < MIN_PLPMTU || PL_PTB_SIZE >= PROBED_SIZE,
     discard it, for any state

  b) MIN_PLPMTU < PL_PTB_SIZE < BASE_PLPMTU,
     Base -> Error, for Base state

  c) BASE_PLPMTU <= PL_PTB_SIZE < PLPMTU,
     Search -> Base or Complete -> Base, for Search and Complete states.

  d) PLPMTU < PL_PTB_SIZE < PROBED_SIZE,
     set pl.probe_size to PL_PTB_SIZE then verify it, for Search state.

The most important one is case d), which will help find the optimal
fast during searching. Like when pathmtu = 1392 for SCTP over IPv4,
the search will be (20 is iphdr_len):

  1. probe with 1200 - 20
  2. probe with 1232 - 20
  3. probe with 1264 - 20
  ...
  7. probe with 1388 - 20
  8. probe with 1420 - 20

When sending the probe with 1420 - 20, TOOBIG may come with PL_PTB_SIZE =
1392 - 20. Then it matches case d), and saves some rounds to try with the
1392 - 20 probe. But of course, PLPMTUD doesn't trust TOOBIG packets, and
it will go back to the common searching once the probe with the new size
can't be verified.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:52 -07:00
Xin Long
b87641aff9 sctp: do state transition when a probe succeeds on HB ACK recv path
As described in rfc8899#section-5.2, when a probe succeeds, there might
be the following state transitions:

  - Base -> Search, occurs when probe succeeds with BASE_PLPMTU,
    pl.pmtu is not changing,
    pl.probe_size increases by SCTP_PL_BIG_STEP,

  - Error -> Search, occurs when probe succeeds with BASE_PLPMTU,
    pl.pmtu is changed from SCTP_MIN_PLPMTU to SCTP_BASE_PLPMTU,
    pl.probe_size increases by SCTP_PL_BIG_STEP.

  - Search -> Search Complete, occurs when probe succeeds with the probe
    size SCTP_MAX_PLPMTU less than pl.probe_high,
    pl.pmtu is not changing, but update *pathmtu* with it,
    pl.probe_size is set back to pl.pmtu to double check it.

  - Search Complete -> Search, occurs when probe succeeds with the probe
    size equal to pl.pmtu,
    pl.pmtu is not changing,
    pl.probe_size increases by SCTP_PL_MIN_STEP.

So search process can be described as:

 1. When it just enters 'Search' state, *pathmtu* is not updated with
    pl.pmtu, and probe_size increases by a big step (SCTP_PL_BIG_STEP)
    each round.

 2. Until pl.probe_high is set when a probe fails, and probe_size
    decreases back to pl.pmtu, as described in the last patch.

 3. When the probe with the new size succeeds, probe_size changes to
    increase by a small step (SCTP_PL_MIN_STEP) due to pl.probe_high
    is set.

 4. Until probe_size is next to pl.probe_high, the searching finishes and
    it goes to 'Complete' state and updates *pathmtu* with pl.pmtu, and
    then probe_size is set to pl.pmtu to confirm by once more probe.

 5. This probe occurs after "30 * probe_inteval", a much longer time than
    that in Search state. Once it is done it goes to 'Search' state again
    with probe_size increased by SCTP_PL_MIN_STEP.

As we can see above, during the searching, pl.pmtu changes while *pathmtu*
doesn't. *pathmtu* is only updated when the search finishes by which it
gets an optimal value for it. A big step is used at the beginning until
it gets close to the optimal value, then it changes to a small step until
it has this optimal value.

The small step is also used in 'Complete' until it goes to 'Search' state
again and the probe with 'pmtu + the small step' succeeds, which means a
higher size could be used. Then probe_size changes to increase by a big
step again until it gets close to the next optimal value.

Note that anytime when black hole is detected, it goes directly to 'Base'
state with pl.pmtu set to SCTP_BASE_PLPMTU, as described in the last patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:52 -07:00
Xin Long
1dc68c1945 sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send path
The state transition is described in rfc8899#section-5.2,
PROBE_COUNT == MAX_PROBES means the probe fails for MAX times, and the
state transition includes:

  - Base -> Error, occurs when BASE_PLPMTU Confirmation Fails,
    pl.pmtu is set to SCTP_MIN_PLPMTU,
    probe_size is still SCTP_BASE_PLPMTU;

  - Search -> Base, occurs when Black Hole Detected,
    pl.pmtu is set to SCTP_BASE_PLPMTU,
    probe_size is set back to SCTP_BASE_PLPMTU;

  - Search Complete -> Base, occurs when Black Hole Detected
    pl.pmtu is set to SCTP_BASE_PLPMTU,
    probe_size is set back to SCTP_BASE_PLPMTU;

Note a black hole is encountered when a sender is unaware that packets
are not being delivered to the destination endpoint. So it includes the
probe failures with equal probe_size to pl.pmtu, and definitely not
include that with greater probe_size than pl.pmtu. The later one is the
normal probe failure where probe_size should decrease back to pl.pmtu
and pl.probe_high is set.  pl.probe_high would be used on HB ACK recv
path in the next patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:52 -07:00
Xin Long
fe59379b9a sctp: do the basic send and recv for PLPMTUD probe
This patch does exactly what rfc8899#section-6.2.1.2 says:

   The SCTP sender needs to be able to determine the total size of a
   probe packet.  The HEARTBEAT chunk could carry a Heartbeat
   Information parameter that includes, besides the information
   suggested in [RFC4960], the probe size to help an implementation
   associate a HEARTBEAT ACK with the size of probe that was sent.  The
   sender could also use other methods, such as sending a nonce and
   verifying the information returned also contains the corresponding
   nonce.  The length of the PAD chunk is computed by reducing the
   probing size by the size of the SCTP common header and the HEARTBEAT
   chunk.

Note that HB ACK chunk will carry back whatever HB chunk carried, including
the probe_size we put it in; We also check hbinfo->probe_size in the HB ACK
against link->pl.probe_size to validate this HB ACK chunk.

v1->v2:
  - Remove the unused 'sp' and add static for sctp_packet_bundle_pad().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:52 -07:00
Xin Long
92548ec2f1 sctp: add the probe timer in transport for PLPMTUD
There are 3 timers described in rfc8899#section-5.1.1:

  PROBE_TIMER, PMTU_RAISE_TIMER, CONFIRMATION_TIMER

This patches adds a 'probe_timer' in transport, and it works as either
PROBE_TIMER or PMTU_RAISE_TIMER. At most time, it works as PROBE_TIMER
and expires every a 'probe_interval' time to send the HB probe packet.
When transport pl enters COMPLETE state, it works as PMTU_RAISE_TIMER
and expires in 'probe_interval * 30' time to go back to SEARCH state
and do searching again.

SCTP HB is an acknowledged packet, CONFIRMATION_TIMER is not needed.

The timer will start when transport pl enters BASE state and stop
when it enters DISABLED state.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:52 -07:00
Xin Long
d9e2e410ae sctp: add the constants/variables and states and some APIs for transport
These are 4 constants described in rfc8899#section-5.1.2:

  MAX_PROBES, MIN_PLPMTU, MAX_PLPMTU, BASE_PLPMTU;

And 2 variables described in rfc8899#section-5.1.3:

  PROBED_SIZE, PROBE_COUNT;

And 5 states described in rfc8899#section-5.2:

  DISABLED, BASE, SEARCH, SEARCH_COMPLETE, ERROR;

And these 4 APIs are used to reset/update PLPMTUD, check if PLPMTUD is
enabled, and calculate the additional headers length for a transport.

Note the member 'probe_high' in transport will be set to the probe
size when a probe fails with this probe size in the next patches.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:52 -07:00
Xin Long
3190b649b4 sctp: add SCTP_PLPMTUD_PROBE_INTERVAL sockopt for sock/asoc/transport
With this socket option, users can change probe_interval for
a transport, asoc or sock after it's created.

Note that if the change is for an asoc, also apply the change
to each transport in this asoc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:51 -07:00
Xin Long
d1e462a7a5 sctp: add probe_interval in sysctl and sock/asoc/transport
PLPMTUD can be enabled by doing 'sysctl -w net.sctp.probe_interval=n'.
'n' is the interval for PLPMTUD probe timer in milliseconds, and it
can't be less than 5000 if it's not 0.

All asoc/transport's PLPMTUD in a new socket will be enabled by default.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:51 -07:00
Xin Long
745a32117b sctp: add pad chunk and its make function and event table
This chunk is defined in rfc4820#section-3, and used to pad an
SCTP packet. The receiver must discard this chunk and continue
processing the rest of the chunks in the packet.

Add it now, as it will be bundled with a heartbeat chunk to probe
pmtu in the following patches.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:51 -07:00
Lorenzo Bianconi
aff0824dc4 net: marvell: return csum computation result from mvneta_rx_csum/mvpp2_rx_csum
This is a preliminary patch to add hw csum hint support to
mvneta/mvpp2 xdp implementation

Tested-by: Matteo Croce <mcroce@linux.microsoft.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:55:05 -07:00
David S. Miller
f84974e75f Merge branch 'tc-testing-dnat-tuple-collision'
Marcelo Ricardo Leitner says:

====================
tc-testing: add test for ct DNAT tuple collision

That was fixed in 13c62f5371 ("net/sched: act_ct: handle DNAT tuple
collision").

For that, it requires that tdc is able to send diverse packets with
scapy, which is then done on the 2nd patch of this series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:52:40 -07:00
Marcelo Ricardo Leitner
e469056413 tc-testing: add test for ct DNAT tuple collision
When this test fails, /proc/net/nf_conntrack gets only 1 entry:
ipv4     2 tcp      6 119 SYN_SENT src=10.0.0.10 dst=10.0.0.10 sport=5000 dport=10 [UNREPLIED] src=20.0.0.1 dst=10.0.0.10 sport=10 dport=5000 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2

When it works, it gets 2 entries:
ipv4     2 tcp      6 119 SYN_SENT src=10.0.0.10 dst=10.0.0.20 sport=5000 dport=10 [UNREPLIED] src=20.0.0.1 dst=10.0.0.10 sport=10 dport=58203 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2
ipv4     2 tcp      6 119 SYN_SENT src=10.0.0.10 dst=10.0.0.10 sport=5000 dport=10 [UNREPLIED] src=20.0.0.1 dst=10.0.0.10 sport=10 dport=5000 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2

The missing entry is because the 2nd packet hits a tuple collusion and the
conntrack entry doesn't get allocated.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:52:39 -07:00
Marcelo Ricardo Leitner
11f04de902 tc-testing: add support for sending various scapy packets
It can be worth sending different scapy packets on a given test, as in the
last patch of this series. For that, lets listify the scapy attribute and
simply iterate over it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:52:39 -07:00
Marcelo Ricardo Leitner
b4fd096cbb tc-testing: fix list handling
python lists don't have an 'add' method, but 'append'.

Fixes: 14e5175e9e ("tc-testing: introduce scapyPlugin for basic traffic")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:52:39 -07:00
Loic Poulain
1b134d8d75 MAINTAINERS: network: add entry for WWAN
This patch adds maintainer info for drivers/net/wwan subdir, including
WWAN core and drivers. Adding Sergey and myself as maintainers and
Johannes as reviewer.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:48:16 -07:00
Aaron Conole
c4ab7b56be openvswitch: add trace points
This makes openvswitch module use the event tracing framework
to log the upcall interface and action execution pipeline.  When
using openvswitch as the packet forwarding engine, some types of
debugging are made possible simply by using the ovs-vswitchd's
ofproto/trace command.  However, such a command has some
limitations:

  1. When trying to trace packets that go through the CT action,
     the state of the packet can't be determined, and probably
     would be potentially wrong.

  2. Deducing problem packets can sometimes be difficult as well
     even if many of the flows are known

  3. It's possible to use the openvswitch module even without
     the ovs-vswitchd (although, not common use).

Introduce the event tracing points here to make it possible for
working through these problems in kernel space.  The style is
copied from the mac80211 driver-trace / trace code for
consistency - this creates some checkpatch splats, but the
official 'guide' for adding tracepoints, as well as the existing
examples all add the same splats so it seems acceptable.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:47:32 -07:00
Dan Carpenter
b0e03950dd stmmac: dwmac-loongson: fix uninitialized variable in loongson_dwmac_probe()
The "mdio" variable is never set to false.  Also it should be a bool
type instead of int.

Fixes: 30bba69d7d ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:44:50 -07:00
David S. Miller
a4bdf76f54 Merge branch 'ethtool-eeprom'
Ido Schimmel says:

====================
ethtool: Module EEPROM API improvements

This patchset contains various improvements to recently introduced
module EEPROM netlink API. Noticed these while adding module EEPROM
write support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:55 -07:00
Ido Schimmel
88f9a87afe ethtool: Validate module EEPROM offset as part of policy
Validate the offset to read from module EEPROM as part of the netlink
policy and remove the corresponding check from the code.

This also makes it possible to query the offset range from user space:

 $ genl ctrl policy name ethtool
 ...
 ID: 0x14  policy[32]:attr[2]: type=U32 range:[0,255]
 ...

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:54 -07:00
Ido Schimmel
0dc7dd02ba ethtool: Validate module EEPROM length as part of policy
Validate the number of bytes to read from the module EEPROM as part of
the netlink policy and remove the corresponding check from the code.

This also makes it possible to query the length range from user space:

 $ genl ctrl policy name ethtool
 ...
 ID: 0x14  policy[32]:attr[3]: type=U32 range:[1,128]
 ...

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:54 -07:00
Ido Schimmel
b8c48be23c ethtool: Use kernel data types for internal EEPROM struct
The struct is not visible to user space and therefore should not use the
user visible data types.

Instead, use internal data types like other structures in the file.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:54 -07:00
Ido Schimmel
37a025e839 ethtool: Document behavior when module EEPROM bank attribute is omitted
The kernel assumes bank 0 when 'ETHTOOL_MSG_MODULE_EEPROM_GET' is sent
without 'ETHTOOL_A_MODULE_EEPROM_BANK'.

Document it as part of the interface documentation.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:54 -07:00
Ido Schimmel
f5fe211d13 ethtool: Decrease size of module EEPROM get policy array
The 'ETHTOOL_A_MODULE_EEPROM_DATA' attribute is not part of the get
request.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:54 -07:00
Ido Schimmel
913d026fbf ethtool: Document correct attribute type
'ETHTOOL_A_MODULE_EEPROM_DATA' is a binary attribute, not a nested one.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:54 -07:00
Ido Schimmel
78c57f22e3 ethtool: Use correct command name in title
The command is called 'ETHTOOL_MSG_MODULE_EEPROM_GET', not
'ETHTOOL_MSG_MODULE_EEPROM'.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:40:54 -07:00
gushengxian
98534fce52 bridge: cfm: remove redundant return
Return statements are not needed in Void function.

Signed-off-by: gushengxian <gushengxian@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:35:15 -07:00
Kees Cook
f2fcffe392 hv_netvsc: Avoid field-overflowing memcpy()
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.

Add flexible array to represent start of buf_info, improving readability
and avoid future warning where memcpy() thinks it is writing past the
end of the structure.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:20:51 -07:00
Florian Fainelli
64a81b2448 net: dsa: b53: Create default VLAN entry explicitly
In case CONFIG_VLAN_8021Q is not set, there will be no call down to the
b53 driver to ensure that the default PVID VLAN entry will be configured
with the appropriate untagged attribute towards the CPU port. We were
implicitly relying on dsa_slave_vlan_rx_add_vid() to do that for us,
instead make it explicit.

Reported-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:19:39 -07:00
Kees Cook
ee8e7622e0 octeontx2-af: Avoid field-overflowing memcpy()
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.

To avoid having memcpy() think a u64 "prof" is being written beyond,
adjust the prof member type by adding struct nix_bandprof_s to the union
to match the other structs. This silences the following future warning:

In file included from ./include/linux/string.h:253,
                 from ./include/linux/bitmap.h:10,
                 from ./include/linux/cpumask.h:12,
                 from ./arch/x86/include/asm/cpumask.h:5,
                 from ./arch/x86/include/asm/msr.h:11,
                 from ./arch/x86/include/asm/processor.h:22,
                 from ./arch/x86/include/asm/timex.h:5,
                 from ./include/linux/timex.h:65,
                 from ./include/linux/time32.h:13,
                 from ./include/linux/time.h:60,
                 from ./include/linux/stat.h:19,
                 from ./include/linux/module.h:13,
                 from drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:11:
In function '__fortify_memcpy_chk',
    inlined from '__fortify_memcpy' at ./include/linux/fortify-string.h:310:2,
    inlined from 'rvu_nix_blk_aq_enq_inst' at drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:910:5:
./include/linux/fortify-string.h:268:4: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); please use struct_group() [-Wattribute-warning]
  268 |    __write_overflow_field();
      |    ^~~~~~~~~~~~~~~~~~~~~~~~

drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:
...
                        else if (req->ctype == NIX_AQ_CTYPE_BANDPROF)
                                memcpy(&rsp->prof, ctx,
                                       sizeof(struct nix_bandprof_s));
...

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Subbaraya Sundeep<sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:19:01 -07:00
David S. Miller
78c235f9ea Merge branch 'wwan-link-creation-improvements'
Sergey Ryazanov says:

====================
net: WWAN link creation improvements

This series is intended to make the WWAN network links management easier
for WWAN device drivers.

The series begins with adding support for network links creation to the
WWAN HW simulator to facilitate code testing. Then there are a couple of
changes that prepe the WWAN core code for further modifications. The
following patches (4-6) simplify driver unregistering procedures by
performing the created links cleanup in the WWAN core. 7th patch is to
avoid the odd hold of a driver module. Next patches (8th and 9th) make
it easier for drivers to create a network interface for a default data
channel. Finally, 10th patch adds support for reporting of data link
(aka channel aka context) id to make user aware which network
interface is bound to which WWAN device data channel.

All core changes have been tested with the HW simulator. The MHI and
IOSM drivers were only compile tested as I have no access to this
hardware. So the coresponding patches require ACK from the driver
authors.

Changelog:
  v1 -> v2:
    * rebased on top of latest net-next
    * patch that reworks the creation of mhi_net default netdev was
      dropped; as Loic explained, this network device has different
      purpose depending on a driver mode; Loic has a plan to rework the
      mhi_net driver, so we will defer the default netdev creation
      reworkings
    * add a new patch that creates a default network interface for IOSM
      modems
    * 7th, 8th, 10th patches have a minor updates (see the patches for
      details)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:17 -07:00
Sergey Ryazanov
6994092403 wwan: core: add WWAN common private data for netdev
The WWAN core not only multiplex the netdev configuration data, but
process it too, and needs some space to store its private data
associated with the netdev. Add a structure to keep common WWAN core
data. The structure will be stored inside the netdev private data before
WWAN driver private data and have a field to make it easier to access
the driver data. Also add a helper function that simplifies drivers
access to their data.

At the moment we use the common WWAN private data to store the WWAN data
link (channel) id at the time the link is created, and report it back to
user using the .fill_info() RTNL callback. This should help the user to
be aware which network interface is bound to which WWAN device data
channel.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
CC: M Chetan Kumar <m.chetan.kumar@intel.com>
CC: Intel Corporation <linuxwwan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:17 -07:00
Sergey Ryazanov
83068395bb net: iosm: create default link via WWAN core
Utilize the just introduced WWAN core feature to create a default netdev
for the default data (IP MUX) channel.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
CC: M Chetan Kumar <m.chetan.kumar@intel.com>
CC: Intel Corporation <linuxwwan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:17 -07:00
Sergey Ryazanov
ca374290aa wwan: core: support default netdev creation
Most, if not each WWAN device driver will create a netdev for the
default data channel. Therefore, add an option for the WWAN netdev ops
registration function to create a default netdev for the WWAN device.

A WWAN device driver should pass a default data channel link id to the
ops registering function to request the creation of a default netdev, or
a special value WWAN_NO_DEFAULT_LINK to inform the WWAN core that the
default netdev should not be created.

For now, only wwan_hwsim utilize the default link creation option. Other
drivers will be reworked next.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
CC: M Chetan Kumar <m.chetan.kumar@intel.com>
CC: Intel Corporation <linuxwwan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:16 -07:00
Sergey Ryazanov
9f0248ea47 wwan: core: no more hold netdev ops owning module
The WWAN netdev ops owner holding was used to protect from the
unexpected memory disappear. This approach causes a dependency cycle
(driver -> core -> driver) and effectively prevents a WWAN driver
unloading. E.g. WWAN hwsim could not be unloaded until all simulated
devices are removed:

~# modprobe wwan_hwsim devices=2
~# lsmod | grep wwan
wwan_hwsim             16384  2
wwan                   20480  1 wwan_hwsim
~# rmmod wwan_hwsim
rmmod: ERROR: Module wwan_hwsim is in use
~# echo > /sys/kernel/debug/wwan_hwsim/hwsim0/destroy
~# echo > /sys/kernel/debug/wwan_hwsim/hwsim1/destroy
~# lsmod | grep wwan
wwan_hwsim             16384  0
wwan                   20480  1 wwan_hwsim
~# rmmod wwan_hwsim

For a real device driver this will cause an inability to unload module
until a served device is physically detached.

Since the last commit we are removing all child netdev(s) when a driver
unregister the netdev ops. This allows us to permit the driver
unloading, since any sane driver will call ops unregistering on a device
deinitialization. So, remove the holding of an ops owner to make it
easier to unload a driver module. The owner field has also beed removed
from the ops structure as there are no more users of this field.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:16 -07:00
Sergey Ryazanov
322a0ba99c net: iosm: drop custom netdev(s) removing
Since the last commit, the WWAN core will remove all our network
interfaces for us at the time of the WWAN netdev ops unregistering.
Therefore, we can safely drop the custom code that cleans the list of
created netdevs. Anyway it no longer removes any netdev, since all
netdevs were removed earlier in the wwan_unregister_ops() call.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com>
CC: M Chetan Kumar <m.chetan.kumar@intel.com>
CC: Intel Corporation <linuxwwan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:16 -07:00
Sergey Ryazanov
2f75238014 wwan: core: remove all netdevs on ops unregistering
We use the ops owner module hold to protect against ops memory
disappearing. But this approach does not protect us from a driver that
unregisters ops but forgets to remove netdev(s) that were created using
this ops. In such case, we are left with netdev(s), which can not be
removed since ops is gone. Moreover, batch netdevs removing on
deinitialization is a desireable option for WWAN drivers as it is a
quite common task.

Implement deletion of all created links on WWAN netdev ops unregistering
in the same way that RTNL removes all links on RTNL ops unregistering.
Simply remove all child netdevs of a device whose WWAN netdev ops is
unregistering. This way we protecting the kernel from buggy drivers and
make it easier to write a driver deinitialization code.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:16 -07:00
Sergey Ryazanov
f492fccf3d wwan: core: multiple netdevs deletion support
Use unregister_netdevice_queue() instead of simple
unregister_netdevice() if the WWAN netdev ops does not provide a dellink
callback. This will help to accelerate deletion of multiple netdevs.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:16 -07:00
Sergey Ryazanov
58c3b421c6 wwan: core: require WWAN netdev setup callback existence
The setup callback will be unconditionally passed to the
alloc_netdev_mqs(), where the NULL pointer dereference will cause the
kernel panic. So refuse to register WWAN netdev ops with warning
generation if the setup callback is not provided.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:16 -07:00
Sergey Ryazanov
355a4e7e0a wwan: core: relocate ops registering code
It is unlikely that RTNL callbacks will call WWAN ops (un-)register
functions, but it is highly likely that the ops (un-)register functions
will use RTNL link create/destroy handlers. So move the WWAN network
interface ops (un-)register functions below the RTNL callbacks to be
able to call them without forward declarations.

No functional changes, just code relocation.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:16 -07:00
Sergey Ryazanov
f842f48891 wwan_hwsim: support network interface creation
Add support for networking interface creation via the WWAN core by
registering the WWAN netdev creation ops for each simulated WWAN device.
Implemented minimalistic netdev support where the xmit callback just
consumes all egress skbs.

This should help with WWAN network interfaces creation testing.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:01:16 -07:00
David S. Miller
1a77de09b7 Merge branch 'mptcp-optimizations'
Mat Martineau says:

====================
mptcp: A few optimizations

Here is a set of patches that we've accumulated and tested in the MPTCP
tree.

Patch 1 removes the MPTCP-level tx skb cache that added complexity but
did not provide a meaningful benefit.

Patch 2 uses the fast socket lock in more places.

Patch 3 improves handling of a data-ready flag.

Patch 4 deletes an unnecessary and racy connection state check.

Patch 5 adds a MIB counter for one type of invalid MPTCP header.

Patch 6 improves self test failure output.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 09:57:45 -07:00
Matthieu Baerts
a4debc4772 selftests: mptcp: display proper reason to abort tests
Without this modification, we were often displaying this error messages:

  FAIL: Could not even run loopback test

But $ret could have been set to a non 0 value in many different cases:

- net.mptcp.enabled=0 is not working as expected
- setsockopt(..., TCP_ULP, "mptcp", ...) is allowed
- ping between each netns are failing
- tests between ns1 as a receiver and ns>1 are failing
- other tests not involving ns1 as a receiver are failing

So not only for the loopback test.

Now a clearer message, including the time it took to run all tests, is
displayed.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 09:57:45 -07:00
Paolo Abeni
06285da96a mptcp: add MIB counter for invalid mapping
Account this exceptional events for better introspection.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 09:57:45 -07:00
Paolo Abeni
8cfc47fc2e mptcp: drop redundant test in move_skbs_to_msk()
Currently we check the msk state to avoid enqueuing new
skbs at msk shutdown time.

Such test is racy - as we can't acquire the msk socket lock -
and useless, as the caller already checked the subflow
field 'disposable', covering the same scenario in a race
free manner - read and updated under the ssk socket lock.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 09:57:45 -07:00
Paolo Abeni
3c90e377a1 mptcp: don't clear MPTCP_DATA_READY in sk_wait_event()
If we don't flush entirely the receive queue, we need set
again such bit later. We can simply avoid clearing it.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 09:57:45 -07:00
Paolo Abeni
75e908c336 mptcp: use fast lock for subflows when possible
There are a bunch of callsite where the ssk socket
lock is acquired using the full-blown version eligible for
the fast variant. Let's move to the latter.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 09:57:45 -07:00
Paolo Abeni
8ce568ed06 mptcp: drop tx skb cache
The mentioned cache was introduced to reduce the number of skb
allocation in atomic context, but the required complexity is
excessive.

This change remove the mentioned cache.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 09:57:45 -07:00
David S. Miller
070258effa Merge branch 'marvell-mdio-ACPI'
Marcin Wojtas says:

====================
ACPI MDIO support for Marvell controllers

The third version of the patchset main change is
dropping a clock handling optimisation patch
for mvmdio driver. Other than that it sets
explicit dependency on FWNODE_MDIO for CONFIG_FSL_XGMAC_MDIO
and applies minor cosmetic improvements (please see the
'Changelog' below).

The firmware ACPI description is exposed in the public github branch:
https://github.com/semihalf-wojtas-marcin/edk2-platforms/commits/acpi-mdio-r20210613
There is also MacchiatoBin firmware binary available for testing:
https://drive.google.com/file/d/1eigP_aeM4wYQpEaLAlQzs3IN_w1-kQr0

I'm looking forward to the comments or remarks.

Best regards,
Marcin

Changelog:
v2->v3
* Rebase on top of net-next/master.
* Drop "net: mvmdio: simplify clock handling" patch.
* 1/6 - fix code block comments.
* 2/6 - unchanged
* 3/6 - add "depends on FWNODE_MDIO" for CONFIG_FSL_XGMAC_MDIO
* 4/6 - drop mention about the clocks from the commit message.
* 5/6 - unchanged
* 6/6 - add Andrew's RB.

v1->v2
* 1/7 - new patch
* 2/7 - new patch
* 3/7 - new patch
* 4/7 - new patch
* 5/7 - remove unnecessary `if (has_acpi_companion())` and rebase onto
        the new clock handling
* 6/7 - remove deprecated comment
* 7/7 - no changes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 09:54:55 -07:00
Marcin Wojtas
8d909440ab net: mvpp2: remove unused 'has_phy' field
The 'has_phy' field from struct mvpp2_port is no longer used.
Remove it.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 09:54:55 -07:00