Commit Graph

908231 Commits

Author SHA1 Message Date
Michal Kubecek
217275453b ethtool: provide coalescing parameters with COALESCE_GET request
Implement COALESCE_GET request to get coalescing parameters of a network
device. These are traditionally available via ETHTOOL_GCOALESCE ioctl
request. This commit adds only support for device coalescing parameters,
not per queue coalescing parameters.

Omit attributes with zero values unless they are declared as supported
(i.e. the corresponding bit in ethtool_ops::supported_coalesce_params is
set).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
Michal Kubecek
b51fb7711a ethtool: fix reference leak in ethnl_set_privflags()
Andrew noticed that some handlers for *_SET commands leak a netdev
reference if required ethtool_ops callbacks do not exist. One of them is
ethnl_set_privflags(), a simple reproducer would be e.g.

  ip link add veth1 type veth peer name veth2
  ethtool --set-priv-flags veth1 foo on
  ip link del veth1

Make sure dev_put() is called when ethtool_ops check fails.

Fixes: f265d79959 ("ethtool: set device private flags with PRIVFLAGS_SET request")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:32:36 -07:00
David S. Miller
96376cad35 Merge branch 'ipv6-add-rpl-source-routing'
Alexander Aring says:

====================
net: ipv6: add rpl source routing

This patch series will add handling for RPL source routing handling
and insertion (implement as lwtunnel)! I did an example prototype
implementation in rpld for using this implementation in non-storing mode:

https://github.com/linux-wpan/rpld/tree/nonstoring_mode

I will also present a talk at netdev about it:

https://netdevconf.info/0x14/session.html?talk-extend-segment-routing-for-RPL

In receive handling I add handling for IPIP encapsulation as RFC6554
describes it as possible. For reasons I didn't implemented it yet for
generating such packets because I am not really sure how/when this
should happen. So far I understand there exists a draft yet which
describes the cases (inclusive a Hop-by-Hop option which we also not
support yet).

https://tools.ietf.org/html/draft-ietf-roll-useofrplinfo-35

This is just the beginning to start implementation everything for yet,
step by step. It works for my use cases yet to have it running on a
6LOWPAN _only_ network.

I have some patches for iproute2 as well.

A sidenote: I check on local addresses if they are part of segment
routes, this is just to avoid stupid settings. A use can add addresses
afterwards what I cannot control anymore but then it's users fault to
make such thing. The receive handling checks for this as well which is
required by RFC6554, so the next hops or when it comes back should drop
it anyway.

To make this possible I added functionality to pass the net structure to
the build_state of lwtunnel (I hope I caught all lwtunnels).

Another sidenote: I set the headroom value to 0 as I figured out it will
break on interfaces with IPv6 min mtu if set to non zero for tunnels on
L3.

- Alex

changes since v3:
 - use parse_nested which isn't deprecated - Thanks David Ahern
 - change to return -1 instead errno in exthdr handling to unify
   error code
 - change function name from ipv6_rpl_srh_decompress_size to
   ipv6_rpl_srh_size

changes since v2:
 - add additional segdata length in lwtunnel build_state
 - fix build_state patch by not catching one inline noop function
   if LWTUNNEL is disabled

Alexander Aring (5):
  include: uapi: linux: add rpl sr header definition
  addrconf: add functionality to check on rpl requirements
  net: ipv6: add support for rpl sr exthdr
  net: add net available in build_state
  net: ipv6: add rpl sr tunnel
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
a7a29f9c36 net: ipv6: add rpl sr tunnel
This patch adds functionality to configure routes for RPL source routing
functionality. There is no IPIP functionality yet implemented which can
be added later when the cases when to use IPv6 encapuslation comes more
clear.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
faee676944 net: add net available in build_state
The build_state callback of lwtunnel doesn't contain the net namespace
structure yet. This patch will add it so we can check on specific
address configuration at creation time of rpl source routes.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
8610c7c6e3 net: ipv6: add support for rpl sr exthdr
This patch adds rpl source routing receive handling. Everything works
only if sysconf "rpl_seg_enabled" and source routing is enabled. Mostly
the same behaviour as IPv6 segmentation routing. To handle compression
and uncompression a rpl.c file is created which contains the necessary
functionality. The receive handling will also care about IPv6
encapsulated so far it's specified as possible nexthdr in RFC 6554.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
f37c605936 addrconf: add functionality to check on rpl requirements
This patch adds a functionality to addrconf to check on a specific RPL
address configuration. According to RFC 6554:

To detect loops in the SRH, a router MUST determine if the SRH
includes multiple addresses assigned to any interface on that
router. If such addresses appear more than once and are separated by
at least one address not assigned to that router.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Alexander Aring
cfa933d938 include: uapi: linux: add rpl sr header definition
This patch adds a uapi header for rpl struct definition. The segments
data can be accessed over rpl_segaddr or rpl_segdata macros. In case of
compri and compre is zero the segment data is not compressed and can be
accessed by rpl_segaddr. In the other case the compressed data can be
accessed by rpl_segdata and interpreted as byte array.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
William Dauchy
25629fdaff net, ip_tunnel: fix interface lookup with no key
when creating a new ipip interface with no local/remote configuration,
the lookup is done with TUNNEL_NO_KEY flag, making it impossible to
match the new interface (only possible match being fallback or metada
case interface); e.g: `ip link add tunl1 type ipip dev eth0`

To fix this case, adding a flag check before the key comparison so we
permit to match an interface with no local/remote config; it also avoids
breaking possible userland tools relying on TUNNEL_NO_KEY flag and
uninitialised key.

context being on my side, I'm creating an extra ipip interface attached
to the physical one, and moving it to a dedicated namespace.

Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:16:27 -07:00
David S. Miller
c189b5483c Merge branch 'mptcp-multiple-subflows-path-management'
Mat Martineau says:

====================
Multipath TCP part 3: Multiple subflows and path management

v2 -> v3: Remove 'inline' in .c files, fix uapi bit macros, and rebase.

v1 -> v2: Rebase on current net-next, fix for netlink limit setting,
and update .gitignore for selftest.

This patch set allows more than one TCP subflow to be established and
used for a multipath TCP connection. Subflows are added to an existing
connection using the MP_JOIN option during the 3-way handshake. With
multiple TCP subflows available, sent data is now stored in the MPTCP
socket so it may be retransmitted on any TCP subflow if there is no
DATA_ACK before a timeout. If an MPTCP-level timeout occurs, data is
retransmitted using an available subflow. Storing this sent data
requires the addition of memory accounting at the MPTCP level, which was
previously delegated to the single subflow. Incoming DATA_ACKs now free
data from the MPTCP-level retransmit buffer.

IP addresses available for new subflow connections can now be advertised
and received with the ADD_ADDR option, and the corresponding REMOVE_ADDR
option likewise advertises that an address is no longer available.

The MPTCP path manager netlink interface has commands to set in-kernel
limits for the number of concurrent subflows and control the
advertisement of IP addresses between peers.

To track and debug MPTCP connections there are new MPTCP MIB counters,
and subflow context can be requested using inet_diag. The MPTCP
self-tests now validate multiple-subflow operation and the netlink path
manager interface.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:15:17 -07:00
Paolo Abeni
b08fbf2410 selftests: add test-cases for MPTCP MP_JOIN
Use the pm netlink to configure the creation of several
subflows, and verify that via MIB counters.

Update the mptcp_connect program to allow reliable MP_JOIN
handshake even on small data file

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Paolo Abeni
eedbc68532 selftests: add PM netlink functional tests
This introduces basic self-tests for the PM netlink,
checking the basic APIs and possible exceptional
values.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Paolo Abeni
01cacb00b3 mptcp: add netlink-based PM
Expose a new netlink family to userspace to control the PM, setting:

 - list of local addresses to be signalled.
 - list of local addresses used to created subflows.
 - maximum number of add_addr option to react

When the msk is fully established, the PM netlink attempts to
announce the 'signal' list via the ADD_ADDR option. Since we
currently lack the ADD_ADDR echo (and related event) only the
first addr is sent.

After exhausting the 'announce' list, the PM tries to create
subflow for each addr in 'local' list, waiting for each
connection to be completed before attempting the next one.

Idea is to add an additional PM hook for ADD_ADDR echo, to allow
the PM netlink announcing multiple addresses, in sequence.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Florian Westphal
fc518953bc mptcp: add and use MIB counter infrastructure
Exported via same /proc file as the Linux TCP MIB counters, so "netstat -s"
or "nstat" will show them automatically.

The MPTCP MIB counters are allocated in a distinct pcpu area in order to
avoid bloating/wasting TCP pcpu memory.

Counters are allocated once the first MPTCP socket is created in a
network namespace and free'd on exit.

If no sockets have been allocated, all-zero mptcp counters are shown.

The MIB counter list is taken from the multipath-tcp.org kernel, but
only a few counters have been picked up so far.  The counter list can
be increased at any time later on.

v2 -> v3:
 - remove 'inline' in foo.c files (David S. Miller)

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Davide Caratti
5147dfb508 mptcp: allow dumping subflow context to userspace
add ulp-specific diagnostic functions, so that subflow information can be
dumped to userspace programs like 'ss'.

v2 -> v3:
- uapi: use bit macros appropriate for userspace

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Paolo Abeni
3b1d6210a9 mptcp: implement and use MPTCP-level retransmission
On timeout event, schedule a work queue to do the retransmission.
Retransmission code closely resembles the sendmsg() implementation and
re-uses mptcp_sendmsg_frag, providing a dummy msghdr - for flags'
sake - and peeking the relevant dfrag from the rtx head.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Paolo Abeni
3f8e0aae17 mptcp: rework mptcp_sendmsg_frag to accept optional dfrag
This will simplify mptcp-level retransmission implementation
in the next patch. If dfrag is provided by the caller, skip
kernel space memory allocation and use data and metadata
provided by the dfrag itself.

Because a peer could ack data at TCP level but refrain from
sending mptcp-level ACKs, we could grow the mptcp socket
backlog indefinitely.

We should thus block mptcp_sendmsg until the peer has acked some of the
sent data.

In order to be able to do so, increment the mptcp socket wmem_queued
counter on memory allocation and decrement it when releasing the memory
on mptcp-level ack reception.

Because TCP performns sndbuf auto-tuning up to tcp_wmem_max[2], make
this the mptcp sk_sndbuf limit.

In the future we could add experiment with autotuning as TCP does in
tcp_sndbuf_expand().

v2 -> v3:
 - remove 'inline' in foo.c files (David S. Miller)

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Florian Westphal
7948f6cc99 mptcp: allow partial cleaning of rtx head dfrag
After adding wmem accounting for the mptcp socket we could get
into a situation where the mptcp socket can't transmit more data,
and mptcp_clean_una doesn't reduce wmem even if snd_una has advanced
because it currently will only remove entire dfrags.

Allow advancing the dfrag head sequence and reduce wmem,
even though this isn't correct (as we can't release the page).

Because we will soon block on mptcp sk in case wmem is too large,
call sk_stream_write_space() in case we reduced the backlog so
userspace task blocked in sendmsg or poll will be woken up.

This isn't an issue if the send buffer is large, but it is when
SO_SNDBUF is used to reduce it to a lower value.

Note we can still get a deadlock for low SO_SNDBUF values in
case both sides of the connection write to the socket: both could
be blocked due to wmem being too small -- and current mptcp stack
will only increment mptcp ack_seq on recv.

This doesn't happen with the selftest as it uses poll() and
will always call recv if there is data to read.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Paolo Abeni
d027236c41 mptcp: implement memory accounting for mptcp rtx queue
Charge the data on the rtx queue to the master MPTCP socket, too.
Such memory in uncharged when the data is acked/dequeued.

Also account mptcp sockets inuse via a protocol specific pcpu
counter.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Paolo Abeni
b51f9b80c0 mptcp: introduce MPTCP retransmission timer
The timer will be used to schedule retransmission. It's
frequency is based on the current subflow RTO estimation and
is reset on every una_seq update

The timer is clearer for good by __mptcp_clear_xmit()

Also clean MPTCP rtx queue before each transmission.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Paolo Abeni
18b683bff8 mptcp: queue data for mptcp level retransmission
Keep the send page fragment on an MPTCP level retransmission queue.
The queue entries are allocated inside the page frag allocator,
acquiring an additional reference to the page for each list entry.

Also switch to a custom page frag refill function, to ensure that
the current page fragment can always host an MPTCP rtx queue entry.

The MPTCP rtx queue is flushed at disconnect() and close() time

Note that now we need to call __mptcp_init_sock() regardless of mptcp
enable status, as the destructor will try to walk the rtx_queue.

v2 -> v3:
 - remove 'inline' in foo.c files (David S. Miller)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Paolo Abeni
cc9d256698 mptcp: update per unacked sequence on pkt reception
So that we keep per unacked sequence number consistent; since
we update per msk data, use an atomic64 cmpxchg() to protect
against concurrent updates from multiple subflows.

Initialize the snd_una at connect()/accept() time.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Peter Krystad
926bdeab55 mptcp: Implement path manager interface commands
Fill in more path manager functionality by adding a worker function and
modifying the related stub functions to schedule the worker.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Peter Krystad
ec3edaa7ca mptcp: Add handling of outgoing MP_JOIN requests
Subflow creation may be initiated by the path manager when
the primary connection is fully established and a remote
address has been received via ADD_ADDR.

Create an in-kernel sock and use kernel_connect() to
initiate connection.

Passive sockets can't acquire the mptcp socket lock at
subflow creation time, so an additional list protected by
a new spinlock is used to track the MPJ subflows.

Such list is spliced into conn_list tail every time the msk
socket lock is acquired, so that it will not interfere
with data flow on the original connection.

Data flow and connection failover not addressed by this commit.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Peter Krystad
f296234c98 mptcp: Add handling of incoming MP_JOIN requests
Process the MP_JOIN option in a SYN packet with the same flow
as MP_CAPABLE but when the third ACK is received add the
subflow to the MPTCP socket subflow list instead of adding it to
the TCP socket accept queue.

The subflow is added at the end of the subflow list so it will not
interfere with the existing subflows operation and no data is
expected to be transmitted on it.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Peter Krystad
1b1c7a0ef7 mptcp: Add path manager interface
Add enough of a path manager interface to allow sending of ADD_ADDR
when an incoming MPTCP connection is created. Capable of sending only
a single IPv4 ADD_ADDR option. The 'pm_data' element of the connection
sock will need to be expanded to handle multiple interfaces and IPv6.
Partial processing of the incoming ADD_ADDR is included so the path
manager notification of that event happens at the proper time, which
involves validating the incoming address information.

This is a skeleton interface definition for events generated by
MPTCP.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Peter Krystad
3df523ab58 mptcp: Add ADD_ADDR handling
Add handling for sending and receiving the ADD_ADDR, ADD_ADDR6,
and RM_ADDR suboptions.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Jacob Keller
41b145024c mlx4: fix "initializer element not constant" compiler error
A recent commit e893768179 ("devlink: prepare to support region
operations") used the region_cr_space_str and region_fw_health_str
variables as initializers for the devlink_region_ops structures.

This can result in compiler errors:
drivers/net/ethernet/mellanox//mlx4/crdump.c:45:10: error: initializer
element is not constant
   .name = region_cr_space_str,
           ^
drivers/net/ethernet/mellanox//mlx4/crdump.c:45:10: note: (near
initialization for ‘region_cr_space_ops.name’)
drivers/net/ethernet/mellanox//mlx4/crdump.c:50:10: error: initializer
element is not constant
   .name = region_fw_health_str,

The variables were made to be "const char * const", indicating that both
the pointer and data were constant. This was enough to resolve this on
recent GCC (gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1) for this author).

Unfortunately this is not enough for older compilers to realize that the
variable can be treated as a constant expression.

Fix this by introducing macros for the string and use those instead of
the variable name in the region ops structures.

Reported-by: tanhuazhong <tanhuazhong@huawei.com>
Fixes: e893768179 ("devlink: prepare to support region operations")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:09:39 -07:00
Jacob Keller
9c11cc7849 devlink: don't wrap commands in rST shell blocks
The devlink-region.rst and ice-region.rst documentation files wrapped
some lines within shell code blocks due to being longer than 80 lines.

It was pointed out during review that wrapping these lines shouldn't be
done. Fix these two rST files and remove the line wrapping on these
shell command examples.

Reported-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:08:45 -07:00
René van Dorst
1d01145fd6 net: dsa: mt7530: use resolved link config in mac_link_up()
Convert the mt7530 switch driver to use the finalised link
parameters in mac_link_up() rather than the parameters in mac_config().

Signed-off-by: René van Dorst <opensource@vdorst.com>
Tested-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:05:53 -07:00
Vladimir Oltean
336aa67bd0 net: dsa: sja1105: show more ethtool statistics counters for P/Q/R/S
It looks like the P/Q/R/S series supports some more counters,
generically named "Ethernet statistics counter", which we were not
printing. Add them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:04:20 -07:00
Marcelo Ricardo Leitner
582eea2305 sctp: fix possibly using a bad saddr with a given dst
Under certain circumstances, depending on the order of addresses on the
interfaces, it could be that sctp_v[46]_get_dst() would return a dst
with a mismatched struct flowi.

For example, if when walking through the bind addresses and the first
one is not a match, it saves the dst as a fallback (added in
410f03831c), but not the flowi. Then if the next one is also not a
match, the previous dst will be returned but with the flowi information
for the 2nd address, which is wrong.

The fix is to use a locally stored flowi that can be used for such
attempts, and copy it to the parameter only in case it is a possible
match, together with the corresponding dst entry.

The patch updates IPv6 code mostly just to be in sync. Even though the issue
is also present there, it fallback is not expected to work with IPv6.

Fixes: 410f03831c ("sctp: add routing output fallback")
Reported-by: Jin Meng <meng.a.jin@nokia-sbell.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:01:43 -07:00
Julian Wiedmann
b8f14878e6 s390/qeth: support net namespaces for L3 devices
Enable the L3 driver's IPv4 address notifier to watch for events on qeth
devices that have been moved into a net namespace. We need to program
those IPs into the HW just as usual, otherwise inbound traffic won't
flow.

Fixes: 6133fb1aa1 ("[NETNS]: Disable inetaddr notifiers in namespaces other than initial.")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:58:55 -07:00
Qiujun Huang
5c3e82fe15 sctp: fix refcount bug in sctp_wfree
We should iterate over the datamsgs to move
all chunks(skbs) to newsk.

The following case cause the bug:
for the trouble SKB, it was in outq->transmitted list

sctp_outq_sack
        sctp_check_transmitted
                SKB was moved to outq->sacked list
        then throw away the sack queue
                SKB was deleted from outq->sacked
(but it was held by datamsg at sctp_datamsg_to_asoc
So, sctp_wfree was not called here)

then migrate happened

        sctp_for_each_tx_datachunk(
        sctp_clear_owner_w);
        sctp_assoc_migrate();
        sctp_for_each_tx_datachunk(
        sctp_set_owner_w);
SKB was not in the outq, and was not changed to newsk

finally

__sctp_outq_teardown
        sctp_chunk_put (for another skb)
                sctp_datamsg_put
                        __kfree_skb(msg->frag_list)
                                sctp_wfree (for SKB)
	SKB->sk was still oldsk (skb->sk != asoc->base.sk).

Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Acked-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:57:30 -07:00
Cambda Zhu
a08e7fd912 net: Fix typo of SKB_SGO_CB_OFFSET
The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the
offset of the GSO in skb cb. This patch fixes the typo.

Fixes: 9207f9d45b ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:53:18 -07:00
Qian Cai
fbe4e0c1b2 ipv4: fix a RCU-list lock in fib_triestat_seq_show
fib_triestat_seq_show() calls hlist_for_each_entry_rcu(tb, head,
tb_hlist) without rcu_read_lock() will trigger a warning,

 net/ipv4/fib_trie.c:2579 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by proc01/115277:
  #0: c0000014507acf00 (&p->lock){+.+.}-{3:3}, at: seq_read+0x58/0x670

 Call Trace:
  dump_stack+0xf4/0x164 (unreliable)
  lockdep_rcu_suspicious+0x140/0x164
  fib_triestat_seq_show+0x750/0x880
  seq_read+0x1a0/0x670
  proc_reg_read+0x10c/0x1b0
  __vfs_read+0x3c/0x70
  vfs_read+0xac/0x170
  ksys_read+0x7c/0x140
  system_call+0x5c/0x68

Fix it by adding a pair of rcu_read_lock/unlock() and use
cond_resched_rcu() to avoid the situation where walking of a large
number of items  may prevent scheduling for a long time.

Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:51:28 -07:00
Yuval Basson
3b85720d3f qed: Fix race condition between scheduling and destroying the slowpath workqueue
Calling queue_delayed_work concurrently with
destroy_workqueue might race to an unexpected outcome -
scheduled task after wq is destroyed or other resources
(like ptt_pool) are freed (yields NULL pointer dereference).
cancel_delayed_work prevents the race by cancelling
the timer triggered for scheduling a new task.

Fixes: 59ccf86fe ("qed: Add driver infrastucture for handling mfw requests")
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Yuval Basson <ybason@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:50:17 -07:00
Denis Kirjanov
798dda818a net: page pool: allow to pass zero flags to page_pool_init()
page pool API can be useful for non-DMA cases like
xen-netfront driver so let's allow to pass zero flags to
page pool flags.

v2: check DMA direction only if PP_FLAG_DMA_MAP is set

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:49:20 -07:00
Jian Yang
5ef5c90e3c selftests: move timestamping selftests to net folder
For historical reasons, there are several timestamping selftest targets
in selftests/networking/timestamping. Move them to the standard
directory for networking tests: selftests/net.

Signed-off-by: Jian Yang <jianyang@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:48:30 -07:00
Philippe Schenker
1b68480b94 ARM: dts: apalis-imx6qdl: use rgmii-id instead of rgmii
Until now a PHY-fixup in mach-imx set our rgmii timing correctly. For
the PHY KSZ9131 there is no PHY-fixup in mach-imx. To support this PHY
too, use rgmii-id.
For the now used KSZ9031 nothing will change, as rgmii-id is only
implemented and supported by the KSZ9131.

Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:44:26 -07:00
Philippe Schenker
bd734a742d net: phy: micrel.c: add rgmii interface delay possibility to ksz9131
The KSZ9131 provides DLL controlled delays on RXC and TXC lines. This
patch makes use of those delays. The information which delays should
be enabled or disabled comes from the interface names, documented in
ethernet-controller.yaml:

rgmii:      Disable RXC and TXC delays
rgmii-id:   Enable RXC and TXC delays
rgmii-txid: Enable only TXC delay, disable RXC delay
rgmii-rxid: Enable onlx RXC delay, disable TXC delay

Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:44:26 -07:00
Mark Starovoytov
791bb3fcaf net: macsec: add support for specifying offload upon link creation
This patch adds new netlink attribute to allow a user to (optionally)
specify the desired offload mode immediately upon MACSec link creation.

Separate iproute patch will be required to support this from user space.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:34:21 -07:00
Johannes Berg
be8c827f50 mac80211: fix authentication with iwlwifi/mvm
The original patch didn't copy the ieee80211_is_data() condition
because on most drivers the management frames don't go through
this path. However, they do on iwlwifi/mvm, so we do need to keep
the condition here.

Cc: stable@vger.kernel.org
Fixes: ce2e1ca703 ("mac80211: Check port authorization in the ieee80211_tx_dequeue() case")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:27:44 -07:00
David S. Miller
f0b5989745 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor comment conflict in mac80211.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:25:29 -07:00
Sven Schnelle
3db81afd99 seccomp: Add missing compat_ioctl for notify
Executing the seccomp_bpf testsuite under a 64-bit kernel with 32-bit
userland (both s390 and x86) doesn't work because there's no compat_ioctl
handler defined. Add the handler.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200310123332.42255-1-svens@linux.ibm.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-03-29 21:10:51 -07:00
Reinhard Karcher
d9dac147a2 kbuild: deb-pkg: fix warning when CONFIG_DEBUG_INFO is unset
Creating a Debian package without CONFIG_DEBUG_INFO produces
a warning that no debug package was created.

This patch excludes the debug package from the control file,
if no debug package is created by this configuration.

Signed-off-by: Reinhard Karcher <reinhard.karcher@gmx.net>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-30 09:23:00 +09:00
wenxu
ef803b3cf9 netfilter: flowtable: add counter support in HW offload
Store the conntrack counters to the conntrack entry in the
HW flowtable offload.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30 02:05:40 +02:00
wenxu
9312eabab4 netfilter: conntrack: add nf_ct_acct_add()
Add nf_ct_acct_add function to update the conntrack counter
with packets and bytes.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30 02:05:39 +02:00
Pablo Neira Ayuso
d56aab2625 netfilter: nf_tables: skip set types that do not support for expressions
The bitmap set does not support for expressions, skip it from the
estimation step.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30 02:05:39 +02:00
Pablo Neira Ayuso
8548bde989 netfilter: nft_dynset: validate set expression definition
If the global set expression definition mismatches the dynset
expression, then bail out.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30 02:05:38 +02:00