Commit Graph

843797 Commits

Author SHA1 Message Date
Al Viro
333f7909a8 coallocate socket_wq with socket itself
socket->wq is assign-once, set when we are initializing both
struct socket it's in and struct socket_wq it points to.  As the
matter of fact, the only reason for separate allocation was the
ability to RCU-delay freeing of socket_wq.  RCU-delaying the
freeing of socket itself gets rid of that need, so we can just
fold struct socket_wq into the end of struct socket and simplify
the life both for sock_alloc_inode() (one allocation instead of
two) and for tun/tap oddballs, where we used to embed struct socket
and struct socket_wq into the same structure (now - embedding just
the struct socket).

Note that reference to struct socket_wq in struct sock does remain
a reference - that's unchanged.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 19:25:19 -07:00
Al Viro
6d7855c54e sockfs: switch to ->free_inode()
we do have an RCU-delayed part there already (freeing the wq),
so it's not like the pipe situation; moreover, it might be
worth considering coallocating wq with the rest of struct sock_alloc.
->sk_wq in struct sock would remain a pointer as it is, but
the object it normally points to would be coallocated with
struct socket...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 19:25:19 -07:00
David S. Miller
17ccf9e31e Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-07-09

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Lots of libbpf improvements: i) addition of new APIs to attach BPF
   programs to tracing entities such as {k,u}probes or tracepoints,
   ii) improve specification of BTF-defined maps by eliminating the
   need for data initialization for some of the members, iii) addition
   of a high-level API for setting up and polling perf buffers for
   BPF event output helpers, all from Andrii.

2) Add "prog run" subcommand to bpftool in order to test-run programs
   through the kernel testing infrastructure of BPF, from Quentin.

3) Improve verifier for BPF sockaddr programs to support 8-byte stores
   for user_ip6 and msg_src_ip6 members given clang tends to generate
   such stores, from Stanislav.

4) Enable the new BPF JIT zero-extension optimization for further
   riscv64 ALU ops, from Luke.

5) Fix a bpftool json JIT dump crash on powerpc, from Jiri.

6) Fix an AF_XDP race in generic XDP's receive path, from Ilya.

7) Various smaller fixes from Ilya, Yue and Arnd.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 19:14:38 -07:00
Ilya Maximets
bf0bdd1343 xdp: fix race on generic receive path
Unlike driver mode, generic xdp receive could be triggered
by different threads on different CPU cores at the same time
leading to the fill and rx queue breakage. For example, this
could happen while sending packets from two processes to the
first interface of veth pair while the second part of it is
open with AF_XDP socket.

Need to take a lock for each generic receive to avoid race.

Fixes: c497176cb2 ("xsk: add Rx receive functions and poll support")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-09 01:43:26 +02:00
David S. Miller
7650b1a9bd Merge branch 'mp-inner-L3'
Stephen Suryaputra says:

====================
net: Multipath hashing on inner L3

This series extends commit 363887a2cd ("ipv4: Support multipath
hashing on inner IP pkts for GRE tunnel") to include support when the
outer L3 is IPv6 and to consider the case where the inner L3 is
different version from the outer L3, such as IPv6 tunneled by IPv4 GRE
or vice versa. It also includes kselftest scripts to test the use cases.

v2: Clarify the commit messages in the commits in this series to use the
    term tunneled by IPv4 GRE or by IPv6 GRE so that it's clear which
    one is the inner and which one is the outer (per David Miller).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:37:30 -07:00
Stephen Suryaputra
2800f24854 selftests: forwarding: Test multipath hashing on inner IP pkts for GRE tunnel
Add selftest scripts for multipath hashing on inner IP pkts when there
is a single GRE tunnel but there are multiple underlay routes to reach
the other end of the tunnel.

Four cases are covered in these scripts:
    - IPv4 inner, IPv4 outer
    - IPv6 inner, IPv4 outer
    - IPv4 inner, IPv6 outer
    - IPv6 inner, IPv6 outer

Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:37:29 -07:00
Stephen Suryaputra
d8f74f0975 ipv6: Support multipath hashing on inner IP pkts
Make the same support as commit 363887a2cd ("ipv4: Support multipath
hashing on inner IP pkts for GRE tunnel") for outer IPv6. The hashing
considers both IPv4 and IPv6 pkts when they are tunneled by IPv6 GRE.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:37:29 -07:00
Stephen Suryaputra
828b2b4421 ipv4: Multipath hashing on inner L3 needs to consider inner IPv6 pkts
Commit 363887a2cd ("ipv4: Support multipath hashing on inner IP pkts
for GRE tunnel") supports multipath policy value of 2, Layer 3 or inner
Layer 3 if present, but it only considers inner IPv4. There is a use
case of IPv6 is tunneled by IPv4 GRE, thus add the ability to hash on
inner IPv6 addresses.

Fixes: 363887a2cd ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel")
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:37:29 -07:00
Wen Yang
faf5577f24 net: pasemi: fix an use-after-free in pasemi_mac_phy_init()
The phy_dn variable is still being used in of_phy_connect() after the
of_node_put() call, which may result in use-after-free.

Fixes: 1dd2d06c04 ("net: Rework pasemi_mac driver to use of_mdio infrastructure")
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:33:02 -07:00
Wen Yang
ef86ea982b net: axienet: fix a potential double free in axienet_probe()
There is a possible use-after-free issue in the axienet_probe():

1701:	np = of_parse_phandle(pdev->dev.of_node, "axistream-connected", 0);
1702:   if (np) {
...
1787:		of_node_put(np); ---> released here
1788:		lp->eth_irq = platform_get_irq(pdev, 0);
1789:	} else {
...
1801:	}
1802:	if (IS_ERR(lp->dma_regs)) {
...
1805:		of_node_put(np); ---> double released here
1806:		goto free_netdev;
1807:	}

We solve this problem by removing the unnecessary of_node_put().

Fixes: 28ef9ebdb6 ("net: axienet: make use of axistream-connected attribute optional")
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Anirudha Sarangi <anirudh@xilinx.com>
Cc: John Linn <John.Linn@xilinx.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Robert Hancock <hancock@sedsystems.ca>
Cc: netdev@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:28:32 -07:00
Ilya Leoshkevich
bc2d8afecb selftests/bpf: fix test_reuseport_array on s390
Fix endianness issue: passing a pointer to 64-bit fd as a 32-bit key
does not work on big-endian architectures. So cast fd to 32-bits when
necessary.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-09 01:10:23 +02:00
Kweh Hock Leong
d4117d63a3 net: stmmac: enable clause 45 mdio support
DWMAC4 is capable to support clause 45 mdio communication.
This patch enable the feature on stmmac_mdio_write() and
stmmac_mdio_read() by following phy_write_mmd() and
phy_read_mmd() mdiobus read write implementation format.

Reviewed-by: Li, Yifan <yifan2.li@intel.com>
Signed-off-by: Kweh Hock Leong <hock.leong.kweh@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:08:55 -07:00
Taehee Yoo
44e3725943 net: openvswitch: use netif_ovs_is_port() instead of opencode
Use netif_ovs_is_port() function instead of open code.
This patch doesn't change logic.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:53:25 -07:00
Jesper Dangaard Brouer
f714ecc9cf MAINTAINERS: Add page_pool maintainer entry
In this release cycle the number of NIC drivers using page_pool
will likely reach 4 drivers.  It is about time to add a maintainer
entry.  Add myself and Ilias.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:51:00 -07:00
David S. Miller
11aef3c6da Merge branch 'mvpp2-cls-ether'
Maxime Chevallier says:

====================
net: mvpp2: Add classification based on the ETHER flow

This series adds support for classification of the ETHER flow in the
mvpp2 driver.

The first patch allows detecting when a user specifies a flow_type that
isn't supported by the driver, while the second adds support for this
flow_type by adding the mapping between the ETHER_FLOW enum value and
the relevant classifier flow entries.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:50:06 -07:00
Maxime Chevallier
f406324e50 net: mvpp2: cls: Add support for ETHER_FLOW
Users can specify classification actions based on the 'ether' flow type.
In that case, this will apply to all ethernet traffic, superseeding
flows such as 'udp4' or 'tcp6'.

Add support for this flow type in the PPv2 classifier, by mapping the
ETHER_FLOW value to the corresponding entries in the classifier.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:50:06 -07:00
Maxime Chevallier
f4f1ba1819 net: mvpp2: cls: Report an error for unsupported flow types
Add a missing check to detect flow types that we don't support, so that
user can be informed of this.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:50:06 -07:00
David S. Miller
3f4957eb6c Merge branch 'vsock-virtio-fixes'
Stefano Garzarella says:

====================
vsock/virtio: several fixes in the .probe() and .remove()

During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
before registering the driver", Stefan pointed out some possible issues
in the .probe() and .remove() callbacks of the virtio-vsock driver.

This series tries to solve these issues:
- Patch 1 adds RCU critical sections to avoid use-after-free of
  'the_virtio_vsock' pointer.
- Patch 2 stops workers before to call vdev->config->reset(vdev) to
  be sure that no one is accessing the device.
- Patch 3 moves the works flush at the end of the .remove() to avoid
  use-after-free of 'vsock' object.

v3:
- Patch 1: use rcu_dereference_protected() to get the_virtio_vosck value in
           the virtio_vsock_probe() [Jason]

v2: https://patchwork.kernel.org/cover/11022343/

v1: https://patchwork.kernel.org/cover/10964733/

Before this series the guest crashes in a few second. After this series the
test runs (~12h) without issues.
Tested on an SMP guest (-smp 4 -monitor tcp:127.0.0.1:1234,server,nowait)
with these scripts to stress the .probe()/.remove() path:

- guest
  while true; do
      cat /dev/urandom | nc-vsock -l 4321 > /dev/null &
      cat /dev/urandom | nc-vsock -l 5321 > /dev/null &
      cat /dev/urandom | nc-vsock -l 6321 > /dev/null &
      cat /dev/urandom | nc-vsock -l 7321 > /dev/null &
      wait
  done

- host
  while true; do
      cat /dev/urandom | nc-vsock 3 4321 > /dev/null &
      cat /dev/urandom | nc-vsock 3 5321 > /dev/null &
      cat /dev/urandom | nc-vsock 3 6321 > /dev/null &
      cat /dev/urandom | nc-vsock 3 7321 > /dev/null &
      sleep 2
      echo "device_del v1" | nc 127.0.0.1 1234
      sleep 1
      echo "device_add vhost-vsock-pci,id=v1,guest-cid=3" | nc 127.0.0.1 1234
      sleep 1
  done
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:35:17 -07:00
Stefano Garzarella
e226121fcc vsock/virtio: fix flush of works during the .remove()
This patch moves the flush of works after vdev->config->del_vqs(vdev),
because we need to be sure that no workers run before to free the
'vsock' object.

Since we stopped the workers using the [tx|rx|event]_run flags,
we are sure no one is accessing the device while we are calling
vdev->config->reset(vdev), so we can safely move the workers' flush.

Before the vdev->config->del_vqs(vdev), workers can be scheduled
by VQ callbacks, so we must flush them after del_vqs(), to avoid
use-after-free of 'vsock' object.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:35:17 -07:00
Stefano Garzarella
b917507e5a vsock/virtio: stop workers during the .remove()
Before to call vdev->config->reset(vdev) we need to be sure that
no one is accessing the device, for this reason, we add new variables
in the struct virtio_vsock to stop the workers during the .remove().

This patch also add few comments before vdev->config->reset(vdev)
and vdev->config->del_vqs(vdev).

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:35:17 -07:00
Stefano Garzarella
0deab087b1 vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock
Some callbacks used by the upper layers can run while we are in the
.remove(). A potential use-after-free can happen, because we free
the_virtio_vsock without knowing if the callbacks are over or not.

To solve this issue we move the assignment of the_virtio_vsock at the
end of .probe(), when we finished all the initialization, and at the
beginning of .remove(), before to release resources.
For the same reason, we do the same also for the vdev->priv.

We use RCU to be sure that all callbacks that use the_virtio_vsock
ended before freeing it. This is not required for callbacks that
use vdev->priv, because after the vdev->config->del_vqs() we are sure
that they are ended and will no longer be invoked.

We also take the mutex during the .remove() to avoid that .probe() can
run while we are resetting the device.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:35:17 -07:00
David S. Miller
1a2d405c00 Merge branch 'b53-docs'
Benedikt Spranger says:

====================
Document the configuration of b53

this is the third round to document the configuration of a b53 supported
switch.

v3..v2:
- fix a typo
- improve b53 configuration in DSA_TAG_PROTO_NONE showcase.
- grade up from RFC to patch for mainline inclusion.

v1..v2:
- split out generic parts of the configuration.
- target comments by Andrew Lunn and Florian Fainelli.
- make changes visible to build system
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:30:13 -07:00
Benedikt Spranger
ff2d339375 Documentation: net: dsa: b53: Describe b53 configuration
Document the different needs of documentation for the b53 driver.

Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:30:13 -07:00
Benedikt Spranger
58dd7a8d9d Documentation: net: dsa: Describe DSA switch configuration
Document DSA tagged and VLAN based switch configuration by showcases.

Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:30:13 -07:00
Wei Yongjun
31d166642c nfp: tls: fix error return code in nfp_net_tls_add()
Fix to return negative error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 1f35a56cf5 ("nfp: tls: add/delete TLS TX connections")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:27:33 -07:00
David S. Miller
107d3ce601 Merge branch 'bnxt_en-XDP_REDIRECT'
Michael Chan says:

====================
bnxt_en: Add XDP_REDIRECT support.

This patch series adds XDP_REDIRECT support by Andy Gospodarek.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:25 -07:00
Andy Gospodarek
322b87ca55 bnxt_en: add page_pool support
This removes contention over page allocation for XDP_REDIRECT actions by
adding page_pool support per queue for the driver.  The performance for
XDP_REDIRECT actions scales linearly with the number of cores performing
redirect actions when using the page pools instead of the standard page
allocator.

v2: Fix up the error path from XDP registration, noted by Ilias Apalodimas.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:25 -07:00
Andy Gospodarek
f18c2b77b2 bnxt_en: optimized XDP_REDIRECT support
This adds basic support for XDP_REDIRECT in the bnxt_en driver.  Next
patch adds the more optimized page pool support.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:24 -07:00
Michael Chan
c1ba92a86d bnxt_en: Refactor __bnxt_xmit_xdp().
__bnxt_xmit_xdp() is used by XDP_TX and ethtool loopback packet transmit.
Refactor it so that it can be re-used by the XDP_REDIRECT logic.
Restructure the TX interrupt handler logic to cleanly separate XDP_TX
logic in preparation for XDP_REDIRECT.

Acked-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:24 -07:00
Andy Gospodarek
52c0609258 bnxt_en: rename some xdp functions
Renaming bnxt_xmit_xdp to __bnxt_xmit_xdp to get ready for XDP_REDIRECT
support and reduce confusion/namespace collision.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:24 -07:00
David S. Miller
aa6be2b95d Merge branch 'cpsw-Add-XDP-support'
Ivan Khoronzhuk says:

====================
net: ethernet: ti: cpsw: Add XDP support

This patchset adds XDP support for TI cpsw driver and base it on
page_pool allocator. It was verified on af_xdp socket drop,
af_xdp l2f, ebpf XDP_DROP, XDP_REDIRECT, XDP_PASS, XDP_TX.

It was verified with following configs enabled:
CONFIG_JIT=y
CONFIG_BPFILTER=y
CONFIG_BPF_SYSCALL=y
CONFIG_XDP_SOCKETS=y
CONFIG_BPF_EVENTS=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_JIT=y
CONFIG_CGROUP_BPF=y

Link on previous v7:
https://lkml.org/lkml/2019/7/4/715

Also regular tests with iperf2 were done in order to verify impact on
regular netstack performance, compared with base commit:
https://pastebin.com/JSMT0iZ4

v8..v9:
- fix warnings on arm64 caused by typos in type casting

v7..v8:
- corrected dma calculation based on headroom instead of hard start
- minor comment changes

v6..v7:
- rolled back to v4 solution but with small modification
- picked up patch:
  https://www.spinics.net/lists/netdev/msg583145.html
- added changes related to netsec fix and cpsw

v5..v6:
- do changes that is rx_dev while redirect/flush cycle is kept the same
- dropped net: ethernet: ti: davinci_cpdma: return handler status
- other changes desc in patches

v4..v5:
- added two plreliminary patches:
  net: ethernet: ti: davinci_cpdma: allow desc split while down
  net: ethernet: ti: cpsw_ethtool: allow res split while down
- added xdp alocator refcnt on xdp level, avoiding page pool refcnt
- moved flush status as separate argument for cpdma_chan_process
- reworked cpsw code according to last changes to allocator
- added missed statistic counter

v3..v4:
- added page pool user counter
- use same pool for ndevs in dual mac
- restructured page pool create/destroy according to the last changes in API

v2..v3:
- each rxq and ndev has its own page pool

v1..v2:
- combined xdp_xmit functions
- used page allocation w/o refcnt juggle
- unmapped page for skb netstack
- moved rxq/page pool allocation to open/close pair
- added several preliminary patches:
  net: page_pool: add helper function to retrieve dma addresses
  net: page_pool: add helper function to unmap dma addresses
  net: ethernet: ti: cpsw: use cpsw as drv data
  net: ethernet: ti: cpsw_ethtool: simplify slave loops
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk
9ed4050c0d net: ethernet: ti: cpsw: add XDP support
Add XDP support based on rx page_pool allocator, one frame per page.
Page pool allocator is used with assumption that only one rx_handler
is running simultaneously. DMA map/unmap is reused from page pool
despite there is no need to map whole page.

Due to specific of cpsw, the same TX/RX handler can be used by 2
network devices, so special fields in buffer are added to identify
an interface the frame is destined to. Thus XDP works for both
interfaces, that allows to test xdp redirect between two interfaces
easily. Also, each rx queue have own page pools, but common for both
netdevs.

XDP prog is common for all channels till appropriate changes are added
in XDP infrastructure. Also, once page_pool recycling becomes part of
skb netstack some simplifications can be added, like removing
page_pool_release_page() before skb receive.

In order to keep rx_dev while redirect, that can be somehow used in
future, do flush in rx_handler, that allows to keep rx dev the same
while redirect. It allows to conform with tracing rx_dev pointed
by Jesper.

Also, there is probability, that XDP generic code can be extended to
support multi ndev drivers like this one, using same rx queue for
several ndevs, based on switchdev for instance or else. In this case,
driver can be modified like exposed here:
https://lkml.org/lkml/2019/7/3/243

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk
608ef6202f net: ethernet: ti: cpsw_ethtool: allow res split while down
That's possible to set channel num while interfaces are down. When
interface gets up it should resplit budget. This resplit can happen
after phy is up but only if speed is changed, so should be set before
this, for this allow it to happen while changing number of channels,
when interfaces are down.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk
962fb61890 net: ethernet: ti: davinci_cpdma: allow desc split while down
That's possible to set ring params while interfaces are down. When
interface gets up it uses number of descs to fill rx queue and on
later on changes to create rx pools. Usually, this resplit can happen
after phy is up, but it can be needed before this, so allow it to
happen while setting number of rx descs, when interfaces are down.
Also, if no dependency on intf state, move it to cpdma layer, where
it should be.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk
6670acacd5 net: ethernet: ti: davinci_cpdma: add dma mapped submit
In case if dma mapped packet needs to be sent, like with XDP
page pool, the "mapped" submit can be used. This patch adds dma
mapped submit based on regular one.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk
1da4bbeffe net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
Jesper recently removed page_pool_destroy() (from driver invocation)
and moved shutdown and free of page_pool into xdp_rxq_info_unreg(),
in-order to handle in-flight packets/pages. This created an asymmetry
in drivers create/destroy pairs.

This patch reintroduce page_pool_destroy and add page_pool user
refcnt. This serves the purpose to simplify drivers error handling as
driver now drivers always calls page_pool_destroy() and don't need to
track if xdp_rxq_info_reg_mem_model() was unsuccessful.

This could be used for a special cases where a single RX-queue (with a
single page_pool) provides packets for two net_device'es, and thus
needs to register the same page_pool twice with two xdp_rxq_info
structures.

This patch is primarily to ease API usage for drivers. The recently
merged netsec driver, actually have a bug in this area, which is
solved by this API change.

This patch is a modified version of Ivan Khoronzhuk's original patch.

Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
Fixes: 5c67bf0ec4 ("net: netsec: Use page_pool API")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Arnd Bergmann
49db9228b8 macb: fix build warning for !CONFIG_OF
When CONFIG_OF is disabled, we get a harmless warning about the
newly added variable:

drivers/net/ethernet/cadence/macb_main.c:48:39: error: 'mgmt' defined but not used [-Werror=unused-variable]
 static struct sifive_fu540_macb_mgmt *mgmt;

Move the variable closer to its use inside of the #ifdef.

Fixes: c218ad5590 ("macb: Add support for SiFive FU540-C000")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:43:07 -07:00
Arnd Bergmann
0287f9ed16 gve: fix unused variable/label warnings
On unusual page sizes, we get harmless warnings:

drivers/net/ethernet/google/gve/gve_rx.c:283:6: error: unused variable 'pagecount' [-Werror,-Wunused-variable]
drivers/net/ethernet/google/gve/gve_rx.c:336:1: error: unused label 'have_skb' [-Werror,-Wunused-label]

Change the preprocessor #if to regular if() to avoid this.

Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:40:58 -07:00
Martin Habets
05cfee98c8 sfc: Remove 'PCIE error reporting unavailable'
This is only at notice level but it was pointed out that no other driver
does this.
Also there is no action the user can take as it is really a property of
the server.

Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:16:48 -07:00
David S. Miller
47cfb90406 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Move bridge keys in nft_meta to nft_meta_bridge, from wenxu.

2) Support for bridge pvid matching, from wenxu.

3) Support for bridge vlan protocol matching, also from wenxu.

4) Add br_vlan_get_pvid_rcu(), to fetch the bridge port pvid
   from packet path.

5) Prefer specific family extension in nf_tables.

6) Autoload specific family extension in case it is missing.

7) Add synproxy support to nf_tables, from Fernando Fernandez Mancera.

8) Support for GRE encapsulation in IPVS, from Vadim Fedorenko.

9) ICMP handling for GRE encapsulation, from Julian Anastasov.

10) Remove unused parameter in nf_queue, from Florian Westphal.

11) Replace seq_printf() by seq_puts() in nf_log, from Markus Elfring.

12) Rename nf_SYNPROXY.h => nf_synproxy.h before this header becomes
    public.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:13:38 -07:00
Ilias Apalodimas
bfb204129a net: netsec: Sync dma for device on buffer allocation
cd1973a921 ("net: netsec: Sync dma for device on buffer allocation")
was merged on it's v1 instead of the v3.
Merge the proper patch version

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 11:48:10 -07:00
Quentin Monnet
8fc9f8bedf tools: bpftool: add completion for bpftool prog "loadall"
Bash completion for proposing the "loadall" subcommand is missing. Let's
add it to the completion script.

Add a specific case to propose "load" and "loadall" for completing:

    $ bpftool prog load
                       ^ cursor is here

Otherwise, completion considers that $command is in load|loadall and
starts making related completions (file or directory names, as the
number of words on the command line is below 6), when the only suggested
keywords should be "load" and "loadall" until one has been picked and a
space entered after that to move to the next word.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 17:20:34 +02:00
Arnd Bergmann
bef8e26392 bpf: avoid unused variable warning in tcp_bpf_rtt()
When CONFIG_BPF is disabled, we get a warning for an unused
variable:

In file included from drivers/target/target_core_device.c:26:
include/net/tcp.h:2226:19: error: unused variable 'tp' [-Werror,-Wunused-variable]
        struct tcp_sock *tp = tcp_sk(sk);

The variable is only used in one place, so it can be
replaced with its value there to avoid the warning.

Fixes: 23729ff231 ("bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 17:18:03 +02:00
YueHaibing
6705fea0c7 bpf: cgroup: Fix build error without CONFIG_NET
If CONFIG_NET is not set and CONFIG_CGROUP_BPF=y,
gcc building fails:

kernel/bpf/cgroup.o: In function `cg_sockopt_func_proto':
cgroup.c:(.text+0x237e): undefined reference to `bpf_sk_storage_get_proto'
cgroup.c:(.text+0x2394): undefined reference to `bpf_sk_storage_delete_proto'
kernel/bpf/cgroup.o: In function `__cgroup_bpf_run_filter_getsockopt':
(.text+0x2a1f): undefined reference to `lock_sock_nested'
(.text+0x2ca2): undefined reference to `release_sock'
kernel/bpf/cgroup.o: In function `__cgroup_bpf_run_filter_setsockopt':
(.text+0x3006): undefined reference to `lock_sock_nested'
(.text+0x32bb): undefined reference to `release_sock'

Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 17:17:00 +02:00
Andrii Nakryiko
06ec0e2c49 selftests/bpf: fix test_attach_probe map definition
ef99b02b23 ("libbpf: capture value in BTF type info for BTF-defined map
defs") changed BTF-defined maps syntax, while independently merged
1e8611bbdf ("selftests/bpf: add kprobe/uprobe selftests") added new
test using outdated syntax of maps. This patch fixes this test after
corresponding patch sets were merged.

Fixes: ef99b02b23 ("libbpf: capture value in BTF type info for BTF-defined map defs")
Fixes: 1e8611bbdf ("selftests/bpf: add kprobe/uprobe selftests")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 16:25:58 +02:00
Daniel Borkmann
8bfec4f325 Merge branch 'bpf-sockaddr-wide-store'
Stanislav Fomichev says:

====================
Clang can generate 8-byte stores for user_ip6 & msg_src_ip6,
let's support that on the verifier side.

v3:
* fix comments spelling an -> and (Andrii Nakryiko)

v2:
* Add simple cover letter (Yonghong Song)
* Update comments (Yonghong Song)
* Remove [4] selftests (Yonghong Song)
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 16:22:56 +02:00
Stanislav Fomichev
76d950773c selftests/bpf: add verifier tests for wide stores
Make sure that wide stores are allowed at proper (aligned) addresses.
Note that user_ip6 is naturally aligned on 8-byte boundary, so
correct addresses are user_ip6[0] and user_ip6[2]. msg_src_ip6 is,
however, aligned on a 4-byte bondary, so only msg_src_ip6[1]
can be wide-stored.

Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 16:22:55 +02:00
Stanislav Fomichev
4cfacbe6df bpf: sync bpf.h to tools/
Sync user_ip6 & msg_src_ip6 comments.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 16:22:55 +02:00
Stanislav Fomichev
600c70bad6 bpf: allow wide (u64) aligned stores for some fields of bpf_sock_addr
Since commit cd17d77705 ("bpf/tools: sync bpf.h") clang decided
that it can do a single u64 store into user_ip6[2] instead of two
separate u32 ones:

 #  17: (18) r2 = 0x100000000000000
 #  ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2);
 #  19: (7b) *(u64 *)(r1 +16) = r2
 #  invalid bpf_context access off=16 size=8

>From the compiler point of view it does look like a correct thing
to do, so let's support it on the kernel side.

Credit to Andrii Nakryiko for a proper implementation of
bpf_ctx_wide_store_ok.

Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Fixes: cd17d77705 ("bpf/tools: sync bpf.h")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 16:22:55 +02:00
Daniel Borkmann
d2850ce0bd Merge branch 'bpf-libbpf-perf-rb-api'
Andrii Nakryiko says:

====================
This patchset adds a high-level API for setting up and polling perf buffers
associated with BPF_MAP_TYPE_PERF_EVENT_ARRAY map. Details of APIs are
described in corresponding commit.

Patch #1 adds a set of APIs to set up and work with perf buffer.
Patch #2 enhances libbpf to support auto-setting PERF_EVENT_ARRAY map size.
Patch #3 adds test.
Patch #4 converts bpftool map event_pipe to new API.
Patch #5 updates README to mention perf_buffer_ prefix.

v6->v7:
- __x64_ syscall prefix (Yonghong);
v5->v6:
- fix C99 for loop variable initialization usage (Yonghong);
v4->v5:
- initialize perf_buffer_raw_opts in bpftool map event_pipe (Jakub);
- add perf_buffer_ to README;
v3->v4:
- fixed bpftool event_pipe cmd error handling (Jakub);
v2->v3:
- added perf_buffer__new_raw for more low-level control;
- converted bpftool map event_pipe to new API (Daniel);
- fixed bug with error handling in create_maps (Song);
v1->v2:
- add auto-sizing of PERF_EVENT_ARRAY maps;
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 15:35:45 +02:00