Commit Graph

812921 Commits

Author SHA1 Message Date
Björn Töpel
e2c6f50e48 selftests/bpf: add "any alignment" annotation for some tests
RISC-V does, in-general, not have "efficient unaligned access". When
testing the RISC-V BPF JIT, some selftests failed in the verification
due to misaligned access. Annotate these tests with the
F_NEEDS_EFFICIENT_UNALIGNED_ACCESS flag.

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 16:56:10 +01:00
Björn Töpel
e8cb0167ae bpf, doc: add RISC-V JIT to BPF documentation
Update Documentation/networking/filter.txt and
Documentation/sysctl/net.txt to mention RISC-V.

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 16:56:10 +01:00
Björn Töpel
8a9e0aff88 MAINTAINERS: add RISC-V BPF JIT maintainer
Add Björn Töpel as RISC-V BPF JIT maintainer.

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 16:56:10 +01:00
Björn Töpel
2353ecc6f9 bpf, riscv: add BPF JIT for RV64G
This commit adds a BPF JIT for RV64G.

The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar
to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64).

At the moment the RISC-V Linux port does not support
CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not
supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and
BPF_PROG_TYPE_RAW_TRACEPOINT passes.

The implementation does not support "far branching" (>4KiB).

Test results:
  # modprobe test_bpf
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]

  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier
  ...
  Summary: 761 PASSED, 507 SKIPPED, 2 FAILED

Note that "test_verifier" was run with one build with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise
many of the the tests that require unaligned access were skipped.

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y:
  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier | grep -c 'NOTE.*unknown align'
  0

No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS:
  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier | grep -c 'NOTE.*unknown align'
  59

The two failing test_verifier tests are:
  "ld_abs: vlan + abs, test 1"
  "ld_abs: jump around ld_abs"

This is due to that "far branching" involved in those tests.

All tests where done on QEMU (QEMU emulator version 3.1.50
(v3.1.0-688-g8ae951fbc106)).

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 16:56:10 +01:00
Daniel Borkmann
31de389707 Merge branch 'bpf-btf-dedup'
Andrii Nakryiko says:

====================
This patch series adds BTF deduplication algorithm to libbpf. This algorithm
allows to take BTF type information containing duplicate per-compilation unit
information and reduce it to equivalent set of BTF types with no duplication without
loss of information. It also deduplicates strings and removes those strings that
are not referenced from any BTF type (and line information in .BTF.ext section,
if any).

Algorithm also resolves struct/union forward declarations into concrete BTF types
across multiple compilation units to facilitate better deduplication ratio. If
undesired, this resolution can be disabled through specifying corresponding options.

When applied to BTF data emitted by pahole's DWARF->BTF converter, it reduces
the overall size of .BTF section by about 65x, from about 112MB to 1.75MB, leaving
only 29247 out of initial 3073497 BTF type descriptors.

Algorithm with minor differences and preliminary results before FUNC/FUNC_PROTO
support is also described more verbosely at:

https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html

v1->v2:
- rebase on latest bpf-next
- err_log/elog -> pr_debug
- btf__dedup, btf__get_strings, btf__get_nr_types listed under 0.0.2 version
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 16:52:58 +01:00
Andrii Nakryiko
9c65112744 selftests/btf: add initial BTF dedup tests
This patch sets up a new kind of tests (BTF dedup tests) and tests few aspects of
BTF dedup algorithm. More complete set of tests will come in follow up patches.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 16:52:57 +01:00
Andrii Nakryiko
d5caef5b56 btf: add BTF types deduplication algorithm
This patch implements BTF types deduplication algorithm. It allows to
greatly compress typical output of pahole's DWARF-to-BTF conversion or
LLVM's compilation output by detecting and collapsing identical types emitted in
isolation per compilation unit. Algorithm also resolves struct/union forward
declarations into concrete BTF types representing referenced struct/union. If
undesired, this resolution can be disabled through specifying corresponding options.

Algorithm itself and its application to Linux kernel's BTF types is
described in details at:
https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 16:52:57 +01:00
Andrii Nakryiko
69eaab04c6 btf: extract BTF type size calculation
This pre-patch extracts calculation of amount of space taken by BTF type descriptor
for later reuse by btf_dedup functionality.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05 16:52:57 +01:00
Lucas De Marchi
2a121030d4 drm/i915: always return something on DDI clock selection
Even if we don't have the correct clock and get a warning, we should not
skip the return.

v2: improve commit message (from Joonas)

Fixes: 1fa11ee2d9 ("drm/i915/icl: start adding the TBT pll")
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125222444.19926-3-lucas.demarchi@intel.com
(cherry picked from commit 7a61a6dec3)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-02-05 15:25:27 +02:00
Ville Syrjälä
3e0b69bbed drm/i915: Fix skl srckey mask bits
We're incorrectly masking off the R/V channel enable bit from
KEYMSK. Fix it up.

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Fixes: b208152556 ("drm/i915: Add plane alpha blending support, v2.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125183846.28755-1-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 968bf969b4)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-02-05 15:18:46 +02:00
Florian Westphal
947e492c0f netfilter: nft_compat: don't use refcount_inc on newly allocated entry
When I moved the refcount to refcount_t type I missed the fact that
refcount_inc() will result in use-after-free warning with
CONFIG_REFCOUNT_FULL=y builds.

The correct fix would be to init the reference count to 1 at allocation
time, but, unfortunately we cannot do this, as we can't undo that
in case something else fails later in the batch.

So only solution I see is to special-case the 'new entry' condition
and replace refcount_inc() with a "delayed" refcount_set(1) in this case,
as done here.

The .activate callback can be removed to simplify things, we only
need to make sure that deactivate() decrements/unlinks the entry
from the list at end of transaction phase (commit or abort).

Fixes: 12c44aba66 ("netfilter: nft_compat: use refcnt_t type for nft_xt reference count")
Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-05 14:10:33 +01:00
Eli Cooper
15df03c661 netfilter: ipv6: Don't preserve original oif for loopback address
Commit 508b09046c ("netfilter: ipv6: Preserve link scope traffic
original oif") made ip6_route_me_harder() keep the original oif for
link-local and multicast packets. However, it also affected packets
for the loopback address because it used rt6_need_strict().

REDIRECT rules in the OUTPUT chain rewrite the destination to loopback
address; thus its oif should not be preserved. This commit fixes the bug
that redirected local packets are being dropped. Actually the packet was
not exactly dropped; Instead it was sent out to the original oif rather
than lo. When a packet with daddr ::1 is sent to the router, it is
effectively dropped.

Fixes: 508b09046c ("netfilter: ipv6: Preserve link scope traffic original oif")
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-05 14:10:33 +01:00
Thomas Hellstrom
9ddac734aa drm/vmwgfx: Improve on IOMMU detection
instead of relying on intel_iommu_enabled, use the fact that the
dma_map_ops::map_page != dma_direct_map_page.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
2019-02-05 13:55:16 +01:00
Thomas Hellstrom
4cbfa1e6c0 drm/vmwgfx: Fix setting of dma masks
Previously we set only the dma mask and not the coherent mask. Fix that.
Also, for clarity, make sure both are initially set to 64 bits.

Cc: <stable@vger.kernel.org>
Fixes: 0d00c488f3de: ("drm/vmwgfx: Fix the driver for large dma addresses")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
2019-02-05 13:54:21 +01:00
Deepak Rawat
479d59026f drm/vmwgfx: Also check for crtc status while checking for DU active
During modeset check it is possible to have all crtc_state's in atomic
state. Check for crtc enable status while checking for display unit
active status. Only error if enabling a crtc while display unit is not
active.

Cc: <stable@vger.kernel.org>
Fixes: 9da6e26c0aae: ("drm/vmwgfx: Fix a layout race condition")
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2019-02-05 13:53:28 +01:00
Thomas Hellstrom
51fdbeb4ca drm/vmwgfx: Fix an uninitialized fence handle value
if vmw_execbuf_fence_commands() fails, The handle value will be
uninitialized and a bogus fence handle might be copied to user-space.

Cc: <stable@vger.kernel.org>
Fixes: 2724b2d54cda: ("drm/vmwgfx: Use new validation interface for the modesetting code v2")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com> #v1
Reviewed-by: Sinclair Yeh <syeh@vmware.com> #v1
Reviewed-by: Deepak Rawat <drawat@vmware.com>
2019-02-05 13:51:57 +01:00
Thomas Hellstrom
728354c005 drm/vmwgfx: Return error code from vmw_execbuf_copy_fence_user
The function was unconditionally returning 0, and a caller would have to
rely on the returned fence pointer being NULL to detect errors. However,
the function vmw_execbuf_copy_fence_user() would expect a non-zero error
code in that case and would BUG otherwise.

So make sure we return a proper non-zero error code if the fence pointer
returned is NULL.

Cc: <stable@vger.kernel.org>
Fixes: ae2a104058e2: ("vmwgfx: Implement fence objects")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
2019-02-05 13:50:53 +01:00
Linus Walleij
5468e82f70 net: phy: fixed-phy: Drop GPIO from fixed_phy_add()
All users of the fixed_phy_add() pass -1 as GPIO number
to the fixed phy driver, and all users of fixed_phy_register()
pass -1 as GPIO number as well, except for the device
tree MDIO bus.

Any new users should create a proper device and pass the
GPIO as a descriptor associated with the device so delete
the GPIO argument from the calls and drop the code looking
requesting a GPIO in fixed_phy_add().

In fixed phy_register(), investigate the "fixed-link"
node and pick the GPIO descriptor from "link-gpios" if
this property exists. Move the corresponding code out
of of_mdio.c as the fixed phy code anyways requires
OF to be in use.

Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 18:33:36 -08:00
Marc Zyngier
c8101f7729 net: dsa: Fix lockdep false positive splat
Creating a macvtap on a DSA-backed interface results in the following
splat when lockdep is enabled:

[   19.638080] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
[   23.041198] device lan0 entered promiscuous mode
[   23.043445] device eth0 entered promiscuous mode
[   23.049255]
[   23.049557] ============================================
[   23.055021] WARNING: possible recursive locking detected
[   23.060490] 5.0.0-rc3-00013-g56c857a1b8d3 #118 Not tainted
[   23.066132] --------------------------------------------
[   23.071598] ip/2861 is trying to acquire lock:
[   23.076171] 00000000f61990cb (_xmit_ETHER){+...}, at: dev_set_rx_mode+0x1c/0x38
[   23.083693]
[   23.083693] but task is already holding lock:
[   23.089696] 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
[   23.096774]
[   23.096774] other info that might help us debug this:
[   23.103494]  Possible unsafe locking scenario:
[   23.103494]
[   23.109584]        CPU0
[   23.112093]        ----
[   23.114601]   lock(_xmit_ETHER);
[   23.117917]   lock(_xmit_ETHER);
[   23.121233]
[   23.121233]  *** DEADLOCK ***
[   23.121233]
[   23.127325]  May be due to missing lock nesting notation
[   23.127325]
[   23.134315] 2 locks held by ip/2861:
[   23.137987]  #0: 000000003b766c72 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x338/0x4e0
[   23.146231]  #1: 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
[   23.153757]
[   23.153757] stack backtrace:
[   23.158243] CPU: 0 PID: 2861 Comm: ip Not tainted 5.0.0-rc3-00013-g56c857a1b8d3 #118
[   23.166212] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[   23.172843] Call trace:
[   23.175358]  dump_backtrace+0x0/0x188
[   23.179116]  show_stack+0x14/0x20
[   23.182524]  dump_stack+0xb4/0xec
[   23.185928]  __lock_acquire+0x123c/0x1860
[   23.190048]  lock_acquire+0xc8/0x248
[   23.193724]  _raw_spin_lock_bh+0x40/0x58
[   23.197755]  dev_set_rx_mode+0x1c/0x38
[   23.201607]  dev_set_promiscuity+0x3c/0x50
[   23.205820]  dsa_slave_change_rx_flags+0x5c/0x70
[   23.210567]  __dev_set_promiscuity+0x148/0x1e0
[   23.215136]  __dev_set_rx_mode+0x74/0x98
[   23.219167]  dev_uc_add+0x54/0x70
[   23.222575]  macvlan_open+0x170/0x1d0
[   23.226336]  __dev_open+0xe0/0x160
[   23.229830]  __dev_change_flags+0x16c/0x1b8
[   23.234132]  dev_change_flags+0x20/0x60
[   23.238074]  do_setlink+0x2d0/0xc50
[   23.241658]  __rtnl_newlink+0x5f8/0x6e8
[   23.245601]  rtnl_newlink+0x50/0x78
[   23.249184]  rtnetlink_rcv_msg+0x360/0x4e0
[   23.253397]  netlink_rcv_skb+0xe8/0x130
[   23.257338]  rtnetlink_rcv+0x14/0x20
[   23.261012]  netlink_unicast+0x190/0x210
[   23.265043]  netlink_sendmsg+0x288/0x350
[   23.269075]  sock_sendmsg+0x18/0x30
[   23.272659]  ___sys_sendmsg+0x29c/0x2c8
[   23.276602]  __sys_sendmsg+0x60/0xb8
[   23.280276]  __arm64_sys_sendmsg+0x1c/0x28
[   23.284488]  el0_svc_common+0xd8/0x138
[   23.288340]  el0_svc_handler+0x24/0x80
[   23.292192]  el0_svc+0x8/0xc

This looks fairly harmless (no actual deadlock occurs), and is
fixed in a similar way to c6894dec8e ("bridge: fix lockdep
addr_list_lock false positive splat") by putting the addr_list_lock
in its own lockdep class.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 18:30:43 -08:00
Rundong Ge
17ab4f61b8 net: dsa: slave: Don't propagate flag changes on down slave interfaces
The unbalance of master's promiscuity or allmulti will happen after ifdown
and ifup a slave interface which is in a bridge.

When we ifdown a slave interface , both the 'dsa_slave_close' and
'dsa_slave_change_rx_flags' will clear the master's flags. The flags
of master will be decrease twice.
In the other hand, if we ifup the slave interface again, since the
slave's flags were cleared the 'dsa_slave_open' won't set the master's
flag, only 'dsa_slave_change_rx_flags' that triggered by 'br_add_if'
will set the master's flags. The flags of master is increase once.

Only propagating flag changes when a slave interface is up makes
sure this does not happen. The 'vlan_dev_change_rx_flags' had the
same problem and was fixed, and changes here follows that fix.

Fixes: 91da11f870 ("net: Distributed Switch Architecture protocol support")
Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 18:29:35 -08:00
Tonghao Zhang
fc9c5a4a5a net/mlx5: Fix code style issue in mlx driver
Add the tab before '}' and keep the code style consistent.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 18:01:56 -08:00
Stanislav Fomichev
a8a1f7d09c libbpf: fix libbpf_print
With the recent print rework we now have the following problem:
pr_{warning,info,debug} expand to __pr which calls libbpf_print.
libbpf_print does va_start and calls __libbpf_pr with va_list argument.
In __base_pr we again do va_start. Because the next argument is a
va_list, we don't get correct pointer to the argument (and print noting
in my case, I don't know why it doesn't crash tbh).

Fix this by changing libbpf_print_fn_t signature to accept va_list and
remove unneeded calls to va_start in the existing users.

Alternatively, this can we solved by exporting __libbpf_pr and
changing __pr macro to (and killing libbpf_print):
{
	if (__libbpf_pr)
		__libbpf_pr(level, "libbpf: " fmt, ##__VA_ARGS__)
}

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-04 17:45:31 -08:00
Santosh Shilimkar
fd261ce6a3 rds: rdma: update rdma transport for tos
For RDMA transports, RDS TOS is an extension of IB QoS(Annex A13)
to provide clients the ability to segregate traffic flows for
different type of data. RDMA CM abstract it for ULPs using
rdma_set_service_type(). Internally, each traffic flow is
represented by a connection with all of its independent resources
like that of a normal connection, and is differentiated by
service type. In other words, there can be multiple qp connections
between an IP pair and each supports a unique service type.

The feature has been added from RDSv4.1 onwards and supports
rolling upgrades. RDMA connection metadata also carries the tos
information to set up SL on end to end context. The original
code was developed by Bang Nguyen in downstream kernel back in
2.6.32 kernel days and it has evolved over period of time.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-04 14:59:13 -08:00
Santosh Shilimkar
56dc8bce9f rds: add transport specific tos_map hook
RDMA transport maps user tos to underline virtual lanes(VL)
for IB or DSCP values. RDMA CM transport abstract thats for
RDS. TCP transport makes use of default priority 0 and maps
all user tos values to it.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-04 14:59:13 -08:00
Santosh Shilimkar
3eb450367d rds: add type of service(tos) infrastructure
RDS Service type (TOS) is user-defined and needs to be configured
via RDS IOCTL interface. It must be set before initiating any
traffic and once set the TOS can not be changed. All out-going
traffic from the socket will be associated with its TOS.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-04 14:59:12 -08:00
Santosh Shilimkar
d021fabf52 rds: rdma: add consumer reject
For legacy protocol version incompatibility with non linux RDS,
consumer reject reason being used to convey it to peer. But the
choice of reject reason value as '1' was really poor.

Anyway for interoperability reasons with shipping products,
it needs to be supported. For any future versions, properly
encoded reject reason should to be used.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-04 14:59:11 -08:00
Santosh Shilimkar
cdc306a5c9 rds: make v3.1 as compat version
Mark RDSv3.1 as compat version and add v4.1 version macro's.
Subsequent patches enable TOS(Type of Service) feature which is
tied with v4.1 for RDMA transport.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
2019-02-04 14:59:11 -08:00
David S. Miller
d3ab9df53e Merge branch 'sh_eth-implement-simple-RX-checksum-offload'
Sergei Shtylyov says:

====================
sh_eth: implement simple RX checksum offload

Here's a set of 7 patches against DaveM's 'net-next.git' repo. I'm implemeting
the simple RX checksum offload (like was done for the 'ravb' driver by Simon
Horman); it has been only tested on the R8A7740 and R8A77980 SoCs, the other
SoCs should just work (according to their manuals)...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 13:31:00 -08:00
Sergei Shtylyov
997feb11b8 sh_eth: offload RX checksum on SH7763
The SH7763 SoC manual describes the Ether MAC's RX checksum offload
the same way as it's implemented in the EtherAVB MACs...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 13:31:00 -08:00
Sergei Shtylyov
06240e1b52 sh_eth: offload RX checksum on SH7734
The SH7734 SoC manual describes the Ether MAC's RX checksum offload
the same way as it's implemented in the EtherAVB MACs...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 13:31:00 -08:00
Sergei Shtylyov
0da843adee sh_eth: offload RX checksum on R8A77980
The R-Car V3H (R8A77980) SoC manual describes the Ether MAC's RX checksum
offload the same way as it's implemented in the EtherAVB MAC...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 13:31:00 -08:00
Sergei Shtylyov
040c16fd59 sh_eth: offload RX checksum on R8A7740
The R-Mobile A1 (R8A7740) SoC manual describes the Ether MAC's RX checksum
offload the same way as it's implemented in the EtherAVB MAC...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 13:31:00 -08:00
Sergei Shtylyov
48132cd0c6 sh_eth: offload RX checksum on R7S72100
The RZ/A1H (R7S721000) SoC manual describes the Ether MAC's RX checksum
offload the same way as it's implemented in the EtherAVB MACs...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 13:31:00 -08:00
Sergei Shtylyov
f8e022db50 sh_eth: RX checksum offload support
Add support for the RX checksum offload. This is enabled by default and
may be disabled and re-enabled using 'ethtool':

# ethtool -K eth0 rx off
# ethtool -K eth0 rx on

Some Ether MACs provide a simple checksumming scheme which appears to be
completely compatible with CHECKSUM_COMPLETE: sum of all packet data after
the L2 header is appended to packet data; this may be trivially read by
the driver and used to update the skb accordingly. The same checksumming
scheme is implemented in the EtherAVB MACs and now supported by the 'ravb'
driver.

In terms of performance, throughput is close to gigabit line rate with the
RX checksum offload both enabled and disabled.  The 'perf' output, however,
appears to indicate that significantly less time is spent in do_csum() --
this is as expected.

Test results with RX checksum offload enabled:

~/netperf-2.2pl4# perf record -a ./netperf -t TCP_MAERTS -H 192.168.2.4
TCP MAERTS TEST to 192.168.2.4
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

131072  16384  16384    10.01     933.93
[ perf record: Woken up 8 times to write data ]
[ perf record: Captured and wrote 1.955 MB perf.data (41940 samples) ]
~/netperf-2.2pl4# perf report
Samples: 41K of event 'cycles:ppp', Event count (approx.): 9915302763
Overhead  Command          Shared Object             Symbol
   9.44%  netperf          [kernel.kallsyms]         [k] __arch_copy_to_user
   7.75%  swapper          [kernel.kallsyms]         [k] _raw_spin_unlock_irq
   6.31%  swapper          [kernel.kallsyms]         [k] default_idle_call
   5.89%  swapper          [kernel.kallsyms]         [k] arch_cpu_idle
   4.37%  swapper          [kernel.kallsyms]         [k] tick_nohz_idle_exit
   4.02%  netperf          [kernel.kallsyms]         [k] _raw_spin_unlock_irq
   2.52%  netperf          [kernel.kallsyms]         [k] preempt_count_sub
   1.81%  netperf          [kernel.kallsyms]         [k] tcp_recvmsg
   1.80%  netperf          [kernel.kallsyms]         [k] _raw_spin_unlock_irqres
   1.78%  netperf          [kernel.kallsyms]         [k] preempt_count_add
   1.36%  netperf          [kernel.kallsyms]         [k] __tcp_transmit_skb
   1.20%  netperf          [kernel.kallsyms]         [k] __local_bh_enable_ip
   1.10%  netperf          [kernel.kallsyms]         [k] sh_eth_start_xmit

Test results with RX checksum offload disabled:

~/netperf-2.2pl4# perf record -a ./netperf -t TCP_MAERTS -H 192.168.2.4
TCP MAERTS TEST to 192.168.2.4
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
131072  16384  16384    10.01     932.04
[ perf record: Woken up 14 times to write data ]
[ perf record: Captured and wrote 3.642 MB perf.data (78817 samples) ]
~/netperf-2.2pl4# perf report
Samples: 78K of event 'cycles:ppp', Event count (approx.): 18091442796
Overhead  Command          Shared Object       Symbol
   7.00%  swapper          [kernel.kallsyms]   [k] do_csum
   3.94%  swapper          [kernel.kallsyms]   [k] sh_eth_poll
   3.83%  ksoftirqd/0      [kernel.kallsyms]   [k] do_csum
   3.23%  swapper          [kernel.kallsyms]   [k] _raw_spin_unlock_irq
   2.87%  netperf          [kernel.kallsyms]   [k] __arch_copy_to_user
   2.86%  swapper          [kernel.kallsyms]   [k] arch_cpu_idle
   2.13%  swapper          [kernel.kallsyms]   [k] default_idle_call
   2.12%  ksoftirqd/0      [kernel.kallsyms]   [k] sh_eth_poll
   2.02%  swapper          [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore
   1.84%  swapper          [kernel.kallsyms]   [k] __softirqentry_text_start
   1.64%  swapper          [kernel.kallsyms]   [k] tick_nohz_idle_exit
   1.53%  netperf          [kernel.kallsyms]   [k] _raw_spin_unlock_irq
   1.32%  netperf          [kernel.kallsyms]   [k] preempt_count_sub
   1.27%  swapper          [kernel.kallsyms]   [k] __pi___inval_dcache_area
   1.22%  swapper          [kernel.kallsyms]   [k] check_preemption_disabled
   1.01%  ksoftirqd/0      [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore

The above results collected on the R-Car V3H Starter Kit board.

Based on the commit 4d86d38186 ("ravb: RX checksum offload")...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 13:31:00 -08:00
Sergei Shtylyov
2c2ab5af7d sh_eth: rename sh_eth_cpu_data::hw_checksum
Commit 62e04b7e0e ("sh_eth: rename 'sh_eth_cpu_data::hw_crc'") renamed
the field to 'hw_checksum' for the Ether DMAC "intelligent checksum",
however some Ether MACs implement a simpler checksumming scheme, so that
name now seems misleading. Rename that field to 'csmr' as the "intelligent
checksum" is always controlled by the CSMR register.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 13:31:00 -08:00
Alexei Starovoitov
1728b11110 Merge branch 'libbpf-btf_ext'
Yonghong Song says:

====================
This patch set exposed a few functions in libbpf.
All these newly added API functions are helpful for
JIT based bpf compilation where .BTF and .BTF.ext
are available as in-memory data blobs.

Patch #1 exposed several btf_ext__* API functions which
are used to handle .BTF.ext ELF sections.
Patch #2 refactored the function bpf_map_find_btf_info()
and exposed API function btf__get_map_kv_tids() to
retrieve the map key/value type id's generated by
bpf program through BPF_ANNOTATE_KV_PAIR macro.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-04 12:48:37 -08:00
Yonghong Song
96408c4344 tools/bpf: implement libbpf btf__get_map_kv_tids() API function
Currently, to get map key/value type id's, the macro
  BPF_ANNOTATE_KV_PAIR(<map_name>, <key_type>, <value_type>)
needs to be defined in the bpf program for the
corresponding map.

During program/map loading time,
the local static function bpf_map_find_btf_info()
in libbpf.c is implemented to retrieve the key/value
type ids given the map name.

The patch refactored function bpf_map_find_btf_info()
to create an API btf__get_map_kv_tids() which includes
the bulk of implementation for the original function.
The API btf__get_map_kv_tids() can be used by bcc,
a JIT based bpf compilation system, which uses the
same BPF_ANNOTATE_KV_PAIR to record map key/value types.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-04 12:48:36 -08:00
Yonghong Song
b8dcf8d149 tools/bpf: expose functions btf_ext__* as API functions
The following set of functions, which manipulates .BTF.ext
section, are exposed as API functions:
  . btf_ext__new
  . btf_ext__free
  . btf_ext__reloc_func_info
  . btf_ext__reloc_line_info
  . btf_ext__func_info_rec_size
  . btf_ext__line_info_rec_size

These functions are useful for JIT based bpf codegen, e.g.,
bcc, to manipulate in-memory .BTF.ext sections.

The signature of function btf_ext__reloc_func_info()
is also changed to be the same as its definition in btf.c.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-04 12:48:36 -08:00
Stanislav Fomichev
7e8a590377 selftests/bpf: use localhost in tcp_{server,client}.py
Bind and connect to localhost. There is no reason for this test to
use non-localhost interface. This lets us run this test in a network
namespace.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-04 21:29:27 +01:00
Heiko Carstens
ecc15f113c s390: bpf: fix JMP32 code-gen
Commit 626a5f66da ("s390: bpf: implement jitting of JMP32") added
JMP32 code-gen support for s390. However it triggers the warning below
due to some unusual gotos in the original s390 bpf jit code.

Add a couple of additional "is_jmp32" initializations to fix this.
Also fix the wrong opcode for the "llilf" instruction that was
introduced with the same commit.

arch/s390/net/bpf_jit_comp.c: In function 'bpf_jit_insn':
arch/s390/net/bpf_jit_comp.c:248:55: warning: 'is_jmp32' may be used uninitialized in this function [-Wmaybe-uninitialized]
  _EMIT6(op1 | reg(b1, b2) << 16 | (rel & 0xffff), op2 | mask); \
                                                       ^
arch/s390/net/bpf_jit_comp.c:1211:8: note: 'is_jmp32' was declared here
   bool is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;

Fixes: 626a5f66da ("s390: bpf: implement jitting of JMP32")
Cc: Jiong Wang <jiong.wang@netronome.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-04 09:45:09 -08:00
David S. Miller
0429f237ce Merge branch 's390-qeth-fixes'
Julian Wiedmann says:

====================
s390/qeth: fixes 2019-02-04

please apply the following four fixes to -net.

Patch 1 takes care of a common resource leak in various error paths, while the
second patch fixes a misordered kfree when cleaning up after an error.
The other two patches ensure that there's no stale work dangling on workqueues
when the qeth device has already been offlined and/or removed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:43:48 -08:00
Julian Wiedmann
c0a2e4d10d s390/qeth: conclude all event processing before offlining a card
Work for Bridgeport events is currently placed on a driver-wide
workqueue. If the card is removed and freed while any such work is still
active, this causes a use-after-free.
So put the events on a per-card queue, where we can control their
lifetime. As we also don't want stale events to last beyond an
offline & online cycle, flush this queue when setting the card offline.

Fixes: b4d72c08b3 ("qeth: bridgeport support - basic control")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:43:48 -08:00
Julian Wiedmann
c2780c1a3f s390/qeth: cancel close_dev work before removing a card
A card's close_dev work is scheduled on a driver-wide workqueue. If the
card is removed and freed while the work is still active, this causes a
use-after-free.
So make sure that the work is completed before freeing the card.

Fixes: 0f54761d16 ("qeth: Support VEPA mode")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:43:48 -08:00
Julian Wiedmann
afa0c5904b s390/qeth: fix use-after-free in error path
The error path in qeth_alloc_qdio_buffers() that takes care of
cleaning up the Output Queues is buggy. It first frees the queue, but
then calls qeth_clear_outq_buffers() with that very queue struct.

Make the call to qeth_clear_outq_buffers() part of the free action
(in the correct order), and while at it fix the naming of the helper.

Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:43:48 -08:00
Julian Wiedmann
5065b2dd3e s390/qeth: release cmd buffer in error paths
Whenever we fail before/while starting an IO, make sure to release the
IO buffer. Usually qeth_irq() would do this for us, but if the IO
doesn't even start we obviously won't get an interrupt for it either.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:43:47 -08:00
Alexei Starovoitov
9fa3b47304 Merge branch 'change-libbpf-print-api'
Yonghong Song says:

====================
These are patches responding to my comments for
Magnus's patch (https://patchwork.ozlabs.org/patch/1032848/).
The goal is to make pr_* macros available to other C files
than libbpf.c, and to simplify API function libbpf_set_print().

Specifically, Patch #1 used global functions
to facilitate pr_* macros in the header files so they
are available in different C files.
Patch #2 removes the global function libbpf_print_level_available()
which is added in Patch 1.
Patch #3 simplified libbpf_set_print() which takes only one print
function with a debug level argument among others.

Changelogs:
 v3 -> v4:
   . rename libbpf internal header util.h to libbpf_util.h
   . rename libbpf internal function libbpf_debug_print() to libbpf_print()
 v2 -> v3:
   . bailed out earlier in libbpf_debug_print() if __libbpf_pr is NULL
   . added missing LIBBPF_DEBUG level check in libbpf.c __base_pr().
 v1 -> v2:
   . Renamed global function libbpf_dprint() to libbpf_debug_print()
     to be more expressive.
   . Removed libbpf_dprint_level_available() as it is used only
     once in btf.c and we can remove it by optimizing for common cases.
====================

Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-04 09:40:59 -08:00
Yonghong Song
6f1ae8b662 tools/bpf: simplify libbpf API function libbpf_set_print()
Currently, the libbpf API function libbpf_set_print()
takes three function pointer parameters for warning, info
and debug printout respectively.

This patch changes the API to have just one function pointer
parameter and the function pointer has one additional
parameter "debugging level". So if in the future, if
the debug level is increased, the function signature
won't change.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-04 09:40:59 -08:00
Yonghong Song
9d100a19ff tools/bpf: print out btf log at LIBBPF_WARN level
Currently, the btf log is allocated and printed out in case
of error at LIBBPF_DEBUG level.
Such logs from kernel are very important for debugging.
For example, bpf syscall BPF_PROG_LOAD command can get
verifier logs back to user space. In function load_program()
of libbpf.c, the log buffer is allocated unconditionally
and printed out at pr_warning() level.

Let us do the similar thing here for btf. Allocate buffer
unconditionally and print out error logs at pr_warning() level.
This can reduce one global function and
optimize for common situations where pr_warning()
is activated either by default or by user supplied
debug output function.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-04 09:40:58 -08:00
Yonghong Song
8461ef8b7e tools/bpf: move libbpf pr_* debug print functions to headers
A global function libbpf_print, which is invisible
outside the shared library, is defined to print based
on levels. The pr_warning, pr_info and pr_debug
macros are moved into the newly created header
common.h. So any .c file including common.h can
use these macros directly.

Currently btf__new and btf_ext__new API has an argument getting
__pr_debug function pointer into btf.c so the debugging information
can be printed there. This patch removed this parameter
from btf__new and btf_ext__new and directly using pr_debug in btf.c.

Another global function libbpf_print_level_available, also
invisible outside the shared library, can test
whether a particular level debug printing is
available or not. It is used in btf.c to
test whether DEBUG level debug printing is availabl or not,
based on which the log buffer will be allocated when loading
btf to the kernel.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-04 09:40:58 -08:00
Petr Machata
c1f7e02979 net: cls_flower: Remove filter from mask before freeing it
In fl_change(), when adding a new rule (i.e. fold == NULL), a driver may
reject the new rule, for example due to resource exhaustion. By that
point, the new rule was already assigned a mask, and it was added to
that mask's hash table. The clean-up path that's invoked as a result of
the rejection however neglects to undo the hash table addition, and
proceeds to free the new rule, thus leaving a dangling pointer in the
hash table.

Fix by removing fnew from the mask's hash table before it is freed.

Fixes: 35cc3cefc4 ("net/sched: cls_flower: Reject duplicated rules also under skip_sw")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:19:14 -08:00