Commit Graph

753672 Commits

Author SHA1 Message Date
Jakub Kicinski
8d93045077 samples: bpf: rename libbpf.h to bpf_insn.h
The libbpf.h file in samples is clashing with libbpf's header.
Since it only includes a subset of filter.h instruction helpers
rename it to bpf_insn.h.  Drop the unnecessary include of bpf/bpf.h.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 22:52:10 -07:00
Jakub Kicinski
2bf3e2ef42 samples: bpf: include bpf/bpf.h instead of local libbpf.h
There are two files in the tree called libbpf.h which is becoming
problematic.  Most samples don't actually need the local libbpf.h
they simply include it to get to bpf/bpf.h.  Include bpf/bpf.h
directly instead.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 22:52:10 -07:00
Alexei Starovoitov
fb40c9ddd6 Merge branch 'bpf-jit-cleanups'
Daniel Borkmann says:

====================
This series follows up mostly with with some minor cleanups on top
of 'Move ld_abs/ld_ind to native BPF' as well as implements better
32/64 bit immediate load into register and saves tail call init on
cBPF for the arm64 JIT. Last but not least we add a couple of test
cases. For details please see individual patches. Thanks!

v1 -> v2:
  - Minor fix in i64_i16_blocks() to remove 24 shift.
  - Added last two patches.
  - Added Acks from prior round.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 19:11:46 -07:00
Daniel Borkmann
a82d8cd398 bpf: add ld64 imm test cases
Add test cases where we combine semi-random imm values, mainly for testing
JITs when they have different encoding options for 64 bit immediates in
order to reduce resulting image size.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 19:11:45 -07:00
Daniel Borkmann
56ea6a8b49 bpf, arm64: save 4 bytes in prologue when ebpf insns came from cbpf
We can trivially save 4 bytes in prologue for cBPF since tail calls
can never be used from there. The register push/pop is pairwise,
here, x25 (fp) and x26 (tcc), so no point in changing that, only
reset to zero is not needed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 19:11:45 -07:00
Daniel Borkmann
6d2eea6fb0 bpf, arm64: optimize 32/64 immediate emission
Improve the JIT to emit 64 and 32 bit immediates, the current
algorithm is not optimal and we often emit more instructions
than actually needed. arm64 has movz, movn, movk variants but
for the current 64 bit immediates we only use movz with a
series of movk when needed.

For example loading ffffffffffffabab emits the following 4
instructions in the JIT today:

  * movz: abab, shift:  0, result: 000000000000abab
  * movk: ffff, shift: 16, result: 00000000ffffabab
  * movk: ffff, shift: 32, result: 0000ffffffffabab
  * movk: ffff, shift: 48, result: ffffffffffffabab

Whereas after the patch the same load only needs a single
instruction:

  * movn: 5454, shift:  0, result: ffffffffffffabab

Another example where two extra instructions can be saved:

  * movz: abab, shift:  0, result: 000000000000abab
  * movk: 1f2f, shift: 16, result: 000000001f2fabab
  * movk: ffff, shift: 32, result: 0000ffff1f2fabab
  * movk: ffff, shift: 48, result: ffffffff1f2fabab

After the patch:

  * movn: e0d0, shift: 16, result: ffffffff1f2fffff
  * movk: abab, shift:  0, result: ffffffff1f2fabab

Another example with movz, before:

  * movz: 0000, shift:  0, result: 0000000000000000
  * movk: fea0, shift: 32, result: 0000fea000000000

After:

  * movz: fea0, shift: 32, result: 0000fea000000000

Moreover, reuse emit_a64_mov_i() for 32 bit immediates that
are loaded via emit_a64_mov_i64() which is a similar optimization
as done in 6fe8b9c1f4 ("bpf, x64: save several bytes by using
mov over movabsq when possible"). On arm64, the latter allows to
use a single instruction with movn due to zero extension where
otherwise two would be needed. And last but not least add a
missing optimization in emit_a64_mov_i() where movn is used but
the subsequent movk not needed. With some of the Cilium programs
in use, this shrinks the needed instructions by about three
percent. Tested on Cavium ThunderX CN8890.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 19:11:45 -07:00
Daniel Borkmann
09ece3d0f2 bpf, arm64: save 4 bytes of unneeded stack space
Follow-up to 816d9ef32a ("bpf, arm64: remove ld_abs/ld_ind") in
that the extra 4 byte JIT scratchpad is not needed anymore since it
was in ld_abs/ld_ind as stack buffer for bpf_load_pointer().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 19:11:45 -07:00
Daniel Borkmann
38ca930601 bpf, arm32: save 4 bytes of unneeded stack space
The extra skb_copy_bits() buffer is not used anymore, therefore
remove the extra 4 byte stack space requirement.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 19:11:45 -07:00
Daniel Borkmann
36256009b2 bpf, x64: clean up retpoline emission slightly
Make the RETPOLINE_{RA,ED}X_BPF_JIT() a bit more readable by
cleaning up the macro, aligning comments and spacing.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 19:11:45 -07:00
Daniel Borkmann
631b1e3b83 bpf, sparc: remove unused variable
Since fe83963b7c ("bpf, sparc64: remove ld_abs/ld_ind") it's not
used anymore therefore remove it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 19:11:45 -07:00
Daniel Borkmann
0631b6583f bpf, mips: remove unused function
The ool_skb_header_pointer() and size_to_len() is unused same as
tmp_offset, therefore remove all of them.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 19:11:45 -07:00
Prashant Bhole
53ea24c20c samples/bpf: xdp_monitor, accept short options
Updated optstring parameter for getopt_long() to accept short options.
Also updated usage() function.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-14 23:41:59 +02:00
Daniel Borkmann
1772be37fc Merge branch 'bpf-stackmap-nmi'
Song Liu says:
====================
Changes v2 -> v3:
  Improve syntax based on suggestion by Tobin C. Harding.

Changes v1 -> v2:
  1. Rename some variables to (hopefully) reduce confusion;
  2. Check irq_work status with IRQ_WORK_BUSY (instead of work->sem);
  3. In Kconfig, let BPF_SYSCALL select IRQ_WORK;
  4. Add static to DEFINE_PER_CPU();
   5. Remove pr_info() in stack_map_init().
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-14 23:29:46 +02:00
Song Liu
13790d1cc7 bpf: add selftest for stackmap with build_id in NMI context
This new test captures stackmap with build_id with hardware event
PERF_COUNT_HW_CPU_CYCLES.

Because we only support one ips-to-build_id lookup per cpu in NMI
context, stack_amap will not be able to do the lookup in this test.
Therefore, we didn't do compare_stack_ips(), as it will alwasy fail.

urandom_read.c is extended to run configurable cycles so that it can be
caught by the perf event.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-14 23:29:45 +02:00
Song Liu
bae77c5eb5 bpf: enable stackmap with build_id in nmi context
Currently, we cannot parse build_id in nmi context because of
up_read(&current->mm->mmap_sem), this makes stackmap with build_id
less useful. This patch enables parsing build_id in nmi by putting
the up_read() call in irq_work. To avoid memory allocation in nmi
context, we use per cpu variable for the irq_work. As a result, only
one irq_work per cpu is allowed. If the irq_work is in-use, we
fallback to only report ips.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-14 23:29:45 +02:00
Daniel Borkmann
a84880ef43 Merge branch 'bpf-perf-rb-libbpf'
Jakub Kicinski says:

====================
This series started out as a follow up to the bpftool perf event dumping
patches.

As suggested by Daniel patch 1 makes use of PERF_SAMPLE_TIME to simplify
code and improve accuracy of timestamps.

Remaining patches are trying to move perf event loop into libbpf as
suggested by Alexei.  One user for this new function is bpftool which
links with libbpf nicely, the other, unfortunately, is in samples/bpf.
Remaining patches make samples/bpf link against full libbpf.a (not just
a handful of objects).  Once we have full power of libbpf at our disposal
we can convert some of XDP samples to use libbpf loader instead of
bpf_load.c.  My understanding is that this is the desired direction,
at least for networking code.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:48:32 +02:00
Jakub Kicinski
be5bca44aa samples: bpf: convert some XDP samples from bpf_load to libbpf
Now that we can use full powers of libbpf in BPF samples, we
should perhaps make the simplest XDP programs not depend on
bpf_load helpers.  This way newcomers will be exposed to the
recommended library from the start.

Use of bpf_prog_load_xattr() will also make it trivial to later
on request offload of the programs by simply adding ifindex to
the xattr.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:44:17 +02:00
Jakub Kicinski
17387dd5ac tools: bpf: don't complain about no kernel version for networking code
BPF programs only have to specify the target kernel version for
tracing related hooks, in networking world that requirement does
not really apply.  Loosen the checks in libbpf to reflect that.

bpf_object__open() users will continue to see the error for backward
compatibility (and because prog_type is not available there).

Error code for NULL file name is changed from ENOENT to EINVAL,
as it seems more appropriate, hopefully, that's an OK change.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:52 +02:00
Jakub Kicinski
2eb57bb8f6 tools: bpf: improve comments in libbpf.h
Fix spelling mistakes, improve and clarify the language of comments
in libbpf.h.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:52 +02:00
Jakub Kicinski
d0cabbb021 tools: bpf: move the event reading loop to libbpf
There are two copies of event reading loop - in bpftool and
trace_helpers "library".  Consolidate them and move the code
to libbpf.  Return codes from trace_helpers are kept, but
renamed to include LIBBPF prefix.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:52 +02:00
Jakub Kicinski
5f9380572b samples: bpf: compile and link against full libbpf
samples/bpf currently cherry-picks object files from tools/lib/bpf
to link against.  Just compile the full library and link statically
against it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:52 +02:00
Jakub Kicinski
74662ea5d4 samples: bpf: rename struct bpf_map_def to avoid conflict with libbpf
Both tools/lib/bpf/libbpf.h and samples/bpf/bpf_load.h define their
own version of struct bpf_map_def.  The version in bpf_load.h has
more fields.  libbpf does not support inner maps and its definition
of struct bpf_map_def lacks the related fields.  Rename the definition
in bpf_load.h (samples/bpf) to avoid conflicts.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:51 +02:00
Jakub Kicinski
e3687510fc tools: bpftool: use PERF_SAMPLE_TIME instead of reading the clock
Ask the kernel to include sample time in each even instead of
reading the clock.  This is also more accurate because our
clock reading was done when user space would dump the buffer,
not when sample was produced.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:40:51 +02:00
Prashant Bhole
cb9c28ef57 bpf: sync tools bpf.h uapi header
Sync the header from include/uapi/linux/bpf.h which was updated to add
fib lookup helper function. This fixes selftests/bpf build failure.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:35:25 +02:00
Joe Stringer
91bc07c9e8 selftests/bpf: Fix bash reference in Makefile
'|& ...' is a bash 4.0+ construct which is not guaranteed to be available
when using '$(shell ...)' in a Makefile. Fall back to the more portable
'2>&1 | ...'.

Fixes the following warning during compilation:

	/bin/sh: 1: Syntax error: "&" unexpected

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 01:32:07 +02:00
Daniel Borkmann
ff1f56d987 Merge branch 'bpf-fib-lookup-helper'
David Ahern says:

====================
Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet is expected to continue up the stack
for full processing.

The response from a FIB and neighbor lookup is either the egress index
with the bpf_fib_lookup struct filled in with dmac and gateway or
0 meaning the packet should continue up the stack. In time we can
revisit this to return the FIB lookup result errno if it is one of the
special RTN_'s such as RTN_BLACKHOLE (-EINVAL) so that the XDP
programs can do an early drop if desired.

Patches 1-6 do some more refactoring to IPv6 with the end goal of
extracting a FIB lookup function that aligns with fib_lookup for IPv4,
basically returning a fib6_info without creating a dst based entry.

Patch 7 adds lookup functions to the ipv6 stub. These are needed since
bpf is built into the kernel and ipv6 may not be built or loaded.

Patch 8 adds the bpf helper and 9 adds a sample program.

v3
- remove ETH_ALEN and in6_addr from uapi header

v2
- removed pkt_access from bpf_func_proto as noticed by Daniel
- added check in that IPv6 forwarding is enabled
- added DaveM's ack on patches 1-7 and 9 based on v1 response and
  fact that no changes were made to them in v2

v1
- updated commit messages and cover letter
- added comment to sample program noting lack of verification on
  egress device supporting XDP

RFC v2
- fixed use of foward helper from cls_act as noted by Daniel
- in patch 1 rename fib6_lookup_1 as well for consistency
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:59 +02:00
David Ahern
fe616055f7 samples/bpf: Add example of ipv4 and ipv6 forwarding in XDP
Simple example of fast-path forwarding. It has a serious flaw
in not verifying the egress device index supports XDP forwarding.
If the egress device does not packets are dropped.

Take this only as a simple example of fast-path forwarding.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00
David Ahern
87f5fc7e48 bpf: Provide helper to do forwarding lookups in kernel FIB table
Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet continues up the stack.

If it is to be forwarded, the forwarding can be done directly if the
neighbor is already known. If the neighbor does not exist, the first
few packets go up the stack for neighbor resolution. Once resolved, the
xdp program provides the fast path.

On successful lookup the nexthop dmac, current device smac and egress
device index are returned.

The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6
are implemented in this patch. The API includes layer 4 parameters if
the XDP program chooses to do deep packet inspection to allow compare
against ACLs implemented as FIB rules.

Header rewrite is left to the XDP program.

The lookup takes 2 flags:
- BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes
  straight to the table associated with the device (expert setting for
  those looking to maximize throughput)

- BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective.
  Default is an ingress lookup.

Initial performance numbers collected by Jesper, forwarded packets/sec:

       Full stack    XDP FIB lookup    XDP Direct lookup
IPv4   1,947,969       7,074,156          7,415,333
IPv6   1,728,000       6,165,504          7,262,720

These number are single CPU core forwarding on a Broadwell
E5-1650 v4 @ 3.60GHz.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00
David Ahern
65a2022e89 net/ipv6: Add fib lookup stubs for use in bpf helper
Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table,
a stub to do a lookup in a specific table, fib6_table_lookup, and
a stub for a full route lookup.

The stubs are needed for core bpf code to handle the case when the
IPv6 module is not builtin.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00
David Ahern
d4bea421f7 net/ipv6: Update fib6 tracepoint to take fib6_info
Similar to IPv4, IPv6 should use the FIB lookup result in the
tracepoint.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00
David Ahern
138118ec96 net/ipv6: Add fib6_lookup
Add IPv6 equivalent to fib_lookup. Does a fib lookup, including rules,
but returns a FIB entry, fib6_info, rather than a dst based rt6_info.
fib6_lookup is any where from 140% (MULTIPLE_TABLES config disabled)
to 60% faster than any of the dst based lookup methods (without custom
rules) and 25% faster with custom rules (e.g., l3mdev rule).

Since the lookup function has a completely different signature,
fib6_rule_action is split into 2 paths: the existing one is
renamed __fib6_rule_action and a new one for the fib6_info path
is added. fib6_rule_action decides which to call based on the
lookup_ptr. If it is fib6_table_lookup then the new path is taken.

Caller must hold rcu lock as no reference is taken on the returned
fib entry.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:56 +02:00
David Ahern
cc065a9eb9 net/ipv6: Refactor fib6_rule_action
Move source address lookup from fib6_rule_action to a helper. It will be
used in a later patch by a second variant for fib6_rule_action.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:56 +02:00
David Ahern
1d053da910 net/ipv6: Extract table lookup from ip6_pol_route
ip6_pol_route is used for ingress and egress FIB lookups. Refactor it
moving the table lookup into a separate fib6_table_lookup that can be
invoked separately and export the new function.

ip6_pol_route now calls fib6_table_lookup and uses the result to generate
a dst based rt6_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:56 +02:00
David Ahern
3b290a31bb net/ipv6: Rename rt6_multipath_select
Rename rt6_multipath_select to fib6_multipath_select and export it.
A later patch wants access to it similar to IPv4's fib_select_path.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:56 +02:00
David Ahern
6454743bc1 net/ipv6: Rename fib6_lookup to fib6_node_lookup
Rename fib6_lookup to fib6_node_lookup to better reflect what it
returns. The fib6_lookup name will be used in a later patch for
an IPv6 equivalent to IPv4's fib_lookup.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:56 +02:00
Wang YanQing
68625b7631 bpf, doc: clarification for the meaning of 'id'
For me, as a reader whose mother language isn't English, the
old words bring a little difficulty to catch the meaning, this
patch rewords the subsection in a more clarificatory way.

This patch also add blank lines as separator at two places
to improve readability.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:07:14 +02:00
Sirio Balmelli
96112e9363 selftests/bpf: ignore build products
Update .gitignore files.

Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-10 00:41:06 +02:00
Sirio Balmelli
cd65cd9512 selftests/bpf: add architecture-agnostic headers
The BPF selftests fail to build with missing headers
'asm/bitsperlong.h' and 'asm/errno.h'.

These already exist in 'tools/arch/[arch]/include';
add architecture-agnostic header files in 'tools/include/uapi'
to reference them.

Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-10 00:40:58 +02:00
Björn Töpel
ea7e343529 xsk: fix 64-bit division
i386 builds report:
  net/xdp/xdp_umem.o: In function `xdp_umem_reg':
  xdp_umem.c:(.text+0x47e): undefined reference to `__udivdi3'

This fix uses div_u64 instead of the GCC built-in.

Fixes: c0c77d8fb7 ("xsk: add user memory registration support sockopt")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 18:12:21 +02:00
Daniel Borkmann
a46a5c1a43 Merge branch 'bpf-nfp-programmable-rss'
Jakub Kicinski says:

====================
This small series adds a feature which extends BPF offload beyond
a pure host processing offload and firmly into the realm of
heterogeneous processing.  Allowing offloaded XDP programs to set
the RX queue index opens the door for defining fully programmable
RSS/n-tuple filter replacement.  In fact the device datapath will
skip the RSS processing completely if BPF decided on the queue
already, making the XDP program replace part of the standard NIC
datapath.

We hope some day the entire NIC datapath will be defined by BPF :)
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2018-05-09 18:04:39 +02:00
Jakub Kicinski
d985888faa nfp: bpf: support setting the RX queue index
BPF has access to all internal FW datapath structures.  Including
the structure containing RX queue selection.  With little coordination
with the datapath we can let the offloaded BPF select the RX queue.
We just need a way to tell the datapath that queue selection has already
been done and it shouldn't overwrite it.  Define a bit to tell datapath
BPF already selected a queue (QSEL_SET), if the selected queue is not
enabled (>= number of enabled queues) datapath will perform normal RSS.

BPF queue selection on the NIC can be used to replace standard
datapath RSS with fully programmable BPF/XDP RSS.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 18:04:37 +02:00
Jakub Kicinski
0d83003256 bpf: xdp: allow offloads to store into rx_queue_index
It's fairly easy for offloaded XDP programs to select the RX queue
packets go to.  We need a way of expressing this in the software.
Allow write to the rx_queue_index field of struct xdp_md for
device-bound programs.

Skip convert_ctx_access callback entirely for offloads.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 18:04:36 +02:00
Daniel Borkmann
a1d1f07935 Merge branch 'bpf-btf-id'
Martin KaFai Lau says:

====================
This series introduces BTF ID which is exposed through
the new BPF_BTF_GET_FD_BY_ID cmd, new "struct bpf_btf_info"
and new members in the "struct bpf_map_info".

Please see individual patch for details.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 17:25:16 +02:00
Martin KaFai Lau
cd8b89280c bpf: btf: Tests for BPF_OBJ_GET_INFO_BY_FD and BPF_BTF_GET_FD_BY_ID
This patch adds test for BPF_BTF_GET_FD_BY_ID and the new
btf_id/btf_key_id/btf_value_id in the "struct bpf_map_info".

It also modifies the existing BPF_OBJ_GET_INFO_BY_FD test
to reflect the new "struct bpf_btf_info".

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 17:25:14 +02:00
Martin KaFai Lau
7a01f6a3bd bpf: btf: Update tools/include/uapi/linux/btf.h with BTF ID
This patch sync the tools/include/uapi/linux/btf.h with
the newly introduced BTF ID support.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 17:25:14 +02:00
Martin KaFai Lau
e34d98d3b4 bpf: btf: Some test_btf clean up
This patch adds a CHECK() macro for condition checking
and error report purpose.  Something similar to test_progs.c

It also counts the number of tests passed/skipped/failed and
print them at the end of the test run.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 17:25:14 +02:00
Martin KaFai Lau
62dab84c81 bpf: btf: Add struct bpf_btf_info
During BPF_OBJ_GET_INFO_BY_FD on a btf_fd, the current bpf_attr's
info.info is directly filled with the BTF binary data.  It is
not extensible.  In this case, we want to add BTF ID.

This patch adds "struct bpf_btf_info" which has the BTF ID as
one of its member.  The BTF binary data itself is exposed through
the "btf" and "btf_size" members.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 17:25:13 +02:00
Martin KaFai Lau
78958fca7e bpf: btf: Introduce BTF ID
This patch gives an ID to each loaded BTF.  The ID is allocated by
the idr like the existing prog-id and map-id.

The bpf_put(map->btf) is moved to __bpf_map_put() so that the
userspace can stop seeing the BTF ID ASAP when the last BTF
refcnt is gone.

It also makes BTF accessible from userspace through the
1. new BPF_BTF_GET_FD_BY_ID command.  It is limited to CAP_SYS_ADMIN
   which is inline with the BPF_BTF_LOAD cmd and the existing
   BPF_[MAP|PROG]_GET_FD_BY_ID cmd.
2. new btf_id (and btf_key_id + btf_value_id) in "struct bpf_map_info"

Once the BTF ID handler is accessible from userspace, freeing a BTF
object has to go through a rcu period.  The BPF_BTF_GET_FD_BY_ID cmd
can then be done under a rcu_read_lock() instead of taking
spin_lock.
[Note: A similar rcu usage can be done to the existing
       bpf_prog_get_fd_by_id() in a follow up patch]

When processing the BPF_BTF_GET_FD_BY_ID cmd,
refcount_inc_not_zero() is needed because the BTF object
could be already in the rcu dead row .  btf_get() is
removed since its usage is currently limited to btf.c
alone.  refcount_inc() is used directly instead.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 17:25:13 +02:00
Martin KaFai Lau
82e9697250 bpf: btf: Avoid WARN_ON when CONFIG_REFCOUNT_FULL=y
If CONFIG_REFCOUNT_FULL=y, refcount_inc() WARN when refcount is 0.
When creating a new btf, the initial btf->refcnt is 0 and
triggered the following:

[   34.855452] refcount_t: increment on 0; use-after-free.
[   34.856252] WARNING: CPU: 6 PID: 1857 at lib/refcount.c:153 refcount_inc+0x26/0x30
....
[   34.868809] Call Trace:
[   34.869168]  btf_new_fd+0x1af6/0x24d0
[   34.869645]  ? btf_type_seq_show+0x200/0x200
[   34.870212]  ? lock_acquire+0x3b0/0x3b0
[   34.870726]  ? security_capable+0x54/0x90
[   34.871247]  __x64_sys_bpf+0x1b2/0x310
[   34.871761]  ? __ia32_sys_bpf+0x310/0x310
[   34.872285]  ? bad_area_access_error+0x310/0x310
[   34.872894]  do_syscall_64+0x95/0x3f0

This patch uses refcount_set() instead.

Reported-by: Yonghong Song <yhs@fb.com>
Tested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 17:25:13 +02:00
Fabio Estevam
53a7bdfb2a dt-bindings: dsa: Remove unnecessary #address/#size-cells
If the example binding is used on a real dts file, the following DTC
warning is seen with W=1:

arch/arm/boot/dts/imx6q-b450v3.dtb: Warning (avoid_unnecessary_addr_size): /mdio-gpio/switch@0: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property

Remove unnecessary #address-cells/#size-cells to improve the binding
document examples.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08 20:28:44 -04:00