Commit Graph

1925 Commits

Author SHA1 Message Date
Andrii Nakryiko
b0e875bac0 libbpf: Fix memory leak in strset
Free struct strset itself, not just its internal parts.

Fixes: 90d76d3ece ("libbpf: Extract internal set-of-strings datastructure APIs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211001185910.86492-1-andrii@kernel.org
2021-10-01 22:54:38 +02:00
Jakub Kicinski
dd9a887b35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/phy/bcm7xxx.c
  d88fd1b546 ("net: phy: bcm7xxx: Fixed indirect MMD operations")
  f68d08c437 ("net: phy: bcm7xxx: Add EPHY entry for 72165")

net/sched/sch_api.c
  b193e15ac6 ("net: prevent user from passing illegal stab size")
  69508d4333 ("net_sched: Use struct_size() and flex_array_size() helpers")

Both cases trivial - adjacent code additions.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30 14:49:21 -07:00
Kumar Kartikeya Dwivedi
4729445b47 libbpf: Fix segfault in light skeleton for objects without BTF
When fed an empty BPF object, bpftool gen skeleton -L crashes at
btf__set_fd() since it assumes presence of obj->btf, however for
the sequence below clang adds no .BTF section (hence no BTF).

Reproducer:

  $ touch a.bpf.c
  $ clang -O2 -g -target bpf -c a.bpf.c
  $ bpftool gen skeleton -L a.bpf.o
  /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
  /* THIS FILE IS AUTOGENERATED! */

  struct a_bpf {
	struct bpf_loader_ctx ctx;
  Segmentation fault (core dumped)

The same occurs for files compiled without BTF info, i.e. without
clang's -g flag.

Fixes: 6723474373 (libbpf: Generate loader program out of BPF ELF file.)
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210930061634.1840768-1-memxor@gmail.com
2021-09-30 23:19:58 +02:00
Kumar Kartikeya Dwivedi
e68ac00827 libbpf: Fix skel_internal.h to set errno on loader retval < 0
When the loader indicates an internal error (result of a checked bpf
system call), it returns the result in attr.test.retval. However, tests
that rely on ASSERT_OK_PTR on NULL (returned from light skeleton) may
miss that NULL denotes an error if errno is set to 0. This would result
in skel pointer being NULL, while ASSERT_OK_PTR returning 1, leading to
a SEGV on dereference of skel, because libbpf_get_error relies on the
assumption that errno is always set in case of error for ptr == NULL.

In particular, this was observed for the ksyms_module test. When
executed using `./test_progs -t ksyms`, prior tests manipulated errno
and the test didn't crash when it failed at ksyms_module load, while
using `./test_progs -t ksyms_module` crashed due to errno being
untouched.

Fixes: 6723474373 (libbpf: Generate loader program out of BPF ELF file.)
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210927145941.1383001-11-memxor@gmail.com
2021-09-29 20:42:32 -07:00
Toke Høiland-Jørgensen
161ecd5379 libbpf: Properly ignore STT_SECTION symbols in legacy map definitions
The previous patch to ignore STT_SECTION symbols only added the ignore
condition in one of them. This fails if there's more than one map
definition in the 'maps' section, because the subsequent modulus check will
fail, resulting in error messages like:

libbpf: elf: unable to determine legacy map definition size in ./xdpdump_xdp.o

Fix this by also ignoring STT_SECTION in the first loop.

Fixes: c3e8c44a90 ("libbpf: Ignore STT_SECTION symbols in 'maps' section")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210929213837.832449-1-toke@redhat.com
2021-09-29 15:50:32 -07:00
Alexei Starovoitov
66fe332417 libbpf: Make gen_loader data aligned.
Align gen_loader data to 8 byte boundary to make sure union bpf_attr,
bpf_insns and other structs are aligned.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210927145941.1383001-9-memxor@gmail.com
2021-09-29 13:27:19 -07:00
Andrii Nakryiko
7c80c87ad5 selftests/bpf: Switch sk_lookup selftests to strict SEC("sk_lookup") use
Update "sk_lookup/" definition to be a stand-alone type specifier,
with backwards-compatible prefix match logic in non-libbpf-1.0 mode.

Currently in selftests all the "sk_lookup/<whatever>" uses just use
<whatever> for duplicated unique name encoding, which is redundant as
BPF program's name (C function name) uniquely and descriptively
identifies the intended use for such BPF programs.

With libbpf's SEC_DEF("sk_lookup") definition updated, switch existing
sk_lookup programs to use "unqualified" SEC("sk_lookup") section names,
with no random text after it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-11-andrii@kernel.org
2021-09-28 13:51:20 -07:00
Andrii Nakryiko
dd94d45cf0 libbpf: Add opt-in strict BPF program section name handling logic
Implement strict ELF section name handling for BPF programs. It utilizes
`libbpf_set_strict_mode()` framework and adds new flag: LIBBPF_STRICT_SEC_NAME.

If this flag is set, libbpf will enforce exact section name matching for
a lot of program types that previously allowed just partial prefix
match. E.g., if previously SEC("xdp_whatever_i_want") was allowed, now
in strict mode only SEC("xdp") will be accepted, which makes SEC("")
definitions cleaner and more structured. SEC() now won't be used as yet
another way to uniquely encode BPF program identifier (for that
C function name is better and is guaranteed to be unique within
bpf_object). Now SEC() is strictly BPF program type and, depending on
program type, extra load/attach parameter specification.

Libbpf completely supports multiple BPF programs in the same ELF
section, so multiple BPF programs of the same type/specification easily
co-exist together within the same bpf_object scope.

Additionally, a new (for now internal) convention is introduced: section
name that can be a stand-alone exact BPF program type specificator, but
also could have extra parameters after '/' delimiter. An example of such
section is "struct_ops", which can be specified by itself, but also
allows to specify the intended operation to be attached to, e.g.,
"struct_ops/dctcp_init". Note, that "struct_ops_some_op" is not allowed.
Such section definition is specified as "struct_ops+".

This change is part of libbpf 1.0 effort ([0], [1]).

  [0] Closes: https://github.com/libbpf/libbpf/issues/271
  [1] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#stricter-and-more-uniform-bpf-program-section-name-sec-handling

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-10-andrii@kernel.org
2021-09-28 13:51:19 -07:00
Andrii Nakryiko
d41ea045a6 libbpf: Complete SEC() table unification for BPF_APROG_SEC/BPF_EAPROG_SEC
Complete SEC() table refactoring towards unified form by rewriting
BPF_APROG_SEC and BPF_EAPROG_SEC definitions with
SEC_DEF(SEC_ATTACHABLE_OPT) (for optional expected_attach_type) and
SEC_DEF(SEC_ATTACHABLE) (mandatory expected_attach_type), respectively.
Drop BPF_APROG_SEC, BPF_EAPROG_SEC, and BPF_PROG_SEC_IMPL macros after
that, leaving SEC_DEF() macro as the only one used.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-9-andrii@kernel.org
2021-09-28 13:51:19 -07:00
Andrii Nakryiko
15ea31fadd libbpf: Refactor ELF section handler definitions
Refactor ELF section handler definitions table to use a set of flags and
unified SEC_DEF() macro. This allows for more succinct and table-like
set of definitions, and allows to more easily extend the logic without
adding more verbosity (this is utilized in later patches in the series).

This approach is also making libbpf-internal program pre-load callback
not rely on bpf_sec_def definition, which demonstrates that future
pluggable ELF section handlers will be able to achieve similar level of
integration without libbpf having to expose extra types and APIs.

For starters, update SEC_DEF() definitions and make them more succinct.
Also convert BPF_PROG_SEC() and BPF_APROG_COMPAT() definitions to
a common SEC_DEF() use.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-8-andrii@kernel.org
2021-09-28 13:51:19 -07:00
Andrii Nakryiko
13d35a0cf1 libbpf: Reduce reliance of attach_fns on sec_def internals
Move closer to not relying on bpf_sec_def internals that won't be part
of public API, when pluggable SEC() handlers will be allowed. Drop
pre-calculated prefix length, and in various helpers don't rely on this
prefix length availability. Also minimize reliance on knowing
bpf_sec_def's prefix for few places where section prefix shortcuts are
supported (e.g., tp vs tracepoint, raw_tp vs raw_tracepoint).

Given checking some string for having a given string-constant prefix is
such a common operation and so annoying to be done with pure C code, add
a small macro helper, str_has_pfx(), and reuse it throughout libbpf.c
where prefix comparison is performed. With __builtin_constant_p() it's
possible to have a convenient helper that checks some string for having
a given prefix, where prefix is either string literal (or compile-time
known string due to compiler optimization) or just a runtime string
pointer, which is quite convenient and saves a lot of typing and string
literal duplication.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-7-andrii@kernel.org
2021-09-28 13:51:19 -07:00
Andrii Nakryiko
12d9466d8b libbpf: Refactor internal sec_def handling to enable pluggability
Refactor internals of libbpf to allow adding custom SEC() handling logic
easily from outside of libbpf. To that effect, each SEC()-handling
registration sets mandatory program type/expected attach type for
a given prefix and can provide three callbacks called at different
points of BPF program lifetime:

  - init callback for right after bpf_program is initialized and
  prog_type/expected_attach_type is set. This happens during
  bpf_object__open() step, close to the very end of constructing
  bpf_object, so all the libbpf APIs for querying and updating
  bpf_program properties should be available;

  - pre-load callback is called right before BPF_PROG_LOAD command is
  called in the kernel. This callbacks has ability to set both
  bpf_program properties, as well as program load attributes, overriding
  and augmenting the standard libbpf handling of them;

  - optional auto-attach callback, which makes a given SEC() handler
  support auto-attachment of a BPF program through bpf_program__attach()
  API and/or BPF skeletons <skel>__attach() method.

Each callbacks gets a `long cookie` parameter passed in, which is
specified during SEC() handling. This can be used by callbacks to lookup
whatever additional information is necessary.

This is not yet completely ready to be exposed to the outside world,
mainly due to non-public nature of struct bpf_prog_load_params. Instead
of making it part of public API, we'll wait until the planned low-level
libbpf API improvements for BPF_PROG_LOAD and other typical bpf()
syscall APIs, at which point we'll have a public, probably OPTS-based,
way to fully specify BPF program load parameters, which will be used as
an interface for custom pre-load callbacks.

But this change itself is already a good first step to unify the BPF
program hanling logic even within the libbpf itself. As one example, all
the extra per-program type handling (sleepable bit, attach_btf_id
resolution, unsetting optional expected attach type) is now more obvious
and is gathered in one place.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-6-andrii@kernel.org
2021-09-28 13:51:19 -07:00
Andrii Nakryiko
9673268f03 libbpf: Add "tc" SEC_DEF which is a better name for "classifier"
As argued in [0], add "tc" ELF section definition for SCHED_CLS BPF
program type. "classifier" is a misleading terminology and should be
migrated away from.

  [0] https://lore.kernel.org/bpf/270e27b1-e5be-5b1c-b343-51bd644d0747@iogearbox.net/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-2-andrii@kernel.org
2021-09-28 13:51:19 -07:00
David S. Miller
4ccb9f03fe Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-09-28

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 14 day(s) which contain
a total of 11 files changed, 139 insertions(+), 53 deletions(-).

The main changes are:

1) Fix MIPS JIT jump code emission for too large offsets, from Piotr Krysiuk.

2) Fix x86 JIT atomic/fetch emission when dst reg maps to rax, from Johan Almbladh.

3) Fix cgroup_sk_alloc corner case when called from interrupt, from Daniel Borkmann.

4) Fix segfault in libbpf's linker for objects without BTF, from Kumar Kartikeya Dwivedi.

5) Fix bpf_jit_charge_modmem for applications with CAP_BPF, from Lorenz Bauer.

6) Fix return value handling for struct_ops BPF programs, from Hou Tao.

7) Various fixes to BPF selftests, from Jiri Benc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
,
2021-09-28 13:52:46 +01:00
Kumar Kartikeya Dwivedi
bcfd367c28 libbpf: Fix segfault in static linker for objects without BTF
When a BPF object is compiled without BTF info (without -g),
trying to link such objects using bpftool causes a SIGSEGV due to
btf__get_nr_types accessing obj->btf which is NULL. Fix this by
checking for the NULL pointer, and return error.

Reproducer:
$ cat a.bpf.c
extern int foo(void);
int bar(void) { return foo(); }
$ cat b.bpf.c
int foo(void) { return 0; }
$ clang -O2 -target bpf -c a.bpf.c
$ clang -O2 -target bpf -c b.bpf.c
$ bpftool gen obj out a.bpf.o b.bpf.o
Segmentation fault (core dumped)

After fix:
$ bpftool gen obj out a.bpf.o b.bpf.o
libbpf: failed to find BTF info for object 'a.bpf.o'
Error: failed to link 'a.bpf.o': Unknown error -22 (-22)

Fixes: a46349227c (libbpf: Add linker extern resolution support for functions and global variables)
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210924023725.70228-1-memxor@gmail.com
2021-09-28 09:29:03 +02:00
Toke Høiland-Jørgensen
c3e8c44a90 libbpf: Ignore STT_SECTION symbols in 'maps' section
When parsing legacy map definitions, libbpf would error out when
encountering an STT_SECTION symbol. This becomes a problem because some
versions of binutils will produce SECTION symbols for every section when
processing an ELF file, so BPF files run through 'strip' will end up with
such symbols, making libbpf refuse to load them.

There's not really any reason why erroring out is strictly necessary, so
change libbpf to just ignore SECTION symbols when parsing the ELF.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210927205810.715656-1-toke@redhat.com
2021-09-27 21:29:37 -07:00
Jakub Kicinski
2fcd14d0f7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/mptcp/protocol.c
  977d293e23 ("mptcp: ensure tx skbs always have the MPTCP ext")
  efe686ffce ("mptcp: ensure tx skbs always have the MPTCP ext")

same patch merged in both trees, keep net-next.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-23 11:19:49 -07:00
Andrii Nakryiko
cc10623c68 libbpf: Add legacy uprobe attaching support
Similarly to recently added legacy kprobe attach interface support
through tracefs, support attaching uprobes using the legacy interface if
host kernel doesn't support newer FD-based interface.

For uprobes event name consists of "libbpf_" prefix, PID, sanitized
binary path and offset within that binary. Structuraly the code is
aligned with kprobe logic refactoring in previous patch. struct
bpf_link_perf is re-used and all the same legacy_probe_name and
legacy_is_retprobe fields are used to ensure proper cleanup on
bpf_link__destroy().

Users should be aware, though, that on old kernels which don't support
FD-based interface for kprobe/uprobe attachment, if the application
crashes before bpf_link__destroy() is called, uprobe legacy
events will be left in tracefs. This is the same limitation as with
legacy kprobe interfaces.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210921210036.1545557-5-andrii@kernel.org
2021-09-21 19:40:09 -07:00
Andrii Nakryiko
46ed5fc33d libbpf: Refactor and simplify legacy kprobe code
Refactor legacy kprobe handling code to follow the same logic as uprobe
legacy logic added in the next patchs:
  - add append_to_file() helper that makes it simpler to work with
    tracefs file-based interface for creating and deleting probes;
  - move out probe/event name generation outside of the code that
    adds/removes it, which simplifies bookkeeping significantly;
  - change the probe name format to start with "libbpf_" prefix and
    include offset within kernel function;
  - switch 'unsigned long' to 'size_t' for specifying kprobe offsets,
    which is consistent with how uprobes define that, simplifies
    printf()-ing internally, and also avoids unnecessary complications on
    architectures where sizeof(long) != sizeof(void *).

This patch also implicitly fixes the problem with invalid open() error
handling present in poke_kprobe_events(), which (the function) this
patch removes.

Fixes: ca304b40c2 ("libbpf: Introduce legacy kprobe events support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210921210036.1545557-4-andrii@kernel.org
2021-09-21 19:40:09 -07:00
Andrii Nakryiko
303a257223 libbpf: Fix memory leak in legacy kprobe attach logic
In some error scenarios legacy_probe string won't be free()'d. Fix this.
This was reported by Coverity static analysis.

Fixes: ca304b40c2 ("libbpf: Introduce legacy kprobe events support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210921210036.1545557-2-andrii@kernel.org
2021-09-21 19:40:08 -07:00
Grant Seltzer
97c140d94e libbpf: Add doc comments in libbpf.h
This adds comments above functions in libbpf.h which document
their uses. These comments are of a format that doxygen and sphinx
can pick up and render. These are rendered by libbpf.readthedocs.org

These doc comments are for:
- bpf_object__find_map_by_name()
- bpf_map__fd()
- bpf_map__is_internal()
- libbpf_get_error()
- libbpf_num_possible_cpus()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210918031457.36204-1-grantseltzer@gmail.com
2021-09-20 17:23:31 -07:00
Ian Rogers
aba5daeb64 libperf evsel: Make use of FD robust.
FD uses xyarray__entry that may return NULL if an index is out of
bounds. If NULL is returned then a segv happens as FD unconditionally
dereferences the pointer. This was happening in a case of with perf
iostat as shown below. The fix is to make FD an "int*" rather than an
int and handle the NULL case as either invalid input or a closed fd.

  $ sudo gdb --args perf stat --iostat  list
  ...
  Breakpoint 1, perf_evsel__alloc_fd (evsel=0x5555560951a0, ncpus=1, nthreads=1) at evsel.c:50
  50      {
  (gdb) bt
   #0  perf_evsel__alloc_fd (evsel=0x5555560951a0, ncpus=1, nthreads=1) at evsel.c:50
   #1  0x000055555585c188 in evsel__open_cpu (evsel=0x5555560951a0, cpus=0x555556093410,
      threads=0x555556086fb0, start_cpu=0, end_cpu=1) at util/evsel.c:1792
   #2  0x000055555585cfb2 in evsel__open (evsel=0x5555560951a0, cpus=0x0, threads=0x555556086fb0)
      at util/evsel.c:2045
   #3  0x000055555585d0db in evsel__open_per_thread (evsel=0x5555560951a0, threads=0x555556086fb0)
      at util/evsel.c:2065
   #4  0x00005555558ece64 in create_perf_stat_counter (evsel=0x5555560951a0,
      config=0x555555c34700 <stat_config>, target=0x555555c2f1c0 <target>, cpu=0) at util/stat.c:590
   #5  0x000055555578e927 in __run_perf_stat (argc=1, argv=0x7fffffffe4a0, run_idx=0)
      at builtin-stat.c:833
   #6  0x000055555578f3c6 in run_perf_stat (argc=1, argv=0x7fffffffe4a0, run_idx=0)
      at builtin-stat.c:1048
   #7  0x0000555555792ee5 in cmd_stat (argc=1, argv=0x7fffffffe4a0) at builtin-stat.c:2534
   #8  0x0000555555835ed3 in run_builtin (p=0x555555c3f540 <commands+288>, argc=3,
      argv=0x7fffffffe4a0) at perf.c:313
   #9  0x0000555555836154 in handle_internal_command (argc=3, argv=0x7fffffffe4a0) at perf.c:365
   #10 0x000055555583629f in run_argv (argcp=0x7fffffffe2ec, argv=0x7fffffffe2e0) at perf.c:409
   #11 0x0000555555836692 in main (argc=3, argv=0x7fffffffe4a0) at perf.c:539
  ...
  (gdb) c
  Continuing.
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (uncore_iio_0/event=0x83,umask=0x04,ch_mask=0xF,fc_mask=0x07/).
  /bin/dmesg | grep -i perf may provide additional information.

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555559b03ea in perf_evsel__close_fd_cpu (evsel=0x5555560951a0, cpu=1) at evsel.c:166
  166                     if (FD(evsel, cpu, thread) >= 0)

v3. fixes a bug in perf_evsel__run_ioctl where the sense of a branch was
    backward.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210918054440.2350466-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-18 17:43:06 -03:00
Dave Marchevsky
6c66b0e7c9 libbpf: Use static const fmt string in __bpf_printk
The __bpf_printk convenience macro was using a 'char' fmt string holder
as it predates support for globals in libbpf. Move to more efficient
'static const char', but provide a fallback to the old way via
BPF_NO_GLOBAL_DATA so users on old kernels can still use the macro.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-6-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00
Dave Marchevsky
c2758baa97 libbpf: Modify bpf_printk to choose helper based on arg count
Instead of being a thin wrapper which calls into bpf_trace_printk,
libbpf's bpf_printk convenience macro now chooses between
bpf_trace_printk and bpf_trace_vprintk. If the arg count (excluding
format string) is >3, use bpf_trace_vprintk, otherwise use the older
helper.

The motivation behind this added complexity - instead of migrating
entirely to bpf_trace_vprintk - is to maintain good developer experience
for users compiling against new libbpf but running on older kernels.
Users who are passing <=3 args to bpf_printk will see no change in their
bytecode.

__bpf_vprintk functions similarly to BPF_SEQ_PRINTF and BPF_SNPRINTF
macros elsewhere in the file - it allows use of bpf_trace_vprintk
without manual conversion of varargs to u64 array. Previous
implementation of bpf_printk macro is moved to __bpf_printk for use by
the new implementation.

This does change behavior of bpf_printk calls with >3 args in the "new
libbpf, old kernels" scenario. Before this patch, attempting to use 4
args to bpf_printk results in a compile-time error. After this patch,
using bpf_printk with 4 args results in a trace_vprintk helper call
being emitted and a load-time failure on older kernels.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-5-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00
Andrii Nakryiko
942025c9f3 libbpf: Constify all high-level program attach APIs
Attach APIs shouldn't need to modify bpf_program/bpf_map structs, so
change all struct bpf_program and struct bpf_map pointers to const
pointers. This is completely backwards compatible with no functional
change.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-8-andrii@kernel.org
2021-09-17 09:05:41 -07:00
Andrii Nakryiko
91b555d73e libbpf: Schedule open_opts.attach_prog_fd deprecation since v0.7
bpf_object_open_opts.attach_prog_fd makes a pretty strong assumption
that bpf_object contains either only single freplace BPF program or all
of BPF programs in BPF object are freplaces intended to replace
different subprograms of the same target BPF program. This seems both
a bit confusing, too assuming, and limiting.

We've had bpf_program__set_attach_target() API which allows more
fine-grained control over this, on a per-program level. As such, mark
open_opts.attach_prog_fd as deprecated starting from v0.7, so that we
have one more universal way of setting freplace targets. With previous
change to allow NULL attach_func_name argument, and especially combined
with BPF skeleton, arguable bpf_program__set_attach_target() is a more
convenient and explicit API as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-7-andrii@kernel.org
2021-09-17 09:05:41 -07:00
Andrii Nakryiko
2d5ec1c66e libbpf: Allow skipping attach_func_name in bpf_program__set_attach_target()
Allow to use bpf_program__set_attach_target to only set target attach
program FD, while letting libbpf to use target attach function name from
SEC() definition. This might be useful for some scenarios where
bpf_object contains multiple related freplace BPF programs intended to
replace different sub-programs in target BPF program. In such case all
programs will have the same attach_prog_fd, but different
attach_func_name. It's convenient to specify such target function names
declaratively in SEC() definitions, but attach_prog_fd is a dynamic
runtime setting.

To simplify such scenario, allow bpf_program__set_attach_target() to
delay BTF ID resolution till the BPF program load time by providing NULL
attach_func_name. In that case the behavior will be similar to using
bpf_object_open_opts.attach_prog_fd (which is marked deprecated since
v0.7), but has the benefit of allowing more control by user in what is
attached to what. Such setup allows having BPF programs attached to
different target attach_prog_fd with target functions still declaratively
recorded in BPF source code in SEC() definitions.

Selftests changes in the next patch should make this more obvious.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-5-andrii@kernel.org
2021-09-17 09:05:00 -07:00
Andrii Nakryiko
277641859e libbpf: Deprecated bpf_object_open_opts.relaxed_core_relocs
It's relevant and hasn't been doing anything for a long while now.
Deprecated it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-4-andrii@kernel.org
2021-09-17 09:04:12 -07:00
Andrii Nakryiko
f11f86a393 libbpf: Use pre-setup sec_def in libbpf_find_attach_btf_id()
Don't perform another search for sec_def inside
libbpf_find_attach_btf_id(), as each recognized bpf_program already has
prog->sec_def set.

Also remove unnecessary NULL check for prog->sec_name, as it can never
be NULL.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-2-andrii@kernel.org
2021-09-17 09:04:12 -07:00
Grant Seltzer
69cd823956 libbpf: Add sphinx code documentation comments
This adds comments above five functions in btf.h which document
their uses. These comments are of a format that doxygen and sphinx
can pick up and render. These are rendered by libbpf.readthedocs.org

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210915021951.117186-1-grantseltzer@gmail.com
2021-09-15 13:16:02 -07:00
Yonghong Song
5b84bd1036 libbpf: Add support for BTF_KIND_TAG
Add BTF_KIND_TAG support for parsing and dedup.
Also added sanitization for BTF_KIND_TAG. If BTF_KIND_TAG is not
supported in the kernel, sanitize it to INTs.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223025.246687-1-yhs@fb.com
2021-09-14 18:45:52 -07:00
Yonghong Song
30025e8bd8 libbpf: Rename btf_{hash,equal}_int to btf_{hash,equal}_int_tag
This patch renames functions btf_{hash,equal}_int() to
btf_{hash,equal}_int_tag() so they can be reused for
BTF_KIND_TAG support. There is no functionality change for
this patch.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223020.245829-1-yhs@fb.com
2021-09-14 18:45:52 -07:00
Andrii Nakryiko
b6291a6f30 libbpf: Minimize explicit iterator of section definition array
Remove almost all the code that explicitly iterated BPF program section
definitions in favor of using find_sec_def(). The only remaining user of
section_defs is libbpf_get_type_names that has to iterate all of them to
construct its result.

Having one internal API entry point for section definitions will
simplify further refactorings around libbpf's program section
definitions parsing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-5-andrii@kernel.org
2021-09-14 15:49:24 -07:00
Andrii Nakryiko
5532dfd42e libbpf: Simplify BPF program auto-attach code
Remove the need to explicitly pass bpf_sec_def for auto-attachable BPF
programs, as it is already recorded at bpf_object__open() time for all
recognized type of BPF programs. This further reduces number of explicit
calls to find_sec_def(), simplifying further refactorings.

No functional changes are done by this patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-4-andrii@kernel.org
2021-09-14 15:49:24 -07:00
Andrii Nakryiko
91b4d1d1d5 libbpf: Ensure BPF prog types are set before relocations
Refactor bpf_object__open() sequencing to perform BPF program type
detection based on SEC() definitions before we get to relocations
collection. This allows to have more information about BPF program by
the time we get to, say, struct_ops relocation gathering. This,
subsequently, simplifies struct_ops logic and removes the need to
perform extra find_sec_def() resolution.

With this patch libbpf will require all struct_ops BPF programs to be
marked with SEC("struct_ops") or SEC("struct_ops/xxx") annotations.
Real-world applications are already doing that through something like
selftests's BPF_STRUCT_OPS() macro. This change streamlines libbpf's
internal handling of SEC() definitions and is in the sprit of
upcoming libbpf-1.0 section strictness changes ([0]).

  [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#stricter-and-more-uniform-bpf-program-section-name-sec-handling

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-3-andrii@kernel.org
2021-09-14 15:49:24 -07:00
Rafael David Tinoco
ca304b40c2 libbpf: Introduce legacy kprobe events support
Allow kprobe tracepoint events creation through legacy interface, as the
kprobe dynamic PMUs support, used by default, was only created in v4.17.

Store legacy kprobe name in struct bpf_perf_link, instead of creating
a new "subclass" off of bpf_perf_link. This is ok as it's just two new
fields, which are also going to be reused for legacy uprobe support in
follow up patches.

Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210912064844.3181742-1-rafaeldtinoco@gmail.com
2021-09-14 14:44:45 -07:00
Andrii Nakryiko
2f38304127 libbpf: Make libbpf_version.h non-auto-generated
Turn previously auto-generated libbpf_version.h header into a normal
header file. This prevents various tricky Makefile integration issues,
simplifies the overall build process, but also allows to further extend
it with some more versioning-related APIs in the future.

To prevent accidental out-of-sync versions as defined by libbpf.map and
libbpf_version.h, Makefile checks their consistency at build time.

Simultaneously with this change bump libbpf.map to v0.6.

Also undo adding libbpf's output directory into include path for
kernel/bpf/preload, bpftool, and resolve_btfids, which is not necessary
because libbpf_version.h is just a normal header like any other.

Fixes: 0b46b75505 ("libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210913222309.3220849-1-andrii@kernel.org
2021-09-13 15:36:47 -07:00
Quentin Monnet
0b46b75505 libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations
Introduce a macro LIBBPF_DEPRECATED_SINCE(major, minor, message) to prepare
the deprecation of two API functions. This macro marks functions as deprecated
when libbpf's version reaches the values passed as an argument.

As part of this change libbpf_version.h header is added with recorded major
(LIBBPF_MAJOR_VERSION) and minor (LIBBPF_MINOR_VERSION) libbpf version macros.
They are now part of libbpf public API and can be relied upon by user code.
libbpf_version.h is installed system-wide along other libbpf public headers.

Due to this new build-time auto-generated header, in-kernel applications
relying on libbpf (resolve_btfids, bpftool, bpf_preload) are updated to
include libbpf's output directory as part of a list of include search paths.
Better fix would be to use libbpf's make_install target to install public API
headers, but that clean up is left out as a future improvement. The build
changes were tested by building kernel (with KBUILD_OUTPUT and O= specified
explicitly), bpftool, libbpf, selftests/bpf, and resolve_btfids builds. No
problems were detected.

Note that because of the constraints of the C preprocessor we have to write
a few lines of macro magic for each version used to prepare deprecation (0.6
for now).

Also, use LIBBPF_DEPRECATED_SINCE() to schedule deprecation of
btf__get_from_id() and btf__load(), which are replaced by
btf__load_from_kernel_by_id() and btf__load_into_kernel(), respectively,
starting from future libbpf v0.6. This is part of libbpf 1.0 effort ([0]).

  [0] Closes: https://github.com/libbpf/libbpf/issues/278

Co-developed-by: Quentin Monnet <quentin@isovalent.com>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210908213226.1871016-1-andrii@kernel.org
2021-09-09 23:28:05 +02:00
Andrii Nakryiko
006a5099fc libbpf: Fix build with latest gcc/binutils with LTO
After updating to binutils 2.35, the build began to fail with an
assembler error. A bug was opened on the Red Hat Bugzilla a few days
later for the same issue.

Work around the problem by using the new `symver` attribute (introduced
in GCC 10) as needed instead of assembler directives.

This addresses Red Hat ([0]) and OpenSUSE ([1]) bug reports, as well as libbpf
issue ([2]).

  [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1863059
  [1]: https://bugzilla.opensuse.org/show_bug.cgi?id=1188749
  [2]: Closes: https://github.com/libbpf/libbpf/issues/338

Co-developed-by: Patrick McCarty <patrick.mccarty@intel.com>
Co-developed-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Patrick McCarty <patrick.mccarty@intel.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210907221023.2660953-1-andrii@kernel.org
2021-09-07 19:32:04 -07:00
Matt Smith
08a6f22ef6 libbpf: Change bpf_object_skeleton data field to const pointer
This change was necessary to enforce the implied contract
that bpf_object_skeleton->data should not be mutated.  The data
will be cast to `void *` during assignment to handle the case
where a user is compiling with older libbpf headers to avoid
a compiler warning of `const void *` data being cast to `void *`

Signed-off-by: Matt Smith <alastorze@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210901194439.3853238-2-alastorze@fb.com
2021-09-07 17:33:49 -07:00
Toke Høiland-Jørgensen
03e601f48b libbpf: Don't crash on object files with no symbol tables
If libbpf encounters an ELF file that has been stripped of its symbol
table, it will crash in bpf_object__add_programs() when trying to
dereference the obj->efile.symbols pointer.

Fix this by erroring out of bpf_object__elf_collect() if it is not able
able to find the symbol table.

v2:
  - Move check into bpf_object__elf_collect() and add nice error message

Fixes: 6245947c1b ("libbpf: Allow gaps in BPF program sections to support overriden weak functions")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210901114812.204720-1-toke@redhat.com
2021-09-07 17:18:48 -07:00
Linus Torvalds
27151f1778 perf tools changes for v5.15:
New features:
 
 - Improvements for the flamegraph python script, including:
 
   - Display perf.data header
   - Display PIDs of user stacks
   - Added option to change color scheme
   - Default to blue/green color scheme to improve accessibility
   - Correctly identify kernel stacks when debuginfo is available
 
 - Improvements for 'perf bench futex':
   - Add --mlockall parameter
   - Add --broadcast and --pi to the 'requeue' sub benchmark
 
 - Add support for PMU aliases.
 
 - Introduce an ARM Coresight ETE decoder.
 
 - Add a 'perf bench' entry for evlist open/close operations, to help quantify
   improvements with multithreading 'perf record'.
 
 - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf script's
   python scripting.
 
 - Add a 'perf test' entry for PMU aliases.
 
 - Add a 'perf test' entry for 'perf record/perf report/perf script' pipe mode.
 
 Fixes:
 
 - perf script dlfilter (API for filtering via dynamically loaded shared object
   introduced in v5.14) fixes and a 'perf test' entry for it.
 
 - Fix get_current_dir_name() compilation on Android.
 
 - Fix issues with asciidoc and double dashes uses.
 
 - Fix memory leaks in the BTF handling code.
 
 - Fix leftover problems in the Documentation from the infrastructure originally
   lifted from the git codebase.
 
 - Fix *probe_vfs_getname.sh 'perf test' failures.
 
 - Handle fd gaps in 'perf test's test__dso_data_reopen().
 
 - Make sure to show disasembly warnings for 'perf annotate --stdio'.
 
 - Fix output from pipe to file and vice-versa in 'perf record/report/script'.
 
 - Correct 'perf data -h' output.
 
 - Fix wrong comm in system-wide mode with 'perf record --delay'.
 
 - Do not allow --for-each-cgroup without cpu in 'perf stat'
 
 - Make 'perf test --skip' work on shell tests.
 
 - Fix libperf's verbose printing.
 
 Misc improvements:
 
 - Preparatory patches for multithreading varios 'perf record' phases
   (synthesizing, opening, recording, etc).
 
 - Add sparse context/locking annotations in compiler-types.h, also to help with
   the multithreading effort.
 
 - Optimize the generation of the arch specific erno tables used in 'perf trace'.
 
 - Optimize libperf's perf_cpu_map__max().
 
 - Improve ARM's CoreSight warnings.
 
 - Report collisions in AUX records.
 
 - Improve warnings for the LLVM 'perf test' entry.
 
 - Improve the PMU events 'perf test' codebase.
 
 - perf test: Do not compare overheads in the zstd comp test
 
 - Better support annotation on ARM.
 
 - Update 'perf trace's cmd string table to decode sys_bpf() first arg.
 
 Vendor events:
 
 - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and Elhart Lake.
 
 - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake servers.
 
 Hardware tracing:
 
 - Improvements for the ARM hardware tracing auxtrace support.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYTO8OAAKCRCyPKLppCJ+
 J/N0AQCTAfxKsVyh9vjIY5vwHvNkYMDM4Wvs7TmlfXbVgKLP3AEA0wYlTyar/teI
 NC6jQpipxpWASFUJbLSmU8qn/XFXwQo=
 =4r0w
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Improvements for the flamegraph python script, including:
       - Display perf.data header
       - Display PIDs of user stacks
       - Added option to change color scheme
       - Default to blue/green color scheme to improve accessibility
       - Correctly identify kernel stacks when debuginfo is available

   - Improvements for 'perf bench futex':
       - Add --mlockall parameter
       - Add --broadcast and --pi to the 'requeue' sub benchmark

   - Add support for PMU aliases.

   - Introduce an ARM Coresight ETE decoder.

   - Add a 'perf bench' entry for evlist open/close operations, to help
     quantify improvements with multithreading 'perf record'.

   - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf
     script's python scripting.

   - Add a 'perf test' entry for PMU aliases.

   - Add a 'perf test' entry for 'perf record/perf report/perf script'
     pipe mode.

  Fixes:

   - perf script dlfilter (API for filtering via dynamically loaded
     shared object introduced in v5.14) fixes and a 'perf test' entry
     for it.

   - Fix get_current_dir_name() compilation on Android.

   - Fix issues with asciidoc and double dashes uses.

   - Fix memory leaks in the BTF handling code.

   - Fix leftover problems in the Documentation from the infrastructure
     originally lifted from the git codebase.

   - Fix *probe_vfs_getname.sh 'perf test' failures.

   - Handle fd gaps in 'perf test's test__dso_data_reopen().

   - Make sure to show disasembly warnings for 'perf annotate --stdio'.

   - Fix output from pipe to file and vice-versa in 'perf
     record/report/script'.

   - Correct 'perf data -h' output.

   - Fix wrong comm in system-wide mode with 'perf record --delay'.

   - Do not allow --for-each-cgroup without cpu in 'perf stat'

   - Make 'perf test --skip' work on shell tests.

   - Fix libperf's verbose printing.

  Misc improvements:

   - Preparatory patches for multithreading various 'perf record' phases
     (synthesizing, opening, recording, etc).

   - Add sparse context/locking annotations in compiler-types.h, also to
     help with the multithreading effort.

   - Optimize the generation of the arch specific erno tables used in
     'perf trace'.

   - Optimize libperf's perf_cpu_map__max().

   - Improve ARM's CoreSight warnings.

   - Report collisions in AUX records.

   - Improve warnings for the LLVM 'perf test' entry.

   - Improve the PMU events 'perf test' codebase.

   - perf test: Do not compare overheads in the zstd comp test

   - Better support annotation on ARM.

   - Update 'perf trace's cmd string table to decode sys_bpf() first
     arg.

  Vendor events:

   - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and
     Elhart Lake.

   - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake
     servers.

  Hardware tracing:

   - Improvements for the ARM hardware tracing auxtrace support"

* tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (130 commits)
  perf tests: Add test for PMU aliases
  perf pmu: Add PMU alias support
  perf session: Report collisions in AUX records
  perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event
  perf build: Report failure for testing feature libopencsd
  perf cs-etm: Show a warning for an unknown magic number
  perf cs-etm: Print the decoder name
  perf cs-etm: Create ETE decoder
  perf cs-etm: Update OpenCSD decoder for ETE
  perf cs-etm: Fix typo
  perf cs-etm: Save TRCDEVARCH register
  perf cs-etm: Refactor out ETMv4 header saving
  perf cs-etm: Initialise architecture based on TRCIDR1
  perf cs-etm: Refactor initialisation of decoder params.
  tools build: Fix feature detect clean for out of source builds
  perf evlist: Add evlist__for_each_entry_from() macro
  perf evsel: Handle precise_ip fallback in evsel__open_cpu()
  perf evsel: Move bpf_counter__install_pe() to success path in evsel__open_cpu()
  perf evsel: Move test_attr__open() to success path in evsel__open_cpu()
  perf evsel: Move ignore_missing_thread() to fallback code
  ...
2021-09-05 11:56:18 -07:00
Riccardo Mancini
6e93bc534f libperf cpumap: Take into advantage it is sorted to optimize perf_cpu_map__max()
From commit 7074674e73 ("perf cpumap: Maintain cpumaps ordered and
without dups"), perf_cpu_map elements are sorted in ascending order.

This patch improves the perf_cpu_map__max function by returning the last
element.

Committer notes:

Do it as a ternary to keep it in just one return line, add a comment
explaining it is sorted and what functions does it.

Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/fb79f02e7b86ea8044d563adb1e9890c906f982f.1629490974.git.rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-31 16:17:02 -03:00
Riccardo Mancini
b75f299d69 libsubcmd: add OPT_UINTEGER_OPTARG option type
This patch adds OPT_UINTEGER_OPTARG, which is the same as OPT_UINTEGER,
but also makes it possible to use the option without any value, setting
the variable to a default value, d.

Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/c46749b3dff796729078352ff164d363457a3587.1629490974.git.rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-31 15:44:05 -03:00
Arnaldo Carvalho de Melo
c635813fef Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-30 10:05:46 -03:00
Shunsuke Nakamura
37c3193fa4 libperf tests: Fix verbose printing
libperf's verbose printing checks the -v option every time the macro _T_ START
is called.

Since there are currently four libperf tests registered, the macro _T_ START is
called four times, but verbose printing after the second time is not output.

Resets the index of the element processed by getopt() and fix verbose printing
so that it prints in all tests.

Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Acked-by: Rob Herring <robh@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210820093908.734503-3-nakamura.shun@fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-24 15:05:35 -03:00
Andrii Nakryiko
5e3b8356de libbpf: Add uprobe ref counter offset support for USDT semaphores
When attaching to uprobes through perf subsystem, it's possible to specify
offset of a so-called USDT semaphore, which is just a reference counted u16,
used by kernel to keep track of how many tracers are attached to a given
location. Support for this feature was added in [0], so just wire this through
uprobe_opts. This is important to enable implementing USDT attachment and
tracing through libbpf's bpf_program__attach_uprobe_opts() API.

  [0] a6ca88b241 ("trace_uprobe: support reference counter in fd-based uprobe")

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-16-andrii@kernel.org
2021-08-17 00:45:08 +02:00
Andrii Nakryiko
47faff3717 libbpf: Add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs
Wire through bpf_cookie for all attach APIs that use perf_event_open under the
hood:
  - for kprobes, extend existing bpf_kprobe_opts with bpf_cookie field;
  - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and
    pass bpf_cookie through opts.

For kernel that don't support BPF_LINK_CREATE for perf_events, and thus
bpf_cookie is not supported either, return error and log warning for user.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-12-andrii@kernel.org
2021-08-17 00:45:08 +02:00
Andrii Nakryiko
3ec84f4b16 libbpf: Add bpf_cookie support to bpf_link_create() API
Add ability to specify bpf_cookie value when creating BPF perf link with
bpf_link_create() low-level API.

Given BPF_LINK_CREATE command is growing and keeps getting new fields that are
specific to the type of BPF_LINK, extend libbpf side of bpf_link_create() API
and corresponding OPTS struct to accomodate such changes. Add extra checks to
prevent using incompatible/unexpected combinations of fields.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-11-andrii@kernel.org
2021-08-17 00:45:08 +02:00
Andrii Nakryiko
668ace0ea5 libbpf: Use BPF perf link when supported by kernel
Detect kernel support for BPF perf link and prefer it when attaching to
perf_event, tracepoint, kprobe/uprobe. Underlying perf_event FD will be kept
open until BPF link is destroyed, at which point both perf_event FD and BPF
link FD will be closed.

This preserves current behavior in which perf_event FD is open for the
duration of bpf_link's lifetime and user is able to "disconnect" bpf_link from
underlying FD (with bpf_link__disconnect()), so that bpf_link__destroy()
doesn't close underlying perf_event FD.When BPF perf link is used, disconnect
will keep both perf_event and bpf_link FDs open, so it will be up to
(advanced) user to close them. This approach is demonstrated in bpf_cookie.c
selftests, added in this patch set.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-10-andrii@kernel.org
2021-08-17 00:45:08 +02:00
Andrii Nakryiko
d88b71d4a9 libbpf: Remove unused bpf_link's destroy operation, but add dealloc
bpf_link->destroy() isn't used by any code, so remove it. Instead, add ability
to override deallocation procedure, with default doing plain free(link). This
is necessary for cases when we want to "subclass" struct bpf_link to keep
extra information, as is the case in the next patch adding struct
bpf_link_perf.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-9-andrii@kernel.org
2021-08-17 00:45:08 +02:00
Andrii Nakryiko
61c7aa5020 libbpf: Re-build libbpf.so when libbpf.map changes
Ensure libbpf.so is re-built whenever libbpf.map is modified.  Without this,
changes to libbpf.map are not detected and versioned symbols mismatch error
will be reported until `make clean && make` is used, which is a suboptimal
developer experience.

Fixes: 306b267cb3 ("libbpf: Verify versioned symbols")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-8-andrii@kernel.org
2021-08-17 00:45:07 +02:00
Hao Luo
2211c825e7 libbpf: Support weak typed ksyms.
Currently weak typeless ksyms have default value zero, when they don't
exist in the kernel. However, weak typed ksyms are rejected by libbpf
if they can not be resolved. This means that if a bpf object contains
the declaration of a nonexistent weak typed ksym, it will be rejected
even if there is no program that references the symbol.

Nonexistent weak typed ksyms can also default to zero just like
typeless ones. This allows programs that access weak typed ksyms to be
accepted by verifier, if the accesses are guarded. For example,

extern const int bpf_link_fops3 __ksym __weak;

/* then in BPF program */

if (&bpf_link_fops3) {
   /* use bpf_link_fops3 */
}

If actual use of nonexistent typed ksym is not guarded properly,
verifier would see that register is not PTR_TO_BTF_ID and wouldn't
allow to use it for direct memory reads or passing it to BPF helpers.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210812003819.2439037-1-haoluo@google.com
2021-08-13 15:56:28 -07:00
Jakub Kicinski
f4083a752a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
  9e26680733 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
  9e518f2580 ("bnxt_en: 1PPS functions to configure TSIO pins")
  099fdeda65 ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
  a2baf4e8bb ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
  c7603cfa04 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
  5957cc557d ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
  2d0b41a376 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
  7b637cd52f ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
  7d901a1e87 ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 06:41:22 -07:00
Jin Yao
2696d6e59c libperf: Add perf_cpu_map__default_new()
libperf already has a static function called 'cpu_map__default_new()'.

Add a new API perf_cpu_map__default_new() to export the function.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https //lore.kernel.org/r/20210723063433.7318-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-11 16:03:36 -03:00
Daniel Xu
c34c338a40 libbpf: Do not close un-owned FD 0 on errors
Before this patch, btf_new() was liable to close an arbitrary FD 0 if
BTF parsing failed. This was because:

* btf->fd was initialized to 0 through the calloc()
* btf__free() (in the `done` label) closed any FDs >= 0
* btf->fd is left at 0 if parsing fails

This issue was discovered on a system using libbpf v0.3 (without
BTF_KIND_FLOAT support) but with a kernel that had BTF_KIND_FLOAT types
in BTF. Thus, parsing fails.

While this patch technically doesn't fix any issues b/c upstream libbpf
has BTF_KIND_FLOAT support, it'll help prevent issues in the future if
more BTF types are added. It also allow the fix to be backported to
older libbpf's.

Fixes: 3289959b97 ("libbpf: Support BTF loading and raw data output in both endianness")
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/5969bb991adedb03c6ae93e051fd2a00d293cf25.1627513670.git.dxu@dxuuu.xyz
2021-08-07 01:39:15 +02:00
Robin Gögge
78d14bda86 libbpf: Fix probe for BPF_PROG_TYPE_CGROUP_SOCKOPT
This patch fixes the probe for BPF_PROG_TYPE_CGROUP_SOCKOPT,
so the probe reports accurate results when used by e.g.
bpftool.

Fixes: 4cdbfb59c4 ("libbpf: support sockopt hooks")
Signed-off-by: Robin Gögge <r.goegge@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20210728225825.2357586-1-r.goegge@gmail.com
2021-08-07 01:38:52 +02:00
Hengqi Chen
a710eed386 libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf
Add two new APIs: btf__load_vmlinux_btf and btf__load_module_btf.
btf__load_vmlinux_btf is just an alias to the existing API named
libbpf_find_kernel_btf, rename to be more precisely and consistent
with existing BTF APIs. btf__load_module_btf can be used to load
module BTF, add it for completeness. These two APIs are useful for
implementing tracing tools and introspection tools. This is part
of the effort towards libbpf 1.0 ([0]).

  [0] Closes: https://github.com/libbpf/libbpf/issues/280

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210730114012.494408-1-hengqi.chen@gmail.com
2021-07-30 12:21:29 -07:00
Quentin Monnet
61fc51b1d3 libbpf: Add split BTF support for btf__load_from_kernel_by_id()
Add a new API function btf__load_from_kernel_by_id_split(), which takes
a pointer to a base BTF object in order to support split BTF objects
when retrieving BTF information from the kernel.

Reference: https://github.com/libbpf/libbpf/issues/314

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210729162028.29512-8-quentin@isovalent.com
2021-07-29 17:23:59 -07:00
Quentin Monnet
6cc93e2f2c libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()
Rename function btf__get_from_id() as btf__load_from_kernel_by_id() to
better indicate what the function does. Change the new function so that,
instead of requiring a pointer to the pointer to update and returning
with an error code, it takes a single argument (the id of the BTF
object) and returns the corresponding pointer. This is more in line with
the existing constructors.

The other tools calling the (soon-to-be) deprecated btf__get_from_id()
function will be updated in a future commit.

References:

- https://github.com/libbpf/libbpf/issues/278
- https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#btfh-apis

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210729162028.29512-4-quentin@isovalent.com
2021-07-29 17:09:19 -07:00
Quentin Monnet
3c7e585906 libbpf: Rename btf__load() as btf__load_into_kernel()
As part of the effort to move towards a v1.0 for libbpf, rename
btf__load() function, used to "upload" BTF information into the kernel,
as btf__load_into_kernel(). This new name better reflects what the
function does.

References:

- https://github.com/libbpf/libbpf/issues/278
- https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#btfh-apis

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210729162028.29512-3-quentin@isovalent.com
2021-07-29 17:03:41 -07:00
Quentin Monnet
6d2d73cdd6 libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()
Variable "err" is initialised to -EINVAL so that this error code is
returned when something goes wrong in libbpf_find_prog_btf_id().
However, a recent change in the function made use of the variable in
such a way that it is set to 0 if retrieving linear information on the
program is successful, and this 0 value remains if we error out on
failures at later stages.

Let's fix this by setting err to -EINVAL later in the function.

Fixes: e9fc3ce99b ("libbpf: Streamline error reporting for high-level APIs")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210729162028.29512-2-quentin@isovalent.com
2021-07-29 17:03:41 -07:00
Martynas Pumputis
043c5bb3c4 libbpf: Fix race when pinning maps in parallel
When loading in parallel multiple programs which use the same to-be
pinned map, it is possible that two instances of the loader will call
bpf_object__create_maps() at the same time. If the map doesn't exist
when both instances call bpf_object__reuse_map(), then one of the
instances will fail with EEXIST when calling bpf_map__pin().

Fix the race by retrying reusing a map if bpf_map__pin() returns
EEXIST. The fix is similar to the one in iproute2: e4c4685fd6e4 ("bpf:
Fix race condition with map pinning").

Before retrying the pinning, we don't do any special cleaning of an
internal map state. The closer code inspection revealed that it's not
required:

    - bpf_object__create_map(): map->inner_map is destroyed after a
      successful call, map->fd is closed if pinning fails.
    - bpf_object__populate_internal_map(): created map elements is
      destroyed upon close(map->fd).
    - init_map_slots(): slots are freed after their initialization.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210726152001.34845-1-m@lambda.lt
2021-07-27 14:32:03 -07:00
Jason Wang
c139e40a51 libbpf: Fix comment typo
Remove the repeated word 'the' in line 48.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210727115928.74600-1-wangborong@cdjrlc.com
2021-07-27 14:18:47 -07:00
Alexei Starovoitov
b0588390db libbpf: Split CO-RE logic into relo_core.c.
Move CO-RE logic into separate file.
The internal interface between libbpf and CO-RE is through
bpf_core_apply_relo_insn() function and few structs defined in relo_core.h.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210721000822.40958-5-alexei.starovoitov@gmail.com
2021-07-26 12:29:14 -07:00
Alexei Starovoitov
301ba4d710 libbpf: Move CO-RE types into relo_core.h.
In order to make a clean split of CO-RE logic move its types
into independent header file.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210721000822.40958-4-alexei.starovoitov@gmail.com
2021-07-26 12:19:41 -07:00
Alexei Starovoitov
3ee4f53355 libbpf: Split bpf_core_apply_relo() into bpf_program independent helper.
bpf_core_apply_relo() doesn't need to know bpf_program internals
and hashmap details.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210721000822.40958-3-alexei.starovoitov@gmail.com
2021-07-26 12:13:05 -07:00
Alexei Starovoitov
6e43b28607 libbpf: Cleanup the layering between CORE and bpf_program.
CO-RE processing functions don't need to know 'struct bpf_program' details.
Cleanup the layering to eventually be able to move CO-RE logic into a separate file.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210721000822.40958-2-alexei.starovoitov@gmail.com
2021-07-26 12:11:23 -07:00
Evgeniy Litvinenko
e244d34d0e libbpf: Add bpf_map__pin_path function
Add bpf_map__pin_path, so that the inconsistently named
bpf_map__get_pin_path can be deprecated later. This is part of the
effort towards libbpf v1.0: https://github.com/libbpf/libbpf/issues/307

Also, add a selftest for the new function.

Signed-off-by: Evgeniy Litvinenko <evgeniyl@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210723221511.803683-1-evgeniyl@fb.com
2021-07-23 16:57:03 -07:00
Jiri Olsa
da97553ec6 libbpf: Export bpf_program__attach_kprobe_opts function
Export bpf_program__attach_kprobe_opts as a public API.

Rename bpf_program_attach_kprobe_opts to bpf_kprobe_opts and turn it into OPTS
struct.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20210721215810.889975-4-jolsa@kernel.org
2021-07-22 20:09:16 -07:00
Jiri Olsa
e3f9bc35ea libbpf: Allow decimal offset for kprobes
Allow to specify decimal offset in SEC macro, like:
  SEC("kprobe/bpf_fentry_test7+5")

Add selftest for that.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20210721215810.889975-3-jolsa@kernel.org
2021-07-22 20:09:16 -07:00
Jiri Olsa
1f71a468a7 libbpf: Fix func leak in attach_kprobe
Add missing free() for func pointer in attach_kprobe function.

Fixes: a2488b5f48 ("libbpf: Allow specification of "kprobe/function+offset"")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20210721215810.889975-2-jolsa@kernel.org
2021-07-22 20:08:39 -07:00
Alan Maguire
720c29fca9 libbpf: Propagate errors when retrieving enum value for typed data display
When retrieving the enum value associated with typed data during
"is data zero?" checking in btf_dump_type_data_check_zero(), the
return value of btf_dump_get_enum_value() is not passed to the caller
if the function returns a non-zero (error) value.  Currently, 0
is returned if the function returns an error.  We should instead
propagate the error to the caller.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626770993-11073-4-git-send-email-alan.maguire@oracle.com
2021-07-20 13:49:25 -07:00
Alan Maguire
a1d3cc3c5e libbpf: Avoid use of __int128 in typed dump display
__int128 is not supported for some 32-bit platforms (arm and i386).
__int128 was used in carrying out computations on bitfields which
aid display, but the same calculations could be done with __u64
with the small effect of not supporting 128-bit bitfields.

With these changes, a big-endian issue with casting 128-bit integers
to 64-bit for enum bitfields is solved also, as we now use 64-bit
integers for bitfield calculations.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626770993-11073-2-git-send-email-alan.maguire@oracle.com
2021-07-20 13:49:25 -07:00
Martynas Pumputis
a21ab4c59e libbpf: Fix removal of inner map in bpf_object__create_map
If creating an outer map of a BTF-defined map-in-map fails (via
bpf_object__create_map()), then the previously created its inner map
won't be destroyed.

Fix this by ensuring that the destroy routines are not bypassed in the
case of a failure.

Fixes: 646f02ffdd ("libbpf: Add BTF-defined map-in-map support")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210719173838.423148-2-m@lambda.lt
2021-07-19 15:48:20 -07:00
Alan Maguire
add192f81a libbpf: Btf typed dump does not need to allocate dump data
By using the stack for this small structure, we avoid the need
for freeing memory in error paths.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626475617-25984-4-git-send-email-alan.maguire@oracle.com
2021-07-16 17:29:56 -07:00
Alan Maguire
04eb4dff6a libbpf: Fix compilation errors on ppc64le for btf dump typed data
__s64 can be defined as either long or long long, depending on the
architecture. On ppc64le it's defined as long, giving this error:

 In file included from btf_dump.c:22:
btf_dump.c: In function 'btf_dump_type_data_check_overflow':
libbpf_internal.h:111:22: error: format '%lld' expects argument of
type 'long long int', but argument 3 has type '__s64' {aka 'long int'}
[-Werror=format=]
  111 |  libbpf_print(level, "libbpf: " fmt, ##__VA_ARGS__); \
      |                      ^~~~~~~~~~
libbpf_internal.h:114:27: note: in expansion of macro '__pr'
  114 | #define pr_warn(fmt, ...) __pr(LIBBPF_WARN, fmt, ##__VA_ARGS__)
      |                           ^~~~
btf_dump.c:1992:3: note: in expansion of macro 'pr_warn'
 1992 |   pr_warn("unexpected size [%lld] for id [%u]\n",
      |   ^~~~~~~
btf_dump.c:1992:32: note: format string is defined here
 1992 |   pr_warn("unexpected size [%lld] for id [%u]\n",
      |                             ~~~^
      |                                |
      |                                long long int
      |                             %ld

Cast to size_t and use %zu instead.

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626475617-25984-3-git-send-email-alan.maguire@oracle.com
2021-07-16 17:25:57 -07:00
Alan Maguire
8d44c3578b libbpf: Clarify/fix unaligned data issues for btf typed dump
If data is packed, data structures can store it outside of usual
boundaries.  For example a 4-byte int can be stored on a unaligned
boundary in a case like this:

struct s {
	char f1;
	int f2;
} __attribute((packed));

...the int is stored at an offset of one byte.  Some platforms have
problems dereferencing data that is not aligned with its size, and
code exists to handle most cases of this for BTF typed data display.
However pointer display was missed, and a simple function to test if
"ptr_is_aligned(data, data_sz)" would help clarify this code.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626475617-25984-2-git-send-email-alan.maguire@oracle.com
2021-07-16 17:24:55 -07:00
Alan Maguire
920d16af9b libbpf: BTF dumper support for typed data
Add a BTF dumper for typed data, so that the user can dump a typed
version of the data provided.

The API is

int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
                             void *data, size_t data_sz,
                             const struct btf_dump_type_data_opts *opts);

...where the id is the BTF id of the data pointed to by the "void *"
argument; for example the BTF id of "struct sk_buff" for a
"struct skb *" data pointer.  Options supported are

 - a starting indent level (indent_lvl)
 - a user-specified indent string which will be printed once per
   indent level; if NULL, tab is chosen but any string <= 32 chars
   can be provided.
 - a set of boolean options to control dump display, similar to those
   used for BPF helper bpf_snprintf_btf().  Options are
        - compact : omit newlines and other indentation
        - skip_names: omit member names
        - emit_zeroes: show zero-value members

Default output format is identical to that dumped by bpf_snprintf_btf(),
for example a "struct sk_buff" representation would look like this:

struct sk_buff){
	(union){
		(struct){
			.next = (struct sk_buff *)0xffffffffffffffff,
			.prev = (struct sk_buff *)0xffffffffffffffff,
		(union){
			.dev = (struct net_device *)0xffffffffffffffff,
			.dev_scratch = (long unsigned int)18446744073709551615,
		},
	},
...

If the data structure is larger than the *data_sz*
number of bytes that are available in *data*, as much
of the data as possible will be dumped and -E2BIG will
be returned.  This is useful as tracers will sometimes
not be able to capture all of the data associated with
a type; for example a "struct task_struct" is ~16k.
Being able to specify that only a subset is available is
important for such cases.  On success, the amount of data
dumped is returned.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626362126-27775-2-git-send-email-alan.maguire@oracle.com
2021-07-16 13:46:59 -07:00
Shuyi Cheng
18353c87e0 libbpf: Fix the possible memory leak on error
If the strdup() fails then we need to call bpf_object__close(obj) to
avoid a resource leak.

Fixes: 166750bc1d ("libbpf: Support libbpf-provided extern variables")
Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626180159-112996-3-git-send-email-chengshuyi@linux.alibaba.com
2021-07-16 13:22:47 -07:00
Shuyi Cheng
1373ff5995 libbpf: Introduce 'btf_custom_path' to 'bpf_obj_open_opts'
btf_custom_path allows developers to load custom BTF which libbpf will
subsequently use for CO-RE relocation instead of vmlinux BTF.

Having btf_custom_path in bpf_object_open_opts one can directly use the
skeleton's <objname>_bpf__open_opts() API to pass in the btf_custom_path
parameter, as opposed to using bpf_object__load_xattr() which is slated to be
deprecated ([0]).

This work continues previous work started by another developer ([1]).

  [0] https://lore.kernel.org/bpf/CAEf4BzbJZLjNoiK8_VfeVg_Vrg=9iYFv+po-38SMe=UzwDKJ=Q@mail.gmail.com/#t
  [1] https://yhbt.net/lore/all/CAEf4Bzbgw49w2PtowsrzKQNcxD4fZRE6AKByX-5-dMo-+oWHHA@mail.gmail.com/

Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626180159-112996-2-git-send-email-chengshuyi@linux.alibaba.com
2021-07-16 13:22:47 -07:00
David S. Miller
82a1ffe57e Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-07-15

The following pull-request contains BPF updates for your *net-next* tree.

We've added 45 non-merge commits during the last 15 day(s) which contain
a total of 52 files changed, 3122 insertions(+), 384 deletions(-).

The main changes are:

1) Introduce bpf timers, from Alexei.

2) Add sockmap support for unix datagram socket, from Cong.

3) Fix potential memleak and UAF in the verifier, from He.

4) Add bpf_get_func_ip helper, from Jiri.

5) Improvements to generic XDP mode, from Kumar.

6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-15 22:40:10 -07:00
Alan Maguire
a2488b5f48 libbpf: Allow specification of "kprobe/function+offset"
kprobes can be placed on most instructions in a function, not
just entry, and ftrace and bpftrace support the function+offset
notification for probe placement.  Adding parsing of func_name
into func+offset to bpf_program__attach_kprobe() allows the
user to specify

SEC("kprobe/bpf_fentry_test5+0x6")

...for example, and the offset can be passed to perf_event_open_probe()
to support kprobe attachment.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210714094400.396467-8-jolsa@kernel.org
2021-07-15 17:59:14 -07:00
Jiri Olsa
ac0ed48829 libbpf: Add bpf_program__attach_kprobe_opts function
Adding bpf_program__attach_kprobe_opts that does the same
as bpf_program__attach_kprobe, but takes opts argument.

Currently opts struct holds just retprobe bool, but we will
add new field in following patch.

The function is not exported, so there's no need to add
size to the struct bpf_program_attach_kprobe_opts for now.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210714094400.396467-7-jolsa@kernel.org
2021-07-15 17:59:14 -07:00
Linus Torvalds
8096acd744 Networking fixes for 5.14-rc2, including fixes from bpf and netfilter.
Current release - regressions:
 
  - sock: fix parameter order in sock_setsockopt()
 
 Current release - new code bugs:
 
  - netfilter: nft_last:
      - fix incorrect arithmetic when restoring last used
      - honor NFTA_LAST_SET on restoration
 
 Previous releases - regressions:
 
  - udp: properly flush normal packet at GRO time
 
  - sfc: ensure correct number of XDP queues; don't allow enabling the
         feature if there isn't sufficient resources to Tx from any CPU
 
  - dsa: sja1105: fix address learning getting disabled on the CPU port
 
  - mptcp: addresses a rmem accounting issue that could keep packets
         in subflow receive buffers longer than necessary, delaying
 	MPTCP-level ACKs
 
  - ip_tunnel: fix mtu calculation for ETHER tunnel devices
 
  - do not reuse skbs allocated from skbuff_fclone_cache in the napi
    skb cache, we'd try to return them to the wrong slab cache
 
  - tcp: consistently disable header prediction for mptcp
 
 Previous releases - always broken:
 
  - bpf: fix subprog poke descriptor tracking use-after-free
 
  - ipv6:
       - allocate enough headroom in ip6_finish_output2() in case
         iptables TEE is used
       - tcp: drop silly ICMPv6 packet too big messages to avoid
         expensive and pointless lookups (which may serve as a DDOS
 	vector)
       - make sure fwmark is copied in SYNACK packets
       - fix 'disable_policy' for forwarded packets (align with IPv4)
 
  - netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state
 
  - netfilter: conntrack: do not mark RST in the reply direction coming
       after SYN packet for an out-of-sync entry
 
  - mptcp: cleanly handle error conditions with MP_JOIN and syncookies
 
  - mptcp: fix double free when rejecting a join due to port mismatch
 
  - validate lwtstate->data before returning from skb_tunnel_info()
 
  - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
 
  - mt76: mt7921: continue to probe driver when fw already downloaded
 
  - bonding: fix multiple issues with offloading IPsec to (thru?) bond
 
  - stmmac: ptp: fix issues around Qbv support and setting time back
 
  - bcmgenet: always clear wake-up based on energy detection
 
 Misc:
 
  - sctp: move 198 addresses from unusable to private scope
 
  - ptp: support virtual clocks and timestamping
 
  - openvswitch: optimize operation for key comparison
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDu3mMACgkQMUZtbf5S
 Irsjxg//UwcPJMYFmXV+fGkEsWYe1Kf29FcUDEeANFtbltfAcIfZ0GoTbSDRnrVb
 HcYAKcm4XRx5bWWdQrQsQq/yiLbnS/rSLc7VRB+uRHWRKl3eYcaUB2rnCXsxrjGw
 wQJgOmztDCJS4BIky24iQpF/8lg7p/Gj2Ih532gh93XiYo612FrEJKkYb2/OQfYX
 GkbnZ0kL2Y1SV+bhy6aT5azvhHKM4/3eA4fHeJ2p8e2gOZ5ni0vpX0xEzdzKOCd0
 vwR/Wu3h/+2QuFYVcSsVguuM++JXACG8MAS/Tof78dtNM4a3kQxzqeh5Bv6IkfTu
 rokENLq4pjNRy+nBAOeQZj8Jd0K0kkf/PN9WMdGQtplMoFhjjV25R6PeRrV9wwPo
 peozIz2MuQo7Kfof1D+44h2foyLfdC28/Z0CvRbDpr5EHOfYynvBbrnhzIGdQp6V
 xgftKTOdgz2Djgg8HiblZund1FA44OYerddVAASrIsnSFnIz1VLVQIsfV+GLBwwc
 FawrIZ6WfIjzRSrDGOvDsbAQI47T/1jbaPJeK6XgjWkQmjEd6UtRWRZLYCxemQEw
 4HP3sWC96BOehuD8ylipVE1oFqrxCiOB/fZxezXqjo8dSX3NLdak4cCHTHoW5SuZ
 eEAxQRaBliKd+P7hoy9cZ57CAu3zUa8kijfM5QRlCAHF+zSxaPs=
 =QFnb
 -----END PGP SIGNATURE-----

Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski.
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - sock: fix parameter order in sock_setsockopt()

  Current release - new code bugs:

   - netfilter: nft_last:
       - fix incorrect arithmetic when restoring last used
       - honor NFTA_LAST_SET on restoration

  Previous releases - regressions:

   - udp: properly flush normal packet at GRO time

   - sfc: ensure correct number of XDP queues; don't allow enabling the
     feature if there isn't sufficient resources to Tx from any CPU

   - dsa: sja1105: fix address learning getting disabled on the CPU port

   - mptcp: addresses a rmem accounting issue that could keep packets in
     subflow receive buffers longer than necessary, delaying MPTCP-level
     ACKs

   - ip_tunnel: fix mtu calculation for ETHER tunnel devices

   - do not reuse skbs allocated from skbuff_fclone_cache in the napi
     skb cache, we'd try to return them to the wrong slab cache

   - tcp: consistently disable header prediction for mptcp

  Previous releases - always broken:

   - bpf: fix subprog poke descriptor tracking use-after-free

   - ipv6:
       - allocate enough headroom in ip6_finish_output2() in case
         iptables TEE is used
       - tcp: drop silly ICMPv6 packet too big messages to avoid
         expensive and pointless lookups (which may serve as a DDOS
         vector)
       - make sure fwmark is copied in SYNACK packets
       - fix 'disable_policy' for forwarded packets (align with IPv4)

   - netfilter: conntrack:
       - do not renew entry stuck in tcp SYN_SENT state
       - do not mark RST in the reply direction coming after SYN packet
         for an out-of-sync entry

   - mptcp: cleanly handle error conditions with MP_JOIN and syncookies

   - mptcp: fix double free when rejecting a join due to port mismatch

   - validate lwtstate->data before returning from skb_tunnel_info()

   - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path

   - mt76: mt7921: continue to probe driver when fw already downloaded

   - bonding: fix multiple issues with offloading IPsec to (thru?) bond

   - stmmac: ptp: fix issues around Qbv support and setting time back

   - bcmgenet: always clear wake-up based on energy detection

  Misc:

   - sctp: move 198 addresses from unusable to private scope

   - ptp: support virtual clocks and timestamping

   - openvswitch: optimize operation for key comparison"

* tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
  net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
  sfc: add logs explaining XDP_TX/REDIRECT is not available
  sfc: ensure correct number of XDP queues
  sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
  net: fddi: fix UAF in fza_probe
  net: dsa: sja1105: fix address learning getting disabled on the CPU port
  net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
  net: Use nlmsg_unicast() instead of netlink_unicast()
  octeontx2-pf: Fix uninitialized boolean variable pps
  ipv6: allocate enough headroom in ip6_finish_output2()
  net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
  net: bridge: multicast: fix MRD advertisement router port marking race
  net: bridge: multicast: fix PIM hello router port marking race
  net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
  dsa: fix for_each_child.cocci warnings
  virtio_net: check virtqueue_add_sgs() return value
  mptcp: properly account bulk freed memory
  selftests: mptcp: fix case multiple subflows limited by server
  mptcp: avoid processing packet if a subflow reset
  mptcp: fix syncookie process if mptcp can not_accept new subflow
  ...
2021-07-14 09:24:32 -07:00
Martynas Pumputis
97eb31384a libbpf: Fix reuse of pinned map on older kernel
When loading a BPF program with a pinned map, the loader checks whether
the pinned map can be reused, i.e. their properties match. To derive
such of the pinned map, the loader invokes BPF_OBJ_GET_INFO_BY_FD and
then does the comparison.

Unfortunately, on < 4.12 kernels the BPF_OBJ_GET_INFO_BY_FD is not
available, so loading the program fails with the following error:

	libbpf: failed to get map info for map FD 5: Invalid argument
	libbpf: couldn't reuse pinned map at
		'/sys/fs/bpf/tc/globals/cilium_call_policy': parameter
		mismatch"
	libbpf: map 'cilium_call_policy': error reusing pinned map
	libbpf: map 'cilium_call_policy': failed to create:
		Invalid argument(-22)
	libbpf: failed to load object 'bpf_overlay.o'

To fix this, fallback to derivation of the map properties via
/proc/$PID/fdinfo/$MAP_FD if BPF_OBJ_GET_INFO_BY_FD fails with EINVAL,
which can be used as an indicator that the kernel doesn't support
the latter.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210712125552.58705-1-m@lambda.lt
2021-07-12 16:15:27 -07:00
Jiri Olsa
afd4ad01ff libperf: Add tests for perf_evlist__set_leader()
Add a test for the newly added perf_evlist__set_leader() function.

Committer testing:

  $ cd tools/lib/perf/
  $ sudo make tests
  [sudo] password for acme:
  running static:
  - running tests/test-cpumap.c...OK
  - running tests/test-threadmap.c...OK
  - running tests/test-evlist.c...OK
  - running tests/test-evsel.c...OK
  running dynamic:
  - running tests/test-cpumap.c...OK
  - running tests/test-threadmap.c...OK
  - running tests/test-evlist.c...OK
  - running tests/test-evsel.c...OK
  $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 15:34:41 -03:00
Arnaldo Carvalho de Melo
e2c18168c3 libperf: Remove BUG_ON() from library code in get_group_fd()
We shouldn't just panic, return a value that doesn't clash with what
perf_evsel__open() was already returning in case of error, i.e. errno
when sys_perf_event_open() fails.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Link: http://lore.kernel.org/lkml/YOiOA5zOtVH9IBbE@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 15:33:39 -03:00
Jiri Olsa
3fd35de168 libperf: Add group support to perf_evsel__open()
Add support to set group_fd in perf_evsel__open() and make it follow the
group setup.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:49:47 -03:00
Jiri Olsa
2e6263ab54 libperf: Adopt evlist__set_leader() from tools/perf as perf_evlist__set_leader()
Move the implementation of evlist__set_leader() to a new libperf
perf_evlist__set_leader() function with the same functionality make it a
libperf exported API.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:32 -03:00
Jiri Olsa
3a683120d8 libperf: Move 'nr_groups' from tools/perf to evlist::nr_groups
Move evsel::nr_groups to perf_evsel::nr_groups, so we can move the group
interface to libperf.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:32 -03:00
Jiri Olsa
fba7c86601 libperf: Move 'leader' from tools/perf to perf_evsel::leader
Move evsel::leader to perf_evsel::leader, so we can move the group
interface to libperf.

Also add several evsel helpers to ease up the transition:

  struct evsel *evsel__leader(struct evsel *evsel);
  - get leader evsel

  bool evsel__has_leader(struct evsel *evsel, struct evsel *leader);
  - true if evsel has leader as leader

  bool evsel__is_leader(struct evsel *evsel);
  - true if evsel is itw own leader

  void evsel__set_leader(struct evsel *evsel, struct evsel *leader);
  - set leader for evsel

Committer notes:

Fix this when building with 'make BUILD_BPF_SKEL=1'

  tools/perf/util/bpf_counter.c

  -       if (evsel->leader->core.nr_members > 1) {
  +       if (evsel->core.leader->nr_members > 1) {

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:31 -03:00
Jiri Olsa
38fe0e0156 libperf: Move 'idx' from tools/perf to perf_evsel::idx
Move evsel::idx to perf_evsel::idx, so we can move the group interface
to libperf.

Committer notes:

Fixup evsel->idx usage in tools/perf/util/bpf_counter_cgroup.c, that
appeared in my tree in my local tree.

Also fixed up these:

$ find tools/perf/ -name "*.[ch]" | xargs grep 'evsel->idx'
tools/perf/ui/gtk/annotate.c:                      evsel->idx + i);
tools/perf/ui/gtk/annotate.c:                   evsel->idx);
$

That running 'make -C tools/perf build-test' caught.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:28 -03:00
Jiri Olsa
3d970601da libperf: Change tests to single static and shared binaries
Make tests to be two binaries 'tests_static' and 'tests_shared', so the
maintenance is easier.

Adding tests under libperf build system, so we define all the flags just
once.

Adding make-tests tule to just compile tests without running them.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Link: http://lore.kernel.org/lkml/20210706151704.73662-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-07 11:41:58 -03:00
Toke Høiland-Jørgensen
af0efa050c libbpf: Restore errno return for functions that were already returning it
The update to streamline libbpf error reporting intended to change all
functions to return the errno as a negative return value if
LIBBPF_STRICT_DIRECT_ERRS is set. However, if the flag is *not* set, the
return value changes for the two functions that were already returning a
negative errno unconditionally: bpf_link__unpin() and perf_buffer__poll().

This is a user-visible API change that breaks applications; so let's revert
these two functions back to unconditionally returning a negative errno
value.

Fixes: e9fc3ce99b ("libbpf: Streamline error reporting for high-level APIs")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210706122355.236082-1-toke@redhat.com
2021-07-06 21:13:08 -07:00
Linus Torvalds
406254918b perf tools changes for v5.14:
Tools:
 
 - Add cgroup support for 'perf top' (-G).
 
 - Add support for KVM MSRs in 'perf kvm stat'
 
 - Support probes on init functions in 'perf probe', to support the
   bootconfig format.
 
 - Improve error reporting in 'perf probe'.
 
 - No need to synthesize BUILD_ID records in 'perf inject' if the MMAP2
   records have build ids already.
 
 - Allow toggling source code ('s' hotkey) in 'perf annotate' in all
   lines.
 
 - Add itrace options support to 'perf annotate'.
 
 - Support to custom DSO filters for 'perf script'.
 
 Hardware enablement:
 
 - Support the HYBRID_TOPOLOGY and HYBRID_CPU_PMU_CAPS features in the
   perf.data file header.
 
 - Support PMU prefix for mem-load and mem-store events, to support
   hybrid (BIG little) CPUs such as Intel's Alderlake.
 
 - Support hybrid CPUs in 'perf mem' and 'perf c2c'.
 
 Hardware tracing:
 
 - Intel PT now supports tracing KVM guests.
 
 - Timestamp improvements for ARM's Coresight.
 
 Build:
 
 - Add 'make -C tools/perf build-test' entries for libopencsd/CORESIGHT=1
   and libbpf/LIBBPF_DYNAMIC=1.
 
 - Use bison's --file-prefix-map option to avoid storing full paths when
   using O= in the perf build.
 
 Tests:
 
 - Improve the 'perf test' entries for libpfm4 and BPF counters.
 
 Misc:
 
 - Sync msr-index.h, mount.h, kvm headers with the kernel originals.
 
 - Add vendor events and metrics for Intel's Icelake Server & Client.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYN5WhAAKCRCyPKLppCJ+
 J3b9APwO5iDjSMvVgKT84njXo1EqURMz6nmV3kkjBkaMo0KK2wEAvXysIEgwx1cu
 hakfFw63ztxVQctcWShOzP7jnJOOwwg=
 =PpGz
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.14-2021-07-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "Tools:

   - Add cgroup support for 'perf top' (-G).

   - Add support for KVM MSRs in 'perf kvm stat'

   - Support probes on init functions in 'perf probe', to support the
     bootconfig format.

   - Improve error reporting in 'perf probe'.

   - No need to synthesize BUILD_ID records in 'perf inject' if the
     MMAP2 records have build ids already.

   - Allow toggling source code ('s' hotkey) in 'perf annotate' in all
     lines.

   - Add itrace options support to 'perf annotate'.

   - Support to custom DSO filters for 'perf script'.

  Hardware enablement:

   - Support the HYBRID_TOPOLOGY and HYBRID_CPU_PMU_CAPS features in the
     perf.data file header.

   - Support PMU prefix for mem-load and mem-store events, to support
     hybrid (BIG little) CPUs such as Intel's Alderlake.

   - Support hybrid CPUs in 'perf mem' and 'perf c2c'.

  Hardware tracing:

   - Intel PT now supports tracing KVM guests.

   - Timestamp improvements for ARM's Coresight.

  Build:

   - Add 'make -C tools/perf build-test' entries for
     libopencsd/CORESIGHT=1 and libbpf/LIBBPF_DYNAMIC=1.

   - Use bison's --file-prefix-map option to avoid storing full paths
     when using O= in the perf build.

  Tests:

   - Improve the 'perf test' entries for libpfm4 and BPF counters.

  Misc:

   - Sync msr-index.h, mount.h, kvm headers with the kernel originals.

   - Add vendor events and metrics for Intel's Icelake Server & Client"

* tag 'perf-tools-for-v5.14-2021-07-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (123 commits)
  perf session: Add missing evlist__delete when deleting a session
  perf annotate: Allow 's' on source code lines
  perf dlfilter: Add object_code() to perf_dlfilter_fns
  perf dlfilter: Add attr() to perf_dlfilter_fns
  perf dlfilter: Add srcline() to perf_dlfilter_fns
  perf dlfilter: Add insn() to perf_dlfilter_fns
  perf dlfilter: Add resolve_address() to perf_dlfilter_fns
  perf build: Install perf_dlfilter.h
  perf script: Add option to pass arguments to dlfilters
  perf script: Add option to list dlfilters
  perf script: Add dlfilter__filter_event_early()
  perf script: Add API for filtering via dynamically loaded shared object
  perf llvm: Return -ENOMEM when asprintf() fails
  perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
  tools headers UAPI: Synch KVM's svm.h header with the kernel
  tools kvm headers arm64: Update KVM headers from the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools include UAPI: Update linux/mount.h copy
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  ...
2021-07-02 12:24:31 -07:00
Linus Torvalds
dbe69e4337 Networking changes for 5.14.
Core:
 
  - BPF:
    - add syscall program type and libbpf support for generating
      instructions and bindings for in-kernel BPF loaders (BPF loaders
      for BPF), this is a stepping stone for signed BPF programs
    - infrastructure to migrate TCP child sockets from one listener
      to another in the same reuseport group/map to improve flexibility
      of service hand-off/restart
    - add broadcast support to XDP redirect
 
  - allow bypass of the lockless qdisc to improving performance
    (for pktgen: +23% with one thread, +44% with 2 threads)
 
  - add a simpler version of "DO_ONCE()" which does not require
    jump labels, intended for slow-path usage
 
  - virtio/vsock: introduce SOCK_SEQPACKET support
 
  - add getsocketopt to retrieve netns cookie
 
  - ip: treat lowest address of a IPv4 subnet as ordinary unicast address
        allowing reclaiming of precious IPv4 addresses
 
  - ipv6: use prandom_u32() for ID generation
 
  - ip: add support for more flexible field selection for hashing
        across multi-path routes (w/ offload to mlxsw)
 
  - icmp: add support for extended RFC 8335 PROBE (ping)
 
  - seg6: add support for SRv6 End.DT46 behavior
 
  - mptcp:
     - DSS checksum support (RFC 8684) to detect middlebox meddling
     - support Connection-time 'C' flag
     - time stamping support
 
  - sctp: packetization Layer Path MTU Discovery (RFC 8899)
 
  - xfrm: speed up state addition with seq set
 
  - WiFi:
     - hidden AP discovery on 6 GHz and other HE 6 GHz improvements
     - aggregation handling improvements for some drivers
     - minstrel improvements for no-ack frames
     - deferred rate control for TXQs to improve reaction times
     - switch from round robin to virtual time-based airtime scheduler
 
  - add trace points:
     - tcp checksum errors
     - openvswitch - action execution, upcalls
     - socket errors via sk_error_report
 
 Device APIs:
 
  - devlink: add rate API for hierarchical control of max egress rate
             of virtual devices (VFs, SFs etc.)
 
  - don't require RCU read lock to be held around BPF hooks
    in NAPI context
 
  - page_pool: generic buffer recycling
 
 New hardware/drivers:
 
  - mobile:
     - iosm: PCIe Driver for Intel M.2 Modem
     - support for Qualcomm MSM8998 (ipa)
 
  - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
 
  - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
 
  - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
 
  - NXP SJA1110 Automotive Ethernet 10-port switch
 
  - Qualcomm QCA8327 switch support (qca8k)
 
  - Mikrotik 10/25G NIC (atl1c)
 
 Driver changes:
 
  - ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP
    (our first foray into MAC/PHY description via ACPI)
 
  - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
 
  - Mellanox/Nvidia NIC (mlx5)
    - NIC VF offload of L2 bridging
    - support IRQ distribution to Sub-functions
 
  - Marvell (prestera):
     - add flower and match all
     - devlink trap
     - link aggregation
 
  - Netronome (nfp): connection tracking offload
 
  - Intel 1GE (igc): add AF_XDP support
 
  - Marvell DPU (octeontx2): ingress ratelimit offload
 
  - Google vNIC (gve): new ring/descriptor format support
 
  - Qualcomm mobile (rmnet & ipa): inline checksum offload support
 
  - MediaTek WiFi (mt76)
     - mt7915 MSI support
     - mt7915 Tx status reporting
     - mt7915 thermal sensors support
     - mt7921 decapsulation offload
     - mt7921 enable runtime pm and deep sleep
 
  - Realtek WiFi (rtw88)
     - beacon filter support
     - Tx antenna path diversity support
     - firmware crash information via devcoredump
 
  - Qualcomm 60GHz WiFi (wcn36xx)
     - Wake-on-WLAN support with magic packets and GTK rekeying
 
  - Micrel PHY (ksz886x/ksz8081): add cable test support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDb+fUACgkQMUZtbf5S
 Irs2Jg//aqN0Q8CgIvYCVhPxQw1tY7pTAbgyqgBZ01vwjyvtIOgJiWzSfFEU84mX
 M8fcpFX5eTKrOyJ9S6UFfQ/JG114n3hjAxFFT4Hxk2gC1Tg0vHuFQTDHcUl28bUE
 mTm61e1YpdorILnv2k5JVQ/wu0vs5QKDrjcYcrcPnh+j93wvnPOgAfDBV95nZzjS
 OTt4q2fR8GzLcSYWWsclMbDNkzyTG50RW/0Yd6aGjr5QGvXfrMeXfUJNz533PMf/
 w5lNyjRKv+x9mdTZJzU0+msNUrZgUdRz7W8Ey8lD3hJZRE+D6/uU7FtsE8Mi3+uc
 HWxeZUyzA3YF1MfVl/eesbxyPT7S/OkLzk4O5B35FbqP0YltaP+bOjq1/nM3ce1/
 io9Dx9pIl/2JANUgRCAtLi8Z2dkvRoqTaBxZ/nPudCCljFwDwl6joTMJ7Ow22i5Y
 5aIkcXFmZq4LbJDiHvbTlqT7yiuaEvu2UK/23bSIg/K3nF4eAmkY9Y1EgiMf60OF
 78Ttw0wk2tUegwaS5MZnCniKBKDyl9gM2F6rbZ/IxQRR2LTXFc1B6gC+ynUxgXfh
 Ub8O++6qGYGYZ0XvQH4pzco79p3qQWBTK5beIp2eu6BOAjBVIXq4AibUfoQLACsu
 hX7jMPYd0kc3WFgUnKgQP8EnjFSwbf4XiaE7fIXvWBY8hzCw2h4=
 =LvtX
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - BPF:
      - add syscall program type and libbpf support for generating
        instructions and bindings for in-kernel BPF loaders (BPF loaders
        for BPF), this is a stepping stone for signed BPF programs
      - infrastructure to migrate TCP child sockets from one listener to
        another in the same reuseport group/map to improve flexibility
        of service hand-off/restart
      - add broadcast support to XDP redirect

   - allow bypass of the lockless qdisc to improving performance (for
     pktgen: +23% with one thread, +44% with 2 threads)

   - add a simpler version of "DO_ONCE()" which does not require jump
     labels, intended for slow-path usage

   - virtio/vsock: introduce SOCK_SEQPACKET support

   - add getsocketopt to retrieve netns cookie

   - ip: treat lowest address of a IPv4 subnet as ordinary unicast
     address allowing reclaiming of precious IPv4 addresses

   - ipv6: use prandom_u32() for ID generation

   - ip: add support for more flexible field selection for hashing
     across multi-path routes (w/ offload to mlxsw)

   - icmp: add support for extended RFC 8335 PROBE (ping)

   - seg6: add support for SRv6 End.DT46 behavior

   - mptcp:
      - DSS checksum support (RFC 8684) to detect middlebox meddling
      - support Connection-time 'C' flag
      - time stamping support

   - sctp: packetization Layer Path MTU Discovery (RFC 8899)

   - xfrm: speed up state addition with seq set

   - WiFi:
      - hidden AP discovery on 6 GHz and other HE 6 GHz improvements
      - aggregation handling improvements for some drivers
      - minstrel improvements for no-ack frames
      - deferred rate control for TXQs to improve reaction times
      - switch from round robin to virtual time-based airtime scheduler

   - add trace points:
      - tcp checksum errors
      - openvswitch - action execution, upcalls
      - socket errors via sk_error_report

  Device APIs:

   - devlink: add rate API for hierarchical control of max egress rate
     of virtual devices (VFs, SFs etc.)

   - don't require RCU read lock to be held around BPF hooks in NAPI
     context

   - page_pool: generic buffer recycling

  New hardware/drivers:

   - mobile:
      - iosm: PCIe Driver for Intel M.2 Modem
      - support for Qualcomm MSM8998 (ipa)

   - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices

   - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches

   - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)

   - NXP SJA1110 Automotive Ethernet 10-port switch

   - Qualcomm QCA8327 switch support (qca8k)

   - Mikrotik 10/25G NIC (atl1c)

  Driver changes:

   - ACPI support for some MDIO, MAC and PHY devices from Marvell and
     NXP (our first foray into MAC/PHY description via ACPI)

   - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx

   - Mellanox/Nvidia NIC (mlx5)
      - NIC VF offload of L2 bridging
      - support IRQ distribution to Sub-functions

   - Marvell (prestera):
      - add flower and match all
      - devlink trap
      - link aggregation

   - Netronome (nfp): connection tracking offload

   - Intel 1GE (igc): add AF_XDP support

   - Marvell DPU (octeontx2): ingress ratelimit offload

   - Google vNIC (gve): new ring/descriptor format support

   - Qualcomm mobile (rmnet & ipa): inline checksum offload support

   - MediaTek WiFi (mt76)
      - mt7915 MSI support
      - mt7915 Tx status reporting
      - mt7915 thermal sensors support
      - mt7921 decapsulation offload
      - mt7921 enable runtime pm and deep sleep

   - Realtek WiFi (rtw88)
      - beacon filter support
      - Tx antenna path diversity support
      - firmware crash information via devcoredump

   - Qualcomm WiFi (wcn36xx)
      - Wake-on-WLAN support with magic packets and GTK rekeying

   - Micrel PHY (ksz886x/ksz8081): add cable test support"

* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
  tcp: change ICSK_CA_PRIV_SIZE definition
  tcp_yeah: check struct yeah size at compile time
  gve: DQO: Fix off by one in gve_rx_dqo()
  stmmac: intel: set PCI_D3hot in suspend
  stmmac: intel: Enable PHY WOL option in EHL
  net: stmmac: option to enable PHY WOL with PMT enabled
  net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
  net: use netdev_info in ndo_dflt_fdb_{add,del}
  ptp: Set lookup cookie when creating a PTP PPS source.
  net: sock: add trace for socket errors
  net: sock: introduce sk_error_report
  net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
  net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
  net: dsa: include fdb entries pointing to bridge in the host fdb list
  net: dsa: include bridge addresses which are local in the host fdb list
  net: dsa: sync static FDB entries on foreign interfaces to hardware
  net: dsa: install the host MDB and FDB entries in the master's RX filter
  net: dsa: reference count the FDB addresses at the cross-chip notifier level
  net: dsa: introduce a separate cross-chip notifier type for host FDBs
  net: dsa: reference count the MDB entries at the cross-chip notifier level
  ...
2021-06-30 15:51:09 -07:00
Alexey Bayduraev
f20510d552 tools lib: Adopt bitmap_intersects() operation from the kernel sources
Adopt bitmap_intersects() routine that tests whether bitmaps bitmap1 and
bitmap2 intersects. This routine will be used during thread masks
initialization.

Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Link: http://lore.kernel.org/lkml/f75aa738d8ff8f9cffd7532d671f3ef3deb97a7c.1625065643.git.alexey.v.bayduraev@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-30 15:28:00 -03:00
Linus Torvalds
36824f198c ARM:
- Add MTE support in guests, complete with tag save/restore interface
 
 - Reduce the impact of CMOs by moving them in the page-table code
 
 - Allow device block mappings at stage-2
 
 - Reduce the footprint of the vmemmap in protected mode
 
 - Support the vGIC on dumb systems such as the Apple M1
 
 - Add selftest infrastructure to support multiple configuration
   and apply that to PMU/non-PMU setups
 
 - Add selftests for the debug architecture
 
 - The usual crop of PMU fixes
 
 PPC:
 
 - Support for the H_RPT_INVALIDATE hypercall
 
 - Conversion of Book3S entry/exit to C
 
 - Bug fixes
 
 S390:
 
 - new HW facilities for guests
 
 - make inline assembly more robust with KASAN and co
 
 x86:
 
 - Allow userspace to handle emulation errors (unknown instructions)
 
 - Lazy allocation of the rmap (host physical -> guest physical address)
 
 - Support for virtualizing TSC scaling on VMX machines
 
 - Optimizations to avoid shattering huge pages at the beginning of live migration
 
 - Support for initializing the PDPTRs without loading them from memory
 
 - Many TLB flushing cleanups
 
 - Refuse to load if two-stage paging is available but NX is not (this has
   been a requirement in practice for over a year)
 
 - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
   CR0/CR4/EFER, using the MMU mode everywhere once it is computed
   from the CPU registers
 
 - Use PM notifier to notify the guest about host suspend or hibernate
 
 - Support for passing arguments to Hyper-V hypercalls using XMM registers
 
 - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on
   AMD processors
 
 - Hide Hyper-V hypercalls that are not included in the guest CPUID
 
 - Fixes for live migration of virtual machines that use the Hyper-V
   "enlightened VMCS" optimization of nested virtualization
 
 - Bugfixes (not many)
 
 Generic:
 
 - Support for retrieving statistics without debugfs
 
 - Cleanups for the KVM selftests API
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV9UYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOIRgf/XX8fKLh24RnTOs2ldIu2AfRGVrT4
 QMrr8MxhmtukBAszk2xKvBt8/6gkUjdaIC3xqEnVjxaDaUvZaEtP7CQlF5JV45rn
 iv1zyxUKucXrnIOr+gCioIT7qBlh207zV35ArKioP9Y83cWx9uAs22pfr6g+7RxO
 h8bJZlJbSG6IGr3voANCIb9UyjU1V/l8iEHqRwhmr/A5rARPfD7g8lfMEQeGkzX6
 +/UydX2fumB3tl8e2iMQj6vLVdSOsCkehvpHK+Z33EpkKhan7GwZ2sZ05WmXV/nY
 QLAYfD10KegoNWl5Ay4GTp4hEAIYVrRJCLC+wnLdc0U8udbfCuTC31LK4w==
 =NcRh
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This covers all architectures (except MIPS) so I don't expect any
  other feature pull requests this merge window.

  ARM:

   - Add MTE support in guests, complete with tag save/restore interface

   - Reduce the impact of CMOs by moving them in the page-table code

   - Allow device block mappings at stage-2

   - Reduce the footprint of the vmemmap in protected mode

   - Support the vGIC on dumb systems such as the Apple M1

   - Add selftest infrastructure to support multiple configuration and
     apply that to PMU/non-PMU setups

   - Add selftests for the debug architecture

   - The usual crop of PMU fixes

  PPC:

   - Support for the H_RPT_INVALIDATE hypercall

   - Conversion of Book3S entry/exit to C

   - Bug fixes

  S390:

   - new HW facilities for guests

   - make inline assembly more robust with KASAN and co

  x86:

   - Allow userspace to handle emulation errors (unknown instructions)

   - Lazy allocation of the rmap (host physical -> guest physical
     address)

   - Support for virtualizing TSC scaling on VMX machines

   - Optimizations to avoid shattering huge pages at the beginning of
     live migration

   - Support for initializing the PDPTRs without loading them from
     memory

   - Many TLB flushing cleanups

   - Refuse to load if two-stage paging is available but NX is not (this
     has been a requirement in practice for over a year)

   - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
     CR0/CR4/EFER, using the MMU mode everywhere once it is computed
     from the CPU registers

   - Use PM notifier to notify the guest about host suspend or hibernate

   - Support for passing arguments to Hyper-V hypercalls using XMM
     registers

   - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap
     on AMD processors

   - Hide Hyper-V hypercalls that are not included in the guest CPUID

   - Fixes for live migration of virtual machines that use the Hyper-V
     "enlightened VMCS" optimization of nested virtualization

   - Bugfixes (not many)

  Generic:

   - Support for retrieving statistics without debugfs

   - Cleanups for the KVM selftests API"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits)
  KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled
  kvm: x86: disable the narrow guest module parameter on unload
  selftests: kvm: Allows userspace to handle emulation errors.
  kvm: x86: Allow userspace to handle emulation errors
  KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on
  KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
  KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
  KVM: x86: Enhance comments for MMU roles and nested transition trickiness
  KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
  KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
  KVM: x86/mmu: Use MMU's role to determine PTTYPE
  KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
  KVM: x86/mmu: Add a helper to calculate root from role_regs
  KVM: x86/mmu: Add helper to update paging metadata
  KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
  KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
  KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
  KVM: x86/mmu: Get nested MMU's root level from the MMU's role
  ...
2021-06-28 15:40:51 -07:00
David S. Miller
e1289cfb63 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-06-28

The following pull-request contains BPF updates for your *net-next* tree.

We've added 37 non-merge commits during the last 12 day(s) which contain
a total of 56 files changed, 394 insertions(+), 380 deletions(-).

The main changes are:

1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.

2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.

3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.

4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.

5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.

6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.

7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.

8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.

9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.

10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.

11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.

12) Fix up bpfilter startup log-level to info level, from Gary Lin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28 15:28:03 -07:00
Paolo Bonzini
b8917b4ae4 KVM/arm64 updates for v5.14.
- Add MTE support in guests, complete with tag save/restore interface
 - Reduce the impact of CMOs by moving them in the page-table code
 - Allow device block mappings at stage-2
 - Reduce the footprint of the vmemmap in protected mode
 - Support the vGIC on dumb systems such as the Apple M1
 - Add selftest infrastructure to support multiple configuration
   and apply that to PMU/non-PMU setups
 - Add selftests for the debug architecture
 - The usual crop of PMU fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmDV2bEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDEr8P/ivwROx5NwGcHGmU5RfUCT3aFqhtVHHwD/lu
 jPcgoO61kz9TelOu6QRaVuK+mVHxcq3iP4R8nPq/QCkUlEXTmK2xkyhXhGXSYpH4
 6jM8+BbC3eG7iAxx6H0UM4JTl4Riwat6ZZtXpWEWs9TKqOHOQYFpMkxSttwVZ1CZ
 SjbtFvXLEdzKn6PzUWnKdBNMV/mHsdAtohZit9oJOc4ttc8072XxETQ4TFQ+MSvA
 j9zY9QPmWzgcZnotqRRu9sbTGO2vxtXuUtY3sjdD8+C9OgSe9qvpnNjymcmfwaMu
 1fBkfh65oaO4ItJBdGOUOoEcFqwN5imPiI7CB/O+ZYkO9sBCuTUPSQwPkyiwXb9r
 bUkTaQw2nZiNWsqR1x07fQ2sGYbMp5mnmgmqiV4MUWkLmFp9LZATCWYTTn24cBNS
 6SjVP6/8S0r3EhLnYjH0Pn1we5PooU1EF6RlCAd3ewYoo+9fPnwjNYwIWH5i5wB7
 +tnei44NACAw9cfbos+BYQQ/dY15OSFzLzIMomlabB7OpXOdDg3H6tJnPbFwWwXb
 9nF8XdHqxeDVVVrDCAx1BSodSXm9xqgnQM2RDGTUnpVcAfqAr3MXX6VsyKQDzj8T
 QXF9qOVCBAABv6BXAvSQ6mvMJZDUVbUPEPhf7kXzF46JsRd6A7wWoU/OnMGHQ/w7
 wjvH8HVy
 =fWBV
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for v5.14.

- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
  and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
2021-06-25 11:24:24 -04:00
Sean Christopherson
167f8a5cae KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans
Rename "nxe" to "efer_nx" so that future macro magic can use the pattern
<reg>_<bit> for all CR0, CR4, and EFER bits that included in the role.
Using "efer_nx" also makes it clear that the role bit reflects EFER.NX,
not the NX bit in the corresponding PTE.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-25-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:41 -04:00
Kumar Kartikeya Dwivedi
ee62a5c6bb libbpf: Switch to void * casting in netlink helpers
Netlink helpers I added in 8bbb77b7c7 ("libbpf: Add various netlink
helpers") used char * casts everywhere, and there were a few more that
existed from before.

Convert all of them to void * cast, as it is treated equivalently by
clang/gcc for the purposes of pointer arithmetic and to follow the
convention elsewhere in the kernel/libbpf.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210619041454.417577-2-memxor@gmail.com
2021-06-22 17:04:02 +02:00
Kumar Kartikeya Dwivedi
0ae64fb6b6 libbpf: Add request buffer type for netlink messages
Coverity complains about OOB writes to nlmsghdr. There is no OOB as we
write to the trailing buffer, but static analyzers and compilers may
rightfully be confused as the nlmsghdr pointer has subobject provenance
(and hence subobject bounds).

Fix this by using an explicit request structure containing the nlmsghdr,
struct tcmsg/ifinfomsg, and attribute buffer.

Also switch nh_tail (renamed to req_tail) to cast req * to char * so
that it can be understood as arithmetic on pointer to the representation
array (hence having same bound as request structure), which should
further appease analyzers.

As a bonus, callers don't have to pass sizeof(req) all the time now, as
size is implicitly obtained using the pointer. While at it, also reduce
the size of attribute buffer to 128 bytes (132 for ifinfomsg using
functions due to the padding).

Summary of problem:

  Even though C standard allows interconvertibility of pointer to first
  member and pointer to struct, for the purposes of alias analysis it
  would still consider the first as having pointer value "pointer to T"
  where T is type of first member hence having subobject bounds,
  allowing analyzers within reason to complain when object is accessed
  beyond the size of pointed to object.

  The only exception to this rule may be when a char * is formed to a
  member subobject. It is not possible for the compiler to be able to
  tell the intent of the programmer that it is a pointer to member
  object or the underlying representation array of the containing
  object, so such diagnosis is suppressed.

Fixes: 715c5ce454 ("libbpf: Add low level TC-BPF management API")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210619041454.417577-1-memxor@gmail.com
2021-06-22 17:03:52 +02:00
Jonathan Edwards
5c10a3dbe9 libbpf: Add extra BPF_PROG_TYPE check to bpf_object__probe_loading
eBPF has been backported for RHEL 7 w/ kernel 3.10-940+ [0]. However only
the following program types are supported [1]:

  BPF_PROG_TYPE_KPROBE
  BPF_PROG_TYPE_TRACEPOINT
  BPF_PROG_TYPE_PERF_EVENT

For libbpf this causes an EINVAL return during the bpf_object__probe_loading
call which only checks to see if programs of type BPF_PROG_TYPE_SOCKET_FILTER
can load.

The following will try BPF_PROG_TYPE_TRACEPOINT as a fallback attempt before
erroring out. BPF_PROG_TYPE_KPROBE was not a good candidate because on some
kernels it requires knowledge of the LINUX_VERSION_CODE.

  [0] https://www.redhat.com/en/blog/introduction-ebpf-red-hat-enterprise-linux-7
  [1] https://access.redhat.com/articles/3550581

Signed-off-by: Jonathan Edwards <jonathan.edwards@165gc.onmicrosoft.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210619151007.GA6963@165gc.onmicrosoft.com
2021-06-21 17:21:07 +02:00
Jakub Kicinski
adc2e56ebe Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh

scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-18 19:47:02 -07:00
Grant Seltzer
f42cfb469f bpf: Add documentation for libbpf including API autogen
This patch is meant to start the initiative to document libbpf.
It includes .rst files which are text documentation describing building,
API naming convention, as well as an index to generated API documentation.

In this approach the generated API documentation is enabled by the kernels
existing kernel documentation system which uses sphinx. The resulting docs
would then be synced to kernel.org/doc

You can test this by running `make htmldocs` and serving the html in
Documentation/output. Since libbpf does not yet have comments in kernel
doc format, see kernel.org/doc/html/latest/doc-guide/kernel-doc.html for
an example so you can test this.

The advantage of this approach is to use the existing sphinx
infrastructure that the kernel has, and have libbpf docs in
the same place as everything else.

The current plan is to have the libbpf mirror sync the generated docs
and version them based on the libbpf releases which are cut on github.

This patch includes the addition of libbpf_api.rst which pulls comment
documentation from header files in libbpf under tools/lib/bpf/. The comment
docs would be of the standard kernel doc format.

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210618140459.9887-2-grantseltzer@gmail.com
2021-06-18 23:06:06 +02:00
David S. Miller
a52171ae7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-06-17

The following pull-request contains BPF updates for your *net-next* tree.

We've added 50 non-merge commits during the last 25 day(s) which contain
a total of 148 files changed, 4779 insertions(+), 1248 deletions(-).

The main changes are:

1) BPF infrastructure to migrate TCP child sockets from a listener to another
   in the same reuseport group/map, from Kuniyuki Iwashima.

2) Add a provably sound, faster and more precise algorithm for tnum_mul() as
   noted in https://arxiv.org/abs/2105.05398, from Harishankar Vishwanathan.

3) Streamline error reporting changes in libbpf as planned out in the
   'libbpf: the road to v1.0' effort, from Andrii Nakryiko.

4) Add broadcast support to xdp_redirect_map(), from Hangbin Liu.

5) Extends bpf_map_lookup_and_delete_elem() functionality to 4 more map
   types, that is, {LRU_,PERCPU_,LRU_PERCPU_,}HASH, from Denis Salopek.

6) Support new LLVM relocations in libbpf to make them more linker friendly,
   also add a doc to describe the BPF backend relocations, from Yonghong Song.

7) Silence long standing KUBSAN complaints on register-based shifts in
   interpreter, from Daniel Borkmann and Eric Biggers.

8) Add dummy PT_REGS macros in libbpf to fail BPF program compilation when
   target arch cannot be determined, from Lorenz Bauer.

9) Extend AF_XDP to support large umems with 1M+ pages, from Magnus Karlsson.

10) Fix two minor libbpf tc BPF API issues, from Kumar Kartikeya Dwivedi.

11) Move libbpf BPF_SEQ_PRINTF/BPF_SNPRINTF macros that can be used by BPF
    programs to bpf_helpers.h header, from Florent Revest.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17 11:54:56 -07:00
Lorenz Bauer
4a638d581a libbpf: Fail compilation if target arch is missing
bpf2go is the Go equivalent of libbpf skeleton. The convention is that
the compiled BPF is checked into the repository to facilitate distributing
BPF as part of Go packages. To make this portable, bpf2go by default
generates both bpfel and bpfeb variants of the C.

Using bpf_tracing.h is inherently non-portable since the fields of
struct pt_regs differ between platforms, so CO-RE can't help us here.
The only way of working around this is to compile for each target
platform independently. bpf2go can't do this by default since there
are too many platforms.

Define the various PT_... macros when no target can be determined and
turn them into compilation failures. This works because bpf2go always
compiles for bpf targets, so the compiler fallback doesn't kick in.
Conditionally define __BPF_MISSING_TARGET so that we can inject a
more appropriate error message at build time. The user can then
choose which platform to target explicitly.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210616083635.11434-1-lmb@cloudflare.com
2021-06-16 20:15:30 -07:00
Kuniyuki Iwashima
50501271e7 libbpf: Set expected_attach_type for BPF_PROG_TYPE_SK_REUSEPORT.
This commit introduces a new section (sk_reuseport/migrate) and sets
expected_attach_type to two each section in BPF_PROG_TYPE_SK_REUSEPORT
program.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210612123224.12525-11-kuniyu@amazon.co.jp
2021-06-15 18:01:06 +02:00
Kumar Kartikeya Dwivedi
bbf29d3a2e libbpf: Set NLM_F_EXCL when creating qdisc
This got lost during the refactoring across versions. We always use
NLM_F_EXCL when creating some TC object, so reflect what the function
says and set the flag.

Fixes: 715c5ce454 ("libbpf: Add low level TC-BPF management API")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210612023502.1283837-3-memxor@gmail.com
2021-06-15 14:00:30 +02:00
Kumar Kartikeya Dwivedi
4e164f8716 libbpf: Remove unneeded check for flags during tc detach
Coverity complained about this being unreachable code. It is right
because we already enforce flags to be unset, so a check validating
the flag value is redundant.

Fixes: 715c5ce454 ("libbpf: Add low level TC-BPF management API")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210612023502.1283837-2-memxor@gmail.com
2021-06-15 13:58:56 +02:00
Wang Hai
3b3af91cb6 libbpf: Simplify the return expression of bpf_object__init_maps function
There is no need for special treatment of the 'ret == 0' case.
This patch simplifies the return expression.

Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210609115651.3392580-1-wanghai38@huawei.com
2021-06-11 15:30:52 -07:00
Michal Suchanek
edc0571c5f libbpf: Fix pr_warn type warnings on 32bit
The printed value is ptrdiff_t and is formatted wiht %ld. This works on
64bit but produces a warning on 32bit. Fix the format specifier to %td.

Fixes: 6723474373 ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210604112448.32297-1-msuchanek@suse.de
2021-06-08 22:02:10 +02:00
Kev Jackson
11fc79fc9f libbpf: Fixes incorrect rx_ring_setup_done
When calling xsk_socket__create_shared(), the logic at line 1097 marks a
boolean flag true within the xsk_umem structure to track setup progress
in order to support multiple calls to the function.  However, instead of
marking umem->tx_ring_setup_done, the code incorrectly sets
umem->rx_ring_setup_done.  This leads to improper behaviour when
creating and destroying xsk and umem structures.

Multiple calls to this function is documented as supported.

Fixes: ca7a83e248 ("libbpf: Only create rx and tx XDP rings when necessary")
Signed-off-by: Kev Jackson <foamdino@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/YL4aU4f3Aaik7CN0@linux-dev
2021-06-07 17:44:03 -07:00
Andrii Nakryiko
7d8a819dd3 libbpf: Install skel_internal.h header used from light skeletons
Light skeleton code assumes skel_internal.h header to be installed system-wide
by libbpf package. Make sure it is actually installed.

Fixes: 6723474373 ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-4-andrii@kernel.org
2021-06-03 15:50:11 +02:00
Andrii Nakryiko
232c9e8bd5 libbpf: Refactor header installation portions of Makefile
As we gradually get more headers that have to be installed, it's quite
annoying to copy/paste long $(call) commands. So extract that logic and do
a simple $(foreach) over the list of headers.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-3-andrii@kernel.org
2021-06-03 15:50:11 +02:00
Andrii Nakryiko
16cac00606 libbpf: Move few APIs from 0.4 to 0.5 version
Official libbpf 0.4 release doesn't include three APIs that were tentatively
put into 0.4 section. Fix libbpf.map and move these three APIs:

  - bpf_map__initial_value;
  - bpf_map_lookup_and_delete_elem_flags;
  - bpf_object__gen_loader.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-2-andrii@kernel.org
2021-06-03 15:50:05 +02:00
Jakub Kicinski
5ada57a9a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
cdc-wdm: s/kill_urbs/poison_urbs/ to fix build

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 09:55:10 -07:00
Florent Revest
d6a6a55518 libbpf: Move BPF_SEQ_PRINTF and BPF_SNPRINTF to bpf_helpers.h
These macros are convenient wrappers around the bpf_seq_printf and
bpf_snprintf helpers. They are currently provided by bpf_tracing.h which
targets low level tracing primitives. bpf_helpers.h is a better fit.

The __bpf_narg and __bpf_apply are needed in both files and provided
twice. __bpf_empty isn't used anywhere and is removed from bpf_tracing.h

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210526164643.2881368-1-revest@chromium.org
2021-05-26 10:45:41 -07:00
Andrii Nakryiko
e9fc3ce99b libbpf: Streamline error reporting for high-level APIs
Implement changes to error reporting for high-level libbpf APIs to make them
less surprising and less error-prone to users:
  - in all the cases when error happens, errno is set to an appropriate error
    value;
  - in libbpf 1.0 mode, all pointer-returning APIs return NULL on error and
    error code is communicated through errno; this applies both to APIs that
    already returned NULL before (so now they communicate more detailed error
    codes), as well as for many APIs that used ERR_PTR() macro and encoded
    error numbers as fake pointers.
  - in legacy (default) mode, those APIs that were returning ERR_PTR(err),
    continue doing so, but still set errno.

With these changes, errno can be always used to extract actual error,
regardless of legacy or libbpf 1.0 modes. This is utilized internally in
libbpf in places where libbpf uses it's own high-level APIs.
libbpf_get_error() is adapted to handle both cases completely transparently to
end-users (and is used by libbpf consistently as well).

More context, justification, and discussion can be found in "Libbpf: the road
to v1.0" document ([0]).

  [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-5-andrii@kernel.org
2021-05-25 17:32:35 -07:00
Andrii Nakryiko
f12b654327 libbpf: Streamline error reporting for low-level APIs
Ensure that low-level APIs behave uniformly across the libbpf as follows:
  - in case of an error, errno is always set to the correct error code;
  - when libbpf 1.0 mode is enabled with LIBBPF_STRICT_DIRECT_ERRS option to
    libbpf_set_strict_mode(), return -Exxx error value directly, instead of -1;
  - by default, until libbpf 1.0 is released, keep returning -1 directly.

More context, justification, and discussion can be found in "Libbpf: the road
to v1.0" document ([0]).

  [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-4-andrii@kernel.org
2021-05-25 17:32:35 -07:00
Andrii Nakryiko
5981881d21 libbpf: Add libbpf_set_strict_mode() API to turn on libbpf 1.0 behaviors
Add libbpf_set_strict_mode() API that allows application to simulate libbpf
1.0 breaking changes before libbpf 1.0 is released. This will help users
migrate gradually and with confidence.

For now only ALL or NONE options are available, subsequent patches will add
more flags. This patch is preliminary for selftests/bpf changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-2-andrii@kernel.org
2021-05-25 17:32:34 -07:00
Yonghong Song
9f0c317f6a libbpf: Add support for new llvm bpf relocations
LLVM patch https://reviews.llvm.org/D102712
narrowed the scope of existing R_BPF_64_64
and R_BPF_64_32 relocations, and added three
new relocations, R_BPF_64_ABS64, R_BPF_64_ABS32
and R_BPF_64_NODYLD32. The main motivation is
to make relocations linker friendly.

This change, unfortunately, breaks libbpf build,
and we will see errors like below:
  libbpf: ELF relo #0 in section #6 has unexpected type 2 in
     /home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o
  Error: failed to link
     '/home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o':
     Unknown error -22 (-22)
The new relocation R_BPF_64_ABS64 is generated
and libbpf linker sanity check doesn't understand it.
Relocation section '.rel.struct_ops' at offset 0x1410 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name
0000000000000018  0000000700000002 R_BPF_64_ABS64         0000000000000000 nogpltcp_init

Look at the selftests/bpf/bpf_tcp_nogpl.c,
  void BPF_STRUCT_OPS(nogpltcp_init, struct sock *sk)
  {
  }

  SEC(".struct_ops")
  struct tcp_congestion_ops bpf_nogpltcp = {
          .init           = (void *)nogpltcp_init,
          .name           = "bpf_nogpltcp",
  };
The new llvm relocation scheme categorizes 'nogpltcp_init' reference
as R_BPF_64_ABS64 instead of R_BPF_64_64 which is used to specify
ld_imm64 relocation in the new scheme.

Let us fix the linker sanity checking by including
R_BPF_64_ABS64 and R_BPF_64_ABS32. There is no need to
check R_BPF_64_NODYLD32 which is used for .BTF and .BTF.ext.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210522162341.3687617-1-yhs@fb.com
2021-05-24 21:03:05 -07:00
Denis Salopek
d59b9f2d1b bpf: Extend libbpf with bpf_map_lookup_and_delete_elem_flags
Add bpf_map_lookup_and_delete_elem_flags() libbpf API in order to use
the BPF_F_LOCK flag with the map_lookup_and_delete_elem() function.

Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/15b05dafe46c7e0750d110f233977372029d1f62.1620763117.git.denis.salopek@sartura.hr
2021-05-24 13:30:51 -07:00
Stanislav Fomichev
f9bceaa59c libbpf: Skip bpf_object__probe_loading for light skeleton
I'm getting the following error when running 'gen skeleton -L' as
regular user:

libbpf: Error in bpf_object__probe_loading():Operation not permitted(1).
Couldn't load trivial BPF program. Make sure your kernel supports BPF
(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
value.

Fixes: 6723474373 ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210521030653.2626513-1-sdf@google.com
2021-05-24 12:22:09 -07:00
Alexei Starovoitov
5d67f34959 bpf: Add cmd alias BPF_PROG_RUN
Add BPF_PROG_RUN command as an alias to BPF_RPOG_TEST_RUN to better
indicate the full range of use cases done by the command.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210519014032.20908-1-alexei.starovoitov@gmail.com
2021-05-19 15:35:12 +02:00
Alexei Starovoitov
7723256bf2 libbpf: Introduce bpf_map__initial_value().
Introduce bpf_map__initial_value() to read initial contents
of mmaped data/rodata/bss maps.
Note that bpf_map__set_initial_value() doesn't allow modifying
kconfig map while bpf_map__initial_value() allows reading
its values.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-17-alexei.starovoitov@gmail.com
2021-05-19 00:40:56 +02:00
Alexei Starovoitov
30f51aedab libbpf: Cleanup temp FDs when intermediate sys_bpf fails.
Fix loader program to close temporary FDs when intermediate
sys_bpf command fails.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-16-alexei.starovoitov@gmail.com
2021-05-19 00:40:44 +02:00
Alexei Starovoitov
6723474373 libbpf: Generate loader program out of BPF ELF file.
The BPF program loading process performed by libbpf is quite complex
and consists of the following steps:
"open" phase:
- parse elf file and remember relocations, sections
- collect externs and ksyms including their btf_ids in prog's BTF
- patch BTF datasec (since llvm couldn't do it)
- init maps (old style map_def, BTF based, global data map, kconfig map)
- collect relocations against progs and maps
"load" phase:
- probe kernel features
- load vmlinux BTF
- resolve externs (kconfig and ksym)
- load program BTF
- init struct_ops
- create maps
- apply CO-RE relocations
- patch ld_imm64 insns with src_reg=PSEUDO_MAP, PSEUDO_MAP_VALUE, PSEUDO_BTF_ID
- reposition subprograms and adjust call insns
- sanitize and load progs

During this process libbpf does sys_bpf() calls to load BTF, create maps,
populate maps and finally load programs.
Instead of actually doing the syscalls generate a trace of what libbpf
would have done and represent it as the "loader program".
The "loader program" consists of single map with:
- union bpf_attr(s)
- BTF bytes
- map value bytes
- insns bytes
and single bpf program that passes bpf_attr(s) and data into bpf_sys_bpf() helper.
Executing such "loader program" via bpf_prog_test_run() command will
replay the sequence of syscalls that libbpf would have done which will result
the same maps created and programs loaded as specified in the elf file.
The "loader program" removes libelf and majority of libbpf dependency from
program loading process.

kconfig, typeless ksym, struct_ops and CO-RE are not supported yet.

The order of relocate_data and relocate_calls had to change, so that
bpf_gen__prog_load() can see all relocations for a given program with
correct insn_idx-es.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-15-alexei.starovoitov@gmail.com
2021-05-19 00:39:40 +02:00
Alexei Starovoitov
e2fa0156a4 libbpf: Preliminary support for fd_idx
Prep libbpf to use FD_IDX kernel feature when generating loader program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-14-alexei.starovoitov@gmail.com
2021-05-19 00:33:41 +02:00
Alexei Starovoitov
9ca1f56aba libbpf: Add bpf_object pointer to kernel_supports().
Add a pointer to 'struct bpf_object' to kernel_supports() helper.
It will be used in the next patch.
No functional changes.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-13-alexei.starovoitov@gmail.com
2021-05-19 00:33:40 +02:00
Alexei Starovoitov
b126882672 libbpf: Change the order of data and text relocations.
In order to be able to generate loader program in the later
patches change the order of data and text relocations.
Also improve the test to include data relos.

If the kernel supports "FD array" the map_fd relocations can be processed
before text relos since generated loader program won't need to manually
patch ld_imm64 insns with map_fd.
But ksym and kfunc relocations can only be processed after all calls
are relocated, since loader program will consist of a sequence
of calls to bpf_btf_find_by_name_kind() followed by patching of btf_id
and btf_obj_fd into corresponding ld_imm64 insns. The locations of those
ld_imm64 insns are specified in relocations.
Hence process all data relocations (maps, ksym, kfunc) together after call relos.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-12-alexei.starovoitov@gmail.com
2021-05-19 00:33:40 +02:00
Alexei Starovoitov
5452fc9a17 libbpf: Support for syscall program type
Trivial support for syscall program type.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-5-alexei.starovoitov@gmail.com
2021-05-19 00:33:40 +02:00
Kumar Kartikeya Dwivedi
715c5ce454 libbpf: Add low level TC-BPF management API
This adds functions that wrap the netlink API used for adding, manipulating,
and removing traffic control filters.

The API summary:

A bpf_tc_hook represents a location where a TC-BPF filter can be attached.
This means that creating a hook leads to creation of the backing qdisc,
while destruction either removes all filters attached to a hook, or destroys
qdisc if requested explicitly (as discussed below).

The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
query, and detach tc filters. All functions return 0 on success, and a
negative error code on failure.

bpf_tc_hook_create - Create a hook
Parameters:
	@hook - Cannot be NULL, ifindex > 0, attach_point must be set to
		proper enum constant. Note that parent must be unset when
		attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
		that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
		valid value for attach_point.

		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.

bpf_tc_hook_destroy - Destroy a hook
Parameters:
	@hook - Cannot be NULL. The behaviour depends on value of
		attach_point. If BPF_TC_INGRESS, all filters attached to
		the ingress hook will be detached. If BPF_TC_EGRESS, all
		filters attached to the egress hook will be detached. If
		BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
		deleted, also detaching all filters. As before, parent must
		be unset for these attach_points, and set for BPF_TC_CUSTOM.

		It is advised that if the qdisc is operated on by many programs,
		then the program at least check that there are no other existing
		filters before deleting the clsact qdisc. An example is shown
		below:

		DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
				    .attach_point = BPF_TC_INGRESS);
		/* set opts as NULL, as we're not really interested in
		 * getting any info for a particular filter, but just
	 	 * detecting its presence.
		 */
		r = bpf_tc_query(&hook, NULL);
		if (r == -ENOENT) {
			/* no filters */
			hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
			return bpf_tc_hook_destroy(&hook);
		} else {
			/* failed or r == 0, the latter means filters do exist */
			return r;
		}

		Note that there is a small race between checking for no
		filters and deleting the qdisc. This is currently unavoidable.

		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.

bpf_tc_attach - Attach a filter to a hook
Parameters:
	@hook - Cannot be NULL. Represents the hook the filter will be
		attached to. Requirements for ifindex and attach_point are
		same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
		is also supported.  In that case, parent must be set to the
		handle where the filter will be attached (using BPF_TC_PARENT).
		E.g. to set parent to 1:16 like in tc command line, the
		equivalent would be BPF_TC_PARENT(1, 16).

	@opts - Cannot be NULL. The following opts are optional:
		* handle   - The handle of the filter
		* priority - The priority of the filter
			     Must be >= 0 and <= UINT16_MAX
		Note that when left unset, they will be auto-allocated by
		the kernel. The following opts must be set:
		* prog_fd - The fd of the loaded SCHED_CLS prog
		The following opts must be unset:
		* prog_id - The ID of the BPF prog
		The following opts are optional:
		* flags - Currently only BPF_TC_F_REPLACE is allowed. It
			  allows replacing an existing filter instead of
			  failing with -EEXIST.
		The following opts will be filled by bpf_tc_attach on a
		successful attach operation if they are unset:
		* handle   - The handle of the attached filter
		* priority - The priority of the attached filter
		* prog_id  - The ID of the attached SCHED_CLS prog
		This way, the user can know what the auto allocated values
		for optional opts like handle and priority are for the newly
		attached filter, if they were unset.

		Note that some other attributes are set to fixed default
		values listed below (this holds for all bpf_tc_* APIs):
		protocol as ETH_P_ALL, direct action mode, chain index of 0,
		and class ID of 0 (this can be set by writing to the
		skb->tc_classid field from the BPF program).

bpf_tc_detach
Parameters:
	@hook - Cannot be NULL. Represents the hook the filter will be
		detached from. Requirements are same as described above
		in bpf_tc_attach.

	@opts - Cannot be NULL. The following opts must be set:
		* handle, priority
		The following opts must be unset:
		* prog_fd, prog_id, flags

bpf_tc_query
Parameters:
	@hook - Cannot be NULL. Represents the hook where the filter lookup will
		be performed. Requirements are same as described above in
		bpf_tc_attach().

	@opts - Cannot be NULL. The following opts must be set:
		* handle, priority
		The following opts must be unset:
		* prog_fd, prog_id, flags
		The following fields will be filled by bpf_tc_query upon a
		successful lookup:
		* prog_id

Some usage examples (using BPF skeleton infrastructure):

BPF program (test_tc_bpf.c):

	#include <linux/bpf.h>
	#include <bpf/bpf_helpers.h>

	SEC("classifier")
	int cls(struct __sk_buff *skb)
	{
		return 0;
	}

Userspace loader:

	struct test_tc_bpf *skel = NULL;
	int fd, r;

	skel = test_tc_bpf__open_and_load();
	if (!skel)
		return -ENOMEM;

	fd = bpf_program__fd(skel->progs.cls);

	DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
			    if_nametoindex("lo"), .attach_point =
			    BPF_TC_INGRESS);
	/* Create clsact qdisc */
	r = bpf_tc_hook_create(&hook);
	if (r < 0)
		goto end;

	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
	r = bpf_tc_attach(&hook, &opts);
	if (r < 0)
		goto end;
	/* Print the auto allocated handle and priority */
	printf("Handle=%u", opts.handle);
	printf("Priority=%u", opts.priority);

	opts.prog_fd = opts.prog_id = 0;
	bpf_tc_detach(&hook, &opts);
end:
	test_tc_bpf__destroy(skel);

This is equivalent to doing the following using tc command line:
  # tc qdisc add dev lo clsact
  # tc filter add dev lo ingress bpf obj foo.o sec classifier da
  # tc filter del dev lo ingress handle <h> prio <p> bpf
... where the handle and priority can be found using:
  # tc filter show dev lo ingress

Another example replacing a filter (extending prior example):

	/* We can also choose both (or one), let's try replacing an
	 * existing filter.
	 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
			    opts.handle, .priority = opts.priority,
			    .prog_fd = fd);
	r = bpf_tc_attach(&hook, &replace_opts);
	if (r == -EEXIST) {
		/* Expected, now use BPF_TC_F_REPLACE to replace it */
		replace_opts.flags = BPF_TC_F_REPLACE;
		return bpf_tc_attach(&hook, &replace_opts);
	} else if (r < 0) {
		return r;
	}
	/* There must be no existing filter with these
	 * attributes, so cleanup and return an error.
	 */
	replace_opts.prog_fd = replace_opts.prog_id = 0;
	bpf_tc_detach(&hook, &replace_opts);
	return -1;

To obtain info of a particular filter:

	/* Find info for filter with handle 1 and priority 50 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
			    .priority = 50);
	r = bpf_tc_query(&hook, &info_opts);
	if (r == -ENOENT)
		printf("Filter not found");
	else if (r < 0)
		return r;
	printf("Prog ID: %u", info_opts.prog_id);
	return 0;

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> # libbpf API design
[ Daniel: also did major patch cleanup ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-3-memxor@gmail.com
2021-05-17 17:48:45 +02:00
Kumar Kartikeya Dwivedi
8bbb77b7c7 libbpf: Add various netlink helpers
This change introduces a few helpers to wrap open coded attribute
preparation in netlink.c. It also adds a libbpf_netlink_send_recv() that
is useful to wrap send + recv handling in a generic way. Subsequent patch
will also use this function for sending and receiving a netlink response.
The libbpf_nl_get_link() helper has been removed instead, moving socket
creation into the newly named libbpf_netlink_send_recv().

Every nested attribute's closure must happen using the helper
nlattr_end_nested(), which sets its length properly. NLA_F_NESTED is
enforced using nlattr_begin_nested() helper. Other simple attributes
can be added directly.

The maxsz parameter corresponds to the size of the request structure
which is being filled in, so for instance with req being:

  struct {
	struct nlmsghdr nh;
	struct tcmsg t;
	char buf[4096];
  } req;

Then, maxsz should be sizeof(req).

This change also converts the open coded attribute preparation with these
helpers. Note that the only failure the internal call to nlattr_add()
could result in the nested helper would be -EMSGSIZE, hence that is what
we return to our caller.

The libbpf_netlink_send_recv() call takes care of opening the socket,
sending the netlink message, receiving the response, potentially invoking
callbacks, and return errors if any, and then finally close the socket.
This allows users to avoid identical socket setup code in different places.
The only user of libbpf_nl_get_link() has been converted to make use of it.
__bpf_set_link_xdp_fd_replace() has also been refactored to use it.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
[ Daniel: major patch cleanup ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-2-memxor@gmail.com
2021-05-17 17:48:36 +02:00
Andrii Nakryiko
513f485ca5 libbpf: Reject static entry-point BPF programs
Detect use of static entry-point BPF programs (those with SEC() markings) and
emit error message. This is similar to
c1cccec9c6 ("libbpf: Reject static maps") but for BPF programs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210514195534.1440970-1-andrii@kernel.org
2021-05-14 16:11:09 -07:00
Andrii Nakryiko
c1cccec9c6 libbpf: Reject static maps
Static maps never really worked with libbpf, because all such maps were always
silently resolved to the very first map. Detect static maps (both legacy and
BTF-defined) and report user-friendly error.

Tested locally by switching few maps (legacy and BTF-defined) in selftests to
static ones and verifying that now libbpf rejects them loudly.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210513233643.194711-2-andrii@kernel.org
2021-05-13 17:23:58 -07:00
David S. Miller
df6f823703 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-05-11

The following pull-request contains BPF updates for your *net* tree.

We've added 13 non-merge commits during the last 8 day(s) which contain
a total of 21 files changed, 817 insertions(+), 382 deletions(-).

The main changes are:

1) Fix multiple ringbuf bugs in particular to prevent writable mmap of
   read-only pages, from Andrii Nakryiko & Thadeu Lima de Souza Cascardo.

2) Fix verifier alu32 known-const subregister bound tracking for bitwise
   operations and/or/xor, from Daniel Borkmann.

3) Reject trampoline attachment for functions with variable arguments,
   and also add a deny list of other forbidden functions, from Jiri Olsa.

4) Fix nested bpf_bprintf_prepare() calls used by various helpers by
   switching to per-CPU buffers, from Florent Revest.

5) Fix kernel compilation with BTF debug info on ppc64 due to pahole
   missing TCP-CC functions like cubictcp_init, from Martin KaFai Lau.

6) Add a kconfig entry to provide an option to disallow unprivileged
   BPF by default, from Daniel Borkmann.

7) Fix libbpf compilation for older libelf when GELF_ST_VISIBILITY()
   macro is not available, from Arnaldo Carvalho de Melo.

8) Migrate test_tc_redirect to test_progs framework as prep work
   for upcoming skb_change_head() fix & selftest, from Jussi Maki.

9) Fix a libbpf segfault in add_dummy_ksym_var() if BTF is not
   present, from Ian Rogers.

10) Fix tx_only micro-benchmark in xdpsock BPF sample with proper frame
    size, from Magnus Karlsson.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11 16:05:56 -07:00
Andrii Nakryiko
e5670fa029 libbpf: Treat STV_INTERNAL same as STV_HIDDEN for functions
Do the same global -> static BTF update for global functions with STV_INTERNAL
visibility to turn on static BPF verification mode.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-7-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Andrii Nakryiko
247b8634e6 libbpf: Fix ELF symbol visibility update logic
Fix silly bug in updating ELF symbol's visibility.

Fixes: a46349227c ("libbpf: Add linker extern resolution support for functions and global variables")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-6-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Andrii Nakryiko
fdbf5ddeb8 libbpf: Add per-file linker opts
For better future extensibility add per-file linker options. Currently
the set of available options is empty. This changes bpf_linker__add_file()
API, but it's not a breaking change as bpf_linker APIs hasn't been released
yet.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-3-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Arnaldo Carvalho de Melo
67e7ec0bd4 libbpf: Provide GELF_ST_VISIBILITY() define for older libelf
Where that macro isn't available.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/YJaspEh0qZr4LYOc@kernel.org
2021-05-11 23:07:33 +02:00
Linus Torvalds
fc858a5231 Networking fixes for 5.13-rc1, including fixes from bpf, can
and netfilter trees. Self-contained fixes, nothing risky.
 
 Current release - new code bugs:
 
  - dsa: ksz: fix a few bugs found by static-checker in the new driver
 
  - stmmac: fix frame preemption handshake not triggering after
            interface restart
 
 Previous releases - regressions:
 
  - make nla_strcmp handle more then one trailing null character
 
  - fix stack OOB reads while fragmenting IPv4 packets in openvswitch
    and net/sched
 
  - sctp: do asoc update earlier in sctp_sf_do_dupcook_a
 
  - sctp: delay auto_asconf init until binding the first addr
 
  - stmmac: clear receive all(RA) bit when promiscuous mode is off
 
  - can: mcp251x: fix resume from sleep before interface was brought up
 
 Previous releases - always broken:
 
  - bpf: fix leakage of uninitialized bpf stack under speculation
 
  - bpf: fix masking negation logic upon negative dst register
 
  - netfilter: don't assume that skb_header_pointer() will never fail
 
  - only allow init netns to set default tcp cong to a restricted algo
 
  - xsk: fix xp_aligned_validate_desc() when len == chunk_size to
         avoid false positive errors
 
  - ethtool: fix missing NLM_F_MULTI flag when dumping
 
  - can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
 
  - sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
 
  - bridge: fix NULL-deref caused by a races between assigning
            rx_handler_data and setting the IFF_BRIDGE_PORT bit
 
 Latecomer:
 
  - seg6: add counters support for SRv6 Behaviors
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmCV3YoACgkQMUZtbf5S
 IrsQ2w//Q8/qbl6wGTKUfu6DZHYUU5j5sTwiHR823PKKSgXI+okWMN0KUlZszOsz
 qnPkH6GuojRooOE1s8PFLSlt9axKhQ0y7uzMTrWYafQ+JZTtgg9/MiPxQ8fdiE5i
 uOG1ngttZ+1jlE5tMPL4GAOSegg3rWVDclzqnJTdsPPOco3MWj6SL9xN0LDPxCEL
 BDysRqL/UiOIoh4v6IXQRx2UWjsNGu4biM1po+Jfumnd9T0zKoEpzu6UN6yPShbx
 284LihZSQtughCbhGqkErBOxfjZcvpFOQrqmjEvI+Z/eYg4InfWZemt8Sa92/alE
 yAFjK76MUTaUxaAO/gk8XauhvkYOzJJwKpqhbOmlaM7oj55QdzT5/8JxMxVoA6hV
 pscHOixk15GVse49PdPV8v47cyTLc/Xi69i+/uUdNVVfuORL1wft1w1xbd0S6Pbe
 7Gqax21S7zxcDsrUli7cFheYiqtbQAL0anlIUz8tUOZFz0VQ/zPuFd4rUYZ/o38V
 Mrevdk3t6CXNxS4CRXyUW4UejYB1O6Qw12sUue31e3h73d6LiN3NAiN5Qp7SEk1/
 fvk+jfOf8vvmtimYvcUK2i0D+vqj4Ec/qRIE/XXuUDBcp22tPL9uWMfWavwTdAj1
 Se4SzksTWF+NM0lO0ItonMyPh3ZXcSLhIv/gHrZwEKuWkXCGO4M=
 =JmWS
 -----END PGP SIGNATURE-----

Merge tag 'net-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.13-rc1, including fixes from bpf, can and
  netfilter trees. Self-contained fixes, nothing risky.

  Current release - new code bugs:

   - dsa: ksz: fix a few bugs found by static-checker in the new driver

   - stmmac: fix frame preemption handshake not triggering after
     interface restart

  Previous releases - regressions:

   - make nla_strcmp handle more then one trailing null character

   - fix stack OOB reads while fragmenting IPv4 packets in openvswitch
     and net/sched

   - sctp: do asoc update earlier in sctp_sf_do_dupcook_a

   - sctp: delay auto_asconf init until binding the first addr

   - stmmac: clear receive all(RA) bit when promiscuous mode is off

   - can: mcp251x: fix resume from sleep before interface was brought up

  Previous releases - always broken:

   - bpf: fix leakage of uninitialized bpf stack under speculation

   - bpf: fix masking negation logic upon negative dst register

   - netfilter: don't assume that skb_header_pointer() will never fail

   - only allow init netns to set default tcp cong to a restricted algo

   - xsk: fix xp_aligned_validate_desc() when len == chunk_size to avoid
     false positive errors

   - ethtool: fix missing NLM_F_MULTI flag when dumping

   - can: m_can: m_can_tx_work_queue(): fix tx_skb race condition

   - sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b

   - bridge: fix NULL-deref caused by a races between assigning
     rx_handler_data and setting the IFF_BRIDGE_PORT bit

  Latecomer:

   - seg6: add counters support for SRv6 Behaviors"

* tag 'net-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
  atm: firestream: Use fallthrough pseudo-keyword
  net: stmmac: Do not enable RX FIFO overflow interrupts
  mptcp: fix splat when closing unaccepted socket
  i40e: Remove LLDP frame filters
  i40e: Fix PHY type identifiers for 2.5G and 5G adapters
  i40e: fix the restart auto-negotiation after FEC modified
  i40e: Fix use-after-free in i40e_client_subtask()
  i40e: fix broken XDP support
  netfilter: nftables: avoid potential overflows on 32bit arches
  netfilter: nftables: avoid overflows in nft_hash_buckets()
  tcp: Specify cmsgbuf is user pointer for receive zerocopy.
  mlxsw: spectrum_mr: Update egress RIF list before route's action
  net: ipa: fix inter-EE IRQ register definitions
  can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
  can: mcp251x: fix resume from sleep before interface was brought up
  can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
  can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe
  netfilter: nftables: Fix a memleak from userdata error path in new objects
  netfilter: remove BUG_ON() after skb_header_pointer()
  netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
  ...
2021-05-08 08:31:46 -07:00
Linus Torvalds
a48b0872e6 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
 "This is everything else from -mm for this merge window.

  90 patches.

  Subsystems affected by this patch series: mm (cleanups and slub),
  alpha, procfs, sysctl, misc, core-kernel, bitmap, lib, compat,
  checkpatch, epoll, isofs, nilfs2, hpfs, exit, fork, kexec, gcov,
  panic, delayacct, gdb, resource, selftests, async, initramfs, ipc,
  drivers/char, and spelling"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (90 commits)
  mm: fix typos in comments
  mm: fix typos in comments
  treewide: remove editor modelines and cruft
  ipc/sem.c: spelling fix
  fs: fat: fix spelling typo of values
  kernel/sys.c: fix typo
  kernel/up.c: fix typo
  kernel/user_namespace.c: fix typos
  kernel/umh.c: fix some spelling mistakes
  include/linux/pgtable.h: few spelling fixes
  mm/slab.c: fix spelling mistake "disired" -> "desired"
  scripts/spelling.txt: add "overflw"
  scripts/spelling.txt: Add "diabled" typo
  scripts/spelling.txt: add "overlfow"
  arm: print alloc free paths for address in registers
  mm/vmalloc: remove vwrite()
  mm: remove xlate_dev_kmem_ptr()
  drivers/char: remove /dev/kmem for good
  mm: fix some typos and code style problems
  ipc/sem.c: mundane typo fixes
  ...
2021-05-07 00:34:51 -07:00
Yury Norov
eaae7841ba tools: sync lib/find_bit implementation
Add fast paths to find_*_bit() functions as per kernel implementation.

Link: https://lkml.kernel.org/r/20210401003153.97325-12-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06 19:24:12 -07:00
Yury Norov
ea81c1ef44 tools: sync find_next_bit implementation
Sync the implementation with recent kernel changes.

Link: https://lkml.kernel.org/r/20210401003153.97325-9-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06 19:24:12 -07:00
Yury Norov
e5b9252d90 tools: bitmap: sync function declarations with the kernel
Some functions in tools/include/linux/bitmap.h declare nbits as int.  In
the kernel nbits is declared as unsigned int.

Link: https://lkml.kernel.org/r/20210401003153.97325-3-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06 19:24:11 -07:00
Ian Rogers
9683e5775c libbpf: Add NULL check to add_dummy_ksym_var
Avoids a segv if btf isn't present. Seen on the call path
__bpf_object__open calling bpf_object__collect_externs.

Fixes: 5bd022ec01 (libbpf: Support extern kernel function)
Suggested-by: Stanislav Fomichev <sdf@google.com>
Suggested-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210504234910.976501-1-irogers@google.com
2021-05-05 11:29:33 -07:00
Brendan Jackman
2a30f94406 libbpf: Fix signed overflow in ringbuf_process_ring
One of our benchmarks running in (Google-internal) CI pushes data
through the ringbuf faster htan than userspace is able to consume
it. In this case it seems we're actually able to get >INT_MAX entries
in a single ring_buffer__consume() call. ASAN detected that cnt
overflows in this case.

Fix by using 64-bit counter internally and then capping the result to
INT_MAX before converting to the int return type. Do the same for
the ring_buffer__poll().

Fixes: bf99c936f9 (libbpf: Add BPF ring buffer support)
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210429130510.1621665-1-jackmanb@google.com
2021-05-03 09:54:12 -07:00