Commit Graph

1088047 Commits

Author SHA1 Message Date
Kumar Kartikeya Dwivedi
97e6d7dab1 bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access
The commit being fixed was aiming to disallow users from incorrectly
obtaining writable pointer to memory that is only meant to be read. This
is enforced now using a MEM_RDONLY flag.

For instance, in case of global percpu variables, when the BTF type is
not struct (e.g. bpf_prog_active), the verifier marks register type as
PTR_TO_MEM | MEM_RDONLY from bpf_this_cpu_ptr or bpf_per_cpu_ptr
helpers. However, when passing such pointer to kfunc, global funcs, or
BPF helpers, in check_helper_mem_access, there is no expectation
MEM_RDONLY flag will be set, hence it is checked as pointer to writable
memory. Later, verifier sets up argument type of global func as
PTR_TO_MEM | PTR_MAYBE_NULL, so user can use a global func to get around
the limitations imposed by this flag.

This check will also cover global non-percpu variables that may be
introduced in kernel BTF in future.

Also, we update the log message for PTR_TO_BUF case to be similar to
PTR_TO_MEM case, so that the reason for error is clear to user.

Fixes: 34d3a78c68 ("bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.")
Reviewed-by: Hao Luo <haoluo@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-04-06 10:32:12 -07:00
Kumar Kartikeya Dwivedi
be77354a3d bpf: Do write access check for kfunc and global func
When passing pointer to some map value to kfunc or global func, in
verifier we are passing meta as NULL to various functions, which uses
meta->raw_mode to check whether memory is being written to. Since some
kfunc or global funcs may also write to memory pointers they receive as
arguments, we must check for write access to memory. E.g. in some case
map may be read only and this will be missed by current checks.

However meta->raw_mode allows for uninitialized memory (e.g. on stack),
since there is not enough info available through BTF, we must perform
one call for read access (raw_mode = false), and one for write access
(raw_mode = true).

Fixes: e5069b9c23 ("bpf: Support pointers in global func args")
Fixes: d583691c47 ("bpf: Introduce mem, size argument pair support for kfunc")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-04-06 10:32:12 -07:00
Artem Savkov
ebaf24c589 selftests/bpf: Use bpf_num_possible_cpus() in per-cpu map allocations
bpf_map_value_size() uses num_possible_cpus() to determine map size, but
some of the tests only allocate enough memory for online cpus. This
results in out-of-bound writes in userspace during bpf(BPF_MAP_LOOKUP_ELEM)
syscalls in cases when number of online cpus is lower than the number of
possible cpus. Fix by switching from get_nprocs_conf() to
bpf_num_possible_cpus() when determining the number of processors in
these tests (test_progs/netcnt and test_cgroup_storage).

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220406085408.339336-1-asavkov@redhat.com
2022-04-06 10:15:53 -07:00
Colin Ian King
a8d600f6bc libbpf: Fix spelling mistake "libaries" -> "libraries"
There is a spelling mistake in a pr_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220406080835.14879-1-colin.i.king@gmail.com
2022-04-06 10:14:27 -07:00
Yuntao Wang
958ddfd75d selftests/bpf: Fix issues in parse_num_list()
The function does not check that parsing_end is false after parsing
argument. Thus, if the final part of the argument is something like '4-',
which is invalid, parse_num_list() will discard it instead of returning
-EINVAL.

Before:

 $ ./test_progs -n 2,4-
 #2 atomic_bounds:OK
 Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

After:

 $ ./test_progs -n 2,4-
 Failed to parse test numbers.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220406003622.73539-1-ytcoode@gmail.com
2022-04-06 10:10:03 -07:00
Lorenzo Bianconi
1963c740dc net: netfilter: Reports ct direction in CT lookup helpers for XDP and TC-BPF
Report connection tracking tuple direction in
bpf_skb_ct_lookup/bpf_xdp_ct_lookup helpers. Direction will be used to
implement snat/dnat through xdp ebpf program.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/aa1aaac89191cfc64078ecef36c0a48c302321b9.1648908601.git.lorenzo@kernel.org
2022-04-06 09:58:30 -07:00
Yuntao Wang
2d0df01974 selftests/bpf: Fix file descriptor leak in load_kallsyms()
Currently, if sym_cnt > 0, it just returns and does not close file, fix it.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220405145711.49543-1-ytcoode@gmail.com
2022-04-05 16:49:32 -07:00
Xu Kuohai
042152c27c bpf, arm64: Sign return address for JITed code
Sign return address for JITed code when the kernel is built with pointer
authentication enabled:

1. Sign LR with paciasp instruction before LR is pushed to stack. Since
   paciasp acts like landing pads for function entry, no need to insert
   bti instruction before paciasp.

2. Authenticate LR with autiasp instruction after LR is popped from stack.

For BPF tail call, the stack frame constructed by the caller is reused by
the callee. That is, the stack frame is constructed by the caller and
destructed by the callee. Thus LR is signed and pushed to the stack in the
caller's prologue, and poped from the stack and authenticated in the
callee's epilogue.

For BPF2BPF call, the caller and callee construct their own stack frames,
and sign and authenticate their own LRs.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://events.static.linuxfound.org/sites/events/files/slides/slides_23.pdf
Link: https://lore.kernel.org/bpf/20220402073942.3782529-1-xukuohai@huawei.com
2022-04-06 00:04:22 +02:00
Alexei Starovoitov
9a7ef9f86b Merge branch 'Add libbpf support for USDTs'
Andrii Nakryiko says:

====================

Add libbpf support for USDT (User Statically-Defined Tracing) probes.
USDTs is important part of tracing, and BPF, ecosystem, widely used in
mission-critical production applications for observability, performance
analysis, and debugging.

And while USDTs themselves are pretty complicated abstraction built on top of
uprobes, for end-users USDT is as natural a primitive as uprobes themselves.
And thus it's important for libbpf to provide best possible user experience
when it comes to build tracing applications relying on USDTs.

USDTs historically presented a lot of challenges for libbpf's no
compilation-on-the-fly general approach to BPF tracing. BCC utilizes power of
on-the-fly source code generation and compilation using its embedded Clang
toolchain, which was impractical for more lightweight and thus more rigid
libbpf-based approach. But still, with enough diligence and BPF cookies it's
possible to implement USDT support that feels as natural as tracing any
uprobe.

This patch set is the culmination of such effort to add libbpf USDT support
following the spirit and philosophy of BPF CO-RE (even though it's not
inherently relying on BPF CO-RE much, see patch #1 for some notes regarding
this). Each respective patch has enough details and explanations, so I won't
go into details here.

In the end, I think the overall usability of libbpf's USDT support *exceeds*
the status quo set by BCC due to the elimination of awkward runtime USDT
supporting code generation. It also exceeds BCC's capabilities due to the use
of BPF cookie. This eliminates the need to determine a USDT call site (and
thus specifics about how exactly to fetch arguments) based on its *absolute IP
address*, which is impossible with shared libraries if no PID is specified (as
we then just *can't* know absolute IP at which shared library is loaded,
because it might be different for each process). With BPF cookie this is not
a problem as we record "call site ID" directly in a BPF cookie value. This
makes it possible to do a system-wide tracing of a USDT defined in a shared
library. Think about tracing some USDT in libc across any process in the
system, both running at the time of attachment and all the new processes
started *afterwards*. This is a very powerful capability that allows more
efficient observability and tracing tooling.

Once this functionality lands, the plan is to extend libbpf-bootstrap ([0])
with an USDT example. It will also become possible to start converting BCC
tools that rely on USDTs to their libbpf-based counterparts ([1]).

It's worth noting that preliminary version of this code was currently used and
tested in production code running fleet-wide observability toolkit.

Libbpf functionality is broken down into 5 mostly logically independent parts,
for ease of reviewing:
  - patch #1 adds BPF-side implementation;
  - patch #2 adds user-space APIs and wires bpf_link for USDTs;
  - patch #3 adds the most mundate pieces: handling ELF, parsing USDT notes,
    dealing with memory segments, relative vs absolute addresses, etc;
  - patch #4 adds internal ID allocation and setting up/tearing down of
    BPF-side state (spec and IP-to-ID mapping);
  - patch #5 implements x86/x86-64-specific logic of parsing USDT argument
    specifications;
  - patch #6 adds testing of various basic aspects of handling of USDT;
  - patch #7 extends the set of tests with more combinations of semaphore,
    executable vs shared library, and PID filter options.

  [0] https://github.com/libbpf/libbpf-bootstrap
  [1] https://github.com/iovisor/bcc/tree/master/libbpf-tools

v2->v3:
  - fix typos, leave link to systemtap doc, acks, etc (Dave);
  - include sys/sdt.h to avoid extra system-wide package dependencies;
v1->v2:
  - huge high-level comment describing how all the moving parts fit together
    (Alan, Alexei);
  - switched from `__hidden __weak` to `static inline __noinline` for now, as
    there is a bug in BPF linker breaking final BPF object file due to invalid
    .BTF.ext data; I want to fix it separately at which point I'll switch back
    to __hidden __weak again. The fix isn't trivial, so I don't want to block
    on that. Same for __weak variable lookup bug that Henqi reported.
  - various fixes and improvements, addressing other feedback (Alan, Hengqi);

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Hengqi Chen <hengqi.chen@gmail.com>
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-04-05 13:16:08 -07:00
Andrii Nakryiko
00a0fa2d7d selftests/bpf: Add urandom_read shared lib and USDTs
Extend urandom_read helper binary to include USDTs of 4 combinations:
semaphore/semaphoreless (refcounted and non-refcounted) and based in
executable or shared library. We also extend urandom_read with ability
to report it's own PID to parent process and wait for parent process to
ready itself up for tracing urandom_read. We utilize popen() and
underlying pipe properties for proper signaling.

Once urandom_read is ready, we add few tests to validate that libbpf's
USDT attachment handles all the above combinations of semaphore (or lack
of it) and static or shared library USDTs. Also, we validate that libbpf
handles shared libraries both with PID filter and without one (i.e., -1
for PID argument).

Having the shared library case tested with and without PID is important
because internal logic differs on kernels that don't support BPF
cookies. On such older kernels, attaching to USDTs in shared libraries
without specifying concrete PID doesn't work in principle, because it's
impossible to determine shared library's load address to derive absolute
IPs for uprobe attachments. Without absolute IPs, it's impossible to
perform correct look up of USDT spec based on uprobe's absolute IP (the
only kind available from BPF at runtime). This is not the problem on
newer kernels with BPF cookie as we don't need IP-to-ID lookup because
BPF cookie value *is* spec ID.

So having those two situations as separate subtests is good because
libbpf CI is able to test latest selftests against old kernels (e.g.,
4.9 and 5.5), so we'll be able to disable PID-less shared lib attachment
for old kernels, but will still leave PID-specific one enabled to validate
this legacy logic is working correctly.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-8-andrii@kernel.org
2022-04-05 13:16:08 -07:00
Andrii Nakryiko
630301b0d5 selftests/bpf: Add basic USDT selftests
Add semaphore-based USDT to test_progs itself and write basic tests to
valicate both auto-attachment and manual attachment logic, as well as
BPF-side functionality.

Also add subtests to validate that libbpf properly deduplicates USDT
specs and handles spec overflow situations correctly, as well as proper
"rollback" of partially-attached multi-spec USDT.

BPF-side of selftest intentionally consists of two files to validate
that usdt.bpf.h header can be included from multiple source code files
that are subsequently linked into final BPF object file without causing
any symbol duplication or other issues. We are validating that __weak
maps and bpf_usdt_xxx() API functions defined in usdt.bpf.h do work as
intended.

USDT selftests utilize sys/sdt.h header that on Ubuntu systems comes
from systemtap-sdt-devel package. But to simplify everyone's life,
including CI but especially casual contributors to bpf/bpf-next that
are trying to build selftests, I've checked in sys/sdt.h header from [0]
directly. This way it will work on all architectures and distros without
having to figure it out for every relevant combination and adding any
extra implicit package dependencies.

  [0] https://sourceware.org/git?p=systemtap.git;a=blob_plain;f=includes/sys/sdt.h;h=ca0162b4dc57520b96638c8ae79ad547eb1dd3a1;hb=HEAD

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-7-andrii@kernel.org
2022-04-05 13:16:08 -07:00
Andrii Nakryiko
4c59e584d1 libbpf: Add x86-specific USDT arg spec parsing logic
Add x86/x86_64-specific USDT argument specification parsing. Each
architecture will require their own logic, as all this is arch-specific
assembly-based notation. Architectures that libbpf doesn't support for
USDTs will pr_warn() with specific error and return -ENOTSUP.

We use sscanf() as a very powerful and easy to use string parser. Those
spaces in sscanf's format string mean "skip any whitespaces", which is
pretty nifty (and somewhat little known) feature.

All this was tested on little-endian architecture, so bit shifts are
probably off on big-endian, which our CI will hopefully prove.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-6-andrii@kernel.org
2022-04-05 13:16:08 -07:00
Andrii Nakryiko
999783c8bb libbpf: Wire up spec management and other arch-independent USDT logic
Last part of architecture-agnostic user-space USDT handling logic is to
set up BPF spec and, optionally, IP-to-ID maps from user-space.
usdt_manager performs a compact spec ID allocation to utilize
fixed-sized BPF maps as efficiently as possible. We also use hashmap to
deduplicate USDT arg spec strings and map identical strings to single
USDT spec, minimizing the necessary BPF map size. usdt_manager supports
arbitrary sequences of attachment and detachment, both of the same USDT
and multiple different USDTs and internally maintains a free list of
unused spec IDs. bpf_link_usdt's logic is extended with proper setup and
teardown of this spec ID free list and supporting BPF maps.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-5-andrii@kernel.org
2022-04-05 13:16:07 -07:00
Andrii Nakryiko
74cc6311ce libbpf: Add USDT notes parsing and resolution logic
Implement architecture-agnostic parts of USDT parsing logic. The code is
the documentation in this case, it's futile to try to succinctly
describe how USDT parsing is done in any sort of concreteness. But
still, USDTs are recorded in special ELF notes section (.note.stapsdt),
where each USDT call site is described separately. Along with USDT
provider and USDT name, each such note contains USDT argument
specification, which uses assembly-like syntax to describe how to fetch
value of USDT argument. USDT arg spec could be just a constant, or
a register, or a register dereference (most common cases in x86_64), but
it technically can be much more complicated cases, like offset relative
to global symbol and stuff like that. One of the later patches will
implement most common subset of this for x86 and x86-64 architectures,
which seems to handle a lot of real-world production application.

USDT arg spec contains a compact encoding allowing usdt.bpf.h from
previous patch to handle the above 3 cases. Instead of recording which
register might be needed, we encode register's offset within struct
pt_regs to simplify BPF-side implementation. USDT argument can be of
different byte sizes (1, 2, 4, and 8) and signed or unsigned. To handle
this, libbpf pre-calculates necessary bit shifts to do proper casting
and sign-extension in a short sequences of left and right shifts.

The rest is in the code with sometimes extensive comments and references
to external "documentation" for USDTs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-4-andrii@kernel.org
2022-04-05 13:16:07 -07:00
Andrii Nakryiko
2e4913e025 libbpf: Wire up USDT API and bpf_link integration
Wire up libbpf USDT support APIs without yet implementing all the
nitty-gritty details of USDT discovery, spec parsing, and BPF map
initialization.

User-visible user-space API is simple and is conceptually very similar
to uprobe API.

bpf_program__attach_usdt() API allows to programmatically attach given
BPF program to a USDT, specified through binary path (executable or
shared lib), USDT provider and name. Also, just like in uprobe case, PID
filter is specified (0 - self, -1 - any process, or specific PID).
Optionally, USDT cookie value can be specified. Such single API
invocation will try to discover given USDT in specified binary and will
use (potentially many) BPF uprobes to attach this program in correct
locations.

Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
represents this attachment. It is a virtual BPF link that doesn't have
direct kernel object, as it can consist of multiple underlying BPF
uprobe links. As such, attachment is not atomic operation and there can
be brief moment when some USDT call sites are attached while others are
still in the process of attaching. This should be taken into
consideration by user. But bpf_program__attach_usdt() guarantees that
in the case of success all USDT call sites are successfully attached, or
all the successfuly attachments will be detached as soon as some USDT
call sites failed to be attached. So, in theory, there could be cases of
failed bpf_program__attach_usdt() call which did trigger few USDT
program invocations. This is unavoidable due to multi-uprobe nature of
USDT and has to be handled by user, if it's important to create an
illusion of atomicity.

USDT BPF programs themselves are marked in BPF source code as either
SEC("usdt"), in which case they won't be auto-attached through
skeleton's <skel>__attach() method, or it can have a full definition,
which follows the spirit of fully-specified uprobes:
SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
attach method will attempt auto-attachment. Similarly, generic
bpf_program__attach() will have enought information to go off of for
parameterless attachment.

USDT BPF programs are actually uprobes, and as such for kernel they are
marked as BPF_PROG_TYPE_KPROBE.

Another part of this patch is USDT-related feature probing:
  - BPF cookie support detection from user-space;
  - detection of kernel support for auto-refcounting of USDT semaphore.

The latter is optional. If kernel doesn't support such feature and USDT
doesn't rely on USDT semaphores, no error is returned. But if libbpf
detects that USDT requires setting semaphores and kernel doesn't support
this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
support poking process's memory directly to increment semaphore value,
like BCC does on legacy kernels, due to inherent raciness and danger of
such process memory manipulation. Libbpf let's kernel take care of this
properly or gives up.

Logistically, all the extra USDT-related infrastructure of libbpf is put
into a separate usdt.c file and abstracted behind struct usdt_manager.
Each bpf_object has lazily-initialized usdt_manager pointer, which is
only instantiated if USDT programs are attempted to be attached. Closing
BPF object frees up usdt_manager resources. usdt_manager keeps track of
USDT spec ID assignment and few other small things.

Subsequent patches will fill out remaining missing pieces of USDT
initialization and setup logic.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-3-andrii@kernel.org
2022-04-05 13:16:07 -07:00
Andrii Nakryiko
d72e2968fb libbpf: Add BPF-side of USDT support
Add BPF-side implementation of libbpf-provided USDT support. This
consists of single header library, usdt.bpf.h, which is meant to be used
from user's BPF-side source code. This header is added to the list of
installed libbpf header, along bpf_helpers.h and others.

BPF-side implementation consists of two BPF maps:
  - spec map, which contains "a USDT spec" which encodes information
    necessary to be able to fetch USDT arguments and other information
    (argument count, user-provided cookie value, etc) at runtime;
  - IP-to-spec-ID map, which is only used on kernels that don't support
    BPF cookie feature. It allows to lookup spec ID based on the place
    in user application that triggers USDT program.

These maps have default sizes, 256 and 1024, which are chosen
conservatively to not waste a lot of space, but handling a lot of common
cases. But there could be cases when user application needs to either
trace a lot of different USDTs, or USDTs are heavily inlined and their
arguments are located in a lot of differing locations. For such cases it
might be necessary to size those maps up, which libbpf allows to do by
overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.

It is an important aspect to keep in mind. Single USDT (user-space
equivalent of kernel tracepoint) can have multiple USDT "call sites".
That is, single logical USDT is triggered from multiple places in user
application. This can happen due to function inlining. Each such inlined
instance of USDT invocation can have its own unique USDT argument
specification (instructions about the location of the value of each of
USDT arguments). So while USDT looks very similar to usual uprobe or
kernel tracepoint, under the hood it's actually a collection of uprobes,
each potentially needing different spec to know how to fetch arguments.

User-visible API consists of three helper functions:
  - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT;
  - bpf_usdt_arg(), which reads value of specified USDT argument (by
    it's zero-indexed position) and returns it as 64-bit value;
  - bpf_usdt_cookie(), which functions like BPF cookie for USDT
    programs; this is necessary as libbpf doesn't allow specifying actual
    BPF cookie and utilizes it internally for USDT support implementation.

Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
BPF program. On kernels that don't support BPF cookie it is used to
fetch absolute IP address of the underlying uprobe.

usdt.bpf.h also provides BPF_USDT() macro, which functions like
BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
get access to USDT arguments, if USDT definition is static and known to
the user. It is expected that majority of use cases won't have to use
bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover
all their needs.

Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
detect kernel support for BPF cookie. If BPF CO-RE dependency is
undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
either a boolean constant (or equivalently zero and non-zero), or even
point it to its own .rodata variable that can be specified from user's
application user-space code. It is important that
BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus
.rodata and not just .data), as otherwise BPF code will still contain
bpf_get_attach_cookie() BPF helper call and will fail validation at
runtime, if not dead-code eliminated.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-2-andrii@kernel.org
2022-04-05 13:16:07 -07:00
Ilya Leoshkevich
568189310c libbpf: Support Debian in resolve_full_path()
attach_probe selftest fails on Debian-based distros with `failed to
resolve full path for 'libc.so.6'`. The reason is that these distros
embraced multiarch to the point where even for the "main" architecture
they store libc in /lib/<triple>.

This is configured in /etc/ld.so.conf and in theory it's possible to
replicate the loader's parsing and processing logic in libbpf, however
a much simpler solution is to just enumerate the known library paths.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220404225020.51029-1-iii@linux.ibm.com
2022-04-04 16:47:16 -07:00
Ilya Leoshkevich
d298761746 selftests/bpf: Define SYS_NANOSLEEP_KPROBE_NAME for aarch64
attach_probe selftest fails on aarch64 with `failed to create kprobe
'sys_nanosleep+0x0' perf event: No such file or directory`. This is
because, like on several other architectures, nanosleep has a prefix.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220404142101.27900-1-iii@linux.ibm.com
2022-04-04 14:57:29 -07:00
Andrii Nakryiko
7224a0737c Merge branch 'bpf/bpftool: add program & link type names'
Milan Landaverde says:

====================

With the addition of the syscall prog type we should now
be able to see feature probe info for that prog type:

    $ bpftool feature probe kernel
    ...
    eBPF program_type syscall is available
    ...
    eBPF helpers supported for program type syscall:
        ...
        - bpf_sys_bpf
        - bpf_sys_close

And for the link types, their names should aid in
the output.

Before:
    $ bpftool link show
    50: type 7  prog 5042
	    bpf_cookie 0
	    pids vfsstat(394433)

After:
    $ bpftool link show
    57: perf_event  prog 5058
	    bpf_cookie 0
	    pids vfsstat(394725)
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-04-04 14:54:47 -07:00
Milan Landaverde
7b53eaa656 bpftool: Handle libbpf_probe_prog_type errors
Previously [1], we were using bpf_probe_prog_type which returned a
bool, but the new libbpf_probe_bpf_prog_type can return a negative
error code on failure. This change decides for bpftool to declare
a program type is not available on probe failure.

[1] https://lore.kernel.org/bpf/20220202225916.3313522-3-andrii@kernel.org/

Signed-off-by: Milan Landaverde <milan@mdaverde.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220331154555.422506-4-milan@mdaverde.com
2022-04-04 14:54:44 -07:00
Milan Landaverde
fff3dfab17 bpftool: Add missing link types
Will display the link type names in bpftool link show output

Signed-off-by: Milan Landaverde <milan@mdaverde.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220331154555.422506-3-milan@mdaverde.com
2022-04-04 14:54:34 -07:00
Milan Landaverde
380341637e bpftool: Add syscall prog type
In addition to displaying the program type in bpftool prog show
this enables us to be able to query bpf_prog_type_syscall
availability through feature probe as well as see
which helpers are available in those programs (such as
bpf_sys_bpf and bpf_sys_close)

Signed-off-by: Milan Landaverde <milan@mdaverde.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220331154555.422506-2-milan@mdaverde.com
2022-04-04 14:52:54 -07:00
Quentin Monnet
4eeebce6ac selftests/bpf: Fix parsing of prog types in UAPI hdr for bpftool sync
The script for checking that various lists of types in bpftool remain in
sync with the UAPI BPF header uses a regex to parse enum bpf_prog_type.
If this enum contains a set of values different from the list of program
types in bpftool, it complains.

This script should have reported the addition, some time ago, of the new
BPF_PROG_TYPE_SYSCALL, which was not reported to bpftool's program types
list. It failed to do so, because it failed to parse that new type from
the enum. This is because the new value, in the BPF header, has an
explicative comment on the same line, and the regex does not support
that.

Let's update the script to support parsing enum values when they have
comments on the same line.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220404140944.64744-1-quentin@isovalent.com
2022-04-04 14:46:15 -07:00
Alexander Lobakin
fc843ccd8e samples: bpf: Fix linking xdp_router_ipv4 after migration
Users of the xdp_sample_user infra should be explicitly linked
with the standard math library (`-lm`). Otherwise, the following
happens:

/usr/bin/ld: xdp_sample_user.c:(.text+0x59fc): undefined reference to `ceil'
/usr/bin/ld: xdp_sample_user.c:(.text+0x5a0d): undefined reference to `ceil'
/usr/bin/ld: xdp_sample_user.c:(.text+0x5adc): undefined reference to `floor'
/usr/bin/ld: xdp_sample_user.c:(.text+0x5b01): undefined reference to `ceil'
/usr/bin/ld: xdp_sample_user.c:(.text+0x5c1e): undefined reference to `floor'
/usr/bin/ld: xdp_sample_user.c:(.text+0x5c43): undefined reference to `ceil
[...]

That happened previously, so there's a block of linkage flags in the
Makefile. xdp_router_ipv4 has been transferred to this infra quite
recently, but hasn't been added to it. Fix.

Fixes: 85bf1f5169 ("samples: bpf: Convert xdp_router_ipv4 to XDP samples helper")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220404115451.1116478-1-alexandr.lobakin@intel.com
2022-04-04 14:45:06 -07:00
Song Chen
35f91d1fe1 sample: bpf: syscall_tp_user: Print result of verify_map
At the end of the test, we already print out
    prog <prog number>: map ids <...> <...>
Value is the number read from kernel through bpf map, further print out
    verify map:<map id> val:<...>
will help users to understand the program runs successfully.

Signed-off-by: Song Chen <chensong_2000@189.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648889828-12417-1-git-send-email-chensong_2000@189.cn
2022-04-04 14:43:57 -07:00
Yuntao Wang
e93f39998d libbpf: Don't return -EINVAL if hdr_len < offsetofend(core_relo_len)
Since core relos is an optional part of the .BTF.ext ELF section, we should
skip parsing it instead of returning -EINVAL if header size is less than
offsetofend(struct btf_ext_header, core_relo_len).

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220404005320.1723055-1-ytcoode@gmail.com
2022-04-03 19:56:01 -07:00
Andrii Nakryiko
88d01a5711 Merge branch 'libbpf: name-based u[ret]probe attach'
Alan Maguire says:

====================

This patch series focuses on supporting name-based attach - similar
to that supported for kprobes - for uprobe BPF programs.

Currently attach for such probes is done by determining the offset
manually, so the aim is to try and mimic the simplicity of kprobe
attach, making use of uprobe opts to specify a name string.
Patch 1 supports expansion of the binary_path argument used for
bpf_program__attach_uprobe_opts(), allowing it to determine paths
for programs and shared objects automatically, allowing for
specification of "libc.so.6" rather than the full path
"/usr/lib64/libc.so.6".

Patch 2 adds the "func_name" option to allow uprobe attach by
name; the mechanics are described there.

Having name-based support allows us to support auto-attach for
uprobes; patch 3 adds auto-attach support while attempting
to handle backwards-compatibility issues that arise.  The format
supported is

u[ret]probe/binary_path:[raw_offset|function[+offset]]

For example, to attach to libc malloc:

SEC("uprobe//usr/lib64/libc.so.6:malloc")

..or, making use of the path computation mechanisms introduced in patch 1

SEC("uprobe/libc.so.6:malloc")

Finally patch 4 add tests to the attach_probe selftests covering
attach by name, with patch 5 covering skeleton auto-attach.

Changes since v4 [1]:
- replaced strtok_r() usage with copying segments from static char *; avoids
  unneeded string allocation (Andrii, patch 1)
- switched to using access() instead of stat() when checking path-resolved
  binary (Andrii, patch 1)
- removed computation of .plt offset for instrumenting shared library calls
  within binaries.  Firstly it proved too brittle, and secondly it was somewhat
  unintuitive in that this form of instrumentation did not support function+offset
  as the "local function in binary" and "shared library function in shared library"
  cases did.  We can still instrument library calls, just need to do it in the
  library .so (patch 2)
- added binary path logging in cases where it was missing (Andrii, patch 2)
- avoid strlen() calcuation in checking name match (Andrii, patch 2)
- reword comments for func_name option (Andrii, patch 2)
- tightened SEC() name validation to support "u[ret]probe" and fail on other
  permutations that do not support auto-attach (i.e. have u[ret]probe/binary_path:func
  format (Andrii, patch 3)
- fixed selftests to fail independently rather than skip remainder on failure
  (Andrii, patches 4,5)
Changes since v3 [2]:
- reworked variable naming to fit better with libbpf conventions
  (Andrii, patch 2)
- use quoted binary path in log messages (Andrii, patch 2)
- added path determination mechanisms using LD_LIBRARY_PATH/PATH and
  standard locations (patch 1, Andrii)
- changed section lookup to be type+name (if name is specified) to
  simplify use cases (patch 2, Andrii)
- fixed .plt lookup scheme to match symbol table entries with .plt
  index via the .rela.plt table; also fix the incorrect assumption
  that the code in the .plt that does library linking is the same
  size as .plt entries (it just happens to be on x86_64)
- aligned with pluggable section support such that uprobe SEC() names
  that do not conform to auto-attach format do not cause skeleton load
  failure (patch 3, Andrii)
- no longer need to look up absolute path to libraries used by test_progs
  since we have mechanism to determine path automatically
- replaced CHECK()s with ASSERT*()s for attach_probe test (Andrii, patch 4)
- added auto-attach selftests also (Andrii, patch 5)
Changes since RFC [3]:
- used "long" for addresses instead of ssize_t (Andrii, patch 1).
- used gelf_ interfaces to avoid assumptions about 64-bit
  binaries (Andrii, patch 1)
- clarified string matching in symbol table lookups
  (Andrii, patch 1)
- added support for specification of shared object functions
  in a non-shared object binary.  This approach instruments
  the Procedure Linking Table (PLT) - malloc@PLT.
- changed logic in symbol search to check dynamic symbol table
  first, then fall back to symbol table (Andrii, patch 1).
- modified auto-attach string to require "/" separator prior
  to path prefix i.e. uprobe//path/to/binary (Andrii, patch 2)
- modified auto-attach string to use ':' separator (Andrii,
  patch 2)
- modified auto-attach to support raw offset (Andrii, patch 2)
- modified skeleton attach to interpret -ESRCH errors as
  a non-fatal "unable to auto-attach" (Andrii suggested
  -EOPNOTSUPP but my concern was it might collide with other
  instances where that value is returned and reflects a
  failure to attach a to-be-expected attachment rather than
  skip a program that does not present an auto-attachable
  section name. Admittedly -EOPNOTSUPP seems a more natural
  value here).
- moved library path retrieval code to trace_helpers (Andrii,
  patch 3)

[1] https://lore.kernel.org/bpf/1647000658-16149-1-git-send-email-alan.maguire@oracle.com/
[2] https://lore.kernel.org/bpf/1643645554-28723-1-git-send-email-alan.maguire@oracle.com/
[3] https://lore.kernel.org/bpf/1642678950-19584-1-git-send-email-alan.maguire@oracle.com/
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-04-03 19:56:01 -07:00
Alan Maguire
579c3196b2 selftests/bpf: Add tests for uprobe auto-attach via skeleton
tests that verify auto-attach works for function entry/return for
local functions in program and library functions in a library.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-6-git-send-email-alan.maguire@oracle.com
2022-04-03 19:56:01 -07:00
Alan Maguire
ba7499bc9d selftests/bpf: Add tests for u[ret]probe attach by name
add tests that verify attaching by name for

1. local functions in a program
2. library functions in a shared object

...succeed for uprobe and uretprobes using new "func_name"
option for bpf_program__attach_uprobe_opts().  Also verify
auto-attach works where uprobe, path to binary and function
name are specified, but fails with -EOPNOTSUPP with a SEC
name that does not specify binary path/function.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-5-git-send-email-alan.maguire@oracle.com
2022-04-03 19:56:00 -07:00
Alan Maguire
39f8dc43b7 libbpf: Add auto-attach for uprobes based on section name
Now that u[ret]probes can use name-based specification, it makes
sense to add support for auto-attach based on SEC() definition.
The format proposed is

        SEC("u[ret]probe/binary:[raw_offset|[function_name[+offset]]")

For example, to trace malloc() in libc:

        SEC("uprobe/libc.so.6:malloc")

...or to trace function foo2 in /usr/bin/foo:

        SEC("uprobe//usr/bin/foo:foo2")

Auto-attach is done for all tasks (pid -1).  prog can be an absolute
path or simply a program/library name; in the latter case, we use
PATH/LD_LIBRARY_PATH to resolve the full path, falling back to
standard locations (/usr/bin:/usr/sbin or /usr/lib64:/usr/lib) if
the file is not found via environment-variable specified locations.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-4-git-send-email-alan.maguire@oracle.com
2022-04-03 19:55:57 -07:00
Alan Maguire
433966e3ae libbpf: Support function name-based attach uprobes
kprobe attach is name-based, using lookups of kallsyms to translate
a function name to an address.  Currently uprobe attach is done
via an offset value as described in [1].  Extend uprobe opts
for attach to include a function name which can then be converted
into a uprobe-friendly offset.  The calcualation is done in
several steps:

1. First, determine the symbol address using libelf; this gives us
   the offset as reported by objdump
2. If the function is a shared library function - and the binary
   provided is a shared library - no further work is required;
   the address found is the required address
3. Finally, if the function is local, subtract the base address
   associated with the object, retrieved from ELF program headers.

The resultant value is then added to the func_offset value passed
in to specify the uprobe attach address.  So specifying a func_offset
of 0 along with a function name "printf" will attach to printf entry.

The modes of operation supported are then

1. to attach to a local function in a binary; function "foo1" in
   "/usr/bin/foo"
2. to attach to a shared library function in a shared library -
   function "malloc" in libc.

[1] https://www.kernel.org/doc/html/latest/trace/uprobetracer.html

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-3-git-send-email-alan.maguire@oracle.com
2022-04-03 18:12:05 -07:00
Alan Maguire
1ce3a60e3c libbpf: auto-resolve programs/libraries when necessary for uprobes
bpf_program__attach_uprobe_opts() requires a binary_path argument
specifying binary to instrument.  Supporting simply specifying
"libc.so.6" or "foo" should be possible too.

Library search checks LD_LIBRARY_PATH, then /usr/lib64, /usr/lib.
This allows users to run BPF programs prefixed with
LD_LIBRARY_PATH=/path2/lib while still searching standard locations.
Similarly for non .so files, we check PATH and /usr/bin, /usr/sbin.

Path determination will be useful for auto-attach of BPF uprobe programs
using SEC() definition.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-2-git-send-email-alan.maguire@oracle.com
2022-04-03 18:11:47 -07:00
Lorenzo Bianconi
85bf1f5169 samples: bpf: Convert xdp_router_ipv4 to XDP samples helper
Rely on the libbpf skeleton facility and other utilities provided by XDP
sample helpers in xdp_router_ipv4 sample.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/7f4d98ee2c13c04d5eb924eebf79ced32fee8418.1647414711.git.lorenzo@kernel.org
2022-04-03 17:12:09 -07:00
Haiyue Wang
66df0fdb59 bpf: Correct the comment for BTF kind bitfield
The commit 8fd886911a ("bpf: Add BTF_KIND_FLOAT to uapi") has extended
the BTF kind bitfield from 4 to 5 bits, correct the comment.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220403115327.205964-1-haiyue.wang@intel.com
2022-04-03 17:06:52 -07:00
Yuntao Wang
9bbad6dab8 selftests/bpf: Fix cd_flavor_subdir() of test_progs
Currently, when we run test_progs with just executable file name, for
example 'PATH=. test_progs-no_alu32', cd_flavor_subdir() will not check
if test_progs is running as a flavored test runner and switch into
corresponding sub-directory.

This will cause test_progs-no_alu32 executed by the
'PATH=. test_progs-no_alu32' command to run in the wrong directory and
load the wrong BPF objects.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220403135245.1713283-1-ytcoode@gmail.com
2022-04-03 17:01:48 -07:00
Haowen Bai
f6d60facd9 selftests/bpf: Return true/false (not 1/0) from bool functions
Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions.  This fixes the following warnings from coccicheck:

./tools/testing/selftests/bpf/progs/test_xdp_noinline.c:567:9-10: WARNING:
return of 0/1 in function 'get_packet_dst' with return type bool
./tools/testing/selftests/bpf/progs/test_l4lb_noinline.c:221:9-10: WARNING:
return of 0/1 in function 'get_packet_dst' with return type bool

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1648779354-14700-1-git-send-email-baihaowen@meizu.com
2022-04-03 16:42:43 -07:00
Nikolay Borisov
e299bcd4d1 selftests/bpf: Fix vfs_link kprobe definition
Since commit 6521f89170 ("namei: prepare for idmapped mounts")
vfs_link's prototype was changed, the kprobe definition in
profiler selftest in turn wasn't updated. The result is that all
argument after the first are now stored in different registers. This
means that self-test has been broken ever since. Fix it by updating the
kprobe definition accordingly.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220331140949.1410056-1-nborisov@suse.com
2022-04-03 16:41:24 -07:00
Jakob Koschel
185da3da93 bpf: Replace usage of supported with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use the found variable (existed & supported)
and simply checking if the variable was set, can determine if the
break/goto was hit.

[1] https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/

Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220331091929.647057-1-jakobkoschel@gmail.com
2022-04-03 16:33:36 -07:00
Yauheni Kaliuta
891663ace7 bpf, test_offload.py: Skip base maps without names
The test fails:

  # ./test_offload.py
  [...]
  Test bpftool bound info reporting (own ns)...
  FAIL: 3 BPF maps loaded, expected 2
    File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 1177, in <module>
      check_dev_info(False, "")
    File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 645, in check_dev_info
      maps = bpftool_map_list(expected=2, ns=ns)
    File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 190, in bpftool_map_list
      fail(True, "%d BPF maps loaded, expected %d" %
    File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 86, in fail
      tb = "".join(traceback.extract_stack().format())

Some base maps do not have names and they cannot be added due to compatibility
with older kernels, see [0]. So, just skip the unnamed maps.

  [0] https://lore.kernel.org/bpf/CAEf4BzY66WPKQbDe74AKZ6nFtZjq5e+G3Ji2egcVytB9R6_sGQ@mail.gmail.com/

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220329081100.9705-1-ykaliuta@redhat.com
2022-04-01 23:06:19 +02:00
Yuntao Wang
8eb943fc5e bpf: Remove redundant assignment to smap->map.value_size
The attr->value_size is already assigned to smap->map.value_size
in bpf_map_init_from_attr(), there is no need to do it again in
stack_map_alloc().

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/bpf/20220323073626.958652-1-ytcoode@gmail.com
2022-04-01 22:44:44 +02:00
Eyal Birger
fe4625d8b0 selftests/bpf: Remove unused variable from bpf_sk_assign test
Was never used in bpf_sk_assign_test(), and was removed from handle_{tcp,udp}()
in commit 0b9ad56b1e ("selftests/bpf: Use SOCKMAP for server sockets in
bpf_sk_assign test").

Fixes: 0b9ad56b1e ("selftests/bpf: Use SOCKMAP for server sockets in bpf_sk_assign test")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220329154914.3718658-1-eyal.birger@gmail.com
2022-04-01 22:35:29 +02:00
Jiapeng Chong
11e17ae423 bpf: Use swap() instead of open coding it
Clean the following coccicheck warning:

./kernel/trace/bpf_trace.c:2263:34-35: WARNING opportunity for swap().
./kernel/trace/bpf_trace.c:2264:40-41: WARNING opportunity for swap().

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220322062149.109180-1-jiapeng.chong@linux.alibaba.com
2022-04-01 22:27:07 +02:00
Xu Kuohai
38608ee7b6 bpf, tests: Add load store test case for tail call
Add test case to enusre that the caller and callee's fp offsets are
correct during tail call (mainly asserting for arm64 JIT).

Tested on both big-endian and little-endian arm64 qemu, result:

 test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]
 test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
 test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321152852.2334294-6-xukuohai@huawei.com
2022-04-01 00:27:52 +02:00
Xu Kuohai
f516420f68 bpf, tests: Add tests for BPF_LDX/BPF_STX with different offsets
This patch adds tests to verify the behavior of BPF_LDX/BPF_STX +
BPF_B/BPF_H/BPF_W/BPF_DW with negative offset, small positive offset,
large positive offset, and misaligned offset.

Tested on both big-endian and little-endian arm64 qemu, result:

 test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]']
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
 test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321152852.2334294-5-xukuohai@huawei.com
2022-04-01 00:27:48 +02:00
Xu Kuohai
5b3d19b9bd bpf, arm64: Adjust the offset of str/ldr(immediate) to positive number
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number.  Therefore, this patch tries to
convert the offsets.

The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.

FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.

Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.

For example, for the following bpftrace command:

  bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'

Without this patch, jited code(fragment):

   0:   bti     c
   4:   stp     x29, x30, [sp, #-16]!
   8:   mov     x29, sp
   c:   stp     x19, x20, [sp, #-16]!
  10:   stp     x21, x22, [sp, #-16]!
  14:   stp     x25, x26, [sp, #-16]!
  18:   mov     x25, sp
  1c:   mov     x26, #0x0                       // #0
  20:   bti     j
  24:   sub     sp, sp, #0x90
  28:   add     x19, x0, #0x0
  2c:   mov     x0, #0x0                        // #0
  30:   mov     x10, #0xffffffffffffff78        // #-136
  34:   str     x0, [x25, x10]
  38:   mov     x10, #0xffffffffffffff80        // #-128
  3c:   str     x0, [x25, x10]
  40:   mov     x10, #0xffffffffffffff88        // #-120
  44:   str     x0, [x25, x10]
  48:   mov     x10, #0xffffffffffffff90        // #-112
  4c:   str     x0, [x25, x10]
  50:   mov     x10, #0xffffffffffffff98        // #-104
  54:   str     x0, [x25, x10]
  58:   mov     x10, #0xffffffffffffffa0        // #-96
  5c:   str     x0, [x25, x10]
  60:   mov     x10, #0xffffffffffffffa8        // #-88
  64:   str     x0, [x25, x10]
  68:   mov     x10, #0xffffffffffffffb0        // #-80
  6c:   str     x0, [x25, x10]
  70:   mov     x10, #0xffffffffffffffb8        // #-72
  74:   str     x0, [x25, x10]
  78:   mov     x10, #0xffffffffffffffc0        // #-64
  7c:   str     x0, [x25, x10]
  80:   mov     x10, #0xffffffffffffffc8        // #-56
  84:   str     x0, [x25, x10]
  88:   mov     x10, #0xffffffffffffffd0        // #-48
  8c:   str     x0, [x25, x10]
  90:   mov     x10, #0xffffffffffffffd8        // #-40
  94:   str     x0, [x25, x10]
  98:   mov     x10, #0xffffffffffffffe0        // #-32
  9c:   str     x0, [x25, x10]
  a0:   mov     x10, #0xffffffffffffffe8        // #-24
  a4:   str     x0, [x25, x10]
  a8:   mov     x10, #0xfffffffffffffff0        // #-16
  ac:   str     x0, [x25, x10]
  b0:   mov     x10, #0xfffffffffffffff8        // #-8
  b4:   str     x0, [x25, x10]
  b8:   mov     x10, #0x8                       // #8
  bc:   ldr     x2, [x19, x10]
  [...]

With this patch, jited code(fragment):

   0:   bti     c
   4:   stp     x29, x30, [sp, #-16]!
   8:   mov     x29, sp
   c:   stp     x19, x20, [sp, #-16]!
  10:   stp     x21, x22, [sp, #-16]!
  14:   stp     x25, x26, [sp, #-16]!
  18:   stp     x27, x28, [sp, #-16]!
  1c:   mov     x25, sp
  20:   sub     x27, x25, #0x88
  24:   mov     x26, #0x0                       // #0
  28:   bti     j
  2c:   sub     sp, sp, #0x90
  30:   add     x19, x0, #0x0
  34:   mov     x0, #0x0                        // #0
  38:   str     x0, [x27]
  3c:   str     x0, [x27, #8]
  40:   str     x0, [x27, #16]
  44:   str     x0, [x27, #24]
  48:   str     x0, [x27, #32]
  4c:   str     x0, [x27, #40]
  50:   str     x0, [x27, #48]
  54:   str     x0, [x27, #56]
  58:   str     x0, [x27, #64]
  5c:   str     x0, [x27, #72]
  60:   str     x0, [x27, #80]
  64:   str     x0, [x27, #88]
  68:   str     x0, [x27, #96]
  6c:   str     x0, [x27, #104]
  70:   str     x0, [x27, #112]
  74:   str     x0, [x27, #120]
  78:   str     x0, [x27, #128]
  7c:   ldr     x2, [x19, #8]
  [...]

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321152852.2334294-4-xukuohai@huawei.com
2022-04-01 00:27:34 +02:00
Xu Kuohai
7db6c0f1d8 bpf, arm64: Optimize BPF store/load using arm64 str/ldr(immediate offset)
The current BPF store/load instruction is translated by the JIT into two
instructions. The first instruction moves the immediate offset into a
temporary register. The second instruction uses this temporary register
to do the real store/load.

In fact, arm64 supports addressing with immediate offsets. So This patch
introduces optimization that uses arm64 str/ldr instruction with immediate
offset when the offset fits.

Example of generated instuction for r2 = *(u64 *)(r1 + 0):

without optimization:
mov x10, 0
ldr x1, [x0, x10]

with optimization:
ldr x1, [x0, 0]

If the offset is negative, or is not aligned correctly, or exceeds max
value, rollback to the use of temporary register.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321152852.2334294-3-xukuohai@huawei.com
2022-04-01 00:27:34 +02:00
Xu Kuohai
30c90f6757 arm64, insn: Add ldr/str with immediate offset
This patch introduces ldr/str with immediate offset support to simplify
the JIT implementation of BPF LDX/STX instructions on arm64. Although
arm64 ldr/str immediate is available in pre-index, post-index and
unsigned offset forms, the unsigned offset form is sufficient for BPF,
so this patch only adds this type.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321152852.2334294-2-xukuohai@huawei.com
2022-04-01 00:27:29 +02:00
Linus Torvalds
2975dbdc39 Networking fixes for 5.18-rc1 and rethook patches.
Features:
 
  - kprobes: rethook: x86: replace kretprobe trampoline with rethook
 
 Current release - regressions:
 
  - sfc: avoid null-deref on systems without NUMA awareness
    in the new queue sizing code
 
 Current release - new code bugs:
 
  - vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices
 
  - eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl
    when interface is down
 
 Previous releases - always broken:
 
  - openvswitch: correct neighbor discovery target mask field
    in the flow dump
 
  - wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak
 
  - rxrpc: fix call timer start racing with call destruction
 
  - rxrpc: fix null-deref when security type is rxrpc_no_security
 
  - can: fix UAF bugs around echo skbs in multiple drivers
 
 Misc:
 
  - docs: move netdev-FAQ to the "process" section of the documentation
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmJF3S0ACgkQMUZtbf5S
 IruIvA/+NZx+c+fBBrbjOh63avRL7kYIqIDREf+v6lh4ZXmbrp22xalcjIdxgWeK
 vAiYfYmzZblWAGkilcvPG3blCBc+9b+YE+pPJXFe60Huv3eYpjKfgTKwQOg/lIeM
 8MfPP7eBwcJ/ltSTRtySRl9LYgyVcouP9rAVJavFVYrvuDYunwhfChswVfGCYon8
 42O4nRwrtkTE1MjHD8HS3YxvwGlo+iIyhsxgG/gWx8F2xeIG22H6adzjDXcCQph8
 air/awrJ4enYkVMRokGNfNppK9Z3vjJDX5xha3CREpvXNPe0F24cAE/L8XqyH7+r
 /bXP5y9VC9mmEO7x4Le3VmDhOJGbCOtR89gTlevftDRdSIrbNHffZhbPW48tR7o8
 NJFlhiSJb4HEMN0q7BmxnWaKlbZUlvLEXLuU5ytZE/G7i+nETULlunfZrCD4eNYH
 gBGYhiob2I/XotJA9QzG/RDyaFwDaC/VARsyv37PSeBAl/yrEGAeP7DsKkKX/ayg
 LM9ItveqHXK30J0xr3QJA8s49EkIYejjYR3l0hQ9esf9QvGK99dE/fo44Apf3C3A
 Lz6XpnRc5Xd7tZ9Aopwb3FqOH6WR9Hq9Qlbk0qifsL/2sRbatpuZbbDK6L3CR3Ir
 WFNcOoNbbqv85kCKFXFjj0jdpoNa9Yej8XFkMkVSkM3sHImYmYQ=
 =5Bvy
 -----END PGP SIGNATURE-----

Merge tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull more networking updates from Jakub Kicinski:
 "Networking fixes and rethook patches.

  Features:

   - kprobes: rethook: x86: replace kretprobe trampoline with rethook

  Current release - regressions:

   - sfc: avoid null-deref on systems without NUMA awareness in the new
     queue sizing code

  Current release - new code bugs:

   - vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices

   - eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when
     interface is down

  Previous releases - always broken:

   - openvswitch: correct neighbor discovery target mask field in the
     flow dump

   - wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak

   - rxrpc: fix call timer start racing with call destruction

   - rxrpc: fix null-deref when security type is rxrpc_no_security

   - can: fix UAF bugs around echo skbs in multiple drivers

  Misc:

   - docs: move netdev-FAQ to the 'process' section of the
     documentation"

* tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices
  openvswitch: Add recirc_id to recirc warning
  rxrpc: fix some null-ptr-deref bugs in server_key.c
  rxrpc: Fix call timer start racing with call destruction
  net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
  net: hns3: fix the concurrency between functions reading debugfs
  docs: netdev: move the netdev-FAQ to the process pages
  docs: netdev: broaden the new vs old code formatting guidelines
  docs: netdev: call out the merge window in tag checking
  docs: netdev: add missing back ticks
  docs: netdev: make the testing requirement more stringent
  docs: netdev: add a question about re-posting frequency
  docs: netdev: rephrase the 'should I update patchwork' question
  docs: netdev: rephrase the 'Under review' question
  docs: netdev: shorten the name and mention msgid for patch status
  docs: netdev: note that RFC postings are allowed any time
  docs: netdev: turn the net-next closed into a Warning
  docs: netdev: move the patch marking section up
  docs: netdev: minor reword
  docs: netdev: replace references to old archives
  ...
2022-03-31 11:23:31 -07:00
Linus Torvalds
93235e3df2 This push fixes the following issues:
- Missing Kconfig dependency on arm that leads to boot failure.
 - x86 SLS fixes.
 - Reference leak in the stm32 driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmJD5lYACgkQxycdCkmx
 i6cE1w//Xp0x6/m+iMOtctbBy8dLRoKO3ADUxFI+Y+GdYYkUuRX5YKFRRBIIsRWv
 1RLv9C53g5It7O5ohtH2oMrefLiZ2jWLojfBBrv/1pvol6r1LsAxSsRN9QrFSNAB
 Bsv6RouB/HYaMjbwEroPlj9/3XUlvsbvb4aNGxSnpcNI12HifxYRh3FPlJj/mdHh
 SPvPqpSewuDSajNubHfRAAvayG3md7iOZBFx1q+fAaczHiO5NK8DslktFlyRUbeV
 KT0YosZ7VuGLWgsQD052FYKqApqRzj9GmePtO/n5F24e+K5fbo0vP1XzjpTI2KAh
 I+vZ4CvTjSz3feFSsCNjLjd+KGj+cCuG2TrTn0rhM9o2bINGw+VWwSj3Wr7EBsS5
 Gf9CzdLrlcpM+HfDW2HMEqX+MXsaGQ0eoKxWs5BeKrPAUtbWTG9Y0UNrZ/eeoLYa
 4j6r3Lr0eb6zLzy6rRkG6iKN2tBUmj3BC6KZjNJaHq+bxHTY2myU1YLtcTHZXvKc
 x6I5G6e/AyRNQwcSoGYOnAnp8PfZyPaeMRR3ydxqRL/dZiJrH7xUjF0gr4ZYLcDr
 9khwTmlMiSQA7X/FlgnmGFkVlFQdxIF1jQ5RXn5K/CrzWHgmbdoKB2rVJB/mdSMj
 TwAGCbL8r0Sr7SSkisHrgZN+mGOt4XxpPWh+IpkLUQx4iB1XI7I=
 =qKtj
 -----END PGP SIGNATURE-----

Merge tag 'v5.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

 - Missing Kconfig dependency on arm that leads to boot failure

 - x86 SLS fixes

 - Reference leak in the stm32 driver

* tag 'v5.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/sm3 - Fixup SLS
  crypto: x86/poly1305 - Fixup SLS
  crypto: x86/chacha20 - Avoid spurious jumps to other functions
  crypto: stm32 - fix reference leak in stm32_crc_remove
  crypto: arm/aes-neonbs-cbc - Select generic cbc and aes
2022-03-31 11:17:39 -07:00
Eric Dumazet
9d570741ae vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices
vxlan_vnifilter_dump_dev() assumes it is called only
for vxlan devices. Make sure it is the case.

BUG: KASAN: slab-out-of-bounds in vxlan_vnifilter_dump_dev+0x9a0/0xb40 drivers/net/vxlan/vxlan_vnifilter.c:349
Read of size 4 at addr ffff888060d1ce70 by task syz-executor.3/17662

CPU: 0 PID: 17662 Comm: syz-executor.3 Tainted: G        W         5.17.0-syzkaller-12888-g77c9387c0c5b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0xeb/0x495 mm/kasan/report.c:313
 print_report mm/kasan/report.c:429 [inline]
 kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
 vxlan_vnifilter_dump_dev+0x9a0/0xb40 drivers/net/vxlan/vxlan_vnifilter.c:349
 vxlan_vnifilter_dump+0x3ff/0x650 drivers/net/vxlan/vxlan_vnifilter.c:428
 netlink_dump+0x4b5/0xb70 net/netlink/af_netlink.c:2270
 __netlink_dump_start+0x647/0x900 net/netlink/af_netlink.c:2375
 netlink_dump_start include/linux/netlink.h:245 [inline]
 rtnetlink_rcv_msg+0x70c/0xb80 net/core/rtnetlink.c:5953
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:725
 ____sys_sendmsg+0x6e2/0x800 net/socket.c:2413
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f87b8e89049

Fixes: f9c4bb0b24 ("vxlan: vni filtering support on collect metadata device")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Roopa Prabhu <roopa@nvidia.com>
Link: https://lore.kernel.org/r/20220330194643.2706132-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-31 08:53:01 -07:00