Commit Graph

1123204 Commits

Author SHA1 Message Date
Deming Wang
b59cc7fcba samples/bpf: Fix typo in xdp_router_ipv4 sample
Fix typo in xdp_router_ipv4 sample.

Signed-off-by: Deming Wang <wangdeming@inspur.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220927192527.8722-1-wangdeming@inspur.com
2022-09-30 15:04:23 -07:00
Andrii Nakryiko
2ade1cd95b Merge branch 'tools: bpftool: Remove unused struct'
Yuan Can says:

====================
This series contains two cleanup patches, remove unused struct.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-09-30 14:09:16 -07:00
Yuan Can
f95a479797 bpftool: Remove unused struct event_ring_info
After commit 9b190f185d ("tools/bpftool: switch map event_pipe to
libbpf's perf_buffer"), struct event_ring_info is not used any more and
can be removed as well.

Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220928090440.79637-3-yuancan@huawei.com
2022-09-30 14:09:05 -07:00
Yuan Can
d863f42930 bpftool: Remove unused struct btf_attach_point
After commit 2828d0d75b ("bpftool: Switch to libbpf's hashmap for
programs/maps in BTF listing"), struct btf_attach_point is not used
anymore and can be removed as well.

Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220928090440.79637-2-yuancan@huawei.com
2022-09-30 14:08:40 -07:00
Dave Thaler
5a8921ba96 bpf, docs: Add TOC and fix formatting.
Add TOC and fix formatting.

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Link: https://lore.kernel.org/r/20220927185958.14995-5-dthaler1968@googlemail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-30 13:41:36 -07:00
Dave Thaler
ee159bdbdb bpf, docs: Add Clang note about BPF_ALU
Add Clang note about BPF_ALU.

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Link: https://lore.kernel.org/r/20220927185958.14995-4-dthaler1968@googlemail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-30 13:41:25 -07:00
Dave Thaler
6c7aaffb24 bpf, docs: Move Clang notes to a separate file
Move Clang notes to a separate file.

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Link: https://lore.kernel.org/r/20220927185958.14995-3-dthaler1968@googlemail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-30 13:41:13 -07:00
Dave Thaler
9a0bf21337 bpf, docs: Linux byteswap note
Add Linux byteswap note.

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Link: https://lore.kernel.org/r/20220927185958.14995-2-dthaler1968@googlemail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-30 13:40:59 -07:00
Dave Thaler
6166da0a02 bpf, docs: Move legacy packet instructions to a separate file
Move legacy packet instructions to a separate file.

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Link: https://lore.kernel.org/r/20220927185958.14995-1-dthaler1968@googlemail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-30 13:40:45 -07:00
Alexei Starovoitov
5ee35abb46 Merge branch 'bpf: Remove recursion check for struct_ops prog'
Martin KaFai Lau says:

====================

From: Martin KaFai Lau <martin.lau@kernel.org>

The struct_ops is sharing the tracing-trampoline's enter/exit
function which tracks prog->active to avoid recursion.  It turns
out the struct_ops bpf prog will hit this prog->active and
unnecessarily skipped running the struct_ops prog.  eg.  The
'.ssthresh' may run in_task() and then interrupted by softirq
that runs the same '.ssthresh'.

The kernel does not call the tcp-cc's ops in a recursive way,
so this set is to remove the recursion check for struct_ops prog.

v3:
- Clear the bpf_chg_cc_inprogress from the newly cloned tcp_sock
  in tcp_create_openreq_child() because the listen sk can
  be cloned without lock being held. (Eric Dumazet)

v2:
- v1 [0] turned into a long discussion on a few cases and also
  whether it needs to follow the bpf_run_ctx chain if there is
  tracing bpf_run_ctx (kprobe/trace/trampoline) running in between.

  It is a good signal that it is not obvious enough to reason
  about it and needs a tradeoff for a more straight forward approach.

  This revision uses one bit out of an existing 1 byte hole
  in the tcp_sock.  It is in Patch 4.

  [0]: https://lore.kernel.org/bpf/20220922225616.3054840-1-kafai@fb.com/T/#md98d40ac5ec295fdadef476c227a3401b2b6b911
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:48 -07:00
Martin KaFai Lau
3411c5b6f8 selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION)
This patch changes the bpf_dctcp test to ensure the recurred
bpf_setsockopt(TCP_CONGESTION) returns -EBUSY.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20220929070407.965581-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:47 -07:00
Martin KaFai Lau
061ff04071 bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
When a bad bpf prog '.init' calls
bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:

.init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
... => .init => bpf_setsockopt(tcp_cc).

It was prevented by the prog->active counter before but the prog->active
detection cannot be used in struct_ops as explained in the earlier
patch of the set.

In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
in order to break the loop.  This is done by using a bit of
an existing 1 byte hole in tcp_sock to check if there is
on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.

Note that this essentially limits only the first '.init' can
call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
does not support ECN) and the second '.init' cannot fallback to
another cc.  This applies even the second
bpf_setsockopt(TCP_CONGESTION) will not cause a loop.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220929070407.965581-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:47 -07:00
Martin KaFai Lau
1e7d217faa bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function
This patch moves the bpf_setsockopt(TCP_CONGESTION) logic into
another function.  The next patch will add extra logic to avoid
recursion and this will make the latter patch easier to follow.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20220929070407.965581-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:47 -07:00
Martin KaFai Lau
37cfbe0bf2 bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt()
The check on the tcp-cc, "cdg", is done in the bpf_sk_setsockopt which is
used by the bpf_tcp_ca, bpf_lsm, cg_sockopt, and tcp_iter hooks.
However, it is not done for cg sock_ddr, cg sockops, and some of
the bpf_lsm_cgroup hooks.

The tcp-cc "cdg" should have very limited usage.  This patch is to
move the "cdg" check to the common sol_tcp_sockopt() so that all
hooks have a consistent behavior.   The motivation to make
this check consistent now is because the latter patch will
refactor the bpf_setsockopt(TCP_CONGESTION) into another function,
so it is better to take this chance to refactor this piece
also.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20220929070407.965581-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:47 -07:00
Martin KaFai Lau
64696c40d0 bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline
The struct_ops prog is to allow using bpf to implement the functions in
a struct (eg. kernel module).  The current usage is to implement the
tcp_congestion.  The kernel does not call the tcp-cc's ops (ie.
the bpf prog) in a recursive way.

The struct_ops is sharing the tracing-trampoline's enter/exit
function which tracks prog->active to avoid recursion.  It is
needed for tracing prog.  However, it turns out the struct_ops
bpf prog will hit this prog->active and unnecessarily skipped
running the struct_ops prog.  eg.  The '.ssthresh' may run in_task()
and then interrupted by softirq that runs the same '.ssthresh'.
Skip running the '.ssthresh' will end up returning random value
to the caller.

The patch adds __bpf_prog_{enter,exit}_struct_ops for the
struct_ops trampoline.  They do not track the prog->active
to detect recursion.

One exception is when the tcp_congestion's '.init' ops is doing
bpf_setsockopt(TCP_CONGESTION) and then recurs to the same
'.init' ops.  This will be addressed in the following patches.

Fixes: ca06f55b90 ("bpf: Add per-program recursion prevention mechanism")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20220929070407.965581-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29 09:25:47 -07:00
Andrii Nakryiko
8526f0d613 Merge branch 'bpf/selftests: convert some tests to ASSERT_* macros'
Wang Yufen says:

====================

Convert some tests to use the preferred ASSERT_* macros instead of the
deprecated CHECK().
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-09-28 17:34:24 -07:00
Wang Yufen
1fddca3d36 selftests/bpf: Convert udp_limit test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-12-git-send-email-wangyufen@huawei.com
2022-09-28 17:34:19 -07:00
Wang Yufen
9d0b05bdfb selftests/bpf: Convert tcpbpf_user test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-11-git-send-email-wangyufen@huawei.com
2022-09-28 17:34:14 -07:00
Wang Yufen
8dda32ac58 selftests/bpf: Convert tcp_rtt test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-10-git-send-email-wangyufen@huawei.com
2022-09-28 17:34:09 -07:00
Wang Yufen
3082f8cd4b selftests/bpf: Convert tcp_hdr_options test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-9-git-send-email-wangyufen@huawei.com
2022-09-28 17:34:04 -07:00
Wang Yufen
a0a1729671 selftests/bpf: Convert tcp_estats test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-8-git-send-email-wangyufen@huawei.com
2022-09-28 17:33:59 -07:00
Wang Yufen
f19708dfa0 selftests/bpf: Convert sockopt_sk test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-7-git-send-email-wangyufen@huawei.com
2022-09-28 17:33:53 -07:00
Wang Yufen
a605a6bbcc selftests/bpf: Convert sockopt_multi test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-6-git-send-email-wangyufen@huawei.com
2022-09-28 17:33:47 -07:00
Wang Yufen
675bc8abe1 selftests/bpf: Convert sockopt_inherit test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-5-git-send-email-wangyufen@huawei.com
2022-09-28 17:33:38 -07:00
Wang Yufen
099763e7da selftests/bpf: Convert sockopt test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-4-git-send-email-wangyufen@huawei.com
2022-09-28 17:33:29 -07:00
Wang Yufen
d155fcb3ff selftests/bpf: Convert sockmap_ktls test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-3-git-send-email-wangyufen@huawei.com
2022-09-28 17:33:21 -07:00
Wang Yufen
38e35e1d0c selftests/bpf: Convert sockmap_basic test to ASSERT_* macros
Convert the selftest to use the preferred ASSERT_* macros instead of the
deprecated CHECK().

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1664169131-32405-2-git-send-email-wangyufen@huawei.com
2022-09-28 17:33:07 -07:00
Andrii Nakryiko
aa55dfd378 Merge branch 'Parameterize task iterators.'
Kui-Feng Lee says:

====================

Allow creating an iterator that loops through resources of one task/thread.

People could only create iterators to loop through all resources of
files, vma, and tasks in the system, even though they were interested in only the
resources of a specific task or process.  Passing the additional
parameters, people can now create an iterator to go through all
resources or only the resources of a task.

Major Changes:

 - Add new parameters in bpf_iter_link_info to indicate to go through
   all tasks or to go through a specific task.

 - Change the implementations of BPF iterators of vma, files, and
   tasks to allow going through only the resources of a specific task.

 - Provide the arguments of parameterized task iterators in
   bpf_link_info.

Differences from v10:

 - Check pid_alive() to avoid potential errors.

Differences from v9:

  - Fix the boundary check of computing page_shift.

  - Rewording the reason of checking and returning the same task.

Differences from v8:

 - Fix uninitialized variable.

 - Avoid redundant work of getting task from pid.

 - Change format string to use %u instead of %d.

 - Use the value of page_shift to compute correct offset in
   bpf_iter_vm_offset.c.

Differences from v7:

 - Travel the tasks of a process through task_group linked list
   instead of traveling through the whole namespace.

Differences from v6:

 - Add part 5 to make bpftool show the value of parameters.

 - Change of wording of show_fdinfo() to show pid or tid instead of
   always pid.

 - Simplify error handling and naming of test cases.

Differences from v5:

 - Use user-space tid/pid terminologies in bpf_iter_link_info and
   bpf_link_info.

 - Fix reference count

 - Merge all variants to one 'u32 pid' in internal structs.
   (bpf_iter_aux_info and bpf_iter_seq_task_common)

 - Compare the result of get_uprobe_offset() with the implementation
   with the vma iterators.

 - Implement show_fdinfo.

Differences from v4:

 - Remove 'type' from bpf_iter_link_info and bpf_link_info.

v10: https://lore.kernel.org/all/20220831181039.2680134-1-kuifeng@fb.com/
v9: https://lore.kernel.org/bpf/20220829192317.486946-1-kuifeng@fb.com/
v8: https://lore.kernel.org/bpf/20220829192317.486946-1-kuifeng@fb.com/
v7: https://lore.kernel.org/bpf/20220826003712.2810158-1-kuifeng@fb.com/
v6: https://lore.kernel.org/bpf/20220819220927.3409575-1-kuifeng@fb.com/
v5: https://lore.kernel.org/bpf/20220811001654.1316689-1-kuifeng@fb.com/
v4: https://lore.kernel.org/bpf/20220809195429.1043220-1-kuifeng@fb.com/
v3: https://lore.kernel.org/bpf/20220809063501.667610-1-kuifeng@fb.com/
v2: https://lore.kernel.org/bpf/20220801232649.2306614-1-kuifeng@fb.com/
v1: https://lore.kernel.org/bpf/20220726051713.840431-1-kuifeng@fb.com/
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-09-28 16:30:12 -07:00
Kui-Feng Lee
6bdb6d6be0 bpftool: Show parameters of BPF task iterators.
Show tid or pid of iterators if giving an argument of tid or pid

For example, the command `bpftool link list` may list following
lines.

1: iter  prog 2  target_name bpf_map
2: iter  prog 3  target_name bpf_prog
33: iter  prog 225  target_name task_file  tid 1644
        pids test_progs(1644)

Link 33 is a task_file iterator with tid 1644.  For now, only targets
of task, task_file and task_vma may be with tid or pid to filter out
tasks other than those belonging to a process (pid) or a thread (tid).

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20220926184957.208194-6-kuifeng@fb.com
2022-09-28 16:30:08 -07:00
Kui-Feng Lee
b3e1331eb9 selftests/bpf: Test parameterized task BPF iterators.
Test iterators of vma, files and tasks.

Ensure the API works appropriately to visit all tasks,
tasks in a process, or a particular task.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20220926184957.208194-5-kuifeng@fb.com
2022-09-28 16:30:03 -07:00
Kui-Feng Lee
2c4fe44fb0 bpf: Handle show_fdinfo for the parameterized task BPF iterators
Show information of iterators in the respective files under
/proc/<pid>/fdinfo/.

For example, for a task file iterator with 1723 as the value of tid
parameter, its fdinfo would look like the following lines.

    pos:    0
    flags:  02000000
    mnt_id: 14
    ino:    38
    link_type:      iter
    link_id:        51
    prog_tag:       a590ac96db22b825
    prog_id:        299
    target_name:    task_file
    task_type:      TID
    tid: 1723

This patch add the last three fields.  task_type is the type of the
task parameter.  TID means the iterator visit only the thread
specified by tid.  The value of tid in the above example is 1723.  For
the case of PID task_type, it means the iterator visits only threads
of a process and will show the pid value of the process instead of a
tid.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20220926184957.208194-4-kuifeng@fb.com
2022-09-28 16:30:00 -07:00
Kui-Feng Lee
21fb6f2aa3 bpf: Handle bpf_link_info for the parameterized task BPF iterators.
Add new fields to bpf_link_info that users can query it through
bpf_obj_get_info_by_fd().

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20220926184957.208194-3-kuifeng@fb.com
2022-09-28 16:29:55 -07:00
Kui-Feng Lee
f0d74c4da1 bpf: Parameterize task iterators.
Allow creating an iterator that loops through resources of one
thread/process.

People could only create iterators to loop through all resources of
files, vma, and tasks in the system, even though they were interested
in only the resources of a specific task or process.  Passing the
additional parameters, people can now create an iterator to go
through all resources or only the resources of a task.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20220926184957.208194-2-kuifeng@fb.com
2022-09-28 16:29:47 -07:00
Andrii Nakryiko
87dbdc230d libbpf: Don't require full struct enum64 in UAPI headers
Drop the requirement for system-wide kernel UAPI headers to provide full
struct btf_enum64 definition. This is an unexpected requirement that
slipped in libbpf 1.0 and put unnecessary pressure ([0]) on users to have
a bleeding-edge kernel UAPI header from unreleased Linux 6.0.

To achieve this, we forward declare struct btf_enum64. But that's not
enough as there is btf_enum64_value() helper that expects to know the
layout of struct btf_enum64. So we get a bit creative with
reinterpreting memory layout as array of __u32 and accesing lo32/hi32
fields as array elements. Alternative way would be to have a local
pointer variable for anonymous struct with exactly the same layout as
struct btf_enum64, but that gets us into C++ compiler errors complaining
about invalid type casts. So play it safe, if ugly.

  [0] Closes: https://github.com/libbpf/libbpf/issues/562

Fixes: d90ec262b3 ("libbpf: Add enum64 support for btf_dump")
Reported-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/bpf/20220927042940.147185-1-andrii@kernel.org
2022-09-27 20:45:17 +02:00
Yauheni Kaliuta
2702c78999 selftests/bpf: Fix passing arguments via function in test_kmod.sh
Since the tests are run in a function $@ there actually contains the
function arguments, not the script ones.

Pass "$@" to the function as well.

Fixes: 272d1f4cfa ("selftests: bpf: test_kmod.sh: Pass parameters to the module")
Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220926092320.564631-1-ykaliuta@redhat.com
2022-09-27 15:57:54 +02:00
Jon Doron
6a4ab8869d libbpf: Fix the case of running as non-root with capabilities
When running rootless with special capabilities like:
FOWNER / DAC_OVERRIDE / DAC_READ_SEARCH

The "access" API will not make the proper check if there is really
access to a file or not.

>From the access man page:
"
The check is done using the calling process's real UID and GID, rather
than the effective IDs as is done when actually attempting an operation
(e.g., open(2)) on the file.  Similarly, for the root user, the check
uses the set of permitted capabilities  rather than the set of effective
capabilities; ***and for non-root users, the check uses an empty set of
capabilities.***
"

What that means is that for non-root user the access API will not do the
proper validation if the process really has permission to a file or not.

To resolve this this patch replaces all the access API calls with
faccessat with AT_EACCESS flag.

Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220925070431.1313680-1-arilou@gmail.com
2022-09-26 21:38:32 -07:00
Alexei Starovoitov
9666a70265 Merge branch 'enforce W^X for trampoline and dispatcher'
Song Liu says:

====================

Changes v1 => v2:
1. Update arch_prepare_bpf_dispatcher to use a RO image and a RW buffer.
   (Alexei) Note: I haven't found an existing test to cover this part, so
   this part was tested manually (comparing the generated dispatcher is
   the same).

Jeff Layton reported CPA W^X warning linux-next [1]. It turns out to be
W^X issue with bpf trampoline and bpf dispatcher. Fix these by:

1. Use bpf_prog_pack for bpf_dispatcher;
2. Set memory permission properly with bpf trampoline.

[1] https://lore.kernel.org/lkml/c84cc27c1a5031a003039748c3c099732a718aec.camel@kernel.org/
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26 20:40:44 -07:00
Song Liu
5b0d1c7bd5 bpf: Enforce W^X for bpf trampoline
Mark the trampoline as RO+X after arch_prepare_bpf_trampoline, so that
the trampoine follows W^X rule strictly. This will turn off warnings like

CPA refuse W^X violation: 8000000000000163 -> 0000000000000163 range: ...

Also remove bpf_jit_alloc_exec_page(), since it is not used any more.

Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20220926184739.3512547-3-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26 20:40:43 -07:00
Song Liu
19c02415da bpf: use bpf_prog_pack for bpf_dispatcher
Allocate bpf_dispatcher with bpf_prog_pack_alloc so that bpf_dispatcher
can share pages with bpf programs.

arch_prepare_bpf_dispatcher() is updated to provide a RW buffer as working
area for arch code to write to.

This also fixes CPA W^X warnning like:

CPA refuse W^X violation: 8000000000000163 -> 0000000000000163 range: ...

Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20220926184739.3512547-2-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26 20:40:43 -07:00
Alexei Starovoitov
30b8fdbbe3 Merge branch 'bpf: Fixes for CONFIG_X86_KERNEL_IBT'
Jiri Olsa says:

====================
Martynas reported bpf_get_func_ip returning +4 address when
CONFIG_X86_KERNEL_IBT option is enabled and I found there are
some failing bpf tests when this option is enabled.

The CONFIG_X86_KERNEL_IBT option adds endbr instruction at the
function entry, so the idea is to 'fix' entry ip for kprobe_multi
and trampoline probes, because they are placed on the function
entry.

v5 changes:
  - updated uapi/linux/bpf.h headers with comment for
    bpf_get_func_ip returning 0 [Andrii]
  - added acks

v4 changes:
  - used get_kernel_nofault to read previous instruction [Peter]
  - used movabs instruction in trampoline comment [Peter]
  - renamed fentry_ip argument in kprobe_multi_link_handler [Peter]

v3 changes:
  - using 'unused' bpf function to get IBT config option
    into selftest skeleton
  - rebased to current bpf-next/master
  - added ack/review from Masami

v2 changes:
  - change kprobes get_func_ip to return zero for kprobes
    attached within the function body [Andrii]
  - detect IBT config and properly test kprobe with offset
    [Andrii]

v1 changes:
  - read previous instruction in kprobe_multi link handler
    and adjust entry_ip for CONFIG_X86_KERNEL_IBT option
  - split first patch into 2 separate changes
  - update changelogs
====================

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26 20:31:02 -07:00
Jiri Olsa
738c345b74 selftests/bpf: Fix get_func_ip offset test for CONFIG_X86_KERNEL_IBT
With CONFIG_X86_KERNEL_IBT enabled the test for kprobe with offset
won't work because of the extra endbr instruction.

As suggested by Andrii adding CONFIG_X86_KERNEL_IBT detection
and using appropriate offset value based on that.

Also removing test7 program, because it does the same as test6.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26 20:30:40 -07:00
Jiri Olsa
0e253f7e55 bpf: Return value in kprobe get_func_ip only for entry address
Changing return value of kprobe's version of bpf_get_func_ip
to return zero if the attach address is not on the function's
entry point.

For kprobes attached in the middle of the function we can't easily
get to the function address especially now with the CONFIG_X86_KERNEL_IBT
support.

If user cares about current IP for kprobes attached within the
function body, they can get it with PT_REGS_IP(ctx).

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26 20:30:40 -07:00
Jiri Olsa
c09eb2e578 bpf: Adjust kprobe_multi entry_ip for CONFIG_X86_KERNEL_IBT
Martynas reported bpf_get_func_ip returning +4 address when
CONFIG_X86_KERNEL_IBT option is enabled.

When CONFIG_X86_KERNEL_IBT is enabled we'll have endbr instruction
at the function entry, which screws return value of bpf_get_func_ip()
helper that should return the function address.

There's short term workaround for kprobe_multi bpf program made by
Alexei [1], but we need this fixup also for bpf_get_attach_cookie,
that returns cookie based on the entry_ip value.

Moving the fixup in the fprobe handler, so both bpf_get_func_ip
and bpf_get_attach_cookie get expected function address when
CONFIG_X86_KERNEL_IBT option is enabled.

Also renaming kprobe_multi_link_handler entry_ip argument to fentry_ip
so it's clearer this is an ftrace __fentry__ ip.

[1] commit 7f0059b58f ("selftests/bpf: Fix kprobe_multi test.")

Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26 20:30:39 -07:00
Jiri Olsa
4d854f4f31 bpf: Use given function address for trampoline ip arg
Using function address given at the generation time as the trampoline
ip argument. This way we get directly the function address that we
need, so we don't need to:
  - read the ip from the stack
  - subtract X86_PATCH_SIZE
  - subtract ENDBR_INSN_SIZE if CONFIG_X86_KERNEL_IBT is enabled
    which is not even implemented yet ;-)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26 20:30:39 -07:00
Jiri Olsa
9d68c19c57 ftrace: Keep the resolved addr in kallsyms_callback
Keeping the resolved 'addr' in kallsyms_callback, instead of taking
ftrace_location value, because we depend on symbol address in the
cookie related code.

With CONFIG_X86_KERNEL_IBT option the ftrace_location value differs
from symbol address, which screwes the symbol address cookies matching.

There are 2 users of this function:
- bpf_kprobe_multi_link_attach
    for which this fix is for

- get_ftrace_locations
    which is used by register_fprobe_syms

    this function needs to get symbols resolved to addresses,
    but does not need 'ftrace location addresses' at this point
    there's another ftrace location translation in the path done
    by ftrace_set_filter_ips call:

     register_fprobe_syms
       addrs = get_ftrace_locations

       register_fprobe_ips(addrs)
         ...
         ftrace_set_filter_ips
           ...
             __ftrace_match_addr
               ip = ftrace_location(ip);
               ...

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26 20:30:39 -07:00
Jiri Olsa
bf7a87f107 kprobes: Add new KPROBE_FLAG_ON_FUNC_ENTRY kprobe flag
Adding KPROBE_FLAG_ON_FUNC_ENTRY kprobe flag to indicate that
attach address is on function entry. This is used in following
changes in get_func_ip helper to return correct function address.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220926153340.1621984-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26 20:30:39 -07:00
Liu Jian
bec217197b skmsg: Schedule psock work if the cached skb exists on the psock
In sk_psock_backlog function, for ingress direction skb, if no new data
packet arrives after the skb is cached, the cached skb does not have a
chance to be added to the receive queue of psock. As a result, the cached
skb cannot be received by the upper-layer application. Fix this by reschedule
the psock work to dispose the cached skb in sk_msg_recvmsg function.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220907071311.60534-1-liujian56@huawei.com
2022-09-26 17:48:05 +02:00
Liu Jian
043a7356db selftests/bpf: Add wait send memory test for sockmap redirect
Add one test for wait redirect sock's send memory test for sockmap.

Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220823133755.314697-3-liujian56@huawei.com
2022-09-26 17:45:31 +02:00
Liu Jian
3f8ef65af9 net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory
Fixes the below NULL pointer dereference:

  [...]
  [   14.471200] Call Trace:
  [   14.471562]  <TASK>
  [   14.471882]  lock_acquire+0x245/0x2e0
  [   14.472416]  ? remove_wait_queue+0x12/0x50
  [   14.473014]  ? _raw_spin_lock_irqsave+0x17/0x50
  [   14.473681]  _raw_spin_lock_irqsave+0x3d/0x50
  [   14.474318]  ? remove_wait_queue+0x12/0x50
  [   14.474907]  remove_wait_queue+0x12/0x50
  [   14.475480]  sk_stream_wait_memory+0x20d/0x340
  [   14.476127]  ? do_wait_intr_irq+0x80/0x80
  [   14.476704]  do_tcp_sendpages+0x287/0x600
  [   14.477283]  tcp_bpf_push+0xab/0x260
  [   14.477817]  tcp_bpf_sendmsg_redir+0x297/0x500
  [   14.478461]  ? __local_bh_enable_ip+0x77/0xe0
  [   14.479096]  tcp_bpf_send_verdict+0x105/0x470
  [   14.479729]  tcp_bpf_sendmsg+0x318/0x4f0
  [   14.480311]  sock_sendmsg+0x2d/0x40
  [   14.480822]  ____sys_sendmsg+0x1b4/0x1c0
  [   14.481390]  ? copy_msghdr_from_user+0x62/0x80
  [   14.482048]  ___sys_sendmsg+0x78/0xb0
  [   14.482580]  ? vmf_insert_pfn_prot+0x91/0x150
  [   14.483215]  ? __do_fault+0x2a/0x1a0
  [   14.483738]  ? do_fault+0x15e/0x5d0
  [   14.484246]  ? __handle_mm_fault+0x56b/0x1040
  [   14.484874]  ? lock_is_held_type+0xdf/0x130
  [   14.485474]  ? find_held_lock+0x2d/0x90
  [   14.486046]  ? __sys_sendmsg+0x41/0x70
  [   14.486587]  __sys_sendmsg+0x41/0x70
  [   14.487105]  ? intel_pmu_drain_pebs_core+0x350/0x350
  [   14.487822]  do_syscall_64+0x34/0x80
  [   14.488345]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [...]

The test scenario has the following flow:

thread1                               thread2
-----------                           ---------------
 tcp_bpf_sendmsg
  tcp_bpf_send_verdict
   tcp_bpf_sendmsg_redir              sock_close
    tcp_bpf_push_locked                 __sock_release
     tcp_bpf_push                         //inet_release
      do_tcp_sendpages                    sock->ops->release
       sk_stream_wait_memory          	   // tcp_close
          sk_wait_event                      sk->sk_prot->close
           release_sock(__sk);
            ***
                                                lock_sock(sk);
                                                  __tcp_close
                                                    sock_orphan(sk)
                                                      sk->sk_wq  = NULL
                                                release_sock
            ****
           lock_sock(__sk);
          remove_wait_queue(sk_sleep(sk), &wait);
             sk_sleep(sk)
             //NULL pointer dereference
             &rcu_dereference_raw(sk->sk_wq)->wait

While waiting for memory in thread1, the socket is released with its wait
queue because thread2 has closed it. This caused by tcp_bpf_send_verdict
didn't increase the f_count of psock->sk_redir->sk_socket->file in thread1.

We should check if SOCK_DEAD flag is set on wakeup in sk_stream_wait_memory
before accessing the wait queue.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20220823133755.314697-2-liujian56@huawei.com
2022-09-26 17:43:43 +02:00
Alexei Starovoitov
230bf137e7 Merge branch 'veristat: further usability improvements'
Andrii Nakryiko says:

====================

A small patch set adding few usability improvements and features making
veristat a more convenient tool to be used for work on BPF verifier:

  - patch #2 speeds up and makes stats parsing from BPF verifier log more
    robust;

  - patch #3 makes veristat less strict about input object files; veristat
    will ignore non-BPF ELF files;

  - patch #4 adds progress log, by default, so that user doing
    mass-verification is aware that veristat is not stuck;

  - patch #5 allows to tune requested BPF verifier log level, which makes
    veristat a simplest way to get BPF verifier log, especially successfully
    verified ones.

v1->v2:
  - don't emit progress in non-table mode, as it breaks CSV output.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-23 18:14:45 -07:00