linux

Author	SHA1	Message	Date
Masami Hiramatsu	75caf33eda	rethook: x86: Add rethook x86 implementation Add rethook for x86 implementation. Most of the code has been copied from kretprobes on x86. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735286243.1084943.7477055110527046644.stgit@devnote2	2022-03-17 20:16:35 -07:00
Masami Hiramatsu	54ecbe6f1e	rethook: Add a generic return hook Add a return hook framework which hooks the function return. Most of the logic came from the kretprobe, but this is independent from kretprobe. Note that this is expected to be used with other function entry hooking feature, like ftrace, fprobe, adn kprobes. Eventually this will replace the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment, this is just an additional hook. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2	2022-03-17 20:16:29 -07:00
Masami Hiramatsu	cad9931f64	fprobe: Add ftrace based probe APIs The fprobe is a wrapper API for ftrace function tracer. Unlike kprobes, this probes only supports the function entry, but this can probe multiple functions by one fprobe. The usage is similar, user will set their callback to fprobe::entry_handler and call register_fprobe() with probed functions. There are 3 registration interfaces, - register_fprobe() takes filtering patterns of the functin names. - register_fprobe_ips() takes an array of ftrace-location addresses. - register_fprobe_syms() takes an array of function names. The registered fprobes can be unregistered with unregister_fprobe(). e.g. struct fprobe fp = { .entry_handler = user_handler }; const char targets[] = { "func1", "func2", "func3"}; ... ret = register_fprobe_syms(&fp, targets, ARRAY_SIZE(targets)); ... unregister_fprobe(&fp); Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735283857.1084943.1154436951479395551.stgit@devnote2	2022-03-17 20:16:15 -07:00
Jiri Olsa	4f554e9556	ftrace: Add ftrace_set_filter_ips function Adding ftrace_set_filter_ips function to be able to set filter on multiple ip addresses at once. With the kprobe multi attach interface we have cases where we need to initialize ftrace_ops object with thousands of functions, so having single function diving into ftrace_hash_move_and_update_ops with ftrace_lock is faster. The functions ips are passed as unsigned long array with count. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735282673.1084943.18310504594134769804.stgit@devnote2	2022-03-17 20:15:17 -07:00
Kaixi Fan	e0999c8e59	selftests/bpf: Fix tunnel remote IP comments In namespace at_ns0, the IP address of tnl dev is 10.1.1.100 which is the overlay IP, and the ip address of veth0 is 172.16.1.100 which is the vtep IP. When doing 'ping 10.1.1.100' from root namespace, the remote_ip should be 172.16.1.100. Fixes: `933a741e3b` ("selftests/bpf: bpf tunnel test.") Signed-off-by: Kaixi Fan <fankaixi.li@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220313164116.5889-1-fankaixi.li@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-03-17 16:08:02 -07:00
Lorenzo Bianconi	7cda76d858	veth: Allow jumbo frames in xdp mode Allow increasing the MTU over page boundaries on veth devices if the attached xdp program declares to support xdp fragments. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/d5dc039c3d4123426e7023a488c449181a7bc57f.1646989407.git.lorenzo@kernel.org	2022-03-17 20:33:52 +01:00
Lorenzo Bianconi	718a18a0c8	veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb Introduce veth_convert_skb_to_xdp_buff routine in order to convert a non-linear skb into a xdp buffer. If the received skb is cloned or shared, veth_convert_skb_to_xdp_buff will copy it in a new skb composed by order-0 pages for the linear and the fragmented area. Moreover veth_convert_skb_to_xdp_buff guarantees we have enough headroom for xdp. This is a preliminary patch to allow attaching xdp programs with frags support on veth devices. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/8d228b106bc1903571afd1d77e797bffe9a5ea7c.1646989407.git.lorenzo@kernel.org	2022-03-17 20:33:52 +01:00
Lorenzo Bianconi	5142239a22	net: veth: Account total xdp_frame len running ndo_xdp_xmit Even if this is a theoretical issue since it is not possible to perform XDP_REDIRECT on a non-linear xdp_frame, veth driver does not account paged area in ndo_xdp_xmit function pointer. Introduce xdp_get_frame_len utility routine to get the xdp_frame full length and account total frame size running XDP_REDIRECT of a non-linear xdp frame into a veth device. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/54f9fd3bb65d190daf2c0bbae2f852ff16cfbaa0.1646989407.git.lorenzo@kernel.org	2022-03-17 20:33:52 +01:00
Hou Tao	ad13baf456	selftests/bpf: Test subprog jit when toggle bpf_jit_harden repeatedly When bpf_jit_harden is toggled between 0 and 2, subprog jit may fail due to inconsistent twice read values of bpf_jit_harden during jit. So add a test to ensure the problem is fixed. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220309123321.2400262-5-houtao1@huawei.com	2022-03-16 15:13:36 -07:00
Hou Tao	d2a3b7c5be	bpf: Fix net.core.bpf_jit_harden race It is the bpf_jit_harden counterpart to commit `60b58afc96` ("bpf: fix net.core.bpf_jit_enable race"). bpf_jit_harden will be tested twice for each subprog if there are subprogs in bpf program and constant blinding may increase the length of program, so when running "./test_progs -t subprogs" and toggling bpf_jit_harden between 0 and 2, jit_subprogs may fail because constant blinding increases the length of subprog instructions during extra passs. So cache the value of bpf_jit_blinding_enabled() during program allocation, and use the cached value during constant blinding, subprog JITing and args tracking of tail call. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220309123321.2400262-4-houtao1@huawei.com	2022-03-16 15:13:36 -07:00
Hou Tao	73e14451f3	bpf, x86: Fall back to interpreter mode when extra pass fails Extra pass for subprog jit may fail (e.g. due to bpf_jit_harden race), but bpf_func is not cleared for the subprog and jit_subprogs will succeed. The running of the bpf program may lead to oops because the memory for the jited subprog image has already been freed. So fall back to interpreter mode by clearing bpf_func/jited/jited_len when extra pass fails. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220309123321.2400262-2-houtao1@huawei.com	2022-03-16 15:12:18 -07:00
Alexei Starovoitov	aaccdf9c93	Merge branch 'Remove libcap dependency from bpf selftests' Martin KaFai Lau says: ==================== After upgrading to the newer libcap (>= 2.60), the libcap commit aca076443591 ("Make cap_t operations thread safe.") added a "__u8 mutex;" to the "struct _cap_struct". It caused a few byte shift that breaks the assumption made in the "struct libcap" definition in test_verifier.c. This set is to remove the libcap dependency from the bpf selftests. v2: - Define CAP_PERFMON and CAP_BPF when the older <linux/capability.h> does not have them. (Andrii) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-03-16 15:07:50 -07:00
Martin KaFai Lau	82cb2b3077	bpf: selftests: Remove libcap usage from test_progs This patch removes the libcap usage from test_progs. bind_perm.c is the only user. cap_*_effective() helpers added in the earlier patch are directly used instead. No other selftest binary is using libcap, so '-lcap' is also removed from the Makefile. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220316173835.2039334-1-kafai@fb.com	2022-03-16 15:07:49 -07:00
Martin KaFai Lau	b1c2768a82	bpf: selftests: Remove libcap usage from test_verifier This patch removes the libcap usage from test_verifier. The cap_*_effective() helpers added in the earlier patch are used instead. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220316173829.2038682-1-kafai@fb.com	2022-03-16 15:07:49 -07:00
Martin KaFai Lau	663af70aab	bpf: selftests: Add helpers to directly use the capget and capset syscall After upgrading to the newer libcap (>= 2.60), the libcap commit aca076443591 ("Make cap_t operations thread safe.") added a "__u8 mutex;" to the "struct _cap_struct". It caused a few byte shift that breaks the assumption made in the "struct libcap" definition in test_verifier.c. The bpf selftest usage only needs to enable and disable the effective caps of the running task. It is easier to directly syscall the capget and capset instead. It can also remove the libcap library dependency. The cap_helpers.{c,h} is added. One __u64 is used for all CAP_* bits instead of two __u32. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220316173823.2036955-1-kafai@fb.com	2022-03-16 15:07:49 -07:00
Daniel Xu	6585abea98	bpftool: man: Add missing top level docs The top-level (bpftool.8) man page was missing docs for a few subcommands and their respective sub-sub-commands. This commit brings the top level man page up to date. Note that I've kept the ordering of the subcommands the same as in `bpftool help`. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/3049ef5dc509c0d1832f0a8b2dba2ccaad0af688.1647213551.git.dxu@dxuuu.xyz	2022-03-15 15:51:41 -07:00
Dmitrii Dolgov	cbdaf71f7e	bpftool: Add bpf_cookie to link output Commit `82e6b1eee6` ("bpf: Allow to specify user-provided bpf_cookie for BPF perf links") introduced the concept of user specified bpf_cookie, which could be accessed by BPF programs using bpf_get_attach_cookie(). For troubleshooting purposes it is convenient to expose bpf_cookie via bpftool as well, so there is no need to meddle with the target BPF program itself. Implemented using the pid iterator BPF program to actually fetch bpf_cookies, which allows constraining code changes only to bpftool. $ bpftool link 1: type 7 prog 5 bpf_cookie 123 pids bootstrap(81) Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220309163112.24141-1-9erthalion6@gmail.com	2022-03-15 15:07:27 -07:00
Guo Zhengkui	f98d6dd1e7	selftests/bpf: Clean up array_size.cocci warnings Clean up the array_size.cocci warnings under tools/testing/selftests/bpf/: Use `ARRAY_SIZE(arr)` instead of forms like `sizeof(arr)/sizeof(arr[0])`. tools/testing/selftests/bpf/test_cgroup_storage.c uses ARRAY_SIZE() defined in tools/include/linux/kernel.h (sys/sysinfo.h -> linux/kernel.h), while others use ARRAY_SIZE() in bpf_util.h. Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220315130143.2403-1-guozhengkui@vivo.com	2022-03-15 17:03:10 +01:00
Niklas Söderlund	8fa42d78f6	samples/bpf, xdpsock: Fix race when running for fix duration of time When running xdpsock for a fix duration of time before terminating using --duration=<n>, there is a race condition that may cause xdpsock to terminate immediately. When running for a fixed duration of time the check to determine when to terminate execution is in is_benchmark_done() and is being executed in the context of the poller thread, if (opt_duration > 0) { unsigned long dt = (get_nsecs() - start_time); if (dt >= opt_duration) benchmark_done = true; } However start_time is only set after the poller thread have been created. This leaves a small window when the poller thread is starting and calls is_benchmark_done() for the first time that start_time is not yet set. In that case start_time have its initial value of 0 and the duration check fails as it do not correlate correctly for the applications start time and immediately sets benchmark_done which in turn terminates the xdpsock application. Fix this by setting start_time before creating the poller thread. Fixes: `d3f11b018f` ("samples/bpf: xdpsock: Add duration option to specify how long to run") Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220315102948.466436-1-niklas.soderlund@corigine.com	2022-03-15 16:53:37 +01:00
Wang Yufen	2486ab434b	bpf, sockmap: Fix double uncharge the mem of sk_msg If tcp_bpf_sendmsg is running during a tear down operation, psock may be freed. tcp_bpf_sendmsg() tcp_bpf_send_verdict() sk_msg_return() tcp_bpf_sendmsg_redir() unlikely(!psock)) sk_msg_free() The mem of msg has been uncharged in tcp_bpf_send_verdict() by sk_msg_return(), and would be uncharged by sk_msg_free() again. When psock is null, we can simply returning an error code, this would then trigger the sk_msg_free_nocharge in the error path of __SK_REDIRECT and would have the side effect of throwing an error up to user space. This would be a slight change in behavior from user side but would look the same as an error if the redirect on the socket threw an error. This issue can cause the following info: WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260 Call Trace: <TASK> __sk_destruct+0x24/0x1f0 sk_psock_destroy+0x19b/0x1c0 process_one_work+0x1b3/0x3c0 worker_thread+0x30/0x350 ? process_one_work+0x3c0/0x3c0 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220304081145.2037182-5-wangyufen@huawei.com	2022-03-15 16:43:31 +01:00
Wang Yufen	84472b436e	bpf, sockmap: Fix more uncharged while msg has more_data In tcp_bpf_send_verdict(), if msg has more data after tcp_bpf_sendmsg_redir(): tcp_bpf_send_verdict() tosend = msg->sg.size //msg->sg.size = 22220 case __SK_REDIRECT: sk_msg_return() //uncharged msg->sg.size(22220) sk->sk_forward_alloc tcp_bpf_sendmsg_redir() //after tcp_bpf_sendmsg_redir, msg->sg.size=11000 goto more_data; tosend = msg->sg.size //msg->sg.size = 11000 case __SK_REDIRECT: sk_msg_return() //uncharged msg->sg.size(11000) to sk->sk_forward_alloc The msg->sg.size(11000) has been uncharged twice, to fix we can charge the remaining msg->sg.size before goto more data. This issue can cause the following info: WARNING: CPU: 0 PID: 9860 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0 Call Trace: <TASK> inet_csk_destroy_sock+0x55/0x110 __tcp_close+0x279/0x470 tcp_close+0x1f/0x60 inet_release+0x3f/0x80 __sock_release+0x3d/0xb0 sock_close+0x11/0x20 __fput+0x92/0x250 task_work_run+0x6a/0xa0 do_exit+0x33b/0xb60 do_group_exit+0x2f/0xa0 get_signal+0xb6/0x950 arch_do_signal_or_restart+0xac/0x2a0 ? vfs_write+0x237/0x290 exit_to_user_mode_prepare+0xa9/0x200 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x46/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260 Call Trace: <TASK> __sk_destruct+0x24/0x1f0 sk_psock_destroy+0x19b/0x1c0 process_one_work+0x1b3/0x3c0 worker_thread+0x30/0x350 ? process_one_work+0x3c0/0x3c0 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220304081145.2037182-4-wangyufen@huawei.com	2022-03-15 16:43:31 +01:00
Wang Yufen	9c34e38c4a	bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full If tcp_bpf_sendmsg() is running while sk msg is full. When sk_msg_alloc() returns -ENOMEM error, tcp_bpf_sendmsg() goes to wait_for_memory. If partial memory has been alloced by sk_msg_alloc(), that is, msg_tx->sg.size is greater than osize after sk_msg_alloc(), memleak occurs. To fix we use sk_msg_trim() to release the allocated memory, then goto wait for memory. Other call paths of sk_msg_alloc() have the similar issue, such as tls_sw_sendmsg(), so handle sk_msg_trim logic inside sk_msg_alloc(), as Cong Wang suggested. This issue can cause the following info: WARNING: CPU: 3 PID: 7950 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0 Call Trace: <TASK> inet_csk_destroy_sock+0x55/0x110 __tcp_close+0x279/0x470 tcp_close+0x1f/0x60 inet_release+0x3f/0x80 __sock_release+0x3d/0xb0 sock_close+0x11/0x20 __fput+0x92/0x250 task_work_run+0x6a/0xa0 do_exit+0x33b/0xb60 do_group_exit+0x2f/0xa0 get_signal+0xb6/0x950 arch_do_signal_or_restart+0xac/0x2a0 exit_to_user_mode_prepare+0xa9/0x200 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x46/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> WARNING: CPU: 3 PID: 2094 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260 Call Trace: <TASK> __sk_destruct+0x24/0x1f0 sk_psock_destroy+0x19b/0x1c0 process_one_work+0x1b3/0x3c0 kthread+0xe6/0x110 ret_from_fork+0x22/0x30 </TASK> Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220304081145.2037182-3-wangyufen@huawei.com	2022-03-15 16:43:31 +01:00
Wang Yufen	938d3480b9	bpf, sockmap: Fix memleak in sk_psock_queue_msg If tcp_bpf_sendmsg is running during a tear down operation we may enqueue data on the ingress msg queue while tear down is trying to free it. sk1 (redirect sk2) sk2 ------------------- --------------- tcp_bpf_sendmsg() tcp_bpf_send_verdict() tcp_bpf_sendmsg_redir() bpf_tcp_ingress() sock_map_close() lock_sock() lock_sock() ... blocking sk_psock_stop sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); release_sock(sk); lock_sock() sk_mem_charge() get_page() sk_psock_queue_msg() sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED); drop_sk_msg() release_sock() While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge and has sg pages need to put. To fix we use sk_msg_free() and then kfee() msg. This issue can cause the following info: WARNING: CPU: 0 PID: 9202 at net/core/stream.c:205 sk_stream_kill_queues+0xc8/0xe0 Call Trace: <IRQ> inet_csk_destroy_sock+0x55/0x110 tcp_rcv_state_process+0xe5f/0xe90 ? sk_filter_trim_cap+0x10d/0x230 ? tcp_v4_do_rcv+0x161/0x250 tcp_v4_do_rcv+0x161/0x250 tcp_v4_rcv+0xc3a/0xce0 ip_protocol_deliver_rcu+0x3d/0x230 ip_local_deliver_finish+0x54/0x60 ip_local_deliver+0xfd/0x110 ? ip_protocol_deliver_rcu+0x230/0x230 ip_rcv+0xd6/0x100 ? ip_local_deliver+0x110/0x110 __netif_receive_skb_one_core+0x85/0xa0 process_backlog+0xa4/0x160 __napi_poll+0x29/0x1b0 net_rx_action+0x287/0x300 __do_softirq+0xff/0x2fc do_softirq+0x79/0x90 </IRQ> WARNING: CPU: 0 PID: 531 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x175/0x1b0 Call Trace: <TASK> __sk_destruct+0x24/0x1f0 sk_psock_destroy+0x19b/0x1c0 process_one_work+0x1b3/0x3c0 ? process_one_work+0x3c0/0x3c0 worker_thread+0x30/0x350 ? process_one_work+0x3c0/0x3c0 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Fixes: `9635720b7c` ("bpf, sockmap: Fix memleak on ingress msg enqueue") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220304081145.2037182-2-wangyufen@huawei.com	2022-03-15 16:43:31 +01:00
Yonghong Song	d3b351f65b	selftests/bpf: Fix a clang compilation error for send_signal.c Building selftests/bpf with latest clang compiler (clang15 built from source), I hit the following compilation error: /.../prog_tests/send_signal.c:43:16: error: variable 'j' set but not used [-Werror,-Wunused-but-set-variable] volatile int j = 0; ^ 1 error generated. The problem also exists with clang13 and clang14. clang12 is okay. In send_signal.c, we have the following code ... volatile int j = 0; [...] for (int i = 0; i < 100000000 && !sigusr1_received; i++) j /= i + 1; ... to burn CPU cycles so bpf_send_signal() helper can be tested in NMI mode. Slightly changing 'j /= i + 1' to 'j /= i + j + 1' or 'j++' can fix the problem. Further investigation indicated this should be a clang bug ([1]). The upstream fix will be proposed later. But it is a good idea to workaround the issue to unblock people who build kernel/selftests with clang. [1] https://discourse.llvm.org/t/strange-clang-unused-but-set-variable-error-with-volatile-variables/60841 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220311003721.2177170-1-yhs@fb.com	2022-03-11 22:18:13 +01:00
Toke Høiland-Jørgensen	c09df4bd3a	selftests/bpf: Add a test for maximum packet size in xdp_do_redirect This adds an extra test to the xdp_do_redirect selftest for XDP live packet mode, which verifies that the maximum permissible packet size is accepted without any errors, and that a too big packet is correctly rejected. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220310225621.53374-2-toke@redhat.com	2022-03-11 22:01:26 +01:00
Toke Høiland-Jørgensen	b6f1f780b3	bpf, test_run: Fix packet size check for live packet mode The live packet mode uses some extra space at the start of each page to cache data structures so they don't have to be rebuilt at every repetition. This space wasn't correctly accounted for in the size checking of the arguments supplied to userspace. In addition, the definition of the frame size should include the size of the skb_shared_info (as there is other logic that subtracts the size of this). Together, these mistakes resulted in userspace being able to trip the XDP_WARN() in xdp_update_frame_from_buff(), which syzbot discovered in short order. Fix this by changing the frame size define and adding the extra headroom to the bpf_prog_test_run_xdp() function. Also drop the max_len parameter to the page_pool init, since this is related to DMA which is not used for the page pool instance in PROG_TEST_RUN. Fixes: `b530e9e106` ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Reported-by: syzbot+0e91362d99386dc5de99@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220310225621.53374-1-toke@redhat.com	2022-03-11 22:01:26 +01:00
Hao Luo	6789ab9668	compiler_types: Refactor the use of btf_type_tag attribute. Previous patches have introduced the compiler attribute btf_type_tag for __user and __percpu. The availability of this attribute depends on some CONFIGs and compiler support. This patch refactors the use of btf_type_tag by introducing BTF_TYPE_TAG, which hides all the dependencies. No functional change. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220310211655.3173786-1-haoluo@google.com	2022-03-10 19:13:12 -08:00
Alexei Starovoitov	a77c2cfd4e	Merge branch 'bpf-lsm: Extend interoperability with IMA' Roberto Sassu says: ==================== Extend the interoperability with IMA, to give wider flexibility for the implementation of integrity-focused LSMs based on eBPF. Patch 1 fixes some style issues. Patches 2-6 give the ability to eBPF-based LSMs to take advantage of the measurement capability of IMA without needing to setup a policy in IMA (those LSMs might implement the policy capability themselves). Patches 7-9 allow eBPF-based LSMs to evaluate files read by the kernel. Changelog v2: - Add better description to patch 1 (suggested by Shuah) - Recalculate digest if it is not fresh (when IMA_COLLECTED flag not set) - Move declaration of bpf_ima_file_hash() at the end (suggested by Yonghong) - Add tests to check if the digest has been recalculated - Add deny test for bpf_kernel_read_file() - Add description to tests v1: - Modify ima_file_hash() only and allow the usage of the function with the modified behavior by eBPF-based LSMs through the new function bpf_ima_file_hash() (suggested by Mimi) - Make bpf_lsm_kernel_read_file() sleepable so that bpf_ima_inode_hash() and bpf_ima_file_hash() can be called inside the implementation of eBPF-based LSMs for this hook ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-03-10 19:10:13 -08:00
Roberto Sassu	7bae42b68d	selftests/bpf: Check that bpf_kernel_read_file() denies reading IMA policy Check that bpf_kernel_read_file() denies the reading of an IMA policy, by ensuring that ima_setup.sh exits with an error. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-10-roberto.sassu@huawei.com	2022-03-10 18:57:55 -08:00
Roberto Sassu	e6dcf7bbf3	selftests/bpf: Add test for bpf_lsm_kernel_read_file() Test the ability of bpf_lsm_kernel_read_file() to call the sleepable functions bpf_ima_inode_hash() or bpf_ima_file_hash() to obtain a measurement of a loaded IMA policy. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-9-roberto.sassu@huawei.com	2022-03-10 18:57:55 -08:00
Roberto Sassu	df6b3039fa	bpf-lsm: Make bpf_lsm_kernel_read_file() as sleepable Make bpf_lsm_kernel_read_file() as sleepable, so that bpf_ima_inode_hash() or bpf_ima_file_hash() can be called inside the implementation of this hook. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-8-roberto.sassu@huawei.com	2022-03-10 18:57:55 -08:00
Roberto Sassu	91e8fa254d	selftests/bpf: Check if the digest is refreshed after a file write Verify that bpf_ima_inode_hash() returns a non-fresh digest after a file write, and that bpf_ima_file_hash() returns a fresh digest. Verification is done by requesting the digest from the bprm_creds_for_exec hook, called before ima_bprm_check(). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-7-roberto.sassu@huawei.com	2022-03-10 18:57:54 -08:00
Roberto Sassu	27a77d0d46	selftests/bpf: Add test for bpf_ima_file_hash() Add new test to ensure that bpf_ima_file_hash() returns the digest of the executed files. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-6-roberto.sassu@huawei.com	2022-03-10 18:57:54 -08:00
Roberto Sassu	2746de3c53	selftests/bpf: Move sample generation code to ima_test_common() Move sample generator code to ima_test_common() so that the new function can be called by multiple LSM hooks. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-5-roberto.sassu@huawei.com	2022-03-10 18:57:54 -08:00
Roberto Sassu	174b16946e	bpf-lsm: Introduce new helper bpf_ima_file_hash() ima_file_hash() has been modified to calculate the measurement of a file on demand, if it has not been already performed by IMA or the measurement is not fresh. For compatibility reasons, ima_inode_hash() remains unchanged. Keep the same approach in eBPF and introduce the new helper bpf_ima_file_hash() to take advantage of the modified behavior of ima_file_hash(). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-4-roberto.sassu@huawei.com	2022-03-10 18:57:54 -08:00
Roberto Sassu	280fe8367b	ima: Always return a file measurement in ima_file_hash() __ima_inode_hash() checks if a digest has been already calculated by looking for the integrity_iint_cache structure associated to the passed inode. Users of ima_file_hash() (e.g. eBPF) might be interested in obtaining the information without having to setup an IMA policy so that the digest is always available at the time they call this function. In addition, they likely expect the digest to be fresh, e.g. recalculated by IMA after a file write. Although getting the digest from the bprm_committed_creds hook (as in the eBPF test) ensures that the digest is fresh, as the IMA hook is executed before that hook, this is not always the case (e.g. for the mmap_file hook). Call ima_collect_measurement() in __ima_inode_hash(), if the file descriptor is available (passed by ima_file_hash()) and the digest is not available/not fresh, and store the file measurement in a temporary integrity_iint_cache structure. This change does not cause memory usage increase, due to using the temporary integrity_iint_cache structure, and due to freeing the ima_digest_data structure inside integrity_iint_cache before exiting from __ima_inode_hash(). For compatibility reasons, the behavior of ima_inode_hash() remains unchanged. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/bpf/20220302111404.193900-3-roberto.sassu@huawei.com	2022-03-10 18:56:24 -08:00
Roberto Sassu	bae60eefb9	ima: Fix documentation-related warnings in ima_main.c Fix the following warnings in ima_main.c, displayed with W=n make argument: security/integrity/ima/ima_main.c:432: warning: Function parameter or member 'vma' not described in 'ima_file_mprotect' security/integrity/ima/ima_main.c:636: warning: Function parameter or member 'inode' not described in 'ima_post_create_tmpfile' security/integrity/ima/ima_main.c:636: warning: Excess function parameter 'file' description in 'ima_post_create_tmpfile' security/integrity/ima/ima_main.c:843: warning: Function parameter or member 'load_id' not described in 'ima_post_load_data' security/integrity/ima/ima_main.c:843: warning: Excess function parameter 'id' description in 'ima_post_load_data' Also, fix some style issues in the description of ima_post_create_tmpfile() and ima_post_path_mknod(). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/bpf/20220302111404.193900-2-roberto.sassu@huawei.com	2022-03-10 18:56:23 -08:00
Chris J Arges	357b3cc3c0	bpftool: Ensure bytes_memlock json output is correct If a BPF map is created over 2^32 the memlock value as displayed in JSON format will be incorrect. Use atoll instead of atoi so that the correct number is displayed. ``` $ bpftool map create /sys/fs/bpf/test_bpfmap type hash key 4 \ value 1024 entries 4194304 name test_bpfmap $ bpftool map list 1: hash name test_bpfmap flags 0x0 key 4B value 1024B max_entries 4194304 memlock 4328521728B $ sudo bpftool map list -j \| jq .[].bytes_memlock 33554432 ``` Signed-off-by: Chris J Arges <carges@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/b6601087-0b11-33cc-904a-1133d1500a10@cloudflare.com	2022-03-11 00:06:11 +01:00
Yuntao Wang	1b773d0003	bpf: Use offsetofend() to simplify macro definition Use offsetofend() instead of offsetof() + sizeof() to simplify MIN_BPF_LINEINFO_SIZE macro definition. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/bpf/20220310161518.534544-1-ytcoode@gmail.com	2022-03-10 23:04:47 +01:00
Hengqi Chen	5861701440	bpf: Fix comment for helper bpf_current_task_under_cgroup() Fix the descriptions of the return values of helper bpf_current_task_under_cgroup(). Fixes: `c6b5fb8690` ("bpf: add documentation for eBPF helpers (42-50)") Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220310155335.1278783-1-hengqi.chen@gmail.com	2022-03-10 23:00:43 +01:00
Daniel Borkmann	60695896e4	Merge branch 'bpf-tstamp-follow-ups' Martin KaFai Lau says: ==================== This set is a follow up on the bpf side based on discussion [0]. Patch 1 is to remove some skbuff macros that are used in bpf filter.c. Patch 2 and 3 are to simplify the bpf insn rewrite on __sk_buff->tstamp. Patch 4 is to simplify the bpf uapi by modeling the __sk_buff->tstamp and __sk_buff->tstamp_type (was delivery_time_type) the same as its kernel counter part skb->tstamp and skb->mono_delivery_time. Patch 5 is to adjust the bpf selftests due to changes in patch 4. [0]: https://lore.kernel.org/bpf/419d994e-ff61-7c11-0ec7-11fefcb0186e@iogearbox.net/ ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2022-03-10 22:57:09 +01:00
Martin KaFai Lau	3daf0896f3	bpf: selftests: Update tests after s/delivery_time/tstamp/ change in bpf.h The previous patch made the follow changes: - s/delivery_time_type/tstamp_type/ - s/bpf_skb_set_delivery_time/bpf_skb_set_tstamp/ - BPF_SKB_DELIVERY_TIME_* to BPF_SKB_TSTAMP_* This patch is to change the test_tc_dtime.c to reflect the above. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220309090515.3712742-1-kafai@fb.com	2022-03-10 22:57:05 +01:00
Martin KaFai Lau	9bb984f28d	bpf: Remove BPF_SKB_DELIVERY_TIME_NONE and rename s/delivery_time_/tstamp_/ This patch is to simplify the uapi bpf.h regarding to the tstamp type and use a similar way as the kernel to describe the value stored in __sk_buff->tstamp. My earlier thought was to avoid describing the semantic and clock base for the rcv timestamp until there is more clarity on the use case, so the __sk_buff->delivery_time_type naming instead of __sk_buff->tstamp_type. With some thoughts, it can reuse the UNSPEC naming. This patch first removes BPF_SKB_DELIVERY_TIME_NONE and also rename BPF_SKB_DELIVERY_TIME_UNSPEC to BPF_SKB_TSTAMP_UNSPEC and BPF_SKB_DELIVERY_TIME_MONO to BPF_SKB_TSTAMP_DELIVERY_MONO. The semantic of BPF_SKB_TSTAMP_DELIVERY_MONO is the same: __sk_buff->tstamp has delivery time in mono clock base. BPF_SKB_TSTAMP_UNSPEC means __sk_buff->tstamp has the (rcv) tstamp at ingress and the delivery time at egress. At egress, the clock base could be found from skb->sk->sk_clockid. __sk_buff->tstamp == 0 naturally means NONE, so NONE is not needed. With BPF_SKB_TSTAMP_UNSPEC for the rcv tstamp at ingress, the __sk_buff->delivery_time_type is also renamed to __sk_buff->tstamp_type which was also suggested in the earlier discussion: https://lore.kernel.org/bpf/b181acbe-caf8-502d-4b7b-7d96b9fc5d55@iogearbox.net/ The above will then make __sk_buff->tstamp and __sk_buff->tstamp_type the same as its kernel skb->tstamp and skb->mono_delivery_time counter part. The internal kernel function bpf_skb_convert_dtime_type_read() is then renamed to bpf_skb_convert_tstamp_type_read() and it can be simplified with the BPF_SKB_DELIVERY_TIME_NONE gone. A BPF_ALU32_IMM(BPF_AND) insn is also saved by using BPF_JMP32_IMM(BPF_JSET). The bpf helper bpf_skb_set_delivery_time() is also renamed to bpf_skb_set_tstamp(). The arg name is changed from dtime to tstamp also. It only allows setting tstamp 0 for BPF_SKB_TSTAMP_UNSPEC and it could be relaxed later if there is use case to change mono delivery time to non mono. prog->delivery_time_access is also renamed to prog->tstamp_type_access. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220309090509.3712315-1-kafai@fb.com	2022-03-10 22:57:05 +01:00
Martin KaFai Lau	9d90db97e4	bpf: Simplify insn rewrite on BPF_WRITE __sk_buff->tstamp BPF_JMP32_IMM(BPF_JSET) is used to save a BPF_ALU32_IMM(BPF_AND). The skb->tc_at_ingress and skb->mono_delivery_time are at the same offset, so only one BPF_LDX_MEM(BPF_B) is needed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220309090502.3711982-1-kafai@fb.com	2022-03-10 22:57:05 +01:00
Martin KaFai Lau	539de9328e	bpf: Simplify insn rewrite on BPF_READ __sk_buff->tstamp The skb->tc_at_ingress and skb->mono_delivery_time are at the same byte offset. Thus, only one BPF_LDX_MEM(BPF_B) is needed and both bits can be tested together. /* BPF_READ: a = __sk_buff->tstamp */ if (skb->tc_at_ingress && skb->mono_delivery_time) a = 0; else a = skb->tstamp; Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220309090456.3711530-1-kafai@fb.com	2022-03-10 22:57:05 +01:00
Martin KaFai Lau	3b5d4ddf8f	bpf: net: Remove TC_AT_INGRESS_OFFSET and SKB_MONO_DELIVERY_TIME_OFFSET macro This patch removes the TC_AT_INGRESS_OFFSET and SKB_MONO_DELIVERY_TIME_OFFSET macros. Instead, PKT_VLAN_PRESENT_OFFSET is used because all of them are at the same offset. Comment is added to make it clear that changing the position of tc_at_ingress or mono_delivery_time will require to adjust the defined macros. The earlier discussion can be found here: https://lore.kernel.org/bpf/419d994e-ff61-7c11-0ec7-11fefcb0186e@iogearbox.net/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220309090450.3710955-1-kafai@fb.com	2022-03-10 22:57:05 +01:00
Yihao Han	743bec1b78	bpf, test_run: Use kvfree() for memory allocated with kvmalloc() It is allocated with kvmalloc(), the corresponding release function should not be kfree(), use kvfree() instead. Generated by: scripts/coccinelle/api/kfree_mismatch.cocci Fixes: `b530e9e106` ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Signed-off-by: Yihao Han <hanyihao@vivo.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20220310092828.13405-1-hanyihao@vivo.com	2022-03-10 16:20:02 +01:00
Toke Høiland-Jørgensen	eecbfd976e	bpf: Initialise retval in bpf_prog_test_run_xdp() The kernel test robot pointed out that the newly added bpf_test_run_xdp_live() runner doesn't set the retval in the caller (by design), which means that the variable can be passed unitialised to bpf_test_finish(). Fix this by initialising the variable properly. Fixes: `b530e9e106` ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220310110228.161869-1-toke@redhat.com	2022-03-10 16:12:10 +01:00
Niklas Söderlund	f655c088e7	bpftool: Restore support for BPF offload-enabled feature probing Commit `1a56c18e6c` ("bpftool: Stop supporting BPF offload-enabled feature probing") removed the support to probe for BPF offload features. This is still something that is useful for NFP NIC that can support offloading of BPF programs. The reason for the dropped support was that libbpf starting with v1.0 would drop support for passing the ifindex to the BPF prog/map/helper feature probing APIs. In order to keep this useful feature for NFP restore the functionality by moving it directly into bpftool. The code restored is a simplified version of the code that existed in libbpf which supposed passing the ifindex. The simplification is that it only targets the cases where ifindex is given and call into libbpf for the cases where it's not. Before restoring support for probing offload features: # bpftool feature probe dev ens4np0 Scanning system call availability... bpf() syscall is available Scanning eBPF program types... Scanning eBPF map types... Scanning eBPF helper functions... eBPF helpers supported for program type sched_cls: eBPF helpers supported for program type xdp: Scanning miscellaneous eBPF features... Large program size limit is NOT available Bounded loop support is NOT available ISA extension v2 is NOT available ISA extension v3 is NOT available With support for probing offload features restored: # bpftool feature probe dev ens4np0 Scanning system call availability... bpf() syscall is available Scanning eBPF program types... eBPF program_type sched_cls is available eBPF program_type xdp is available Scanning eBPF map types... eBPF map_type hash is available eBPF map_type array is available Scanning eBPF helper functions... eBPF helpers supported for program type sched_cls: - bpf_map_lookup_elem - bpf_get_prandom_u32 - bpf_perf_event_output eBPF helpers supported for program type xdp: - bpf_map_lookup_elem - bpf_get_prandom_u32 - bpf_perf_event_output - bpf_xdp_adjust_head - bpf_xdp_adjust_tail Scanning miscellaneous eBPF features... Large program size limit is NOT available Bounded loop support is NOT available ISA extension v2 is NOT available ISA extension v3 is NOT available Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220310121846.921256-1-niklas.soderlund@corigine.com	2022-03-10 16:09:47 +01:00
Alexei Starovoitov	de55c9a196	Merge branch 'Add support for transmitting packets using XDP in bpf_prog_run()' Toke Høiland-Jørgensen says: ==================== This series adds support for transmitting packets using XDP in bpf_prog_run(), by enabling a new mode "live packet" mode which will handle the XDP program return codes and redirect the packets to the stack or other devices. The primary use case for this is testing the redirect map types and the ndo_xdp_xmit driver operation without an external traffic generator. But it turns out to also be useful for creating a programmable traffic generator in XDP, as well as injecting frames into the stack. A sample traffic generator, which was included in previous versions of the series, but now moved to xdp-tools, transmits up to 9 Mpps/core on my test machine. To transmit the frames, the new mode instantiates a page_pool structure in bpf_prog_run() and initialises the pages to contain XDP frames with the data passed in by userspace. These frames can then be handled as though they came from the hardware XDP path, and the existing page_pool code takes care of returning and recycling them. The setup is optimised for high performance with a high number of repetitions to support stress testing and the traffic generator use case; see patch 1 for details. v11: - Fix override of return code in xdp_test_run_batch() - Add Martin's ACKs to remaining patches v10: - Only propagate memory allocation errors from xdp_test_run_batch() - Get rid of BPF_F_TEST_XDP_RESERVED; batch_size can be used to probe - Check that batch_size is unset in non-XDP test_run funcs - Lower the number of repetitions in the selftest to 10k - Count number of recycled pages in the selftest - Fix a few other nits from Martin, carry forward ACKs v9: - XDP_DROP packets in the selftest to ensure pages are recycled - Fix a few issues reported by the kernel test robot - Rewrite the documentation of the batch size to make it a bit clearer - Rebase to newest bpf-next v8: - Make the batch size configurable from userspace - Don't interrupt the packet loop on errors in do_redirect (this can be caught from the tracepoint) - Add documentation of the feature - Add reserved flag userspace can use to probe for support (kernel didn't check flags previously) - Rebase to newest bpf-next, disallow live mode for jumbo frames v7: - Extend the local_bh_disable() to cover the full test run loop, to prevent running concurrently with the softirq. Fixes a deadlock with veth xmit. - Reinstate the forwarding sysctl setting in the selftest, and bump up the number of packets being transmitted to trigger the above bug. - Update commit message to make it clear that user space can select the ingress interface. v6: - Fix meta vs data pointer setting and add a selftest for it - Add local_bh_disable() around code passing packets up the stack - Create a new netns for the selftest and use a TC program instead of the forwarding hack to count packets being XDP_PASS'ed from the test prog. - Check for the correct ingress ifindex in the selftest - Rebase and drop patches 1-5 that were already merged v5: - Rebase to current bpf-next v4: - Fix a few code style issues (Alexei) - Also handle the other return codes: XDP_PASS builds skbs and injects them into the stack, and XDP_TX is turned into a redirect out the same interface (Alexei). - Drop the last patch adding an xdp_trafficgen program to samples/bpf; this will live in xdp-tools instead (Alexei). - Add a separate bpf_test_run_xdp_live() function to test_run.c instead of entangling the new mode in the existing bpf_test_run(). v3: - Reorder patches to make sure they all build individually (Patchwork) - Remove a couple of unused variables (Patchwork) - Remove unlikely() annotation in slow path and add back John's ACK that I accidentally dropped for v2 (John) v2: - Split up up __xdp_do_redirect to avoid passing two pointers to it (John) - Always reset context pointers before each test run (John) - Use get_mac_addr() from xdp_sample_user.h instead of rolling our own (Kumar) - Fix wrong offset for metadata pointer ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-03-09 14:19:23 -08:00

1 2 3 4 5 ...

1075905 Commits