linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-27 06:31:52 +00:00

Author	SHA1	Message	Date
Alexei Starovoitov	8e07bb9ebc	bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu. Convert bpf_cpumask to bpf_mem_cache_free_rcu. Note that migrate_disable() in bpf_cpumask_release() is still necessary, since bpf_cpumask_release() is a dtor. bpf_obj_free_fields() can be converted to do migrate_disable() there in a follow up. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-14-alexei.starovoitov@gmail.com	2023-07-12 23:45:23 +02:00
Alexei Starovoitov	5af6807bdb	bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu(). Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu(). Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into per-cpu free list the _rcu() flavor waits for RCU grace period and then moves objects into free_by_rcu_ttrace list where they are waiting for RCU task trace grace period to be freed into slab. The life cycle of objects: alloc: dequeue free_llist free: enqeueu free_llist free_rcu: enqueue free_by_rcu -> waiting_for_gp free_llist above high watermark -> free_by_rcu_ttrace after RCU GP waiting_for_gp -> free_by_rcu_ttrace free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-13-alexei.starovoitov@gmail.com	2023-07-12 23:45:23 +02:00
Alexei Starovoitov	f76faa65c9	selftests/bpf: Improve test coverage of bpf_mem_alloc. bpf_obj_new() calls bpf_mem_alloc(), but doing alloc/free of 8 elements is not triggering watermark conditions in bpf_mem_alloc. Increase to 200 elements to make sure alloc_bulk/free_bulk is exercised. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-12-alexei.starovoitov@gmail.com	2023-07-12 23:45:23 +02:00
Paul E. McKenney	43a89baecf	rcu: Export rcu_request_urgent_qs_task() If a CPU is executing a long series of non-sleeping system calls, RCU grace periods can be delayed for on the order of a couple hundred milliseconds. This is normally not a problem, but if each system call does a call_rcu(), those callbacks can stack up. RCU will eventually notice this callback storm, but use of rcu_request_urgent_qs_task() allows the code invoking call_rcu() to give RCU a heads up. This function is not for general use, not yet, anyway. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230706033447.54696-11-alexei.starovoitov@gmail.com	2023-07-12 23:45:23 +02:00
Alexei Starovoitov	04fabf00b4	bpf: Allow reuse from waiting_for_gp_ttrace list. alloc_bulk() can reuse elements from free_by_rcu_ttrace. Let it reuse from waiting_for_gp_ttrace as well to avoid unnecessary kmalloc(). Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230706033447.54696-10-alexei.starovoitov@gmail.com	2023-07-12 23:45:23 +02:00
Alexei Starovoitov	822fb26bdb	bpf: Add a hint to allocated objects. To address OOM issue when one cpu is allocating and another cpu is freeing add a target bpf_mem_cache hint to allocated objects and when local cpu free_llist overflows free to that bpf_mem_cache. The hint addresses the OOM while maintaining the same performance for common case when alloc/free are done on the same cpu. Note that do_call_rcu_ttrace() now has to check 'draining' flag in one more case, since do_call_rcu_ttrace() is called not only for current cpu. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-9-alexei.starovoitov@gmail.com	2023-07-12 23:45:23 +02:00
Alexei Starovoitov	d114dde245	bpf: Change bpf_mem_cache draining process. The next patch will introduce cross-cpu llist access and existing irq_work_sync() + drain_mem_cache() + rcu_barrier_tasks_trace() mechanism will not be enough, since irq_work_sync() + drain_mem_cache() on cpu A won't guarantee that llist on cpu A are empty. The free_bulk() on cpu B might add objects back to llist of cpu A. Add 'bool draining' flag. The modified sequence looks like: for_each_cpu: WRITE_ONCE(c->draining, true); // do_call_rcu_ttrace() won't be doing call_rcu() any more irq_work_sync(); // wait for irq_work callback (free_bulk) to finish drain_mem_cache(); // free all objects rcu_barrier_tasks_trace(); // wait for RCU callbacks to execute Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-8-alexei.starovoitov@gmail.com	2023-07-12 23:45:22 +02:00
Alexei Starovoitov	7468048237	bpf: Further refactor alloc_bulk(). In certain scenarios alloc_bulk() might be taking free objects mainly from free_by_rcu_ttrace list. In such case get_memcg() and set_active_memcg() are redundant, but they show up in perf profile. Split the loop and only set memcg when allocating from slab. No performance difference in this patch alone, but it helps in combination with further patches. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-7-alexei.starovoitov@gmail.com	2023-07-12 23:45:22 +02:00
Alexei Starovoitov	18e027b1c7	bpf: Factor out inc/dec of active flag into helpers. Factor out local_inc/dec_return(&c->active) into helpers. No functional changes. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-6-alexei.starovoitov@gmail.com	2023-07-12 23:45:22 +02:00
Alexei Starovoitov	05ae68656a	bpf: Refactor alloc_bulk(). Factor out inner body of alloc_bulk into separate helper. No functional changes. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-5-alexei.starovoitov@gmail.com	2023-07-12 23:45:22 +02:00
Alexei Starovoitov	9de3e81521	bpf: Let free_all() return the number of freed elements. Let free_all() helper return the number of freed elements. It's not used in this patch, but helps in debug/development of bpf_mem_alloc. For example this diff for __free_rcu(): - free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); + printk("cpu %d freed %d objs after tasks trace\n", raw_smp_processor_id(), + free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size)); would show how busy RCU tasks trace is. In artificial benchmark where one cpu is allocating and different cpu is freeing the RCU tasks trace won't be able to keep up and the list of objects would keep growing from thousands to millions and eventually OOMing. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-4-alexei.starovoitov@gmail.com	2023-07-12 23:45:22 +02:00
Alexei Starovoitov	a80672d7e1	bpf: Simplify code of destroy_mem_alloc() with kmemdup(). Use kmemdup() to simplify the code. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-3-alexei.starovoitov@gmail.com	2023-07-12 23:45:22 +02:00
Alexei Starovoitov	12c8d0f4c8	bpf: Rename few bpf_mem_alloc fields. Rename: - struct rcu_head rcu; - struct llist_head free_by_rcu; - struct llist_head waiting_for_gp; - atomic_t call_rcu_in_progress; + struct llist_head free_by_rcu_ttrace; + struct llist_head waiting_for_gp_ttrace; + struct rcu_head rcu_ttrace; + atomic_t call_rcu_ttrace_in_progress; ... - static void do_call_rcu(struct bpf_mem_cache c) + static void do_call_rcu_ttrace(struct bpf_mem_cache c) to better indicate intended use. The 'tasks trace' is shortened to 'ttrace' to reduce verbosity. No functional changes. Later patches will add free_by_rcu/waiting_for_gp fields to be used with normal RCU. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-2-alexei.starovoitov@gmail.com	2023-07-12 23:45:22 +02:00
Andrii Nakryiko	c21de5fc5f	selftests/bpf: extend existing map resize tests for per-cpu use case Add a per-cpu array resizing use case and demonstrate how bpf_get_smp_processor_id() can be used to directly access proper data with no extra checks. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230711232400.1658562-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-12 07:57:18 -07:00
Andrii Nakryiko	f42bcd168d	bpf: teach verifier actual bounds of bpf_get_smp_processor_id() result bpf_get_smp_processor_id() helper returns current CPU on which BPF program runs. It can't return value that is bigger than maximum allowed number of CPUs (minus one, due to zero indexing). Teach BPF verifier to recognize that. This makes it possible to use bpf_get_smp_processor_id() result to index into arrays without extra checks, as demonstrated in subsequent selftests/bpf patch. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230711232400.1658562-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-12 07:57:18 -07:00
Alexei Starovoitov	87e098e623	Merge branch 'bpf: Support ->fill_link_info for kprobe_multi and perf_event links' Yafang Shao says: ==================== This patchset enhances the usability of kprobe_multi program by introducing support for ->fill_link_info. This allows users to easily determine the probed functions associated with a kprobe_multi program. While `bpftool perf show` already provides information about functions probed by perf_event programs, supporting ->fill_link_info ensures consistent access to this information across all bpf links. In addition, this patch extends support to generic perf events, which are currently not covered by `bpftool perf show`. While userspace is exposed to only the perf type and config, other attributes such as sample_period and sample_freq are disregarded. To ensure accurate identification of probed functions, it is preferable to expose the address directly rather than relying solely on the symbol name. However, this implementation respects the kptr_restrict setting and avoids exposing the address if it is not permitted. v6->v7: - From Daniel - No need to explicitly cast in many places - Use ptr_to_u64() instead of the cast - return -ENOMEM when calloc fails - Simplify the code in bpf_get_kprobe_info() further - Squash #9 with #8 - And other coding style improvement - From Andrii - Comment improvement - Use ENOSPC instead of E2BIG - Use strlen only when buf in not NULL - Clear probe_addr in bpf_get_uprobe_info() v5->v6: - From Andrii - if ucount is too less, copy ucount items and return -E2BIG - zero out kmulti_link->cnt elements if it is not permitted by kptr - avoid leaking information when ucount is greater than kmulti_link->cnt - drop the flags, and add BPF_PERF_EVENT_[UK]RETPROBE - From Quentin - use jsonw_null instead when we have no module name - add explanation on perf_type_name in the commit log - avoid the unnecessary out lable v4->v5: - Print "func [module]" in the kprobe_multi header (Andrii) - Remove MAX_BPF_PERF_EVENT_TYPE (Alexei) - Add padding field for future reuse (Yonghong) v3->v4: - From Quentin - Rename MODULE_NAME_LEN to MODULE_MAX_NAME - Convert retprobe to boolean for json output - Trim the square brackets around module names for json output - Move perf names into link.c - Use a generic helper to get perf names - Show address before func name, for consistency - Use switch-case instead of if-else - Increase the buff len to PATH_MAX - Move macros to the top of the file - From Andrii - kprobe_multi flags should always be returned - Keep it single line if it fits in under 100 characters - Change the output format when showing kprobe_multi - Imporve the format of perf_event names - Rename struct perf_link to struct perf_event, and change the names of the enum consequently - From Yonghong - Avoid disallowing extensions for all structs in the big union - From Jiri - Add flags to bpf_kprobe_multi_link - Report kprobe_multi selftests errors - Rename bpf_perf_link_fill_name and make it a separate patch - Avoid breaking compilation when CONFIG_KPROBE_EVENTS or CONFIG_UPROBE_EVENTS options are not defined v2->v3: - Expose flags instead of retporbe (Andrii) - Simplify the check on kmulti_link->cnt (Andrii) - Use kallsyms_show_value() instead (Andrii) - Show also the module name for kprobe_multi (Andrii) - Add new enum bpf_perf_link_type (Andrii) - Move perf event names into bpftool (Andrii, Quentin, Jiri) - Keep perf event names in sync with perf tools (Jiri) v1->v2: - Fix sparse warning (Stanislav, lkp@intel.com) - Fix BPF CI build error - Reuse kernel_syms_load() (Alexei) - Print 'name' instead of 'func' (Alexei) - Show whether the probe is retprobe or not (Andrii) - Add comment for the meaning of perf_event name (Andrii) - Add support for generic perf event - Adhere to the kptr_restrict setting RFC->v1: - Use a single copy_to_user() instead (Jiri) - Show also the symbol name in bpftool (Quentin, Alexei) - Use calloc() instead of malloc() in bpftool (Quentin) - Avoid having conditional entries in the JSON output (Quentin) - Drop ->show_fdinfo (Alexei) - Use __u64 instead of __aligned_u64 for the field addr (Alexei) - Avoid the contradiction in perf_event name length (Alexei) - Address a build warning reported by kernel test robot <lkp@intel.com> ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:57 -07:00
Yafang Shao	88d6160737	bpftool: Show perf link info Enhance bpftool to display comprehensive information about exposed perf_event links, covering uprobe, kprobe, tracepoint, and generic perf event. The resulting output will include the following details: $ tools/bpf/bpftool/bpftool link show 3: perf_event prog 14 event software:cpu-clock bpf_cookie 0 pids perf_event(19483) 4: perf_event prog 14 event hw-cache:LLC-load-misses bpf_cookie 0 pids perf_event(19483) 5: perf_event prog 14 event hardware:cpu-cycles bpf_cookie 0 pids perf_event(19483) 6: perf_event prog 19 tracepoint sched_switch bpf_cookie 0 pids tracepoint(20947) 7: perf_event prog 26 uprobe /home/dev/waken/bpf/uprobe/a.out+0x1338 bpf_cookie 0 pids uprobe(21973) 8: perf_event prog 27 uretprobe /home/dev/waken/bpf/uprobe/a.out+0x1338 bpf_cookie 0 pids uprobe(21973) 10: perf_event prog 43 kprobe ffffffffb70a9660 kernel_clone bpf_cookie 0 pids kprobe(35275) 11: perf_event prog 41 kretprobe ffffffffb70a9660 kernel_clone bpf_cookie 0 pids kprobe(35275) $ tools/bpf/bpftool/bpftool link show -j [{"id":3,"type":"perf_event","prog_id":14,"event_type":"software","event_config":"cpu-clock","bpf_cookie":0,"pids":[{"pid":19483,"comm":"perf_event"}]},{"id":4,"type":"perf_event","prog_id":14,"event_type":"hw-cache","event_config":"LLC-load-misses","bpf_cookie":0,"pids":[{"pid":19483,"comm":"perf_event"}]},{"id":5,"type":"perf_event","prog_id":14,"event_type":"hardware","event_config":"cpu-cycles","bpf_cookie":0,"pids":[{"pid":19483,"comm":"perf_event"}]},{"id":6,"type":"perf_event","prog_id":19,"tracepoint":"sched_switch","bpf_cookie":0,"pids":[{"pid":20947,"comm":"tracepoint"}]},{"id":7,"type":"perf_event","prog_id":26,"retprobe":false,"file":"/home/dev/waken/bpf/uprobe/a.out","offset":4920,"bpf_cookie":0,"pids":[{"pid":21973,"comm":"uprobe"}]},{"id":8,"type":"perf_event","prog_id":27,"retprobe":true,"file":"/home/dev/waken/bpf/uprobe/a.out","offset":4920,"bpf_cookie":0,"pids":[{"pid":21973,"comm":"uprobe"}]},{"id":10,"type":"perf_event","prog_id":43,"retprobe":false,"addr":18446744072485508704,"func":"kernel_clone","offset":0,"bpf_cookie":0,"pids":[{"pid":35275,"comm":"kprobe"}]},{"id":11,"type":"perf_event","prog_id":41,"retprobe":true,"addr":18446744072485508704,"func":"kernel_clone","offset":0,"bpf_cookie":0,"pids":[{"pid":35275,"comm":"kprobe"}]}] For generic perf events, the displayed information in bpftool is limited to the type and configuration, while other attributes such as sample_period, sample_freq, etc., are not included. The kernel function address won't be exposed if it is not permitted by kptr_restrict. The result as follows when kptr_restrict is 2. $ tools/bpf/bpftool/bpftool link show 3: perf_event prog 14 event software:cpu-clock 4: perf_event prog 14 event hw-cache:LLC-load-misses 5: perf_event prog 14 event hardware:cpu-cycles 6: perf_event prog 19 tracepoint sched_switch 7: perf_event prog 26 uprobe /home/dev/waken/bpf/uprobe/a.out+0x1338 8: perf_event prog 27 uretprobe /home/dev/waken/bpf/uprobe/a.out+0x1338 10: perf_event prog 43 kprobe kernel_clone 11: perf_event prog 41 kretprobe kernel_clone Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-11-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:51 -07:00
Yafang Shao	62b57e3ddd	bpftool: Add perf event names Add new functions and macros to get perf event names. These names except the perf_type_name are all copied from tool/perf/util/{parse-events,evsel}.c, so that in the future we will have a good chance to use the same code. Suggested-by: Jiri Olsa <olsajiri@gmail.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-10-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:51 -07:00
Yafang Shao	1b715e1b0e	bpf: Support ->fill_link_info for perf_event By introducing support for ->fill_link_info to the perf_event link, users gain the ability to inspect it using `bpftool link show`. While the current approach involves accessing this information via `bpftool perf show`, consolidating link information for all link types in one place offers greater convenience. Additionally, this patch extends support to the generic perf event, which is not currently accommodated by `bpftool perf show`. While only the perf type and config are exposed to userspace, other attributes such as sample_period and sample_freq are ignored. It's important to note that if kptr_restrict is not permitted, the probed address will not be exposed, maintaining security measures. A new enum bpf_perf_event_type is introduced to help the user understand which struct is relevant. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-9-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:51 -07:00
Yafang Shao	57d4853765	bpf: Add a common helper bpf_copy_to_user() Add a common helper bpf_copy_to_user(), which will be used at multiple places. No functional change. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-8-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:51 -07:00
Yafang Shao	cd3910d005	bpf: Expose symbol's respective address Since different symbols can share the same name, it is insufficient to only expose the symbol name. It is essential to also expose the symbol address so that users can accurately identify which one is being probed. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-7-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:51 -07:00
Yafang Shao	5125e757e6	bpf: Clear the probe_addr for uprobe To avoid returning uninitialized or random values when querying the file descriptor (fd) and accessing probe_addr, it is necessary to clear the variable prior to its use. Fixes: `41bdc4b40e` ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-6-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:51 -07:00
Yafang Shao	f1a414537e	bpf: Protect probed address based on kptr_restrict setting The probed address can be accessed by userspace through querying the task file descriptor (fd). However, it is crucial to adhere to the kptr_restrict setting and refrain from exposing the address if it is not permitted. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-5-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:51 -07:00
Yafang Shao	edd7f49bb8	bpftool: Show kprobe_multi link info Show the already expose kprobe_multi link info in bpftool. The result as follows, $ tools/bpf/bpftool/bpftool link show 91: kprobe_multi prog 244 kprobe.multi func_cnt 7 addr func [module] ffffffff98c44f20 schedule_timeout_interruptible ffffffff98c44f60 schedule_timeout_killable ffffffff98c44fa0 schedule_timeout_uninterruptible ffffffff98c44fe0 schedule_timeout_idle ffffffffc075b8d0 xfs_trans_get_efd [xfs] ffffffffc0768a10 xfs_trans_get_buf_map [xfs] ffffffffc076c320 xfs_trans_get_dqtrx [xfs] pids kprobe_multi(188367) 92: kprobe_multi prog 244 kretprobe.multi func_cnt 7 addr func [module] ffffffff98c44f20 schedule_timeout_interruptible ffffffff98c44f60 schedule_timeout_killable ffffffff98c44fa0 schedule_timeout_uninterruptible ffffffff98c44fe0 schedule_timeout_idle ffffffffc075b8d0 xfs_trans_get_efd [xfs] ffffffffc0768a10 xfs_trans_get_buf_map [xfs] ffffffffc076c320 xfs_trans_get_dqtrx [xfs] pids kprobe_multi(188367) $ tools/bpf/bpftool/bpftool link show -j [{"id":91,"type":"kprobe_multi","prog_id":244,"retprobe":false,"func_cnt":7,"funcs":[{"addr":18446744071977586464,"func":"schedule_timeout_interruptible","module":null},{"addr":18446744071977586528,"func":"schedule_timeout_killable","module":null},{"addr":18446744071977586592,"func":"schedule_timeout_uninterruptible","module":null},{"addr":18446744071977586656,"func":"schedule_timeout_idle","module":null},{"addr":18446744072643524816,"func":"xfs_trans_get_efd","module":"xfs"},{"addr":18446744072643578384,"func":"xfs_trans_get_buf_map","module":"xfs"},{"addr":18446744072643592992,"func":"xfs_trans_get_dqtrx","module":"xfs"}],"pids":[{"pid":188367,"comm":"kprobe_multi"}]},{"id":92,"type":"kprobe_multi","prog_id":244,"retprobe":true,"func_cnt":7,"funcs":[{"addr":18446744071977586464,"func":"schedule_timeout_interruptible","module":null},{"addr":18446744071977586528,"func":"schedule_timeout_killable","module":null},{"addr":18446744071977586592,"func":"schedule_timeout_uninterruptible","module":null},{"addr":18446744071977586656,"func":"schedule_timeout_idle","module":null},{"addr":18446744072643524816,"func":"xfs_trans_get_efd","module":"xfs"},{"addr":18446744072643578384,"func":"xfs_trans_get_buf_map","module":"xfs"},{"addr":18446744072643592992,"func":"xfs_trans_get_dqtrx","module":"xfs"}],"pids":[{"pid":188367,"comm":"kprobe_multi"}]}] When kptr_restrict is 2, the result is, $ tools/bpf/bpftool/bpftool link show 91: kprobe_multi prog 244 kprobe.multi func_cnt 7 92: kprobe_multi prog 244 kretprobe.multi func_cnt 7 Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-4-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:50 -07:00
Yafang Shao	dc6519445b	bpftool: Dump the kernel symbol's module name If the kernel symbol is in a module, we will dump the module name as well. The square brackets around the module name are trimmed. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:50 -07:00
Yafang Shao	7ac8d0d261	bpf: Support ->fill_link_info for kprobe_multi With the addition of support for fill_link_info to the kprobe_multi link, users will gain the ability to inspect it conveniently using the `bpftool link show`. This enhancement provides valuable information to the user, including the count of probed functions and their respective addresses. It's important to note that if the kptr_restrict setting is not permitted, the probed address will not be exposed, ensuring security. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-11 20:07:50 -07:00
Rong Tao	07018b5706	samples/bpf: syscall_tp: Aarch64 no open syscall __NR_open never exist on AArch64. Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/tencent_C6AD4AD72BEFE813228FC188905F96C6A506@qq.com	2023-07-11 10:02:42 -07:00
John Sanpe	a3e7e6b179	libbpf: Remove HASHMAP_INIT static initialization helper Remove the wrong HASHMAP_INIT. It's not used anywhere in libbpf. Signed-off-by: John Sanpe <sanpeqf@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230711070712.2064144-1-sanpeqf@gmail.com	2023-07-11 09:40:05 -07:00
Andrii Nakryiko	8a0260dbf6	libbpf: Fix realloc API handling in zero-sized edge cases realloc() and reallocarray() can either return NULL or a special non-NULL pointer, if their size argument is zero. This requires a bit more care to handle NULL-as-valid-result situation differently from NULL-as-error case. This has caused real issues before ([0]), and just recently bit again in production when performing bpf_program__attach_usdt(). This patch fixes 4 places that do or potentially could suffer from this mishandling of NULL, including the reported USDT-related one. There are many other places where realloc()/reallocarray() is used and NULL is always treated as an error value, but all those have guarantees that their size is always non-zero, so those spot don't need any extra handling. [0] `d08ab82f59` ("libbpf: Fix double-free when linker processes empty sections") Fixes: `999783c8bb` ("libbpf: Wire up spec management and other arch-independent USDT logic") Fixes: `b63b3c490e` ("libbpf: Add bpf_program__set_insns function") Fixes: `697f104db8` ("libbpf: Support custom SEC() handlers") Fixes: `b126882672` ("libbpf: Change the order of data and text relocations.") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230711024150.1566433-1-andrii@kernel.org	2023-07-11 09:32:00 +02:00
David Vernet	4d496be9ca	bpf,docs: Create new standardization subdirectory The BPF standardization effort is actively underway with the IETF. As described in the BPF Working Group (WG) charter in [0], there are a number of proposed documents, some informational and some proposed standards, that will be drafted as part of the standardization effort. [0]: https://datatracker.ietf.org/wg/bpf/about/ Though the specific documents that will formally be standardized will exist as Internet Drafts (I-D) and WG documents in the BPF WG datatracker page, the source of truth from where those documents will be generated will reside in the kernel documentation tree (originating in the bpf-next tree). Because these documents will be used to generate the I-D and WG documents which will be standardized with the IETF, they are a bit special as far as kernel-tree documentation goes: - They will be dual licensed with LGPL-2.1 OR BSD-2-Clause - IETF I-D and WG documents (the documents which will actually be standardized) will be auto-generated from these documents. In order to keep things clearly organized in the BPF documentation tree, and to make it abundantly clear where standards-related documentation needs to go, we should move standards-relevant documents into a separate standardization/ subdirectory. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230710183027.15132-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-10 18:12:50 -07:00
Andrii Nakryiko	19f4b53234	Merge branch 'bpftool: Fix skeletons compilation for older kernels' Quentin Monnet says: ==================== At runtime, bpftool may run its own BPF programs to get the pids of processes referencing BPF programs, or to profile programs. The skeletons for these programs rely on a vmlinux.h header and may fail to compile when building bpftool on hosts running older kernels, where some structs or enums are not defined. In this set, we address this issue by using local definitions for struct perf_event, struct bpf_perf_link, BPF_LINK_TYPE_PERF_EVENT (pids.bpf.c) and struct bpf_perf_event_value (profiler.bpf.c). This set contains patches 1 to 3 from Alexander Lobakin's series, "bpf: random unpopular userspace fixes (32 bit et al)" (v2) [0], from April 2022. An additional patch defines a local version of BPF_LINK_TYPE_PERF_EVENT in bpftool's pids.bpf.c. [0] https://lore.kernel.org/bpf/20220421003152.339542-1-alobakin@pm.me/ v2: Fixed description (CO-RE for container_of()) in patch 2. Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Michal Suchánek <msuchanek@suse.de> Alexander Lobakin (3): bpftool: use a local copy of perf_event to fix accessing ::bpf_cookie bpftool: define a local bpf_perf_link to fix accessing its fields bpftool: use a local bpf_perf_event_value to fix accessing its fields ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2023-07-10 17:01:41 -07:00
Alexander Lobakin	658ac06801	bpftool: Use a local bpf_perf_event_value to fix accessing its fields Fix the following error when building bpftool: CLANG profiler.bpf.o CLANG pid_iter.bpf.o skeleton/profiler.bpf.c:18:21: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_perf_event_value' __uint(value_size, sizeof(struct bpf_perf_event_value)); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helpers.h:13:39: note: expanded from macro '__uint' tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helper_defs.h:7:8: note: forward declaration of 'struct bpf_perf_event_value' struct bpf_perf_event_value; ^ struct bpf_perf_event_value is being used in the kernel only when CONFIG_BPF_EVENTS is enabled, so it misses a BTF entry then. Define struct bpf_perf_event_value___local with the `preserve_access_index` attribute inside the pid_iter BPF prog to allow compiling on any configs. It is a full mirror of a UAPI structure, so is compatible both with and w/o CO-RE. bpf_perf_event_read_value() requires a pointer of the original type, so a cast is needed. Fixes: `47c09d6a9f` ("bpftool: Introduce "prog profile" command") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230707095425.168126-5-quentin@isovalent.com	2023-07-10 15:29:21 -07:00
Quentin Monnet	44ba7b30e8	bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c In order to allow the BPF program in bpftool's pid_iter.bpf.c to compile correctly on hosts where vmlinux.h does not define BPF_LINK_TYPE_PERF_EVENT (running kernel versions lower than 5.15, for example), define and use a local copy of the enum value. This requires LLVM 12 or newer to build the BPF program. Fixes: `cbdaf71f7e` ("bpftool: Add bpf_cookie to link output") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230707095425.168126-4-quentin@isovalent.com	2023-07-10 15:29:20 -07:00
Alexander Lobakin	67a43462ee	bpftool: Define a local bpf_perf_link to fix accessing its fields When building bpftool with !CONFIG_PERF_EVENTS: skeleton/pid_iter.bpf.c:47:14: error: incomplete definition of type 'struct bpf_perf_link' perf_link = container_of(link, struct bpf_perf_link, link); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helpers.h:74:22: note: expanded from macro 'container_of' ((type )(__mptr - offsetof(type, member))); \ ^~~~~~~~~~~~~~~~~~~~~~ tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_helpers.h:68:60: note: expanded from macro 'offsetof' #define offsetof(TYPE, MEMBER) ((unsigned long)&((TYPE )0)->MEMBER) ~~~~~~~~~~~^ skeleton/pid_iter.bpf.c:44:9: note: forward declaration of 'struct bpf_perf_link' struct bpf_perf_link *perf_link; ^ &bpf_perf_link is being defined and used only under the ifdef. Define struct bpf_perf_link___local with the `preserve_access_index` attribute inside the pid_iter BPF prog to allow compiling on any configs. CO-RE will substitute it with the real struct bpf_perf_link accesses later on. container_of() uses offsetof(), which does the necessary CO-RE relocation if the field is specified with `preserve_access_index` - as is the case for struct bpf_perf_link___local. Fixes: `cbdaf71f7e` ("bpftool: Add bpf_cookie to link output") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230707095425.168126-3-quentin@isovalent.com	2023-07-10 15:29:20 -07:00
Alexander Lobakin	4cbeeb0dc0	bpftool: use a local copy of perf_event to fix accessing :: Bpf_cookie When CONFIG_PERF_EVENTS is not set, struct perf_event remains empty. However, the structure is being used by bpftool indirectly via BTF. This leads to: skeleton/pid_iter.bpf.c:49:30: error: no member named 'bpf_cookie' in 'struct perf_event' return BPF_CORE_READ(event, bpf_cookie); ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ ... skeleton/pid_iter.bpf.c:49:9: error: returning 'void' from a function with incompatible result type '__u64' (aka 'unsigned long long') return BPF_CORE_READ(event, bpf_cookie); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tools and samples can't use any CONFIG_ definitions, so the fields used there should always be present. Define struct perf_event___local with the `preserve_access_index` attribute inside the pid_iter BPF prog to allow compiling on any configs. CO-RE will substitute it with the real struct perf_event accesses later on. Fixes: `cbdaf71f7e` ("bpftool: Add bpf_cookie to link output") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230707095425.168126-2-quentin@isovalent.com	2023-07-10 15:29:20 -07:00
Andrii Nakryiko	c628747cc8	libbpf: only reset sec_def handler when necessary Don't reset recorded sec_def handler unconditionally on bpf_program__set_type(). There are two situations where this is wrong. First, if the program type didn't actually change. In that case original SEC handler should work just fine. Second, catch-all custom SEC handler is supposed to work with any BPF program type and SEC() annotation, so it also doesn't make sense to reset that. This patch fixes both issues. This was reported recently in the context of breaking perf tool, which uses custom catch-all handler for fancy BPF prologue generation logic. This patch should fix the issue. [0] https://lore.kernel.org/linux-perf-users/ab865e6d-06c5-078e-e404-7f90686db50d@amd.com/ Fixes: `d6e6286a12` ("libbpf: disassociate section handler on explicit bpf_program__set_type() call") Reported-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230707231156.1711948-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-08 18:29:53 -07:00
Lu Hongfei	856fe03d92	selftests/bpf: Correct two typos When wrapping code, use ';' better than using ',' which is more in line with the coding habits of most engineers. Signed-off-by: Lu Hongfei <luhongfei@vivo.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230707081253.34638-1-luhongfei@vivo.com	2023-07-07 19:36:04 +02:00
Jackie Liu	56baeeba0a	libbpf: Use available_filter_functions_addrs with multi-kprobes Now that kernel provides a new available_filter_functions_addrs file which can help us avoid the need to cross-validate available_filter_functions and kallsyms, we can improve efficiency of multi-attach kprobes. For example, on my device, the sample program [1] of start time: $ sudo ./funccount "tcp_*" before after 1.2s 1.0s [1]: https://github.com/JackieLiu1/ketones/tree/master/src/funccount Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230705091209.3803873-2-liu.yun@linux.dev	2023-07-06 16:05:08 -07:00
Jackie Liu	8a3fe76f87	libbpf: Cross-join available_filter_functions and kallsyms for multi-kprobes When using regular expression matching with "kprobe multi", it scans all the functions under "/proc/kallsyms" that can be matched. However, not all of them can be traced by kprobe.multi. If any one of the functions fails to be traced, it will result in the failure of all functions. The best approach is to filter out the functions that cannot be traced to ensure proper tracking of the functions. Closes: https://lore.kernel.org/oe-kbuild-all/202307030355.TdXOHklM-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Jiri Olsa <jolsa@kernel.org> Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230705091209.3803873-1-liu.yun@linux.dev	2023-07-06 16:04:50 -07:00
Björn Töpel	e76a014334	selftests/bpf: Bump and validate MAX_SYMS BPF tests that load /proc/kallsyms, e.g. bpf_cookie, will perform a buffer overrun if the number of syms on the system is larger than MAX_SYMS. Bump the MAX_SYMS to 400000, and add a runtime check that bails out if the maximum is reached. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230706142228.1128452-1-bjorn@kernel.org	2023-07-06 13:39:40 -07:00
Alexei Starovoitov	b625030c90	Merge branch 'bpf: add percpu stats for bpf_map' Anton Protopopov says: ==================== This series adds a mechanism for maps to populate per-cpu counters on insertions/deletions. The sum of these counters can be accessed by a new kfunc from map iterator and tracing programs. The following patches are present in the series: * Patch 1 adds a generic per-cpu counter to struct bpf_map * Patch 2 adds a new kfunc to access the sum of per-cpu counters * Patch 3 utilizes this mechanism for hash-based maps * Patch 4 extends the preloaded map iterator to dump the sum * Patch 5 adds a self-test for the change The reason for adding this functionality in our case (Cilium) is to get signals about how full some heavy-used maps are and what the actual dynamic profile of map capacity is. In the case of LRU maps this is impossible to get this information anyhow else. The original presentation can be found here [1]. [1] https://lpc.events/event/16/contributions/1368/ v4 -> v5: * don't pass useless empty opts when creating a link, pass NULL (Hou) * add a debug message (Hou) * make code more readable (Alexei) * remove the selftest which only checked that elem_count != NULL v3 -> v4: * fix selftests: * added test code for batch map operations * added a test for BPF_MAP_TYPE_HASH_OF_MAPS (Hou) * added tests for BPF_MAP_TYPE_LRU* with BPF_F_NO_COMMON_LRU (Hou) * map_info was called multiple times unnecessarily (Hou) * small fixes + some memory leaks (Hou) * fixed wrong error path for freeing a non-prealloc map (Hou) * fixed counters for batch delete operations (Hou) v2 -> v3: - split commits to better represent update logic (Alexei) - remove filter from kfunc to allow all tracing programs (Alexei) - extend selftests (Alexei) v1 -> v2: - make the counters generic part of struct bpf_map (Alexei) - don't use map_info and /proc/self/fdinfo in favor of a kfunc (Alexei) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-06 12:42:31 -07:00
Anton Protopopov	6c1b8cb6a7	selftests/bpf: test map percpu stats Add a new map test, map_percpu_stats.c, which is checking the correctness of map's percpu elements counters. For supported maps the test upserts a number of elements, checks the correctness of the counters, then deletes all the elements and checks again that the counters sum drops down to zero. The following map types are tested: * BPF_MAP_TYPE_HASH, BPF_F_NO_PREALLOC * BPF_MAP_TYPE_PERCPU_HASH, BPF_F_NO_PREALLOC * BPF_MAP_TYPE_HASH, * BPF_MAP_TYPE_PERCPU_HASH, * BPF_MAP_TYPE_LRU_HASH * BPF_MAP_TYPE_LRU_PERCPU_HASH * BPF_MAP_TYPE_LRU_HASH, BPF_F_NO_COMMON_LRU * BPF_MAP_TYPE_LRU_PERCPU_HASH, BPF_F_NO_COMMON_LRU * BPF_MAP_TYPE_HASH_OF_MAPS Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230706133932.45883-6-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-06 12:42:25 -07:00
Anton Protopopov	515ee52b22	bpf: make preloaded map iterators to display map elements count Add another column to the /sys/fs/bpf/maps.debug iterator to display cur_entries, the current number of entries in the map as is returned by the bpf_map_sum_elem_count kfunc. Also fix formatting. Example: # cat /sys/fs/bpf/maps.debug id name max_entries cur_entries 2 iterator.rodata 1 0 125 cilium_auth_map 524288 666 126 cilium_runtime_ 256 0 127 cilium_signals 32 0 128 cilium_node_map 16384 1344 129 cilium_events 32 0 ... Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230706133932.45883-5-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-06 12:42:25 -07:00
Anton Protopopov	9bc421b6be	bpf: populate the per-cpu insertions/deletions counters for hashmaps Initialize and utilize the per-cpu insertions/deletions counters for hash-based maps. Non-trivial changes only apply to the preallocated maps for which the {inc,dec}_elem_count functions are not called, as there's no need in counting elements to sustain proper map operations. To increase/decrease percpu counters for preallocated maps we add raw calls to the bpf_map_{inc,dec}_elem_count functions so that the impact is minimal. For dynamically allocated maps we add corresponding calls to the existing {inc,dec}_elem_count functions. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230706133932.45883-4-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-06 12:42:25 -07:00
Anton Protopopov	803370d3d3	bpf: add a new kfunc to return current bpf_map elements count A bpf_map_sum_elem_count kfunc was added to simplify getting the sum of the map per-cpu element counters. If a map doesn't implement the counter, then the function will always return 0. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230706133932.45883-3-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-06 12:42:25 -07:00
Anton Protopopov	2595473046	bpf: add percpu stats for bpf_map elements insertions/deletions Add a generic percpu stats for bpf_map elements insertions/deletions in order to keep track of both, the current (approximate) number of elements in a map and per-cpu statistics on update/delete operations. To expose these stats a particular map implementation should initialize the counter and adjust it as needed using the 'bpf_map_*_elem_count' helpers provided by this commit. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230706133932.45883-2-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-06 12:42:25 -07:00
Hou Tao	fd283ab196	selftests/bpf: Add benchmark for bpf memory allocator The benchmark could be used to compare the performance of hash map operations and the memory usage between different flavors of bpf memory allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also could be used to check the performance improvement or the memory saving provided by optimization. The benchmark creates a non-preallocated hash map which uses bpf memory allocator and shows the operation performance and the memory usage of the hash map under different use cases: (1) overwrite Each CPU overwrites nonoverlapping part of hash map. When each CPU completes overwriting of 64 elements in hash map, it increases the op_count. (2) batch_add_batch_del Each CPU adds then deletes nonoverlapping part of hash map in batch. When each CPU adds and deletes 64 elements in hash map, it increases the op_count twice. (3) add_del_on_diff_cpu Each two-CPUs pair adds and deletes nonoverlapping part of map cooperatively. When each CPU adds or deletes 64 elements in hash map, it will increase the op_count. The following is the benchmark results when comparing between different flavors of bpf memory allocator. These tests are conducted on a KVM guest with 8 CPUs and 16 GB memory. The command line below is used to do all the following benchmarks: ./bench htab-mem --use-case $name ${OPTS} -w3 -d10 -a -p8 These results show that preallocated hash map has both better performance and smaller memory footprint. (1) non-preallocated + no bpf memory allocator (v6.0.19) use kmalloc() + call_rcu overwrite per-prod-op: 11.24 ± 0.07k/s, avg mem: 82.64 ± 26.32MiB, peak mem: 119.18MiB batch_add_batch_del per-prod-op: 18.45 ± 0.10k/s, avg mem: 50.47 ± 14.51MiB, peak mem: 94.96MiB add_del_on_diff_cpu per-prod-op: 14.50 ± 0.03k/s, avg mem: 4.64 ± 0.73MiB, peak mem: 7.20MiB (2) preallocated OPTS=--preallocated overwrite per-prod-op: 191.42 ± 0.09k/s, avg mem: 1.24 ± 0.00MiB, peak mem: 1.49MiB batch_add_batch_del per-prod-op: 221.83 ± 0.17k/s, avg mem: 1.23 ± 0.00MiB, peak mem: 1.49MiB add_del_on_diff_cpu per-prod-op: 39.66 ± 0.31k/s, avg mem: 1.47 ± 0.13MiB, peak mem: 1.75MiB (3) normal bpf memory allocator overwrite per-prod-op: 126.59 ± 0.02k/s, avg mem: 2.26 ± 0.00MiB, peak mem: 2.74MiB batch_add_batch_del per-prod-op: 83.37 ± 0.20k/s, avg mem: 2.14 ± 0.17MiB, peak mem: 2.74MiB add_del_on_diff_cpu per-prod-op: 21.25 ± 0.24k/s, avg mem: 17.50 ± 3.32MiB, peak mem: 28.87MiB Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230704025039.938914-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-05 18:36:19 -07:00
Björn Töpel	21be9e477f	selftests/bpf: Honor $(O) when figuring out paths When building the kselftests out-of-tree, e.g. ... \| make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- \ \| O=/tmp/kselftest headers \| make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- \ \| O=/tmp/kselftest HOSTCC=gcc FORMAT= \ \| SKIP_TARGETS="arm64 ia64 powerpc sparc64 x86 sgx" \ \| -C tools/testing/selftests gen_tar ... the kselftest build would not pick up the correct GENDIR path, and therefore not including autoconf.h. Correct that by taking $(O) into consideration when figuring out the GENDIR path. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230705113926.751791-3-bjorn@kernel.org	2023-07-05 14:34:33 +02:00
Björn Töpel	ce1f289f54	selftests/bpf: Add F_NEEDS_EFFICIENT_UNALIGNED_ACCESS to some tests Some verifier tests were missing F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, which made the test fail. Add the flag where needed. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230705113926.751791-2-bjorn@kernel.org	2023-07-05 14:34:23 +02:00
Hou Tao	cf6eeb8f9d	bpf: Remove unnecessary ring buffer size check The theoretical maximum size of ring buffer is about 64GB, but now the size of ring buffer is specified by max_entries in bpf_attr and its maximum value is (4GB - 1), and it won't be possible for overflow. So just remove the unnecessary size check in ringbuf_map_alloc() but keep the comments for possible extension in future. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Closes: https://lore.kernel.org/bpf/9c636a63-1f3d-442d-9223-96c2dccb9469@moroto.mountain Link: https://lore.kernel.org/bpf/20230704074014.216616-1-houtao@huaweicloud.com	2023-07-05 14:09:45 +02:00

1 2 3 4 5 ...

1191591 Commits