linux

Author	SHA1	Message	Date
Magnus Karlsson	d18b09bf67	selftests: xsk: Remove color mode Remove color mode since it does not add any value and having less code means less maintenance which is a good thing. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-2-magnus.karlsson@gmail.com	2021-08-25 12:21:59 -07:00
Alexei Starovoitov	35cba2988f	Merge branch 'bpf: Add bpf_task_pt_regs() helper' Daniel Xu says: ==================== The motivation behind this helper is to access userspace pt_regs in a kprobe handler. uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a kprobe handler. The final case (kernelspace pt_regs in uprobe) is pretty rare (usermode helper) so I think that can be solved later if necessary. More concretely, this helper is useful in doing BPF-based DWARF stack unwinding. Currently the kernel can only do framepointer based stack unwinds for userspace code. This is because the DWARF state machines are too fragile to be computed in kernelspace [0]. The idea behind DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace stack (while in prog context) and send it up to userspace for unwinding (probably with libunwind) [1]. This would effectively enable profiling applications with -fomit-frame-pointer using kprobes and uprobes. [0]: https://lkml.org/lkml/2012/2/10/356 [1]: https://github.com/danobi/bpf-dwarf-walk Changes from v1: - Conwolidate BTF_ID decls for task_struct - Enable bpf_get_current_task_btf() for all prog types - Enable bpf_task_pt_regs() for all prog types - Use ASSERT_* macros instead of CHECK ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-08-25 10:37:06 -07:00
Daniel Xu	576d47bb1a	bpf: selftests: Add bpf_task_pt_regs() selftest This test retrieves the uprobe's pt_regs in two different ways and compares the contents in an arch-agnostic way. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/5581eb8800f6625ec8813fe21e9dce1fbdef4937.1629772842.git.dxu@dxuuu.xyz	2021-08-25 10:37:05 -07:00
Daniel Xu	dd6e10fbd9	bpf: Add bpf_task_pt_regs() helper The motivation behind this helper is to access userspace pt_regs in a kprobe handler. uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a kprobe handler. The final case (kernelspace pt_regs in uprobe) is pretty rare (usermode helper) so I think that can be solved later if necessary. More concretely, this helper is useful in doing BPF-based DWARF stack unwinding. Currently the kernel can only do framepointer based stack unwinds for userspace code. This is because the DWARF state machines are too fragile to be computed in kernelspace [0]. The idea behind DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace stack (while in prog context) and send it up to userspace for unwinding (probably with libunwind) [1]. This would effectively enable profiling applications with -fomit-frame-pointer using kprobes and uprobes. [0]: https://lkml.org/lkml/2012/2/10/356 [1]: https://github.com/danobi/bpf-dwarf-walk Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/e2718ced2d51ef4268590ab8562962438ab82815.1629772842.git.dxu@dxuuu.xyz	2021-08-25 10:37:05 -07:00
Daniel Xu	a396eda551	bpf: Extend bpf_base_func_proto helpers with bpf_get_current_task_btf() bpf_get_current_task() is already supported so it's natural to also include the _btf() variant for btf-powered helpers. This is required for non-tracing progs to use bpf_task_pt_regs() in the next commit. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/f99870ed5f834c9803d73b3476f8272b1bb987c0.1629772842.git.dxu@dxuuu.xyz	2021-08-25 10:37:05 -07:00
Daniel Xu	33c5cb3601	bpf: Consolidate task_struct BTF_ID declarations No need to have it defined 5 times. Once is enough. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/6dcefa5bed26fe1226f26683f36819bb53ec19a2.1629772842.git.dxu@dxuuu.xyz	2021-08-25 10:37:05 -07:00
Daniel Xu	1b07d00a15	bpf: Add BTF_ID_LIST_GLOBAL_SINGLE macro Same as BTF_ID_LIST_SINGLE macro except defines a global ID. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/a867a97517df42fd3953eeb5454402b57e74538f.1629772842.git.dxu@dxuuu.xyz	2021-08-25 10:37:05 -07:00
Alexei Starovoitov	3bbc8ee7c3	Merge branch 'Improve XDP samples usability and output' Kumar Kartikeya says: ==================== This set revamps XDP samples related to redirection to show better output and implement missing features consolidating all their differences and giving them a consistent look and feel, by implementing common features and command line options. Some of the TODO items like reporting redirect error numbers (ENETDOWN, EINVAL, ENOSPC, etc.) have also been implemented. Some of the features are: * Received packet statistics * xdp_redirect/xdp_redirect_map tracepoint statistics * xdp_redirect_err/xdp_redirect_map_err tracepoint statistics (with support for showing exact errno) * xdp_cpumap_enqueue/xdp_cpumap_kthread tracepoint statistics * xdp_devmap_xmit tracepoint statistics * xdp_exception tracepoint statistics * Per ifindex pair devmap_xmit stats shown dynamically (for xdp_monitor) to decompose the total. * Use of BPF skeleton and BPF static linking to share BPF programs. * Use of vmlinux.h and tp_btf for raw_tracepoint support. * Removal of redundant -N/--native-mode option (enforced by default now) * ... and massive cleanups all over the place. All tracepoints also use raw_tp now, and tracepoints like xdp_redirect are only enabled when requested explicitly to capture successful redirection statistics. The set of programs converted as part of this series are: * xdp_redirect_cpu * xdp_redirect_map_multi * xdp_redirect_map * xdp_redirect * xdp_monitor Explanation of the output: There is now a concise output mode by default that shows primarily four fields: rx/s Number of packets received per second redir/s Number of packets successfully redirected per second err,drop/s Aggregated count of errors per second (including dropped packets) xmit/s Number of packets transmitted on the output device per second Some examples: ; sudo ./xdp_redirect_map veth0 veth1 -s Redirecting from veth0 (ifindex 15; driver veth) to veth1 (ifindex 14; driver veth) veth0->veth1 0 rx/s 0 redir/s 0 err,drop/s 0 xmit/s veth0->veth1 9,998,660 rx/s 9,998,658 redir/s 0 err,drop/s 9,998,654 xmit/s ... There is also a verbose mode, that can also be enabled by default using -v (--verbose). The output mode can be switched dynamically at runtime using Ctrl + \ (SIGQUIT). To make the concise output more useful, the errors that occur are expanded inline (as if verbose mode was enabled) to let the user pin down the source of the problem without having to clutter output (or possibly miss it) or always use verbose mode. For instance, let's consider a case where the output device link state is set to down while redirection is happening: [...] veth0->veth1 24,503,376 rx/s 0 err,drop/s 24,503,372 xmit/s veth0->veth1 25,044,775 rx/s 0 err,drop/s 25,044,783 xmit/s veth0->veth1 25,263,046 rx/s 4 err,drop/s 25,263,028 xmit/s redirect_err 4 error/s ENETDOWN 4 error/s [...] The same holds for xdp_exception actions. An example of how a complete xdp_redirect_map session would look: ; sudo ./xdp_redirect_map veth0 veth1 Redirecting from veth0 (ifindex 5; driver veth) to veth1 (ifindex 4; driver veth) veth0->veth1 7,411,506 rx/s 0 err,drop/s 7,411,470 xmit/s veth0->veth1 8,931,770 rx/s 0 err,drop/s 8,931,771 xmit/s ^\ veth0->veth1 8,787,295 rx/s 0 err,drop/s 8,787,325 xmit/s receive total 8,787,295 pkt/s 0 drop/s 0 error/s cpu:7 8,787,295 pkt/s 0 drop/s 0 error/s redirect_err 0 error/s xdp_exception 0 hit/s xmit veth0->veth1 8,787,325 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg cpu:7 8,787,325 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg veth0->veth1 8,842,610 rx/s 0 err,drop/s 8,842,606 xmit/s receive total 8,842,610 pkt/s 0 drop/s 0 error/s cpu:7 8,842,610 pkt/s 0 drop/s 0 error/s redirect_err 0 error/s xdp_exception 0 hit/s xmit veth0->veth1 8,842,606 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg cpu:7 8,842,606 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg ^C Packets received : 33,973,181 Average packets/s : 4,246,648 Packets transmitted : 33,973,172 Average transmit/s : 4,246,647 The xdp_redirect tracepoint (for success stats) needs to be enabled explicitly using --stats/-s. Documentation for entire output and options is provided when user specifies --help/-h with a sample. Changelog: ---------- v3 -> v4: v3: https://lore.kernel.org/bpf/20210728165552.435050-1-memxor@gmail.com * Address all feedback from Daniel * Use READ_ONCE/WRITE_ONCE from linux/compiler.h (cannot directly include due to conflicts with vmlinux.h) * Fix MAX_CPUS hardcoding by switching to mmapable array maps, that are resized based on the value of libbpf_num_possible_cpus * s/ELEMENTS_OF/ARRAY_SIZE/g * Use tools/include/linux/hashtable.h * Coding style fixes * Remove hyperlinks for tracepoints * Split into smaller reviewable changes * Restore support for specifying custom xdp_redirect_cpu cpumap prog with some enhancements, including built-in programs for common actions (pass, drop, redirect). By default, cpumap prog is now disabled. * Misc bug fixes all over the place The printing stuff is a lot more basic without hyperlink support, hence it has not been exported into a more general facility. v2 -> v3 v2: https://lore.kernel.org/bpf/20210721212833.701342-1-memxor@gmail.com * Address all feedback from Andrii * Replace usage of libbpf hashmap (internal API) with custom one * Rename ATOMIC_* macros to NO_TEAR_* to better reflect their use * Use size_t as a portable word sized data type * Set libbpf_set_strict_mode * Invert conditions in BPF programs to exit early and reduce nesting * Use canonical SEC("xdp") naming for all XDP BPF progams * Add missing help description for cpumap enqueue and kthread tracepoints * Move private struct declarations from xdp_sample_user.h to .c file * Improve help output for cpumap enqueue and cpumap kthread tracepoints * Fix a bug where keys array for BPF_MAP_LOOKUP_BATCH is overallocated * Fix some conditions for printing stats (earlier only checked pps, now pps, drop, err and print if any is greater than zero) * Fix alloc_stats_record to properly return and cleanup allocated memory on allocation failure instead of calling exit(3) * Bump bpf_map_lookup_batch count to 32 to reduce lookup time with multiple devices in map * Fix a bug where devmap_xmit_multi stats are not printed when previous record is missing (i.e. when the first time stats are printed), by simply using a dummy record that is zeroed out * Also print per-CPU counts for devmap_xmit_multi which we collect already * Change mac_map to be BPF_MAP_TYPE_HASH instead of array to prevent resizing to a large size when max_ifindex is high, in xdp_redirect_map_multi * Fix instance of strerror(errno) in sample_install_xdp to use saved errno * Provide a usage function from samples helper * Provide a fix where incorrect stats are shown for parallel sessions of xdp_redirect_* samples by introducing matching support for input device(s), output device(s) and cpumap map id for enqueue and kthread stats. Only xdp_monitor doesn't filter stats, all others do. RFC (v1) -> v2 RFC (v1): https://lore.kernel.org/bpf/20210528235250.2635167-1-memxor@gmail.com * Address all feedback from Andrii * Use BPF static linking * Use vmlinux.h * Use BPF_PROG macro * Use global variables instead of maps * Use of tp_btf for raw_tracepoint progs * Switch to timerfd for polling * Use libbpf hashmap for maintaing device sets for per ifindex pair devmap_xmit stats * Fix Makefile to specify object dependencies properly * Use in-tree bpftool * ... misc fixes and cleanups all over the place ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-08-24 14:48:43 -07:00
Kumar Kartikeya Dwivedi	594a116b2a	samples: bpf: Convert xdp_redirect_map_multi to XDP samples helper Use the libbpf skeleton facility and other utilities provided by XDP samples helper. Also adapt to change of type of mac address map, so that no resizing is required. Add a new flag for sample mask that skips priting the from_device->to_device heading for each line, as xdp_redirect_map_multi may have two devices but the flow of data may be bidirectional, so the output would be confusing. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-23-memxor@gmail.com	2021-08-24 14:48:42 -07:00
Kumar Kartikeya Dwivedi	a29b3ca17e	samples: bpf: Convert xdp_redirect_map_multi_kern.o to XDP samples helper One of the notable changes is using a BPF_MAP_TYPE_HASH instead of array map to store mac addresses of devices, as the resizing behavior was based on max_ifindex, which unecessarily maximized the capacity of map beyond what was needed. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-22-memxor@gmail.com	2021-08-24 14:48:42 -07:00
Kumar Kartikeya Dwivedi	bbe65865aa	samples: bpf: Convert xdp_redirect_map to XDP samples helper Use the libbpf skeleton facility and other utilities provided by XDP samples helper. Since get_mac_addr is already provided by XDP samples helper, we drop it. Also convert to XDP samples helper similar to prior samples to minimize duplication of code. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-21-memxor@gmail.com	2021-08-24 14:48:42 -07:00
Kumar Kartikeya Dwivedi	54af769db9	samples: bpf: Convert xdp_redirect_map_kern.o to XDP samples helper Also update it to use consistent SEC("xdp") and SEC("xdp_devmap") naming, and use global variable instead of BPF map for copying the mac address. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-20-memxor@gmail.com	2021-08-24 14:48:42 -07:00
Kumar Kartikeya Dwivedi	e531a220cc	samples: bpf: Convert xdp_redirect_cpu to XDP samples helper Use the libbpf skeleton facility and other utilities provided by XDP samples helper. Similar to xdp_monitor, xdp_redirect_cpu was quite featureful except a few minor omissions (e.g. redirect errno reporting). All of these have been moved to XDP samples helper, hence drop the unneeded code and convert to usage of helpers provided by it. One of the important changes here is dropping of mprog-disable option, as we make that the default. Also, we support built-in programs for some common actions on the packet when it reaches kthread (pass, drop, redirect to device). If the user still needs to install a custom program, they can still supply a BPF object, however the program should be suitably tagged with SEC("xdp_cpumap") annotation so that the expected attach type is correct when updating our cpumap map element. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-19-memxor@gmail.com	2021-08-24 14:48:42 -07:00
Kumar Kartikeya Dwivedi	79ccf4529e	samples: bpf: Convert xdp_redirect_cpu_kern.o to XDP samples helper Similar to xdp_monitor_kern, a lot of these BPF programs have been reimplemented properly consolidating missing features from other XDP samples. Hence, drop the unneeded code and rename to .bpf.c suffix. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-18-memxor@gmail.com	2021-08-24 14:48:42 -07:00
Kumar Kartikeya Dwivedi	b926c55d85	samples: bpf: Convert xdp_redirect to XDP samples helper Use the libbpf skeleton facility and other utilities provided by XDP samples helper. One important note: The XDP samples helper handles ownership of installed XDP programs on devices, including responding to SIGINT and SIGTERM, so drop the code here and use the helpers we provide going forward for all xdp_redirect* conversions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-17-memxor@gmail.com	2021-08-24 14:48:42 -07:00
Kumar Kartikeya Dwivedi	66fc4ca85d	samples: bpf: Convert xdp_redirect_kern.o to XDP samples helper We moved swap_src_dst_mac to xdp_sample.bpf.h to be shared with other potential users, so drop it while moving code to the new file. Also, consistently use SEC("xdp") naming instead. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-16-memxor@gmail.com	2021-08-24 14:48:42 -07:00
Kumar Kartikeya Dwivedi	6e1051a54e	samples: bpf: Convert xdp_monitor to XDP samples helper Use the libbpf skeleton facility and other utilities provided by XDP samples helper. A lot of the code in xdp_monitor and xdp_redirect_cpu has been moved to the xdp_sample_user.o helper, so we remove the duplicate functions here that are no longer needed. Thanks to BPF skeleton, we no longer depend on order of tracepoints to uninstall them on startup. Instead, the sample mask is used to install the needed tracepoints. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-15-memxor@gmail.com	2021-08-24 14:48:41 -07:00
Kumar Kartikeya Dwivedi	3f19956010	samples: bpf: Convert xdp_monitor_kern.o to XDP samples helper We already moved all the functionality it provided in XDP samples helper userspace and kernel BPF object, so just delete the unneeded code. We also add generation of BPF skeleton and compilation using clang -target bpf for files ending with .bpf.c suffix (to denote that they use vmlinux.h). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-14-memxor@gmail.com	2021-08-24 14:48:41 -07:00
Kumar Kartikeya Dwivedi	384b6b3bbf	samples: bpf: Add vmlinux.h generation support Also, take this opportunity to depend on in-tree bpftool, so that we can use static linking support in subsequent commits for XDP samples BPF helper object. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-13-memxor@gmail.com	2021-08-24 14:48:41 -07:00
Kumar Kartikeya Dwivedi	af93d58c27	samples: bpf: Add devmap_xmit tracepoint statistics support This adds support for retrieval and printing for devmap_xmit total and mutli mode tracepoint. For multi mode, we keep a hash map entry for each redirection stream, such that we can dynamically add and remove entries on output. The from_match and to_match will be set by individual samples when setting up the XDP program on these devices. The multi mode tracepoint is also handy for xdp_redirect_map_multi, where up to 32 devices can be specified. Also add samples_init_pre_load macro to finally set up the resized maps and mmap them in place for low overhead stats retrieval. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-12-memxor@gmail.com	2021-08-24 14:48:41 -07:00
Kumar Kartikeya Dwivedi	5f116212f4	samples: bpf: Add BPF support for devmap_xmit tracepoint This adds support for the devmap_xmit tracepoint, and its multi device variant that can be used to obtain streams for each individual net_device to net_device redirection. This is useful for decomposing total xmit stats in xdp_monitor. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-11-memxor@gmail.com	2021-08-24 14:48:41 -07:00
Kumar Kartikeya Dwivedi	d771e21750	samples: bpf: Add cpumap tracepoint statistics support This consolidates retrieval and printing into the XDP sample helper. For the kthread stats, it expands xdp_stats separately with its own per-CPU stats. For cpumap enqueue, we display FROM->TO stats also with its per-CPU stats. The help out explains in detail the various aspects of the output. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-10-memxor@gmail.com	2021-08-24 14:48:41 -07:00
Kumar Kartikeya Dwivedi	0cf3c2fc4b	samples: bpf: Add BPF support for cpumap tracepoints These are invoked in two places, when the XDP frame or SKB (for generic XDP) enqueued to the ptr_ring (cpumap_enqueue) and when kthread processes the frame after invoking the CPUMAP program for it (returning stats for the batch). We use cpumap_map_id to filter on the map_id as a way to avoid printing incorrect stats for parallel sessions of xdp_redirect_cpu. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-9-memxor@gmail.com	2021-08-24 14:48:41 -07:00
Kumar Kartikeya Dwivedi	82c450803a	samples: bpf: Add xdp_exception tracepoint statistics support This implements the retrieval and printing, as well the help output. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-8-memxor@gmail.com	2021-08-24 14:48:41 -07:00
Kumar Kartikeya Dwivedi	451588764e	samples: bpf: Add BPF support for xdp_exception tracepoint This would allow us to store stats for each XDP action, including their per-CPU counts. Consolidating this here allows all redirect samples to detect xdp_exception events. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-7-memxor@gmail.com	2021-08-24 14:48:40 -07:00
Kumar Kartikeya Dwivedi	1d930fd2cd	samples: bpf: Add redirect tracepoint statistics support This implements per-errno reporting (for the ones we explicitly recognize), adds some help output, and implements the stats retrieval and printing functions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-6-memxor@gmail.com	2021-08-24 14:48:40 -07:00
Kumar Kartikeya Dwivedi	3231403894	samples: bpf: Add BPF support for redirect tracepoint This adds the shared BPF file that will be used going forward for sharing tracepoint programs among XDP redirect samples. Since vmlinux.h conflicts with tools/include for READ_ONCE/WRITE_ONCE and ARRAY_SIZE, they are copied in to xdp_sample.bpf.h along with other helpers that will be required. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-5-memxor@gmail.com	2021-08-24 14:48:40 -07:00
Kumar Kartikeya Dwivedi	156f886cf6	samples: bpf: Add basic infrastructure for XDP samples This file implements some common helpers to consolidate differences in features and functionality between the various XDP samples and give them a consistent look, feel, and reporting capabilities. This commit only adds support for receive statistics, which does not rely on any tracepoint, but on the XDP program installed on the device by each XDP redirect sample. Some of the key features are: * A concise output format accompanied by helpful text explaining its fields. * An elaborate output format building upon the concise one, and folding out details in case of errors and staying out of view otherwise. * Printing driver names for devices redirecting packets. * Getting mac address for interface. * Printing summarized total statistics for the entire session. * Ability to dynamically switch between concise and verbose mode, using SIGQUIT (Ctrl + \). In later patches, the support will be extended for each tracepoint with its own custom output in concise and verbose mode. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-4-memxor@gmail.com	2021-08-24 14:48:40 -07:00
Kumar Kartikeya Dwivedi	f2e85d4a75	tools: include: Add ethtool_drvinfo definition to UAPI header Instead of copying the whole header in, just add the struct definitions we need for now. In the future it can be synced as a copy of in-tree header if required. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-3-memxor@gmail.com	2021-08-24 14:48:40 -07:00
Kumar Kartikeya Dwivedi	50b796e645	samples: bpf: Fix a couple of warnings cookie_uid_helper_example.c: In function ‘main’: cookie_uid_helper_example.c:178:69: warning: ‘ -j ACCEPT’ directive writing 10 bytes into a region of size between 8 and 58 [-Wformat-overflow=] 178 \| sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT", \| ^~~~~~~~~~ /home/kkd/src/linux/samples/bpf/cookie_uid_helper_example.c:178:9: note: ‘sprintf’ output between 53 and 103 bytes into a destination of size 100 178 \| sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 179 \| file); \| ~~~~~ Fix by using snprintf and a sufficiently sized buffer. tracex4_user.c:35:15: warning: ‘write’ reading 12 bytes from a region of size 11 [-Wstringop-overread] 35 \| key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */ \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use size as 11. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-2-memxor@gmail.com	2021-08-24 14:48:40 -07:00
Andrey Ignatov	d7af7e497f	bpf: Fix possible out of bound write in narrow load handling Fix a verifier bug found by smatch static checker in [0]. This problem has never been seen in prod to my best knowledge. Fixing it still seems to be a good idea since it's hard to say for sure whether it's possible or not to have a scenario where a combination of convert_ctx_access() and a narrow load would lead to an out of bound write. When narrow load is handled, one or two new instructions are added to insn_buf array, but before it was only checked that cnt >= ARRAY_SIZE(insn_buf) And it's safe to add a new instruction to insn_buf[cnt++] only once. The second try will lead to out of bound write. And this is what can happen if `shift` is set. Fix it by making sure that if the BPF_RSH instruction has to be added in addition to BPF_AND then there is enough space for two more instructions in insn_buf. The full report [0] is below: kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array kernel/bpf/verifier.c 12282 12283 insn->off = off & ~(size_default - 1); 12284 insn->code = BPF_LDX \| BPF_MEM \| size_code; 12285 } 12286 12287 target_size = 0; 12288 cnt = convert_ctx_access(type, insn, insn_buf, env->prog, 12289 &target_size); 12290 if (cnt == 0 \|\| cnt >= ARRAY_SIZE(insn_buf) \|\| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Bounds check. 12291 (ctx_field_size && !target_size)) { 12292 verbose(env, "bpf verifier is misconfigured\n"); 12293 return -EINVAL; 12294 } 12295 12296 if (is_narrower_load && size < target_size) { 12297 u8 shift = bpf_ctx_narrow_access_offset( 12298 off, size, size_default) * 8; 12299 if (ctx_field_size <= 4) { 12300 if (shift) 12301 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH, ^^^^^ increment beyond end of array 12302 insn->dst_reg, 12303 shift); --> 12304 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg, ^^^^^ out of bounds write 12305 (1 << size * 8) - 1); 12306 } else { 12307 if (shift) 12308 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH, 12309 insn->dst_reg, 12310 shift); 12311 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg, ^^^^^^^^^^^^^^^ Same. 12312 (1ULL << size * 8) - 1); 12313 } 12314 } 12315 12316 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); 12317 if (!new_prog) 12318 return -ENOMEM; 12319 12320 delta += cnt - 1; 12321 12322 /* keep walking new program and skip insns we just inserted */ 12323 env->prog = new_prog; 12324 insn = new_prog->insnsi + i + delta; 12325 } 12326 12327 return 0; 12328 } [0] https://lore.kernel.org/bpf/20210817050843.GA21456@kili/ v1->v2: - clarify that problem was only seen by static checker but not in prod; Fixes: `46f53a65d2` ("bpf: Allow narrow loads with offset > 0") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820163935.1902398-1-rdna@fb.com	2021-08-24 14:32:26 -07:00
Alexei Starovoitov	f63693e3ae	Merge branch 'bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SK_MSG' Xu Liu says: ==================== We'd like to be able to identify netns from sk_msg hooks to accelerate local process communication form different netns. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-08-24 14:17:53 -07:00
Xu Liu	6cbca1ee0d	selftests/bpf: Test for get_netns_cookie Add test to use get_netns_cookie() from BPF_PROG_TYPE_SK_MSG. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820071712.52852-3-liuxu623@gmail.com	2021-08-24 14:17:53 -07:00
Xu Liu	fab60e29fc	bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SK_MSG We'd like to be able to identify netns from sk_msg hooks to accelerate local process communication form different netns. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820071712.52852-2-liuxu623@gmail.com	2021-08-24 14:17:53 -07:00
Alexei Starovoitov	8c0bb89e8e	Merge branch 'selftests/bpf: minor fixups' Li Zhijian says: ==================== Fix a few issues reported by 0Day/LKP during runing selftests/bpf. Changelog: V2: - folded previous similar standalone patch to [1/5], and add acked tag from Song Liu - add acked tag to [2/5], [3/5] from Song Liu - [4/5]: move test_bpftool.py to TEST_PROGS_EXTENDED, files in TEST_GEN_PROGS_EXTENDED are generated by make. Otherwise, it will break out-of-tree install: 'make O=/kselftest-build SKIP_TARGETS= V=1 -C tools/testing/selftests install INSTALL_PATH=/kselftest-install' - [5/5]: new patch ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-08-24 14:01:11 -07:00
Li Zhijian	00e1116031	selftests/bpf: Exit with KSFT_SKIP if no Makefile found This would happend when we run the tests after install kselftests root@lkp-skl-d01 ~# /kselftests/run_kselftest.sh -t bpf:test_doc_build.sh TAP version 13 1..1 # selftests: bpf: test_doc_build.sh perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_ADDRESS = "en_US.UTF-8", LC_NAME = "en_US.UTF-8", LC_MONETARY = "en_US.UTF-8", LC_PAPER = "en_US.UTF-8", LC_IDENTIFICATION = "en_US.UTF-8", LC_TELEPHONE = "en_US.UTF-8", LC_MEASUREMENT = "en_US.UTF-8", LC_TIME = "en_US.UTF-8", LC_NUMERIC = "en_US.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). # skip: bpftool files not found! # ok 1 selftests: bpf: test_doc_build.sh # SKIP Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820025549.28325-1-lizhijian@cn.fujitsu.com	2021-08-24 14:01:10 -07:00
Li Zhijian	404bd9ff5d	selftests/bpf: Add missing files required by test_bpftool.sh for installing test_bpftool.sh relies on bpftool and test_bpftool.py. 'make install' will install bpftool to INSTALL_PATH/bpf/bpftool, and export it to PATH so that it can be used after installing. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820015556.23276-5-lizhijian@cn.fujitsu.com	2021-08-24 14:01:10 -07:00
Li Zhijian	7a3bdca20b	selftests/bpf: Add default bpftool built by selftests to PATH For 'make run_tests': selftests will build bpftool into tools/testing/selftests/bpf/tools/sbin/bpftool by default. ================== root@lkp-skl-d01 /opt/rootfs/v5.14-rc4# make -C tools/testing/selftests/bpf run_tests make: Entering directory '/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf' MKDIR include MKDIR libbpf MKDIR bpftool [...] GEN /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/profiler.skel.h CC /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/prog.o GEN /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/pid_iter.skel.h CC /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/pids.o LINK /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/bpftool INSTALL bpftool GEN vmlinux.h [...] # test_feature_dev_json (test_bpftool.TestBpftool) ... ERROR # test_feature_kernel (test_bpftool.TestBpftool) ... ERROR # test_feature_kernel_full (test_bpftool.TestBpftool) ... ERROR # test_feature_kernel_full_vs_not_full (test_bpftool.TestBpftool) ... ERROR # test_feature_macros (test_bpftool.TestBpftool) ... Error: bug: failed to retrieve CAP_BPF status: Invalid argument # ERROR # # ====================================================================== # ERROR: test_feature_dev_json (test_bpftool.TestBpftool) # ---------------------------------------------------------------------- # Traceback (most recent call last): # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 57, in wrapper # return f(args, iface, kwargs) # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 82, in test_feature_dev_json # res = bpftool_json(["feature", "probe", "dev", iface]) # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 42, in bpftool_json # res = _bpftool(args) # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 34, in _bpftool # return subprocess.check_output(_args) # File "/usr/lib/python3.7/subprocess.py", line 395, in check_output # *kwargs).stdout # File "/usr/lib/python3.7/subprocess.py", line 487, in run # output=stdout, stderr=stderr) # subprocess.CalledProcessError: Command '['bpftool', '-j', 'feature', 'probe', 'dev', 'dummy0']' returned non-zero exit status 255. # ================== Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210820015556.23276-4-lizhijian@cn.fujitsu.com	2021-08-24 14:01:10 -07:00
Li Zhijian	5a980b5baf	selftests/bpf: Make test_doc_build.sh work from script directory Previously, it fails as below: ------------- root@lkp-skl-d01 /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf# ./test_doc_build.sh ++ realpath --relative-to=/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf ./test_doc_build.sh + SCRIPT_REL_PATH=test_doc_build.sh ++ dirname test_doc_build.sh + SCRIPT_REL_DIR=. ++ realpath /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/./../../../../ + KDIR_ROOT_DIR=/opt/rootfs/v5.14-rc4 + cd /opt/rootfs/v5.14-rc4 + for tgt in docs docs-clean + make -s -C /opt/rootfs/v5.14-rc4/. docs make: * No rule to make target 'docs'. Stop. + for tgt in docs docs-clean + make -s -C /opt/rootfs/v5.14-rc4/. docs-clean make: * No rule to make target 'docs-clean'. Stop. ----------- Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210820015556.23276-3-lizhijian@cn.fujitsu.com	2021-08-24 14:01:10 -07:00
Li Zhijian	2d82d73da3	selftests/bpf: Enlarge select() timeout for test_maps 0Day robot observed that it's easily timeout on a heavy load host. ------------------- # selftests: bpf: test_maps # Fork 1024 tasks to 'test_update_delete' # Fork 1024 tasks to 'test_update_delete' # Fork 100 tasks to 'test_hashmap' # Fork 100 tasks to 'test_hashmap_percpu' # Fork 100 tasks to 'test_hashmap_sizes' # Fork 100 tasks to 'test_hashmap_walk' # Fork 100 tasks to 'test_arraymap' # Fork 100 tasks to 'test_arraymap_percpu' # Failed sockmap unexpected timeout not ok 3 selftests: bpf: test_maps # exit=1 # selftests: bpf: test_lru_map # nr_cpus:8 ------------------- Since this test will be scheduled by 0Day to a random host that could have only a few cpus(2-8), enlarge the timeout to avoid a false NG report. In practice, i tried to pin it to only one cpu by 'taskset 0x01 ./test_maps', and knew 10S is likely enough, but i still perfer to a larger value 30. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210820015556.23276-2-lizhijian@cn.fujitsu.com	2021-08-24 14:01:10 -07:00
Yucong Sun	a6258837c8	selftests/bpf: Reduce flakyness in timer_mim This patch extends wait time in timer_mim. As observed in slow CI environment, it is possible to have interrupt/preemption long enough to cause the test to fail, almost 1 failure in 5 runs. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210823213629.3519641-1-fallentree@fb.com	2021-08-23 18:01:47 -07:00
Alexei Starovoitov	4ed589a278	Merge branch 'Refactor cgroup_bpf internals to use more specific attach_type' Dave Marchevsky says: ==================== The cgroup_bpf struct has a few arrays (effective, progs, and flags) of size MAX_BPF_ATTACH_TYPE. These are meant to separate progs by their attach type, currently represented by the bpf_attach_type enum. There are some bpf_attach_type values which are not valid attach types for cgroup bpf programs. Programs with these attach types will never be handled by cgroup_bpf_{attach,detach} and thus will never be held in cgroup_bpf structs. Even if such programs did make it into their reserved slot in those arrays, they would never be executed. Accordingly we can migrate to a new internal cgroup_bpf-specific enum for these arrays, saving some bytes per cgroup and making it more obvious which BPF programs belong there. netns_bpf_attach_type is an existing example of this pattern, let's do similar for cgroup_bpf. v1->v2: Address Daniel's comments * Reverse xmas tree ordering for def changes * Helper macro to reduce to_cgroup_bpf_attach_type boilerplate * checkpatch.pl complains: "ERROR: Macros with complex values should be enclosed in parentheses". Found some existing macros (do 'git grep "define case"') which get same complaint. Think it's fine to keep as-is since it's immediately undef'd. * Remove CG_BPF_ prefix from cgroup_bpf_attach_type * Although I agree that the prefix is redundant, the de-prefixed names feel a bit too 'general' given the internal use of the enum. e.g. when someone sees CGROUP_INET6_BIND it's not obvious that it should only be used in certain ways internally. * Don't feel strongly about this, just my thoughts as a noob to the internals. * Rebase onto latest bpf-next/master * No significant conflicts, some small boilerplate adjustments needed to catch up to Andrii's "bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions" change ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-08-23 17:50:24 -07:00
Dave Marchevsky	6fc88c354f	bpf: Migrate cgroup_bpf to internal cgroup_bpf_attach_type enum Add an enum (cgroup_bpf_attach_type) containing only valid cgroup_bpf attach types and a function to map bpf_attach_type values to the new enum. Inspired by netns_bpf_attach_type. Then, migrate cgroup_bpf to use cgroup_bpf_attach_type wherever possible. Functionality is unchanged as attach_type_to_prog_type switches in bpf/syscall.c were preventing non-cgroup programs from making use of the invalid cgroup_bpf array slots. As a result struct cgroup_bpf uses 504 fewer bytes relative to when its arrays were sized using MAX_BPF_ATTACH_TYPE. bpf_cgroup_storage is notably not migrated as struct bpf_cgroup_storage_key is part of uapi and contains a bpf_attach_type member which is not meant to be opaque. Similarly, bpf_cgroup_link continues to report its bpf_attach_type member to userspace via fdinfo and bpf_link_info. To ease disambiguation, bpf_attach_type variables are renamed from 'type' to 'atype' when changed to cgroup_bpf_attach_type. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210819092420.1984861-2-davemarchevsky@fb.com	2021-08-23 17:50:24 -07:00
Jiang Wang	d359902d5c	af_unix: Fix NULL pointer bug in unix_shutdown Commit `94531cfcbe` ("af_unix: Add unix_stream_proto for sockmap") introduced a bug for af_unix SEQPACKET type. In unix_shutdown, the unhash function will call prot->unhash(), which is NULL for SEQPACKET. And kernel will panic. On ARM32, it will show following messages: (it likely affects x86 too). Fix the bug by checking the prot->unhash is NULL or not first. Kernel log: <--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = 2fba1ffb *pgd=00000000 Internal error: Oops: 80000005 [#1] PREEMPT SMP THUMB2 Modules linked in: CPU: 1 PID: 1999 Comm: falkon Tainted: G W 5.14.0-rc5-01175-g94531cfcbe79-dirty #9240 Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) PC is at 0x0 LR is at unix_shutdown+0x81/0x1a8 pc : [<00000000>] lr : [<c08f3311>] psr: 600f0013 sp : e45aff70 ip : e463a3c0 fp : beb54f04 r10: 00000125 r9 : e45ae000 r8 : c4a56664 r7 : 00000001 r6 : c4a56464 r5 : 00000001 r4 : c4a56400 r3 : 00000000 r2 : c5a6b180 r1 : 00000000 r0 : c4a56400 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 50c5387d Table: 05aa804a DAC: 00000051 Register r0 information: slab PING start c4a56400 pointer offset 0 Register r1 information: NULL pointer Register r2 information: slab task_struct start c5a6b180 pointer offset 0 Register r3 information: NULL pointer Register r4 information: slab PING start c4a56400 pointer offset 0 Register r5 information: non-paged memory Register r6 information: slab PING start c4a56400 pointer offset 100 Register r7 information: non-paged memory Register r8 information: slab PING start c4a56400 pointer offset 612 Register r9 information: non-slab/vmalloc memory Register r10 information: non-paged memory Register r11 information: non-paged memory Register r12 information: slab filp start e463a3c0 pointer offset 0 Process falkon (pid: 1999, stack limit = 0x9ec48895) Stack: (0xe45aff70 to 0xe45b0000) ff60: e45ae000 c5f26a00 00000000 00000125 ff80: c0100264 c07f7fa3 beb54f04 fffffff7 00000001 e6f3fc0e b5e5e9ec beb54ec4 ffa0: b5da0ccc c010024b b5e5e9ec beb54ec4 0000000f 00000000 00000000 beb54ebc ffc0: b5e5e9ec beb54ec4 b5da0ccc 00000125 beb54f58 00785238 beb5529c beb54f04 ffe0: b5da1e24 beb54eac b301385c b62b6ee8 600f0030 0000000f 00000000 00000000 [<c08f3311>] (unix_shutdown) from [<c07f7fa3>] (__sys_shutdown+0x2f/0x50) [<c07f7fa3>] (__sys_shutdown) from [<c010024b>] (__sys_trace_return+0x1/0x16) Exception stack(0xe45affa8 to 0xe45afff0) Fixes: `94531cfcbe` ("af_unix: Add unix_stream_proto for sockmap") Reported-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/bpf/20210821180738.1151155-1-jiang.wang@bytedance.com	2021-08-23 14:56:01 +02:00
Prankur Gupta	f2a6ee924d	selftests/bpf: Add tests for {set\|get} socket option from setsockopt BPF Adding selftests for the newly added functionality to call bpf_setsockopt() and bpf_getsockopt() from setsockopt BPF programs. Test Details: 1. BPF Program Checks for changes in IPV6_TCLASS(SOL_IPV6) via setsockopt If the cca for the socket is not cubic do nothing If the newly set value for IPV6_TCLASS is 45 (0x2d) (as per our use-case) then change the cc from cubic to reno 2. User Space Program Creates an AF_INET6 socket and set the cca for that to be "cubic" Attach the program and set the IPV6_TCLASS to 0x2d using setsockopt Verify the cca for the socket changed to reno Signed-off-by: Prankur Gupta <prankgup@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210817224221.3257826-3-prankgup@fb.com	2021-08-20 01:10:01 +02:00
Prankur Gupta	2c531639de	bpf: Add support for {set\|get} socket options from setsockopt BPF Add logic to call bpf_setsockopt() and bpf_getsockopt() from setsockopt BPF programs. An example use case is when the user sets the IPV6_TCLASS socket option, we would also like to change the tcp-cc for that socket. We don't have any use case for calling bpf_setsockopt() from supposedly read- only sys_getsockopt(), so it is made available to BPF_CGROUP_SETSOCKOPT only at this point. Signed-off-by: Prankur Gupta <prankgup@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210817224221.3257826-2-prankgup@fb.com	2021-08-20 01:04:52 +02:00
Stanislav Fomichev	44779a4b85	bpf: Use kvmalloc for map keys in syscalls Same as previous patch but for the keys. memdup_bpfptr is renamed to kvmemdup_bpfptr (and converted to kvmalloc). Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818235216.1159202-2-sdf@google.com	2021-08-20 00:09:49 +02:00
Stanislav Fomichev	f0dce1d9b7	bpf: Use kvmalloc for map values in syscall Use kvmalloc/kvfree for temporary value when manipulating a map via syscall. kmalloc might not be sufficient for percpu maps where the value is big (and further multiplied by hundreds of CPUs). Can be reproduced with netcnt test on qemu with "-smp 255". Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818235216.1159202-1-sdf@google.com	2021-08-20 00:09:38 +02:00
Yucong Sun	3666b167ea	selftests/bpf: Adding delay in socketmap_listen to reduce flakyness This patch adds a 1ms delay to reduce flakyness of the test. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210819163609.2583758-1-fallentree@fb.com	2021-08-19 12:28:20 -07:00
Yonghong Song	594286b757	bpf: Fix NULL event->prog pointer access in bpf_overflow_handler Andrii reported that libbpf CI hit the following oops when running selftest send_signal: [ 1243.160719] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 1243.161066] #PF: supervisor read access in kernel mode [ 1243.161066] #PF: error_code(0x0000) - not-present page [ 1243.161066] PGD 0 P4D 0 [ 1243.161066] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 1243.161066] CPU: 1 PID: 882 Comm: new_name Tainted: G O 5.14.0-rc5 #1 [ 1243.161066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 1243.161066] RIP: 0010:bpf_overflow_handler+0x9a/0x1e0 [ 1243.161066] Code: 5a 84 c0 0f 84 06 01 00 00 be 66 02 00 00 48 c7 c7 6d 96 07 82 48 8b ab 18 05 00 00 e8 df 55 eb ff 66 90 48 8d 75 48 48 89 e7 <ff> 55 30 41 89 c4 e8 fb c1 f0 ff 84 c0 0f 84 94 00 00 00 e8 6e 0f [ 1243.161066] RSP: 0018:ffffc900000c0d80 EFLAGS: 00000046 [ 1243.161066] RAX: 0000000000000002 RBX: ffff8881002e0dd0 RCX: 00000000b4b47cf8 [ 1243.161066] RDX: ffffffff811dcb06 RSI: 0000000000000048 RDI: ffffc900000c0d80 [ 1243.161066] RBP: 0000000000000000 R08: 0000000000000000 R09: 1a9d56bb00000000 [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: 0000000000000000 [ 1243.161066] R13: ffffc900000c0e00 R14: ffffc900001c3c68 R15: 0000000000000082 [ 1243.161066] FS: 00007fc0be2d3380(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 [ 1243.161066] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1243.161066] CR2: 0000000000000030 CR3: 0000000104f8e000 CR4: 00000000000006e0 [ 1243.161066] Call Trace: [ 1243.161066] <IRQ> [ 1243.161066] __perf_event_overflow+0x4f/0xf0 [ 1243.161066] perf_swevent_hrtimer+0x116/0x130 [ 1243.161066] ? __lock_acquire+0x378/0x2730 [ 1243.161066] ? __lock_acquire+0x372/0x2730 [ 1243.161066] ? lock_is_held_type+0xd5/0x130 [ 1243.161066] ? find_held_lock+0x2b/0x80 [ 1243.161066] ? lock_is_held_type+0xd5/0x130 [ 1243.161066] ? perf_event_groups_first+0x80/0x80 [ 1243.161066] ? perf_event_groups_first+0x80/0x80 [ 1243.161066] __hrtimer_run_queues+0x1a3/0x460 [ 1243.161066] hrtimer_interrupt+0x110/0x220 [ 1243.161066] __sysvec_apic_timer_interrupt+0x8a/0x260 [ 1243.161066] sysvec_apic_timer_interrupt+0x89/0xc0 [ 1243.161066] </IRQ> [ 1243.161066] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 1243.161066] RIP: 0010:finish_task_switch+0xaf/0x250 [ 1243.161066] Code: 31 f6 68 90 2a 09 81 49 8d 7c 24 18 e8 aa d6 03 00 4c 89 e7 e8 12 ff ff ff 4c 89 e7 e8 ca 9c 80 00 e8 35 af 0d 00 fb 4d 85 f6 <58> 74 1d 65 48 8b 04 25 c0 6d 01 00 4c 3b b0 a0 04 00 00 74 37 f0 [ 1243.161066] RSP: 0018:ffffc900001c3d18 EFLAGS: 00000282 [ 1243.161066] RAX: 000000000000031f RBX: ffff888104cf4980 RCX: 0000000000000000 [ 1243.161066] RDX: 0000000000000000 RSI: ffffffff82095460 RDI: ffffffff820adc4e [ 1243.161066] RBP: ffffc900001c3d58 R08: 0000000000000001 R09: 0000000000000001 [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: ffff88813bd2bc80 [ 1243.161066] R13: ffff8881002e8000 R14: ffff88810022ad80 R15: 0000000000000000 [ 1243.161066] ? finish_task_switch+0xab/0x250 [ 1243.161066] ? finish_task_switch+0x70/0x250 [ 1243.161066] __schedule+0x36b/0xbb0 [ 1243.161066] ? _raw_spin_unlock_irqrestore+0x2d/0x50 [ 1243.161066] ? lockdep_hardirqs_on+0x79/0x100 [ 1243.161066] schedule+0x43/0xe0 [ 1243.161066] pipe_read+0x30b/0x450 [ 1243.161066] ? wait_woken+0x80/0x80 [ 1243.161066] new_sync_read+0x164/0x170 [ 1243.161066] vfs_read+0x122/0x1b0 [ 1243.161066] ksys_read+0x93/0xd0 [ 1243.161066] do_syscall_64+0x35/0x80 [ 1243.161066] entry_SYSCALL_64_after_hwframe+0x44/0xae The oops can also be reproduced with the following steps: ./vmtest.sh -s # at qemu shell cd /root/bpf && while true; do ./test_progs -t send_signal Further analysis showed that the failure is introduced with commit `b89fbfbb85` ("bpf: Implement minimal BPF perf link"). With the above commit, the following scenario becomes possible: cpu1 cpu2 hrtimer_interrupt -> bpf_overflow_handler (due to closing link_fd) bpf_perf_link_release -> perf_event_free_bpf_prog -> perf_event_free_bpf_handler -> WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler) event->prog = NULL bpf_prog_run(event->prog, &ctx) In the above case, the event->prog is NULL for bpf_prog_run, hence causing oops. To fix the issue, check whether event->prog is NULL or not. If it is, do not call bpf_prog_run. This seems working as the above reproducible step runs more than one hour and I didn't see any failures. Fixes: `b89fbfbb85` ("bpf: Implement minimal BPF perf link") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210819155209.1927994-1-yhs@fb.com	2021-08-19 11:49:19 -07:00

1 2 3 4 5 ...

1032001 Commits