linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-03 17:41:22 +00:00

Author	SHA1	Message	Date
Jakub Kicinski	e2ae4ca266	bpf: verifier: hard wire branches to dead code Loading programs with dead code becomes more and more common, as people begin to patch constants at load time. Turn conditional jumps to unconditional ones, to avoid potential branch misprediction penalty. This optimization is enabled for privileged users only. For branches which just fall through we could just mark them as not seen and have dead code removal take care of them, but that seems less clean. v0.2: - don't call capable(CAP_SYS_ADMIN) twice (Jiong). v3: - fix GCC warning; Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-23 17:35:31 -08:00
Jakub Kicinski	2cbd95a5c4	bpf: change parameters of call/branch offset adjustment In preparation for code removal change parameters to branch and call adjustment functions to be more universal. The current parameters assume we are patching a single instruction with a longer set. A diagram may help reading the change, this is for the patch single case, patching instruction 1 with a replacement of 4: ____ 0 \|____\| 1 \|____\| <-- pos ^ 2 \| \| <-- end old ^ \| 3 \| \| \| delta \| len 4 \|____\| \| \| (patch region) 5 \| \| <-- end new v v 6 \|____\| end_old = pos + 1 end_new = pos + delta + 1 If we are before the patch region - curr variable and the target are fully in old coordinates (hence comparing against end_old). If we are after the region curr is in new coordinates (hence the comparison to end_new) but target is in mixed coordinates, so we just check if it falls before end_new, and if so it needs the adjustment. Note that we will not fix up branches which land in removed region in case of removal, which should be okay, as we are only going to remove dead code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-23 17:35:31 -08:00
Stanislav Fomichev	bbebce8eb9	selftests/bpf: don't hardcode iptables/nc path in test_tcpnotify_user system() is calling shell which should find the appropriate full path via $PATH. On some systems, full path to iptables and/or nc might be different that we one we have hardcoded. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-23 12:56:30 +01:00
Taeung Song	c76e4c228b	libbpf: Show supported ELF section names when failing to guess prog/attach type We need to let users check their wrong ELF section name with proper ELF section names when they fail to get a prog/attach type from it. Because users can't realize libbpf guess prog/attach types from given ELF section names. For example, when a 'cgroup' section name of a BPF program is used, show available ELF section names(types). Before: $ bpftool prog load bpf-prog.o /sys/fs/bpf/prog1 Error: failed to guess program type based on ELF section name cgroup After: libbpf: failed to guess program type based on ELF section name 'cgroup' libbpf: supported section(type) names are: socket kprobe/ kretprobe/ classifier action tracepoint/ raw_tracepoint/ xdp perf_event lwt_in lwt_out lwt_xmit lwt_seg6local cgroup_skb/ingress cgroup_skb/egress cgroup/skb cgroup/sock cgroup/post_bind4 cgroup/post_bind6 cgroup/dev sockops sk_skb/stream_parser sk_skb/stream_verdict sk_skb sk_msg lirc_mode2 flow_dissector cgroup/bind4 cgroup/bind6 cgroup/connect4 cgroup/connect6 cgroup/sendmsg4 cgroup/sendmsg6 Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Quentin Monnet <quentin.monnet@netronome.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Andrey Ignatov <rdna@fb.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-23 12:27:04 +01:00
Yonghong Song	ffcf7ce933	bpf: btf: add btf documentation This patch added documentation for BTF (BPF Debug Format). The document is placed under linux:Documentation/bpf directory. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 23:50:02 -08:00
Alexei Starovoitov	cbeaad9028	Merge branch 'bpftool-probes' Quentin Monnet says: ==================== Hi, This set adds a new command to bpftool in order to dump a list of eBPF-related parameters for the system (or for a specific network device) to the console. Once again, this is based on a suggestion from Daniel. At this time, output includes: - Availability of bpf() system call - Availability of bpf() system call for unprivileged users - JIT status (enabled or not, with or without debugging traces) - JIT hardening status - JIT kallsyms exports status - Global memory limit for JIT compiler for unprivileged users - Status of kernel compilation options related to BPF features - Availability of known eBPF program types - Availability of known eBPF map types - Availability of known eBPF helper functions There are three different ways to dump this information at this time: - Plain output dumps probe results in plain text. It is the most flexible options for providing descriptive output to the user, but should not be relied upon for parsing the output. - JSON output is supported. - A third mode, available through the "macros" keyword appended to the command line, dumps some of those parameters (not all) as a series of "#define" directives, that can be included into a C header file for example. Probes for supported program and map types, and supported helpers, are directly added to libbpf, so that other applications (or selftests) can reuse them as necessary. If the user does not have root privileges (or more precisely, the CAP_SYS_ADMIN capability) detection will be erroneous for most parameters. Therefore, forbid non-root users to run the command. v5: - Move exported symbols to a new LIBBPF_0.0.2 section in libbpf.map (patches 4 to 6). - Minor fixes on patches 3 and 4. v4: - Probe bpf_jit_limit parameter (patch 2). - Probe some additional kernel config options (patch 3). - Minor fixes on patch 6. v3: - Do not probe kernel version in bpftool (just retrieve it to probe support for kprobes in libbpf). - Change the way results for helper support is displayed: now one list of compatible helpers for each program type (and C-style output gets a HAVE_PROG_TYPE_HELPER(prog_type, helper) macro to help with tests. See patches 6, 7. - Address other comments from feedback from v2 (please refer to individual patches' history). v2 (please also refer to individual patches' history): - Move probes for prog/map types, helpers, from bpftool to libbpf. - Move C-style output as a separate patch, and restrict it to a subset of collected information (bpf() availability, prog/map types, helpers). - Now probe helpers with all supported program types, and display a list of compatible program types (as supported on the system) for each helper. - NOT addressed: grouping compilation options for kernel into subsections (patch 3) (I don't see an easy way of grouping them at the moment, please see also the discussion on v1 thread). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 22:15:41 -08:00
Quentin Monnet	948703e808	tools: bpftool: add bash completion for bpftool probes Add the bash completion related to the newly introduced "bpftool feature probe" command. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 22:15:40 -08:00
Quentin Monnet	f9499fedf2	tools: bpftool: add probes for a network device bpftool gained support for probing the current system in order to see what program and map types, and what helpers are available on that system. This patch adds the possibility to pass an interface index to libbpf (and hence to the kernel) when trying to load the programs or to create the maps, in order to see what items a given network device can support. A new keyword "dev <ifname>" can be used as an alternative to "kernel" to indicate that the given device should be tested. If no target ("dev" or "kernel") is specified bpftool defaults to probing the kernel. Sample output: # bpftool -p feature probe dev lo { "syscall_config": { "have_bpf_syscall": true }, "program_types": { "have_sched_cls_prog_type": false, "have_xdp_prog_type": false }, ... } As the target is a network device, /proc/ parameters and kernel configuration are NOT dumped. Availability of the bpf() syscall is still probed, so we can return early if that syscall is not usable (since there is no point in attempting the remaining probes in this case). Among the program types, only the ones that can be offloaded are probed. All map types are probed, as there is no specific rule telling which one could or could not be supported by a device in the future. All helpers are probed (but only for offload-able program types). Caveat: as bpftool does not attempt to attach programs to the device at the moment, probes do not entirely reflect what the device accepts: typically, for Netronome's nfp, results will announce that TC cls offload is available even if support has been deactivated (with e.g. ethtool -K eth1 hw-tc-offload off). v2: - All helpers are probed, whereas previous version would only probe the ones compatible with an offload-able program type. This is because we do not keep a default compatible program type for each helper anymore. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 22:15:40 -08:00
Quentin Monnet	d267cff467	tools: bpftool: add C-style "#define" output for probes Make bpftool able to dump a subset of the parameters collected by probing the system as a listing of C-style #define macros, so that external projects can reuse the result of this probing and build BPF-based project in accordance with the features available on the system. The new "macros" keyword is used to select this output. An additional "prefix" keyword is added so that users can select a custom prefix for macro names, in order to avoid any namespace conflict. Sample output: # bpftool feature probe kernel macros prefix FOO_ /* System call availability / #define FOO_HAVE_BPF_SYSCALL / eBPF program types / #define FOO_HAVE_SOCKET_FILTER_PROG_TYPE #define FOO_HAVE_KPROBE_PROG_TYPE #define FOO_HAVE_SCHED_CLS_PROG_TYPE ... / eBPF map types / #define FOO_HAVE_HASH_MAP_TYPE #define FOO_HAVE_ARRAY_MAP_TYPE #define FOO_HAVE_PROG_ARRAY_MAP_TYPE ... / eBPF helper functions / / * Use FOO_HAVE_PROG_TYPE_HELPER(prog_type_name, helper_name) * to determine if <helper_name> is available for <prog_type_name>, * e.g. * #if FOO_HAVE_PROG_TYPE_HELPER(xdp, bpf_redirect) * // do stuff with this helper * #elif * // use a workaround * #endif */ #define FOO_HAVE_PROG_TYPE_HELPER(prog_type, helper) \ FOO_BPF__PROG_TYPE_ ## prog_type ## __HELPER_ ## helper ... #define FOO_BPF__PROG_TYPE_socket_filter__HELPER_bpf_probe_read 0 #define FOO_BPF__PROG_TYPE_socket_filter__HELPER_bpf_ktime_get_ns 1 #define FOO_BPF__PROG_TYPE_socket_filter__HELPER_bpf_trace_printk 1 ... v3: - Change output for helpers again: add a HAVE_PROG_TYPE_HELPER(type, helper) macro that can be used to tell if <helper> is available for program <type>. v2: - #define-based output added as a distinct patch. - "HAVE_" prefix appended to macro names. - Output limited to bpf() syscall availability, BPF prog and map types, helper functions. In this version kernel config options, procfs parameter or kernel version are intentionally left aside. - Following the change on helper probes, format for helper probes in this output style has changed (now a list of compatible program types). Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 22:15:40 -08:00
Quentin Monnet	2d3ea5e85d	tools: bpftool: add probes for eBPF helper functions Similarly to what was done for program types and map types, add a set of probes to test the availability of the different eBPF helper functions on the current system. For each known program type, all known helpers are tested, in order to establish a compatibility matrix. Output is provided as a set of lists of available helpers, one per program type. Sample output: # bpftool feature probe kernel ... Scanning eBPF helper functions... eBPF helpers supported for program type socket_filter: - bpf_map_lookup_elem - bpf_map_update_elem - bpf_map_delete_elem ... eBPF helpers supported for program type kprobe: - bpf_map_lookup_elem - bpf_map_update_elem - bpf_map_delete_elem ... # bpftool --json --pretty feature probe kernel { ... "helpers": { "socket_filter_available_helpers": ["bpf_map_lookup_elem", \ "bpf_map_update_elem","bpf_map_delete_elem", ... ], "kprobe_available_helpers": ["bpf_map_lookup_elem", \ "bpf_map_update_elem","bpf_map_delete_elem", ... ], ... } } v5: - In libbpf.map, move global symbol to the new LIBBPF_0.0.2 section. v4: - Use "enum bpf_func_id" instead of "__u32" in bpf_probe_helper() declaration for the type of the argument used to pass the id of the helper to probe. - Undef BPF_HELPER_MAKE_ENTRY after using it. v3: - Do not pass kernel version from bpftool to libbpf probes (kernel version for testing program with kprobes is retrieved directly from libbpf). - Dump one list of available helpers per program type (instead of one list of compatible program types per helper). v2: - Move probes from bpftool to libbpf. - Test all program types for each helper, print a list of working prog types for each helper. - Fall back on include/uapi/linux/bpf.h for names and ids of helpers. - Remove C-style macros output from this patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 22:15:40 -08:00
Quentin Monnet	f99e166397	tools: bpftool: add probes for eBPF map types Add new probes for eBPF map types, to detect what are the ones available on the system. Try creating one map of each type, and see if the kernel complains. Sample output: # bpftool feature probe kernel ... Scanning eBPF map types... eBPF map_type hash is available eBPF map_type array is available eBPF map_type prog_array is available ... # bpftool --json --pretty feature probe kernel { ... "map_types": { "have_hash_map_type": true, "have_array_map_type": true, "have_prog_array_map_type": true, ... } } v5: - In libbpf.map, move global symbol to the new LIBBPF_0.0.2 section. v3: - Use a switch with all enum values for setting specific map parameters, so that gcc complains at compile time (-Wswitch-enum) if new map types were added to the kernel but libbpf was not updated. v2: - Move probes from bpftool to libbpf. - Remove C-style macros output from this patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 22:15:40 -08:00
Quentin Monnet	1bf4b05810	tools: bpftool: add probes for eBPF program types Introduce probes for supported BPF program types in libbpf, and call it from bpftool to test what types are available on the system. The probe simply consists in loading a very basic program of that type and see if the verifier complains or not. Sample output: # bpftool feature probe kernel ... Scanning eBPF program types... eBPF program_type socket_filter is available eBPF program_type kprobe is available eBPF program_type sched_cls is available ... # bpftool --json --pretty feature probe kernel { ... "program_types": { "have_socket_filter_prog_type": true, "have_kprobe_prog_type": true, "have_sched_cls_prog_type": true, ... } } v5: - In libbpf.map, move global symbol to a new LIBBPF_0.0.2 section. - Rename (non-API function) prog_load() as probe_load(). v3: - Get kernel version for checking kprobes availability from libbpf instead of from bpftool. Do not pass kernel_version as an argument when calling libbpf probes. - Use a switch with all enum values for setting specific program parameters just before probing, so that gcc complains at compile time (-Wswitch-enum) if new prog types were added to the kernel but libbpf was not updated. - Add a comment in libbpf.h about setrlimit() usage to allow many consecutive probe attempts. v2: - Move probes from bpftool to libbpf. - Remove C-style macros output from this patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 22:15:40 -08:00
Quentin Monnet	4567b983f7	tools: bpftool: add probes for kernel configuration options Add probes to dump a number of options set (or not set) for compiling the kernel image. These parameters provide information about what BPF components should be available on the system. A number of them are not directly related to eBPF, but are in fact used in the kernel as conditions on which to compile, or not to compile, some of the eBPF helper functions. Sample output: # bpftool feature probe kernel Scanning system configuration... ... CONFIG_BPF is set to y CONFIG_BPF_SYSCALL is set to y CONFIG_HAVE_EBPF_JIT is set to y ... # bpftool --pretty --json feature probe kernel { "system_config": { ... "CONFIG_BPF": "y", "CONFIG_BPF_SYSCALL": "y", "CONFIG_HAVE_EBPF_JIT": "y", ... } } v5: - Declare options[] array in probe_kernel_image_config() as static. v4: - Add some options to the list: - CONFIG_TRACING - CONFIG_KPROBE_EVENTS - CONFIG_UPROBE_EVENTS - CONFIG_FTRACE_SYSCALLS - Add comments about those options in the source code. v3: - Add a comment about /proc/config.gz not being supported as a path for the config file at this time. - Use p_info() instead of p_err() on failure to get options from config file, as bpftool keeps probing other parameters and that would possibly create duplicate "error" entries for JSON. v2: - Remove C-style macros output from this patch. - NOT addressed: grouping of those config options into subsections (I don't see an easy way of grouping them at the moment, please see also the discussion on v1 thread). Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 22:15:40 -08:00
Quentin Monnet	7a4522bbef	tools: bpftool: add probes for /proc/ eBPF parameters Add a set of probes to dump the eBPF-related parameters available from /proc/: availability of bpf() syscall for unprivileged users, JIT compiler status and hardening status, kallsyms exports status. Sample output: # bpftool feature probe kernel Scanning system configuration... bpf() syscall for unprivileged users is enabled JIT compiler is disabled JIT compiler hardening is disabled JIT compiler kallsyms exports are disabled Global memory limit for JIT compiler for unprivileged users \ is 264241152 bytes ... # bpftool --json --pretty feature probe kernel { "system_config": { "unprivileged_bpf_disabled": 0, "bpf_jit_enable": 0, "bpf_jit_harden": 0, "bpf_jit_kallsyms": 0, "bpf_jit_limit": 264241152 }, ... } These probes are skipped if procfs is not mounted. v4: - Add bpf_jit_limit parameter. v2: - Remove C-style macros output from this patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 22:15:40 -08:00
Quentin Monnet	49eb7ab3b2	tools: bpftool: add basic probe capability, probe syscall availability Add a new component and command for bpftool, in order to probe the system to dump a set of eBPF-related parameters so that users can know what features are available on the system. Parameters are dumped in plain or JSON output (with -j/-p options). The current patch introduces probing of one simple parameter: availability of the bpf() system call. Later commits will add other probes. Sample output: # bpftool feature probe kernel Scanning system call availability... bpf() syscall is available # bpftool --json --pretty feature probe kernel { "syscall_config": { "have_bpf_syscall": true } } The optional "kernel" keyword enforces probing of the current system, which is the only possible behaviour at this stage. It can be safely omitted. The feature comes with the relevant man page, but bash completion will come in a dedicated commit. v3: - Do not probe kernel version. Contrarily to what is written below for v2, we can have the kernel version retrieved in libbpf instead of bpftool (in the patch adding probing for program types). v2: - Remove C-style macros output from this patch. - Even though kernel version is no longer needed for testing kprobes availability, note that we still collect it in this patch so that bpftool gets able to probe (in next patches) older kernels as well. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-01-22 22:15:40 -08:00
Peter Oskolkov	d0b2818efb	bpf: fix a (false) compiler warning An older GCC compiler complains: kernel/bpf/verifier.c: In function 'bpf_check': kernel/bpf/verifier.c:4*:13: error: 'prev_offset' may be used uninitialized in this function [-Werror=maybe-uninitialized] } else if (krecord[i].insn_offset <= prev_offset) { ^ kernel/bpf/verifier.c:4*:38: note: 'prev_offset' was declared here u32 i, nfuncs, urec_size, min_size, prev_offset; Although the compiler is wrong here, the patch makes sure that prev_offset is always initialized, just to silence the warning. v2: fix a spelling error in the commit message. Signed-off-by: Peter Oskolkov <posk@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-17 10:40:16 +01:00
Daniel Borkmann	4edc01b846	Merge branch 'bpf-bpftool-queue-stack' Stanislav Fomichev says: ==================== This patch series add support for queue/stack manipulations. It goes like this: commands by permitting empty keys. v2: * removed unneeded jsonw_null from patch #6 * improved bash completions (and moved them into separate patch #7) ==================== Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-17 10:30:35 +01:00
Stanislav Fomichev	55c70bffc7	bpftool: add bash completion for peek/push/enqueue/pop/dequeue bpftool map peek id <TAB> - suggests only queue and stack map ids bpftool map pop id <TAB> - suggests only stack map ids bpftool map dequeue id <TAB> - suggests only queue map ids bpftool map push id <TAB> - suggests only stack map ids bpftool map enqueue id <TAB> - suggests only queue map ids bpftool map push id 1 <TAB> - suggests 'value', not 'key' bpftool map enqueue id 2 <TAB> - suggests 'value', not 'key' bpftool map update id <stack/queue type> - suggests 'value', not 'key' bpftool map lookup id <stack/queue type> - suggests nothing Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-17 10:30:32 +01:00
Stanislav Fomichev	74f312ef84	bpftool: add pop and dequeue commands This is intended to be used with queues and stacks, it pops and prints the last element via bpf_map_lookup_and_delete_elem. Example: bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q bpftool map push pinned /sys/fs/bpf/q value 0 1 2 3 bpftool map pop pinned /sys/fs/bpf/q value: 00 01 02 03 bpftool map pop pinned /sys/fs/bpf/q Error: empty map bpftool map create /sys/fs/bpf/s type stack value 4 entries 10 name s bpftool map enqueue pinned /sys/fs/bpf/s value 0 1 2 3 bpftool map dequeue pinned /sys/fs/bpf/s value: 00 01 02 03 bpftool map dequeue pinned /sys/fs/bpf/s Error: empty map Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-17 10:30:32 +01:00
Stanislav Fomichev	549d4d3da7	bpftool: add push and enqueue commands This is intended to be used with queues and stacks and be more user-friendly than 'update' without the key. Example: bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q bpftool map push pinned /sys/fs/bpf/q value 0 1 2 3 bpftool map peek pinned /sys/fs/bpf/q value: 00 01 02 03 bpftool map create /sys/fs/bpf/s type stack value 4 entries 10 name s bpftool map enqueue pinned /sys/fs/bpf/s value 0 1 2 3 bpftool map peek pinned /sys/fs/bpf/s value: 00 01 02 03 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-17 10:30:31 +01:00
Stanislav Fomichev	66cf6e0b12	bpftool: add peek command This is intended to be used with queues and stacks and be more user-friendly than 'lookup' without key/value. Example: bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3 bpftool map peek pinned /sys/fs/bpf/q value: 00 01 02 03 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-17 10:30:31 +01:00
Stanislav Fomichev	04a5d323e6	bpftool: don't print empty key/value for maps When doing dump or lookup, don't print key if key_size == 0 or value if value_size == 0. The initial usecase is queue and stack, where we have only values. This is for regular output only, json still has all the fields. Before: bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3 bpftool map lookup pinned /sys/fs/bpf/q key: value: 00 01 02 03 After: bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3 bpftool map lookup pinned /sys/fs/bpf/q value: 00 01 02 03 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-17 10:30:31 +01:00
Stanislav Fomichev	8a89fff60a	bpftool: make key optional in lookup command Bpftool expects key for 'lookup' operations. For some map types, key should not be specified. Support looking up those map types. Before: bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3 bpftool map lookup pinned /sys/fs/bpf/q Error: did not find key After: bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3 bpftool map lookup pinned /sys/fs/bpf/q key: value: 00 01 02 03 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-17 10:30:31 +01:00
Stanislav Fomichev	7d7209cb9a	bpftool: make key and value optional in update command Bpftool expects both key and value for 'update' operations. For some map types, key should not be specified. Support updating those map types. Before: bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3 Error: did not find key After: bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-17 10:30:30 +01:00
Daniel Borkmann	e13279e211	Merge branch 'bpf-int128-btf' Yonghong Song says: ==================== Previous maximum supported integer bit width is 64. But the __int128 type has been supported by most (if not all) 64bit architectures including bpf for both gcc and clang. The kernel itself uses __int128 for x64 and arm64. Some bcc tools are using __int128 in bpf programs to describe ipv6 addresses. Without 128bit int support, the vmlinux BTF won't work and those bpf programs using __int128 cannot utilize BTF. This patch set therefore implements BTF __int128 support. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-16 22:53:46 +01:00
Yonghong Song	e86e513854	tools/bpf: support __int128 in bpftool map pretty dumper For formatted output, currently when json is enabled, the decimal number is required. Similar to kernel bpffs printout, for int128 numbers, only hex numbers are dumped, which are quoted as strings. The below is an example to show plain and json pretty print based on the map in test_btf pretty print test. $ bpftool m s 75: hash name pprint_test_has flags 0x0 key 4B value 112B max_entries 4 memlock 4096B $ bpftool m d id 75 ...... { "key": 3, "value": { "ui32": 3, "ui16": 0, "si32": -3, "unused_bits2a": 0x3, "bits28": 0x3, "unused_bits2b": 0x3, "": { "ui64": 3, "ui8a": [3,0,0,0,0,0,0,0 ] }, "aenum": 3, "ui32b": 4, "bits2c": 0x1, "si128a": 0x3, "si128b": 0xfffffffd, "bits3": 0x3, "bits80": 0x10000000000000003, "ui128": 0x20000000000000003 } }, ...... $ bptfool -p -j m d id 75 ...... { "key": ["0x03","0x00","0x00","0x00" ], "value": ["0x03","0x00","0x00","0x00","0x00","0x00","0x00","0x00", "0xfd","0xff","0xff","0xff","0x0f","0x00","0x00","0xc0", "0x03","0x00","0x00","0x00","0x00","0x00","0x00","0x00", "0x03","0x00","0x00","0x00","0x04","0x00","0x00","0x00", "0x01","0x00","0x00","0x00","0x00","0x00","0x00","0x00", "0x00","0x00","0x00","0x00","0x00","0x00","0x00","0x00", "0x03","0x00","0x00","0x00","0x00","0x00","0x00","0x00", "0x00","0x00","0x00","0x00","0x00","0x00","0x00","0x00", "0xfd","0xff","0xff","0xff","0x00","0x00","0x00","0x00", "0x00","0x00","0x00","0x00","0x00","0x00","0x00","0x00", "0x1b","0x00","0x00","0x00","0x00","0x00","0x00","0x00", "0x08","0x00","0x00","0x00","0x00","0x00","0x00","0x00", "0x03","0x00","0x00","0x00","0x00","0x00","0x00","0x00", "0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00" ], "formatted": { "key": 3, "value": { "ui32": 3, "ui16": 0, "si32": -3, "unused_bits2a": "0x3", "bits28": "0x3", "unused_bits2b": "0x3", "": { "ui64": 3, "ui8a": [3,0,0,0,0,0,0,0 ] }, "aenum": 3, "ui32b": 4, "bits2c": "0x1", "si128a": "0x3", "si128b": "0xfffffffd", "bits3": "0x3", "bits80": "0x10000000000000003", "ui128": "0x20000000000000003" } } } ...... Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-16 22:53:44 +01:00
Yonghong Song	4df3a1d0a5	tools/bpf: add bpffs pretty print test for int128 The bpffs pretty print test is extended to cover int128 types. Tested on an x64 machine. $ test_btf -p ...... BTF pretty print array(#3)......OK PASS:9 SKIP:0 FAIL:0 Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-16 22:53:44 +01:00
Yonghong Song	ce6ec47a10	tools/bpf: refactor test_btf pretty printing for multiple map value formats The test_btf pretty print is refactored in order to easily support multiple map value formats. The next patch will add __int128 type tests which needs macro guard __SIZEOF_INT128__. There is no functionality change with this patch. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-16 22:53:44 +01:00
Yonghong Song	a80eba20ed	tools/bpf: add int128 raw test in test_btf Several int128 raw type tests are added to test_btf. Currently these tests are enabled only for x64 and arm64 for which kernel has CONFIG_ARCH_SUPPORTS_INT128 set. $ test_btf ...... BTF raw test[106] (128-bit int): OK BTF raw test[107] (struct, 128-bit int member): OK BTF raw test[108] (struct, 120-bit int member bitfield): OK BTF raw test[109] (struct, kind_flag, 128-bit int member): OK BTF raw test[110] (struct, kind_flag, 120-bit int member bitfield): OK ...... Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-16 22:53:44 +01:00
Yonghong Song	b1e8818cab	bpf: btf: support 128 bit integer type Currently, btf only supports up to 64-bit integer. On the other hand, 128bit support for gcc and clang has existed for a long time. For example, both gcc 4.8 and llvm 3.7 supports types "__int128" and "unsigned __int128" for virtually all 64bit architectures including bpf. The requirement for __int128 support comes from two areas: . bpf program may use __int128. For example, some bcc tools (https://github.com/iovisor/bcc/tree/master/tools), mostly tcp v6 related, tcpstates.py, tcpaccept.py, etc., are using __int128 to represent the ipv6 addresses. . linux itself is using __int128 types. Hence supporting __int128 type in BTF is required for vmlinux BTF, which will be used by "compile once and run everywhere" and other projects. For 128bit integer, instead of base-10, hex numbers are pretty printed out as large decimal number is hard to decipher, e.g., for ipv6 addresses. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-16 22:53:44 +01:00
Stanislav Fomichev	eeedd3527d	libbpf: don't define CC and AR We are already including tools/scripts/Makefile.include which correctly handles CROSS_COMPILE, no need to define our own vars. See related commit `7ed1c1901f` ("tools: fix cross-compile var clobbering") for more details. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-01-16 22:47:43 +01:00
Randy Dunlap	ae5220c672	networking: Documentation: fix snmp_counters.rst Sphinx warnings Fix over 100 documentation warnings in snmp_counter.rst by extending the underline string lengths and inserting a blank line after bullet items. Examples: Documentation/networking/snmp_counter.rst:1: WARNING: Title overline too short. Documentation/networking/snmp_counter.rst:14: WARNING: Bullet list ends without a blank line; unexpected unindent. Fixes: `2b96547223` ("add document for TCP OFO, PAWS and skip ACK counters") Fixes: `8e2ea53a83` ("add snmp counters document") Fixes: `712ee16c23` ("add documents for snmp counters") Fixes: `80cc49507b` ("net: Add part of TCP counts explanations in snmp_counters.rst") Fixes: `b08794a922` ("documentation of some IP/ICMP snmp counters") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: yupeng <yupeng0921@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 13:29:54 -08:00
Gustavo A. R. Silva	bb3e16ad8b	net, decnet: use struct_size() in kzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 13:22:10 -08:00
Gustavo A. R. Silva	faa311e950	mlxsw: spectrum_nve: Use struct_size() in kzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 09:12:23 -08:00
Gustavo A. R. Silva	2285ec872d	mlxsw: spectrum_acl_bloom_filter: use struct_size() in kzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void entry[]; }; instance = kzalloc(sizeof(struct foo) + sizeof(void ) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This issue was detected with the help of Coccinelle. Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 09:12:23 -08:00
Sergio Paracuellos	590ce401c2	dt-bindings: net: dsa: ksz9477: fix indentation for switch spi bindings Switch bindings for spi managed mode are using spaces instead of tabs. Fix them to get a file with a proper kernel indentation style. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-15 22:29:16 -08:00
Lepton Wu	a22d325142	Fix ERROR:do not initialise statics to 0 in af_vsock.c Found by scripts/checkpatch.pl Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-15 20:38:29 -08:00
David S. Miller	9dde6da512	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2019-01-15 This series contains updates to the ice driver only. Bruce fixes an unused variable build warning, which was introduced with the commit `2fd527b72b` ("net: ndo_bridge_setlink: Add extack"). Added ethtool support for get_eeprom and get_eeprom_len operations. Added support for bringing down the PHY link optional when the interface is administratively downed. Anirudh refactors the transmit scheduler functions, which results in reduced code duplication and adds a helper function, which all the scheduler functions call instead. Added an LED blinking handler to ethtool. Reworked the queue management code to allow for reuse in future XDP feature support. Updates the driver to be able to preserve the aggregator list after reset by moving it out of port_info and into ice_hw. Added the ability to offload SCTP checksum calculation to the hardware. Added support for new PHY types, which support higher link speeds. Md Fahad makes sure that RSS lookup table and hash key get configured during the rebuild path after a reset. Brett updates the driver to set the physical link state according to the netdev state (up/down). Added support for adaptive/dynamic interrupt moderation in the ice driver, along with the ethtool operations needed. Tony adds software timestamping support by using ethtool_op_get_ts_info(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-15 15:48:00 -08:00
Jacob Keller	d671e3e0da	ice: add const qualifier to mac_addr parameter The function ice_aq_manage_mac_write takes a pointer to a MAC address. The parameter is not marked const, even though the function doesn't need to modify it. This prevents passing a parameter that is already marked const. Update the function prototype to take a const pointer, to allow passing constant pointers to this function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 12:42:38 -08:00
Anirudh Venkataramanan	aef74145f0	ice: Add support for new PHY types This patch adds code for the detection and operation of several additional PHY types that support higher link speeds. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 12:38:44 -08:00
Anirudh Venkataramanan	cf909e19ac	ice: Offload SCTP checksum This patch adds the ability to offload SCTP checksum calculations to the NIC. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 12:02:27 -08:00
Tony Nguyen	a8939784a1	ice: Allow for software timestamping Use ethtool_op_get_ts_info to provide software timestamping. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 11:56:26 -08:00
Brett Creeley	67fe64d78c	ice: Implement getting and setting ethtool coalesce This patch includes the following ethtool operations: 1. get_coalesce 2. set_coalesce 3. get_per_q_coalesce 4. set_per_q_coalesce Each ITR value (current_itr/target_itr) are stored on a per ice_ring_container basis. This is because each valid ice_ring_container can have 1 or more rings that are tied to the same q_vector ITR index. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 11:50:05 -08:00
Brett Creeley	63f545ed12	ice: Add support for adaptive interrupt moderation Currently the driver does not support adaptive/dynamic interrupt moderation. This patch adds support for this. Also, adaptive/dynamic interrupt moderation is turned on by default upon driver load. In order to support adaptive interrupt moderation, two functions were added, ice_update_itr() and ice_itr_divisor(). These are used to determine the current packet load and to determine a divisor based on link speed respectively. This patch also adds the ICE_ITR_GRAN_S define that is used in the hot-path when setting a new ITR value. The shift is used to pet two birds with one hand, set the ITR value while re-enabling the interrupt. Also, the ICE_ITR_GRAN_S is defined as 1 because the device has a ITR granularity of 2usecs. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 11:29:16 -08:00
Anirudh Venkataramanan	9be1d6f8c3	ice: Move aggregator list into ice_hw instance The aggregator list needs to be preserved for use after a reset. This patch moves it out of the port_info instance and into the ice_hw instance. Signed-off-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 11:21:13 -08:00
Anirudh Venkataramanan	03f7a98668	ice: Rework queue management code for reuse This patch reworks the queue management code to allow for reuse with the XDP feature (to be added in a future patch). Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 11:11:10 -08:00
Bruce Allan	ab4ab73fc1	ice: Add ethtool private flag to make forcing link down optional Add new infrastructure for implementing ethtool private flags using the existing pf->flags bitmap to store them, and add the link-down-on-close ethtool private flag to optionally bring down the PHY link when the interface is administratively downed. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 10:32:59 -08:00
Brett Creeley	b6f934f027	ice: Set physical link up/down when an interface is set up/down When a netdev is set up/down we need to set the phsyical link state accordingly. This patch adds that functionality by calling ice_force_phys_link_state(vsi, link_up) in both the ice_stop() and ice_open() paths. In order to force link, ice_force_phys_link_state(vsi, link_up) will first determine the current phy capabilities. If link has not changed there is nothing to do. If link has changed, previous PHY capabilities are saved and the "Enable Automatic Link Update" and "Link Establishment State Machine (LESM)" enable bits are set. Then the new PHY config is saved. The "Enable Automatic Link Update" will force the FW to execute Setup link and restart auto-negotiation. This should then result in a "Link Status Event (LSE)" which will cause the driver to get the current link status. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 10:27:18 -08:00
Bruce Allan	4c98ab550c	ice: Implement support for normal get_eeprom[_len] ethtool ops Add support for get_eeprom and get_eeprom_len ethtool ops Specification states that PF software accesses NVM (shadow-ram) via AQ commands (e.g. NVM Read, NVM Write) in the range 0x000000-0x00FFFF (64KB), so the get_eeprom_len op should return 64KB. If additional regions of the 16MB NVM must be read, another access method must be used. The ethtool kernel code, by default, will ask for multiple page-size hunks of the NVM not to exceed the value returned by ice_get_eeprom_len(). ice_read_sr_buf() deals with arch page sizes different than 4KB. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 10:20:43 -08:00
Anirudh Venkataramanan	8e151d50a1	ice: Add ethtool set_phys_id handler Add led blinking handler to ethtool. Since led blinking is controlled by FW/HW only ETHTOOL_ID_ACTIVE and ETHTOOL_ID_INACTIVE are really needed. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-01-15 10:07:08 -08:00

1 2 3 4 5 ...

810758 Commits