Commit Graph

1088896 Commits

Author SHA1 Message Date
Alan Maguire
0f8619929c libbpf: Usdt aarch64 arg parsing support
Parsing of USDT arguments is architecture-specific. On aarch64 it is
relatively easy since registers used are x[0-31] and sp. Format is
slightly different compared to x86_64. Possible forms are:

- "size@[reg[,offset]]" for dereferences, e.g. "-8@[sp,76]" and "-4@[sp]";
- "size@reg" for register values, e.g. "-4@x0";
- "size@value" for raw values, e.g. "-8@1".

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649690496-1902-2-git-send-email-alan.maguire@oracle.com
2022-04-11 15:32:28 -07:00
Yuntao Wang
aa1b02e674 bpf: Remove redundant assignment to meta.seq in __task_seq_show()
The seq argument is assigned to meta.seq twice, the second one is
redundant, remove it.

This patch also removes a redundant space in bpf_iter_link_attach().

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220410060020.307283-1-ytcoode@gmail.com
2022-04-11 21:14:34 +02:00
Geliang Tang
f4fd706f73 selftests/bpf: Drop duplicate max/min definitions
Drop duplicate macros min() and MAX() definitions in prog_tests and use
MIN() or MAX() in sys/param.h instead.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/1ae276da9925c2de59b5bdc93b693b4c243e692e.1649462033.git.geliang.tang@suse.com
2022-04-11 17:18:09 +02:00
Pu Lehui
dd642ccb45 riscv, bpf: Implement more atomic operations for RV64
This patch implement more BPF atomic operations for RV64. The newly
added operations are shown below:

  atomic[64]_[fetch_]add
  atomic[64]_[fetch_]and
  atomic[64]_[fetch_]or
  atomic[64]_xchg
  atomic[64]_cmpxchg

Since riscv specification does not provide AMO instruction for CAS
operation, we use lr/sc instruction for cmpxchg operation, and AMO
instructions for the rest ops.

Tests "test_bpf.ko" and "test_progs -t atomic" have passed, as well
as "test_verifier" with no new failure cases.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20220410101246.232875-1-pulehui@huawei.com
2022-04-11 16:54:54 +02:00
Andrii Nakryiko
33fc250c3e Merge branch 'bpf: RLIMIT_MEMLOCK cleanups'
Yafang Shao says:

====================

We have switched to memcg-based memory accouting and thus the rlimit is
not needed any more. LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK was introduced in
libbpf for backward compatibility, so we can use it instead now.

This patchset cleanups the usage of RLIMIT_MEMLOCK in tools/bpf/,
tools/testing/selftests/bpf and samples/bpf. The file
tools/testing/selftests/bpf/bpf_rlimit.h is removed. The included header
sys/resource.h is removed from many files as it is useless in these files.

- v4: Squash patches and use customary subject prefixes. (Andrii)
- v3: Get rid of bpf_rlimit.h and fix some typos (Andrii)
- v2: Use libbpf_set_strict_mode instead. (Andrii)
- v1: https://lore.kernel.org/bpf/20220320060815.7716-2-laoar.shao@gmail.com/
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-04-10 20:17:17 -07:00
Yafang Shao
451b5fbc2c tools/runqslower: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK
Explicitly set libbpf 1.0 API mode, then we can avoid using the deprecated
RLIMIT_MEMLOCK.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409125958.92629-5-laoar.shao@gmail.com
2022-04-10 20:17:16 -07:00
Yafang Shao
a777e18f1b bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK
We have switched to memcg-based memory accouting and thus the rlimit is
not needed any more. LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK was introduced in
libbpf for backward compatibility, so we can use it instead now.

libbpf_set_strict_mode always return 0, so we don't need to check whether
the return value is 0 or not.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409125958.92629-4-laoar.shao@gmail.com
2022-04-10 20:17:16 -07:00
Yafang Shao
b858ba8c52 selftests/bpf: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK
We have switched to memcg-based memory accouting and thus the rlimit is
not needed any more. LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK was introduced in
libbpf for backward compatibility, so we can use it instead now. After
this change, the header tools/testing/selftests/bpf/bpf_rlimit.h can be
removed.

This patch also removes the useless header sys/resource.h from many files
in tools/testing/selftests/bpf/.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409125958.92629-3-laoar.shao@gmail.com
2022-04-10 20:17:16 -07:00
Yafang Shao
b25acdafd3 samples/bpf: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK
We have switched to memcg-based memory accouting and thus the rlimit is
not needed any more. LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK was introduced in
libbpf for backward compatibility, so we can use it instead now.

This patch also removes the useless header sys/resource.h from many files
in samples/bpf.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409125958.92629-2-laoar.shao@gmail.com
2022-04-10 20:17:15 -07:00
Runqing Yang
d252a4a499 libbpf: Fix a bug with checking bpf_probe_read_kernel() support in old kernels
Background:
Libbpf automatically replaces calls to BPF bpf_probe_read_{kernel,user}
[_str]() helpers with bpf_probe_read[_str](), if libbpf detects that
kernel doesn't support new APIs. Specifically, libbpf invokes the
probe_kern_probe_read_kernel function to load a small eBPF program into
the kernel in which bpf_probe_read_kernel API is invoked and lets the
kernel checks whether the new API is valid. If the loading fails, libbpf
considers the new API invalid and replaces it with the old API.

static int probe_kern_probe_read_kernel(void)
{
	struct bpf_insn insns[] = {
		BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),	/* r1 = r10 (fp) */
		BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),	/* r1 += -8 */
		BPF_MOV64_IMM(BPF_REG_2, 8),		/* r2 = 8 */
		BPF_MOV64_IMM(BPF_REG_3, 0),		/* r3 = 0 */
		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel),
		BPF_EXIT_INSN(),
	};
	int fd, insn_cnt = ARRAY_SIZE(insns);

	fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL,
                           "GPL", insns, insn_cnt, NULL);
	return probe_fd(fd);
}

Bug:
On older kernel versions [0], the kernel checks whether the version
number provided in the bpf syscall, matches the LINUX_VERSION_CODE.
If not matched, the bpf syscall fails. eBPF However, the
probe_kern_probe_read_kernel code does not set the kernel version
number provided to the bpf syscall, which causes the loading process
alwasys fails for old versions. It means that libbpf will replace the
new API with the old one even the kernel supports the new one.

Solution:
After a discussion in [1], the solution is using BPF_PROG_TYPE_TRACEPOINT
program type instead of BPF_PROG_TYPE_KPROBE because kernel does not
enfoce version check for tracepoint programs. I test the patch in old
kernels (4.18 and 4.19) and it works well.

  [0] https://elixir.bootlin.com/linux/v4.19/source/kernel/bpf/syscall.c#L1360
  [1] Closes: https://github.com/libbpf/libbpf/issues/473

Signed-off-by: Runqing Yang <rainkin1993@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409144928.27499-1-rainkin1993@gmail.com
2022-04-10 20:15:22 -07:00
Mykola Lysenko
61ddff373f selftests/bpf: Improve by-name subtest selection logic in prog_tests
Improve subtest selection logic when using -t/-a/-d parameters.
In particular, more than one subtest can be specified or a
combination of tests / subtests.

-a send_signal -d send_signal/send_signal_nmi* - runs send_signal
test without nmi tests

-a send_signal/send_signal_nmi*,find_vma - runs two send_signal
subtests and find_vma test

-a 'send_signal*' -a find_vma -d send_signal/send_signal_nmi* -
runs 2 send_signal test and find_vma test. Disables two send_signal
nmi subtests

-t send_signal -t find_vma - runs two *send_signal* tests and one
*find_vma* test

This will allow us to have granular control over which subtests
to disable in the CI system instead of disabling whole tests.

Also, add new selftest to avoid possible regression when
changing prog_test test name selection logic.

Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409001750.529930-1-mykolal@fb.com
2022-04-10 20:08:20 -07:00
Vladimir Isaev
0738599856 libbpf: Add ARC support to bpf_tracing.h
Add PT_REGS macros suitable for ARCompact and ARCv2.

Signed-off-by: Vladimir Isaev <isaev@synopsys.com>
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220408224442.599566-1-geomatsi@gmail.com
2022-04-10 18:53:37 -07:00
Jakub Kicinski
34ba23b44c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-04-09

We've added 63 non-merge commits during the last 9 day(s) which contain
a total of 68 files changed, 4852 insertions(+), 619 deletions(-).

The main changes are:

1) Add libbpf support for USDT (User Statically-Defined Tracing) probes.
   USDTs are an abstraction built on top of uprobes, critical for tracing
   and BPF, and widely used in production applications, from Andrii Nakryiko.

2) While Andrii was adding support for x86{-64}-specific logic of parsing
   USDT argument specification, Ilya followed-up with USDT support for s390
   architecture, from Ilya Leoshkevich.

3) Support name-based attaching for uprobe BPF programs in libbpf. The format
   supported is `u[ret]probe/binary_path:[raw_offset|function[+offset]]`, e.g.
   attaching to libc malloc can be done in BPF via SEC("uprobe/libc.so.6:malloc")
   now, from Alan Maguire.

4) Various load/store optimizations for the arm64 JIT to shrink the image
   size by using arm64 str/ldr immediate instructions. Also enable pointer
   authentication to verify return address for JITed code, from Xu Kuohai.

5) BPF verifier fixes for write access checks to helper functions, e.g.
   rd-only memory from bpf_*_cpu_ptr() must not be passed to helpers that
   write into passed buffers, from Kumar Kartikeya Dwivedi.

6) Fix overly excessive stack map allocation for its base map structure and
   buckets which slipped-in from cleanups during the rlimit accounting removal
   back then, from Yuntao Wang.

7) Extend the unstable CT lookup helpers for XDP and tc/BPF to report netfilter
   connection tracking tuple direction, from Lorenzo Bianconi.

8) Improve bpftool dump to show BPF program/link type names, Milan Landaverde.

9) Minor cleanups all over the place from various others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (63 commits)
  bpf: Fix excessive memory allocation in stack_map_alloc()
  selftests/bpf: Fix return value checks in perf_event_stackmap test
  selftests/bpf: Add CO-RE relos into linked_funcs selftests
  libbpf: Use weak hidden modifier for USDT BPF-side API functions
  libbpf: Don't error out on CO-RE relos for overriden weak subprogs
  samples, bpf: Move routes monitor in xdp_router_ipv4 in a dedicated thread
  libbpf: Allow WEAK and GLOBAL bindings during BTF fixup
  libbpf: Use strlcpy() in path resolution fallback logic
  libbpf: Add s390-specific USDT arg spec parsing logic
  libbpf: Make BPF-side of USDT support work on big-endian machines
  libbpf: Minor style improvements in USDT code
  libbpf: Fix use #ifdef instead of #if to avoid compiler warning
  libbpf: Potential NULL dereference in usdt_manager_attach_usdt()
  selftests/bpf: Uprobe tests should verify param/return values
  libbpf: Improve string parsing for uprobe auto-attach
  libbpf: Improve library identification for uprobe binary path resolution
  selftests/bpf: Test for writes to map key from BPF helpers
  selftests/bpf: Test passing rdonly mem to global func
  bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access
  bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access
  ...
====================

Link: https://lore.kernel.org/r/20220408231741.19116-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08 17:07:29 -07:00
Yuntao Wang
b45043192b bpf: Fix excessive memory allocation in stack_map_alloc()
The 'n_buckets * (value_size + sizeof(struct stack_map_bucket))' part of the
allocated memory for 'smap' is never used after the memlock accounting was
removed, thus get rid of it.

[ Note, Daniel:

Commit b936ca643a ("bpf: rework memlock-based memory accounting for maps")
moved `cost += n_buckets * (value_size + sizeof(struct stack_map_bucket))`
up and therefore before the bpf_map_area_alloc() allocation, sigh. In a later
step commit c85d69135a ("bpf: move memory size checks to bpf_map_charge_init()"),
and the overflow checks of `cost >= U32_MAX - PAGE_SIZE` moved into
bpf_map_charge_init(). And then 370868107b ("bpf: Eliminate rlimit-based
memory accounting for stackmap maps") finally removed the bpf_map_charge_init().
Anyway, the original code did the allocation same way as /after/ this fix. ]

Fixes: b936ca643a ("bpf: rework memlock-based memory accounting for maps")
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220407130423.798386-1-ytcoode@gmail.com
2022-04-09 00:28:21 +02:00
Bert Kenward
bd4a2697e5 sfc: use hardware tx timestamps for more than PTP
The 8000 series and newer NICs all get hardware timestamps from the MAC
 and can provide timestamps on a normal TX queue, rather than via a slow
 path through the MC. As such we can use this path for any packet where a
 hardware timestamp is requested.
This also enables support for PTP over transports other than IPv4+UDP.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: Edward Cree <ecree@xilinx.com>
Link: https://lore.kernel.org/r/510652dc-54b4-0e11-657e-e37ee3ca26a9@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08 14:43:10 -07:00
Marek Vasut
58389c00d4 net: phy: micrel: ksz9031/ksz9131: add cabletest support
Add cable test support for Micrel KSZ9x31 PHYs.

Tested on i.MX8M Mini with KSZ9131RNX in 100/Full mode with pairs shuffled
before magnetics:
(note: Cable test started/completed messages are omitted)

  mx8mm-ksz9131-a-d-connected$ ethtool --cable-test eth0
  Pair A code OK
  Pair B code Short within Pair
  Pair B, fault length: 0.80m
  Pair C code Short within Pair
  Pair C, fault length: 0.80m
  Pair D code OK

  mx8mm-ksz9131-a-b-connected$ ethtool --cable-test eth0
  Pair A code OK
  Pair B code OK
  Pair C code Short within Pair
  Pair C, fault length: 0.00m
  Pair D code Short within Pair
  Pair D, fault length: 0.00m

Tested on R8A77951 Salvator-XS with KSZ9031RNX and all four pairs connected:
(note: Cable test started/completed messages are omitted)

  r8a7795-ksz9031-all-connected$ ethtool --cable-test eth0
  Pair A code OK
  Pair B code OK
  Pair C code OK
  Pair D code OK

The CTRL1000 CTL1000_ENABLE_MASTER and CTL1000_AS_MASTER bits are not
restored by calling phy_init_hw(), they must be manually cached in
ksz9x31_cable_test_start() and restored at the end of
ksz9x31_cable_test_get_status().

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Oleksij Rempel <linux@rempel-privat.de>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20220407105534.85833-1-marex@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08 14:15:37 -07:00
Yuntao Wang
658d87687c selftests/bpf: Fix return value checks in perf_event_stackmap test
The bpf_get_stackid() function may also return 0 on success as per UAPI BPF
helper documentation. Therefore, correct checks from 'val > 0' to 'val >= 0'
to ensure that they cover all possible success return values.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220408041452.933944-1-ytcoode@gmail.com
2022-04-08 22:38:17 +02:00
Andrii Nakryiko
8555defe48 selftests/bpf: Add CO-RE relos into linked_funcs selftests
Add CO-RE relocations into __weak subprogs for multi-file linked_funcs
selftest to make sure libbpf handles such combination well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220408181425.2287230-4-andrii@kernel.org
2022-04-08 22:24:15 +02:00
Andrii Nakryiko
2fa5b0f290 libbpf: Use weak hidden modifier for USDT BPF-side API functions
Use __weak __hidden for bpf_usdt_xxx() APIs instead of much more
confusing `static inline __noinline`. This was previously impossible due
to libbpf erroring out on CO-RE relocations pointing to eliminated weak
subprogs. Now that previous patch fixed this issue, switch back to
__weak __hidden as it's a more direct way of specifying the desired
behavior.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220408181425.2287230-3-andrii@kernel.org
2022-04-08 22:24:15 +02:00
Andrii Nakryiko
e89d57d938 libbpf: Don't error out on CO-RE relos for overriden weak subprogs
During BPF static linking, all the ELF relocations and .BTF.ext
information (including CO-RE relocations) are preserved for __weak
subprograms that were logically overriden by either previous weak
subprogram instance or by corresponding "strong" (non-weak) subprogram.
This is just how native user-space linkers work, nothing new.

But libbpf is over-zealous when processing CO-RE relocation to error out
when CO-RE relocation belonging to such eliminated weak subprogram is
encountered. Instead of erroring out on this expected situation, log
debug-level message and skip the relocation.

Fixes: db2b8b0642 ("libbpf: Support CO-RE relocations for multi-prog sections")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220408181425.2287230-2-andrii@kernel.org
2022-04-08 22:24:15 +02:00
Lorenzo Bianconi
587323cf6a samples, bpf: Move routes monitor in xdp_router_ipv4 in a dedicated thread
In order to not miss any netlink message from the kernel, move routes
monitor to a dedicated thread.

Fixes: 85bf1f5169 ("samples: bpf: Convert xdp_router_ipv4 to XDP samples helper")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/e364b817c69ded73be24b677ab47a157f7c21b64.1649167911.git.lorenzo@kernel.org
2022-04-08 22:10:57 +02:00
Andrii Nakryiko
3a06ec0a99 libbpf: Allow WEAK and GLOBAL bindings during BTF fixup
During BTF fix up for global variables, global variable can be global
weak and will have STB_WEAK binding in ELF. Support such global
variables in addition to non-weak ones.

This is not the problem when using BPF static linking, as BPF static
linker "fixes up" BTF during generation so that libbpf doesn't have to
do it anymore during bpf_object__open(), which led to this not being
noticed for a while, along with a pretty rare (currently) use of __weak
variables and maps.

Reported-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220407230446.3980075-2-andrii@kernel.org
2022-04-08 09:16:09 -07:00
Andrii Nakryiko
3c0dfe6e4c libbpf: Use strlcpy() in path resolution fallback logic
Coverity static analyzer complains that strcpy() can cause buffer
overflow. Use libbpf_strlcpy() instead to be 100% sure this doesn't
happen.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220407230446.3980075-1-andrii@kernel.org
2022-04-08 09:16:09 -07:00
Andrii Nakryiko
700a6ef1fa Merge branch 'Add USDT support for s390'
Ilya Leoshkevich says:

====================

This series adds USDT support for s390, making the "usdt" test pass
there. Patch 1 is a collection of minor cleanups, patch 2 adds
BPF-side support, patch 3 adds userspace-side support.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-04-08 07:04:25 -07:00
Ilya Leoshkevich
bd022685bd libbpf: Add s390-specific USDT arg spec parsing logic
The logic is superficially similar to that of x86, but the small
differences (no need for register table and dynamic allocation of
register names, no $ sign before constants) make maintaining a common
implementation too burdensome. Therefore simply add a s390x-specific
version of parse_usdt_arg().

Note that while bcc supports index registers, this patch does not. This
should not be a problem in most cases, since s390 uses a default value
"nor" for STAP_SDT_ARG_CONSTRAINT.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220407214411.257260-4-iii@linux.ibm.com
2022-04-08 07:04:20 -07:00
David S. Miller
85b15c268f Merge branch 'net-sched-offload-failure-error-reporting'
Ido Schimmel says:

====================
net/sched: Better error reporting for offload failures

This patchset improves error reporting to user space when offload fails
during the flow action setup phase. That is, when failures occur in the
actions themselves, even before calling device drivers. Requested /
reported in [1].

This is done by passing extack to the offload_act_setup() callback and
making use of it in the various actions.

Patches #1-#2 change matchall and flower to log error messages to user
space in accordance with the verbose flag.

Patch #3 passes extack to the offload_act_setup() callback from the
various call sites, including matchall and flower.

Patches #4-#11 make use of extack in the various actions to report
offload failures.

Patch #12 adds an error message when the action does not support offload
at all.

Patches #13-#14 change matchall and flower to stop overwriting more
specific error messages.

[1] https://lore.kernel.org/netdev/20220317185249.5mff5u2x624pjewv@skbuf/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:44 +01:00
Ido Schimmel
fd23e0e250 net/sched: flower: Avoid overwriting error messages
The various error paths of tc_setup_offload_action() now report specific
error messages. Remove the generic messages to avoid overwriting the
more specific ones.

Before:

 # tc filter add dev dummy0 ingress pref 1 proto ip flower skip_sw dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
 Error: cls_flower: Failed to setup flow action.
 We have an error talking to the kernel

After:

 # tc filter add dev dummy0 ingress pref 1 proto ip flower skip_sw dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
 Error: act_police: Offload not supported when conform/exceed action is "reclassify".
 We have an error talking to the kernel

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
0cba5c34b8 net/sched: matchall: Avoid overwriting error messages
The various error paths of tc_setup_offload_action() now report specific
error messages. Remove the generic messages to avoid overwriting the
more specific ones.

Before:

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

After:

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000
 Error: act_police: Offload not supported when conform/exceed action is "reclassify".
 We have an error talking to the kernel

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
c440615ffb net/sched: cls_api: Add extack message for unsupported action offload
For better error reporting to user space, add an extack message when the
requested action does not support offload.

Example:

 # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action nat ingress 192.0.2.1 198.51.100.1
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-181     [000] b..1.    88.406093: netlink_extack: msg=Action does not support offload
       tc-181     [000] .....    88.406108: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
f8fab31694 net/sched: act_vlan: Add extack message for offload failure
For better error reporting to user space, add an extack message when
vlan action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than the current
set of modes.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
ee367d44b9 net/sched: act_tunnel_key: Add extack message for offload failure
For better error reporting to user space, add an extack message when
tunnel_key action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than set/release
modes.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
a9c64939b6 net/sched: act_skbedit: Add extack messages for offload failure
For better error reporting to user space, add extack messages when
skbedit action offload fails.

Example:

 # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action skbedit queue_mapping 1234
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-185     [002] b..1.    31.802414: netlink_extack: msg=act_skbedit: Offload not supported when "queue_mapping" option is used
       tc-185     [002] .....    31.802418: netlink_extack: msg=cls_matchall: Failed to setup flow action

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action skbedit inheritdsfield
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-187     [002] b..1.    45.985145: netlink_extack: msg=act_skbedit: Offload not supported when "inheritdsfield" option is used
       tc-187     [002] .....    45.985160: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
b50e462bc2 net/sched: act_police: Add extack messages for offload failure
For better error reporting to user space, add extack messages when
police action offload fails.

Example:

 # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-182     [000] b..1.    21.592969: netlink_extack: msg=act_police: Offload not supported when conform/exceed action is "reclassify"
       tc-182     [000] .....    21.592982: netlink_extack: msg=cls_matchall: Failed to setup flow action

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000 conform-exceed drop/continue
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-184     [000] b..1.    38.882579: netlink_extack: msg=act_police: Offload not supported when conform/exceed action is "continue"
       tc-184     [000] .....    38.882593: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
bf3b99e4f9 net/sched: act_pedit: Add extack message for offload failure
For better error reporting to user space, add an extack message when
pedit action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than set/add
commands.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
bca3821d19 net/sched: act_mpls: Add extack messages for offload failure
For better error reporting to user space, add extack messages when mpls
action offload fails.

Example:

 # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action mpls dec_ttl
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-182     [000] b..1.    18.693915: netlink_extack: msg=act_mpls: Offload not supported when "dec_ttl" option is used
       tc-182     [000] .....    18.693921: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
4dcaa50d02 net/sched: act_mirred: Add extack message for offload failure
For better error reporting to user space, add an extack message when
mirred action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than ingress/egress
mirror/redirect.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
69642c2ab2 net/sched: act_gact: Add extack messages for offload failure
For better error reporting to user space, add extack messages when gact
action offload fails.

Example:

 # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action continue
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-181     [002] b..1.   105.493450: netlink_extack: msg=act_gact: Offload of "continue" action is not supported
       tc-181     [002] .....   105.493466: netlink_extack: msg=cls_matchall: Failed to setup flow action

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action reclassify
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-183     [002] b..1.   124.126477: netlink_extack: msg=act_gact: Offload of "reclassify" action is not supported
       tc-183     [002] .....   124.126489: netlink_extack: msg=cls_matchall: Failed to setup flow action

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action pipe action drop
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-185     [002] b..1.   137.097791: netlink_extack: msg=act_gact: Offload of "pipe" action is not supported
       tc-185     [002] .....   137.097804: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
c2ccf84ecb net/sched: act_api: Add extack to offload_act_setup() callback
The callback is used by various actions to populate the flow action
structure prior to offload. Pass extack to this callback so that the
various actions will be able to report accurate error messages to user
space.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
11c95317bc net/sched: flower: Take verbose flag into account when logging error messages
The verbose flag was added in commit 81c7288b17 ("sched: cls: enable
verbose logging") to avoid suppressing logging of error messages that
occur "when the rule is not to be exclusively executed by the hardware".

However, such error messages are currently suppressed when setup of flow
action fails. Take the verbose flag into account to avoid suppressing
error messages. This is done by using the extack pointer initialized by
tc_cls_common_offload_init(), which performs the necessary checks.

Before:

 # tc filter add dev dummy0 ingress pref 1 proto ip flower dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
 # tc filter add dev dummy0 ingress pref 2 proto ip flower verbose dst_ip 198.51.100.1 action police rate 100Mbit burst 10000

After:

 # tc filter add dev dummy0 ingress pref 1 proto ip flower dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
 # tc filter add dev dummy0 ingress pref 2 proto ip flower verbose dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
 Warning: cls_flower: Failed to setup flow action.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
4c096ea2d6 net/sched: matchall: Take verbose flag into account when logging error messages
The verbose flag was added in commit 81c7288b17 ("sched: cls: enable
verbose logging") to avoid suppressing logging of error messages that
occur "when the rule is not to be exclusively executed by the hardware".

However, such error messages are currently suppressed when setup of flow
action fails. Take the verbose flag into account to avoid suppressing
error messages. This is done by using the extack pointer initialized by
tc_cls_common_offload_init(), which performs the necessary checks.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:42 +01:00
David S. Miller
4a778f3d53 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-04-07

Alexander Lobakin says:

This hunts down several places around packet templates/dummies for
switch rules which are either repetitive, fragile or just not
really readable code.
It's a common need to add new packet templates and to review such
changes as well, try to simplify both with the help of a pair
macros and aliases.
ice_find_dummy_packet() became very complex at this point with tons
of nested if-elses. It clearly showed this approach does not scale,
so convert its logics to the simple mask-match + static const array.

bloat-o-meter is happy about that (built w/ LLVM 13):

add/remove: 0/1 grow/shrink: 1/1 up/down: 2/-1058 (-1056)
Function                                     old     new   delta
ice_fill_adv_dummy_packet                    289     291      +2
ice_adv_add_update_vsi_list                  201       -    -201
ice_add_adv_rule                            2950    2093    -857
Total: Before=414512, After=413456, chg -0.25%
add/remove: 53/52 grow/shrink: 0/0 up/down: 4660/-3988 (672)
RO Data                                      old     new   delta
ice_dummy_pkt_profiles                         -     672    +672
Total: Before=37895, After=38567, chg +1.77%

Diffstat also looks nice, and adding new packet templates now takes
less lines.

We'll probably come out with dynamic template crafting in a while,
but for now let's improve what we have currently.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:41:31 +01:00
David S. Miller
e89006be0b Merge branch 'aspeed-mdio-c45'
Potin Lai says:

====================
mdio: aspeed: Add Clause 45 support for Aspeed MDIO

This patch series add Clause 45 support for Aspeed MDIO driver, and
separate c22 and c45 implementation into different functions.

LINK: [v1] https://lore.kernel.org/all/20220329161949.19762-1-potin.lai@quantatw.com/
LINK: [v2] https://lore.kernel.org/all/20220406170055.28516-1-potin.lai@quantatw.com/

Changes v2 --> v3:
 - sort local variable sequence in reverse Christmas tree format.

Changes v1 --> v2:
 - add C45 to probe_capabilities
 - break one patch into 3 small patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 12:20:52 +01:00
Potin Lai
e6df1b4a27 net: mdio: aspeed: Add c45 support
Add Clause 45 support for Aspeed mdio driver.

Signed-off-by: Potin Lai <potin.lai@quantatw.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 12:20:52 +01:00
Potin Lai
eb05719323 net: mdio: aspeed: Introduce read write function for c22 and c45
Add following additional functions to move out the implementation from
aspeed_mdio_read() and aspeed_mdio_write().

c22:
 - aspeed_mdio_read_c22()
 - aspeed_mdio_write_c22()

c45:
 - aspeed_mdio_read_c45()
 - aspeed_mdio_write_c45()

Signed-off-by: Potin Lai <potin.lai@quantatw.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 12:20:52 +01:00
Potin Lai
737ca35256 net: mdio: aspeed: move reg accessing part into separate functions
Add aspeed_mdio_op() and aseed_mdio_get_data() for register accessing.

aspeed_mdio_op() handles operations, write command to control register,
then check and wait operations is finished (bit 31 is cleared).

aseed_mdio_get_data() fetchs the result value of operation from data
register.

Signed-off-by: Potin Lai <potin.lai@quantatw.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 12:20:52 +01:00
Jakub Kicinski
e05afd0848 net: atm: remove the ambassador driver
The driver for ATM Ambassador devices spews build warnings on
microblaze. The virt_to_bus() calls discard the volatile keyword.
The right thing to do would be to migrate this driver to a modern
DMA API but it seems unlikely anyone is actually using it.
There had been no fixes or functional changes here since
the git era begun.

In fact it sounds like the FW loading was broken from 2008
'til 2012 - see commit fcdc90b025 ("atm: forever loop loading
ambassador firmware").

Let's remove this driver, there isn't much changing in the APIs,
if users come forward we can apologize and revert.

Link: https://lore.kernel.org/all/20220321144013.440d7fc0@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 12:06:11 +01:00
David S. Miller
6e8805de30 Merge branch 'bnxt-xdp-multi-buffer'
Michael Chan says:

====================
bnxt: Support XDP multi buffer

This series adds XDP multi buffer support, allowing MTU to go beyond
the page size limit.

v4: Rebase with latest net-next
v3: Simplify page mode buffer size calculation
    Check to make sure XDP program supports multipage packets
v2: Fix uninitialized variable warnings in patch 1 and 10.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:48 +01:00
Andy Gospodarek
9f4b28301c bnxt: XDP multibuffer enablement
Allow aggregation buffers to be in place in the receive path and
allow XDP programs to be attached when using a larger than 4k MTU.

v3: Add a check to sure XDP program supports multipage packets.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:48 +01:00
Andy Gospodarek
a7559bc8c1 bnxt: support transmit and free of aggregation buffers
This patch adds the following features:
- Support for XDP_TX and XDP_DROP action when using xdp_buff
  with frags
- Support for freeing all frags attached to an xdp_buff
- Cleanup of TX ring buffers after transmits complete
- Slight change in definition of bnxt_sw_tx_bd since nr_frags
  and RX producer may both need to be used
- Clear out skb_shared_info at the end of the buffer

v2: Fix uninitialized variable warning in bnxt_xdp_buff_frags_free().

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:48 +01:00
Andy Gospodarek
1dc4c557bf bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff
Since we have an xdp_buff with frags there needs to be a way to
convert that into a valid sk_buff in the event that XDP_PASS is
the resulting operation.  This adds a new rx_skb_func when the
netdev has an MTU that prevents the packets from sitting in a
single page.

This also make sure that GRO/LRO stay disabled even when using
the aggregation ring for large buffers.

v3: Use BNXT_PAGE_MODE_BUF_SIZE for build_skb

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:48 +01:00