linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-25 21:51:40 +00:00

A mirror of the official Linux kernel repository just in case

Go to file

Alexei Starovoitov fec56f5890 bpf: Introduce BPF trampoline Introduce BPF trampoline concept to allow kernel code to call into BPF programs with practically zero overhead. The trampoline generation logic is architecture dependent. It's converting native calling convention into BPF calling convention. BPF ISA is 64-bit (even on 32-bit architectures). The registers R1 to R5 are used to pass arguments into BPF functions. The main BPF program accepts only single argument "ctx" in R1. Whereas CPU native calling convention is different. x86-64 is passing first 6 arguments in registers and the rest on the stack. x86-32 is passing first 3 arguments in registers. sparc64 is passing first 6 in registers. And so on. The trampolines between BPF and kernel already exist. BPF_CALL_x macros in include/linux/filter.h statically compile trampolines from BPF into kernel helpers. They convert up to five u64 arguments into kernel C pointers and integers. On 64-bit architectures this BPF_to_kernel trampolines are nops. On 32-bit architecture they're meaningful. The opposite job kernel_to_BPF trampolines is done by CAST_TO_U64 macros and __bpf_trace_##call() shim functions in include/trace/bpf_probe.h. They convert kernel function arguments into array of u64s that BPF program consumes via R1=ctx pointer. This patch set is doing the same job as __bpf_trace_##call() static trampolines, but dynamically for any kernel function. There are ~22k global kernel functions that are attachable via nop at function entry. The function arguments and types are described in BTF. The job of btf_distill_func_proto() function is to extract useful information from BTF into "function model" that architecture dependent trampoline generators will use to generate assembly code to cast kernel function arguments into array of u64s. For example the kernel function eth_type_trans has two pointers. They will be casted to u64 and stored into stack of generated trampoline. The pointer to that stack space will be passed into BPF program in R1. On x86-64 such generated trampoline will consume 16 bytes of stack and two stores of %rdi and %rsi into stack. The verifier will make sure that only two u64 are accessed read-only by BPF program. The verifier will also recognize the precise type of the pointers being accessed and will not allow typecasting of the pointer to a different type within BPF program. The tracing use case in the datacenter demonstrated that certain key kernel functions have (like tcp_retransmit_skb) have 2 or more kprobes that are always active. Other functions have both kprobe and kretprobe. So it is essential to keep both kernel code and BPF programs executing at maximum speed. Hence generated BPF trampoline is re-generated every time new program is attached or detached to maintain maximum performance. To avoid the high cost of retpoline the attached BPF programs are called directly. __bpf_prog_enter/exit() are used to support per-program execution stats. In the future this logic will be optimized further by adding support for bpf_stats_enabled_key inside generated assembly code. Introduction of preemptible and sleepable BPF programs will completely remove the need to call to __bpf_prog_enter/exit(). Detach of a BPF program from the trampoline should not fail. To avoid memory allocation in detach path the half of the page is used as a reserve and flipped after each attach/detach. 2k bytes is enough to call 40+ BPF programs directly which is enough for BPF tracing use cases. This limit can be increased in the future. BPF_TRACE_FENTRY programs have access to raw kernel function arguments while BPF_TRACE_FEXIT programs have access to kernel return value as well. Often kprobe BPF program remembers function arguments in a map while kretprobe fetches arguments from a map and analyzes them together with return value. BPF_TRACE_FEXIT accelerates this typical use case. Recursion prevention for kprobe BPF programs is done via per-cpu bpf_prog_active counter. In practice that turned out to be a mistake. It caused programs to randomly skip execution. The tracing tools missed results they were looking for. Hence BPF trampoline doesn't provide builtin recursion prevention. It's a job of BPF program itself and will be addressed in the follow up patches. BPF trampoline is intended to be used beyond tracing and fentry/fexit use cases in the future. For example to remove retpoline cost from XDP programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org		2019-11-15 23:41:51 +01:00
arch	bpf: Introduce BPF trampoline	2019-11-15 23:41:51 +01:00
block	iocost: don't nest spin_lock_irq in ioc_weight_write()	2019-10-31 11:40:57 -06:00
certs	PKCS#7: Refactor verify_pkcs7_signature()	2019-08-05 18:40:18 -04:00
crypto	Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security	2019-09-28 08:14:15 -07:00
Documentation	bpf, doc: Change right arguments for JIT example code	2019-11-15 22:36:35 +01:00
drivers	mlx5-updates-2019-11-01	2019-11-03 19:23:49 -08:00
fs	NFS Client Bugfixes for Linux 5.4-rc6	2019-11-01 17:37:44 -07:00
include	bpf: Introduce BPF trampoline	2019-11-15 23:41:51 +01:00
init	init: Support mounting root file systems over SMB	2019-10-02 12:15:15 -04:00
ipc	ipc/sem.c: convert to use built-in RCU list checking	2019-09-25 17:51:41 -07:00
kernel	bpf: Introduce BPF trampoline	2019-11-15 23:41:51 +01:00
lib	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	2019-11-02 15:29:58 -07:00
LICENSES	LICENSES: Rename other to deprecated	2019-05-03 06:34:32 -06:00
mm	uaccess: Add strict non-pagefault kernel-space read function	2019-11-02 12:39:12 -07:00
net	net: icmp6: provide input address for traceroute6	2019-11-03 17:26:53 -08:00
samples	samples/bpf: Add missing option to xdpsock usage	2019-11-15 22:32:10 +01:00
scripts	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2019-11-02 13:54:56 -07:00
security	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2019-11-02 13:54:56 -07:00
sound	ALSA: timer: Fix mutex deadlock at releasing card	2019-10-30 22:54:56 +01:00
tools	bpf, testing: Add missing object file to TEST_FILES	2019-11-11 22:35:23 +01:00
usr	kbuild: update compile-test header list for v5.4-rc2	2019-10-05 15:29:49 +09:00
virt	kvm: call kvm_arch_destroy_vm if vm creation fails	2019-10-31 12:13:16 +01:00
.clang-format	clang-format: Update with the latest for_each macro list	2019-08-31 10:00:51 +02:00
.cocciconfig
.get_maintainer.ignore	Opt out of scripts/get_maintainer.pl	2019-05-16 10:53:40 -07:00
.gitattributes	.gitattributes: set git diff driver for C source code files	2016-10-07 18:46:30 -07:00
.gitignore	Modules updates for v5.4	2019-09-22 10:34:46 -07:00
.mailmap	A few MIPS fixes:	2019-10-26 19:43:12 -04:00
COPYING	COPYING: use the new text with points to the license files	2018-03-23 12:41:45 -06:00
CREDITS	MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer	2019-10-10 08:12:51 -07:00
Kbuild	kbuild: do not descend to ./Kbuild when cleaning	2019-08-21 21:03:58 +09:00
Kconfig	docs: kbuild: convert docs to ReST and rename to *.rst	2019-06-14 14:21:21 -06:00
MAINTAINERS	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2019-11-02 13:54:56 -07:00
Makefile	Linux 5.4-rc5	2019-10-27 13:19:19 -04:00
README	Drop all 00-INDEX files from Documentation/	2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.