A mirror of the official Linux kernel repository just in case
Go to file
Alexei Starovoitov c1bc51f85c Merge branch 'bpf-support-private-stack-for-bpf-progs'
Yonghong Song says:

====================
bpf: Support private stack for bpf progs

The main motivation for private stack comes from nested scheduler in
sched-ext from Tejun. The basic idea is that
 - each cgroup will its own associated bpf program,
 - bpf program with parent cgroup will call bpf programs
   in immediate child cgroups.

Let us say we have the following cgroup hierarchy:
  root_cg (prog0):
    cg1 (prog1):
      cg11 (prog11):
        cg111 (prog111)
        cg112 (prog112)
      cg12 (prog12):
        cg121 (prog121)
        cg122 (prog122)
    cg2 (prog2):
      cg21 (prog21)
      cg22 (prog22)
      cg23 (prog23)

In the above example, prog0 will call a kfunc which will call prog1 and
prog2 to get sched info for cg1 and cg2 and then the information is
summarized and sent back to prog0. Similarly, prog11 and prog12 will be
invoked in the kfunc and the result will be summarized and sent back to
prog1, etc. The following illustrates a possible call sequence:
   ... -> bpf prog A -> kfunc -> ops.<callback_fn> (bpf prog B) ...

Currently, for each thread, the x86 kernel allocate 16KB stack. Each
bpf program (including its subprograms) has maximum 512B stack size to
avoid potential stack overflow. Nested bpf programs further increase the
risk of stack overflow. To avoid potential stack overflow caused by bpf
programs, this patch set supported private stack and bpf program stack
space is allocated during jit time. Using private stack for bpf progs
can reduce or avoid potential kernel stack overflow.

Currently private stack is applied to tracing programs like kprobe/uprobe,
perf_event, tracepoint, raw tracepoint and struct_ops progs.
Tracing progs enable private stack if any subprog stack size is more
than a threshold (i.e. 64B). Struct-ops progs enable private stack
based on particular struct op implementation which can enable private
stack before verification at per-insn level. Struct-ops progs have
the same treatment as tracing progs w.r.t when to enable private stack.

For all these progs, the kernel will do recursion check (no nesting for
per prog per cpu) to ensure that private stack won't be overwritten.
The bpf_prog_aux struct has a callback func recursion_detected() which
can be implemented by kernel subsystem to synchronously detect recursion,
report error, etc.

Only x86_64 arch supports private stack now. It can be extended to other
archs later. Please see each individual patch for details.

Change logs:
  v11 -> v12:
    - v11 link: https://lore.kernel.org/bpf/20241109025312.148539-1-yonghong.song@linux.dev/
    - Fix a bug where allocated percpu space is less than actual private stack.
    - Add guard memory (before and after actual prog stack) to detect potential
      underflow/overflow.
  v10 -> v11:
    - v10 link: https://lore.kernel.org/bpf/20241107024138.3355687-1-yonghong.song@linux.dev/
    - Use two bool variables, priv_stack_requested (used by struct-ops only) and
      jits_use_priv_stack, in order to make code cleaner.
    - Set env->prog->aux->jits_use_priv_stack to true if any subprog uses private stack.
      This is for struct-ops use case to kick in recursion protection.
  v9 -> v10:
    - v9 link: https://lore.kernel.org/bpf/20241104193455.3241859-1-yonghong.song@linux.dev/
    - Simplify handling async cbs by making those async cb related progs using normal
      kernel stack.
    - Do percpu allocation in jit instead of verifier.
  v8 -> v9:
    - v8 link: https://lore.kernel.org/bpf/20241101030950.2677215-1-yonghong.song@linux.dev/
    - Use enum to express priv stack mode.
    - Use bits in bpf_subprog_info struct to do subprog recursion check between
      main/async and async subprogs.
    - Fix potential memory leak.
    - Rename recursion detection func from recursion_skipped() to recursion_detected().
  v7 -> v8:
    - v7 link: https://lore.kernel.org/bpf/20241029221637.264348-1-yonghong.song@linux.dev/
    - Add recursion_skipped() callback func to bpf_prog->aux structure such that if
      a recursion miss happened and bpf_prog->aux->recursion_skipped is not NULL, the
      callback fn will be called so the subsystem can do proper action based on their
      respective design.
  v6 -> v7:
    - v6 link: https://lore.kernel.org/bpf/20241020191341.2104841-1-yonghong.song@linux.dev/
    - Going back to do private stack allocation per prog instead per subtree. This can
      simplify implementation and avoid verifier complexity.
    - Handle potential nested subprog run if async callback exists.
    - Use struct_ops->check_member() callback to set whether a particular struct-ops
      prog wants private stack or not.
  v5 -> v6:
    - v5 link: https://lore.kernel.org/bpf/20241017223138.3175885-1-yonghong.song@linux.dev/
    - Instead of using (or not using) private stack at struct_ops level,
      each prog in struct_ops can decide whether to use private stack or not.
  v4 -> v5:
    - v4 link: https://lore.kernel.org/bpf/20241010175552.1895980-1-yonghong.song@linux.dev/
    - Remove bpf_prog_call() related implementation.
    - Allow (opt-in) private stack for sched-ext progs.
  v3 -> v4:
    - v3 link: https://lore.kernel.org/bpf/20240926234506.1769256-1-yonghong.song@linux.dev/
      There is a long discussion in the above v3 link trying to allow private
      stack to be used by kernel functions in order to simplify implementation.
      But unfortunately we didn't find a workable solution yet, so we return
      to the approach where private stack is only used by bpf programs.
    - Add bpf_prog_call() kfunc.
  v2 -> v3:
    - Instead of per-subprog private stack allocation, allocate private
      stacks at main prog or callback entry prog. Subprogs not main or callback
      progs will increment the inherited stack pointer to be their
      frame pointer.
    - Private stack allows each prog max stack size to be 512 bytes, intead
      of the whole prog hierarchy to be 512 bytes.
    - Add some tests.
====================

Link: https://lore.kernel.org/r/20241112163902.2223011-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12 16:26:25 -08:00
arch bpf, x86: Support private stack in jit 2024-11-12 16:26:24 -08:00
block block-6.12-20241018 2024-10-18 15:53:00 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto This push fixes the following issues: 2024-10-16 08:42:54 -07:00
Documentation bpf: Remove trailing whitespace in verifier.rst 2024-11-11 08:17:48 -08:00
drivers Including fixes from netfiler, xfrm and bluetooth. 2024-10-24 16:43:50 -07:00
fs for-6.12-rc4-tag 2024-10-24 13:04:15 -07:00
include bpf: Support private stack for struct_ops progs 2024-11-12 16:26:25 -08:00
init cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS 2024-10-13 22:23:13 +02:00
io_uring io_uring/rw: fix wrong NOWAIT check in io_rw_init_file() 2024-10-19 09:25:45 -06:00
ipc struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
kernel bpf: Support private stack for struct_ops progs 2024-11-12 16:26:25 -08:00
lib Probes fixes for v6.12-rc4(2): 2024-10-24 13:51:58 -07:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2024-10-24 18:47:28 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2024-10-24 18:47:28 -07:00
rust Driver core fix for 6.12-rc3 2024-10-13 09:10:52 -07:00
samples samples/bpf: remove obsolete tracing related tests 2024-10-11 09:51:31 -07:00
scripts kbuild,bpf: Pass make jobs' value to pahole 2024-11-11 20:00:21 -08:00
security ipe: fallback to platform keyring also if key in trusted keyring is rejected 2024-10-18 12:14:53 -07:00
sound ALSA: hda/conexant - Use cached pin control for Node 0x1d on HP EliteOne 1000 G2 2024-10-16 10:29:57 +02:00
tools selftests/bpf: Add struct_ops prog private stack tests 2024-11-12 16:26:25 -08:00
usr initramfs: shorten cmd_initfs in usr/Makefile 2024-07-16 01:07:52 +09:00
virt ARM64: 2024-10-21 11:22:04 -07:00
.clang-format clang-format: Update with v6.11-rc1's for_each macro list 2024-08-02 13:20:31 +02:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore Add Jeff Kirsher to .get_maintainer.ignore 2024-03-08 11:36:54 +00:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Kbuild updates for v6.12 2024-09-24 13:02:06 -07:00
.mailmap Including fixes from netfiler, xfrm and bluetooth. 2024-10-24 16:43:50 -07:00
.rustfmt.toml
COPYING
CREDITS CREDITS: sort alphabetically by name 2024-10-09 12:47:19 -07:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig
MAINTAINERS Including fixes from netfiler, xfrm and bluetooth. 2024-10-24 16:43:50 -07:00
Makefile Linux 6.12-rc4 2024-10-20 15:19:38 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.