linux

mainlining shenanigans

Go to file

Andrii Nakryiko 1e0bd5a091 bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails `92117d8443` ("bpf: fix refcnt overflow") turned refcounting of bpf_map into potentially failing operation, when refcount reaches BPF_MAX_REFCNT limit (32k). Due to using 32-bit counter, it's possible in practice to overflow refcounter and make it wrap around to 0, causing erroneous map free, while there are still references to it, causing use-after-free problems. But having a failing refcounting operations are problematic in some cases. One example is mmap() interface. After establishing initial memory-mapping, user is allowed to arbitrarily map/remap/unmap parts of mapped memory, arbitrarily splitting it into multiple non-contiguous regions. All this happening without any control from the users of mmap subsystem. Rather mmap subsystem sends notifications to original creator of memory mapping through open/close callbacks, which are optionally specified during initial memory mapping creation. These callbacks are used to maintain accurate refcount for bpf_map (see next patch in this series). The problem is that open() callback is not supposed to fail, because memory-mapped resource is set up and properly referenced. This is posing a problem for using memory-mapping with BPF maps. One solution to this is to maintain separate refcount for just memory-mappings and do single bpf_map_inc/bpf_map_put when it goes from/to zero, respectively. There are similar use cases in current work on tcp-bpf, necessitating extra counter as well. This seems like a rather unfortunate and ugly solution that doesn't scale well to various new use cases. Another approach to solve this is to use non-failing refcount_t type, which uses 32-bit counter internally, but, once reaching overflow state at UINT_MAX, stays there. This utlimately causes memory leak, but prevents use after free. But given refcounting is not the most performance-critical operation with BPF maps (it's not used from running BPF program code), we can also just switch to 64-bit counter that can't overflow in practice, potentially disadvantaging 32-bit platforms a tiny bit. This simplifies semantics and allows above described scenarios to not worry about failing refcount increment operation. In terms of struct bpf_map size, we are still good and use the same amount of space: BEFORE (3 cache lines, 8 bytes of padding at the end): struct bpf_map { const struct bpf_map_ops * ops __attribute__((__aligned__(64))); /* 0 8 / struct bpf_map inner_map_meta; /* 8 8 / void security; /* 16 8 / enum bpf_map_type map_type; / 24 4 / u32 key_size; / 28 4 / u32 value_size; / 32 4 / u32 max_entries; / 36 4 / u32 map_flags; / 40 4 / int spin_lock_off; / 44 4 / u32 id; / 48 4 / int numa_node; / 52 4 / u32 btf_key_type_id; / 56 4 / u32 btf_value_type_id; / 60 4 / / --- cacheline 1 boundary (64 bytes) --- / struct btf btf; /* 64 8 / struct bpf_map_memory memory; / 72 16 / bool unpriv_array; / 88 1 / bool frozen; / 89 1 / / XXX 38 bytes hole, try to pack / / --- cacheline 2 boundary (128 bytes) --- / atomic_t refcnt __attribute__((__aligned__(64))); / 128 4 / atomic_t usercnt; / 132 4 / struct work_struct work; / 136 32 / char name[16]; / 168 16 / / size: 192, cachelines: 3, members: 21 / / sum members: 146, holes: 1, sum holes: 38 / / padding: 8 / / forced alignments: 2, forced holes: 1, sum forced holes: 38 / } __attribute__((__aligned__(64))); AFTER (same 3 cache lines, no extra padding now): struct bpf_map { const struct bpf_map_ops ops __attribute__((__aligned__(64))); /* 0 8 / struct bpf_map inner_map_meta; /* 8 8 / void security; /* 16 8 / enum bpf_map_type map_type; / 24 4 / u32 key_size; / 28 4 / u32 value_size; / 32 4 / u32 max_entries; / 36 4 / u32 map_flags; / 40 4 / int spin_lock_off; / 44 4 / u32 id; / 48 4 / int numa_node; / 52 4 / u32 btf_key_type_id; / 56 4 / u32 btf_value_type_id; / 60 4 / / --- cacheline 1 boundary (64 bytes) --- / struct btf btf; /* 64 8 / struct bpf_map_memory memory; / 72 16 / bool unpriv_array; / 88 1 / bool frozen; / 89 1 / / XXX 38 bytes hole, try to pack / / --- cacheline 2 boundary (128 bytes) --- / atomic64_t refcnt __attribute__((__aligned__(64))); / 128 8 / atomic64_t usercnt; / 136 8 / struct work_struct work; / 144 32 / char name[16]; / 176 16 / / size: 192, cachelines: 3, members: 21 / / sum members: 154, holes: 1, sum holes: 38 / / forced alignments: 2, forced holes: 1, sum forced holes: 38 */ } __attribute__((__aligned__(64))); This patch, while modifying all users of bpf_map_inc, also cleans up its interface to match bpf_map_put with separate operations for bpf_map_inc and bpf_map_inc_with_uref (to match bpf_map_put and bpf_map_put_with_uref, respectively). Also, given there are no users of bpf_map_inc_not_zero specifying uref=true, remove uref flag and default to uref=false internally. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191117172806.2195367-2-andriin@fb.com		2019-11-18 11:41:59 +01:00
arch	bpf: Support attaching tracing BPF program to other BPF programs	2019-11-15 23:45:24 +01:00
block	iocost: don't nest spin_lock_irq in ioc_weight_write()	2019-10-31 11:40:57 -06:00
certs	PKCS#7: Refactor verify_pkcs7_signature()	2019-08-05 18:40:18 -04:00
crypto	Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security	2019-09-28 08:14:15 -07:00
Documentation	bpf, doc: Change right arguments for JIT example code	2019-11-15 22:36:35 +01:00
drivers	bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails	2019-11-18 11:41:59 +01:00
fs	NFS Client Bugfixes for Linux 5.4-rc6	2019-11-01 17:37:44 -07:00
include	bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails	2019-11-18 11:41:59 +01:00
init	init: Support mounting root file systems over SMB	2019-10-02 12:15:15 -04:00
ipc	ipc/sem.c: convert to use built-in RCU list checking	2019-09-25 17:51:41 -07:00
kernel	bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails	2019-11-18 11:41:59 +01:00
lib	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	2019-11-02 15:29:58 -07:00
LICENSES	LICENSES: Rename other to deprecated	2019-05-03 06:34:32 -06:00
mm	uaccess: Add strict non-pagefault kernel-space read function	2019-11-02 12:39:12 -07:00
net	bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails	2019-11-18 11:41:59 +01:00
samples	samples/bpf: Add missing option to xdpsock usage	2019-11-15 22:32:10 +01:00
scripts	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2019-11-02 13:54:56 -07:00
security	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2019-11-02 13:54:56 -07:00
sound	ALSA: timer: Fix mutex deadlock at releasing card	2019-10-30 22:54:56 +01:00
tools	selftests/bpf: Add a test for attaching BPF prog to another BPF prog and subprog	2019-11-15 23:46:09 +01:00
usr	kbuild: update compile-test header list for v5.4-rc2	2019-10-05 15:29:49 +09:00
virt	kvm: call kvm_arch_destroy_vm if vm creation fails	2019-10-31 12:13:16 +01:00
.clang-format	clang-format: Update with the latest for_each macro list	2019-08-31 10:00:51 +02:00
.cocciconfig
.get_maintainer.ignore	Opt out of scripts/get_maintainer.pl	2019-05-16 10:53:40 -07:00
.gitattributes	.gitattributes: set git diff driver for C source code files	2016-10-07 18:46:30 -07:00
.gitignore	Modules updates for v5.4	2019-09-22 10:34:46 -07:00
.mailmap	A few MIPS fixes:	2019-10-26 19:43:12 -04:00
COPYING	COPYING: use the new text with points to the license files	2018-03-23 12:41:45 -06:00
CREDITS	MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer	2019-10-10 08:12:51 -07:00
Kbuild	kbuild: do not descend to ./Kbuild when cleaning	2019-08-21 21:03:58 +09:00
Kconfig	docs: kbuild: convert docs to ReST and rename to *.rst	2019-06-14 14:21:21 -06:00
MAINTAINERS	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2019-11-02 13:54:56 -07:00
Makefile	Linux 5.4-rc5	2019-10-27 13:19:19 -04:00
README	Drop all 00-INDEX files from Documentation/	2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.