linux

mainlining shenanigans

Go to file

Serge E. Hallyn 8db6c34f1d Introduce v3 namespaced file capabilities Root in a non-initial user ns cannot be trusted to write a traditional security.capability xattr. If it were allowed to do so, then any unprivileged user on the host could map his own uid to root in a private namespace, write the xattr, and execute the file with privilege on the host. However supporting file capabilities in a user namespace is very desirable. Not doing so means that any programs designed to run with limited privilege must continue to support other methods of gaining and dropping privilege. For instance a program installer must detect whether file capabilities can be assigned, and assign them if so but set setuid-root otherwise. The program in turn must know how to drop partial capabilities, and do so only if setuid-root. This patch introduces v3 of the security.capability xattr. It builds a vfs_ns_cap_data struct by appending a uid_t rootid to struct vfs_cap_data. This is the absolute uid_t (that is, the uid_t in user namespace which mounted the filesystem, usually init_user_ns) of the root id in whose namespaces the file capabilities may take effect. When a task asks to write a v2 security.capability xattr, if it is privileged with respect to the userns which mounted the filesystem, then nothing should change. Otherwise, the kernel will transparently rewrite the xattr as a v3 with the appropriate rootid. This is done during the execution of setxattr() to catch user-space-initiated capability writes. Subsequently, any task executing the file which has the noted kuid as its root uid, or which is in a descendent user_ns of such a user_ns, will run the file with capabilities. Similarly when asking to read file capabilities, a v3 capability will be presented as v2 if it applies to the caller's namespace. If a task writes a v3 security.capability, then it can provide a uid for the xattr so long as the uid is valid in its own user namespace, and it is privileged with CAP_SETFCAP over its namespace. The kernel will translate that rootid to an absolute uid, and write that to disk. After this, a task in the writer's namespace will not be able to use those capabilities (unless rootid was 0), but a task in a namespace where the given uid is root will. Only a single security.capability xattr may exist at a time for a given file. A task may overwrite an existing xattr so long as it is privileged over the inode. Note this is a departure from previous semantics, which required privilege to remove a security.capability xattr. This check can be re-added if deemed useful. This allows a simple setxattr to work, allows tar/untar to work, and allows us to tar in one namespace and untar in another while preserving the capability, without risking leaking privilege into a parent namespace. Example using tar: $ cp /bin/sleep sleepx $ mkdir b1 b2 $ lxc-usernsexec -m b:0:100000:1 -m b:1:$(id -u):1 -- chown 0:0 b1 $ lxc-usernsexec -m b:0:100001:1 -m b:1:$(id -u):1 -- chown 0:0 b2 $ lxc-usernsexec -m b:0:100000:1000 -- tar --xattrs-include=security.capability --xattrs -cf b1/sleepx.tar sleepx $ lxc-usernsexec -m b:0:100001:1000 -- tar --xattrs-include=security.capability --xattrs -C b2 -xf b1/sleepx.tar $ lxc-usernsexec -m b:0:100001:1000 -- getcap b2/sleepx b2/sleepx = cap_sys_admin+ep # /opt/ltp/testcases/bin/getv3xattr b2/sleepx v3 xattr, rootid is 100001 A patch to linux-test-project adding a new set of tests for this functionality is in the nsfscaps branch at github.com/hallyn/ltp Changelog: Nov 02 2016: fix invalid check at refuse_fcap_overwrite() Nov 07 2016: convert rootid from and to fs user_ns (From ebiederm: mar 28 2017) commoncap.c: fix typos - s/v4/v3 get_vfs_caps_from_disk: clarify the fs_ns root access check nsfscaps: change the code split for cap_inode_setxattr() Apr 09 2017: don't return v3 cap for caps owned by current root. return a v2 cap for a true v2 cap in non-init ns Apr 18 2017: . Change the flow of fscap writing to support s_user_ns writing. . Remove refuse_fcap_overwrite(). The value of the previous xattr doesn't matter. Apr 24 2017: . incorporate Eric's incremental diff . move cap_convert_nscap to setxattr and simplify its usage May 8, 2017: . fix leaking dentry refcount in cap_inode_getsecurity Signed-off-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>		2017-09-01 14:57:15 -05:00
arch	signal: Remove kernel interal si_code magic	2017-07-24 14:30:28 -05:00
block	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	2017-07-11 15:36:52 -07:00
certs	modsign: add markers to endif-statements in certs/Makefile	2017-07-14 11:01:37 +10:00
crypto	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	2017-07-14 22:49:50 -07:00
Documentation	This series converts a number of top-level documents to the RST format	2017-07-15 12:58:58 -07:00
drivers	Add wait_for_random_bytes() and get_random_*_wait() functions so that	2017-07-15 12:44:02 -07:00
firmware	firmware/Makefile: force recompilation if makefile changes	2017-05-08 17:15:10 -07:00
fs	Introduce v3 namespaced file capabilities	2017-09-01 14:57:15 -05:00
include	Introduce v3 namespaced file capabilities	2017-09-01 14:57:15 -05:00
init	random: do not ignore early device randomness	2017-07-12 16:26:00 -07:00
ipc	ipc/util.h: update documentation for ipc_getref() and ipc_putref()	2017-07-12 16:26:02 -07:00
kernel	signal: Fix sending signals with siginfo	2017-07-24 14:39:37 -05:00
lib	Add wait_for_random_bytes() and get_random_*_wait() functions so that	2017-07-15 12:44:02 -07:00
mm	Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-15 12:00:42 -07:00
net	Add wait_for_random_bytes() and get_random_*_wait() functions so that	2017-07-15 12:44:02 -07:00
samples	Merge branch 'akpm' (patches from Andrew)	2017-07-13 12:38:49 -07:00
scripts	Kbuild updates for v4.13 (2nd)	2017-07-13 13:37:57 -07:00
security	Introduce v3 namespaced file capabilities	2017-09-01 14:57:15 -05:00
sound	sound fixes for 4.13-rc1	2017-07-14 12:44:00 -07:00
tools	signal/testing: Don't look for __SI_FAULT in userspace	2017-07-19 19:13:15 -05:00
usr	ramfs: clarify help text that compression applies to ramfs as well as legacy ramdisk.	2017-07-06 16:24:30 -07:00
virt	Second batch of KVM updates for v4.13	2017-07-15 10:18:16 -07:00
.cocciconfig	scripts: add Linux .cocciconfig for coccinelle	2016-07-22 12:13:39 +02:00
.get_maintainer.ignore
.gitattributes	.gitattributes: set git diff driver for C source code files	2016-10-07 18:46:30 -07:00
.gitignore	kbuild: Add support to generate LLVM assembly files	2017-04-25 08:13:52 +09:00
.mailmap	power supply and reset changes for the v4.12 series (part 2)	2017-05-12 12:02:21 -07:00
COPYING
CREDITS	avr32: remove support for AVR32 architecture	2017-05-01 09:27:15 +02:00
Kbuild	kbuild: Consolidate header generation from ASM offset information	2017-04-13 05:43:37 +09:00
Kconfig
MAINTAINERS	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus	2017-07-15 10:59:54 -07:00
Makefile	Linux v4.13-rc1	2017-07-15 15:22:10 -07:00
README	README: add a new README file, pointing to the Documentation/	2016-10-24 08:12:35 -02:00

README

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.