linux/Documentation/filesystems
Linus Torvalds 6aee4badd8 Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull openat2 support from Al Viro:
 "This is the openat2() series from Aleksa Sarai.

  I'm afraid that the rest of namei stuff will have to wait - it got
  zero review the last time I'd posted #work.namei, and there had been a
  leak in the posted series I'd caught only last weekend. I was going to
  repost it on Monday, but the window opened and the odds of getting any
  review during that... Oh, well.

  Anyway, openat2 part should be ready; that _did_ get sane amount of
  review and public testing, so here it comes"

From Aleksa's description of the series:
 "For a very long time, extending openat(2) with new features has been
  incredibly frustrating. This stems from the fact that openat(2) is
  possibly the most famous counter-example to the mantra "don't silently
  accept garbage from userspace" -- it doesn't check whether unknown
  flags are present[1].

  This means that (generally) the addition of new flags to openat(2) has
  been fraught with backwards-compatibility issues (O_TMPFILE has to be
  defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
  kernels gave errors, since it's insecure to silently ignore the
  flag[2]). All new security-related flags therefore have a tough road
  to being added to openat(2).

  Furthermore, the need for some sort of control over VFS's path
  resolution (to avoid malicious paths resulting in inadvertent
  breakouts) has been a very long-standing desire of many userspace
  applications.

  This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
  (which was a variant of David Drysdale's O_BENEATH patchset[4] which
  was a spin-off of the Capsicum project[5]) with a few additions and
  changes made based on the previous discussion within [6] as well as
  others I felt were useful.

  In line with the conclusions of the original discussion of
  AT_NO_JUMPS, the flag has been split up into separate flags. However,
  instead of being an openat(2) flag it is provided through a new
  syscall openat2(2) which provides several other improvements to the
  openat(2) interface (see the patch description for more details). The
  following new LOOKUP_* flags are added:

  LOOKUP_NO_XDEV:

     Blocks all mountpoint crossings (upwards, downwards, or through
     absolute links). Absolute pathnames alone in openat(2) do not
     trigger this. Magic-link traversal which implies a vfsmount jump is
     also blocked (though magic-link jumps on the same vfsmount are
     permitted).

  LOOKUP_NO_MAGICLINKS:

     Blocks resolution through /proc/$pid/fd-style links. This is done
     by blocking the usage of nd_jump_link() during resolution in a
     filesystem. The term "magic-links" is used to match with the only
     reference to these links in Documentation/, but I'm happy to change
     the name.

     It should be noted that this is different to the scope of
     ~LOOKUP_FOLLOW in that it applies to all path components. However,
     you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
     will *not* fail (assuming that no parent component was a
     magic-link), and you will have an fd for the magic-link.

     In order to correctly detect magic-links, the introduction of a new
     LOOKUP_MAGICLINK_JUMPED state flag was required.

  LOOKUP_BENEATH:

     Disallows escapes to outside the starting dirfd's
     tree, using techniques such as ".." or absolute links. Absolute
     paths in openat(2) are also disallowed.

     Conceptually this flag is to ensure you "stay below" a certain
     point in the filesystem tree -- but this requires some additional
     to protect against various races that would allow escape using
     "..".

     Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
     can trivially beam you around the filesystem (breaking the
     protection). In future, there might be similar safety checks done
     as in LOOKUP_IN_ROOT, but that requires more discussion.

  In addition, two new flags are added that expand on the above ideas:

  LOOKUP_NO_SYMLINKS:

     Does what it says on the tin. No symlink resolution is allowed at
     all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
     can still be used with NOFOLLOW to open an fd for the symlink as
     long as no parent path had a symlink component.

  LOOKUP_IN_ROOT:

     This is an extension of LOOKUP_BENEATH that, rather than blocking
     attempts to move past the root, forces all such movements to be
     scoped to the starting point. This provides chroot(2)-like
     protection but without the cost of a chroot(2) for each filesystem
     operation, as well as being safe against race attacks that
     chroot(2) is not.

     If a race is detected (as with LOOKUP_BENEATH) then an error is
     generated, and similar to LOOKUP_BENEATH it is not permitted to
     cross magic-links with LOOKUP_IN_ROOT.

     The primary need for this is from container runtimes, which
     currently need to do symlink scoping in userspace[7] when opening
     paths in a potentially malicious container.

     There is a long list of CVEs that could have bene mitigated by
     having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
     CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
     few).

  In order to make all of the above more usable, I'm working on
  libpathrs[8] which is a C-friendly library for safe path resolution.
  It features a userspace-emulated backend if the kernel doesn't support
  openat2(2). Hopefully we can get userspace to switch to using it, and
  thus get openat2(2) support for free once it's ready.

  Future work would include implementing things like
  RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
  programs to be sure they don't hit DoSes though stale NFS handles)"

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29 11:20:24 -08:00
..
caching Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
cifs various cifs/smb3 fixes (including for share deleted cases) and features including improved encrypted read performance, and various debugging improvements 2019-09-19 10:32:16 -07:00
configfs configfs: fix wrong name of struct in documentation 2018-12-20 08:41:38 -07:00
ext4 Added new ext4 debugging ioctls to allow userspace to get information 2019-09-21 13:37:39 -07:00
nfs docs: fs: convert docs without extension to ReST 2019-07-31 13:31:05 -06:00
9p.txt
adfs.txt
affs.txt
afs.txt afs: Implement @sys substitution handling 2018-04-09 21:12:31 +01:00
api-summary.rst docs: no structured comments in fs/file_table.c 2019-05-24 15:03:39 -06:00
autofs-mount-control.txt autofs: update mount control expire desription with AUTOFS_EXP_FORCED 2019-05-14 19:52:50 -07:00
autofs.rst docs: filesystems: Add mount map description in Content 2019-11-18 12:19:59 -07:00
automount-support.txt autofs: use autofs instead of autofs4 in documentation 2018-06-07 17:34:39 -07:00
befs.txt
bfs.txt Tigran has moved 2017-05-12 15:57:15 -07:00
btrfs.txt
ceph.txt ceph: auto reconnect after blacklisted 2019-09-16 12:06:24 +02:00
coda.txt coda: Fix typo in the struct CodaCred documentation 2019-07-30 14:19:41 -06:00
cramfs.txt cramfs: rehabilitate it 2017-10-15 00:47:23 -04:00
dax.txt Documentation: filesystem: Convert xfs.txt to ReST 2019-07-15 09:15:09 -07:00
debugfs.txt debugfs: Add debugfs_create_xul() for hexadecimal unsigned long 2019-11-03 18:08:53 +01:00
devpts.txt
directory-locking.rst docs: fs: convert docs without extension to ReST 2019-07-31 13:31:05 -06:00
dlmfs.txt
dnotify.txt Documentation: fix selftests related file refs 2017-10-19 12:58:21 -06:00
ecryptfs.txt
efivarfs.txt
erofs.txt erofs: update documentation 2019-12-08 21:37:01 +08:00
ext2.txt doc: ext2: update description of quota options for ext2 2019-05-20 10:50:48 +02:00
ext3.txt
f2fs.txt f2fs: expose main_blkaddr in sysfs 2019-11-25 10:01:27 -08:00
fiemap.txt
files.txt
fscrypt.rst fscrypt: improve format of no-key names 2020-01-22 14:50:03 -08:00
fsverity.rst docs: fs-verity: mention statx() support 2019-11-13 12:15:34 -08:00
fuse-io.txt fuse: add writeback documentation 2018-03-20 17:11:45 +01:00
fuse.txt
gfs2-glocks.txt GFS2: Minor improvements to comments and documentation 2018-04-12 10:07:51 -07:00
gfs2-uevents.txt
gfs2.txt
hfs.txt
hfsplus.txt
hpfs.txt
index.rst docs: filesystems: convert autofs.txt to reST 2019-11-18 12:17:17 -07:00
inotify.txt
isofs.txt
journalling.rst docs: Bring some order to filesystem documentation 2019-03-06 09:46:10 -07:00
locking.rst Documentation: atomic_open called with shared lock on non-O_CREAT open 2019-11-07 13:17:25 -07:00
locks.txt docs: fix locations of several documents that got moved 2016-10-24 08:12:35 -02:00
mandatory-locking.txt locks: print a warning when mount fails due to lack of "mand" support 2019-08-16 12:13:48 -04:00
mount_api.txt vfs: Update mount API docs 2019-03-28 08:54:20 -07:00
nilfs2.txt MAINTAINERS, nilfs2: change project home URLs 2018-01-13 10:42:48 -08:00
ntfs.txt
ocfs2-online-filecheck.txt
ocfs2.txt
omfs.txt
orangefs.txt Orangefs: documentation updates 2018-04-04 14:05:48 -04:00
overlayfs.rst docs: filesystems: overlayfs: Fix restview warnings 2019-12-10 16:00:55 +01:00
path-lookup.rst Documentation: path-lookup: include new LOOKUP flags 2020-01-18 09:19:28 -05:00
path-lookup.txt
porting.rst docs: fs: porting.rst: fix a broken reference to another doc 2019-07-31 14:30:23 -06:00
proc.txt mm: thp: fix false negative of shmem vma's THP eligibility 2019-07-18 17:08:06 -07:00
qnx6.txt Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
quota.txt scripts/spelling.txt: add "an user" pattern and fix typo instances 2017-02-27 18:43:46 -08:00
ramfs-rootfs-initramfs.txt docs: early-userspace: move to driver-api guide 2019-07-15 11:03:01 -03:00
relay.txt Documentation : Update relay function types 2018-07-10 15:11:00 -06:00
romfs.txt
seq_file.txt fs/seq_file.c: simplify seq_file iteration code and interface 2018-08-17 16:20:28 -07:00
sharedsubtree.txt
splice.rst docs: Bring some order to filesystem documentation 2019-03-06 09:46:10 -07:00
spufs.txt Documentation: fix spelling mistake, EACCESS -> EACCES 2018-11-07 15:28:55 -07:00
squashfs.txt
sysfs-pci.txt PCI: Add pci_mmap_resource_range() and use it for ARM64 2017-04-20 08:47:47 -05:00
sysfs-tagging.txt
sysfs.txt docs: driver-model: move it to the driver-api book 2019-07-15 11:03:02 -03:00
sysv-fs.txt
tmpfs.txt docs: cgroup-v1: add it to the admin-guide book 2019-07-15 11:03:02 -03:00
ubifs-authentication.rst docs: ubifs-authentication.md: convert to ReST 2019-07-31 13:25:22 -06:00
ubifs.txt ubifs: Enable authentication support 2018-10-23 13:49:01 +02:00
udf.txt udf: Remove never implemented mount options 2018-02-27 10:25:33 +01:00
vfat.txt Documentation/filesystems/vfat.txt: fix a remark that implies UCS2 2017-12-21 13:39:28 -07:00
vfs.rst docs: fs: convert docs without extension to ReST 2019-07-31 13:31:05 -06:00
virtiofs.rst virtio-fs: add Documentation/filesystems/virtiofs.rst 2019-09-18 15:09:34 +02:00
xfs-delayed-logging-design.txt Documentation: xfs: Fix typo 2019-06-07 11:42:20 -06:00
xfs-self-describing-metadata.txt xfs: add struct xfs_mount pointer to struct xfs_buf 2019-06-28 19:27:29 -07:00