linux/fs/proc
Christian Brauner d80b065bb1
Merge patch series "proc: restrict overmounting of ephemeral entities"
Christian Brauner <brauner@kernel.org> says:

It is currently possible to mount on top of various ephemeral entities
in procfs. This specifically includes magic links. To recap, magic links
are links of the form /proc/<pid>/fd/<nr>. They serve as references to
a target file and during path lookup they cause a jump to the target
path. Such magic links disappear if the corresponding file descriptor is
closed.

Currently it is possible to overmount such magic links:

int fd = open("/mnt/foo", O_RDONLY);
sprintf(path, "/proc/%d/fd/%d", getpid(), fd);
int fd2 = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW);
mount("/mnt/bar", path, "", MS_BIND, 0);

Arguably, this is nonsensical and is mostly interesting for an attacker
that wants to somehow trick a process into e.g., reopening something
that they didn't intend to reopen or to hide a malicious file
descriptor.

But also it risks leaking mounts for long-running processes. When
overmounting a magic link like above, the mount will not be detached
when the file descriptor is closed. Only the target mountpoint will
disappear. Which has the consequence of making it impossible to unmount
that mount afterwards. So the mount will stick around until the process
exits and the /proc/<pid>/ directory is cleaned up during
proc_flush_pid() when the dentries are pruned and invalidated.

That in turn means it's possible for a program to accidentally leak
mounts and it's also possible to make a task leak mounts without it's
knowledge if the attacker just keeps overmounting things under
/proc/<pid>/fd/<nr>.

I think it's wrong to try and fix this by us starting to play games with
close() or somewhere else to undo these mounts when the file descriptor
is closed. The fact that we allow overmounting of such magic links is
simply a bug and one that we need to fix.

Similar things can be said about entries under fdinfo/ and map_files/ so
those are restricted as well.

I have a further more aggressive patch that gets out the big hammer and
makes everything under /proc/<pid>/*, as well as immediate symlinks such
as /proc/self, /proc/thread-self, /proc/mounts, /proc/net that point
into /proc/<pid>/ not overmountable. Imho, all of this should be blocked
if we can get away with it. It's only useful to hide exploits such as in [1].

And again, overmounting of any global procfs files remains unaffected
and is an existing and supported use-case.

Link: https://righteousit.com/2024/07/24/hiding-linux-processes-with-bind-mounts [1]

// Note that repro uses the traditional way of just mounting over
// /proc/<pid>/fd/<nr>. This could also all be achieved just based on
// file descriptors using move_mount(). So /proc/<pid>/fd/<nr> isn't the
// only entry vector here. It's also possible to e.g., mount directly
// onto /proc/<pid>/map_files/* without going over /proc/<pid>/fd/<nr>.
int main(int argc, char *argv[])
{
        char path[PATH_MAX];

        creat("/mnt/foo", 0777);
        creat("/mnt/bar", 0777);

        /*
         * For illustration use a bunch of file descriptors in the upper
         * range that are unused.
         */
        for (int i = 10000; i >= 256; i--) {
                printf("I'm: /proc/%d/\n", getpid());

                int fd2 = open("/mnt/foo", O_RDONLY);
                if (fd2 < 0) {
                        printf("%m - Failed to open\n");
                        _exit(1);
                }

                int newfd = dup2(fd2, i);
                if (newfd < 0) {
                        printf("%m - Failed to dup\n");
                        _exit(1);
                }
                close(fd2);

                sprintf(path, "/proc/%d/fd/%d", getpid(), newfd);
                int fd = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW);
                if (fd < 0) {
                        printf("%m - Failed to open\n");
                        _exit(3);
                }

                sprintf(path, "/proc/%d/fd/%d", getpid(), fd);
                printf("Mounting on top of %s\n", path);
                if (mount("/mnt/bar", path, "", MS_BIND, 0)) {
                        printf("%m - Failed to mount\n");
                        _exit(4);
                }

                close(newfd);
                close(fd2);
        }

        /*
         * Give some time to look at things. The mounts now linger until
         * the process exits.
         */
        sleep(10000);
        _exit(0);
}

* patches from https://lore.kernel.org/r/20240806-work-procfs-v1-0-fb04e1d09f0c@kernel.org:
  proc: block mounting on top of /proc/<pid>/fdinfo/*
  proc: block mounting on top of /proc/<pid>/fd/*
  proc: block mounting on top of /proc/<pid>/map_files/*
  proc: add proc_splice_unmountable()
  proc: proc_readfdinfo() -> proc_fdinfo_iterate()
  proc: proc_readfd() -> proc_fd_iterate()

Link: https://lore.kernel.org/r/20240806-work-procfs-v1-0-fb04e1d09f0c@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30 08:22:13 +02:00
..
array.c fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats 2024-02-07 21:20:33 -08:00
base.c Merge patch series "proc: restrict overmounting of ephemeral entities" 2024-08-30 08:22:13 +02:00
bootconfig.c fs/proc: Skip bootloader comment if no embedded kernel parameters 2024-04-09 23:36:18 +09:00
cmdline.c proc: mark /proc/cmdline as permanent 2023-02-02 22:50:02 -08:00
consoles.c proc: consoles: use console_list_lock for list iteration 2022-12-02 11:25:02 +01:00
cpuinfo.c x86/aperfmperf: Replace aperfmperf_get_khz() 2022-04-27 20:22:19 +02:00
devices.c proc: mark more files as permanent 2022-10-03 14:21:45 -07:00
fd.c proc: block mounting on top of /proc/<pid>/fdinfo/* 2024-08-30 08:22:13 +02:00
fd.h fs: port ->permission() to pass mnt_idmap 2023-01-19 09:24:28 +01:00
generic.c proc: Remove usage of the deprecated ida_simple_xx() API 2024-06-25 11:15:47 +02:00
inode.c mm: switch mm->get_unmapped_area() to a flag 2024-04-25 20:56:25 -07:00
internal.h proc: add proc_splice_unmountable() 2024-08-30 08:22:12 +02:00
interrupts.c
Kconfig crash: split vmcoreinfo exporting code out from crash_core.c 2024-02-23 17:48:22 -08:00
kcore.c crash: split vmcoreinfo exporting code out from crash_core.c 2024-02-23 17:48:22 -08:00
kmsg.c printk changes for 6.1 2022-10-10 11:24:19 -07:00
loadavg.c proc: mark more files as permanent 2022-10-03 14:21:45 -07:00
Makefile kbuild: make -Woverride-init warnings more consistent 2024-03-31 11:32:26 +09:00
meminfo.c mm: zswap: optimize zswap pool size tracking 2024-04-25 20:55:47 -07:00
namespaces.c
nommu.c fs: create helper file_user_path() for user displayed mapped file path 2023-10-19 11:03:15 +02:00
page.c kpageflags: detect isolated KPF_THP folios 2024-07-12 15:52:21 -07:00
proc_net.c fs: Add kernel-doc comments to proc_create_net_data_write() 2024-03-26 09:01:18 +01:00
proc_sysctl.c sysctl: Warn on an empty procname element 2024-06-13 10:50:52 +02:00
proc_tty.c proc: delete unused <linux/uaccess.h> includes 2022-07-17 17:31:39 -07:00
root.c procfs: make freeing proc_fs_info rcu-delayed 2024-02-25 02:10:32 -05:00
self.c proc: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
softirqs.c proc: mark more files as permanent 2022-10-03 14:21:45 -07:00
stat.c proc/stat: remove arch_idle_time() 2023-04-18 16:39:33 -07:00
task_mmu.c Random number generator updates for Linux 6.11-rc1. 2024-07-24 10:29:50 -07:00
task_nommu.c vfs-6.7.misc 2023-10-30 09:14:19 -10:00
thread_self.c proc: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
uptime.c proc: mark more files as permanent 2022-10-03 14:21:45 -07:00
util.c
version.c proc: mark more files as permanent 2022-10-03 14:21:45 -07:00
vmcore.c fs/proc: fix softlockup in __read_vmcore 2024-05-11 15:51:44 -07:00