linux/kernel
Peter Xu cefdca0a86 userfaultfd/sysctl: add vm.unprivileged_userfaultfd
Userfaultfd can be misued to make it easier to exploit existing
use-after-free (and similar) bugs that might otherwise only make a
short window or race condition available.  By using userfaultfd to
stall a kernel thread, a malicious program can keep some state that it
wrote, stable for an extended period, which it can then access using an
existing exploit.  While it doesn't cause the exploit itself, and while
it's not the only thing that can stall a kernel thread when accessing a
memory location, it's one of the few that never needs privilege.

We can add a flag, allowing userfaultfd to be restricted, so that in
general it won't be useable by arbitrary user programs, but in
environments that require userfaultfd it can be turned back on.

Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
whether userfaultfd is allowed by unprivileged users.  When this is
set to zero, only privileged users (root user, or users with the
CAP_SYS_PTRACE capability) will be able to use the userfaultfd
syscalls.

Andrea said:

: The only difference between the bpf sysctl and the userfaultfd sysctl
: this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
: requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
: because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
: already if it's doing other kind of tracking on processes runtime, in
: addition of userfaultfd.  In other words both syscalls works only for
: root, when the two sysctl are opt-in set to 1.

[dgilbert@redhat.com: changelog additions]
[akpm@linux-foundation.org: documentation tweak, per Mike]
Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
cgroup Merge branch 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2019-05-09 13:52:12 -07:00
configs kvm_config: add CONFIG_VIRTIO_MENU 2018-10-24 20:55:56 -04:00
debug kdb: use bool for binary state indicators 2018-12-30 08:31:52 +00:00
dma DMA mapping updates for 5.2 2019-05-09 08:40:55 -07:00
events Printk changes for 5.2 2019-05-07 09:18:12 -07:00
gcov kernel/gcov/gcc_3_4.c: use struct_size() in kzalloc() 2019-03-07 18:32:02 -08:00
irq Driver core/kobject patches for 5.2-rc1 2019-05-07 13:01:40 -07:00
livepatch Driver core/kobject patches for 5.2-rc1 2019-05-07 13:01:40 -07:00
locking Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) 2019-05-06 16:57:52 -07:00
power Power management updates for 5.2-rc1 2019-05-06 19:40:31 -07:00
printk fbdev changes for v5.1: 2019-03-15 14:22:59 -07:00
rcu Printk changes for 5.2 2019-05-07 09:18:12 -07:00
sched Driver core/kobject patches for 5.2-rc1 2019-05-07 13:01:40 -07:00
time Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
trace This is the bulk of the GPIO changes for the v5.2 kernel cycle: 2019-05-11 10:54:43 -04:00
.gitignore Provide in-kernel headers to make extending kernel easier 2019-04-29 16:48:03 +02:00
acct.c acct_on(): don't mess with freeze protection 2019-04-04 21:04:13 -04:00
async.c treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively 2019-04-09 14:19:06 +02:00
audit_fsnotify.c audit_compare_dname_path(): switch to const struct qstr * 2019-04-28 20:33:43 -04:00
audit_tree.c fsnotify: switch send_to_group() and ->handle_event to const struct qstr * 2019-04-26 13:51:03 -04:00
audit_watch.c audit_compare_dname_path(): switch to const struct qstr * 2019-04-28 20:33:43 -04:00
audit.c audit: connect LOGIN record to its syscall record 2019-03-20 20:57:48 -04:00
audit.h audit_compare_dname_path(): switch to const struct qstr * 2019-04-28 20:33:43 -04:00
auditfilter.c Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-05-07 20:03:32 -07:00
auditsc.c Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-05-07 20:03:32 -07:00
backtracetest.c backtrace-test: Simplify stack trace handling 2019-04-29 12:37:47 +02:00
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-10-31 08:54:14 -07:00
capability.c LSM: add SafeSetID module that gates setid calls 2019-01-25 11:22:43 -08:00
compat.c time: make adjtime compat handling available for 32 bit 2019-02-07 00:13:27 +01:00
configs.c kernel/configs: use .incbin directive to embed config_data.gz 2019-03-07 18:32:02 -08:00
context_tracking.c
cpu_pm.c
cpu.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-05-06 14:50:46 -07:00
crash_core.c kexec: export PG_offline to VMCOREINFO 2019-03-05 21:07:14 -08:00
crash_dump.c
cred.c SELinux: Remove cred security blob poisoning 2019-01-08 13:18:44 -08:00
delayacct.c delayacct: track delays from thrashing cache pages 2018-10-26 16:26:32 -07:00
dma.c
elfcore.c
exec_domain.c
exit.c Merge branch 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2019-03-07 10:11:41 -08:00
extable.c
fail_function.c treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively 2019-04-09 14:19:06 +02:00
fork.c fork: do not release lock that wasn't taken 2019-05-10 14:26:12 +02:00
freezer.c
futex.c locking/futex: Allow low-level atomic operations to return -EAGAIN 2019-04-26 13:57:31 +01:00
gen_ikh_data.sh Provide in-kernel headers to make extending kernel easier 2019-04-29 16:48:03 +02:00
groups.c
hung_task.c kernel/hung_task.c: Use continuously blocked time when reporting. 2019-03-07 18:31:59 -08:00
iomem.c mm/resource: Use resource_overlaps() to simplify region_intersects() 2019-04-19 12:59:36 +02:00
irq_work.c irq_work: Do not raise an IPI when queueing work on the local CPU 2019-04-18 14:07:52 +02:00
jump_label.c locking/static_key: Don't take sleeping locks in __static_key_slow_dec_deferred() 2019-04-29 08:29:21 +02:00
kallsyms.c bpf: Add module name [bpf] to ksymbols for bpf programs 2019-01-21 17:38:56 -03:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) 2019-05-06 16:57:52 -07:00
Kconfig.preempt kconfig: warn no new line at end of file 2018-12-15 17:44:35 +09:00
kcov.c kcov: convert kcov.refcount to refcount_t 2019-03-07 18:32:02 -08:00
kexec_core.c power/suspend: Add function to disable secondaries for suspend 2019-05-03 19:42:41 +02:00
kexec_file.c crypto: shash - remove shash_desc::flags 2019-04-25 15:38:12 +08:00
kexec_internal.h
kexec.c
kheaders.c Provide in-kernel headers to make extending kernel easier 2019-04-29 16:48:03 +02:00
kmod.c
kprobes.c kprobes: Fix error check when reusing optimized probes 2019-04-16 09:38:16 +02:00
ksysfs.c
kthread.c Merge branch 'akpm' (patches from Andrew) 2019-03-06 10:31:36 -08:00
latencytop.c latency_top: Simplify stack trace handling 2019-04-29 12:37:48 +02:00
Makefile Driver core/kobject patches for 5.2-rc1 2019-05-07 13:01:40 -07:00
memremap.c mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm 2018-12-28 12:11:52 -08:00
module_signing.c modsign: use all trusted keys to verify module signature 2018-11-07 14:41:41 +01:00
module-internal.h
module.c Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-05-09 12:54:40 -07:00
notifier.c
nsproxy.c
padata.c padata: Replace padata_attr_type default_attrs field with groups 2019-04-25 22:06:11 +02:00
panic.c s390: simplify disabled_wait 2019-05-02 13:54:11 +02:00
params.c
pid_namespace.c
pid.c Fix failure path in alloc_pid() 2018-12-28 12:42:30 -08:00
profile.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
ptrace.c ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK 2019-03-29 10:01:37 -07:00
range.c
reboot.c
relay.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 13:27:20 -07:00
resource.c mm/resource: Use resource_overlaps() to simplify region_intersects() 2019-04-19 12:59:36 +02:00
rseq.c rseq: Remove superfluous rseq_len from task_struct 2019-04-19 12:39:32 +02:00
seccomp.c audit/stable-5.2 PR 20190507 2019-05-07 19:06:04 -07:00
signal.c Merge branch 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2019-05-09 13:52:12 -07:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-01-30 19:27:00 +01:00
smpboot.c
smpboot.h
softirq.c softirq: Remove tasklet_hrtimer 2019-03-22 14:36:02 +01:00
stackleak.c stackleak: Mark stackleak_track_stack() as notrace 2018-12-05 19:31:44 -08:00
stacktrace.c stacktrace: Provide common infrastructure 2019-04-29 12:37:57 +02:00
stop_machine.c treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively 2019-04-09 14:19:06 +02:00
sys_ni.c signal: support CLONE_PIDFD with pidfd_send_signal 2019-05-07 14:31:03 +02:00
sys.c kernel/sys.c: prctl: fix false positive in validate_prctl_map() 2019-05-14 09:47:44 -07:00
sysctl_binary.c kernel/sysctl: add panic_print into sysctl 2019-01-04 13:13:47 -08:00
sysctl.c userfaultfd/sysctl: add vm.unprivileged_userfaultfd 2019-05-14 09:47:45 -07:00
task_work.c
taskstats.c genetlink: optionally validate strictly/dumps 2019-04-27 17:07:22 -04:00
test_kprobes.c
torture.c torture: Don't try to offline the last CPU 2019-03-26 14:42:53 -07:00
tracepoint.c tracing: Replace synchronize_sched() and call_rcu_sched() 2018-11-27 09:21:41 -08:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c umh: add exit routine for UMH process 2019-01-11 18:05:40 -08:00
up.c
user_namespace.c userns: also map extents in the reverse map to kernel IDs 2018-11-07 23:51:16 -06:00
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog_hld.c kernel/watchdog_hld.c: hard lockup message should end with a newline 2019-04-19 09:46:05 -07:00
watchdog.c watchdog: Fix typo in comment 2019-04-18 14:05:51 +02:00
workqueue_internal.h sched/core, workqueues: Distangle worker accounting from rq lock 2019-04-16 16:55:15 +02:00
workqueue.c Merge branch 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2019-05-09 13:48:52 -07:00