linux/kernel
Linus Torvalds 83fa805bcb threads-v5.6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXjFo8wAKCRCRxhvAZXjc
 omaGAQDVwCHQekqxp2eC8EJH4Pkt+Bn1BLrA25stlTo93YBPHgEAsPVUCRNcrZAl
 VncYmxCfpt3Yu0S/MTVXu5xrRiIXPQk=
 =uqTN
 -----END PGP SIGNATURE-----

Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread management updates from Christian Brauner:
 "Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
  syscall.

  This syscall allows for the retrieval of file descriptors of a process
  based on its pidfd. A task needs to have ptrace_may_access()
  permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
  Andy) on the target.

  One of the main use-cases is in combination with seccomp's user
  notification feature. As a reminder, seccomp's user notification
  feature was made available in v5.0. It allows a task to retrieve a
  file descriptor for its seccomp filter. The file descriptor is usually
  handed of to a more privileged supervising process. The supervisor can
  then listen for syscall events caught by the seccomp filter of the
  supervisee and perform actions in lieu of the supervisee, usually
  emulating syscalls. pidfd_getfd() is needed to expand its uses.

  There are currently two major users that wait on pidfd_getfd() and one
  future user:

   - Netflix, Sargun said, is working on a service mesh where users
     should be able to connect to a dns-based VIP. When a user connects
     to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
     redirected to an envoy process. This service mesh uses seccomp user
     notifications and pidfd to intercept all connect calls and instead
     of connecting them to 1.2.3.4:80 connects them to e.g.
     127.0.0.1:8080.

   - LXD uses the seccomp notifier heavily to intercept and emulate
     mknod() and mount() syscalls for unprivileged containers/processes.
     With pidfd_getfd() more uses-cases e.g. bridging socket connections
     will be possible.

   - The patchset has also seen some interest from the browser corner.
     Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
     broker process. In the future glibc will start blocking all signals
     during dlopen() rendering this type of sandbox impossible. Hence,
     in the future Firefox will switch to a seccomp-user-nofication
     based sandbox which also makes use of file descriptor retrieval.
     The thread for this can be found at
     https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html

  With pidfd_getfd() it is e.g. possible to bridge socket connections
  for the supervisee (binding to a privileged port) and taking actions
  on file descriptors on behalf of the supervisee in general.

  Sargun's first version was using an ioctl on pidfds but various people
  pushed for it to be a proper syscall which he duely implemented as
  well over various review cycles. Selftests are of course included.
  I've also added instructions how to deal with merge conflicts below.

  There's also a small fix coming from the kernel mentee project to
  correctly annotate struct sighand_struct with __rcu to fix various
  sparse warnings. We've received a few more such fixes and even though
  they are mostly trivial I've decided to postpone them until after -rc1
  since they came in rather late and I don't want to risk introducing
  build warnings.

  Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
  needed to avoid allocation recursions triggerable by storage drivers
  that have userspace parts that run in the IO path (e.g. dm-multipath,
  iscsi, etc). These allocation recursions deadlock the device.

  The new prctl() allows such privileged userspace components to avoid
  allocation recursions by setting the PF_MEMALLOC_NOIO and
  PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
  relevant maintainers and is routed here as part of prctl()
  thread-management."

* tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
  sched.h: Annotate sighand_struct with __rcu
  test: Add test for pidfd getfd
  arch: wire up pidfd_getfd syscall
  pid: Implement pidfd_getfd syscall
  vfs, fdtable: Add fget_task helper
2020-01-29 19:38:34 -08:00
..
bpf Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-01-29 11:20:24 -08:00
cgroup Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-01-28 16:02:33 -08:00
configs
debug kdb: Tweak escape handling for vi users 2019-10-28 12:08:29 +00:00
dma lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr 2019-12-04 19:44:13 -08:00
events Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-01-29 11:20:24 -08:00
gcov Revert "um: Enable CONFIG_CONSTRUCTORS" 2020-01-19 22:42:06 +01:00
irq irqchip updates for Linux 5.6: 2020-01-24 20:08:51 +01:00
livepatch New tracing features: 2019-11-27 11:42:01 -08:00
locking Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-01-28 09:33:25 -08:00
power Merge back new material related to system-wide PM for v5.6. 2020-01-23 16:00:56 +01:00
printk printk: fix exclusive_console replaying 2020-01-02 16:15:04 +01:00
rcu rcu: Forgive slow expedited grace periods at boot time 2020-01-25 12:00:40 -08:00
sched Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-01-28 10:07:09 -08:00
time y2038: core, driver and file system changes 2020-01-29 14:55:47 -08:00
trace Kprobe events added "ustring" to distinguish reading strings from kernel space 2020-01-28 18:26:09 -08:00
.gitignore
acct.c acct: stop using get_seconds() 2019-12-18 18:07:31 +01:00
async.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
audit_fsnotify.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
audit_tree.c
audit_watch.c audit_get_nd(): don't unlock parent too early 2019-11-10 11:56:55 -05:00
audit.c audit: Add __rcu annotation to RCU pointer 2019-12-09 15:19:03 -05:00
audit.h audit/stable-5.3 PR 20190702 2019-07-08 18:55:42 -07:00
auditfilter.c audit/stable-5.3 PR 20190702 2019-07-08 18:55:42 -07:00
auditsc.c Revert "bpf: Emit audit messages upon successful prog load and unload" 2019-11-23 09:56:02 -08:00
backtracetest.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
bounds.c
capability.c
compat.c y2038: itimer: compat handling to itimer.c 2019-11-15 14:38:30 +01:00
configs.c kernel/configs: Replace GPL boilerplate code with SPDX identifier 2019-07-30 18:34:15 +02:00
context_tracking.c context_tracking: Rename context_tracking_is_enabled() => context_tracking_enabled() 2019-10-29 10:01:12 +01:00
cpu_pm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 282 2019-06-05 17:36:37 +02:00
cpu.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-01-28 10:07:09 -08:00
crash_core.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 230 2019-06-19 17:09:06 +02:00
crash_dump.c
cred.c Merge branch 'dhowells' (patches from DavidH) 2020-01-14 09:56:31 -08:00
delayacct.c
dma.c
elfcore.c kernel/elfcore.c: include proper prototypes 2019-09-25 17:51:39 -07:00
exec_domain.c
exit.c for-linus-2020-01-03 2020-01-03 11:17:14 -08:00
extable.c bpf: Allow to resolve bpf trampoline and dispatcher in unwind 2020-01-25 07:12:40 -08:00
fail_function.c fail_function: no need to check return value of debugfs_create functions 2019-06-03 15:49:06 +02:00
fork.c The timekeeping and timers departement provides: 2020-01-27 16:47:05 -08:00
freezer.c Revert "libata, freezer: avoid block device removal while system is frozen" 2019-10-06 09:11:37 -06:00
futex.c futex: Fix kernel-doc notation warning 2020-01-09 13:23:40 +01:00
gen_kheaders.sh kheaders: explain why include/config/autoconf.h is excluded from md5sum 2019-11-11 20:10:01 +09:00
groups.c
hung_task.c
iomem.c mm/nvdimm: add is_ioremap_addr and use that to check ioremap address 2019-07-12 11:05:40 -07:00
irq_work.c irq_work: Fix IRQ_WORK_BUSY bit clearing 2019-11-15 10:48:37 +01:00
jump_label.c jump_label: Don't warn on __exit jump entries 2019-08-29 15:10:10 +01:00
kallsyms.c kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol 2019-08-27 16:19:56 +01:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks sched/rt, locking: Use CONFIG_PREEMPTION 2019-12-08 14:37:36 +01:00
Kconfig.preempt sched/Kconfig: Fix spelling mistake in user-visible help text 2019-11-12 11:35:32 +01:00
kcov.c kcov: remote coverage support 2019-12-04 19:44:14 -08:00
kexec_core.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec_elf.c kexec_elf: support 32 bit ELF files 2019-09-06 23:58:44 +02:00
kexec_file.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec_internal.h kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kheaders.c
kmod.c
kprobes.c kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic 2020-01-09 12:40:13 +01:00
ksysfs.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 170 2019-05-30 11:26:39 -07:00
kthread.c kthread: make __kthread_queue_delayed_work static 2019-10-16 09:20:58 -07:00
latencytop.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
Makefile Kbuild updates for v5.5 2019-12-02 17:35:04 -08:00
module_signature.c MODSIGN: Export module signature definitions 2019-08-05 18:39:56 -04:00
module_signing.c MODSIGN: Export module signature definitions 2019-08-05 18:39:56 -04:00
module-internal.h
module.c Merge branch 'core/kprobes' into perf/core, to pick up a completed branch 2019-12-25 10:43:08 +01:00
notifier.c kernel/notifier.c: remove blocking_notifier_chain_cond_register() 2019-12-04 19:44:12 -08:00
nsproxy.c ns: Introduce Time Namespace 2020-01-14 12:20:48 +01:00
padata.c padata: update documentation 2019-12-11 16:37:02 +08:00
panic.c locking/refcount: Remove unused 'refcount_error_report()' function 2019-11-25 09:15:42 +01:00
params.c lockdown: Lock down module params that specify hardware parameters (eg. ioport) 2019-08-19 21:54:16 -07:00
pid_namespace.c fork: extend clone3() to support setting a PID 2019-11-15 23:49:22 +01:00
pid.c pid: Implement pidfd_getfd syscall 2020-01-13 21:49:36 +01:00
profile.c kernel/profile.c: use cpumask_available to check for NULL cpumask 2019-12-04 19:44:12 -08:00
ptrace.c ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() 2020-01-18 13:51:39 +01:00
range.c
reboot.c
relay.c
resource.c mm/memory_hotplug.c: use PFN_UP / PFN_DOWN in walk_system_ram_range() 2019-09-24 15:54:09 -07:00
rseq.c rseq: Reject unknown flags on rseq unregister 2019-12-25 10:41:20 +01:00
seccomp.c seccomp: Check that seccomp_notif is zeroed out by the user 2020-01-02 13:03:45 -08:00
signal.c sched.h: Annotate sighand_struct with __rcu 2020-01-26 10:54:47 +01:00
smp.c smp: Remove allocation mask from on_each_cpu_cond.*() 2020-01-24 20:40:09 +01:00
smpboot.c
smpboot.h
softirq.c Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 11:01:13 -07:00
stackleak.c
stacktrace.c stacktrace: Get rid of unneeded '!!' pattern 2019-11-11 10:30:59 +01:00
stop_machine.c stop_machine: Make stop_cpus() static 2020-01-17 10:19:21 +01:00
sys_ni.c y2038: allow disabling time32 system calls 2019-11-15 14:38:30 +01:00
sys.c prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim 2020-01-28 10:09:51 +01:00
sysctl_binary.c sysctl: Remove the sysctl system call 2019-11-26 13:03:56 -06:00
sysctl-test.c kunit: allow kunit tests to be loaded as a module 2020-01-09 16:42:29 -07:00
sysctl.c rcu: Make PREEMPT_RCU be a modifier to TREE_RCU 2019-12-09 12:37:51 -08:00
task_work.c
taskstats.c taskstats: fix data-race 2019-12-04 15:18:39 +01:00
test_kprobes.c
torture.c torture: Remove exporting of internal functions 2019-08-01 14:30:22 -07:00
tracepoint.c The main changes in this release include: 2019-07-18 11:51:00 -07:00
tsacct.c tsacct: add 64-bit btime field 2019-12-18 18:07:31 +01:00
ucount.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
uid16.c
uid16.h
umh.c
up.c smp: Remove allocation mask from on_each_cpu_cond.*() 2020-01-24 20:40:09 +01:00
user_namespace.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
user-return-notifier.c
user.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
utsname_sysctl.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
utsname.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
watchdog_hld.c
watchdog.c watchdog/softlockup: Enforce that timestamp is valid on boot 2020-01-17 11:19:22 +01:00
workqueue_internal.h
workqueue.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-01-28 10:07:09 -08:00