linux/include
Adrian Reber 49cb2fc42c fork: extend clone3() to support setting a PID
The main motivation to add set_tid to clone3() is CRIU.

To restore a process with the same PID/TID CRIU currently uses
/proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to
ns_last_pid and then (quickly) does a clone(). This works most of the
time, but it is racy. It is also slow as it requires multiple syscalls.

Extending clone3() to support *set_tid makes it possible restore a
process using CRIU without accessing /proc/sys/kernel/ns_last_pid and
race free (as long as the desired PID/TID is available).

This clone3() extension places the same restrictions (CAP_SYS_ADMIN)
on clone3() with *set_tid as they are currently in place for ns_last_pid.

The original version of this change was using a single value for
set_tid. At the 2019 LPC, after presenting set_tid, it was, however,
decided to change set_tid to an array to enable setting the PID of a
process in multiple PID namespaces at the same time. If a process is
created in a PID namespace it is possible to influence the PID inside
and outside of the PID namespace. Details also in the corresponding
selftest.

To create a process with the following PIDs:

      PID NS level         Requested PID
        0 (host)              31496
        1                        42
        2                         1

For that example the two newly introduced parameters to struct
clone_args (set_tid and set_tid_size) would need to be:

  set_tid[0] = 1;
  set_tid[1] = 42;
  set_tid[2] = 31496;
  set_tid_size = 3;

If only the PIDs of the two innermost nested PID namespaces should be
defined it would look like this:

  set_tid[0] = 1;
  set_tid[1] = 42;
  set_tid_size = 2;

The PID of the newly created process would then be the next available
free PID in the PID namespace level 0 (host) and 42 in the PID namespace
at level 1 and the PID of the process in the innermost PID namespace
would be 1.

The set_tid array is used to specify the PID of a process starting
from the innermost nested PID namespaces up to set_tid_size PID namespaces.

set_tid_size cannot be larger then the current PID namespace level.

Signed-off-by: Adrian Reber <areber@redhat.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191115123621.142252-1-areber@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-11-15 23:49:22 +01:00
..
acpi ACPI updates for 5.4-rc1 2019-09-17 19:31:36 -07:00
asm-generic Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
clocksource
crypto Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity 2019-09-27 19:37:27 -07:00
drm drm: Measure Self Refresh Entry/Exit times to avoid thrashing 2019-09-19 10:03:32 -04:00
dt-bindings Main MIPS changes for v5.4: 2019-09-22 09:30:30 -07:00
keys
kvm
linux fork: extend clone3() to support setting a PID 2019-11-15 23:49:22 +01:00
math-emu nds32: Mark expected switch fall-throughs 2019-08-29 11:06:56 -05:00
media
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2019-09-27 20:15:00 +02:00
pcmcia
ras
rdma RDMA subsystem updates for 5.4 2019-09-21 10:26:24 -07:00
scsi SCSI misc on 20190919 2019-09-21 10:50:15 -07:00
soc Char/Misc driver patches for 5.4-rc1 2019-09-18 11:14:31 -07:00
sound ASoC: Updates for v5.4 2019-09-10 13:03:08 +02:00
target
trace Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-10-05 08:50:15 -07:00
uapi fork: extend clone3() to support setting a PID 2019-11-15 23:49:22 +01:00
vdso
video
xen xen: fixes and cleanups for 5.4-rc2 2019-10-04 11:13:09 -07:00
Kbuild - New Drivers 2019-09-23 19:37:49 -07:00