linux/kernel
Eric W. Biederman 63f818f46a proc: Use a dedicated lock in struct pid
syzbot wrote:
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 5.6.0-syzkaller #0 Not tainted
> --------------------------------------------------------
> swapper/1/0 just changed the state of lock:
> ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigurg+0x9f/0x320 fs/fcntl.c:840
> but this lock took another, SOFTIRQ-unsafe lock in the past:
>  (&pid->wait_pidfd){+.+.}-{2:2}
>
>
> and interrupts could create inverse lock ordering between them.
>
>
> other info that might help us debug this:
>  Possible interrupt unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&pid->wait_pidfd);
>                                local_irq_disable();
>                                lock(tasklist_lock);
>                                lock(&pid->wait_pidfd);
>   <Interrupt>
>     lock(tasklist_lock);
>
>  *** DEADLOCK ***
>
> 4 locks held by swapper/1/0:

The problem is that because wait_pidfd.lock is taken under the tasklist
lock.  It must always be taken with irqs disabled as tasklist_lock can be
taken from interrupt context and if wait_pidfd.lock was already taken this
would create a lock order inversion.

Oleg suggested just disabling irqs where I have added extra calls to
wait_pidfd.lock.  That should be safe and I think the code will eventually
do that.  It was rightly pointed out by Christian that sharing the
wait_pidfd.lock was a premature optimization.

It is also true that my pre-merge window testing was insufficient.  So
remove the premature optimization and give struct pid a dedicated lock of
it's own for struct pid things.  I have verified that lockdep sees all 3
paths where we take the new pid->lock and lockdep does not complain.

It is my current day dream that one day pid->lock can be used to guard the
task lists as well and then the tasklist_lock won't need to be held to
deliver signals.  That will require taking pid->lock with irqs disabled.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/00000000000011d66805a25cd73f@google.com/
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: syzbot+343f75cdeea091340956@syzkaller.appspotmail.com
Reported-by: syzbot+832aabf700bc3ec920b9@syzkaller.appspotmail.com
Reported-by: syzbot+f675f964019f884dbd0f@syzkaller.appspotmail.com
Reported-by: syzbot+a9fb1457d720a55d6dc5@syzkaller.appspotmail.com
Fixes: 7bc3e6e55a ("proc: Use a list of inodes to flush from proc")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-04-09 12:15:35 -05:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-02-08 17:15:08 -08:00
cgroup Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2020-02-10 17:07:05 -08:00
configs
debug Revert "kdb: Get rid of confusing diag msg from "rd" if current task has no regs" 2020-02-06 11:40:09 +00:00
dma lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr 2019-12-04 19:44:13 -08:00
events perf: Use new infrastructure to fix deadlocks in execve 2020-03-25 10:04:01 -05:00
gcov Revert "um: Enable CONFIG_CONSTRUCTORS" 2020-01-19 22:42:06 +01:00
irq A set of fixes for X86: 2020-02-09 12:11:12 -08:00
livepatch New tracing features: 2019-11-27 11:42:01 -08:00
locking proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
power ACPI: PM: s2idle: Avoid possible race related to the EC GPE 2020-02-11 10:11:02 +01:00
printk printk: fix exclusive_console replaying 2020-01-02 16:15:04 +01:00
rcu rcu: Forgive slow expedited grace periods at boot time 2020-01-25 12:00:40 -08:00
sched Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-02-15 12:51:22 -08:00
time Two small fixes for the time(r) subsystem: 2020-02-09 12:00:12 -08:00
trace Various fixes: 2020-02-11 16:39:18 -08:00
.gitignore
acct.c acct: stop using get_seconds() 2019-12-18 18:07:31 +01:00
async.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
audit_fsnotify.c
audit_tree.c
audit_watch.c audit_get_nd(): don't unlock parent too early 2019-11-10 11:56:55 -05:00
audit.c audit: Add __rcu annotation to RCU pointer 2019-12-09 15:19:03 -05:00
audit.h audit/stable-5.3 PR 20190702 2019-07-08 18:55:42 -07:00
auditfilter.c audit/stable-5.3 PR 20190702 2019-07-08 18:55:42 -07:00
auditsc.c Revert "bpf: Emit audit messages upon successful prog load and unload" 2019-11-23 09:56:02 -08:00
backtracetest.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
bounds.c
capability.c
compat.c y2038: itimer: compat handling to itimer.c 2019-11-15 14:38:30 +01:00
configs.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
context_tracking.c context_tracking: Rename context_tracking_is_enabled() => context_tracking_enabled() 2019-10-29 10:01:12 +01:00
cpu_pm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 282 2019-06-05 17:36:37 +02:00
cpu.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-01-28 10:07:09 -08:00
crash_core.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 230 2019-06-19 17:09:06 +02:00
crash_dump.c
cred.c kernel: doc: remove outdated comment cred.c 2020-03-25 10:04:01 -05:00
delayacct.c
dma.c
elfcore.c kernel/elfcore.c: include proper prototypes 2019-09-25 17:51:39 -07:00
exec_domain.c
exit.c proc: Use a list of inodes to flush from proc 2020-02-24 10:14:44 -06:00
extable.c bpf: Allow to resolve bpf trampoline and dispatcher in unwind 2020-01-25 07:12:40 -08:00
fail_function.c fail_function: no need to check return value of debugfs_create functions 2019-06-03 15:49:06 +02:00
fork.c exec: Fix a deadlock in strace 2020-03-25 10:04:00 -05:00
freezer.c Revert "libata, freezer: avoid block device removal while system is frozen" 2019-10-06 09:11:37 -06:00
futex.c futex: Fix kernel-doc notation warning 2020-01-09 13:23:40 +01:00
gen_kheaders.sh kheaders: explain why include/config/autoconf.h is excluded from md5sum 2019-11-11 20:10:01 +09:00
groups.c
hung_task.c
iomem.c mm/nvdimm: add is_ioremap_addr and use that to check ioremap address 2019-07-12 11:05:40 -07:00
irq_work.c irq_work: Fix IRQ_WORK_BUSY bit clearing 2019-11-15 10:48:37 +01:00
jump_label.c jump_label: Don't warn on __exit jump entries 2019-08-29 15:10:10 +01:00
kallsyms.c Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
kcmp.c kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve 2020-03-25 10:04:01 -05:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks sched/rt, locking: Use CONFIG_PREEMPTION 2019-12-08 14:37:36 +01:00
Kconfig.preempt sched/Kconfig: Fix spelling mistake in user-visible help text 2019-11-12 11:35:32 +01:00
kcov.c kcov: remote coverage support 2019-12-04 19:44:14 -08:00
kexec_core.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec_elf.c kexec_elf: support 32 bit ELF files 2019-09-06 23:58:44 +02:00
kexec_file.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec_internal.h kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kheaders.c
kmod.c
kprobes.c kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic 2020-01-09 12:40:13 +01:00
ksysfs.c
kthread.c kthread: make __kthread_queue_delayed_work static 2019-10-16 09:20:58 -07:00
latencytop.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
Makefile kcov: ignore fault-inject and stacktrace 2020-01-31 10:30:41 -08:00
module_signature.c MODSIGN: Export module signature definitions 2019-08-05 18:39:56 -04:00
module_signing.c MODSIGN: Export module signature definitions 2019-08-05 18:39:56 -04:00
module-internal.h
module.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
notifier.c kernel/notifier.c: remove blocking_notifier_chain_cond_register() 2019-12-04 19:44:12 -08:00
nsproxy.c ns: Introduce Time Namespace 2020-01-14 12:20:48 +01:00
padata.c padata: update documentation 2019-12-11 16:37:02 +08:00
panic.c locking/refcount: Remove unused 'refcount_error_report()' function 2019-11-25 09:15:42 +01:00
params.c lockdown: Lock down module params that specify hardware parameters (eg. ioport) 2019-08-19 21:54:16 -07:00
pid_namespace.c pid: Improve the comment about waiting in zap_pid_ns_processes 2020-02-28 16:29:12 -06:00
pid.c proc: Use a dedicated lock in struct pid 2020-04-09 12:15:35 -05:00
profile.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
ptrace.c ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() 2020-01-18 13:51:39 +01:00
range.c
reboot.c
relay.c
resource.c mm/memory_hotplug.c: use PFN_UP / PFN_DOWN in walk_system_ram_range() 2019-09-24 15:54:09 -07:00
rseq.c rseq: Reject unknown flags on rseq unregister 2019-12-25 10:41:20 +01:00
seccomp.c seccomp: Check that seccomp_notif is zeroed out by the user 2020-01-02 13:03:45 -08:00
signal.c signal: Extend exec_id to 64bits 2020-04-01 12:04:24 -05:00
smp.c smp: Remove superfluous cond_func check in smp_call_function_many_cond() 2020-01-28 15:43:00 +01:00
smpboot.c
smpboot.h
softirq.c Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 11:01:13 -07:00
stackleak.c
stacktrace.c stacktrace: Get rid of unneeded '!!' pattern 2019-11-11 10:30:59 +01:00
stop_machine.c stop_machine: Make stop_cpus() static 2020-01-17 10:19:21 +01:00
sys_ni.c y2038: allow disabling time32 system calls 2019-11-15 14:38:30 +01:00
sys.c prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim 2020-01-28 10:09:51 +01:00
sysctl_binary.c sysctl: Remove the sysctl system call 2019-11-26 13:03:56 -06:00
sysctl-test.c kunit: allow kunit tests to be loaded as a module 2020-01-09 16:42:29 -07:00
sysctl.c rcu: Make PREEMPT_RCU be a modifier to TREE_RCU 2019-12-09 12:37:51 -08:00
task_work.c
taskstats.c taskstats: fix data-race 2019-12-04 15:18:39 +01:00
test_kprobes.c
torture.c torture: Remove exporting of internal functions 2019-08-01 14:30:22 -07:00
tracepoint.c The main changes in this release include: 2019-07-18 11:51:00 -07:00
tsacct.c tsacct: add 64-bit btime field 2019-12-18 18:07:31 +01:00
ucount.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
uid16.c
uid16.h
umh.c
up.c smp/up: Make smp_call_function_single() match SMP semantics 2020-02-07 15:34:12 +01:00
user_namespace.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
user-return-notifier.c
user.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
utsname_sysctl.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
utsname.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
watchdog_hld.c
watchdog.c watchdog/softlockup: Enforce that timestamp is valid on boot 2020-01-17 11:19:22 +01:00
workqueue_internal.h
workqueue.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-01-28 10:07:09 -08:00