linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-27 22:51:35 +00:00

History

Paul E. McKenney 80b3fd474c rcu: Make rcu_barrier() no longer block CPU-hotplug operations This commit removes the cpus_read_lock() and cpus_read_unlock() calls from rcu_barrier(), thus allowing CPUs to come and go during the course of rcu_barrier() execution. Posting of the ->barrier_head callbacks does synchronize with portions of RCU's CPU-hotplug notifiers, but these locks are held for short time periods on both sides. Thus, full CPU-hotplug operations could both start and finish during the execution of a given rcu_barrier() invocation. Additional synchronization is provided by a global ->barrier_lock. Since the ->barrier_lock is only used during rcu_barrier() execution and during onlining/offlining a CPU, the contention for this lock should be low. It might be tempting to make use of a per-CPU lock just on general principles, but straightforward attempts to do this have the problems shown below. Initial state: 3 CPUs present, CPU 0 and CPU1 do not have any callback and CPU2 has callbacks. 1. CPU0 calls rcu_barrier(). 2. CPU1 starts offlining for CPU2. CPU1 calls rcutree_migrate_callbacks(). rcu_barrier_entrain() is called from rcutree_migrate_callbacks(), with CPU2's rdp->barrier_lock. It does not entrain ->barrier_head for CPU2, as rcu_barrier() on CPU0 hasn't started the barrier sequence (by calling rcu_seq_start(&rcu_state.barrier_sequence)) yet. 3. CPU0 starts new barrier sequence. It iterates over CPU0 and CPU1, after acquiring their per-cpu ->barrier_lock and finds 0 segcblist length. It updates ->barrier_seq_snap for CPU0 and CPU1 and continues loop iteration to CPU2. for_each_possible_cpu(cpu) { raw_spin_lock_irqsave(&rdp->barrier_lock, flags); if (!rcu_segcblist_n_cbs(&rdp->cblist)) { WRITE_ONCE(rdp->barrier_seq_snap, gseq); raw_spin_unlock_irqrestore(&rdp->barrier_lock, flags); rcu_barrier_trace(TPS("NQ"), cpu, rcu_state.barrier_sequence); continue; } 4. rcutree_migrate_callbacks() completes execution on CPU1. Segcblist len for CPU2 becomes 0. 5. The loop iteration on CPU0, checks rcu_segcblist_n_cbs(&rdp->cblist) for CPU2 and completes the loop iteration after setting ->barrier_seq_snap. 6. As there isn't any ->barrier_head callback entrained; at this point, rcu_barrier() in CPU0 returns. 7. The callbacks, which migrated from CPU2 to CPU1, execute. Straightforward per-CPU locking is also subject to the following race condition noted by Boqun Feng: 1. CPU0 calls rcu_barrier(), starting a new barrier sequence by invoking rcu_seq_start() and init_completion(), but does not yet initialize rcu_state.barrier_cpu_count. 2. CPU1 starts offlining for CPU2, calling rcutree_migrate_callbacks(), which in turn calls rcu_barrier_entrain() holding CPU2's. rdp->barrier_lock. It then entrains ->barrier_head for CPU2 and atomically increments rcu_state.barrier_cpu_count, which is unfortunately not yet initialized to the value 2. 3. The just-entrained RCU callback is invoked. It atomically decrements rcu_state.barrier_cpu_count and sees that it is now zero. This callback therefore invokes complete(). 4. CPU0 continues executing rcu_barrier(), but is not blocked by its call to wait_for_completion(). This results in rcu_barrier() returning before all pre-existing callbacks have been invoked, which is a bug. Therefore, synchronization is provided by rcu_state.barrier_lock, which is also held across the initialization sequence, especially the rcu_seq_start() and the atomic_set() that sets rcu_state.barrier_cpu_count to the value 2. In addition, this lock is held when entraining the rcu_barrier() callback, when deciding whether or not a CPU has callbacks that rcu_barrier() must wait on, when setting the ->qsmaskinitnext for incoming CPUs, and when migrating callbacks from a CPU that is going offline. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>		2022-02-08 10:12:28 -08:00
..
bpf	bpf: Fix ringbuf memory type confusion when passing to helpers	2022-01-19 01:21:46 +01:00
cgroup	Merge branch 'for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2022-01-11 09:14:37 -08:00
configs	configs: introduce debug.config for CI-like setup	2022-01-20 08:52:55 +02:00
debug	kdb: Adopt scheduler's task classification	2021-11-03 17:21:37 +00:00
dma	hyperv-next for 5.17	2022-01-16 15:53:00 +02:00
entry	entry: Snapshot thread flags	2021-12-01 00:06:43 +01:00
events	Peter Zijlstra says:	2022-01-12 16:26:58 -08:00
futex	Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2022-01-17 05:49:30 +02:00
gcov	gcov: Remove compiler version check	2021-12-02 17:25:21 +09:00
irq	proc: remove PDE_DATA() completely	2022-01-22 08:33:37 +02:00
kcsan	KCSAN updates for v5.17	2022-01-11 09:51:26 -08:00
livepatch	Livepatching changes for 5.17	2022-01-16 10:08:13 +02:00
locking	locking/rwlocks: introduce write_lock_nested	2022-01-22 08:33:37 +02:00
power	PM: hibernate: Allow ACPI hardware signature to be honoured	2021-12-08 16:06:10 +01:00
printk	printk: fix build warning when CONFIG_PRINTK=n	2022-01-22 08:33:36 +02:00
rcu	rcu: Make rcu_barrier() no longer block CPU-hotplug operations	2022-02-08 10:12:28 -08:00
sched	Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2022-01-17 05:49:30 +02:00
time	bitmap patches for 5.17-rc1	2022-01-23 06:20:44 +02:00
trace	ftrace: Fix s390 breakage from sorting mcount tables	2022-01-23 08:07:02 +02:00
.gitignore	.gitignore: prefix local generated files with a slash	2021-05-02 00:43:35 +09:00
acct.c	kernel: remove spurious blkdev.h includes	2021-10-18 06:17:01 -06:00
async.c	kernel/async.c: remove async_unregister_domain()	2021-05-07 00:26:33 -07:00
audit_fsnotify.c	fsnotify: clarify contract for create event hooks	2021-10-27 12:32:34 +02:00
audit_tree.c	audit: use struct_size() helper in kmalloc()	2021-12-14 17:39:42 -05:00
audit_watch.c	\n	2021-11-06 16:43:20 -07:00
audit.c	audit/stable-5.17 PR 20220110	2022-01-11 13:08:21 -08:00
audit.h	audit/stable-5.16 PR 20211101	2021-11-01 21:17:39 -07:00
auditfilter.c	audit/stable-5.17 PR 20220110	2022-01-11 13:08:21 -08:00
auditsc.c	lsm: security_task_getsecid_subj() -> security_current_getsecid_subj()	2021-11-22 17:52:47 -05:00
backtracetest.c
bounds.c
capability.c
cfi.c	cfi: Use rcu_read_{un}lock_sched_notrace	2021-08-11 13:11:12 -07:00
compat.c	arch: remove compat_alloc_user_space	2021-09-08 15:32:35 -07:00
configs.c
context_tracking.c
cpu_pm.c	PM: cpu: Make notifier chain use a raw_spinlock_t	2021-08-16 18:55:32 +02:00
cpu.c	sched/scs: Reset task stack state in bringup_cpu()	2021-11-24 12:20:27 +01:00
crash_core.c	kernel/crash_core: suppress unknown crashkernel parameter warning	2021-12-25 12:20:55 -08:00
crash_dump.c
cred.c	ucounts: In set_cred_ucounts assume new->ucounts is non-NULL	2021-10-20 10:45:34 -05:00
delayacct.c	delayacct: track delays from memory compact	2022-01-20 08:52:55 +02:00
dma.c
exec_domain.c
exit.c	exit: Fix the exit_code for wait_task_zombie	2022-01-08 12:43:57 -06:00
extable.c	extable: use is_kernel_text() helper	2021-11-09 10:02:51 -08:00
fail_function.c
fork.c	Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2022-01-17 05:49:30 +02:00
freezer.c	sched: Add get_current_state()	2021-06-18 11:43:08 +02:00
gen_kheaders.sh	kbuild: clean up ${quiet} checks in shell scripts	2021-05-27 04:01:50 +09:00
groups.c
hung_task.c	hung_task: move hung_task sysctl interface to hung_task.c	2022-01-22 08:33:34 +02:00
iomem.c
irq_work.c	irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT	2021-10-15 11:25:18 +02:00
jump_label.c	jump_label: Fix jump_label_text_reserved() vs __init	2021-07-05 10:46:20 +02:00
kallsyms.c	Livepatching changes for 5.17	2022-01-16 10:08:13 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks	locking/rwlock: Provide RT variant	2021-08-17 17:50:51 +02:00
Kconfig.preempt	preempt: Restore preemption model selection configs	2021-11-11 13:09:33 +01:00
kcov.c	kcov: replace local_irq_save() with a local_lock_t	2021-11-09 10:02:52 -08:00
kexec_core.c	exit: Move oops specific logic from do_exit into make_task_dead	2021-12-13 12:04:45 -06:00
kexec_elf.c
kexec_file.c	memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED	2021-11-06 13:30:42 -07:00
kexec_internal.h
kexec.c	kexec: avoid compat_alloc_user_space	2021-09-08 15:32:34 -07:00
kheaders.c
kmod.c	modules: add CONFIG_MODPROBE_PATH	2021-05-07 00:26:33 -07:00
kprobes.c	kprobe: move sysctl_kprobes_optimization to kprobes.c	2022-01-22 08:33:36 +02:00
ksysfs.c
kthread.c	Merge branch 'akpm' (patches from Andrew)	2022-01-20 10:41:01 +02:00
latencytop.c
Makefile	module: add in-kernel support for decompressing	2022-01-11 18:45:02 -08:00
module_decompress.c	kernel: Fix spelling mistake "compresser" -> "compressor"	2022-01-13 07:17:47 -08:00
module_signature.c
module_signing.c
module-internal.h	module: add in-kernel support for decompressing	2022-01-11 18:45:02 -08:00
module.c	Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux	2022-01-17 07:32:51 +02:00
notifier.c	notifier: Return an error when a callback has already been registered	2021-12-29 10:37:33 +01:00
nsproxy.c	memcg: enable accounting for new namesapces and struct nsproxy	2021-09-03 09:58:12 -07:00
padata.c	padata: Remove repeated verbose license text	2021-08-27 16:30:18 +08:00
panic.c	panic: remove oops_id	2022-01-20 08:52:55 +02:00
params.c	kobject: remove kset from struct kset_uevent_ops callbacks	2021-12-28 11:26:18 +01:00
pid_namespace.c	memcg: enable accounting for new namesapces and struct nsproxy	2021-09-03 09:58:12 -07:00
pid.c	pid: add pidfd_get_task() helper	2021-10-14 13:29:18 +02:00
profile.c	exit: Remove profile_handoff_task	2022-01-08 12:43:57 -06:00
ptrace.c	ptrace: Remove second setting of PT_SEIZED in ptrace_attach	2022-01-08 12:43:57 -06:00
range.c
reboot.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input	2021-11-12 11:53:16 -08:00
regset.c
relay.c
resource_kunit.c
resource.c	proc: remove PDE_DATA() completely	2022-01-22 08:33:37 +02:00
rseq.c	KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest	2021-09-22 10:24:01 -04:00
scftorture.c	scftorture: Always log error message	2021-12-07 16:36:17 -08:00
scs.c	scs: Release kasan vmalloc poison in scs_free process	2021-09-30 09:37:27 +01:00
seccomp.c	Merge branch 'exit-cleanups-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2021-09-01 14:52:05 -07:00
signal.c	Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2022-01-17 05:49:30 +02:00
smp.c	sched: Improve wake_up_all_idle_cpus() take #2	2021-10-22 15:32:46 +02:00
smpboot.c	smpboot: Replace deprecated CPU-hotplug functions.	2021-08-10 14:57:42 +02:00
smpboot.h
softirq.c	timers/nohz: Last resort update jiffies on nohz_full IRQ entry	2021-12-02 15:07:22 +01:00
stackleak.c	stackleak: move stack_erasing sysctl to stackleak.c	2022-01-22 08:33:35 +02:00
stacktrace.c	stacktrace: move filter_irq_stacks() to kernel/stacktrace.c	2021-11-06 13:30:43 -07:00
static_call.c	static_call: Fix static_call_text_reserved() vs __init	2021-07-05 10:46:33 +02:00
stop_machine.c
sys_ni.c	mm/mempolicy: wire up syscall set_mempolicy_home_node	2022-01-15 16:30:30 +02:00
sys.c	Merge branch 'akpm' (patches from Andrew)	2022-01-20 10:41:01 +02:00
sysctl-test.c	kernel/sysctl-test: Remove some casts which are no-longer required	2021-06-23 16:41:24 -06:00
sysctl.c	sysctl: returns -EINVAL when a negative value is passed to proc_doulongvec_minmax	2022-01-22 08:33:37 +02:00
task_work.c	kasan: record task_work_add() call stack	2021-04-30 11:20:42 -07:00
taskstats.c
torture.c	locktorture,rcutorture,torture: Always log error message	2021-12-07 16:36:17 -08:00
tracepoint.c	tracepoint: Fix kerneldoc comments	2021-08-16 11:39:51 -04:00
tsacct.c	taskstats: Cleanup the use of task->exit_code	2022-01-08 12:43:57 -06:00
ucount.c	ucounts: Fix rlimit max values check	2021-12-09 15:37:18 -06:00
uid16.c
uid16.h
umh.c	kernel/umh.c: fix some spelling mistakes	2021-05-07 00:26:34 -07:00
up.c	A set of locking related fixes and updates:	2021-05-09 13:07:03 -07:00
user_namespace.c	memcg: enable accounting for new namesapces and struct nsproxy	2021-09-03 09:58:12 -07:00
user-return-notifier.c
user.c	fs/epoll: use a per-cpu counter for user's watches count	2021-09-08 11:50:27 -07:00
usermode_driver.c	Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2021-07-03 11:41:14 -07:00
utsname_sysctl.c
utsname.c
watch_queue.c
watchdog_hld.c
watchdog.c	watchdog: move watchdog sysctl interface to watchdog.c	2022-01-22 08:33:34 +02:00
workqueue_internal.h	workqueue: Assign a color to barrier work items	2021-08-17 07:49:10 -10:00
workqueue.c	Merge branch 'workqueue/for-5.16-fixes' into workqueue/for-5.17	2022-01-10 07:54:04 -10:00