linux

History

Tejun Heo 29187a9eea workqueue: fix subtle pool management issue which can stall whole worker_pool A worker_pool's forward progress is guaranteed by the fact that the last idle worker assumes the manager role to create more workers and summon the rescuers if creating workers doesn't succeed in timely manner before proceeding to execute work items. This manager role is implemented in manage_workers(), which indicates whether the worker may proceed to work item execution with its return value. This is necessary because multiple workers may contend for the manager role, and, if there already is a manager, others should proceed to work item execution. Unfortunately, the function also indicates that the worker may proceed to work item execution if need_to_create_worker() is false at the head of the function. need_to_create_worker() tests the following conditions. pending work items && !nr_running && !nr_idle The first and third conditions are protected by pool->lock and thus won't change while holding pool->lock; however, nr_running can change asynchronously as other workers block and resume and while it's likely to be zero, as someone woke this worker up in the first place, some other workers could have become runnable inbetween making it non-zero. If this happens, manage_worker() could return false even with zero nr_idle making the worker, the last idle one, proceed to execute work items. If then all workers of the pool end up blocking on a resource which can only be released by a work item which is pending on that pool, the whole pool can deadlock as there's no one to create more workers or summon the rescuers. This patch fixes the problem by removing the early exit condition from maybe_create_worker() and making manage_workers() return false iff there's already another manager, which ensures that the last worker doesn't start executing work items. We can leave the early exit condition alone and just ignore the return value but the only reason it was put there is because the manage_workers() used to perform both creations and destructions of workers and thus the function may be invoked while the pool is trying to reduce the number of workers. Now that manage_workers() is called only when more workers are needed, the only case this early exit condition is triggered is rare race conditions rendering it pointless. Tested with simulated workload and modified workqueue code which trigger the pool deadlock reliably without this patch. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Eric Sandeen <sandeen@sandeen.net> Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net Cc: Dave Chinner <david@fromorbit.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: stable@vger.kernel.org		2015-01-16 14:21:16 -05:00
..
bpf	bpf: verifier: add checks for BPF_ABS \| BPF_IND instructions	2014-12-05 21:47:32 -08:00
configs	x86: Add "make tinyconfig" to configure the tiniest possible kernel	2014-08-08 16:30:24 -07:00
debug	kdb: replace strnicmp with strncasecmp	2014-10-14 02:18:25 +02:00
events	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2014-12-19 13:15:24 -08:00
gcov	gcov: enable GCOV_PROFILE_ALL from ARCH Kconfigs	2014-12-13 12:42:51 -08:00
irq	genirq: Prevent proc race against freeing of irq descriptors	2014-12-13 13:33:07 +01:00
locking	locking/mutex: Don't assume TASK_RUNNING	2014-10-28 10:55:08 +01:00
power	PM: Eliminate CONFIG_PM_RUNTIME	2014-12-19 22:55:06 +01:00
printk	This code is a fork from the trace-3.19 pull as it needed the trace_seq	2014-12-13 14:04:41 -08:00
rcu	Merge branches 'torture.2014.11.03a', 'cpu.2014.11.03a', 'doc.2014.11.13a', 'fixes.2014.11.13a', 'signal.2014.10.29a' and 'rt.2014.10.29a' into HEAD	2014-11-13 10:39:04 -08:00
sched	sched_show_task: fix unsafe usage of ->real_parent	2014-12-10 17:41:09 -08:00
time	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2014-12-19 13:29:20 -08:00
trace	More ACPI and power management updates for 3.19-rc1	2014-12-18 20:28:33 -08:00
.gitignore
acct.c	acct: eliminate compile warning	2014-10-09 22:26:04 -04:00
async.c	kernel/async.c: switch to pr_foo()	2014-10-09 22:26:04 -04:00
audit_tree.c	fsnotify: unify inode and mount marks handling	2014-12-13 12:42:53 -08:00
audit_watch.c	audit: invalid op= values for rules	2014-09-23 16:37:53 -04:00
audit.c	Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit	2014-12-13 13:41:28 -08:00
audit.h	audit: reduce scope of audit_log_fcaps	2014-09-23 16:37:51 -04:00
auditfilter.c	Merge git://git.infradead.org/users/eparis/audit	2014-10-19 16:25:56 -07:00
auditsc.c	new helper: audit_file()	2014-11-19 13:01:26 -05:00
backtracetest.c	kernel/backtracetest.c: replace no level printk by pr_info()	2014-06-04 16:54:14 -07:00
bounds.c	page-cgroup: get rid of NR_PCG_FLAGS	2014-08-08 15:57:18 -07:00
capability.c	CAPABILITIES: remove undefined caps from all processes	2014-07-24 21:53:47 +10:00
cgroup_freezer.c	cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes	2014-07-15 11:05:09 -04:00
cgroup.c	cgroup: implement cgroup_get_e_css()	2014-11-18 02:49:52 -05:00
compat.c	compat: nanosleep: Clarify error handling	2014-09-06 12:58:18 +02:00
configs.c
context_tracking.c	sched: stop the unbound recursion in preempt_schedule_context()	2014-10-28 10:46:05 +01:00
cpu_pm.c
cpu.c	cpu: Avoid puts_pending overflow	2014-11-03 19:21:01 -08:00
cpuset.c	Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2014-12-11 18:57:19 -08:00
crash_dump.c	crash_dump: Make is_kdump_kernel() accessible from modules	2014-08-25 15:42:19 -07:00
cred.c
delayacct.c	delayacct: Remove braindamaged type conversions	2014-07-23 10:18:06 -07:00
dma.c
elfcore.c
exec_domain.c	kernel/exec_domain.c: code clean-up	2014-06-04 16:54:15 -07:00
exit.c	TTY/Serial driver patches for 3.19-rc1	2014-12-14 15:23:32 -08:00
extable.c	ftrace/x86/extable: Add is_ftrace_trampoline() function	2014-11-19 15:25:26 -05:00
fork.c	mm: use new helper functions around the i_mmap_mutex	2014-12-13 12:42:45 -08:00
freezer.c	freezer: remove obsolete comments in __thaw_task()	2014-10-21 23:44:20 +02:00
futex_compat.c
futex.c	futex: Fix a race condition between REQUEUE_PI and task death	2014-10-26 16:16:18 +01:00
groups.c	userns: Don't allow setgroups until a gid mapping has been setablished	2014-12-09 16:58:40 -06:00
hung_task.c	kernel/hung_task.c: convert simple_strtoul to kstrtouint	2014-06-04 16:54:15 -07:00
irq_work.c	percpu: Convert remaining __get_cpu_var uses in 3.18-rcX	2014-10-29 11:18:18 -04:00
jump_label.c
kallsyms.c	kernel/kallsyms.c: use __seq_open_private()	2014-10-14 02:18:16 +02:00
kcmp.c	kcmp: fix standard comparison bug	2014-09-10 15:42:12 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks	locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER	2014-07-16 14:57:13 +02:00
Kconfig.preempt
kexec.c	kexec: remove unnecessary KERN_ERR from kexec.c	2014-12-13 12:42:51 -08:00
kmod.c	usermodehelper: kill the kmod_thread_locker logic	2014-12-10 17:41:17 -08:00
kprobes.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux	2014-12-11 17:30:55 -08:00
ksysfs.c	kobject: Make support for uevent_helper optional.	2014-04-25 12:00:49 -07:00
kthread.c	kernel/kthread.c: partial revert of `81c98869fa` ("kthread: ensure locality of task_struct allocations")	2014-10-09 22:25:51 -04:00
latencytop.c	kernel/latencytop.c: convert seq_printf to seq_puts	2014-06-04 16:54:15 -07:00
Makefile	kernel: res_counter: remove the unused API	2014-12-10 17:41:04 -08:00
module_signing.c
module-internal.h
module.c	The exciting thing here is the getting rid of stop_machine on module	2014-12-18 20:55:41 -08:00
notifier.c	kprobes, notifier: Use NOKPROBE_SYMBOL macro in notifier	2014-04-24 10:26:39 +02:00
nsproxy.c	bury struct proc_ns in fs/proc	2014-12-04 14:34:54 -05:00
padata.c
panic.c	kernel: add panic_on_warn	2014-12-10 17:41:10 -08:00
params.c	param: do not set store func without write perm	2014-12-18 12:38:51 +10:30
pid_namespace.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2014-12-16 15:53:03 -08:00
pid.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2014-12-16 15:53:03 -08:00
profile.c	kernel/profile.c: use static const char instead of static char	2014-06-06 16:08:13 -07:00
ptrace.c	exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()	2014-12-10 17:41:10 -08:00
range.c
reboot.c	kernel: add support for kernel restart handler call chain	2014-09-26 00:00:06 -07:00
relay.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2014-04-12 14:49:50 -07:00
resource.c	x86: optimize resource lookups for ioremap	2014-10-14 02:18:22 +02:00
seccomp.c	Merge branch 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2014-10-14 02:27:06 +02:00
signal.c	Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2014-12-10 09:34:43 -08:00
smp.c	Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu	2014-10-15 07:48:18 +02:00
smpboot.c	sched, smp: Correctly deal with nested sleeps	2014-10-28 10:56:24 +01:00
smpboot.h
softirq.c	rcu: Remove "cpu" argument to rcu_note_context_switch()	2014-11-03 19:20:34 -08:00
stacktrace.c	stacktrace: introduce snprint_stack_trace for buffer output	2014-12-13 12:42:48 -08:00
stop_machine.c	kernel/stop_machine.c: kernel-doc warning fix	2014-06-04 16:54:15 -07:00
sys_ni.c	syscalls: implement execveat() system call	2014-12-13 12:42:51 -08:00
sys.c	x86, mpx: On-demand kernel allocation of bounds tables	2014-11-18 00:58:53 +01:00
sysctl_binary.c	kernel: add panic_on_warn	2014-12-10 17:41:10 -08:00
sysctl.c	As the merge window is still open, and this code was not as complex	2014-12-16 12:53:59 -08:00
system_certificates.S
system_keyring.c	KEYS: validate certificate trust only with builtin keys	2014-07-17 09:35:17 -04:00
task_work.c
taskstats.c	kill f_dentry uses	2014-11-19 13:01:25 -05:00
test_kprobes.c	kernel/test_kprobes.c: use current logging functions	2014-08-08 15:57:18 -07:00
torture.c	torture: Address race in module cleanup	2014-09-16 13:41:06 -07:00
tracepoint.c	tracing: syscall_regfunc() should not skip kernel threads	2014-06-21 00:15:26 -04:00
tsacct.c	sched: Make task->start_time nanoseconds based	2014-07-23 10:18:05 -07:00
uid16.c	groups: Consolidate the setgroups permission checks	2014-12-05 17:19:27 -06:00
up.c	smp: Rename __smp_call_function_single() to smp_call_function_single_async()	2014-02-24 14:47:15 -08:00
user_namespace.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2014-12-17 12:31:40 -08:00
user-return-notifier.c	scheduler: Replace __get_cpu_var with this_cpu_ptr	2014-08-26 13:45:45 -04:00
user.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2014-12-17 12:31:40 -08:00
utsname_sysctl.c	sysctl: convert use of typedef ctl_table to struct ctl_table	2014-06-06 16:08:16 -07:00
utsname.c	copy address of proc_ns_ops into ns_common	2014-12-04 14:34:47 -05:00
watchdog.c	Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu	2014-10-15 07:48:18 +02:00
workqueue_internal.h	workqueue: rename manager_mutex to attach_mutex	2014-05-20 10:59:32 -04:00
workqueue.c	workqueue: fix subtle pool management issue which can stall whole worker_pool	2015-01-16 14:21:16 -05:00