linux/kernel
Oleg Nesterov 1445c08d06 sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq()
move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
lockless. We can race with set_cpus_allowed() running in parallel.

Change it to take rq->lock around select_fallback_rq(). Note that it is not
trivial to move this spin_lock() into select_fallback_rq(), we must recheck
the task was not migrated after we take the lock and other callers do not
need this lock.

To avoid the races with other callers of select_fallback_rq() which rely on
TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
CPU anyway, and the subsequent __migrate_task() is just meaningless because
p->se.on_rq must be false.

Alternatively, we could change select_task_rq() to take rq->lock right
after it calls sched_class->select_task_rq(), but this looks a bit ugly.

Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091010.GA9131@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:12:01 +02:00
..
gcov microblaze: Enable GCOV_PROFILE_ALL 2009-09-21 14:29:21 +02:00
irq Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-26 15:09:06 -07:00
power mm/pm: force GFP_NOIO during suspend/hibernation and resume 2010-03-06 11:26:26 -08:00
time Merge branch 'linus' into sched/core 2010-04-02 20:03:08 +02:00
trace Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-26 15:10:13 -07:00
.gitignore
acct.c copy_signal() cleanup: kill taskstats_tgid_init() and acct_init_pacct() 2010-03-12 15:52:39 -08:00
async.c
audit_tree.c new helper: iterate_mounts() 2010-03-03 14:07:57 -05:00
audit_watch.c Audit: reorganize struct audit_watch to save 8 bytes 2009-09-24 03:50:25 -04:00
audit.c Fix misspelling of "should" and "shouldn't" in comments. 2010-02-05 12:22:30 +01:00
audit.h Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
auditfilter.c Audit: clean up all op= output to include string quoting 2009-06-24 00:00:52 -04:00
auditsc.c Lose the first argument of audit_inode_child() 2010-02-08 14:38:36 -05:00
backtracetest.c
bounds.c kbuild: move bounds.h to include/generated 2009-12-12 13:08:14 +01:00
capability.c sched: Remove remaining USER_SCHED code 2010-04-02 20:12:00 +02:00
cgroup_freezer.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
cgroup.c cgroups: remove duplicate include 2010-03-24 16:31:19 -07:00
compat.c
configs.c
cpu.c kernel/cpu.c: delete deprecated definition in cpu_up() 2010-03-06 11:26:28 -08:00
cpuset.c sched: Kill the broken and deadlockable cpuset_lock/cpuset_cpus_allowed_locked code 2010-04-02 20:12:01 +02:00
cred.c sched: Remove remaining USER_SCHED code 2010-04-02 20:12:00 +02:00
delayacct.c headers: taskstats_kern.h trim 2009-09-18 09:48:52 -07:00
dma.c
early_res.c x86: Do not free zero sized per cpu areas 2010-03-29 18:55:40 +02:00
elfcore.c elf coredump: add extended numbering support 2010-03-06 11:26:46 -08:00
exec_domain.c
exit.c sched: Remove remaining USER_SCHED code 2010-04-02 20:12:00 +02:00
extable.c
fork.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-13 14:43:01 -08:00
freezer.c sched: fix nr_uninterruptible accounting of frozen tasks really 2009-07-18 14:19:53 +02:00
futex_compat.c futex: Protect pid lookup in compat code with RCU 2009-12-09 14:22:14 +01:00
futex.c futex: Handle futex value corruption gracefully 2010-02-03 15:13:22 +01:00
groups.c
hrtimer.c hrtimers: Convert to raw_spinlocks 2009-12-14 23:55:34 +01:00
hung_task.c softlockup: Fix hung_task_check_count sysctl 2009-11-27 06:21:57 +01:00
hw_breakpoint.c Merge branch 'perf/core' into perf/urgent 2010-03-04 11:47:52 +01:00
itimer.c itimers: Fix racy writes to cpu_itimer fields 2009-11-18 16:32:12 +01:00
kallsyms.c hw-breakpoints: Fix broken hw-breakpoint sample module 2009-11-10 11:23:29 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks mutex: Better control mutex adaptive spinning config 2009-12-03 11:50:11 +01:00
Kconfig.preempt
kexec.c percpu: add __percpu sparse annotations to core kernel subsystems 2010-02-17 11:17:38 +09:00
kfifo.c kfifo: Don't use integer as NULL pointer 2010-02-16 15:11:08 -08:00
kgdb.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-02-04 16:07:41 -08:00
kmod.c kmod: fix resource leak in call_usermodehelper_pipe() 2010-01-11 09:34:04 -08:00
kprobes.c kprobes: Calculate the index correctly when freeing the out-of-line execution slot 2010-03-11 14:06:16 +01:00
ksysfs.c Merge branch 'for-next' into for-linus 2010-03-08 16:55:37 +01:00
kthread.c cpuset: fix the problem that cpuset_mem_spread_node() returns an offline node 2010-03-24 16:31:21 -07:00
latencytop.c
lockdep_internals.h lockdep: BFS cleanup 2009-07-24 10:53:29 +02:00
lockdep_proc.c seq_file: constify seq_operations 2009-09-23 07:39:29 -07:00
lockdep_states.h
lockdep.c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-18 16:52:46 -07:00
Makefile elf coredump: replace ELF_CORE_EXTRA_* macros by functions 2010-03-06 11:26:45 -08:00
module.c sysfs: Use sysfs_attr_init and sysfs_bin_attr_init on module dynamic attributes 2010-03-07 17:04:51 -08:00
mutex-debug.c headers: remove sched.h from interrupt.h 2009-10-11 11:20:58 -07:00
mutex-debug.h locking: Implement new raw_spinlock 2009-12-14 23:55:32 +01:00
mutex.c mutex: Better control mutex adaptive spinning config 2009-12-03 11:50:11 +01:00
mutex.h
notifier.c sched: Use lockdep-based checking on rcu_dereference() 2010-02-25 10:34:26 +01:00
ns_cgroup.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
nsproxy.c nsproxy: remove INIT_NSPROXY() 2010-03-12 15:52:40 -08:00
padata.c padata: Allocate the cpumask for the padata instance 2010-03-04 13:30:22 +08:00
panic.c panic: fix panic_timeout accuracy when running on a hypervisor 2010-03-06 11:26:33 -08:00
params.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-03-12 16:04:50 -08:00
perf_event.c perf: Fix unexported generic perf_arch_fetch_caller_regs 2010-03-17 12:26:49 +01:00
pid_namespace.c pid_ns: zap_pid_ns_processes: use SEND_SIG_NOINFO instead of force_sig() 2010-03-12 15:52:40 -08:00
pid.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-13 14:43:01 -08:00
pm_qos_params.c pm_qos: clean up racy global "name" variable 2009-10-14 15:31:10 +02:00
posix-cpu-timers.c Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-26 15:10:38 -07:00
posix-timers.c posix-timers.c: Don't export local functions 2010-02-05 14:54:10 +01:00
printk.c printk: avoid warning when CONFIG_PRINTK is disabled 2010-03-06 11:26:33 -08:00
profile.c kernel/profile.c: Switch /proc/irq/prof_cpu_mask to seq_file 2009-09-20 20:15:40 +02:00
ptrace.c ptrace: Fix ptrace_regset() comments and diagnose errors specifically 2010-02-23 13:45:26 -08:00
range.c x86: Change range end to start+size 2010-02-10 17:47:17 -08:00
rcupdate.c rcu: Make rcu_read_lock_bh_held() allow for disabled BH 2010-03-16 09:57:49 +01:00
rcutiny.c rcu: Eliminate unneeded function wrapping 2009-11-22 18:58:16 +01:00
rcutorture.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2010-03-03 07:34:18 -08:00
rcutree_plugin.h rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU 2010-02-28 09:17:42 +01:00
rcutree_trace.c rcu: Stop overflowing signed integers 2010-02-25 10:34:57 +01:00
rcutree.c rcu: Fix accelerated grace periods for last non-dynticked CPU 2010-02-27 09:53:52 +01:00
rcutree.h rcu: Increase RCU CPU stall timeouts if PROVE_RCU 2010-03-11 13:38:01 +01:00
relay.c splice: comparing unsigned int < 0 2010-03-06 11:26:32 -08:00
res_counter.c memcg: some modification to softlimit under hierarchical memory reclaim. 2009-10-01 16:11:13 -07:00
resource.c resources: add interfaces that return conflict information 2010-03-23 13:33:50 -07:00
rtmutex_common.h
rtmutex-debug.c sched: Convert pi_lock to raw_spinlock 2009-12-14 23:55:33 +01:00
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c rtmutes: Convert rtmutex.lock to raw_spinlock 2009-12-14 23:55:33 +01:00
rtmutex.h
rwsem.c
sched_clock.c sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK 2009-12-15 09:04:36 +01:00
sched_cpupri.c Merge branch 'for-next' into for-linus 2010-03-08 16:55:37 +01:00
sched_cpupri.h sched: Convert cpupri lock to raw_spinlock 2009-12-14 23:55:33 +01:00
sched_debug.c sched: Remove remaining USER_SCHED code 2010-04-02 20:12:00 +02:00
sched_fair.c Merge branch 'linus' into sched/core 2010-04-02 20:03:08 +02:00
sched_features.h sched: Remove ASYM_GRAN feature 2010-03-11 18:32:53 +01:00
sched_idletask.c sched: Remove the sched_class load_balance methods 2010-01-21 13:40:09 +01:00
sched_rt.c Merge branch 'linus' into sched/core 2010-04-02 20:03:08 +02:00
sched_stats.h
sched.c sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq() 2010-04-02 20:12:01 +02:00
seccomp.c
semaphore.c
signal.c kernel core: use helpers for rlimits 2010-03-06 11:26:33 -08:00
slow-work-debugfs.c SLOW_WORK: Move slow_work's proc file to debugfs 2009-12-01 08:20:31 -08:00
slow-work.c slow-work: use get_ref wrapper instead of directly calling get_ref 2010-03-29 09:13:30 -07:00
slow-work.h SLOW_WORK: CONFIG_SLOW_WORK_PROC should be CONFIG_SLOW_WORK_DEBUG 2010-03-29 09:14:47 -07:00
smp.c generic-ipi: Optimize accesses by using DEFINE_PER_CPU_SHARED_ALIGNED for IPI data 2010-01-18 09:02:59 +01:00
softirq.c hrtimer, softirq: Fix hrtimer->softirq trampoline 2010-02-03 18:17:40 +01:00
softlockup.c softlockup: Stop spurious softlockup messages due to overflow 2010-03-21 19:30:13 +01:00
spinlock.c locking: Cleanup the name space completely 2009-12-14 23:55:33 +01:00
srcu.c rcu: Introduce lockdep-based checking to RCU read-side primitives 2010-02-25 09:40:59 +01:00
stacktrace.c
stop_machine.c percpu: add __percpu sparse annotations to core kernel subsystems 2010-02-17 11:17:38 +09:00
sys_ni.c Add generic sys_ipc wrapper 2010-03-12 15:52:32 -08:00
sys.c Add generic sys_olduname() 2010-03-12 15:52:32 -08:00
sysctl_binary.c Switch may_open() and break_lease() to passing O_... 2010-03-03 13:00:21 -05:00
sysctl_check.c ipv4 05/05: add sysctl to accept packets with local source addresses 2009-12-03 12:14:38 -08:00
sysctl.c sysctl extern cleanup: lockdep 2010-03-12 15:53:10 -08:00
taskstats.c const: struct nla_policy 2010-02-18 14:30:18 -08:00
test_kprobes.c
time.c Revert "time: Remove xtime_cache" 2009-12-22 14:10:37 -08:00
timeconst.pl
timer.c timer stats: Fix del_timer_sync() and try_to_del_timer_sync() 2010-03-12 19:11:29 +01:00
tracepoint.c trivial: fix typo "to to" in multiple files 2009-09-21 15:14:55 +02:00
tsacct.c mm: clean up mm_counter 2010-03-06 11:26:23 -08:00
uid16.c headers: utsname.h redux 2009-09-23 18:13:10 -07:00
up.c
user_namespace.c
user-return-notifier.c core: Clean up user return notifers use of per_cpu 2009-12-02 10:22:59 +01:00
user.c sched: Remove remaining USER_SCHED code 2010-04-02 20:12:00 +02:00
utsname_sysctl.c sysctl kernel: Remove binary sysctl logic 2009-11-12 02:04:55 -08:00
utsname.c utsns: extract creeate_uts_ns() 2009-06-18 13:03:55 -07:00
wait.c locking, sched: Give waitqueue spinlocks their own lockdep classes 2009-08-10 14:43:09 +02:00
workqueue.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2009-12-10 09:35:44 -08:00