linux/kernel
Gregory Haskins 73fe6aae84 sched: add RT-balance cpu-weight
Some RT tasks (particularly kthreads) are bound to one specific CPU.
It is fairly common for two or more bound tasks to get queued up at the
same time.  Consider, for instance, softirq_timer and softirq_sched.  A
timer goes off in an ISR which schedules softirq_thread to run at RT50.
Then the timer handler determines that it's time to smp-rebalance the
system so it schedules softirq_sched to run.  So we are in a situation
where we have two RT50 tasks queued, and the system will go into
rt-overload condition to request other CPUs for help.

This causes two problems in the current code:

1) If a high-priority bound task and a low-priority unbounded task queue
   up behind the running task, we will fail to ever relocate the unbounded
   task because we terminate the search on the first unmovable task.

2) We spend precious futile cycles in the fast-path trying to pull
   overloaded tasks over.  It is therefore optimial to strive to avoid the
   overhead all together if we can cheaply detect the condition before
   overload even occurs.

This patch tries to achieve this optimization by utilizing the hamming
weight of the task->cpus_allowed mask.  A weight of 1 indicates that
the task cannot be migrated.  We will then utilize this information to
skip non-migratable tasks and to eliminate uncessary rebalance attempts.

We introduce a per-rq variable to count the number of migratable tasks
that are currently running.  We only go into overload if we have more
than one rt task, AND at least one of them is migratable.

In addition, we introduce a per-task variable to cache the cpus_allowed
weight, since the hamming calculation is probably relatively expensive.
We only update the cached value when the mask is updated which should be
relatively infrequent, especially compared to scheduling frequency
in the fast path.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:07 +01:00
..
irq genirq: revert lazy irq disable for simple irqs 2007-12-18 18:05:58 +01:00
power driver core: make /sys/power a kobject 2008-01-24 20:40:25 -08:00
time Driver core: change sysdev classes to use dynamic kobject names 2008-01-24 20:40:40 -08:00
.gitignore
acct.c acct: real_parent ppid 2008-01-07 14:55:37 -08:00
audit_tree.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit.h [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
auditfilter.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
auditsc.c auditsc: fix kernel-doc param warnings 2007-10-22 19:40:02 -07:00
capability.c Uninline find_pid etc set of functions 2007-10-19 11:53:41 -07:00
cgroup_debug.c Task Control Groups: simple task cgroup debug info subsystem 2007-10-19 11:53:36 -07:00
cgroup.c Improve cgroup printks 2007-11-14 18:45:37 -08:00
compat.c Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt 2007-10-18 15:12:41 -07:00
configs.c use simple_read_from_buffer in kernel/ 2007-05-09 12:30:49 -07:00
cpu.c cpu-hotplug: replace per-subsystem mutexes with get_online_cpus() 2008-01-25 21:08:02 +01:00
cpuset.c cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus() 2008-01-25 21:08:02 +01:00
delayacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
dma.c whitespace fixes: DMA channel allocator 2007-10-18 14:37:24 -07:00
exec_domain.c whitespace fixes: execution domains 2007-10-18 14:37:26 -07:00
exit.c wait_task_stopped(): pass correct exit_code to wait_noreap_copyout() 2007-11-29 09:24:55 -08:00
extable.c
fork.c sched: add RT-balance cpu-weight 2008-01-25 21:08:07 +01:00
futex_compat.c [FUTEX] Fix address computation in compat code. 2007-11-09 16:13:08 -08:00
futex.c futex: Prevent stale futex owner when interrupted/timeout 2008-01-08 16:21:39 -08:00
hrtimer.c hrtimer: fix section mismatch 2008-01-21 19:39:41 -08:00
itimer.c whitespace fixes: interval timers 2007-10-18 14:37:26 -07:00
kallsyms.c FRV: fix the extern declaration of kallsyms_num_syms 2007-11-29 09:24:54 -08:00
Kconfig.hz [PATCH] HZ: 300Hz support 2006-12-07 08:39:36 -08:00
Kconfig.instrumentation Tiny clean-up of OPROFILE/KPROBES configuration 2007-12-06 09:41:12 -08:00
Kconfig.preempt Move PREEMPT_NOTIFIERS into an always-included Kconfig 2007-10-17 08:42:55 -07:00
kexec.c vmcoreinfo: add the array length of "free_list" for filtering free pages 2008-01-08 16:10:36 -08:00
kfifo.c is_power_of_2: kernel/kfifo.c 2007-07-16 09:05:50 -07:00
kmod.c Fix unbalanced helper_lock in kernel/kmod.c 2008-01-17 15:38:59 -08:00
kprobes.c kprobes: support kretprobe blacklist 2007-10-16 09:43:10 -07:00
ksysfs.c Kobject: convert remaining kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
kthread.c kthread: silence bogus section mismatch warning 2007-07-31 15:39:42 -07:00
latency.c [PATCH] severing module.h->sched.h 2006-12-04 02:00:22 -05:00
lockdep_internals.h [PATCH] lockdep: more chains 2006-12-07 08:39:43 -08:00
lockdep_proc.c lockdep: Avoid /proc/lockdep & lock_stat infinite output 2007-10-11 22:11:11 +02:00
lockdep.c softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks 2008-01-25 21:08:02 +01:00
Makefile revert "Task Control Groups: example CPU accounting subsystem" 2007-11-14 18:45:40 -08:00
marker.c Linux Kernel Markers: fix marker mutex not taken upon module load 2007-11-14 18:45:40 -08:00
module.c Kobject: convert remaining kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
mutex-debug.c [PATCH] remove many unneeded #includes of sched.h 2007-02-14 08:09:54 -08:00
mutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
mutex.c lockdep: fixup mutex annotations 2007-10-11 22:11:12 +02:00
mutex.h [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
notifier.c Add kernel/notifier.c 2007-10-19 11:53:34 -07:00
ns_cgroup.c cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
nsproxy.c pid namespaces: allow cloning of new namespace 2007-10-19 11:53:39 -07:00
panic.c debug: add end-of-oops marker 2007-12-20 15:01:17 +01:00
params.c Modules: remove unneeded release function 2008-01-24 20:40:39 -08:00
pid.c pidns: Place under CONFIG_EXPERIMENTAL 2007-11-14 18:45:43 -08:00
posix-cpu-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
posix-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
printk.c sched: remove printk_clock() 2008-01-25 21:07:59 +01:00
profile.c sched: document profile=sleep requiring CONFIG_SCHEDSTATS 2007-10-24 18:23:50 +02:00
ptrace.c ptrace: Call arch_ptrace_attach() when request=PTRACE_TRACEME 2008-01-25 08:31:39 +01:00
rcupdate.c rcu: fix section mismatch 2008-01-22 09:17:48 -08:00
rcutorture.c cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus() 2008-01-25 21:08:02 +01:00
relay.c whitespace fixes: relayfs 2007-10-18 14:37:24 -07:00
resource.c Add IORESOUCE_BUSY flag for System RAM 2007-11-14 18:45:39 -08:00
rtmutex_common.h FUTEX: Tidy up the code 2007-07-16 09:05:49 -07:00
rtmutex-debug.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rtmutex-tester.c Driver core: change sysdev classes to use dynamic kobject names 2008-01-24 20:40:40 -08:00
rtmutex.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rwsem.c sched: mark rwsem functions as __sched for wchan/profiling 2007-12-18 15:21:13 +01:00
sched_debug.c sched: fix gcc warnings 2007-12-30 17:24:35 +01:00
sched_fair.c sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups 2008-01-25 21:08:00 +01:00
sched_idletask.c sched: isolate SMP balancing code a bit more 2007-10-24 18:23:51 +02:00
sched_rt.c sched: add RT-balance cpu-weight 2008-01-25 21:08:07 +01:00
sched_stats.h sched: clean up kernel/sched_stat.h 2007-11-28 15:52:56 +01:00
sched.c sched: add RT-balance cpu-weight 2008-01-25 21:08:07 +01:00
seccomp.c make seccomp zerocost in schedule 2007-07-16 09:05:50 -07:00
signal.c sigwait eats blocked default-ignore signals 2007-11-12 16:05:23 -08:00
softirq.c [KERNEL]: Unexport raise_softirq_irqoff 2007-10-10 16:49:18 -07:00
softlockup.c softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks 2008-01-25 21:08:02 +01:00
spinlock.c lockstat: hook into spinlock_t, rwlock_t, rwsem and mutex 2007-07-19 10:04:49 -07:00
srcu.c [PATCH] SRCU: report out-of-memory errors 2006-10-04 07:55:30 -07:00
stacktrace.c [PATCH] lockdep: stacktrace subsystem, core 2006-07-03 15:27:02 -07:00
stop_machine.c cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus() 2008-01-25 21:08:02 +01:00
sys_ni.c [COMPAT]: Fix build on COMPAT platforms when CONFIG_NET is disabled. 2007-10-30 21:29:56 -07:00
sys.c x86: ignore the sys_getcpu() tcache parameter 2007-11-17 16:27:00 +01:00
sysctl_check.c sysctl: fix ax25 checks 2007-12-17 19:28:17 -08:00
sysctl.c softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks 2008-01-25 21:08:02 +01:00
taskstats.c kernel/taskstats.c: fix bogus nlmsg_free() 2007-11-14 18:45:44 -08:00
time.c whitespace fixes: time syscalls 2007-10-18 14:37:24 -07:00
timer.c timer: fix section mismatch 2008-01-21 19:39:41 -08:00
tsacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
uid16.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
user_namespace.c Fix user namespace exiting OOPs 2007-09-19 11:24:18 -07:00
user.c Kobject: convert kernel/user.c to use kobject_init/add_ng() 2008-01-24 20:40:31 -08:00
utsname_sysctl.c Isolate the UTS namespace's domainname and hostname back 2007-11-29 09:24:53 -08:00
utsname.c Fix UTS corruption during clone(CLONE_NEWUTS) 2007-09-19 11:24:17 -07:00
wait.c Fix occurrences of "the the " 2007-05-09 08:57:56 +02:00
workqueue.c cpu-hotplug: replace per-subsystem mutexes with get_online_cpus() 2008-01-25 21:08:02 +01:00