linux/kernel
Peter Zijlstra 866ab43efd sched: Fix the group_imb logic
On a 2*6*2 machine something like:

 taskset -c 3-11 bash -c 'for ((i=0;i<9;i++)) do while :; do :; done & done'

_should_ result in 9 busy CPUs, each running 1 task.

However it didn't quite work reliably, most of the time one cpu of the
second socket (6-11) would be idle and one cpu of the first socket
(0-5) would have two tasks on it.

The group_imb logic is supposed to deal with this and detect when a
particular group is imbalanced (like in our case, 0-2 are idle but 3-5
will have 4 tasks on it).

The detection phase needed a bit of a tweak as it was too weak and
required more than 2 avg weight tasks difference between idle and busy
cpus in the group which won't trigger for our test-case. So cure that
to be one or more avg task weight difference between cpus.

Once the detection phase worked, it was then defeated by the f_b_g()
tests trying to avoid ping-pongs. In particular, this_load >= max_load
triggered because the pulling cpu (the (first) idle cpu in on the
second socket, say 6) would find this_load to be 5 and max_load to be
4 (there'd be 5 tasks running on our socket and only 4 on the other
socket).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nikhil Rao <ncrao@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-23 11:33:57 +01:00
..
debug Merge branch 'master' into for-next 2010-12-22 18:57:02 +01:00
gcov llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
irq genirq: Prevent irq storm on migration 2011-02-02 22:15:08 +01:00
power Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2011-02-18 12:36:06 -08:00
time timer debug: Hide kernel addresses via %pK in /proc/timer_list 2011-02-12 14:11:56 +01:00
trace Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block 2011-02-09 11:45:21 -08:00
.gitignore
acct.c pass a struct path to vfs_statfs 2010-08-09 16:48:42 -04:00
async.c async: use workqueue for worker pool 2010-07-14 11:29:46 +02:00
audit_tree.c in untag_chunk() we need to do alloc_chunk() a bit earlier 2010-10-30 02:18:32 -04:00
audit_watch.c audit: make functions static 2010-10-30 01:42:19 -04:00
audit.c audit: error message typo correction 2010-11-03 13:49:58 -04:00
audit.h audit: make functions static 2010-10-30 01:42:19 -04:00
auditfilter.c Audit: add support to match lsm labels on user audit messages 2010-10-30 01:41:57 -04:00
auditsc.c audit mmap 2010-10-30 08:45:43 -04:00
backtracetest.c
bounds.c
capability.c security: add cred argument to security_capable() 2011-02-11 17:41:58 +11:00
cgroup_freezer.c cgroup_freezer: update_freezer_state() does incorrect state transitions 2010-10-27 18:03:08 -07:00
cgroup.c Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin 2011-01-14 09:08:29 -08:00
compat.c compat: Make compat_alloc_user_space() incorporate the access_ok() 2010-09-14 16:08:45 -07:00
configs.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
cpu.c Merge branches 'x86-alternatives-for-linus', 'x86-fpu-for-linus', 'x86-hwmon-for-linus', 'x86-paravirt-for-linus', 'core-locking-for-linus' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-01-06 11:11:50 -08:00
cpuset.c convert cgroup and cpuset 2010-10-29 04:17:06 -04:00
cred.c CRED: Fix memory and refcount leaks upon security_prepare_creds() failure 2011-02-07 14:04:00 -08:00
delayacct.c
dma.c
elfcore.c
exec_domain.c sys_personality: remove the bogus checks in sys_personality()->__set_personality() path 2010-08-09 20:45:05 -07:00
exit.c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-01-11 11:02:13 -08:00
extable.c
fork.c thp: khugepaged 2011-01-13 17:32:43 -08:00
freezer.c Freezer: Fix a race during freezing of TASK_STOPPED tasks 2010-12-24 15:02:40 +01:00
futex_compat.c futex: Address compiler warnings in exit_robust_list 2010-11-10 13:27:50 +01:00
futex.c Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', 'timers-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-01-15 12:45:00 -08:00
groups.c kernel/groups.c: fix integer overflow in groups_search 2010-09-09 18:57:24 -07:00
hrtimer.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
hung_task.c lockup detector: Fix grammar by adding a missing "to" in the comments 2010-08-17 09:11:52 +02:00
hw_breakpoint.c perf: Dynamic pmu types 2010-12-16 11:36:43 +01:00
irq_work.c irq_work: Use per cpu atomics instead of regular atomics 2010-12-18 15:54:48 +01:00
itimer.c
jump_label.c jump label: Make arch_jump_label_text_poke_early() optional 2010-10-29 12:56:13 -04:00
kallsyms.c Revert "kernel: make /proc/kallsyms mode 400 to reduce ease of attacking" 2010-11-19 11:54:40 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c tree-wide: fix comment/printk typos 2010-11-01 15:38:34 -04:00
kfifo.c kfifo: fix scatterlist usage 2010-10-01 10:50:58 -07:00
kmod.c Make do_execve() take a const filename pointer 2010-08-17 18:07:43 -07:00
kprobes.c Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2011-01-07 17:02:58 -08:00
ksysfs.c sysfs: add struct file* to bin_attr callbacks 2010-05-21 09:37:31 -07:00
kthread.c sched: Constify function scope static struct sched_param usage 2011-01-07 15:55:45 +01:00
latencytop.c fs/proc/base.c, kernel/latencytop.c: convert sprintf_symbol() to %ps 2011-01-13 08:03:16 -08:00
lockdep_internals.h lockdep: No need to disable preemption in debug atomic ops 2010-05-04 05:38:16 +02:00
lockdep_proc.c locking, lockdep: Convert sprintf_symbol to %pS 2010-11-10 10:23:58 +01:00
lockdep_states.h
lockdep.c lockdep: Move early boot local IRQ enable/disable status to init/main.c 2011-01-20 13:32:33 +01:00
Makefile kernel: clean up USE_GENERIC_SMP_HELPERS 2011-01-13 08:03:08 -08:00
module.c tracepoints: Fix section alignment using pointer array 2011-02-03 09:28:46 -05:00
mutex-debug.c
mutex-debug.h
mutex.c mutexes, sched: Introduce arch_mutex_cpu_relax() 2010-11-26 15:05:34 +01:00
mutex.h
notifier.c
ns_cgroup.c cgroup: notify ns_cgroup deprecated 2010-10-27 18:03:09 -07:00
nsproxy.c
padata.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2010-08-04 15:23:14 -07:00
panic.c ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support 2011-01-12 03:06:19 -05:00
params.c module: show version information for built-in modules in sysfs 2011-01-24 14:32:51 +10:30
perf_event.c perf: Fix reading in perf_event_read() 2011-02-03 12:15:46 +01:00
pid_namespace.c
pid.c Add RCU check for find_task_by_vpid(). 2010-08-19 17:18:02 -07:00
pm_qos_params.c PM / PM QoS: Fix reversed min and max 2010-11-15 22:45:22 +01:00
posix-cpu-timers.c posix-cpu-timers: Rcu_read_lock/unlock protect find_task_by_vpid call 2010-11-10 13:07:06 +01:00
posix-timers.c posix-timers: Annotate lock_timer() 2010-10-21 17:30:06 +02:00
printk.c cap_syslog: accept CAP_SYS_ADMIN for now 2011-02-10 17:53:55 -08:00
profile.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
ptrace.c ptrace: use safer wake up on ptrace_detach() 2011-02-11 16:12:19 -08:00
range.c kernel/range.c: fix clean_sort_range() for the case of full array 2010-11-12 07:55:31 -08:00
rcupdate.c Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu 2010-10-07 09:43:11 +02:00
rcutiny_plugin.h rcu: Distinguish between boosting and boosted 2010-11-29 22:01:56 -08:00
rcutiny.c rcu: avoid pointless blocked-task warnings 2011-01-14 04:58:08 -08:00
rcutorture.c rcu: add priority-inversion testing to rcutorture 2010-10-07 10:41:08 -07:00
rcutree_plugin.h rcu: increase synchronize_sched_expedited() batching 2010-12-17 12:34:08 -08:00
rcutree_trace.c rcu,cleanup: simplify the code when cpu is dying 2010-11-29 22:01:58 -08:00
rcutree.c Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2011-01-07 17:02:58 -08:00
rcutree.h rcu: limit rcu_node leaf-level fanout 2010-12-17 12:34:20 -08:00
relay.c Clean up relay_alloc_page_array() slightly by using vzalloc rather than vmalloc and memset 2010-11-05 08:21:34 -07:00
res_counter.c
resource.c resources: add arch hook for preventing allocation in reserved areas 2010-12-17 10:01:09 -08:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c rtmutex-tester: make it build without BKL 2010-10-19 11:29:56 +02:00
rtmutex.c
rtmutex.h
rwsem.c
sched_autogroup.c sched, autogroup: Fix CONFIG_RT_GROUP_SCHED sched_setscheduler() failure 2011-01-18 15:09:42 +01:00
sched_autogroup.h sched, autogroup: Fix CONFIG_RT_GROUP_SCHED sched_setscheduler() failure 2011-01-18 15:09:42 +01:00
sched_clock.c sched: Add some clock info to sched_debug 2010-11-23 10:29:08 +01:00
sched_cpupri.c sched: No need for bootmem special cases 2010-07-17 12:06:22 +02:00
sched_cpupri.h sched: No need for bootmem special cases 2010-07-17 12:06:22 +02:00
sched_debug.c sched: Use a buddy to implement yield_task_fair() 2011-02-03 14:20:33 +01:00
sched_fair.c sched: Fix the group_imb logic 2011-02-23 11:33:57 +01:00
sched_features.h sched: Rewrite tg_shares_up) 2010-11-18 13:27:46 +01:00
sched_idletask.c sched: Fix switch_from_fair() 2011-01-26 12:33:22 +01:00
sched_rt.c Merge commit 'v2.6.38-rc5' into sched/core 2011-02-16 13:31:55 +01:00
sched_stats.h sched_stat: Update sched_info_queue/dequeue() code comments 2010-10-24 13:29:01 +02:00
sched_stoptask.c sched: Fix switch_from_fair() 2011-01-26 12:33:22 +01:00
sched.c sched: Add yield_to(task, preempt) functionality 2011-02-03 14:20:33 +01:00
seccomp.c
semaphore.c
signal.c signals: annotate lock context change on ptrace_stop() 2010-10-27 18:03:12 -07:00
smp.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-01-20 18:30:37 -08:00
softirq.c softirqs: Free up pf flag PF_KSOFTIRQD 2011-01-26 12:33:20 +01:00
spinlock.c
srcu.c rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status 2011-01-14 04:56:49 -08:00
stacktrace.c
stop_machine.c stop_machine: convert cpu notifier to return encapsulate errno value 2010-10-26 16:52:15 -07:00
sys_ni.c powerpc: define a compat_sys_recv cond_syscall 2010-09-23 17:03:55 +10:00
sys.c Fix prlimit64 for suid/sgid processes 2011-01-31 13:01:27 +10:00
sysctl_binary.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
sysctl_check.c sysctl: min/max bounds are optional 2010-10-15 14:42:24 -07:00
sysctl.c Merge commit 'v2.6.38-rc5' into sched/core 2011-02-16 13:31:55 +01:00
taskstats.c taskstats: use better ifdef for alignment 2011-01-13 08:03:19 -08:00
test_kprobes.c kprobes: Fix selftest to clear flags field for reusing probes 2010-10-14 08:55:27 +02:00
time.c time: Add nsecs_to_cputime64 interface for asm-generic 2011-01-26 12:33:20 +01:00
timeconst.pl
timer.c Revert "lockdep, timer: Fix del_timer_sync() annotation" 2011-02-08 16:18:39 +01:00
tracepoint.c tracepoints: Fix section alignment using pointer array 2011-02-03 09:28:46 -05:00
tsacct.c taskstats: use real microsecond granularity for CPU times 2010-10-27 18:03:17 -07:00
uid16.c
up.c
user_namespace.c user_ns: improve the user_ns on-the-slab packaging 2011-01-13 08:03:18 -08:00
user-return-notifier.c
user.c fix freeing user_struct in user cache 2010-12-29 11:31:38 -08:00
utsname_sysctl.c
utsname.c
wait.c docbook: add more wait/wake/completion to device-drivers docbook 2010-10-26 17:32:41 -07:00
watchdog.c watchdog, nmi: Lower the severity of error messages 2011-02-10 13:21:59 +01:00
workqueue_sched.h workqueue: implement concurrency managed dynamic worker pool 2010-06-29 10:07:14 +02:00
workqueue.c workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long 2011-02-16 18:10:19 +01:00