linux/kernel/sched
Vincent Guittot 0b0695f2b3 sched/fair: Rework load_balance()
The load_balance() algorithm contains some heuristics which have become
meaningless since the rework of the scheduler's metrics like the
introduction of PELT.

Furthermore, load is an ill-suited metric for solving certain task
placement imbalance scenarios.

For instance, in the presence of idle CPUs, we should simply try to get at
least one task per CPU, whereas the current load-based algorithm can actually
leave idle CPUs alone simply because the load is somewhat balanced.

The current algorithm ends up creating virtual and meaningless values like
the avg_load_per_task or tweaks the state of a group to make it overloaded
whereas it's not, in order to try to migrate tasks.

load_balance() should better qualify the imbalance of the group and clearly
define what has to be moved to fix this imbalance.

The type of sched_group has been extended to better reflect the type of
imbalance. We now have:

	group_has_spare
	group_fully_busy
	group_misfit_task
	group_asym_packing
	group_imbalanced
	group_overloaded

Based on the type of sched_group, load_balance now sets what it wants to
move in order to fix the imbalance. It can be some load as before but also
some utilization, a number of task or a type of task:

	migrate_task
	migrate_util
	migrate_load
	migrate_misfit

This new load_balance() algorithm fixes several pending wrong tasks
placement:

 - the 1 task per CPU case with asymmetric system
 - the case of cfs task preempted by other class
 - the case of tasks not evenly spread on groups with spare capacity

Also the load balance decisions have been consolidated in the 3 functions
below after removing the few bypasses and hacks of the current code:

 - update_sd_pick_busiest() select the busiest sched_group.
 - find_busiest_group() checks if there is an imbalance between local and
   busiest group.
 - calculate_imbalance() decides what have to be moved.

Finally, the now unused field total_running of struct sd_lb_stats has been
removed.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hdanton@sina.com
Cc: parth@linux.ibm.com
Cc: pauld@redhat.com
Cc: quentin.perret@arm.com
Cc: riel@surriel.com
Cc: srikar@linux.vnet.ibm.com
Cc: valentin.schneider@arm.com
Link: https://lkml.kernel.org/r/1571405198-27570-5-git-send-email-vincent.guittot@linaro.org
[ Small readability and spelling updates. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-21 09:40:53 +02:00
..
autogroup.c sched/autogroup: Make autogroup_path() always available 2019-06-24 19:23:40 +02:00
autogroup.h sched/headers: Simplify and clean up header usage in the scheduler 2018-03-04 12:39:29 +01:00
clock.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
completion.c sched/Documentation: Update wake_up() & co. memory-barrier guarantees 2018-07-17 09:30:34 +02:00
core.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-28 12:39:07 -07:00
cpuacct.c sched/headers: Simplify and clean up header usage in the scheduler 2018-03-04 12:39:29 +01:00
cpudeadline.c Linux 5.2-rc5 2019-06-17 12:12:27 +02:00
cpudeadline.h sched/headers: Simplify and clean up header usage in the scheduler 2018-03-04 12:39:29 +01:00
cpufreq_schedutil.c Power management updates for 5.4-rc1 2019-09-17 19:15:14 -07:00
cpufreq.c sched/cpufreq: Annotate cpufreq_update_util_data pointer with __rcu 2019-04-03 12:34:31 +02:00
cpupri.c Linux 5.2-rc5 2019-06-17 12:12:27 +02:00
cpupri.h sched/headers: Simplify and clean up header usage in the scheduler 2018-03-04 12:39:29 +01:00
cputime.c sched/cputime: Spare a seqcount lock/unlock cycle on context switch 2019-10-09 12:39:26 +02:00
deadline.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-17 12:35:15 -07:00
debug.c Linux 5.2-rc6 2019-06-24 19:19:53 +02:00
fair.c sched/fair: Rework load_balance() 2019-10-21 09:40:53 +02:00
features.h sched/fair: Replace source_load() & target_load() with weighted_cpuload() 2019-06-03 11:49:39 +02:00
idle.c mm: remove quicklist page table caches 2019-09-24 15:54:09 -07:00
isolation.c sched/isolation: Prefer housekeeping CPU in local node 2019-07-25 15:51:55 +02:00
loadavg.c sched: loadavg: make calc_load_n() public 2018-10-26 16:26:32 -07:00
Makefile psi: pressure stall information for CPU, memory, and IO 2018-10-26 16:26:32 -07:00
membarrier.c membarrier: Fix RCU locking bug caused by faulty merge 2019-10-01 21:27:50 +02:00
pelt.c sched/debug: Add new tracepoint to track PELT at se level 2019-06-24 19:23:42 +02:00
pelt.h sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity() 2019-06-24 19:23:39 +02:00
psi.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-16 17:25:49 -07:00
rt.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-17 12:35:15 -07:00
sched-pelt.h sched/fair: Fix "runnable_avg_yN_inv" not used warnings 2019-06-17 12:15:58 +02:00
sched.h sched/membarrier: Fix p->mm->membarrier_state racy load 2019-09-25 17:42:30 +02:00
stats.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
stats.h sched/stats: Fix unlikely() use of sched_info_on() 2019-07-25 15:51:55 +02:00
stop_task.c sched: Rework pick_next_task() slow-path 2019-08-08 09:09:31 +02:00
swait.c kernel/sched/: remove caller signal_pending branch predictions 2019-01-04 13:13:48 -08:00
topology.c sched/topology: Don't set SD_BALANCE_WAKE on cpuset domain relax 2019-10-17 21:31:54 +02:00
wait_bit.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
wait.c sched/wait: Deduplicate code with do-while 2019-06-24 19:23:40 +02:00