linux

History

Vincent Guittot ea67821b9a sched: Replace capacity_factor by usage The scheduler tries to compute how many tasks a group of CPUs can handle by assuming that a task's load is SCHED_LOAD_SCALE and a CPU's capacity is SCHED_CAPACITY_SCALE. 'struct sg_lb_stats:group_capacity_factor' divides the capacity of the group by SCHED_LOAD_SCALE to estimate how many task can run in the group. Then, it compares this value with the sum of nr_running to decide if the group is overloaded or not. But the 'group_capacity_factor' concept is hardly working for SMT systems, it sometimes works for big cores but fails to do the right thing for little cores. Below are two examples to illustrate the problem that this patch solves: 1- If the original capacity of a CPU is less than SCHED_CAPACITY_SCALE (640 as an example), a group of 3 CPUS will have a max capacity_factor of 2 (div_round_closest(3x640/1024) = 2) which means that it will be seen as overloaded even if we have only one task per CPU. 2 - If the original capacity of a CPU is greater than SCHED_CAPACITY_SCALE (1512 as an example), a group of 4 CPUs will have a capacity_factor of 4 (at max and thanks to the fix [0] for SMT system that prevent the apparition of ghost CPUs) but if one CPU is fully used by rt tasks (and its capacity is reduced to nearly nothing), the capacity factor of the group will still be 4 (div_round_closest(3*1512/1024) = 5 which is cap to 4 with [0]). So, this patch tries to solve this issue by removing capacity_factor and replacing it with the 2 following metrics: - The available CPU's capacity for CFS tasks which is already used by load_balance(). - The usage of the CPU by the CFS tasks. For the latter, utilization_avg_contrib has been re-introduced to compute the usage of a CPU by CFS tasks. 'group_capacity_factor' and 'group_has_free_capacity' has been removed and replaced by 'group_no_capacity'. We compare the number of task with the number of CPUs and we evaluate the level of utilization of the CPUs to define if a group is overloaded or if a group has capacity to handle more tasks. For SD_PREFER_SIBLING, a group is tagged overloaded if it has more than 1 task so it will be selected in priority (among the overloaded groups). Since [1], SD_PREFER_SIBLING is no more concerned by the computation of 'load_above_capacity' because local is not overloaded. [1] `9a5d9ba6a3` ("sched/fair: Allow calculate_imbalance() to move idle cpus") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Morten.Rasmussen@arm.com Cc: dietmar.eggemann@arm.com Cc: efault@gmx.de Cc: kamalesh@linux.vnet.ibm.com Cc: linaro-kernel@lists.linaro.org Cc: nicolas.pitre@linaro.org Cc: preeti@linux.vnet.ibm.com Cc: riel@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1425052454-25797-9-git-send-email-vincent.guittot@linaro.org [ Tidied up the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>		2015-03-27 09:36:04 +01:00
..
auto_group.c	sched/autogroup: Fix failure to set cpu.rt_runtime_us	2015-02-18 16:17:20 +01:00
auto_group.h
clock.c	kernel/sched/clock.c: add another clock for use with the soft lockup watchdog	2015-02-12 18:54:13 -08:00
completion.c	sched/completion: Serialize completion_done() with complete()	2015-02-18 14:27:40 +01:00
core.c	sched: Add struct rq::cpu_capacity_orig	2015-03-27 09:36:02 +01:00
cpuacct.c	cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes	2014-07-15 11:05:09 -04:00
cpuacct.h	sched/cpuacct: Initialize root cpuacct earlier	2013-04-10 13:54:20 +02:00
cpudeadline.c	sched/deadline: Remove cpu_active_mask from cpudl_find()	2015-02-04 07:52:29 +01:00
cpudeadline.h	sched/deadline: Modify cpudl::free_cpus to reflect rd->online	2015-01-30 19:39:16 +01:00
cpupri.c	Merge commit '3cf2f34' into sched/core, to fix build error	2014-06-12 13:46:37 +02:00
cpupri.h	sched/cpupri: Remove unnecessary definitions in cpupri.h	2014-11-16 10:58:59 +01:00
cputime.c	sched, time: Fix build error with 64 bit cputime_t on 32 bit systems	2014-10-03 05:46:55 +02:00
deadline.c	sched/deadline: Add rq->clock update skip for dl task yield	2015-03-10 05:46:50 +01:00
debug.c	sched: Track group sched_entity usage contributions	2015-03-27 09:35:58 +01:00
fair.c	sched: Replace capacity_factor by usage	2015-03-27 09:36:04 +01:00
features.h	sched/rt: Use IPI to trigger RT task push migration instead of pulling	2015-03-23 10:55:22 +01:00
idle_task.c	sched: Provide update_curr callbacks for stop/idle scheduling classes	2014-11-23 14:14:40 -08:00
idle.c	cpuidle / sleep: Use broadcast timer for states that stop local timer	2015-03-05 23:13:19 +01:00
Makefile	ftrace: allow architectures to specify ftrace compile options	2015-01-29 09:19:19 +01:00
proc.c	cpuidle: menu: Lookup CPU runqueues less	2014-08-06 21:17:45 +02:00
rt.c	sched/rt: Use IPI to trigger RT task push migration instead of pulling	2015-03-23 10:55:22 +01:00
sched.h	sched: Add struct rq::cpu_capacity_orig	2015-03-27 09:36:02 +01:00
stats.c	sched: use %*pb[l] to print bitmaps including cpumasks and nodemasks	2015-02-13 21:21:37 -08:00
stats.h	sched: Micro-optimize by dropping unnecessary task_rq() calls	2013-09-25 13:51:06 +02:00
stop_task.c	sched: Provide update_curr callbacks for stop/idle scheduling classes	2014-11-23 14:14:40 -08:00
wait.c	sched/wait: Fix a kthread race with wait_woken()	2014-11-04 07:17:44 +01:00