linux/kernel/sched
Vincent Guittot 89ee048f3c sched/core: Fix group_entity's share update
The update of the share of a cfs_rq is done when its load_avg is updated
but before the group_entity's load_avg has been updated for the past time
slot. This generates wrong load_avg accounting which can be significant
when small tasks are involved in the scheduling.

Let take the example of a task a that is dequeued of its task group A:
   root
  (cfs_rq)
    \
    (se)
     A
    (cfs_rq)
      \
      (se)
       a

Task "a" was the only task in task group A which becomes idle when a is
dequeued.

We have the sequence:

- dequeue_entity a->se
    - update_load_avg(a->se)
    - dequeue_entity_load_avg(A->cfs_rq, a->se)
    - update_cfs_shares(A->cfs_rq)
	A->cfs_rq->load.weight == 0
        A->se->load.weight is updated with the new share (0 in this case)
- dequeue_entity A->se
    - update_load_avg(A->se) but its weight is now null so the last time
      slot (up to a tick) will be accounted with a weight of 0 instead of
      its real weight during the time slot. The last time slot will be
      accounted as an idle one whereas it was a running one.

If the running time of task a is short enough that no tick happens when it
runs, all running time of group entity A->se will be accounted as idle
time.

Instead, we should update the share of a cfs_rq (in fact the weight of its
group entity) only after having updated the load_avg of the group_entity.

update_cfs_shares() now takes the sched_entity as a parameter instead of the
cfs_rq, and the weight of the group_entity is updated only once its load_avg
has been synced with current time.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/1482335426-7664-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:30:02 +01:00
..
auto_group.c sched/autogroup: Fix 64-bit kernel nice level adjustment 2016-11-24 05:45:02 +01:00
auto_group.h
clock.c sched/clock: Provide better clock continuity 2017-01-14 11:30:00 +01:00
completion.c sched/completions: Fix complete_all() semantics 2017-01-14 11:30:01 +01:00
core.c sched/clock: Delay switching sched_clock to stable 2017-01-14 11:29:59 +01:00
cpuacct.c sched/cpuacct: Avoid %lld seq_printf warning 2016-11-16 10:29:03 +01:00
cpuacct.h sched/cpuacct: Simplify the cpuacct code 2016-03-21 11:00:28 +01:00
cpudeadline.c sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear() 2016-09-05 13:29:43 +02:00
cpudeadline.h sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear() 2016-09-05 13:29:43 +02:00
cpufreq_schedutil.c cpufreq: schedutil: Rectify comment in sugov_irq_work() function 2016-11-24 21:50:59 +01:00
cpufreq.c cpufreq / sched: Pass flags to cpufreq_update_util() 2016-08-16 22:14:55 +02:00
cpupri.c sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed 2016-05-12 09:55:35 +02:00
cpupri.h
cputime.c sched/cputime: Rename vtime_account_user() to vtime_flush() 2017-01-14 09:54:13 +01:00
deadline.c sched/core: Add wrappers for lockdep_(un)pin_lock() 2017-01-14 11:29:30 +01:00
debug.c sched/deadline: Show leftover runtime and abs deadline in /proc/*/sched 2017-01-14 11:30:00 +01:00
fair.c sched/core: Fix group_entity's share update 2017-01-14 11:30:02 +01:00
features.h sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define 2015-09-13 09:52:55 +02:00
idle_task.c sched/core: Add wrappers for lockdep_(un)pin_lock() 2017-01-14 11:29:30 +01:00
idle.c sched/idle: Add support for tasks that inject idle 2016-11-29 14:02:21 +01:00
loadavg.c sched/core: Correct off by one bug in load migration calculation 2016-07-13 14:58:20 +02:00
Makefile cpufreq: schedutil: New governor based on scheduler utilization data 2016-04-02 01:09:12 +02:00
rt.c sched/core: Add wrappers for lockdep_(un)pin_lock() 2017-01-14 11:29:30 +01:00
sched.h sched/core: Add debugging code to catch missing update_rq_clock() calls 2017-01-14 11:29:35 +01:00
stats.c
stats.h sched/debug: Rename 'schedstat_val()' -> 'schedstat_val_or_zero()' 2016-09-05 13:29:46 +02:00
stop_task.c sched/core: Add wrappers for lockdep_(un)pin_lock() 2017-01-14 11:29:30 +01:00
swait.c wait.[ch]: Introduce the simple waitqueue (swait) implementation 2016-02-25 11:27:16 +01:00
wait.c mm: remove per-zone hashtable of bitlock waitqueues 2016-10-27 09:27:57 -07:00