linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-04 01:51:34 +00:00

History

Huaixin Chang f4183717b3 sched/fair: Introduce the burstable CFS controller The CFS bandwidth controller limits CPU requests of a task group to quota during each period. However, parallel workloads might be bursty so that they get throttled even when their average utilization is under quota. And they are latency sensitive at the same time so that throttling them is undesired. We borrow time now against our future underrun, at the cost of increased interference against the other system users. All nicely bounded. Traditional (UP-EDF) bandwidth control is something like: (U = \Sum u_i) <= 1 This guaranteeds both that every deadline is met and that the system is stable. After all, if U were > 1, then for every second of walltime, we'd have to run more than a second of program time, and obviously miss our deadline, but the next deadline will be further out still, there is never time to catch up, unbounded fail. This work observes that a workload doesn't always executes the full quota; this enables one to describe u_i as a statistical distribution. For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100) (the traditional WCET). This effectively allows u to be smaller, increasing the efficiency (we can pack more tasks in the system), but at the cost of missing deadlines when all the odds line up. However, it does maintain stability, since every overrun must be paired with an underrun as long as our x is above the average. That is, suppose we have 2 tasks, both specify a p(95) value, then we have a p(95)p(95) = 90.25% chance both tasks are within their quota and everything is good. At the same time we have a p(5)p(5) = 0.25% chance both tasks will exceed their quota at the same time (guaranteed deadline fail). Somewhere in between there's a threshold where one exceeds and the other doesn't underrun enough to compensate; this depends on the specific CDFs. At the same time, we can say that the worst case deadline miss, will be \Sum e_i; that is, there is a bounded tardiness (under the assumption that x+e is indeed WCET). The benefit of burst is seen when testing with schbench. Default value of kernel.sched_cfs_bandwidth_slice_us(5ms) and CONFIG_HZ(1000) is used. mkdir /sys/fs/cgroup/cpu/test echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us ./schbench -m 1 -t 3 -r 20 -c 80000 -R 10 The average CPU usage is at 80%. I run this for 10 times, and got long tail latency for 6 times and got throttled for 8 times. Tail latencies are shown below, and it wasn't the worst case. Latency percentiles (usec) 50.0000th: 19872 75.0000th: 21344 90.0000th: 22176 95.0000th: 22496 99.0000th: 22752 99.5000th: 22752 99.9000th: 22752 min=0, max=22727 rps: 9.90 p95 (usec) 22496 p99 (usec) 22752 p95/cputime 28.12% p99/cputime 28.44% The interferenece when using burst is valued by the possibilities for missing the deadline and the average WCET. Test results showed that when there many cgroups or CPU is under utilized, the interference is limited. More details are shown in: https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/ Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com> Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com> Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210621092800.23714-2-changhuaixin@linux.alibaba.com		2021-06-24 09:07:50 +02:00
..
autogroup.c	sched/autogroup: Make autogroup_path() always available	2019-06-24 19:23:40 +02:00
autogroup.h
clock.c	sched: Fix various typos	2021-03-22 00:11:52 +01:00
completion.c	completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()	2020-03-23 18:40:25 +01:00
core_sched.c	sched: prctl() core-scheduling interface	2021-05-12 11:43:31 +02:00
core.c	sched/fair: Introduce the burstable CFS controller	2021-06-24 09:07:50 +02:00
cpuacct.c	sched: Wrap rq::lock access	2021-05-12 11:43:26 +02:00
cpudeadline.c	sched,rt: Use the full cpumask for balancing	2020-11-10 18:39:00 +01:00
cpudeadline.h
cpufreq_schedutil.c	sched/cpufreq: Consider reduced CPU capacity in energy calculation	2021-06-17 14:11:43 +02:00
cpufreq.c	cpufreq: Avoid leaving stale IRQ work items during CPU offline	2019-12-12 17:59:43 +01:00
cpupri.c	sched: Fix various typos	2021-03-22 00:11:52 +01:00
cpupri.h	sched/cpupri: Add CPUPRI_HIGHER	2020-10-29 11:00:30 +01:00
cputime.c	Scheduler updates for this cycle are:	2021-04-28 13:33:57 -07:00
deadline.c	sched/rt: Fix Deadline utilization tracking during policy change	2021-06-22 16:41:59 +02:00
debug.c	Merge branch 'sched/urgent' into sched/core, to resolve conflicts	2021-06-18 11:31:25 +02:00
fair.c	sched/fair: Introduce the burstable CFS controller	2021-06-24 09:07:50 +02:00
features.h	sched: Warn on long periods of pending need_resched	2021-04-21 13:55:41 +02:00
idle.c	sched: Trivial forced-newidle balancer	2021-05-12 11:43:30 +02:00
isolation.c	sched/isolation: Reconcile rcu_nocbs= and nohz_full=	2021-05-13 14:12:47 +02:00
loadavg.c	sched: Make multiple runqueue task counters 32-bit	2021-05-12 21:34:17 +02:00
Makefile	sched: Trivial core scheduling cookie management	2021-05-12 11:43:31 +02:00
membarrier.c	sched/membarrier: fix missing local execution of ipi_sync_rq_state()	2021-03-06 12:40:21 +01:00
pelt.c	sched: Fix various typos	2021-03-22 00:11:52 +01:00
pelt.h	Merge branch 'sched/urgent' into sched/core, to resolve conflicts	2021-06-18 11:31:25 +02:00
psi.c	psi: Fix psi state corruption when schedule() races with cgroup move	2021-05-06 15:33:26 +02:00
rt.c	sched/rt: Fix RT utilization tracking during policy change	2021-06-22 16:41:59 +02:00
sched-pelt.h	sched/fair: Fix "runnable_avg_yN_inv" not used warnings	2019-06-17 12:15:58 +02:00
sched.h	sched/fair: Introduce the burstable CFS controller	2021-06-24 09:07:50 +02:00
smp.h	sched/headers: Split out open-coded prototypes into kernel/sched/smp.h	2020-05-28 11:03:20 +02:00
stats.c	sched: Fix various typos	2021-03-22 00:11:52 +01:00
stats.h	sched: Introduce task_is_running()	2021-06-18 11:43:07 +02:00
stop_task.c	sched: Introduce sched_class::pick_task()	2021-05-12 11:43:28 +02:00
swait.c	sched/swait: Prepare usage in completions	2020-03-21 16:00:23 +01:00
topology.c	sched: Wrap rq::lock access	2021-05-12 11:43:26 +02:00
wait_bit.c	sched/wait: fix ___wait_var_event(exclusive)	2019-12-17 13:32:50 +01:00
wait.c	sched/wait: Add add_wait_queue_priority()	2020-11-15 09:49:09 -05:00