linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-27 14:41:39 +00:00

History

Clement Courbet 1e17fb8edc sched: Optimize __calc_delta() A significant portion of __calc_delta() time is spent in the loop shifting a u64 by 32 bits. Use `fls` instead of iterating. This is ~7x faster on benchmarks. The generic `fls` implementation (`generic_fls`) is still ~4x faster than the loop. Architectures that have a better implementation will make use of it. For example, on x86 we get an additional factor 2 in speed without dedicated implementation. On GCC, the asm versions of `fls` are about the same speed as the builtin. On Clang, the versions that use fls are more than twice as slow as the builtin. This is because the way the `fls` function is written, clang puts the value in memory: https://godbolt.org/z/EfMbYe. This bug is filed at https://bugs.llvm.org/show_bug.cgi?idI406. ``` name cpu/op BM_Calc<__calc_delta_loop> 9.57ms Â=B112% BM_Calc<__calc_delta_generic_fls> 2.36ms Â=B113% BM_Calc<__calc_delta_asm_fls> 2.45ms Â=B113% BM_Calc<__calc_delta_asm_fls_nomem> 1.66ms Â=B112% BM_Calc<__calc_delta_asm_fls64> 2.46ms Â=B113% BM_Calc<__calc_delta_asm_fls64_nomem> 1.34ms Â=B115% BM_Calc<__calc_delta_builtin> 1.32ms Â=B111% ``` Signed-off-by: Clement Courbet <courbet@google.com> Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210303224653.2579656-1-joshdon@google.com		2021-03-10 09:51:49 +01:00
..
autogroup.c	sched/autogroup: Make autogroup_path() always available	2019-06-24 19:23:40 +02:00
autogroup.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
clock.c	sched/clock: Use static_branch_likely() with sched_clock_running	2019-11-29 08:10:54 +01:00
completion.c	completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()	2020-03-23 18:40:25 +01:00
core.c	psi: Use ONCPU state tracking machinery to detect reclaim	2021-03-06 12:40:22 +01:00
cpuacct.c	sched/cpuacct: Fix charge cpuacct.usage_sys	2020-05-19 20:34:14 +02:00
cpudeadline.c	sched,rt: Use the full cpumask for balancing	2020-11-10 18:39:00 +01:00
cpudeadline.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpufreq_schedutil.c	More power management updates for 5.12-rc1	2021-02-23 14:59:46 -08:00
cpufreq.c	cpufreq: Avoid leaving stale IRQ work items during CPU offline	2019-12-12 17:59:43 +01:00
cpupri.c	Merge branch 'sched/migrate-disable'	2020-11-10 18:39:04 +01:00
cpupri.h	sched/cpupri: Add CPUPRI_HIGHER	2020-10-29 11:00:30 +01:00
cputime.c	irqtime: Move irqtime entry accounting after irq offset incrementation	2020-12-02 20:20:05 +01:00
deadline.c	sched/features: Distinguish between NORMAL and DEADLINE hrtick	2021-02-17 14:12:42 +01:00
debug.c	sched: Use task_current() instead of 'rq->curr == p'	2021-01-14 11:20:11 +01:00
fair.c	sched: Optimize __calc_delta()	2021-03-10 09:51:49 +01:00
features.h	sched/features: Distinguish between NORMAL and DEADLINE hrtick	2021-02-17 14:12:42 +01:00
idle.c	sched/fair: Trigger the update of blocked load on newly idle cpu	2021-03-06 12:40:22 +01:00
isolation.c	isolcpus: Affine unbound kernel threads to housekeeping cpus	2020-06-15 14:10:03 +02:00
loadavg.c	sched: nohz: stop passing around unused "ticks" parameter.	2020-07-22 10:22:04 +02:00
Makefile	kcsan: Improve various small stylistic details	2019-11-20 10:47:23 +01:00
membarrier.c	sched/membarrier: fix missing local execution of ipi_sync_rq_state()	2021-03-06 12:40:21 +01:00
pelt.c	sched: Add a tracepoint to track rq->nr_running	2020-07-08 11:39:02 +02:00
pelt.h	sched/pelt: Cleanup PELT divider	2020-06-15 14:10:06 +02:00
psi.c	psi: Optimize task switch inside shared cgroups	2021-03-06 12:40:23 +01:00
rt.c	sched: Use task_current() instead of 'rq->curr == p'	2021-01-14 11:20:11 +01:00
sched-pelt.h	sched/fair: Fix "runnable_avg_yN_inv" not used warnings	2019-06-17 12:15:58 +02:00
sched.h	sched: Optimize __calc_delta()	2021-03-10 09:51:49 +01:00
smp.h	sched/headers: Split out open-coded prototypes into kernel/sched/smp.h	2020-05-28 11:03:20 +02:00
stats.c	proc: introduce proc_create_seq{,_data}	2018-05-16 07:23:35 +02:00
stats.h	psi: Optimize task switch inside shared cgroups	2021-03-06 12:40:23 +01:00
stop_task.c	sched: Remove select_task_rq()'s sd_flag parameter	2020-11-10 18:39:06 +01:00
swait.c	sched/swait: Prepare usage in completions	2020-03-21 16:00:23 +01:00
topology.c	sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2	2021-03-06 12:40:22 +01:00
wait_bit.c	sched/wait: fix ___wait_var_event(exclusive)	2019-12-17 13:32:50 +01:00
wait.c	sched/wait: Add add_wait_queue_priority()	2020-11-15 09:49:09 -05:00