linux/kernel/sched
Clement Courbet 1e17fb8edc sched: Optimize __calc_delta()
A significant portion of __calc_delta() time is spent in the loop
shifting a u64 by 32 bits. Use `fls` instead of iterating.

This is ~7x faster on benchmarks.

The generic `fls` implementation (`generic_fls`) is still ~4x faster
than the loop.
Architectures that have a better implementation will make use of it. For
example, on x86 we get an additional factor 2 in speed without dedicated
implementation.

On GCC, the asm versions of `fls` are about the same speed as the
builtin. On Clang, the versions that use fls are more than twice as
slow as the builtin. This is because the way the `fls` function is
written, clang puts the value in memory:
https://godbolt.org/z/EfMbYe. This bug is filed at
https://bugs.llvm.org/show_bug.cgi?idI406.

```
name                                   cpu/op
BM_Calc<__calc_delta_loop>             9.57ms Â=B112%
BM_Calc<__calc_delta_generic_fls>      2.36ms Â=B113%
BM_Calc<__calc_delta_asm_fls>          2.45ms Â=B113%
BM_Calc<__calc_delta_asm_fls_nomem>    1.66ms Â=B112%
BM_Calc<__calc_delta_asm_fls64>        2.46ms Â=B113%
BM_Calc<__calc_delta_asm_fls64_nomem>  1.34ms Â=B115%
BM_Calc<__calc_delta_builtin>          1.32ms Â=B111%
```

Signed-off-by: Clement Courbet <courbet@google.com>
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210303224653.2579656-1-joshdon@google.com
2021-03-10 09:51:49 +01:00
..
autogroup.c sched/autogroup: Make autogroup_path() always available 2019-06-24 19:23:40 +02:00
autogroup.h sched/headers: Simplify and clean up header usage in the scheduler 2018-03-04 12:39:29 +01:00
clock.c sched/clock: Use static_branch_likely() with sched_clock_running 2019-11-29 08:10:54 +01:00
completion.c completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() 2020-03-23 18:40:25 +01:00
core.c psi: Use ONCPU state tracking machinery to detect reclaim 2021-03-06 12:40:22 +01:00
cpuacct.c sched/cpuacct: Fix charge cpuacct.usage_sys 2020-05-19 20:34:14 +02:00
cpudeadline.c sched,rt: Use the full cpumask for balancing 2020-11-10 18:39:00 +01:00
cpudeadline.h sched/headers: Simplify and clean up header usage in the scheduler 2018-03-04 12:39:29 +01:00
cpufreq_schedutil.c More power management updates for 5.12-rc1 2021-02-23 14:59:46 -08:00
cpufreq.c cpufreq: Avoid leaving stale IRQ work items during CPU offline 2019-12-12 17:59:43 +01:00
cpupri.c Merge branch 'sched/migrate-disable' 2020-11-10 18:39:04 +01:00
cpupri.h sched/cpupri: Add CPUPRI_HIGHER 2020-10-29 11:00:30 +01:00
cputime.c irqtime: Move irqtime entry accounting after irq offset incrementation 2020-12-02 20:20:05 +01:00
deadline.c sched/features: Distinguish between NORMAL and DEADLINE hrtick 2021-02-17 14:12:42 +01:00
debug.c sched: Use task_current() instead of 'rq->curr == p' 2021-01-14 11:20:11 +01:00
fair.c sched: Optimize __calc_delta() 2021-03-10 09:51:49 +01:00
features.h sched/features: Distinguish between NORMAL and DEADLINE hrtick 2021-02-17 14:12:42 +01:00
idle.c sched/fair: Trigger the update of blocked load on newly idle cpu 2021-03-06 12:40:22 +01:00
isolation.c isolcpus: Affine unbound kernel threads to housekeeping cpus 2020-06-15 14:10:03 +02:00
loadavg.c sched: nohz: stop passing around unused "ticks" parameter. 2020-07-22 10:22:04 +02:00
Makefile kcsan: Improve various small stylistic details 2019-11-20 10:47:23 +01:00
membarrier.c sched/membarrier: fix missing local execution of ipi_sync_rq_state() 2021-03-06 12:40:21 +01:00
pelt.c sched: Add a tracepoint to track rq->nr_running 2020-07-08 11:39:02 +02:00
pelt.h sched/pelt: Cleanup PELT divider 2020-06-15 14:10:06 +02:00
psi.c psi: Optimize task switch inside shared cgroups 2021-03-06 12:40:23 +01:00
rt.c sched: Use task_current() instead of 'rq->curr == p' 2021-01-14 11:20:11 +01:00
sched-pelt.h sched/fair: Fix "runnable_avg_yN_inv" not used warnings 2019-06-17 12:15:58 +02:00
sched.h sched: Optimize __calc_delta() 2021-03-10 09:51:49 +01:00
smp.h sched/headers: Split out open-coded prototypes into kernel/sched/smp.h 2020-05-28 11:03:20 +02:00
stats.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
stats.h psi: Optimize task switch inside shared cgroups 2021-03-06 12:40:23 +01:00
stop_task.c sched: Remove select_task_rq()'s sd_flag parameter 2020-11-10 18:39:06 +01:00
swait.c sched/swait: Prepare usage in completions 2020-03-21 16:00:23 +01:00
topology.c sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2 2021-03-06 12:40:22 +01:00
wait_bit.c sched/wait: fix ___wait_var_event(exclusive) 2019-12-17 13:32:50 +01:00
wait.c sched/wait: Add add_wait_queue_priority() 2020-11-15 09:49:09 -05:00