linux/arch/x86/kernel
Giovanni Gherdovich 2a0abc5969 x86, sched: Add support for frequency invariance on SKYLAKE_X
The scheduler needs the ratio freq_curr/freq_max for frequency-invariant
accounting. On SKYLAKE_X CPUs set freq_max to the highest frequency that can
be sustained by a group of at least 4 cores.

From the changelog of commit 31e07522be ("tools/power turbostat: fix
decoding for GLM, DNV, SKX turbo-ratio limits"):

 >   Newer processors do not hard-code the the number of cpus in each bin
 >   to {1, 2, 3, 4, 5, 6, 7, 8}  Rather, they can specify any number
 >   of CPUS in each of the 8 bins:
 >
 >   eg.
 >
 >   ...
 >   37 * 100.0 = 3600.0 MHz max turbo 4 active cores
 >   38 * 100.0 = 3700.0 MHz max turbo 3 active cores
 >   39 * 100.0 = 3800.0 MHz max turbo 2 active cores
 >   39 * 100.0 = 3900.0 MHz max turbo 1 active cores
 >
 >   could now look something like this:
 >
 >   ...
 >   37 * 100.0 = 3600.0 MHz max turbo 16 active cores
 >   38 * 100.0 = 3700.0 MHz max turbo 8 active cores
 >   39 * 100.0 = 3800.0 MHz max turbo 4 active cores
 >   39 * 100.0 = 3900.0 MHz max turbo 2 active cores

This encoding of turbo levels applies to both SKYLAKE_X and GOLDMONT/GOLDMONT_D,
but we treat these two classes in separate commits because their freq_max
values need to be different. For SKX we prefer a lower freq_max in the ratio
freq_curr/freq_max, allowing load and utilization to overshoot and the
schedutil governor to be more performance-oriented. Models from the Atom
series (such as GOLDMONT*) are handled in a forthcoming commit as they have to
favor power-efficiency over performance.

Results from a performance evaluation follow.

1. TEST SETUP
2. NEUTRAL BENCHMARKS
3. NON-NEUTRAL BENCHMARKS
4. DETAILED TABLES

1. TEST SETUP
-------------

Test machine:

CPU Model   : Intel Xeon Platinum 8260L CPU @ 2.40GHz (a.k.a. Cascade Lake)
Fam/Mod/Ste : 6:85:6
Topology    : 2 sockets, 24 cores / 48 threads each socket
Memory      : 192G
Storage     : SSD, XFS filesystem

Max EFFICiency, BASE frequency and available turbo levels (MHz):

    EFFIC   1000 |**********
    BASE    2400 |************************
    24C     3100 |*******************************
    20C     3300 |*********************************
    16C     3600 |************************************
    12C     3600 |************************************
    8C      3600 |************************************
    4C      3700 |*************************************
    2C      3900 |***************************************

Tested kernels:

Baseline      : v5.2,              intel_pstate passive,  schedutil
Comparison #1 : v5.2,              intel_pstate active ,  powersave+HWP
Comparison #2 : v5.2, this patch,  intel_pstate passive,  schedutil

2. NEUTRAL BENCHMARKS
---------------------

* pgbench read/write
* NASA Parallel Benchmarks (NPB), MPI or OpenMP for message-passing
* hackbench
* netperf

3. NON-NEUTRAL BENCHMARKS
-------------------------

comparison ratio with baseline; 1.00 means neutral, higher is better:

                      I_PSTATE      FREQ-INV
    ----------------------------------------
    pgbench read-only     1.10             ~
    tbench                1.82          1.14

comparison ratio with baseline; 1.00 means neutral, lower is better:

                      I_PSTATE      FREQ-INV
    ----------------------------------------
    dbench                   ~          0.97
    kernbench             0.88          0.78
    gitsource[*]             ~          0.46

[*] "gitsource" consists in running git's unit tests
tilde (~) means 1.00, ie result identical to baseline

Performance per watt:

performance-per-watt ratios with baseline; 1.00 means neutral, higher is better:

		      I_PSTATE      FREQ-INV
    ----------------------------------------
    dbench                0.92          0.91
    tbench                1.26          1.04
    kernbench             0.95          0.96
    gitsource             1.03          1.30

Similarly to earlier Xeons, measurable performance gains over non-invariant
schedutil are observed on dbench, tbench, kernel compilation and running the
git unit tests suite. Looking at the detailed tables show that the patch
scores the largest difference when the machine is lightly loaded. Power
efficiency suffers lightly on kernbench and a bit more on dbench, but largely
improves on gitsource (which also runs considerably faster). For reference, we
also report results using active intel_pstate with powersave and HWP; the
largest gap between non-invariant schedutil and intel_pstate+powersave is
still tbench, which runs 82% better and with 26% improved efficiency on the
latter configuration -- this divide isn't closed yet by frequency-invariant
schedutil.

4. DETAILED TABLES
------------------

Benchmark          : tbench4 (i.e. dbench4 over the network, actually loopback)
Varying parameter  : number of clients
Unit               : MB/sec (higher is better)

                     5.2.0 vanilla (BASELINE)            5.2.0 intel_pstate/HWP                    5.2.0 freq-inv
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hmean   1         183.56  +- 0.21% (        )       516.12  +- 0.57% ( 181.18%)       185.59  +- 0.59% (   1.11%)
Hmean   2         365.75  +- 0.25% (        )      1015.14  +- 0.33% ( 177.55%)       402.59  +- 4.48% (  10.07%)
Hmean   4         720.99  +- 0.44% (        )      1951.75  +- 0.28% ( 170.70%)       738.39  +- 1.72% (   2.41%)
Hmean   8        1449.93  +- 0.34% (        )      3830.56  +- 0.24% ( 164.19%)      1750.36  +- 4.65% (  20.72%)
Hmean   16       2874.26  +- 0.57% (        )      7381.62  +- 0.53% ( 156.82%)      4348.35  +- 2.22% (  51.29%)
Hmean   32       6116.17  +- 5.10% (        )     13013.05  +- 0.08% ( 112.76%)      8980.35  +- 0.66% (  46.83%)
Hmean   64      14485.04  +- 3.46% (        )     17835.12  +- 0.35% (  23.13%)     16540.73  +- 0.51% (  14.19%)
Hmean   128     30779.16  +- 3.20% (        )     32796.94  +- 2.13% (   6.56%)     31512.58  +- 0.20% (   2.38%)
Hmean   256     34664.66  +- 0.81% (        )     34604.67  +- 0.46% (  -0.17%)     34943.70  +- 0.25% (   0.80%)
Hmean   384     33957.51  +- 0.11% (        )     34091.50  +- 0.14% (   0.39%)     33921.41  +- 0.09% (  -0.11%)

Benchmark          : kernbench (kernel compilation)
Varying parameter  : number of jobs
Unit               : seconds (lower is better)

                    5.2.0 vanilla (BASELINE)             5.2.0 intel_pstate/HWP                     5.2.0 freq-inv
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean   2        332.94  +- 0.40% (        )        260.16  +- 0.45% (  21.86%)        233.56  +- 0.21% (  29.85%)
Amean   4        173.04  +- 0.43% (        )        138.76  +- 0.03% (  19.81%)        123.59  +- 0.11% (  28.58%)
Amean   8         89.65  +- 0.20% (        )         73.54  +- 0.09% (  17.97%)         65.69  +- 0.10% (  26.72%)
Amean   16        48.08  +- 1.41% (        )         41.64  +- 1.61% (  13.40%)         36.00  +- 1.80% (  25.11%)
Amean   32        28.78  +- 0.72% (        )         26.61  +- 1.99% (   7.55%)         23.19  +- 1.68% (  19.43%)
Amean   64        20.46  +- 1.85% (        )         19.76  +- 0.35% (   3.42%)         17.38  +- 0.92% (  15.06%)
Amean   128       18.69  +- 1.70% (        )         17.59  +- 1.04% (   5.90%)         15.73  +- 1.40% (  15.85%)
Amean   192       18.82  +- 1.01% (        )         17.76  +- 0.77% (   5.67%)         15.57  +- 1.80% (  17.28%)

Benchmark          : gitsource (time to run the git unit test suite)
Varying parameter  : none
Unit               : seconds (lower is better)

                 5.2.0 vanilla (BASELINE)           5.2.0 intel_pstate/HWP                    5.2.0 freq-inv
- - - - - - - -  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean         792.49  +- 0.20% (        )      779.35  +- 0.24% (   1.66%)      427.14  +- 0.16% (   46.10%)

Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200122151617.531-3-ggherdovich@suse.cz
2020-01-28 21:37:01 +01:00
..
acpi x86/asm/32: Add ENDs to some functions and relabel with SYM_CODE_* 2019-10-18 11:58:33 +02:00
apic Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 09:52:37 -08:00
cpu x86/mce: Fix possibly incorrect severity calculation on AMD 2019-12-17 09:39:53 +01:00
fpu treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
kprobes Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 15:04:47 -08:00
.gitignore
alternative.c x86/alternatives: Teach text_poke_bp() to emulate instructions 2019-11-15 14:07:01 -08:00
amd_gart_64.c dma-mapping updates for 5.5-rc1 2019-11-28 11:16:43 -08:00
amd_nb.c x86/amd_nb: Add PCI device IDs for family 17h, model 70h 2019-09-03 12:47:17 -07:00
apb_timer.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
aperture_64.c x86/gart: Exclude GART aperture from kcore 2019-03-23 12:11:49 +01:00
apm_32.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118 2019-05-24 17:39:02 +02:00
asm-offsets_32.c x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler 2018-07-20 01:11:36 +02:00
asm-offsets_64.c Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-17 12:04:39 -07:00
asm-offsets.c x86/paravirt: Make read_cr2() CALLEE_SAVE 2019-07-17 23:17:37 +02:00
audit_64.c
bootflag.c
check.c x86/headers: Fix -Wmissing-prototypes warning 2018-11-23 07:59:59 +01:00
cpuid.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 142 2019-05-30 11:25:17 -07:00
crash_dump_32.c
crash_dump_64.c fs/core/vmcore: Move sev_active() reference to x86 arch code 2019-08-09 22:52:10 +10:00
crash.c x86/crash: Align function arguments on opening braces 2019-11-14 18:24:55 +01:00
devicetree.c x86/headers: Fix -Wmissing-prototypes warning 2018-11-23 07:59:59 +01:00
doublefault_32.c x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit 2019-11-26 22:00:04 +01:00
dumpstack_32.c x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area 2019-11-26 21:53:34 +01:00
dumpstack_64.c x86/dumpstack/64: Don't evaluate exception stacks before setup 2019-11-05 00:51:35 +01:00
dumpstack.c x86/dumpstack: Indicate PREEMPT_RT in dumps 2019-07-31 19:03:36 +02:00
e820.c ACPI updates for 5.5-rc1 2019-11-26 19:25:25 -08:00
early_printk.c efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation 2019-02-04 08:27:30 +01:00
early-quirks.c x86/intel: Disable HPET on Intel Ice Lake platforms 2019-11-29 12:17:58 +01:00
ebda.c
eisa.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 243 2019-06-19 17:09:07 +02:00
espfix_64.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 2019-06-05 17:36:37 +02:00
ftrace_32.S x86/ftrace: Get rid of function_hook 2019-10-25 10:52:22 +02:00
ftrace_64.S New tracing features: 2019-11-27 11:42:01 -08:00
ftrace.c ftrace: Fix function_graph tracer interaction with BPF trampoline 2019-12-10 13:53:59 -05:00
head32.c x86/boot: Mostly revert commit ae7e1238e6 ("Add ACPI RSDP address to setup_header") 2018-11-20 09:43:10 +01:00
head64.c x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area 2019-10-11 18:38:15 +02:00
head_32.S Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 10:42:40 -08:00
head_64.S x86/asm/64: Change all ENTRY+END to SYM_CODE_* 2019-10-18 11:58:26 +02:00
hpet.c x86/hpet: Undo the early counter is counting check 2019-07-25 12:21:32 +02:00
hw_breakpoint.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
i8237.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
i8253.c x86/timer: Skip PIT initialization on modern chipsets 2019-06-29 11:35:35 +02:00
i8259.c x86: Don't include linux/irq.h from asm/hardirq.h 2018-08-05 09:53:13 +02:00
idt.c Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 11:22:57 -07:00
ima_arch.c Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
io_delay.c x86/io_delay: Define IO_DELAY macros in C instead of Kconfig 2019-05-24 08:46:06 +02:00
ioport.c x86/ioperm: Extend IOPL config to control ioperm() as well 2019-11-16 11:24:06 +01:00
irq_32.c x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() code 2019-08-19 23:19:06 +02:00
irq_64.c x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() code 2019-08-19 23:19:06 +02:00
irq_work.c
irq.c x86/irq: Check for VECTOR_UNUSED directly 2019-08-19 23:19:07 +02:00
irqflags.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
irqinit.c x86/irq/32: Handle irq stack allocation failure proper 2019-04-17 15:31:42 +02:00
itmt.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
jailhouse.c x86/jailhouse: Only enable platform UARTs if available 2019-10-10 15:43:59 +02:00
jump_label.c x86/alternatives: Teach text_poke_bp() to emulate instructions 2019-11-15 14:07:01 -08:00
kdebugfs.c x86/boot: Introduce setup_indirect 2019-11-12 16:21:15 +01:00
kexec-bzimage64.c Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
kgdb.c x86/apic: Provide and use helper for send_IPI_allbutself() 2019-07-25 16:12:00 +02:00
ksysfs.c x86/boot: Introduce setup_indirect 2019-11-12 16:21:15 +01:00
kvm.c x86/kvm: Fix -Wmissing-prototypes warnings 2019-10-25 14:01:14 +02:00
kvmclock.c x86: kvmguest: use TSC clocksource if invariant TSC is exposed 2019-02-20 22:48:52 +01:00
ldt.c x86: Convert some slow-path static_cpu_has() callers to boot_cpu_has() 2019-04-08 12:13:34 +02:00
livepatch.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
machine_kexec_32.c x86/mm: Remove set_pages_x() and set_pages_nx() 2019-09-03 09:26:37 +02:00
machine_kexec_64.c x86/kdump: Remove the backup region handling 2019-11-14 18:24:43 +01:00
Makefile x86/doublefault/32: Rename doublefault.c to doublefault_32.c 2019-11-26 21:53:34 +01:00
mmconf-fam10h_64.c
module.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
mpparse.c x86/boot: Fix memory leak in default_get_smp_config() 2019-07-16 23:13:48 +02:00
msr.c x86/msr: Restrict MSR access when the kernel is locked down 2019-08-19 21:54:16 -07:00
nmi_selftest.c
nmi.c x86/hotplug: Silence APIC and NMI when CPU is dead 2019-07-25 16:11:59 +02:00
paravirt_patch.c x86/paravirt: Standardize 'insn_buff' variable names 2019-04-29 16:05:49 +02:00
paravirt-spinlocks.c x86/paravirt: Use a single ops structure 2018-09-03 16:50:35 +02:00
paravirt.c x86/iopl: Remove legacy IOPL option 2019-11-16 11:24:05 +01:00
pci-dma.c dma-mapping updates for 5.5-rc1 2019-11-28 11:16:43 -08:00
pci-iommu_table.c x86/iommu: Use NULL instead of 0 2018-08-02 14:33:19 +02:00
pci-swiotlb.c dma-mapping: fix filename references 2019-09-03 08:36:30 +02:00
pcspeaker.c x86/platform/pcspeaker: Use PTR_ERR_OR_ZERO() to fix ptr_ret.cocci warning 2018-07-24 09:46:42 +02:00
perf_regs.c perf/x86/regs: Check reserved bits 2019-06-24 19:19:24 +02:00
platform-quirks.c
pmem.c
probe_roms.c
process_32.c x86/iopl: Remove legacy IOPL option 2019-11-16 11:24:05 +01:00
process_64.c x86/iopl: Remove legacy IOPL option 2019-11-16 11:24:05 +01:00
process.c x86/ioperm: Save an indentation level in tss_update_io_bitmap() 2019-11-30 18:06:56 +01:00
process.h x86: Use the correct SPDX License Identifier in headers 2019-10-01 20:31:35 +02:00
ptrace.c x86/ptrace: Document FSBASE and GSBASE ABI oddities 2019-11-26 22:00:12 +01:00
pvclock.c x86/vdso: Switch to generic vDSO implementation 2019-06-22 21:21:10 +02:00
quirks.c x86/PCI: Remove superfluous returns from void functions 2019-08-20 09:54:36 +02:00
reboot_fixups_32.c
reboot.c x86/apic: Provide and use helper for send_IPI_allbutself() 2019-07-25 16:12:00 +02:00
relocate_kernel_32.S x86/asm: Annotate relocate_kernel_{32,64}.c 2019-10-18 09:53:19 +02:00
relocate_kernel_64.S x86/asm: Annotate relocate_kernel_{32,64}.c 2019-10-18 09:53:19 +02:00
resource.c
rtc.c
setup_percpu.c x86: Use pr_warn instead of pr_warning 2019-10-18 15:00:18 +02:00
setup.c ACPI updates for 5.5-rc1 2019-11-26 19:25:25 -08:00
signal_compat.c
signal.c x86: use static_cpu_has in uaccess region to avoid instrumentation 2019-07-12 11:05:42 -07:00
smp.c x86/smp: Move smp_function_call implementations into IPI code 2019-07-25 16:12:01 +02:00
smpboot.c x86, sched: Add support for frequency invariance on SKYLAKE_X 2020-01-28 21:37:01 +01:00
stacktrace.c x86/stacktrace: Prevent access_ok() warnings in arch_stack_walk_user() 2019-07-22 10:42:36 +02:00
step.c
sys_x86_64.c x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT 2018-11-01 12:59:25 +01:00
sysfb_efi.c x86/sysfb_efi: Add quirks for some devices with swapped width and height 2019-07-22 10:47:11 +02:00
sysfb_simplefb.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
sysfb.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tboot.c x86: Use pr_warn instead of pr_warning 2019-10-18 15:00:18 +02:00
time.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 16:59:34 -07:00
tls.c x86/tls: Fix possible spectre-v1 in do_get_thread_area() 2019-06-27 23:48:04 +02:00
tls.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193 2019-05-30 11:29:21 -07:00
topology.c x86/topology: Make DEBUG_HOTPLUG_CPU0 pr_info() more descriptive 2019-04-19 19:42:57 +02:00
trace_clock.c
tracepoint.c x86/kernel: Fix more -Wmissing-prototypes warnings 2018-12-08 12:24:35 +01:00
traps.c x86/traps: die() instead of panicking on a double fault 2019-11-26 22:00:12 +01:00
tsc_msr.c x86/cpu: Update init data for new Airmont CPU model 2019-09-06 07:30:40 +02:00
tsc_sync.c x86: Use pr_warn instead of pr_warning 2019-10-18 15:00:18 +02:00
tsc.c x86/tsc: Respect tsc command line paraemeter for clocksource_tsc_early 2019-11-05 01:24:56 +01:00
umip.c Merge branches 'x86-cpu-for-linus' and 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 08:58:08 -08:00
unwind_frame.c x86/stackframe/32: Provide consistent pt_regs 2019-06-25 10:23:47 +02:00
unwind_guess.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
unwind_orc.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 16:59:34 -07:00
uprobes.c x86/apic, x86/uprobes: Correct parameter names in kernel-doc comments 2019-10-27 09:00:28 +01:00
verify_cpu.S x86/asm: Annotate local pseudo-functions 2019-10-18 10:04:04 +02:00
vm86_32.c signal: Remove task parameter from force_sig 2019-05-27 09:36:28 -05:00
vmlinux.lds.S x86/vmlinux: Use INT3 instead of NOP for linker fill bytes 2019-11-04 19:10:08 +01:00
vsmp_64.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 346 2019-06-05 17:37:08 +02:00
x86_init.c x86/init: Allow DT configured systems to disable RTC at boot time 2019-11-12 15:46:53 +01:00