On platforms with C8-C10 support, the additional C-states cause
turbostat to overrun its output buffer of 128 bytes per CPU. Increase
this to 256 bytes per CPU.
[ As a bugfix, this should go into 3.10; however, since the C8-C10
support didn't go in until after 3.9, this need not go into any stable
kernel. ]
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull idle update from Len Brown:
"Add support for new Haswell-ULT CPU idle power states"
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
intel_idle: initial C8, C9, C10 support
tools/power turbostat: display C8, C9, C10 residency
Display residency in the new C-states, C8, C9, C10.
C8, C9, C10 are present on some:
"Fourth Generation Intel(R) Core(TM) Processors",
which are based on Intel(R) microarchitecture code name Haswell.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Correct spelling typos in various part of printk.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The SMI counter is popular -- so display it by default
rather than requiring an option. What the heck,
we've blown the 80 column budget on many systems already...
Note that the value displayed is the delta
during the measurement interval.
The absolute value of the counter can still be seen with
the generic 32-bit MSR option, ie. -m 0x34
Signed-off-by: Len Brown <len.brown@intel.com>
When verbose is enabled, print the C1E-Enable
bit in MSR_IA32_POWER_CTL.
also delete some redundant tests on the verbose variable.
Signed-off-by: Len Brown <len.brown@intel.com>
This patch enables turbostat to run properly on the
next-generation Intel(R) Microarchitecture, code named "Haswell" (HSW).
HSW supports the BCLK and counters found in SNB.
Signed-off-by: Len Brown <len.brown@intel.com>
Change the default location to install acpidump into from /usr/bin
to /usr/sbin, as this tool needs to be run as root.
[rjw: Subject and changelog]
Signed-off-by: Thomas Renninger <trenn@suse.de>
Tested-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull powertool update from Len Brown:
"This updates the tree w/ the latest version of turbostat, which
reports temperature and - on SNB and later - Watts."
Fix up semantic merge conflict as per Len.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools: Allow tools to be installed in a user specified location
tools/power: turbostat: make Makefile a bit more capable
tools/power x86_energy_perf_policy: close /proc/stat in for_every_cpu()
tools/power turbostat: v3.0: monitor Watts and Temperature
tools/power turbostat: fix output buffering issue
tools/power turbostat: prevent infinite loop on migration error path
x86 power: define RAPL MSRs
tools/power/x86/turbostat: share kernel MSR #defines
When building x86_energy_perf_policy or turbostat within the confines of
a packaging system such as RPM, we need to be able to have it install to
the buildroot and not the root filesystem of the build machine. This
adds a DESTDIR variable that when set will act as a prefix for the
install location of these tools.
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The turbostat Makefile is pretty simple, its output is placed in the
same directory as the source, the install rule has no concept of a
prefix or sysroot, and you can set CC to use a specific compiler but
not use the more familiar CROSS_COMPILE. By making a few minor changes
these limitations are removed while leaving the default behavior
matching what it used to be.
Example build with these changes:
make CROSS_COMPILE=i686-wrs-linux-gnu- DESTDIR=/tmp install
or from the tools directory
make CROSS_COMPILE=i686-wrs-linux-gnu- DESTDIR=/tmp turbostat_install
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Instead of returning out of for_every_cpu() we should break out of the loop=
which will then tidy up correctly by closing the file /proc/stat.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Show power in Watts and temperature in Celsius
when hardware support is present.
Intel's Sandy Bridge and Ivy Bridge processor generations support RAPL
(Run-Time-Average-Power-Limiting). Per the Intel SDM
(Intel® 64 and IA-32 Architectures Software Developer Manual)
RAPL provides hardware energy counters and power control MSRs
(Model Specific Registers). RAPL MSRs are designed primarily
as a method to implement power capping. However, they are useful
for monitoring system power whether or not power capping is used.
In addition, Turbostat now shows temperature from DTS
(Digital Thermal Sensor) and PTM (Package Thermal Monitor) hardware,
if present.
As before, turbostat reads MSRs, and never writes MSRs.
New columns are present in turbostat output:
The Pkg_W column shows Watts for each package (socket) in the system.
On multi-socket systems, the system summary on the 1st row shows the sum
for all sockets together.
The Cor_W column shows Watts due to processors cores.
Note that Core_W is included in Pkg_W.
The optional GFX_W column shows Watts due to the graphics "un-core".
Note that GFX_W is included in Pkg_W.
The optional RAM_W column on server processors shows Watts due to DRAM DIMMS.
As DRAM DIMMs are outside the processor package, RAM_W is not included in Pkg_W.
The optional PKG_% and RAM_% columns on server processors shows the % of time
in the measurement interval that RAPL power limiting is in effect on the
package and on DRAM.
Note that the RAPL energy counters have some limitations.
First, hardware updates the counters about once every milli-second.
This is fine for typical turbostat measurement intervals > 1 sec.
However, when turbostat is used to measure events that approach
1ms, the counters are less useful.
Second, the 32-bit energy counters are subject to wrapping.
For example, a counter incrementing 15 micro-Joule units
on a 130 Watt TDP server processor could (in theory)
roll over in about 9 minutes. Turbostat detects and handles
up to 1 counter overflow per measurement interval.
But when the measurement interval exceeds the guaranteed
counter range, we can't detect if more than 1 overflow occured.
So in this case turbostat indicates that the results are
in question by replacing the fractional part of the Watts
in the output with "**":
Pkg_W Cor_W GFX_W
3** 0** 0**
Third, the RAPL counters are energy (Joule) counters -- they sum up
weighted events in the package to estimate energy consumed. They are
not analong power (Watt) meters. In practice, they tend to under-count
because they don't cover every possible use of energy in the package.
The accuracy of the RAPL counters will vary between product generations,
and between SKU's in the same product generation, and with temperature.
turbostat's -v (verbose) option now displays more power and thermal configuration
information -- as shown on the turbostat.8 manual page.
For example, it now displays the Package and DRAM Thermal Design Power (TDP):
cpu0: MSR_PKG_POWER_INFO: 0x2f064001980410 (130 W TDP, RAPL 51 - 200 W, 0.045898 sec.)
cpu0: MSR_DRAM_POWER_INFO,: 0x28025800780118 (35 W TDP, RAPL 15 - 75 W, 0.039062 sec.)
cpu8: MSR_PKG_POWER_INFO: 0x2f064001980410 (130 W TDP, RAPL 51 - 200 W, 0.045898 sec.)
cpu8: MSR_DRAM_POWER_INFO,: 0x28025800780118 (35 W TDP, RAPL 15 - 75 W, 0.039062 sec.)
Signed-off-by: Len Brown <len.brown@intel.com>
In periodic mode, turbostat writes to stdout,
but users were un-able to re-direct stdout, eg.
turbostat > outputfile
would result in an empty outputfile.
Signed-off-by: Len Brown <len.brown@intel.com>
If an MSR based monitor is run in parallel this is not needed. This is the
default case on all/most Intel machines.
But when only sysfs info is read via cpupower monitor -m Idle_Stats (typically
the case for non root users) or when other monitors are PCI based (AMD),
Idle_Stats, read from sysfs can be totally bogus:
cpupower monitor -m Idle_Stats
PKG |CORE|CPU | POLL | C1-N | C3-N | C6-N
0| 0| 0| 0.00| 0.00| 0.24| 99.81
0| 0| 32| 0.00| 0.00| 0.00| 100.7
...
0| 17| 20| 0.00| 0.00| 0.00| 173.1
0| 17| 52| 0.00| 0.00| 0.07| 173.0
0| 18| 68| 0.00| 0.00| 0.00| 0.00
0| 18| 76| 0.00| 0.00| 0.00| 0.00
...
With the -c option all cores are woken up and the kernel
did update cpuidle statistics before reading out sysfs.
This causes some overhead. Therefore avoid if possible, use
if needed:
cpupower monitor -c -m Idle_Stats
PKG |CORE|CPU | POLL | C1-N | C3-N | C6-N
0| 0| 0| 0.00| 0.00| 0.00| 100.2
0| 0| 32| 0.00| 0.00| 0.00| 100.2
...
0| 8| 8| 0.00| 0.00| 0.00| 99.82
0| 8| 40| 0.00| 0.00| 0.00| 99.81
0| 9| 24| 0.00| 0.00| 0.00| 100.3
0| 9| 56| 0.00| 0.00| 0.00| 100.2
0| 16| 4| 0.00| 0.00| 0.00| 99.75
0| 16| 36| 0.00| 0.00| 0.00| 99.38
...
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The pkgs member of cpupower_topology is being used as the number of
cpu packages. As the comment in get_cpu_topology notes, the package ids
are not guaranteed to be contiguous. So, simply setting pkgs to the value
of the highest physical_package_id doesn't actually provide a count of
the number of cpu packages. Instead, calculate pkgs by setting it to
the number of distinct physical_packge_id values which is pretty easy
to do after the core_info structs are sorted. Calculating pkgs this
way also has the nice benefit of getting rid of a sign comparison warning
that GCC 4.6 was reporting.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpu_info member of cpupower_topology was being declared as an unnamed
structure. This member was then being malloced using the size of the
parent cpupower_topology * the number of cpus. This works
because cpu_info is smaller than cpupower_topology. However, there is
no guarantee that will always be the case. Making cpu_info its own
top level structure (named cpuid_core_info) allows for mallocing the actual
size of this structure. This also lets us get rid of a redefinition of
the structure in topology.c with slightly different field names.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix a variety of issues with sysfs_topology_read_file:
* The return value of sysfs_topology_read_file function was not properly
being checked for failure.
* The function was reading int valued sysfs variables and then returning
their value. So, even if a function was trying to check the return value
of this function, a caller would not be able to tell an failure code apart
from reading a negative value. This also conflicted with the comment on the
function which said that a return value of 0 indicated success.
* The function was parsing int valued sysfs values with strtoul instead of
strtol.
* The function was non-static even though it was only used in the
file it was declared in.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix minor warnings reported with GCC 4.6:
* The sysfs_write_file function is unused - remove it.
* The pr_mon_len in the print_header function is unsed - remove it.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The files generated by the Makefiles in the debug directories aren't listed
in the .gitignore file in the root of the cpupower tool which causes these
files to show up in the output of 'git status'.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The clean targets from the cpupower tools' Makefiles use brace expansion to
remove some generated files. However, the default shells on many systems do
not support this feature resulting in some generated files not being removed
by clean.
Signed-off-by: Palmer Cox <p@lmercox.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Turbostat assumed if it can't migrate to a CPU, then the CPU
must have gone off-line and turbostat should re-initialize
with the new topology.
But if turbostat can not migrate because it is restricted by
a cpuset, then it will fail to migrate even after re-initialization,
resulting in an infinite loop.
Spit out a warning when we can't migrate
and endure only 2 re-initialize cycles in a row
before giving up and exiting.
Signed-off-by: Len Brown <len.brown@intel.com>
Now that turbostat is built in the kernel tree,
it can share MSR #defines with the kernel.
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Remove duplicated include.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Len Brown <len.brown@intel.com>
Pull kbuild fixes from Michal Marek:
"Here are two fixes I intended to send after v3.6-rc7, but failed to do
so. So please pull them for v3.7-rc1 and they will be picked up by
stable.
The first one fixes gcc -x <language> syntax in various build-time
tests, which icecream and possible other gcc wrappers did not
understand (and yes, icecream is going to be fixed as well).
The second one fixes make tar-pkg so that unpacking the tarball does
not replace the /lib -> /usr/lib symlink on recent Fedora releases."
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: Fix gcc -x syntax
kbuild: Do not package /boot and /lib in make tar-pkg
Counting SMIs is popular, so add a dedicated "-s" option to do it,
and juggle some of the other option letters.
-S is now system summary (was -s)
-c is 32 bit counter (was -d)
-C is 64-bit counter (was -D)
-p is 1st thread in core (was -c)
-P is 1st thread in package (was -p)
bump the minor version number
Signed-off-by: Len Brown <len.brown@intel.com>
The correct syntax for gcc -x is "gcc -x assembler", not
"gcc -xassembler". Even though the latter happens to work, the former
is what is documented in the manual page and thus what gcc wrappers
such as icecream do expect.
This isn't a cosmetic change. The missing space prevents icecream from
recognizing compilation tasks it can't handle, leading to silent kernel
miscompilations.
Besides me, credits go to Michael Matz and Dirk Mueller for
investigating the miscompilation issue and tracking it down to this
incorrect -x parameter syntax.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Cc: Bernhard Walle <bernhard@bwalle.de>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
# turbostat -d 0x34
is useful for printing the number of SMI's within an interval
on Nehalem and newer processors.
where
# turbostat -m 0x34
will simply print out the total SMI count since reset.
Suggested-by: Andi Kleen
Signed-off-by: Len Brown <len.brown@intel.com>
The -M option dumps the specified 64-bit MSR with every sample.
Previously it was output at the end of each line.
However, with the v2 style of printing, the lines are now staggered,
making MSR output hard to read.
So move the MSR output column to the left where things are aligned.
Signed-off-by: Len Brown <len.brown@intel.com>
The "turbo-limit" is the maximum opportunistic processor
speed, assuming no electrical or thermal constraints.
For a given processor, the turbo-limit varies, depending
on the number of active cores. Generally, there is more
opportunity when fewer cores are active.
Under the "-v" verbose option, turbostat would
print the turbo-limits for the four cases
of 1 to 4 cores active.
Expand that capability to cover the cases of turbo
opportunities with up to 16 cores active.
Note that not all hardware platforms supply this information,
and that sometimes a valid limit may be specified for
a core which is not actually present.
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20101221, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
This version finds dynamic tables exported by Linux in
/sys/firmware/acpi/tables/dynamic
Signed-off-by: Yakui Zhao <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20101221, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
This version finds dynamic tables exported by Linux in
/sys/firmware/acpi/tables/dynamic
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20071116, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20070714, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20060606, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
Signed-off-by: Len Brown <len.brown@intel.com>
This is unchanged version 20051111, plus a small bit in
DEFINE_ALTERNATE_TYPES to enable building with latest kernel headers.
Signed-off-by: Len Brown <len.brown@intel.com>
Under some conditions, c1% was displayed as very large number,
much higher than 100%.
c1% is not measured, it is derived as "that, which is left over"
from other counters. However, the other counters are not collected
atomically, and so it is possible for c1% to be calaculagted as
a small negative number -- displayed as very large positive.
There was a check for mperf vs tsc for this already,
but it needed to also include the other counters
that are used to calculate c1.
Signed-off-by: Len Brown <len.brown@intel.com>
Measuring large profoundly-idle configurations
requires turbostat to be more lightweight.
Otherwise, the operation of turbostat itself
can interfere with the measurements.
This re-write makes turbostat topology aware.
Hardware is accessed in "topology order".
Redundant hardware accesses are deleted.
Redundant output is deleted.
Also, output is buffered and
local RDTSC use replaces remote MSR access for TSC.
From a feature point of view, the output
looks different since redundant figures are absent.
Also, there are now -c and -p options -- to restrict
output to the 1st thread in each core, and the 1st
thread in each package, respectively. This is helpful
to reduce output on big systems, where more detail
than the "-s" system summary is desired.
Finally, periodic mode output is now on stdout, not stderr.
Turbostat v2 is also slightly more robust in
handling run-time CPU online/offline events,
as it now checks the actual map of on-line cpus rather
than just the total number of on-line cpus.
Signed-off-by: Len Brown <len.brown@intel.com>
Initial IVB support went into turbostat in Linux-3.1:
553575f1ae
(tools turbostat: recognize and run properly on IVB)
However, when running on IVB, turbostat would fail
to report the new couters added with SNB, c7, pc2 and pc7.
So in scenarios where these counters are non-zero on IVB,
turbostat would report erroneous residencey results.
In particular c7 time would be added to c1 time,
since c1 time is calculated as "that which is left over".
Also, turbostat reports MHz capabilities when passed
the "-v" option, and it would incorrectly report 133MHz
bclk instead of 100MHz bclk for IVB, which would inflate
GHz reported with that option.
This patch is a backport of a fix already included in turbostat v2.
Signed-off-by: Len Brown <len.brown@intel.com>
Linux 3.4 included a modification to turbostat to
lower cross-call overhead by using scheduler affinity:
15aaa34654
(tools turbostat: reduce measurement overhead due to IPIs)
In the use-case where turbostat forks a child program,
that change had the un-intended side-effect of binding
the child to the last cpu in the system.
This change removed the binding before forking the child.
This is a back-port of a fix already included in turbostat v2.
Signed-off-by: Len Brown <len.brown@intel.com>
It's been broken forever (i.e. it's not scheduling in a power
aware fashion), as reported by Suresh and others sending
patches, and nobody cares enough to fix it properly ...
so remove it to make space free for something better.
There's various problems with the code as it stands today, first
and foremost the user interface which is bound to topology
levels and has multiple values per level. This results in a
state explosion which the administrator or distro needs to
master and almost nobody does.
Furthermore large configuration state spaces aren't good, it
means the thing doesn't just work right because it's either
under so many impossibe to meet constraints, or even if
there's an achievable state workloads have to be aware of
it precisely and can never meet it for dynamic workloads.
So pushing this kind of decision to user-space was a bad idea
even with a single knob - it's exponentially worse with knobs
on every node of the topology.
There is a proposal to replace the user interface with a single
3 state knob:
sched_balance_policy := { performance, power, auto }
where 'auto' would be the preferred default which looks at things
like Battery/AC mode and possible cpufreq state or whatever the hw
exposes to show us power use expectations - but there's been no
progress on it in the past many months.
Aside from that, the actual implementation of the various knobs
is known to be broken. There have been sporadic attempts at
fixing things but these always stop short of reaching a mergable
state.
Therefore this wholesale removal with the hopes of spurring
people who care to come forward once again and work on a
coherent replacement.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull ACPI & Power Management changes from Len Brown:
- ACPI 5.0 after-ripples, ACPICA/Linux divergence cleanup
- cpuidle evolving, more ARM use
- thermal sub-system evolving, ditto
- assorted other PM bits
Fix up conflicts in various cpuidle implementations due to ARM cpuidle
cleanups (ARM at91 self-refresh and cpu idle code rewritten into
"standby" in asm conflicting with the consolidation of cpuidle time
keeping), trivial SH include file context conflict and RCU tracing fixes
in generic code.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (77 commits)
ACPI throttling: fix endian bug in acpi_read_throttling_status()
Disable MCP limit exceeded messages from Intel IPS driver
ACPI video: Don't start video device until its associated input device has been allocated
ACPI video: Harden video bus adding.
ACPI: Add support for exposing BGRT data
ACPI: export acpi_kobj
ACPI: Fix logic for removing mappings in 'acpi_unmap'
CPER failed to handle generic error records with multiple sections
ACPI: Clean redundant codes in scan.c
ACPI: Fix unprotected smp_processor_id() in acpi_processor_cst_has_changed()
ACPI: consistently use should_use_kmap()
PNPACPI: Fix device ref leaking in acpi_pnp_match
ACPI: Fix use-after-free in acpi_map_lsapic
ACPI: processor_driver: add missing kfree
ACPI, APEI: Fix incorrect APEI register bit width check and usage
Update documentation for parameter *notrigger* in einj.txt
ACPI, APEI, EINJ, new parameter to control trigger action
ACPI, APEI, EINJ, limit the range of einj_param
ACPI, APEI, Fix ERST header length check
cpuidle: power_usage should be declared signed integer
...
Sometimes users have turbostat running in interval mode
when they take processors offline/online.
Previously, turbostat would survive, but not gracefully.
Tighten up the error checking so turbostat notices
changesn sooner, and print just 1 line on change:
turbostat: re-initialized with num_cpus %d
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat uses /dev/cpu/*/msr interface to read MSRs.
For modern systems, it reads 10 MSR/CPU. This can
be observed as 10 "Function Call Interrupts"
per CPU per sample added to /proc/interrupts.
This overhead is measurable on large idle systems,
and as Yoquan Song pointed out, it can even trick
cpuidle into thinking the system is busy.
Here turbostat re-schedules itself in-turn to each
CPU so that its MSR reads will always be local.
This replaces the 10 "Function Call Interrupts"
with a single "Rescheduling interrupt" per sample
per CPU.
On an idle 32-CPU system, this shifts some residency from
the shallow c1 state to the deeper c7 state:
# ./turbostat.old -s
%c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7
0.27 1.29 2.29 0.95 0.02 0.00 98.77 20.23 0.00 77.41 0.00
0.25 1.24 2.29 0.98 0.02 0.00 98.75 20.34 0.03 77.74 0.00
0.27 1.22 2.29 0.54 0.00 0.00 99.18 20.64 0.00 77.70 0.00
0.26 1.22 2.29 1.22 0.00 0.00 98.52 20.22 0.00 77.74 0.00
0.26 1.38 2.29 0.78 0.02 0.00 98.95 20.51 0.05 77.56 0.00
^C
i# ./turbostat.new -s
%c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7
0.27 1.20 2.29 0.24 0.01 0.00 99.49 20.58 0.00 78.20 0.00
0.27 1.22 2.29 0.25 0.00 0.00 99.48 20.79 0.00 77.85 0.00
0.27 1.20 2.29 0.25 0.02 0.00 99.46 20.71 0.03 77.89 0.00
0.28 1.26 2.29 0.25 0.01 0.00 99.46 20.89 0.02 77.67 0.00
0.27 1.20 2.29 0.24 0.01 0.00 99.48 20.65 0.00 78.04 0.00
cc: Youquan Song <youquan.song@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat -s
cuts down on the amount of output, per user request.
also treak some output whitespace and the man page.
Signed-off-by: Len Brown <len.brown@intel.com>
This patch allows cpupower tool to generate its output files in a
seperate directory. This is now possible by passing the 'O=<path>' to
the command line.
This can be usefull for a normal user if the kernel source code is
located in a read only location.
This is patch stole some bits of the perf makefile.
[linux@dominikbrodowski.net: fix commit message]
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Use the '-p' and '-o' switches to specify the pathname of the output
file to xgettext(1). This avoids to move manually the output file if
xgettext(1) succeeds.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
UTIL_BINS and IDLE_OBJS variables are not defined at all, so
there's no need to remove their content from the 'clean' target.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Looks like some not needed debug code slipped in.
Also this code:
tmp = sysfs_get_idlestate_name(cpu, idlestates - 1);
performs a strdup and the mem was not freed again.
-> delete it.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The number of idle states was wrong.
The POLL idle state (on X86) was missed out:
Number of idle states: 4
Available idle states: C1-NHM C3-NHM C6-NHM
While the POLL is not a real idle state, its
statistics should still be shown. It's now also
explained in a detailed manpage.
This should fix a bug of missing the first idle
state on other archs.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
cpupower-frequency-* manpages slightly differed from the others.
- Use uppercase letters in the title
- Show cpupower Manual in the header
- Remove Mattia from left down corner of the manpage, he is already
listed as author
- Remove --help, prints this message -> not needed
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The last missing manpage for cpupower tools.
More info about other architecture's sleep state specialities would be great.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The name of the monitor is updated at runtime to the name of the
CPU type.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Andreas Herrmann <herrmann.der.user@googlemail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
AMD's BKDG (Bios and Kernel Developers Guide) talks in the CPU spec of their
CPU families about PCI registers defined by "device" (slot) and func(tion).
Assuming that CPU specific configuration PCI devices are always on domain
and bus zero a pci_slot_func_init() func which gets the slot and func of
the desired PCI device passed looks like the most convenient way.
This also obsoletes the PCI device id maintenance.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Andreas Herrmann <herrmann.der.user@googlemail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
This includes initial support for the recently published ACPI 5.0 spec.
In particular, support for the "hardware-reduced" bit that eliminates
the dependency on legacy hardware.
APEI has patches resulting from testing on real hardware.
Plus other random fixes.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits)
acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec
intel_idle: Split up and provide per CPU initialization func
ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2
ACPI processor: Remove unneeded cpuidle_unregister_driver call
intel idle: Make idle driver more robust
intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle
ACPI: kernel-parameters.txt : Add intel_idle.max_cstate
intel_idle: remove redundant local_irq_disable() call
ACPI processor: Fix error path, also remove sysdev link
ACPI: processor: fix acpi_get_cpuid for UP processor
intel_idle: fix API misuse
ACPI APEI: Convert atomicio routines
ACPI: Export interfaces for ioremapping/iounmapping ACPI registers
ACPI: Fix possible alignment issues with GAS 'address' references
ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64)
ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64)
ACPI: Store SRAT table revision
ACPI, APEI, Resolve false conflict between ACPI NVS and APEI
ACPI, Record ACPI NVS regions
ACPI, APEI, EINJ, Refine the fix of resource conflict
...
Field names were shortened: "pkg" is now "pk", "core" is now "cr"
Signed-off-by: Arun Thomas <arun.thomas@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Instead of printing something non-formatted to stdout, call
man(1) to show the man page for the proper subcommand.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Loosely based on a patch for cpufrequtils, submittted by
Sergey Dryabzhinsky <sergey.dryabzhinsky@gmail.com> and
signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
This allows for example:
cpupower -c 2-4,6 monitor -m Mperf
|Mperf
PKG |CORE|CPU | C0 | Cx | Freq
0| 8| 4| 2.42| 97.58| 1353
0| 16| 2| 14.38| 85.62| 1928
0| 24| 6| 1.76| 98.24| 1442
1| 16| 3| 15.53| 84.47| 1650
CPUs always get resorted for package, core then cpu id if it could get read out
(or however you name these topology levels...).
Still this is a nice way to keep the overview if a test binary is bound to
a specific CPU or if one wants to show all CPUs inside a package or similar.
Still missing: Do not measure not available cores to reduce the overhead
and achieve better results.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Before, checking for offlined CPUs was done dirty and
it was checked whether topology parsing returned -1 values.
But this is a valid case on a Xen (and possibly other) kernels.
Do proper online/offline checking, also take CONFIG_HOTPLUG_CPU
option into account (no /sys/devices/../cpuX/online file).
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Which makes the implementation independent from cpufreq drivers.
Therefore this would also work on a Xen kernel where the hypervisor
is doing frequency switching and idle entering.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
* 'tools-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
tools/power turbostat: fit output into 80 columns on snb-ep
tools/power x86_energy_perf_policy: fix print of uninitialized string
Reduce columns for package number to 1.
If you can afford more than 9 packages,
you can also afford a terminal with more than 80 columns:-)
Also shave a column also off the package C-states
Signed-off-by: Len Brown <len.brown@intel.com>
IA32-Intel Devel guide Volume 3A - 14.3.2.1
-------------------------------------------
...
Opportunistic processor performance operation can be disabled by setting bit 38 of
IA32_MISC_ENABLES. This mechanism is intended for BIOS only. If
IA32_MISC_ENABLES[38] is set, CPUID.06H:EAX[1] will return 0.
Better detect things via cpuid, this cleans up the code a bit
and the MSR parts were not working correctly anyway.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: lenb@kernel.org
CC: linux@dominikbrodowski.net
CC: cpufreq@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
This adds the last piece missing from turbostat (if called with -v).
It shows on Intel machines supporting Turbo Boost how many cores
have to be active/idle to enter which boost mode (frequency).
Whether the HW really enters these boost modes can be verified via
./cpupower monitor.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: lenb@kernel.org
CC: linux@dominikbrodowski.net
CC: cpufreq@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
larger sysfs data (>255 bytes) was truncated and thus used improperly
[linux@dominikbrodowski.net: adapted to cpupowerutils]
Signed-off-by: Roman Vasiyarov <rvasiyarov@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
As cpupowerutils is intended to be included into the kernel sources,
use the kernel versioning instead of a custom version.
The script utils/version-gen.sh is largely based on the script already
found in tools/perf/util/PERF-VERSION-GEN .
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Use the quiet/verbose mechanism found in kernel tools, without
relying on the special tool "ccdv"
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
CPU power consumption vs performance tuning is no longer
limited to CPU frequency switching anymore: deep sleep states,
traditional dynamic frequency scaling and hidden turbo/boost
frequencies are tied close together and depend on each other.
The first two exist on different architectures like PPC, Itanium and
ARM, the latter (so far) only on X86. On X86 the APU (CPU+GPU) will
only run most efficiently if CPU and GPU has proper power management
in place.
Users and Developers want to have *one* tool to get an overview what
their system supports and to monitor and debug CPU power management
in detail. The tool should compile and work on as many architectures
as possible.
Once this tool stabilizes a bit, it is intended to replace the
Intel-specific tools in tools/power/x86
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Follow kernel coding style traditions more closely.
Delete typedef, re-name "per cpu counters" to
simply be counters etc.
This patch changes no functionality.
Suggested-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
bug could cause false positive on indicating
presence of invarient TSC or APERF support.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
It is expected to become increasingly important in subsequent generations.
x86_energy_perf_policy is a user-space utility to set the
hardware energy vs performance policy hint in the processor.
Most systems would benefit from "x86_energy_perf_policy normal"
at system startup, as the hardware default is maximum performance
at the expense of energy efficiency.
See x86_energy_perf_policy.8 man page for more information.
Background:
Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
without actually modifying the MSR.
In March, 2010, Venkatesh Pallipadi proposed a small driver
that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
the cpufreq governor in use. It also offered
a boot-time cmdline option to override.
http://lkml.org/lkml/2010/3/4/457
But hiding the hardware policy behind the
governor choice was deemed "kinda icky".
In June, 2010, I proposed a generic user/kernel API to
generalize the power/performance policy trade-off.
"RFC: /sys/power/policy_preference"
http://lkml.org/lkml/2010/6/16/399
That is my preference for implementing this capability,
but I received no support on the list.
So in September, 2010, I sent x86_energy_perf_policy.c to LKML,
a user-space utility that scribbles directly to the MSR.
http://lkml.org/lkml/2010/9/28/246
Here is that same utility, after responding to some review feedback,
to live in tools/power/, where it is easily found.
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat is a Linux tool to observe proper operation
of Intel(R) Turbo Boost Technology.
turbostat displays the actual processor frequency
on x86 processors that include APERF and MPERF MSRs.
Note that turbostat is of limited utility on Linux
kernels 2.6.29 and older, as acpi_cpufreq cleared
APERF/MPERF up through that release.
On Intel Core i3/i5/i7 (Nehalem) and newer processors,
turbostat also displays residency in idle power saving states,
which are necessary for diagnosing any cpuidle issues
that may have an effect on turbo-mode.
See the turbostat.8 man page for example usage.
Signed-off-by: Len Brown <len.brown@intel.com>