cpufreq: User/admin documentation update and consolidation
The user/admin documentation of cpufreq is badly outdated. It conains stale and/or inaccurate information along with things that are not particularly useful. Also, some of the important pieces are missing from it. For this reason, add a new user/admin document for cpufreq containing current information to admin-guide and drop the old outdated .txt documents it is replacing. Since there will be more PM documents in admin-guide going forward, create a separate directory for them and put the cpufreq document in there right away. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
parent
8fa1bb506f
commit
2a0e492798
@ -60,6 +60,7 @@ configure specific aspects of kernel behavior to your liking.
|
|||||||
mono
|
mono
|
||||||
java
|
java
|
||||||
ras
|
ras
|
||||||
|
pm/index
|
||||||
|
|
||||||
.. only:: subproject and html
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
700
Documentation/admin-guide/pm/cpufreq.rst
Normal file
700
Documentation/admin-guide/pm/cpufreq.rst
Normal file
@ -0,0 +1,700 @@
|
|||||||
|
.. |struct cpufreq_policy| replace:: :c:type:`struct cpufreq_policy <cpufreq_policy>`
|
||||||
|
|
||||||
|
=======================
|
||||||
|
CPU Performance Scaling
|
||||||
|
=======================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
Copyright (c) 2017 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||||||
|
|
||||||
|
The Concept of CPU Performance Scaling
|
||||||
|
======================================
|
||||||
|
|
||||||
|
The majority of modern processors are capable of operating in a number of
|
||||||
|
different clock frequency and voltage configurations, often referred to as
|
||||||
|
Operating Performance Points or P-states (in ACPI terminology). As a rule,
|
||||||
|
the higher the clock frequency and the higher the voltage, the more instructions
|
||||||
|
can be retired by the CPU over a unit of time, but also the higher the clock
|
||||||
|
frequency and the higher the voltage, the more energy is consumed over a unit of
|
||||||
|
time (or the more power is drawn) by the CPU in the given P-state. Therefore
|
||||||
|
there is a natural tradeoff between the CPU capacity (the number of instructions
|
||||||
|
that can be executed over a unit of time) and the power drawn by the CPU.
|
||||||
|
|
||||||
|
In some situations it is desirable or even necessary to run the program as fast
|
||||||
|
as possible and then there is no reason to use any P-states different from the
|
||||||
|
highest one (i.e. the highest-performance frequency/voltage configuration
|
||||||
|
available). In some other cases, however, it may not be necessary to execute
|
||||||
|
instructions so quickly and maintaining the highest available CPU capacity for a
|
||||||
|
relatively long time without utilizing it entirely may be regarded as wasteful.
|
||||||
|
It also may not be physically possible to maintain maximum CPU capacity for too
|
||||||
|
long for thermal or power supply capacity reasons or similar. To cover those
|
||||||
|
cases, there are hardware interfaces allowing CPUs to be switched between
|
||||||
|
different frequency/voltage configurations or (in the ACPI terminology) to be
|
||||||
|
put into different P-states.
|
||||||
|
|
||||||
|
Typically, they are used along with algorithms to estimate the required CPU
|
||||||
|
capacity, so as to decide which P-states to put the CPUs into. Of course, since
|
||||||
|
the utilization of the system generally changes over time, that has to be done
|
||||||
|
repeatedly on a regular basis. The activity by which this happens is referred
|
||||||
|
to as CPU performance scaling or CPU frequency scaling (because it involves
|
||||||
|
adjusting the CPU clock frequency).
|
||||||
|
|
||||||
|
|
||||||
|
CPU Performance Scaling in Linux
|
||||||
|
================================
|
||||||
|
|
||||||
|
The Linux kernel supports CPU performance scaling by means of the ``CPUFreq``
|
||||||
|
(CPU Frequency scaling) subsystem that consists of three layers of code: the
|
||||||
|
core, scaling governors and scaling drivers.
|
||||||
|
|
||||||
|
The ``CPUFreq`` core provides the common code infrastructure and user space
|
||||||
|
interfaces for all platforms that support CPU performance scaling. It defines
|
||||||
|
the basic framework in which the other components operate.
|
||||||
|
|
||||||
|
Scaling governors implement algorithms to estimate the required CPU capacity.
|
||||||
|
As a rule, each governor implements one, possibly parametrized, scaling
|
||||||
|
algorithm.
|
||||||
|
|
||||||
|
Scaling drivers talk to the hardware. They provide scaling governors with
|
||||||
|
information on the available P-states (or P-state ranges in some cases) and
|
||||||
|
access platform-specific hardware interfaces to change CPU P-states as requested
|
||||||
|
by scaling governors.
|
||||||
|
|
||||||
|
In principle, all available scaling governors can be used with every scaling
|
||||||
|
driver. That design is based on the observation that the information used by
|
||||||
|
performance scaling algorithms for P-state selection can be represented in a
|
||||||
|
platform-independent form in the majority of cases, so it should be possible
|
||||||
|
to use the same performance scaling algorithm implemented in exactly the same
|
||||||
|
way regardless of which scaling driver is used. Consequently, the same set of
|
||||||
|
scaling governors should be suitable for every supported platform.
|
||||||
|
|
||||||
|
However, that observation may not hold for performance scaling algorithms
|
||||||
|
based on information provided by the hardware itself, for example through
|
||||||
|
feedback registers, as that information is typically specific to the hardware
|
||||||
|
interface it comes from and may not be easily represented in an abstract,
|
||||||
|
platform-independent way. For this reason, ``CPUFreq`` allows scaling drivers
|
||||||
|
to bypass the governor layer and implement their own performance scaling
|
||||||
|
algorithms. That is done by the ``intel_pstate`` scaling driver.
|
||||||
|
|
||||||
|
|
||||||
|
``CPUFreq`` Policy Objects
|
||||||
|
==========================
|
||||||
|
|
||||||
|
In some cases the hardware interface for P-state control is shared by multiple
|
||||||
|
CPUs. That is, for example, the same register (or set of registers) is used to
|
||||||
|
control the P-state of multiple CPUs at the same time and writing to it affects
|
||||||
|
all of those CPUs simultaneously.
|
||||||
|
|
||||||
|
Sets of CPUs sharing hardware P-state control interfaces are represented by
|
||||||
|
``CPUFreq`` as |struct cpufreq_policy| objects. For consistency,
|
||||||
|
|struct cpufreq_policy| is also used when there is only one CPU in the given
|
||||||
|
set.
|
||||||
|
|
||||||
|
The ``CPUFreq`` core maintains a pointer to a |struct cpufreq_policy| object for
|
||||||
|
every CPU in the system, including CPUs that are currently offline. If multiple
|
||||||
|
CPUs share the same hardware P-state control interface, all of the pointers
|
||||||
|
corresponding to them point to the same |struct cpufreq_policy| object.
|
||||||
|
|
||||||
|
``CPUFreq`` uses |struct cpufreq_policy| as its basic data type and the design
|
||||||
|
of its user space interface is based on the policy concept.
|
||||||
|
|
||||||
|
|
||||||
|
CPU Initialization
|
||||||
|
==================
|
||||||
|
|
||||||
|
First of all, a scaling driver has to be registered for ``CPUFreq`` to work.
|
||||||
|
It is only possible to register one scaling driver at a time, so the scaling
|
||||||
|
driver is expected to be able to handle all CPUs in the system.
|
||||||
|
|
||||||
|
The scaling driver may be registered before or after CPU registration. If
|
||||||
|
CPUs are registered earlier, the driver core invokes the ``CPUFreq`` core to
|
||||||
|
take a note of all of the already registered CPUs during the registration of the
|
||||||
|
scaling driver. In turn, if any CPUs are registered after the registration of
|
||||||
|
the scaling driver, the ``CPUFreq`` core will be invoked to take note of them
|
||||||
|
at their registration time.
|
||||||
|
|
||||||
|
In any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it
|
||||||
|
has not seen so far as soon as it is ready to handle that CPU. [Note that the
|
||||||
|
logical CPU may be a physical single-core processor, or a single core in a
|
||||||
|
multicore processor, or a hardware thread in a physical processor or processor
|
||||||
|
core. In what follows "CPU" always means "logical CPU" unless explicitly stated
|
||||||
|
otherwise and the word "processor" is used to refer to the physical part
|
||||||
|
possibly including multiple logical CPUs.]
|
||||||
|
|
||||||
|
Once invoked, the ``CPUFreq`` core checks if the policy pointer is already set
|
||||||
|
for the given CPU and if so, it skips the policy object creation. Otherwise,
|
||||||
|
a new policy object is created and initialized, which involves the creation of
|
||||||
|
a new policy directory in ``sysfs``, and the policy pointer corresponding to
|
||||||
|
the given CPU is set to the new policy object's address in memory.
|
||||||
|
|
||||||
|
Next, the scaling driver's ``->init()`` callback is invoked with the policy
|
||||||
|
pointer of the new CPU passed to it as the argument. That callback is expected
|
||||||
|
to initialize the performance scaling hardware interface for the given CPU (or,
|
||||||
|
more precisely, for the set of CPUs sharing the hardware interface it belongs
|
||||||
|
to, represented by its policy object) and, if the policy object it has been
|
||||||
|
called for is new, to set parameters of the policy, like the minimum and maximum
|
||||||
|
frequencies supported by the hardware, the table of available frequencies (if
|
||||||
|
the set of supported P-states is not a continuous range), and the mask of CPUs
|
||||||
|
that belong to the same policy (including both online and offline CPUs). That
|
||||||
|
mask is then used by the core to populate the policy pointers for all of the
|
||||||
|
CPUs in it.
|
||||||
|
|
||||||
|
The next major initialization step for a new policy object is to attach a
|
||||||
|
scaling governor to it (to begin with, that is the default scaling governor
|
||||||
|
determined by the kernel configuration, but it may be changed later
|
||||||
|
via ``sysfs``). First, a pointer to the new policy object is passed to the
|
||||||
|
governor's ``->init()`` callback which is expected to initialize all of the
|
||||||
|
data structures necessary to handle the given policy and, possibly, to add
|
||||||
|
a governor ``sysfs`` interface to it. Next, the governor is started by
|
||||||
|
invoking its ``->start()`` callback.
|
||||||
|
|
||||||
|
That callback it expected to register per-CPU utilization update callbacks for
|
||||||
|
all of the online CPUs belonging to the given policy with the CPU scheduler.
|
||||||
|
The utilization update callbacks will be invoked by the CPU scheduler on
|
||||||
|
important events, like task enqueue and dequeue, on every iteration of the
|
||||||
|
scheduler tick or generally whenever the CPU utilization may change (from the
|
||||||
|
scheduler's perspective). They are expected to carry out computations needed
|
||||||
|
to determine the P-state to use for the given policy going forward and to
|
||||||
|
invoke the scaling driver to make changes to the hardware in accordance with
|
||||||
|
the P-state selection. The scaling driver may be invoked directly from
|
||||||
|
scheduler context or asynchronously, via a kernel thread or workqueue, depending
|
||||||
|
on the configuration and capabilities of the scaling driver and the governor.
|
||||||
|
|
||||||
|
Similar steps are taken for policy objects that are not new, but were "inactive"
|
||||||
|
previously, meaning that all of the CPUs belonging to them were offline. The
|
||||||
|
only practical difference in that case is that the ``CPUFreq`` core will attempt
|
||||||
|
to use the scaling governor previously used with the policy that became
|
||||||
|
"inactive" (and is re-initialized now) instead of the default governor.
|
||||||
|
|
||||||
|
In turn, if a previously offline CPU is being brought back online, but some
|
||||||
|
other CPUs sharing the policy object with it are online already, there is no
|
||||||
|
need to re-initialize the policy object at all. In that case, it only is
|
||||||
|
necessary to restart the scaling governor so that it can take the new online CPU
|
||||||
|
into account. That is achieved by invoking the governor's ``->stop`` and
|
||||||
|
``->start()`` callbacks, in this order, for the entire policy.
|
||||||
|
|
||||||
|
As mentioned before, the ``intel_pstate`` scaling driver bypasses the scaling
|
||||||
|
governor layer of ``CPUFreq`` and provides its own P-state selection algorithms.
|
||||||
|
Consequently, if ``intel_pstate`` is used, scaling governors are not attached to
|
||||||
|
new policy objects. Instead, the driver's ``->setpolicy()`` callback is invoked
|
||||||
|
to register per-CPU utilization update callbacks for each policy. These
|
||||||
|
callbacks are invoked by the CPU scheduler in the same way as for scaling
|
||||||
|
governors, but in the ``intel_pstate`` case they both determine the P-state to
|
||||||
|
use and change the hardware configuration accordingly in one go from scheduler
|
||||||
|
context.
|
||||||
|
|
||||||
|
The policy objects created during CPU initialization and other data structures
|
||||||
|
associated with them are torn down when the scaling driver is unregistered
|
||||||
|
(which happens when the kernel module containing it is unloaded, for example) or
|
||||||
|
when the last CPU belonging to the given policy in unregistered.
|
||||||
|
|
||||||
|
|
||||||
|
Policy Interface in ``sysfs``
|
||||||
|
=============================
|
||||||
|
|
||||||
|
During the initialization of the kernel, the ``CPUFreq`` core creates a
|
||||||
|
``sysfs`` directory (kobject) called ``cpufreq`` under
|
||||||
|
:file:`/sys/devices/system/cpu/`.
|
||||||
|
|
||||||
|
That directory contains a ``policyX`` subdirectory (where ``X`` represents an
|
||||||
|
integer number) for every policy object maintained by the ``CPUFreq`` core.
|
||||||
|
Each ``policyX`` directory is pointed to by ``cpufreq`` symbolic links
|
||||||
|
under :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer
|
||||||
|
that may be different from the one represented by ``X``) for all of the CPUs
|
||||||
|
associated with (or belonging to) the given policy. The ``policyX`` directories
|
||||||
|
in :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific
|
||||||
|
attributes (files) to control ``CPUFreq`` behavior for the corresponding policy
|
||||||
|
objects (that is, for all of the CPUs associated with them).
|
||||||
|
|
||||||
|
Some of those attributes are generic. They are created by the ``CPUFreq`` core
|
||||||
|
and their behavior generally does not depend on what scaling driver is in use
|
||||||
|
and what scaling governor is attached to the given policy. Some scaling drivers
|
||||||
|
also add driver-specific attributes to the policy directories in ``sysfs`` to
|
||||||
|
control policy-specific aspects of driver behavior.
|
||||||
|
|
||||||
|
The generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/`
|
||||||
|
are the following:
|
||||||
|
|
||||||
|
``affected_cpus``
|
||||||
|
List of online CPUs belonging to this policy (i.e. sharing the hardware
|
||||||
|
performance scaling interface represented by the ``policyX`` policy
|
||||||
|
object).
|
||||||
|
|
||||||
|
``bios_limit``
|
||||||
|
If the platform firmware (BIOS) tells the OS to apply an upper limit to
|
||||||
|
CPU frequencies, that limit will be reported through this attribute (if
|
||||||
|
present).
|
||||||
|
|
||||||
|
The existence of the limit may be a result of some (often unintentional)
|
||||||
|
BIOS settings, restrictions coming from a service processor or another
|
||||||
|
BIOS/HW-based mechanisms.
|
||||||
|
|
||||||
|
This does not cover ACPI thermal limitations which can be discovered
|
||||||
|
through a generic thermal driver.
|
||||||
|
|
||||||
|
This attribute is not present if the scaling driver in use does not
|
||||||
|
support it.
|
||||||
|
|
||||||
|
``cpuinfo_max_freq``
|
||||||
|
Maximum possible operating frequency the CPUs belonging to this policy
|
||||||
|
can run at (in kHz).
|
||||||
|
|
||||||
|
``cpuinfo_min_freq``
|
||||||
|
Minimum possible operating frequency the CPUs belonging to this policy
|
||||||
|
can run at (in kHz).
|
||||||
|
|
||||||
|
``cpuinfo_transition_latency``
|
||||||
|
The time it takes to switch the CPUs belonging to this policy from one
|
||||||
|
P-state to another, in nanoseconds.
|
||||||
|
|
||||||
|
If unknown or if known to be so high that the scaling driver does not
|
||||||
|
work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`)
|
||||||
|
will be returned by reads from this attribute.
|
||||||
|
|
||||||
|
``related_cpus``
|
||||||
|
List of all (online and offline) CPUs belonging to this policy.
|
||||||
|
|
||||||
|
``scaling_available_governors``
|
||||||
|
List of ``CPUFreq`` scaling governors present in the kernel that can
|
||||||
|
be attached to this policy or (if the ``intel_pstate`` scaling driver is
|
||||||
|
in use) list of scaling algorithms provided by the driver that can be
|
||||||
|
applied to this policy.
|
||||||
|
|
||||||
|
[Note that some governors are modular and it may be necessary to load a
|
||||||
|
kernel module for the governor held by it to become available and be
|
||||||
|
listed by this attribute.]
|
||||||
|
|
||||||
|
``scaling_cur_freq``
|
||||||
|
Current frequency of all of the CPUs belonging to this policy (in kHz).
|
||||||
|
|
||||||
|
For the majority of scaling drivers, this is the frequency of the last
|
||||||
|
P-state requested by the driver from the hardware using the scaling
|
||||||
|
interface provided by it, which may or may not reflect the frequency
|
||||||
|
the CPU is actually running at (due to hardware design and other
|
||||||
|
limitations).
|
||||||
|
|
||||||
|
Some scaling drivers (e.g. ``intel_pstate``) attempt to provide
|
||||||
|
information more precisely reflecting the current CPU frequency through
|
||||||
|
this attribute, but that still may not be the exact current CPU
|
||||||
|
frequency as seen by the hardware at the moment.
|
||||||
|
|
||||||
|
``scaling_driver``
|
||||||
|
The scaling driver currently in use.
|
||||||
|
|
||||||
|
``scaling_governor``
|
||||||
|
The scaling governor currently attached to this policy or (if the
|
||||||
|
``intel_pstate`` scaling driver is in use) the scaling algorithm
|
||||||
|
provided by the driver that is currently applied to this policy.
|
||||||
|
|
||||||
|
This attribute is read-write and writing to it will cause a new scaling
|
||||||
|
governor to be attached to this policy or a new scaling algorithm
|
||||||
|
provided by the scaling driver to be applied to it (in the
|
||||||
|
``intel_pstate`` case), as indicated by the string written to this
|
||||||
|
attribute (which must be one of the names listed by the
|
||||||
|
``scaling_available_governors`` attribute described above).
|
||||||
|
|
||||||
|
``scaling_max_freq``
|
||||||
|
Maximum frequency the CPUs belonging to this policy are allowed to be
|
||||||
|
running at (in kHz).
|
||||||
|
|
||||||
|
This attribute is read-write and writing a string representing an
|
||||||
|
integer to it will cause a new limit to be set (it must not be lower
|
||||||
|
than the value of the ``scaling_min_freq`` attribute).
|
||||||
|
|
||||||
|
``scaling_min_freq``
|
||||||
|
Minimum frequency the CPUs belonging to this policy are allowed to be
|
||||||
|
running at (in kHz).
|
||||||
|
|
||||||
|
This attribute is read-write and writing a string representing a
|
||||||
|
non-negative integer to it will cause a new limit to be set (it must not
|
||||||
|
be higher than the value of the ``scaling_max_freq`` attribute).
|
||||||
|
|
||||||
|
``scaling_setspeed``
|
||||||
|
This attribute is functional only if the `userspace`_ scaling governor
|
||||||
|
is attached to the given policy.
|
||||||
|
|
||||||
|
It returns the last frequency requested by the governor (in kHz) or can
|
||||||
|
be written to in order to set a new frequency for the policy.
|
||||||
|
|
||||||
|
|
||||||
|
Generic Scaling Governors
|
||||||
|
=========================
|
||||||
|
|
||||||
|
``CPUFreq`` provides generic scaling governors that can be used with all
|
||||||
|
scaling drivers. As stated before, each of them implements a single, possibly
|
||||||
|
parametrized, performance scaling algorithm.
|
||||||
|
|
||||||
|
Scaling governors are attached to policy objects and different policy objects
|
||||||
|
can be handled by different scaling governors at the same time (although that
|
||||||
|
may lead to suboptimal results in some cases).
|
||||||
|
|
||||||
|
The scaling governor for a given policy object can be changed at any time with
|
||||||
|
the help of the ``scaling_governor`` policy attribute in ``sysfs``.
|
||||||
|
|
||||||
|
Some governors expose ``sysfs`` attributes to control or fine-tune the scaling
|
||||||
|
algorithms implemented by them. Those attributes, referred to as governor
|
||||||
|
tunables, can be either global (system-wide) or per-policy, depending on the
|
||||||
|
scaling driver in use. If the driver requires governor tunables to be
|
||||||
|
per-policy, they are located in a subdirectory of each policy directory.
|
||||||
|
Otherwise, they are located in a subdirectory under
|
||||||
|
:file:`/sys/devices/system/cpu/cpufreq/`. In either case the name of the
|
||||||
|
subdirectory containing the governor tunables is the name of the governor
|
||||||
|
providing them.
|
||||||
|
|
||||||
|
``performance``
|
||||||
|
---------------
|
||||||
|
|
||||||
|
When attached to a policy object, this governor causes the highest frequency,
|
||||||
|
within the ``scaling_max_freq`` policy limit, to be requested for that policy.
|
||||||
|
|
||||||
|
The request is made once at that time the governor for the policy is set to
|
||||||
|
``performance`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq``
|
||||||
|
policy limits change after that.
|
||||||
|
|
||||||
|
``powersave``
|
||||||
|
-------------
|
||||||
|
|
||||||
|
When attached to a policy object, this governor causes the lowest frequency,
|
||||||
|
within the ``scaling_min_freq`` policy limit, to be requested for that policy.
|
||||||
|
|
||||||
|
The request is made once at that time the governor for the policy is set to
|
||||||
|
``powersave`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq``
|
||||||
|
policy limits change after that.
|
||||||
|
|
||||||
|
``userspace``
|
||||||
|
-------------
|
||||||
|
|
||||||
|
This governor does not do anything by itself. Instead, it allows user space
|
||||||
|
to set the CPU frequency for the policy it is attached to by writing to the
|
||||||
|
``scaling_setspeed`` attribute of that policy.
|
||||||
|
|
||||||
|
``schedutil``
|
||||||
|
-------------
|
||||||
|
|
||||||
|
This governor uses CPU utilization data available from the CPU scheduler. It
|
||||||
|
generally is regarded as a part of the CPU scheduler, so it can access the
|
||||||
|
scheduler's internal data structures directly.
|
||||||
|
|
||||||
|
It runs entirely in scheduler context, although in some cases it may need to
|
||||||
|
invoke the scaling driver asynchronously when it decides that the CPU frequency
|
||||||
|
should be changed for a given policy (that depends on whether or not the driver
|
||||||
|
is capable of changing the CPU frequency from scheduler context).
|
||||||
|
|
||||||
|
The actions of this governor for a particular CPU depend on the scheduling class
|
||||||
|
invoking its utilization update callback for that CPU. If it is invoked by the
|
||||||
|
RT or deadline scheduling classes, the governor will increase the frequency to
|
||||||
|
the allowed maximum (that is, the ``scaling_max_freq`` policy limit). In turn,
|
||||||
|
if it is invoked by the CFS scheduling class, the governor will use the
|
||||||
|
Per-Entity Load Tracking (PELT) metric for the root control group of the
|
||||||
|
given CPU as the CPU utilization estimate (see the `Per-entity load tracking`_
|
||||||
|
LWN.net article for a description of the PELT mechanism). Then, the new
|
||||||
|
CPU frequency to apply is computed in accordance with the formula
|
||||||
|
|
||||||
|
f = 1.25 * ``f_0`` * ``util`` / ``max``
|
||||||
|
|
||||||
|
where ``util`` is the PELT number, ``max`` is the theoretical maximum of
|
||||||
|
``util``, and ``f_0`` is either the maximum possible CPU frequency for the given
|
||||||
|
policy (if the PELT number is frequency-invariant), or the current CPU frequency
|
||||||
|
(otherwise).
|
||||||
|
|
||||||
|
This governor also employs a mechanism allowing it to temporarily bump up the
|
||||||
|
CPU frequency for tasks that have been waiting on I/O most recently, called
|
||||||
|
"IO-wait boosting". That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag
|
||||||
|
is passed by the scheduler to the governor callback which causes the frequency
|
||||||
|
to go up to the allowed maximum immediately and then draw back to the value
|
||||||
|
returned by the above formula over time.
|
||||||
|
|
||||||
|
This governor exposes only one tunable:
|
||||||
|
|
||||||
|
``rate_limit_us``
|
||||||
|
Minimum time (in microseconds) that has to pass between two consecutive
|
||||||
|
runs of governor computations (default: 1000 times the scaling driver's
|
||||||
|
transition latency).
|
||||||
|
|
||||||
|
The purpose of this tunable is to reduce the scheduler context overhead
|
||||||
|
of the governor which might be excessive without it.
|
||||||
|
|
||||||
|
This governor generally is regarded as a replacement for the older `ondemand`_
|
||||||
|
and `conservative`_ governors (described below), as it is simpler and more
|
||||||
|
tightly integrated with the CPU scheduler, its overhead in terms of CPU context
|
||||||
|
switches and similar is less significant, and it uses the scheduler's own CPU
|
||||||
|
utilization metric, so in principle its decisions should not contradict the
|
||||||
|
decisions made by the other parts of the scheduler.
|
||||||
|
|
||||||
|
``ondemand``
|
||||||
|
------------
|
||||||
|
|
||||||
|
This governor uses CPU load as a CPU frequency selection metric.
|
||||||
|
|
||||||
|
In order to estimate the current CPU load, it measures the time elapsed between
|
||||||
|
consecutive invocations of its worker routine and computes the fraction of that
|
||||||
|
time in which the given CPU was not idle. The ratio of the non-idle (active)
|
||||||
|
time to the total CPU time is taken as an estimate of the load.
|
||||||
|
|
||||||
|
If this governor is attached to a policy shared by multiple CPUs, the load is
|
||||||
|
estimated for all of them and the greatest result is taken as the load estimate
|
||||||
|
for the entire policy.
|
||||||
|
|
||||||
|
The worker routine of this governor has to run in process context, so it is
|
||||||
|
invoked asynchronously (via a workqueue) and CPU P-states are updated from
|
||||||
|
there if necessary. As a result, the scheduler context overhead from this
|
||||||
|
governor is minimum, but it causes additional CPU context switches to happen
|
||||||
|
relatively often and the CPU P-state updates triggered by it can be relatively
|
||||||
|
irregular. Also, it affects its own CPU load metric by running code that
|
||||||
|
reduces the CPU idle time (even though the CPU idle time is only reduced very
|
||||||
|
slightly by it).
|
||||||
|
|
||||||
|
It generally selects CPU frequencies proportional to the estimated load, so that
|
||||||
|
the value of the ``cpuinfo_max_freq`` policy attribute corresponds to the load of
|
||||||
|
1 (or 100%), and the value of the ``cpuinfo_min_freq`` policy attribute
|
||||||
|
corresponds to the load of 0, unless when the load exceeds a (configurable)
|
||||||
|
speedup threshold, in which case it will go straight for the highest frequency
|
||||||
|
it is allowed to use (the ``scaling_max_freq`` policy limit).
|
||||||
|
|
||||||
|
This governor exposes the following tunables:
|
||||||
|
|
||||||
|
``sampling_rate``
|
||||||
|
This is how often the governor's worker routine should run, in
|
||||||
|
microseconds.
|
||||||
|
|
||||||
|
Typically, it is set to values of the order of 10000 (10 ms). Its
|
||||||
|
default value is equal to the value of ``cpuinfo_transition_latency``
|
||||||
|
for each policy this governor is attached to (but since the unit here
|
||||||
|
is greater by 1000, this means that the time represented by
|
||||||
|
``sampling_rate`` is 1000 times greater than the transition latency by
|
||||||
|
default).
|
||||||
|
|
||||||
|
If this tunable is per-policy, the following shell command sets the time
|
||||||
|
represented by it to be 750 times as high as the transition latency::
|
||||||
|
|
||||||
|
# echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
|
||||||
|
|
||||||
|
|
||||||
|
``min_sampling_rate``
|
||||||
|
The minimum value of ``sampling_rate``.
|
||||||
|
|
||||||
|
Equal to 10000 (10 ms) if :c:macro:`CONFIG_NO_HZ_COMMON` and
|
||||||
|
:c:data:`tick_nohz_active` are both set or to 20 times the value of
|
||||||
|
:c:data:`jiffies` in microseconds otherwise.
|
||||||
|
|
||||||
|
``up_threshold``
|
||||||
|
If the estimated CPU load is above this value (in percent), the governor
|
||||||
|
will set the frequency to the maximum value allowed for the policy.
|
||||||
|
Otherwise, the selected frequency will be proportional to the estimated
|
||||||
|
CPU load.
|
||||||
|
|
||||||
|
``ignore_nice_load``
|
||||||
|
If set to 1 (default 0), it will cause the CPU load estimation code to
|
||||||
|
treat the CPU time spent on executing tasks with "nice" levels greater
|
||||||
|
than 0 as CPU idle time.
|
||||||
|
|
||||||
|
This may be useful if there are tasks in the system that should not be
|
||||||
|
taken into account when deciding what frequency to run the CPUs at.
|
||||||
|
Then, to make that happen it is sufficient to increase the "nice" level
|
||||||
|
of those tasks above 0 and set this attribute to 1.
|
||||||
|
|
||||||
|
``sampling_down_factor``
|
||||||
|
Temporary multiplier, between 1 (default) and 100 inclusive, to apply to
|
||||||
|
the ``sampling_rate`` value if the CPU load goes above ``up_threshold``.
|
||||||
|
|
||||||
|
This causes the next execution of the governor's worker routine (after
|
||||||
|
setting the frequency to the allowed maximum) to be delayed, so the
|
||||||
|
frequency stays at the maximum level for a longer time.
|
||||||
|
|
||||||
|
Frequency fluctuations in some bursty workloads may be avoided this way
|
||||||
|
at the cost of additional energy spent on maintaining the maximum CPU
|
||||||
|
capacity.
|
||||||
|
|
||||||
|
``powersave_bias``
|
||||||
|
Reduction factor to apply to the original frequency target of the
|
||||||
|
governor (including the maximum value used when the ``up_threshold``
|
||||||
|
value is exceeded by the estimated CPU load) or sensitivity threshold
|
||||||
|
for the AMD frequency sensitivity powersave bias driver
|
||||||
|
(:file:`drivers/cpufreq/amd_freq_sensitivity.c`), between 0 and 1000
|
||||||
|
inclusive.
|
||||||
|
|
||||||
|
If the AMD frequency sensitivity powersave bias driver is not loaded,
|
||||||
|
the effective frequency to apply is given by
|
||||||
|
|
||||||
|
f * (1 - ``powersave_bias`` / 1000)
|
||||||
|
|
||||||
|
where f is the governor's original frequency target. The default value
|
||||||
|
of this attribute is 0 in that case.
|
||||||
|
|
||||||
|
If the AMD frequency sensitivity powersave bias driver is loaded, the
|
||||||
|
value of this attribute is 400 by default and it is used in a different
|
||||||
|
way.
|
||||||
|
|
||||||
|
On Family 16h (and later) AMD processors there is a mechanism to get a
|
||||||
|
measured workload sensitivity, between 0 and 100% inclusive, from the
|
||||||
|
hardware. That value can be used to estimate how the performance of the
|
||||||
|
workload running on a CPU will change in response to frequency changes.
|
||||||
|
|
||||||
|
The performance of a workload with the sensitivity of 0 (memory-bound or
|
||||||
|
IO-bound) is not expected to increase at all as a result of increasing
|
||||||
|
the CPU frequency, whereas workloads with the sensitivity of 100%
|
||||||
|
(CPU-bound) are expected to perform much better if the CPU frequency is
|
||||||
|
increased.
|
||||||
|
|
||||||
|
If the workload sensitivity is less than the threshold represented by
|
||||||
|
the ``powersave_bias`` value, the sensitivity powersave bias driver
|
||||||
|
will cause the governor to select a frequency lower than its original
|
||||||
|
target, so as to avoid over-provisioning workloads that will not benefit
|
||||||
|
from running at higher CPU frequencies.
|
||||||
|
|
||||||
|
``conservative``
|
||||||
|
----------------
|
||||||
|
|
||||||
|
This governor uses CPU load as a CPU frequency selection metric.
|
||||||
|
|
||||||
|
It estimates the CPU load in the same way as the `ondemand`_ governor described
|
||||||
|
above, but the CPU frequency selection algorithm implemented by it is different.
|
||||||
|
|
||||||
|
Namely, it avoids changing the frequency significantly over short time intervals
|
||||||
|
which may not be suitable for systems with limited power supply capacity (e.g.
|
||||||
|
battery-powered). To achieve that, it changes the frequency in relatively
|
||||||
|
small steps, one step at a time, up or down - depending on whether or not a
|
||||||
|
(configurable) threshold has been exceeded by the estimated CPU load.
|
||||||
|
|
||||||
|
This governor exposes the following tunables:
|
||||||
|
|
||||||
|
``freq_step``
|
||||||
|
Frequency step in percent of the maximum frequency the governor is
|
||||||
|
allowed to set (the ``scaling_max_freq`` policy limit), between 0 and
|
||||||
|
100 (5 by default).
|
||||||
|
|
||||||
|
This is how much the frequency is allowed to change in one go. Setting
|
||||||
|
it to 0 will cause the default frequency step (5 percent) to be used
|
||||||
|
and setting it to 100 effectively causes the governor to periodically
|
||||||
|
switch the frequency between the ``scaling_min_freq`` and
|
||||||
|
``scaling_max_freq`` policy limits.
|
||||||
|
|
||||||
|
``down_threshold``
|
||||||
|
Threshold value (in percent, 20 by default) used to determine the
|
||||||
|
frequency change direction.
|
||||||
|
|
||||||
|
If the estimated CPU load is greater than this value, the frequency will
|
||||||
|
go up (by ``freq_step``). If the load is less than this value (and the
|
||||||
|
``sampling_down_factor`` mechanism is not in effect), the frequency will
|
||||||
|
go down. Otherwise, the frequency will not be changed.
|
||||||
|
|
||||||
|
``sampling_down_factor``
|
||||||
|
Frequency decrease deferral factor, between 1 (default) and 10
|
||||||
|
inclusive.
|
||||||
|
|
||||||
|
It effectively causes the frequency to go down ``sampling_down_factor``
|
||||||
|
times slower than it ramps up.
|
||||||
|
|
||||||
|
|
||||||
|
Frequency Boost Support
|
||||||
|
=======================
|
||||||
|
|
||||||
|
Background
|
||||||
|
----------
|
||||||
|
|
||||||
|
Some processors support a mechanism to raise the operating frequency of some
|
||||||
|
cores in a multicore package temporarily (and above the sustainable frequency
|
||||||
|
threshold for the whole package) under certain conditions, for example if the
|
||||||
|
whole chip is not fully utilized and below its intended thermal or power budget.
|
||||||
|
|
||||||
|
Different names are used by different vendors to refer to this functionality.
|
||||||
|
For Intel processors it is referred to as "Turbo Boost", AMD calls it
|
||||||
|
"Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on.
|
||||||
|
As a rule, it also is implemented differently by different vendors. The simple
|
||||||
|
term "frequency boost" is used here for brevity to refer to all of those
|
||||||
|
implementations.
|
||||||
|
|
||||||
|
The frequency boost mechanism may be either hardware-based or software-based.
|
||||||
|
If it is hardware-based (e.g. on x86), the decision to trigger the boosting is
|
||||||
|
made by the hardware (although in general it requires the hardware to be put
|
||||||
|
into a special state in which it can control the CPU frequency within certain
|
||||||
|
limits). If it is software-based (e.g. on ARM), the scaling driver decides
|
||||||
|
whether or not to trigger boosting and when to do that.
|
||||||
|
|
||||||
|
The ``boost`` File in ``sysfs``
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls
|
||||||
|
the "boost" setting for the whole system. It is not present if the underlying
|
||||||
|
scaling driver does not support the frequency boost mechanism (or supports it,
|
||||||
|
but provides a driver-specific interface for controlling it, like
|
||||||
|
``intel_pstate``).
|
||||||
|
|
||||||
|
If the value in this file is 1, the frequency boost mechanism is enabled. This
|
||||||
|
means that either the hardware can be put into states in which it is able to
|
||||||
|
trigger boosting (in the hardware-based case), or the software is allowed to
|
||||||
|
trigger boosting (in the software-based case). It does not mean that boosting
|
||||||
|
is actually in use at the moment on any CPUs in the system. It only means a
|
||||||
|
permission to use the frequency boost mechanism (which still may never be used
|
||||||
|
for other reasons).
|
||||||
|
|
||||||
|
If the value in this file is 0, the frequency boost mechanism is disabled and
|
||||||
|
cannot be used at all.
|
||||||
|
|
||||||
|
The only values that can be written to this file are 0 and 1.
|
||||||
|
|
||||||
|
Rationale for Boost Control Knob
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
The frequency boost mechanism is generally intended to help to achieve optimum
|
||||||
|
CPU performance on time scales below software resolution (e.g. below the
|
||||||
|
scheduler tick interval) and it is demonstrably suitable for many workloads, but
|
||||||
|
it may lead to problems in certain situations.
|
||||||
|
|
||||||
|
For this reason, many systems make it possible to disable the frequency boost
|
||||||
|
mechanism in the platform firmware (BIOS) setup, but that requires the system to
|
||||||
|
be restarted for the setting to be adjusted as desired, which may not be
|
||||||
|
practical at least in some cases. For example:
|
||||||
|
|
||||||
|
1. Boosting means overclocking the processor, although under controlled
|
||||||
|
conditions. Generally, the processor's energy consumption increases
|
||||||
|
as a result of increasing its frequency and voltage, even temporarily.
|
||||||
|
That may not be desirable on systems that switch to power sources of
|
||||||
|
limited capacity, such as batteries, so the ability to disable the boost
|
||||||
|
mechanism while the system is running may help there (but that depends on
|
||||||
|
the workload too).
|
||||||
|
|
||||||
|
2. In some situations deterministic behavior is more important than
|
||||||
|
performance or energy consumption (or both) and the ability to disable
|
||||||
|
boosting while the system is running may be useful then.
|
||||||
|
|
||||||
|
3. To examine the impact of the frequency boost mechanism itself, it is useful
|
||||||
|
to be able to run tests with and without boosting, preferably without
|
||||||
|
restarting the system in the meantime.
|
||||||
|
|
||||||
|
4. Reproducible results are important when running benchmarks. Since
|
||||||
|
the boosting functionality depends on the load of the whole package,
|
||||||
|
single-thread performance may vary because of it which may lead to
|
||||||
|
unreproducible results sometimes. That can be avoided by disabling the
|
||||||
|
frequency boost mechanism before running benchmarks sensitive to that
|
||||||
|
issue.
|
||||||
|
|
||||||
|
Legacy AMD ``cpb`` Knob
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
The AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to
|
||||||
|
the global ``boost`` one. It is used for disabling/enabling the "Core
|
||||||
|
Performance Boost" feature of some AMD processors.
|
||||||
|
|
||||||
|
If present, that knob is located in every ``CPUFreq`` policy directory in
|
||||||
|
``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called
|
||||||
|
``cpb``, which indicates a more fine grained control interface. The actual
|
||||||
|
implementation, however, works on the system-wide basis and setting that knob
|
||||||
|
for one policy causes the same value of it to be set for all of the other
|
||||||
|
policies at the same time.
|
||||||
|
|
||||||
|
That knob is still supported on AMD processors that support its underlying
|
||||||
|
hardware feature, but it may be configured out of the kernel (via the
|
||||||
|
:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option) and the global
|
||||||
|
``boost`` knob is present regardless. Thus it is always possible use the
|
||||||
|
``boost`` knob instead of the ``cpb`` one which is highly recommended, as that
|
||||||
|
is more consistent with what all of the other systems do (and the ``cpb`` knob
|
||||||
|
may not be supported any more in the future).
|
||||||
|
|
||||||
|
The ``cpb`` knob is never present for any processors without the underlying
|
||||||
|
hardware feature (e.g. all Intel ones), even if the
|
||||||
|
:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option is set.
|
||||||
|
|
||||||
|
|
||||||
|
.. _Per-entity load tracking: https://lwn.net/Articles/531853/
|
15
Documentation/admin-guide/pm/index.rst
Normal file
15
Documentation/admin-guide/pm/index.rst
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
================
|
||||||
|
Power Management
|
||||||
|
================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
cpufreq
|
||||||
|
|
||||||
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
Indices
|
||||||
|
=======
|
||||||
|
|
||||||
|
* :ref:`genindex`
|
@ -1,93 +0,0 @@
|
|||||||
Processor boosting control
|
|
||||||
|
|
||||||
- information for users -
|
|
||||||
|
|
||||||
Quick guide for the impatient:
|
|
||||||
--------------------
|
|
||||||
/sys/devices/system/cpu/cpufreq/boost
|
|
||||||
controls the boost setting for the whole system. You can read and write
|
|
||||||
that file with either "0" (boosting disabled) or "1" (boosting allowed).
|
|
||||||
Reading or writing 1 does not mean that the system is boosting at this
|
|
||||||
very moment, but only that the CPU _may_ raise the frequency at it's
|
|
||||||
discretion.
|
|
||||||
--------------------
|
|
||||||
|
|
||||||
Introduction
|
|
||||||
-------------
|
|
||||||
Some CPUs support a functionality to raise the operating frequency of
|
|
||||||
some cores in a multi-core package if certain conditions apply, mostly
|
|
||||||
if the whole chip is not fully utilized and below it's intended thermal
|
|
||||||
budget. The decision about boost disable/enable is made either at hardware
|
|
||||||
(e.g. x86) or software (e.g ARM).
|
|
||||||
On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core",
|
|
||||||
in technical documentation "Core performance boost". In Linux we use
|
|
||||||
the term "boost" for convenience.
|
|
||||||
|
|
||||||
Rationale for disable switch
|
|
||||||
----------------------------
|
|
||||||
|
|
||||||
Though the idea is to just give better performance without any user
|
|
||||||
intervention, sometimes the need arises to disable this functionality.
|
|
||||||
Most systems offer a switch in the (BIOS) firmware to disable the
|
|
||||||
functionality at all, but a more fine-grained and dynamic control would
|
|
||||||
be desirable:
|
|
||||||
1. While running benchmarks, reproducible results are important. Since
|
|
||||||
the boosting functionality depends on the load of the whole package,
|
|
||||||
single thread performance can vary. By explicitly disabling the boost
|
|
||||||
functionality at least for the benchmark's run-time the system will run
|
|
||||||
at a fixed frequency and results are reproducible again.
|
|
||||||
2. To examine the impact of the boosting functionality it is helpful
|
|
||||||
to do tests with and without boosting.
|
|
||||||
3. Boosting means overclocking the processor, though under controlled
|
|
||||||
conditions. By raising the frequency and the voltage the processor
|
|
||||||
will consume more power than without the boosting, which may be
|
|
||||||
undesirable for instance for mobile users. Disabling boosting may
|
|
||||||
save power here, though this depends on the workload.
|
|
||||||
|
|
||||||
|
|
||||||
User controlled switch
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
To allow the user to toggle the boosting functionality, the cpufreq core
|
|
||||||
driver exports a sysfs knob to enable or disable it. There is a file:
|
|
||||||
/sys/devices/system/cpu/cpufreq/boost
|
|
||||||
which can either read "0" (boosting disabled) or "1" (boosting enabled).
|
|
||||||
The file is exported only when cpufreq driver supports boosting.
|
|
||||||
Explicitly changing the permissions and writing to that file anyway will
|
|
||||||
return EINVAL.
|
|
||||||
|
|
||||||
On supported CPUs one can write either a "0" or a "1" into this file.
|
|
||||||
This will either disable the boost functionality on all cores in the
|
|
||||||
whole system (0) or will allow the software or hardware to boost at will
|
|
||||||
(1).
|
|
||||||
|
|
||||||
Writing a "1" does not explicitly boost the system, but just allows the
|
|
||||||
CPU to boost at their discretion. Some implementations take external
|
|
||||||
factors like the chip's temperature into account, so boosting once does
|
|
||||||
not necessarily mean that it will occur every time even using the exact
|
|
||||||
same software setup.
|
|
||||||
|
|
||||||
|
|
||||||
AMD legacy cpb switch
|
|
||||||
---------------------
|
|
||||||
The AMD powernow-k8 driver used to support a very similar switch to
|
|
||||||
disable or enable the "Core Performance Boost" feature of some AMD CPUs.
|
|
||||||
This switch was instantiated in each CPU's cpufreq directory
|
|
||||||
(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb".
|
|
||||||
Though the per CPU existence hints at a more fine grained control, the
|
|
||||||
actual implementation only supported a system-global switch semantics,
|
|
||||||
which was simply reflected into each CPU's file. Writing a 0 or 1 into it
|
|
||||||
would pull the other CPUs to the same state.
|
|
||||||
For compatibility reasons this file and its behavior is still supported
|
|
||||||
on AMD CPUs, though it is now protected by a config switch
|
|
||||||
(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created,
|
|
||||||
even with the config option set.
|
|
||||||
This functionality is considered legacy and will be removed in some future
|
|
||||||
kernel version.
|
|
||||||
|
|
||||||
More fine grained boosting control
|
|
||||||
----------------------------------
|
|
||||||
|
|
||||||
Technically it is possible to switch the boosting functionality at least
|
|
||||||
on a per package basis, for some CPUs even per core. Currently the driver
|
|
||||||
does not support it, but this may be implemented in the future.
|
|
@ -1,301 +0,0 @@
|
|||||||
CPU frequency and voltage scaling code in the Linux(TM) kernel
|
|
||||||
|
|
||||||
|
|
||||||
L i n u x C P U F r e q
|
|
||||||
|
|
||||||
C P U F r e q G o v e r n o r s
|
|
||||||
|
|
||||||
- information for users and developers -
|
|
||||||
|
|
||||||
|
|
||||||
Dominik Brodowski <linux@brodo.de>
|
|
||||||
some additions and corrections by Nico Golde <nico@ngolde.de>
|
|
||||||
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
||||||
Viresh Kumar <viresh.kumar@linaro.org>
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Clock scaling allows you to change the clock speed of the CPUs on the
|
|
||||||
fly. This is a nice method to save battery power, because the lower
|
|
||||||
the clock speed, the less power the CPU consumes.
|
|
||||||
|
|
||||||
|
|
||||||
Contents:
|
|
||||||
---------
|
|
||||||
1. What is a CPUFreq Governor?
|
|
||||||
|
|
||||||
2. Governors In the Linux Kernel
|
|
||||||
2.1 Performance
|
|
||||||
2.2 Powersave
|
|
||||||
2.3 Userspace
|
|
||||||
2.4 Ondemand
|
|
||||||
2.5 Conservative
|
|
||||||
2.6 Schedutil
|
|
||||||
|
|
||||||
3. The Governor Interface in the CPUfreq Core
|
|
||||||
|
|
||||||
4. References
|
|
||||||
|
|
||||||
|
|
||||||
1. What Is A CPUFreq Governor?
|
|
||||||
==============================
|
|
||||||
|
|
||||||
Most cpufreq drivers (except the intel_pstate and longrun) or even most
|
|
||||||
cpu frequency scaling algorithms only allow the CPU frequency to be set
|
|
||||||
to predefined fixed values. In order to offer dynamic frequency
|
|
||||||
scaling, the cpufreq core must be able to tell these drivers of a
|
|
||||||
"target frequency". So these specific drivers will be transformed to
|
|
||||||
offer a "->target/target_index/fast_switch()" call instead of the
|
|
||||||
"->setpolicy()" call. For set_policy drivers, all stays the same,
|
|
||||||
though.
|
|
||||||
|
|
||||||
How to decide what frequency within the CPUfreq policy should be used?
|
|
||||||
That's done using "cpufreq governors".
|
|
||||||
|
|
||||||
Basically, it's the following flow graph:
|
|
||||||
|
|
||||||
CPU can be set to switch independently | CPU can only be set
|
|
||||||
within specific "limits" | to specific frequencies
|
|
||||||
|
|
||||||
"CPUfreq policy"
|
|
||||||
consists of frequency limits (policy->{min,max})
|
|
||||||
and CPUfreq governor to be used
|
|
||||||
/ \
|
|
||||||
/ \
|
|
||||||
/ the cpufreq governor decides
|
|
||||||
/ (dynamically or statically)
|
|
||||||
/ what target_freq to set within
|
|
||||||
/ the limits of policy->{min,max}
|
|
||||||
/ \
|
|
||||||
/ \
|
|
||||||
Using the ->setpolicy call, Using the ->target/target_index/fast_switch call,
|
|
||||||
the limits and the the frequency closest
|
|
||||||
"policy" is set. to target_freq is set.
|
|
||||||
It is assured that it
|
|
||||||
is within policy->{min,max}
|
|
||||||
|
|
||||||
|
|
||||||
2. Governors In the Linux Kernel
|
|
||||||
================================
|
|
||||||
|
|
||||||
2.1 Performance
|
|
||||||
---------------
|
|
||||||
|
|
||||||
The CPUfreq governor "performance" sets the CPU statically to the
|
|
||||||
highest frequency within the borders of scaling_min_freq and
|
|
||||||
scaling_max_freq.
|
|
||||||
|
|
||||||
|
|
||||||
2.2 Powersave
|
|
||||||
-------------
|
|
||||||
|
|
||||||
The CPUfreq governor "powersave" sets the CPU statically to the
|
|
||||||
lowest frequency within the borders of scaling_min_freq and
|
|
||||||
scaling_max_freq.
|
|
||||||
|
|
||||||
|
|
||||||
2.3 Userspace
|
|
||||||
-------------
|
|
||||||
|
|
||||||
The CPUfreq governor "userspace" allows the user, or any userspace
|
|
||||||
program running with UID "root", to set the CPU to a specific frequency
|
|
||||||
by making a sysfs file "scaling_setspeed" available in the CPU-device
|
|
||||||
directory.
|
|
||||||
|
|
||||||
|
|
||||||
2.4 Ondemand
|
|
||||||
------------
|
|
||||||
|
|
||||||
The CPUfreq governor "ondemand" sets the CPU frequency depending on the
|
|
||||||
current system load. Load estimation is triggered by the scheduler
|
|
||||||
through the update_util_data->func hook; when triggered, cpufreq checks
|
|
||||||
the CPU-usage statistics over the last period and the governor sets the
|
|
||||||
CPU accordingly. The CPU must have the capability to switch the
|
|
||||||
frequency very quickly.
|
|
||||||
|
|
||||||
Sysfs files:
|
|
||||||
|
|
||||||
* sampling_rate:
|
|
||||||
|
|
||||||
Measured in uS (10^-6 seconds), this is how often you want the kernel
|
|
||||||
to look at the CPU usage and to make decisions on what to do about the
|
|
||||||
frequency. Typically this is set to values of around '10000' or more.
|
|
||||||
It's default value is (cmp. with users-guide.txt): transition_latency
|
|
||||||
* 1000. Be aware that transition latency is in ns and sampling_rate
|
|
||||||
is in us, so you get the same sysfs value by default. Sampling rate
|
|
||||||
should always get adjusted considering the transition latency to set
|
|
||||||
the sampling rate 750 times as high as the transition latency in the
|
|
||||||
bash (as said, 1000 is default), do:
|
|
||||||
|
|
||||||
$ echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
|
|
||||||
|
|
||||||
* sampling_rate_min:
|
|
||||||
|
|
||||||
The sampling rate is limited by the HW transition latency:
|
|
||||||
transition_latency * 100
|
|
||||||
|
|
||||||
Or by kernel restrictions:
|
|
||||||
- If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed.
|
|
||||||
- If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is
|
|
||||||
used, the limits depend on the CONFIG_HZ option:
|
|
||||||
HZ=1000: min=20000us (20ms)
|
|
||||||
HZ=250: min=80000us (80ms)
|
|
||||||
HZ=100: min=200000us (200ms)
|
|
||||||
|
|
||||||
The highest value of kernel and HW latency restrictions is shown and
|
|
||||||
used as the minimum sampling rate.
|
|
||||||
|
|
||||||
* up_threshold:
|
|
||||||
|
|
||||||
This defines what the average CPU usage between the samplings of
|
|
||||||
'sampling_rate' needs to be for the kernel to make a decision on
|
|
||||||
whether it should increase the frequency. For example when it is set
|
|
||||||
to its default value of '95' it means that between the checking
|
|
||||||
intervals the CPU needs to be on average more than 95% in use to then
|
|
||||||
decide that the CPU frequency needs to be increased.
|
|
||||||
|
|
||||||
* ignore_nice_load:
|
|
||||||
|
|
||||||
This parameter takes a value of '0' or '1'. When set to '0' (its
|
|
||||||
default), all processes are counted towards the 'cpu utilisation'
|
|
||||||
value. When set to '1', the processes that are run with a 'nice'
|
|
||||||
value will not count (and thus be ignored) in the overall usage
|
|
||||||
calculation. This is useful if you are running a CPU intensive
|
|
||||||
calculation on your laptop that you do not care how long it takes to
|
|
||||||
complete as you can 'nice' it and prevent it from taking part in the
|
|
||||||
deciding process of whether to increase your CPU frequency.
|
|
||||||
|
|
||||||
* sampling_down_factor:
|
|
||||||
|
|
||||||
This parameter controls the rate at which the kernel makes a decision
|
|
||||||
on when to decrease the frequency while running at top speed. When set
|
|
||||||
to 1 (the default) decisions to reevaluate load are made at the same
|
|
||||||
interval regardless of current clock speed. But when set to greater
|
|
||||||
than 1 (e.g. 100) it acts as a multiplier for the scheduling interval
|
|
||||||
for reevaluating load when the CPU is at its top speed due to high
|
|
||||||
load. This improves performance by reducing the overhead of load
|
|
||||||
evaluation and helping the CPU stay at its top speed when truly busy,
|
|
||||||
rather than shifting back and forth in speed. This tunable has no
|
|
||||||
effect on behavior at lower speeds/lower CPU loads.
|
|
||||||
|
|
||||||
* powersave_bias:
|
|
||||||
|
|
||||||
This parameter takes a value between 0 to 1000. It defines the
|
|
||||||
percentage (times 10) value of the target frequency that will be
|
|
||||||
shaved off of the target. For example, when set to 100 -- 10%, when
|
|
||||||
ondemand governor would have targeted 1000 MHz, it will target
|
|
||||||
1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0
|
|
||||||
(disabled) by default.
|
|
||||||
|
|
||||||
When AMD frequency sensitivity powersave bias driver --
|
|
||||||
drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter
|
|
||||||
defines the workload frequency sensitivity threshold in which a lower
|
|
||||||
frequency is chosen instead of ondemand governor's original target.
|
|
||||||
The frequency sensitivity is a hardware reported (on AMD Family 16h
|
|
||||||
Processors and above) value between 0 to 100% that tells software how
|
|
||||||
the performance of the workload running on a CPU will change when
|
|
||||||
frequency changes. A workload with sensitivity of 0% (memory/IO-bound)
|
|
||||||
will not perform any better on higher core frequency, whereas a
|
|
||||||
workload with sensitivity of 100% (CPU-bound) will perform better
|
|
||||||
higher the frequency. When the driver is loaded, this is set to 400 by
|
|
||||||
default -- for CPUs running workloads with sensitivity value below
|
|
||||||
40%, a lower frequency is chosen. Unloading the driver or writing 0
|
|
||||||
will disable this feature.
|
|
||||||
|
|
||||||
|
|
||||||
2.5 Conservative
|
|
||||||
----------------
|
|
||||||
|
|
||||||
The CPUfreq governor "conservative", much like the "ondemand"
|
|
||||||
governor, sets the CPU frequency depending on the current usage. It
|
|
||||||
differs in behaviour in that it gracefully increases and decreases the
|
|
||||||
CPU speed rather than jumping to max speed the moment there is any load
|
|
||||||
on the CPU. This behaviour is more suitable in a battery powered
|
|
||||||
environment. The governor is tweaked in the same manner as the
|
|
||||||
"ondemand" governor through sysfs with the addition of:
|
|
||||||
|
|
||||||
* freq_step:
|
|
||||||
|
|
||||||
This describes what percentage steps the cpu freq should be increased
|
|
||||||
and decreased smoothly by. By default the cpu frequency will increase
|
|
||||||
in 5% chunks of your maximum cpu frequency. You can change this value
|
|
||||||
to anywhere between 0 and 100 where '0' will effectively lock your CPU
|
|
||||||
at a speed regardless of its load whilst '100' will, in theory, make
|
|
||||||
it behave identically to the "ondemand" governor.
|
|
||||||
|
|
||||||
* down_threshold:
|
|
||||||
|
|
||||||
Same as the 'up_threshold' found for the "ondemand" governor but for
|
|
||||||
the opposite direction. For example when set to its default value of
|
|
||||||
'20' it means that if the CPU usage needs to be below 20% between
|
|
||||||
samples to have the frequency decreased.
|
|
||||||
|
|
||||||
* sampling_down_factor:
|
|
||||||
|
|
||||||
Similar functionality as in "ondemand" governor. But in
|
|
||||||
"conservative", it controls the rate at which the kernel makes a
|
|
||||||
decision on when to decrease the frequency while running in any speed.
|
|
||||||
Load for frequency increase is still evaluated every sampling rate.
|
|
||||||
|
|
||||||
|
|
||||||
2.6 Schedutil
|
|
||||||
-------------
|
|
||||||
|
|
||||||
The "schedutil" governor aims at better integration with the Linux
|
|
||||||
kernel scheduler. Load estimation is achieved through the scheduler's
|
|
||||||
Per-Entity Load Tracking (PELT) mechanism, which also provides
|
|
||||||
information about the recent load [1]. This governor currently does
|
|
||||||
load based DVFS only for tasks managed by CFS. RT and DL scheduler tasks
|
|
||||||
are always run at the highest frequency. Unlike all the other
|
|
||||||
governors, the code is located under the kernel/sched/ directory.
|
|
||||||
|
|
||||||
Sysfs files:
|
|
||||||
|
|
||||||
* rate_limit_us:
|
|
||||||
|
|
||||||
This contains a value in microseconds. The governor waits for
|
|
||||||
rate_limit_us time before reevaluating the load again, after it has
|
|
||||||
evaluated the load once.
|
|
||||||
|
|
||||||
For an in-depth comparison with the other governors refer to [2].
|
|
||||||
|
|
||||||
|
|
||||||
3. The Governor Interface in the CPUfreq Core
|
|
||||||
=============================================
|
|
||||||
|
|
||||||
A new governor must register itself with the CPUfreq core using
|
|
||||||
"cpufreq_register_governor". The struct cpufreq_governor, which has to
|
|
||||||
be passed to that function, must contain the following values:
|
|
||||||
|
|
||||||
governor->name - A unique name for this governor.
|
|
||||||
governor->owner - .THIS_MODULE for the governor module (if appropriate).
|
|
||||||
|
|
||||||
plus a set of hooks to the functions implementing the governor's logic.
|
|
||||||
|
|
||||||
The CPUfreq governor may call the CPU processor driver using one of
|
|
||||||
these two functions:
|
|
||||||
|
|
||||||
int cpufreq_driver_target(struct cpufreq_policy *policy,
|
|
||||||
unsigned int target_freq,
|
|
||||||
unsigned int relation);
|
|
||||||
|
|
||||||
int __cpufreq_driver_target(struct cpufreq_policy *policy,
|
|
||||||
unsigned int target_freq,
|
|
||||||
unsigned int relation);
|
|
||||||
|
|
||||||
target_freq must be within policy->min and policy->max, of course.
|
|
||||||
What's the difference between these two functions? When your governor is
|
|
||||||
in a direct code path of a call to governor callbacks, like
|
|
||||||
governor->start(), the policy->rwsem is still held in the cpufreq core,
|
|
||||||
and there's no need to lock it again (in fact, this would cause a
|
|
||||||
deadlock). So use __cpufreq_driver_target only in these cases. In all
|
|
||||||
other cases (for example, when there's a "daemonized" function that
|
|
||||||
wakes up every second), use cpufreq_driver_target to take policy->rwsem
|
|
||||||
before the command is passed to the cpufreq driver.
|
|
||||||
|
|
||||||
4. References
|
|
||||||
=============
|
|
||||||
|
|
||||||
[1] Per-entity load tracking: https://lwn.net/Articles/531853/
|
|
||||||
[2] Improvements in CPU frequency management: https://lwn.net/Articles/682391/
|
|
||||||
|
|
@ -21,8 +21,6 @@ Documents in this directory:
|
|||||||
|
|
||||||
amd-powernow.txt - AMD powernow driver specific file.
|
amd-powernow.txt - AMD powernow driver specific file.
|
||||||
|
|
||||||
boost.txt - Frequency boosting support.
|
|
||||||
|
|
||||||
core.txt - General description of the CPUFreq core and
|
core.txt - General description of the CPUFreq core and
|
||||||
of CPUFreq notifiers.
|
of CPUFreq notifiers.
|
||||||
|
|
||||||
@ -32,17 +30,12 @@ cpufreq-nforce2.txt - nVidia nForce2 platform specific file.
|
|||||||
|
|
||||||
cpufreq-stats.txt - General description of sysfs cpufreq stats.
|
cpufreq-stats.txt - General description of sysfs cpufreq stats.
|
||||||
|
|
||||||
governors.txt - What are cpufreq governors and how to
|
|
||||||
implement them?
|
|
||||||
|
|
||||||
index.txt - File index, Mailing list and Links (this document)
|
index.txt - File index, Mailing list and Links (this document)
|
||||||
|
|
||||||
intel-pstate.txt - Intel pstate cpufreq driver specific file.
|
intel-pstate.txt - Intel pstate cpufreq driver specific file.
|
||||||
|
|
||||||
pcc-cpufreq.txt - PCC cpufreq driver specific file.
|
pcc-cpufreq.txt - PCC cpufreq driver specific file.
|
||||||
|
|
||||||
user-guide.txt - User Guide to CPUFreq
|
|
||||||
|
|
||||||
|
|
||||||
Mailing List
|
Mailing List
|
||||||
------------
|
------------
|
||||||
|
@ -1,228 +0,0 @@
|
|||||||
CPU frequency and voltage scaling code in the Linux(TM) kernel
|
|
||||||
|
|
||||||
|
|
||||||
L i n u x C P U F r e q
|
|
||||||
|
|
||||||
U S E R G U I D E
|
|
||||||
|
|
||||||
|
|
||||||
Dominik Brodowski <linux@brodo.de>
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Clock scaling allows you to change the clock speed of the CPUs on the
|
|
||||||
fly. This is a nice method to save battery power, because the lower
|
|
||||||
the clock speed, the less power the CPU consumes.
|
|
||||||
|
|
||||||
|
|
||||||
Contents:
|
|
||||||
---------
|
|
||||||
1. Supported Architectures and Processors
|
|
||||||
1.1 ARM and ARM64
|
|
||||||
1.2 x86
|
|
||||||
1.3 sparc64
|
|
||||||
1.4 ppc
|
|
||||||
1.5 SuperH
|
|
||||||
1.6 Blackfin
|
|
||||||
|
|
||||||
2. "Policy" / "Governor"?
|
|
||||||
2.1 Policy
|
|
||||||
2.2 Governor
|
|
||||||
|
|
||||||
3. How to change the CPU cpufreq policy and/or speed
|
|
||||||
3.1 Preferred interface: sysfs
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
1. Supported Architectures and Processors
|
|
||||||
=========================================
|
|
||||||
|
|
||||||
1.1 ARM and ARM64
|
|
||||||
-----------------
|
|
||||||
|
|
||||||
Almost all ARM and ARM64 platforms support CPU frequency scaling.
|
|
||||||
|
|
||||||
1.2 x86
|
|
||||||
-------
|
|
||||||
|
|
||||||
The following processors for the x86 architecture are supported by cpufreq:
|
|
||||||
|
|
||||||
AMD Elan - SC400, SC410
|
|
||||||
AMD mobile K6-2+
|
|
||||||
AMD mobile K6-3+
|
|
||||||
AMD mobile Duron
|
|
||||||
AMD mobile Athlon
|
|
||||||
AMD Opteron
|
|
||||||
AMD Athlon 64
|
|
||||||
Cyrix Media GXm
|
|
||||||
Intel mobile PIII and Intel mobile PIII-M on certain chipsets
|
|
||||||
Intel Pentium 4, Intel Xeon
|
|
||||||
Intel Pentium M (Centrino)
|
|
||||||
National Semiconductors Geode GX
|
|
||||||
Transmeta Crusoe
|
|
||||||
Transmeta Efficeon
|
|
||||||
VIA Cyrix 3 / C3
|
|
||||||
various processors on some ACPI 2.0-compatible systems [*]
|
|
||||||
And many more
|
|
||||||
|
|
||||||
[*] Only if "ACPI Processor Performance States" are available
|
|
||||||
to the ACPI<->BIOS interface.
|
|
||||||
|
|
||||||
|
|
||||||
1.3 sparc64
|
|
||||||
-----------
|
|
||||||
|
|
||||||
The following processors for the sparc64 architecture are supported by
|
|
||||||
cpufreq:
|
|
||||||
|
|
||||||
UltraSPARC-III
|
|
||||||
|
|
||||||
|
|
||||||
1.4 ppc
|
|
||||||
-------
|
|
||||||
|
|
||||||
Several "PowerBook" and "iBook2" notebooks are supported.
|
|
||||||
The following POWER processors are supported in powernv mode:
|
|
||||||
POWER8
|
|
||||||
POWER9
|
|
||||||
|
|
||||||
1.5 SuperH
|
|
||||||
----------
|
|
||||||
|
|
||||||
All SuperH processors supporting rate rounding through the clock
|
|
||||||
framework are supported by cpufreq.
|
|
||||||
|
|
||||||
1.6 Blackfin
|
|
||||||
------------
|
|
||||||
|
|
||||||
The following Blackfin processors are supported by cpufreq:
|
|
||||||
|
|
||||||
BF522, BF523, BF524, BF525, BF526, BF527, Rev 0.1 or higher
|
|
||||||
BF531, BF532, BF533, Rev 0.3 or higher
|
|
||||||
BF534, BF536, BF537, Rev 0.2 or higher
|
|
||||||
BF561, Rev 0.3 or higher
|
|
||||||
BF542, BF544, BF547, BF548, BF549, Rev 0.1 or higher
|
|
||||||
|
|
||||||
|
|
||||||
2. "Policy" / "Governor" ?
|
|
||||||
==========================
|
|
||||||
|
|
||||||
Some CPU frequency scaling-capable processor switch between various
|
|
||||||
frequencies and operating voltages "on the fly" without any kernel or
|
|
||||||
user involvement. This guarantees very fast switching to a frequency
|
|
||||||
which is high enough to serve the user's needs, but low enough to save
|
|
||||||
power.
|
|
||||||
|
|
||||||
|
|
||||||
2.1 Policy
|
|
||||||
----------
|
|
||||||
|
|
||||||
On these systems, all you can do is select the lower and upper
|
|
||||||
frequency limit as well as whether you want more aggressive
|
|
||||||
power-saving or more instantly available processing power.
|
|
||||||
|
|
||||||
|
|
||||||
2.2 Governor
|
|
||||||
------------
|
|
||||||
|
|
||||||
On all other cpufreq implementations, these boundaries still need to
|
|
||||||
be set. Then, a "governor" must be selected. Such a "governor" decides
|
|
||||||
what speed the processor shall run within the boundaries. One such
|
|
||||||
"governor" is the "userspace" governor. This one allows the user - or
|
|
||||||
a yet-to-implement userspace program - to decide what specific speed
|
|
||||||
the processor shall run at.
|
|
||||||
|
|
||||||
|
|
||||||
3. How to change the CPU cpufreq policy and/or speed
|
|
||||||
====================================================
|
|
||||||
|
|
||||||
3.1 Preferred Interface: sysfs
|
|
||||||
------------------------------
|
|
||||||
|
|
||||||
The preferred interface is located in the sysfs filesystem. If you
|
|
||||||
mounted it at /sys, the cpufreq interface is located in a subdirectory
|
|
||||||
"cpufreq" within the cpu-device directory
|
|
||||||
(e.g. /sys/devices/system/cpu/cpu0/cpufreq/ for the first CPU).
|
|
||||||
|
|
||||||
affected_cpus : List of Online CPUs that require software
|
|
||||||
coordination of frequency.
|
|
||||||
|
|
||||||
cpuinfo_cur_freq : Current frequency of the CPU as obtained from
|
|
||||||
the hardware, in KHz. This is the frequency
|
|
||||||
the CPU actually runs at.
|
|
||||||
|
|
||||||
cpuinfo_min_freq : this file shows the minimum operating
|
|
||||||
frequency the processor can run at(in kHz)
|
|
||||||
|
|
||||||
cpuinfo_max_freq : this file shows the maximum operating
|
|
||||||
frequency the processor can run at(in kHz)
|
|
||||||
|
|
||||||
cpuinfo_transition_latency The time it takes on this CPU to
|
|
||||||
switch between two frequencies in nano
|
|
||||||
seconds. If unknown or known to be
|
|
||||||
that high that the driver does not
|
|
||||||
work with the ondemand governor, -1
|
|
||||||
(CPUFREQ_ETERNAL) will be returned.
|
|
||||||
Using this information can be useful
|
|
||||||
to choose an appropriate polling
|
|
||||||
frequency for a kernel governor or
|
|
||||||
userspace daemon. Make sure to not
|
|
||||||
switch the frequency too often
|
|
||||||
resulting in performance loss.
|
|
||||||
|
|
||||||
related_cpus : List of Online + Offline CPUs that need software
|
|
||||||
coordination of frequency.
|
|
||||||
|
|
||||||
scaling_available_frequencies : List of available frequencies, in KHz.
|
|
||||||
|
|
||||||
scaling_available_governors : this file shows the CPUfreq governors
|
|
||||||
available in this kernel. You can see the
|
|
||||||
currently activated governor in
|
|
||||||
|
|
||||||
scaling_cur_freq : Current frequency of the CPU as determined by
|
|
||||||
the governor and cpufreq core, in KHz. This is
|
|
||||||
the frequency the kernel thinks the CPU runs
|
|
||||||
at.
|
|
||||||
|
|
||||||
scaling_driver : this file shows what cpufreq driver is
|
|
||||||
used to set the frequency on this CPU
|
|
||||||
|
|
||||||
scaling_governor, and by "echoing" the name of another
|
|
||||||
governor you can change it. Please note
|
|
||||||
that some governors won't load - they only
|
|
||||||
work on some specific architectures or
|
|
||||||
processors.
|
|
||||||
|
|
||||||
scaling_min_freq and
|
|
||||||
scaling_max_freq show the current "policy limits" (in
|
|
||||||
kHz). By echoing new values into these
|
|
||||||
files, you can change these limits.
|
|
||||||
NOTE: when setting a policy you need to
|
|
||||||
first set scaling_max_freq, then
|
|
||||||
scaling_min_freq.
|
|
||||||
|
|
||||||
scaling_setspeed This can be read to get the currently programmed
|
|
||||||
value by the governor. This can be written to
|
|
||||||
change the current frequency for a group of
|
|
||||||
CPUs, represented by a policy. This is supported
|
|
||||||
currently only by the userspace governor.
|
|
||||||
|
|
||||||
bios_limit : If the BIOS tells the OS to limit a CPU to
|
|
||||||
lower frequencies, the user can read out the
|
|
||||||
maximum available frequency from this file.
|
|
||||||
This typically can happen through (often not
|
|
||||||
intended) BIOS settings, restrictions
|
|
||||||
triggered through a service processor or other
|
|
||||||
BIOS/HW based implementations.
|
|
||||||
This does not cover thermal ACPI limitations
|
|
||||||
which can be detected through the generic
|
|
||||||
thermal driver.
|
|
||||||
|
|
||||||
If you have selected the "userspace" governor which allows you to
|
|
||||||
set the CPU operating frequency to a specific value, you can read out
|
|
||||||
the current frequency in
|
|
||||||
|
|
||||||
scaling_setspeed. By "echoing" a new frequency into this
|
|
||||||
you can change the speed of the CPU,
|
|
||||||
but only within the limits of
|
|
||||||
scaling_min_freq and scaling_max_freq.
|
|
Loading…
Reference in New Issue
Block a user