This patch adds the rk3328 compatible string for supporting
the generic cpufreq driver on RK3328.
Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the current implementation, the response latency between seeing
SCHED_CPUFREQ_IOWAIT set and the actual P-state adjustment can be up
to 10ms. It can be reduced by bumping up the P-state to the max at
the time SCHED_CPUFREQ_IOWAIT is passed to intel_pstate_update_util().
With this change, the IO performance improves significantly.
For a simple "grep -r . linux" (Here linux is the kernel source
folder) with caches dropped every time on a Broadwell Xeon workstation
with per-core P-states, the user and system time is shorter by as much
as 30% - 40%.
The same performance difference was not observed on clients that don't
support per-core P-state.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On many platforms, CPUs can do DVFS across cpufreq policies. i.e CPU
from policy-A can change frequency of CPUs belonging to policy-B.
This is quite common in case of ARM platforms where we don't
configure any per-cpu register.
Add a flag to identify such platforms and update
cpufreq_can_do_remote_dvfs() to allow remote callbacks if this flag is
set.
Also enable the flag for cpufreq-dt driver which is used only on ARM
platforms currently.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into cpufreq governors are only made from the scheduler if the target
CPU of the event is the same as the current CPU. This means there are
certain situations where a target CPU may not run the cpufreq governor
for some time.
One testcase to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that the new tasks should receive maximum
demand initially, this should result in CPU0 increasing frequency
immediately. But because of the above mentioned limitation though, this
does not occur.
This patch updates the scheduler core to call the cpufreq callbacks for
remote CPUs as well.
The schedutil, ondemand and conservative governors are updated to
process cpufreq utilization update hooks called for remote CPUs where
the remote CPU is managed by the cpufreq policy of the local CPU.
The intel_pstate driver is updated to always reject remote callbacks.
This is tested with couple of usecases (Android: hackbench, recentfling,
galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
octa-core, single policy). Only galleryfling showed minor improvements,
while others didn't had much deviation.
The reason being that this patch only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:
- Task is migrated to another CPU.
- The task has high demand, and should take the target CPU to higher
OPPs.
- And the target CPU doesn't call into the cpufreq governor until the
next tick.
Based on initial work from Steve Muckle.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit 62611cb912 (intel_pstate: delete scheduler hook in HWP
mode) the INTEL_PSTATE_HWP_SAMPLING_INTERVAL is not used anywhere in
the code, so drop it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The ->get callback in the intel_pstate structure was mostly there
for the scaling_cur_freq sysfs attribute to work, but after commit
f8475cef90 (x86: use common aperfmperf_khz_on_cpu() to calculate
KHz using APERF/MPERF) that attribute uses arch_freq_get_on_cpu()
provided by the x86 arch code on all processors supported by
intel_pstate, so it doesn't need the ->get callback from the
driver any more.
Moreover, the very presence of the ->get callback in the intel_pstate
structure causes the cpuinfo_cur_freq attribute to be present when
intel_pstate operates in the active mode, which is bogus, because
the role of that attribute is to return the current CPU frequency
as seen by the hardware. For intel_pstate, though, this is just an
average frequency and not really current, but computed for the
previous sampling interval (the actual current frequency may be
way different at the point this value is obtained by reading from
cpuinfo_cur_freq), and after commit 82b4e03e01 (intel_pstate: skip
scheduler hook when in "performance" mode) the value in
cpuinfo_cur_freq may be stale or just 0, depending on the driver's
operation mode. In fact, however, on the hardware supported by
intel_pstate there is no way to read the current CPU frequency
from it, so the cpuinfo_cur_freq attribute should not be present
at all when this driver is in use.
For this reason, drop intel_pstate_get() and clear the ->get
callback pointer pointing to it, so that the cpuinfo_cur_freq is
not present for intel_pstate in the active mode any more.
Fixes: 82b4e03e01 (intel_pstate: skip scheduler hook when in "performance" mode)
Reported-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
for_each_compatible_node performs an of_node_get on each iteration, so a
return from the loop requires an of_node_put.
The semantic patch that fixes this problem is as follows
(http://coccinelle.lip6.fr):
// <smpl>
@@
local idexpression n;
expression e,e1,e2;
statement S;
iterator i1;
iterator name for_each_compatible_node;
@@
for_each_compatible_node(n,e1,e2) {
...
(
of_node_put(n);
|
e = n
|
return n;
|
i1(...,n,...) S
|
+ of_node_put(n);
? return ...;
)
...
}
// </smpl>
Additionally, call of_node_put on the previous value of np, obtained from
of_find_compatible_node, that is no longer accessible at the point of the
for_each_compatible_node.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
All systems use the same P-state selection "powersave" algorithm
in the active mode if HWP is not used, so there's no need to provide
a pointer for it in struct pstate_funcs any more.
Drop ->update_util from struct pstate_funcs and make
intel_pstate_set_update_util_hook() use intel_pstate_update_util()
directly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
All systems with a defined ACPI preferred profile that are not
"servers" have been using the load-based P-state selection algorithm
in intel_pstate since 4.12-rc1 (mobile systems and laptops have been
using it since 4.10-rc1) and no problems with it have been reported
to date. In particular, no regressions with respect to the PID-based
P-state selection have been reported. Also testing indicates that
the P-state selection algorithm based on CPU load is generally on par
with the PID-based algorithm performance-wise, and for some workloads
it turns out to be better than the other one, while being more
straightforward and easier to understand at the same time.
Moreover, the PID-based P-state selection algorithm in intel_pstate
is known to be unstable in some situation and generally problematic,
the issues with it are hard to address and it has become a
significant maintenance burden.
For these reasons, make intel_pstate use the "powersave" P-state
selection algorithm based on CPU load in the active mode on all
systems and drop the PID-based P-state selection code along with
all things related to it from the driver. Also update the
documentation accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
With the recent updates, CPUFREQ_ETERNAL is only used by the drivers
which don't know their transition latency but want to use dynamic
switching.
Anyway, the routine cpufreq_policy_transition_delay_us() caps the value
of transition latency to 10 ms now and that can be used safely with such
platforms.
Remove the check from cpufreq_init_governor() and allow dynamic
switching for such configurations as well.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The policy->transition_latency field is used for multiple purposes
today and its not straight forward at all. This is how it is used:
A. Set the correct transition_latency value.
B. Set it to CPUFREQ_ETERNAL because:
1. We don't want automatic dynamic switching (with
ondemand/conservative) to happen at all.
2. We don't know the transition latency.
This patch handles the B.1. case in a more readable way. A new flag for
the cpufreq drivers is added to disallow use of cpufreq governors which
have dynamic_switching flag set.
All the current cpufreq drivers which are setting transition_latency
unconditionally to CPUFREQ_ETERNAL are updated to use it. They don't
need to set transition_latency anymore.
There shouldn't be any functional change after this patch.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no limitation in the ondemand or conservative governors which
disallow the transition_latency to be greater than 10 ms.
The max_transition_latency field is rather used to disallow automatic
dynamic frequency switching for platforms which didn't wanted these
governors to run.
Replace max_transition_latency with a boolean (dynamic_switching) and
check for transition_latency == CPUFREQ_ETERNAL along with that. This
makes it pretty straight forward to read/understand now.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
All users of arm_big_little driver are defining it and there is no need
to keep it optional.
Make it mandatory to remove the always true conditional statement.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The transition_latency field isn't used for drivers with ->setpolicy()
callback present and there is no point setting it from the drivers.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The policy->transition_delay_us field is used only by the schedutil
governor currently, and this field describes how fast the driver wants
the cpufreq governor to change CPUs frequency. It should rather be a
common thing across all governors, as it doesn't have any schedutil
dependency here.
Create a new helper cpufreq_policy_transition_delay_us() to get the
transition delay across all governors.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpufreq core and governors aren't supposed to set a limit on how
fast we want to try changing the frequency. This is currently done for
the legacy governors with help of min_sampling_rate.
At worst, we may end up setting the sampling rate to a value lower than
the rate at which frequency can be changed and then one of the CPUs in
the policy will be only changing frequency for ever.
But that is something for the user to decide and there is no need to
have special handling for such cases in the core. Leave it for the user
to figure out.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On tango platforms, firmware configures the CPU clock, and Linux is
then only allowed to use the cpu_clk_divider to change the frequency.
Build the OPP table dynamically at init, in order to support whatever
firmware throws at us.
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
MT2701/MT7623 is a 32-bit ARMv7 based quad-core (4 * Cortex-A7) with
single cluster and this hardware is also compatible with the existing
driver through enabling CPU frequency feature with operating-points-v2
bindings. Also, this driver actually supports all MediaTek SoCs, the
Kconfig menu entry and file name itself should be updated with more
generic name to drop "MT8173"
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add zynqmp to the cpufreq dt platform device.
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove unnecessary static on local variable hostbridge.
Such variable is initialized before being used,
on every execution path throughout the function.
The static has no benefit and, removing it reduces
the code size.
This issue was detected using Coccinelle and the following semantic patch:
@bad exists@
position p;
identifier x;
type T;
@@
static T x@p;
...
x = <+...x...+>
@@
identifier x;
expression e;
type T;
position p != bad.p;
@@
-static
T x@p;
... when != x
when strict
?x = e;
In the following log you can see the difference in the code size. Also,
there is a significant difference in the bss segment. This log is the
output of the size command, before and after the code change:
before:
text data bss dec hex filename
5084 3392 256 8732 221c drivers/cpufreq/speedstep-ich.o
after:
text data bss dec hex filename
5062 3304 192 8558 216e drivers/cpufreq/speedstep-ich.o
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Avoid clearing the PCI PME Enable bit for devices as a result of
config space restoration which confuses AML executed afterward and
causes wakeup events to be lost on some systems (Rafael Wysocki).
- Fix the native PCIe PME interrupts handling in the cases when the
PME IRQ is set up as a system wakeup one so that runtime PM remote
wakeup works as expected after system resume on systems where that
happens (Rafael Wysocki).
- Fix the device PM QoS sysfs interface to handle invalid user input
correctly instead of using an unititialized variable value as the
latency tolerance for the device at hand (Dan Carpenter).
- Get rid of one more rounding error from intel_pstate computations
(Srinivas Pandruvada).
- Fix the schedutil cpufreq governor to prevent it from possibly
accessing unititialized data structures from governor callbacks in
some cases on systems when multiple CPUs share a single cpufreq
policy object (Vikram Mulukutla).
- Fix the return values of probe routines in two devfreq drivers
(Gustavo Silva).
- Constify an attribute_group structure in devfreq (Arvind Yadav).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZaLe2AAoJEILEb/54YlRxbi8P/jbQkFdtZinL8eR5DNlUt9jn
ZzOnPNNJL0xj2dRJ8qpmHYT1PAQQGIhWyiXavbJqLeZeO5f4AFnFa8Uya+oq6UfP
rv73RIk+qaogUccdqfa7Y3IcBhuER9q2baSIguLEt4w7+szyiWO+XonK640iTRNz
moUcf2MCA9EacvwlmANQbnimB7mvwz4Tupgn6zK6zh2BJEBYlkWRbqXE1Zm6tJXb
+jYwKY0W/hsJbLAUfhbz0Iz6FhvE/ix46NTRw33gWyjmmsUSn4KvIF6mq1+RplD9
6Rvka6pilqSIWoy3Wr4irAQkaOA8WecvwKGtmTh6mkfQC8TyNbQEHwD0EBSsht9n
G1OHaWLv7m8PKaxmaLMvQEd8gYWmKAF3EZHA6zT2qN+LCPkMKzab/dEhsU/rxuR2
Nda57D5iNsGIETfVws9FBeYKOw64gb6TOQi8bunLPQbg15n4XWuL5IjtgnPwHFcU
xkaxE5UbAmSLIDM8drevIQGIgrEsDDCgezvnVBV8vCYwUyBbzuBb+T6jibPMdNDM
t0DiF8QwQEGJcxYXEd5FpPamS3rmeKxcf234kzf9lHq0Msq6lMFdhihoJvZJ6rw/
F18ZkAT3ni546CRmknJrUmeg7FjwHsTgJo7K7MArIcHBLhsA59+Bv2Mh+UIH//yT
57c1OquHgPXx1uTULMC3
=G9eQ
-----END PGP SIGNATURE-----
Merge tag 'pm-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a recently exposed issue in the PCI device wakeup code and
one older problem related to PCI device wakeup that has been reported
recently, modify one more piece of computations in intel_pstate to get
rid of a rounding error, fix a possible race in the schedutil cpufreq
governor, fix the device PM QoS sysfs interface to correctly handle
invalid user input, fix return values of two probe routines in devfreq
drivers and constify an attribute_group structure in devfreq.
Specifics:
- Avoid clearing the PCI PME Enable bit for devices as a result of
config space restoration which confuses AML executed afterward and
causes wakeup events to be lost on some systems (Rafael Wysocki).
- Fix the native PCIe PME interrupts handling in the cases when the
PME IRQ is set up as a system wakeup one so that runtime PM remote
wakeup works as expected after system resume on systems where that
happens (Rafael Wysocki).
- Fix the device PM QoS sysfs interface to handle invalid user input
correctly instead of using an unititialized variable value as the
latency tolerance for the device at hand (Dan Carpenter).
- Get rid of one more rounding error from intel_pstate computations
(Srinivas Pandruvada).
- Fix the schedutil cpufreq governor to prevent it from possibly
accessing unititialized data structures from governor callbacks in
some cases on systems when multiple CPUs share a single cpufreq
policy object (Vikram Mulukutla).
- Fix the return values of probe routines in two devfreq drivers
(Gustavo Silva).
- Constify an attribute_group structure in devfreq (Arvind Yadav)"
* tag 'pm-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI / PM: Fix native PME handling during system suspend/resume
PCI / PM: Restore PME Enable after config space restoration
cpufreq: schedutil: Fix sugov_start() versus sugov_update_shared() race
PM / QoS: return -EINVAL for bogus strings
cpufreq: intel_pstate: Fix ratio setting for min_perf_pct
PM / devfreq: constify attribute_group structures.
PM / devfreq: tegra: fix error return code in tegra_devfreq_probe()
PM / devfreq: rk3399_dmc: fix error return code in rk3399_dmcfreq_probe()
Pull thermal management updates from Zhang Rui:
- Improve thermal cpu_cooling interaction with cpufreq core.
The cpu_cooling driver is designed to use CPU frequency scaling to
avoid high thermal states for a platform. But it wasn't glued really
well with cpufreq core.
For example clipped-cpus is copied from the policy structure and its
much better to use the policy->cpus (or related_cpus) fields directly
as they may have got updated. Not that things were broken before this
series, but they can be optimized a bit more.
This series tries to improve interactions between cpufreq core and
cpu_cooling driver and does some fixes/cleanups to the cpu_cooling
driver. (Viresh Kumar)
- A couple of fixes and cleanups in thermal core and imx, hisilicon,
bcm_2835, int340x thermal drivers. (Arvind Yadav, Dan Carpenter,
Sumeet Pawnikar, Srinivas Pandruvada, Willy WOLFF)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (24 commits)
thermal: bcm2835: fix an error code in probe()
thermal: hisilicon: Handle return value of clk_prepare_enable
thermal: imx: Handle return value of clk_prepare_enable
thermal: int340x: check for sensor when PTYP is missing
Thermal/int340x: Fix few typos and kernel-doc style
thermal: fix source code documentation for parameters
thermal: cpu_cooling: Replace kmalloc with kmalloc_array
thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device
thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power()
thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev
thermal: cpu_cooling: get_level() can't fail
thermal: cpu_cooling: create structure for idle time stats
thermal: cpu_cooling: merge frequency and power tables
thermal: cpu_cooling: get rid of 'allowed_cpus'
thermal: cpu_cooling: OPPs are registered for all CPUs
thermal: cpu_cooling: store cpufreq policy
cpufreq: create cpufreq_table_count_valid_entries()
thermal: cpu_cooling: use cpufreq_policy to register cooling device
thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state()
thermal: cpu_cooling: remove cpufreq_cooling_get_level()
...
The busy percent calculated for the Knights Landing (KNL) platform
is 1024 times smaller than the correct busy value. This causes
performance to get stuck at the lowest ratio.
The scaling algorithm used for KNL is performance-based, but it still
looks at the CPU load to set the scaled busy factor to 0 when the
load is less than 1 percent. In this case, since the computed load
is 1024x smaller than it should be, the scaled busy factor will
always be 0, irrespective of CPU business.
This needs a fix similar to the turbostat one in commit b2b34dfe4d
(tools/power turbostat: KNL workaround for %Busy and Avg_MHz).
For this reason, add one more callback to processor-specific
callbacks to specify an MPERF multiplier represented by a number of
bit positions to shift the value of that register to the left to
copmensate for its rate difference with respect to the TSC. This
shift value is used during CPU busy calculations.
Fixes: ffb810563c (intel_pstate: Avoid getting stuck in high P-states when idle)
Reported-and-tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When the minimum performance limit percentage is set to the power-up
default, it is possible that minimum performance ratio is off by one.
In the set_policy() callback the minimum ratio is calculated by
applying global.min_perf_pct to turbo_ratio and rounding up, but the
power-up default global.min_perf_pct is already rounded up to the
next percent in min_perf_pct_min(). That results in two round up
operations, so for the default min_perf_pct one of them is not
required.
It is better to remove rounding up in min_perf_pct_min() as this
matches the displayed min_perf_pct prior to commit c5a2ee7dde
(cpufreq: intel_pstate: Active mode P-state limits rework) in 4.12.
For example on a platform with max turbo ratio of 37 and minimum
ratio of 10, the min_perf_pct resulted in 28 with the above commit.
Before this commit it was 27 and it will be the same after this
change.
Fixes: 1a4fe38add (cpufreq: intel_pstate: Remove max/min fractions to limit performance)
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Revert a recent change in the generic power domains (genpd)
framework that led to regressions and turned out the be misguided
(Rafael Wysocki).
- Fix a recently introduced build issue in the generic power domains
(genpd) framework (Arnd Bergmann).
- Constify attribute_group structures in the PM core, the cpufreq
stats code and in intel_pstate (Arvind Yadav).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZY/L6AAoJEILEb/54YlRxMBUP/0lCziyBNAUPm8+gpC7RA5gv
tZwtGPnDr2toWg3+te0Aqc3/LOKt4CFtQEui+IGGPA5ghoZZ53lPuxZH8MCEcv/I
LhoHmNK+2D088JViiaXlkanLgjcbtkgWKEgRQXOm75XlbaReW3wKmiPkc8iLTRde
tytdf82GUN0AKKWCsUMiiEWDCYs9mpM3MPX2GOS+ZPBZfX2cyubKZ9STPzzFDwFf
NfP+NuzJmYfEonBXproTa6ZqAq2UVGPeolyYe1lwV8QVCU8Z4W6GRAKbSzk0Hq6N
wcdpNaNmkQytjDQ1hZ0NNFecTH4qjStOkc9OwNZJwoSbC31sQGHyKnxWP8Re1+hU
UmpIAuNBc6eKJmkyoOE9GfIB08AvvuB4s7B3X8ffpWGqQmASYAY9DhEKDlPmmkhD
NV+HTUkebw+gZoJp6VGL072KGARrNEmodKrcmXA/T4T8ZwoHFbnQbzDaODzW7rzx
1UxwCtUa/Jl5hOPngo0XuLnbeM7AAG1MjaJSKDqoUl4WbjdYG3f7yRxs6T+JS+dk
1+NpVJiIKBM1bqP7Jf+v9xrbYG31w5blikxhCpjA601ztV0vgtiiojssKpNwjkpv
Myh1BavAaLcnMCkCfHppLlXv2bnLHFANMyMcPU+EuLzPDTsxxxfhmbBHBur4r7BA
DXmpRQvWCoAqlQzzvZoZ
=B+R7
-----END PGP SIGNATURE-----
Merge tag 'pm-extra-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These revert one recent change in the generic power domains
framework, fix a recently introduced build issue in there and
constify attribute_group structures in some places.
Specifics:
- Revert a recent change in the generic power domains (genpd)
framework that led to regressions and turned out the be misguided
(Rafael Wysocki).
- Fix a recently introduced build issue in the generic power domains
(genpd) framework (Arnd Bergmann).
- Constify attribute_group structures in the PM core, the cpufreq
stats code and in intel_pstate (Arvind Yadav)"
* tag 'pm-extra-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: constify attribute_group structures
cpufreq: cpufreq_stats: constify attribute_group structures
PM / sleep: constify attribute_group structures
PM / Domains: provide pm_genpd_poweroff_noirq() stub
Revert "PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device"
- New SoC specific drivers
- NVIDIA Tegra PM Domain support for newer SoCs (Tegra186 and later)
based on the "BPMP" firmware
- Clocksource and system controller drivers for the newly added
Action Semi platforms (both arm and arm64).
- Reset subsystem, merged through arm-soc by tradition:
- New drivers for Altera Stratix10, TI Keystone and Cortina Gemini SoCs
- Various subsystem-wide cleanups
- Updates for existing SoC-specific drivers
- TI GPMC (General Purpose Memory Controller)
- Mediatek "scpsys" system controller support for MT6797
- Broadcom "brcmstb_gisb" bus arbitrer
- ARM SCPI firmware
- Renesas "SYSC" system controller
One more driver update was submitted for the Freescale/NXP DPAA
data path acceleration that has previously been used on PowerPC
chips. I ended up postponing the merge until some API questions
for its unusual MMIO access are resolved.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAWVpDFmCrR//JCVInAQKoihAAosgC3+3IHppOhHid+HepkOcp2teyKknw
42fSXVpTdfWa8Oe7q80Kzmz2CPNfFq2SzHz6oXb9WCcDFqSGr0b9aSE7NnksRjTf
2euHVJ4MnJpkRewvorRmcpK8dPXDcHwEw/8hU3yZtJUGI0IKtlrqXis+evgkz9cn
YDynuVdAZgZiEfiNeSeadyNLcxaQCc3x9ovvsBXxBa1/x1pfeRnTbp+6hiHilCJu
Szts/yAzZzZE9Jkf9dLKfNlsT6m2SgtjukqqOR+zHAhi7/BdTFSVUP6L8u7QjrR+
+ijTICg8FMJGiWLAOe6ED2qZVByN92EJ2AGwriYlSles9ouoGfRkJ2rwxyjbete7
avy0HP+PSBFXWdwbOcq8HX8CrbuBltagl5fkMokct6biWLLMshNZ33WWdQ6/DsM2
b9mAAZuhbs0g1qWUBD3+q6qBytSuGme6Px6fMoVEc4GQ2YVFUQOoEfZOGKRv1U1o
aMWGt/6qeF8SG288rYAnQ/TuYWpOLtksV6yhotA00HUHhkTCy0xVCdyWGZtNyKhG
o/x4YnhWFzHsXmqKcR1sM7LzfZY/WNmbrOLvK6i83Z0P7QptqrdaLAylL3iSPEyX
ZDUgExf6PYXkWIewc3KwC5sJjuD05z3ZCgIR+mCezwbuD+3Z2fOdjodY/VBZ74hq
URcM/BqtuWE=
=5L6n
-----END PGP SIGNATURE-----
Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Arnd Bergmann:
"New SoC specific drivers:
- NVIDIA Tegra PM Domain support for newer SoCs (Tegra186 and later)
based on the "BPMP" firmware
- Clocksource and system controller drivers for the newly added
Action Semi platforms (both arm and arm64).
Reset subsystem, merged through arm-soc by tradition:
- New drivers for Altera Stratix10, TI Keystone and Cortina Gemini
SoCs
- Various subsystem-wide cleanups
Updates for existing SoC-specific drivers
- TI GPMC (General Purpose Memory Controller)
- Mediatek "scpsys" system controller support for MT6797
- Broadcom "brcmstb_gisb" bus arbitrer
- ARM SCPI firmware
- Renesas "SYSC" system controller
One more driver update was submitted for the Freescale/NXP DPAA data
path acceleration that has previously been used on PowerPC chips. I
ended up postponing the merge until some API questions for its unusual
MMIO access are resolved"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (35 commits)
clocksource: owl: Add S900 support
clocksource: Add Owl timer
soc: renesas: rcar-sysc: Use GENPD_FLAG_ALWAYS_ON
firmware: tegra: Fix locking bugs in BPMP
soc/tegra: flowctrl: Fix error handling
soc/tegra: bpmp: Implement generic PM domains
soc/tegra: bpmp: Update ABI header
PM / Domains: Allow overriding the ->xlate() callback
soc: brcmstb: enable drivers for ARM64 and BMIPS
soc: renesas: Rework Kconfig and Makefile logic
reset: Add the TI SCI reset driver
dt-bindings: reset: Add TI SCI reset binding
reset: use kref for reference counting
soc: qcom: smsm: Improve error handling, quiesce probe deferral
cpufreq: scpi: use new scpi_ops functions to remove duplicate code
firmware: arm_scpi: add support to populate OPPs and get transition latency
dt-bindings: reset: Add reset manager offsets for Stratix10
memory: omap-gpmc: add error message if bank-width property is absent
memory: omap-gpmc: make dts snippet include semicolon
reset: Add a Gemini reset controller
...
- Rework suspend-to-idle to allow it to take wakeup events signaled
by the EC into account on ACPI-based platforms in order to properly
support power button wakeup from suspend-to-idle on recent Dell
laptops (Rafael Wysocki).
That includes the core suspend-to-idle code rework, support for
the Low Power S0 _DSM interface, and support for the ACPI INT0002
Virtual GPIO device from Hans de Goede (required for USB keyboard
wakeup from suspend-to-idle to work on some machines).
- Stop trying to export the current CPU frequency via /proc/cpuinfo
on x86 as that is inaccurate and confusing (Len Brown).
- Rework the way in which the current CPU frequency is exported by
the kernel (over the cpufreq sysfs interface) on x86 systems with
the APERF and MPERF registers by always using values read from
these registers, when available, to compute the current frequency
regardless of which cpufreq driver is in use (Len Brown).
- Rework the PCI/ACPI device wakeup infrastructure to remove the
questionable and artificial distinction between "devices that
can wake up the system from sleep states" and "devices that can
generate wakeup signals in the working state" from it, which
allows the code to be simplified quite a bit (Rafael Wysocki).
- Fix the wakeup IRQ framework by making it use SRCU instead of
RCU which doesn't allow sleeping in the read-side critical
sections, but which in turn is expected to be allowed by the
IRQ bus locking infrastructure (Thomas Gleixner).
- Modify some computations in the intel_pstate driver to avoid
rounding errors resulting from them (Srinivas Pandruvada).
- Reduce the overhead of the intel_pstate driver in the HWP
(hardware-managed P-states) mode and when the "performance"
P-state selection algorithm is in use by making it avoid
registering scheduler callbacks in those cases (Len Brown).
- Rework the energy_performance_preference sysfs knob in
intel_pstate by changing the values that correspond to
different symbolic hint names used by it (Len Brown).
- Make it possible to use more than one cpuidle driver at the same
time on ARM (Daniel Lezcano).
- Make it possible to prevent the cpuidle menu governor from using
the 0 state by disabling it via sysfs (Nicholas Piggin).
- Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1
on AMD systems (Yazen Ghannam).
- Make the CPPC cpufreq driver take the lowest nonlinear performance
information into account (Prashanth Prakash).
- Add support for hi3660 to the cpufreq-dt driver, fix the
imx6q driver and clean up the sfi, exynos5440 and intel_pstate
drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila,
Rafael Wysocki, Tao Wang).
- Fix a few minor issues in the generic power domains (genpd)
framework and clean it up somewhat (Krzysztof Kozlowski,
Mikko Perttunen, Viresh Kumar).
- Fix a couple of minor issues in the operating performance points
(OPP) framework and clean it up somewhat (Viresh Kumar).
- Fix a CONFIG dependency in the hibernation core and clean it up
slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).
- Add rk3228 support to the rockchip-io adaptive voltage scaling
(AVS) driver (David Wu).
- Fix an incorrect bit shift operation in the RAPL power capping
driver (Adam Lessnau).
- Add support for the EPP field in the HWP (hardware managed
P-states) control register, HWP.EPP, to the x86_energy_perf_policy
tool and update msr-index.h with HWP.EPP values (Len Brown).
- Fix some minor issues in the turbostat tool (Len Brown).
- Add support for AMD family 0x17 CPUs to the cpupower tool and fix
a minor issue in it (Sherry Hurwitz).
- Assorted cleanups, mostly related to the constification of some
data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
Kozlowski).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZWrICAAoJEILEb/54YlRxZYMQAIRhfbyDxKq+ByvSilUS8kTA
AItwJ8FFzykhiwN75Cqabg4rAGyWma7IRs1vzU7zeC1aEQIn+bTQtvk+utZNI+g2
ANFlDha20q/sXsP/CDMMTIAdW9tSOC0TOvFI9s2V2Y8dJZhoekO4ctx34FAfUS5d
Ao6rwSAWCMsCXcGaTAlqTA+TEJmBG7u6Iq6hq6ngltoFwOv3mWWBVn52VVaJ7SMp
9/IPbbLGMFAedrgEBRGCR+MME1xZZpvcZIJaTt1Mgn7Cx3cJaysIUAvqY/SsvFGq
5FcUTcF2qpK3+AGawiAxZIjvOBsGRtIwqKinNIzYWs/NjiIdzmgVAmTeuPtTqp+5
HFehUdtkFcnuDnLqSNzAaZUa7tw84cJkwnbVMnesx0MkG6rZ1SeL22E2Sabpcdsh
3Yo1ThzJSxi59DhiiE92EQnNCEjmCldRy+8q5Ag035muxl6EJYvuNBMnZv/BMCUn
ltSNOrmps1DlN+Col8ORIeNzQ1YjYzWMqKAYzSbyccm4ug/iSHx0/DuESmQ4GTlF
YCwkmqyWiHrBwpl51jc+4a7SGlMmKRqU+MJes0CjagaaqoUAb8qeBOpzEJ0yNwjZ
wtI41l6blE6kbMD3yqGdCfiB2S7GlPVoxa15eX1wRyLH3fLjwwrzJirEaiBS86tI
1PzHZEOmBlh3DYC6DBKA
=Wsph
-----END PGP SIGNATURE-----
Merge tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The big ticket items here are the rework of suspend-to-idle in order
to add proper support for power button wakeup from it on recent Dell
laptops and the rework of interfaces exporting the current CPU
frequency on x86.
In addition to that, support for a few new pieces of hardware is
added, the PCI/ACPI device wakeup infrastructure is simplified
significantly and the wakeup IRQ framework is fixed to unbreak the IRQ
bus locking infrastructure.
Also, there are some functional improvements for intel_pstate, tools
updates and small fixes and cleanups all over.
Specifics:
- Rework suspend-to-idle to allow it to take wakeup events signaled
by the EC into account on ACPI-based platforms in order to properly
support power button wakeup from suspend-to-idle on recent Dell
laptops (Rafael Wysocki).
That includes the core suspend-to-idle code rework, support for the
Low Power S0 _DSM interface, and support for the ACPI INT0002
Virtual GPIO device from Hans de Goede (required for USB keyboard
wakeup from suspend-to-idle to work on some machines).
- Stop trying to export the current CPU frequency via /proc/cpuinfo
on x86 as that is inaccurate and confusing (Len Brown).
- Rework the way in which the current CPU frequency is exported by
the kernel (over the cpufreq sysfs interface) on x86 systems with
the APERF and MPERF registers by always using values read from
these registers, when available, to compute the current frequency
regardless of which cpufreq driver is in use (Len Brown).
- Rework the PCI/ACPI device wakeup infrastructure to remove the
questionable and artificial distinction between "devices that can
wake up the system from sleep states" and "devices that can
generate wakeup signals in the working state" from it, which allows
the code to be simplified quite a bit (Rafael Wysocki).
- Fix the wakeup IRQ framework by making it use SRCU instead of RCU
which doesn't allow sleeping in the read-side critical sections,
but which in turn is expected to be allowed by the IRQ bus locking
infrastructure (Thomas Gleixner).
- Modify some computations in the intel_pstate driver to avoid
rounding errors resulting from them (Srinivas Pandruvada).
- Reduce the overhead of the intel_pstate driver in the HWP
(hardware-managed P-states) mode and when the "performance" P-state
selection algorithm is in use by making it avoid registering
scheduler callbacks in those cases (Len Brown).
- Rework the energy_performance_preference sysfs knob in intel_pstate
by changing the values that correspond to different symbolic hint
names used by it (Len Brown).
- Make it possible to use more than one cpuidle driver at the same
time on ARM (Daniel Lezcano).
- Make it possible to prevent the cpuidle menu governor from using
the 0 state by disabling it via sysfs (Nicholas Piggin).
- Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on
AMD systems (Yazen Ghannam).
- Make the CPPC cpufreq driver take the lowest nonlinear performance
information into account (Prashanth Prakash).
- Add support for hi3660 to the cpufreq-dt driver, fix the imx6q
driver and clean up the sfi, exynos5440 and intel_pstate drivers
(Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael
Wysocki, Tao Wang).
- Fix a few minor issues in the generic power domains (genpd)
framework and clean it up somewhat (Krzysztof Kozlowski, Mikko
Perttunen, Viresh Kumar).
- Fix a couple of minor issues in the operating performance points
(OPP) framework and clean it up somewhat (Viresh Kumar).
- Fix a CONFIG dependency in the hibernation core and clean it up
slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).
- Add rk3228 support to the rockchip-io adaptive voltage scaling
(AVS) driver (David Wu).
- Fix an incorrect bit shift operation in the RAPL power capping
driver (Adam Lessnau).
- Add support for the EPP field in the HWP (hardware managed
P-states) control register, HWP.EPP, to the x86_energy_perf_policy
tool and update msr-index.h with HWP.EPP values (Len Brown).
- Fix some minor issues in the turbostat tool (Len Brown).
- Add support for AMD family 0x17 CPUs to the cpupower tool and fix a
minor issue in it (Sherry Hurwitz).
- Assorted cleanups, mostly related to the constification of some
data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
Kozlowski)"
* tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits)
cpufreq: Update scaling_cur_freq documentation
cpufreq: intel_pstate: Clean up after performance governor changes
PM: hibernate: constify attribute_group structures.
cpuidle: menu: allow state 0 to be disabled
intel_idle: Use more common logging style
PM / Domains: Fix missing default_power_down_ok comment
PM / Domains: Fix unsafe iteration over modified list of domains
PM / Domains: Fix unsafe iteration over modified list of domain providers
PM / Domains: Fix unsafe iteration over modified list of device links
PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device
PM / Domains: Call driver's noirq callbacks
PM / core: Drop run_wake flag from struct dev_pm_info
PCI / PM: Simplify device wakeup settings code
PCI / PM: Drop pme_interrupt flag from struct pci_dev
ACPI / PM: Consolidate device wakeup settings code
ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags
PM / QoS: constify *_attribute_group.
PM / AVS: rockchip-io: add io selectors and supplies for rk3228
powercap/RAPL: prevent overridding bits outside of the mask
PM / sysfs: Constify attribute groups
...
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
15197 2552 40 17789 457d drivers/cpufreq/intel_pstate.o
File size After adding 'const':
text data bss dec hex filename
15261 2488 40 17789 457d drivers/cpufreq/intel_pstate.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
1655 256 4 1915 77b drivers/cpufreq/cpufreq_stats.o
File size After adding 'const':
text data bss dec hex filename
1695 192 4 1891 763 drivers/cpufreq/cpufreq_stats.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull SMP hotplug updates from Thomas Gleixner:
"This update is primarily a cleanup of the CPU hotplug locking code.
The hotplug locking mechanism is an open coded RWSEM, which allows
recursive locking. The main problem with that is the recursive nature
as it evades the full lockdep coverage and hides potential deadlocks.
The rework replaces the open coded RWSEM with a percpu RWSEM and
establishes full lockdep coverage that way.
The bulk of the changes fix up recursive locking issues and address
the now fully reported potential deadlocks all over the place. Some of
these deadlocks have been observed in the RT tree, but on mainline the
probability was low enough to hide them away."
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
cpu/hotplug: Constify attribute_group structures
powerpc: Only obtain cpu_hotplug_lock if called by rtasd
ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
cpu/hotplug: Remove unused check_for_tasks() function
perf/core: Don't release cred_guard_mutex if not taken
cpuhotplug: Link lock stacks for hotplug callbacks
acpi/processor: Prevent cpu hotplug deadlock
sched: Provide is_percpu_thread() helper
cpu/hotplug: Convert hotplug locking to percpu rwsem
s390: Prevent hotplug rwsem recursion
arm: Prevent hotplug rwsem recursion
arm64: Prevent cpu hotplug rwsem recursion
kprobes: Cure hotplug lock ordering issues
jump_label: Reorder hotplug lock and jump_label_lock
perf/tracing/cpuhotplug: Fix locking order
ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
PCI: Replace the racy recursion prevention
PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
x86/perf: Drop EXPORT of perf_check_microcode
...
* pm-cpufreq:
cpufreq / CPPC: Initialize policy->min to lowest nonlinear performance
cpufreq: sfi: make freq_table static
cpufreq: exynos5440: Fix inconsistent indenting
cpufreq: imx6q: imx6ull should use the same flow as imx6ul
cpufreq: dt: Add support for hi3660
* intel_pstate:
cpufreq: Update scaling_cur_freq documentation
cpufreq: intel_pstate: Clean up after performance governor changes
intel_pstate: skip scheduler hook when in "performance" mode
intel_pstate: delete scheduler hook in HWP mode
x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
cpufreq: intel_pstate: Remove max/min fractions to limit performance
x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"
* pm-cpuidle:
cpuidle: menu: allow state 0 to be disabled
intel_idle: Use more common logging style
x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems
ARM: cpuidle: Support asymmetric idle definition
* pm-tools:
cpupower: Add support for new AMD family 0x17
cpupower: Fix bug where return value was not used
tools/power turbostat: update version number
tools/power turbostat: decode MSR_IA32_MISC_ENABLE only on Intel
tools/power turbostat: stop migrating, unless '-m'
tools/power turbostat: if --debug, print sampling overhead
tools/power turbostat: hide SKL counters, when not requested
intel_pstate: use updated msr-index.h HWP.EPP values
tools/power x86_energy_perf_policy: support HWP.EPP
x86: msr-index.h: fix shifts to ULL results in HWP macros.
x86: msr-index.h: define HWP.EPP values
x86: msr-index.h: define EPB mid-points
After commit 82b4e03e01 (intel_pstate: skip scheduler hook when in
"performance" mode) get_target_pstate_use_performance() and
get_target_pstate_use_cpu_load() are never called if scaling_governor
is "performance", so drop the CPUFREQ_POLICY_PERFORMANCE checks from
them as they will never trigger anyway.
Moreover, the documentation needs to be updated to reflect the change
made by the above commit, so do that too.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Description of Lowest Perfomance in ACPI 6.1 specification states:
"Lowest Performance is the absolute lowest performance level of
the platform. Selecting a performance level lower than the lowest
nonlinear performance level may actually cause an efficiency penalty,
but should reduce the instantaneous power consumption of the processor.
In traditional terms, this represents the T-state range of performance
levels."
Set the default value of policy->min to Lowest Nonlinear Performance
to avoid any potential efficiency penalty.
Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When the governor is set to "performance", intel_pstate does not
need the scheduler hook for doing any calculations. Under these
conditions, its only purpose is to continue to maintain
cpufreq/scaling_cur_freq.
The cpufreq/scaling_cur_freq sysfs attribute is now provided by
shared x86 cpufreq code on modern x86 systems, including
all systems supported by the intel_pstate driver.
So in "performance" governor mode, the scheduler hook can be skipped.
This applies to both in Software and Hardware P-state control modes.
Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpufreq/scaling_cur_freq sysfs attribute is now provided by
shared x86 cpufreq code on modern x86 systems, including
all systems supported by the intel_pstate driver.
In HWP mode, maintaining that value was the sole purpose of
the scheduler hook, intel_pstate_update_util_hwp(),
so it can now be removed.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The goal of this change is to give users a uniform and meaningful
result when they read /sys/...cpufreq/scaling_cur_freq
on modern x86 hardware, as compared to what they get today.
Modern x86 processors include the hardware needed
to accurately calculate frequency over an interval --
APERF, MPERF, and the TSC.
Here we provide an x86 routine to make this calculation
on supported hardware, and use it in preference to any
driver driver-specific cpufreq_driver.get() routine.
MHz is computed like so:
MHz = base_MHz * delta_APERF / delta_MPERF
MHz is the average frequency of the busy processor
over a measurement interval. The interval is
defined to be the time between successive invocations
of aperfmperf_khz_on_cpu(), which are expected to to
happen on-demand when users read sysfs attribute
cpufreq/scaling_cur_freq.
As with previous methods of calculating MHz,
idle time is excluded.
base_MHz above is from TSC calibration global "cpu_khz".
This x86 native method to calculate MHz returns a meaningful result
no matter if P-states are controlled by hardware or firmware
and/or if the Linux cpufreq sub-system is or is-not installed.
When this routine is invoked more frequently, the measurement
interval becomes shorter. However, the code limits re-computation
to 10ms intervals so that average frequency remains meaningful.
Discerning users are encouraged to take advantage of
the turbostat(8) utility, which can gracefully handle
concurrent measurement intervals of arbitrary length.
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the current model the max/min perf limits are a fraction of current
user space limits to the allowed max_freq or 100% for global limits.
This results in wrong ratio limits calculation because of rounding
issues for some user space limits.
Initially we tried to solve this issue by issue by having more shift
bits to increase precision. Still there are isolated cases where we still
have error.
This can be avoided by using ratios all together. Since the way we get
cpuinfo.max_freq is by multiplying scaling factor to max ratio, we can
easily keep the max/min ratios in terms of ratios and not fractions.
For example:
if the max ratio = 36
cpuinfo.max_freq = 36 * 100000 = 3600000
Suppose user space sets a limit of 1200000, then we can calculate
max ratio limit as
= 36 * 1200000 / 3600000
= 12
This will be correct for any user limits.
The other advantage is that, we don't need to do any calculation in the
fast path as ratio limit is already calculated via set_policy() callback.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
pointer freq_table can be made static as it does not need to be in
global scope.
Cleans up sparse warning:
"symbol 'freq_table' was not declared. Should it be static?"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix inconsistent indenting and unneeded white space in assignment.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This fixes an issue with imx6ull where setting the frequency to 528Mhz
would actually set the ARM clock to 324Mhz.
Signed-off-by: Octavian Purdila <octavian.purdila@nxp.com>
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add the compatible string for supporting the generic device tree cpufreq-dt
driver on Hisilicon's 3660 SoC.
Signed-off-by: Tao Wang <kevin.wangtao@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Conflicts:
kernel/sched/Makefile
Pick up the waitqueue related renames - it didn't get much feedback,
so it appears to be uncontroversial. Famous last words? ;-)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Adds support to get DVFS transition latency and OPP for any device whose
DVFS are managed by SCPI. This avoids code duplication in both cpufreq
and devfreq SCPI drivers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJZQmv9AAoJEABBurwxfuKYerEP/jaBOlgvAomhBreaERMX5Oud
He+7VIj2qjuapptE3Su2+t+IKCyebc2hTPSi+PhFNn4Upqk/ZchE5X+dbvXNG3nf
eZzBCSsvGPJoIXVeU3ACB5rT+Apv0qm0PSYUPnu9ZyCl46/qGZFECVufDJN1TdQc
FEdYZDi7mQotYL27Mx7q4i7vay8Z6MkJs9EH86BnB9M6dZa3C5kFY3jkm2tu1Rca
1ythsWMoy/Wrc7kHex8Dk90hbHBJ7RhJIYYxxv2IJ+67SkIdTd/+HpBcdLDpqVNt
e9ElLAw9t4ym1PdgGcQjDcHlpqRkxucgOJhSticrxTNdGThTH6hx7HZhz+CqCJ7k
e9my66sfKFH+wzz0C8agd1H9sMbaqoVoHUucj2kR6PR/rO/VQ9ugTDnaZfkRoPvh
+87XEv+d7Gf/B4jfEUwxsfcLSDe1vpghSt02FM4BrYtdoTZFse9koPMt0jKGNnjO
5jzk7VQVoKLpB5Xg8X9ePv808GWlUWTaxuPYfZMqThPdNHPsnAJrWe47s42Yogqw
ZswExEDY/MdixtgzZssi96r5m69QVycSCvs+QiBNDPFqRgsuwTC1CDtvF4/3vZks
mLtIxQ9vPXgHY7SfPGVZH3RuE07YC3QA+0m22xBCytogeriuMtxsyzHAkB3z2YFf
xSNL5FIu5jOGQRVZ+PKQ
=8QF6
-----END PGP SIGNATURE-----
Merge tag 'scpi-updates-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into next/drivers
SCPI update for v4.13
Adds support to get DVFS transition latency and OPP for any device whose
DVFS are managed by SCPI. This avoids code duplication in both cpufreq
and devfreq SCPI drivers.
* tag 'scpi-updates-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
cpufreq: scpi: use new scpi_ops functions to remove duplicate code
firmware: arm_scpi: add support to populate OPPs and get transition latency
Signed-off-by: Olof Johansson <olof@lixom.net>
Commit 27ed3cd2eb (cpufreq: conservative: Fix the logic in frequency
decrease checking) removed the 10 point substraction when comparing the
load against down_threshold but did not remove the related limit for the
down_threshold value. As a result, down_threshold lower than 11 is not
allowed even though values from 1 to 10 do work correctly too. The
comment ("cannot be lower than 11 otherwise freq will not fall") also
does not apply after removing the substraction.
For this reason, allow down_threshold to take any value from 1 to 99
and fix the related comment.
Fixes: 27ed3cd2eb (cpufreq: conservative: Fix the logic in frequency decrease checking)
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit c5a2ee7dde (cpufreq: intel_pstate: Active mode P-state
limits rework) incorrectly assumed that pstate.turbo_pstate would
always be nonzero for CPU0 in min_perf_pct_min() if
cpufreq_register_driver() had succeeded which may not be the case
in virtualized environments.
If that assumption doesn't hold, it leads to an early crash on boot
in intel_pstate_register_driver(), so add a sanity check to
min_perf_pct_min() to prevent the crash from happening.
Fixes: c5a2ee7dde (cpufreq: intel_pstate: Active mode P-state limits rework)
Reported-and-tested-by: Jongman Heo <jongman.heo@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
scpi_ops now provide APIs to get the transition_latency and to add
OPPs to the devices making those logic redundant here.
This patch makes use of those APIs and removes the redundant code in
this driver.
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
* pm-cpufreq:
cpufreq: kirkwood-cpufreq:- Handle return value of clk_prepare_enable()
cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
clk_prepare_enable() can fail here and we must check its return value.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For a driver that does not set the CPUFREQ_STICKY flag, if all of the
->init() calls fail, cpufreq_register_driver() should return an error.
This will prevent the driver from loading.
Fixes: ce1bcfe94d (cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs)
Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We need such a routine at two places already, lets create one.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
The CPU cooling driver uses the cpufreq policy, to get clip_cpus, the
frequency table, etc. Most of the callers of CPU cooling driver's
registration routines have the cpufreq policy with them, but they only
pass the policy->related_cpus cpumask. The __cpufreq_cooling_register()
routine then gets the policy by itself and uses it.
It would be much better if the callers can pass the policy instead
directly. This also fixes a basic design flaw, where the policy can be
freed while the CPU cooling driver is still active.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
cpufreq holds get_online_cpus() while invoking cpuhp_setup_state_nocalls()
to make subsys_interface_register() and the registration of hotplug calls
atomic versus cpu hotplug.
cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is
correct, but prevents the conversion of the hotplug locking to a percpu
rwsem.
Use cpuhp_setup/remove_state_nocalls_cpuslocked() to avoid the nested
call. Convert *_online_cpus() to the new interfaces while at it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170524081547.731628408@linutronix.de
To enable smp_processor_id() and might_sleep() debug checks earlier, it's
required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.
Adjust the system_state check in pas_cpufreq_cpu_exit() to handle the extra
states.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170516184735.620023128@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* intel_pstate:
cpufreq: intel_pstate: Document the current behavior and user interface
* pm-cpufreq:
cpufreq: dbx500: add a Kconfig symbol
* pm-cpufreq-sched:
cpufreq: schedutil: use now as reference when aggregating shared policy requests
Moving the cooling code into the cpufreq driver caused a possible build failure
when the cpu_thermal helper code is a loadable module or disabled:
drivers/cpufreq/dbx500-cpufreq.o: In function `dbx500_cpufreq_ready':
dbx500-cpufreq.c:(.text.dbx500_cpufreq_ready+0x4): undefined reference to `cpufreq_cooling_register'
This adds the same dependency that we have in other cpufreq drivers,
forcing the driver to be disabled when we can't possibly link it.
Fixes: 19678ffb9f (cpufreq: dbx500: Manage cooling device from cpufreq driver)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
intel_pstate exports sysfs attributes for setting and observing HWP.EPP.
These attributes use strings to describe 4 operating states, and
inside the driver, these strings are mapped to numerical register
values.
The authorative mapping between the strings and numerical HWP.EPP values
are now globally defined in msr-index.h, replacing the out-dated
mapping that were open-coded into intel_pstate.c
new old string
--- --- ------
0 0 performance
128 64 balance_performance
192 128 balance_power
255 192 power
Note that the HW and BIOS default value on most system is 128,
which intel_pstate will now call "balance_performance"
while it used to call it "balance_power".
Signed-off-by: Len Brown <len.brown@intel.com>
-----BEGIN PGP SIGNATURE-----
iQIVAwUAWPiW6vSw1s6N8H32AQLOrw/+NTqGf7bjq+64YKS6NfR0XDgE+wNJltGO
ck7zJW3NHIg76RNu8s0I9xg5aVmwizz3Z5DGROZquaolnezux4tQihZ3AFyxIzLc
+Y3WHYagcML7yFfjl/WznCLRD5EW3yPln4lCvQO0nW/xICRYeRI057JaIbi2Dtek
BhcXt3c4AjXDLdYJkgtHV3p2R2mt8hcdFdWqqx6s7JaIThZNRGNzxAgtbcB9k5IW
HVG9ZEIL73VBYWHrYivzjHYF5rBnNCPt87eOwDQeTOSkhv8te+u9k+bH8vxZw1T0
XUtDrLBndKiuVo2GUfLkkF8LItx3Q9eLCJYy0joaIliyPqTEsPx9KjQ+Af0cxS9s
ZPCZ5SYf96stKmDeL5xaMfrAmeyVHJ4lc4JTOqdzbIT8blsOSfYO/03p0ALShSDv
/RQLaKGlf8Bjoy8PwKFcXb4sIDufcd/U1Av/EMFXxOfgN/u2JUkGKq6EaIM5B68L
fHPje+aR9VNELPmPjwNOWtmN4I79EH3EItQf7zv0KG+UeKhcHLx/EAcSJ3ZRKEkH
Lathg7pPOEJGArPiVO79TZzBG01ADn1aiwv65XObMzNZ+54xI/mN/Y1DNF/kL5jU
XzvNzEjFt8mwMIZGVNdAt4+pDyMfIZGZSyUkSRKFnaQZMIvQrfQIU9RLBYLX5eOx
+/p0VkIwDpg=
=lbS7
-----END PGP SIGNATURE-----
Merge tag 'hwparam-20170420' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull hw lockdown support from David Howells:
"Annotation of module parameters that configure hardware resources
including ioports, iomem addresses, irq lines and dma channels.
This allows a future patch to prohibit the use of such module
parameters to prevent that hardware from being abused to gain access
to the running kernel image as part of locking the kernel down under
UEFI secure boot conditions.
Annotations are made by changing:
module_param(n, t, p)
module_param_named(n, v, t, p)
module_param_array(n, t, m, p)
to:
module_param_hw(n, t, hwtype, p)
module_param_hw_named(n, v, t, hwtype, p)
module_param_hw_array(n, t, hwtype, m, p)
where the module parameter refers to a hardware setting
hwtype specifies the type of the resource being configured. This can
be one of:
ioport Module parameter configures an I/O port
iomem Module parameter configures an I/O mem address
ioport_or_iomem Module parameter could be either (runtime set)
irq Module parameter configures an I/O port
dma Module parameter configures a DMA channel
dma_addr Module parameter configures a DMA buffer address
other Module parameter configures some other value
Note that the hwtype is compile checked, but not currently stored (the
lockdown code probably won't require it). It is, however, there for
future use.
A bonus is that the hwtype can also be used for grepping.
The intention is for the kernel to ignore or reject attempts to set
annotated module parameters if lockdown is enabled. This applies to
options passed on the boot command line, passed to insmod/modprobe or
direct twiddling in /sys/module/ parameter files.
The module initialisation then needs to handle the parameter not being
set, by (1) giving an error, (2) probing for a value or (3) using a
reasonable default.
What I can't do is just reject a module out of hand because it may
take a hardware setting in the module parameters. Some important
modules, some ipmi stuff for instance, both probe for hardware and
allow hardware to be manually specified; if the driver is aborts with
any error, you don't get any ipmi hardware.
Further, trying to do this entirely in the module initialisation code
doesn't protect against sysfs twiddling.
[!] Note that in and of itself, this series of patches should have no
effect on the the size of the kernel or code execution - that is
left to a patch in the next series to effect. It does mark
annotated kernel parameters with a KERNEL_PARAM_FL_HWPARAM flag in
an already existing field"
* tag 'hwparam-20170420' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (38 commits)
Annotate hardware config module parameters in sound/pci/
Annotate hardware config module parameters in sound/oss/
Annotate hardware config module parameters in sound/isa/
Annotate hardware config module parameters in sound/drivers/
Annotate hardware config module parameters in fs/pstore/
Annotate hardware config module parameters in drivers/watchdog/
Annotate hardware config module parameters in drivers/video/
Annotate hardware config module parameters in drivers/tty/
Annotate hardware config module parameters in drivers/staging/vme/
Annotate hardware config module parameters in drivers/staging/speakup/
Annotate hardware config module parameters in drivers/staging/media/
Annotate hardware config module parameters in drivers/scsi/
Annotate hardware config module parameters in drivers/pcmcia/
Annotate hardware config module parameters in drivers/pci/hotplug/
Annotate hardware config module parameters in drivers/parport/
Annotate hardware config module parameters in drivers/net/wireless/
Annotate hardware config module parameters in drivers/net/wan/
Annotate hardware config module parameters in drivers/net/irda/
Annotate hardware config module parameters in drivers/net/hamradio/
Annotate hardware config module parameters in drivers/net/ethernet/
...
While examining output from trial builds with -Wformat-security enabled,
many strings were found that should be defined as "const", or as a char
array instead of char pointer. This makes some static analysis easier,
by producing fewer false positives.
As these are all trivial changes, it seemed best to put them all in a
single patch rather than chopping them up per maintainer.
Link: http://lkml.kernel.org/r/20170405214711.GA5711@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jes Sorensen <jes@trained-monkey.org> [runner.c]
Cc: Tony Lindgren <tony@atomide.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Patrice Chotard <patrice.chotard@st.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Antonio Quartulli <a@unstable.cc>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kejian Yan <yankejian@huawei.com>
Cc: Daode Huang <huangdaode@hisilicon.com>
Cc: Qianqian Xie <xieqianqian@huawei.com>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Christian Gromm <christian.gromm@microchip.com>
Cc: Andrey Shvetsov <andrey.shvetsov@k2l.de>
Cc: Jason Litzinger <jlitzingerdev@gmail.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This typo is quite common. Fix it and add it to the spelling file so
that checkpatch catches it earlier.
Link: http://lkml.kernel.org/r/20170317011131.6881-2-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the kernel is running in secure boot mode, we lock down the kernel to
prevent userspace from modifying the running kernel image. Whilst this
includes prohibiting access to things like /dev/mem, it must also prevent
access by means of configuring driver modules in such a way as to cause a
device to access or modify the kernel image.
To this end, annotate module_param* statements that refer to hardware
configuration and indicate for future reference what type of parameter they
specify. The parameter parser in the core sees this information and can
skip such parameters with an error message if the kernel is locked down.
The module initialisation then runs as normal, but just sees whatever the
default values for those parameters is.
Note that we do still need to do the module initialisation because some
drivers have viable defaults set in case parameters aren't specified and
some drivers support automatic configuration (e.g. PNP or PCI) in addition
to manually coded parameters.
This patch annotates drivers in drivers/cpufreq/.
Suggested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
cc: linux-pm@vger.kernel.org
Add a new cpufreq driver for Tegra186 (and likely later).
The CPUs are organized into two clusters, Denver and A57,
with two and four cores respectively. CPU frequency can be
adjusted by writing the desired rate divisor and a voltage
hint to a special per-core register.
The frequency of each core can be set individually; however,
this is just a hint as all CPUs in a cluster will run at
the maximum rate of non-idle CPUs in the cluster.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
According to the previous error handling code, it is likely that
'goto out_free_opp' is expected here in order to avoid a memory leak in
error handling path.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the cpufreq driver tries to modify voltage/freq during suspend/resume
it might need to control an external PMIC via I2C or SPI but those
devices might be already suspended. This issue is likely to happen
whenever the LDOs have their vin-supply set.
To avoid this scenario we just increase cpufreq to the maximum before
suspend.
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If there are any errors in getting the cpu0 regulators, the driver returns
-ENOENT. In case the regulators are not yet available, the devm_regulator_get
calls will return -EPROBE_DEFER, so that the driver can be probed later.
If we return -ENOENT, the driver will fail its initialization and will
not try to probe again (when the regulators become available).
Return the actual error received from regulator_get in probe. Print a
differentiated message in case we need to probe the device later and
in case we actually failed. Also add a message to inform when the
driver has been successfully registered.
Signed-off-by: Irina Tirdea <irina.tirdea@nxp.com>
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).
That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.
Make intel_pstate set transition_delay_us to 500.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
The access to the HBIRD_ESTAR_MODE register in the cpu frequency control
functions must happen on the target CPU. This is achieved by temporarily
setting the affinity of the calling user space thread to the requested CPU
and reset it to the original affinity afterwards.
That's racy vs. CPU hotplug and concurrent affinity settings for that
thread resulting in code executing on the wrong CPU and overwriting the
new affinity setting.
Replace it by a straight forward smp function call.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: linux-pm@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704131020280.2408@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The access to the safari config register in the CPU frequency functions
must be executed on the target CPU. This is achieved by temporarily setting
the affinity of the calling user space thread to the requested CPU and
reset it to the original affinity afterwards.
That's racy vs. CPU hotplug and concurrent affinity settings for that
thread resulting in code executing on the wrong CPU and overwriting the
new affinity setting.
Replace it by a straight forward smp function call.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: linux-pm@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170412201043.047558840@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The target() callback must run on the affected cpu. This is achieved by
temporarily setting the affinity of the calling thread to the requested CPU
and reset it to the original affinity afterwards.
That's racy vs. concurrent affinity settings for that thread resulting in
code executing on the wrong CPU.
Replace it by work_on_cpu(). All call pathes which invoke the callbacks are
already protected against CPU hotplug.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: linux-pm@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170412201042.958216363@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The get() and target() callbacks must run on the affected cpu. This is
achieved by temporarily setting the affinity of the calling thread to the
requested CPU and reset it to the original affinity afterwards.
That's racy vs. concurrent affinity settings for that thread resulting in
code executing on the wrong CPU and overwriting the new affinity setting.
Replace it by work_on_cpu(). All call pathes which invoke the callbacks are
already protected against CPU hotplug.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: linux-pm@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704122231100.2548@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There is a report that after commit 27622b061e ("cpufreq: Convert
to hotplug state machine"), the normal CPU offline/online cycle
fails on some platforms.
According to the ftrace result, this problem was triggered on
platforms using acpi-cpufreq as the default cpufreq driver,
and due to the lack of some ACPI freq method (eg. _PCT),
cpufreq_online() failed and returned a negative value, so the CPU
hotplug state machine rolled back the CPU online process. Actually,
from the user's perspective, the failure of cpufreq_online() should
not prevent that CPU from being brought up, although cpufreq might
not work on that CPU.
BTW, during system startup cpufreq_online() is not invoked via CPU
online but by the cpufreq device creation process, so the APs can be
brought up even though cpufreq_online() fails in that stage.
This patch ignores the return value of cpufreq_online/offline() and
lets the cpufreq framework deal with the failure. cpufreq_online()
itself will do a proper rollback in that case and if _PCT is missing,
the ACPI cpufreq driver will print a warning if the corresponding
debug options have been enabled.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581
Fixes: 27622b061e ("cpufreq: Convert to hotplug state machine")
Reported-and-tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is pure mystery to me why we need to be on a specific CPU while
looking up a value in an array.
My best shot at this is that before commit d4019f0a92 ("cpufreq: move
freq change notifications to cpufreq core") it was required to invoke
cpufreq_notify_transition() on a special CPU.
Since it looks like a waste, remove it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: tglx@linutronix.de
Cc: linux-pm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15888/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Use same parameters as INTEL_FAM6_ATOM_GOLDMONT to enable
Gemini Lake.
Signed-off-by: Box, David E <david.e.box@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some computations in intel_pstate_get_min_max() are not necessary
and one of its two callers doesn't even use the full result.
First off, the fixed-point value of cpu->max_perf represents a
non-negative number between 0 and 1 inclusive and cpu->min_perf
cannot be greater than cpu->max_perf. It is not necessary to check
those conditions every time the numbers in question are used.
Moreover, since intel_pstate_max_within_limits() only needs the
upper boundary, it doesn't make sense to compute the lower one in
there and returning min and max from intel_pstate_get_min_max()
via pointers doesn't look particularly nice.
For the above reasons, drop intel_pstate_get_min_max(), add a helper
to get the base P-state for min/max computations and carry out them
directly in the previous callers of intel_pstate_get_min_max().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
intel_pstate_hwp_set() is the only function walking policy->cpus
in intel_pstate. The rest of the code simply assumes one CPU per
policy, including the initialization code.
Therefore it doesn't make sense for intel_pstate_hwp_set() to
walk policy->cpus as it is guaranteed to have only one bit set
for policy->cpu.
For this reason, rearrange intel_pstate_hwp_set() to take the CPU
number as the argument and drop the loop over policy->cpus from it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a new function pid_in_use() to return the information on whether
or not the PID-based P-state selection algorithm is in use.
That allows a couple of complicated conditions in the code to be
reduced to simple checks against the new function's return value.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpu_defaults structure is redundant, because it only contains
one member of type struct pstate_funcs which can be used directly
instead of struct cpu_defaults.
For this reason, drop struct cpu_defaults, use struct pstate_funcs
directly instead of it where applicable and rename all of the
variables of that type accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move the definitions of the cpu_defaults structures after the
definitions of utilization update callback routines to avoid
extra declarations of the latter.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Avoid using extra function pointers during P-state selection by
dropping the get_target_pstate member from struct pstate_funcs,
adding a new update_util callback to it (to be registered with
the CPU scheduler as the utilization update callback in the active
mode) and reworking the utilization update callback routines to
invoke specific P-state selection functions directly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Notice that some overhead in the utilization update callbacks
registered by intel_pstate in the active mode can be avoided if
those callbacks are tailored to specific configurations of the
driver. For example, the utilization update callback for the HWP
enabled case only needs to update the average CPU performance
periodically whereas the utilization update callback for the
PID-based algorithm does not need to take IO-wait boosting into
account and so on.
With that in mind, define three utilization update callbacks for
three different use cases: HWP enabled, the CPU load "powersave"
P-state selection algorithm and the PID-based "powersave" P-state
selection algorithm and modify the driver initialization to
choose the callback matching its current configuration.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
One of the checks in intel_pstate_update_status() implicitly relies
on the information that there are only two struct cpufreq_driver
objects available, but it is better to do it directly against the
value it really is about (to make the code easier to follow if
nothing else).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The driver_registered variable in intel_pstate is used for checking
whether or not the driver has been registered, but intel_pstate_driver
can be used for that too (with the rule that the driver is not
registered as long as it is NULL).
That is a bit more straightforward and the code may be simplified
a bit this way, so modify the driver accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PID controller parameters only need to be initialized if the
get_target_pstate_use_performance() P-state selection routine
is going to be used. It is not necessary to initialize them
otherwise, so don't do that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>