Commit Graph

3123 Commits

Author SHA1 Message Date
Bjorn Andersson
4f774c4a65 cpufreq: Reintroduce ready() callback
This effectively revert '4bf8e582119e ("cpufreq: Remove ready()
callback")', in order to reintroduce the ready callback.

This is needed in order to be able to leave the thermal pressure
interrupts in the Qualcomm CPUfreq driver disabled during
initialization, so that it doesn't fire while related_cpus are still 0.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[ Viresh: Added the Chinese translation as well and updated commit msg ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-02-09 13:18:49 +05:30
Huang Rui
a2e6840b37 cpufreq: amd-pstate: Fix Kconfig dependencies for AMD P-State
The AMD P-State driver is based on ACPI CPPC function, so ACPI should be
dependence of this driver in the kernel config.

In file included from ../drivers/cpufreq/amd-pstate.c:40:0:
../include/acpi/processor.h:226:2: error: unknown type name ‘phys_cpuid_t’
  phys_cpuid_t phys_id; /* CPU hardware ID such as APIC ID for x86 */
  ^~~~~~~~~~~~
../include/acpi/processor.h:355:1: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
 phys_cpuid_t acpi_get_phys_id(acpi_handle, int type, u32 acpi_id);
 ^~~~~~~~~~~~
 phys_addr_t
  CC      drivers/rtc/rtc-rv3029c2.o
../include/acpi/processor.h:356:1: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
 phys_cpuid_t acpi_map_madt_entry(u32 acpi_id);
 ^~~~~~~~~~~~
 phys_addr_t
../include/acpi/processor.h:357:20: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
 int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id);
                    ^~~~~~~~~~~~
                    phys_addr_t

See https://lore.kernel.org/lkml/20e286d4-25d7-fb6e-31a1-4349c805aae3@infradead.org/.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-06 18:31:33 +01:00
Yang Li
bdc4fd3d48 cpufreq: amd-pstate: Fix struct amd_cpudata kernel-doc comment
Add the description of @req and @boost_supported in struct amd_cpudata
kernel-doc comment to remove warnings found by running scripts/kernel-doc,
which is caused by using 'make W=1'.

drivers/cpufreq/amd-pstate.c:104: warning: Function parameter or member
'req' not described in 'amd_cpudata'
drivers/cpufreq/amd-pstate.c:104: warning: Function parameter or member
'boost_supported' not described in 'amd_cpudata'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-06 18:28:26 +01:00
Huang Rui
3ad7fde16a cpufreq: amd-pstate: Add AMD P-State performance attributes
Introduce sysfs attributes to get the different level AMD P-State
performances.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 18:51:40 +01:00
Huang Rui
ec4e3326a9 cpufreq: amd-pstate: Add AMD P-State frequencies attributes
Introduce sysfs attributes to get the different level processor
frequencies.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 18:51:40 +01:00
Huang Rui
41271016df cpufreq: amd-pstate: Add boost mode support for AMD P-State
If the sbios supports the boost mode of AMD P-State, let's switch to
boost enabled by default.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 18:51:39 +01:00
Huang Rui
60e10f896d cpufreq: amd-pstate: Add trace for AMD P-State module
Add trace event to monitor the performance value changes which is
controlled by cpu governors.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 18:51:39 +01:00
Huang Rui
e059c184da cpufreq: amd-pstate: Introduce the support for the processors with shared memory solution
In some of Zen2 and Zen3 based processors, they are using the shared
memory that exposed from ACPI SBIOS. In this kind of the processors,
there is no MSR support, so we add acpi cppc function as the backend for
them.

It is using a module param (shared_mem) to enable related processors
manually. We will enable this by default once we address performance
issue on this solution.

Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 18:51:39 +01:00
Huang Rui
1d215f0319 cpufreq: amd-pstate: Add fast switch function for AMD P-State
Introduce the fast switch function for AMD P-State on the AMD processors
which support the full MSR register control. It's able to decrease the
latency on interrupt context.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 18:51:39 +01:00
Huang Rui
ec437d71db cpufreq: amd-pstate: Introduce a new AMD P-State driver to support future processors
AMD P-State is the AMD CPU performance scaling driver that introduces a
new CPU frequency control mechanism on AMD Zen based CPU series in Linux
kernel. The new mechanism is based on Collaborative processor
performance control (CPPC) which is finer grain frequency management
than legacy ACPI hardware P-States. Current AMD CPU platforms are using
the ACPI P-states driver to manage CPU frequency and clocks with
switching only in 3 P-states. AMD P-State is to replace the ACPI
P-states controls, allows a flexible, low-latency interface for the
Linux kernel to directly communicate the performance hints to hardware.

AMD P-State leverages the Linux kernel governors such as *schedutil*,
*ondemand*, etc. to manage the performance hints which are provided by CPPC
hardware functionality. The first version for AMD P-State is to support one
of the Zen3 processors, and we will support more in future after we verify
the hardware and SBIOS functionalities.

There are two types of hardware implementations for AMD P-State: one is full
MSR support and another is shared memory support. It can use
X86_FEATURE_CPPC feature flag to distinguish the different types.

Using the new AMD P-State method + kernel governors (*schedutil*,
*ondemand*, ...) to manage the frequency update is the most appropriate
bridge between AMD Zen based hardware processor and Linux kernel, the
processor is able to adjust to the most efficiency frequency according to
the kernel scheduler loading.

Please check the detailed CPU feature and MSR register description in
Processor Programming Reference (PPR) for AMD Family 19h Model 51h,
Revision A1 Processors:

https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 18:51:39 +01:00
Rafael J. Wysocki
5ee22fa4a9 Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull ARM cpufreq updates for 5.17-rc1 from Viresh Kumar:

"- Qcom cpufreq driver updates improve irq support (Ard Biesheuvel, Stephen Boyd,
   and Vladimir Zapolskiy).

 - Fixes double devm_remap for mediatek driver (Hector Yuan).

 - Introduces thermal pressure helpers (Lukasz Luba)."

* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: mediatek-hw: Fix double devm_remap in hotplug case
  cpufreq: qcom-hw: Use optional irq API
  cpufreq: qcom-hw: Set CPU affinity of dcvsh interrupts
  cpufreq: qcom-hw: Fix probable nested interrupt handling
  cpufreq: qcom-cpufreq-hw: Avoid stack buffer for IRQ name
  arch_topology: Remove unused topology_set_thermal_pressure() and related
  cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function
  cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
  thermal: cpufreq_cooling: Use new thermal pressure update function
  arch_topology: Introduce thermal pressure update function
2021-12-30 15:49:54 +01:00
Greg Kroah-Hartman
fe262d5c1f cpufreq: use default_groups in kobj_type
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.  Move the cpufreq code to use default_groups field which has been
the preferred way since aa30f47cf6 ("kobject: Add support for default
attribute groups to kobj_type") so that we can soon get rid of the
obsolete default_attrs field.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-28 19:13:12 +01:00
Hector.Yuan
d776790a55 cpufreq: mediatek-hw: Fix double devm_remap in hotplug case
When hotpluging policy cpu, cpu policy init will be called multiple times.
Unplug CPU7 -> CPU6 -> CPU5 -> CPU4, then plug CPU4 again.
In this case, devm_remap will double remap and resource allocate fail.
So replace devm_remap to ioremap and release resources in cpu policy exit.

Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-12-27 09:44:53 +05:30
Rafael J. Wysocki
dfeeedc1bf cpufreq: intel_pstate: Update cpuinfo.max_freq on HWP_CAP changes
With HWP enabled, when the turbo range of performance levels is
disabled by the platform firmware, the CPU capacity is given by
the "guaranteed performance" field in MSR_HWP_CAPABILITIES which
is generally dynamic.  When it changes, the kernel receives an HWP
notification interrupt handled by notify_hwp_interrupt().

When the "guaranteed performance" value changes in the above
configuration, the CPU performance scaling needs to be adjusted so
as to use the new CPU capacity in computations, which means that
the cpuinfo.max_freq value needs to be updated for that CPU.

Accordingly, modify intel_pstate_notify_work() to read
MSR_HWP_CAPABILITIES and update cpuinfo.max_freq to reflect the
new configuration (this update can be carried out even if the
configuration doesn't actually change, because it simply doesn't
matter then and it takes less time to update it than to do extra
checks to decide whether or not a change has really occurred).

Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-22 18:36:07 +01:00
Rafael J. Wysocki
521223d8b3 cpufreq: Fix initialization of min and max frequency QoS requests
The min and max frequency QoS requests in the cpufreq core are
initialized to whatever the current min and max frequency values are
at the init time, but if any of these values change later (for
example, cpuinfo.max_freq is updated by the driver), these initial
request values will be limiting the CPU frequency unnecessarily
unless they are changed by user space via sysfs.

To address this, initialize min_freq_req and max_freq_req to
FREQ_QOS_MIN_DEFAULT_VALUE and FREQ_QOS_MAX_DEFAULT_VALUE,
respectively, so they don't really limit anything until user
space updates them.

Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-17 16:45:53 +01:00
Srinivas Pandruvada
b6e6f8beec cpufreq: intel_pstate: Update EPP for AlderLake mobile
There is an expectation from users that they can get frequency specified
by cpufreq/cpuinfo_max_freq when conditions permit. But with AlderLake
mobile it may not be possible. This is possible that frequency is clipped
based on the system power-up EPP value. In this case users can update
cpufreq/energy_performance_preference to some performance oriented EPP to
limit clipping of frequencies.

To get out of box behavior as the prior generations of CPUs, update EPP
for AlderLake mobile CPUs on boot. On prior generations of CPUs EPP = 128
was enough to get maximum frequency, but with AlderLake mobile the
equivalent EPP is 102. Since EPP is model specific, this is possible that
they have different meaning on each generation of CPU.

The current EPP string "balance_performance" corresponds to EPP = 128.
Change the EPP corresponding to "balance_performance" to 102 for only
AlderLake mobile CPUs and update this on each CPU during boot.

To implement reuse epp_values[] array and update the modified EPP at the
index for BALANCE_PERFORMANCE. Add a dummy EPP_INDEX_DEFAULT to
epp_values[] to match indexes in the energy_perf_strings[].

After HWP PM is enabled also update EPP when "balance_performance" is
redefined for the very first time after the boot on each CPU. On
subsequent suspend/resume or offline/online the old EPP is restored,
so no specific action is needed.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-17 16:44:15 +01:00
Rafael J. Wysocki
458b03f81a cpufreq: intel_pstate: Drop redundant intel_pstate_get_hwp_cap() call
It is not necessary to call intel_pstate_get_hwp_cap() from
intel_pstate_update_perf_limits(), because it gets called from
intel_pstate_verify_cpu_policy() which is either invoked directly
right before intel_pstate_update_perf_limits(), in
intel_cpufreq_verify_policy() in the passive mode, or called
from driver callbacks in a sequence that causes it to be followed
by an immediate intel_pstate_update_perf_limits().

Namely, in the active mode intel_cpufreq_verify_policy() is called
by intel_pstate_verify_policy() which is the ->verify() callback
routine of intel_pstate and gets called by the cpufreq core right
before intel_pstate_set_policy(), which is the driver's ->setoplicy()
callback routine, where intel_pstate_update_perf_limits() is called.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-17 16:26:50 +01:00
Stephen Boyd
8f5783ad9e cpufreq: qcom-hw: Use optional irq API
Use platform_get_irq_optional() to avoid a noisy error message when the
irq isn't specified. The irq is definitely optional given that we only
care about errors that are -EPROBE_DEFER here.

Cc: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-12-03 10:16:51 +05:30
Tang Yizhou
1e81d3e06d cpufreq: Fix a comment in cpufreq_policy_free
Make the comment in blocking_notifier_call_chain() easier to
understand.

Signed-off-by: Tang Yizhou <tangyizhou@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-01 20:02:11 +01:00
Xiongfeng Wang
2c1b5a8466 cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()
When I hot added a CPU, I found 'cpufreq' directory was not created
below /sys/devices/system/cpu/cpuX/.

It is because get_cpu_device() failed in add_cpu_dev_symlink().

cpufreq_add_dev() is the .add_dev callback of a CPU subsys interface.
It will be called when the CPU device registered into the system.
The call chain is as follows:

  register_cpu()
  ->device_register()
   ->device_add()
    ->bus_probe_device()
     ->cpufreq_add_dev()

But only after the CPU device has been registered, we can get the
CPU device by get_cpu_device(), otherwise it will return NULL.

Since we already have the CPU device in cpufreq_add_dev(), pass
it to add_cpu_dev_symlink().

I noticed that the 'kobj' of the CPU device has been added into
the system before cpufreq_add_dev().

Fixes: 2f0ba790df ("cpufreq: Fix creation of symbolic links to policy directories")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-01 19:50:56 +01:00
Vladimir Zapolskiy
3ed6dfbd3b cpufreq: qcom-hw: Set CPU affinity of dcvsh interrupts
In runtime CPU cluster specific dcvsh interrupts may be handled on
unrelated CPU cores, it leads to an issue of too excessive number of
received and handled interrupts, but this is not observed, if CPU
affinity of the interrupt handler is set in accordance to CPU clusters.

The change reduces a number of received interrupts in about 10-100 times.

Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-11-25 12:19:38 +05:30
Vladimir Zapolskiy
e0e27c3d4e cpufreq: qcom-hw: Fix probable nested interrupt handling
Re-enabling an interrupt from its own interrupt handler may cause
an interrupt storm, if there is a pending interrupt and because its
handling is disabled due to already done entrance into the handler
above in the stack.

Also, apparently it is improper to lock a mutex in an interrupt contex.

Fixes: 275157b367 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-11-25 12:19:38 +05:30
Ard Biesheuvel
be6592ed56 cpufreq: qcom-cpufreq-hw: Avoid stack buffer for IRQ name
Registering an IRQ requires the string buffer containing the name to
remain allocated, as the name is not copied into another buffer.

So let's add a irq_name field to the data struct instead, which is
guaranteed to have the appropriate lifetime.

Cc: Thara Gopinath <thara.gopinath@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Andy Gross <agross@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-11-25 12:19:37 +05:30
Srinivas Pandruvada
03c83982a0 cpufreq: intel_pstate: ITMT support for overclocked system
On systems with overclocking enabled, CPPC Highest Performance can be
hard coded to 0xff. In this case even if we have cores with different
highest performance, ITMT can't be enabled as the current implementation
depends on CPPC Highest Performance.

On such systems we can use MSR_HWP_CAPABILITIES maximum performance field
when CPPC.Highest Performance is 0xff.

Due to legacy reasons, we can't solely depend on MSR_HWP_CAPABILITIES as
in some older systems CPPC Highest Performance is the only way to identify
different performing cores.

Reported-by: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-23 14:11:18 +01:00
Rafael J. Wysocki
ed38eb49d1 cpufreq: intel_pstate: Fix active mode offline/online EPP handling
After commit 4adcf2e582 ("cpufreq: intel_pstate: Add ->offline and
->online callbacks") the EPP value set by the "performance" scaling
algorithm in the active mode is not restored after an offline/online
cycle which replaces it with the saved EPP value coming from user
space.

Address this issue by forcing intel_pstate_hwp_set() to set a new
EPP value when it runs first time after online.

Fixes: 4adcf2e582 ("cpufreq: intel_pstate: Add ->offline and ->online callbacks")
Link: https://lore.kernel.org/linux-pm/adc7132c8655bd4d1c8b6129578e931a14fe1db2.camel@linux.intel.com/
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-23 14:02:11 +01:00
Adamos Ttofari
cd23f02f16 cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs
Commit fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers
support in no-HWP mode") enabled the use of Intel P-State driver
for Ice Lake servers.

But it doesn't cover the case when OS can't control P-States.

Therefore, for Ice Lake server, if MSR_MISC_PWR_MGMT bits 8 or 18
are enabled, then the Intel P-State driver should exit as OS can't
control P-States.

Fixes: fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode")
Signed-off-by: Adamos Ttofari <attofari@amazon.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-23 14:02:10 +01:00
Lukasz Luba
0258cb19c7 cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function
Thermal pressure provides a new API, which allows to use CPU frequency
as an argument. That removes the need of local conversion to capacity.
Use this new API and remove old local conversion code.

The new arch_update_thermal_pressure() also accepts boost frequencies,
which solves issue in the driver code with wrong reduced capacity
calculation. The reduced capacity was calculated wrongly due to
'policy->cpuinfo.max_freq' used as a divider. The value present there was
actually the boost frequency. Thus, even a normal maximum frequency value
which corresponds to max CPU capacity (arch_scale_cpu_capacity(cpu_id))
is not able to remove the capping.

The second side effect which is solved is that the reduced frequency wasn't
properly translated into the right reduced capacity,
e.g.
boost frequency = 3000MHz (stored in policy->cpuinfo.max_freq)
max normal frequency = 2500MHz (which is 1024 capacity)
2nd highest frequency = 2000MHz (which translates to 819 capacity)

Then in a scenario when the 'throttled_freq' max allowed frequency was
2000MHz the driver translated it into 682 capacity:
capacity = 1024 * 2000 / 3000 = 682
Then set the pressure value bigger than actually applied by the HW:
max_capacity - capacity => 1024 - 682 = 342 (<- thermal pressure)
Which was causing higher throttling and misleading task scheduler
about available CPU capacity.
A proper calculation in such case should be:
capacity = 1024 * 2000 / 2500 = 819
1024 - 819 = 205 (<- thermal pressure)

This patch relies on the new arch_update_thermal_pressure() handling
correctly such use case (with boost frequencies).

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-11-23 15:10:26 +05:30
Lukasz Luba
93d9e6f93e cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
The thermal pressure signal gives information to the scheduler about
reduced CPU capacity due to thermal. It is based on a value stored in
a per-cpu 'thermal_pressure' variable. The online CPUs will get the
new value there, while the offline won't. Unfortunately, when the CPU
is back online, the value read from per-cpu variable might be wrong
(stale data).  This might affect the scheduler decisions, since it
sees the CPU capacity differently than what is actually available.

Fix it by making sure that all online+offline CPUs would get the
proper value in their per-cpu variable when there is throttling
or throttling is removed.

Fixes: 275157b367 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-11-23 15:10:26 +05:30
Srinivas Pandruvada
074d0cdfbb cpufreq: intel_pstate: Clear HWP Status during HWP Interrupt enable
It is possible that some performance excursions happened before OS boot
or enable HWP interrupts. So clear MSR_HWP_STATUS bits when we enable
HWP interrupt. In this way a next excursion will results in a HWP
interrupt.

The status bits of MSR_HWP_STATUS must be cleared (0) by software so
that a new status condition change will cause the hardware to set the
bit again and issue the notification.

Fixes: 57577c996d ("cpufreq: intel_pstate: Process HWP Guaranteed change notification")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-04 19:48:47 +01:00
Srinivas Pandruvada
5521055670 cpufreq: intel_pstate: Fix unchecked MSR 0x773 access
It is possible that on some platforms HWP interrupts are disabled. In
that case accessing MSR 0x773 will result in warning.

So check X86_FEATURE_HWP_NOTIFY feature to access MSR 0x773. The other
places in code where this MSR is accessed, already checks this feature
except during disable path called during cpufreq offline and suspend
callbacks.

Fixes: 57577c996d ("cpufreq: intel_pstate: Process HWP Guaranteed change notification")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-04 19:48:47 +01:00
Rafael J. Wysocki
dbea75fe18 cpufreq: intel_pstate: Clear HWP desired on suspend/shutdown and offline
Commit a365ab6b9d ("cpufreq: intel_pstate: Implement the
->adjust_perf() callback") caused intel_pstate to use nonzero HWP
desired values in certain usage scenarios, but it did not prevent
them from being leaked into the confugirations in which HWP desired
is expected to be 0.

The failing scenarios are switching the driver from the passive
mode to the active mode and starting a new kernel via kexec() while
intel_pstate is running in the passive mode.

To address this issue, ensure that HWP desired will be cleared on
offline and suspend/shutdown.

Fixes: a365ab6b9d ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Tested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-04 19:48:47 +01:00
Rafael J. Wysocki
bf56b90797 Merge branches 'pm-em' and 'powercap'
Merge Energy Model and power capping updates for 5.16-rc1:

 - Add support for inefficient operating performance points to the
   Energy Model and modify cpufreq to use them properly (Vincent
   Donnefort).

 - Rearrange the DTPM framework code to simplify it and make it easier
   to follow (Daniel Lezcano).

 - Fix power intialization in DTPM (Daniel Lezcano).

 - Add CPU load consideration when estimating the instaneous power
   consumption in DTPM (Daniel Lezcano).

* pm-em:
  cpufreq: mediatek-hw: Fix cpufreq_table_find_index_dl() call
  PM: EM: Mark inefficiencies in CPUFreq
  cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
  cpufreq: Introducing CPUFREQ_RELATION_E
  cpufreq: Add an interface to mark inefficient frequencies
  cpufreq: Make policy min/max hard requirements
  PM: EM: Allow skipping inefficient states
  PM: EM: Extend em_perf_domain with a flag field
  PM: EM: Mark inefficient states
  PM: EM: Fix inefficient states detection

* powercap:
  powercap/drivers/dtpm: Fix power limit initialization
  powercap/drivers/dtpm: Scale the power with the load
  powercap/drivers/dtpm: Use container_of instead of a private data field
  powercap/drivers/dtpm: Simplify the dtpm table
  powercap/drivers/dtpm: Encapsulate even more the code
2021-11-02 19:31:28 +01:00
Rafael J. Wysocki
19ea8a0dd4 Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull ARM cpufreq updates for 5.16-rc1 from Viresh Kumar:

"- Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).

 - Fix the parameter usage of the newly added perf-domain API (Hector
   Yuan).

 - Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
   Guenter Roeck, and Arnd Bergmann)."

* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: Fix parameter in parse_perf_domain()
  cpufreq: tegra186/tegra194: Handle errors in BPMP response
  cpufreq: remove useless INIT_LIST_HEAD()
  cpufreq: s3c244x: add fallthrough comments for switch
  cpufreq: vexpress: Drop unused variable
2021-11-02 17:55:31 +01:00
Zhang Rui
c72bcf0ab8 cpufreq: intel_pstate: Fix cpu->pstate.turbo_freq initialization
Fix a problem in active mode that cpu->pstate.turbo_freq is initialized
only if HWP-to-frequency scaling factor is refined.

In passive mode, this problem is not exposed, because
cpu->pstate.turbo_freq is set again, later in
intel_cpufreq_cpu_init()->intel_pstate_get_hwp_cap().

Fixes: eb3693f052 ("cpufreq: intel_pstate: hybrid: CPU-specific scaling factor")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-26 16:00:50 +02:00
Vincent Donnefort
6215a5de9e cpufreq: mediatek-hw: Fix cpufreq_table_find_index_dl() call
The new cpufreq table flag RELATION_E introduced a new "efficient"
parameter for the cpufreq_table_find*() functions.

Fixes: 1f39fa0dcc (cpufreq: Introducing CPUFREQ_RELATION_E)
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-07 19:21:54 +02:00
Vincent Donnefort
b894d20e68 cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
Let the governors schedutil, conservative and ondemand to work, if possible
on efficient frequencies only.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Vincent Donnefort
1f39fa0dcc cpufreq: Introducing CPUFREQ_RELATION_E
This newly introduced flag can be applied by a governor to a CPUFreq
relation, when looking for a frequency within the policy table. The
resolution would then only walk through efficient frequencies.

Even with the flag set, the policy max limit will still be honoured. If no
efficient frequencies can be found within the limits of the policy, an
inefficient one would be returned.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Vincent Donnefort
1517176906 cpufreq: Make policy min/max hard requirements
When applying the policy min/max limits, the requested frequency is
simply clamped to not be out of range. It means, however, if one of the
boundaries isn't an available frequency, the frequency resolution can
return a value out of those limits, depending on the relation used.

e.g. freq{0,1,2} being available frequencies.

          freq0  policy->min  freq1  policy->max   freq2
            |        |          |        |           |
          17kHz     18kHz     19kHz     20kHz      21kHz

     __resolve_freq(21kHz, CPUFREQ_RELATION_L) -> 21kHz (out of bounds)
     __resolve_freq(17kHz, CPUFREQ_RELATION_H) -> 17kHz (out of bounds)

If, during the policy init, we resolve the requested min/max to existing
frequencies, we ensure that any CPUFREQ_RELATION_* would resolve to a
frequency which is inside the policy min/max range.

Making the policy limits rigid helps to introduce the inefficient
frequencies support. Resolving an inefficient frequency to an efficient
one should not transgress policy->max (which can be set for thermal
reason) and having a value we can trust simplify this comparison.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Srinivas Pandruvada
57577c996d cpufreq: intel_pstate: Process HWP Guaranteed change notification
It is possible that HWP guaranteed ratio is changed in response to
change in power and thermal limits. For example when Intel Speed Select
performance profile is changed or there is change in TDP, hardware can
send notifications. It is possible that the guaranteed ratio is
increased. This creates an issue when turbo is disabled, as the old
limits set in MSR_HWP_REQUEST are still lower and hardware will clip
to older limits.

This change enables HWP interrupt and process HWP interrupts. When
guaranteed is changed, calls cpufreq_update_policy() so that driver
callbacks are called to update to new HWP limits. This callback
is called from a delayed workqueue of 10ms to avoid frequent updates.

Although the scope of IA32_HWP_INTERRUPT is per logical cpu, on some
plaforms interrupt is generated on all CPUs. This is particularly a
problem during initialization, when the driver didn't allocated
data for other CPUs. So this change uses a cpumask of enabled CPUs and
process interrupts on those CPUs only.

When the cpufreq offline() or suspend() callback is called, HWP interrupt
is disabled on those CPUs and also cancels any pending work item.

Spin lock is used to protect data and processing shared with interrupt
handler. Here READ_ONCE(), WRITE_ONCE() macros are used to designate
shared data, even though spin lock act as an optimization barrier here.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: pablomh@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 15:30:44 +02:00
Mikko Perttunen
c2ace21f93 cpufreq: tegra186/tegra194: Handle errors in BPMP response
The return value from tegra_bpmp_transfer indicates the success or
failure of the IPC transaction with BPMP. If the transaction
succeeded, we also need to check the actual command's result code.
Add code to do this.

While at it, explicitly handle missing CPU clusters, which can
occur on floorswept chips. This worked before as well, but
possibly only by accident.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-10-04 12:31:36 +05:30
Han Wang
6065a67267 cpufreq: remove useless INIT_LIST_HEAD()
list cpu_data_list has been inited staticly through LIST_HEAD,
so there's no need to call another INIT_LIST_HEAD. Simply remove
it from cppc_cpufreq_init.

Signed-off-by: Han Wang <zjuwanghan@outlook.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-10-04 12:27:44 +05:30
Arnd Bergmann
08ef8d35a8 cpufreq: s3c244x: add fallthrough comments for switch
Apparently nobody has so far caught this warning, I hit it in randconfig
build testing:

drivers/cpufreq/s3c2440-cpufreq.c: In function 's3c2440_cpufreq_setdivs':
drivers/cpufreq/s3c2440-cpufreq.c:175:10: error: this statement may fall through [-Werror=implicit-fallthrough=]
   camdiv |= S3C2440_CAMDIVN_HCLK3_HALF;
          ^
drivers/cpufreq/s3c2440-cpufreq.c:176:2: note: here
  case 3:
  ^~~~
drivers/cpufreq/s3c2440-cpufreq.c:181:10: error: this statement may fall through [-Werror=implicit-fallthrough=]
   camdiv |= S3C2440_CAMDIVN_HCLK4_HALF;
          ^
drivers/cpufreq/s3c2440-cpufreq.c:182:2: note: here
  case 4:
  ^~~~

Both look like the fallthrough is intentional, so add the new
"fallthrough;" keyword.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-10-04 11:43:14 +05:30
Guenter Roeck
45b2bb6620 cpufreq: vexpress: Drop unused variable
arm:allmodconfig fails to build with the following error.

drivers/cpufreq/vexpress-spc-cpufreq.c:454:13: error:
					unused variable 'cur_cluster'

Remove the unused variable.

Fixes: bb8c26d938 ("cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag")
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-10-04 11:25:21 +05:30
Linus Torvalds
4357f03d66 Power management fixes for 5.15-rc2
- Prevent intel_pstate from avoiding to use HWP, even if instructed
    to do so via the kernel command line, when HWP has been enabled
    already by the platform firmware (Doug Smythies).
 
  - Prevent use-after-free from occurring in the schedutil cpufreq
    governor on exit by fixing a core helper function that attempts
    to access memory associated with a kobject after calling
    kobject_put() on it (James Morse).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmFE2wsSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx050P/0tsaaPhfNuguwORa/rV6UCWOmFtIuSA
 Td9tmkwee7ff3WKDCOFqMU+peKC443C/jVGLy7lPs/v9vDfD5QaJTcfL4Ua1ifjP
 gW+2Rf1omOdFQuB31JZNO9HV3/FRgo5KX0hUA+3qh+MsPGOF9BlWf96N1rDrwtyA
 Vsgx66LcOmAJg67YX0MZhtNm/RMBV4aFVArU6GgEGaURoHH5OgVg4yuRgeKarZin
 weyVPqY7rOEMCUq5gXl5qpMlLw6O8P89lsQJrqoIWeOh1zmz5CPrzcE5QT0pVWcd
 6m704TYC5ajpqUc6XaMkgkQXSIwHbcPe3IDB5SvP1eQE/duhD+gw2osyHKqE5C3c
 pVtS2z1PFnhByVnRbUP53znwaYyyp63vriyq7zGxs5SwNmgENEavMMu6/C0loTOp
 fpfGvwZbVItxrRVHm+Y49pzSoSwy+aNQq5wTT0rVD670TqpZREKl/HcACihyZLQO
 E23zvOH4OyAW5j8SdPmZMU9m/aJqjfVCduapf5iLe72pOhB28rVQ+DWupJ5395RP
 5qoLKeGxPewy1qtc2PF41lztSfah9pDSrtxBv5ebVDu5uOLt71svaF1fdsixWBlo
 +MHMS+njeJzq5C0iXbfLvUBGw7XCUauax6CkkNl5ooYaFZFp0Gb+YHEs6fiXA8Oh
 ZlkT9hVE9ARj
 =ytGW
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix two cpufreq issues, one in the intel_pstate driver and one
  in the core.

  Specifics:

   - Prevent intel_pstate from avoiding to use HWP, even if instructed
     to do so via the kernel command line, when HWP has been enabled
     already by the platform firmware (Doug Smythies).

   - Prevent use-after-free from occurring in the schedutil cpufreq
     governor on exit by fixing a core helper function that attempts to
     access memory associated with a kobject after calling kobject_put()
     on it (James Morse)"

* tag 'pm-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: schedutil: Destroy mutex before kobject_put() frees the memory
  cpufreq: intel_pstate: Override parameters if HWP forced by BIOS
2021-09-17 12:05:04 -07:00
Guenter Roeck
b60cee5bae cpufreq: vexpress: Drop unused variable
arm:allmodconfig fails to build with the following error.

  drivers/cpufreq/vexpress-spc-cpufreq.c:454:13: error:
					unused variable 'cur_cluster'

Remove the unused variable.

Fixes: bb8c26d938 ("cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag")
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-16 11:29:27 -07:00
James Morse
cdef119660 cpufreq: schedutil: Destroy mutex before kobject_put() frees the memory
Since commit e5c6b312ce ("cpufreq: schedutil: Use kobject release()
method to free sugov_tunables") kobject_put() has kfree()d the
attr_set before gov_attr_set_put() returns.

kobject_put() isn't the last user of attr_set in gov_attr_set_put(),
the subsequent mutex_destroy() triggers a use-after-free:
| BUG: KASAN: use-after-free in mutex_is_locked+0x20/0x60
| Read of size 8 at addr ffff000800ca4250 by task cpuhp/2/20
|
| CPU: 2 PID: 20 Comm: cpuhp/2 Not tainted 5.15.0-rc1 
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development
| Platform, BIOS EDK II Jul 30 2018
| Call trace:
|  dump_backtrace+0x0/0x380
|  show_stack+0x1c/0x30
|  dump_stack_lvl+0x8c/0xb8
|  print_address_description.constprop.0+0x74/0x2b8
|  kasan_report+0x1f4/0x210
|  kasan_check_range+0xfc/0x1a4
|  __kasan_check_read+0x38/0x60
|  mutex_is_locked+0x20/0x60
|  mutex_destroy+0x80/0x100
|  gov_attr_set_put+0xfc/0x150
|  sugov_exit+0x78/0x190
|  cpufreq_offline.isra.0+0x2c0/0x660
|  cpuhp_cpufreq_offline+0x14/0x24
|  cpuhp_invoke_callback+0x430/0x6d0
|  cpuhp_thread_fun+0x1b0/0x624
|  smpboot_thread_fn+0x5e0/0xa6c
|  kthread+0x3a0/0x450
|  ret_from_fork+0x10/0x20

Swap the order of the calls.

Fixes: e5c6b312ce ("cpufreq: schedutil: Use kobject release() method to free sugov_tunables")
Cc: 4.7+ <stable@vger.kernel.org> # 4.7+
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-14 19:01:36 +02:00
Doug Smythies
d9a7e9df73 cpufreq: intel_pstate: Override parameters if HWP forced by BIOS
If HWP has been already been enabled by BIOS, it may be
necessary to override some kernel command line parameters.
Once it has been enabled it requires a reset to be disabled.

Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-13 19:26:08 +02:00
Rafael J. Wysocki
be2d24336f Merge branches 'pm-cpufreq', 'pm-sleep' and 'pm-em'
* pm-cpufreq:
  cpufreq: intel_pstate: hybrid: Rework HWP calibration
  ACPI: CPPC: Introduce cppc_get_nominal_perf()

* pm-sleep:
  PM: sleep: core: Avoid setting power.must_resume to false
  PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()

* pm-em:
  Documentation: power: include kernel-doc in Energy Model doc
  PM: EM: fix kernel-doc comments
2021-09-10 20:26:08 +02:00
Linus Torvalds
30f3490978 More power management updates for 5.15-rc1
- Add new cpufreq driver for the MediaTek MT6779 platform called
    mediatek-hw along with corresponding DT bindings (Hector.Yuan).
 
  - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
    Gopinath).
 
  - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
    policy flag (Taniya Das).
 
  - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
    Andersson).
 
  - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
    flag (Viresh Kumar).
 
  - Add new cpufreq driver callback to allow drivers to register
    with the Energy Model in a consistent way and make several
    drivers use it (Viresh Kumar).
 
  - Change the remaining users of the .ready() cpufreq driver callback
    to move the code from it elsewhere and drop it from the cpufreq
    core (Viresh Kumar).
 
  - Revert recent intel_pstate change adding HWP guaranteed performance
    change notification support to it that led to problems, because
    the notification in question is triggered prematurely on some
    systems (Rafael Wysocki).
 
  - Convert the OPP DT bindings to DT schema and clean them up while
    at it (Rob Herring).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmE41VgSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxCN4P+gMjMrrZmuU6gZsbbvpDlaBhCd2Xq3TD
 xR/DMDi7znkh3TUX3uwL+xnr+k0krIH0jBIeUQUE7NeNIoT6wgbjJ4Ty5rFq76qB
 AODmmZ4vO7lmnupSyqUQbHfYohDmyICSKiStf8UOEj1o+jSWNmrgUYUv0tDtDUH+
 Cn0vByah8gJAnoZX8Y8BM1jmRc3YoNHWpvtTQIhIBPkVZ//+NOKvDZvwUUPZFb+M
 1PzMSfX7WsIDiUrUHpdvtZsoBniaMk0WS1EqVBRvEprqUXad1eHF19yuhtLxeUPH
 8xh/7o8kYzjqVJvs7blTT8DztxRDScWHeGKSVdwoEJupbCwc5R3qfGaD6PWyhI1x
 9R5Swsp64nLptTCwH7ZmgdJbC9IqN3cz1Nadd5v2Q2wr21KvZnj7zI2ijkPKGnZo
 kqYQHghqnDkGPFVjdls/RKUXGCQIXZFQb+FeuyCvpVlz9Ol8+DnTYtgFjjc6VcU1
 kApIqsE8V8GQEyzmm/OIAf6xtsA+mEUUh3Qds16KNCwBCRglC8I6v5IWnBc8PEJz
 a+wtwjx+tyKSSlAMEvcDNtrWVN+3JNrwCFG+Q+QjMrwLiAgACrtJZ6e6PmLj9sZv
 FPbZM8rzbzN7Zqd7XVNY37KHjqNs7zzDScAnATTUCKThp8ijDgtmG3ZhExmOPayc
 74aQLham4TBO
 =WSpr
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These are mostly ARM cpufreq driver updates, including one new
  MediaTek driver that has just passed all of the reviews, with the
  addition of a revert of a recent intel_pstate commit, some core
  cpufreq changes and a DT-related update of the operating performance
  points (OPP) support code.

  Specifics:

   - Add new cpufreq driver for the MediaTek MT6779 platform called
     mediatek-hw along with corresponding DT bindings (Hector.Yuan).

   - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
     Gopinath).

   - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
     policy flag (Taniya Das).

   - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
     Andersson).

   - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
     flag (Viresh Kumar).

   - Add new cpufreq driver callback to allow drivers to register with
     the Energy Model in a consistent way and make several drivers use
     it (Viresh Kumar).

   - Change the remaining users of the .ready() cpufreq driver callback
     to move the code from it elsewhere and drop it from the cpufreq
     core (Viresh Kumar).

   - Revert recent intel_pstate change adding HWP guaranteed performance
     change notification support to it that led to problems, because the
     notification in question is triggered prematurely on some systems
     (Rafael Wysocki).

   - Convert the OPP DT bindings to DT schema and clean them up while at
     it (Rob Herring)"

* tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
  cpufreq: mediatek-hw: Add support for CPUFREQ HW
  cpufreq: Add of_perf_domain_get_sharing_cpumask
  dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
  cpufreq: Remove ready() callback
  cpufreq: sh: Remove sh_cpufreq_cpu_ready()
  cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
  cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
  cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
  cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
  cpufreq: scmi: Use .register_em() to register with energy model
  cpufreq: vexpress: Use .register_em() to register with energy model
  cpufreq: scpi: Use .register_em() to register with energy model
  dt-bindings: opp: Convert to DT schema
  dt-bindings: Clean-up OPP binding node names in examples
  ARM: dts: omap: Drop references to opp.txt
  cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
  cpufreq: omap: Use .register_em() to register with energy model
  cpufreq: mediatek: Use .register_em() to register with energy model
  cpufreq: imx6q: Use .register_em() to register with energy model
  ...
2021-09-08 16:38:25 -07:00
Rafael J. Wysocki
46573fd636 cpufreq: intel_pstate: hybrid: Rework HWP calibration
The current HWP calibration for hybrid processors in intel_pstate is
fragile, because it depends too much on the information provided by
the platform firmware via CPPC which may not be reliable enough.  It
also need not be so complicated.

In order to improve that mechanism and make it more resistant to
platform firmware issues, make it only use the CPPC nominal_perf
values to compute the HWP-to-frequency scaling factors for all
CPUs and possibly use the HWP_CAP highest_perf values to recompute
them if the ones derived from the CPPC nominal_perf values alone
appear to be too high.

Namely, fetch CPC.nominal_perf for all CPUs present in the system,
find the minimum one and use it as a reference for computing all of
the CPUs' scaling factors (using the observation that for the CPUs
having the minimum CPC.nominal_perf the HWP range of available
performance levels should be the same as the range of available
"legacy" P-states and so the HWP-to-frequency scaling factor for
them should be the same as the corresponding scaling factor used
for representing the P-state values in kHz).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
2021-09-07 21:15:16 +02:00