Commit Graph

251 Commits

Author SHA1 Message Date
Uwe Kleine-König
52b43bbdb6 powercap: intel_rapl: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-13 20:45:54 +01:00
Linus Torvalds
07abb19a9b Power management updates for 6.9-rc1
- Allow the Energy Model to be updated dynamically (Lukasz Luba).
 
  - Add support for LZ4 compression algorithm to the hibernation image
    creation and loading code (Nikhil V).
 
  - Fix and clean up system suspend statistics collection (Rafael
    Wysocki).
 
  - Simplify device suspend and resume handling in the power management
    core code (Rafael Wysocki).
 
  - Fix PCI hibernation support description (Yiwei Lin).
 
  - Make hibernation take set_memory_ro() return values into account as
    appropriate (Christophe Leroy).
 
  - Set mem_sleep_current during kernel command line setup to avoid an
    ordering issue with handling it (Maulik Shah).
 
  - Fix wake IRQs handling when pm_runtime_force_suspend() is used as a
    driver's system suspend callback (Qingliang Li).
 
  - Simplify pm_runtime_get_if_active() usage and add a replacement for
    pm_runtime_put_autosuspend() (Sakari Ailus).
 
  - Add a tracepoint for runtime_status changes tracking (Vilas Bhat).
 
  - Fix section title markdown in the runtime PM documentation (Yiwei
    Lin).
 
  - Enable preferred core support in the amd-pstate cpufreq driver (Meng
    Li).
 
  - Fix min_perf assignment in amd_pstate_adjust_perf() and make the
    min/max limit perf values in amd-pstate always stay within the
    (highest perf, lowest perf) range (Tor Vic, Meng Li).
 
  - Allow intel_pstate to assign model-specific values to strings used in
    the EPP sysfs interface and make it do so on Meteor Lake (Srinivas
    Pandruvada).
 
  - Drop long-unused cpudata::prev_cummulative_iowait from the
    intel_pstate cpufreq driver (Jiri Slaby).
 
  - Prevent scaling_cur_freq from exceeding scaling_max_freq when the
    latter is an inefficient frequency (Shivnandan Kumar).
 
  - Change default transition delay in cpufreq to 2ms (Qais Yousef).
 
  - Remove references to 10ms minimum sampling rate from comments in the
    cpufreq code (Pierre Gondois).
 
  - Honour transition_latency over transition_delay_us in cpufreq (Qais
    Yousef).
 
  - Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar).
 
  - General enhancements / cleanups to ARM cpufreq drivers (tianyu2,
    Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia
    Belova).
 
  - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan).
 
  - Make the SCMI cpufreq driver get a transition delay value from
    firmware (Pierre Gondois).
 
  - Prevent the haltpoll cpuidle governor from shrinking guest
    poll_limit_ns below grow_start (Parshuram Sangle).
 
  - Avoid potential overflow in integer multiplication when computing
    cpuidle state parameters (C Cheng).
 
  - Adjust MWAIT hint target C-state computation in the ACPI cpuidle
    driver and in intel_idle to return a correct value for C0 (He
    Rongguang).
 
  - Address multiple issues in the TPMI RAPL driver and add support for
    new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui).
 
  - Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel
    Lezcano).
 
  - Fix kernel-doc for dtpm_create_hierarchy() (Yang Li).
 
  - Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth
    Norway Ananda).
 
  - Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil).
 
  - Fix a couple of warnings in the OPP core code related to W=1
    builds (Viresh Kumar).
 
  - Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh
    Kumar).
 
  - Extend dev_pm_opp_data with turbo support (Sibi Sankar).
 
  - dt-bindings: drop maxItems from inner items (David Heidelberg).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmXvI/ISHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx24sP/jxg6fOGme8raHQvpTXG3/H56wlGzQ4P
 YUvvKUXnfD3yf1zNISsUl7VQebZqDt8rygkwSdymXlUVZX1eubN0RpCFc0F8GZuc
 THG/YQhYQr/9zro3FpKhfDj5evk21PCQzjf+dGvfQF9qVMxNPG1JzEFK6PnolT5X
 2BvkonY1XFWZjCMbZ83B/jt35lTDb0cmeNbCpfD5UJgcnxmMOtZYpORdyfPWTJpG
 GVCwmAFVVXxXlust/AIpt3mmOpKzSA9GnrtJkhtQe5GN+Y4OjnJiFJmTC7EfCctj
 JlWgVUA716mtFMUrjXgjfI54firF2oQpqaSa2HG/V/A96JWQqjarGz5dAV1IrPEt
 ZmYpvMe4E90S411wF1OWyrEqjXUuDnH1OWUvUdWSt4E7DhFw3esDi/jLW2tyVKAT
 hIy+/O4wzbDSTX/h9Cgt1Qjhew6lKUIwvhEXclB3fuJ+JoviWNkC9lnK93e2H0A3
 VYfkd/lpUD74035l0FrCJ/49MjX9kqrsn+TipHsIlSXAi8ZRdKbVvxOTD8RYudcI
 GvCiDDrkMgNwGlyedgbtTBUepCvSg93b+vVmRj7YMPtBhioOUo3qCn6wpqhxfnth
 9BCnPW7JxqUw/NJdlk9hKumaUZq+MK8G+kdYcIDg6xmAkWSUVP2QKlWavfMCxqRP
 +dN6T2iHsKFe
 =UePT
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "From the functional perspective, the most significant change here is
  the addition of support for Energy Models that can be updated
  dynamically at run time.

  There is also the addition of LZ4 compression support for hibernation,
  the new preferred core support in amd-pstate, new platforms support in
  the Intel RAPL driver, new model-specific EPP handling in intel_pstate
  and more.

  Apart from that, the cpufreq default transition delay is reduced from
  10 ms to 2 ms (along with some related adjustments), the system
  suspend statistics code undergoes a significant rework and there is a
  usual bunch of fixes and code cleanups all over.

  Specifics:

   - Allow the Energy Model to be updated dynamically (Lukasz Luba)

   - Add support for LZ4 compression algorithm to the hibernation image
     creation and loading code (Nikhil V)

   - Fix and clean up system suspend statistics collection (Rafael
     Wysocki)

   - Simplify device suspend and resume handling in the power management
     core code (Rafael Wysocki)

   - Fix PCI hibernation support description (Yiwei Lin)

   - Make hibernation take set_memory_ro() return values into account as
     appropriate (Christophe Leroy)

   - Set mem_sleep_current during kernel command line setup to avoid an
     ordering issue with handling it (Maulik Shah)

   - Fix wake IRQs handling when pm_runtime_force_suspend() is used as a
     driver's system suspend callback (Qingliang Li)

   - Simplify pm_runtime_get_if_active() usage and add a replacement for
     pm_runtime_put_autosuspend() (Sakari Ailus)

   - Add a tracepoint for runtime_status changes tracking (Vilas Bhat)

   - Fix section title markdown in the runtime PM documentation (Yiwei
     Lin)

   - Enable preferred core support in the amd-pstate cpufreq driver
     (Meng Li)

   - Fix min_perf assignment in amd_pstate_adjust_perf() and make the
     min/max limit perf values in amd-pstate always stay within the
     (highest perf, lowest perf) range (Tor Vic, Meng Li)

   - Allow intel_pstate to assign model-specific values to strings used
     in the EPP sysfs interface and make it do so on Meteor Lake
     (Srinivas Pandruvada)

   - Drop long-unused cpudata::prev_cummulative_iowait from the
     intel_pstate cpufreq driver (Jiri Slaby)

   - Prevent scaling_cur_freq from exceeding scaling_max_freq when the
     latter is an inefficient frequency (Shivnandan Kumar)

   - Change default transition delay in cpufreq to 2ms (Qais Yousef)

   - Remove references to 10ms minimum sampling rate from comments in
     the cpufreq code (Pierre Gondois)

   - Honour transition_latency over transition_delay_us in cpufreq (Qais
     Yousef)

   - Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar)

   - General enhancements / cleanups to ARM cpufreq drivers (tianyu2,
     Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia
     Belova)

   - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan)

   - Make the SCMI cpufreq driver get a transition delay value from
     firmware (Pierre Gondois)

   - Prevent the haltpoll cpuidle governor from shrinking guest
     poll_limit_ns below grow_start (Parshuram Sangle)

   - Avoid potential overflow in integer multiplication when computing
     cpuidle state parameters (C Cheng)

   - Adjust MWAIT hint target C-state computation in the ACPI cpuidle
     driver and in intel_idle to return a correct value for C0 (He
     Rongguang)

   - Address multiple issues in the TPMI RAPL driver and add support for
     new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui)

   - Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel
     Lezcano)

   - Fix kernel-doc for dtpm_create_hierarchy() (Yang Li)

   - Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth
     Norway Ananda)

   - Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil)

   - Fix a couple of warnings in the OPP core code related to W=1 builds
     (Viresh Kumar)

   - Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh
     Kumar)

   - Extend dev_pm_opp_data with turbo support (Sibi Sankar)

   - dt-bindings: drop maxItems from inner items (David Heidelberg)"

* tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits)
  dt-bindings: opp: drop maxItems from inner items
  OPP: debugfs: Fix warning around icc_get_name()
  OPP: debugfs: Fix warning with W=1 builds
  cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h
  OPP: Extend dev_pm_opp_data with turbo support
  Fix cpupower-frequency-info.1 man page typo
  cpufreq: scmi: Set transition_delay_us
  firmware: arm_scmi: Populate fast channel rate_limit
  firmware: arm_scmi: Populate perf commands rate_limit
  cpuidle: ACPI/intel: fix MWAIT hint target C-state computation
  PM: sleep: wakeirq: fix wake irq warning in system suspend
  powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function
  cpufreq: Don't unregister cpufreq cooling on CPU hotplug
  PM: suspend: Set mem_sleep_current during kernel command line setup
  cpufreq: Honour transition_latency over transition_delay_us
  cpufreq: Limit resolving a frequency to policy min/max
  Documentation: PM: Fix runtime_pm.rst markdown syntax
  cpufreq: amd-pstate: adjust min/max limit perf
  cpufreq: Remove references to 10ms min sampling rate
  cpufreq: intel_pstate: Update default EPPs for Meteor Lake
  ...
2024-03-13 11:40:06 -07:00
Rafael J. Wysocki
3bd834640b Merge branch 'pm-em'
Merge Enery Model changes for 6.9-rc1:

 - Allow the Energy Model to be updated dynamically (Lukasz Luba).

* pm-em: (24 commits)
  PM: EM: Fix nr_states warnings in static checks
  Documentation: EM: Update with runtime modification design
  PM: EM: Add em_dev_compute_costs()
  PM: EM: Remove old table
  PM: EM: Change debugfs configuration to use runtime EM table data
  drivers/thermal/devfreq_cooling: Use new Energy Model interface
  drivers/thermal/cpufreq_cooling: Use new Energy Model interface
  powercap/dtpm_devfreq: Use new Energy Model interface to get table
  powercap/dtpm_cpu: Use new Energy Model interface to get table
  PM: EM: Optimize em_cpu_energy() and remove division
  PM: EM: Support late CPUs booting and capacity adjustment
  PM: EM: Add performance field to struct em_perf_state and optimize
  PM: EM: Add em_perf_state_from_pd() to get performance states table
  PM: EM: Introduce em_dev_update_perf_domain() for EM updates
  PM: EM: Add functions for memory allocations for new EM tables
  PM: EM: Use runtime modified EM for CPUs energy estimation in EAS
  PM: EM: Introduce runtime modifiable table
  PM: EM: Split the allocation and initialization of the EM table
  PM: EM: Check if the get_cost() callback is present in em_compute_costs()
  PM: EM: Introduce em_compute_costs()
  ...
2024-03-11 15:59:51 +01:00
Yang Li
44c9cf9aaa powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function
The existing comment block above the dtpm_create_hierarchy function
does not conform to the kernel-doc standard. This patch fixes the
documentation to match the expected kernel-doc format, which includes
a structured documentation header with param and return value.

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-01 21:39:10 +01:00
Daniel Lezcano
b50155cb0d powercap: dtpm_cpu: Fix error check against freq_qos_add_request()
The caller of the function freq_qos_add_request() checks again a non
zero value but freq_qos_add_request() can return '1' if the request
already exists. Therefore, the setup function fails while the QoS
request actually did not failed.

Fix that by changing the check against a negative value like all the
other callers of the function.

Fixes: 0e8f68d7f0 ("Add CPU energy model based support")
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-16 20:02:31 +01:00
Thomas Gleixner
bd745d1c41 x86/cpu/topology: Rename topology_max_die_per_package()
The plural of die is dies.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210253.065874205@linutronix.de
2024-02-15 22:07:45 +01:00
Sumeet Pawnikar
4add6e841a powercap: intel_rapl: Add support for Arrow Lake
Add support for Arrow Lake platform to the RAPL common driver.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13 17:31:48 +01:00
Zhang Rui
876ed77fbe powercap: intel_rapl: Add support for Lunar Lake-M paltform
Add support for Lunar Lake-M platform to the RAPL common driver.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13 17:31:48 +01:00
Zhang Rui
903eb9fb85 powercap: intel_rapl_tpmi: Fix System Domain probing
Only domain root packages can enumerate System (Psys) domain.
Whether a package is domain root or not is described in the Bit 0 of the
Domain Info register.

Add support for Domain Info register and fix the System domain probing
accordingly.

Fixes: 9eef7f9da9 ("powercap: intel_rapl: Introduce RAPL TPMI interface driver")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.5+ <stable@vger.kernel.org> # 6.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13 17:31:48 +01:00
Zhang Rui
faa9130ce7 powercap: intel_rapl_tpmi: Fix a register bug
Add the missing Domain Info register. This also fixes the bogus
definition of the Interrupt register.

Neither of these two registers was used previously.

Fixes: 9eef7f9da9 ("powercap: intel_rapl: Introduce RAPL TPMI interface driver")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.5+ <stable@vger.kernel.org> # 6.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13 17:31:48 +01:00
Zhang Rui
1aa09b9379 powercap: intel_rapl: Fix locking in TPMI RAPL
The RAPL framework uses CPU hotplug locking to protect the rapl_packages
list and rp->lead_cpu to guarantee that

 1. the RAPL package device is not unprobed and freed
 2. the cached rp->lead_cpu is always valid

for operations like powercap sysfs accesses.

Current RAPL APIs assume being called from CPU hotplug callbacks which
hold the CPU hotplug lock, but TPMI RAPL driver invokes the APIs in the
driver's .probe() function without acquiring the CPU hotplug lock.

Fix the problem by providing both locked and lockless versions of RAPL
APIs.

Fixes: 9eef7f9da9 ("powercap: intel_rapl: Introduce RAPL TPMI interface driver")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.5+ <stable@vger.kernel.org> # 6.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13 17:31:48 +01:00
Zhang Rui
2d1f5006ff powercap: intel_rapl: Fix a NULL pointer dereference
A NULL pointer dereference is triggered when probing the MMIO RAPL
driver on platforms with CPU ID not listed in intel_rapl_common CPU
model list.

This is because the intel_rapl_common module still probes on such
platforms even if 'defaults_msr' is not set after commit 1488ac990a
("powercap: intel_rapl: Allow probing without CPUID match"). Thus the
MMIO RAPL rp->priv->defaults is NULL when registering to RAPL framework.

Fix the problem by adding sanity check to ensure rp->priv->rapl_defaults
is always valid.

Fixes: 1488ac990a ("powercap: intel_rapl: Allow probing without CPUID match")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.5+ <stable@vger.kernel.org> # 6.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13 17:31:48 +01:00
Lukasz Luba
27d2c37e7d powercap/dtpm_devfreq: Use new Energy Model interface to get table
Energy Model framework support modifications at runtime of the power
values. Use the new EM table API which is protected with RCU. Align the
code so that this RCU read section is short.

This change is not expected to alter the general functionality.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08 15:00:31 +01:00
Lukasz Luba
e20b7a8172 powercap/dtpm_cpu: Use new Energy Model interface to get table
Energy Model framework support modifications at runtime of the power
values. Use the new EM table API which is protected with RCU. Align the
code so that this RCU read section is short.

This change is not expected to alter the general functionality.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08 15:00:31 +01:00
Lukasz Luba
bdefd9913b powercap: DTPM: Fix missing cpufreq_cpu_put() calls
The policy returned by cpufreq_cpu_get() has to be released with
the help of cpufreq_cpu_put() to balance its kobject reference counter
properly.

Add the missing calls to cpufreq_cpu_put() in the code.

Fixes: 0aea2e4ec2 ("powercap/dtpm_cpu: Reset per_cpu variable in the release function")
Fixes: 0e8f68d7f0 ("powercap/drivers/dtpm: Add CPU energy model based support")
Cc: v5.16+ <stable@vger.kernel.org> # v5.16+
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-05 20:51:24 +01:00
Lukasz Luba
b817f1488f powercap: DTPM: Fix unneeded conversions to micro-Watts
The power values coming from the Energy Model are already in uW.

The PowerCap and DTPM frameworks operate on uW, so all places should
just use the values from the EM.

Fix the code by removing all of the conversion to uW still present in it.

Fixes: ae6ccaa650 (PM: EM: convert power field to micro-Watts precision and align drivers)
Cc: 5.19+ <stable@vger.kernel.org> # v5.19+
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-28 15:15:14 +01:00
Ville Syrjälä
a60ec4485f powercap: intel_rapl: Downgrade BIOS locked limits pr_warn() to pr_debug()
Before the refactoring the pr_warn() only triggered when
someone explicitly tried to write to a BIOS locked limit.
After the refactoring the warning is also triggering during
system resume. The user can't do anything about this so
printing scary warnings doesn't make sense

Keep the printk but make it pr_debug() instead of pr_warn()
to make it clear it's not a serious issue.

Fixes: 9050a9cd5e ("powercap: intel_rapl: Cleanup Power Limits support")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: 6.5+ <stable@vger.kernel.org> # 6.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-24 22:07:07 +02:00
Srinivas Pandruvada
081690e941 powercap: intel_rapl: Fix invalid setting of Power Limit 4
System runs at minimum performance, once powercap RAPL package domain
enabled flag is changed from 1 to 0 to 1.

Setting RAPL package domain enabled flag to 0, results in setting of
power limit 4 (PL4) MSR 0x601 to 0. This implies disabling PL4 limit.
The PL4 limit controls the peak power. So setting 0, results in some
undesirable performance, which depends on hardware implementation.

Even worse, when the enabled flag is set to 1 again. This will set PL4
MSR value to 0x01, which means reduce peak power to 0.125W. This will
force system to run at the lowest possible performance on every PL4
supported system.

Setting enabled flag should only affect the "enable" bit, not other
bits. Here it is changing power limit.

This is caused by a change which assumes that there is an enable bit in
the PL4 MSR like other power limits. Although PL4 enable/disable bit is
present with TPMI RAPL interface, it is not present with the MSR
interface.

There is a rapl_primitive_info defined for non existent PL4 enable bit
and then it is used with the commit 9050a9cd5e ("powercap: intel_rapl:
Cleanup Power Limits support") to enable PL4. This is wrong, hence remove
this rapl primitive for PL4. Also in the function
rapl_detect_powerlimit(), PL_ENABLE is used to check for the presence of
power limits. Replace PL_ENABLE with PL_LIMIT, as PL_LIMIT must be
present. Without this change, PL4 controls will not be available in the
sysfs once rapl primitive for PL4 is removed.

Fixes: 9050a9cd5e ("powercap: intel_rapl: Cleanup Power Limits support")
Suggested-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Cc: 6.5+ <stable@vger.kernel.org> # 6.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-06 22:21:22 +02:00
Linus Torvalds
ccc5e98177 Power management updates for 6.6-rc1
- Rework the menu and teo cpuidle governors to avoid calling
    tick_nohz_get_sleep_length(), which is likely to become quite
    expensive going forward, too often and improve making decisions
    regarding whether or not to stop the scheduler tick in the teo
    governor (Rafael Wysocki).
 
  - Improve the performance of cpufreq_stats_create_table() in some
    cases (Liao Chang).
 
  - Fix two issues in the amd-pstate-ut cpufreq driver (Swapnil Sapkal).
 
  - Use clamp() helper macro to improve the code readability in
    cpufreq_verify_within_limits() (Liao Chang).
 
  - Set stale CPU frequency to minimum in intel_pstate (Doug Smythies).
 
  - Migrate cpufreq drivers for various platforms to use void remove
    callback (Yangtao Li).
 
  - Add online/offline/exit hooks for Tegra driver (Sumit Gupta).
 
  - Explicitly include correct DT includes in cpufreq (Rob Herring).
 
  - Frequency domain updates for qcom-hw driver (Neil Armstrong).
 
  - Modify AMD pstate driver return the highest_perf value (Meng Li).
 
  - Generic cleanups for cppc, mediatek and powernow driver (Liao Chang,
    Konrad Dybcio).
 
  - Add more platforms to cpufreq-arm driver's blocklist (AngeloGioacchino
    Del Regno and Konrad Dybcio).
 
  - brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva).
 
  - Add device PM helpers to allow a device to remain powered-on during
    system-wide transitions (Ulf Hansson).
 
  - Rework hibernation memory snapshotting to avoid storing pages filled
    with zeros in hibernation image files (Brian Geffon).
 
  - Add check to make sure that CPU latency QoS constraints do not use
    negative values (Clive Lin).
 
  - Optimize rp->domains memory allocation in the Intel RAPL power
    capping driver (xiongxin).
 
  - Remove recursion while parsing zones in the arm_scmi power capping
    driver (Cristian Marussi).
 
  - Fix memory leak in devfreq_dev_release() (Boris Brezillon).
 
  - Rewrite devfreq_monitor_start() kerneldoc comment (Manivannan
    Sadhasivam).
 
  - Explicitly include correct DT includes in devfreq (Rob Herring).
 
  - Remove unsued pm_runtime_update_max_time_suspended() extern
    declaration (YueHaibing).
 
  - Add turbo-boost support to cpupower (Wyes Karny).
 
  - Add support for amd_pstate mode change to cpupower (Wyes Karny).
 
  - Fix 'cpupower idle_set' command to accept only numeric values of
    arguments (Likhitha Korrapati).
 
  - Clean up OPP code and add new frequency related APIs to it (Viresh
    Kumar, Manivannan Sadhasivam).
 
  - Convert ti cpufreq/opp bindings to json schema (Nishanth Menon).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmTslI4SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxLMYP/3v0DxA3HZSZ/Xg63P9ylnln084cDt+/
 qpJZ0CJUd6+MkoeuCYq/5udNwPSREsfx+pIEJy+h/iCiQlQz3NzriR7/dgPV0Ud0
 t7k95lyZo+u51MNxk4SEqRMVTyYaNgDPvGbLyWFpLnne3CsxYzfH5xr77yHf342W
 jHii1vJLXiXPnQWDlahf8tUpdQ0MQFmEwx0WkJp81NaAFyXDi0fPrB4YZaZrr6AQ
 3TNaxTxZSirVSn19m5RPPAQhEfK8Dk4jF8wVPWsuL9F6v+9wERD9zcaxUPf3CD36
 aj+SqKLCkOfkJHk45PCIYbS2wQ04fT/yWE9Rzm4iSr+fWA/q7vA0jXsaAgcv1Bm7
 k6QyAy2ffLZTUFObX5bevIPvxZTzunLh0iglHx0WZKS/nn/9Jwpt6UMrpOsjiw/J
 GLKEww+ZiKXj980GfvV2QUZG/XmsrvML/1L+qiDxNB2IPTxxuOxrWQ+cM7oxUTPM
 pdIPIdwkm5ICVRVcAfNw/fr30s2yp1K304VWgzbKdK9b1aVhUSkxZGI8KHFODOHO
 4Crii2rk0r972kxuJmenKwEfmwr/rbAAstFVSM736jH9RUANaWsIeNvkurXMOd2f
 mil9DViTAu0iY4cy5tgLiLHDH4tOQOOCntRVFJ1tSytMyCFlMvVM0dwrc0yh254Q
 zcrNj8ERJSsC
 =6BIh
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These rework cpuidle governors to call tick_nohz_get_sleep_length()
  less often and fix one of them, rework hibernation to avoid storing
  pages filled with zeros in hibernation images, switch over some
  cpufreq drivers to use void remove callbacks, fix and clean up
  multiple cpufreq drivers, fix the devfreq core, update the cpupower
  utility and make other assorted improvements.

  Specifics:

   - Rework the menu and teo cpuidle governors to avoid calling
     tick_nohz_get_sleep_length(), which is likely to become quite
     expensive going forward, too often and improve making decisions
     regarding whether or not to stop the scheduler tick in the teo
     governor (Rafael Wysocki)

   - Improve the performance of cpufreq_stats_create_table() in some
     cases (Liao Chang)

   - Fix two issues in the amd-pstate-ut cpufreq driver (Swapnil Sapkal)

   - Use clamp() helper macro to improve the code readability in
     cpufreq_verify_within_limits() (Liao Chang)

   - Set stale CPU frequency to minimum in intel_pstate (Doug Smythies)

   - Migrate cpufreq drivers for various platforms to use void remove
     callback (Yangtao Li)

   - Add online/offline/exit hooks for Tegra driver (Sumit Gupta)

   - Explicitly include correct DT includes in cpufreq (Rob Herring)

   - Frequency domain updates for qcom-hw driver (Neil Armstrong)

   - Modify AMD pstate driver return the highest_perf value (Meng Li)

   - Generic cleanups for cppc, mediatek and powernow driver (Liao
     Chang, Konrad Dybcio)

   - Add more platforms to cpufreq-arm driver's blocklist
     (AngeloGioacchino Del Regno and Konrad Dybcio)

   - brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva)

   - Add device PM helpers to allow a device to remain powered-on during
     system-wide transitions (Ulf Hansson)

   - Rework hibernation memory snapshotting to avoid storing pages
     filled with zeros in hibernation image files (Brian Geffon)

   - Add check to make sure that CPU latency QoS constraints do not use
     negative values (Clive Lin)

   - Optimize rp->domains memory allocation in the Intel RAPL power
     capping driver (xiongxin)

   - Remove recursion while parsing zones in the arm_scmi power capping
     driver (Cristian Marussi)

   - Fix memory leak in devfreq_dev_release() (Boris Brezillon)

   - Rewrite devfreq_monitor_start() kerneldoc comment (Manivannan
     Sadhasivam)

   - Explicitly include correct DT includes in devfreq (Rob Herring)

   - Remove unsued pm_runtime_update_max_time_suspended() extern
     declaration (YueHaibing)

   - Add turbo-boost support to cpupower (Wyes Karny)

   - Add support for amd_pstate mode change to cpupower (Wyes Karny)

   - Fix 'cpupower idle_set' command to accept only numeric values of
     arguments (Likhitha Korrapati)

   - Clean up OPP code and add new frequency related APIs to it (Viresh
     Kumar, Manivannan Sadhasivam)

   - Convert ti cpufreq/opp bindings to json schema (Nishanth Menon)"

* tag 'pm-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
  cpufreq: tegra194: remove opp table in exit hook
  cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()
  cpufreq: tegra194: add online/offline hooks
  cpuidle: teo: Avoid unnecessary variable assignments
  cpufreq: qcom-cpufreq-hw: add support for 4 freq domains
  dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain
  cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver
  cpufreq: amd-pstate-ut: Remove module parameter access
  cpufreq: Use clamp() helper macro to improve the code readability
  PM: sleep: Add helpers to allow a device to remain powered-on
  PM: QoS: Add check to make sure CPU latency is non-negative
  PM: runtime: Remove unsued extern declaration of pm_runtime_update_max_time_suspended()
  cpufreq: intel_pstate: set stale CPU frequency to minimum
  cpufreq: stats: Improve the performance of cpufreq_stats_create_table()
  dt-bindings: cpufreq: Convert ti-cpufreq to json schema
  dt-bindings: opp: Convert ti-omap5-opp-supply to json schema
  OPP: Fix argument name in doc comment
  cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases
  cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie
  cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.
  ...
2023-08-28 18:04:39 -07:00
Linus Torvalds
1a7c611546 Perf events changes for v6.6:
- AMD IBS improvements
 - Intel PMU driver updates
 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events
 - Micro-optimize software events and the ring-buffer code
 - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTtBscRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hHoQ/+IBQ8Xi/rcdd40n8OqEB/VBWVuSjNT3uN
 3pHHcTl2Pio9CxBeat42NekNijlRILCKJrZ3Lt3JWBmWyWv5l3KFabelj+lDF2xa
 TVCjTnQNe1+HvrODYnF4ECIs5vaoMVjcJ9jg8+VDgAcOQr1nZs4m5TVAd6TLqPpV
 urBEQVULkkzk7ZRhfrugKhw+wrpWFefgGCx0RV8ijZB7TLMHc2wE+Q/sTxKdKceL
 wNaJaDgV33pZh0aImwR9pKUE532hF1FiBdLuehkh61PZa1L82jzAX1xjw2s1hSa4
 eIWemPHJIYfivRlENbJsDWc4N8gk6ijVHwrxGcr4Axu+NN+zPtQ3ddhaGMAyKdTo
 qUKXH3MZSMIl++jI5Fkc6xM+XLvY1rML62epSzMwu/cc7Z5MeyWdQcri0N9YFuO7
 wUUNnFpU00lwQBLbyyUQ3Zi8E0QV7NuPW4axTkmntiIjMpLagaEvVSf6nf8qLpbE
 WTT16s707t19hUZNazNZ7ONmhly4ALbHFQEH65J2KoYn99fYqy9z68Hwk+xnmykw
 bc3qvfhpw0MImQQ+DqHiBwb4n4UuvY2WlkkZI3FfNeSG63DaM2mZikfpElpXYjn6
 9iOIXvx21Wiq/n0cbLhidI2q/ZzFCzYLCk6ikZ320wb+rhvd7EoSlZil6QSzn3pH
 Qdk+NEZgWQY=
 =ZT6+
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event updates from Ingo Molnar:

 - AMD IBS improvements

 - Intel PMU driver updates

 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events

 - Micro-optimize software events and the ring-buffer code

 - Misc cleanups & fixes

* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
  perf/x86/intel: Add Crestmont PMU
  x86/cpu: Update Hybrids
  x86/cpu: Fix Crestmont uarch
  x86/cpu: Fix Gracemont uarch
  perf: Remove unused extern declaration arch_perf_get_page_size()
  perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
  perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
  perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
  locking/arch: Avoid variable shadowing in local_try_cmpxchg()
  perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
  perf/x86: Use local64_try_cmpxchg
  perf/amd: Prevent grouping of IBS events
2023-08-28 16:35:01 -07:00
Peter Zijlstra
882cdb06b6 x86/cpu: Fix Gracemont uarch
Alderlake N is an E-core only product using Gracemont
micro-architecture. It fits the pre-existing naming scheme perfectly
fine, adhere to it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230807150405.686834933@infradead.org
2023-08-09 21:51:06 +02:00
Rafael J. Wysocki
9784239529 Merge back earlier power capping changes for v6.6. 2023-08-04 22:48:58 +02:00
xiongxin
2fa00769b1 powercap: intel_rapl: Optimize rp->domains memory allocation
In the memory allocation of rp->domains in rapl_detect_domains(), there
is an additional memory of struct rapl_domain allocated, optimize the
code here to save sizeof(struct rapl_domain) bytes of memory.

Test in Intel NUC (i5-1135G7).

Signed-off-by: xiongxin <xiongxin@kylinos.cn>
Tested-by: xiongxin <xiongxin@kylinos.cn>
Reviewed-by: Srinivas Pandruvada<srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-01 13:54:00 +02:00
Zhang Rui
16e95a62ee powercap: intel_rapl: Fix a sparse warning in TPMI interface
Depends on the interface used, the RAPL registers can be either MSR
indexes or memory mapped IO addresses. Current RAPL common code uses u64
to save both MSR and memory mapped IO registers. With this, when
handling register address with an __iomem annotation, it triggers a
sparse warning like below:

sparse warnings: (new ones prefixed by >>)
>> drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected unsigned long long [usertype] *tpmi_rapl_regs @@     got void [noderef] __iomem * @@
   drivers/powercap/intel_rapl_tpmi.c:141:41: sparse:     expected unsigned long long [usertype] *tpmi_rapl_regs
   drivers/powercap/intel_rapl_tpmi.c:141:41: sparse:     got void [noderef] __iomem *

Fix the problem by using a union to save the registers instead.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307031405.dy3druuy-lkp@intel.com/
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-01 13:45:08 +02:00
Cristian Marussi
3e767d6850 powercap: arm_scmi: Remove recursion while parsing zones
Powercap zones can be defined as arranged in a hierarchy of trees and when
registering a zone with powercap_register_zone(), the kernel powercap
subsystem expects this to happen starting from the root zones down to the
leaves; on the other side, de-registration by powercap_deregister_zone()
must begin from the leaf zones.

Available SCMI powercap zones are retrieved dynamically from the platform
at probe time and, while any defined hierarchy between the zones is
described properly in the zones descriptor, the platform returns the
availables zones with no particular well-defined order: as a consequence,
the trees possibly composing the hierarchy of zones have to be somehow
walked properly to register the retrieved zones from the root.

Currently the ARM SCMI Powercap driver walks the zones using a recursive
algorithm; this approach, even though correct and tested can lead to kernel
stack overflow when processing a returned hierarchy of zones composed by
particularly high trees.

Avoid possible kernel stack overflow by substituting the recursive approach
with an iterative one supported by a dynamically allocated stack-like data
structure.

Fixes: b55eef5226 ("powercap: arm_scmi: Add SCMI Powercap based driver")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-20 20:27:10 +02:00
Linus Torvalds
e4c8d01865 ARM: SoC drivers for 6.5
Nothing surprising in the SoC specific drivers, with the usual updates:
 
  * Added or improved SoC driver support for Tegra234, Exynos4121, RK3588,
    as well as multiple Mediatek and Qualcomm chips
 
  * SCMI firmware gains support for multiple SMC/HVC transport and version
    3.2 of the protocol
 
  * Cleanups amd minor changes for the reset controller, memory controller,
    firmware and sram drivers
 
  * Minor changes to amd/xilinx, samsung, tegra, nxp, ti, qualcomm,
    amlogic and renesas SoC specific drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmSdmbIACgkQYKtH/8kJ
 UicewQ/6Aq8j5pBFYBimZoyQ0bi9z+prGrHoDDYLew2vKjtOXJl5z7ZnM3J1oyPt
 Zvis3IaGkHJCuuqotPdsquZrzHq8slzXzwkHPfHORJBC4gV0V/vMS8w32tO5FfTq
 ULrMyWnbsU7Udeywc2xuEpAoC9+bXX9brnCpa3H41peIGZKM+0g7EE6FASt3YaOk
 O+ZMSGqF8QbCqSQrUH3GudFlFMy/VxIvwuUsbLt8aNkRACunQZXVgUdArvLV49nX
 SElFN7hOVRoVDv0rgYMxlwElymrta/kMyjLba8GU1GIhzyDGozVqIJQAnsQ3f6CC
 yyzaJm27zzJH0mx9jx4W+JLBdjqDL4ctE2WyllRVIpTGYMHiMQtutHNwtNupIuD5
 j9j/fIVQWZqOdWXnA6V/CHYN1MZBRTH3KQcnLlYPC01dWKThPDnrHGfwOkfsrwtN
 zuERJJ+gd5b8KW4dmy1ueDOSB8162LxbS7iHxpOBGySmqVOYj3XUqACZhKRfXfIQ
 BVj9punCE/gO2fMb9IZByjeOzgtV+PBRmPxoglyaGkT4fVfL06kEbpKFYbXXq9b/
 aAS/U84gGr8ebWsOXszwDnBzTZRzjMVv/T9KDTTJuWbBEPNyCR7fUG0cZ50rSKnJ
 2cTPe3a0sS6LaBt71qfExCIfxG+cJ2c3N1U5/jb2C49Aob45obs=
 =zvLr
 -----END PGP SIGNATURE-----

Merge tag 'soc-drivers-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "Nothing surprising in the SoC specific drivers, with the usual
  updates:

   - Added or improved SoC driver support for Tegra234, Exynos4121,
     RK3588, as well as multiple Mediatek and Qualcomm chips

   - SCMI firmware gains support for multiple SMC/HVC transport and
     version 3.2 of the protocol

   - Cleanups amd minor changes for the reset controller, memory
     controller, firmware and sram drivers

   - Minor changes to amd/xilinx, samsung, tegra, nxp, ti, qualcomm,
     amlogic and renesas SoC specific drivers"

* tag 'soc-drivers-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (118 commits)
  dt-bindings: interrupt-controller: Convert Amlogic Meson GPIO interrupt controller binding
  MAINTAINERS: add PHY-related files to Amlogic SoC file list
  drivers: meson: secure-pwrc: always enable DMA domain
  tee: optee: Use kmemdup() to replace kmalloc + memcpy
  soc: qcom: geni-se: Do not bother about enable/disable of interrupts in secondary sequencer
  dt-bindings: sram: qcom,imem: document qdu1000
  soc: qcom: icc-bwmon: Fix MSM8998 count unit
  dt-bindings: soc: qcom,rpmh-rsc: Require power-domains
  soc: qcom: socinfo: Add Soc ID for IPQ5300
  dt-bindings: arm: qcom,ids: add SoC ID for IPQ5300
  soc: qcom: Fix a IS_ERR() vs NULL bug in probe
  soc: qcom: socinfo: Add support for new fields in revision 19
  soc: qcom: socinfo: Add support for new fields in revision 18
  dt-bindings: firmware: scm: Add compatible for SDX75
  soc: qcom: mdt_loader: Fix split image detection
  dt-bindings: memory-controllers: drop unneeded quotes
  soc: rockchip: dtpm: use C99 array init syntax
  firmware: tegra: bpmp: Add support for DRAM MRQ GSCs
  soc/tegra: pmc: Use devm_clk_notifier_register()
  soc/tegra: pmc: Simplify debugfs initialization
  ...
2023-06-29 15:22:19 -07:00
Dan Carpenter
49776c712e powercap: RAPL: Fix a NULL vs IS_ERR() bug
The devm_ioremap_resource() function returns error pointers on error,
it never returns NULL.  Update the check accordingly.

Fixes: 9eef7f9da9 ("powercap: intel_rapl: Introduce RAPL TPMI interface driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 19:51:21 +02:00
Zhang Rui
4658fe81b3 powercap: RAPL: Fix CONFIG_IOSF_MBI dependency
After commit 3382388d71 ("intel_rapl: abstract RAPL common code"),
accessing to IOSF_MBI interface is done in the RAPL common code.

Thus it is the CONFIG_INTEL_RAPL_CORE that has dependency of
CONFIG_IOSF_MBI, while CONFIG_INTEL_RAPL_MSR does not.

This problem was not exposed previously because all the previous RAPL
common code users, aka, the RAPL MSR and MMIO I/F drivers, have
CONFIG_IOSF_MBI selected.

Fix the CONFIG_IOSF_MBI dependency in RAPL code. This also fixes a build
time failure when the RAPL TPMI I/F driver is introduced without
selecting CONFIG_IOSF_MBI.

x86_64-linux-ld: vmlinux.o: in function `set_floor_freq_atom':
intel_rapl_common.c:(.text+0x2dac9b8): undefined reference to `iosf_mbi_write'
x86_64-linux-ld: intel_rapl_common.c:(.text+0x2daca66): undefined reference to `iosf_mbi_read'

Reference to iosf_mbi.h is also removed from the RAPL MSR I/F driver.

Fixes: 3382388d71 ("intel_rapl: abstract RAPL common code")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20230601213246.3271412-1-arnd@kernel.org
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 19:48:40 +02:00
Sumeet Pawnikar
d05b5e0baf powercap: RAPL: fix invalid initialization for pl4_supported field
The current initialization of the struct x86_cpu_id via
pl4_support_ids[] is partial and wrong. It is initializing
"stepping" field with "X86_FEATURE_ANY" instead of "feature" field.

Use X86_MATCH_INTEL_FAM6_MODEL macro instead of initializing
each field of the struct x86_cpu_id for pl4_supported list of CPUs.
This X86_MATCH_INTEL_FAM6_MODEL macro internally uses another macro
X86_MATCH_VENDOR_FAM_MODEL_FEATURE for X86 based CPU matching with
appropriate initialized values.

Reported-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/lkml/28ead36b-2d9e-1a36-6f4e-04684e420260@intel.com
Fixes: eb52bc2ae5 ("powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC")
Fixes: b08b95cf30 ("powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P")
Fixes: 5157559069 ("powercap: RAPL: Add Power Limit4 support for RaptorLake")
Fixes: 1cc5b9a411 ("powercap: Add Power Limit4 support for Alder Lake SoC")
Fixes: 8365a898fe ("powercap: Add Power Limit4 support")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 19:44:52 +02:00
Cristian Marussi
aaffb4cacd powercap: arm_scmi: Add support for disabling powercaps on a zone
Add support to disable/enable powercapping on a zone.

Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20230531152039.2363181-4-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2023-06-06 14:05:10 +01:00
Zhang Rui
9eef7f9da9 powercap: intel_rapl: Introduce RAPL TPMI interface driver
The TPMI (Topology Aware Register and PM Capsule Interface) provides a
flexible, extendable and PCIe enumerable MMIO interface for PM features.

Intel RAPL (Running Average Power Limit) is one of the features that
benefit from this. Using TPMI Interface has advantage over traditional MSR
(Model Specific Register) interface, where a thread needs to be scheduled
on the target CPU to read or write. Also the RAPL features vary between
CPU models, and hence lot of model specific code. Here TPMI provides an
architectural interface by providing hierarchical tables and fields,
which will not need any model specific implementation.

TPMI interface uses a PCI VSEC structure to expose the location of MMIO
interface for PM feature enumeration and control.

The Intel VSEC driver parses VSEC structures present in the PCI
configuration space of the given device and creates an auxiliary device
object for each of them. In particular, it creates an auxiliary device
object representing TPMI that can be bound to by an auxiliary driver.

Then the TPMI enumeration driver binds to the TPMI auxiliary device
object created by the Intel VSEC driver, parses the PM Feature Structure
(PFS) present in the TPMI MMIO region and creates device nodes for PM
features described in the PFS.

This RAPL TPMI Interface driver binds the RAPL auxiliary device created
by the TPMI enumeration driver and expose the RAPL control to userspace
via powercap sysfs class.

RAPL TPMI details are published in the following document:
https://github.com/intel/tpmi_power_management/blob/main/RAPL_TPMI_public_disclosure_FINAL.docx

Note, for now, the RAPL TPMI Interface and RAPL MSR Interface cannot
co-exists on the same platform (RAPL TPMI Interface is not supported on
any platforms in the CPU model list for RAPL MSR Interface). Thus
register the RAPL TPMI powercap control type with name "intel-rapl",
the same as RAPL MSR Interface, so that it is transparent to userspace.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
e12dee18b8 powercap: intel_rapl: Introduce core support for TPMI interface
Compared with existing RAPL MSR/MMIO Interface, the RAPL TPMI Interface
1. has per Power Limit register, thus has per Power Limit Lock and
   Enable bit.
2. doesn't have Power Limit Clamp bit.
3. the Power Limit Lock and Enable bits have different bit offsets.
These mean RAPL TPMI Interface needs its own primitive information.

RAPL TPMI Interface also has per domain unit register but with a
different register layout. This requires a TPMI specific rapl_defaults
call to decode the unit register.

Introduce the RAPL core support for TPMI Interface.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
b4288ce788 powercap: intel_rapl: Introduce RAPL I/F type
Different RAPL Interfaces may have different primitive information and
rapl_defaults calls.

To better distinguish this difference in the RAPL framework code,
introduce a new enum to represent different types of RAPL Interfaces.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
bf44b9011d powercap: intel_rapl: Make cpu optional for rapl_package
MSR RAPL Interface always removes a rapl_package when all the CPUs in
that rapl_package are offlined. This is because it relies on an online
CPU to access the MSR.

But for RAPL Interface using MMIO registers, when all the cpus within
the rapl_package are offlined,
1. the register can still be accessed
2. monitoring and setting the Power Pimits for the rapl_package is still
   meaningful because of uncore power.

This means that, a valid rapl_package doesn't rely on one or more cpus
being onlined.

For this sense, make cpu optional for rapl_package. A rapl_package can
be registered either using a CPU id to represent the physical
package/die, or using the physical package id directly.

Note that, the thermal throttling interrupt is not disabled via
MSR_IA32_PACKAGE_THERM_INTERRUPT for such rapl_package at the moment.
If it is still needed in the future, this can be achieved by selecting
an onlined CPU using the physical package id.

Note that, processor_thermal_rapl, the current MMIO RAPL Interface
driver, can also be converted to register using a package id instead.
But this is not done right now because processor_thermal_rapl driver
works on single-package systems only, and offlining the only package
will not happen. So keep the previous logic.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
693c1d7868 powercap: intel_rapl: Remove redundant cpu parameter
For rapl_packages that rely on online CPUs to work, rp->lead_cpu always
has a valid CPU id.

Remove the redundant cpu parameter in rapl_check_domain(),
rapl_detect_domains() and .check_unit() callbacks.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
f442bd2742 powercap: intel_rapl: Add support for lock bit per Power Limit
With RAPL MSR/MMIO Interface, each RAPL domain has one Power Limit
register. Each Power Limit register has one lock bit which tells the OS
if the power limit register can be used or not.
Depending on the number of power limits supported by the power limit
register, the lock bit may apply to one or more power limits.

With RAPL TPMI Interface, each RAPL domain has multiple Power Limits,
and each Power Limit has its own register, with a lock bit.

To handle this, introduce support for lock bit per Power Limit.

For existing RAPL MSR/MMIO Interfaces, the lock bit in the Power Limit
register applies to all the Power Limits controlled by this register.

Remove the per domain DOMAIN_STATE_BIOS_LOCKED flag at the same time
because it can be replaced by the per Power Limit lock.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
9050a9cd5e powercap: intel_rapl: Cleanup Power Limits support
The same set of operations are shared by different Powert Limits,
including Power Limit get/set, Power Limit enable/disable, clamping
enable/disable, time window get/set, and max power get/set, etc.

But the same operation for different Power Limit has different
primitives because they use different registers/register bits.

A lot of dirty/duplicate code was introduced to handle this difference.

Introduce a universal way to issue Power Limit operations.
Instead of using hardcoded primitive name directly, use Power Limit id
+ operation type, and hide all the Power Limit difference details in a
central place, get_pl_prim(). Two helpers, rapl_read_pl_data() and
rapl_write_pl_data(), are introduced at the same time to simplify the
code for issuing Power Limit operations.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
a38f300bb2 powercap: intel_rapl: Use bitmap for Power Limits
Currently, a RAPL package is registered with the number of Power Limits
supported in each RAPL domain. But this doesn't tell which Power Limits
are available. Using the number of Power Limits supported to guess the
availability of each Power Limit is fragile.

Use bitmap to represent the availability of each Power Limit.

Note that PL1 is mandatory thus it does not need to be set explicitly by
the RAPL Interface drivers.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
045610c383 powercap: intel_rapl: Change primitive order
The same set of operations are shared by different Powert Limits,
including Power Limit get/set, Power Limit enable/disable, clamping
enable/disable, time window get/set, and max power get/set, etc.

But the same operation for different Power Limit has different
primitives because they use different registers/register bits.

A lot of dirty/duplicate code was introduced to handle this difference.

Instead of using hardcoded primitive name directly, using Power Limit id
+ operation type is much cleaner.

For this sense, move POWER_LIMIT1/POWER_LIMIT2/POWER_LIMIT4 to the
beginning of enum rapl_primitives so that they can be reused as
Power Limit ids.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
11edbe5c66 powercap: intel_rapl: Use index to initialize primitive information
Currently, the RAPL primitive information array is required to be
initialized in the order of enum rapl_primitives.
This can break easily, especially when different RAPL Interfaces may
support different sets of primitives.

Convert the code to initialize the primitive information using array
index explicitly.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
cb532e728e powercap: intel_rapl: Support per domain energy/power/time unit
RAPL MSR/MMIO Interface has package scope unit register but some RAPL
domains like Dram/Psys may use a fixed energy unit value instead of the
default unit value on certain platforms.
RAPL TPMI Interface supports per domain unit register.

For the above reasons, add support for per domain unit register and per
domain energy/power/time unit.

When per domain unit register is not available, use the package scope
unit register as the per domain unit register for each RAPL domain so
that this change is transparent to MSR/MMIO Interface.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
98ff639a72 powercap: intel_rapl: Support per Interface primitive information
RAPL primitive information is Interface specific.

Although current MSR and MMIO Interface share the same RAPL primitives,
new Interface like TPMI has its own RAPL primitive information.

Save the primitive information in the Interface private structure.

Plus, using variant name "rp" for struct rapl_primitive_info is
confusing because "rp" is also used for struct rapl_package.
Use "rpi" as the variant name for struct rapl_primitive_info, and rename
the previous rpi[] array to avoid conflict.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
e8e28c2af1 powercap: intel_rapl: Support per Interface rapl_defaults
rapl_defaults is Interface specific.

Although current MSR and MMIO Interface share the same rapl_defaults,
new Interface like TPMI need its own rapl_defaults callbacks.

Save the rapl_defaults information in the Interface private structure.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
1488ac990a powercap: intel_rapl: Allow probing without CPUID match
Currently, CPU model checks is used to
1. get proper rapl_defaults callbacks for RAPL MSR/MMIO Interface.
2. create a platform device node for the intel_rapl_msr driver to probe.

Both of these are only mandatory for the RAPL MSR/MMIO Interface.

Make the CPUID match optional.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Linus Torvalds
c8b4accf86 More power management updates for 6.3-rc1
- Fix error handling in the apple-soc cpufreq driver (Dan Carpenter).
 
  - Change the log level of a message in the amd-pstate cpufreq driver
    so it is more visible to users (Kai-Heng Feng).
 
  - Adjust the balance_performance EPP value for Sapphire Rapids in the
    intel_pstate cpufreq driver (Srinivas Pandruvada).
 
  - Remove MODULE_LICENSE from 3 pieces of non-modular code (Nick Alcock).
 
  - Make a read-only kobj_type structure in the schedutil cpufreq governor
    constant (Thomas Weißschuh).
 
  - Add Add Power Limit4 support for Meteor Lake SoC to the Intel RAPL
    power capping driver (Sumeet Pawnikar).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmQCNm0SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx0P8QAJdcjEg4cY/OJq9VTqiUo92mt63aqWXr
 vOMYgp5QDuDRPKiVMMuCJNCJSrA/xPwAz4pcSOj/59CZn1UZqzg6Ap/vdmV7pQV1
 vDp6N9E2sgCL0QMQqueyv3yEN9meleGrQPnAulOZf29Q9PC0rGiAUdivvgDXZq/9
 9wo+/11EzZpKyaDTfLzotIvNOkmmkp5FtxaCR64D5w3e6gehGrL/wL3FgSefDnsa
 fNlhxgi66FYuamSgXqQTkuIuig0Rbvlp0fmhllPaIOkNMI4o7rvP5rB7FbAKZm1E
 XI+M3aVlZsImPpEEJ1dTqbc4y9WU9HakLfRRSiUnnWHSXpwBI8ncEwP0oulqqVoF
 elA9kd7Sv1DwLiUGMy3GaOwscTN6NDUwICH4UJeISWlMZfrP7YR3Bb7gVtzlCrKC
 b99oA92OFazzWliYPSzzSsFQTI1PczNLZKnelzgF1+5g76q3sIUa3tu6Ts6Z84UK
 rHCLDVw8TUFpsEQxKOvM3oLUdpvE0mmQpdG8fSeHVxon47jVzmkbDjgOBFp1i19F
 HBI7LfOhDkkzO35qb6+DfkQoij0mpn8ldbVVLd1XQe4F0WjfSe8D4oGTgB3ErTK7
 8cOKGoez9FsQRHbSVCaZcaTSemDHdB3UMRUHG2hvU1jbwXOaQcLfMAF5McyECJiN
 U8uI28JVGVPI
 =lRKS
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These update power capping (new hardware support and cleanup) and
  cpufreq (bug fixes, cleanups and intel_pstate adjustment for a new
  platform).

  Specifics:

   - Fix error handling in the apple-soc cpufreq driver (Dan Carpenter)

   - Change the log level of a message in the amd-pstate cpufreq driver
     so it is more visible to users (Kai-Heng Feng)

   - Adjust the balance_performance EPP value for Sapphire Rapids in the
     intel_pstate cpufreq driver (Srinivas Pandruvada)

   - Remove MODULE_LICENSE from 3 pieces of non-modular code (Nick
     Alcock)

   - Make a read-only kobj_type structure in the schedutil cpufreq
     governor constant (Thomas Weißschuh)

   - Add Add Power Limit4 support for Meteor Lake SoC to the Intel RAPL
     power capping driver (Sumeet Pawnikar)"

* tag 'pm-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: apple-soc: Fix an IS_ERR() vs NULL check
  powercap: remove MODULE_LICENSE in non-modules
  cpufreq: intel_pstate: remove MODULE_LICENSE in non-modules
  powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC
  cpufreq: amd-pstate: remove MODULE_LICENSE in non-modules
  cpufreq: schedutil: make kobj_type structure constant
  cpufreq: amd-pstate: Let user know amd-pstate is disabled
  cpufreq: intel_pstate: Adjust balance_performance EPP for Sapphire Rapids
2023-03-03 10:30:58 -08:00
Nick Alcock
d25d01b4e5 powercap: remove MODULE_LICENSE in non-modules
Since commit 8b41fc4454 ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-28 21:32:05 +01:00
Sumeet Pawnikar
eb52bc2ae5 powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC
Add Meteor Lake SoC to the list of processor models for which
Power Limit4 is supported by the Intel RAPL driver.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-23 20:06:57 +01:00
Linus Torvalds
1b72607d73 Thermal control updates for 6.3-rc1
- Rework a large bunch of drivers to use the generic thermal trip
    structure and use the opportunity to do more cleanups by removing
    unused functions from the OF code (Daniel Lezcano).
 
  - Remove core header inclusion from drivers (Daniel Lezcano).
 
  - Fix some locking issues related to the generic thermal trip rework
    (Johan Hovold).
 
  - Fix a crash when requesting the critical temperature on tegra, which
    is related to the generic trip point work (Jon Hunter).
 
  - Clean up thermal device unregistration code (Viresh Kumar).
 
  - Fix and clean up thermal control core initialization error code
    paths (Daniel Lezcano).
 
  - Relocate the trip points handling code into a separate file (Daniel
    Lezcano).
 
  - Make the thermal core fail registration of thermal zones and cooling
    devices if the thermal class has not been registered (Rafael Wysocki).
 
  - Add trip point initialization helper functions for ACPI-defined trip
    points and modify two thermal drivers to use them (Rafael Wysocki,
    Daniel Lezcano).
 
  - Make the core thermal control code use sysfs_emit_at() instead of
    scnprintf() where applicable (ye xingchen).
 
  - Consolidate code accessing the Intel TCC (Thermal Control Circuitry)
    MSRs by introducing library functions for that and making the
    TCC-related code in thermal drivers use them (Zhang Rui).
 
  - Enhance the x86_pkg_temp_thermal driver to support dynamic tjmax
    changes (Zhang Rui).
 
  - Address an "unsigned expression compared with zero" warning in the
    intel_soc_dts_iosf thermal driver (Yang Li).
 
  - Update comments regarding two functions in the Intel Menlow thermal
    driver (Deming Wang).
 
  - Use sysfs_emit_at() instead of scnprintf() in the int340x thermal
    driver (ye xingchen).
 
  - Make the intel_pch thermal driver support the Wellsburg PCH (Tim
    Zimmermann).
 
  - Modify the intel_pch and processor_thermal_device_pci thermal drivers
    use generic trip point tables instead of thermal zone trip point
    callbacks (Daniel Lezcano).
 
  - Add production mode attribute sysfs attribute to the int340x thermal
    driver (Srinivas Pandruvada).
 
  - Rework dynamic trip point updates handling and locking in the int340x
    thermal driver (Rafael Wysocki).
 
  - Make the int340x thermal driver use a generic trip points table
    instead of thermal zone trip point callbacks (Rafael Wysocki, Daniel
    Lezcano).
 
  - Clean up and improve the int340x thermal driver (Rafael Wysocki).
 
  - Simplify and clean up the intel_pch thermal driver (Rafael Wysocki).
 
  - Fix the Intel powerclamp thermal driver and make it use the common
    idle injection framework (Srinivas Pandruvada).
 
  - Add two module parameters, cpumask and max_idle, to the Intel powerclamp
    thermal driver to allow it to affect only a specific subset of CPUs
    instead of all of them (Srinivas Pandruvada).
 
  - Make the Intel quark_dts thermal driver Use generic trip point
    objects instead of its own trip point representation (Daniel
    Lezcano).
 
  - Add toctree entry for thermal documents and fix two issues in the
    Intel powerclamp driver documentation (Bagas Sanjaya).
 
  - Use strscpy() to instead of strncpy() in the thermal core (Xu Panda).
 
  - Fix thermal_sampling_exit() (Vincent Guittot).
 
  - Add Mediatek Low Voltage Thermal Sensor (LVTS) driver (Balsam Chihi).
 
  - Add r8a779g0 RCar support to the rcar_gen3 thermal driver (Geert
    Uytterhoeven).
 
  - Fix useless call to set_trips() when resuming in the rcar_gen3
    thermal control driver and add interrupt support detection at init
    time to it (Niklas Söderlund).
 
  - Fix memory corruption in the hi3660 thermal driver (Yongqin Liu).
 
  - Fix include path for libnl3 in pkg-config file for libthermal (Vibhav
    Pant).
 
  - Remove syscfg-based driver for st as the platform is not supported
    any more (Alain Volmat).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPuJuESHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxef0P/3h73rPjGEyuDlvXaazyXsJ2ItIoGeXF
 v9sDwK3IPeFTNwAu80RySXQViOG6G1e5Cl8Ee+LuuMZfPRlBnr3n35BazejDDK0N
 u3YAhPqtNOvWqr31T3A27dYtK+feFR2QL9SGFP0E4yxS1jpMOSO4Q24z7yaXdegT
 hD8YT1HbTW4Cra7A17qdXsG8LkIe0+GQXy7Ig/Dul1eqXTM4RSReGTmXic66hGpv
 lutqIQl8VdjmVBcQtTustpdycAD9zj07xd9BvOyM0lmF90zt6S0VOWFDsk+8u1jA
 FCiuRLBAM1xbguxGubahTVOM051J/MdfM5WqGgPtesNIXlDq4Je2WUGC07jGvSfV
 DMjNNb+nTkD3BK+BEe+rgv3KZBngj4p2sGHFW19v3EPdGftzohqDD5Oqn0GpsKR0
 J4GaT04T66A6jlNdzY/nPfOIw5FYEAsMwx4hR0qtEWDMT4uYtXQYM5iml9TBDoDx
 Kqyx+N8KhaKnQ4PLZ0MwtusyZydKQC1S1YK6G2eo+bXeJEre07FjZkiNfURi5gv9
 lrKS5nbAGBqUrNV4XnS18RmGAC+bxuQrNA5Gr0ouYaaLMT+jGzcdu1yCMeWJxwZI
 fFGAwE6sOU8EtmdGJrQdJt4eKCnpzOS7I1XuMDTBstl8Wv92x/YbH39vOl9wbJVs
 rmSkM+4t+sXb
 =tZwm
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control updates from Rafael Wysocki:
 "The majority of changes here are related to the general switch-over to
  using arrays of generic trip point structures registered along with a
  thermal zone instead of trip point callbacks (this has been done
  mostly by Daniel Lezcano with some help from yours truly on the Intel
  drivers front).

  Apart from that and the related reorganization of code, there are some
  enhancements of the existing driver and a new Mediatek Low Voltage
  Thermal Sensor (LVTS) driver. The Intel powerclamp undergoes a major
  rework so it will use the generic idle_inject facility for CPU idle
  time injection going forward and it will take additional module
  parameters for specifying the subset of CPUs to be affected by it
  (work done by Srinivas Pandruvada).

  Also included are assorted fixes and a whole bunch of cleanups.

  Specifics:

   - Rework a large bunch of drivers to use the generic thermal trip
     structure and use the opportunity to do more cleanups by removing
     unused functions from the OF code (Daniel Lezcano)

   - Remove core header inclusion from drivers (Daniel Lezcano)

   - Fix some locking issues related to the generic thermal trip rework
     (Johan Hovold)

   - Fix a crash when requesting the critical temperature on tegra,
     which is related to the generic trip point work (Jon Hunter)

   - Clean up thermal device unregistration code (Viresh Kumar)

   - Fix and clean up thermal control core initialization error code
     paths (Daniel Lezcano)

   - Relocate the trip points handling code into a separate file (Daniel
     Lezcano)

   - Make the thermal core fail registration of thermal zones and
     cooling devices if the thermal class has not been registered
     (Rafael Wysocki)

   - Add trip point initialization helper functions for ACPI-defined
     trip points and modify two thermal drivers to use them (Rafael
     Wysocki, Daniel Lezcano)

   - Make the core thermal control code use sysfs_emit_at() instead of
     scnprintf() where applicable (ye xingchen)

   - Consolidate code accessing the Intel TCC (Thermal Control
     Circuitry) MSRs by introducing library functions for that and
     making the TCC-related code in thermal drivers use them (Zhang Rui)

   - Enhance the x86_pkg_temp_thermal driver to support dynamic tjmax
     changes (Zhang Rui)

   - Address an "unsigned expression compared with zero" warning in the
     intel_soc_dts_iosf thermal driver (Yang Li)

   - Update comments regarding two functions in the Intel Menlow thermal
     driver (Deming Wang)

   - Use sysfs_emit_at() instead of scnprintf() in the int340x thermal
     driver (ye xingchen)

   - Make the intel_pch thermal driver support the Wellsburg PCH (Tim
     Zimmermann)

   - Modify the intel_pch and processor_thermal_device_pci thermal
     drivers use generic trip point tables instead of thermal zone trip
     point callbacks (Daniel Lezcano)

   - Add production mode attribute sysfs attribute to the int340x
     thermal driver (Srinivas Pandruvada)

   - Rework dynamic trip point updates handling and locking in the
     int340x thermal driver (Rafael Wysocki)

   - Make the int340x thermal driver use a generic trip points table
     instead of thermal zone trip point callbacks (Rafael Wysocki,
     Daniel Lezcano)

   - Clean up and improve the int340x thermal driver (Rafael Wysocki)

   - Simplify and clean up the intel_pch thermal driver (Rafael Wysocki)

   - Fix the Intel powerclamp thermal driver and make it use the common
     idle injection framework (Srinivas Pandruvada)

   - Add two module parameters, cpumask and max_idle, to the Intel
     powerclamp thermal driver to allow it to affect only a specific
     subset of CPUs instead of all of them (Srinivas Pandruvada)

   - Make the Intel quark_dts thermal driver Use generic trip point
     objects instead of its own trip point representation (Daniel
     Lezcano)

   - Add toctree entry for thermal documents and fix two issues in the
     Intel powerclamp driver documentation (Bagas Sanjaya)

   - Use strscpy() to instead of strncpy() in the thermal core (Xu
     Panda)

   - Fix thermal_sampling_exit() (Vincent Guittot)

   - Add Mediatek Low Voltage Thermal Sensor (LVTS) driver (Balsam
     Chihi)

   - Add r8a779g0 RCar support to the rcar_gen3 thermal driver (Geert
     Uytterhoeven)

   - Fix useless call to set_trips() when resuming in the rcar_gen3
     thermal control driver and add interrupt support detection at init
     time to it (Niklas Söderlund)

   - Fix memory corruption in the hi3660 thermal driver (Yongqin Liu)

   - Fix include path for libnl3 in pkg-config file for libthermal
     (Vibhav Pant)

   - Remove syscfg-based driver for st as the platform is not supported
     any more (Alain Volmat)"

* tag 'thermal-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (135 commits)
  thermal/drivers/st: Remove syscfg based driver
  thermal: Remove core header inclusion from drivers
  tools/lib/thermal: Fix include path for libnl3 in pkg-config file.
  thermal/drivers/hisi: Drop second sensor hi3660
  thermal/drivers/rcar_gen3_thermal: Fix device initialization
  thermal/drivers/rcar_gen3_thermal: Create device local ops struct
  thermal/drivers/rcar_gen3_thermal: Do not call set_trips() when resuming
  thermal/drivers/rcar_gen3: Add support for R-Car V4H
  dt-bindings: thermal: rcar-gen3-thermal: Add r8a779g0 support
  thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver
  dt-bindings: thermal: mediatek: Add LVTS thermal controllers
  thermal/drivers/mediatek: Relocate driver to mediatek folder
  tools/lib/thermal: Fix thermal_sampling_exit()
  Documentation: powerclamp: Fix numbered lists formatting
  Documentation: powerclamp: Escape wildcard in cpumask description
  Documentation: admin-guide: Add toctree entry for thermal docs
  thermal: intel: powerclamp: Add two module parameters
  Documentation: admin-guide: Move intel_powerclamp documentation
  thermal: core: Use sysfs_emit_at() instead of scnprintf()
  thermal: intel: powerclamp: Fix duration module parameter
  ...
2023-02-21 12:32:05 -08:00
Zhang Rui
cf835b005b powercap: intel_rapl: Fix handling for large time window
When setting the power limit time window, software updates the 'y' bits
and 'f' bits in the power limit register, and the value hardware takes
follows the formula below

	Time window = 2 ^ y * (1 + f / 4) * Time_Unit

When handling large time window input from userspace, using left
shifting breaks in two cases:

 1. when ilog2(value) is bigger than 31, in expression "1 << y", left
    shifting by more than 31 bits has undefined behavior. This breaks
    'y'. For example, on an Alderlake platform, "1 << 32" returns 1.

 2. when ilog2(value) equals 31, "1 << 31" returns negative value
    because '1' is recognized as signed int. And this breaks 'f'.

Given that 'y' has 5 bits and hardware can never take a value larger
than 31, fix the first problem by clamp the time window to the maximum
possible value that the hardware can take.

Fix the second problem by using unsigned bit left shift.

Note that hardware has its own maximum time window limitation, which
may be lower than the time window value retrieved from the power limit
register. When this happens, hardware clamps the input to its maximum
time window limitation. That is why a software clamp is preferred to
handle the problem on hand.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Adjusted the comment added by this change ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-13 17:01:31 +01:00
Srinivas Pandruvada
acbc661032 powercap: idle_inject: Add update callback
The powercap/idle_inject core uses play_idle_precise() to inject idle
time. But play_idle_precise() can't ensure that the CPU is fully idle
for the specified duration because of wakeups due to interrupts. To
compensate for the reduced idle time due to these wakes, the caller
can adjust requested idle time for the next cycle.

The goal of idle injection is to keep system at some idle percent on
average, so this is fine to overshoot or undershoot instantaneous idle
times.

The idle inject core provides an interface idle_inject_set_duration()
to set idle and runtime duration.

Some architectures provide interface to get actual idle time observed
by the hardware. So, the effective idle percent can be adjusted using
the hardware feedback. For example, Intel CPUs provides package idle
counters, which is currently used by Intel powerclamp driver to
readjust runtime duration.

When the caller's desired idle time over a period is less or greater
than the actual CPU idle time observed by the hardware, caller can
readjust idle and runtime duration for the next cycle.

The only way this can be done currently is by monitoring hardware idle
time from a different software thread and readjust idle and runtime
duration using idle_inject_set_duration().

This can be avoided by adding a callback which callers can register and
readjust from this callback function.

Add a capability to register an optional update() callback, which can be
called from the idle inject core before waking up CPUs for idle injection.
This callback can be registered via a new interface:
idle_inject_register_full().

During this process of constantly adjusting idle and runtime duration
there can be some cases where actual idle time is more than the desired.
In this case idle inject can be skipped for a cycle. If update() callback
returns false, then the idle inject core skips waking up CPUs for the
idle injection.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-02 21:08:32 +01:00