The thermal subsystem registers a hwmon device without providing chip
information or sysfs attribute groups. While undesirable, it would be
difficult to change. On the other side, it abuses the
hwmon_device_register_with_info API by not providing that information.
Use new API specifically created for the thermal subsystem instead to
let us enforce the 'chip' parameter for other callers of
hwmon_device_register_with_info().
Acked-by: Rafael J . Wysocki <rafael@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Previously, during suspend, intel_pch_thermal driver logs for every
cooling iteration, about the current PCH temperature and number of cooling
iterations that have been tried, like below
[ 100.955526] intel_pch_thermal 0000:00:14.2: CPU-PCH current temp [53C] higher than the threshold temp [50C], sleep 1 times for 100 ms duration
[ 101.064156] intel_pch_thermal 0000:00:14.2: CPU-PCH current temp [53C] higher than the threshold temp [50C], sleep 2 times for 100 ms duration
After changing the default delay_cnt to 600, in practice, it is common to
see tens of the above messages if the system is suspended when PCH
overheats. Thus, change this log message from dev_warn to dev_dbg because
it is only useful when we want to check the temperature trend.
At the same time, there is always a one-line message given by the driver
with the patch applied, with below four possibilities.
1. PCH is cool, no cooling delay needed
[ 1791.902853] intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [48C]
2. PCH overheats and becomes cool after the cooling delays
[ 1475.511617] intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [49C] after 30700 ms delay
3. PCH still overheats after the overall cooling timeout
[ 2250.157487] intel_pch_thermal 0000:00:12.0: CPU-PCH is hot [60C] after 60000 ms delay. S0ix might fail
4. PCH aborts cooling because of wakeup event detected during the delay
[ 1933.639509] intel_pch_thermal 0000:00:12.0: Wakeup event detected, abort cooling
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit ef63b043ac ("thermal: intel: pch: fix S0ix failure due to PCH
temperature above threshold") introduces delay loop mechanism that allows
PCH temperature to go down below threshold during suspend so it won't
block S0ix. And the default overall delay timeout is 1 second.
However, in practice, we found that the time it takes to cool the PCH down
below threshold highly depends on the initial PCH temperature when the
delay starts, as well as the ambient temperature.
And in some cases, the 1 second delay is not sufficient. As a result, the
system stays in a shallower power state like PCx instead of S0ix, and
drains the battery power, without user' notice.
To make sure S0ix is not blocked by the PCH overheating, we
1. expand the default overall timeout to 60 seconds.
2. make sure the temperature is below threshold rather than equal to it.
At the same time, as the cooling delay can be much longer and many wakeup
events (ACPI Power Button press, USB mouse move, etc) becomes valid in the
suspend_noirq phase, add detection of wakeup event so that the driver
does not delay blindly when the system suspend is likely to abort soon.
This patch may introduce longer suspend time, but only in the cases when
the system overheats and Linux used to enter a shallower S2idle state,
say, PCx instead of S0ix.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move the PCH Thermal driver suspend callback to suspend_noirq to do
cooling while the system is more quiescent.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add VTM thermal support. In the Voltage Thermal Management
Module(VTM), K3 J72XX supplies a voltage reference and a temperature
sensor feature that are gathered in the band gap voltage and
temperature sensor (VBGAPTS) module. The band gap provides current and
voltage reference for its internal circuits and other analog IP
blocks. The analog-to-digital converter (ADC) produces an output value
that is proportional to the silicon temperature.
Currently reading temperatures only is supported. There are no
active/passive cooling agent supported.
J721e SoCs have errata i2128: https://www.ti.com/lit/pdf/sprz455
The VTM Temperature Monitors (TEMPSENSORs) are trimmed during production,
with the resulting values stored in software-readable registers. Software
should use these register values when translating the Temperature
Monitor output codes to temperature values.
It has an involved workaround. Software needs to read the error codes for
-40C, 30C, 125C from the efuse for each device & derive a new look up table
for adc to temperature conversion. Involved calculating slopes & constants
using 3 different straight line equations with adc refernce codes as the
y-axis & error codes in the x-axis.
-40C to 30C
30C to 125C
125C to 150C
With the above 2 line equations we derive the full look-up table to
workaround the errata i2128 for j721e SoC.
Tested temperature reading on J721e SoC & J7200 SoC.
[daniel.lezcano@linaro.org: Generate look-up tables run-time]
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Link: https://lore.kernel.org/r/20220517172920.10857-3-j-keerthy@ti.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
of_find_node_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.
Fixes: e20db70dba ("thermal: imx_sc: add i.MX system controller thermal support")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220517055121.18092-1-linmq006@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The LMh instances in the Qualcomm SC8180X platform looks to behave
similar to those in SM8150, add additional compatibles to allow
platform specific behavior to be added if needed.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220502164504.3972938-1-bjorn.andersson@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
As per the latest RZ/G2L Hardware User's Manual (Rev.1.10 Apr, 2022),
the bit 31 of TSU OTP Calibration Register(OTPTSUTRIM) indicates
whether bit [11:0] of OTPTSUTRIM is valid or invalid.
This patch updates the code to reflect this change.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20220428093346.7552-1-biju.das.jz@bp.renesas.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Add a missing s to __thermal_bind_param kernel doc comment.
This fixes the following sparse warnings:
drivers/thermal/thermal_of.c:50: warning: expecting prototype for struct __thermal_bind_param. Prototype was for struct __thermal_bind_params instead
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Link: https://lore.kernel.org/r/20220426064113.3787826-1-clabbe@baylibre.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The sensor driver which register through thermal_of interface doesn't
have an option to get thermal zone mode change notification from
thermal core.
Add support for change_mode ops in thermal_of interface so that sensor
driver can use this ops for mode change notification.
Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
Link: https://lore.kernel.org/r/1646767586-31908-1-git-send-email-quic_manafm@quicinc.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The thermal sensor on BCM2711 is capable of negative temperatures, so don't
clamp the measurements at zero. Since this was the only use for variable t,
drop it.
This change based on a patch by Dom Cobley, who also tested the fix.
Fixes: 59b781352d ("thermal: Add BCM2711 thermal driver")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220412195423.104511-1-stefan.wahren@i2se.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
On apq8064 (msm8960) platforms the tsens device is created manually by
the gcc driver. Prepare the tsens driver for the qcom,msm8960-tsens
device instantiated from the device tree.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20220406002648.393486-3-dmitry.baryshkov@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Expose the thermal sensors on K3 AM654 as hwmon devices, so that
temperatures could be read using lm-sensors.
Signed-off-by: Massimiliano Minella <massimiliano.minella@gmail.com>
Link: https://lore.kernel.org/r/20220401151656.913166-1-massimiliano.minella@se.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Add support for PMIC5 Gen2 ADC_TM, used on PMIC7 chips. It is a
close counterpart of PMIC7 ADC and has the same functionality as
PMIC5 ADC_TM, for threshold monitoring and interrupt generation.
It is present on PMK8350 alone, like PMIC7 ADC and can be used
to monitor up to 8 ADC channels, from any of the PMIC7 PMICs
having ADC on a target, through PBS(Programmable Boot Sequence).
Signed-off-by: Jishnu Prakash <quic_jprakash@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/1648991869-20899-5-git-send-email-quic_jprakash@quicinc.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Refactor code to support multiple generations of ADC_TM devices
by defining gen number, irq name and disable, configure, isr and
init APIs in the individual data structs.
Signed-off-by: Jishnu Prakash <quic_jprakash@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/1648991869-20899-4-git-send-email-quic_jprakash@quicinc.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.
In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq_optional().
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20220110144039.5810-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
container_of() will never return NULL, so remove useless code.
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
With the new OS handshake introduced by commit: "c7ff29763989 ("thermal:
int340x: Update OS policy capability handshake")", the "enabled" thermal
zone mode doesn't work in the same way as previously.
The "enabled" mode fails with -EINVAL when the new handshake is used.
To address this issue, when the new OS UUID mask is set:
- When the mode is "enabled", return 0 as the firmware already has the
latest policy mask.
- When the mode is "disabled", update the firmware with the UUID mask
of zero.
This way, the firmware can take over the thermal control.
Also reset the OS UUID mask, which allows user space to update with new
set of policies.
Fixes: c7ff297639 ("thermal: int340x: Update OS policy capability handshake")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog edits, removed unneeded parens ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge a fix for the attr.show callback prototype in the int340x thermal
driver (Kees Cook).
* thermal-int340x:
thermal: int340x: Fix attr.show callback prototype
The userspace governor is still in use on production systems and the
deprecating warning is scary.
Even if we want to get rid of the userspace governor, it is too soon
yet as the alternatives are not yet adopted.
Change the deprecated warning by an information message suggesting to
switch to the netlink thermal events.
Fixes: 0275c9fb0e ("thermal/core: Make the userspace governor deprecated")
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This reverts commit a67a46af4a.
It has been reported the warning is annoying as the cooling device
state is still needed on some production system.
Meanwhile we provide a way to consolidate the thermal framework to
prevent multiple actors acting on the cooling devices with conflicting
decisions, let's revert this warning.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Control Flow Integrity (CFI) instrumentation of the kernel noticed that
the caller, dev_attr_show(), and the callback, odvp_show(), did not have
matching function prototypes, which would cause a CFI exception to be
raised. Correct the prototype by using struct device_attribute instead
of struct kobj_attribute.
Reported-and-tested-by: Joao Moreira <joao@overdrivepizza.com>
Link: https://lore.kernel.org/lkml/067ce8bd4c3968054509831fa2347f4f@overdrivepizza.com/
Fixes: 006f006f1e ("thermal/int340x_thermal: Export OEM vendor variables")
Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix access illegal address problem in following condition:
There are multiple devfreq cooling devices in system, some of them has
EM model but others do not. Energy model ops such as state2power will
append to global devfreq_cooling_ops when the cooling device with
EM model is registered. It makes the cooling device without EM model
also use devfreq_cooling_ops after appending when registered later by
of_devfreq_cooling_register_power() or of_devfreq_cooling_register().
The IPA governor regards the cooling devices without EM model as a power
actor, because they also have energy model ops, and will access illegal
address at dfc->em_pd when execute cdev->ops->get_requested_power,
cdev->ops->state2power or cdev->ops->power2state.
Fixes: 615510fe13 ("thermal: devfreq_cooling: remove old power model and use EM")
Cc: 5.13+ <stable@vger.kernel.org> # 5.13+
Signed-off-by: Kant Fan <kant@allwinnertech.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cleaning up the driver to use pm_sleep_ptr() macro instead of #ifdef
guards is simpler and allows the compiler to remove those functions
if built without CONFIG_PM_SLEEP support.
Signed-off-by: Hesham Almatary <hesham.almatary@huawei.com>
Reviewed-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Energy Model can now be artificial, which means the power values
are mathematically generated to leverage EAS while not expected to be on
an uniform scale with other devices providing power information. If this
EM type is in use, the thermal governor IPA should not be allowed to
operate, since the relation between cooling devices is not properly
defined. Thus, it might be possible that big GPU has lower power values
than a Little CPU. To mitigate a misbehaviour of the thermal control
algorithm, simply do not register the cooling device as IPA's power
actor.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that the UUID is already sanitized by the caller,
lets trivially clean up some of the context arming.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce a single point of freeing/exit after ensuring no error in
int3400_setup_gddv().
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is the caller's responsibility to free only upon ACPI_SUCCESS.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge Intel Hardware Feedback Interface (HFI) thermal driver for
5.18-rc1 and update the intel-speed-select utility to support that
driver.
* thermal-hfi:
tools/power/x86/intel-speed-select: v1.12 release
tools/power/x86/intel-speed-select: HFI support
tools/power/x86/intel-speed-select: OOB daemon mode
thermal: intel: hfi: INTEL_HFI_THERMAL depends on NET
thermal: netlink: Fix parameter type of thermal_genl_cpu_capability_event() stub
thermal: intel: hfi: Notify user space for HFI events
thermal: netlink: Add a new event to notify CPU capabilities change
thermal: intel: hfi: Enable notification interrupt
thermal: intel: hfi: Handle CPU hotplug events
thermal: intel: hfi: Minimally initialize the Hardware Feedback Interface
x86/cpu: Add definitions for the Intel Hardware Feedback Interface
x86/Documentation: Describe the Intel Hardware Feedback Interface
Merge powerclamp thermal driver changes, int340x thermal driver changes
and thermal documentation changes for 5.18-rc1:
- Don't use bitmap_weight() in end_power_clamp() in the powerclamp
driver (Yury Norov).
- Update the OS policy capabilities handshake in the int340x thermal
driver (Srinivas Pandruvada).
- Increase the policies bitmap size in int340x (Srinivas Pandruvada).
- Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the
int340x thermal driver (Rafael Wysocki).
- Check for NULL after calling kmemdup() in int340x (Jiasheng Jiang).
- Add Intel Dynamic Power and Thermal Framework (DPTF) kernel interface
documentation (Srinivas Pandruvada).
- Fix bullet list warning in the thermal documentation (Randy Dunlap).
* thermal-powerclamp:
thermal: intel_powerclamp: don't use bitmap_weight() in end_power_clamp()
* thermal-int340x:
thermal: int340x: Update OS policy capability handshake
thermal: int340x: Increase bitmap size
thermal: Replace acpi_bus_get_device()
thermal: int340x: Check for NULL after calling kmemdup()
* thermal-docs:
Documentation: thermal: DPTF Documentation
thermal: fix Documentation bullet list warning
Update the firmware with OS supported policies mask, so that firmware can
relinquish its internal controls. Without this update several Tiger Lake
laptops gets performance limited with in few seconds of executing in
turbo region.
The existing way of enumerating firmware policies via IDSP method and
selecting policy by directly writing those policy UUIDS via _OSC method
is not supported in newer generation of hardware.
There is a new UUID "B23BA85D-C8B7-3542-88DE-8DE2FFCFD698" is defined for
updating policy capabilities. As part of ACPI _OSC method:
Arg0 - UUID: B23BA85D-C8B7-3542-88DE-8DE2FFCFD698
Arg1 - Rev ID: 1
Arg2 - Count: 2
Arg3 - Capability buffers: Array of Arg2 DWORDS
DWORD1: As defined in the ACPI 5.0 Specification
- Bit 0: Query Flag
- Bits 1-3: Always 0
- Bits 4-31: Reserved
DWORD2 and beyond:
- Bit0: set to 1 to indicate Intel(R) Dynamic Tuning is active, 0 to
indicate it is disabled and legacy thermal mechanism should
be enabled.
- Bit1: set to 1 to indicate Intel(R) Dynamic Tuning is controlling
active cooling, 0 to indicate bios shall enable legacy thermal
zone with active trip point.
- Bit2: set to 1 to indicate Intel(R) Dynamic Tuning is controlling
passive cooling, 0 to indicate bios shall enable legacy thermal
zone with passive trip point.
- Bit3: set to 1 to indicate Intel(R) Dynamic Tuning is handling
critical trip point, 0 to indicate bios shall enable legacy
thermal zone with critical trip point.
- Bits 4:31: Reserved
From sysfs interface, there is an existing interface to update policy
UUID using attribute "current_uuid". User space can write the same UUID
for ACTIVE, PASSIVE and CRITICAL policy. Driver converts these UUIDs to
DWORD2 Bit 1 to Bit 3. When any of the policy is activated by user
space it is assumed that dynamic tuning is active.
For example
$cd /sys/bus/platform/devices/INTC1040:00/uuids
To support active policy
$echo "3A95C389-E4B8-4629-A526-C52C88626BAE" > current_uuid
To support passive policy
$echo "42A441D6-AE6A-462b-A84B-4A8CE79027D3" > current_uuid
To support critical policy
$echo "97C68AE7-15FA-499c-B8C9-5DA81D606E0A" > current_uuid
To check all the supported policies
$cat current_uuid
3A95C389-E4B8-4629-A526-C52C88626BAE
42A441D6-AE6A-462b-A84B-4A8CE79027D3
97C68AE7-15FA-499c-B8C9-5DA81D606E0A
To match the bit format for DWORD2, rearranged enum int3400_thermal_uuid
and int3400_thermal_uuids[] by swapping current INT3400_THERMAL_ACTIVE
and INT3400_THERMAL_PASSIVE_1.
If the policies are enumerated via IDSP method then legacy method is
used, if not the new method is used to update policy support.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The number of policies are 10, so can't be supported by the bitmap size
of u8.
Even though there are no platfoms with these many policies, but
for correctness increase to u32.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Fixes: 16fc8eca19 ("thermal/int340x_thermal: Add additional UUIDs")
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Utilize platform_get_irq_optional() to silence these messages:
brcmstb_thermal a581500.thermal: IRQ index 0 not found
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220301181412.2008044-1-f.fainelli@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The return value from tegra_bpmp_transfer indicates the success or
failure of the IPC transaction with BPMP. If the transaction
succeeded, we also need to check the actual command's result code.
Add code to do this.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20210915085517.1669675-1-mperttunen@nvidia.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Expose ti-soc-thermal thermal sensors as HWMON devices.
# sensors
cpu_thermal-virtual-0
Adapter: Virtual device
temp1: +54.2 C (crit = +105.0 C)
dspeve_thermal-virtual-0
Adapter: Virtual device
temp1: +51.4 C (crit = +105.0 C)
gpu_thermal-virtual-0
Adapter: Virtual device
temp1: +54.2 C (crit = +105.0 C)
iva_thermal-virtual-0
Adapter: Virtual device
temp1: +54.6 C (crit = +105.0 C)
core_thermal-virtual-0
Adapter: Virtual device
temp1: +52.6 C (crit = +105.0 C)
Similar to imx_sc_thermal d2bc4dd91d.
No need to take care of thermal_remove_hwmon_sysfs() since
devm_thermal_add_hwmon_sysfs() (a wrapper around devres) is
used. See c7fc403e40.
Signed-off-by: Romain Naour <romain.naour@smile.fr>
Link: https://lore.kernel.org/r/20220218104725.2718904-1-romain.naour@smile.fr
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Add compatible to support LMh for sm8150 SoC.
sm8150 does not require explicit enabling for various LMh subsystems.
Add a variable indicating the same as match data which is set for sdm845.
Execute the piece of code enabling various LMh subsystems only if
enable algorithm match data is present.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220106173138.411097-2-thara.gopinath@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Register thermal zones as hwmon sensors to let userspace read
CPU temperatures using standard hwmon interface.
Acked-by: Amit Kucheria <amitk@kernel.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20220129180750.1882310-1-dmitry.baryshkov@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Do not call get_trip_hyst() from thermal_genl_cmd_tz_get_trip() if
the thermal zone does not define one.
Fixes: 1ce50e7d40 ("thermal: core: genetlink support for events/cmd/sampling")
Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
THERMAL_NETLINK depends on NET and since 'select' does not follow
any dependency chain, INTEL_HFI_THERMAL also should depend on NET.
Fix one Kconfig warning and 48 subsequent build errors:
WARNING: unmet direct dependencies detected for THERMAL_NETLINK
Depends on [n]: THERMAL [=y] && NET [=n]
Selected by [y]:
- INTEL_HFI_THERMAL [=y] && THERMAL [=y] && (X86 [=y] || X86_INTEL_QUARK [=n] || COMPILE_TEST [=y]) && CPU_SUP_INTEL [=y] && X86_THERMAL_VECTOR [=y]
Fixes: bd30cdfd9b ("thermal: intel: hfi: Notify user space for HFI events")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When building with CONFIG_THERMAL_NETLINK=n, there is a spew of warnings
along the lines of:
In file included from drivers/thermal/thermal_core.c:27:
In file included from drivers/thermal/thermal_core.h:15:
drivers/thermal/thermal_netlink.h:113:71: warning: declaration of 'struct cpu_capability' will not be visible outside of this function [-Wvisibility]
static inline int thermal_genl_cpu_capability_event(int count, struct cpu_capability *caps)
^
1 warning generated.
'struct cpu_capability' is not forward declared anywhere in the header.
As it turns out, this should really be 'struct thermal_genl_cpu_caps',
which silences the warning and makes the parameter types of the stub
match the full function.
Fixes: e4b1eb24ce ("thermal: netlink: Add a new event to notify CPU capabilities change")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Replace acpi_bus_get_device() that is going to be dropped with
acpi_fetch_acpi_dev().
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Don't call bitmap_weight() if the following code can get by
without it.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As the potential failure of the allocation, kmemdup() may return NULL.
Then, 'bin_attr_data_vault.private' will be NULL, but
'bin_attr_data_vault.size' is not 0, which is not consistent.
Therefore, it is better to check the return value of kmemdup() to
avoid the confusion.
Fixes: 0ba13c763a ("thermal/int340x_thermal: Export GDDV")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When the hardware issues an HFI event, relay a notification to user space.
This allows user space to respond by reading performance and efficiency of
each CPU and take appropriate action.
For example, when the performance and efficiency of a CPU is 0, user space
can either offline the CPU or inject idle. Also, if user space notices a
downward trend in performance, it may proactively adjust power limits to
avoid future situations in which performance drops to 0.
To avoid excessive notifications, the rate is limited by one HZ per event.
To limit the netlink message size, send parameters for up to 16 CPUs in a
single message. If there are more than 16 CPUs, issue as many messages as
needed to notify the status of all CPUs.
In the HFI specification, both performance and efficiency capabilities are
defined in the [0, 255] range. The existing implementations of HFI hardware
do not scale the maximum values to 255. Since userspace cares about
capability values that are either 0 or show a downward/upward trend, this
fact does not matter much. Relative changes in capabilities are enough. To
comply with the thermal netlink ABI, scale both performance and efficiency
capabilities to the [0, 1023] interval.
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a new netlink event to notify change in CPU capabilities in terms of
performance and efficiency.
Firmware may change CPU capabilities as a result of thermal events in the
system or to account for changes in the TDP (thermal design power) level.
This notification type will allow user space to avoid running workloads
on certain CPUs or proactively adjust power limits to avoid future events.
The netlink message consists of a nested attribute
(THERMAL_GENL_ATTR_CPU_CAPABILITY) with three attributes:
* THERMAL_GENL_ATTR_CPU_CAPABILITY_ID (type u32):
-- logical CPU number
* THERMAL_GENL_ATTR_CPU_CAPABILITY_PERFORMANCE (type u32):
-- Scaled performance from 0-1023
* THERMAL_GENL_ATTR_CPU_CAPABILITY_EFFICIENCY (type u32):
-- Scaled efficiency from 0-1023
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When hardware wants to inform the operating system about updates in the HFI
table, it issues a package-level thermal event interrupt. For this,
hardware has new interrupt and status bits in the IA32_PACKAGE_THERM_
INTERRUPT and IA32_PACKAGE_THERM_STATUS registers. The existing thermal
throttle driver already handles thermal event interrupts: it initializes
the thermal vector of the local APIC as well as per-CPU and package-level
interrupt reporting. It also provides routines to service such interrupts.
Extend its functionality to also handle HFI interrupts.
The frequency of the thermal HFI interrupt is specific to each processor
model. On some processors, a single interrupt happens as soon as the HFI is
enabled and hardware will never update HFI capabilities afterwards. On
other processors, thermal and power constraints may cause thermal HFI
interrupts every tens of milliseconds.
To not overwhelm consumers of the HFI data, use delayed work to throttle
the rate at which HFI updates are processed. Use a dedicated workqueue to
not overload system_wq if hardware issues many HFI updates.
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
All CPUs in a package are represented in an HFI table. There exists an
HFI table per package. Thus, CPUs in a package need to coordinate to
initialize and access the table. Do such coordination during CPU hotplug.
Use the first CPU to come online in a package to initialize the HFI
instance and the data structure representing it. Other CPUs in the same
package need only to register or unregister themselves in that data
structure.
The HFI depends on both the package-level thermal management and the local
APIC thermal local vector. Thus, to ensure that a CPU coming online has an
associated HFI instance when the hardware issues an HFI event, enable the
HFI only after having enabled the local APIC thermal vector. The thermal
throttle driver takes care of the needed package-level initialization.
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Intel Hardware Feedback Interface provides guidance to the operating
system about the performance and energy efficiency capabilities of each
CPU in the system. Capabilities are numbers between 0 and 255 where a
higher number represents a higher capability. For each CPU, energy
efficiency and performance are reported as separate capabilities.
Hardware computes these capabilities based on the operating conditions of
the system such as power and thermal limits. These capabilities are shared
with the operating system in a table resident in memory. Each package in
the system has its own HFI instance. Every logical CPU in the package is
represented in the table. More than one logical CPUs may be represented in
a single table entry. When the hardware updates the table, it generates a
package-level thermal interrupt.
The size and format of the HFI table depend on the supported features and
can only be determined at runtime. To minimally initialize the HFI, parse
its features and allocate one instance per package of a data structure with
the necessary parameters to read and navigate a local copy (i.e., owned by
the driver) of individual HFI tables.
A subsequent changeset will provide per-CPU initialization and interrupt
handling.
Reviewed-by: Len Brown <len.brown@intel.com>
Co-developed by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add Raptor Lake PCI ID for processor thermal device.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add Raptor Lake ACPI IDs for DPTF devices.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Add new TSU driver and DT bindings for the Renesas RZ/G2L platform
(Biju Das).
- Fix missing check when calling reset_control_deassert() in the
rz2gl thermal driver (Biju Das).
- In preparation for FORTIFY_SOURCE performing compile-time and
run-time field bounds checking for memcpy(), avoid intentionally
writing across neighboring fields in the int340x thermal control
driver (Kees Cook).
- Fix RFIM mailbox write commands handling in the int340x thermal
control driver (Sumeet Pawnikar).
- Fix PM issue occurring in the iMX thermal control driver during
suspend/resume by implementing PM runtime support in it (Oleksij
Rempel).
- Add 'const' annotation to thermal_cooling_ops in the Intel
powerclamp driver (Rikard Falkeborn).
- Fix missing ADC bit set in the iMX8MP thermal driver to enable the
sensor (Paul Gerber).
- Drop unused local variable definition from tmon (ran jianping).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmHcgsISHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxGJ4P/1S6lW2SUA98YfEVt0V6HnnXRCFXRE+5
OANgcjKqrDy57ltHM56sFqym8KQb9If5vki9yKq9/crmUVg9YMFsaN4RCfuUIRE5
B+v/NHg+s8VTrcDgF1fINXbBtYVgUoXhbjTH6gdCLSiEDhTgFkEQFasrcrTQN7IX
uqaC33ZxoSk7e89Wb/roJushHfjylqpqxlFX8ME3aQyRxIkMVN2p3qddg33XwzMS
lQjtTLpL4fOtOZET+KwKxOG97VBDuGcH6bHdzNWHlcKWjI6cZYMS9OrxyMGNggUi
JF9N4Y4xxwQWLQ32YQPXv+xUS92A2fsGTGqw1g50PmZHh2hvRsODJEPopjO8jTd0
0QtmIT+GzfNaEcBbbS31eb0ePYc/DobbBlpJ8kMoe/WskWANl4I6TrFMTDceZj9X
iuzMzGvpr/3/L7tRDIY7qM0iKX1WNeMuOXNIDuD7cmM6of82hzjwOBvhAOf338nP
InZB6o0fRdM38Wypw0E17CHBAJTgXh2vCAgRvfOuDCqavZM21h1BBwNpKHfxa+5z
LsO8WRifkDHHNctSBsh5UoRUO/L6RnCB9SVaQUb8k19HfIY8ZoNjM+CPMAojCAxs
KOEqHYxF7hG7eEkDAIF624INJlxkntPEuQDI9TSj6wi2wc5cISG1tMi1p8M01yND
XVisUkJoZKVy
=Crok
-----END PGP SIGNATURE-----
Merge tag 'thermal-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These add a new driver for Renesas RZ/G2L TSU, update a few existing
thermal control drivers and clean up the tmon utility.
Specifics:
- Add new TSU driver and DT bindings for the Renesas RZ/G2L platform
(Biju Das).
- Fix missing check when calling reset_control_deassert() in the
rz2gl thermal driver (Biju Das).
- In preparation for FORTIFY_SOURCE performing compile-time and
run-time field bounds checking for memcpy(), avoid intentionally
writing across neighboring fields in the int340x thermal control
driver (Kees Cook).
- Fix RFIM mailbox write commands handling in the int340x thermal
control driver (Sumeet Pawnikar).
- Fix PM issue occurring in the iMX thermal control driver during
suspend/resume by implementing PM runtime support in it (Oleksij
Rempel).
- Add 'const' annotation to thermal_cooling_ops in the Intel
powerclamp driver (Rikard Falkeborn).
- Fix missing ADC bit set in the iMX8MP thermal driver to enable the
sensor (Paul Gerber).
- Drop unused local variable definition from tmon (ran jianping)"
* tag 'thermal-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/drivers/int340x: Fix RFIM mailbox write commands
thermal/drivers/rz2gl: Add error check for reset_control_deassert()
thermal/drivers/imx8mm: Enable ADC when enabling monitor
thermal/drivers: Add TSU driver for RZ/G2L
dt-bindings: thermal: Document Renesas RZ/G2L TSU
thermal/drivers/intel_powerclamp: Constify static thermal_cooling_device_ops
thermal/drivers/imx: Implement runtime PM support
thermal: tools: tmon: remove unneeded local variable
thermal: int340x: Use struct_group() for memcpy() region
The existing mail mechanism only supports writing of workload types.
However, mailbox command for RFIM (cmd = 0x08) also requires write
operation which is ignored. This results in failing to store RFI
restriction.
Fixint this requires enhancing mailbox writes for non workload
commands too, so remove the check for MBOX_CMD_WORKLOAD_TYPE_WRITE
in mailbox write to allow this other write commands to be supoorted.
At the same time, however, we have to make sure that there is no
impact on read commands, by avoiding to write anything into the
mailbox data register.
To properly implement that, add two separate functions for mbox read
and write commands for the processor thermal workload command type.
This helps to distinguish the read and write workload command types
from each other while sending mbox commands.
Fixes: 5d6fbc96bd ("thermal/drivers/int340x: processor_thermal: Export additional attributes")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Cc: 5.14+ <stable@vger.kernel.org> # 5.14+
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ARM cpufreq updates for 5.17-rc1 from Viresh Kumar:
"- Qcom cpufreq driver updates improve irq support (Ard Biesheuvel, Stephen Boyd,
and Vladimir Zapolskiy).
- Fixes double devm_remap for mediatek driver (Hector Yuan).
- Introduces thermal pressure helpers (Lukasz Luba)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: mediatek-hw: Fix double devm_remap in hotplug case
cpufreq: qcom-hw: Use optional irq API
cpufreq: qcom-hw: Set CPU affinity of dcvsh interrupts
cpufreq: qcom-hw: Fix probable nested interrupt handling
cpufreq: qcom-cpufreq-hw: Avoid stack buffer for IRQ name
arch_topology: Remove unused topology_set_thermal_pressure() and related
cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function
cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
thermal: cpufreq_cooling: Use new thermal pressure update function
arch_topology: Introduce thermal pressure update function
implementing PM runtime support (Oleksij Rempel)
- Add 'const' annotation to the thermal_cooling_ops in the Intel
powerclamp driver (Rikard Falkeborn)
- Add TSU driver and bindings for the RZ/G2L platform (Biju Das)
- Fix the missing ADC bit set on iMX8MP to enable the sensor (Paul
Gerber)
- Fix missing check when calling reset_control_deassert() (Biju Das)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmHEmTsACgkQqDIjiipP
6E/KMgf+MdzK/OTuxP+pf0yVqpC3WpH6lrNhBRC+mO4S0yAe8ZlYCFVVtvMftSrn
M2H8uxw74XlVkD8yi1kSYa1iVeD9jF+5ZViOybc83SKwryYEFI7tjxsjpaPFotxe
ZnL8y55hDC63YpwNeafGgW9M/JFehBP09zafakdVsvzzNGP8XKoYQcgyeofwNIrX
CRwU95K85k0eS2h9gPFPRWaNwZS4ubVEYr857oiGArX4/hEycvJ2UO+cH+zEmGjQ
QKJIEh5gQEBs2lKgcqUwzNbitm5lUG+rYRWD9omQ4QLtHK/fxZw2CMMVnNso/G5z
lslc8pLLwLh5QdR3v9tX4+OAxsV38A==
=uIVF
-----END PGP SIGNATURE-----
Merge tag 'thermal-v5.17-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal control material for 5.17-rc1 from Daniel Lezcano:
- Fix PM issue on the iMX driver when suspend/resume is happening by
implementing PM runtime support (Oleksij Rempel)
- Add 'const' annotation to the thermal_cooling_ops in the Intel
powerclamp driver (Rikard Falkeborn)
- Add TSU driver and bindings for the RZ/G2L platform (Biju Das)
- Fix missing ADC bit set on iMX8MP to enable the sensor (Paul Gerber)
- Fix missing check when calling reset_control_deassert() (Biju Das)
* tag 'thermal-v5.17-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal/drivers/rz2gl: Add error check for reset_control_deassert()
thermal/drivers/imx8mm: Enable ADC when enabling monitor
thermal/drivers: Add TSU driver for RZ/G2L
dt-bindings: thermal: Document Renesas RZ/G2L TSU
thermal/drivers/intel_powerclamp: Constify static thermal_cooling_device_ops
thermal/drivers/imx: Implement runtime PM support
If reset_control_deassert() fails, then we won't be able to access
the device registers. Therefore check the return code of
reset_control_deassert() and bail out in case of error.
While at it replace the parameter "&pdev->dev" -> "dev" in
devm_reset_control_get_exclusive().
Suggested-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://lore.kernel.org/r/20211208164010.4130-1-biju.das.jz@bp.renesas.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The VCoRefLow CPU FIVR register definition for Tiger Lake is incorrect.
Current implementation reads it from MMIO offset 0x5A18 and bit
offset [12:14], but the actual correct register definition is from
bit offset [11:13].
Update to fix the bit offset.
Fixes: 473be51142 ("thermal: int340x: processor_thermal: Add RFIM driver")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Cc: 5.14+ <stable@vger.kernel.org> # 5.14+
[ rjw: New subject, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The i.MX 8MP has a ADC_PD bit in the TMU_TER register that controls the
operating mode of the ADC:
* 0 means normal operating mode
* 1 means power down mode
When enabling/disabling the TMU, the ADC operating mode must be set
accordingly.
i.MX 8M Mini & Nano are lacking this bit.
Signed-off-by: Paul Gerber <Paul.Gerber@tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Fixes: 2b8f1f0337 ("thermal: imx8mm: Add i.MX8MP support")
Link: https://lore.kernel.org/r/20211122114225.196280-1-alexander.stein@ew.tq-group.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The RZ/G2L SoC incorporates a thermal sensor unit (TSU) that measures the
temperature inside the LSI.
The thermal sensor in this unit measures temperatures in the range from
−40 degree Celsius to 125 degree Celsius with an accuracy of ±3°C. The
TSU repeats measurement at 20 microseconds intervals and automatically
updates the results of measurement.
The TSU has no interrupts as well as no external pins.
This patch adds Thermal Sensor Unit(TSU) driver for RZ/G2L SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20211130155757.17837-3-biju.das.jz@bp.renesas.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The only usage of powerclamp_cooling_ops is to pass its address to
thermal_cooling_device_register(), which takes a pointer to const struct
thermal_cooling_device_ops. Make it const to allow the compiler to put
it in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Link: https://lore.kernel.org/r/20211128214641.30953-1-rikard.falkeborn@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Starting with commit d92ed2c9d3 ("thermal: imx: Use driver's local
data to decide whether to run a measurement") this driver stared using
irq_enabled flag to make decision to power on/off the thermal
core. This triggered a regression, where after reaching critical
temperature, alarm IRQ handler set irq_enabled to false, disabled
thermal core and was not able read temperature and disable cooling
sequence.
In case the cooling device is "CPU/GPU freq", the system will run with
reduce performance until next reboot.
To solve this issue, we need to move all parts implementing hand made
runtime power management and let it handle actual runtime PM framework.
Fixes: d92ed2c9d3 ("thermal: imx: Use driver's local data to decide whether to run a measurement")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Tested-by: Petr Beneš <petr.benes@ysoft.com>
Link: https://lore.kernel.org/r/20211117103426.81813-1-o.rempel@pengutronix.de
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
In preparation for FORTIFY_SOURCE performing compile-time and
run-time field bounds checking for memcpy(), avoid intentionally
writing across neighboring fields.
Use struct_group() in struct art around members weight, and
ac[0-9]_max, so they can be referenced together. This will allow
memcpy() and sizeof() to more easily reason about sizes, improve
readability, and avoid future warnings about writing beyond the
end of weight.
"pahole" shows no size nor member offset changes to struct art.
"objdump -d" shows no meaningful object code changes (i.e. only
source line number induced differences).
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Thermal pressure provides a new API, which allows to use CPU frequency
as an argument. That removes the need of local conversion to capacity.
Use this new function and remove old conversion code.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
During the suspend is in process, thermal_zone_device_update bails out
thermal zone re-evaluation for any sensor trip violation without
setting next valid trip to that sensor. It assumes during resume
it will re-evaluate same thermal zone and update trip. But when it is
in suspend temperature goes down and on resume path while updating
thermal zone if temperature is less than previously violated trip,
thermal zone set trip function evaluates the same previous high and
previous low trip as new high and low trip. Since there is no change
in high/low trip, it bails out from thermal zone set trip API without
setting any trip. It leads to a case where sensor high trip or low
trip is disabled forever even though thermal zone has a valid high
or low trip.
During thermal zone device init, reset thermal zone previous high
and low trip. It resolves above mentioned scenario.
Signed-off-by: Manaf Meethalavalappu Pallikunhi <manafm@codeaurora.org>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
32-bit processors cannot generally access 64-bit MMIO registers
atomically, and it is unknown in which order the two halves of
this registers would need to be read:
drivers/thermal/intel/int340x_thermal/processor_thermal_mbox.c: In function 'send_mbox_cmd':
drivers/thermal/intel/int340x_thermal/processor_thermal_mbox.c:79:37: error: implicit declaration of function 'readq'; did you mean 'readl'? [-Werror=implicit-function-declaration]
79 | *cmd_resp = readq((void __iomem *) (proc_priv->mmio_base + MBOX_OFFSET_DATA));
| ^~~~~
| readl
The driver already does not build for anything other than x86,
so limit it further to x86-64.
Fixes: aeb58c860d ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit aeb58c860d ("thermal/drivers/int340x: processor_thermal: Suppot
64 bit RFIM responses") started using 'readq()' to read 64-bit status
responses from the int340x hardware.
That's all fine and good, but on 32-bit targets a 64-bit 'readq()' is
ambiguous, since it's no longer an atomic access. Some hardware might
require 64-bit accesses, and other hardware might want low word first or
high word first.
It's quite likely that the driver isn't relevant in a 32-bit environment
any more, and there's a patch floating around to just make it depend on
X86_64, but let's make it buildable on x86-32 anyway.
The driver previously just read the low 32 bits, so the hardware
certainly is ok with 32-bit reads, and in a little-endian environment
the low word first model is the natural one.
So just add the include for the 'io-64-nonatomic-lo-hi.h' version.
Fixes: aeb58c860d ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use pr_warn_once() instead of pr_warn() to print the user space
governor deprecation message in user_space_bind() to reduce the
kernel log noise.
Fixes: 0275c9fb0e ("thermal/core: Make the userspace governor deprecated")
Reported-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
of_parse_thermal_zones() parses the thermal-zones node and registers a
thermal_zone device for each subnode. However, if a thermal zone is
consuming a thermal sensor and that thermal sensor device hasn't probed
yet, an attempt to set trip_point_*_temp for that thermal zone device
can cause a NULL pointer dereference. Fix it.
console:/sys/class/thermal/thermal_zone87 # echo 120000 > trip_point_0_temp
...
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
...
Call trace:
of_thermal_set_trip_temp+0x40/0xc4
trip_point_temp_store+0xc0/0x1dc
dev_attr_store+0x38/0x88
sysfs_kf_write+0x64/0xc0
kernfs_fop_write_iter+0x108/0x1d0
vfs_write+0x2f4/0x368
ksys_write+0x7c/0xec
__arm64_sys_write+0x20/0x30
el0_svc_common.llvm.7279915941325364641+0xbc/0x1bc
do_el0_svc+0x28/0xa0
el0_svc+0x14/0x24
el0_sync_handler+0x88/0xec
el0_sync+0x1c0/0x200
While at it, fix the possible NULL pointer dereference in other
functions as well: of_thermal_get_temp(), of_thermal_set_emul_temp(),
of_thermal_get_trend().
Suggested-by: David Collins <quic_collinsd@quicinc.com>
Signed-off-by: Subbaraman Narayanamurthy <quic_subbaram@quicinc.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some of the RFIM mail box command returns 64 bit values. So enhance
mailbox interface to return 64 bit values and use them for RFIM
commands.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Fixes: 5d6fbc96bd ("thermal/drivers/int340x: processor_thermal: Export additional attributes")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cooling devices have their cooling device set_cur_state
read-writable all the time in the sysfs directory, thus allowing the
userspace to act on it.
The thermal framework is wrongly used by userspace as a power capping
framework by acting on the cooling device opaque state. This one then
competes with the in-kernel governor decision.
We have seen in out-of-tree kernels, a big number of devices which are
abusely declaring themselves as cooling device just to act on their
power.
The role of the thermal framework is to protect the junction
temperature of the silicon. Letting the userspace to play with a
cooling device is invalid and potentially dangerous.
The powercap framework is the right framework to do power capping and
moreover it deals with the aggregation via the dev pm qos.
As the userspace governor is marked deprecated and about to be
removed, there is no point to keep this file writable also in the
future.
Emit a warning and deprecate the interface.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20211019163506.2831454-2-daniel.lezcano@linaro.org
The userspace governor is sending temperature when polling is active
and trip point crossed events. Nothing else.
AFAICT, this governor is used with custom kernels making the userspace
governor co-existing with another governor on the same thermal zone
because there was no notification mechanism, implying a hack in the
framework to support this configuration.
The new netlink thermal notification is able to provide more
information than the userspace governor and give the opportunity to
the users of this governor to replace it by a dedicated notification
framework.
The userspace governor will be removed as its usage is no longer
needed.
Add a warning message to tell the userspace governor is deprecated.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20211019163506.2831454-1-daniel.lezcano@linaro.org
When the driver resumes, the tcc offset is set back to its previous
value. But this only works if the value was user defined as otherwise
the offset isn't saved. This asymmetric logic is harder to maintain and
introduced some issues.
Improve the logic by saving the tcc offset in a suspend op, so the right
value is always restored after a resume.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pI andruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20210909085613.5577-3-atenart@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The tsadc node in rk356x.dtsi has more resets then currently supported
by the rockchip_thermal.c driver, so use
devm_reset_control_array_get() to reset them all.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20210930110517.14323-3-jbx6244@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The function can loop and lock the system if for whatever reason the bit
for the target sensor is NEVER valid. This is the case if a sensor is
disabled by the factory and the valid bit is never reported as actually
valid. Add a timeout check and exit if a timeout occurs. As this is
a very rare condition, handle the timeout only if the first read fails.
While at it also rework the function to improve readability and convert
to poll_timeout generic macro.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211007172859.583-1-ansuelsmth@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
When device_register() return failed, program will goto out_kfree_type
to release 'cdev->device' by put_device(). That will call thermal_release()
to free 'cdev'. But the follow-up processes access 'cdev' continually.
That trggers the UAF bug.
====================================================================
BUG: KASAN: use-after-free in __thermal_cooling_device_register+0x75b/0xa90
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
dump_stack_lvl+0xe2/0x152
print_address_description.constprop.0+0x21/0x140
? __thermal_cooling_device_register+0x75b/0xa90
kasan_report.cold+0x7f/0x11b
? __thermal_cooling_device_register+0x75b/0xa90
__thermal_cooling_device_register+0x75b/0xa90
? memset+0x20/0x40
? __sanitizer_cov_trace_pc+0x1d/0x50
? __devres_alloc_node+0x130/0x180
devm_thermal_of_cooling_device_register+0x67/0xf0
max6650_probe.cold+0x557/0x6aa
......
Freed by task 258:
kasan_save_stack+0x1b/0x40
kasan_set_track+0x1c/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0x109/0x140
kfree+0x117/0x4c0
thermal_release+0xa0/0x110
device_release+0xa7/0x240
kobject_put+0x1ce/0x540
put_device+0x20/0x30
__thermal_cooling_device_register+0x731/0xa90
devm_thermal_of_cooling_device_register+0x67/0xf0
max6650_probe.cold+0x557/0x6aa [max6650]
Do not use 'cdev' again after put_device() to fix the problem like doing
in thermal_zone_device_register().
[dlezcano]: as requested by Rafael, change the affectation into two statements.
Fixes: 5848376181 ("thermal/drivers/core: Use a char pointer for the cooling device name")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20211015024504.947520-1-william.xuanziyang@huawei.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
If both dev_set_name() and device_register() failed, then null pointer
dereference occurs in thermal_release() which will use strncmp() to
compare the name.
So fix it by adding dev_set_name() return value check.
Signed-off-by: Yuanzheng Song <songyuanzheng@huawei.com>
Link: https://lore.kernel.org/r/20211015083230.67658-1-songyuanzheng@huawei.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
In production hardware the calibration values used to convert register
values to temperatures can be read from hardware. While pre-production
hardware still depends on pseudo values hard-coded in the driver.
Add support for reading out calibration values from hardware if it's
fused. The presence of fused calibration is indicated in the THSCP
register.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20211014103816.1939782-3-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Prepare for reading the THCODE and PTAT values from hardware fuses by
storing the values used during calculations in the drivers private
data structures.
As the values are now stored directly in the private data structures
there is no need to keep track of the TSC channel id as its only usage
was to lookup the THCODE row, drop it.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20211014103816.1939782-2-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The variant of the ADC Thermal Monitor block found in e.g. PM8998 is
"HC", add support for this variant to the ADC TM5 driver in order to
support using VADC channels as thermal_zones on SDM845 et al.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20211005032531.2251928-3-bjorn.andersson@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The slope of the temperature increase or decrease can be high and when
the temperature crosses the trip point, there could be a significant
difference between the trip temperature and the measured temperatures.
That forces the userspace to read the temperature back right after
receiving a trip violation notification.
In order to be efficient, give the temperature which resulted in the
trip violation.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20211001223323.1836640-1-daniel.lezcano@linaro.org
The only usage of thermal_mmio_ops is to pass its address to
devm_thermal_zone_of_sensor_register(), which has a pointer to const
struct thermal_zone_of_device_ops as argument. Make it const to allow
the compiler to put it in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Acked-by: Talel Shenhar <talel@amazon.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210920203849.32136-1-rikard.falkeborn@gmail.com
This check has a signedness bug and does not work. If "length" is
larger than "PAGE_SIZE" then "PAGE_SIZE - length" is not negative
but instead it is a large unsigned value. Fortunately, Takashi Iwai
changed this code to use scnprint() instead of snprintf() so now
"length" is never larger than "PAGE_SIZE - 1" and the check can be
removed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
'cpu_clamping_mask' is a bitmap. So use 'bitmap_zalloc()' and
'bitmap_free()' to simplify code, improve the semantic of the code and
avoid some open-coded arithmetic in allocator arguments.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some devices can have some thermal sensors disabled from the
factory. The current two irq handler functions check all the sensor by
default and the check if the sensor was actually registered is
wrong. The tzd is actually never set if the registration fails hence
the IS_ERR check is wrong.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210907212543.20220-1-ansuelsmth@gmail.com
After printing the list of thermal governors, then this function prints
a newline character. The problem is that "size" has not been updated
after printing the last governor. This means that it can write one
character (the NUL terminator) beyond the end of the buffer.
Get rid of the "size" variable and just use "PAGE_SIZE - count" directly.
Fixes: 1b4f48494e ("thermal: core: group functions related to governor handling")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210916131342.GB25094@kili
After upgrading to Linux 5.13.3 I noticed my laptop would shutdown due
to overheat (when it should not). It turned out this was due to commit
fe6a6de669 ("thermal/drivers/int340x/processor_thermal: Fix tcc setting").
What happens is this drivers uses a global variable to keep track of the
tcc offset (tcc_offset_save) and uses it on resume. The issue is this
variable is initialized to 0, but is only set in
tcc_offset_degree_celsius_store, i.e. when the tcc offset is explicitly
set by userspace. If that does not happen, the resume path will set the
offset to 0 (in my case the h/w default being 3, the offset would become
too low after a suspend/resume cycle).
The issue did not arise before commit fe6a6de669, as the function
setting the offset would return if the offset was 0. This is no longer
the case (rightfully).
Fix this by not applying the offset if it wasn't saved before, reverting
back to the old logic. A better approach will come later, but this will
be easier to apply to stable kernels.
The logic to restore the offset after a resume was there long before
commit fe6a6de669, but as a value of 0 was considered invalid I'm
referencing the commit that made the issue possible in the Fixes tag
instead.
Fixes: fe6a6de669 ("thermal/drivers/int340x/processor_thermal: Fix tcc setting")
Cc: stable@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pI andruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20210909085613.5577-2-atenart@kernel.org
tegra by adding a dependency on ARCH_TEGRA along with COMPILE_TEST
(Dmitry Osipenko)
- Fix the error code for the exynos when devm_get_clk() fails (Dan
Carpenter)
- Add the TCC cooling support for AlderLake platform (Sumeet Pawnikar)
- Add support for hardware trip points for the rcar gen3 thermal
driver and store TSC id as unsigned int (Niklas Söderlund)
- Replace the deprecated CPU-hotplug functions get_online_cpus() and
put_online_cpus (Sebastian Andrzej Siewior)
- Add the thermal tools directory in the MAINTAINERS file (Daniel
Lezcano)
- Fix the Makefile and the cross compilation flags for the userspace
'tmon' tool (Rolf Eike Beer)
- Allow to use the IMOK independently from the GDDV on Int340x (Sumeet
Pawnikar)
- Fix the stub thermal_cooling_device_register() function prototype
which does not match the real function (Arnd Bergmann)
- Make the thermal trip point optional in the DT bindings (Maxime
Ripard)
- Fix a typo in a comment in the core code (Geert Uytterhoeven)
- Reduce the verbosity of the trace in the SoC thermal tegra driver
(Dmitry Osipenko)
- Add the support for the LMh (Limit Management hardware) driver on
the QCom platforms (Thara Gopinath)
- Allow processing of HWP interrupt by adding a weak function in the
Intel driver (Srinivas Pandruvada)
- Prevent an abort of the sensor probe is a channel is not used
(Matthias Kaehlcke)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmE7yAcACgkQqDIjiipP
6E8WKwgAmI+kdOURCz1CtCIv6FnQFx20suxZidfATPULBPZ8zcHqdjxJECXJpvDz
y8T8dGAPRqXmMBJm2vF/qVraHaDJvscHXaflhsurhyEp0tiGptNeugBosso0jDEQ
JFrQ6Rny/NCmNMOkWW4YlSfkik0zQ43xlHocy3UfOKfcshhYWGsPTbIXdA6wH/zh
3tFW8j327tkyenrKZiaioJOSX88Oyy7Ft8l5k0HP+hPYkrvihWDJuEfEEmpfkdA7
Q70Lacf2QtFP8mUfGKpQxURSgndjphDPU7tzP5NqqWvLuT94aTspvJx4ywVwK8Iv
o+jNvR33yBIs5FXXB7B1ymkNE1h8uQ==
=blIQ
-----END PGP SIGNATURE-----
Merge tag 'thermal-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Add the tegra3 thermal sensor and fix the compilation testing on
tegra by adding a dependency on ARCH_TEGRA along with COMPILE_TEST
(Dmitry Osipenko)
- Fix the error code for the exynos when devm_get_clk() fails (Dan
Carpenter)
- Add the TCC cooling support for AlderLake platform (Sumeet Pawnikar)
- Add support for hardware trip points for the rcar gen3 thermal driver
and store TSC id as unsigned int (Niklas Söderlund)
- Replace the deprecated CPU-hotplug functions get_online_cpus() and
put_online_cpus (Sebastian Andrzej Siewior)
- Add the thermal tools directory in the MAINTAINERS file (Daniel
Lezcano)
- Fix the Makefile and the cross compilation flags for the userspace
'tmon' tool (Rolf Eike Beer)
- Allow to use the IMOK independently from the GDDV on Int340x (Sumeet
Pawnikar)
- Fix the stub thermal_cooling_device_register() function prototype
which does not match the real function (Arnd Bergmann)
- Make the thermal trip point optional in the DT bindings (Maxime
Ripard)
- Fix a typo in a comment in the core code (Geert Uytterhoeven)
- Reduce the verbosity of the trace in the SoC thermal tegra driver
(Dmitry Osipenko)
- Add the support for the LMh (Limit Management hardware) driver on the
QCom platforms (Thara Gopinath)
- Allow processing of HWP interrupt by adding a weak function in the
Intel driver (Srinivas Pandruvada)
- Prevent an abort of the sensor probe is a channel is not used
(Matthias Kaehlcke)
* tag 'thermal-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal/drivers/qcom/spmi-adc-tm5: Don't abort probing if a sensor is not used
thermal/drivers/intel: Allow processing of HWP interrupt
dt-bindings: thermal: Add dt binding for QCOM LMh
thermal/drivers/qcom: Add support for LMh driver
firmware: qcom_scm: Introduce SCM calls to access LMh
thermal/drivers/tegra-soctherm: Silence message about clamped temperature
thermal: Spelling s/scallbacks/callbacks/
dt-bindings: thermal: Make trips node optional
thermal/core: Fix thermal_cooling_device_register() prototype
thermal/drivers/int340x: Use IMOK independently
tools/thermal/tmon: Add cross compiling support
thermal/tools/tmon: Improve the Makefile
MAINTAINERS: Add missing userspace thermal tools to the thermal section
thermal/drivers/intel_powerclamp: Replace deprecated CPU-hotplug functions.
thermal/drivers/rcar_gen3_thermal: Store TSC id as unsigned int
thermal/drivers/rcar_gen3_thermal: Add support for hardware trip points
drivers/thermal/intel: Add TCC cooling support for AlderLake platform
thermal/drivers/exynos: Fix an error code in exynos_tmu_probe()
thermal/drivers/tegra: Correct compile-testing of drivers
thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
adc_tm5_register_tzd() registers the thermal zone sensors for all
channels of the thermal monitor. If the registration of one channel
fails the function skips the processing of the remaining channels
and returns an error, which results in _probe() being aborted.
One of the reasons the registration could fail is that none of the
thermal zones is using the channel/sensor, which hardly is a critical
error (if it is an error at all). If this case is detected emit a
warning and continue with processing the remaining channels.
Fixes: ca66dca5ed ("thermal: qcom: add support for adc-tm5 PMIC thermal monitor")
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reported-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210823134726.1.I1dd23ddf77e5b3568625d80d6827653af071ce19@changeid
Add a weak function to process HWP (Hardware P-states) notifications and
move updating HWP_STATUS MSR to this function.
This allows HWP interrupts to be processed by the intel_pstate driver in
HWP mode by overriding the implementation.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210820024006.2347720-1-srinivas.pandruvada@linux.intel.com
Merge more updates from Andrew Morton:
"147 patches, based on 7d2a07b769.
Subsystems affected by this patch series: mm (memory-hotplug, rmap,
ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
selftests, ipc, and scripts"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
scripts: check_extable: fix typo in user error message
mm/workingset: correct kernel-doc notations
ipc: replace costly bailout check in sysvipc_find_ipc()
selftests/memfd: remove unused variable
Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
configs: remove the obsolete CONFIG_INPUT_POLLDEV
prctl: allow to setup brk for et_dyn executables
pid: cleanup the stale comment mentioning pidmap_init().
kernel/fork.c: unexport get_{mm,task}_exe_file
coredump: fix memleak in dump_vma_snapshot()
fs/coredump.c: log if a core dump is aborted due to changed file permissions
nilfs2: use refcount_dec_and_lock() to fix potential UAF
nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
nilfs2: fix NULL pointer in nilfs_##name##_attr_release
nilfs2: fix memory leak in nilfs_sysfs_create_device_group
trap: cleanup trap_init()
init: move usermodehelper_enable() to populate_rootfs()
...
HZ unit conversion macros are available in units.h, use them and remove
the duplicate definition.
The new macro uses a unsigned long type which is already the type in the
current code via the 'freq' variable.
Link: https://lkml.kernel.org/r/20210816114732.1834145-4-daniel.lezcano@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Christian Eggers <ceggers@arri.de>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
Cc: Peter Meerwald <pmeerw@pmeerw.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Highlights:
- Move all the Intel drivers into their own subdir(s) (mostly Kate's work)
- New meraki-mx100 platform driver
- Asus WMI driver enhancements, including
/sys/firmware/acpi/platform_profile support
- New BIOS SAR driver for Intel M.2 WWAM modems
- Alder Lake support for the Intel PMC driver
- A whole bunch of cleanups + fixes all over the place
The following is an automated git shortlog grouped by driver:
BIOS SAR driver for Intel M.2 Modem:
- BIOS SAR driver for Intel M.2 Modem
ISST:
- use semi-colons instead of commas
- Fix optimization with use of numa
Replace deprecated CPU-hotplug functions.:
- Replace deprecated CPU-hotplug functions.
Update Mario Limonciello's email address in the docs:
- Update Mario Limonciello's email address in the docs
acer-wmi:
- Add Turbo Mode support for Acer PH315-53
add meraki-mx100 platform driver:
- add meraki-mx100 platform driver
asus-nb-wmi:
- Add tablet_mode_sw=lid-flip quirk for the TP200s
- Allow configuring SW_TABLET_MODE method with a module option
asus-wmi:
- Fix "unsigned 'retval' is never less than zero" smatch warning
- Delete impossible condition
- Add support for platform_profile
- Add egpu enable method
- Add dgpu disable method
- Add panel overdrive functionality
dell-smbios:
- Remove unused dmi_system_id table
dell-smbios-wmi:
- Add missing kfree in error-exit from run_smbios_call
- Avoid false-positive memcpy() warning
dell-smo8800:
- Convert to be a platform driver
dual_accel_detect:
- Use the new i2c_acpi_client_count() helper
gigabyte-wmi:
- add support for B450M S2H V2
- add support for X570 GAMING X
hp_accel:
- Convert to be a platform driver
- Remove _INI method call
i2c:
- acpi: Add an i2c_acpi_client_count() helper function
i2c-multi-instantiate:
- Use the new i2c_acpi_client_count() helper
ideapad-laptop:
- Fix Legion 5 Fn lock LED
intel-hid:
- Move to intel sub-directory
intel-rst:
- Move to intel sub-directory
intel-smartconnect:
- Move to intel sub-directory
intel-uncore-frequency:
- Move to intel sub-directory
intel-vbtn:
- Move to intel sub-directory
intel-wmi-sbl-fw-update:
- Move to intel sub-directory
intel-wmi-thunderbolt:
- Move to intel sub-directory
intel_atomisp2:
- Move to intel sub-directory
intel_bxtwc_tmu:
- Move to intel sub-directory
intel_cht_int33fe:
- Use the new i2c_acpi_client_count() helper
intel_chtdc_ti_pwrbtn:
- Move to intel sub-directory
intel_int0002_vgpio:
- Move to intel sub-directory
intel_mrfld_pwrbtn:
- Move to intel sub-directory
intel_oaktrail:
- Move to intel sub-directory
intel_pmc_core:
- Move to intel sub-directory
- Prevent possibile overflow
intel_pmt_telemetry:
- Ignore zero sized entries
intel_punit_ipc:
- Move to intel sub-directory
intel_scu_ipc:
- Fix doc of intel_scu_ipc_dev_command_with_size()
intel_speed_select_if:
- Move to intel sub-directory
intel_telemetry:
- Move to intel sub-directory
intel_turbo_max_3:
- Move to intel sub-directory
lg-laptop:
- Use correct event for keyboard backlight FN-key
- Use correct event for touchpad toggle FN-key
- Support for battery charge limit on newer models
platform/mellanox:
- mlxbf-pmc: fix kernel-doc notation
platform/surface:
- aggregator: Use y instead of objs in Makefile
- surface3_power: Use i2c_acpi_get_i2c_resource() helper
platform/x86/intel:
- pmc/core: Add GBE Package C10 fix for Alder Lake PCH
- pmc/core: Add Alder Lake low power mode support for pmc core
- pmc/core: Add Latency Tolerance Reporting (LTR) support to Alder Lake
- pmc/core: Add Alderlake support to pmc core driver
- int3472: Use y instead of objs in Makefile
- pmt: Use y instead of objs in Makefile
- int33fe: Use y instead of objs in Makefile
- Move Intel PMT drivers to new subfolder
thermal/drivers/intel:
- Move intel_menlow to thermal drivers
think-lmi:
- add debug_cmd
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmEw1KAUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9wk7Qf/dsaaDgx7aC6DfKzdcMgfeLIdTaGm
a6svNXM2t/JFdvhjYzxA+4QlQgco7zkN06iRlWbObEonSUsHlGlwEOHX60VgcopO
qJaqnznmfdXUocFTnA+5acJXabNaw7xkKHS0K61UWgk+mm6aMuygpKxULnNTa4X+
p3HoU6uXFckpoA/Jstzo5UfegNYhg11bflNd7XN4F3rMCbbNHAsWlf4oVr2YsEHa
wECW+1e8wZl4BInUzoXQhilRoybJWXWJ8sLsvQfDXLs9aNoLdDqu9p0MuXEW5QqE
wNt26SNNAP2L49BD6kaJszV5Ry/jNSEtVkWwkrzHbGTZoOEyMYas8pm6Uw==
=iK1W
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"Highlights:
- Move all the Intel drivers into their own subdir(s) (mostly Kate's
work)
- New meraki-mx100 platform driver
- Asus WMI driver enhancements, including support for
/sys/firmware/acpi/platform_profile
- New BIOS SAR driver for Intel M.2 WWAM modems
- Alder Lake support for the Intel PMC driver
- A whole bunch of cleanups + fixes all over the place"
* tag 'platform-drivers-x86-v5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (65 commits)
platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call
platform/x86: dell-smbios-wmi: Avoid false-positive memcpy() warning
platform/x86: ISST: use semi-colons instead of commas
platform/x86: asus-wmi: Fix "unsigned 'retval' is never less than zero" smatch warning
platform/x86: asus-wmi: Delete impossible condition
platform/x86: hp_accel: Convert to be a platform driver
platform/x86: hp_accel: Remove _INI method call
platform/mellanox: mlxbf-pmc: fix kernel-doc notation
platform/x86/intel: pmc/core: Add GBE Package C10 fix for Alder Lake PCH
platform/x86/intel: pmc/core: Add Alder Lake low power mode support for pmc core
platform/x86/intel: pmc/core: Add Latency Tolerance Reporting (LTR) support to Alder Lake
platform/x86/intel: pmc/core: Add Alderlake support to pmc core driver
platform/x86: intel-wmi-thunderbolt: Move to intel sub-directory
platform/x86: intel-wmi-sbl-fw-update: Move to intel sub-directory
platform/x86: intel-vbtn: Move to intel sub-directory
platform/x86: intel_oaktrail: Move to intel sub-directory
platform/x86: intel_int0002_vgpio: Move to intel sub-directory
platform/x86: intel-hid: Move to intel sub-directory
platform/x86: intel_atomisp2: Move to intel sub-directory
platform/x86: intel_speed_select_if: Move to intel sub-directory
...
Add a weak function to process HWP (Hardware P-states) notifications and
move updating HWP_STATUS MSR to this function.
This allows HWP interrupts to be processed by the intel_pstate driver in
HWP mode by overriding the implementation.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Driver enabling various pieces of Limits Management Hardware(LMh) for cpu
cluster0 and cpu cluster1 namely kick starting monitoring of temperature,
current, battery current violations, enabling reliability algorithm and
setting up various temperature limits.
The following has been explained in the cover letter. I am including this
here so that this remains in the commit message as well.
LMh is a hardware infrastructure on some Qualcomm SoCs that can enforce
temperature and current limits as programmed by software for certain IPs
like CPU. On many newer LMh is configured by firmware/TZ and no programming
is needed from the kernel side. But on certain SoCs like sdm845 the
firmware does not do a complete programming of the h/w. On such soc's
kernel software has to explicitly set up the temperature limits and turn on
various monitoring and enforcing algorithms on the hardware.
Tested-by: Steev Klimaszewski <steev@kali.org> # Lenovo Yoga C630
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210809191605.3742979-3-thara.gopinath@linaro.org
Moved drivers/platform/x86/intel_menlow.c to drivers/thermal/intel.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20210816035356.1955982-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The Tegra soctherm driver prints message about the clamped temperature
trip each time when thermal core disables the low/high trip. The message
is confusing and creates illusion that driver is malfunctioning. Turn that
noisy info message into a debug message.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210712002353.17276-1-digetx@gmail.com
Some chrome platform requires IMOK method in coreboot. But these platforms
don't use GDDV data vault in coreboot. As per current code flow, to enable
and use IMOK only, we need to have GDDV support as well in coreboot. This
patch removes the dependency for IMOK from GDDV to enable and use IMOK
independently.
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210716163946.3142-1-sumeet.r.pawnikar@intel.com
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().
Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amitk@kernel.org>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210803141621.780504-20-bigeasy@linutronix.de
The TSC id and number of TSC ids should be stored as unsigned int as
they can't be negative. Fix the datatype of the loop counter 'i' and
rcar_gen3_thermal_tsc.id to reflect this.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210804091818.2196806-3-niklas.soderlund+renesas@ragnatech.se
All supported hardware except V3U is capable of generating interrupts
to the CPU when the temperature go below or above a set value. Use this
to implement support for the set_trip() feature of the thermal core on
supported hardware.
The V3U have its interrupts routed to the ECM module and therefore can
not be used to implement set_trip() as the driver can't be made aware of
when the interrupt triggers.
Each TSC is capable of tracking up-to three different temperatures while
only two are needed to implement the tracking of the thermal window.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210804091818.2196806-2-niklas.soderlund+renesas@ragnatech.se
This error path return success but it should propagate the negative
error code from devm_clk_get().
Fixes: 6c247393cf ("thermal: exynos: Add TMU support for Exynos7 SoC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210810084413.GA23810@kili
All Tegra thermal drivers support compile-testing, but the drivers are
not available for compile-testing because the whole Kconfig meny entry
depends on ARCH_TEGRA, missing the alternative COMPILE_TEST dependency
option. Correct the Kconfig entry.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210617072403.3487-1-digetx@gmail.com
All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
monitors temperature and voltage of the SoC. Sensors control CPU frequency
throttling, which is activated by hardware once preprogrammed temperature
level is breached, they also send signal to Power Management controller to
perform emergency shutdown on a critical overheat of the SoC die. Add
driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor.
Tested-by: Andreas Westman Dorcsak <hedmoo@yahoo.com> # Asus TF700T
Tested-by: Maxim Schwalm <maxim.schwalm@gmail.com> # Asus TF700T
Tested-by: Svyatoslav Ryhel <clamor95@gmail.com> # Asus TF201T
Tested-by: Ihor Didenko <tailormoon@rambler.ru> # Asus TF300T
Tested-by: Ion Agorria <ion@agorria.com> # Asus TF201T
Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya
Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210616190417.32214-4-digetx@gmail.com
- Add missing MODULE_DEVICE_TABLE for the Spreadtrum sensor (Chunyan
Zhang)
- Export additionnal attributes for the int340x thermal processor
(Srinivas Pandruvada)
- Add SC7280 compatible for the tsens driver (Rajeshwari Ravindra
Kamble)
- Fix kernel documentation for thermal_zone_device_unregister() and
use devm_platform_get_and_ioremap_resource() (Yang Yingliang)
- Fix coefficient calculations for the rcar_gen3 sensor driver (Niklas
Söderlund)
- Fix shadowing variable rcar_gen3_ths_tj_1 (Geert Uytterhoeven)
- Add missing of_node_put() for the iMX and Spreadtrum sensors
(Krzysztof Kozlowski)
- Add tegra3 thermal sensor DT bindings (Dmitry Osipenko)
- Stop the thermal zone monitoring when unregistering it to prevent a
temperature update without the 'get_temp' callback (Dmitry Osipenko)
- Add rk3568 DT bindings, convert bindings to yaml schemas and add the
corresponding compatible in the Rockchip sensor (Ezequiel Garcia)
- Add the sc8180x compatible for the Qualcomm tsensor (Bjorn Andersson)
- Use the find_first_zero_bit() function instead of custom code (Andy
Shevchenko)
- Fix the kernel doc for the device cooling device (Yang Li)
- Reorg the processor thermal int340x to set the scene for the PCI
mmio driver (Srinivas Pandruvada)
- Add PCI MMIO driver for the int340x processor thermal driver
(Srinivas Pandruvada)
- Add hwmon sensors for the mediatek sensor (Frank Wunderlich)
- Fix warning for return value reported by Smatch for the int340x
thermal processor (Srinivas Pandruvada)
- Fix wrong register access and decoding for the int340x thermal
processor (Srinivas Pandruvada)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmDkbdAACgkQqDIjiipP
6E8pTwgAgc4RMdSBdNMVWP1Lc3Gprmg8uXMhma9ZlmJD+l3h4b4P+Zm7HjU+SPHI
emvvqHgiWGw/ta/Fuhx9XnqTUjZznG3gMYohnfKe7jPgVTxmud+Yf0/E3dRrDNWl
WNolS8rv4dLf1mjR+vGZ63KasK0Rz5Z4YxDV4kOvh+/VUhqg3XpZ1OTKOh/B9IWX
pUTedq7hIZ44ZMINcwObvLNTeaoEhPNpgA4Rs07vQPYugY0V61HszqR/Mu+l8Rgp
LiE8NUBANJ+8+wydHe/vP6lvOthh7YGSx4091rUe+X17qgfBDKCTsOyECnZ/UX+r
aB1MaTAnr1H0dfM49yeoFcMSPc1rGA==
=q0nG
-----END PGP SIGNATURE-----
Merge tag 'thermal-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Add rk3568 sensor support (Finley Xiao)
- Add missing MODULE_DEVICE_TABLE for the Spreadtrum sensor (Chunyan
Zhang)
- Export additionnal attributes for the int340x thermal processor
(Srinivas Pandruvada)
- Add SC7280 compatible for the tsens driver (Rajeshwari Ravindra
Kamble)
- Fix kernel documentation for thermal_zone_device_unregister() and use
devm_platform_get_and_ioremap_resource() (Yang Yingliang)
- Fix coefficient calculations for the rcar_gen3 sensor driver (Niklas
Söderlund)
- Fix shadowing variable rcar_gen3_ths_tj_1 (Geert Uytterhoeven)
- Add missing of_node_put() for the iMX and Spreadtrum sensors
(Krzysztof Kozlowski)
- Add tegra3 thermal sensor DT bindings (Dmitry Osipenko)
- Stop the thermal zone monitoring when unregistering it to prevent a
temperature update without the 'get_temp' callback (Dmitry Osipenko)
- Add rk3568 DT bindings, convert bindings to yaml schemas and add the
corresponding compatible in the Rockchip sensor (Ezequiel Garcia)
- Add the sc8180x compatible for the Qualcomm tsensor (Bjorn Andersson)
- Use the find_first_zero_bit() function instead of custom code (Andy
Shevchenko)
- Fix the kernel doc for the device cooling device (Yang Li)
- Reorg the processor thermal int340x to set the scene for the PCI mmio
driver (Srinivas Pandruvada)
- Add PCI MMIO driver for the int340x processor thermal driver
(Srinivas Pandruvada)
- Add hwmon sensors for the mediatek sensor (Frank Wunderlich)
- Fix warning for return value reported by Smatch for the int340x
thermal processor (Srinivas Pandruvada)
- Fix wrong register access and decoding for the int340x thermal
processor (Srinivas Pandruvada)
* tag 'thermal-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (23 commits)
thermal/drivers/int340x/processor_thermal: Fix tcc setting
thermal/drivers/int340x/processor_thermal: Fix warning for return value
thermal/drivers/mediatek: Add sensors-support
thermal/drivers/int340x/processor_thermal: Add PCI MMIO based thermal driver
thermal/drivers/int340x/processor_thermal: Split enumeration and processing part
thermal: devfreq_cooling: Fix kernel-doc
thermal/drivers/intel/intel_soc_dts_iosf: Switch to use find_first_zero_bit()
dt-bindings: thermal: tsens: Add sc8180x compatible
dt-bindings: rockchip-thermal: Support the RK3568 SoC compatible
dt-bindings: thermal: convert rockchip-thermal to json-schema
thermal/core/thermal_of: Stop zone device before unregistering it
dt-bindings: thermal: Add binding for Tegra30 thermal sensor
thermal/drivers/sprd: Add missing of_node_put for loop iteration
thermal/drivers/imx_sc: Add missing of_node_put for loop iteration
thermal/drivers/rcar_gen3_thermal: Do not shadow rcar_gen3_ths_tj_1
thermal/drivers/rcar_gen3_thermal: Fix coefficient calculations
thermal/drivers/st: Use devm_platform_get_and_ioremap_resource()
thermal/core: Correct function name thermal_zone_device_unregister()
dt-bindings: thermal: tsens: Add compatible string to TSENS binding for SC7280
thermal/drivers/int340x: processor_thermal: Export additional attributes
...
The following fixes are done for tcc sysfs interface:
- TCC is 6 bits only from bit 29-24
- TCC of 0 is valid
- When BIT(31) is set, this register is read only
- Check for invalid tcc value
- Error for negative values
Fixes: fdf4f2fb8e ("drivers: thermal: processor_thermal_device: Export sysfs interface for TCC offset")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210628215803.75038-1-srinivas.pandruvada@linux.intel.com
Add a new PCI driver which register a thermal zone and allows to get
notification for threshold violation by a RW trip point. These
notifications are delivered from the device using MSI based
interrupt.
The main difference between this new PCI driver and the existing
one is that the temperature and trip points directly use PCI
MMIO instead of using ACPI methods.
This driver registers a thermal zone "TCPU_PCI" in addition to the
legacy processor thermal device, which uses ACPI companion device
to set name, temperature and trips.
This driver is enabled for AlderLake.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210525204811.3793651-3-srinivas.pandruvada@linux.intel.com
Remove enumeration part from the processor_thermal_device to two
different modules. One for ACPI and one for PCI:
ACPI enumeration: int3401_thermal
PCI part: processor_thermal_device_pci_legacy
The current processor_thermal_device now just implements interface
functions to be used by the ACPI and PCI enumeration module. This is
done by:
1. Make functions proc_thermal_add() and proc_thermal_remove() non static
and export them for usage in other processor_thermal_device_pci_legacy.c
and in int3401_thermal.c.
2. Move the sysfs file creation for TCC offset and power limit attribute
group to the proc_thermal_add() from the individual enumeration callbacks
for PCI and ACPI.
3. Create new interface functions proc_thermal_mmio_add() and
proc_thermal_mmio_remove() which will be called from the
processor_thermal_device_pci_legacy module.
4. Export proc_thermal_resume(), so that it can be used by power
management callbacks.
5. Remove special check for double enumeration as it never happens.
While here, fix some cleanup on error conditions in proc_thermal_add().
No functional changes are expected with this change.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210525204811.3793651-2-srinivas.pandruvada@linux.intel.com
Fix function name in devfreq_cooling.c comment to remove a
warning found by kernel-doc.
drivers/thermal/devfreq_cooling.c:479: warning: expecting prototype for
devfreq_cooling_em_register_power(). Prototype was for
devfreq_cooling_em_register() instead.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1623223350-128104-1-git-send-email-yang.lee@linux.alibaba.com
Zone device is enabled after thermal_zone_of_sensor_register() completion,
but it's not disabled before senor is unregistered, leaving temperature
polling active. This results in accessing a disabled zone device and
produces a warning about this problem. Stop zone device before
unregistering it in order to fix this "use-after-free" problem.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210616190417.32214-3-digetx@gmail.com
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow
the flexible utilization of SMT siblings, without exposing
untrusted domains to information leaks & side channels, plus
to ensure more deterministic computing performance on SMT
systems used by heterogenous workloads.
There's new prctls to set core scheduling groups, which
allows more flexible management of workloads that can share
siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve
'memcache'-like workloads.
- "Age" (decay) average idle time, to better track & improve workloads
such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked
via /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable
it at runtime if tooling needs it. Use static keys and
other optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZcPoRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g3yw//WfhIqy7Psa9d/MBMjQDRGbTuO4+w22Dj
vmWFU44Q4KJxQHWeIgUlrK+dzvYWvNmflUs2CUUOiDVzxFTHMIyBtL4qCBUbx4Ns
vKAcB9wsWZge2o3WzZqpProRhdoRaSKw8egUr2q7rACVBkckY7eGP/OjWxXU8BdA
b7D0LPWwuIBFfN4pFYeCDLn32Dqr9s6Chyj+ZecabdG7EE6Gu+f1diVcxy7JE/mc
4WWL0D1RqdgpGrBEuMJIxPYekdrZiuy4jtEbztz5gbTBteN1cj3BLfqn0Pc/e6rO
Vyuc5mXCAmzRVi18z6g6bsVl+IA/nrbErENB2OHOhOYtqiZxqGTd4GPWZszMyY17
5AsEO5+5pcaBsy4gyp09qURggBu9zhJnMVmOI3rIHZkmkhwzc6uUJlyhDCTiFWOz
3ZF3LjbZEyCKodMD8qMHbs3axIBpIfZqjzkvSKyFnvfXEGVytVse7NUuWtQ36u92
GnURxVeYY1TDVXvE1Y8owNKMxknKQ6YRlypP7Dtbeo/qG6hShp0xmS7qDLDi0ybZ
ZlK+bDECiVoDf3nvJo+8v5M82IJ3CBt4UYldeRJsa1YCK/FsbK8tp91fkEfnXVue
+U6LPX0AmMpXacR5HaZfb3uBIKRw/QMdP/7RFtBPhpV6jqCrEmuqHnpPQiEVtxwO
UmG7bt94Trk=
=3VDr
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler udpates from Ingo Molnar:
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow the
flexible utilization of SMT siblings, without exposing untrusted
domains to information leaks & side channels, plus to ensure more
deterministic computing performance on SMT systems used by
heterogenous workloads.
There are new prctls to set core scheduling groups, which allows
more flexible management of workloads that can share siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve 'memcache'-like
workloads.
- "Age" (decay) average idle time, to better track & improve
workloads such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked via
/sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable it at
runtime if tooling needs it. Use static keys and other
optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/doc: Update the CPU capacity asymmetry bits
sched/topology: Rework CPU capacity asymmetry detection
sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
psi: Fix race between psi_trigger_create/destroy
sched/fair: Introduce the burstable CFS controller
sched/uclamp: Fix uclamp_tg_restrict()
sched/rt: Fix Deadline utilization tracking during policy change
sched/rt: Fix RT utilization tracking during policy change
sched: Change task_struct::state
sched,arch: Remove unused TASK_STATE offsets
sched,timer: Use __set_current_state()
sched: Add get_current_state()
sched,perf,kvm: Fix preemption condition
sched: Introduce task_is_running()
sched: Unbreak wakeups
sched/fair: Age the average idle time
sched/cpufreq: Consider reduced CPU capacity in energy calculation
sched/fair: Take thermal pressure into account while estimating energy
thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
sched/fair: Return early from update_tg_cfs_load() if delta == 0
...
This commit in sched/urgent moved the cfs_rq_is_decayed() function:
a7b359fc6a: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
and this fresh commit in sched/core modified it in the old location:
9e077b52d8: ("sched/pelt: Check that *_avg are null when *_sum are")
Merge the two variants.
Conflicts:
kernel/sched/fair.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The thermal pressure signal gives information to the scheduler about
reduced CPU capacity due to thermal. It is based on a value stored in
a per-cpu 'thermal_pressure' variable. The online CPUs will get the
new value there, while the offline won't. Unfortunately, when the CPU
is back online, the value read from per-cpu variable might be wrong
(stale data). This might affect the scheduler decisions, since it
sees the CPU capacity differently than what is actually available.
Fix it by making sure that all online+offline CPUs would get the
proper value in their per-cpu variable when thermal framework sets
capping.
Fixes: f12e4f66ab ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20210614191030.22241-1-lukasz.luba@arm.com
Early exits from for_each_available_child_of_node() should decrement the
node reference counter. Reported by Coccinelle:
drivers/thermal/sprd_thermal.c:387:1-23: WARNING:
Function "for_each_child_of_node" should have of_node_put() before goto around lines 391.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Acked-by: Chunyan Zhang <zhang.lyra@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210614192230.19248-2-krzysztof.kozlowski@canonical.com
Early exits from for_each_available_child_of_node() should decrement the
node reference counter. Reported by Coccinelle:
drivers/thermal/imx_sc_thermal.c:93:1-33: WARNING:
Function "for_each_available_child_of_node" should have of_node_put() before return around line 97.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Reviewed-by: Jacky Bai <ping.bai@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210614192230.19248-1-krzysztof.kozlowski@canonical.com
With -Wshadow:
drivers/thermal/rcar_gen3_thermal.c: In function ‘rcar_gen3_thermal_probe’:
drivers/thermal/rcar_gen3_thermal.c:310:13: warning: declaration of ‘rcar_gen3_ths_tj_1’ shadows a global declaration [-Wshadow]
310 | const int *rcar_gen3_ths_tj_1 = of_device_get_match_data(dev);
| ^~~~~~~~~~~~~~~~~~
drivers/thermal/rcar_gen3_thermal.c:246:18: note: shadowed declaration is here
246 | static const int rcar_gen3_ths_tj_1 = 126;
| ^~~~~~~~~~~~~~~~~~
To add to the confusion, the local variable has a different type.
Fix the shadowing by renaming the local variable to ths_tj_1.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/9ea7e65d0331daba96f9a7925cb3d12d2170efb1.1623076804.git.geert+renesas@glider.be
The fixed value of 157 used in the calculations are only correct for
M3-W, on other Gen3 SoC it should be 167. The constant can be derived
correctly from the static TJ_3 constant and the SoC specific TJ_1 value.
Update the calculation be correct on all Gen3 SoCs.
Fixes: 4eb39f79ef ("thermal: rcar_gen3_thermal: Update value of Tj_1")
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210605085211.564909-1-niklas.soderlund+renesas@ragnatech.se
Use devm_platform_get_and_ioremap_resource() to simplify
code and remove error message which within
devm_platform_get_and_ioremap_resource() already.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210605120205.2459578-1-yangyingliang@huawei.com
Fix the following make W=1 kernel build warning:
drivers/thermal/thermal_core.c:1376: warning: expecting prototype for thermal_device_unregister(). Prototype was for thermal_zone_device_unregister() instead
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210517051020.3463536-1-yangyingliang@huawei.com
Export additional attributes:
ddr_data_rate (RO) : Show current DDR (Double Data Rate) data rate.
rfi_restriction (RW) : Show or set current state for RFI (Radio
Frequency Interference) protection.
These attributes use mailbox commands to get/set information. Here
command codes are:
0x0007: Read RFI restriction
0x0107: Read DDR data rate
0x0008: Write RFI restriction
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210517061441.1921901-3-srinivas.pandruvada@linux.intel.com
MODULE_DEVICE_TABLE is used to extract the device information out of the
driver and builds a table when being compiled. If using this macro,
kernel can find the driver if available when the device is plugged in,
and then loads that driver and initializes the device.
Fixes: 554fdbaf19 ("thermal: sprd: Add Spreadtrum thermal driver support")
Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210512093752.243168-1-zhang.lyra@gmail.com
The RK3568 SoCs have two Temperature Sensors, channel 0 is for CPU,
channel 1 is for GPU.
Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com>
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210506175514.168365-5-ezequiel@collabora.com
MSR_AMD64_SEV even though the spec clearly states so, and check CPUID
bits first.
- Send only one signal to a task when it is a SEGV_PKUERR si_code type.
- Do away with all the wankery of reserving X amount of memory in
the first megabyte to prevent BIOS corrupting it and simply and
unconditionally reserve the whole first megabyte.
- Make alternatives NOP optimization work at an arbitrary position
within the patched sequence because the compiler can put single-byte
NOPs for alignment anywhere in the sequence (32-bit retpoline), vs our
previous assumption that the NOPs are only appended.
- Force-disable ENQCMD[S] instructions support and remove update_pasid()
because of insufficient protection against FPU state modification in an
interrupt context, among other xstate horrors which are being addressed
at the moment. This one limits the fallout until proper enablement.
- Use cpu_feature_enabled() in the idxd driver so that it can be
build-time disabled through the defines in .../asm/disabled-features.h.
- Fix LVT thermal setup for SMI delivery mode by making sure the APIC
LVT value is read before APIC initialization so that softlockups during
boot do not happen at least on one machine.
- Mark all legacy interrupts as legacy vectors when the IO-APIC is
disabled and when all legacy interrupts are routed through the PIC.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmC8fdEACgkQEsHwGGHe
VUqO5A/+IbIo8myl8VPjw6HRnHgY8rsYRjxdtmVhbaMi5XOmTMfVA9zJ6QALxseo
Mar8bmWcezEs0/FmNvk1vEOtIgZvRVy5RqXbu3W2EgWICuzRWbj822q+KrkbY0tH
1GWjcZQO8VlgeuQsukyj5QHaBLffpn3Fh1XB8r0cktZvwciM+LRNMnK8d6QjqxNM
ctTX4wdI6kc076pOi7MhKxSe+/xo5Wnf27lClLMOcsO/SS42KqgeRM5psWqxihhL
j6Y3Oe+Nm+7GKF8y841PUSlwjgWmlZa6UkR6DBTP7DGnHDa5hMpzxYvHOquq/SbA
leV9OLqI0iWs56kSzbEcXo7do1kld62KjsA2KtUhJfVAtm+igQLh5G0jESBwrWca
TBWaE5kt6s8wP7LXeg26o4U8XD8vqEH88Tmsjlgqb/t/PKDV9PMGvNpF00dPZFo6
Jhj2yntJYjLQYoAQLuQm5pfnKhZy3KKvk7ViGcnp3iN9i4eU9HzawIiXnliNOrTI
ohQ9KoRhy1Cx0UfLkR+cdK4ks0u26DC2/Ewt0CE5AP/CQ1rX6Zbv2gFLjSpy7yQo
6A99HEpbaLuy3kDt5vn91viPNUlOveuIXIdHp6u+zgFfx88eLUoEvfR135aV/Gyh
p5PJm/BO99KByQzFCnilkp7nBeKtnKYSmUojA6JsZKjzJimSPYo=
=zRI1
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"A bunch of x86/urgent stuff accumulated for the last two weeks so
lemme unload it to you.
It should be all totally risk-free, of course. :-)
- Fix out-of-spec hardware (1st gen Hygon) which does not implement
MSR_AMD64_SEV even though the spec clearly states so, and check
CPUID bits first.
- Send only one signal to a task when it is a SEGV_PKUERR si_code
type.
- Do away with all the wankery of reserving X amount of memory in the
first megabyte to prevent BIOS corrupting it and simply and
unconditionally reserve the whole first megabyte.
- Make alternatives NOP optimization work at an arbitrary position
within the patched sequence because the compiler can put
single-byte NOPs for alignment anywhere in the sequence (32-bit
retpoline), vs our previous assumption that the NOPs are only
appended.
- Force-disable ENQCMD[S] instructions support and remove
update_pasid() because of insufficient protection against FPU state
modification in an interrupt context, among other xstate horrors
which are being addressed at the moment. This one limits the
fallout until proper enablement.
- Use cpu_feature_enabled() in the idxd driver so that it can be
build-time disabled through the defines in disabled-features.h.
- Fix LVT thermal setup for SMI delivery mode by making sure the APIC
LVT value is read before APIC initialization so that softlockups
during boot do not happen at least on one machine.
- Mark all legacy interrupts as legacy vectors when the IO-APIC is
disabled and when all legacy interrupts are routed through the PIC"
* tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Check SME/SEV support in CPUID first
x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
x86/setup: Always reserve the first 1M of RAM
x86/alternative: Optimize single-byte NOPs at an arbitrary position
x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
dmaengine: idxd: Use cpu_feature_enabled()
x86/thermal: Fix LVT thermal setup for SMI delivery mode
x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
There are machines out there with added value crap^WBIOS which provide an
SMI handler for the local APIC thermal sensor interrupt. Out of reset,
the BSP on those machines has something like 0x200 in that APIC register
(timestamps left in because this whole issue is timing sensitive):
[ 0.033858] read lvtthmr: 0x330, val: 0x200
which means:
- bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
- bits [10:8] have 010b which means SMI delivery mode.
Now, later during boot, when the kernel programs the local APIC, it
soft-disables it temporarily through the spurious vector register:
setup_local_APIC:
...
/*
* If this comes from kexec/kcrash the APIC might be enabled in
* SPIV. Soft disable it before doing further initialization.
*/
value = apic_read(APIC_SPIV);
value &= ~APIC_SPIV_APIC_ENABLED;
apic_write(APIC_SPIV, value);
which means (from the SDM):
"10.4.7.2 Local APIC State After It Has Been Software Disabled
...
* The mask bits for all the LVT entries are set. Attempts to reset these
bits will be ignored."
And this happens too:
[ 0.124111] APIC: Switch to symmetric I/O mode setup
[ 0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
[ 0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0
This results in CPU 0 soft lockups depending on the placement in time
when the APIC soft-disable happens. Those soft lockups are not 100%
reproducible and the reason for that can only be speculated as no one
tells you what SMM does. Likely, it confuses the SMM code that the APIC
is disabled and the thermal interrupt doesn't doesn't fire at all,
leading to CPU 0 stuck in SMM forever...
Now, before
4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
due to how the APIC_LVTTHMR was read before APIC initialization in
mcheck_intel_therm_init(), it would read the value with the mask bit 16
clear and then intel_init_thermal() would replicate it onto the APs and
all would be peachy - the thermal interrupt would remain enabled.
But that commit moved that reading to a later moment in
intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
late and with its interrupt mask bit set.
Thus, revert back to the old behavior of reading the thermal LVT
register before the APIC gets initialized.
Fixes: 4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
Reported-by: James Feeney <james@nurealm.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
Return -EINVAL when args is invalid instead of 'ret' which is set to
zero by a previous successful call to a function.
Fixes: ca66dca5ed ("thermal: qcom: add support for adc-tm5 PMIC thermal monitor")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210527092640.2070555-1-yangyingliang@huawei.com
Fix function name in ti-bandgap.c kernel-doc comment
to remove a warning.
drivers/thermal/ti-soc-thermal/ti-bandgap.c:787: warning: expecting
prototype for ti_bandgap_alert_init(). Prototype was for
ti_bandgap_talert_init() instead.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1621851963-36548-1-git-send-email-yang.lee@linux.alibaba.com
After commit 81ad4276b5 ("Thermal: Ignore invalid trip points") all
user_space governor notifications via RW trip point is broken in intel
thermal drivers. This commits marks trip_points with value of 0 during
call to thermal_zone_device_register() as invalid. RW trip points can be
0 as user space will set the correct trip temperature later.
During driver init, x86_package_temp and all int340x drivers sets RW trip
temperature as 0. This results in all these trips marked as invalid by
the thermal core.
To fix this initialize RW trips to THERMAL_TEMP_INVALID instead of 0.
Cc: <stable@vger.kernel.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210430122343.1789899-1-srinivas.pandruvada@linux.intel.com
- Fix spellos in comments for the tegra and sun8i (Bhaskar Chowdhury)
- Add the missing fifth node on the rcar_gen3 sensor (Niklas
Söderlund)
- Remove duplicate include in ti-bandgap (Zhang Yunkai)
- Assign error code in the error path in the function
thermal_of_populate_bind_params() (Jia-Ju Bai)
- Fix spelling mistake in a comment 'disabed' -> 'disabled' (Colin Ian
King)
- Use the device name instead of auto-numbering for a better
identification of the cooling device (Daniel Lezcano)
- Improve a bit the division accuracy in the power allocator governor
(Jeson Gao)
- Enable the missing third sensor on msm8976 (Konrad Dybcio)
- Add QCom tsens driver co-maintainer (Thara Gopinath)
- Fix memory leak and use after free errors in the core code (Daniel
Lezcano)
- Add the MDM9607 compatible bindings (Konrad Dybcio)
- Fix trivial spello in the copyright name for Hisilicon (Hao Fang)
- Fix negative index array access when converting the frequency to
power in the energy model (Brian-sy Yang)
- Add support for Gen2 new PMIC support for Qcom SPMI (David Collins)
- Update maintainer file for CPU cooling device section (Lukasz Luba)
- Fix missing put_device on error in the Qcom tsens driver (Guangqing
Zhu)
- Add compatible DT binding for sm8350 (Robert Foss)
- Add support for the MDM9607's tsens driver (Konrad Dybcio)
- Remove duplicate error messages in thermal_mmio and the bcm2835
driver (Ruiqi Gong)
- Add the Thermal Temperature Cooling driver (Zhang Rui)
- Remove duplicate error messages in the Hisilicon sensor driver (Ye
Bin)
- Use the devm_platform_ioremap_resource_byname() function instead of
a couple of corresponding calls (dingsenjie)
- Sort the headers alphabetically in the ti-bandgap driver (Zhen Lei)
- Add missing property in the DT thermal sensor binding (Rafał
Miłecki)
- Remove dead code in the ti-bandgap sensor driver (Lin Ruizhe)
- Convert the BRCM DT bindings to the yaml schema (Rafał Miłecki)
- Replace the thermal_notify_framework() call by a call to the
thermal_zone_device_update() function. Remove the function as well
as the corresponding documentation (Thara Gopinath)
- Add support for the ipq8064-tsens sensor along with a set of
cleanups and code preparation (Ansuel Smith)
- Add a lockless __thermal_cdev_update() function to improve the
locking scheme in the core code and governors (Lukasz Luba)
- Fix multiple cooling device notification changes (Lukasz Luba)
- Remove unneeded variable initialization (Colin Ian King)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmCRqDIACgkQqDIjiipP
6E8O2Qf5AQvSVoN9WYRBLo1+a4mkGsJ/wHQMEsOA4FVHft5/QVkRtpMNbSiyq00O
YTpNuoBqiYm/tSTyzK/5Oh+0ucgm/ef4c4dTyPjZYw2GB+3rYNRAXdX/tB6Ggjl/
oUArUCoSQZjOU6Y573B05rcHp1PVM/XL9LgD1uX76tXA1MaGvsyC0cyPRAdOANke
W83BWI0XMhv8B1bZwHVB2Oft5x6HhqWBl3HKbNOmPEMtwkqqBCFAqB0wNEH88ZTf
2hyBjBoZQHdMkJsC0piMvIyAjHZiIjQB47VWz31EvKB3/E28xCqRqPViPq9QbrA5
got0+oDbxI96T024ndXRomc0SSxZnw==
=5THg
-----END PGP SIGNATURE-----
Merge tag 'thermal-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Remove duplicate error message for the amlogic driver (Tang Bin)
- Fix spellos in comments for the tegra and sun8i (Bhaskar Chowdhury)
- Add the missing fifth node on the rcar_gen3 sensor (Niklas Söderlund)
- Remove duplicate include in ti-bandgap (Zhang Yunkai)
- Assign error code in the error path in the function
thermal_of_populate_bind_params() (Jia-Ju Bai)
- Fix spelling mistake in a comment 'disabed' -> 'disabled' (Colin Ian
King)
- Use the device name instead of auto-numbering for a better
identification of the cooling device (Daniel Lezcano)
- Improve a bit the division accuracy in the power allocator governor
(Jeson Gao)
- Enable the missing third sensor on msm8976 (Konrad Dybcio)
- Add QCom tsens driver co-maintainer (Thara Gopinath)
- Fix memory leak and use after free errors in the core code (Daniel
Lezcano)
- Add the MDM9607 compatible bindings (Konrad Dybcio)
- Fix trivial spello in the copyright name for Hisilicon (Hao Fang)
- Fix negative index array access when converting the frequency to
power in the energy model (Brian-sy Yang)
- Add support for Gen2 new PMIC support for Qcom SPMI (David Collins)
- Update maintainer file for CPU cooling device section (Lukasz Luba)
- Fix missing put_device on error in the Qcom tsens driver (Guangqing
Zhu)
- Add compatible DT binding for sm8350 (Robert Foss)
- Add support for the MDM9607's tsens driver (Konrad Dybcio)
- Remove duplicate error messages in thermal_mmio and the bcm2835
driver (Ruiqi Gong)
- Add the Thermal Temperature Cooling driver (Zhang Rui)
- Remove duplicate error messages in the Hisilicon sensor driver (Ye
Bin)
- Use the devm_platform_ioremap_resource_byname() function instead of a
couple of corresponding calls (dingsenjie)
- Sort the headers alphabetically in the ti-bandgap driver (Zhen Lei)
- Add missing property in the DT thermal sensor binding (Rafał Miłecki)
- Remove dead code in the ti-bandgap sensor driver (Lin Ruizhe)
- Convert the BRCM DT bindings to the yaml schema (Rafał Miłecki)
- Replace the thermal_notify_framework() call by a call to the
thermal_zone_device_update() function. Remove the function as well as
the corresponding documentation (Thara Gopinath)
- Add support for the ipq8064-tsens sensor along with a set of cleanups
and code preparation (Ansuel Smith)
- Add a lockless __thermal_cdev_update() function to improve the
locking scheme in the core code and governors (Lukasz Luba)
- Fix multiple cooling device notification changes (Lukasz Luba)
- Remove unneeded variable initialization (Colin Ian King)
* tag 'thermal-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (55 commits)
thermal/drivers/mtk_thermal: Remove redundant initializations of several variables
thermal/core/power allocator: Use the lockless __thermal_cdev_update() function
thermal/core/fair share: Use the lockless __thermal_cdev_update() function
thermal/core/fair share: Lock the thermal zone while looping over instances
thermal/core/power_allocator: Update once cooling devices when temp is low
thermal/core/power_allocator: Maintain the device statistics from going stale
thermal/core: Create a helper __thermal_cdev_update() without a lock
dt-bindings: thermal: tsens: Document ipq8064 bindings
thermal/drivers/tsens: Add support for ipq8064-tsens
thermal/drivers/tsens: Drop unused define for msm8960
thermal/drivers/tsens: Replace custom 8960 apis with generic apis
thermal/drivers/tsens: Fix bug in sensor enable for msm8960
thermal/drivers/tsens: Use init_common for msm8960
thermal/drivers/tsens: Add VER_0 tsens version
thermal/drivers/tsens: Convert msm8960 to reg_field
thermal/drivers/tsens: Don't hardcode sensor slope
Documentation: driver-api: thermal: Remove thermal_notify_framework from documentation
thermal/core: Remove thermal_notify_framework
iwlwifi: mvm: tt: Replace thermal_notify_framework
dt-bindings: thermal: brcm,ns-thermal: Convert to the json-schema
...
Several variables are being initialized with values that is never
read and being updated later with a new value. The initializations
are redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210422120412.246291-1-colin.king@canonical.com
Use the new helper function and avoid unnecessery second lock/unlock,
which was present in old approach with thermal_cdev_update().
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210422153624.6074-4-lukasz.luba@arm.com
Use the new helper function and avoid unnecessery second lock/unlock,
which was present in old approach with thermal_cdev_update().
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210422153624.6074-3-lukasz.luba@arm.com