Commit Graph

2981 Commits

Author SHA1 Message Date
Rafael J. Wysocki
0a0a40d71c thermal: sysfs: Use the dev argument in instance-related show/store
Two sysfs show/store functions for attributes representing thermal
instances, trip_point_show() and weight_store(), retrieve the thermal
zone pointer from the instance object at hand, but they may also get
it from their dev argument, which is more consistent with what the
other thermal sysfs functions do, so make them do so.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/1987669.PYKUYFuaPT@rjwysocki.net
2024-08-22 17:43:14 +02:00
Rafael J. Wysocki
eb3591cde1 thermal: core: Drop redundant thermal instance checks
Because the trip and cdev pointers are sufficient to identify a thermal
instance holding them unambiguously, drop the additional thermal zone
checks from two loops walking the list of thermal instances in a
thermal zone.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/10527734.nUPlyArG6x@rjwysocki.net
2024-08-22 17:43:14 +02:00
Rafael J. Wysocki
b4e6d39817 thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
It is not necessary to look up the thermal zone and the cooling device
in the respective global lists to check whether or not they are
registered.  It is sufficient to check whether or not their respective
list nodes are empty for this purpose.

Use the above observation to simplify thermal_bind_cdev_to_trip().  In
addition, eliminate an unnecessary ternary operator from it.

Moreover, add lockdep_assert_held() for thermal_list_lock to it because
that lock must be held by its callers when it is running.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/3324214.44csPzL39Z@rjwysocki.net
2024-08-22 17:43:14 +02:00
Rafael J. Wysocki
a8bbe6f10f thermal: core: Fold two functions into their respective callers
Fold bind_cdev() into __thermal_cooling_device_register() and bind_tz()
into thermal_zone_device_register_with_trips() to reduce code bloat and
make it somewhat easier to follow the code flow.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/2962184.e9J7NaK4W3@rjwysocki.net
2024-08-22 17:43:13 +02:00
Rafael J. Wysocki
f6a034f2df thermal: Introduce a debugfs-based testing facility
Introduce a facility allowing the thermal core functionality to be
exercised in a controlled way in order to verify its behavior, without
affecting its regular users noticeably.

It is based on the idea of preparing thermal zone templates along with
their trip points by writing to files in debugfs.  When ready, those
templates can be used for registering test thermal zones with the
thermal core.

The temperature of a test thermal zone created this way can be adjusted
via debugfs, which also triggers a __thermal_zone_device_update() call
for it.  By manipulating the temperature of a test thermal zone, one can
check if the thermal core reacts to the changes of it as expected.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/6065927.lOV4Wx5bFT@rjwysocki.net
[ rjw: Fixed ordering of kcalloc() arguments ]
[ rjw: Fixed debugfs_create_dir() return value checks ]
[ rjw: Fixed two kerneldoc comments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-22 17:42:57 +02:00
Daniel Lezcano
f9ba1e0517 thermal/core: Compute low and high boundaries in thermal_zone_device_update()
In order to set the scene for the thresholds support which have to
manipulate the low and high temperature boundaries for the interrupt
support, we must pass the low and high values to the incoming
thresholds routine.

The variables are set from the thermal_zone_set_trips() where the
function loops the thermal trips to figure out the next and the
previous temperatures to set the interrupt to be triggered when they
are crossed.

These variables will be needed by the function in charge of handling
the thresholds in the incoming changes but they are local to the
aforementioned function thermal_zone_set_trips().

Move the low and high boundaries computation out of the function in
thermal_zone_device_update() so they are accessible from there.

The positive side effect is they are computed in the same loop as
handle_thermal_trip(), so we remove one loop.

Co-developed-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20240816081241.1925221-2-daniel.lezcano@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-19 16:06:58 +02:00
Rafael J. Wysocki
5ae98b5a9f Merge back thermal core material for 6.12. 2024-08-19 14:42:19 +02:00
Rafael J. Wysocki
6e6f58a170 thermal: gov_bang_bang: Use governor_data to reduce overhead
After running once, the for_each_trip_desc() loop in
bang_bang_manage() is pure needless overhead because it is not going to
make any changes unless a new cooling device has been bound to one of
the trips in the thermal zone or the system is resuming from sleep.

For this reason, make bang_bang_manage() set governor_data for the
thermal zone and check it upfront to decide whether or not it needs to
do anything.

However, governor_data needs to be reset in some cases to let
bang_bang_manage() know that it should walk the trips again, so add an
.update_tz() callback to the governor and make the core additionally
invoke it during system resume.

To avoid affecting the other users of that callback unnecessarily, add
a special notification reason for system resume, THERMAL_TZ_RESUME, and
also pass it to __thermal_zone_device_update() called during system
resume for consistency.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
2024-08-16 13:13:59 +02:00
Rafael J. Wysocki
5f64b4a1ab thermal: gov_bang_bang: Add .manage() callback
After recent changes, the Bang-bang governor may not adjust the
initial configuration of cooling devices to the actual situation.

Namely, if a cooling device bound to a certain trip point starts in
the "on" state and the thermal zone temperature is below the threshold
of that trip point, the trip point may never be crossed on the way up
in which case the state of the cooling device will never be adjusted
because the thermal core will never invoke the governor's
.trip_crossed() callback.  [Note that there is no issue if the zone
temperature is at the trip threshold or above it to start with because
.trip_crossed() will be invoked then to indicate the start of thermal
mitigation for the given trip.]

To address this, add a .manage() callback to the Bang-bang governor
and use it to ensure that all of the thermal instances managed by the
governor have been initialized properly and the states of all of the
cooling devices involved have been adjusted to the current zone
temperature as appropriate.

Fixes: 530c932bdf ("thermal: gov_bang_bang: Use .trip_crossed() instead of .throttle()")
Link: https://lore.kernel.org/linux-pm/1bfbbae5-42b0-4c7d-9544-e98855715294@piie.net/
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/8419356.T7Z3S40VBb@rjwysocki.net
2024-08-16 13:13:49 +02:00
Rafael J. Wysocki
84248e35d9 thermal: gov_bang_bang: Split bang_bang_control()
Move the setting of the thermal instance target state from
bang_bang_control() into a separate function that will be also called
in a different place going forward.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/3313587.aeNJFYEL58@rjwysocki.net
2024-08-16 13:13:42 +02:00
Rafael J. Wysocki
b9b6ee6fe2 thermal: gov_bang_bang: Call __thermal_cdev_update() directly
Instead of clearing the "updated" flag for each cooling device
affected by the trip point crossing in bang_bang_control() and
walking all thermal instances to run thermal_cdev_update() for all
of the affected cooling devices, call __thermal_cdev_update()
directly for each of them.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/13583081.uLZWGnKmhe@rjwysocki.net
2024-08-16 13:13:33 +02:00
Rafael J. Wysocki
107280e137 thermal: sysfs: Refine the handling of trip hysteresis changes
Change trip_point_hyst_store() and replace thermal_zone_trip_updated()
with thermal_zone_set_trip_hyst() to follow the trip_point_temp_store()
code pattern for more consistency.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5508466.Sb9uPGUboI@rjwysocki.net
2024-08-13 20:54:17 +02:00
Rafael J. Wysocki
afd84fb10c thermal: sysfs: Get to trips via attribute pointers
The  _store() and _show() functions for sysfs attributes corresponding
to trip point parameters (type, temperature and hysteresis) read the
trip ID from the attribute name and then use the trip ID as the index
in the given thermal zone's trips table to get to the trip object they
want.

Instead of doing this, make them use the attribute pointer they get
as the second argument to get to the trip object embedded in the same
struct thermal_trip_desc as the struct device_attribute pointed to by
it, which is much more straightforward and less overhead.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/114841552.nniJfEyVGO@rjwysocki.net
2024-08-13 20:53:18 +02:00
Rafael J. Wysocki
66b263306a thermal: core: Store trip sysfs attributes in thermal_trip_desc
Instead of allocating memory for trip point sysfs attributes separately,
store them in struct thermal_trip_desc for each trip individually which
allows a few extra memory allocations to be avoided.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/46571375.fMDQidcC6G@rjwysocki.net
2024-08-13 20:50:57 +02:00
Rafael J. Wysocki
8ecd953ca5 thermal: trip: Drop thermal_zone_get_trip()
There are no more callers of thermal_zone_get_trip() in the tree, so
drop it.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2220301.Mh6RI2rZIc@rjwysocki.net
2024-08-02 14:01:12 +02:00
Rafael J. Wysocki
79f194dd54 thermal: trip: Get rid of thermal_zone_get_num_trips()
The only existing caller of thermal_zone_get_num_trips(), which is
rcar_gen3_thermal_probe(), uses this function to put the number of
trip points into a kernel log message, but this information is also
available from the thermal sysfs interface.

For this reason, remove the thermal_zone_get_num_trips() call from
rcar_gen3_thermal_probe() and drop the former altogether.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2636988.Lt9SDvczpP@rjwysocki.net
2024-08-02 13:59:50 +02:00
Rafael J. Wysocki
96d819908d thermal: helpers: Drop get_thermal_instance()
There are no more users of get_thermal_instance(), so drop it.

While at it, replace get_instance() returning a pointer to struct
thermal_instance with thermal_instance_present() returning a bool
which is more straightforward.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2014591.usQuhbGJ8B@rjwysocki.net
[ rjw: Dropped get_thermal_instance() documentation ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-02 13:57:04 +02:00
Rafael J. Wysocki
5136f99b9a thermal: tegra: Use thermal_zone_for_each_trip() for walking trip points
It is generally inefficient to iterate over trip indices and call
thermal_zone_get_trip() every time to get the struct thermal_trip
corresponding to the given trip index, so modify the Tegra thermal
drivers to use thermal_zone_for_each_trip() for walking trips.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/1819430.VLH7GnMWUR@rjwysocki.net
[ rjw: Dropped an unused local variable ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-02 13:50:04 +02:00
Rafael J. Wysocki
289a54f611 thermal: tegra: Introduce struct trip_temps for critical and hot trips
Introduce a helper structure, struct trip_temps, for storing the
temperatures of the critical and hot trip points.

This helps to make the code in tegra_tsensor_get_hw_channel_trips()
somewhat cleaner and will be useful subsequently in eliminating
iteration over trip indices from the driver.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/9333090.CDJkKcVGEf@rjwysocki.net
2024-08-02 13:48:03 +02:00
Rafael J. Wysocki
ab446887ea thermal: qcom: Use thermal_zone_get_crit_temp() in qpnp_tm_init()
Modify qpnp_tm_init() to use thermal_zone_get_crit_temp() to get the
critical trip temperature instead of iterating over trip indices and
using thermal_zone_get_trip() to get a struct thermal_trip pointer
from a trip index until it finds the critical one.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/7712228.EvYhyI6sBW@rjwysocki.net
2024-08-02 13:46:10 +02:00
Rafael J. Wysocki
1ac1503cff thermal: hisi: Use thermal_zone_for_each_trip() in hisi_thermal_register_sensor()
Modify hisi_thermal_register_sensor() to use thermal_zone_for_each_trip()
for walking trip points instead of iterating over trip indices and using
thermal_zone_get_trip() to get a struct thermal_trip pointer from a trip
index.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/1994088.PYKUYFuaPT@rjwysocki.net
[ rjw: Dropped two unused local variables ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-02 13:41:52 +02:00
Rafael J. Wysocki
5df9809c42 thermal: broadcom: Use thermal_zone_get_crit_temp() in bcm2835_thermal_probe()
Modify the bcm2835 thermal driver to use thermal_zone_get_crit_temp() in
bcm2835_thermal_probe() instead of relying on the assumption that the
critical trip index will always be 0.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/3322893.aeNJFYEL58@rjwysocki.net
2024-08-02 13:32:09 +02:00
Rafael J. Wysocki
d955d7cecb Merge branch 'thermal-intel'
Merge fixes for the int340x thermal driver handling of MSI IRQs:

 - Fix MSI error path cleanup in int340x, allow it to work with a
   subset of thermal MSI IRQs if some of them are not working and
   make it free all MSI IRQs on module exit (Srinivas Pandruvada).

* thermal-intel:
  thermal: intel: int340x: Free MSI IRQ vectors on module exit
  thermal: intel: int340x: Allow limited thermal MSI support
  thermal: intel: int340x: Fix kernel warning during MSI cleanup
2024-07-31 12:31:27 +02:00
Rafael J. Wysocki
f844793f2d thermal: trip: Avoid skipping trips in thermal_zone_set_trips()
Say there are 3 trip points A, B, C sorted in ascending temperature
order with no hysteresis.  If the zone temerature is exactly equal to
B, thermal_zone_set_trips() will set the boundaries to A and C and the
hardware will not catch any crossing of B (either way) until either A
or C is crossed and the boundaries are changed.

To avoid that, use non-strict inequalities when comparing the trip
threshold to the zone temperature in thermal_zone_set_trips().

In the example above, it will cause both boundaries to be set to B,
which is desirable because an interrupt will trigger when the zone
temperature becomes different from B regardless of which way it goes.
That will allow a new interval to be set depending on the direction of
the zone temperature change.

Fixes: 893bae9223 ("thermal: trip: Make thermal_zone_set_trips() use trip thresholds")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12509184.O9o76ZdvQC@rjwysocki.net
2024-07-31 12:30:58 +02:00
Srinivas Pandruvada
f8ce49be27 thermal: intel: int340x: Free MSI IRQ vectors on module exit
On module exit call proc_thermal_free_msi() to free vectors allocated by
pci_alloc_irq_vectors().

Fixes: 7a9a8c5faf ("thermal: intel: int340x: Support MSI interrupt for Lunar Lake")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/20240723140228.865919-4-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-30 16:36:58 +02:00
Srinivas Pandruvada
b85a2d300a thermal: intel: int340x: Allow limited thermal MSI support
On some Lunar Lake pre-production systems, not all the MSI thermal
vectors are valid. In that case instead of failing module load, continue
with partial thermal interrupt support.

pci_alloc_irq_vectors() can return less than expected maximum vectors.
In that case call devm_request_threaded_irq() only for current maximum
vectors.

Fixes: 7a9a8c5faf ("thermal: intel: int340x: Support MSI interrupt for Lunar Lake")
Reported-by: Yijun Shen <Yijun.Shen@dell.com>
Tested-by: Yijun Shen <Yijun.Shen@dell.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/20240723140228.865919-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-30 16:36:51 +02:00
Srinivas Pandruvada
b630a04121 thermal: intel: int340x: Fix kernel warning during MSI cleanup
On some pre-production Lunar Lake systems, there is a kernel
warning:

remove_proc_entry: removing non-empty directory 'irq/172'
WARNING: CPU: 0 PID: 501 at fs/proc/generic.c:717
	remove_proc_entry+0x1b4/0x1e0
...
...
remove_proc_entry+0x1b4/0x1e0
report_bug+0x182/0x1b0
handle_bug+0x51/0xa0
exc_invalid_op+0x18/0x80
asm_exc_invalid_op+0x1b/0x20
remove_proc_entry+0x1b4/0x1e0
remove_proc_entry+0x1b4/0x1e0
unregister_irq_proc+0xf2/0x120
free_desc+0x41/0xe0
irq_domain_free_irqs+0x138/0x1c0
irq_free_descs+0x52/0x80
irq_domain_free_irqs+0x151/0x1c0
msi_domain_free_locked.part.0+0x17e/0x1c0
msi_domain_free_irqs_all_locked+0x74/0xc0
pci_msi_teardown_msi_irqs+0x50/0x60
pci_free_msi_irqs+0x12/0x40
pci_free_irq_vectors+0x58/0x70

On these systems, not all the MSI thermal vectors are valid. This causes
devm_request_threaded_irq() to fail for some vectors. As part of the
clean up on this error, pci_free_irq_vectors() is called without calling
devm_free_irq(). This causes the above warning.

Add a function proc_thermal_free_msi() to call devm_free_irq() for all
successfully registered IRQ handlers, then call pci_free_irq_vectors().
Call this function for MSI cleanup.

Fixes: 7a9a8c5faf ("thermal: intel: int340x: Support MSI interrupt for Lunar Lake")
Reported-by: Yijun Shen <Yijun.shen@dell.com>
Tested-by: Yijun Shen <Yijun.shen@dell.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/20240723140228.865919-2-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-30 16:36:39 +02:00
Rafael J. Wysocki
f7c1b0e4ae thermal: core: Back off when polling thermal zones on errors
Commit a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone
temperature is invalid") introduced a polling mechanism by which the
thermal core attampts to get a valid temperature value for thermal zones
where the .get_temp() callback returns errors to start with (for
example, due to initialization ordering woes).  However, this polling is
carried out periodically ad infinitum and every iteration of it causes
a message to be printed to the kernel log which means a lot of log noise
on systems where there are thermal zones that never get ready for some
reason.  It is also not really useful to continuously poll thermal zones
that never respond.

To address this, modify the thermal core to increase the delay between
consecutive thermal zone temperature checks after every check that fails
until it reaches a certain maximum value.  At that point, the thermal
zone in question will be disabled, but user space will be able to
reenable it if it believes that the failure is transient.

Also change the code to print messages regarding failed temperature
checks to the kernel log only twice, once when the thermal zone's
.get_temp() callback returns an error for the first time and once when
disabling the given thermal zone.  In addition, a dev_crit() message
will be printed at that point if the given thermal zone contains a
critical trip point to notify the system operator about the situation.

Fixes: a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Link: https://lore.kernel.org/linux-acpi/CAGnHSE=RyPK++UG0-wAtVKgeJxe0uzFYgLxm+RUOKKoQquW=Ow@mail.gmail.com/
Reported-by: Tom Yan <tom.ty89@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2962033.e9J7NaK4W3@rjwysocki.net
2024-07-24 12:40:23 +02:00
Rafael J. Wysocki
e5f98896ef thermal: trip: Split thermal_zone_device_set_mode()
Pull a wrapper around thermal zone .change_mode() callback out of
thermal_zone_device_set_mode() because it will be used elsewhere
subsequently.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2206793.irdbgypaU6@rjwysocki.net
2024-07-23 14:18:15 +02:00
Rafael J. Wysocki
e528be3c87 thermal: core: Allow thermal zones to tell the core to ignore them
The iwlwifi wireless driver registers a thermal zone that is only needed
when the network interface handled by it is up and it wants that thermal
zone to be effectively ignored by the core otherwise.

Before commit a8a2617744 ("thermal: core: Call monitor_thermal_zone()
if zone temperature is invalid") that could be achieved by returning
an error code from the thermal zone's .get_temp() callback because the
core did not really handle errors returned by it almost at all.
However, commit a8a2617744 made the core attempt to recover from the
situation in which the temperature of a thermal zone cannot be
determined due to errors returned by its .get_temp() and is always
invalid from the core's perspective.

That was done because there are thermal zones in which .get_temp()
returns errors to start with due to some difficulties related to the
initialization ordering, but then it will start to produce valid
temperature values at one point.

Unfortunately, the simple approach taken by commit a8a2617744,
which is to poll the thermal zone periodically until its .get_temp()
callback starts to return valid temperature values, is at odds with
the special thermal zone in iwlwifi in which .get_temp() may always
return an error because its network interface may always be down.  If
that happens, every attempt to invoke the thermal zone's .get_temp()
callback resulting in an error causes the thermal core to print a
dev_warn() message to the kernel log which is super-noisy.

To address this problem, make the core handle the case in which
.get_temp() returns 0, but the temperature value returned by it
is not actually valid, in a special way.  Namely, make the core
completely ignore the invalid temperature value coming from
.get_temp() in that case, which requires folding in
update_temperature() into its caller and a few related changes.

On the iwlwifi side, modify iwl_mvm_tzone_get_temp() to return 0
and put THERMAL_TEMP_INVALID into the temperature return memory
location instead of returning an error when the firmware is not
running or it is not of the right type.

Also, to clearly separate the handling of invalid temperature
values from the thermal zone initialization, introduce a special
THERMAL_TEMP_INIT value specifically for the latter purpose.

Fixes: a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Closes: https://lore.kernel.org/linux-pm/20240715044527.GA1544@sol.localdomain/
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201761
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4950004.31r3eYUQgx@rjwysocki.net
[ rjw: Rebased on top of the current mainline ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-18 13:35:55 +02:00
Rafael J. Wysocki
281cfec53b Merge branch 'thermal-intel'
Merge updates of Intel thermal drivers for 6.11-rc1:

 - Switch Intel thermal drivers to new Intel CPU model defines (Tony
   Luck).

 - Clean up the int3400 and int3403 drivers (Erick Archer and David Alan
   Gilbert).

 - Improve intel_pch_thermal kernel log messages printed during suspend
   to idle (Zhang Rui).

 - Make the intel_tcc_cooling driver use a model-specific bitmask for
   TCC offset (Ricardo Neri).

 - Add DLVR and MSI interrupt support for the Lunar Lake platform to the
   int340x thermal driver (Srinivas Pandruvada).

 - Enable workload type hints (WLT) support and power floor interrupt
   support for the Lunar Lake platform in int340x ((Srinivas Pandruvada).

 - Make the HFI thermal driver use package scope for HFI instances as
   per the Intel SDM (Zhang Rui).

* thermal-intel:
  thermal: intel: hfi: Give HFI instances package scope
  thermal: intel: int340x: Enable WLT and power floor support for Lunar Lake
  thermal: intel: int340x: Support MSI interrupt for Lunar Lake
  thermal: intel: int340x: Remove unnecessary calls to free irq
  thermal: intel: int340x: Add DLVR support for Lunar Lake
  thermal: intel: int340x: Capability to map user space to firmware values
  thermal: intel: int340x: Cleanup of DLVR sysfs on driver remove
  thermal: intel: intel_tcc_cooling: Use a model-specific bitmask for TCC offset
  thermal: intel: intel_tcc: Add model checks for temperature registers
  thermal: intel: intel_pch: Improve cooling log
  thermal: int3403: remove unused struct 'int3403_performance_state'
  thermal: int3400: Use sizeof(*pointer) instead of sizeof(type)
  thermal: intel: intel_soc_dts_thermal: Switch to new Intel CPU model defines
  thermal: intel: intel_tcc_cooling: Switch to new Intel CPU model defines
2024-07-15 20:44:31 +02:00
Rafael J. Wysocki
ab33da3a22 Merge branch 'thermal-core'
Merge updates related to the thermal core for 6.11-rc1:

 - Redesign the .set_trip_temp() thermal zone callback to take a trip
   pointer instead of a trip ID and update its users (Rafael Wysocki).

 - Avoid using invalid combinations of polling_delay and passive_delay
   thermal zone parameters (Rafael Wysocki).

 - Update a cooling device registration function to take a const
   argument (Krzysztof Kozlowski).

 - Make the uniphier thermal driver use thermal_zone_for_each_trip() for
   walking trip points (Rafael Wysocki).

* thermal-core:
  thermal: core: Add sanity checks for polling_delay and passive_delay
  thermal: trip: Fold __thermal_zone_get_trip() into its caller
  thermal: trip: Pass trip pointer to .set_trip_temp() thermal zone callback
  thermal: imx: Drop critical trip check from imx_set_trip_temp()
  thermal: trip: Add conversion macros for thermal trip priv field
  thermal: helpers: Introduce thermal_trip_is_bound_to_cdev()
  thermal: core: Change passive_delay and polling_delay data type
  thermal: core: constify 'type' in devm_thermal_of_cooling_device_register()
  thermal: uniphier: Use thermal_zone_for_each_trip() for walking trip points
2024-07-15 20:43:21 +02:00
Raphael Gallais-Pou
e61cc85edb thermal/drivers/sti: Cleanup code related to stih416
"st,stih416-mpe-thermal" compatible seems to appear nowhere in the
device-tree nor in the documentation.
Remove compatible and related code.

Signed-off-by: Raphael Gallais-Pou <rgallaispou@gmail.com>
Link: https://lore.kernel.org/r/20240708161840.102004-1-rgallaispou@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:41 +02:00
Krzysztof Kozlowski
d5c38eec5d thermal/drivers/generic-adc: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-12-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:41 +02:00
Krzysztof Kozlowski
f637bfe26c thermal/drivers/generic-adc: Simplify probe() with local dev variable
Simplify the probe() function by using local 'dev' instead of
&pdev->dev.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-11-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:41 +02:00
Krzysztof Kozlowski
bc55630c65 thermal/drivers/qcom-tsens: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-10-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:41 +02:00
Krzysztof Kozlowski
ecfee9176b thermal/drivers/qcom-spmi-adc-tm5: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-9-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:41 +02:00
Krzysztof Kozlowski
d0b297e76b thermal/drivers/imx: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-8-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
e9ac90242b thermal/drivers/imx: Simplify probe() with local dev variable
Simplify the probe() function by using local 'dev' instead of
&pdev->dev.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-7-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
3e1a0680bb thermal/drivers/hisi: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-6-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
ca6176693f thermal/drivers/exynos: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-5-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
4a6cf76edf thermal/drivers/exynos: Simplify probe() with local dev variable
Simplify the probe() function by using local 'dev' instead of
&pdev->dev.  While touching devm_kzalloc(), use preferred sizeof(*)
syntax.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-4-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
9d55cb3ba3 thermal/drivers/broadcom: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-3-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
fd972a1745 thermal/drivers/broadcom: Simplify probe() with local dev variable
Simplify the probe() function by using local 'dev' instead of
&pdev->dev.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-2-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
e90c369cc2 thermal/drivers/broadcom: Fix race between removal and clock disable
During the probe, driver enables clocks necessary to access registers
(in get_temp()) and then registers thermal zone with managed-resources
(devm) interface.  Removal of device is not done in reversed order,
because:
1. Clock will be disabled in driver remove() callback - thermal zone is
   still registered and accessible to users,
2. devm interface will unregister thermal zone.

This leaves short window between (1) and (2) for accessing the
get_temp() callback with disabled clock.

Fix this by enabling clock also via devm-interface, so entire cleanup
path will be in proper, reversed order.

Fixes: 8454c8c09c ("thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-1-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Chen-Yu Tsai
a5d4afb92e thermal/drivers/mediatek/lvts_thermal: Provide default calibration data
On some pre-production hardware, the SoCs do not contain calibration
data for the thermal sensors. The downstream drivers provide default
values that sort of work, instead of having the thermal sensors not
work at all.

Port the default values to the upstream driver. These values are from
the ChromeOS kernels, which sadly do not cover the MT7988.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20240620092306.2352606-1-wenst@chromium.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:39 +02:00
Julien Panis
be3e224ec5 dt-bindings: thermal: mediatek: Fix thermal zone definitions for MT8188
Fix thermal zone names for consistency with the other SoCs:
- GPU0 must be used as the first GPU item.
- SOCx deal with audio DSP, video, and infra subsystems.

The naming must be fixed "atomically" so compilation does not break.
As a result, the change is made in the dt-bindings and in the LVTS
driver within a single commit, despite the checkpatch warning.

The definitions can be safely modified here because they are used only
in the LVTS driver, which is modified accordingly, and have not yet
been included in a released kernel.

Fixes: 78c88534e5 ("dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT8188")
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Link: https://lore.kernel.org/r/20240603-mtk-thermal-mt818x-dtsi-v7-2-8c8e3c7a3643@baylibre.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:39 +02:00
Julien Panis
6b04928e83 dt-bindings: thermal: mediatek: Fix thermal zone definition for MT8186
Fix a thermal zone name for consistency with the other SoCs:
MFG contains GPU, the latter is more specific and must be used here.

The naming must be fixed "atomically" so compilation does not break.
As a result, the change is made in the dt-bindings and in the LVTS
driver within a single commit, despite the checkpatch warning.

The definition can be safely modified here because it is used only
in the LVTS driver, which is modified accordingly, and has not yet
been included in a released kernel.

Fixes: a2ca202350 ("dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT8186")
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Link: https://lore.kernel.org/r/20240603-mtk-thermal-mt818x-dtsi-v7-1-8c8e3c7a3643@baylibre.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:39 +02:00
Théo Lebrun
854a8e208c thermal/drivers/k3_j72xx_bandgap: Implement suspend/resume support
This add suspend-to-ram support.

The derived_table is kept-as is, so the resume is only about
pm_runtime_* calls and restoring the same registers as the probe.

Extract the hardware initialization procedure to a function called at
both probe-time & resume-time.

The probe-time loop is split in two to ensure doing the hardware
initialization before registering thermal zones. That ensures our
callbacks cannot be called while in bad state.

The 100ms delay in the hardware initialization sequence was removed.
It was initially added to be sure the thresholds are programmed before
enabling the interrupt, but in fact it's not needed (tested on J7200
platform).

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Acked-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Thomas Richard <thomas.richard@bootlin.com>
Link: https://lore.kernel.org/r/20240425153238.498750-1-thomas.richard@bootlin.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:39 +02:00
Niklas Söderlund
f996e2b17a thermal/drivers/renesas/rcar: Add dependency on OF
The R-Car thermal driver depends on OF, describe this.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240506154011.344324-3-niklas.soderlund+renesas@ragnatech.se
2024-07-15 13:31:39 +02:00