mirror of
https://github.com/torvalds/linux.git
synced 2024-12-21 10:31:54 +00:00
02b26774af
Currently the use of pm_runtime_put_sync() is not safe from interrupts-disabled context because rpm_idle() will release the spinlock and enable interrupts for the idle callbacks. This enables interrupts during a time where interrupts were expected to be disabled, and can have strange side effects on drivers that expected interrupts to be disabled. This is not a bug since the documentation clearly states that only _put_sync_suspend() is safe in IRQ-safe mode. However, pm_runtime_put_sync() could be made safe when in IRQ-safe mode by releasing the spinlock but not re-enabling interrupts, which is what this patch aims to do. Problem was found when using some buggy drivers that set pm_runtime_irq_safe() and used _put_sync() in interrupts-disabled context. Reported-by: Colin Cross <ccross@google.com> Tested-by: Nishanth Menon <nm@ti.com> Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
858 lines
40 KiB
Plaintext
858 lines
40 KiB
Plaintext
Runtime Power Management Framework for I/O Devices
|
|
|
|
(C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
|
|
(C) 2010 Alan Stern <stern@rowland.harvard.edu>
|
|
|
|
1. Introduction
|
|
|
|
Support for runtime power management (runtime PM) of I/O devices is provided
|
|
at the power management core (PM core) level by means of:
|
|
|
|
* The power management workqueue pm_wq in which bus types and device drivers can
|
|
put their PM-related work items. It is strongly recommended that pm_wq be
|
|
used for queuing all work items related to runtime PM, because this allows
|
|
them to be synchronized with system-wide power transitions (suspend to RAM,
|
|
hibernation and resume from system sleep states). pm_wq is declared in
|
|
include/linux/pm_runtime.h and defined in kernel/power/main.c.
|
|
|
|
* A number of runtime PM fields in the 'power' member of 'struct device' (which
|
|
is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
|
|
be used for synchronizing runtime PM operations with one another.
|
|
|
|
* Three device runtime PM callbacks in 'struct dev_pm_ops' (defined in
|
|
include/linux/pm.h).
|
|
|
|
* A set of helper functions defined in drivers/base/power/runtime.c that can be
|
|
used for carrying out runtime PM operations in such a way that the
|
|
synchronization between them is taken care of by the PM core. Bus types and
|
|
device drivers are encouraged to use these functions.
|
|
|
|
The runtime PM callbacks present in 'struct dev_pm_ops', the device runtime PM
|
|
fields of 'struct dev_pm_info' and the core helper functions provided for
|
|
runtime PM are described below.
|
|
|
|
2. Device Runtime PM Callbacks
|
|
|
|
There are three device runtime PM callbacks defined in 'struct dev_pm_ops':
|
|
|
|
struct dev_pm_ops {
|
|
...
|
|
int (*runtime_suspend)(struct device *dev);
|
|
int (*runtime_resume)(struct device *dev);
|
|
int (*runtime_idle)(struct device *dev);
|
|
...
|
|
};
|
|
|
|
The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks are
|
|
executed by the PM core for either the device type, or the class (if the device
|
|
type's struct dev_pm_ops object does not exist), or the bus type (if the
|
|
device type's and class' struct dev_pm_ops objects do not exist) of the given
|
|
device (this allows device types to override callbacks provided by bus types or
|
|
classes if necessary). The bus type, device type and class callbacks are
|
|
referred to as subsystem-level callbacks in what follows.
|
|
|
|
By default, the callbacks are always invoked in process context with interrupts
|
|
enabled. However, subsystems can use the pm_runtime_irq_safe() helper function
|
|
to tell the PM core that a device's ->runtime_suspend() and ->runtime_resume()
|
|
callbacks should be invoked in atomic context with interrupts disabled.
|
|
This implies that these callback routines must not block or sleep, but it also
|
|
means that the synchronous helper functions listed at the end of Section 4 can
|
|
be used within an interrupt handler or in an atomic context.
|
|
|
|
The subsystem-level suspend callback is _entirely_ _responsible_ for handling
|
|
the suspend of the device as appropriate, which may, but need not include
|
|
executing the device driver's own ->runtime_suspend() callback (from the
|
|
PM core's point of view it is not necessary to implement a ->runtime_suspend()
|
|
callback in a device driver as long as the subsystem-level suspend callback
|
|
knows what to do to handle the device).
|
|
|
|
* Once the subsystem-level suspend callback has completed successfully
|
|
for given device, the PM core regards the device as suspended, which need
|
|
not mean that the device has been put into a low power state. It is
|
|
supposed to mean, however, that the device will not process data and will
|
|
not communicate with the CPU(s) and RAM until the subsystem-level resume
|
|
callback is executed for it. The runtime PM status of a device after
|
|
successful execution of the subsystem-level suspend callback is 'suspended'.
|
|
|
|
* If the subsystem-level suspend callback returns -EBUSY or -EAGAIN,
|
|
the device's runtime PM status is 'active', which means that the device
|
|
_must_ be fully operational afterwards.
|
|
|
|
* If the subsystem-level suspend callback returns an error code different
|
|
from -EBUSY or -EAGAIN, the PM core regards this as a fatal error and will
|
|
refuse to run the helper functions described in Section 4 for the device,
|
|
until the status of it is directly set either to 'active', or to 'suspended'
|
|
(the PM core provides special helper functions for this purpose).
|
|
|
|
In particular, if the driver requires remote wake-up capability (i.e. hardware
|
|
mechanism allowing the device to request a change of its power state, such as
|
|
PCI PME) for proper functioning and device_run_wake() returns 'false' for the
|
|
device, then ->runtime_suspend() should return -EBUSY. On the other hand, if
|
|
device_run_wake() returns 'true' for the device and the device is put into a low
|
|
power state during the execution of the subsystem-level suspend callback, it is
|
|
expected that remote wake-up will be enabled for the device. Generally, remote
|
|
wake-up should be enabled for all input devices put into a low power state at
|
|
run time.
|
|
|
|
The subsystem-level resume callback is _entirely_ _responsible_ for handling the
|
|
resume of the device as appropriate, which may, but need not include executing
|
|
the device driver's own ->runtime_resume() callback (from the PM core's point of
|
|
view it is not necessary to implement a ->runtime_resume() callback in a device
|
|
driver as long as the subsystem-level resume callback knows what to do to handle
|
|
the device).
|
|
|
|
* Once the subsystem-level resume callback has completed successfully, the PM
|
|
core regards the device as fully operational, which means that the device
|
|
_must_ be able to complete I/O operations as needed. The runtime PM status
|
|
of the device is then 'active'.
|
|
|
|
* If the subsystem-level resume callback returns an error code, the PM core
|
|
regards this as a fatal error and will refuse to run the helper functions
|
|
described in Section 4 for the device, until its status is directly set
|
|
either to 'active' or to 'suspended' (the PM core provides special helper
|
|
functions for this purpose).
|
|
|
|
The subsystem-level idle callback is executed by the PM core whenever the device
|
|
appears to be idle, which is indicated to the PM core by two counters, the
|
|
device's usage counter and the counter of 'active' children of the device.
|
|
|
|
* If any of these counters is decreased using a helper function provided by
|
|
the PM core and it turns out to be equal to zero, the other counter is
|
|
checked. If that counter also is equal to zero, the PM core executes the
|
|
subsystem-level idle callback with the device as an argument.
|
|
|
|
The action performed by a subsystem-level idle callback is totally dependent on
|
|
the subsystem in question, but the expected and recommended action is to check
|
|
if the device can be suspended (i.e. if all of the conditions necessary for
|
|
suspending the device are satisfied) and to queue up a suspend request for the
|
|
device in that case. The value returned by this callback is ignored by the PM
|
|
core.
|
|
|
|
The helper functions provided by the PM core, described in Section 4, guarantee
|
|
that the following constraints are met with respect to the bus type's runtime
|
|
PM callbacks:
|
|
|
|
(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
|
|
->runtime_suspend() in parallel with ->runtime_resume() or with another
|
|
instance of ->runtime_suspend() for the same device) with the exception that
|
|
->runtime_suspend() or ->runtime_resume() can be executed in parallel with
|
|
->runtime_idle() (although ->runtime_idle() will not be started while any
|
|
of the other callbacks is being executed for the same device).
|
|
|
|
(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
|
|
devices (i.e. the PM core will only execute ->runtime_idle() or
|
|
->runtime_suspend() for the devices the runtime PM status of which is
|
|
'active').
|
|
|
|
(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
|
|
the usage counter of which is equal to zero _and_ either the counter of
|
|
'active' children of which is equal to zero, or the 'power.ignore_children'
|
|
flag of which is set.
|
|
|
|
(4) ->runtime_resume() can only be executed for 'suspended' devices (i.e. the
|
|
PM core will only execute ->runtime_resume() for the devices the runtime
|
|
PM status of which is 'suspended').
|
|
|
|
Additionally, the helper functions provided by the PM core obey the following
|
|
rules:
|
|
|
|
* If ->runtime_suspend() is about to be executed or there's a pending request
|
|
to execute it, ->runtime_idle() will not be executed for the same device.
|
|
|
|
* A request to execute or to schedule the execution of ->runtime_suspend()
|
|
will cancel any pending requests to execute ->runtime_idle() for the same
|
|
device.
|
|
|
|
* If ->runtime_resume() is about to be executed or there's a pending request
|
|
to execute it, the other callbacks will not be executed for the same device.
|
|
|
|
* A request to execute ->runtime_resume() will cancel any pending or
|
|
scheduled requests to execute the other callbacks for the same device,
|
|
except for scheduled autosuspends.
|
|
|
|
3. Runtime PM Device Fields
|
|
|
|
The following device runtime PM fields are present in 'struct dev_pm_info', as
|
|
defined in include/linux/pm.h:
|
|
|
|
struct timer_list suspend_timer;
|
|
- timer used for scheduling (delayed) suspend and autosuspend requests
|
|
|
|
unsigned long timer_expires;
|
|
- timer expiration time, in jiffies (if this is different from zero, the
|
|
timer is running and will expire at that time, otherwise the timer is not
|
|
running)
|
|
|
|
struct work_struct work;
|
|
- work structure used for queuing up requests (i.e. work items in pm_wq)
|
|
|
|
wait_queue_head_t wait_queue;
|
|
- wait queue used if any of the helper functions needs to wait for another
|
|
one to complete
|
|
|
|
spinlock_t lock;
|
|
- lock used for synchronisation
|
|
|
|
atomic_t usage_count;
|
|
- the usage counter of the device
|
|
|
|
atomic_t child_count;
|
|
- the count of 'active' children of the device
|
|
|
|
unsigned int ignore_children;
|
|
- if set, the value of child_count is ignored (but still updated)
|
|
|
|
unsigned int disable_depth;
|
|
- used for disabling the helper funcions (they work normally if this is
|
|
equal to zero); the initial value of it is 1 (i.e. runtime PM is
|
|
initially disabled for all devices)
|
|
|
|
unsigned int runtime_error;
|
|
- if set, there was a fatal error (one of the callbacks returned error code
|
|
as described in Section 2), so the helper funtions will not work until
|
|
this flag is cleared; this is the error code returned by the failing
|
|
callback
|
|
|
|
unsigned int idle_notification;
|
|
- if set, ->runtime_idle() is being executed
|
|
|
|
unsigned int request_pending;
|
|
- if set, there's a pending request (i.e. a work item queued up into pm_wq)
|
|
|
|
enum rpm_request request;
|
|
- type of request that's pending (valid if request_pending is set)
|
|
|
|
unsigned int deferred_resume;
|
|
- set if ->runtime_resume() is about to be run while ->runtime_suspend() is
|
|
being executed for that device and it is not practical to wait for the
|
|
suspend to complete; means "start a resume as soon as you've suspended"
|
|
|
|
unsigned int run_wake;
|
|
- set if the device is capable of generating runtime wake-up events
|
|
|
|
enum rpm_status runtime_status;
|
|
- the runtime PM status of the device; this field's initial value is
|
|
RPM_SUSPENDED, which means that each device is initially regarded by the
|
|
PM core as 'suspended', regardless of its real hardware status
|
|
|
|
unsigned int runtime_auto;
|
|
- if set, indicates that the user space has allowed the device driver to
|
|
power manage the device at run time via the /sys/devices/.../power/control
|
|
interface; it may only be modified with the help of the pm_runtime_allow()
|
|
and pm_runtime_forbid() helper functions
|
|
|
|
unsigned int no_callbacks;
|
|
- indicates that the device does not use the runtime PM callbacks (see
|
|
Section 8); it may be modified only by the pm_runtime_no_callbacks()
|
|
helper function
|
|
|
|
unsigned int irq_safe;
|
|
- indicates that the ->runtime_suspend() and ->runtime_resume() callbacks
|
|
will be invoked with the spinlock held and interrupts disabled
|
|
|
|
unsigned int use_autosuspend;
|
|
- indicates that the device's driver supports delayed autosuspend (see
|
|
Section 9); it may be modified only by the
|
|
pm_runtime{_dont}_use_autosuspend() helper functions
|
|
|
|
unsigned int timer_autosuspends;
|
|
- indicates that the PM core should attempt to carry out an autosuspend
|
|
when the timer expires rather than a normal suspend
|
|
|
|
int autosuspend_delay;
|
|
- the delay time (in milliseconds) to be used for autosuspend
|
|
|
|
unsigned long last_busy;
|
|
- the time (in jiffies) when the pm_runtime_mark_last_busy() helper
|
|
function was last called for this device; used in calculating inactivity
|
|
periods for autosuspend
|
|
|
|
All of the above fields are members of the 'power' member of 'struct device'.
|
|
|
|
4. Runtime PM Device Helper Functions
|
|
|
|
The following runtime PM helper functions are defined in
|
|
drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
|
|
|
void pm_runtime_init(struct device *dev);
|
|
- initialize the device runtime PM fields in 'struct dev_pm_info'
|
|
|
|
void pm_runtime_remove(struct device *dev);
|
|
- make sure that the runtime PM of the device will be disabled after
|
|
removing the device from device hierarchy
|
|
|
|
int pm_runtime_idle(struct device *dev);
|
|
- execute the subsystem-level idle callback for the device; returns 0 on
|
|
success or error code on failure, where -EINPROGRESS means that
|
|
->runtime_idle() is already being executed
|
|
|
|
int pm_runtime_suspend(struct device *dev);
|
|
- execute the subsystem-level suspend callback for the device; returns 0 on
|
|
success, 1 if the device's runtime PM status was already 'suspended', or
|
|
error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
|
|
to suspend the device again in future and -EACCES means that
|
|
'power.disable_depth' is different from 0
|
|
|
|
int pm_runtime_autosuspend(struct device *dev);
|
|
- same as pm_runtime_suspend() except that the autosuspend delay is taken
|
|
into account; if pm_runtime_autosuspend_expiration() says the delay has
|
|
not yet expired then an autosuspend is scheduled for the appropriate time
|
|
and 0 is returned
|
|
|
|
int pm_runtime_resume(struct device *dev);
|
|
- execute the subsystem-level resume callback for the device; returns 0 on
|
|
success, 1 if the device's runtime PM status was already 'active' or
|
|
error code on failure, where -EAGAIN means it may be safe to attempt to
|
|
resume the device again in future, but 'power.runtime_error' should be
|
|
checked additionally, and -EACCES means that 'power.disable_depth' is
|
|
different from 0
|
|
|
|
int pm_request_idle(struct device *dev);
|
|
- submit a request to execute the subsystem-level idle callback for the
|
|
device (the request is represented by a work item in pm_wq); returns 0 on
|
|
success or error code if the request has not been queued up
|
|
|
|
int pm_request_autosuspend(struct device *dev);
|
|
- schedule the execution of the subsystem-level suspend callback for the
|
|
device when the autosuspend delay has expired; if the delay has already
|
|
expired then the work item is queued up immediately
|
|
|
|
int pm_schedule_suspend(struct device *dev, unsigned int delay);
|
|
- schedule the execution of the subsystem-level suspend callback for the
|
|
device in future, where 'delay' is the time to wait before queuing up a
|
|
suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work
|
|
item is queued up immediately); returns 0 on success, 1 if the device's PM
|
|
runtime status was already 'suspended', or error code if the request
|
|
hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
|
|
->runtime_suspend() is already scheduled and not yet expired, the new
|
|
value of 'delay' will be used as the time to wait
|
|
|
|
int pm_request_resume(struct device *dev);
|
|
- submit a request to execute the subsystem-level resume callback for the
|
|
device (the request is represented by a work item in pm_wq); returns 0 on
|
|
success, 1 if the device's runtime PM status was already 'active', or
|
|
error code if the request hasn't been queued up
|
|
|
|
void pm_runtime_get_noresume(struct device *dev);
|
|
- increment the device's usage counter
|
|
|
|
int pm_runtime_get(struct device *dev);
|
|
- increment the device's usage counter, run pm_request_resume(dev) and
|
|
return its result
|
|
|
|
int pm_runtime_get_sync(struct device *dev);
|
|
- increment the device's usage counter, run pm_runtime_resume(dev) and
|
|
return its result
|
|
|
|
void pm_runtime_put_noidle(struct device *dev);
|
|
- decrement the device's usage counter
|
|
|
|
int pm_runtime_put(struct device *dev);
|
|
- decrement the device's usage counter; if the result is 0 then run
|
|
pm_request_idle(dev) and return its result
|
|
|
|
int pm_runtime_put_autosuspend(struct device *dev);
|
|
- decrement the device's usage counter; if the result is 0 then run
|
|
pm_request_autosuspend(dev) and return its result
|
|
|
|
int pm_runtime_put_sync(struct device *dev);
|
|
- decrement the device's usage counter; if the result is 0 then run
|
|
pm_runtime_idle(dev) and return its result
|
|
|
|
int pm_runtime_put_sync_suspend(struct device *dev);
|
|
- decrement the device's usage counter; if the result is 0 then run
|
|
pm_runtime_suspend(dev) and return its result
|
|
|
|
int pm_runtime_put_sync_autosuspend(struct device *dev);
|
|
- decrement the device's usage counter; if the result is 0 then run
|
|
pm_runtime_autosuspend(dev) and return its result
|
|
|
|
void pm_runtime_enable(struct device *dev);
|
|
- decrement the device's 'power.disable_depth' field; if that field is equal
|
|
to zero, the runtime PM helper functions can execute subsystem-level
|
|
callbacks described in Section 2 for the device
|
|
|
|
int pm_runtime_disable(struct device *dev);
|
|
- increment the device's 'power.disable_depth' field (if the value of that
|
|
field was previously zero, this prevents subsystem-level runtime PM
|
|
callbacks from being run for the device), make sure that all of the pending
|
|
runtime PM operations on the device are either completed or canceled;
|
|
returns 1 if there was a resume request pending and it was necessary to
|
|
execute the subsystem-level resume callback for the device to satisfy that
|
|
request, otherwise 0 is returned
|
|
|
|
int pm_runtime_barrier(struct device *dev);
|
|
- check if there's a resume request pending for the device and resume it
|
|
(synchronously) in that case, cancel any other pending runtime PM requests
|
|
regarding it and wait for all runtime PM operations on it in progress to
|
|
complete; returns 1 if there was a resume request pending and it was
|
|
necessary to execute the subsystem-level resume callback for the device to
|
|
satisfy that request, otherwise 0 is returned
|
|
|
|
void pm_suspend_ignore_children(struct device *dev, bool enable);
|
|
- set/unset the power.ignore_children flag of the device
|
|
|
|
int pm_runtime_set_active(struct device *dev);
|
|
- clear the device's 'power.runtime_error' flag, set the device's runtime
|
|
PM status to 'active' and update its parent's counter of 'active'
|
|
children as appropriate (it is only valid to use this function if
|
|
'power.runtime_error' is set or 'power.disable_depth' is greater than
|
|
zero); it will fail and return error code if the device has a parent
|
|
which is not active and the 'power.ignore_children' flag of which is unset
|
|
|
|
void pm_runtime_set_suspended(struct device *dev);
|
|
- clear the device's 'power.runtime_error' flag, set the device's runtime
|
|
PM status to 'suspended' and update its parent's counter of 'active'
|
|
children as appropriate (it is only valid to use this function if
|
|
'power.runtime_error' is set or 'power.disable_depth' is greater than
|
|
zero)
|
|
|
|
bool pm_runtime_suspended(struct device *dev);
|
|
- return true if the device's runtime PM status is 'suspended' and its
|
|
'power.disable_depth' field is equal to zero, or false otherwise
|
|
|
|
bool pm_runtime_status_suspended(struct device *dev);
|
|
- return true if the device's runtime PM status is 'suspended'
|
|
|
|
void pm_runtime_allow(struct device *dev);
|
|
- set the power.runtime_auto flag for the device and decrease its usage
|
|
counter (used by the /sys/devices/.../power/control interface to
|
|
effectively allow the device to be power managed at run time)
|
|
|
|
void pm_runtime_forbid(struct device *dev);
|
|
- unset the power.runtime_auto flag for the device and increase its usage
|
|
counter (used by the /sys/devices/.../power/control interface to
|
|
effectively prevent the device from being power managed at run time)
|
|
|
|
void pm_runtime_no_callbacks(struct device *dev);
|
|
- set the power.no_callbacks flag for the device and remove the runtime
|
|
PM attributes from /sys/devices/.../power (or prevent them from being
|
|
added when the device is registered)
|
|
|
|
void pm_runtime_irq_safe(struct device *dev);
|
|
- set the power.irq_safe flag for the device, causing the runtime-PM
|
|
suspend and resume callbacks (but not the idle callback) to be invoked
|
|
with interrupts disabled
|
|
|
|
void pm_runtime_mark_last_busy(struct device *dev);
|
|
- set the power.last_busy field to the current time
|
|
|
|
void pm_runtime_use_autosuspend(struct device *dev);
|
|
- set the power.use_autosuspend flag, enabling autosuspend delays
|
|
|
|
void pm_runtime_dont_use_autosuspend(struct device *dev);
|
|
- clear the power.use_autosuspend flag, disabling autosuspend delays
|
|
|
|
void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
|
|
- set the power.autosuspend_delay value to 'delay' (expressed in
|
|
milliseconds); if 'delay' is negative then runtime suspends are
|
|
prevented
|
|
|
|
unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
|
|
- calculate the time when the current autosuspend delay period will expire,
|
|
based on power.last_busy and power.autosuspend_delay; if the delay time
|
|
is 1000 ms or larger then the expiration time is rounded up to the
|
|
nearest second; returns 0 if the delay period has already expired or
|
|
power.use_autosuspend isn't set, otherwise returns the expiration time
|
|
in jiffies
|
|
|
|
It is safe to execute the following helper functions from interrupt context:
|
|
|
|
pm_request_idle()
|
|
pm_request_autosuspend()
|
|
pm_schedule_suspend()
|
|
pm_request_resume()
|
|
pm_runtime_get_noresume()
|
|
pm_runtime_get()
|
|
pm_runtime_put_noidle()
|
|
pm_runtime_put()
|
|
pm_runtime_put_autosuspend()
|
|
pm_runtime_enable()
|
|
pm_suspend_ignore_children()
|
|
pm_runtime_set_active()
|
|
pm_runtime_set_suspended()
|
|
pm_runtime_suspended()
|
|
pm_runtime_mark_last_busy()
|
|
pm_runtime_autosuspend_expiration()
|
|
|
|
If pm_runtime_irq_safe() has been called for a device then the following helper
|
|
functions may also be used in interrupt context:
|
|
|
|
pm_runtime_suspend()
|
|
pm_runtime_autosuspend()
|
|
pm_runtime_resume()
|
|
pm_runtime_get_sync()
|
|
pm_runtime_put_sync()
|
|
pm_runtime_put_sync_suspend()
|
|
|
|
5. Runtime PM Initialization, Device Probing and Removal
|
|
|
|
Initially, the runtime PM is disabled for all devices, which means that the
|
|
majority of the runtime PM helper funtions described in Section 4 will return
|
|
-EAGAIN until pm_runtime_enable() is called for the device.
|
|
|
|
In addition to that, the initial runtime PM status of all devices is
|
|
'suspended', but it need not reflect the actual physical state of the device.
|
|
Thus, if the device is initially active (i.e. it is able to process I/O), its
|
|
runtime PM status must be changed to 'active', with the help of
|
|
pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
|
|
|
|
However, if the device has a parent and the parent's runtime PM is enabled,
|
|
calling pm_runtime_set_active() for the device will affect the parent, unless
|
|
the parent's 'power.ignore_children' flag is set. Namely, in that case the
|
|
parent won't be able to suspend at run time, using the PM core's helper
|
|
functions, as long as the child's status is 'active', even if the child's
|
|
runtime PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
|
|
the child yet or pm_runtime_disable() has been called for it). For this reason,
|
|
once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
|
|
should be called for it too as soon as reasonably possible or its runtime PM
|
|
status should be changed back to 'suspended' with the help of
|
|
pm_runtime_set_suspended().
|
|
|
|
If the default initial runtime PM status of the device (i.e. 'suspended')
|
|
reflects the actual state of the device, its bus type's or its driver's
|
|
->probe() callback will likely need to wake it up using one of the PM core's
|
|
helper functions described in Section 4. In that case, pm_runtime_resume()
|
|
should be used. Of course, for this purpose the device's runtime PM has to be
|
|
enabled earlier by calling pm_runtime_enable().
|
|
|
|
If the device bus type's or driver's ->probe() callback runs
|
|
pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts,
|
|
they will fail returning -EAGAIN, because the device's usage counter is
|
|
incremented by the driver core before executing ->probe(). Still, it may be
|
|
desirable to suspend the device as soon as ->probe() has finished, so the driver
|
|
core uses pm_runtime_put_sync() to invoke the subsystem-level idle callback for
|
|
the device at that time.
|
|
|
|
Moreover, the driver core prevents runtime PM callbacks from racing with the bus
|
|
notifier callback in __device_release_driver(), which is necessary, because the
|
|
notifier is used by some subsystems to carry out operations affecting the
|
|
runtime PM functionality. It does so by calling pm_runtime_get_sync() before
|
|
driver_sysfs_remove() and the BUS_NOTIFY_UNBIND_DRIVER notifications. This
|
|
resumes the device if it's in the suspended state and prevents it from
|
|
being suspended again while those routines are being executed.
|
|
|
|
To allow bus types and drivers to put devices into the suspended state by
|
|
calling pm_runtime_suspend() from their ->remove() routines, the driver core
|
|
executes pm_runtime_put_sync() after running the BUS_NOTIFY_UNBIND_DRIVER
|
|
notifications in __device_release_driver(). This requires bus types and
|
|
drivers to make their ->remove() callbacks avoid races with runtime PM directly,
|
|
but also it allows of more flexibility in the handling of devices during the
|
|
removal of their drivers.
|
|
|
|
The user space can effectively disallow the driver of the device to power manage
|
|
it at run time by changing the value of its /sys/devices/.../power/control
|
|
attribute to "on", which causes pm_runtime_forbid() to be called. In principle,
|
|
this mechanism may also be used by the driver to effectively turn off the
|
|
runtime power management of the device until the user space turns it on.
|
|
Namely, during the initialization the driver can make sure that the runtime PM
|
|
status of the device is 'active' and call pm_runtime_forbid(). It should be
|
|
noted, however, that if the user space has already intentionally changed the
|
|
value of /sys/devices/.../power/control to "auto" to allow the driver to power
|
|
manage the device at run time, the driver may confuse it by using
|
|
pm_runtime_forbid() this way.
|
|
|
|
6. Runtime PM and System Sleep
|
|
|
|
Runtime PM and system sleep (i.e., system suspend and hibernation, also known
|
|
as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of
|
|
ways. If a device is active when a system sleep starts, everything is
|
|
straightforward. But what should happen if the device is already suspended?
|
|
|
|
The device may have different wake-up settings for runtime PM and system sleep.
|
|
For example, remote wake-up may be enabled for runtime suspend but disallowed
|
|
for system sleep (device_may_wakeup(dev) returns 'false'). When this happens,
|
|
the subsystem-level system suspend callback is responsible for changing the
|
|
device's wake-up setting (it may leave that to the device driver's system
|
|
suspend routine). It may be necessary to resume the device and suspend it again
|
|
in order to do so. The same is true if the driver uses different power levels
|
|
or other settings for runtime suspend and system sleep.
|
|
|
|
During system resume, the simplest approach is to bring all devices back to full
|
|
power, even if they had been suspended before the system suspend began. There
|
|
are several reasons for this, including:
|
|
|
|
* The device might need to switch power levels, wake-up settings, etc.
|
|
|
|
* Remote wake-up events might have been lost by the firmware.
|
|
|
|
* The device's children may need the device to be at full power in order
|
|
to resume themselves.
|
|
|
|
* The driver's idea of the device state may not agree with the device's
|
|
physical state. This can happen during resume from hibernation.
|
|
|
|
* The device might need to be reset.
|
|
|
|
* Even though the device was suspended, if its usage counter was > 0 then most
|
|
likely it would need a runtime resume in the near future anyway.
|
|
|
|
If the device had been suspended before the system suspend began and it's
|
|
brought back to full power during resume, then its runtime PM status will have
|
|
to be updated to reflect the actual post-system sleep status. The way to do
|
|
this is:
|
|
|
|
pm_runtime_disable(dev);
|
|
pm_runtime_set_active(dev);
|
|
pm_runtime_enable(dev);
|
|
|
|
The PM core always increments the runtime usage counter before calling the
|
|
->suspend() callback and decrements it after calling the ->resume() callback.
|
|
Hence disabling runtime PM temporarily like this will not cause any runtime
|
|
suspend attempts to be permanently lost. If the usage count goes to zero
|
|
following the return of the ->resume() callback, the ->runtime_idle() callback
|
|
will be invoked as usual.
|
|
|
|
On some systems, however, system sleep is not entered through a global firmware
|
|
or hardware operation. Instead, all hardware components are put into low-power
|
|
states directly by the kernel in a coordinated way. Then, the system sleep
|
|
state effectively follows from the states the hardware components end up in
|
|
and the system is woken up from that state by a hardware interrupt or a similar
|
|
mechanism entirely under the kernel's control. As a result, the kernel never
|
|
gives control away and the states of all devices during resume are precisely
|
|
known to it. If that is the case and none of the situations listed above takes
|
|
place (in particular, if the system is not waking up from hibernation), it may
|
|
be more efficient to leave the devices that had been suspended before the system
|
|
suspend began in the suspended state.
|
|
|
|
The PM core does its best to reduce the probability of race conditions between
|
|
the runtime PM and system suspend/resume (and hibernation) callbacks by carrying
|
|
out the following operations:
|
|
|
|
* During system suspend it calls pm_runtime_get_noresume() and
|
|
pm_runtime_barrier() for every device right before executing the
|
|
subsystem-level .suspend() callback for it. In addition to that it calls
|
|
pm_runtime_disable() for every device right after executing the
|
|
subsystem-level .suspend() callback for it.
|
|
|
|
* During system resume it calls pm_runtime_enable() and pm_runtime_put_sync()
|
|
for every device right before and right after executing the subsystem-level
|
|
.resume() callback for it, respectively.
|
|
|
|
7. Generic subsystem callbacks
|
|
|
|
Subsystems may wish to conserve code space by using the set of generic power
|
|
management callbacks provided by the PM core, defined in
|
|
driver/base/power/generic_ops.c:
|
|
|
|
int pm_generic_runtime_idle(struct device *dev);
|
|
- invoke the ->runtime_idle() callback provided by the driver of this
|
|
device, if defined, and call pm_runtime_suspend() for this device if the
|
|
return value is 0 or the callback is not defined
|
|
|
|
int pm_generic_runtime_suspend(struct device *dev);
|
|
- invoke the ->runtime_suspend() callback provided by the driver of this
|
|
device and return its result, or return -EINVAL if not defined
|
|
|
|
int pm_generic_runtime_resume(struct device *dev);
|
|
- invoke the ->runtime_resume() callback provided by the driver of this
|
|
device and return its result, or return -EINVAL if not defined
|
|
|
|
int pm_generic_suspend(struct device *dev);
|
|
- if the device has not been suspended at run time, invoke the ->suspend()
|
|
callback provided by its driver and return its result, or return 0 if not
|
|
defined
|
|
|
|
int pm_generic_suspend_noirq(struct device *dev);
|
|
- if pm_runtime_suspended(dev) returns "false", invoke the ->suspend_noirq()
|
|
callback provided by the device's driver and return its result, or return
|
|
0 if not defined
|
|
|
|
int pm_generic_resume(struct device *dev);
|
|
- invoke the ->resume() callback provided by the driver of this device and,
|
|
if successful, change the device's runtime PM status to 'active'
|
|
|
|
int pm_generic_resume_noirq(struct device *dev);
|
|
- invoke the ->resume_noirq() callback provided by the driver of this device
|
|
|
|
int pm_generic_freeze(struct device *dev);
|
|
- if the device has not been suspended at run time, invoke the ->freeze()
|
|
callback provided by its driver and return its result, or return 0 if not
|
|
defined
|
|
|
|
int pm_generic_freeze_noirq(struct device *dev);
|
|
- if pm_runtime_suspended(dev) returns "false", invoke the ->freeze_noirq()
|
|
callback provided by the device's driver and return its result, or return
|
|
0 if not defined
|
|
|
|
int pm_generic_thaw(struct device *dev);
|
|
- if the device has not been suspended at run time, invoke the ->thaw()
|
|
callback provided by its driver and return its result, or return 0 if not
|
|
defined
|
|
|
|
int pm_generic_thaw_noirq(struct device *dev);
|
|
- if pm_runtime_suspended(dev) returns "false", invoke the ->thaw_noirq()
|
|
callback provided by the device's driver and return its result, or return
|
|
0 if not defined
|
|
|
|
int pm_generic_poweroff(struct device *dev);
|
|
- if the device has not been suspended at run time, invoke the ->poweroff()
|
|
callback provided by its driver and return its result, or return 0 if not
|
|
defined
|
|
|
|
int pm_generic_poweroff_noirq(struct device *dev);
|
|
- if pm_runtime_suspended(dev) returns "false", run the ->poweroff_noirq()
|
|
callback provided by the device's driver and return its result, or return
|
|
0 if not defined
|
|
|
|
int pm_generic_restore(struct device *dev);
|
|
- invoke the ->restore() callback provided by the driver of this device and,
|
|
if successful, change the device's runtime PM status to 'active'
|
|
|
|
int pm_generic_restore_noirq(struct device *dev);
|
|
- invoke the ->restore_noirq() callback provided by the device's driver
|
|
|
|
These functions can be assigned to the ->runtime_idle(), ->runtime_suspend(),
|
|
->runtime_resume(), ->suspend(), ->suspend_noirq(), ->resume(),
|
|
->resume_noirq(), ->freeze(), ->freeze_noirq(), ->thaw(), ->thaw_noirq(),
|
|
->poweroff(), ->poweroff_noirq(), ->restore(), ->restore_noirq() callback
|
|
pointers in the subsystem-level dev_pm_ops structures.
|
|
|
|
If a subsystem wishes to use all of them at the same time, it can simply assign
|
|
the GENERIC_SUBSYS_PM_OPS macro, defined in include/linux/pm.h, to its
|
|
dev_pm_ops structure pointer.
|
|
|
|
Device drivers that wish to use the same function as a system suspend, freeze,
|
|
poweroff and runtime suspend callback, and similarly for system resume, thaw,
|
|
restore, and runtime resume, can achieve this with the help of the
|
|
UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
|
|
last argument to NULL).
|
|
|
|
8. "No-Callback" Devices
|
|
|
|
Some "devices" are only logical sub-devices of their parent and cannot be
|
|
power-managed on their own. (The prototype example is a USB interface. Entire
|
|
USB devices can go into low-power mode or send wake-up requests, but neither is
|
|
possible for individual interfaces.) The drivers for these devices have no
|
|
need of runtime PM callbacks; if the callbacks did exist, ->runtime_suspend()
|
|
and ->runtime_resume() would always return 0 without doing anything else and
|
|
->runtime_idle() would always call pm_runtime_suspend().
|
|
|
|
Subsystems can tell the PM core about these devices by calling
|
|
pm_runtime_no_callbacks(). This should be done after the device structure is
|
|
initialized and before it is registered (although after device registration is
|
|
also okay). The routine will set the device's power.no_callbacks flag and
|
|
prevent the non-debugging runtime PM sysfs attributes from being created.
|
|
|
|
When power.no_callbacks is set, the PM core will not invoke the
|
|
->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks.
|
|
Instead it will assume that suspends and resumes always succeed and that idle
|
|
devices should be suspended.
|
|
|
|
As a consequence, the PM core will never directly inform the device's subsystem
|
|
or driver about runtime power changes. Instead, the driver for the device's
|
|
parent must take responsibility for telling the device's driver when the
|
|
parent's power state changes.
|
|
|
|
9. Autosuspend, or automatically-delayed suspends
|
|
|
|
Changing a device's power state isn't free; it requires both time and energy.
|
|
A device should be put in a low-power state only when there's some reason to
|
|
think it will remain in that state for a substantial time. A common heuristic
|
|
says that a device which hasn't been used for a while is liable to remain
|
|
unused; following this advice, drivers should not allow devices to be suspended
|
|
at runtime until they have been inactive for some minimum period. Even when
|
|
the heuristic ends up being non-optimal, it will still prevent devices from
|
|
"bouncing" too rapidly between low-power and full-power states.
|
|
|
|
The term "autosuspend" is an historical remnant. It doesn't mean that the
|
|
device is automatically suspended (the subsystem or driver still has to call
|
|
the appropriate PM routines); rather it means that runtime suspends will
|
|
automatically be delayed until the desired period of inactivity has elapsed.
|
|
|
|
Inactivity is determined based on the power.last_busy field. Drivers should
|
|
call pm_runtime_mark_last_busy() to update this field after carrying out I/O,
|
|
typically just before calling pm_runtime_put_autosuspend(). The desired length
|
|
of the inactivity period is a matter of policy. Subsystems can set this length
|
|
initially by calling pm_runtime_set_autosuspend_delay(), but after device
|
|
registration the length should be controlled by user space, using the
|
|
/sys/devices/.../power/autosuspend_delay_ms attribute.
|
|
|
|
In order to use autosuspend, subsystems or drivers must call
|
|
pm_runtime_use_autosuspend() (preferably before registering the device), and
|
|
thereafter they should use the various *_autosuspend() helper functions instead
|
|
of the non-autosuspend counterparts:
|
|
|
|
Instead of: pm_runtime_suspend use: pm_runtime_autosuspend;
|
|
Instead of: pm_schedule_suspend use: pm_request_autosuspend;
|
|
Instead of: pm_runtime_put use: pm_runtime_put_autosuspend;
|
|
Instead of: pm_runtime_put_sync use: pm_runtime_put_sync_autosuspend.
|
|
|
|
Drivers may also continue to use the non-autosuspend helper functions; they
|
|
will behave normally, not taking the autosuspend delay into account.
|
|
Similarly, if the power.use_autosuspend field isn't set then the autosuspend
|
|
helper functions will behave just like the non-autosuspend counterparts.
|
|
|
|
The implementation is well suited for asynchronous use in interrupt contexts.
|
|
However such use inevitably involves races, because the PM core can't
|
|
synchronize ->runtime_suspend() callbacks with the arrival of I/O requests.
|
|
This synchronization must be handled by the driver, using its private lock.
|
|
Here is a schematic pseudo-code example:
|
|
|
|
foo_read_or_write(struct foo_priv *foo, void *data)
|
|
{
|
|
lock(&foo->private_lock);
|
|
add_request_to_io_queue(foo, data);
|
|
if (foo->num_pending_requests++ == 0)
|
|
pm_runtime_get(&foo->dev);
|
|
if (!foo->is_suspended)
|
|
foo_process_next_request(foo);
|
|
unlock(&foo->private_lock);
|
|
}
|
|
|
|
foo_io_completion(struct foo_priv *foo, void *req)
|
|
{
|
|
lock(&foo->private_lock);
|
|
if (--foo->num_pending_requests == 0) {
|
|
pm_runtime_mark_last_busy(&foo->dev);
|
|
pm_runtime_put_autosuspend(&foo->dev);
|
|
} else {
|
|
foo_process_next_request(foo);
|
|
}
|
|
unlock(&foo->private_lock);
|
|
/* Send req result back to the user ... */
|
|
}
|
|
|
|
int foo_runtime_suspend(struct device *dev)
|
|
{
|
|
struct foo_priv foo = container_of(dev, ...);
|
|
int ret = 0;
|
|
|
|
lock(&foo->private_lock);
|
|
if (foo->num_pending_requests > 0) {
|
|
ret = -EBUSY;
|
|
} else {
|
|
/* ... suspend the device ... */
|
|
foo->is_suspended = 1;
|
|
}
|
|
unlock(&foo->private_lock);
|
|
return ret;
|
|
}
|
|
|
|
int foo_runtime_resume(struct device *dev)
|
|
{
|
|
struct foo_priv foo = container_of(dev, ...);
|
|
|
|
lock(&foo->private_lock);
|
|
/* ... resume the device ... */
|
|
foo->is_suspended = 0;
|
|
pm_runtime_mark_last_busy(&foo->dev);
|
|
if (foo->num_pending_requests > 0)
|
|
foo_process_requests(foo);
|
|
unlock(&foo->private_lock);
|
|
return 0;
|
|
}
|
|
|
|
The important point is that after foo_io_completion() asks for an autosuspend,
|
|
the foo_runtime_suspend() callback may race with foo_read_or_write().
|
|
Therefore foo_runtime_suspend() has to check whether there are any pending I/O
|
|
requests (while holding the private lock) before allowing the suspend to
|
|
proceed.
|
|
|
|
In addition, the power.autosuspend_delay field can be changed by user space at
|
|
any time. If a driver cares about this, it can call
|
|
pm_runtime_autosuspend_expiration() from within the ->runtime_suspend()
|
|
callback while holding its private lock. If the function returns a nonzero
|
|
value then the delay has not yet expired and the callback should return
|
|
-EAGAIN.
|