forked from Minki/linux
Merge branch 'pci/docs'
- Convert docs to reST (Changbin Du) - Convert PM docs to reST (Mauro Carvalho Chehab) * pci/docs: docs: power: convert docs to ReST and rename to *.rst Documentation: PCI: convert endpoint/pci-test-howto.txt to reST Documentation: PCI: convert endpoint/pci-test-function.txt to reST Documentation: PCI: convert endpoint/pci-endpoint-cfs.txt to reST Documentation: PCI: convert endpoint/pci-endpoint.txt to reST Documentation: PCI: convert pcieaer-howto.txt to reST Documentation: PCI: convert pci-error-recovery.txt to reST Documentation: PCI: convert acpi-info.txt to reST Documentation: PCI: convert MSI-HOWTO.txt to reST Documentation: PCI: convert pci-iov-howto.txt to reST Documentation: PCI: convert PCIEBUS-HOWTO.txt to reST Documentation: PCI: convert pci.txt to reST Documentation: add Linux PCI to Sphinx TOC tree
This commit is contained in:
commit
b6a001c0cb
@ -5,7 +5,7 @@ Contact: linux-pm@vger.kernel.org
|
||||
Description:
|
||||
The powercap/ class sub directory belongs to the power cap
|
||||
subsystem. Refer to
|
||||
Documentation/power/powercap/powercap.txt for details.
|
||||
Documentation/power/powercap/powercap.rst for details.
|
||||
|
||||
What: /sys/class/powercap/<control type>
|
||||
Date: September 2013
|
||||
|
@ -1,4 +1,8 @@
|
||||
ACPI considerations for PCI host bridges
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
========================================
|
||||
ACPI considerations for PCI host bridges
|
||||
========================================
|
||||
|
||||
The general rule is that the ACPI namespace should describe everything the
|
||||
OS might use unless there's another way for the OS to find it [1, 2].
|
||||
@ -131,12 +135,13 @@ address always corresponds to bus 0, even if the bus range below the bridge
|
||||
|
||||
[4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4:
|
||||
QWord/DWord/Word Address Space Descriptor (.1, .2, .3)
|
||||
General Flags: Bit [0] Ignored
|
||||
General Flags: Bit [0] Ignored
|
||||
|
||||
Extended Address Space Descriptor (.4)
|
||||
General Flags: Bit [0] Consumer/Producer:
|
||||
1–This device consumes this resource
|
||||
0–This device produces and consumes this resource
|
||||
General Flags: Bit [0] Consumer/Producer:
|
||||
|
||||
* 1 – This device consumes this resource
|
||||
* 0 – This device produces and consumes this resource
|
||||
|
||||
[5] ACPI 6.2, sec 19.6.43:
|
||||
ResourceUsage specifies whether the Memory range is consumed by
|
13
Documentation/PCI/endpoint/index.rst
Normal file
13
Documentation/PCI/endpoint/index.rst
Normal file
@ -0,0 +1,13 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================
|
||||
PCI Endpoint Framework
|
||||
======================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
pci-endpoint
|
||||
pci-endpoint-cfs
|
||||
pci-test-function
|
||||
pci-test-howto
|
@ -1,41 +1,51 @@
|
||||
CONFIGURING PCI ENDPOINT USING CONFIGFS
|
||||
Kishon Vijay Abraham I <kishon@ti.com>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================================
|
||||
Configuring PCI Endpoint Using CONFIGFS
|
||||
=======================================
|
||||
|
||||
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||
|
||||
The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the
|
||||
PCI endpoint function and to bind the endpoint function
|
||||
with the endpoint controller. (For introducing other mechanisms to
|
||||
configure the PCI Endpoint Function refer to [1]).
|
||||
|
||||
*) Mounting configfs
|
||||
Mounting configfs
|
||||
=================
|
||||
|
||||
The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs
|
||||
directory. configfs can be mounted using the following command.
|
||||
directory. configfs can be mounted using the following command::
|
||||
|
||||
mount -t configfs none /sys/kernel/config
|
||||
|
||||
*) Directory Structure
|
||||
Directory Structure
|
||||
===================
|
||||
|
||||
The pci_ep configfs has two directories at its root: controllers and
|
||||
functions. Every EPC device present in the system will have an entry in
|
||||
the *controllers* directory and and every EPF driver present in the system
|
||||
will have an entry in the *functions* directory.
|
||||
::
|
||||
|
||||
/sys/kernel/config/pci_ep/
|
||||
.. controllers/
|
||||
.. functions/
|
||||
/sys/kernel/config/pci_ep/
|
||||
.. controllers/
|
||||
.. functions/
|
||||
|
||||
*) Creating EPF Device
|
||||
Creating EPF Device
|
||||
===================
|
||||
|
||||
Every registered EPF driver will be listed in controllers directory. The
|
||||
entries corresponding to EPF driver will be created by the EPF core.
|
||||
::
|
||||
|
||||
/sys/kernel/config/pci_ep/functions/
|
||||
.. <EPF Driver1>/
|
||||
... <EPF Device 11>/
|
||||
... <EPF Device 21>/
|
||||
.. <EPF Driver2>/
|
||||
... <EPF Device 12>/
|
||||
... <EPF Device 22>/
|
||||
/sys/kernel/config/pci_ep/functions/
|
||||
.. <EPF Driver1>/
|
||||
... <EPF Device 11>/
|
||||
... <EPF Device 21>/
|
||||
.. <EPF Driver2>/
|
||||
... <EPF Device 12>/
|
||||
... <EPF Device 22>/
|
||||
|
||||
In order to create a <EPF device> of the type probed by <EPF Driver>, the
|
||||
user has to create a directory inside <EPF DriverN>.
|
||||
@ -44,34 +54,37 @@ Every <EPF device> directory consists of the following entries that can be
|
||||
used to configure the standard configuration header of the endpoint function.
|
||||
(These entries are created by the framework when any new <EPF Device> is
|
||||
created)
|
||||
::
|
||||
|
||||
.. <EPF Driver1>/
|
||||
... <EPF Device 11>/
|
||||
... vendorid
|
||||
... deviceid
|
||||
... revid
|
||||
... progif_code
|
||||
... subclass_code
|
||||
... baseclass_code
|
||||
... cache_line_size
|
||||
... subsys_vendor_id
|
||||
... subsys_id
|
||||
... interrupt_pin
|
||||
.. <EPF Driver1>/
|
||||
... <EPF Device 11>/
|
||||
... vendorid
|
||||
... deviceid
|
||||
... revid
|
||||
... progif_code
|
||||
... subclass_code
|
||||
... baseclass_code
|
||||
... cache_line_size
|
||||
... subsys_vendor_id
|
||||
... subsys_id
|
||||
... interrupt_pin
|
||||
|
||||
*) EPC Device
|
||||
EPC Device
|
||||
==========
|
||||
|
||||
Every registered EPC device will be listed in controllers directory. The
|
||||
entries corresponding to EPC device will be created by the EPC core.
|
||||
::
|
||||
|
||||
/sys/kernel/config/pci_ep/controllers/
|
||||
.. <EPC Device1>/
|
||||
... <Symlink EPF Device11>/
|
||||
... <Symlink EPF Device12>/
|
||||
... start
|
||||
.. <EPC Device2>/
|
||||
... <Symlink EPF Device21>/
|
||||
... <Symlink EPF Device22>/
|
||||
... start
|
||||
/sys/kernel/config/pci_ep/controllers/
|
||||
.. <EPC Device1>/
|
||||
... <Symlink EPF Device11>/
|
||||
... <Symlink EPF Device12>/
|
||||
... start
|
||||
.. <EPC Device2>/
|
||||
... <Symlink EPF Device21>/
|
||||
... <Symlink EPF Device22>/
|
||||
... start
|
||||
|
||||
The <EPC Device> directory will have a list of symbolic links to
|
||||
<EPF Device>. These symbolic links should be created by the user to
|
||||
@ -81,7 +94,7 @@ The <EPC Device> directory will also have a *start* field. Once
|
||||
"1" is written to this field, the endpoint device will be ready to
|
||||
establish the link with the host. This is usually done after
|
||||
all the EPF devices are created and linked with the EPC device.
|
||||
|
||||
::
|
||||
|
||||
| controllers/
|
||||
| <Directory: EPC name>/
|
||||
@ -102,4 +115,4 @@ all the EPF devices are created and linked with the EPC device.
|
||||
| interrupt_pin
|
||||
| function
|
||||
|
||||
[1] -> Documentation/PCI/endpoint/pci-endpoint.txt
|
||||
[1] :doc:`pci-endpoint`
|
@ -1,11 +1,13 @@
|
||||
PCI ENDPOINT FRAMEWORK
|
||||
Kishon Vijay Abraham I <kishon@ti.com>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||
|
||||
This document is a guide to use the PCI Endpoint Framework in order to create
|
||||
endpoint controller driver, endpoint function driver, and using configfs
|
||||
interface to bind the function driver to the controller driver.
|
||||
|
||||
1. Introduction
|
||||
Introduction
|
||||
============
|
||||
|
||||
Linux has a comprehensive PCI subsystem to support PCI controllers that
|
||||
operates in Root Complex mode. The subsystem has capability to scan PCI bus,
|
||||
@ -19,26 +21,30 @@ add endpoint mode support in Linux. This will help to run Linux in an
|
||||
EP system which can have a wide variety of use cases from testing or
|
||||
validation, co-processor accelerator, etc.
|
||||
|
||||
2. PCI Endpoint Core
|
||||
PCI Endpoint Core
|
||||
=================
|
||||
|
||||
The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller
|
||||
library, the Endpoint Function library, and the configfs layer to bind the
|
||||
endpoint function with the endpoint controller.
|
||||
|
||||
2.1 PCI Endpoint Controller(EPC) Library
|
||||
PCI Endpoint Controller(EPC) Library
|
||||
------------------------------------
|
||||
|
||||
The EPC library provides APIs to be used by the controller that can operate
|
||||
in endpoint mode. It also provides APIs to be used by function driver/library
|
||||
in order to implement a particular endpoint function.
|
||||
|
||||
2.1.1 APIs for the PCI controller Driver
|
||||
APIs for the PCI controller Driver
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||
by the PCI controller driver.
|
||||
|
||||
*) devm_pci_epc_create()/pci_epc_create()
|
||||
* devm_pci_epc_create()/pci_epc_create()
|
||||
|
||||
The PCI controller driver should implement the following ops:
|
||||
|
||||
* write_header: ops to populate configuration space header
|
||||
* set_bar: ops to configure the BAR
|
||||
* clear_bar: ops to reset the BAR
|
||||
@ -51,110 +57,116 @@ by the PCI controller driver.
|
||||
The PCI controller driver can then create a new EPC device by invoking
|
||||
devm_pci_epc_create()/pci_epc_create().
|
||||
|
||||
*) devm_pci_epc_destroy()/pci_epc_destroy()
|
||||
* devm_pci_epc_destroy()/pci_epc_destroy()
|
||||
|
||||
The PCI controller driver can destroy the EPC device created by either
|
||||
devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or
|
||||
pci_epc_destroy().
|
||||
|
||||
*) pci_epc_linkup()
|
||||
* pci_epc_linkup()
|
||||
|
||||
In order to notify all the function devices that the EPC device to which
|
||||
they are linked has established a link with the host, the PCI controller
|
||||
driver should invoke pci_epc_linkup().
|
||||
|
||||
*) pci_epc_mem_init()
|
||||
* pci_epc_mem_init()
|
||||
|
||||
Initialize the pci_epc_mem structure used for allocating EPC addr space.
|
||||
|
||||
*) pci_epc_mem_exit()
|
||||
* pci_epc_mem_exit()
|
||||
|
||||
Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
|
||||
|
||||
2.1.2 APIs for the PCI Endpoint Function Driver
|
||||
|
||||
APIs for the PCI Endpoint Function Driver
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||
by the PCI endpoint function driver.
|
||||
|
||||
*) pci_epc_write_header()
|
||||
* pci_epc_write_header()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_write_header() to
|
||||
write the standard configuration header to the endpoint controller.
|
||||
|
||||
*) pci_epc_set_bar()
|
||||
* pci_epc_set_bar()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_set_bar() to configure
|
||||
the Base Address Register in order for the host to assign PCI addr space.
|
||||
Register space of the function driver is usually configured
|
||||
using this API.
|
||||
|
||||
*) pci_epc_clear_bar()
|
||||
* pci_epc_clear_bar()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_clear_bar() to reset
|
||||
the BAR.
|
||||
|
||||
*) pci_epc_raise_irq()
|
||||
* pci_epc_raise_irq()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_raise_irq() to raise
|
||||
Legacy Interrupt, MSI or MSI-X Interrupt.
|
||||
|
||||
*) pci_epc_mem_alloc_addr()
|
||||
* pci_epc_mem_alloc_addr()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to
|
||||
allocate memory address from EPC addr space which is required to access
|
||||
RC's buffer
|
||||
|
||||
*) pci_epc_mem_free_addr()
|
||||
* pci_epc_mem_free_addr()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_mem_free_addr() to
|
||||
free the memory space allocated using pci_epc_mem_alloc_addr().
|
||||
|
||||
2.1.3 Other APIs
|
||||
Other APIs
|
||||
~~~~~~~~~~
|
||||
|
||||
There are other APIs provided by the EPC library. These are used for binding
|
||||
the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
|
||||
using these APIs.
|
||||
|
||||
*) pci_epc_get()
|
||||
* pci_epc_get()
|
||||
|
||||
Get a reference to the PCI endpoint controller based on the device name of
|
||||
the controller.
|
||||
|
||||
*) pci_epc_put()
|
||||
* pci_epc_put()
|
||||
|
||||
Release the reference to the PCI endpoint controller obtained using
|
||||
pci_epc_get()
|
||||
|
||||
*) pci_epc_add_epf()
|
||||
* pci_epc_add_epf()
|
||||
|
||||
Add a PCI endpoint function to a PCI endpoint controller. A PCIe device
|
||||
can have up to 8 functions according to the specification.
|
||||
|
||||
*) pci_epc_remove_epf()
|
||||
* pci_epc_remove_epf()
|
||||
|
||||
Remove the PCI endpoint function from PCI endpoint controller.
|
||||
|
||||
*) pci_epc_start()
|
||||
* pci_epc_start()
|
||||
|
||||
The PCI endpoint function driver should invoke pci_epc_start() once it
|
||||
has configured the endpoint function and wants to start the PCI link.
|
||||
|
||||
*) pci_epc_stop()
|
||||
* pci_epc_stop()
|
||||
|
||||
The PCI endpoint function driver should invoke pci_epc_stop() to stop
|
||||
the PCI LINK.
|
||||
|
||||
2.2 PCI Endpoint Function(EPF) Library
|
||||
|
||||
PCI Endpoint Function(EPF) Library
|
||||
----------------------------------
|
||||
|
||||
The EPF library provides APIs to be used by the function driver and the EPC
|
||||
library to provide endpoint mode functionality.
|
||||
|
||||
2.2.1 APIs for the PCI Endpoint Function Driver
|
||||
APIs for the PCI Endpoint Function Driver
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||
by the PCI endpoint function driver.
|
||||
|
||||
*) pci_epf_register_driver()
|
||||
* pci_epf_register_driver()
|
||||
|
||||
The PCI Endpoint Function driver should implement the following ops:
|
||||
* bind: ops to perform when a EPC device has been bound to EPF device
|
||||
@ -166,50 +178,54 @@ by the PCI endpoint function driver.
|
||||
The PCI Function driver can then register the PCI EPF driver by using
|
||||
pci_epf_register_driver().
|
||||
|
||||
*) pci_epf_unregister_driver()
|
||||
* pci_epf_unregister_driver()
|
||||
|
||||
The PCI Function driver can unregister the PCI EPF driver by using
|
||||
pci_epf_unregister_driver().
|
||||
|
||||
*) pci_epf_alloc_space()
|
||||
* pci_epf_alloc_space()
|
||||
|
||||
The PCI Function driver can allocate space for a particular BAR using
|
||||
pci_epf_alloc_space().
|
||||
|
||||
*) pci_epf_free_space()
|
||||
* pci_epf_free_space()
|
||||
|
||||
The PCI Function driver can free the allocated space
|
||||
(using pci_epf_alloc_space) by invoking pci_epf_free_space().
|
||||
|
||||
2.2.2 APIs for the PCI Endpoint Controller Library
|
||||
APIs for the PCI Endpoint Controller Library
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||
by the PCI endpoint controller library.
|
||||
|
||||
*) pci_epf_linkup()
|
||||
* pci_epf_linkup()
|
||||
|
||||
The PCI endpoint controller library invokes pci_epf_linkup() when the
|
||||
EPC device has established the connection to the host.
|
||||
|
||||
2.2.2 Other APIs
|
||||
Other APIs
|
||||
~~~~~~~~~~
|
||||
|
||||
There are other APIs provided by the EPF library. These are used to notify
|
||||
the function driver when the EPF device is bound to the EPC device.
|
||||
pci-ep-cfs.c can be used as reference for using these APIs.
|
||||
|
||||
*) pci_epf_create()
|
||||
* pci_epf_create()
|
||||
|
||||
Create a new PCI EPF device by passing the name of the PCI EPF device.
|
||||
This name will be used to bind the the EPF device to a EPF driver.
|
||||
|
||||
*) pci_epf_destroy()
|
||||
* pci_epf_destroy()
|
||||
|
||||
Destroy the created PCI EPF device.
|
||||
|
||||
*) pci_epf_bind()
|
||||
* pci_epf_bind()
|
||||
|
||||
pci_epf_bind() should be invoked when the EPF device has been bound to
|
||||
a EPC device.
|
||||
|
||||
*) pci_epf_unbind()
|
||||
* pci_epf_unbind()
|
||||
|
||||
pci_epf_unbind() should be invoked when the binding between EPC device
|
||||
and EPF device is lost.
|
@ -1,5 +1,10 @@
|
||||
PCI TEST
|
||||
Kishon Vijay Abraham I <kishon@ti.com>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=================
|
||||
PCI Test Function
|
||||
=================
|
||||
|
||||
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||
|
||||
Traditionally PCI RC has always been validated by using standard
|
||||
PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards.
|
||||
@ -23,65 +28,76 @@ The PCI endpoint test device has the following registers:
|
||||
8) PCI_ENDPOINT_TEST_IRQ_TYPE
|
||||
9) PCI_ENDPOINT_TEST_IRQ_NUMBER
|
||||
|
||||
*) PCI_ENDPOINT_TEST_MAGIC
|
||||
* PCI_ENDPOINT_TEST_MAGIC
|
||||
|
||||
This register will be used to test BAR0. A known pattern will be written
|
||||
and read back from MAGIC register to verify BAR0.
|
||||
|
||||
*) PCI_ENDPOINT_TEST_COMMAND:
|
||||
* PCI_ENDPOINT_TEST_COMMAND
|
||||
|
||||
This register will be used by the host driver to indicate the function
|
||||
that the endpoint device must perform.
|
||||
|
||||
Bitfield Description:
|
||||
Bit 0 : raise legacy IRQ
|
||||
Bit 1 : raise MSI IRQ
|
||||
Bit 2 : raise MSI-X IRQ
|
||||
Bit 3 : read command (read data from RC buffer)
|
||||
Bit 4 : write command (write data to RC buffer)
|
||||
Bit 5 : copy command (copy data from one RC buffer to another
|
||||
RC buffer)
|
||||
======== ================================================================
|
||||
Bitfield Description
|
||||
======== ================================================================
|
||||
Bit 0 raise legacy IRQ
|
||||
Bit 1 raise MSI IRQ
|
||||
Bit 2 raise MSI-X IRQ
|
||||
Bit 3 read command (read data from RC buffer)
|
||||
Bit 4 write command (write data to RC buffer)
|
||||
Bit 5 copy command (copy data from one RC buffer to another RC buffer)
|
||||
======== ================================================================
|
||||
|
||||
*) PCI_ENDPOINT_TEST_STATUS
|
||||
* PCI_ENDPOINT_TEST_STATUS
|
||||
|
||||
This register reflects the status of the PCI endpoint device.
|
||||
|
||||
Bitfield Description:
|
||||
Bit 0 : read success
|
||||
Bit 1 : read fail
|
||||
Bit 2 : write success
|
||||
Bit 3 : write fail
|
||||
Bit 4 : copy success
|
||||
Bit 5 : copy fail
|
||||
Bit 6 : IRQ raised
|
||||
Bit 7 : source address is invalid
|
||||
Bit 8 : destination address is invalid
|
||||
======== ==============================
|
||||
Bitfield Description
|
||||
======== ==============================
|
||||
Bit 0 read success
|
||||
Bit 1 read fail
|
||||
Bit 2 write success
|
||||
Bit 3 write fail
|
||||
Bit 4 copy success
|
||||
Bit 5 copy fail
|
||||
Bit 6 IRQ raised
|
||||
Bit 7 source address is invalid
|
||||
Bit 8 destination address is invalid
|
||||
======== ==============================
|
||||
|
||||
*) PCI_ENDPOINT_TEST_SRC_ADDR
|
||||
* PCI_ENDPOINT_TEST_SRC_ADDR
|
||||
|
||||
This register contains the source address (RC buffer address) for the
|
||||
COPY/READ command.
|
||||
|
||||
*) PCI_ENDPOINT_TEST_DST_ADDR
|
||||
* PCI_ENDPOINT_TEST_DST_ADDR
|
||||
|
||||
This register contains the destination address (RC buffer address) for
|
||||
the COPY/WRITE command.
|
||||
|
||||
*) PCI_ENDPOINT_TEST_IRQ_TYPE
|
||||
* PCI_ENDPOINT_TEST_IRQ_TYPE
|
||||
|
||||
This register contains the interrupt type (Legacy/MSI) triggered
|
||||
for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands.
|
||||
|
||||
Possible types:
|
||||
- Legacy : 0
|
||||
- MSI : 1
|
||||
- MSI-X : 2
|
||||
|
||||
*) PCI_ENDPOINT_TEST_IRQ_NUMBER
|
||||
====== ==
|
||||
Legacy 0
|
||||
MSI 1
|
||||
MSI-X 2
|
||||
====== ==
|
||||
|
||||
* PCI_ENDPOINT_TEST_IRQ_NUMBER
|
||||
|
||||
This register contains the triggered ID interrupt.
|
||||
|
||||
Admissible values:
|
||||
- Legacy : 0
|
||||
- MSI : [1 .. 32]
|
||||
- MSI-X : [1 .. 2048]
|
||||
|
||||
====== ===========
|
||||
Legacy 0
|
||||
MSI [1 .. 32]
|
||||
MSI-X [1 .. 2048]
|
||||
====== ===========
|
@ -1,38 +1,51 @@
|
||||
PCI TEST USERGUIDE
|
||||
Kishon Vijay Abraham I <kishon@ti.com>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===================
|
||||
PCI Test User Guide
|
||||
===================
|
||||
|
||||
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||
|
||||
This document is a guide to help users use pci-epf-test function driver
|
||||
and pci_endpoint_test host driver for testing PCI. The list of steps to
|
||||
be followed in the host side and EP side is given below.
|
||||
|
||||
1. Endpoint Device
|
||||
Endpoint Device
|
||||
===============
|
||||
|
||||
1.1 Endpoint Controller Devices
|
||||
Endpoint Controller Devices
|
||||
---------------------------
|
||||
|
||||
To find the list of endpoint controller devices in the system:
|
||||
To find the list of endpoint controller devices in the system::
|
||||
|
||||
# ls /sys/class/pci_epc/
|
||||
51000000.pcie_ep
|
||||
|
||||
If PCI_ENDPOINT_CONFIGFS is enabled
|
||||
If PCI_ENDPOINT_CONFIGFS is enabled::
|
||||
|
||||
# ls /sys/kernel/config/pci_ep/controllers
|
||||
51000000.pcie_ep
|
||||
|
||||
1.2 Endpoint Function Drivers
|
||||
|
||||
To find the list of endpoint function drivers in the system:
|
||||
Endpoint Function Drivers
|
||||
-------------------------
|
||||
|
||||
To find the list of endpoint function drivers in the system::
|
||||
|
||||
# ls /sys/bus/pci-epf/drivers
|
||||
pci_epf_test
|
||||
|
||||
If PCI_ENDPOINT_CONFIGFS is enabled
|
||||
If PCI_ENDPOINT_CONFIGFS is enabled::
|
||||
|
||||
# ls /sys/kernel/config/pci_ep/functions
|
||||
pci_epf_test
|
||||
|
||||
1.3 Creating pci-epf-test Device
|
||||
|
||||
Creating pci-epf-test Device
|
||||
----------------------------
|
||||
|
||||
PCI endpoint function device can be created using the configfs. To create
|
||||
pci-epf-test device, the following commands can be used
|
||||
pci-epf-test device, the following commands can be used::
|
||||
|
||||
# mount -t configfs none /sys/kernel/config
|
||||
# cd /sys/kernel/config/pci_ep/
|
||||
@ -42,7 +55,7 @@ The "mkdir func1" above creates the pci-epf-test function device that will
|
||||
be probed by pci_epf_test driver.
|
||||
|
||||
The PCI endpoint framework populates the directory with the following
|
||||
configurable fields.
|
||||
configurable fields::
|
||||
|
||||
# ls functions/pci_epf_test/func1
|
||||
baseclass_code interrupt_pin progif_code subsys_id
|
||||
@ -51,67 +64,83 @@ configurable fields.
|
||||
|
||||
The PCI endpoint function driver populates these entries with default values
|
||||
when the device is bound to the driver. The pci-epf-test driver populates
|
||||
vendorid with 0xffff and interrupt_pin with 0x0001
|
||||
vendorid with 0xffff and interrupt_pin with 0x0001::
|
||||
|
||||
# cat functions/pci_epf_test/func1/vendorid
|
||||
0xffff
|
||||
# cat functions/pci_epf_test/func1/interrupt_pin
|
||||
0x0001
|
||||
|
||||
1.4 Configuring pci-epf-test Device
|
||||
|
||||
Configuring pci-epf-test Device
|
||||
-------------------------------
|
||||
|
||||
The user can configure the pci-epf-test device using configfs entry. In order
|
||||
to change the vendorid and the number of MSI interrupts used by the function
|
||||
device, the following commands can be used.
|
||||
device, the following commands can be used::
|
||||
|
||||
# echo 0x104c > functions/pci_epf_test/func1/vendorid
|
||||
# echo 0xb500 > functions/pci_epf_test/func1/deviceid
|
||||
# echo 16 > functions/pci_epf_test/func1/msi_interrupts
|
||||
# echo 8 > functions/pci_epf_test/func1/msix_interrupts
|
||||
|
||||
1.5 Binding pci-epf-test Device to EP Controller
|
||||
|
||||
Binding pci-epf-test Device to EP Controller
|
||||
--------------------------------------------
|
||||
|
||||
In order for the endpoint function device to be useful, it has to be bound to
|
||||
a PCI endpoint controller driver. Use the configfs to bind the function
|
||||
device to one of the controller driver present in the system.
|
||||
device to one of the controller driver present in the system::
|
||||
|
||||
# ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/
|
||||
|
||||
Once the above step is completed, the PCI endpoint is ready to establish a link
|
||||
with the host.
|
||||
|
||||
1.6 Start the Link
|
||||
|
||||
Start the Link
|
||||
--------------
|
||||
|
||||
In order for the endpoint device to establish a link with the host, the _start_
|
||||
field should be populated with '1'.
|
||||
field should be populated with '1'::
|
||||
|
||||
# echo 1 > controllers/51000000.pcie_ep/start
|
||||
|
||||
2. RootComplex Device
|
||||
|
||||
2.1 lspci Output
|
||||
RootComplex Device
|
||||
==================
|
||||
|
||||
Note that the devices listed here correspond to the value populated in 1.4 above
|
||||
lspci Output
|
||||
------------
|
||||
|
||||
Note that the devices listed here correspond to the value populated in 1.4
|
||||
above::
|
||||
|
||||
00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
|
||||
01:00.0 Unassigned class [ff00]: Texas Instruments Device b500
|
||||
|
||||
2.2 Using Endpoint Test function Device
|
||||
|
||||
Using Endpoint Test function Device
|
||||
-----------------------------------
|
||||
|
||||
pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
|
||||
tests. To compile this tool the following commands should be used:
|
||||
tests. To compile this tool the following commands should be used::
|
||||
|
||||
# cd <kernel-dir>
|
||||
# make -C tools/pci
|
||||
|
||||
or if you desire to compile and install in your system:
|
||||
or if you desire to compile and install in your system::
|
||||
|
||||
# cd <kernel-dir>
|
||||
# make -C tools/pci install
|
||||
|
||||
The tool and script will be located in <rootfs>/usr/bin/
|
||||
|
||||
2.2.1 pcitest.sh Output
|
||||
|
||||
pcitest.sh Output
|
||||
~~~~~~~~~~~~~~~~~
|
||||
::
|
||||
|
||||
# pcitest.sh
|
||||
BAR tests
|
||||
|
18
Documentation/PCI/index.rst
Normal file
18
Documentation/PCI/index.rst
Normal file
@ -0,0 +1,18 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================
|
||||
Linux PCI Bus Subsystem
|
||||
=======================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:numbered:
|
||||
|
||||
pci
|
||||
picebus-howto
|
||||
pci-iov-howto
|
||||
msi-howto
|
||||
acpi-info
|
||||
pci-error-recovery
|
||||
pcieaer-howto
|
||||
endpoint/index
|
@ -1,13 +1,16 @@
|
||||
The MSI Driver Guide HOWTO
|
||||
Tom L Nguyen tom.l.nguyen@intel.com
|
||||
10/03/2003
|
||||
Revised Feb 12, 2004 by Martine Silbermann
|
||||
email: Martine.Silbermann@hp.com
|
||||
Revised Jun 25, 2004 by Tom L Nguyen
|
||||
Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com>
|
||||
Copyright 2003, 2008 Intel Corporation
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
1. About this guide
|
||||
==========================
|
||||
The MSI Driver Guide HOWTO
|
||||
==========================
|
||||
|
||||
:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox
|
||||
|
||||
:Copyright: 2003, 2008 Intel Corporation
|
||||
|
||||
About this guide
|
||||
================
|
||||
|
||||
This guide describes the basics of Message Signaled Interrupts (MSIs),
|
||||
the advantages of using MSI over traditional interrupt mechanisms, how
|
||||
@ -15,7 +18,8 @@ to change your driver to use MSI or MSI-X and some basic diagnostics to
|
||||
try if a device doesn't support MSIs.
|
||||
|
||||
|
||||
2. What are MSIs?
|
||||
What are MSIs?
|
||||
==============
|
||||
|
||||
A Message Signaled Interrupt is a write from the device to a special
|
||||
address which causes an interrupt to be received by the CPU.
|
||||
@ -29,7 +33,8 @@ Devices may support both MSI and MSI-X, but only one can be enabled at
|
||||
a time.
|
||||
|
||||
|
||||
3. Why use MSIs?
|
||||
Why use MSIs?
|
||||
=============
|
||||
|
||||
There are three reasons why using MSIs can give an advantage over
|
||||
traditional pin-based interrupts.
|
||||
@ -61,14 +66,16 @@ Other possible designs include giving one interrupt to each packet queue
|
||||
in a network card or each port in a storage controller.
|
||||
|
||||
|
||||
4. How to use MSIs
|
||||
How to use MSIs
|
||||
===============
|
||||
|
||||
PCI devices are initialised to use pin-based interrupts. The device
|
||||
driver has to set up the device to use MSI or MSI-X. Not all machines
|
||||
support MSIs correctly, and for those machines, the APIs described below
|
||||
will simply fail and the device will continue to use pin-based interrupts.
|
||||
|
||||
4.1 Include kernel support for MSIs
|
||||
Include kernel support for MSIs
|
||||
-------------------------------
|
||||
|
||||
To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
|
||||
option enabled. This option is only available on some architectures,
|
||||
@ -76,14 +83,15 @@ and it may depend on some other options also being set. For example,
|
||||
on x86, you must also enable X86_UP_APIC or SMP in order to see the
|
||||
CONFIG_PCI_MSI option.
|
||||
|
||||
4.2 Using MSI
|
||||
Using MSI
|
||||
---------
|
||||
|
||||
Most of the hard work is done for the driver in the PCI layer. The driver
|
||||
simply has to request that the PCI layer set up the MSI capability for this
|
||||
device.
|
||||
|
||||
To automatically use MSI or MSI-X interrupt vectors, use the following
|
||||
function:
|
||||
function::
|
||||
|
||||
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
|
||||
unsigned int max_vecs, unsigned int flags);
|
||||
@ -101,12 +109,12 @@ any possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set,
|
||||
pci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
|
||||
|
||||
To get the Linux IRQ numbers passed to request_irq() and free_irq() and the
|
||||
vectors, use the following function:
|
||||
vectors, use the following function::
|
||||
|
||||
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
|
||||
|
||||
Any allocated resources should be freed before removing the device using
|
||||
the following function:
|
||||
the following function::
|
||||
|
||||
void pci_free_irq_vectors(struct pci_dev *dev);
|
||||
|
||||
@ -126,7 +134,7 @@ The typical usage of MSI or MSI-X interrupts is to allocate as many vectors
|
||||
as possible, likely up to the limit supported by the device. If nvec is
|
||||
larger than the number supported by the device it will automatically be
|
||||
capped to the supported limit, so there is no need to query the number of
|
||||
vectors supported beforehand:
|
||||
vectors supported beforehand::
|
||||
|
||||
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
|
||||
if (nvec < 0)
|
||||
@ -135,7 +143,7 @@ vectors supported beforehand:
|
||||
If a driver is unable or unwilling to deal with a variable number of MSI
|
||||
interrupts it can request a particular number of interrupts by passing that
|
||||
number to pci_alloc_irq_vectors() function as both 'min_vecs' and
|
||||
'max_vecs' parameters:
|
||||
'max_vecs' parameters::
|
||||
|
||||
ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
|
||||
if (ret < 0)
|
||||
@ -143,23 +151,24 @@ number to pci_alloc_irq_vectors() function as both 'min_vecs' and
|
||||
|
||||
The most notorious example of the request type described above is enabling
|
||||
the single MSI mode for a device. It could be done by passing two 1s as
|
||||
'min_vecs' and 'max_vecs':
|
||||
'min_vecs' and 'max_vecs'::
|
||||
|
||||
ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
|
||||
if (ret < 0)
|
||||
goto out_err;
|
||||
|
||||
Some devices might not support using legacy line interrupts, in which case
|
||||
the driver can specify that only MSI or MSI-X is acceptable:
|
||||
the driver can specify that only MSI or MSI-X is acceptable::
|
||||
|
||||
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
|
||||
if (nvec < 0)
|
||||
goto out_err;
|
||||
|
||||
4.3 Legacy APIs
|
||||
Legacy APIs
|
||||
-----------
|
||||
|
||||
The following old APIs to enable and disable MSI or MSI-X interrupts should
|
||||
not be used in new code:
|
||||
not be used in new code::
|
||||
|
||||
pci_enable_msi() /* deprecated */
|
||||
pci_disable_msi() /* deprecated */
|
||||
@ -174,9 +183,11 @@ number of vectors. If you have a legitimate special use case for the count
|
||||
of vectors we might have to revisit that decision and add a
|
||||
pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
|
||||
|
||||
4.4 Considerations when using MSIs
|
||||
Considerations when using MSIs
|
||||
------------------------------
|
||||
|
||||
4.4.1 Spinlocks
|
||||
Spinlocks
|
||||
~~~~~~~~~
|
||||
|
||||
Most device drivers have a per-device spinlock which is taken in the
|
||||
interrupt handler. With pin-based interrupts or a single MSI, it is not
|
||||
@ -188,7 +199,8 @@ acquire the spinlock. Such deadlocks can be avoided by using
|
||||
spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
|
||||
and acquire the lock (see Documentation/kernel-hacking/locking.rst).
|
||||
|
||||
4.5 How to tell whether MSI/MSI-X is enabled on a device
|
||||
How to tell whether MSI/MSI-X is enabled on a device
|
||||
----------------------------------------------------
|
||||
|
||||
Using 'lspci -v' (as root) may show some devices with "MSI", "Message
|
||||
Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities
|
||||
@ -196,7 +208,8 @@ has an 'Enable' flag which is followed with either "+" (enabled)
|
||||
or "-" (disabled).
|
||||
|
||||
|
||||
5. MSI quirks
|
||||
MSI quirks
|
||||
==========
|
||||
|
||||
Several PCI chipsets or devices are known not to support MSIs.
|
||||
The PCI stack provides three ways to disable MSIs:
|
||||
@ -205,7 +218,8 @@ The PCI stack provides three ways to disable MSIs:
|
||||
2. on all devices behind a specific bridge
|
||||
3. on a single device
|
||||
|
||||
5.1. Disabling MSIs globally
|
||||
Disabling MSIs globally
|
||||
-----------------------
|
||||
|
||||
Some host chipsets simply don't support MSIs properly. If we're
|
||||
lucky, the manufacturer knows this and has indicated it in the ACPI
|
||||
@ -219,7 +233,8 @@ on the kernel command line to disable MSIs on all devices. It would be
|
||||
in your best interests to report the problem to linux-pci@vger.kernel.org
|
||||
including a full 'lspci -v' so we can add the quirks to the kernel.
|
||||
|
||||
5.2. Disabling MSIs below a bridge
|
||||
Disabling MSIs below a bridge
|
||||
-----------------------------
|
||||
|
||||
Some PCI bridges are not able to route MSIs between busses properly.
|
||||
In this case, MSIs must be disabled on all devices behind the bridge.
|
||||
@ -230,7 +245,7 @@ as the nVidia nForce and Serverworks HT2000). As with host chipsets,
|
||||
Linux mostly knows about them and automatically enables MSIs if it can.
|
||||
If you have a bridge unknown to Linux, you can enable
|
||||
MSIs in configuration space using whatever method you know works, then
|
||||
enable MSIs on that bridge by doing:
|
||||
enable MSIs on that bridge by doing::
|
||||
|
||||
echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
|
||||
|
||||
@ -244,7 +259,8 @@ below this bridge.
|
||||
Again, please notify linux-pci@vger.kernel.org of any bridges that need
|
||||
special handling.
|
||||
|
||||
5.3. Disabling MSIs on a single device
|
||||
Disabling MSIs on a single device
|
||||
---------------------------------
|
||||
|
||||
Some devices are known to have faulty MSI implementations. Usually this
|
||||
is handled in the individual device driver, but occasionally it's necessary
|
||||
@ -252,7 +268,8 @@ to handle this with a quirk. Some drivers have an option to disable use
|
||||
of MSI. While this is a convenient workaround for the driver author,
|
||||
it is not good practice, and should not be emulated.
|
||||
|
||||
5.4. Finding why MSIs are disabled on a device
|
||||
Finding why MSIs are disabled on a device
|
||||
-----------------------------------------
|
||||
|
||||
From the above three sections, you can see that there are many reasons
|
||||
why MSIs may not be enabled for a given device. Your first step should
|
||||
@ -260,8 +277,8 @@ be to examine your dmesg carefully to determine whether MSIs are enabled
|
||||
for your machine. You should also check your .config to be sure you
|
||||
have enabled CONFIG_PCI_MSI.
|
||||
|
||||
Then, 'lspci -t' gives the list of bridges above a device. Reading
|
||||
/sys/bus/pci/devices/*/msi_bus will tell you whether MSIs are enabled (1)
|
||||
Then, 'lspci -t' gives the list of bridges above a device. Reading
|
||||
`/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1)
|
||||
or disabled (0). If 0 is found in any of the msi_bus files belonging
|
||||
to bridges between the PCI root and the device, MSIs are disabled.
|
||||
|
@ -1,12 +1,13 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
PCI Error Recovery
|
||||
------------------
|
||||
February 2, 2006
|
||||
==================
|
||||
PCI Error Recovery
|
||||
==================
|
||||
|
||||
Current document maintainer:
|
||||
Linas Vepstas <linasvepstas@gmail.com>
|
||||
updated by Richard Lary <rlary@us.ibm.com>
|
||||
and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009
|
||||
|
||||
:Authors: - Linas Vepstas <linasvepstas@gmail.com>
|
||||
- Richard Lary <rlary@us.ibm.com>
|
||||
- Mike Mason <mmlnx@us.ibm.com>
|
||||
|
||||
|
||||
Many PCI bus controllers are able to detect a variety of hardware
|
||||
@ -63,7 +64,8 @@ mechanisms for dealing with SCSI bus errors and SCSI bus resets.
|
||||
|
||||
|
||||
Detailed Design
|
||||
---------------
|
||||
===============
|
||||
|
||||
Design and implementation details below, based on a chain of
|
||||
public email discussions with Ben Herrenschmidt, circa 5 April 2005.
|
||||
|
||||
@ -73,30 +75,33 @@ pci_driver. A driver that fails to provide the structure is "non-aware",
|
||||
and the actual recovery steps taken are platform dependent. The
|
||||
arch/powerpc implementation will simulate a PCI hotplug remove/add.
|
||||
|
||||
This structure has the form:
|
||||
struct pci_error_handlers
|
||||
{
|
||||
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
|
||||
int (*mmio_enabled)(struct pci_dev *dev);
|
||||
int (*slot_reset)(struct pci_dev *dev);
|
||||
void (*resume)(struct pci_dev *dev);
|
||||
};
|
||||
This structure has the form::
|
||||
|
||||
The possible channel states are:
|
||||
enum pci_channel_state {
|
||||
pci_channel_io_normal, /* I/O channel is in normal state */
|
||||
pci_channel_io_frozen, /* I/O to channel is blocked */
|
||||
pci_channel_io_perm_failure, /* PCI card is dead */
|
||||
};
|
||||
struct pci_error_handlers
|
||||
{
|
||||
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
|
||||
int (*mmio_enabled)(struct pci_dev *dev);
|
||||
int (*slot_reset)(struct pci_dev *dev);
|
||||
void (*resume)(struct pci_dev *dev);
|
||||
};
|
||||
|
||||
Possible return values are:
|
||||
enum pci_ers_result {
|
||||
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
|
||||
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
|
||||
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
|
||||
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
|
||||
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
|
||||
};
|
||||
The possible channel states are::
|
||||
|
||||
enum pci_channel_state {
|
||||
pci_channel_io_normal, /* I/O channel is in normal state */
|
||||
pci_channel_io_frozen, /* I/O to channel is blocked */
|
||||
pci_channel_io_perm_failure, /* PCI card is dead */
|
||||
};
|
||||
|
||||
Possible return values are::
|
||||
|
||||
enum pci_ers_result {
|
||||
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
|
||||
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
|
||||
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
|
||||
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
|
||||
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
|
||||
};
|
||||
|
||||
A driver does not have to implement all of these callbacks; however,
|
||||
if it implements any, it must implement error_detected(). If a callback
|
||||
@ -134,16 +139,17 @@ shouldn't do any new IOs. Called in task context. This is sort of a
|
||||
|
||||
All drivers participating in this system must implement this call.
|
||||
The driver must return one of the following result codes:
|
||||
- PCI_ERS_RESULT_CAN_RECOVER:
|
||||
Driver returns this if it thinks it might be able to recover
|
||||
the HW by just banging IOs or if it wants to be given
|
||||
a chance to extract some diagnostic information (see
|
||||
mmio_enable, below).
|
||||
- PCI_ERS_RESULT_NEED_RESET:
|
||||
Driver returns this if it can't recover without a
|
||||
slot reset.
|
||||
- PCI_ERS_RESULT_DISCONNECT:
|
||||
Driver returns this if it doesn't want to recover at all.
|
||||
|
||||
- PCI_ERS_RESULT_CAN_RECOVER
|
||||
Driver returns this if it thinks it might be able to recover
|
||||
the HW by just banging IOs or if it wants to be given
|
||||
a chance to extract some diagnostic information (see
|
||||
mmio_enable, below).
|
||||
- PCI_ERS_RESULT_NEED_RESET
|
||||
Driver returns this if it can't recover without a
|
||||
slot reset.
|
||||
- PCI_ERS_RESULT_DISCONNECT
|
||||
Driver returns this if it doesn't want to recover at all.
|
||||
|
||||
The next step taken will depend on the result codes returned by the
|
||||
drivers.
|
||||
@ -159,25 +165,27 @@ then recovery proceeds to STEP 4 (Slot Reset).
|
||||
If the platform is unable to recover the slot, the next step
|
||||
is STEP 6 (Permanent Failure).
|
||||
|
||||
>>> The current powerpc implementation assumes that a device driver will
|
||||
>>> *not* schedule or semaphore in this routine; the current powerpc
|
||||
>>> implementation uses one kernel thread to notify all devices;
|
||||
>>> thus, if one device sleeps/schedules, all devices are affected.
|
||||
>>> Doing better requires complex multi-threaded logic in the error
|
||||
>>> recovery implementation (e.g. waiting for all notification threads
|
||||
>>> to "join" before proceeding with recovery.) This seems excessively
|
||||
>>> complex and not worth implementing.
|
||||
.. note::
|
||||
|
||||
>>> The current powerpc implementation doesn't much care if the device
|
||||
>>> attempts I/O at this point, or not. I/O's will fail, returning
|
||||
>>> a value of 0xff on read, and writes will be dropped. If more than
|
||||
>>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
|
||||
>>> assumes that the device driver has gone into an infinite loop
|
||||
>>> and prints an error to syslog. A reboot is then required to
|
||||
>>> get the device working again.
|
||||
The current powerpc implementation assumes that a device driver will
|
||||
*not* schedule or semaphore in this routine; the current powerpc
|
||||
implementation uses one kernel thread to notify all devices;
|
||||
thus, if one device sleeps/schedules, all devices are affected.
|
||||
Doing better requires complex multi-threaded logic in the error
|
||||
recovery implementation (e.g. waiting for all notification threads
|
||||
to "join" before proceeding with recovery.) This seems excessively
|
||||
complex and not worth implementing.
|
||||
|
||||
The current powerpc implementation doesn't much care if the device
|
||||
attempts I/O at this point, or not. I/O's will fail, returning
|
||||
a value of 0xff on read, and writes will be dropped. If more than
|
||||
EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
|
||||
assumes that the device driver has gone into an infinite loop
|
||||
and prints an error to syslog. A reboot is then required to
|
||||
get the device working again.
|
||||
|
||||
STEP 2: MMIO Enabled
|
||||
-------------------
|
||||
--------------------
|
||||
The platform re-enables MMIO to the device (but typically not the
|
||||
DMA), and then calls the mmio_enabled() callback on all affected
|
||||
device drivers.
|
||||
@ -192,34 +200,36 @@ link reset was performed by the HW. If the platform can't just re-enable IOs
|
||||
without a slot reset or a link reset, it will not call this callback, and
|
||||
instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
|
||||
|
||||
>>> The following is proposed; no platform implements this yet:
|
||||
>>> Proposal: All I/O's should be done _synchronously_ from within
|
||||
>>> this callback, errors triggered by them will be returned via
|
||||
>>> the normal pci_check_whatever() API, no new error_detected()
|
||||
>>> callback will be issued due to an error happening here. However,
|
||||
>>> such an error might cause IOs to be re-blocked for the whole
|
||||
>>> segment, and thus invalidate the recovery that other devices
|
||||
>>> on the same segment might have done, forcing the whole segment
|
||||
>>> into one of the next states, that is, link reset or slot reset.
|
||||
.. note::
|
||||
|
||||
The following is proposed; no platform implements this yet:
|
||||
Proposal: All I/O's should be done _synchronously_ from within
|
||||
this callback, errors triggered by them will be returned via
|
||||
the normal pci_check_whatever() API, no new error_detected()
|
||||
callback will be issued due to an error happening here. However,
|
||||
such an error might cause IOs to be re-blocked for the whole
|
||||
segment, and thus invalidate the recovery that other devices
|
||||
on the same segment might have done, forcing the whole segment
|
||||
into one of the next states, that is, link reset or slot reset.
|
||||
|
||||
The driver should return one of the following result codes:
|
||||
- PCI_ERS_RESULT_RECOVERED
|
||||
Driver returns this if it thinks the device is fully
|
||||
functional and thinks it is ready to start
|
||||
normal driver operations again. There is no
|
||||
guarantee that the driver will actually be
|
||||
allowed to proceed, as another driver on the
|
||||
same segment might have failed and thus triggered a
|
||||
slot reset on platforms that support it.
|
||||
- PCI_ERS_RESULT_RECOVERED
|
||||
Driver returns this if it thinks the device is fully
|
||||
functional and thinks it is ready to start
|
||||
normal driver operations again. There is no
|
||||
guarantee that the driver will actually be
|
||||
allowed to proceed, as another driver on the
|
||||
same segment might have failed and thus triggered a
|
||||
slot reset on platforms that support it.
|
||||
|
||||
- PCI_ERS_RESULT_NEED_RESET
|
||||
Driver returns this if it thinks the device is not
|
||||
recoverable in its current state and it needs a slot
|
||||
reset to proceed.
|
||||
- PCI_ERS_RESULT_NEED_RESET
|
||||
Driver returns this if it thinks the device is not
|
||||
recoverable in its current state and it needs a slot
|
||||
reset to proceed.
|
||||
|
||||
- PCI_ERS_RESULT_DISCONNECT
|
||||
Same as above. Total failure, no recovery even after
|
||||
reset driver dead. (To be defined more precisely)
|
||||
- PCI_ERS_RESULT_DISCONNECT
|
||||
Same as above. Total failure, no recovery even after
|
||||
reset driver dead. (To be defined more precisely)
|
||||
|
||||
The next step taken depends on the results returned by the drivers.
|
||||
If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
|
||||
@ -293,31 +303,33 @@ device will be considered "dead" in this case.
|
||||
Drivers for multi-function cards will need to coordinate among
|
||||
themselves as to which driver instance will perform any "one-shot"
|
||||
or global device initialization. For example, the Symbios sym53cxx2
|
||||
driver performs device init only from PCI function 0:
|
||||
driver performs device init only from PCI function 0::
|
||||
|
||||
+ if (PCI_FUNC(pdev->devfn) == 0)
|
||||
+ sym_reset_scsi_bus(np, 0);
|
||||
+ if (PCI_FUNC(pdev->devfn) == 0)
|
||||
+ sym_reset_scsi_bus(np, 0);
|
||||
|
||||
Result codes:
|
||||
- PCI_ERS_RESULT_DISCONNECT
|
||||
Same as above.
|
||||
Result codes:
|
||||
- PCI_ERS_RESULT_DISCONNECT
|
||||
Same as above.
|
||||
|
||||
Drivers for PCI Express cards that require a fundamental reset must
|
||||
set the needs_freset bit in the pci_dev structure in their probe function.
|
||||
For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
|
||||
PCI card types:
|
||||
PCI card types::
|
||||
|
||||
+ /* Set EEH reset type to fundamental if required by hba */
|
||||
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
|
||||
+ pdev->needs_freset = 1;
|
||||
+
|
||||
+ /* Set EEH reset type to fundamental if required by hba */
|
||||
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
|
||||
+ pdev->needs_freset = 1;
|
||||
+
|
||||
|
||||
Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
|
||||
Failure).
|
||||
|
||||
>>> The current powerpc implementation does not try a power-cycle
|
||||
>>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
|
||||
>>> However, it probably should.
|
||||
.. note::
|
||||
|
||||
The current powerpc implementation does not try a power-cycle
|
||||
reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
|
||||
However, it probably should.
|
||||
|
||||
|
||||
STEP 5: Resume Operations
|
||||
@ -370,44 +382,43 @@ The current policy is to turn this into a platform policy.
|
||||
That is, the recovery API only requires that:
|
||||
|
||||
- There is no guarantee that interrupt delivery can proceed from any
|
||||
device on the segment starting from the error detection and until the
|
||||
slot_reset callback is called, at which point interrupts are expected
|
||||
to be fully operational.
|
||||
device on the segment starting from the error detection and until the
|
||||
slot_reset callback is called, at which point interrupts are expected
|
||||
to be fully operational.
|
||||
|
||||
- There is no guarantee that interrupt delivery is stopped, that is,
|
||||
a driver that gets an interrupt after detecting an error, or that detects
|
||||
an error within the interrupt handler such that it prevents proper
|
||||
ack'ing of the interrupt (and thus removal of the source) should just
|
||||
return IRQ_NOTHANDLED. It's up to the platform to deal with that
|
||||
condition, typically by masking the IRQ source during the duration of
|
||||
the error handling. It is expected that the platform "knows" which
|
||||
interrupts are routed to error-management capable slots and can deal
|
||||
with temporarily disabling that IRQ number during error processing (this
|
||||
isn't terribly complex). That means some IRQ latency for other devices
|
||||
sharing the interrupt, but there is simply no other way. High end
|
||||
platforms aren't supposed to share interrupts between many devices
|
||||
anyway :)
|
||||
a driver that gets an interrupt after detecting an error, or that detects
|
||||
an error within the interrupt handler such that it prevents proper
|
||||
ack'ing of the interrupt (and thus removal of the source) should just
|
||||
return IRQ_NOTHANDLED. It's up to the platform to deal with that
|
||||
condition, typically by masking the IRQ source during the duration of
|
||||
the error handling. It is expected that the platform "knows" which
|
||||
interrupts are routed to error-management capable slots and can deal
|
||||
with temporarily disabling that IRQ number during error processing (this
|
||||
isn't terribly complex). That means some IRQ latency for other devices
|
||||
sharing the interrupt, but there is simply no other way. High end
|
||||
platforms aren't supposed to share interrupts between many devices
|
||||
anyway :)
|
||||
|
||||
>>> Implementation details for the powerpc platform are discussed in
|
||||
>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
|
||||
.. note::
|
||||
|
||||
>>> As of this writing, there is a growing list of device drivers with
|
||||
>>> patches implementing error recovery. Not all of these patches are in
|
||||
>>> mainline yet. These may be used as "examples":
|
||||
>>>
|
||||
>>> drivers/scsi/ipr
|
||||
>>> drivers/scsi/sym53c8xx_2
|
||||
>>> drivers/scsi/qla2xxx
|
||||
>>> drivers/scsi/lpfc
|
||||
>>> drivers/next/bnx2.c
|
||||
>>> drivers/next/e100.c
|
||||
>>> drivers/net/e1000
|
||||
>>> drivers/net/e1000e
|
||||
>>> drivers/net/ixgb
|
||||
>>> drivers/net/ixgbe
|
||||
>>> drivers/net/cxgb3
|
||||
>>> drivers/net/s2io.c
|
||||
>>> drivers/net/qlge
|
||||
Implementation details for the powerpc platform are discussed in
|
||||
the file Documentation/powerpc/eeh-pci-error-recovery.txt
|
||||
|
||||
The End
|
||||
-------
|
||||
As of this writing, there is a growing list of device drivers with
|
||||
patches implementing error recovery. Not all of these patches are in
|
||||
mainline yet. These may be used as "examples":
|
||||
|
||||
- drivers/scsi/ipr
|
||||
- drivers/scsi/sym53c8xx_2
|
||||
- drivers/scsi/qla2xxx
|
||||
- drivers/scsi/lpfc
|
||||
- drivers/next/bnx2.c
|
||||
- drivers/next/e100.c
|
||||
- drivers/net/e1000
|
||||
- drivers/net/e1000e
|
||||
- drivers/net/ixgb
|
||||
- drivers/net/ixgbe
|
||||
- drivers/net/cxgb3
|
||||
- drivers/net/s2io.c
|
||||
- drivers/net/qlge
|
@ -1,14 +1,19 @@
|
||||
PCI Express I/O Virtualization Howto
|
||||
Copyright (C) 2009 Intel Corporation
|
||||
Yu Zhao <yu.zhao@intel.com>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
Update: November 2012
|
||||
-- sysfs-based SRIOV enable-/disable-ment
|
||||
Donald Dutile <ddutile@redhat.com>
|
||||
====================================
|
||||
PCI Express I/O Virtualization Howto
|
||||
====================================
|
||||
|
||||
1. Overview
|
||||
:Copyright: |copy| 2009 Intel Corporation
|
||||
:Authors: - Yu Zhao <yu.zhao@intel.com>
|
||||
- Donald Dutile <ddutile@redhat.com>
|
||||
|
||||
1.1 What is SR-IOV
|
||||
Overview
|
||||
========
|
||||
|
||||
What is SR-IOV
|
||||
--------------
|
||||
|
||||
Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
|
||||
capability which makes one physical device appear as multiple virtual
|
||||
@ -23,9 +28,11 @@ Memory Space, which is used to map its register set. VF device driver
|
||||
operates on the register set so it can be functional and appear as a
|
||||
real existing PCI device.
|
||||
|
||||
2. User Guide
|
||||
User Guide
|
||||
==========
|
||||
|
||||
2.1 How can I enable SR-IOV capability
|
||||
How can I enable SR-IOV capability
|
||||
----------------------------------
|
||||
|
||||
Multiple methods are available for SR-IOV enablement.
|
||||
In the first method, the device driver (PF driver) will control the
|
||||
@ -43,105 +50,123 @@ checks, e.g., check numvfs == 0 if enabling VFs, ensure
|
||||
numvfs <= totalvfs.
|
||||
The second method is the recommended method for new/future VF devices.
|
||||
|
||||
2.2 How can I use the Virtual Functions
|
||||
How can I use the Virtual Functions
|
||||
-----------------------------------
|
||||
|
||||
The VF is treated as hot-plugged PCI devices in the kernel, so they
|
||||
should be able to work in the same way as real PCI devices. The VF
|
||||
requires device driver that is same as a normal PCI device's.
|
||||
|
||||
3. Developer Guide
|
||||
Developer Guide
|
||||
===============
|
||||
|
||||
3.1 SR-IOV API
|
||||
SR-IOV API
|
||||
----------
|
||||
|
||||
To enable SR-IOV capability:
|
||||
(a) For the first method, in the driver:
|
||||
|
||||
(a) For the first method, in the driver::
|
||||
|
||||
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
|
||||
'nr_virtfn' is number of VFs to be enabled.
|
||||
(b) For the second method, from sysfs:
|
||||
|
||||
'nr_virtfn' is number of VFs to be enabled.
|
||||
|
||||
(b) For the second method, from sysfs::
|
||||
|
||||
echo 'nr_virtfn' > \
|
||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
|
||||
|
||||
To disable SR-IOV capability:
|
||||
(a) For the first method, in the driver:
|
||||
|
||||
(a) For the first method, in the driver::
|
||||
|
||||
void pci_disable_sriov(struct pci_dev *dev);
|
||||
(b) For the second method, from sysfs:
|
||||
|
||||
(b) For the second method, from sysfs::
|
||||
|
||||
echo 0 > \
|
||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
|
||||
|
||||
To enable auto probing VFs by a compatible driver on the host, run
|
||||
command below before enabling SR-IOV capabilities. This is the
|
||||
default behavior.
|
||||
::
|
||||
|
||||
echo 1 > \
|
||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
|
||||
|
||||
To disable auto probing VFs by a compatible driver on the host, run
|
||||
command below before enabling SR-IOV capabilities. Updating this
|
||||
entry will not affect VFs which are already probed.
|
||||
::
|
||||
|
||||
echo 0 > \
|
||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
|
||||
|
||||
3.2 Usage example
|
||||
Usage example
|
||||
-------------
|
||||
|
||||
Following piece of code illustrates the usage of the SR-IOV API.
|
||||
::
|
||||
|
||||
static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
|
||||
{
|
||||
pci_enable_sriov(dev, NR_VIRTFN);
|
||||
static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
|
||||
{
|
||||
pci_enable_sriov(dev, NR_VIRTFN);
|
||||
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void dev_remove(struct pci_dev *dev)
|
||||
{
|
||||
pci_disable_sriov(dev);
|
||||
|
||||
...
|
||||
}
|
||||
|
||||
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
|
||||
{
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int dev_resume(struct pci_dev *dev)
|
||||
{
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void dev_shutdown(struct pci_dev *dev)
|
||||
{
|
||||
...
|
||||
}
|
||||
|
||||
static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
|
||||
{
|
||||
if (numvfs > 0) {
|
||||
...
|
||||
pci_enable_sriov(dev, numvfs);
|
||||
...
|
||||
return numvfs;
|
||||
}
|
||||
if (numvfs == 0) {
|
||||
....
|
||||
pci_disable_sriov(dev);
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
static struct pci_driver dev_driver = {
|
||||
.name = "SR-IOV Physical Function driver",
|
||||
.id_table = dev_id_table,
|
||||
.probe = dev_probe,
|
||||
.remove = dev_remove,
|
||||
.suspend = dev_suspend,
|
||||
.resume = dev_resume,
|
||||
.shutdown = dev_shutdown,
|
||||
.sriov_configure = dev_sriov_configure,
|
||||
};
|
||||
static void dev_remove(struct pci_dev *dev)
|
||||
{
|
||||
pci_disable_sriov(dev);
|
||||
|
||||
...
|
||||
}
|
||||
|
||||
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
|
||||
{
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int dev_resume(struct pci_dev *dev)
|
||||
{
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void dev_shutdown(struct pci_dev *dev)
|
||||
{
|
||||
...
|
||||
}
|
||||
|
||||
static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
|
||||
{
|
||||
if (numvfs > 0) {
|
||||
...
|
||||
pci_enable_sriov(dev, numvfs);
|
||||
...
|
||||
return numvfs;
|
||||
}
|
||||
if (numvfs == 0) {
|
||||
....
|
||||
pci_disable_sriov(dev);
|
||||
...
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
static struct pci_driver dev_driver = {
|
||||
.name = "SR-IOV Physical Function driver",
|
||||
.id_table = dev_id_table,
|
||||
.probe = dev_probe,
|
||||
.remove = dev_remove,
|
||||
.suspend = dev_suspend,
|
||||
.resume = dev_resume,
|
||||
.shutdown = dev_shutdown,
|
||||
.sriov_configure = dev_sriov_configure,
|
||||
};
|
@ -1,10 +1,12 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
How To Write Linux PCI Drivers
|
||||
==============================
|
||||
How To Write Linux PCI Drivers
|
||||
==============================
|
||||
|
||||
by Martin Mares <mj@ucw.cz> on 07-Feb-2000
|
||||
updated by Grant Grundler <grundler@parisc-linux.org> on 23-Dec-2006
|
||||
:Authors: - Martin Mares <mj@ucw.cz>
|
||||
- Grant Grundler <grundler@parisc-linux.org>
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The world of PCI is vast and full of (mostly unpleasant) surprises.
|
||||
Since each CPU architecture implements different chip-sets and PCI devices
|
||||
have different requirements (erm, "features"), the result is the PCI support
|
||||
@ -15,8 +17,7 @@ PCI device drivers.
|
||||
A more complete resource is the third edition of "Linux Device Drivers"
|
||||
by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
|
||||
LDD3 is available for free (under Creative Commons License) from:
|
||||
|
||||
http://lwn.net/Kernel/LDD3/
|
||||
http://lwn.net/Kernel/LDD3/.
|
||||
|
||||
However, keep in mind that all documents are subject to "bit rot".
|
||||
Refer to the source code if things are not working as described here.
|
||||
@ -25,9 +26,8 @@ Please send questions/comments/patches about Linux PCI API to the
|
||||
"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list.
|
||||
|
||||
|
||||
|
||||
0. Structure of PCI drivers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Structure of PCI drivers
|
||||
========================
|
||||
PCI drivers "discover" PCI devices in a system via pci_register_driver().
|
||||
Actually, it's the other way around. When the PCI generic code discovers
|
||||
a new device, the driver with a matching "description" will be notified.
|
||||
@ -42,24 +42,25 @@ pointers and thus dictates the high level structure of a driver.
|
||||
Once the driver knows about a PCI device and takes ownership, the
|
||||
driver generally needs to perform the following initialization:
|
||||
|
||||
Enable the device
|
||||
Request MMIO/IOP resources
|
||||
Set the DMA mask size (for both coherent and streaming DMA)
|
||||
Allocate and initialize shared control data (pci_allocate_coherent())
|
||||
Access device configuration space (if needed)
|
||||
Register IRQ handler (request_irq())
|
||||
Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
||||
Enable DMA/processing engines
|
||||
- Enable the device
|
||||
- Request MMIO/IOP resources
|
||||
- Set the DMA mask size (for both coherent and streaming DMA)
|
||||
- Allocate and initialize shared control data (pci_allocate_coherent())
|
||||
- Access device configuration space (if needed)
|
||||
- Register IRQ handler (request_irq())
|
||||
- Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
||||
- Enable DMA/processing engines
|
||||
|
||||
When done using the device, and perhaps the module needs to be unloaded,
|
||||
the driver needs to take the follow steps:
|
||||
Disable the device from generating IRQs
|
||||
Release the IRQ (free_irq())
|
||||
Stop all DMA activity
|
||||
Release DMA buffers (both streaming and coherent)
|
||||
Unregister from other subsystems (e.g. scsi or netdev)
|
||||
Release MMIO/IOP resources
|
||||
Disable the device
|
||||
|
||||
- Disable the device from generating IRQs
|
||||
- Release the IRQ (free_irq())
|
||||
- Stop all DMA activity
|
||||
- Release DMA buffers (both streaming and coherent)
|
||||
- Unregister from other subsystems (e.g. scsi or netdev)
|
||||
- Release MMIO/IOP resources
|
||||
- Disable the device
|
||||
|
||||
Most of these topics are covered in the following sections.
|
||||
For the rest look at LDD3 or <linux/pci.h> .
|
||||
@ -70,99 +71,38 @@ completely empty or just returning an appropriate error codes to avoid
|
||||
lots of ifdefs in the drivers.
|
||||
|
||||
|
||||
pci_register_driver() call
|
||||
==========================
|
||||
|
||||
1. pci_register_driver() call
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
PCI device drivers call pci_register_driver() during their
|
||||
PCI device drivers call ``pci_register_driver()`` during their
|
||||
initialization with a pointer to a structure describing the driver
|
||||
(struct pci_driver):
|
||||
(``struct pci_driver``):
|
||||
|
||||
field name Description
|
||||
---------- ------------------------------------------------------
|
||||
id_table Pointer to table of device ID's the driver is
|
||||
interested in. Most drivers should export this
|
||||
table using MODULE_DEVICE_TABLE(pci,...).
|
||||
.. kernel-doc:: include/linux/pci.h
|
||||
:functions: pci_driver
|
||||
|
||||
probe This probing function gets called (during execution
|
||||
of pci_register_driver() for already existing
|
||||
devices or later if a new device gets inserted) for
|
||||
all PCI devices which match the ID table and are not
|
||||
"owned" by the other drivers yet. This function gets
|
||||
passed a "struct pci_dev *" for each device whose
|
||||
entry in the ID table matches the device. The probe
|
||||
function returns zero when the driver chooses to
|
||||
take "ownership" of the device or an error code
|
||||
(negative number) otherwise.
|
||||
The probe function always gets called from process
|
||||
context, so it can sleep.
|
||||
|
||||
remove The remove() function gets called whenever a device
|
||||
being handled by this driver is removed (either during
|
||||
deregistration of the driver or when it's manually
|
||||
pulled out of a hot-pluggable slot).
|
||||
The remove function always gets called from process
|
||||
context, so it can sleep.
|
||||
|
||||
suspend Put device into low power state.
|
||||
suspend_late Put device into low power state.
|
||||
|
||||
resume_early Wake device from low power state.
|
||||
resume Wake device from low power state.
|
||||
|
||||
(Please see Documentation/power/pci.txt for descriptions
|
||||
of PCI Power Management and the related functions.)
|
||||
|
||||
shutdown Hook into reboot_notifier_list (kernel/sys.c).
|
||||
Intended to stop any idling DMA operations.
|
||||
Useful for enabling wake-on-lan (NIC) or changing
|
||||
the power state of a device before reboot.
|
||||
e.g. drivers/net/e100.c.
|
||||
|
||||
err_handler See Documentation/PCI/pci-error-recovery.txt
|
||||
|
||||
|
||||
The ID table is an array of struct pci_device_id entries ending with an
|
||||
The ID table is an array of ``struct pci_device_id`` entries ending with an
|
||||
all-zero entry. Definitions with static const are generally preferred.
|
||||
|
||||
Each entry consists of:
|
||||
.. kernel-doc:: include/linux/mod_devicetable.h
|
||||
:functions: pci_device_id
|
||||
|
||||
vendor,device Vendor and device ID to match (or PCI_ANY_ID)
|
||||
|
||||
subvendor, Subsystem vendor and device ID to match (or PCI_ANY_ID)
|
||||
subdevice,
|
||||
|
||||
class Device class, subclass, and "interface" to match.
|
||||
See Appendix D of the PCI Local Bus Spec or
|
||||
include/linux/pci_ids.h for a full list of classes.
|
||||
Most drivers do not need to specify class/class_mask
|
||||
as vendor/device is normally sufficient.
|
||||
|
||||
class_mask limit which sub-fields of the class field are compared.
|
||||
See drivers/scsi/sym53c8xx_2/ for example of usage.
|
||||
|
||||
driver_data Data private to the driver.
|
||||
Most drivers don't need to use driver_data field.
|
||||
Best practice is to use driver_data as an index
|
||||
into a static list of equivalent device types,
|
||||
instead of using it as a pointer.
|
||||
|
||||
|
||||
Most drivers only need PCI_DEVICE() or PCI_DEVICE_CLASS() to set up
|
||||
Most drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up
|
||||
a pci_device_id table.
|
||||
|
||||
New PCI IDs may be added to a device driver pci_ids table at runtime
|
||||
as shown below:
|
||||
as shown below::
|
||||
|
||||
echo "vendor device subvendor subdevice class class_mask driver_data" > \
|
||||
/sys/bus/pci/drivers/{driver}/new_id
|
||||
echo "vendor device subvendor subdevice class class_mask driver_data" > \
|
||||
/sys/bus/pci/drivers/{driver}/new_id
|
||||
|
||||
All fields are passed in as hexadecimal values (no leading 0x).
|
||||
The vendor and device fields are mandatory, the others are optional. Users
|
||||
need pass only as many optional fields as necessary:
|
||||
o subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
|
||||
o class and classmask fields default to 0
|
||||
o driver_data defaults to 0UL.
|
||||
|
||||
- subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
|
||||
- class and classmask fields default to 0
|
||||
- driver_data defaults to 0UL.
|
||||
|
||||
Note that driver_data must match the value used by any of the pci_device_id
|
||||
entries defined in the driver. This makes the driver_data field mandatory
|
||||
@ -175,29 +115,31 @@ When the driver exits, it just calls pci_unregister_driver() and the PCI layer
|
||||
automatically calls the remove hook for all devices handled by the driver.
|
||||
|
||||
|
||||
1.1 "Attributes" for driver functions/data
|
||||
"Attributes" for driver functions/data
|
||||
--------------------------------------
|
||||
|
||||
Please mark the initialization and cleanup functions where appropriate
|
||||
(the corresponding macros are defined in <linux/init.h>):
|
||||
|
||||
====== =================================================
|
||||
__init Initialization code. Thrown away after the driver
|
||||
initializes.
|
||||
__exit Exit code. Ignored for non-modular drivers.
|
||||
====== =================================================
|
||||
|
||||
Tips on when/where to use the above attributes:
|
||||
o The module_init()/module_exit() functions (and all
|
||||
- The module_init()/module_exit() functions (and all
|
||||
initialization functions called _only_ from these)
|
||||
should be marked __init/__exit.
|
||||
|
||||
o Do not mark the struct pci_driver.
|
||||
- Do not mark the struct pci_driver.
|
||||
|
||||
o Do NOT mark a function if you are not sure which mark to use.
|
||||
- Do NOT mark a function if you are not sure which mark to use.
|
||||
Better to not mark the function than mark the function wrong.
|
||||
|
||||
|
||||
|
||||
2. How to find PCI devices manually
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
How to find PCI devices manually
|
||||
================================
|
||||
|
||||
PCI drivers should have a really good reason for not using the
|
||||
pci_register_driver() interface to search for PCI devices.
|
||||
@ -207,17 +149,17 @@ E.g. combined serial/parallel port/floppy controller.
|
||||
|
||||
A manual search may be performed using the following constructs:
|
||||
|
||||
Searching by vendor and device ID:
|
||||
Searching by vendor and device ID::
|
||||
|
||||
struct pci_dev *dev = NULL;
|
||||
while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
|
||||
configure_device(dev);
|
||||
|
||||
Searching by class ID (iterate in a similar way):
|
||||
Searching by class ID (iterate in a similar way)::
|
||||
|
||||
pci_get_class(CLASS_ID, dev)
|
||||
|
||||
Searching by both vendor/device and subsystem vendor/device ID:
|
||||
Searching by both vendor/device and subsystem vendor/device ID::
|
||||
|
||||
pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
|
||||
|
||||
@ -230,21 +172,20 @@ the pci_dev that they return. You must eventually (possibly at module unload)
|
||||
decrement the reference count on these devices by calling pci_dev_put().
|
||||
|
||||
|
||||
|
||||
3. Device Initialization Steps
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Device Initialization Steps
|
||||
===========================
|
||||
|
||||
As noted in the introduction, most PCI drivers need the following steps
|
||||
for device initialization:
|
||||
|
||||
Enable the device
|
||||
Request MMIO/IOP resources
|
||||
Set the DMA mask size (for both coherent and streaming DMA)
|
||||
Allocate and initialize shared control data (pci_allocate_coherent())
|
||||
Access device configuration space (if needed)
|
||||
Register IRQ handler (request_irq())
|
||||
Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
||||
Enable DMA/processing engines.
|
||||
- Enable the device
|
||||
- Request MMIO/IOP resources
|
||||
- Set the DMA mask size (for both coherent and streaming DMA)
|
||||
- Allocate and initialize shared control data (pci_allocate_coherent())
|
||||
- Access device configuration space (if needed)
|
||||
- Register IRQ handler (request_irq())
|
||||
- Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
||||
- Enable DMA/processing engines.
|
||||
|
||||
The driver can access PCI config space registers at any time.
|
||||
(Well, almost. When running BIST, config space can go away...but
|
||||
@ -252,26 +193,29 @@ that will just result in a PCI Bus Master Abort and config reads
|
||||
will return garbage).
|
||||
|
||||
|
||||
3.1 Enable the PCI device
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Enable the PCI device
|
||||
---------------------
|
||||
Before touching any device registers, the driver needs to enable
|
||||
the PCI device by calling pci_enable_device(). This will:
|
||||
o wake up the device if it was in suspended state,
|
||||
o allocate I/O and memory regions of the device (if BIOS did not),
|
||||
o allocate an IRQ (if BIOS did not).
|
||||
|
||||
NOTE: pci_enable_device() can fail! Check the return value.
|
||||
- wake up the device if it was in suspended state,
|
||||
- allocate I/O and memory regions of the device (if BIOS did not),
|
||||
- allocate an IRQ (if BIOS did not).
|
||||
|
||||
[ OS BUG: we don't check resource allocations before enabling those
|
||||
resources. The sequence would make more sense if we called
|
||||
pci_request_resources() before calling pci_enable_device().
|
||||
Currently, the device drivers can't detect the bug when when two
|
||||
devices have been allocated the same range. This is not a common
|
||||
problem and unlikely to get fixed soon.
|
||||
.. note::
|
||||
pci_enable_device() can fail! Check the return value.
|
||||
|
||||
.. warning::
|
||||
OS BUG: we don't check resource allocations before enabling those
|
||||
resources. The sequence would make more sense if we called
|
||||
pci_request_resources() before calling pci_enable_device().
|
||||
Currently, the device drivers can't detect the bug when when two
|
||||
devices have been allocated the same range. This is not a common
|
||||
problem and unlikely to get fixed soon.
|
||||
|
||||
This has been discussed before but not changed as of 2.6.19:
|
||||
http://lkml.org/lkml/2006/3/2/194
|
||||
|
||||
This has been discussed before but not changed as of 2.6.19:
|
||||
http://lkml.org/lkml/2006/3/2/194
|
||||
]
|
||||
|
||||
pci_set_master() will enable DMA by setting the bus master bit
|
||||
in the PCI_COMMAND register. It also fixes the latency timer value if
|
||||
@ -288,8 +232,8 @@ pci_try_set_mwi() to have the system do its best effort at enabling
|
||||
Mem-Wr-Inval.
|
||||
|
||||
|
||||
3.2 Request MMIO/IOP resources
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Request MMIO/IOP resources
|
||||
--------------------------
|
||||
Memory (MMIO), and I/O port addresses should NOT be read directly
|
||||
from the PCI device config space. Use the values in the pci_dev structure
|
||||
as the PCI "bus address" might have been remapped to a "host physical"
|
||||
@ -304,9 +248,10 @@ Conversely, drivers should call pci_release_region() AFTER
|
||||
calling pci_disable_device().
|
||||
The idea is to prevent two devices colliding on the same address range.
|
||||
|
||||
[ See OS BUG comment above. Currently (2.6.19), The driver can only
|
||||
determine MMIO and IO Port resource availability _after_ calling
|
||||
pci_enable_device(). ]
|
||||
.. tip::
|
||||
See OS BUG comment above. Currently (2.6.19), The driver can only
|
||||
determine MMIO and IO Port resource availability _after_ calling
|
||||
pci_enable_device().
|
||||
|
||||
Generic flavors of pci_request_region() are request_mem_region()
|
||||
(for MMIO ranges) and request_region() (for IO Port ranges).
|
||||
@ -316,12 +261,13 @@ BARs.
|
||||
Also see pci_request_selected_regions() below.
|
||||
|
||||
|
||||
3.3 Set the DMA mask size
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
[ If anything below doesn't make sense, please refer to
|
||||
Documentation/DMA-API.txt. This section is just a reminder that
|
||||
drivers need to indicate DMA capabilities of the device and is not
|
||||
an authoritative source for DMA interfaces. ]
|
||||
Set the DMA mask size
|
||||
---------------------
|
||||
.. note::
|
||||
If anything below doesn't make sense, please refer to
|
||||
Documentation/DMA-API.txt. This section is just a reminder that
|
||||
drivers need to indicate DMA capabilities of the device and is not
|
||||
an authoritative source for DMA interfaces.
|
||||
|
||||
While all drivers should explicitly indicate the DMA capability
|
||||
(e.g. 32 or 64 bit) of the PCI bus master, devices with more than
|
||||
@ -342,23 +288,23 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
|
||||
("consistent") data.
|
||||
|
||||
|
||||
3.4 Setup shared control data
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Setup shared control data
|
||||
-------------------------
|
||||
Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
|
||||
memory. See Documentation/DMA-API.txt for a full description of
|
||||
the DMA APIs. This section is just a reminder that it needs to be done
|
||||
before enabling DMA on the device.
|
||||
|
||||
|
||||
3.5 Initialize device registers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Initialize device registers
|
||||
---------------------------
|
||||
Some drivers will need specific "capability" fields programmed
|
||||
or other "vendor specific" register initialized or reset.
|
||||
E.g. clearing pending interrupts.
|
||||
|
||||
|
||||
3.6 Register IRQ handler
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Register IRQ handler
|
||||
--------------------
|
||||
While calling request_irq() is the last step described here,
|
||||
this is often just another intermediate step to initialize a device.
|
||||
This step can often be deferred until the device is opened for use.
|
||||
@ -396,6 +342,7 @@ and msix_enabled flags in the pci_dev structure after calling
|
||||
pci_alloc_irq_vectors.
|
||||
|
||||
There are (at least) two really good reasons for using MSI:
|
||||
|
||||
1) MSI is an exclusive interrupt vector by definition.
|
||||
This means the interrupt handler doesn't have to verify
|
||||
its device caused the interrupt.
|
||||
@ -410,24 +357,23 @@ See drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples
|
||||
of MSI/MSI-X usage.
|
||||
|
||||
|
||||
|
||||
4. PCI device shutdown
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
PCI device shutdown
|
||||
===================
|
||||
|
||||
When a PCI device driver is being unloaded, most of the following
|
||||
steps need to be performed:
|
||||
|
||||
Disable the device from generating IRQs
|
||||
Release the IRQ (free_irq())
|
||||
Stop all DMA activity
|
||||
Release DMA buffers (both streaming and consistent)
|
||||
Unregister from other subsystems (e.g. scsi or netdev)
|
||||
Disable device from responding to MMIO/IO Port addresses
|
||||
Release MMIO/IO Port resource(s)
|
||||
- Disable the device from generating IRQs
|
||||
- Release the IRQ (free_irq())
|
||||
- Stop all DMA activity
|
||||
- Release DMA buffers (both streaming and consistent)
|
||||
- Unregister from other subsystems (e.g. scsi or netdev)
|
||||
- Disable device from responding to MMIO/IO Port addresses
|
||||
- Release MMIO/IO Port resource(s)
|
||||
|
||||
|
||||
4.1 Stop IRQs on the device
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Stop IRQs on the device
|
||||
-----------------------
|
||||
How to do this is chip/device specific. If it's not done, it opens
|
||||
the possibility of a "screaming interrupt" if (and only if)
|
||||
the IRQ is shared with another device.
|
||||
@ -446,16 +392,16 @@ MSI and MSI-X are defined to be exclusive interrupts and thus
|
||||
are not susceptible to the "screaming interrupt" problem.
|
||||
|
||||
|
||||
4.2 Release the IRQ
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
Release the IRQ
|
||||
---------------
|
||||
Once the device is quiesced (no more IRQs), one can call free_irq().
|
||||
This function will return control once any pending IRQs are handled,
|
||||
"unhook" the drivers IRQ handler from that IRQ, and finally release
|
||||
the IRQ if no one else is using it.
|
||||
|
||||
|
||||
4.3 Stop all DMA activity
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Stop all DMA activity
|
||||
---------------------
|
||||
It's extremely important to stop all DMA operations BEFORE attempting
|
||||
to deallocate DMA control data. Failure to do so can result in memory
|
||||
corruption, hangs, and on some chip-sets a hard crash.
|
||||
@ -467,8 +413,8 @@ While this step sounds obvious and trivial, several "mature" drivers
|
||||
didn't get this step right in the past.
|
||||
|
||||
|
||||
4.4 Release DMA buffers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Release DMA buffers
|
||||
-------------------
|
||||
Once DMA is stopped, clean up streaming DMA first.
|
||||
I.e. unmap data buffers and return buffers to "upstream"
|
||||
owners if there is one.
|
||||
@ -478,8 +424,8 @@ Then clean up "consistent" buffers which contain the control data.
|
||||
See Documentation/DMA-API.txt for details on unmapping interfaces.
|
||||
|
||||
|
||||
4.5 Unregister from other subsystems
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Unregister from other subsystems
|
||||
--------------------------------
|
||||
Most low level PCI device drivers support some other subsystem
|
||||
like USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your
|
||||
driver isn't losing resources from that other subsystem.
|
||||
@ -487,31 +433,30 @@ If this happens, typically the symptom is an Oops (panic) when
|
||||
the subsystem attempts to call into a driver that has been unloaded.
|
||||
|
||||
|
||||
4.6 Disable Device from responding to MMIO/IO Port addresses
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Disable Device from responding to MMIO/IO Port addresses
|
||||
--------------------------------------------------------
|
||||
io_unmap() MMIO or IO Port resources and then call pci_disable_device().
|
||||
This is the symmetric opposite of pci_enable_device().
|
||||
Do not access device registers after calling pci_disable_device().
|
||||
|
||||
|
||||
4.7 Release MMIO/IO Port Resource(s)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Release MMIO/IO Port Resource(s)
|
||||
--------------------------------
|
||||
Call pci_release_region() to mark the MMIO or IO Port range as available.
|
||||
Failure to do so usually results in the inability to reload the driver.
|
||||
|
||||
|
||||
How to access PCI config space
|
||||
==============================
|
||||
|
||||
5. How to access PCI config space
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You can use pci_(read|write)_config_(byte|word|dword) to access the config
|
||||
space of a device represented by struct pci_dev *. All these functions return 0
|
||||
when successful or an error code (PCIBIOS_...) which can be translated to a text
|
||||
string by pcibios_strerror. Most drivers expect that accesses to valid PCI
|
||||
You can use `pci_(read|write)_config_(byte|word|dword)` to access the config
|
||||
space of a device represented by `struct pci_dev *`. All these functions return
|
||||
0 when successful or an error code (`PCIBIOS_...`) which can be translated to a
|
||||
text string by pcibios_strerror. Most drivers expect that accesses to valid PCI
|
||||
devices don't fail.
|
||||
|
||||
If you don't have a struct pci_dev available, you can call
|
||||
pci_bus_(read|write)_config_(byte|word|dword) to access a given device
|
||||
`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device
|
||||
and function on that bus.
|
||||
|
||||
If you access fields in the standard portion of the config header, please
|
||||
@ -522,10 +467,10 @@ pci_find_capability() for the particular capability and it will find the
|
||||
corresponding register block for you.
|
||||
|
||||
|
||||
Other interesting functions
|
||||
===========================
|
||||
|
||||
6. Other interesting functions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
============================= ================================================
|
||||
pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain,
|
||||
bus and slot and number. If the device is
|
||||
found, its reference count is increased.
|
||||
@ -539,11 +484,11 @@ pci_set_drvdata() Set private driver data pointer for a pci_dev
|
||||
pci_get_drvdata() Return private driver data pointer for a pci_dev
|
||||
pci_set_mwi() Enable Memory-Write-Invalidate transactions.
|
||||
pci_clear_mwi() Disable Memory-Write-Invalidate transactions.
|
||||
============================= ================================================
|
||||
|
||||
|
||||
|
||||
7. Miscellaneous hints
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
Miscellaneous hints
|
||||
===================
|
||||
|
||||
When displaying PCI device names to the user (for example when a driver wants
|
||||
to tell the user what card has it found), please use pci_name(pci_dev).
|
||||
@ -559,9 +504,8 @@ on the bus need to be capable of doing it, so this is something which needs
|
||||
to be handled by platform and generic code, not individual drivers.
|
||||
|
||||
|
||||
|
||||
8. Vendor and device identifications
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Vendor and device identifications
|
||||
=================================
|
||||
|
||||
Do not add new device or vendor IDs to include/linux/pci_ids.h unless they
|
||||
are shared across multiple drivers. You can add private definitions in
|
||||
@ -575,28 +519,27 @@ There are mirrors of the pci.ids file at http://pciids.sourceforge.net/
|
||||
and https://github.com/pciutils/pciids.
|
||||
|
||||
|
||||
|
||||
9. Obsolete functions
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
Obsolete functions
|
||||
==================
|
||||
|
||||
There are several functions which you might come across when trying to
|
||||
port an old driver to the new PCI interface. They are no longer present
|
||||
in the kernel as they aren't compatible with hotplug or PCI domains or
|
||||
having sane locking.
|
||||
|
||||
================= ===========================================
|
||||
pci_find_device() Superseded by pci_get_device()
|
||||
pci_find_subsys() Superseded by pci_get_subsys()
|
||||
pci_find_slot() Superseded by pci_get_domain_bus_and_slot()
|
||||
pci_get_slot() Superseded by pci_get_domain_bus_and_slot()
|
||||
|
||||
================= ===========================================
|
||||
|
||||
The alternative is the traditional PCI device driver that walks PCI
|
||||
device lists. This is still possible but discouraged.
|
||||
|
||||
|
||||
|
||||
10. MMIO Space and "Write Posting"
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
MMIO Space and "Write Posting"
|
||||
==============================
|
||||
|
||||
Converting a driver from using I/O Port space to using MMIO space
|
||||
often requires some additional changes. Specifically, "write posting"
|
||||
@ -609,14 +552,14 @@ the CPU before the transaction has reached its destination.
|
||||
|
||||
Thus, timing sensitive code should add readl() where the CPU is
|
||||
expected to wait before doing other work. The classic "bit banging"
|
||||
sequence works fine for I/O Port space:
|
||||
sequence works fine for I/O Port space::
|
||||
|
||||
for (i = 8; --i; val >>= 1) {
|
||||
outb(val & 1, ioport_reg); /* write bit */
|
||||
udelay(10);
|
||||
}
|
||||
|
||||
The same sequence for MMIO space should be:
|
||||
The same sequence for MMIO space should be::
|
||||
|
||||
for (i = 8; --i; val >>= 1) {
|
||||
writeb(val & 1, mmio_reg); /* write bit */
|
||||
@ -633,4 +576,3 @@ handle the PCI master abort on all platforms if the PCI device is
|
||||
expected to not respond to a readl(). Most x86 platforms will allow
|
||||
MMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
|
||||
(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").
|
||||
|
@ -1,21 +1,29 @@
|
||||
The PCI Express Advanced Error Reporting Driver Guide HOWTO
|
||||
T. Long Nguyen <tom.l.nguyen@intel.com>
|
||||
Yanmin Zhang <yanmin.zhang@intel.com>
|
||||
07/29/2006
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
===========================================================
|
||||
The PCI Express Advanced Error Reporting Driver Guide HOWTO
|
||||
===========================================================
|
||||
|
||||
1. Overview
|
||||
:Authors: - T. Long Nguyen <tom.l.nguyen@intel.com>
|
||||
- Yanmin Zhang <yanmin.zhang@intel.com>
|
||||
|
||||
1.1 About this guide
|
||||
:Copyright: |copy| 2006 Intel Corporation
|
||||
|
||||
Overview
|
||||
===========
|
||||
|
||||
About this guide
|
||||
----------------
|
||||
|
||||
This guide describes the basics of the PCI Express Advanced Error
|
||||
Reporting (AER) driver and provides information on how to use it, as
|
||||
well as how to enable the drivers of endpoint devices to conform with
|
||||
PCI Express AER driver.
|
||||
|
||||
1.2 Copyright (C) Intel Corporation 2006.
|
||||
|
||||
1.3 What is the PCI Express AER Driver?
|
||||
What is the PCI Express AER Driver?
|
||||
-----------------------------------
|
||||
|
||||
PCI Express error signaling can occur on the PCI Express link itself
|
||||
or on behalf of transactions initiated on the link. PCI Express
|
||||
@ -30,17 +38,19 @@ The PCI Express AER driver provides the infrastructure to support PCI
|
||||
Express Advanced Error Reporting capability. The PCI Express AER
|
||||
driver provides three basic functions:
|
||||
|
||||
- Gathers the comprehensive error information if errors occurred.
|
||||
- Reports error to the users.
|
||||
- Performs error recovery actions.
|
||||
- Gathers the comprehensive error information if errors occurred.
|
||||
- Reports error to the users.
|
||||
- Performs error recovery actions.
|
||||
|
||||
AER driver only attaches root ports which support PCI-Express AER
|
||||
capability.
|
||||
|
||||
|
||||
2. User Guide
|
||||
User Guide
|
||||
==========
|
||||
|
||||
2.1 Include the PCI Express AER Root Driver into the Linux Kernel
|
||||
Include the PCI Express AER Root Driver into the Linux Kernel
|
||||
-------------------------------------------------------------
|
||||
|
||||
The PCI Express AER Root driver is a Root Port service driver attached
|
||||
to the PCI Express Port Bus driver. If a user wants to use it, the driver
|
||||
@ -48,7 +58,8 @@ has to be compiled. Option CONFIG_PCIEAER supports this capability. It
|
||||
depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
|
||||
CONFIG_PCIEAER = y.
|
||||
|
||||
2.2 Load PCI Express AER Root Driver
|
||||
Load PCI Express AER Root Driver
|
||||
--------------------------------
|
||||
|
||||
Some systems have AER support in firmware. Enabling Linux AER support at
|
||||
the same time the firmware handles AER may result in unpredictable
|
||||
@ -56,30 +67,34 @@ behavior. Therefore, Linux does not handle AER events unless the firmware
|
||||
grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
|
||||
Specification for details regarding _OSC usage.
|
||||
|
||||
2.3 AER error output
|
||||
AER error output
|
||||
----------------
|
||||
|
||||
When a PCIe AER error is captured, an error message will be output to
|
||||
console. If it's a correctable error, it is output as a warning.
|
||||
Otherwise, it is printed as an error. So users could choose different
|
||||
log level to filter out correctable error messages.
|
||||
|
||||
Below shows an example:
|
||||
0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
|
||||
0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
|
||||
0000:50:00.0: [20] Unsupported Request (First)
|
||||
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
|
||||
Below shows an example::
|
||||
|
||||
0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
|
||||
0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
|
||||
0000:50:00.0: [20] Unsupported Request (First)
|
||||
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
|
||||
|
||||
In the example, 'Requester ID' means the ID of the device who sends
|
||||
the error message to root port. Pls. refer to pci express specs for
|
||||
other fields.
|
||||
|
||||
2.4 AER Statistics / Counters
|
||||
AER Statistics / Counters
|
||||
-------------------------
|
||||
|
||||
When PCIe AER errors are captured, the counters / statistics are also exposed
|
||||
in the form of sysfs attributes which are documented at
|
||||
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
|
||||
|
||||
3. Developer Guide
|
||||
Developer Guide
|
||||
===============
|
||||
|
||||
To enable AER aware support requires a software driver to configure
|
||||
the AER capability structure within its device and to provide callbacks.
|
||||
@ -120,7 +135,8 @@ hierarchy and links. These errors do not include any device specific
|
||||
errors because device specific errors will still get sent directly to
|
||||
the device driver.
|
||||
|
||||
3.1 Configure the AER capability structure
|
||||
Configure the AER capability structure
|
||||
--------------------------------------
|
||||
|
||||
AER aware drivers of PCI Express component need change the device
|
||||
control registers to enable AER. They also could change AER registers,
|
||||
@ -128,9 +144,11 @@ including mask and severity registers. Helper function
|
||||
pci_enable_pcie_error_reporting could be used to enable AER. See
|
||||
section 3.3.
|
||||
|
||||
3.2. Provide callbacks
|
||||
Provide callbacks
|
||||
-----------------
|
||||
|
||||
3.2.1 callback reset_link to reset pci express link
|
||||
callback reset_link to reset pci express link
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This callback is used to reset the pci express physical link when a
|
||||
fatal error happens. The root port aer service driver provides a
|
||||
@ -140,13 +158,15 @@ upstream ports should provide their own reset_link functions.
|
||||
|
||||
In struct pcie_port_service_driver, a new pointer, reset_link, is
|
||||
added.
|
||||
::
|
||||
|
||||
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
|
||||
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
|
||||
|
||||
Section 3.2.2.2 provides more detailed info on when to call
|
||||
reset_link.
|
||||
|
||||
3.2.2 PCI error-recovery callbacks
|
||||
PCI error-recovery callbacks
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The PCI Express AER Root driver uses error callbacks to coordinate
|
||||
with downstream device drivers associated with a hierarchy in question
|
||||
@ -161,7 +181,8 @@ definitions of the callbacks.
|
||||
|
||||
Below sections specify when to call the error callback functions.
|
||||
|
||||
3.2.2.1 Correctable errors
|
||||
Correctable errors
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Correctable errors pose no impacts on the functionality of
|
||||
the interface. The PCI Express protocol can recover without any
|
||||
@ -169,13 +190,16 @@ software intervention or any loss of data. These errors do not
|
||||
require any recovery actions. The AER driver clears the device's
|
||||
correctable error status register accordingly and logs these errors.
|
||||
|
||||
3.2.2.2 Non-correctable (non-fatal and fatal) errors
|
||||
Non-correctable (non-fatal and fatal) errors
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If an error message indicates a non-fatal error, performing link reset
|
||||
at upstream is not required. The AER driver calls error_detected(dev,
|
||||
pci_channel_io_normal) to all drivers associated within a hierarchy in
|
||||
question. for example,
|
||||
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort.
|
||||
question. for example::
|
||||
|
||||
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort
|
||||
|
||||
If Upstream port A captures an AER error, the hierarchy consists of
|
||||
Downstream port B and EndPoint.
|
||||
|
||||
@ -199,53 +223,72 @@ function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
|
||||
reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
|
||||
to mmio_enabled.
|
||||
|
||||
3.3 helper functions
|
||||
helper functions
|
||||
----------------
|
||||
::
|
||||
|
||||
int pci_enable_pcie_error_reporting(struct pci_dev *dev);
|
||||
|
||||
3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
|
||||
pci_enable_pcie_error_reporting enables the device to send error
|
||||
messages to root port when an error is detected. Note that devices
|
||||
don't enable the error reporting by default, so device drivers need
|
||||
call this function to enable it.
|
||||
|
||||
3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
|
||||
::
|
||||
|
||||
int pci_disable_pcie_error_reporting(struct pci_dev *dev);
|
||||
|
||||
pci_disable_pcie_error_reporting disables the device to send error
|
||||
messages to root port when an error is detected.
|
||||
|
||||
3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
|
||||
::
|
||||
|
||||
int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
|
||||
|
||||
pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
|
||||
error status register.
|
||||
|
||||
3.4 Frequent Asked Questions
|
||||
Frequent Asked Questions
|
||||
------------------------
|
||||
|
||||
Q: What happens if a PCI Express device driver does not provide an
|
||||
error recovery handler (pci_driver->err_handler is equal to NULL)?
|
||||
Q:
|
||||
What happens if a PCI Express device driver does not provide an
|
||||
error recovery handler (pci_driver->err_handler is equal to NULL)?
|
||||
|
||||
A: The devices attached with the driver won't be recovered. If the
|
||||
error is fatal, kernel will print out warning messages. Please refer
|
||||
to section 3 for more information.
|
||||
A:
|
||||
The devices attached with the driver won't be recovered. If the
|
||||
error is fatal, kernel will print out warning messages. Please refer
|
||||
to section 3 for more information.
|
||||
|
||||
Q: What happens if an upstream port service driver does not provide
|
||||
callback reset_link?
|
||||
Q:
|
||||
What happens if an upstream port service driver does not provide
|
||||
callback reset_link?
|
||||
|
||||
A: Fatal error recovery will fail if the errors are reported by the
|
||||
upstream ports who are attached by the service driver.
|
||||
A:
|
||||
Fatal error recovery will fail if the errors are reported by the
|
||||
upstream ports who are attached by the service driver.
|
||||
|
||||
Q: How does this infrastructure deal with driver that is not PCI
|
||||
Express aware?
|
||||
Q:
|
||||
How does this infrastructure deal with driver that is not PCI
|
||||
Express aware?
|
||||
|
||||
A: This infrastructure calls the error callback functions of the
|
||||
driver when an error happens. But if the driver is not aware of
|
||||
PCI Express, the device might not report its own errors to root
|
||||
port.
|
||||
A:
|
||||
This infrastructure calls the error callback functions of the
|
||||
driver when an error happens. But if the driver is not aware of
|
||||
PCI Express, the device might not report its own errors to root
|
||||
port.
|
||||
|
||||
Q: What modifications will that driver need to make it compatible
|
||||
with the PCI Express AER Root driver?
|
||||
Q:
|
||||
What modifications will that driver need to make it compatible
|
||||
with the PCI Express AER Root driver?
|
||||
|
||||
A: It could call the helper functions to enable AER in devices and
|
||||
cleanup uncorrectable status register. Pls. refer to section 3.3.
|
||||
A:
|
||||
It could call the helper functions to enable AER in devices and
|
||||
cleanup uncorrectable status register. Pls. refer to section 3.3.
|
||||
|
||||
|
||||
4. Software error injection
|
||||
Software error injection
|
||||
========================
|
||||
|
||||
Debugging PCIe AER error recovery code is quite difficult because it
|
||||
is hard to trigger real hardware errors. Software based error
|
||||
@ -261,6 +304,7 @@ After reboot with new kernel or insert the module, a device file named
|
||||
|
||||
Then, you need a user space tool named aer-inject, which can be gotten
|
||||
from:
|
||||
|
||||
https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
|
||||
|
||||
More information about aer-inject can be found in the document comes
|
@ -1,16 +1,23 @@
|
||||
The PCI Express Port Bus Driver Guide HOWTO
|
||||
Tom L Nguyen tom.l.nguyen@intel.com
|
||||
11/03/2004
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
1. About this guide
|
||||
===========================================
|
||||
The PCI Express Port Bus Driver Guide HOWTO
|
||||
===========================================
|
||||
|
||||
:Author: Tom L Nguyen tom.l.nguyen@intel.com 11/03/2004
|
||||
:Copyright: |copy| 2004 Intel Corporation
|
||||
|
||||
About this guide
|
||||
================
|
||||
|
||||
This guide describes the basics of the PCI Express Port Bus driver
|
||||
and provides information on how to enable the service drivers to
|
||||
register/unregister with the PCI Express Port Bus Driver.
|
||||
|
||||
2. Copyright 2004 Intel Corporation
|
||||
|
||||
3. What is the PCI Express Port Bus Driver
|
||||
What is the PCI Express Port Bus Driver
|
||||
=======================================
|
||||
|
||||
A PCI Express Port is a logical PCI-PCI Bridge structure. There
|
||||
are two types of PCI Express Port: the Root Port and the Switch
|
||||
@ -30,7 +37,8 @@ support (AER), and virtual channel support (VC). These services may
|
||||
be handled by a single complex driver or be individually distributed
|
||||
and handled by corresponding service drivers.
|
||||
|
||||
4. Why use the PCI Express Port Bus Driver?
|
||||
Why use the PCI Express Port Bus Driver?
|
||||
========================================
|
||||
|
||||
In existing Linux kernels, the Linux Device Driver Model allows a
|
||||
physical device to be handled by only a single driver. The PCI
|
||||
@ -51,28 +59,31 @@ PCI Express Ports and distributes all provided service requests
|
||||
to the corresponding service drivers as required. Some key
|
||||
advantages of using the PCI Express Port Bus driver are listed below:
|
||||
|
||||
- Allow multiple service drivers to run simultaneously on
|
||||
a PCI-PCI Bridge Port device.
|
||||
- Allow multiple service drivers to run simultaneously on
|
||||
a PCI-PCI Bridge Port device.
|
||||
|
||||
- Allow service drivers implemented in an independent
|
||||
staged approach.
|
||||
- Allow service drivers implemented in an independent
|
||||
staged approach.
|
||||
|
||||
- Allow one service driver to run on multiple PCI-PCI Bridge
|
||||
Port devices.
|
||||
- Allow one service driver to run on multiple PCI-PCI Bridge
|
||||
Port devices.
|
||||
|
||||
- Manage and distribute resources of a PCI-PCI Bridge Port
|
||||
device to requested service drivers.
|
||||
- Manage and distribute resources of a PCI-PCI Bridge Port
|
||||
device to requested service drivers.
|
||||
|
||||
5. Configuring the PCI Express Port Bus Driver vs. Service Drivers
|
||||
Configuring the PCI Express Port Bus Driver vs. Service Drivers
|
||||
===============================================================
|
||||
|
||||
5.1 Including the PCI Express Port Bus Driver Support into the Kernel
|
||||
Including the PCI Express Port Bus Driver Support into the Kernel
|
||||
-----------------------------------------------------------------
|
||||
|
||||
Including the PCI Express Port Bus driver depends on whether the PCI
|
||||
Express support is included in the kernel config. The kernel will
|
||||
automatically include the PCI Express Port Bus driver as a kernel
|
||||
driver when the PCI Express support is enabled in the kernel.
|
||||
|
||||
5.2 Enabling Service Driver Support
|
||||
Enabling Service Driver Support
|
||||
-------------------------------
|
||||
|
||||
PCI device drivers are implemented based on Linux Device Driver Model.
|
||||
All service drivers are PCI device drivers. As discussed above, it is
|
||||
@ -89,9 +100,11 @@ header file /include/linux/pcieport_if.h, before calling these APIs.
|
||||
Failure to do so will result an identity mismatch, which prevents
|
||||
the PCI Express Port Bus driver from loading a service driver.
|
||||
|
||||
5.2.1 pcie_port_service_register
|
||||
pcie_port_service_register
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
::
|
||||
|
||||
int pcie_port_service_register(struct pcie_port_service_driver *new)
|
||||
int pcie_port_service_register(struct pcie_port_service_driver *new)
|
||||
|
||||
This API replaces the Linux Driver Model's pci_register_driver API. A
|
||||
service driver should always calls pcie_port_service_register at
|
||||
@ -99,69 +112,76 @@ module init. Note that after service driver being loaded, calls
|
||||
such as pci_enable_device(dev) and pci_set_master(dev) are no longer
|
||||
necessary since these calls are executed by the PCI Port Bus driver.
|
||||
|
||||
5.2.2 pcie_port_service_unregister
|
||||
pcie_port_service_unregister
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
::
|
||||
|
||||
void pcie_port_service_unregister(struct pcie_port_service_driver *new)
|
||||
void pcie_port_service_unregister(struct pcie_port_service_driver *new)
|
||||
|
||||
pcie_port_service_unregister replaces the Linux Driver Model's
|
||||
pci_unregister_driver. It's always called by service driver when a
|
||||
module exits.
|
||||
|
||||
5.2.3 Sample Code
|
||||
Sample Code
|
||||
~~~~~~~~~~~
|
||||
|
||||
Below is sample service driver code to initialize the port service
|
||||
driver data structure.
|
||||
::
|
||||
|
||||
static struct pcie_port_service_id service_id[] = { {
|
||||
.vendor = PCI_ANY_ID,
|
||||
.device = PCI_ANY_ID,
|
||||
.port_type = PCIE_RC_PORT,
|
||||
.service_type = PCIE_PORT_SERVICE_AER,
|
||||
}, { /* end: all zeroes */ }
|
||||
};
|
||||
static struct pcie_port_service_id service_id[] = { {
|
||||
.vendor = PCI_ANY_ID,
|
||||
.device = PCI_ANY_ID,
|
||||
.port_type = PCIE_RC_PORT,
|
||||
.service_type = PCIE_PORT_SERVICE_AER,
|
||||
}, { /* end: all zeroes */ }
|
||||
};
|
||||
|
||||
static struct pcie_port_service_driver root_aerdrv = {
|
||||
.name = (char *)device_name,
|
||||
.id_table = &service_id[0],
|
||||
static struct pcie_port_service_driver root_aerdrv = {
|
||||
.name = (char *)device_name,
|
||||
.id_table = &service_id[0],
|
||||
|
||||
.probe = aerdrv_load,
|
||||
.remove = aerdrv_unload,
|
||||
.probe = aerdrv_load,
|
||||
.remove = aerdrv_unload,
|
||||
|
||||
.suspend = aerdrv_suspend,
|
||||
.resume = aerdrv_resume,
|
||||
};
|
||||
.suspend = aerdrv_suspend,
|
||||
.resume = aerdrv_resume,
|
||||
};
|
||||
|
||||
Below is a sample code for registering/unregistering a service
|
||||
driver.
|
||||
::
|
||||
|
||||
static int __init aerdrv_service_init(void)
|
||||
{
|
||||
int retval = 0;
|
||||
static int __init aerdrv_service_init(void)
|
||||
{
|
||||
int retval = 0;
|
||||
|
||||
retval = pcie_port_service_register(&root_aerdrv);
|
||||
if (!retval) {
|
||||
/*
|
||||
* FIX ME
|
||||
*/
|
||||
}
|
||||
return retval;
|
||||
}
|
||||
retval = pcie_port_service_register(&root_aerdrv);
|
||||
if (!retval) {
|
||||
/*
|
||||
* FIX ME
|
||||
*/
|
||||
}
|
||||
return retval;
|
||||
}
|
||||
|
||||
static void __exit aerdrv_service_exit(void)
|
||||
{
|
||||
pcie_port_service_unregister(&root_aerdrv);
|
||||
}
|
||||
static void __exit aerdrv_service_exit(void)
|
||||
{
|
||||
pcie_port_service_unregister(&root_aerdrv);
|
||||
}
|
||||
|
||||
module_init(aerdrv_service_init);
|
||||
module_exit(aerdrv_service_exit);
|
||||
module_init(aerdrv_service_init);
|
||||
module_exit(aerdrv_service_exit);
|
||||
|
||||
6. Possible Resource Conflicts
|
||||
Possible Resource Conflicts
|
||||
===========================
|
||||
|
||||
Since all service drivers of a PCI-PCI Bridge Port device are
|
||||
allowed to run simultaneously, below lists a few of possible resource
|
||||
conflicts with proposed solutions.
|
||||
|
||||
6.1 MSI and MSI-X Vector Resource
|
||||
MSI and MSI-X Vector Resource
|
||||
-----------------------------
|
||||
|
||||
Once MSI or MSI-X interrupts are enabled on a device, it stays in this
|
||||
mode until they are disabled again. Since service drivers of the same
|
||||
@ -179,7 +199,8 @@ driver. Service drivers should use (struct pcie_device*)dev->irq to
|
||||
call request_irq/free_irq. In addition, the interrupt mode is stored
|
||||
in the field interrupt_mode of struct pcie_device.
|
||||
|
||||
6.3 PCI Memory/IO Mapped Regions
|
||||
PCI Memory/IO Mapped Regions
|
||||
----------------------------
|
||||
|
||||
Service drivers for PCI Express Power Management (PME), Advanced
|
||||
Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access
|
||||
@ -188,7 +209,8 @@ registers accessed are independent of each other. This patch assumes
|
||||
that all service drivers will be well behaved and not overwrite
|
||||
other service driver's configuration settings.
|
||||
|
||||
6.4 PCI Config Registers
|
||||
PCI Config Registers
|
||||
--------------------
|
||||
|
||||
Each service driver runs its PCI config operations on its own
|
||||
capability structure except the PCI Express capability structure, in
|
@ -13,7 +13,7 @@
|
||||
For ARM64, ONLY "acpi=off", "acpi=on" or "acpi=force"
|
||||
are available
|
||||
|
||||
See also Documentation/power/runtime_pm.txt, pci=noacpi
|
||||
See also Documentation/power/runtime_pm.rst, pci=noacpi
|
||||
|
||||
acpi_apic_instance= [ACPI, IOAPIC]
|
||||
Format: <int>
|
||||
@ -223,7 +223,7 @@
|
||||
acpi_sleep= [HW,ACPI] Sleep options
|
||||
Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig,
|
||||
old_ordering, nonvs, sci_force_enable, nobl }
|
||||
See Documentation/power/video.txt for information on
|
||||
See Documentation/power/video.rst for information on
|
||||
s3_bios and s3_mode.
|
||||
s3_beep is for debugging; it makes the PC's speaker beep
|
||||
as soon as the kernel's real-mode entry point is called.
|
||||
@ -4108,7 +4108,7 @@
|
||||
Specify the offset from the beginning of the partition
|
||||
given by "resume=" at which the swap header is located,
|
||||
in <PAGE_SIZE> units (needed only for swap files).
|
||||
See Documentation/power/swsusp-and-swap-files.txt
|
||||
See Documentation/power/swsusp-and-swap-files.rst
|
||||
|
||||
resumedelay= [HIBERNATION] Delay (in seconds) to pause before attempting to
|
||||
read the resume files
|
||||
|
@ -95,7 +95,7 @@ flags - flags of the cpufreq driver
|
||||
|
||||
3. CPUFreq Table Generation with Operating Performance Point (OPP)
|
||||
==================================================================
|
||||
For details about OPP, see Documentation/power/opp.txt
|
||||
For details about OPP, see Documentation/power/opp.rst
|
||||
|
||||
dev_pm_opp_init_cpufreq_table -
|
||||
This function provides a ready to use conversion routine to translate
|
||||
|
@ -225,7 +225,7 @@ system-wide transition to a sleep state even though its :c:member:`runtime_auto`
|
||||
flag is clear.
|
||||
|
||||
For more information about the runtime power management framework, refer to
|
||||
:file:`Documentation/power/runtime_pm.txt`.
|
||||
:file:`Documentation/power/runtime_pm.rst`.
|
||||
|
||||
|
||||
Calling Drivers to Enter and Leave System Sleep States
|
||||
@ -728,7 +728,7 @@ it into account in any way.
|
||||
|
||||
Devices may be defined as IRQ-safe which indicates to the PM core that their
|
||||
runtime PM callbacks may be invoked with disabled interrupts (see
|
||||
:file:`Documentation/power/runtime_pm.txt` for more information). If an
|
||||
:file:`Documentation/power/runtime_pm.rst` for more information). If an
|
||||
IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be
|
||||
disallowed, unless the domain itself is defined as IRQ-safe. However, it
|
||||
makes sense to define a PM domain as IRQ-safe only if all the devices in it
|
||||
@ -795,7 +795,7 @@ so on) and the final state of the device must reflect the "active" runtime PM
|
||||
status in that case.
|
||||
|
||||
During system-wide resume from a sleep state it's easiest to put devices into
|
||||
the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
|
||||
the full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`.
|
||||
[Refer to that document for more information regarding this particular issue as
|
||||
well as for information on the device runtime power management framework in
|
||||
general.]
|
||||
|
@ -46,7 +46,7 @@ device is turned off while the system as a whole remains running, we
|
||||
call it a "dynamic suspend" (also known as a "runtime suspend" or
|
||||
"selective suspend"). This document concentrates mostly on how
|
||||
dynamic PM is implemented in the USB subsystem, although system PM is
|
||||
covered to some extent (see ``Documentation/power/*.txt`` for more
|
||||
covered to some extent (see ``Documentation/power/*.rst`` for more
|
||||
information about system PM).
|
||||
|
||||
System PM support is present only if the kernel was built with
|
||||
|
@ -101,6 +101,7 @@ needed).
|
||||
filesystems/index
|
||||
vm/index
|
||||
bpf/index
|
||||
PCI/index
|
||||
misc-devices/index
|
||||
|
||||
Architecture-specific documentation
|
||||
|
@ -1,5 +1,7 @@
|
||||
============
|
||||
APM or ACPI?
|
||||
------------
|
||||
============
|
||||
|
||||
If you have a relatively recent x86 mobile, desktop, or server system,
|
||||
odds are it supports either Advanced Power Management (APM) or
|
||||
Advanced Configuration and Power Interface (ACPI). ACPI is the newer
|
||||
@ -28,5 +30,7 @@ and be sure that they are started sometime in the system boot process.
|
||||
Go ahead and start both. If ACPI or APM is not available on your
|
||||
system the associated daemon will exit gracefully.
|
||||
|
||||
apmd: http://ftp.debian.org/pool/main/a/apmd/
|
||||
acpid: http://acpid.sf.net/
|
||||
===== =======================================
|
||||
apmd http://ftp.debian.org/pool/main/a/apmd/
|
||||
acpid http://acpid.sf.net/
|
||||
===== =======================================
|
@ -1,12 +1,16 @@
|
||||
=================================
|
||||
Debugging hibernation and suspend
|
||||
=================================
|
||||
|
||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||
|
||||
1. Testing hibernation (aka suspend to disk or STD)
|
||||
===================================================
|
||||
|
||||
To check if hibernation works, you can try to hibernate in the "reboot" mode:
|
||||
To check if hibernation works, you can try to hibernate in the "reboot" mode::
|
||||
|
||||
# echo reboot > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
# echo reboot > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
|
||||
and the system should create a hibernation image, reboot, resume and get back to
|
||||
the command prompt where you have started the transition. If that happens,
|
||||
@ -15,20 +19,21 @@ test at least a couple of times in a row for confidence. [This is necessary,
|
||||
because some problems only show up on a second attempt at suspending and
|
||||
resuming the system.] Moreover, hibernating in the "reboot" and "shutdown"
|
||||
modes causes the PM core to skip some platform-related callbacks which on ACPI
|
||||
systems might be necessary to make hibernation work. Thus, if your machine fails
|
||||
to hibernate or resume in the "reboot" mode, you should try the "platform" mode:
|
||||
systems might be necessary to make hibernation work. Thus, if your machine
|
||||
fails to hibernate or resume in the "reboot" mode, you should try the
|
||||
"platform" mode::
|
||||
|
||||
# echo platform > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
# echo platform > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
|
||||
which is the default and recommended mode of hibernation.
|
||||
|
||||
Unfortunately, the "platform" mode of hibernation does not work on some systems
|
||||
with broken BIOSes. In such cases the "shutdown" mode of hibernation might
|
||||
work:
|
||||
work::
|
||||
|
||||
# echo shutdown > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
# echo shutdown > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
|
||||
(it is similar to the "reboot" mode, but it requires you to press the power
|
||||
button to make the system resume).
|
||||
@ -37,6 +42,7 @@ If neither "platform" nor "shutdown" hibernation mode works, you will need to
|
||||
identify what goes wrong.
|
||||
|
||||
a) Test modes of hibernation
|
||||
----------------------------
|
||||
|
||||
To find out why hibernation fails on your system, you can use a special testing
|
||||
facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then,
|
||||
@ -44,36 +50,38 @@ there is the file /sys/power/pm_test that can be used to make the hibernation
|
||||
core run in a test mode. There are 5 test modes available:
|
||||
|
||||
freezer
|
||||
- test the freezing of processes
|
||||
- test the freezing of processes
|
||||
|
||||
devices
|
||||
- test the freezing of processes and suspending of devices
|
||||
- test the freezing of processes and suspending of devices
|
||||
|
||||
platform
|
||||
- test the freezing of processes, suspending of devices and platform
|
||||
global control methods(*)
|
||||
- test the freezing of processes, suspending of devices and platform
|
||||
global control methods [1]_
|
||||
|
||||
processors
|
||||
- test the freezing of processes, suspending of devices, platform
|
||||
global control methods(*) and the disabling of nonboot CPUs
|
||||
- test the freezing of processes, suspending of devices, platform
|
||||
global control methods [1]_ and the disabling of nonboot CPUs
|
||||
|
||||
core
|
||||
- test the freezing of processes, suspending of devices, platform global
|
||||
control methods(*), the disabling of nonboot CPUs and suspending of
|
||||
platform/system devices
|
||||
- test the freezing of processes, suspending of devices, platform global
|
||||
control methods\ [1]_, the disabling of nonboot CPUs and suspending
|
||||
of platform/system devices
|
||||
|
||||
(*) the platform global control methods are only available on ACPI systems
|
||||
.. [1]
|
||||
|
||||
the platform global control methods are only available on ACPI systems
|
||||
and are only tested if the hibernation mode is set to "platform"
|
||||
|
||||
To use one of them it is necessary to write the corresponding string to
|
||||
/sys/power/pm_test (eg. "devices" to test the freezing of processes and
|
||||
suspending devices) and issue the standard hibernation commands. For example,
|
||||
to use the "devices" test mode along with the "platform" mode of hibernation,
|
||||
you should do the following:
|
||||
you should do the following::
|
||||
|
||||
# echo devices > /sys/power/pm_test
|
||||
# echo platform > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
# echo devices > /sys/power/pm_test
|
||||
# echo platform > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
|
||||
Then, the kernel will try to freeze processes, suspend devices, wait a few
|
||||
seconds (5 by default, but configurable by the suspend.pm_test_delay module
|
||||
@ -108,11 +116,12 @@ If the "devices" test fails, most likely there is a driver that cannot suspend
|
||||
or resume its device (in the latter case the system may hang or become unstable
|
||||
after the test, so please take that into consideration). To find this driver,
|
||||
you can carry out a binary search according to the rules:
|
||||
|
||||
- if the test fails, unload a half of the drivers currently loaded and repeat
|
||||
(that would probably involve rebooting the system, so always note what drivers
|
||||
have been loaded before the test),
|
||||
(that would probably involve rebooting the system, so always note what drivers
|
||||
have been loaded before the test),
|
||||
- if the test succeeds, load a half of the drivers you have unloaded most
|
||||
recently and repeat.
|
||||
recently and repeat.
|
||||
|
||||
Once you have found the failing driver (there can be more than just one of
|
||||
them), you have to unload it every time before hibernation. In that case please
|
||||
@ -146,6 +155,7 @@ indicates a serious problem that very well may be related to the hardware, but
|
||||
please report it anyway.
|
||||
|
||||
b) Testing minimal configuration
|
||||
--------------------------------
|
||||
|
||||
If all of the hibernation test modes work, you can boot the system with the
|
||||
"init=/bin/bash" command line parameter and attempt to hibernate in the
|
||||
@ -165,14 +175,15 @@ Again, if you find the offending module(s), it(they) must be unloaded every time
|
||||
before hibernation, and please report the problem with it(them).
|
||||
|
||||
c) Using the "test_resume" hibernation option
|
||||
---------------------------------------------
|
||||
|
||||
/sys/power/disk generally tells the kernel what to do after creating a
|
||||
hibernation image. One of the available options is "test_resume" which
|
||||
causes the just created image to be used for immediate restoration. Namely,
|
||||
after doing:
|
||||
after doing::
|
||||
|
||||
# echo test_resume > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
# echo test_resume > /sys/power/disk
|
||||
# echo disk > /sys/power/state
|
||||
|
||||
a hibernation image will be created and a resume from it will be triggered
|
||||
immediately without involving the platform firmware in any way.
|
||||
@ -190,6 +201,7 @@ to resume may be related to the differences between the restore and image
|
||||
kernels.
|
||||
|
||||
d) Advanced debugging
|
||||
---------------------
|
||||
|
||||
In case that hibernation does not work on your system even in the minimal
|
||||
configuration and compiling more drivers as modules is not practical or some
|
||||
@ -200,9 +212,10 @@ kernel messages using the serial console. This may provide you with some
|
||||
information about the reasons of the suspend (resume) failure. Alternatively,
|
||||
it may be possible to use a FireWire port for debugging with firescope
|
||||
(http://v3.sk/~lkundrak/firescope/). On x86 it is also possible to
|
||||
use the PM_TRACE mechanism documented in Documentation/power/s2ram.txt .
|
||||
use the PM_TRACE mechanism documented in Documentation/power/s2ram.rst .
|
||||
|
||||
2. Testing suspend to RAM (STR)
|
||||
===============================
|
||||
|
||||
To verify that the STR works, it is generally more convenient to use the s2ram
|
||||
tool available from http://suspend.sf.net and documented at
|
||||
@ -230,7 +243,8 @@ you will have to unload them every time before an STR transition (ie. before
|
||||
you run s2ram), and please report the problems with them.
|
||||
|
||||
There is a debugfs entry which shows the suspend to RAM statistics. Here is an
|
||||
example of its output.
|
||||
example of its output::
|
||||
|
||||
# mount -t debugfs none /sys/kernel/debug
|
||||
# cat /sys/kernel/debug/suspend_stats
|
||||
success: 20
|
||||
@ -248,6 +262,7 @@ example of its output.
|
||||
-16
|
||||
last_failed_step: suspend
|
||||
suspend
|
||||
|
||||
Field success means the success number of suspend to RAM, and field fail means
|
||||
the failure number. Others are the failure number of different steps of suspend
|
||||
to RAM. suspend_stats just lists the last 2 failed devices, error number and
|
@ -1,4 +1,7 @@
|
||||
===============
|
||||
Charger Manager
|
||||
===============
|
||||
|
||||
(C) 2011 MyungJoo Ham <myungjoo.ham@samsung.com>, GPL
|
||||
|
||||
Charger Manager provides in-kernel battery charger management that
|
||||
@ -55,41 +58,39 @@ Charger Manager supports the following:
|
||||
notification to users with UEVENT.
|
||||
|
||||
2. Global Charger-Manager Data related with suspend_again
|
||||
========================================================
|
||||
=========================================================
|
||||
In order to setup Charger Manager with suspend-again feature
|
||||
(in-suspend monitoring), the user should provide charger_global_desc
|
||||
with setup_charger_manager(struct charger_global_desc *).
|
||||
with setup_charger_manager(`struct charger_global_desc *`).
|
||||
This charger_global_desc data for in-suspend monitoring is global
|
||||
as the name suggests. Thus, the user needs to provide only once even
|
||||
if there are multiple batteries. If there are multiple batteries, the
|
||||
multiple instances of Charger Manager share the same charger_global_desc
|
||||
and it will manage in-suspend monitoring for all instances of Charger Manager.
|
||||
|
||||
The user needs to provide all the three entries properly in order to activate
|
||||
in-suspend monitoring:
|
||||
The user needs to provide all the three entries to `struct charger_global_desc`
|
||||
properly in order to activate in-suspend monitoring:
|
||||
|
||||
struct charger_global_desc {
|
||||
|
||||
char *rtc_name;
|
||||
: The name of rtc (e.g., "rtc0") used to wakeup the system from
|
||||
`char *rtc_name;`
|
||||
The name of rtc (e.g., "rtc0") used to wakeup the system from
|
||||
suspend for Charger Manager. The alarm interrupt (AIE) of the rtc
|
||||
should be able to wake up the system from suspend. Charger Manager
|
||||
saves and restores the alarm value and use the previously-defined
|
||||
alarm if it is going to go off earlier than Charger Manager so that
|
||||
Charger Manager does not interfere with previously-defined alarms.
|
||||
|
||||
bool (*rtc_only_wakeup)(void);
|
||||
: This callback should let CM know whether
|
||||
`bool (*rtc_only_wakeup)(void);`
|
||||
This callback should let CM know whether
|
||||
the wakeup-from-suspend is caused only by the alarm of "rtc" in the
|
||||
same struct. If there is any other wakeup source triggered the
|
||||
wakeup, it should return false. If the "rtc" is the only wakeup
|
||||
reason, it should return true.
|
||||
|
||||
bool assume_timer_stops_in_suspend;
|
||||
: if true, Charger Manager assumes that
|
||||
`bool assume_timer_stops_in_suspend;`
|
||||
if true, Charger Manager assumes that
|
||||
the timer (CM uses jiffies as timer) stops during suspend. Then, CM
|
||||
assumes that the suspend-duration is same as the alarm length.
|
||||
};
|
||||
|
||||
|
||||
3. How to setup suspend_again
|
||||
=============================
|
||||
@ -109,26 +110,28 @@ if the system was woken up by Charger Manager and the polling
|
||||
=============================================
|
||||
For each battery charged independently from other batteries (if a series of
|
||||
batteries are charged by a single charger, they are counted as one independent
|
||||
battery), an instance of Charger Manager is attached to it.
|
||||
battery), an instance of Charger Manager is attached to it. The following
|
||||
|
||||
struct charger_desc {
|
||||
struct charger_desc elements:
|
||||
|
||||
char *psy_name;
|
||||
: The power-supply-class name of the battery. Default is
|
||||
`char *psy_name;`
|
||||
The power-supply-class name of the battery. Default is
|
||||
"battery" if psy_name is NULL. Users can access the psy entries
|
||||
at "/sys/class/power_supply/[psy_name]/".
|
||||
|
||||
enum polling_modes polling_mode;
|
||||
: CM_POLL_DISABLE: do not poll this battery.
|
||||
CM_POLL_ALWAYS: always poll this battery.
|
||||
CM_POLL_EXTERNAL_POWER_ONLY: poll this battery if and only if
|
||||
an external power source is attached.
|
||||
CM_POLL_CHARGING_ONLY: poll this battery if and only if the
|
||||
battery is being charged.
|
||||
`enum polling_modes polling_mode;`
|
||||
CM_POLL_DISABLE:
|
||||
do not poll this battery.
|
||||
CM_POLL_ALWAYS:
|
||||
always poll this battery.
|
||||
CM_POLL_EXTERNAL_POWER_ONLY:
|
||||
poll this battery if and only if an external power
|
||||
source is attached.
|
||||
CM_POLL_CHARGING_ONLY:
|
||||
poll this battery if and only if the battery is being charged.
|
||||
|
||||
unsigned int fullbatt_vchkdrop_ms;
|
||||
unsigned int fullbatt_vchkdrop_uV;
|
||||
: If both have non-zero values, Charger Manager will check the
|
||||
`unsigned int fullbatt_vchkdrop_ms; / unsigned int fullbatt_vchkdrop_uV;`
|
||||
If both have non-zero values, Charger Manager will check the
|
||||
battery voltage drop fullbatt_vchkdrop_ms after the battery is fully
|
||||
charged. If the voltage drop is over fullbatt_vchkdrop_uV, Charger
|
||||
Manager will try to recharge the battery by disabling and enabling
|
||||
@ -136,50 +139,52 @@ unsigned int fullbatt_vchkdrop_uV;
|
||||
condition) is needed to be implemented with hardware interrupts from
|
||||
fuel gauges or charger devices/chips.
|
||||
|
||||
unsigned int fullbatt_uV;
|
||||
: If specified with a non-zero value, Charger Manager assumes
|
||||
`unsigned int fullbatt_uV;`
|
||||
If specified with a non-zero value, Charger Manager assumes
|
||||
that the battery is full (capacity = 100) if the battery is not being
|
||||
charged and the battery voltage is equal to or greater than
|
||||
fullbatt_uV.
|
||||
|
||||
unsigned int polling_interval_ms;
|
||||
: Required polling interval in ms. Charger Manager will poll
|
||||
`unsigned int polling_interval_ms;`
|
||||
Required polling interval in ms. Charger Manager will poll
|
||||
this battery every polling_interval_ms or more frequently.
|
||||
|
||||
enum data_source battery_present;
|
||||
: CM_BATTERY_PRESENT: assume that the battery exists.
|
||||
CM_NO_BATTERY: assume that the battery does not exists.
|
||||
CM_FUEL_GAUGE: get battery presence information from fuel gauge.
|
||||
CM_CHARGER_STAT: get battery presence from chargers.
|
||||
`enum data_source battery_present;`
|
||||
CM_BATTERY_PRESENT:
|
||||
assume that the battery exists.
|
||||
CM_NO_BATTERY:
|
||||
assume that the battery does not exists.
|
||||
CM_FUEL_GAUGE:
|
||||
get battery presence information from fuel gauge.
|
||||
CM_CHARGER_STAT:
|
||||
get battery presence from chargers.
|
||||
|
||||
char **psy_charger_stat;
|
||||
: An array ending with NULL that has power-supply-class names of
|
||||
`char **psy_charger_stat;`
|
||||
An array ending with NULL that has power-supply-class names of
|
||||
chargers. Each power-supply-class should provide "PRESENT" (if
|
||||
battery_present is "CM_CHARGER_STAT"), "ONLINE" (shows whether an
|
||||
external power source is attached or not), and "STATUS" (shows whether
|
||||
the battery is {"FULL" or not FULL} or {"FULL", "Charging",
|
||||
"Discharging", "NotCharging"}).
|
||||
|
||||
int num_charger_regulators;
|
||||
struct regulator_bulk_data *charger_regulators;
|
||||
: Regulators representing the chargers in the form for
|
||||
`int num_charger_regulators; / struct regulator_bulk_data *charger_regulators;`
|
||||
Regulators representing the chargers in the form for
|
||||
regulator framework's bulk functions.
|
||||
|
||||
char *psy_fuel_gauge;
|
||||
: Power-supply-class name of the fuel gauge.
|
||||
`char *psy_fuel_gauge;`
|
||||
Power-supply-class name of the fuel gauge.
|
||||
|
||||
int (*temperature_out_of_range)(int *mC);
|
||||
bool measure_battery_temp;
|
||||
: This callback returns 0 if the temperature is safe for charging,
|
||||
`int (*temperature_out_of_range)(int *mC); / bool measure_battery_temp;`
|
||||
This callback returns 0 if the temperature is safe for charging,
|
||||
a positive number if it is too hot to charge, and a negative number
|
||||
if it is too cold to charge. With the variable mC, the callback returns
|
||||
the temperature in 1/1000 of centigrade.
|
||||
The source of temperature can be battery or ambient one according to
|
||||
the value of measure_battery_temp.
|
||||
};
|
||||
|
||||
|
||||
5. Notify Charger-Manager of charger events: cm_notify_event()
|
||||
=========================================================
|
||||
==============================================================
|
||||
If there is an charger event is required to notify
|
||||
Charger Manager, a charger device driver that triggers the event can call
|
||||
cm_notify_event(psy, type, msg) to notify the corresponding Charger Manager.
|
@ -1,7 +1,11 @@
|
||||
====================================================
|
||||
Testing suspend and resume support in device drivers
|
||||
====================================================
|
||||
|
||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||
|
||||
1. Preparing the test system
|
||||
============================
|
||||
|
||||
Unfortunately, to effectively test the support for the system-wide suspend and
|
||||
resume transitions in a driver, it is necessary to suspend and resume a fully
|
||||
@ -14,19 +18,20 @@ the machine's BIOS.
|
||||
Of course, for this purpose the test system has to be known to suspend and
|
||||
resume without the driver being tested. Thus, if possible, you should first
|
||||
resolve all suspend/resume-related problems in the test system before you start
|
||||
testing the new driver. Please see Documentation/power/basic-pm-debugging.txt
|
||||
testing the new driver. Please see Documentation/power/basic-pm-debugging.rst
|
||||
for more information about the debugging of suspend/resume functionality.
|
||||
|
||||
2. Testing the driver
|
||||
=====================
|
||||
|
||||
Once you have resolved the suspend/resume-related problems with your test system
|
||||
without the new driver, you are ready to test it:
|
||||
|
||||
a) Build the driver as a module, load it and try the test modes of hibernation
|
||||
(see: Documentation/power/basic-pm-debugging.txt, 1).
|
||||
(see: Documentation/power/basic-pm-debugging.rst, 1).
|
||||
|
||||
b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and
|
||||
"platform" modes (see: Documentation/power/basic-pm-debugging.txt, 1).
|
||||
"platform" modes (see: Documentation/power/basic-pm-debugging.rst, 1).
|
||||
|
||||
c) Compile the driver directly into the kernel and try the test modes of
|
||||
hibernation.
|
||||
@ -34,12 +39,12 @@ c) Compile the driver directly into the kernel and try the test modes of
|
||||
d) Attempt to hibernate with the driver compiled directly into the kernel
|
||||
in the "reboot", "shutdown" and "platform" modes.
|
||||
|
||||
e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.txt,
|
||||
e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.rst,
|
||||
2). [As far as the STR tests are concerned, it should not matter whether or
|
||||
not the driver is built as a module.]
|
||||
|
||||
f) Attempt to suspend to RAM using the s2ram tool with the driver loaded
|
||||
(see: Documentation/power/basic-pm-debugging.txt, 2).
|
||||
(see: Documentation/power/basic-pm-debugging.rst, 2).
|
||||
|
||||
Each of the above tests should be repeated several times and the STD tests
|
||||
should be mixed with the STR tests. If any of them fails, the driver cannot be
|
@ -1,6 +1,6 @@
|
||||
====================
|
||||
Energy Model of CPUs
|
||||
====================
|
||||
====================
|
||||
Energy Model of CPUs
|
||||
====================
|
||||
|
||||
1. Overview
|
||||
-----------
|
||||
@ -20,7 +20,7 @@ kernel, hence enabling to avoid redundant work.
|
||||
|
||||
The figure below depicts an example of drivers (Arm-specific here, but the
|
||||
approach is applicable to any architecture) providing power costs to the EM
|
||||
framework, and interested clients reading the data from it.
|
||||
framework, and interested clients reading the data from it::
|
||||
|
||||
+---------------+ +-----------------+ +---------------+
|
||||
| Thermal (IPA) | | Scheduler (EAS) | | Other |
|
||||
@ -58,15 +58,17 @@ micro-architectures.
|
||||
2. Core APIs
|
||||
------------
|
||||
|
||||
2.1 Config options
|
||||
2.1 Config options
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
|
||||
|
||||
|
||||
2.2 Registration of performance domains
|
||||
2.2 Registration of performance domains
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Drivers are expected to register performance domains into the EM framework by
|
||||
calling the following API:
|
||||
calling the following API::
|
||||
|
||||
int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
|
||||
struct em_data_callback *cb);
|
||||
@ -80,7 +82,8 @@ callback, and kernel/power/energy_model.c for further documentation on this
|
||||
API.
|
||||
|
||||
|
||||
2.3 Accessing performance domains
|
||||
2.3 Accessing performance domains
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Subsystems interested in the energy model of a CPU can retrieve it using the
|
||||
em_cpu_get() API. The energy model tables are allocated once upon creation of
|
||||
@ -99,46 +102,46 @@ More details about the above APIs can be found in include/linux/energy_model.h.
|
||||
This section provides a simple example of a CPUFreq driver registering a
|
||||
performance domain in the Energy Model framework using the (fake) 'foo'
|
||||
protocol. The driver implements an est_power() function to be provided to the
|
||||
EM framework.
|
||||
EM framework::
|
||||
|
||||
-> drivers/cpufreq/foo_cpufreq.c
|
||||
-> drivers/cpufreq/foo_cpufreq.c
|
||||
|
||||
01 static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
|
||||
02 {
|
||||
03 long freq, power;
|
||||
04
|
||||
05 /* Use the 'foo' protocol to ceil the frequency */
|
||||
06 freq = foo_get_freq_ceil(cpu, *KHz);
|
||||
07 if (freq < 0);
|
||||
08 return freq;
|
||||
09
|
||||
10 /* Estimate the power cost for the CPU at the relevant freq. */
|
||||
11 power = foo_estimate_power(cpu, freq);
|
||||
12 if (power < 0);
|
||||
13 return power;
|
||||
14
|
||||
15 /* Return the values to the EM framework */
|
||||
16 *mW = power;
|
||||
17 *KHz = freq;
|
||||
18
|
||||
19 return 0;
|
||||
20 }
|
||||
21
|
||||
22 static int foo_cpufreq_init(struct cpufreq_policy *policy)
|
||||
23 {
|
||||
24 struct em_data_callback em_cb = EM_DATA_CB(est_power);
|
||||
25 int nr_opp, ret;
|
||||
26
|
||||
27 /* Do the actual CPUFreq init work ... */
|
||||
28 ret = do_foo_cpufreq_init(policy);
|
||||
29 if (ret)
|
||||
30 return ret;
|
||||
31
|
||||
32 /* Find the number of OPPs for this policy */
|
||||
33 nr_opp = foo_get_nr_opp(policy);
|
||||
34
|
||||
35 /* And register the new performance domain */
|
||||
36 em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
|
||||
37
|
||||
38 return 0;
|
||||
39 }
|
||||
01 static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
|
||||
02 {
|
||||
03 long freq, power;
|
||||
04
|
||||
05 /* Use the 'foo' protocol to ceil the frequency */
|
||||
06 freq = foo_get_freq_ceil(cpu, *KHz);
|
||||
07 if (freq < 0);
|
||||
08 return freq;
|
||||
09
|
||||
10 /* Estimate the power cost for the CPU at the relevant freq. */
|
||||
11 power = foo_estimate_power(cpu, freq);
|
||||
12 if (power < 0);
|
||||
13 return power;
|
||||
14
|
||||
15 /* Return the values to the EM framework */
|
||||
16 *mW = power;
|
||||
17 *KHz = freq;
|
||||
18
|
||||
19 return 0;
|
||||
20 }
|
||||
21
|
||||
22 static int foo_cpufreq_init(struct cpufreq_policy *policy)
|
||||
23 {
|
||||
24 struct em_data_callback em_cb = EM_DATA_CB(est_power);
|
||||
25 int nr_opp, ret;
|
||||
26
|
||||
27 /* Do the actual CPUFreq init work ... */
|
||||
28 ret = do_foo_cpufreq_init(policy);
|
||||
29 if (ret)
|
||||
30 return ret;
|
||||
31
|
||||
32 /* Find the number of OPPs for this policy */
|
||||
33 nr_opp = foo_get_nr_opp(policy);
|
||||
34
|
||||
35 /* And register the new performance domain */
|
||||
36 em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
|
||||
37
|
||||
38 return 0;
|
||||
39 }
|
@ -1,13 +1,18 @@
|
||||
=================
|
||||
Freezing of tasks
|
||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||
=================
|
||||
|
||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||
|
||||
I. What is the freezing of tasks?
|
||||
=================================
|
||||
|
||||
The freezing of tasks is a mechanism by which user space processes and some
|
||||
kernel threads are controlled during hibernation or system-wide suspend (on some
|
||||
architectures).
|
||||
|
||||
II. How does it work?
|
||||
=====================
|
||||
|
||||
There are three per-task flags used for that, PF_NOFREEZE, PF_FROZEN
|
||||
and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have
|
||||
@ -41,7 +46,7 @@ explicitly in suitable places or use the wait_event_freezable() or
|
||||
wait_event_freezable_timeout() macros (defined in include/linux/freezer.h)
|
||||
that combine interruptible sleep with checking if the task is to be frozen and
|
||||
calling try_to_freeze(). The main loop of a freezable kernel thread may look
|
||||
like the following one:
|
||||
like the following one::
|
||||
|
||||
set_freezable();
|
||||
do {
|
||||
@ -65,7 +70,7 @@ order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that
|
||||
have been frozen leave __refrigerator() and continue running.
|
||||
|
||||
|
||||
Rationale behind the functions dealing with freezing and thawing of tasks:
|
||||
Rationale behind the functions dealing with freezing and thawing of tasks
|
||||
-------------------------------------------------------------------------
|
||||
|
||||
freeze_processes():
|
||||
@ -86,6 +91,7 @@ thaw_processes():
|
||||
|
||||
|
||||
III. Which kernel threads are freezable?
|
||||
========================================
|
||||
|
||||
Kernel threads are not freezable by default. However, a kernel thread may clear
|
||||
PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE
|
||||
@ -93,37 +99,39 @@ directly is not allowed). From this point it is regarded as freezable
|
||||
and must call try_to_freeze() in a suitable place.
|
||||
|
||||
IV. Why do we do that?
|
||||
======================
|
||||
|
||||
Generally speaking, there is a couple of reasons to use the freezing of tasks:
|
||||
|
||||
1. The principal reason is to prevent filesystems from being damaged after
|
||||
hibernation. At the moment we have no simple means of checkpointing
|
||||
filesystems, so if there are any modifications made to filesystem data and/or
|
||||
metadata on disks, we cannot bring them back to the state from before the
|
||||
modifications. At the same time each hibernation image contains some
|
||||
filesystem-related information that must be consistent with the state of the
|
||||
on-disk data and metadata after the system memory state has been restored from
|
||||
the image (otherwise the filesystems will be damaged in a nasty way, usually
|
||||
making them almost impossible to repair). We therefore freeze tasks that might
|
||||
cause the on-disk filesystems' data and metadata to be modified after the
|
||||
hibernation image has been created and before the system is finally powered off.
|
||||
The majority of these are user space processes, but if any of the kernel threads
|
||||
may cause something like this to happen, they have to be freezable.
|
||||
hibernation. At the moment we have no simple means of checkpointing
|
||||
filesystems, so if there are any modifications made to filesystem data and/or
|
||||
metadata on disks, we cannot bring them back to the state from before the
|
||||
modifications. At the same time each hibernation image contains some
|
||||
filesystem-related information that must be consistent with the state of the
|
||||
on-disk data and metadata after the system memory state has been restored
|
||||
from the image (otherwise the filesystems will be damaged in a nasty way,
|
||||
usually making them almost impossible to repair). We therefore freeze
|
||||
tasks that might cause the on-disk filesystems' data and metadata to be
|
||||
modified after the hibernation image has been created and before the
|
||||
system is finally powered off. The majority of these are user space
|
||||
processes, but if any of the kernel threads may cause something like this
|
||||
to happen, they have to be freezable.
|
||||
|
||||
2. Next, to create the hibernation image we need to free a sufficient amount of
|
||||
memory (approximately 50% of available RAM) and we need to do that before
|
||||
devices are deactivated, because we generally need them for swapping out. Then,
|
||||
after the memory for the image has been freed, we don't want tasks to allocate
|
||||
additional memory and we prevent them from doing that by freezing them earlier.
|
||||
[Of course, this also means that device drivers should not allocate substantial
|
||||
amounts of memory from their .suspend() callbacks before hibernation, but this
|
||||
is a separate issue.]
|
||||
memory (approximately 50% of available RAM) and we need to do that before
|
||||
devices are deactivated, because we generally need them for swapping out.
|
||||
Then, after the memory for the image has been freed, we don't want tasks
|
||||
to allocate additional memory and we prevent them from doing that by
|
||||
freezing them earlier. [Of course, this also means that device drivers
|
||||
should not allocate substantial amounts of memory from their .suspend()
|
||||
callbacks before hibernation, but this is a separate issue.]
|
||||
|
||||
3. The third reason is to prevent user space processes and some kernel threads
|
||||
from interfering with the suspending and resuming of devices. A user space
|
||||
process running on a second CPU while we are suspending devices may, for
|
||||
example, be troublesome and without the freezing of tasks we would need some
|
||||
safeguards against race conditions that might occur in such a case.
|
||||
from interfering with the suspending and resuming of devices. A user space
|
||||
process running on a second CPU while we are suspending devices may, for
|
||||
example, be troublesome and without the freezing of tasks we would need some
|
||||
safeguards against race conditions that might occur in such a case.
|
||||
|
||||
Although Linus Torvalds doesn't like the freezing of tasks, he said this in one
|
||||
of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
|
||||
@ -132,7 +140,7 @@ of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
|
||||
|
||||
Linus: In many ways, 'at all'.
|
||||
|
||||
I _do_ realize the IO request queue issues, and that we cannot actually do
|
||||
I **do** realize the IO request queue issues, and that we cannot actually do
|
||||
s2ram with some devices in the middle of a DMA. So we want to be able to
|
||||
avoid *that*, there's no question about that. And I suspect that stopping
|
||||
user threads and then waiting for a sync is practically one of the easier
|
||||
@ -150,17 +158,18 @@ thawed after the driver's .resume() callback has run, so it won't be accessing
|
||||
the device while it's suspended.
|
||||
|
||||
4. Another reason for freezing tasks is to prevent user space processes from
|
||||
realizing that hibernation (or suspend) operation takes place. Ideally, user
|
||||
space processes should not notice that such a system-wide operation has occurred
|
||||
and should continue running without any problems after the restore (or resume
|
||||
from suspend). Unfortunately, in the most general case this is quite difficult
|
||||
to achieve without the freezing of tasks. Consider, for example, a process
|
||||
that depends on all CPUs being online while it's running. Since we need to
|
||||
disable nonboot CPUs during the hibernation, if this process is not frozen, it
|
||||
may notice that the number of CPUs has changed and may start to work incorrectly
|
||||
because of that.
|
||||
realizing that hibernation (or suspend) operation takes place. Ideally, user
|
||||
space processes should not notice that such a system-wide operation has
|
||||
occurred and should continue running without any problems after the restore
|
||||
(or resume from suspend). Unfortunately, in the most general case this
|
||||
is quite difficult to achieve without the freezing of tasks. Consider,
|
||||
for example, a process that depends on all CPUs being online while it's
|
||||
running. Since we need to disable nonboot CPUs during the hibernation,
|
||||
if this process is not frozen, it may notice that the number of CPUs has
|
||||
changed and may start to work incorrectly because of that.
|
||||
|
||||
V. Are there any problems related to the freezing of tasks?
|
||||
===========================================================
|
||||
|
||||
Yes, there are.
|
||||
|
||||
@ -172,11 +181,12 @@ may be undesirable. That's why kernel threads are not freezable by default.
|
||||
|
||||
Second, there are the following two problems related to the freezing of user
|
||||
space processes:
|
||||
|
||||
1. Putting processes into an uninterruptible sleep distorts the load average.
|
||||
2. Now that we have FUSE, plus the framework for doing device drivers in
|
||||
userspace, it gets even more complicated because some userspace processes are
|
||||
now doing the sorts of things that kernel threads do
|
||||
(https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html).
|
||||
userspace, it gets even more complicated because some userspace processes are
|
||||
now doing the sorts of things that kernel threads do
|
||||
(https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html).
|
||||
|
||||
The problem 1. seems to be fixable, although it hasn't been fixed so far. The
|
||||
other one is more serious, but it seems that we can work around it by using
|
||||
@ -201,6 +211,7 @@ requested early enough using the suspend notifier API described in
|
||||
Documentation/driver-api/pm/notifiers.rst.
|
||||
|
||||
VI. Are there any precautions to be taken to prevent freezing failures?
|
||||
=======================================================================
|
||||
|
||||
Yes, there are.
|
||||
|
||||
@ -226,6 +237,8 @@ So, to summarize, use [un]lock_system_sleep() instead of directly using
|
||||
mutex_[un]lock(&system_transition_mutex). That would prevent freezing failures.
|
||||
|
||||
V. Miscellaneous
|
||||
================
|
||||
|
||||
/sys/power/pm_freeze_timeout controls how long it will cost at most to freeze
|
||||
all user space processes or all freezable kernel threads, in unit of millisecond.
|
||||
The default value is 20000, with range of unsigned integer.
|
46
Documentation/power/index.rst
Normal file
46
Documentation/power/index.rst
Normal file
@ -0,0 +1,46 @@
|
||||
:orphan:
|
||||
|
||||
================
|
||||
Power Management
|
||||
================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
apm-acpi
|
||||
basic-pm-debugging
|
||||
charger-manager
|
||||
drivers-testing
|
||||
energy-model
|
||||
freezing-of-tasks
|
||||
interface
|
||||
opp
|
||||
pci
|
||||
pm_qos_interface
|
||||
power_supply_class
|
||||
runtime_pm
|
||||
s2ram
|
||||
suspend-and-cpuhotplug
|
||||
suspend-and-interrupts
|
||||
swsusp-and-swap-files
|
||||
swsusp-dmcrypt
|
||||
swsusp
|
||||
video
|
||||
tricks
|
||||
|
||||
userland-swsusp
|
||||
|
||||
powercap/powercap
|
||||
|
||||
regulator/consumer
|
||||
regulator/design
|
||||
regulator/machine
|
||||
regulator/overview
|
||||
regulator/regulator
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
@ -1,4 +1,6 @@
|
||||
===========================================
|
||||
Power Management Interface for System Sleep
|
||||
===========================================
|
||||
|
||||
Copyright (c) 2016 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||||
|
||||
@ -11,10 +13,10 @@ mounted at /sys).
|
||||
|
||||
Reading from it returns a list of supported sleep states, encoded as:
|
||||
|
||||
'freeze' (Suspend-to-Idle)
|
||||
'standby' (Power-On Suspend)
|
||||
'mem' (Suspend-to-RAM)
|
||||
'disk' (Suspend-to-Disk)
|
||||
- 'freeze' (Suspend-to-Idle)
|
||||
- 'standby' (Power-On Suspend)
|
||||
- 'mem' (Suspend-to-RAM)
|
||||
- 'disk' (Suspend-to-Disk)
|
||||
|
||||
Suspend-to-Idle is always supported. Suspend-to-Disk is always supported
|
||||
too as long the kernel has been configured to support hibernation at all
|
||||
@ -32,18 +34,18 @@ Specifically, it tells the kernel what to do after creating a hibernation image.
|
||||
|
||||
Reading from it returns a list of supported options encoded as:
|
||||
|
||||
'platform' (put the system into sleep using a platform-provided method)
|
||||
'shutdown' (shut the system down)
|
||||
'reboot' (reboot the system)
|
||||
'suspend' (trigger a Suspend-to-RAM transition)
|
||||
'test_resume' (resume-after-hibernation test mode)
|
||||
- 'platform' (put the system into sleep using a platform-provided method)
|
||||
- 'shutdown' (shut the system down)
|
||||
- 'reboot' (reboot the system)
|
||||
- 'suspend' (trigger a Suspend-to-RAM transition)
|
||||
- 'test_resume' (resume-after-hibernation test mode)
|
||||
|
||||
The currently selected option is printed in square brackets.
|
||||
|
||||
The 'platform' option is only available if the platform provides a special
|
||||
mechanism to put the system to sleep after creating a hibernation image (ACPI
|
||||
does that, for example). The 'suspend' option is available if Suspend-to-RAM
|
||||
is supported. Refer to Documentation/power/basic-pm-debugging.txt for the
|
||||
is supported. Refer to Documentation/power/basic-pm-debugging.rst for the
|
||||
description of the 'test_resume' option.
|
||||
|
||||
To select an option, write the string representing it to /sys/power/disk.
|
||||
@ -71,7 +73,7 @@ If /sys/power/pm_trace contains '1', the fingerprint of each suspend/resume
|
||||
event point in turn will be stored in the RTC memory (overwriting the actual
|
||||
RTC information), so it will survive a system crash if one occurs right after
|
||||
storing it and it can be used later to identify the driver that caused the crash
|
||||
to happen (see Documentation/power/s2ram.txt for more information).
|
||||
to happen (see Documentation/power/s2ram.rst for more information).
|
||||
|
||||
Initially it contains '0' which may be changed to '1' by writing a string
|
||||
representing a nonzero integer into it.
|
@ -1,20 +1,23 @@
|
||||
==========================================
|
||||
Operating Performance Points (OPP) Library
|
||||
==========================================
|
||||
|
||||
(C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated
|
||||
|
||||
Contents
|
||||
--------
|
||||
1. Introduction
|
||||
2. Initial OPP List Registration
|
||||
3. OPP Search Functions
|
||||
4. OPP Availability Control Functions
|
||||
5. OPP Data Retrieval Functions
|
||||
6. Data Structures
|
||||
.. Contents
|
||||
|
||||
1. Introduction
|
||||
2. Initial OPP List Registration
|
||||
3. OPP Search Functions
|
||||
4. OPP Availability Control Functions
|
||||
5. OPP Data Retrieval Functions
|
||||
6. Data Structures
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
|
||||
1.1 What is an Operating Performance Point (OPP)?
|
||||
-------------------------------------------------
|
||||
|
||||
Complex SoCs of today consists of a multiple sub-modules working in conjunction.
|
||||
In an operational system executing varied use cases, not all modules in the SoC
|
||||
@ -28,16 +31,19 @@ the device will support per domain are called Operating Performance Points or
|
||||
OPPs.
|
||||
|
||||
As an example:
|
||||
|
||||
Let us consider an MPU device which supports the following:
|
||||
{300MHz at minimum voltage of 1V}, {800MHz at minimum voltage of 1.2V},
|
||||
{1GHz at minimum voltage of 1.3V}
|
||||
|
||||
We can represent these as three OPPs as the following {Hz, uV} tuples:
|
||||
{300000000, 1000000}
|
||||
{800000000, 1200000}
|
||||
{1000000000, 1300000}
|
||||
|
||||
- {300000000, 1000000}
|
||||
- {800000000, 1200000}
|
||||
- {1000000000, 1300000}
|
||||
|
||||
1.2 Operating Performance Points Library
|
||||
----------------------------------------
|
||||
|
||||
OPP library provides a set of helper functions to organize and query the OPP
|
||||
information. The library is located in drivers/base/power/opp.c and the header
|
||||
@ -46,9 +52,10 @@ CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on
|
||||
CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to
|
||||
optionally boot at a certain OPP without needing cpufreq.
|
||||
|
||||
Typical usage of the OPP library is as follows:
|
||||
(users) -> registers a set of default OPPs -> (library)
|
||||
SoC framework -> modifies on required cases certain OPPs -> OPP layer
|
||||
Typical usage of the OPP library is as follows::
|
||||
|
||||
(users) -> registers a set of default OPPs -> (library)
|
||||
SoC framework -> modifies on required cases certain OPPs -> OPP layer
|
||||
-> queries to search/retrieve information ->
|
||||
|
||||
OPP layer expects each domain to be represented by a unique device pointer. SoC
|
||||
@ -57,8 +64,9 @@ list is expected to be an optimally small number typically around 5 per device.
|
||||
This initial list contains a set of OPPs that the framework expects to be safely
|
||||
enabled by default in the system.
|
||||
|
||||
Note on OPP Availability:
|
||||
------------------------
|
||||
Note on OPP Availability
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
As the system proceeds to operate, SoC framework may choose to make certain
|
||||
OPPs available or not available on each device based on various external
|
||||
factors. Example usage: Thermal management or other exceptional situations where
|
||||
@ -88,7 +96,8 @@ registering the OPPs is maintained by OPP library throughout the device
|
||||
operation. The SoC framework can subsequently control the availability of the
|
||||
OPPs dynamically using the dev_pm_opp_enable / disable functions.
|
||||
|
||||
dev_pm_opp_add - Add a new OPP for a specific domain represented by the device pointer.
|
||||
dev_pm_opp_add
|
||||
Add a new OPP for a specific domain represented by the device pointer.
|
||||
The OPP is defined using the frequency and voltage. Once added, the OPP
|
||||
is assumed to be available and control of it's availability can be done
|
||||
with the dev_pm_opp_enable/disable functions. OPP library internally stores
|
||||
@ -96,9 +105,11 @@ dev_pm_opp_add - Add a new OPP for a specific domain represented by the device p
|
||||
used by SoC framework to define a optimal list as per the demands of
|
||||
SoC usage environment.
|
||||
|
||||
WARNING: Do not use this function in interrupt context.
|
||||
WARNING:
|
||||
Do not use this function in interrupt context.
|
||||
|
||||
Example::
|
||||
|
||||
Example:
|
||||
soc_pm_init()
|
||||
{
|
||||
/* Do things */
|
||||
@ -125,12 +136,15 @@ Callers of these functions shall call dev_pm_opp_put() after they have used the
|
||||
OPP. Otherwise the memory for the OPP will never get freed and result in
|
||||
memleak.
|
||||
|
||||
dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
|
||||
dev_pm_opp_find_freq_exact
|
||||
Search for an OPP based on an *exact* frequency and
|
||||
availability. This function is especially useful to enable an OPP which
|
||||
is not available by default.
|
||||
Example: In a case when SoC framework detects a situation where a
|
||||
higher frequency could be made available, it can use this function to
|
||||
find the OPP prior to call the dev_pm_opp_enable to actually make it available.
|
||||
find the OPP prior to call the dev_pm_opp_enable to actually make
|
||||
it available::
|
||||
|
||||
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
|
||||
dev_pm_opp_put(opp);
|
||||
/* dont operate on the pointer.. just do a sanity check.. */
|
||||
@ -141,27 +155,34 @@ dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
|
||||
dev_pm_opp_enable(dev,1000000000);
|
||||
}
|
||||
|
||||
NOTE: This is the only search function that operates on OPPs which are
|
||||
not available.
|
||||
NOTE:
|
||||
This is the only search function that operates on OPPs which are
|
||||
not available.
|
||||
|
||||
dev_pm_opp_find_freq_floor - Search for an available OPP which is *at most* the
|
||||
dev_pm_opp_find_freq_floor
|
||||
Search for an available OPP which is *at most* the
|
||||
provided frequency. This function is useful while searching for a lesser
|
||||
match OR operating on OPP information in the order of decreasing
|
||||
frequency.
|
||||
Example: To find the highest opp for a device:
|
||||
Example: To find the highest opp for a device::
|
||||
|
||||
freq = ULONG_MAX;
|
||||
opp = dev_pm_opp_find_freq_floor(dev, &freq);
|
||||
dev_pm_opp_put(opp);
|
||||
|
||||
dev_pm_opp_find_freq_ceil - Search for an available OPP which is *at least* the
|
||||
dev_pm_opp_find_freq_ceil
|
||||
Search for an available OPP which is *at least* the
|
||||
provided frequency. This function is useful while searching for a
|
||||
higher match OR operating on OPP information in the order of increasing
|
||||
frequency.
|
||||
Example 1: To find the lowest opp for a device:
|
||||
Example 1: To find the lowest opp for a device::
|
||||
|
||||
freq = 0;
|
||||
opp = dev_pm_opp_find_freq_ceil(dev, &freq);
|
||||
dev_pm_opp_put(opp);
|
||||
Example 2: A simplified implementation of a SoC cpufreq_driver->target:
|
||||
|
||||
Example 2: A simplified implementation of a SoC cpufreq_driver->target::
|
||||
|
||||
soc_cpufreq_target(..)
|
||||
{
|
||||
/* Do stuff like policy checks etc. */
|
||||
@ -184,12 +205,15 @@ fine grained dynamic control of which sets of OPPs are operationally available.
|
||||
These functions are intended to *temporarily* remove an OPP in conditions such
|
||||
as thermal considerations (e.g. don't use OPPx until the temperature drops).
|
||||
|
||||
WARNING: Do not use these functions in interrupt context.
|
||||
WARNING:
|
||||
Do not use these functions in interrupt context.
|
||||
|
||||
dev_pm_opp_enable - Make a OPP available for operation.
|
||||
dev_pm_opp_enable
|
||||
Make a OPP available for operation.
|
||||
Example: Lets say that 1GHz OPP is to be made available only if the
|
||||
SoC temperature is lower than a certain threshold. The SoC framework
|
||||
implementation might choose to do something as follows:
|
||||
implementation might choose to do something as follows::
|
||||
|
||||
if (cur_temp < temp_low_thresh) {
|
||||
/* Enable 1GHz if it was disabled */
|
||||
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
|
||||
@ -201,10 +225,12 @@ dev_pm_opp_enable - Make a OPP available for operation.
|
||||
goto try_something_else;
|
||||
}
|
||||
|
||||
dev_pm_opp_disable - Make an OPP to be not available for operation
|
||||
dev_pm_opp_disable
|
||||
Make an OPP to be not available for operation
|
||||
Example: Lets say that 1GHz OPP is to be disabled if the temperature
|
||||
exceeds a threshold value. The SoC framework implementation might
|
||||
choose to do something as follows:
|
||||
choose to do something as follows::
|
||||
|
||||
if (cur_temp > temp_high_thresh) {
|
||||
/* Disable 1GHz if it was enabled */
|
||||
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, true);
|
||||
@ -223,11 +249,13 @@ information from the OPP structure is necessary. Once an OPP pointer is
|
||||
retrieved using the search functions, the following functions can be used by SoC
|
||||
framework to retrieve the information represented inside the OPP layer.
|
||||
|
||||
dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer.
|
||||
dev_pm_opp_get_voltage
|
||||
Retrieve the voltage represented by the opp pointer.
|
||||
Example: At a cpufreq transition to a different frequency, SoC
|
||||
framework requires to set the voltage represented by the OPP using
|
||||
the regulator framework to the Power Management chip providing the
|
||||
voltage.
|
||||
voltage::
|
||||
|
||||
soc_switch_to_freq_voltage(freq)
|
||||
{
|
||||
/* do things */
|
||||
@ -239,10 +267,12 @@ dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer.
|
||||
/* do other things */
|
||||
}
|
||||
|
||||
dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer.
|
||||
dev_pm_opp_get_freq
|
||||
Retrieve the freq represented by the opp pointer.
|
||||
Example: Lets say the SoC framework uses a couple of helper functions
|
||||
we could pass opp pointers instead of doing additional parameters to
|
||||
handle quiet a bit of data parameters.
|
||||
handle quiet a bit of data parameters::
|
||||
|
||||
soc_cpufreq_target(..)
|
||||
{
|
||||
/* do things.. */
|
||||
@ -264,9 +294,11 @@ dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer.
|
||||
/* do things.. */
|
||||
}
|
||||
|
||||
dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device
|
||||
dev_pm_opp_get_opp_count
|
||||
Retrieve the number of available opps for a device
|
||||
Example: Lets say a co-processor in the SoC needs to know the available
|
||||
frequencies in a table, the main processor can notify as following:
|
||||
frequencies in a table, the main processor can notify as following::
|
||||
|
||||
soc_notify_coproc_available_frequencies()
|
||||
{
|
||||
/* Do things */
|
||||
@ -289,54 +321,59 @@ dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device
|
||||
==================
|
||||
Typically an SoC contains multiple voltage domains which are variable. Each
|
||||
domain is represented by a device pointer. The relationship to OPP can be
|
||||
represented as follows:
|
||||
SoC
|
||||
|- device 1
|
||||
| |- opp 1 (availability, freq, voltage)
|
||||
| |- opp 2 ..
|
||||
... ...
|
||||
| `- opp n ..
|
||||
|- device 2
|
||||
...
|
||||
`- device m
|
||||
represented as follows::
|
||||
|
||||
SoC
|
||||
|- device 1
|
||||
| |- opp 1 (availability, freq, voltage)
|
||||
| |- opp 2 ..
|
||||
... ...
|
||||
| `- opp n ..
|
||||
|- device 2
|
||||
...
|
||||
`- device m
|
||||
|
||||
OPP library maintains a internal list that the SoC framework populates and
|
||||
accessed by various functions as described above. However, the structures
|
||||
representing the actual OPPs and domains are internal to the OPP library itself
|
||||
to allow for suitable abstraction reusable across systems.
|
||||
|
||||
struct dev_pm_opp - The internal data structure of OPP library which is used to
|
||||
struct dev_pm_opp
|
||||
The internal data structure of OPP library which is used to
|
||||
represent an OPP. In addition to the freq, voltage, availability
|
||||
information, it also contains internal book keeping information required
|
||||
for the OPP library to operate on. Pointer to this structure is
|
||||
provided back to the users such as SoC framework to be used as a
|
||||
identifier for OPP in the interactions with OPP layer.
|
||||
|
||||
WARNING: The struct dev_pm_opp pointer should not be parsed or modified by the
|
||||
users. The defaults of for an instance is populated by dev_pm_opp_add, but the
|
||||
availability of the OPP can be modified by dev_pm_opp_enable/disable functions.
|
||||
WARNING:
|
||||
The struct dev_pm_opp pointer should not be parsed or modified by the
|
||||
users. The defaults of for an instance is populated by
|
||||
dev_pm_opp_add, but the availability of the OPP can be modified
|
||||
by dev_pm_opp_enable/disable functions.
|
||||
|
||||
struct device - This is used to identify a domain to the OPP layer. The
|
||||
struct device
|
||||
This is used to identify a domain to the OPP layer. The
|
||||
nature of the device and it's implementation is left to the user of
|
||||
OPP library such as the SoC framework.
|
||||
|
||||
Overall, in a simplistic view, the data structure operations is represented as
|
||||
following:
|
||||
following::
|
||||
|
||||
Initialization / modification:
|
||||
+-----+ /- dev_pm_opp_enable
|
||||
dev_pm_opp_add --> | opp | <-------
|
||||
| +-----+ \- dev_pm_opp_disable
|
||||
\-------> domain_info(device)
|
||||
Initialization / modification:
|
||||
+-----+ /- dev_pm_opp_enable
|
||||
dev_pm_opp_add --> | opp | <-------
|
||||
| +-----+ \- dev_pm_opp_disable
|
||||
\-------> domain_info(device)
|
||||
|
||||
Search functions:
|
||||
/-- dev_pm_opp_find_freq_ceil ---\ +-----+
|
||||
domain_info<---- dev_pm_opp_find_freq_exact -----> | opp |
|
||||
\-- dev_pm_opp_find_freq_floor ---/ +-----+
|
||||
Search functions:
|
||||
/-- dev_pm_opp_find_freq_ceil ---\ +-----+
|
||||
domain_info<---- dev_pm_opp_find_freq_exact -----> | opp |
|
||||
\-- dev_pm_opp_find_freq_floor ---/ +-----+
|
||||
|
||||
Retrieval functions:
|
||||
+-----+ /- dev_pm_opp_get_voltage
|
||||
| opp | <---
|
||||
+-----+ \- dev_pm_opp_get_freq
|
||||
Retrieval functions:
|
||||
+-----+ /- dev_pm_opp_get_voltage
|
||||
| opp | <---
|
||||
+-----+ \- dev_pm_opp_get_freq
|
||||
|
||||
domain_info <- dev_pm_opp_get_opp_count
|
||||
domain_info <- dev_pm_opp_get_opp_count
|
@ -1,4 +1,6 @@
|
||||
====================
|
||||
PCI Power Management
|
||||
====================
|
||||
|
||||
Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
|
||||
|
||||
@ -9,14 +11,14 @@ management. Based on previous work by Patrick Mochel <mochel@transmeta.com>
|
||||
This document only covers the aspects of power management specific to PCI
|
||||
devices. For general description of the kernel's interfaces related to device
|
||||
power management refer to Documentation/driver-api/pm/devices.rst and
|
||||
Documentation/power/runtime_pm.txt.
|
||||
Documentation/power/runtime_pm.rst.
|
||||
|
||||
---------------------------------------------------------------------------
|
||||
.. contents:
|
||||
|
||||
1. Hardware and Platform Support for PCI Power Management
|
||||
2. PCI Subsystem and Device Power Management
|
||||
3. PCI Device Drivers and Power Management
|
||||
4. Resources
|
||||
1. Hardware and Platform Support for PCI Power Management
|
||||
2. PCI Subsystem and Device Power Management
|
||||
3. PCI Device Drivers and Power Management
|
||||
4. Resources
|
||||
|
||||
|
||||
1. Hardware and Platform Support for PCI Power Management
|
||||
@ -24,6 +26,7 @@ Documentation/power/runtime_pm.txt.
|
||||
|
||||
1.1. Native and Platform-Based Power Management
|
||||
-----------------------------------------------
|
||||
|
||||
In general, power management is a feature allowing one to save energy by putting
|
||||
devices into states in which they draw less power (low-power states) at the
|
||||
price of reduced functionality or performance.
|
||||
@ -67,6 +70,7 @@ mechanisms have to be used simultaneously to obtain the desired result.
|
||||
|
||||
1.2. Native PCI Power Management
|
||||
--------------------------------
|
||||
|
||||
The PCI Bus Power Management Interface Specification (PCI PM Spec) was
|
||||
introduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a
|
||||
standard interface for performing various operations related to power
|
||||
@ -134,6 +138,7 @@ sufficiently active to generate a wakeup signal.
|
||||
|
||||
1.3. ACPI Device Power Management
|
||||
---------------------------------
|
||||
|
||||
The platform firmware support for the power management of PCI devices is
|
||||
system-specific. However, if the system in question is compliant with the
|
||||
Advanced Configuration and Power Interface (ACPI) Specification, like the
|
||||
@ -194,6 +199,7 @@ enabled for the device to be able to generate wakeup signals.
|
||||
|
||||
1.4. Wakeup Signaling
|
||||
---------------------
|
||||
|
||||
Wakeup signals generated by PCI devices, either as native PCI PMEs, or as
|
||||
a result of the execution of the _DSW (or _PSW) ACPI control method before
|
||||
putting the device into a low-power state, have to be caught and handled as
|
||||
@ -265,14 +271,15 @@ the native PCI Express PME signaling cannot be used by the kernel in that case.
|
||||
|
||||
2.1. Device Power Management Callbacks
|
||||
--------------------------------------
|
||||
|
||||
The PCI Subsystem participates in the power management of PCI devices in a
|
||||
number of ways. First of all, it provides an intermediate code layer between
|
||||
the device power management core (PM core) and PCI device drivers.
|
||||
Specifically, the pm field of the PCI subsystem's struct bus_type object,
|
||||
pci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing
|
||||
pointers to several device power management callbacks:
|
||||
pointers to several device power management callbacks::
|
||||
|
||||
const struct dev_pm_ops pci_dev_pm_ops = {
|
||||
const struct dev_pm_ops pci_dev_pm_ops = {
|
||||
.prepare = pci_pm_prepare,
|
||||
.complete = pci_pm_complete,
|
||||
.suspend = pci_pm_suspend,
|
||||
@ -290,7 +297,7 @@ const struct dev_pm_ops pci_dev_pm_ops = {
|
||||
.runtime_suspend = pci_pm_runtime_suspend,
|
||||
.runtime_resume = pci_pm_runtime_resume,
|
||||
.runtime_idle = pci_pm_runtime_idle,
|
||||
};
|
||||
};
|
||||
|
||||
These callbacks are executed by the PM core in various situations related to
|
||||
device power management and they, in turn, execute power management callbacks
|
||||
@ -299,9 +306,9 @@ involving some standard configuration registers of PCI devices that device
|
||||
drivers need not know or care about.
|
||||
|
||||
The structure representing a PCI device, struct pci_dev, contains several fields
|
||||
that these callbacks operate on:
|
||||
that these callbacks operate on::
|
||||
|
||||
struct pci_dev {
|
||||
struct pci_dev {
|
||||
...
|
||||
pci_power_t current_state; /* Current operating state. */
|
||||
int pm_cap; /* PM capability offset in the
|
||||
@ -315,13 +322,14 @@ struct pci_dev {
|
||||
unsigned int wakeup_prepared:1; /* Device prepared for wake up */
|
||||
unsigned int d3_delay; /* D3->D0 transition time in ms */
|
||||
...
|
||||
};
|
||||
};
|
||||
|
||||
They also indirectly use some fields of the struct device that is embedded in
|
||||
struct pci_dev.
|
||||
|
||||
2.2. Device Initialization
|
||||
--------------------------
|
||||
|
||||
The PCI subsystem's first task related to device power management is to
|
||||
prepare the device for power management and initialize the fields of struct
|
||||
pci_dev used for this purpose. This happens in two functions defined in
|
||||
@ -348,10 +356,11 @@ during system-wide transitions to a sleep state and back to the working state.
|
||||
|
||||
2.3. Runtime Device Power Management
|
||||
------------------------------------
|
||||
|
||||
The PCI subsystem plays a vital role in the runtime power management of PCI
|
||||
devices. For this purpose it uses the general runtime power management
|
||||
(runtime PM) framework described in Documentation/power/runtime_pm.txt.
|
||||
Namely, it provides subsystem-level callbacks:
|
||||
(runtime PM) framework described in Documentation/power/runtime_pm.rst.
|
||||
Namely, it provides subsystem-level callbacks::
|
||||
|
||||
pci_pm_runtime_suspend()
|
||||
pci_pm_runtime_resume()
|
||||
@ -425,13 +434,14 @@ to the given subsystem before the next phase begins. These phases always run
|
||||
after tasks have been frozen.
|
||||
|
||||
2.4.1. System Suspend
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When the system is going into a sleep state in which the contents of memory will
|
||||
be preserved, such as one of the ACPI sleep states S1-S3, the phases are:
|
||||
|
||||
prepare, suspend, suspend_noirq.
|
||||
|
||||
The following PCI bus type's callbacks, respectively, are used in these phases:
|
||||
The following PCI bus type's callbacks, respectively, are used in these phases::
|
||||
|
||||
pci_pm_prepare()
|
||||
pci_pm_suspend()
|
||||
@ -492,6 +502,7 @@ this purpose). PCI device drivers are not encouraged to do that, but in some
|
||||
rare cases doing that in the driver may be the optimum approach.
|
||||
|
||||
2.4.2. System Resume
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When the system is undergoing a transition from a sleep state in which the
|
||||
contents of memory have been preserved, such as one of the ACPI sleep states
|
||||
@ -500,7 +511,7 @@ S1-S3, into the working state (ACPI S0), the phases are:
|
||||
resume_noirq, resume, complete.
|
||||
|
||||
The following PCI bus type's callbacks, respectively, are executed in these
|
||||
phases:
|
||||
phases::
|
||||
|
||||
pci_pm_resume_noirq()
|
||||
pci_pm_resume()
|
||||
@ -539,6 +550,7 @@ The pci_pm_complete() routine only executes the device driver's pm->complete()
|
||||
callback, if defined.
|
||||
|
||||
2.4.3. System Hibernation
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
System hibernation is more complicated than system suspend, because it requires
|
||||
a system image to be created and written into a persistent storage medium. The
|
||||
@ -551,7 +563,7 @@ to be free) in the following three phases:
|
||||
|
||||
prepare, freeze, freeze_noirq
|
||||
|
||||
that correspond to the PCI bus type's callbacks:
|
||||
that correspond to the PCI bus type's callbacks::
|
||||
|
||||
pci_pm_prepare()
|
||||
pci_pm_freeze()
|
||||
@ -580,7 +592,7 @@ back to the fully functional state and this is done in the following phases:
|
||||
|
||||
thaw_noirq, thaw, complete
|
||||
|
||||
using the following PCI bus type's callbacks:
|
||||
using the following PCI bus type's callbacks::
|
||||
|
||||
pci_pm_thaw_noirq()
|
||||
pci_pm_thaw()
|
||||
@ -608,7 +620,7 @@ three phases:
|
||||
|
||||
where the prepare phase is exactly the same as for system suspend. The other
|
||||
two phases are analogous to the suspend and suspend_noirq phases, respectively.
|
||||
The PCI subsystem-level callbacks they correspond to
|
||||
The PCI subsystem-level callbacks they correspond to::
|
||||
|
||||
pci_pm_poweroff()
|
||||
pci_pm_poweroff_noirq()
|
||||
@ -618,6 +630,7 @@ although they don't attempt to save the device's standard configuration
|
||||
registers.
|
||||
|
||||
2.4.4. System Restore
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
System restore requires a hibernation image to be loaded into memory and the
|
||||
pre-hibernation memory contents to be restored before the pre-hibernation system
|
||||
@ -653,7 +666,7 @@ phases:
|
||||
|
||||
The first two of these are analogous to the resume_noirq and resume phases
|
||||
described above, respectively, and correspond to the following PCI subsystem
|
||||
callbacks:
|
||||
callbacks::
|
||||
|
||||
pci_pm_restore_noirq()
|
||||
pci_pm_restore()
|
||||
@ -671,6 +684,7 @@ resume.
|
||||
|
||||
3.1. Power Management Callbacks
|
||||
-------------------------------
|
||||
|
||||
PCI device drivers participate in power management by providing callbacks to be
|
||||
executed by the PCI subsystem's power management routines described above and by
|
||||
controlling the runtime power management of their devices.
|
||||
@ -698,6 +712,7 @@ defined, though, they are expected to behave as described in the following
|
||||
subsections.
|
||||
|
||||
3.1.1. prepare()
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
The prepare() callback is executed during system suspend, during hibernation
|
||||
(when a hibernation image is about to be created), during power-off after
|
||||
@ -716,6 +731,7 @@ preallocated earlier, for example in a suspend/hibernate notifier as described
|
||||
in Documentation/driver-api/pm/notifiers.rst).
|
||||
|
||||
3.1.2. suspend()
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
The suspend() callback is only executed during system suspend, after prepare()
|
||||
callbacks have been executed for all devices in the system.
|
||||
@ -742,6 +758,7 @@ operations relying on the driver's ability to handle interrupts should be
|
||||
carried out in this callback.
|
||||
|
||||
3.1.3. suspend_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The suspend_noirq() callback is only executed during system suspend, after
|
||||
suspend() callbacks have been executed for all devices in the system and
|
||||
@ -753,6 +770,7 @@ suspend_noirq() can carry out operations that would cause race conditions to
|
||||
arise if they were performed in suspend().
|
||||
|
||||
3.1.4. freeze()
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
The freeze() callback is hibernation-specific and is executed in two situations,
|
||||
during hibernation, after prepare() callbacks have been executed for all devices
|
||||
@ -770,6 +788,7 @@ or put it into a low-power state. Still, either it or freeze_noirq() should
|
||||
save the device's standard configuration registers using pci_save_state().
|
||||
|
||||
3.1.5. freeze_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The freeze_noirq() callback is hibernation-specific. It is executed during
|
||||
hibernation, after prepare() and freeze() callbacks have been executed for all
|
||||
@ -786,6 +805,7 @@ The difference between freeze_noirq() and freeze() is analogous to the
|
||||
difference between suspend_noirq() and suspend().
|
||||
|
||||
3.1.6. poweroff()
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The poweroff() callback is hibernation-specific. It is executed when the system
|
||||
is about to be powered off after saving a hibernation image to a persistent
|
||||
@ -802,6 +822,7 @@ into a low-power state, respectively, but it need not save the device's standard
|
||||
configuration registers.
|
||||
|
||||
3.1.7. poweroff_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The poweroff_noirq() callback is hibernation-specific. It is executed after
|
||||
poweroff() callbacks have been executed for all devices in the system.
|
||||
@ -814,6 +835,7 @@ The difference between poweroff_noirq() and poweroff() is analogous to the
|
||||
difference between suspend_noirq() and suspend().
|
||||
|
||||
3.1.8. resume_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The resume_noirq() callback is only executed during system resume, after the
|
||||
PM core has enabled the non-boot CPUs. The driver's interrupt handler will not
|
||||
@ -827,6 +849,7 @@ it should only be used for performing operations that would lead to race
|
||||
conditions if carried out by resume().
|
||||
|
||||
3.1.9. resume()
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
The resume() callback is only executed during system resume, after
|
||||
resume_noirq() callbacks have been executed for all devices in the system and
|
||||
@ -837,6 +860,7 @@ device and bringing it back to the fully functional state. The device should be
|
||||
able to process I/O in a usual way after resume() has returned.
|
||||
|
||||
3.1.10. thaw_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The thaw_noirq() callback is hibernation-specific. It is executed after a
|
||||
system image has been created and the non-boot CPUs have been enabled by the PM
|
||||
@ -851,6 +875,7 @@ freeze() and freeze_noirq(), so in general it does not need to modify the
|
||||
contents of the device's registers.
|
||||
|
||||
3.1.11. thaw()
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
The thaw() callback is hibernation-specific. It is executed after thaw_noirq()
|
||||
callbacks have been executed for all devices in the system and after device
|
||||
@ -860,6 +885,7 @@ This callback is responsible for restoring the pre-freeze configuration of
|
||||
the device, so that it will work in a usual way after thaw() has returned.
|
||||
|
||||
3.1.12. restore_noirq()
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The restore_noirq() callback is hibernation-specific. It is executed in the
|
||||
restore_noirq phase of hibernation, when the boot kernel has passed control to
|
||||
@ -875,6 +901,7 @@ For the vast majority of PCI device drivers there is no difference between
|
||||
resume_noirq() and restore_noirq().
|
||||
|
||||
3.1.13. restore()
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The restore() callback is hibernation-specific. It is executed after
|
||||
restore_noirq() callbacks have been executed for all devices in the system and
|
||||
@ -888,14 +915,17 @@ For the vast majority of PCI device drivers there is no difference between
|
||||
resume() and restore().
|
||||
|
||||
3.1.14. complete()
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The complete() callback is executed in the following situations:
|
||||
|
||||
- during system resume, after resume() callbacks have been executed for all
|
||||
devices,
|
||||
- during hibernation, before saving the system image, after thaw() callbacks
|
||||
have been executed for all devices,
|
||||
- during system restore, when the system is going back to its pre-hibernation
|
||||
state, after restore() callbacks have been executed for all devices.
|
||||
|
||||
It also may be executed if the loading of a hibernation image into memory fails
|
||||
(in that case it is run after thaw() callbacks have been executed for all
|
||||
devices that have drivers in the boot kernel).
|
||||
@ -904,6 +934,7 @@ This callback is entirely optional, although it may be necessary if the
|
||||
prepare() callback performs operations that need to be reversed.
|
||||
|
||||
3.1.15. runtime_suspend()
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The runtime_suspend() callback is specific to device runtime power management
|
||||
(runtime PM). It is executed by the PM core's runtime PM framework when the
|
||||
@ -915,6 +946,7 @@ put into a low-power state, but it must allow the PCI subsystem to perform all
|
||||
of the PCI-specific actions necessary for suspending the device.
|
||||
|
||||
3.1.16. runtime_resume()
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The runtime_resume() callback is specific to device runtime PM. It is executed
|
||||
by the PM core's runtime PM framework when the device is about to be resumed
|
||||
@ -927,6 +959,7 @@ The device is expected to be able to process I/O in the usual way after
|
||||
runtime_resume() has returned.
|
||||
|
||||
3.1.17. runtime_idle()
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The runtime_idle() callback is specific to device runtime PM. It is executed
|
||||
by the PM core's runtime PM framework whenever it may be desirable to suspend
|
||||
@ -939,6 +972,7 @@ PCI subsystem will call pm_runtime_suspend() for the device, which in turn will
|
||||
cause the driver's runtime_suspend() callback to be executed.
|
||||
|
||||
3.1.18. Pointing Multiple Callback Pointers to One Routine
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Although in principle each of the callbacks described in the previous
|
||||
subsections can be defined as a separate function, it often is convenient to
|
||||
@ -962,6 +996,7 @@ dev_pm_ops to indicate that one suspend routine is to be pointed to by the
|
||||
be pointed to by the .resume(), .thaw(), and .restore() members.
|
||||
|
||||
3.1.19. Driver Flags for Power Management
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The PM core allows device drivers to set flags that influence the handling of
|
||||
power management for the devices by the core itself and by middle layer code
|
||||
@ -1007,6 +1042,7 @@ it.
|
||||
|
||||
3.2. Device Runtime Power Management
|
||||
------------------------------------
|
||||
|
||||
In addition to providing device power management callbacks PCI device drivers
|
||||
are responsible for controlling the runtime power management (runtime PM) of
|
||||
their devices.
|
||||
@ -1073,22 +1109,27 @@ device the PM core automatically queues a request to check if the device is
|
||||
idle), device drivers are generally responsible for queuing power management
|
||||
requests for their devices. For this purpose they should use the runtime PM
|
||||
helper functions provided by the PM core, discussed in
|
||||
Documentation/power/runtime_pm.txt.
|
||||
Documentation/power/runtime_pm.rst.
|
||||
|
||||
Devices can also be suspended and resumed synchronously, without placing a
|
||||
request into pm_wq. In the majority of cases this also is done by their
|
||||
drivers that use helper functions provided by the PM core for this purpose.
|
||||
|
||||
For more information on the runtime PM of devices refer to
|
||||
Documentation/power/runtime_pm.txt.
|
||||
Documentation/power/runtime_pm.rst.
|
||||
|
||||
|
||||
4. Resources
|
||||
============
|
||||
|
||||
PCI Local Bus Specification, Rev. 3.0
|
||||
|
||||
PCI Bus Power Management Interface Specification, Rev. 1.2
|
||||
|
||||
Advanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b
|
||||
|
||||
PCI Express Base Specification, Rev. 2.0
|
||||
|
||||
Documentation/driver-api/pm/devices.rst
|
||||
Documentation/power/runtime_pm.txt
|
||||
|
||||
Documentation/power/runtime_pm.rst
|
@ -1,4 +1,6 @@
|
||||
PM Quality Of Service Interface.
|
||||
===============================
|
||||
PM Quality Of Service Interface
|
||||
===============================
|
||||
|
||||
This interface provides a kernel and user mode interface for registering
|
||||
performance expectations by drivers, subsystems and user space applications on
|
||||
@ -11,6 +13,7 @@ memory_bandwidth.
|
||||
constraints and PM QoS flags.
|
||||
|
||||
Each parameters have defined units:
|
||||
|
||||
* latency: usec
|
||||
* timeout: usec
|
||||
* throughput: kbs (kilo bit / sec)
|
||||
@ -18,6 +21,7 @@ Each parameters have defined units:
|
||||
|
||||
|
||||
1. PM QoS framework
|
||||
===================
|
||||
|
||||
The infrastructure exposes multiple misc device nodes one per implemented
|
||||
parameter. The set of parameters implement is defined by pm_qos_power_init()
|
||||
@ -37,38 +41,39 @@ reading the aggregated value does not require any locking mechanism.
|
||||
From kernel mode the use of this interface is simple:
|
||||
|
||||
void pm_qos_add_request(handle, param_class, target_value):
|
||||
Will insert an element into the list for that identified PM QoS class with the
|
||||
target value. Upon change to this list the new target is recomputed and any
|
||||
registered notifiers are called only if the target value is now different.
|
||||
Clients of pm_qos need to save the returned handle for future use in other
|
||||
pm_qos API functions.
|
||||
Will insert an element into the list for that identified PM QoS class with the
|
||||
target value. Upon change to this list the new target is recomputed and any
|
||||
registered notifiers are called only if the target value is now different.
|
||||
Clients of pm_qos need to save the returned handle for future use in other
|
||||
pm_qos API functions.
|
||||
|
||||
void pm_qos_update_request(handle, new_target_value):
|
||||
Will update the list element pointed to by the handle with the new target value
|
||||
and recompute the new aggregated target, calling the notification tree if the
|
||||
target is changed.
|
||||
Will update the list element pointed to by the handle with the new target value
|
||||
and recompute the new aggregated target, calling the notification tree if the
|
||||
target is changed.
|
||||
|
||||
void pm_qos_remove_request(handle):
|
||||
Will remove the element. After removal it will update the aggregate target and
|
||||
call the notification tree if the target was changed as a result of removing
|
||||
the request.
|
||||
Will remove the element. After removal it will update the aggregate target and
|
||||
call the notification tree if the target was changed as a result of removing
|
||||
the request.
|
||||
|
||||
int pm_qos_request(param_class):
|
||||
Returns the aggregated value for a given PM QoS class.
|
||||
Returns the aggregated value for a given PM QoS class.
|
||||
|
||||
int pm_qos_request_active(handle):
|
||||
Returns if the request is still active, i.e. it has not been removed from a
|
||||
PM QoS class constraints list.
|
||||
Returns if the request is still active, i.e. it has not been removed from a
|
||||
PM QoS class constraints list.
|
||||
|
||||
int pm_qos_add_notifier(param_class, notifier):
|
||||
Adds a notification callback function to the PM QoS class. The callback is
|
||||
called when the aggregated value for the PM QoS class is changed.
|
||||
Adds a notification callback function to the PM QoS class. The callback is
|
||||
called when the aggregated value for the PM QoS class is changed.
|
||||
|
||||
int pm_qos_remove_notifier(int param_class, notifier):
|
||||
Removes the notification callback function for the PM QoS class.
|
||||
Removes the notification callback function for the PM QoS class.
|
||||
|
||||
|
||||
From user mode:
|
||||
|
||||
Only processes can register a pm_qos request. To provide for automatic
|
||||
cleanup of a process, the interface requires the process to register its
|
||||
parameter requests in the following way:
|
||||
@ -89,6 +94,7 @@ node.
|
||||
|
||||
|
||||
2. PM QoS per-device latency and flags framework
|
||||
================================================
|
||||
|
||||
For each device, there are three lists of PM QoS requests. Two of them are
|
||||
maintained along with the aggregated targets of resume latency and active
|
||||
@ -107,73 +113,80 @@ the aggregated value does not require any locking mechanism.
|
||||
From kernel mode the use of this interface is the following:
|
||||
|
||||
int dev_pm_qos_add_request(device, handle, type, value):
|
||||
Will insert an element into the list for that identified device with the
|
||||
target value. Upon change to this list the new target is recomputed and any
|
||||
registered notifiers are called only if the target value is now different.
|
||||
Clients of dev_pm_qos need to save the handle for future use in other
|
||||
dev_pm_qos API functions.
|
||||
Will insert an element into the list for that identified device with the
|
||||
target value. Upon change to this list the new target is recomputed and any
|
||||
registered notifiers are called only if the target value is now different.
|
||||
Clients of dev_pm_qos need to save the handle for future use in other
|
||||
dev_pm_qos API functions.
|
||||
|
||||
int dev_pm_qos_update_request(handle, new_value):
|
||||
Will update the list element pointed to by the handle with the new target value
|
||||
and recompute the new aggregated target, calling the notification trees if the
|
||||
target is changed.
|
||||
Will update the list element pointed to by the handle with the new target
|
||||
value and recompute the new aggregated target, calling the notification
|
||||
trees if the target is changed.
|
||||
|
||||
int dev_pm_qos_remove_request(handle):
|
||||
Will remove the element. After removal it will update the aggregate target and
|
||||
call the notification trees if the target was changed as a result of removing
|
||||
the request.
|
||||
Will remove the element. After removal it will update the aggregate target
|
||||
and call the notification trees if the target was changed as a result of
|
||||
removing the request.
|
||||
|
||||
s32 dev_pm_qos_read_value(device):
|
||||
Returns the aggregated value for a given device's constraints list.
|
||||
Returns the aggregated value for a given device's constraints list.
|
||||
|
||||
enum pm_qos_flags_status dev_pm_qos_flags(device, mask)
|
||||
Check PM QoS flags of the given device against the given mask of flags.
|
||||
The meaning of the return values is as follows:
|
||||
PM_QOS_FLAGS_ALL: All flags from the mask are set
|
||||
PM_QOS_FLAGS_SOME: Some flags from the mask are set
|
||||
PM_QOS_FLAGS_NONE: No flags from the mask are set
|
||||
PM_QOS_FLAGS_UNDEFINED: The device's PM QoS structure has not been
|
||||
initialized or the list of requests is empty.
|
||||
Check PM QoS flags of the given device against the given mask of flags.
|
||||
The meaning of the return values is as follows:
|
||||
|
||||
PM_QOS_FLAGS_ALL:
|
||||
All flags from the mask are set
|
||||
PM_QOS_FLAGS_SOME:
|
||||
Some flags from the mask are set
|
||||
PM_QOS_FLAGS_NONE:
|
||||
No flags from the mask are set
|
||||
PM_QOS_FLAGS_UNDEFINED:
|
||||
The device's PM QoS structure has not been initialized
|
||||
or the list of requests is empty.
|
||||
|
||||
int dev_pm_qos_add_ancestor_request(dev, handle, type, value)
|
||||
Add a PM QoS request for the first direct ancestor of the given device whose
|
||||
power.ignore_children flag is unset (for DEV_PM_QOS_RESUME_LATENCY requests)
|
||||
or whose power.set_latency_tolerance callback pointer is not NULL (for
|
||||
DEV_PM_QOS_LATENCY_TOLERANCE requests).
|
||||
Add a PM QoS request for the first direct ancestor of the given device whose
|
||||
power.ignore_children flag is unset (for DEV_PM_QOS_RESUME_LATENCY requests)
|
||||
or whose power.set_latency_tolerance callback pointer is not NULL (for
|
||||
DEV_PM_QOS_LATENCY_TOLERANCE requests).
|
||||
|
||||
int dev_pm_qos_expose_latency_limit(device, value)
|
||||
Add a request to the device's PM QoS list of resume latency constraints and
|
||||
create a sysfs attribute pm_qos_resume_latency_us under the device's power
|
||||
directory allowing user space to manipulate that request.
|
||||
Add a request to the device's PM QoS list of resume latency constraints and
|
||||
create a sysfs attribute pm_qos_resume_latency_us under the device's power
|
||||
directory allowing user space to manipulate that request.
|
||||
|
||||
void dev_pm_qos_hide_latency_limit(device)
|
||||
Drop the request added by dev_pm_qos_expose_latency_limit() from the device's
|
||||
PM QoS list of resume latency constraints and remove sysfs attribute
|
||||
pm_qos_resume_latency_us from the device's power directory.
|
||||
Drop the request added by dev_pm_qos_expose_latency_limit() from the device's
|
||||
PM QoS list of resume latency constraints and remove sysfs attribute
|
||||
pm_qos_resume_latency_us from the device's power directory.
|
||||
|
||||
int dev_pm_qos_expose_flags(device, value)
|
||||
Add a request to the device's PM QoS list of flags and create sysfs attribute
|
||||
pm_qos_no_power_off under the device's power directory allowing user space to
|
||||
change the value of the PM_QOS_FLAG_NO_POWER_OFF flag.
|
||||
Add a request to the device's PM QoS list of flags and create sysfs attribute
|
||||
pm_qos_no_power_off under the device's power directory allowing user space to
|
||||
change the value of the PM_QOS_FLAG_NO_POWER_OFF flag.
|
||||
|
||||
void dev_pm_qos_hide_flags(device)
|
||||
Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list
|
||||
of flags and remove sysfs attribute pm_qos_no_power_off from the device's power
|
||||
directory.
|
||||
Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list
|
||||
of flags and remove sysfs attribute pm_qos_no_power_off from the device's power
|
||||
directory.
|
||||
|
||||
Notification mechanisms:
|
||||
|
||||
The per-device PM QoS framework has a per-device notification tree.
|
||||
|
||||
int dev_pm_qos_add_notifier(device, notifier):
|
||||
Adds a notification callback function for the device.
|
||||
The callback is called when the aggregated value of the device constraints list
|
||||
is changed (for resume latency device PM QoS only).
|
||||
Adds a notification callback function for the device.
|
||||
The callback is called when the aggregated value of the device constraints list
|
||||
is changed (for resume latency device PM QoS only).
|
||||
|
||||
int dev_pm_qos_remove_notifier(device, notifier):
|
||||
Removes the notification callback function for the device.
|
||||
Removes the notification callback function for the device.
|
||||
|
||||
|
||||
Active state latency tolerance
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This device PM QoS type is used to support systems in which hardware may switch
|
||||
to energy-saving operation modes on the fly. In those systems, if the operation
|
282
Documentation/power/power_supply_class.rst
Normal file
282
Documentation/power/power_supply_class.rst
Normal file
@ -0,0 +1,282 @@
|
||||
========================
|
||||
Linux power supply class
|
||||
========================
|
||||
|
||||
Synopsis
|
||||
~~~~~~~~
|
||||
Power supply class used to represent battery, UPS, AC or DC power supply
|
||||
properties to user-space.
|
||||
|
||||
It defines core set of attributes, which should be applicable to (almost)
|
||||
every power supply out there. Attributes are available via sysfs and uevent
|
||||
interfaces.
|
||||
|
||||
Each attribute has well defined meaning, up to unit of measure used. While
|
||||
the attributes provided are believed to be universally applicable to any
|
||||
power supply, specific monitoring hardware may not be able to provide them
|
||||
all, so any of them may be skipped.
|
||||
|
||||
Power supply class is extensible, and allows to define drivers own attributes.
|
||||
The core attribute set is subject to the standard Linux evolution (i.e.
|
||||
if it will be found that some attribute is applicable to many power supply
|
||||
types or their drivers, it can be added to the core set).
|
||||
|
||||
It also integrates with LED framework, for the purpose of providing
|
||||
typically expected feedback of battery charging/fully charged status and
|
||||
AC/USB power supply online status. (Note that specific details of the
|
||||
indication (including whether to use it at all) are fully controllable by
|
||||
user and/or specific machine defaults, per design principles of LED
|
||||
framework).
|
||||
|
||||
|
||||
Attributes/properties
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
Power supply class has predefined set of attributes, this eliminates code
|
||||
duplication across drivers. Power supply class insist on reusing its
|
||||
predefined attributes *and* their units.
|
||||
|
||||
So, userspace gets predictable set of attributes and their units for any
|
||||
kind of power supply, and can process/present them to a user in consistent
|
||||
manner. Results for different power supplies and machines are also directly
|
||||
comparable.
|
||||
|
||||
See drivers/power/supply/ds2760_battery.c and drivers/power/supply/pda_power.c
|
||||
for the example how to declare and handle attributes.
|
||||
|
||||
|
||||
Units
|
||||
~~~~~
|
||||
Quoting include/linux/power_supply.h:
|
||||
|
||||
All voltages, currents, charges, energies, time and temperatures in µV,
|
||||
µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
|
||||
stated. It's driver's job to convert its raw values to units in which
|
||||
this class operates.
|
||||
|
||||
|
||||
Attributes/properties detailed
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
+--------------------------------------------------------------------------+
|
||||
| **Charge/Energy/Capacity - how to not confuse** |
|
||||
+--------------------------------------------------------------------------+
|
||||
| **Because both "charge" (µAh) and "energy" (µWh) represents "capacity" |
|
||||
| of battery, this class distinguish these terms. Don't mix them!** |
|
||||
| |
|
||||
| - `CHARGE_*` |
|
||||
| attributes represents capacity in µAh only. |
|
||||
| - `ENERGY_*` |
|
||||
| attributes represents capacity in µWh only. |
|
||||
| - `CAPACITY` |
|
||||
| attribute represents capacity in *percents*, from 0 to 100. |
|
||||
+--------------------------------------------------------------------------+
|
||||
|
||||
Postfixes:
|
||||
|
||||
_AVG
|
||||
*hardware* averaged value, use it if your hardware is really able to
|
||||
report averaged values.
|
||||
_NOW
|
||||
momentary/instantaneous values.
|
||||
|
||||
STATUS
|
||||
this attribute represents operating status (charging, full,
|
||||
discharging (i.e. powering a load), etc.). This corresponds to
|
||||
`BATTERY_STATUS_*` values, as defined in battery.h.
|
||||
|
||||
CHARGE_TYPE
|
||||
batteries can typically charge at different rates.
|
||||
This defines trickle and fast charges. For batteries that
|
||||
are already charged or discharging, 'n/a' can be displayed (or
|
||||
'unknown', if the status is not known).
|
||||
|
||||
AUTHENTIC
|
||||
indicates the power supply (battery or charger) connected
|
||||
to the platform is authentic(1) or non authentic(0).
|
||||
|
||||
HEALTH
|
||||
represents health of the battery, values corresponds to
|
||||
POWER_SUPPLY_HEALTH_*, defined in battery.h.
|
||||
|
||||
VOLTAGE_OCV
|
||||
open circuit voltage of the battery.
|
||||
|
||||
VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN
|
||||
design values for maximal and minimal power supply voltages.
|
||||
Maximal/minimal means values of voltages when battery considered
|
||||
"full"/"empty" at normal conditions. Yes, there is no direct relation
|
||||
between voltage and battery capacity, but some dumb
|
||||
batteries use voltage for very approximated calculation of capacity.
|
||||
Battery driver also can use this attribute just to inform userspace
|
||||
about maximal and minimal voltage thresholds of a given battery.
|
||||
|
||||
VOLTAGE_MAX, VOLTAGE_MIN
|
||||
same as _DESIGN voltage values except that these ones should be used
|
||||
if hardware could only guess (measure and retain) the thresholds of a
|
||||
given power supply.
|
||||
|
||||
VOLTAGE_BOOT
|
||||
Reports the voltage measured during boot
|
||||
|
||||
CURRENT_BOOT
|
||||
Reports the current measured during boot
|
||||
|
||||
CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN
|
||||
design charge values, when battery considered full/empty.
|
||||
|
||||
ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN
|
||||
same as above but for energy.
|
||||
|
||||
CHARGE_FULL, CHARGE_EMPTY
|
||||
These attributes means "last remembered value of charge when battery
|
||||
became full/empty". It also could mean "value of charge when battery
|
||||
considered full/empty at given conditions (temperature, age)".
|
||||
I.e. these attributes represents real thresholds, not design values.
|
||||
|
||||
ENERGY_FULL, ENERGY_EMPTY
|
||||
same as above but for energy.
|
||||
|
||||
CHARGE_COUNTER
|
||||
the current charge counter (in µAh). This could easily
|
||||
be negative; there is no empty or full value. It is only useful for
|
||||
relative, time-based measurements.
|
||||
|
||||
PRECHARGE_CURRENT
|
||||
the maximum charge current during precharge phase of charge cycle
|
||||
(typically 20% of battery capacity).
|
||||
|
||||
CHARGE_TERM_CURRENT
|
||||
Charge termination current. The charge cycle terminates when battery
|
||||
voltage is above recharge threshold, and charge current is below
|
||||
this setting (typically 10% of battery capacity).
|
||||
|
||||
CONSTANT_CHARGE_CURRENT
|
||||
constant charge current programmed by charger.
|
||||
|
||||
|
||||
CONSTANT_CHARGE_CURRENT_MAX
|
||||
maximum charge current supported by the power supply object.
|
||||
|
||||
CONSTANT_CHARGE_VOLTAGE
|
||||
constant charge voltage programmed by charger.
|
||||
CONSTANT_CHARGE_VOLTAGE_MAX
|
||||
maximum charge voltage supported by the power supply object.
|
||||
|
||||
INPUT_CURRENT_LIMIT
|
||||
input current limit programmed by charger. Indicates
|
||||
the current drawn from a charging source.
|
||||
|
||||
CHARGE_CONTROL_LIMIT
|
||||
current charge control limit setting
|
||||
CHARGE_CONTROL_LIMIT_MAX
|
||||
maximum charge control limit setting
|
||||
|
||||
CALIBRATE
|
||||
battery or coulomb counter calibration status
|
||||
|
||||
CAPACITY
|
||||
capacity in percents.
|
||||
CAPACITY_ALERT_MIN
|
||||
minimum capacity alert value in percents.
|
||||
CAPACITY_ALERT_MAX
|
||||
maximum capacity alert value in percents.
|
||||
CAPACITY_LEVEL
|
||||
capacity level. This corresponds to POWER_SUPPLY_CAPACITY_LEVEL_*.
|
||||
|
||||
TEMP
|
||||
temperature of the power supply.
|
||||
TEMP_ALERT_MIN
|
||||
minimum battery temperature alert.
|
||||
TEMP_ALERT_MAX
|
||||
maximum battery temperature alert.
|
||||
TEMP_AMBIENT
|
||||
ambient temperature.
|
||||
TEMP_AMBIENT_ALERT_MIN
|
||||
minimum ambient temperature alert.
|
||||
TEMP_AMBIENT_ALERT_MAX
|
||||
maximum ambient temperature alert.
|
||||
TEMP_MIN
|
||||
minimum operatable temperature
|
||||
TEMP_MAX
|
||||
maximum operatable temperature
|
||||
|
||||
TIME_TO_EMPTY
|
||||
seconds left for battery to be considered empty
|
||||
(i.e. while battery powers a load)
|
||||
TIME_TO_FULL
|
||||
seconds left for battery to be considered full
|
||||
(i.e. while battery is charging)
|
||||
|
||||
|
||||
Battery <-> external power supply interaction
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Often power supplies are acting as supplies and supplicants at the same
|
||||
time. Batteries are good example. So, batteries usually care if they're
|
||||
externally powered or not.
|
||||
|
||||
For that case, power supply class implements notification mechanism for
|
||||
batteries.
|
||||
|
||||
External power supply (AC) lists supplicants (batteries) names in
|
||||
"supplied_to" struct member, and each power_supply_changed() call
|
||||
issued by external power supply will notify supplicants via
|
||||
external_power_changed callback.
|
||||
|
||||
|
||||
Devicetree battery characteristics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Drivers should call power_supply_get_battery_info() to obtain battery
|
||||
characteristics from a devicetree battery node, defined in
|
||||
Documentation/devicetree/bindings/power/supply/battery.txt. This is
|
||||
implemented in drivers/power/supply/bq27xxx_battery.c.
|
||||
|
||||
Properties in struct power_supply_battery_info and their counterparts in the
|
||||
battery node have names corresponding to elements in enum power_supply_property,
|
||||
for naming consistency between sysfs attributes and battery node properties.
|
||||
|
||||
|
||||
QA
|
||||
~~
|
||||
|
||||
Q:
|
||||
Where is POWER_SUPPLY_PROP_XYZ attribute?
|
||||
A:
|
||||
If you cannot find attribute suitable for your driver needs, feel free
|
||||
to add it and send patch along with your driver.
|
||||
|
||||
The attributes available currently are the ones currently provided by the
|
||||
drivers written.
|
||||
|
||||
Good candidates to add in future: model/part#, cycle_time, manufacturer,
|
||||
etc.
|
||||
|
||||
|
||||
Q:
|
||||
I have some very specific attribute (e.g. battery color), should I add
|
||||
this attribute to standard ones?
|
||||
A:
|
||||
Most likely, no. Such attribute can be placed in the driver itself, if
|
||||
it is useful. Of course, if the attribute in question applicable to
|
||||
large set of batteries, provided by many drivers, and/or comes from
|
||||
some general battery specification/standard, it may be a candidate to
|
||||
be added to the core attribute set.
|
||||
|
||||
|
||||
Q:
|
||||
Suppose, my battery monitoring chip/firmware does not provides capacity
|
||||
in percents, but provides charge_{now,full,empty}. Should I calculate
|
||||
percentage capacity manually, inside the driver, and register CAPACITY
|
||||
attribute? The same question about time_to_empty/time_to_full.
|
||||
A:
|
||||
Most likely, no. This class is designed to export properties which are
|
||||
directly measurable by the specific hardware available.
|
||||
|
||||
Inferring not available properties using some heuristics or mathematical
|
||||
model is not subject of work for a battery driver. Such functionality
|
||||
should be factored out, and in fact, apm_power, the driver to serve
|
||||
legacy APM API on top of power supply class, uses a simple heuristic of
|
||||
approximating remaining battery capacity based on its charge, current,
|
||||
voltage and so on. But full-fledged battery model is likely not subject
|
||||
for kernel at all, as it would require floating point calculation to deal
|
||||
with things like differential equations and Kalman filters. This is
|
||||
better be handled by batteryd/libbattery, yet to be written.
|
@ -1,231 +0,0 @@
|
||||
Linux power supply class
|
||||
========================
|
||||
|
||||
Synopsis
|
||||
~~~~~~~~
|
||||
Power supply class used to represent battery, UPS, AC or DC power supply
|
||||
properties to user-space.
|
||||
|
||||
It defines core set of attributes, which should be applicable to (almost)
|
||||
every power supply out there. Attributes are available via sysfs and uevent
|
||||
interfaces.
|
||||
|
||||
Each attribute has well defined meaning, up to unit of measure used. While
|
||||
the attributes provided are believed to be universally applicable to any
|
||||
power supply, specific monitoring hardware may not be able to provide them
|
||||
all, so any of them may be skipped.
|
||||
|
||||
Power supply class is extensible, and allows to define drivers own attributes.
|
||||
The core attribute set is subject to the standard Linux evolution (i.e.
|
||||
if it will be found that some attribute is applicable to many power supply
|
||||
types or their drivers, it can be added to the core set).
|
||||
|
||||
It also integrates with LED framework, for the purpose of providing
|
||||
typically expected feedback of battery charging/fully charged status and
|
||||
AC/USB power supply online status. (Note that specific details of the
|
||||
indication (including whether to use it at all) are fully controllable by
|
||||
user and/or specific machine defaults, per design principles of LED
|
||||
framework).
|
||||
|
||||
|
||||
Attributes/properties
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
Power supply class has predefined set of attributes, this eliminates code
|
||||
duplication across drivers. Power supply class insist on reusing its
|
||||
predefined attributes *and* their units.
|
||||
|
||||
So, userspace gets predictable set of attributes and their units for any
|
||||
kind of power supply, and can process/present them to a user in consistent
|
||||
manner. Results for different power supplies and machines are also directly
|
||||
comparable.
|
||||
|
||||
See drivers/power/supply/ds2760_battery.c and drivers/power/supply/pda_power.c
|
||||
for the example how to declare and handle attributes.
|
||||
|
||||
|
||||
Units
|
||||
~~~~~
|
||||
Quoting include/linux/power_supply.h:
|
||||
|
||||
All voltages, currents, charges, energies, time and temperatures in µV,
|
||||
µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
|
||||
stated. It's driver's job to convert its raw values to units in which
|
||||
this class operates.
|
||||
|
||||
|
||||
Attributes/properties detailed
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~
|
||||
~ ~
|
||||
~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~
|
||||
~ of battery, this class distinguish these terms. Don't mix them! ~
|
||||
~ ~
|
||||
~ CHARGE_* attributes represents capacity in µAh only. ~
|
||||
~ ENERGY_* attributes represents capacity in µWh only. ~
|
||||
~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~
|
||||
~ ~
|
||||
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
|
||||
|
||||
Postfixes:
|
||||
_AVG - *hardware* averaged value, use it if your hardware is really able to
|
||||
report averaged values.
|
||||
_NOW - momentary/instantaneous values.
|
||||
|
||||
STATUS - this attribute represents operating status (charging, full,
|
||||
discharging (i.e. powering a load), etc.). This corresponds to
|
||||
BATTERY_STATUS_* values, as defined in battery.h.
|
||||
|
||||
CHARGE_TYPE - batteries can typically charge at different rates.
|
||||
This defines trickle and fast charges. For batteries that
|
||||
are already charged or discharging, 'n/a' can be displayed (or
|
||||
'unknown', if the status is not known).
|
||||
|
||||
AUTHENTIC - indicates the power supply (battery or charger) connected
|
||||
to the platform is authentic(1) or non authentic(0).
|
||||
|
||||
HEALTH - represents health of the battery, values corresponds to
|
||||
POWER_SUPPLY_HEALTH_*, defined in battery.h.
|
||||
|
||||
VOLTAGE_OCV - open circuit voltage of the battery.
|
||||
|
||||
VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and
|
||||
minimal power supply voltages. Maximal/minimal means values of voltages
|
||||
when battery considered "full"/"empty" at normal conditions. Yes, there is
|
||||
no direct relation between voltage and battery capacity, but some dumb
|
||||
batteries use voltage for very approximated calculation of capacity.
|
||||
Battery driver also can use this attribute just to inform userspace
|
||||
about maximal and minimal voltage thresholds of a given battery.
|
||||
|
||||
VOLTAGE_MAX, VOLTAGE_MIN - same as _DESIGN voltage values except that
|
||||
these ones should be used if hardware could only guess (measure and
|
||||
retain) the thresholds of a given power supply.
|
||||
|
||||
VOLTAGE_BOOT - Reports the voltage measured during boot
|
||||
|
||||
CURRENT_BOOT - Reports the current measured during boot
|
||||
|
||||
CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when
|
||||
battery considered full/empty.
|
||||
|
||||
ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy.
|
||||
|
||||
CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value
|
||||
of charge when battery became full/empty". It also could mean "value of
|
||||
charge when battery considered full/empty at given conditions (temperature,
|
||||
age)". I.e. these attributes represents real thresholds, not design values.
|
||||
|
||||
ENERGY_FULL, ENERGY_EMPTY - same as above but for energy.
|
||||
|
||||
CHARGE_COUNTER - the current charge counter (in µAh). This could easily
|
||||
be negative; there is no empty or full value. It is only useful for
|
||||
relative, time-based measurements.
|
||||
|
||||
PRECHARGE_CURRENT - the maximum charge current during precharge phase
|
||||
of charge cycle (typically 20% of battery capacity).
|
||||
CHARGE_TERM_CURRENT - Charge termination current. The charge cycle
|
||||
terminates when battery voltage is above recharge threshold, and charge
|
||||
current is below this setting (typically 10% of battery capacity).
|
||||
|
||||
CONSTANT_CHARGE_CURRENT - constant charge current programmed by charger.
|
||||
CONSTANT_CHARGE_CURRENT_MAX - maximum charge current supported by the
|
||||
power supply object.
|
||||
|
||||
CONSTANT_CHARGE_VOLTAGE - constant charge voltage programmed by charger.
|
||||
CONSTANT_CHARGE_VOLTAGE_MAX - maximum charge voltage supported by the
|
||||
power supply object.
|
||||
|
||||
INPUT_CURRENT_LIMIT - input current limit programmed by charger. Indicates
|
||||
the current drawn from a charging source.
|
||||
|
||||
CHARGE_CONTROL_LIMIT - current charge control limit setting
|
||||
CHARGE_CONTROL_LIMIT_MAX - maximum charge control limit setting
|
||||
|
||||
CALIBRATE - battery or coulomb counter calibration status
|
||||
|
||||
CAPACITY - capacity in percents.
|
||||
CAPACITY_ALERT_MIN - minimum capacity alert value in percents.
|
||||
CAPACITY_ALERT_MAX - maximum capacity alert value in percents.
|
||||
CAPACITY_LEVEL - capacity level. This corresponds to
|
||||
POWER_SUPPLY_CAPACITY_LEVEL_*.
|
||||
|
||||
TEMP - temperature of the power supply.
|
||||
TEMP_ALERT_MIN - minimum battery temperature alert.
|
||||
TEMP_ALERT_MAX - maximum battery temperature alert.
|
||||
TEMP_AMBIENT - ambient temperature.
|
||||
TEMP_AMBIENT_ALERT_MIN - minimum ambient temperature alert.
|
||||
TEMP_AMBIENT_ALERT_MAX - maximum ambient temperature alert.
|
||||
TEMP_MIN - minimum operatable temperature
|
||||
TEMP_MAX - maximum operatable temperature
|
||||
|
||||
TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e.
|
||||
while battery powers a load)
|
||||
TIME_TO_FULL - seconds left for battery to be considered full (i.e.
|
||||
while battery is charging)
|
||||
|
||||
|
||||
Battery <-> external power supply interaction
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Often power supplies are acting as supplies and supplicants at the same
|
||||
time. Batteries are good example. So, batteries usually care if they're
|
||||
externally powered or not.
|
||||
|
||||
For that case, power supply class implements notification mechanism for
|
||||
batteries.
|
||||
|
||||
External power supply (AC) lists supplicants (batteries) names in
|
||||
"supplied_to" struct member, and each power_supply_changed() call
|
||||
issued by external power supply will notify supplicants via
|
||||
external_power_changed callback.
|
||||
|
||||
|
||||
Devicetree battery characteristics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Drivers should call power_supply_get_battery_info() to obtain battery
|
||||
characteristics from a devicetree battery node, defined in
|
||||
Documentation/devicetree/bindings/power/supply/battery.txt. This is
|
||||
implemented in drivers/power/supply/bq27xxx_battery.c.
|
||||
|
||||
Properties in struct power_supply_battery_info and their counterparts in the
|
||||
battery node have names corresponding to elements in enum power_supply_property,
|
||||
for naming consistency between sysfs attributes and battery node properties.
|
||||
|
||||
|
||||
QA
|
||||
~~
|
||||
Q: Where is POWER_SUPPLY_PROP_XYZ attribute?
|
||||
A: If you cannot find attribute suitable for your driver needs, feel free
|
||||
to add it and send patch along with your driver.
|
||||
|
||||
The attributes available currently are the ones currently provided by the
|
||||
drivers written.
|
||||
|
||||
Good candidates to add in future: model/part#, cycle_time, manufacturer,
|
||||
etc.
|
||||
|
||||
|
||||
Q: I have some very specific attribute (e.g. battery color), should I add
|
||||
this attribute to standard ones?
|
||||
A: Most likely, no. Such attribute can be placed in the driver itself, if
|
||||
it is useful. Of course, if the attribute in question applicable to
|
||||
large set of batteries, provided by many drivers, and/or comes from
|
||||
some general battery specification/standard, it may be a candidate to
|
||||
be added to the core attribute set.
|
||||
|
||||
|
||||
Q: Suppose, my battery monitoring chip/firmware does not provides capacity
|
||||
in percents, but provides charge_{now,full,empty}. Should I calculate
|
||||
percentage capacity manually, inside the driver, and register CAPACITY
|
||||
attribute? The same question about time_to_empty/time_to_full.
|
||||
A: Most likely, no. This class is designed to export properties which are
|
||||
directly measurable by the specific hardware available.
|
||||
|
||||
Inferring not available properties using some heuristics or mathematical
|
||||
model is not subject of work for a battery driver. Such functionality
|
||||
should be factored out, and in fact, apm_power, the driver to serve
|
||||
legacy APM API on top of power supply class, uses a simple heuristic of
|
||||
approximating remaining battery capacity based on its charge, current,
|
||||
voltage and so on. But full-fledged battery model is likely not subject
|
||||
for kernel at all, as it would require floating point calculation to deal
|
||||
with things like differential equations and Kalman filters. This is
|
||||
better be handled by batteryd/libbattery, yet to be written.
|
257
Documentation/power/powercap/powercap.rst
Normal file
257
Documentation/power/powercap/powercap.rst
Normal file
@ -0,0 +1,257 @@
|
||||
=======================
|
||||
Power Capping Framework
|
||||
=======================
|
||||
|
||||
The power capping framework provides a consistent interface between the kernel
|
||||
and the user space that allows power capping drivers to expose the settings to
|
||||
user space in a uniform way.
|
||||
|
||||
Terminology
|
||||
===========
|
||||
|
||||
The framework exposes power capping devices to user space via sysfs in the
|
||||
form of a tree of objects. The objects at the root level of the tree represent
|
||||
'control types', which correspond to different methods of power capping. For
|
||||
example, the intel-rapl control type represents the Intel "Running Average
|
||||
Power Limit" (RAPL) technology, whereas the 'idle-injection' control type
|
||||
corresponds to the use of idle injection for controlling power.
|
||||
|
||||
Power zones represent different parts of the system, which can be controlled and
|
||||
monitored using the power capping method determined by the control type the
|
||||
given zone belongs to. They each contain attributes for monitoring power, as
|
||||
well as controls represented in the form of power constraints. If the parts of
|
||||
the system represented by different power zones are hierarchical (that is, one
|
||||
bigger part consists of multiple smaller parts that each have their own power
|
||||
controls), those power zones may also be organized in a hierarchy with one
|
||||
parent power zone containing multiple subzones and so on to reflect the power
|
||||
control topology of the system. In that case, it is possible to apply power
|
||||
capping to a set of devices together using the parent power zone and if more
|
||||
fine grained control is required, it can be applied through the subzones.
|
||||
|
||||
|
||||
Example sysfs interface tree::
|
||||
|
||||
/sys/devices/virtual/powercap
|
||||
└──intel-rapl
|
||||
├──intel-rapl:0
|
||||
│ ├──constraint_0_name
|
||||
│ ├──constraint_0_power_limit_uw
|
||||
│ ├──constraint_0_time_window_us
|
||||
│ ├──constraint_1_name
|
||||
│ ├──constraint_1_power_limit_uw
|
||||
│ ├──constraint_1_time_window_us
|
||||
│ ├──device -> ../../intel-rapl
|
||||
│ ├──energy_uj
|
||||
│ ├──intel-rapl:0:0
|
||||
│ │ ├──constraint_0_name
|
||||
│ │ ├──constraint_0_power_limit_uw
|
||||
│ │ ├──constraint_0_time_window_us
|
||||
│ │ ├──constraint_1_name
|
||||
│ │ ├──constraint_1_power_limit_uw
|
||||
│ │ ├──constraint_1_time_window_us
|
||||
│ │ ├──device -> ../../intel-rapl:0
|
||||
│ │ ├──energy_uj
|
||||
│ │ ├──max_energy_range_uj
|
||||
│ │ ├──name
|
||||
│ │ ├──enabled
|
||||
│ │ ├──power
|
||||
│ │ │ ├──async
|
||||
│ │ │ []
|
||||
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||
│ │ └──uevent
|
||||
│ ├──intel-rapl:0:1
|
||||
│ │ ├──constraint_0_name
|
||||
│ │ ├──constraint_0_power_limit_uw
|
||||
│ │ ├──constraint_0_time_window_us
|
||||
│ │ ├──constraint_1_name
|
||||
│ │ ├──constraint_1_power_limit_uw
|
||||
│ │ ├──constraint_1_time_window_us
|
||||
│ │ ├──device -> ../../intel-rapl:0
|
||||
│ │ ├──energy_uj
|
||||
│ │ ├──max_energy_range_uj
|
||||
│ │ ├──name
|
||||
│ │ ├──enabled
|
||||
│ │ ├──power
|
||||
│ │ │ ├──async
|
||||
│ │ │ []
|
||||
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||
│ │ └──uevent
|
||||
│ ├──max_energy_range_uj
|
||||
│ ├──max_power_range_uw
|
||||
│ ├──name
|
||||
│ ├──enabled
|
||||
│ ├──power
|
||||
│ │ ├──async
|
||||
│ │ []
|
||||
│ ├──subsystem -> ../../../../../class/power_cap
|
||||
│ ├──enabled
|
||||
│ ├──uevent
|
||||
├──intel-rapl:1
|
||||
│ ├──constraint_0_name
|
||||
│ ├──constraint_0_power_limit_uw
|
||||
│ ├──constraint_0_time_window_us
|
||||
│ ├──constraint_1_name
|
||||
│ ├──constraint_1_power_limit_uw
|
||||
│ ├──constraint_1_time_window_us
|
||||
│ ├──device -> ../../intel-rapl
|
||||
│ ├──energy_uj
|
||||
│ ├──intel-rapl:1:0
|
||||
│ │ ├──constraint_0_name
|
||||
│ │ ├──constraint_0_power_limit_uw
|
||||
│ │ ├──constraint_0_time_window_us
|
||||
│ │ ├──constraint_1_name
|
||||
│ │ ├──constraint_1_power_limit_uw
|
||||
│ │ ├──constraint_1_time_window_us
|
||||
│ │ ├──device -> ../../intel-rapl:1
|
||||
│ │ ├──energy_uj
|
||||
│ │ ├──max_energy_range_uj
|
||||
│ │ ├──name
|
||||
│ │ ├──enabled
|
||||
│ │ ├──power
|
||||
│ │ │ ├──async
|
||||
│ │ │ []
|
||||
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||
│ │ └──uevent
|
||||
│ ├──intel-rapl:1:1
|
||||
│ │ ├──constraint_0_name
|
||||
│ │ ├──constraint_0_power_limit_uw
|
||||
│ │ ├──constraint_0_time_window_us
|
||||
│ │ ├──constraint_1_name
|
||||
│ │ ├──constraint_1_power_limit_uw
|
||||
│ │ ├──constraint_1_time_window_us
|
||||
│ │ ├──device -> ../../intel-rapl:1
|
||||
│ │ ├──energy_uj
|
||||
│ │ ├──max_energy_range_uj
|
||||
│ │ ├──name
|
||||
│ │ ├──enabled
|
||||
│ │ ├──power
|
||||
│ │ │ ├──async
|
||||
│ │ │ []
|
||||
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||
│ │ └──uevent
|
||||
│ ├──max_energy_range_uj
|
||||
│ ├──max_power_range_uw
|
||||
│ ├──name
|
||||
│ ├──enabled
|
||||
│ ├──power
|
||||
│ │ ├──async
|
||||
│ │ []
|
||||
│ ├──subsystem -> ../../../../../class/power_cap
|
||||
│ ├──uevent
|
||||
├──power
|
||||
│ ├──async
|
||||
│ []
|
||||
├──subsystem -> ../../../../class/power_cap
|
||||
├──enabled
|
||||
└──uevent
|
||||
|
||||
The above example illustrates a case in which the Intel RAPL technology,
|
||||
available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one
|
||||
control type called intel-rapl which contains two power zones, intel-rapl:0 and
|
||||
intel-rapl:1, representing CPU packages. Each of these power zones contains
|
||||
two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the
|
||||
"core" and the "uncore" parts of the given CPU package, respectively. All of
|
||||
the zones and subzones contain energy monitoring attributes (energy_uj,
|
||||
max_energy_range_uj) and constraint attributes (constraint_*) allowing controls
|
||||
to be applied (the constraints in the 'package' power zones apply to the whole
|
||||
CPU packages and the subzone constraints only apply to the respective parts of
|
||||
the given package individually). Since Intel RAPL doesn't provide instantaneous
|
||||
power value, there is no power_uw attribute.
|
||||
|
||||
In addition to that, each power zone contains a name attribute, allowing the
|
||||
part of the system represented by that zone to be identified.
|
||||
For example::
|
||||
|
||||
cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
|
||||
|
||||
package-0
|
||||
---------
|
||||
|
||||
The Intel RAPL technology allows two constraints, short term and long term,
|
||||
with two different time windows to be applied to each power zone. Thus for
|
||||
each zone there are 2 attributes representing the constraint names, 2 power
|
||||
limits and 2 attributes representing the sizes of the time windows. Such that,
|
||||
constraint_j_* attributes correspond to the jth constraint (j = 0,1).
|
||||
|
||||
For example::
|
||||
|
||||
constraint_0_name
|
||||
constraint_0_power_limit_uw
|
||||
constraint_0_time_window_us
|
||||
constraint_1_name
|
||||
constraint_1_power_limit_uw
|
||||
constraint_1_time_window_us
|
||||
|
||||
Power Zone Attributes
|
||||
=====================
|
||||
|
||||
Monitoring attributes
|
||||
---------------------
|
||||
|
||||
energy_uj (rw)
|
||||
Current energy counter in micro joules. Write "0" to reset.
|
||||
If the counter can not be reset, then this attribute is read only.
|
||||
|
||||
max_energy_range_uj (ro)
|
||||
Range of the above energy counter in micro-joules.
|
||||
|
||||
power_uw (ro)
|
||||
Current power in micro watts.
|
||||
|
||||
max_power_range_uw (ro)
|
||||
Range of the above power value in micro-watts.
|
||||
|
||||
name (ro)
|
||||
Name of this power zone.
|
||||
|
||||
It is possible that some domains have both power ranges and energy counter ranges;
|
||||
however, only one is mandatory.
|
||||
|
||||
Constraints
|
||||
-----------
|
||||
|
||||
constraint_X_power_limit_uw (rw)
|
||||
Power limit in micro watts, which should be applicable for the
|
||||
time window specified by "constraint_X_time_window_us".
|
||||
|
||||
constraint_X_time_window_us (rw)
|
||||
Time window in micro seconds.
|
||||
|
||||
constraint_X_name (ro)
|
||||
An optional name of the constraint
|
||||
|
||||
constraint_X_max_power_uw(ro)
|
||||
Maximum allowed power in micro watts.
|
||||
|
||||
constraint_X_min_power_uw(ro)
|
||||
Minimum allowed power in micro watts.
|
||||
|
||||
constraint_X_max_time_window_us(ro)
|
||||
Maximum allowed time window in micro seconds.
|
||||
|
||||
constraint_X_min_time_window_us(ro)
|
||||
Minimum allowed time window in micro seconds.
|
||||
|
||||
Except power_limit_uw and time_window_us other fields are optional.
|
||||
|
||||
Common zone and control type attributes
|
||||
---------------------------------------
|
||||
|
||||
enabled (rw): Enable/Disable controls at zone level or for all zones using
|
||||
a control type.
|
||||
|
||||
Power Cap Client Driver Interface
|
||||
=================================
|
||||
|
||||
The API summary:
|
||||
|
||||
Call powercap_register_control_type() to register control type object.
|
||||
Call powercap_register_zone() to register a power zone (under a given
|
||||
control type), either as a top-level power zone or as a subzone of another
|
||||
power zone registered earlier.
|
||||
The number of constraints in a power zone and the corresponding callbacks have
|
||||
to be defined prior to calling powercap_register_zone() to register that zone.
|
||||
|
||||
To Free a power zone call powercap_unregister_zone().
|
||||
To free a control type object call powercap_unregister_control_type().
|
||||
Detailed API can be generated using kernel-doc on include/linux/powercap.h.
|
@ -1,236 +0,0 @@
|
||||
Power Capping Framework
|
||||
==================================
|
||||
|
||||
The power capping framework provides a consistent interface between the kernel
|
||||
and the user space that allows power capping drivers to expose the settings to
|
||||
user space in a uniform way.
|
||||
|
||||
Terminology
|
||||
=========================
|
||||
The framework exposes power capping devices to user space via sysfs in the
|
||||
form of a tree of objects. The objects at the root level of the tree represent
|
||||
'control types', which correspond to different methods of power capping. For
|
||||
example, the intel-rapl control type represents the Intel "Running Average
|
||||
Power Limit" (RAPL) technology, whereas the 'idle-injection' control type
|
||||
corresponds to the use of idle injection for controlling power.
|
||||
|
||||
Power zones represent different parts of the system, which can be controlled and
|
||||
monitored using the power capping method determined by the control type the
|
||||
given zone belongs to. They each contain attributes for monitoring power, as
|
||||
well as controls represented in the form of power constraints. If the parts of
|
||||
the system represented by different power zones are hierarchical (that is, one
|
||||
bigger part consists of multiple smaller parts that each have their own power
|
||||
controls), those power zones may also be organized in a hierarchy with one
|
||||
parent power zone containing multiple subzones and so on to reflect the power
|
||||
control topology of the system. In that case, it is possible to apply power
|
||||
capping to a set of devices together using the parent power zone and if more
|
||||
fine grained control is required, it can be applied through the subzones.
|
||||
|
||||
|
||||
Example sysfs interface tree:
|
||||
|
||||
/sys/devices/virtual/powercap
|
||||
??? intel-rapl
|
||||
??? intel-rapl:0
|
||||
? ??? constraint_0_name
|
||||
? ??? constraint_0_power_limit_uw
|
||||
? ??? constraint_0_time_window_us
|
||||
? ??? constraint_1_name
|
||||
? ??? constraint_1_power_limit_uw
|
||||
? ??? constraint_1_time_window_us
|
||||
? ??? device -> ../../intel-rapl
|
||||
? ??? energy_uj
|
||||
? ??? intel-rapl:0:0
|
||||
? ? ??? constraint_0_name
|
||||
? ? ??? constraint_0_power_limit_uw
|
||||
? ? ??? constraint_0_time_window_us
|
||||
? ? ??? constraint_1_name
|
||||
? ? ??? constraint_1_power_limit_uw
|
||||
? ? ??? constraint_1_time_window_us
|
||||
? ? ??? device -> ../../intel-rapl:0
|
||||
? ? ??? energy_uj
|
||||
? ? ??? max_energy_range_uj
|
||||
? ? ??? name
|
||||
? ? ??? enabled
|
||||
? ? ??? power
|
||||
? ? ? ??? async
|
||||
? ? ? []
|
||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
||||
? ? ??? uevent
|
||||
? ??? intel-rapl:0:1
|
||||
? ? ??? constraint_0_name
|
||||
? ? ??? constraint_0_power_limit_uw
|
||||
? ? ??? constraint_0_time_window_us
|
||||
? ? ??? constraint_1_name
|
||||
? ? ??? constraint_1_power_limit_uw
|
||||
? ? ??? constraint_1_time_window_us
|
||||
? ? ??? device -> ../../intel-rapl:0
|
||||
? ? ??? energy_uj
|
||||
? ? ??? max_energy_range_uj
|
||||
? ? ??? name
|
||||
? ? ??? enabled
|
||||
? ? ??? power
|
||||
? ? ? ??? async
|
||||
? ? ? []
|
||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
||||
? ? ??? uevent
|
||||
? ??? max_energy_range_uj
|
||||
? ??? max_power_range_uw
|
||||
? ??? name
|
||||
? ??? enabled
|
||||
? ??? power
|
||||
? ? ??? async
|
||||
? ? []
|
||||
? ??? subsystem -> ../../../../../class/power_cap
|
||||
? ??? enabled
|
||||
? ??? uevent
|
||||
??? intel-rapl:1
|
||||
? ??? constraint_0_name
|
||||
? ??? constraint_0_power_limit_uw
|
||||
? ??? constraint_0_time_window_us
|
||||
? ??? constraint_1_name
|
||||
? ??? constraint_1_power_limit_uw
|
||||
? ??? constraint_1_time_window_us
|
||||
? ??? device -> ../../intel-rapl
|
||||
? ??? energy_uj
|
||||
? ??? intel-rapl:1:0
|
||||
? ? ??? constraint_0_name
|
||||
? ? ??? constraint_0_power_limit_uw
|
||||
? ? ??? constraint_0_time_window_us
|
||||
? ? ??? constraint_1_name
|
||||
? ? ??? constraint_1_power_limit_uw
|
||||
? ? ??? constraint_1_time_window_us
|
||||
? ? ??? device -> ../../intel-rapl:1
|
||||
? ? ??? energy_uj
|
||||
? ? ??? max_energy_range_uj
|
||||
? ? ??? name
|
||||
? ? ??? enabled
|
||||
? ? ??? power
|
||||
? ? ? ??? async
|
||||
? ? ? []
|
||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
||||
? ? ??? uevent
|
||||
? ??? intel-rapl:1:1
|
||||
? ? ??? constraint_0_name
|
||||
? ? ??? constraint_0_power_limit_uw
|
||||
? ? ??? constraint_0_time_window_us
|
||||
? ? ??? constraint_1_name
|
||||
? ? ??? constraint_1_power_limit_uw
|
||||
? ? ??? constraint_1_time_window_us
|
||||
? ? ??? device -> ../../intel-rapl:1
|
||||
? ? ??? energy_uj
|
||||
? ? ??? max_energy_range_uj
|
||||
? ? ??? name
|
||||
? ? ??? enabled
|
||||
? ? ??? power
|
||||
? ? ? ??? async
|
||||
? ? ? []
|
||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
||||
? ? ??? uevent
|
||||
? ??? max_energy_range_uj
|
||||
? ??? max_power_range_uw
|
||||
? ??? name
|
||||
? ??? enabled
|
||||
? ??? power
|
||||
? ? ??? async
|
||||
? ? []
|
||||
? ??? subsystem -> ../../../../../class/power_cap
|
||||
? ??? uevent
|
||||
??? power
|
||||
? ??? async
|
||||
? []
|
||||
??? subsystem -> ../../../../class/power_cap
|
||||
??? enabled
|
||||
??? uevent
|
||||
|
||||
The above example illustrates a case in which the Intel RAPL technology,
|
||||
available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one
|
||||
control type called intel-rapl which contains two power zones, intel-rapl:0 and
|
||||
intel-rapl:1, representing CPU packages. Each of these power zones contains
|
||||
two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the
|
||||
"core" and the "uncore" parts of the given CPU package, respectively. All of
|
||||
the zones and subzones contain energy monitoring attributes (energy_uj,
|
||||
max_energy_range_uj) and constraint attributes (constraint_*) allowing controls
|
||||
to be applied (the constraints in the 'package' power zones apply to the whole
|
||||
CPU packages and the subzone constraints only apply to the respective parts of
|
||||
the given package individually). Since Intel RAPL doesn't provide instantaneous
|
||||
power value, there is no power_uw attribute.
|
||||
|
||||
In addition to that, each power zone contains a name attribute, allowing the
|
||||
part of the system represented by that zone to be identified.
|
||||
For example:
|
||||
|
||||
cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
|
||||
package-0
|
||||
|
||||
The Intel RAPL technology allows two constraints, short term and long term,
|
||||
with two different time windows to be applied to each power zone. Thus for
|
||||
each zone there are 2 attributes representing the constraint names, 2 power
|
||||
limits and 2 attributes representing the sizes of the time windows. Such that,
|
||||
constraint_j_* attributes correspond to the jth constraint (j = 0,1).
|
||||
|
||||
For example:
|
||||
constraint_0_name
|
||||
constraint_0_power_limit_uw
|
||||
constraint_0_time_window_us
|
||||
constraint_1_name
|
||||
constraint_1_power_limit_uw
|
||||
constraint_1_time_window_us
|
||||
|
||||
Power Zone Attributes
|
||||
=================================
|
||||
Monitoring attributes
|
||||
----------------------
|
||||
|
||||
energy_uj (rw): Current energy counter in micro joules. Write "0" to reset.
|
||||
If the counter can not be reset, then this attribute is read only.
|
||||
|
||||
max_energy_range_uj (ro): Range of the above energy counter in micro-joules.
|
||||
|
||||
power_uw (ro): Current power in micro watts.
|
||||
|
||||
max_power_range_uw (ro): Range of the above power value in micro-watts.
|
||||
|
||||
name (ro): Name of this power zone.
|
||||
|
||||
It is possible that some domains have both power ranges and energy counter ranges;
|
||||
however, only one is mandatory.
|
||||
|
||||
Constraints
|
||||
----------------
|
||||
constraint_X_power_limit_uw (rw): Power limit in micro watts, which should be
|
||||
applicable for the time window specified by "constraint_X_time_window_us".
|
||||
|
||||
constraint_X_time_window_us (rw): Time window in micro seconds.
|
||||
|
||||
constraint_X_name (ro): An optional name of the constraint
|
||||
|
||||
constraint_X_max_power_uw(ro): Maximum allowed power in micro watts.
|
||||
|
||||
constraint_X_min_power_uw(ro): Minimum allowed power in micro watts.
|
||||
|
||||
constraint_X_max_time_window_us(ro): Maximum allowed time window in micro seconds.
|
||||
|
||||
constraint_X_min_time_window_us(ro): Minimum allowed time window in micro seconds.
|
||||
|
||||
Except power_limit_uw and time_window_us other fields are optional.
|
||||
|
||||
Common zone and control type attributes
|
||||
----------------------------------------
|
||||
enabled (rw): Enable/Disable controls at zone level or for all zones using
|
||||
a control type.
|
||||
|
||||
Power Cap Client Driver Interface
|
||||
==================================
|
||||
The API summary:
|
||||
|
||||
Call powercap_register_control_type() to register control type object.
|
||||
Call powercap_register_zone() to register a power zone (under a given
|
||||
control type), either as a top-level power zone or as a subzone of another
|
||||
power zone registered earlier.
|
||||
The number of constraints in a power zone and the corresponding callbacks have
|
||||
to be defined prior to calling powercap_register_zone() to register that zone.
|
||||
|
||||
To Free a power zone call powercap_unregister_zone().
|
||||
To free a control type object call powercap_unregister_control_type().
|
||||
Detailed API can be generated using kernel-doc on include/linux/powercap.h.
|
@ -1,3 +1,4 @@
|
||||
===================================
|
||||
Regulator Consumer Driver Interface
|
||||
===================================
|
||||
|
||||
@ -8,73 +9,77 @@ Please see overview.txt for a description of the terms used in this text.
|
||||
1. Consumer Regulator Access (static & dynamic drivers)
|
||||
=======================================================
|
||||
|
||||
A consumer driver can get access to its supply regulator by calling :-
|
||||
A consumer driver can get access to its supply regulator by calling ::
|
||||
|
||||
regulator = regulator_get(dev, "Vcc");
|
||||
regulator = regulator_get(dev, "Vcc");
|
||||
|
||||
The consumer passes in its struct device pointer and power supply ID. The core
|
||||
then finds the correct regulator by consulting a machine specific lookup table.
|
||||
If the lookup is successful then this call will return a pointer to the struct
|
||||
regulator that supplies this consumer.
|
||||
|
||||
To release the regulator the consumer driver should call :-
|
||||
To release the regulator the consumer driver should call ::
|
||||
|
||||
regulator_put(regulator);
|
||||
regulator_put(regulator);
|
||||
|
||||
Consumers can be supplied by more than one regulator e.g. codec consumer with
|
||||
analog and digital supplies :-
|
||||
analog and digital supplies ::
|
||||
|
||||
digital = regulator_get(dev, "Vcc"); /* digital core */
|
||||
analog = regulator_get(dev, "Avdd"); /* analog */
|
||||
digital = regulator_get(dev, "Vcc"); /* digital core */
|
||||
analog = regulator_get(dev, "Avdd"); /* analog */
|
||||
|
||||
The regulator access functions regulator_get() and regulator_put() will
|
||||
usually be called in your device drivers probe() and remove() respectively.
|
||||
|
||||
|
||||
2. Regulator Output Enable & Disable (static & dynamic drivers)
|
||||
====================================================================
|
||||
===============================================================
|
||||
|
||||
A consumer can enable its power supply by calling:-
|
||||
|
||||
int regulator_enable(regulator);
|
||||
A consumer can enable its power supply by calling::
|
||||
|
||||
NOTE: The supply may already be enabled before regulator_enabled() is called.
|
||||
This may happen if the consumer shares the regulator or the regulator has been
|
||||
previously enabled by bootloader or kernel board initialization code.
|
||||
int regulator_enable(regulator);
|
||||
|
||||
A consumer can determine if a regulator is enabled by calling :-
|
||||
NOTE:
|
||||
The supply may already be enabled before regulator_enabled() is called.
|
||||
This may happen if the consumer shares the regulator or the regulator has been
|
||||
previously enabled by bootloader or kernel board initialization code.
|
||||
|
||||
int regulator_is_enabled(regulator);
|
||||
A consumer can determine if a regulator is enabled by calling::
|
||||
|
||||
int regulator_is_enabled(regulator);
|
||||
|
||||
This will return > zero when the regulator is enabled.
|
||||
|
||||
|
||||
A consumer can disable its supply when no longer needed by calling :-
|
||||
A consumer can disable its supply when no longer needed by calling::
|
||||
|
||||
int regulator_disable(regulator);
|
||||
int regulator_disable(regulator);
|
||||
|
||||
NOTE: This may not disable the supply if it's shared with other consumers. The
|
||||
regulator will only be disabled when the enabled reference count is zero.
|
||||
NOTE:
|
||||
This may not disable the supply if it's shared with other consumers. The
|
||||
regulator will only be disabled when the enabled reference count is zero.
|
||||
|
||||
Finally, a regulator can be forcefully disabled in the case of an emergency :-
|
||||
Finally, a regulator can be forcefully disabled in the case of an emergency::
|
||||
|
||||
int regulator_force_disable(regulator);
|
||||
int regulator_force_disable(regulator);
|
||||
|
||||
NOTE: this will immediately and forcefully shutdown the regulator output. All
|
||||
consumers will be powered off.
|
||||
NOTE:
|
||||
this will immediately and forcefully shutdown the regulator output. All
|
||||
consumers will be powered off.
|
||||
|
||||
|
||||
3. Regulator Voltage Control & Status (dynamic drivers)
|
||||
======================================================
|
||||
=======================================================
|
||||
|
||||
Some consumer drivers need to be able to dynamically change their supply
|
||||
voltage to match system operating points. e.g. CPUfreq drivers can scale
|
||||
voltage along with frequency to save power, SD drivers may need to select the
|
||||
correct card voltage, etc.
|
||||
|
||||
Consumers can control their supply voltage by calling :-
|
||||
Consumers can control their supply voltage by calling::
|
||||
|
||||
int regulator_set_voltage(regulator, min_uV, max_uV);
|
||||
int regulator_set_voltage(regulator, min_uV, max_uV);
|
||||
|
||||
Where min_uV and max_uV are the minimum and maximum acceptable voltages in
|
||||
microvolts.
|
||||
@ -84,47 +89,50 @@ when enabled, then the voltage changes instantly, otherwise the voltage
|
||||
configuration changes and the voltage is physically set when the regulator is
|
||||
next enabled.
|
||||
|
||||
The regulators configured voltage output can be found by calling :-
|
||||
The regulators configured voltage output can be found by calling::
|
||||
|
||||
int regulator_get_voltage(regulator);
|
||||
int regulator_get_voltage(regulator);
|
||||
|
||||
NOTE: get_voltage() will return the configured output voltage whether the
|
||||
regulator is enabled or disabled and should NOT be used to determine regulator
|
||||
output state. However this can be used in conjunction with is_enabled() to
|
||||
determine the regulator physical output voltage.
|
||||
NOTE:
|
||||
get_voltage() will return the configured output voltage whether the
|
||||
regulator is enabled or disabled and should NOT be used to determine regulator
|
||||
output state. However this can be used in conjunction with is_enabled() to
|
||||
determine the regulator physical output voltage.
|
||||
|
||||
|
||||
4. Regulator Current Limit Control & Status (dynamic drivers)
|
||||
===========================================================
|
||||
=============================================================
|
||||
|
||||
Some consumer drivers need to be able to dynamically change their supply
|
||||
current limit to match system operating points. e.g. LCD backlight driver can
|
||||
change the current limit to vary the backlight brightness, USB drivers may want
|
||||
to set the limit to 500mA when supplying power.
|
||||
|
||||
Consumers can control their supply current limit by calling :-
|
||||
Consumers can control their supply current limit by calling::
|
||||
|
||||
int regulator_set_current_limit(regulator, min_uA, max_uA);
|
||||
int regulator_set_current_limit(regulator, min_uA, max_uA);
|
||||
|
||||
Where min_uA and max_uA are the minimum and maximum acceptable current limit in
|
||||
microamps.
|
||||
|
||||
NOTE: this can be called when the regulator is enabled or disabled. If called
|
||||
when enabled, then the current limit changes instantly, otherwise the current
|
||||
limit configuration changes and the current limit is physically set when the
|
||||
regulator is next enabled.
|
||||
NOTE:
|
||||
this can be called when the regulator is enabled or disabled. If called
|
||||
when enabled, then the current limit changes instantly, otherwise the current
|
||||
limit configuration changes and the current limit is physically set when the
|
||||
regulator is next enabled.
|
||||
|
||||
A regulators current limit can be found by calling :-
|
||||
A regulators current limit can be found by calling::
|
||||
|
||||
int regulator_get_current_limit(regulator);
|
||||
int regulator_get_current_limit(regulator);
|
||||
|
||||
NOTE: get_current_limit() will return the current limit whether the regulator
|
||||
is enabled or disabled and should not be used to determine regulator current
|
||||
load.
|
||||
NOTE:
|
||||
get_current_limit() will return the current limit whether the regulator
|
||||
is enabled or disabled and should not be used to determine regulator current
|
||||
load.
|
||||
|
||||
|
||||
5. Regulator Operating Mode Control & Status (dynamic drivers)
|
||||
=============================================================
|
||||
==============================================================
|
||||
|
||||
Some consumers can further save system power by changing the operating mode of
|
||||
their supply regulator to be more efficient when the consumers operating state
|
||||
@ -135,9 +143,9 @@ Regulator operating mode can be changed indirectly or directly.
|
||||
Indirect operating mode control.
|
||||
--------------------------------
|
||||
Consumer drivers can request a change in their supply regulator operating mode
|
||||
by calling :-
|
||||
by calling::
|
||||
|
||||
int regulator_set_load(struct regulator *regulator, int load_uA);
|
||||
int regulator_set_load(struct regulator *regulator, int load_uA);
|
||||
|
||||
This will cause the core to recalculate the total load on the regulator (based
|
||||
on all its consumers) and change operating mode (if necessary and permitted)
|
||||
@ -153,12 +161,13 @@ consumers.
|
||||
|
||||
Direct operating mode control.
|
||||
------------------------------
|
||||
|
||||
Bespoke or tightly coupled drivers may want to directly control regulator
|
||||
operating mode depending on their operating point. This can be achieved by
|
||||
calling :-
|
||||
calling::
|
||||
|
||||
int regulator_set_mode(struct regulator *regulator, unsigned int mode);
|
||||
unsigned int regulator_get_mode(struct regulator *regulator);
|
||||
int regulator_set_mode(struct regulator *regulator, unsigned int mode);
|
||||
unsigned int regulator_get_mode(struct regulator *regulator);
|
||||
|
||||
Direct mode will only be used by consumers that *know* about the regulator and
|
||||
are not sharing the regulator with other consumers.
|
||||
@ -166,24 +175,26 @@ are not sharing the regulator with other consumers.
|
||||
|
||||
6. Regulator Events
|
||||
===================
|
||||
|
||||
Regulators can notify consumers of external events. Events could be received by
|
||||
consumers under regulator stress or failure conditions.
|
||||
|
||||
Consumers can register interest in regulator events by calling :-
|
||||
Consumers can register interest in regulator events by calling::
|
||||
|
||||
int regulator_register_notifier(struct regulator *regulator,
|
||||
struct notifier_block *nb);
|
||||
int regulator_register_notifier(struct regulator *regulator,
|
||||
struct notifier_block *nb);
|
||||
|
||||
Consumers can unregister interest by calling :-
|
||||
Consumers can unregister interest by calling::
|
||||
|
||||
int regulator_unregister_notifier(struct regulator *regulator,
|
||||
struct notifier_block *nb);
|
||||
int regulator_unregister_notifier(struct regulator *regulator,
|
||||
struct notifier_block *nb);
|
||||
|
||||
Regulators use the kernel notifier framework to send event to their interested
|
||||
consumers.
|
||||
|
||||
7. Regulator Direct Register Access
|
||||
===================================
|
||||
|
||||
Some kinds of power management hardware or firmware are designed such that
|
||||
they need to do low-level hardware access to regulators, with no involvement
|
||||
from the kernel. Examples of such devices are:
|
||||
@ -199,20 +210,20 @@ to it. The regulator framework provides the following helpers for querying
|
||||
these details.
|
||||
|
||||
Bus-specific details, like I2C addresses or transfer rates are handled by the
|
||||
regmap framework. To get the regulator's regmap (if supported), use :-
|
||||
regmap framework. To get the regulator's regmap (if supported), use::
|
||||
|
||||
struct regmap *regulator_get_regmap(struct regulator *regulator);
|
||||
struct regmap *regulator_get_regmap(struct regulator *regulator);
|
||||
|
||||
To obtain the hardware register offset and bitmask for the regulator's voltage
|
||||
selector register, use :-
|
||||
selector register, use::
|
||||
|
||||
int regulator_get_hardware_vsel_register(struct regulator *regulator,
|
||||
unsigned *vsel_reg,
|
||||
unsigned *vsel_mask);
|
||||
int regulator_get_hardware_vsel_register(struct regulator *regulator,
|
||||
unsigned *vsel_reg,
|
||||
unsigned *vsel_mask);
|
||||
|
||||
To convert a regulator framework voltage selector code (used by
|
||||
regulator_list_voltage) to a hardware-specific voltage selector that can be
|
||||
directly written to the voltage selector register, use :-
|
||||
directly written to the voltage selector register, use::
|
||||
|
||||
int regulator_list_hardware_vsel(struct regulator *regulator,
|
||||
unsigned selector);
|
||||
int regulator_list_hardware_vsel(struct regulator *regulator,
|
||||
unsigned selector);
|
@ -1,3 +1,4 @@
|
||||
==========================
|
||||
Regulator API design notes
|
||||
==========================
|
||||
|
||||
@ -14,7 +15,9 @@ Safety
|
||||
have different power requirements, and not all components with power
|
||||
requirements are visible to software.
|
||||
|
||||
=> The API should make no changes to the hardware state unless it has
|
||||
.. note::
|
||||
|
||||
The API should make no changes to the hardware state unless it has
|
||||
specific knowledge that these changes are safe to perform on this
|
||||
particular system.
|
||||
|
||||
@ -28,6 +31,8 @@ Consumer use cases
|
||||
- Many of the power supplies in the system will be shared between many
|
||||
different consumers.
|
||||
|
||||
=> The consumer API should be structured so that these use cases are
|
||||
.. note::
|
||||
|
||||
The consumer API should be structured so that these use cases are
|
||||
very easy to handle and so that consumers will work with shared
|
||||
supplies without any additional effort.
|
@ -1,10 +1,11 @@
|
||||
==================================
|
||||
Regulator Machine Driver Interface
|
||||
===================================
|
||||
==================================
|
||||
|
||||
The regulator machine driver interface is intended for board/machine specific
|
||||
initialisation code to configure the regulator subsystem.
|
||||
|
||||
Consider the following machine :-
|
||||
Consider the following machine::
|
||||
|
||||
Regulator-1 -+-> Regulator-2 --> [Consumer A @ 1.8 - 2.0V]
|
||||
|
|
||||
@ -13,31 +14,31 @@ Consider the following machine :-
|
||||
The drivers for consumers A & B must be mapped to the correct regulator in
|
||||
order to control their power supplies. This mapping can be achieved in machine
|
||||
initialisation code by creating a struct regulator_consumer_supply for
|
||||
each regulator.
|
||||
each regulator::
|
||||
|
||||
struct regulator_consumer_supply {
|
||||
struct regulator_consumer_supply {
|
||||
const char *dev_name; /* consumer dev_name() */
|
||||
const char *supply; /* consumer supply - e.g. "vcc" */
|
||||
};
|
||||
};
|
||||
|
||||
e.g. for the machine above
|
||||
e.g. for the machine above::
|
||||
|
||||
static struct regulator_consumer_supply regulator1_consumers[] = {
|
||||
static struct regulator_consumer_supply regulator1_consumers[] = {
|
||||
REGULATOR_SUPPLY("Vcc", "consumer B"),
|
||||
};
|
||||
};
|
||||
|
||||
static struct regulator_consumer_supply regulator2_consumers[] = {
|
||||
static struct regulator_consumer_supply regulator2_consumers[] = {
|
||||
REGULATOR_SUPPLY("Vcc", "consumer A"),
|
||||
};
|
||||
};
|
||||
|
||||
This maps Regulator-1 to the 'Vcc' supply for Consumer B and maps Regulator-2
|
||||
to the 'Vcc' supply for Consumer A.
|
||||
|
||||
Constraints can now be registered by defining a struct regulator_init_data
|
||||
for each regulator power domain. This structure also maps the consumers
|
||||
to their supply regulators :-
|
||||
to their supply regulators::
|
||||
|
||||
static struct regulator_init_data regulator1_data = {
|
||||
static struct regulator_init_data regulator1_data = {
|
||||
.constraints = {
|
||||
.name = "Regulator-1",
|
||||
.min_uV = 3300000,
|
||||
@ -46,7 +47,7 @@ static struct regulator_init_data regulator1_data = {
|
||||
},
|
||||
.num_consumer_supplies = ARRAY_SIZE(regulator1_consumers),
|
||||
.consumer_supplies = regulator1_consumers,
|
||||
};
|
||||
};
|
||||
|
||||
The name field should be set to something that is usefully descriptive
|
||||
for the board for configuration of supplies for other regulators and
|
||||
@ -57,9 +58,9 @@ name is provided then the subsystem will choose one.
|
||||
Regulator-1 supplies power to Regulator-2. This relationship must be registered
|
||||
with the core so that Regulator-1 is also enabled when Consumer A enables its
|
||||
supply (Regulator-2). The supply regulator is set by the supply_regulator
|
||||
field below and co:-
|
||||
field below and co::
|
||||
|
||||
static struct regulator_init_data regulator2_data = {
|
||||
static struct regulator_init_data regulator2_data = {
|
||||
.supply_regulator = "Regulator-1",
|
||||
.constraints = {
|
||||
.min_uV = 1800000,
|
||||
@ -69,11 +70,11 @@ static struct regulator_init_data regulator2_data = {
|
||||
},
|
||||
.num_consumer_supplies = ARRAY_SIZE(regulator2_consumers),
|
||||
.consumer_supplies = regulator2_consumers,
|
||||
};
|
||||
};
|
||||
|
||||
Finally the regulator devices must be registered in the usual manner.
|
||||
Finally the regulator devices must be registered in the usual manner::
|
||||
|
||||
static struct platform_device regulator_devices[] = {
|
||||
static struct platform_device regulator_devices[] = {
|
||||
{
|
||||
.name = "regulator",
|
||||
.id = DCDC_1,
|
||||
@ -88,9 +89,9 @@ static struct platform_device regulator_devices[] = {
|
||||
.platform_data = ®ulator2_data,
|
||||
},
|
||||
},
|
||||
};
|
||||
/* register regulator 1 device */
|
||||
platform_device_register(®ulator_devices[0]);
|
||||
};
|
||||
/* register regulator 1 device */
|
||||
platform_device_register(®ulator_devices[0]);
|
||||
|
||||
/* register regulator 2 device */
|
||||
platform_device_register(®ulator_devices[1]);
|
||||
/* register regulator 2 device */
|
||||
platform_device_register(®ulator_devices[1]);
|
@ -1,3 +1,4 @@
|
||||
=============================================
|
||||
Linux voltage and current regulator framework
|
||||
=============================================
|
||||
|
||||
@ -13,26 +14,30 @@ regulators (where voltage output is controllable) and current sinks (where
|
||||
current limit is controllable).
|
||||
|
||||
(C) 2008 Wolfson Microelectronics PLC.
|
||||
|
||||
Author: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||
|
||||
|
||||
Nomenclature
|
||||
============
|
||||
|
||||
Some terms used in this document:-
|
||||
Some terms used in this document:
|
||||
|
||||
o Regulator - Electronic device that supplies power to other devices.
|
||||
- Regulator
|
||||
- Electronic device that supplies power to other devices.
|
||||
Most regulators can enable and disable their output while
|
||||
some can control their output voltage and or current.
|
||||
|
||||
Input Voltage -> Regulator -> Output Voltage
|
||||
|
||||
|
||||
o PMIC - Power Management IC. An IC that contains numerous regulators
|
||||
and often contains other subsystems.
|
||||
- PMIC
|
||||
- Power Management IC. An IC that contains numerous
|
||||
regulators and often contains other subsystems.
|
||||
|
||||
|
||||
o Consumer - Electronic device that is supplied power by a regulator.
|
||||
- Consumer
|
||||
- Electronic device that is supplied power by a regulator.
|
||||
Consumers can be classified into two types:-
|
||||
|
||||
Static: consumer does not change its supply voltage or
|
||||
@ -44,46 +49,48 @@ Some terms used in this document:-
|
||||
current limit to meet operation demands.
|
||||
|
||||
|
||||
o Power Domain - Electronic circuit that is supplied its input power by the
|
||||
- Power Domain
|
||||
- Electronic circuit that is supplied its input power by the
|
||||
output power of a regulator, switch or by another power
|
||||
domain.
|
||||
|
||||
The supply regulator may be behind a switch(s). i.e.
|
||||
The supply regulator may be behind a switch(s). i.e.::
|
||||
|
||||
Regulator -+-> Switch-1 -+-> Switch-2 --> [Consumer A]
|
||||
| |
|
||||
| +-> [Consumer B], [Consumer C]
|
||||
|
|
||||
+-> [Consumer D], [Consumer E]
|
||||
Regulator -+-> Switch-1 -+-> Switch-2 --> [Consumer A]
|
||||
| |
|
||||
| +-> [Consumer B], [Consumer C]
|
||||
|
|
||||
+-> [Consumer D], [Consumer E]
|
||||
|
||||
That is one regulator and three power domains:
|
||||
|
||||
Domain 1: Switch-1, Consumers D & E.
|
||||
Domain 2: Switch-2, Consumers B & C.
|
||||
Domain 3: Consumer A.
|
||||
- Domain 1: Switch-1, Consumers D & E.
|
||||
- Domain 2: Switch-2, Consumers B & C.
|
||||
- Domain 3: Consumer A.
|
||||
|
||||
and this represents a "supplies" relationship:
|
||||
|
||||
Domain-1 --> Domain-2 --> Domain-3.
|
||||
|
||||
A power domain may have regulators that are supplied power
|
||||
by other regulators. i.e.
|
||||
by other regulators. i.e.::
|
||||
|
||||
Regulator-1 -+-> Regulator-2 -+-> [Consumer A]
|
||||
|
|
||||
+-> [Consumer B]
|
||||
Regulator-1 -+-> Regulator-2 -+-> [Consumer A]
|
||||
|
|
||||
+-> [Consumer B]
|
||||
|
||||
This gives us two regulators and two power domains:
|
||||
|
||||
Domain 1: Regulator-2, Consumer B.
|
||||
Domain 2: Consumer A.
|
||||
- Domain 1: Regulator-2, Consumer B.
|
||||
- Domain 2: Consumer A.
|
||||
|
||||
and a "supplies" relationship:
|
||||
|
||||
Domain-1 --> Domain-2
|
||||
|
||||
|
||||
o Constraints - Constraints are used to define power levels for performance
|
||||
- Constraints
|
||||
- Constraints are used to define power levels for performance
|
||||
and hardware protection. Constraints exist at three levels:
|
||||
|
||||
Regulator Level: This is defined by the regulator hardware
|
||||
@ -141,7 +148,7 @@ relevant to non SoC devices and is split into the following four interfaces:-
|
||||
limit. This also compiles out if not in use so drivers can be reused in
|
||||
systems with no regulator based power control.
|
||||
|
||||
See Documentation/power/regulator/consumer.txt
|
||||
See Documentation/power/regulator/consumer.rst
|
||||
|
||||
2. Regulator driver interface.
|
||||
|
||||
@ -149,7 +156,7 @@ relevant to non SoC devices and is split into the following four interfaces:-
|
||||
operations to the core. It also has a notifier call chain for propagating
|
||||
regulator events to clients.
|
||||
|
||||
See Documentation/power/regulator/regulator.txt
|
||||
See Documentation/power/regulator/regulator.rst
|
||||
|
||||
3. Machine interface.
|
||||
|
||||
@ -160,7 +167,7 @@ relevant to non SoC devices and is split into the following four interfaces:-
|
||||
allows the creation of a regulator tree whereby some regulators are
|
||||
supplied by others (similar to a clock tree).
|
||||
|
||||
See Documentation/power/regulator/machine.txt
|
||||
See Documentation/power/regulator/machine.rst
|
||||
|
||||
4. Userspace ABI.
|
||||
|
32
Documentation/power/regulator/regulator.rst
Normal file
32
Documentation/power/regulator/regulator.rst
Normal file
@ -0,0 +1,32 @@
|
||||
==========================
|
||||
Regulator Driver Interface
|
||||
==========================
|
||||
|
||||
The regulator driver interface is relatively simple and designed to allow
|
||||
regulator drivers to register their services with the core framework.
|
||||
|
||||
|
||||
Registration
|
||||
============
|
||||
|
||||
Drivers can register a regulator by calling::
|
||||
|
||||
struct regulator_dev *regulator_register(struct regulator_desc *regulator_desc,
|
||||
const struct regulator_config *config);
|
||||
|
||||
This will register the regulator's capabilities and operations to the regulator
|
||||
core.
|
||||
|
||||
Regulators can be unregistered by calling::
|
||||
|
||||
void regulator_unregister(struct regulator_dev *rdev);
|
||||
|
||||
|
||||
Regulator Events
|
||||
================
|
||||
|
||||
Regulators can send events (e.g. overtemperature, undervoltage, etc) to
|
||||
consumer drivers by calling::
|
||||
|
||||
int regulator_notifier_call_chain(struct regulator_dev *rdev,
|
||||
unsigned long event, void *data);
|
@ -1,30 +0,0 @@
|
||||
Regulator Driver Interface
|
||||
==========================
|
||||
|
||||
The regulator driver interface is relatively simple and designed to allow
|
||||
regulator drivers to register their services with the core framework.
|
||||
|
||||
|
||||
Registration
|
||||
============
|
||||
|
||||
Drivers can register a regulator by calling :-
|
||||
|
||||
struct regulator_dev *regulator_register(struct regulator_desc *regulator_desc,
|
||||
const struct regulator_config *config);
|
||||
|
||||
This will register the regulator's capabilities and operations to the regulator
|
||||
core.
|
||||
|
||||
Regulators can be unregistered by calling :-
|
||||
|
||||
void regulator_unregister(struct regulator_dev *rdev);
|
||||
|
||||
|
||||
Regulator Events
|
||||
================
|
||||
Regulators can send events (e.g. overtemperature, undervoltage, etc) to
|
||||
consumer drivers by calling :-
|
||||
|
||||
int regulator_notifier_call_chain(struct regulator_dev *rdev,
|
||||
unsigned long event, void *data);
|
@ -1,10 +1,15 @@
|
||||
==================================================
|
||||
Runtime Power Management Framework for I/O Devices
|
||||
==================================================
|
||||
|
||||
(C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
|
||||
|
||||
(C) 2010 Alan Stern <stern@rowland.harvard.edu>
|
||||
|
||||
(C) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
|
||||
Support for runtime power management (runtime PM) of I/O devices is provided
|
||||
at the power management core (PM core) level by means of:
|
||||
@ -33,16 +38,17 @@ fields of 'struct dev_pm_info' and the core helper functions provided for
|
||||
runtime PM are described below.
|
||||
|
||||
2. Device Runtime PM Callbacks
|
||||
==============================
|
||||
|
||||
There are three device runtime PM callbacks defined in 'struct dev_pm_ops':
|
||||
There are three device runtime PM callbacks defined in 'struct dev_pm_ops'::
|
||||
|
||||
struct dev_pm_ops {
|
||||
struct dev_pm_ops {
|
||||
...
|
||||
int (*runtime_suspend)(struct device *dev);
|
||||
int (*runtime_resume)(struct device *dev);
|
||||
int (*runtime_idle)(struct device *dev);
|
||||
...
|
||||
};
|
||||
};
|
||||
|
||||
The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks
|
||||
are executed by the PM core for the device's subsystem that may be either of
|
||||
@ -112,7 +118,7 @@ low-power state during the execution of the suspend callback, it is expected
|
||||
that remote wakeup will be enabled for the device. Generally, remote wakeup
|
||||
should be enabled for all input devices put into low-power states at run time.
|
||||
|
||||
The subsystem-level resume callback, if present, is _entirely_ _responsible_ for
|
||||
The subsystem-level resume callback, if present, is **entirely responsible** for
|
||||
handling the resume of the device as appropriate, which may, but need not
|
||||
include executing the device driver's own ->runtime_resume() callback (from the
|
||||
PM core's point of view it is not necessary to implement a ->runtime_resume()
|
||||
@ -197,95 +203,96 @@ rules:
|
||||
except for scheduled autosuspends.
|
||||
|
||||
3. Runtime PM Device Fields
|
||||
===========================
|
||||
|
||||
The following device runtime PM fields are present in 'struct dev_pm_info', as
|
||||
defined in include/linux/pm.h:
|
||||
|
||||
struct timer_list suspend_timer;
|
||||
`struct timer_list suspend_timer;`
|
||||
- timer used for scheduling (delayed) suspend and autosuspend requests
|
||||
|
||||
unsigned long timer_expires;
|
||||
`unsigned long timer_expires;`
|
||||
- timer expiration time, in jiffies (if this is different from zero, the
|
||||
timer is running and will expire at that time, otherwise the timer is not
|
||||
running)
|
||||
|
||||
struct work_struct work;
|
||||
`struct work_struct work;`
|
||||
- work structure used for queuing up requests (i.e. work items in pm_wq)
|
||||
|
||||
wait_queue_head_t wait_queue;
|
||||
`wait_queue_head_t wait_queue;`
|
||||
- wait queue used if any of the helper functions needs to wait for another
|
||||
one to complete
|
||||
|
||||
spinlock_t lock;
|
||||
`spinlock_t lock;`
|
||||
- lock used for synchronization
|
||||
|
||||
atomic_t usage_count;
|
||||
`atomic_t usage_count;`
|
||||
- the usage counter of the device
|
||||
|
||||
atomic_t child_count;
|
||||
`atomic_t child_count;`
|
||||
- the count of 'active' children of the device
|
||||
|
||||
unsigned int ignore_children;
|
||||
`unsigned int ignore_children;`
|
||||
- if set, the value of child_count is ignored (but still updated)
|
||||
|
||||
unsigned int disable_depth;
|
||||
`unsigned int disable_depth;`
|
||||
- used for disabling the helper functions (they work normally if this is
|
||||
equal to zero); the initial value of it is 1 (i.e. runtime PM is
|
||||
initially disabled for all devices)
|
||||
|
||||
int runtime_error;
|
||||
`int runtime_error;`
|
||||
- if set, there was a fatal error (one of the callbacks returned error code
|
||||
as described in Section 2), so the helper functions will not work until
|
||||
this flag is cleared; this is the error code returned by the failing
|
||||
callback
|
||||
|
||||
unsigned int idle_notification;
|
||||
`unsigned int idle_notification;`
|
||||
- if set, ->runtime_idle() is being executed
|
||||
|
||||
unsigned int request_pending;
|
||||
`unsigned int request_pending;`
|
||||
- if set, there's a pending request (i.e. a work item queued up into pm_wq)
|
||||
|
||||
enum rpm_request request;
|
||||
`enum rpm_request request;`
|
||||
- type of request that's pending (valid if request_pending is set)
|
||||
|
||||
unsigned int deferred_resume;
|
||||
`unsigned int deferred_resume;`
|
||||
- set if ->runtime_resume() is about to be run while ->runtime_suspend() is
|
||||
being executed for that device and it is not practical to wait for the
|
||||
suspend to complete; means "start a resume as soon as you've suspended"
|
||||
|
||||
enum rpm_status runtime_status;
|
||||
`enum rpm_status runtime_status;`
|
||||
- the runtime PM status of the device; this field's initial value is
|
||||
RPM_SUSPENDED, which means that each device is initially regarded by the
|
||||
PM core as 'suspended', regardless of its real hardware status
|
||||
|
||||
unsigned int runtime_auto;
|
||||
`unsigned int runtime_auto;`
|
||||
- if set, indicates that the user space has allowed the device driver to
|
||||
power manage the device at run time via the /sys/devices/.../power/control
|
||||
interface; it may only be modified with the help of the pm_runtime_allow()
|
||||
`interface;` it may only be modified with the help of the pm_runtime_allow()
|
||||
and pm_runtime_forbid() helper functions
|
||||
|
||||
unsigned int no_callbacks;
|
||||
`unsigned int no_callbacks;`
|
||||
- indicates that the device does not use the runtime PM callbacks (see
|
||||
Section 8); it may be modified only by the pm_runtime_no_callbacks()
|
||||
helper function
|
||||
|
||||
unsigned int irq_safe;
|
||||
`unsigned int irq_safe;`
|
||||
- indicates that the ->runtime_suspend() and ->runtime_resume() callbacks
|
||||
will be invoked with the spinlock held and interrupts disabled
|
||||
|
||||
unsigned int use_autosuspend;
|
||||
`unsigned int use_autosuspend;`
|
||||
- indicates that the device's driver supports delayed autosuspend (see
|
||||
Section 9); it may be modified only by the
|
||||
pm_runtime{_dont}_use_autosuspend() helper functions
|
||||
|
||||
unsigned int timer_autosuspends;
|
||||
`unsigned int timer_autosuspends;`
|
||||
- indicates that the PM core should attempt to carry out an autosuspend
|
||||
when the timer expires rather than a normal suspend
|
||||
|
||||
int autosuspend_delay;
|
||||
`int autosuspend_delay;`
|
||||
- the delay time (in milliseconds) to be used for autosuspend
|
||||
|
||||
unsigned long last_busy;
|
||||
`unsigned long last_busy;`
|
||||
- the time (in jiffies) when the pm_runtime_mark_last_busy() helper
|
||||
function was last called for this device; used in calculating inactivity
|
||||
periods for autosuspend
|
||||
@ -293,37 +300,38 @@ defined in include/linux/pm.h:
|
||||
All of the above fields are members of the 'power' member of 'struct device'.
|
||||
|
||||
4. Runtime PM Device Helper Functions
|
||||
=====================================
|
||||
|
||||
The following runtime PM helper functions are defined in
|
||||
drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||
|
||||
void pm_runtime_init(struct device *dev);
|
||||
`void pm_runtime_init(struct device *dev);`
|
||||
- initialize the device runtime PM fields in 'struct dev_pm_info'
|
||||
|
||||
void pm_runtime_remove(struct device *dev);
|
||||
`void pm_runtime_remove(struct device *dev);`
|
||||
- make sure that the runtime PM of the device will be disabled after
|
||||
removing the device from device hierarchy
|
||||
|
||||
int pm_runtime_idle(struct device *dev);
|
||||
`int pm_runtime_idle(struct device *dev);`
|
||||
- execute the subsystem-level idle callback for the device; returns an
|
||||
error code on failure, where -EINPROGRESS means that ->runtime_idle() is
|
||||
already being executed; if there is no callback or the callback returns 0
|
||||
then run pm_runtime_autosuspend(dev) and return its result
|
||||
|
||||
int pm_runtime_suspend(struct device *dev);
|
||||
`int pm_runtime_suspend(struct device *dev);`
|
||||
- execute the subsystem-level suspend callback for the device; returns 0 on
|
||||
success, 1 if the device's runtime PM status was already 'suspended', or
|
||||
error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
|
||||
to suspend the device again in future and -EACCES means that
|
||||
'power.disable_depth' is different from 0
|
||||
|
||||
int pm_runtime_autosuspend(struct device *dev);
|
||||
`int pm_runtime_autosuspend(struct device *dev);`
|
||||
- same as pm_runtime_suspend() except that the autosuspend delay is taken
|
||||
into account; if pm_runtime_autosuspend_expiration() says the delay has
|
||||
`into account;` if pm_runtime_autosuspend_expiration() says the delay has
|
||||
not yet expired then an autosuspend is scheduled for the appropriate time
|
||||
and 0 is returned
|
||||
|
||||
int pm_runtime_resume(struct device *dev);
|
||||
`int pm_runtime_resume(struct device *dev);`
|
||||
- execute the subsystem-level resume callback for the device; returns 0 on
|
||||
success, 1 if the device's runtime PM status was already 'active' or
|
||||
error code on failure, where -EAGAIN means it may be safe to attempt to
|
||||
@ -331,17 +339,17 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||
checked additionally, and -EACCES means that 'power.disable_depth' is
|
||||
different from 0
|
||||
|
||||
int pm_request_idle(struct device *dev);
|
||||
`int pm_request_idle(struct device *dev);`
|
||||
- submit a request to execute the subsystem-level idle callback for the
|
||||
device (the request is represented by a work item in pm_wq); returns 0 on
|
||||
success or error code if the request has not been queued up
|
||||
|
||||
int pm_request_autosuspend(struct device *dev);
|
||||
`int pm_request_autosuspend(struct device *dev);`
|
||||
- schedule the execution of the subsystem-level suspend callback for the
|
||||
device when the autosuspend delay has expired; if the delay has already
|
||||
expired then the work item is queued up immediately
|
||||
|
||||
int pm_schedule_suspend(struct device *dev, unsigned int delay);
|
||||
`int pm_schedule_suspend(struct device *dev, unsigned int delay);`
|
||||
- schedule the execution of the subsystem-level suspend callback for the
|
||||
device in future, where 'delay' is the time to wait before queuing up a
|
||||
suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work
|
||||
@ -351,58 +359,58 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||
->runtime_suspend() is already scheduled and not yet expired, the new
|
||||
value of 'delay' will be used as the time to wait
|
||||
|
||||
int pm_request_resume(struct device *dev);
|
||||
`int pm_request_resume(struct device *dev);`
|
||||
- submit a request to execute the subsystem-level resume callback for the
|
||||
device (the request is represented by a work item in pm_wq); returns 0 on
|
||||
success, 1 if the device's runtime PM status was already 'active', or
|
||||
error code if the request hasn't been queued up
|
||||
|
||||
void pm_runtime_get_noresume(struct device *dev);
|
||||
`void pm_runtime_get_noresume(struct device *dev);`
|
||||
- increment the device's usage counter
|
||||
|
||||
int pm_runtime_get(struct device *dev);
|
||||
`int pm_runtime_get(struct device *dev);`
|
||||
- increment the device's usage counter, run pm_request_resume(dev) and
|
||||
return its result
|
||||
|
||||
int pm_runtime_get_sync(struct device *dev);
|
||||
`int pm_runtime_get_sync(struct device *dev);`
|
||||
- increment the device's usage counter, run pm_runtime_resume(dev) and
|
||||
return its result
|
||||
|
||||
int pm_runtime_get_if_in_use(struct device *dev);
|
||||
`int pm_runtime_get_if_in_use(struct device *dev);`
|
||||
- return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the
|
||||
runtime PM status is RPM_ACTIVE and the runtime PM usage counter is
|
||||
nonzero, increment the counter and return 1; otherwise return 0 without
|
||||
changing the counter
|
||||
|
||||
void pm_runtime_put_noidle(struct device *dev);
|
||||
`void pm_runtime_put_noidle(struct device *dev);`
|
||||
- decrement the device's usage counter
|
||||
|
||||
int pm_runtime_put(struct device *dev);
|
||||
`int pm_runtime_put(struct device *dev);`
|
||||
- decrement the device's usage counter; if the result is 0 then run
|
||||
pm_request_idle(dev) and return its result
|
||||
|
||||
int pm_runtime_put_autosuspend(struct device *dev);
|
||||
`int pm_runtime_put_autosuspend(struct device *dev);`
|
||||
- decrement the device's usage counter; if the result is 0 then run
|
||||
pm_request_autosuspend(dev) and return its result
|
||||
|
||||
int pm_runtime_put_sync(struct device *dev);
|
||||
`int pm_runtime_put_sync(struct device *dev);`
|
||||
- decrement the device's usage counter; if the result is 0 then run
|
||||
pm_runtime_idle(dev) and return its result
|
||||
|
||||
int pm_runtime_put_sync_suspend(struct device *dev);
|
||||
`int pm_runtime_put_sync_suspend(struct device *dev);`
|
||||
- decrement the device's usage counter; if the result is 0 then run
|
||||
pm_runtime_suspend(dev) and return its result
|
||||
|
||||
int pm_runtime_put_sync_autosuspend(struct device *dev);
|
||||
`int pm_runtime_put_sync_autosuspend(struct device *dev);`
|
||||
- decrement the device's usage counter; if the result is 0 then run
|
||||
pm_runtime_autosuspend(dev) and return its result
|
||||
|
||||
void pm_runtime_enable(struct device *dev);
|
||||
`void pm_runtime_enable(struct device *dev);`
|
||||
- decrement the device's 'power.disable_depth' field; if that field is equal
|
||||
to zero, the runtime PM helper functions can execute subsystem-level
|
||||
callbacks described in Section 2 for the device
|
||||
|
||||
int pm_runtime_disable(struct device *dev);
|
||||
`int pm_runtime_disable(struct device *dev);`
|
||||
- increment the device's 'power.disable_depth' field (if the value of that
|
||||
field was previously zero, this prevents subsystem-level runtime PM
|
||||
callbacks from being run for the device), make sure that all of the
|
||||
@ -411,7 +419,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||
necessary to execute the subsystem-level resume callback for the device
|
||||
to satisfy that request, otherwise 0 is returned
|
||||
|
||||
int pm_runtime_barrier(struct device *dev);
|
||||
`int pm_runtime_barrier(struct device *dev);`
|
||||
- check if there's a resume request pending for the device and resume it
|
||||
(synchronously) in that case, cancel any other pending runtime PM requests
|
||||
regarding it and wait for all runtime PM operations on it in progress to
|
||||
@ -419,10 +427,10 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||
necessary to execute the subsystem-level resume callback for the device to
|
||||
satisfy that request, otherwise 0 is returned
|
||||
|
||||
void pm_suspend_ignore_children(struct device *dev, bool enable);
|
||||
`void pm_suspend_ignore_children(struct device *dev, bool enable);`
|
||||
- set/unset the power.ignore_children flag of the device
|
||||
|
||||
int pm_runtime_set_active(struct device *dev);
|
||||
`int pm_runtime_set_active(struct device *dev);`
|
||||
- clear the device's 'power.runtime_error' flag, set the device's runtime
|
||||
PM status to 'active' and update its parent's counter of 'active'
|
||||
children as appropriate (it is only valid to use this function if
|
||||
@ -430,61 +438,61 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||
zero); it will fail and return error code if the device has a parent
|
||||
which is not active and the 'power.ignore_children' flag of which is unset
|
||||
|
||||
void pm_runtime_set_suspended(struct device *dev);
|
||||
`void pm_runtime_set_suspended(struct device *dev);`
|
||||
- clear the device's 'power.runtime_error' flag, set the device's runtime
|
||||
PM status to 'suspended' and update its parent's counter of 'active'
|
||||
children as appropriate (it is only valid to use this function if
|
||||
'power.runtime_error' is set or 'power.disable_depth' is greater than
|
||||
zero)
|
||||
|
||||
bool pm_runtime_active(struct device *dev);
|
||||
`bool pm_runtime_active(struct device *dev);`
|
||||
- return true if the device's runtime PM status is 'active' or its
|
||||
'power.disable_depth' field is not equal to zero, or false otherwise
|
||||
|
||||
bool pm_runtime_suspended(struct device *dev);
|
||||
`bool pm_runtime_suspended(struct device *dev);`
|
||||
- return true if the device's runtime PM status is 'suspended' and its
|
||||
'power.disable_depth' field is equal to zero, or false otherwise
|
||||
|
||||
bool pm_runtime_status_suspended(struct device *dev);
|
||||
`bool pm_runtime_status_suspended(struct device *dev);`
|
||||
- return true if the device's runtime PM status is 'suspended'
|
||||
|
||||
void pm_runtime_allow(struct device *dev);
|
||||
`void pm_runtime_allow(struct device *dev);`
|
||||
- set the power.runtime_auto flag for the device and decrease its usage
|
||||
counter (used by the /sys/devices/.../power/control interface to
|
||||
effectively allow the device to be power managed at run time)
|
||||
|
||||
void pm_runtime_forbid(struct device *dev);
|
||||
`void pm_runtime_forbid(struct device *dev);`
|
||||
- unset the power.runtime_auto flag for the device and increase its usage
|
||||
counter (used by the /sys/devices/.../power/control interface to
|
||||
effectively prevent the device from being power managed at run time)
|
||||
|
||||
void pm_runtime_no_callbacks(struct device *dev);
|
||||
`void pm_runtime_no_callbacks(struct device *dev);`
|
||||
- set the power.no_callbacks flag for the device and remove the runtime
|
||||
PM attributes from /sys/devices/.../power (or prevent them from being
|
||||
added when the device is registered)
|
||||
|
||||
void pm_runtime_irq_safe(struct device *dev);
|
||||
`void pm_runtime_irq_safe(struct device *dev);`
|
||||
- set the power.irq_safe flag for the device, causing the runtime-PM
|
||||
callbacks to be invoked with interrupts off
|
||||
|
||||
bool pm_runtime_is_irq_safe(struct device *dev);
|
||||
`bool pm_runtime_is_irq_safe(struct device *dev);`
|
||||
- return true if power.irq_safe flag was set for the device, causing
|
||||
the runtime-PM callbacks to be invoked with interrupts off
|
||||
|
||||
void pm_runtime_mark_last_busy(struct device *dev);
|
||||
`void pm_runtime_mark_last_busy(struct device *dev);`
|
||||
- set the power.last_busy field to the current time
|
||||
|
||||
void pm_runtime_use_autosuspend(struct device *dev);
|
||||
`void pm_runtime_use_autosuspend(struct device *dev);`
|
||||
- set the power.use_autosuspend flag, enabling autosuspend delays; call
|
||||
pm_runtime_get_sync if the flag was previously cleared and
|
||||
power.autosuspend_delay is negative
|
||||
|
||||
void pm_runtime_dont_use_autosuspend(struct device *dev);
|
||||
`void pm_runtime_dont_use_autosuspend(struct device *dev);`
|
||||
- clear the power.use_autosuspend flag, disabling autosuspend delays;
|
||||
decrement the device's usage counter if the flag was previously set and
|
||||
power.autosuspend_delay is negative; call pm_runtime_idle
|
||||
|
||||
void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
|
||||
`void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);`
|
||||
- set the power.autosuspend_delay value to 'delay' (expressed in
|
||||
milliseconds); if 'delay' is negative then runtime suspends are
|
||||
prevented; if power.use_autosuspend is set, pm_runtime_get_sync may be
|
||||
@ -493,7 +501,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||
changed to or from a negative value; if power.use_autosuspend is clear,
|
||||
pm_runtime_idle is called
|
||||
|
||||
unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
|
||||
`unsigned long pm_runtime_autosuspend_expiration(struct device *dev);`
|
||||
- calculate the time when the current autosuspend delay period will expire,
|
||||
based on power.last_busy and power.autosuspend_delay; if the delay time
|
||||
is 1000 ms or larger then the expiration time is rounded up to the
|
||||
@ -503,36 +511,37 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||
|
||||
It is safe to execute the following helper functions from interrupt context:
|
||||
|
||||
pm_request_idle()
|
||||
pm_request_autosuspend()
|
||||
pm_schedule_suspend()
|
||||
pm_request_resume()
|
||||
pm_runtime_get_noresume()
|
||||
pm_runtime_get()
|
||||
pm_runtime_put_noidle()
|
||||
pm_runtime_put()
|
||||
pm_runtime_put_autosuspend()
|
||||
pm_runtime_enable()
|
||||
pm_suspend_ignore_children()
|
||||
pm_runtime_set_active()
|
||||
pm_runtime_set_suspended()
|
||||
pm_runtime_suspended()
|
||||
pm_runtime_mark_last_busy()
|
||||
pm_runtime_autosuspend_expiration()
|
||||
- pm_request_idle()
|
||||
- pm_request_autosuspend()
|
||||
- pm_schedule_suspend()
|
||||
- pm_request_resume()
|
||||
- pm_runtime_get_noresume()
|
||||
- pm_runtime_get()
|
||||
- pm_runtime_put_noidle()
|
||||
- pm_runtime_put()
|
||||
- pm_runtime_put_autosuspend()
|
||||
- pm_runtime_enable()
|
||||
- pm_suspend_ignore_children()
|
||||
- pm_runtime_set_active()
|
||||
- pm_runtime_set_suspended()
|
||||
- pm_runtime_suspended()
|
||||
- pm_runtime_mark_last_busy()
|
||||
- pm_runtime_autosuspend_expiration()
|
||||
|
||||
If pm_runtime_irq_safe() has been called for a device then the following helper
|
||||
functions may also be used in interrupt context:
|
||||
|
||||
pm_runtime_idle()
|
||||
pm_runtime_suspend()
|
||||
pm_runtime_autosuspend()
|
||||
pm_runtime_resume()
|
||||
pm_runtime_get_sync()
|
||||
pm_runtime_put_sync()
|
||||
pm_runtime_put_sync_suspend()
|
||||
pm_runtime_put_sync_autosuspend()
|
||||
- pm_runtime_idle()
|
||||
- pm_runtime_suspend()
|
||||
- pm_runtime_autosuspend()
|
||||
- pm_runtime_resume()
|
||||
- pm_runtime_get_sync()
|
||||
- pm_runtime_put_sync()
|
||||
- pm_runtime_put_sync_suspend()
|
||||
- pm_runtime_put_sync_autosuspend()
|
||||
|
||||
5. Runtime PM Initialization, Device Probing and Removal
|
||||
========================================================
|
||||
|
||||
Initially, the runtime PM is disabled for all devices, which means that the
|
||||
majority of the runtime PM helper functions described in Section 4 will return
|
||||
@ -608,6 +617,7 @@ manage the device at run time, the driver may confuse it by using
|
||||
pm_runtime_forbid() this way.
|
||||
|
||||
6. Runtime PM and System Sleep
|
||||
==============================
|
||||
|
||||
Runtime PM and system sleep (i.e., system suspend and hibernation, also known
|
||||
as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of
|
||||
@ -647,9 +657,9 @@ brought back to full power during resume, then its runtime PM status will have
|
||||
to be updated to reflect the actual post-system sleep status. The way to do
|
||||
this is:
|
||||
|
||||
pm_runtime_disable(dev);
|
||||
pm_runtime_set_active(dev);
|
||||
pm_runtime_enable(dev);
|
||||
- pm_runtime_disable(dev);
|
||||
- pm_runtime_set_active(dev);
|
||||
- pm_runtime_enable(dev);
|
||||
|
||||
The PM core always increments the runtime usage counter before calling the
|
||||
->suspend() callback and decrements it after calling the ->resume() callback.
|
||||
@ -705,66 +715,66 @@ Subsystems may wish to conserve code space by using the set of generic power
|
||||
management callbacks provided by the PM core, defined in
|
||||
driver/base/power/generic_ops.c:
|
||||
|
||||
int pm_generic_runtime_suspend(struct device *dev);
|
||||
`int pm_generic_runtime_suspend(struct device *dev);`
|
||||
- invoke the ->runtime_suspend() callback provided by the driver of this
|
||||
device and return its result, or return 0 if not defined
|
||||
|
||||
int pm_generic_runtime_resume(struct device *dev);
|
||||
`int pm_generic_runtime_resume(struct device *dev);`
|
||||
- invoke the ->runtime_resume() callback provided by the driver of this
|
||||
device and return its result, or return 0 if not defined
|
||||
|
||||
int pm_generic_suspend(struct device *dev);
|
||||
`int pm_generic_suspend(struct device *dev);`
|
||||
- if the device has not been suspended at run time, invoke the ->suspend()
|
||||
callback provided by its driver and return its result, or return 0 if not
|
||||
defined
|
||||
|
||||
int pm_generic_suspend_noirq(struct device *dev);
|
||||
`int pm_generic_suspend_noirq(struct device *dev);`
|
||||
- if pm_runtime_suspended(dev) returns "false", invoke the ->suspend_noirq()
|
||||
callback provided by the device's driver and return its result, or return
|
||||
0 if not defined
|
||||
|
||||
int pm_generic_resume(struct device *dev);
|
||||
`int pm_generic_resume(struct device *dev);`
|
||||
- invoke the ->resume() callback provided by the driver of this device and,
|
||||
if successful, change the device's runtime PM status to 'active'
|
||||
|
||||
int pm_generic_resume_noirq(struct device *dev);
|
||||
`int pm_generic_resume_noirq(struct device *dev);`
|
||||
- invoke the ->resume_noirq() callback provided by the driver of this device
|
||||
|
||||
int pm_generic_freeze(struct device *dev);
|
||||
`int pm_generic_freeze(struct device *dev);`
|
||||
- if the device has not been suspended at run time, invoke the ->freeze()
|
||||
callback provided by its driver and return its result, or return 0 if not
|
||||
defined
|
||||
|
||||
int pm_generic_freeze_noirq(struct device *dev);
|
||||
`int pm_generic_freeze_noirq(struct device *dev);`
|
||||
- if pm_runtime_suspended(dev) returns "false", invoke the ->freeze_noirq()
|
||||
callback provided by the device's driver and return its result, or return
|
||||
0 if not defined
|
||||
|
||||
int pm_generic_thaw(struct device *dev);
|
||||
`int pm_generic_thaw(struct device *dev);`
|
||||
- if the device has not been suspended at run time, invoke the ->thaw()
|
||||
callback provided by its driver and return its result, or return 0 if not
|
||||
defined
|
||||
|
||||
int pm_generic_thaw_noirq(struct device *dev);
|
||||
`int pm_generic_thaw_noirq(struct device *dev);`
|
||||
- if pm_runtime_suspended(dev) returns "false", invoke the ->thaw_noirq()
|
||||
callback provided by the device's driver and return its result, or return
|
||||
0 if not defined
|
||||
|
||||
int pm_generic_poweroff(struct device *dev);
|
||||
`int pm_generic_poweroff(struct device *dev);`
|
||||
- if the device has not been suspended at run time, invoke the ->poweroff()
|
||||
callback provided by its driver and return its result, or return 0 if not
|
||||
defined
|
||||
|
||||
int pm_generic_poweroff_noirq(struct device *dev);
|
||||
`int pm_generic_poweroff_noirq(struct device *dev);`
|
||||
- if pm_runtime_suspended(dev) returns "false", run the ->poweroff_noirq()
|
||||
callback provided by the device's driver and return its result, or return
|
||||
0 if not defined
|
||||
|
||||
int pm_generic_restore(struct device *dev);
|
||||
`int pm_generic_restore(struct device *dev);`
|
||||
- invoke the ->restore() callback provided by the driver of this device and,
|
||||
if successful, change the device's runtime PM status to 'active'
|
||||
|
||||
int pm_generic_restore_noirq(struct device *dev);
|
||||
`int pm_generic_restore_noirq(struct device *dev);`
|
||||
- invoke the ->restore_noirq() callback provided by the device's driver
|
||||
|
||||
These functions are the defaults used by the PM core, if a subsystem doesn't
|
||||
@ -781,6 +791,7 @@ UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
|
||||
last argument to NULL).
|
||||
|
||||
8. "No-Callback" Devices
|
||||
========================
|
||||
|
||||
Some "devices" are only logical sub-devices of their parent and cannot be
|
||||
power-managed on their own. (The prototype example is a USB interface. Entire
|
||||
@ -807,6 +818,7 @@ parent must take responsibility for telling the device's driver when the
|
||||
parent's power state changes.
|
||||
|
||||
9. Autosuspend, or automatically-delayed suspends
|
||||
=================================================
|
||||
|
||||
Changing a device's power state isn't free; it requires both time and energy.
|
||||
A device should be put in a low-power state only when there's some reason to
|
||||
@ -832,8 +844,8 @@ registration the length should be controlled by user space, using the
|
||||
|
||||
In order to use autosuspend, subsystems or drivers must call
|
||||
pm_runtime_use_autosuspend() (preferably before registering the device), and
|
||||
thereafter they should use the various *_autosuspend() helper functions instead
|
||||
of the non-autosuspend counterparts:
|
||||
thereafter they should use the various `*_autosuspend()` helper functions
|
||||
instead of the non-autosuspend counterparts::
|
||||
|
||||
Instead of: pm_runtime_suspend use: pm_runtime_autosuspend;
|
||||
Instead of: pm_schedule_suspend use: pm_request_autosuspend;
|
||||
@ -858,7 +870,7 @@ The implementation is well suited for asynchronous use in interrupt contexts.
|
||||
However such use inevitably involves races, because the PM core can't
|
||||
synchronize ->runtime_suspend() callbacks with the arrival of I/O requests.
|
||||
This synchronization must be handled by the driver, using its private lock.
|
||||
Here is a schematic pseudo-code example:
|
||||
Here is a schematic pseudo-code example::
|
||||
|
||||
foo_read_or_write(struct foo_priv *foo, void *data)
|
||||
{
|
@ -1,7 +1,9 @@
|
||||
How to get s2ram working
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
2006 Linus Torvalds
|
||||
2006 Pavel Machek
|
||||
========================
|
||||
How to get s2ram working
|
||||
========================
|
||||
|
||||
2006 Linus Torvalds
|
||||
2006 Pavel Machek
|
||||
|
||||
1) Check suspend.sf.net, program s2ram there has long whitelist of
|
||||
"known ok" machines, along with tricks to use on each one.
|
||||
@ -12,8 +14,8 @@
|
||||
|
||||
3) You can use Linus' TRACE_RESUME infrastructure, described below.
|
||||
|
||||
Using TRACE_RESUME
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
Using TRACE_RESUME
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
I've been working at making the machines I have able to STR, and almost
|
||||
always it's a driver that is buggy. Thank God for the suspend/resume
|
||||
@ -27,7 +29,7 @@ machine that doesn't boot) is:
|
||||
|
||||
- enable PM_DEBUG, and PM_TRACE
|
||||
|
||||
- use a script like this:
|
||||
- use a script like this::
|
||||
|
||||
#!/bin/sh
|
||||
sync
|
||||
@ -38,7 +40,7 @@ machine that doesn't boot) is:
|
||||
|
||||
- if it doesn't come back up (which is usually the problem), reboot by
|
||||
holding the power button down, and look at the dmesg output for things
|
||||
like
|
||||
like::
|
||||
|
||||
Magic number: 4:156:725
|
||||
hash matches drivers/base/power/resume.c:28
|
||||
@ -52,7 +54,7 @@ machine that doesn't boot) is:
|
||||
If no device matches the hash (or any matches appear to be false positives),
|
||||
the culprit may be a device from a loadable kernel module that is not loaded
|
||||
until after the hash is checked. You can check the hash against the current
|
||||
devices again after more modules are loaded using sysfs:
|
||||
devices again after more modules are loaded using sysfs::
|
||||
|
||||
cat /sys/power/pm_trace_dev_match
|
||||
|
@ -1,10 +1,15 @@
|
||||
====================================================================
|
||||
Interaction of Suspend code (S3) with the CPU hotplug infrastructure
|
||||
====================================================================
|
||||
|
||||
(C) 2011 - 2014 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
|
||||
(C) 2011 - 2014 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
|
||||
|
||||
|
||||
I. How does the regular CPU hotplug code differ from how the Suspend-to-RAM
|
||||
infrastructure uses it internally? And where do they share common code?
|
||||
I. Differences between CPU hotplug and Suspend-to-RAM
|
||||
======================================================
|
||||
|
||||
How does the regular CPU hotplug code differ from how the Suspend-to-RAM
|
||||
infrastructure uses it internally? And where do they share common code?
|
||||
|
||||
Well, a picture is worth a thousand words... So ASCII art follows :-)
|
||||
|
||||
@ -16,13 +21,13 @@ of describing where they take different paths and where they share code.
|
||||
What happens when regular CPU hotplug and Suspend-to-RAM race with each other
|
||||
is not depicted here.]
|
||||
|
||||
On a high level, the suspend-resume cycle goes like this:
|
||||
On a high level, the suspend-resume cycle goes like this::
|
||||
|
||||
|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
|
||||
|tasks | | cpus | | | | cpus | |tasks|
|
||||
|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
|
||||
|tasks | | cpus | | | | cpus | |tasks|
|
||||
|
||||
|
||||
More details follow:
|
||||
More details follow::
|
||||
|
||||
Suspend call path
|
||||
-----------------
|
||||
@ -87,7 +92,9 @@ More details follow:
|
||||
|
||||
Resuming back is likewise, with the counterparts being (in the order of
|
||||
execution during resume):
|
||||
* enable_nonboot_cpus() which involves:
|
||||
|
||||
* enable_nonboot_cpus() which involves::
|
||||
|
||||
| Acquire cpu_add_remove_lock
|
||||
| Decrease cpu_hotplug_disabled, thereby enabling regular cpu hotplug
|
||||
| Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
|
||||
@ -103,6 +110,8 @@ It is to be noted here that the system_transition_mutex lock is acquired at the
|
||||
beginning, when we are just starting out to suspend, and then released only
|
||||
after the entire cycle is complete (i.e., suspend + resume).
|
||||
|
||||
::
|
||||
|
||||
|
||||
|
||||
Regular CPU hotplug call path
|
||||
@ -152,16 +161,16 @@ with the 'tasks_frozen' argument set to 1.
|
||||
|
||||
|
||||
Important files and functions/entry points:
|
||||
------------------------------------------
|
||||
-------------------------------------------
|
||||
|
||||
kernel/power/process.c : freeze_processes(), thaw_processes()
|
||||
kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
|
||||
kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
|
||||
- kernel/power/process.c : freeze_processes(), thaw_processes()
|
||||
- kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
|
||||
- kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
|
||||
|
||||
|
||||
|
||||
II. What are the issues involved in CPU hotplug?
|
||||
-------------------------------------------
|
||||
------------------------------------------------
|
||||
|
||||
There are some interesting situations involving CPU hotplug and microcode
|
||||
update on the CPUs, as discussed below:
|
||||
@ -243,8 +252,11 @@ d. Handling microcode update during suspend/hibernate:
|
||||
cycles).
|
||||
|
||||
|
||||
III. Are there any known problems when regular CPU hotplug and suspend race
|
||||
with each other?
|
||||
III. Known problems
|
||||
===================
|
||||
|
||||
Are there any known problems when regular CPU hotplug and suspend race
|
||||
with each other?
|
||||
|
||||
Yes, they are listed below:
|
||||
|
@ -1,4 +1,6 @@
|
||||
====================================
|
||||
System Suspend and Device Interrupts
|
||||
====================================
|
||||
|
||||
Copyright (C) 2014 Intel Corp.
|
||||
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
@ -1,4 +1,7 @@
|
||||
===============================================
|
||||
Using swap files with software suspend (swsusp)
|
||||
===============================================
|
||||
|
||||
(C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
|
||||
|
||||
The Linux kernel handles swap files almost in the same way as it handles swap
|
||||
@ -21,20 +24,20 @@ units.
|
||||
|
||||
In order to use a swap file with swsusp, you need to:
|
||||
|
||||
1) Create the swap file and make it active, eg.
|
||||
1) Create the swap file and make it active, eg.::
|
||||
|
||||
# dd if=/dev/zero of=<swap_file_path> bs=1024 count=<swap_file_size_in_k>
|
||||
# mkswap <swap_file_path>
|
||||
# swapon <swap_file_path>
|
||||
# dd if=/dev/zero of=<swap_file_path> bs=1024 count=<swap_file_size_in_k>
|
||||
# mkswap <swap_file_path>
|
||||
# swapon <swap_file_path>
|
||||
|
||||
2) Use an application that will bmap the swap file with the help of the
|
||||
FIBMAP ioctl and determine the location of the file's swap header, as the
|
||||
offset, in <PAGE_SIZE> units, from the beginning of the partition which
|
||||
holds the swap file.
|
||||
|
||||
3) Add the following parameters to the kernel command line:
|
||||
3) Add the following parameters to the kernel command line::
|
||||
|
||||
resume=<swap_file_partition> resume_offset=<swap_file_offset>
|
||||
resume=<swap_file_partition> resume_offset=<swap_file_offset>
|
||||
|
||||
where <swap_file_partition> is the partition on which the swap file is located
|
||||
and <swap_file_offset> is the offset of the swap header determined by the
|
||||
@ -46,7 +49,7 @@ OR
|
||||
|
||||
Use a userland suspend application that will set the partition and offset
|
||||
with the help of the SNAPSHOT_SET_SWAP_AREA ioctl described in
|
||||
Documentation/power/userland-swsusp.txt (this is the only method to suspend
|
||||
Documentation/power/userland-swsusp.rst (this is the only method to suspend
|
||||
to a swap file allowing the resume to be initiated from an initrd or initramfs
|
||||
image).
|
||||
|
@ -1,13 +1,15 @@
|
||||
=======================================
|
||||
How to use dm-crypt and swsusp together
|
||||
=======================================
|
||||
|
||||
Author: Andreas Steinmetz <ast@domdv.de>
|
||||
|
||||
|
||||
How to use dm-crypt and swsusp together:
|
||||
========================================
|
||||
|
||||
Some prerequisites:
|
||||
You know how dm-crypt works. If not, visit the following web page:
|
||||
http://www.saout.de/misc/dm-crypt/
|
||||
You have read Documentation/power/swsusp.txt and understand it.
|
||||
You have read Documentation/power/swsusp.rst and understand it.
|
||||
You did read Documentation/admin-guide/initrd.rst and know how an initrd works.
|
||||
You know how to create or how to modify an initrd.
|
||||
|
||||
@ -29,23 +31,23 @@ a way that the swap device you suspend to/resume from has
|
||||
always the same major/minor within the initrd as well as
|
||||
within your running system. The easiest way to achieve this is
|
||||
to always set up this swap device first with dmsetup, so that
|
||||
it will always look like the following:
|
||||
it will always look like the following::
|
||||
|
||||
brw------- 1 root root 254, 0 Jul 28 13:37 /dev/mapper/swap0
|
||||
brw------- 1 root root 254, 0 Jul 28 13:37 /dev/mapper/swap0
|
||||
|
||||
Now set up your kernel to use /dev/mapper/swap0 as the default
|
||||
resume partition, so your kernel .config contains:
|
||||
resume partition, so your kernel .config contains::
|
||||
|
||||
CONFIG_PM_STD_PARTITION="/dev/mapper/swap0"
|
||||
CONFIG_PM_STD_PARTITION="/dev/mapper/swap0"
|
||||
|
||||
Prepare your boot loader to use the initrd you will create or
|
||||
modify. For lilo the simplest setup looks like the following
|
||||
lines:
|
||||
lines::
|
||||
|
||||
image=/boot/vmlinuz
|
||||
initrd=/boot/initrd.gz
|
||||
label=linux
|
||||
append="root=/dev/ram0 init=/linuxrc rw"
|
||||
image=/boot/vmlinuz
|
||||
initrd=/boot/initrd.gz
|
||||
label=linux
|
||||
append="root=/dev/ram0 init=/linuxrc rw"
|
||||
|
||||
Finally you need to create or modify your initrd. Lets assume
|
||||
you create an initrd that reads the required dm-crypt setup
|
||||
@ -53,66 +55,66 @@ from a pcmcia flash disk card. The card is formatted with an ext2
|
||||
fs which resides on /dev/hde1 when the card is inserted. The
|
||||
card contains at least the encrypted swap setup in a file
|
||||
named "swapkey". /etc/fstab of your initrd contains something
|
||||
like the following:
|
||||
like the following::
|
||||
|
||||
/dev/hda1 /mnt ext3 ro 0 0
|
||||
none /proc proc defaults,noatime,nodiratime 0 0
|
||||
none /sys sysfs defaults,noatime,nodiratime 0 0
|
||||
/dev/hda1 /mnt ext3 ro 0 0
|
||||
none /proc proc defaults,noatime,nodiratime 0 0
|
||||
none /sys sysfs defaults,noatime,nodiratime 0 0
|
||||
|
||||
/dev/hda1 contains an unencrypted mini system that sets up all
|
||||
of your crypto devices, again by reading the setup from the
|
||||
pcmcia flash disk. What follows now is a /linuxrc for your
|
||||
initrd that allows you to resume from encrypted swap and that
|
||||
continues boot with your mini system on /dev/hda1 if resume
|
||||
does not happen:
|
||||
does not happen::
|
||||
|
||||
#!/bin/sh
|
||||
PATH=/sbin:/bin:/usr/sbin:/usr/bin
|
||||
mount /proc
|
||||
mount /sys
|
||||
mapped=0
|
||||
noresume=`grep -c noresume /proc/cmdline`
|
||||
if [ "$*" != "" ]
|
||||
then
|
||||
noresume=1
|
||||
fi
|
||||
dmesg -n 1
|
||||
/sbin/cardmgr -q
|
||||
for i in 1 2 3 4 5 6 7 8 9 0
|
||||
do
|
||||
if [ -f /proc/ide/hde/media ]
|
||||
#!/bin/sh
|
||||
PATH=/sbin:/bin:/usr/sbin:/usr/bin
|
||||
mount /proc
|
||||
mount /sys
|
||||
mapped=0
|
||||
noresume=`grep -c noresume /proc/cmdline`
|
||||
if [ "$*" != "" ]
|
||||
then
|
||||
usleep 500000
|
||||
mount -t ext2 -o ro /dev/hde1 /mnt
|
||||
if [ -f /mnt/swapkey ]
|
||||
noresume=1
|
||||
fi
|
||||
dmesg -n 1
|
||||
/sbin/cardmgr -q
|
||||
for i in 1 2 3 4 5 6 7 8 9 0
|
||||
do
|
||||
if [ -f /proc/ide/hde/media ]
|
||||
then
|
||||
dmsetup create swap0 /mnt/swapkey > /dev/null 2>&1 && mapped=1
|
||||
usleep 500000
|
||||
mount -t ext2 -o ro /dev/hde1 /mnt
|
||||
if [ -f /mnt/swapkey ]
|
||||
then
|
||||
dmsetup create swap0 /mnt/swapkey > /dev/null 2>&1 && mapped=1
|
||||
fi
|
||||
umount /mnt
|
||||
break
|
||||
fi
|
||||
umount /mnt
|
||||
break
|
||||
fi
|
||||
usleep 500000
|
||||
done
|
||||
killproc /sbin/cardmgr
|
||||
dmesg -n 6
|
||||
if [ $mapped = 1 ]
|
||||
then
|
||||
if [ $noresume != 0 ]
|
||||
usleep 500000
|
||||
done
|
||||
killproc /sbin/cardmgr
|
||||
dmesg -n 6
|
||||
if [ $mapped = 1 ]
|
||||
then
|
||||
mkswap /dev/mapper/swap0 > /dev/null 2>&1
|
||||
if [ $noresume != 0 ]
|
||||
then
|
||||
mkswap /dev/mapper/swap0 > /dev/null 2>&1
|
||||
fi
|
||||
echo 254:0 > /sys/power/resume
|
||||
dmsetup remove swap0
|
||||
fi
|
||||
echo 254:0 > /sys/power/resume
|
||||
dmsetup remove swap0
|
||||
fi
|
||||
umount /sys
|
||||
mount /mnt
|
||||
umount /proc
|
||||
cd /mnt
|
||||
pivot_root . mnt
|
||||
mount /proc
|
||||
umount -l /mnt
|
||||
umount /proc
|
||||
exec chroot . /sbin/init $* < dev/console > dev/console 2>&1
|
||||
umount /sys
|
||||
mount /mnt
|
||||
umount /proc
|
||||
cd /mnt
|
||||
pivot_root . mnt
|
||||
mount /proc
|
||||
umount -l /mnt
|
||||
umount /proc
|
||||
exec chroot . /sbin/init $* < dev/console > dev/console 2>&1
|
||||
|
||||
Please don't mind the weird loop above, busybox's msh doesn't know
|
||||
the let statement. Now, what is happening in the script?
|
501
Documentation/power/swsusp.rst
Normal file
501
Documentation/power/swsusp.rst
Normal file
@ -0,0 +1,501 @@
|
||||
============
|
||||
Swap suspend
|
||||
============
|
||||
|
||||
Some warnings, first.
|
||||
|
||||
.. warning::
|
||||
|
||||
**BIG FAT WARNING**
|
||||
|
||||
If you touch anything on disk between suspend and resume...
|
||||
...kiss your data goodbye.
|
||||
|
||||
If you do resume from initrd after your filesystems are mounted...
|
||||
...bye bye root partition.
|
||||
|
||||
[this is actually same case as above]
|
||||
|
||||
If you have unsupported ( ) devices using DMA, you may have some
|
||||
problems. If your disk driver does not support suspend... (IDE does),
|
||||
it may cause some problems, too. If you change kernel command line
|
||||
between suspend and resume, it may do something wrong. If you change
|
||||
your hardware while system is suspended... well, it was not good idea;
|
||||
but it will probably only crash.
|
||||
|
||||
( ) suspend/resume support is needed to make it safe.
|
||||
|
||||
If you have any filesystems on USB devices mounted before software suspend,
|
||||
they won't be accessible after resume and you may lose data, as though
|
||||
you have unplugged the USB devices with mounted filesystems on them;
|
||||
see the FAQ below for details. (This is not true for more traditional
|
||||
power states like "standby", which normally don't turn USB off.)
|
||||
|
||||
Swap partition:
|
||||
You need to append resume=/dev/your_swap_partition to kernel command
|
||||
line or specify it using /sys/power/resume.
|
||||
|
||||
Swap file:
|
||||
If using a swapfile you can also specify a resume offset using
|
||||
resume_offset=<number> on the kernel command line or specify it
|
||||
in /sys/power/resume_offset.
|
||||
|
||||
After preparing then you suspend by::
|
||||
|
||||
echo shutdown > /sys/power/disk; echo disk > /sys/power/state
|
||||
|
||||
- If you feel ACPI works pretty well on your system, you might try::
|
||||
|
||||
echo platform > /sys/power/disk; echo disk > /sys/power/state
|
||||
|
||||
- If you would like to write hibernation image to swap and then suspend
|
||||
to RAM (provided your platform supports it), you can try::
|
||||
|
||||
echo suspend > /sys/power/disk; echo disk > /sys/power/state
|
||||
|
||||
- If you have SATA disks, you'll need recent kernels with SATA suspend
|
||||
support. For suspend and resume to work, make sure your disk drivers
|
||||
are built into kernel -- not modules. [There's way to make
|
||||
suspend/resume with modular disk drivers, see FAQ, but you probably
|
||||
should not do that.]
|
||||
|
||||
If you want to limit the suspend image size to N bytes, do::
|
||||
|
||||
echo N > /sys/power/image_size
|
||||
|
||||
before suspend (it is limited to around 2/5 of available RAM by default).
|
||||
|
||||
- The resume process checks for the presence of the resume device,
|
||||
if found, it then checks the contents for the hibernation image signature.
|
||||
If both are found, it resumes the hibernation image.
|
||||
|
||||
- The resume process may be triggered in two ways:
|
||||
|
||||
1) During lateinit: If resume=/dev/your_swap_partition is specified on
|
||||
the kernel command line, lateinit runs the resume process. If the
|
||||
resume device has not been probed yet, the resume process fails and
|
||||
bootup continues.
|
||||
2) Manually from an initrd or initramfs: May be run from
|
||||
the init script by using the /sys/power/resume file. It is vital
|
||||
that this be done prior to remounting any filesystems (even as
|
||||
read-only) otherwise data may be corrupted.
|
||||
|
||||
Article about goals and implementation of Software Suspend for Linux
|
||||
====================================================================
|
||||
|
||||
Author: Gábor Kuti
|
||||
Last revised: 2003-10-20 by Pavel Machek
|
||||
|
||||
Idea and goals to achieve
|
||||
-------------------------
|
||||
|
||||
Nowadays it is common in several laptops that they have a suspend button. It
|
||||
saves the state of the machine to a filesystem or to a partition and switches
|
||||
to standby mode. Later resuming the machine the saved state is loaded back to
|
||||
ram and the machine can continue its work. It has two real benefits. First we
|
||||
save ourselves the time machine goes down and later boots up, energy costs
|
||||
are real high when running from batteries. The other gain is that we don't have
|
||||
to interrupt our programs so processes that are calculating something for a long
|
||||
time shouldn't need to be written interruptible.
|
||||
|
||||
swsusp saves the state of the machine into active swaps and then reboots or
|
||||
powerdowns. You must explicitly specify the swap partition to resume from with
|
||||
`resume=` kernel option. If signature is found it loads and restores saved
|
||||
state. If the option `noresume` is specified as a boot parameter, it skips
|
||||
the resuming. If the option `hibernate=nocompress` is specified as a boot
|
||||
parameter, it saves hibernation image without compression.
|
||||
|
||||
In the meantime while the system is suspended you should not add/remove any
|
||||
of the hardware, write to the filesystems, etc.
|
||||
|
||||
Sleep states summary
|
||||
====================
|
||||
|
||||
There are three different interfaces you can use, /proc/acpi should
|
||||
work like this:
|
||||
|
||||
In a really perfect world::
|
||||
|
||||
echo 1 > /proc/acpi/sleep # for standby
|
||||
echo 2 > /proc/acpi/sleep # for suspend to ram
|
||||
echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power conservative
|
||||
echo 4 > /proc/acpi/sleep # for suspend to disk
|
||||
echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system
|
||||
|
||||
and perhaps::
|
||||
|
||||
echo 4b > /proc/acpi/sleep # for suspend to disk via s4bios
|
||||
|
||||
Frequently Asked Questions
|
||||
==========================
|
||||
|
||||
Q:
|
||||
well, suspending a server is IMHO a really stupid thing,
|
||||
but... (Diego Zuccato):
|
||||
|
||||
A:
|
||||
You bought new UPS for your server. How do you install it without
|
||||
bringing machine down? Suspend to disk, rearrange power cables,
|
||||
resume.
|
||||
|
||||
You have your server on UPS. Power died, and UPS is indicating 30
|
||||
seconds to failure. What do you do? Suspend to disk.
|
||||
|
||||
|
||||
Q:
|
||||
Maybe I'm missing something, but why don't the regular I/O paths work?
|
||||
|
||||
A:
|
||||
We do use the regular I/O paths. However we cannot restore the data
|
||||
to its original location as we load it. That would create an
|
||||
inconsistent kernel state which would certainly result in an oops.
|
||||
Instead, we load the image into unused memory and then atomically copy
|
||||
it back to it original location. This implies, of course, a maximum
|
||||
image size of half the amount of memory.
|
||||
|
||||
There are two solutions to this:
|
||||
|
||||
* require half of memory to be free during suspend. That way you can
|
||||
read "new" data onto free spots, then cli and copy
|
||||
|
||||
* assume we had special "polling" ide driver that only uses memory
|
||||
between 0-640KB. That way, I'd have to make sure that 0-640KB is free
|
||||
during suspending, but otherwise it would work...
|
||||
|
||||
suspend2 shares this fundamental limitation, but does not include user
|
||||
data and disk caches into "used memory" by saving them in
|
||||
advance. That means that the limitation goes away in practice.
|
||||
|
||||
Q:
|
||||
Does linux support ACPI S4?
|
||||
|
||||
A:
|
||||
Yes. That's what echo platform > /sys/power/disk does.
|
||||
|
||||
Q:
|
||||
What is 'suspend2'?
|
||||
|
||||
A:
|
||||
suspend2 is 'Software Suspend 2', a forked implementation of
|
||||
suspend-to-disk which is available as separate patches for 2.4 and 2.6
|
||||
kernels from swsusp.sourceforge.net. It includes support for SMP, 4GB
|
||||
highmem and preemption. It also has a extensible architecture that
|
||||
allows for arbitrary transformations on the image (compression,
|
||||
encryption) and arbitrary backends for writing the image (eg to swap
|
||||
or an NFS share[Work In Progress]). Questions regarding suspend2
|
||||
should be sent to the mailing list available through the suspend2
|
||||
website, and not to the Linux Kernel Mailing List. We are working
|
||||
toward merging suspend2 into the mainline kernel.
|
||||
|
||||
Q:
|
||||
What is the freezing of tasks and why are we using it?
|
||||
|
||||
A:
|
||||
The freezing of tasks is a mechanism by which user space processes and some
|
||||
kernel threads are controlled during hibernation or system-wide suspend (on some
|
||||
architectures). See freezing-of-tasks.txt for details.
|
||||
|
||||
Q:
|
||||
What is the difference between "platform" and "shutdown"?
|
||||
|
||||
A:
|
||||
shutdown:
|
||||
save state in linux, then tell bios to powerdown
|
||||
|
||||
platform:
|
||||
save state in linux, then tell bios to powerdown and blink
|
||||
"suspended led"
|
||||
|
||||
"platform" is actually right thing to do where supported, but
|
||||
"shutdown" is most reliable (except on ACPI systems).
|
||||
|
||||
Q:
|
||||
I do not understand why you have such strong objections to idea of
|
||||
selective suspend.
|
||||
|
||||
A:
|
||||
Do selective suspend during runtime power management, that's okay. But
|
||||
it's useless for suspend-to-disk. (And I do not see how you could use
|
||||
it for suspend-to-ram, I hope you do not want that).
|
||||
|
||||
Lets see, so you suggest to
|
||||
|
||||
* SUSPEND all but swap device and parents
|
||||
* Snapshot
|
||||
* Write image to disk
|
||||
* SUSPEND swap device and parents
|
||||
* Powerdown
|
||||
|
||||
Oh no, that does not work, if swap device or its parents uses DMA,
|
||||
you've corrupted data. You'd have to do
|
||||
|
||||
* SUSPEND all but swap device and parents
|
||||
* FREEZE swap device and parents
|
||||
* Snapshot
|
||||
* UNFREEZE swap device and parents
|
||||
* Write
|
||||
* SUSPEND swap device and parents
|
||||
|
||||
Which means that you still need that FREEZE state, and you get more
|
||||
complicated code. (And I have not yet introduce details like system
|
||||
devices).
|
||||
|
||||
Q:
|
||||
There don't seem to be any generally useful behavioral
|
||||
distinctions between SUSPEND and FREEZE.
|
||||
|
||||
A:
|
||||
Doing SUSPEND when you are asked to do FREEZE is always correct,
|
||||
but it may be unnecessarily slow. If you want your driver to stay simple,
|
||||
slowness may not matter to you. It can always be fixed later.
|
||||
|
||||
For devices like disk it does matter, you do not want to spindown for
|
||||
FREEZE.
|
||||
|
||||
Q:
|
||||
After resuming, system is paging heavily, leading to very bad interactivity.
|
||||
|
||||
A:
|
||||
Try running::
|
||||
|
||||
cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u | while read file
|
||||
do
|
||||
test -f "$file" && cat "$file" > /dev/null
|
||||
done
|
||||
|
||||
after resume. swapoff -a; swapon -a may also be useful.
|
||||
|
||||
Q:
|
||||
What happens to devices during swsusp? They seem to be resumed
|
||||
during system suspend?
|
||||
|
||||
A:
|
||||
That's correct. We need to resume them if we want to write image to
|
||||
disk. Whole sequence goes like
|
||||
|
||||
**Suspend part**
|
||||
|
||||
running system, user asks for suspend-to-disk
|
||||
|
||||
user processes are stopped
|
||||
|
||||
suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
|
||||
with state snapshot
|
||||
|
||||
state snapshot: copy of whole used memory is taken with interrupts disabled
|
||||
|
||||
resume(): devices are woken up so that we can write image to swap
|
||||
|
||||
write image to swap
|
||||
|
||||
suspend(PMSG_SUSPEND): suspend devices so that we can power off
|
||||
|
||||
turn the power off
|
||||
|
||||
**Resume part**
|
||||
|
||||
(is actually pretty similar)
|
||||
|
||||
running system, user asks for suspend-to-disk
|
||||
|
||||
user processes are stopped (in common case there are none,
|
||||
but with resume-from-initrd, no one knows)
|
||||
|
||||
read image from disk
|
||||
|
||||
suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
|
||||
with image restoration
|
||||
|
||||
image restoration: rewrite memory with image
|
||||
|
||||
resume(): devices are woken up so that system can continue
|
||||
|
||||
thaw all user processes
|
||||
|
||||
Q:
|
||||
What is this 'Encrypt suspend image' for?
|
||||
|
||||
A:
|
||||
First of all: it is not a replacement for dm-crypt encrypted swap.
|
||||
It cannot protect your computer while it is suspended. Instead it does
|
||||
protect from leaking sensitive data after resume from suspend.
|
||||
|
||||
Think of the following: you suspend while an application is running
|
||||
that keeps sensitive data in memory. The application itself prevents
|
||||
the data from being swapped out. Suspend, however, must write these
|
||||
data to swap to be able to resume later on. Without suspend encryption
|
||||
your sensitive data are then stored in plaintext on disk. This means
|
||||
that after resume your sensitive data are accessible to all
|
||||
applications having direct access to the swap device which was used
|
||||
for suspend. If you don't need swap after resume these data can remain
|
||||
on disk virtually forever. Thus it can happen that your system gets
|
||||
broken in weeks later and sensitive data which you thought were
|
||||
encrypted and protected are retrieved and stolen from the swap device.
|
||||
To prevent this situation you should use 'Encrypt suspend image'.
|
||||
|
||||
During suspend a temporary key is created and this key is used to
|
||||
encrypt the data written to disk. When, during resume, the data was
|
||||
read back into memory the temporary key is destroyed which simply
|
||||
means that all data written to disk during suspend are then
|
||||
inaccessible so they can't be stolen later on. The only thing that
|
||||
you must then take care of is that you call 'mkswap' for the swap
|
||||
partition used for suspend as early as possible during regular
|
||||
boot. This asserts that any temporary key from an oopsed suspend or
|
||||
from a failed or aborted resume is erased from the swap device.
|
||||
|
||||
As a rule of thumb use encrypted swap to protect your data while your
|
||||
system is shut down or suspended. Additionally use the encrypted
|
||||
suspend image to prevent sensitive data from being stolen after
|
||||
resume.
|
||||
|
||||
Q:
|
||||
Can I suspend to a swap file?
|
||||
|
||||
A:
|
||||
Generally, yes, you can. However, it requires you to use the "resume=" and
|
||||
"resume_offset=" kernel command line parameters, so the resume from a swap file
|
||||
cannot be initiated from an initrd or initramfs image. See
|
||||
swsusp-and-swap-files.txt for details.
|
||||
|
||||
Q:
|
||||
Is there a maximum system RAM size that is supported by swsusp?
|
||||
|
||||
A:
|
||||
It should work okay with highmem.
|
||||
|
||||
Q:
|
||||
Does swsusp (to disk) use only one swap partition or can it use
|
||||
multiple swap partitions (aggregate them into one logical space)?
|
||||
|
||||
A:
|
||||
Only one swap partition, sorry.
|
||||
|
||||
Q:
|
||||
If my application(s) causes lots of memory & swap space to be used
|
||||
(over half of the total system RAM), is it correct that it is likely
|
||||
to be useless to try to suspend to disk while that app is running?
|
||||
|
||||
A:
|
||||
No, it should work okay, as long as your app does not mlock()
|
||||
it. Just prepare big enough swap partition.
|
||||
|
||||
Q:
|
||||
What information is useful for debugging suspend-to-disk problems?
|
||||
|
||||
A:
|
||||
Well, last messages on the screen are always useful. If something
|
||||
is broken, it is usually some kernel driver, therefore trying with as
|
||||
little as possible modules loaded helps a lot. I also prefer people to
|
||||
suspend from console, preferably without X running. Booting with
|
||||
init=/bin/bash, then swapon and starting suspend sequence manually
|
||||
usually does the trick. Then it is good idea to try with latest
|
||||
vanilla kernel.
|
||||
|
||||
Q:
|
||||
How can distributions ship a swsusp-supporting kernel with modular
|
||||
disk drivers (especially SATA)?
|
||||
|
||||
A:
|
||||
Well, it can be done, load the drivers, then do echo into
|
||||
/sys/power/resume file from initrd. Be sure not to mount
|
||||
anything, not even read-only mount, or you are going to lose your
|
||||
data.
|
||||
|
||||
Q:
|
||||
How do I make suspend more verbose?
|
||||
|
||||
A:
|
||||
If you want to see any non-error kernel messages on the virtual
|
||||
terminal the kernel switches to during suspend, you have to set the
|
||||
kernel console loglevel to at least 4 (KERN_WARNING), for example by
|
||||
doing::
|
||||
|
||||
# save the old loglevel
|
||||
read LOGLEVEL DUMMY < /proc/sys/kernel/printk
|
||||
# set the loglevel so we see the progress bar.
|
||||
# if the level is higher than needed, we leave it alone.
|
||||
if [ $LOGLEVEL -lt 5 ]; then
|
||||
echo 5 > /proc/sys/kernel/printk
|
||||
fi
|
||||
|
||||
IMG_SZ=0
|
||||
read IMG_SZ < /sys/power/image_size
|
||||
echo -n disk > /sys/power/state
|
||||
RET=$?
|
||||
#
|
||||
# the logic here is:
|
||||
# if image_size > 0 (without kernel support, IMG_SZ will be zero),
|
||||
# then try again with image_size set to zero.
|
||||
if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size
|
||||
echo 0 > /sys/power/image_size
|
||||
echo -n disk > /sys/power/state
|
||||
RET=$?
|
||||
fi
|
||||
|
||||
# restore previous loglevel
|
||||
echo $LOGLEVEL > /proc/sys/kernel/printk
|
||||
exit $RET
|
||||
|
||||
Q:
|
||||
Is this true that if I have a mounted filesystem on a USB device and
|
||||
I suspend to disk, I can lose data unless the filesystem has been mounted
|
||||
with "sync"?
|
||||
|
||||
A:
|
||||
That's right ... if you disconnect that device, you may lose data.
|
||||
In fact, even with "-o sync" you can lose data if your programs have
|
||||
information in buffers they haven't written out to a disk you disconnect,
|
||||
or if you disconnect before the device finished saving data you wrote.
|
||||
|
||||
Software suspend normally powers down USB controllers, which is equivalent
|
||||
to disconnecting all USB devices attached to your system.
|
||||
|
||||
Your system might well support low-power modes for its USB controllers
|
||||
while the system is asleep, maintaining the connection, using true sleep
|
||||
modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the
|
||||
/sys/power/state file; write "standby" or "mem".) We've not seen any
|
||||
hardware that can use these modes through software suspend, although in
|
||||
theory some systems might support "platform" modes that won't break the
|
||||
USB connections.
|
||||
|
||||
Remember that it's always a bad idea to unplug a disk drive containing a
|
||||
mounted filesystem. That's true even when your system is asleep! The
|
||||
safest thing is to unmount all filesystems on removable media (such USB,
|
||||
Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays)
|
||||
before suspending; then remount them after resuming.
|
||||
|
||||
There is a work-around for this problem. For more information, see
|
||||
Documentation/driver-api/usb/persist.rst.
|
||||
|
||||
Q:
|
||||
Can I suspend-to-disk using a swap partition under LVM?
|
||||
|
||||
A:
|
||||
Yes and No. You can suspend successfully, but the kernel will not be able
|
||||
to resume on its own. You need an initramfs that can recognize the resume
|
||||
situation, activate the logical volume containing the swap volume (but not
|
||||
touch any filesystems!), and eventually call::
|
||||
|
||||
echo -n "$major:$minor" > /sys/power/resume
|
||||
|
||||
where $major and $minor are the respective major and minor device numbers of
|
||||
the swap volume.
|
||||
|
||||
uswsusp works with LVM, too. See http://suspend.sourceforge.net/
|
||||
|
||||
Q:
|
||||
I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
|
||||
compiled with the similar configuration files. Anyway I found that
|
||||
suspend to disk (and resume) is much slower on 2.6.16 compared to
|
||||
2.6.15. Any idea for why that might happen or how can I speed it up?
|
||||
|
||||
A:
|
||||
This is because the size of the suspend image is now greater than
|
||||
for 2.6.15 (by saving more data we can get more responsive system
|
||||
after resume).
|
||||
|
||||
There's the /sys/power/image_size knob that controls the size of the
|
||||
image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as
|
||||
root), the 2.6.15 behavior should be restored. If it is still too
|
||||
slow, take a look at suspend.sf.net -- userland suspend is faster and
|
||||
supports LZF compression to speed it up further.
|
@ -1,446 +0,0 @@
|
||||
Some warnings, first.
|
||||
|
||||
* BIG FAT WARNING *********************************************************
|
||||
*
|
||||
* If you touch anything on disk between suspend and resume...
|
||||
* ...kiss your data goodbye.
|
||||
*
|
||||
* If you do resume from initrd after your filesystems are mounted...
|
||||
* ...bye bye root partition.
|
||||
* [this is actually same case as above]
|
||||
*
|
||||
* If you have unsupported (*) devices using DMA, you may have some
|
||||
* problems. If your disk driver does not support suspend... (IDE does),
|
||||
* it may cause some problems, too. If you change kernel command line
|
||||
* between suspend and resume, it may do something wrong. If you change
|
||||
* your hardware while system is suspended... well, it was not good idea;
|
||||
* but it will probably only crash.
|
||||
*
|
||||
* (*) suspend/resume support is needed to make it safe.
|
||||
*
|
||||
* If you have any filesystems on USB devices mounted before software suspend,
|
||||
* they won't be accessible after resume and you may lose data, as though
|
||||
* you have unplugged the USB devices with mounted filesystems on them;
|
||||
* see the FAQ below for details. (This is not true for more traditional
|
||||
* power states like "standby", which normally don't turn USB off.)
|
||||
|
||||
Swap partition:
|
||||
You need to append resume=/dev/your_swap_partition to kernel command
|
||||
line or specify it using /sys/power/resume.
|
||||
|
||||
Swap file:
|
||||
If using a swapfile you can also specify a resume offset using
|
||||
resume_offset=<number> on the kernel command line or specify it
|
||||
in /sys/power/resume_offset.
|
||||
|
||||
After preparing then you suspend by
|
||||
|
||||
echo shutdown > /sys/power/disk; echo disk > /sys/power/state
|
||||
|
||||
. If you feel ACPI works pretty well on your system, you might try
|
||||
|
||||
echo platform > /sys/power/disk; echo disk > /sys/power/state
|
||||
|
||||
. If you would like to write hibernation image to swap and then suspend
|
||||
to RAM (provided your platform supports it), you can try
|
||||
|
||||
echo suspend > /sys/power/disk; echo disk > /sys/power/state
|
||||
|
||||
. If you have SATA disks, you'll need recent kernels with SATA suspend
|
||||
support. For suspend and resume to work, make sure your disk drivers
|
||||
are built into kernel -- not modules. [There's way to make
|
||||
suspend/resume with modular disk drivers, see FAQ, but you probably
|
||||
should not do that.]
|
||||
|
||||
If you want to limit the suspend image size to N bytes, do
|
||||
|
||||
echo N > /sys/power/image_size
|
||||
|
||||
before suspend (it is limited to around 2/5 of available RAM by default).
|
||||
|
||||
. The resume process checks for the presence of the resume device,
|
||||
if found, it then checks the contents for the hibernation image signature.
|
||||
If both are found, it resumes the hibernation image.
|
||||
|
||||
. The resume process may be triggered in two ways:
|
||||
1) During lateinit: If resume=/dev/your_swap_partition is specified on
|
||||
the kernel command line, lateinit runs the resume process. If the
|
||||
resume device has not been probed yet, the resume process fails and
|
||||
bootup continues.
|
||||
2) Manually from an initrd or initramfs: May be run from
|
||||
the init script by using the /sys/power/resume file. It is vital
|
||||
that this be done prior to remounting any filesystems (even as
|
||||
read-only) otherwise data may be corrupted.
|
||||
|
||||
Article about goals and implementation of Software Suspend for Linux
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Author: Gábor Kuti
|
||||
Last revised: 2003-10-20 by Pavel Machek
|
||||
|
||||
Idea and goals to achieve
|
||||
|
||||
Nowadays it is common in several laptops that they have a suspend button. It
|
||||
saves the state of the machine to a filesystem or to a partition and switches
|
||||
to standby mode. Later resuming the machine the saved state is loaded back to
|
||||
ram and the machine can continue its work. It has two real benefits. First we
|
||||
save ourselves the time machine goes down and later boots up, energy costs
|
||||
are real high when running from batteries. The other gain is that we don't have to
|
||||
interrupt our programs so processes that are calculating something for a long
|
||||
time shouldn't need to be written interruptible.
|
||||
|
||||
swsusp saves the state of the machine into active swaps and then reboots or
|
||||
powerdowns. You must explicitly specify the swap partition to resume from with
|
||||
``resume='' kernel option. If signature is found it loads and restores saved
|
||||
state. If the option ``noresume'' is specified as a boot parameter, it skips
|
||||
the resuming. If the option ``hibernate=nocompress'' is specified as a boot
|
||||
parameter, it saves hibernation image without compression.
|
||||
|
||||
In the meantime while the system is suspended you should not add/remove any
|
||||
of the hardware, write to the filesystems, etc.
|
||||
|
||||
Sleep states summary
|
||||
====================
|
||||
|
||||
There are three different interfaces you can use, /proc/acpi should
|
||||
work like this:
|
||||
|
||||
In a really perfect world:
|
||||
echo 1 > /proc/acpi/sleep # for standby
|
||||
echo 2 > /proc/acpi/sleep # for suspend to ram
|
||||
echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power conservative
|
||||
echo 4 > /proc/acpi/sleep # for suspend to disk
|
||||
echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system
|
||||
|
||||
and perhaps
|
||||
echo 4b > /proc/acpi/sleep # for suspend to disk via s4bios
|
||||
|
||||
Frequently Asked Questions
|
||||
==========================
|
||||
|
||||
Q: well, suspending a server is IMHO a really stupid thing,
|
||||
but... (Diego Zuccato):
|
||||
|
||||
A: You bought new UPS for your server. How do you install it without
|
||||
bringing machine down? Suspend to disk, rearrange power cables,
|
||||
resume.
|
||||
|
||||
You have your server on UPS. Power died, and UPS is indicating 30
|
||||
seconds to failure. What do you do? Suspend to disk.
|
||||
|
||||
|
||||
Q: Maybe I'm missing something, but why don't the regular I/O paths work?
|
||||
|
||||
A: We do use the regular I/O paths. However we cannot restore the data
|
||||
to its original location as we load it. That would create an
|
||||
inconsistent kernel state which would certainly result in an oops.
|
||||
Instead, we load the image into unused memory and then atomically copy
|
||||
it back to it original location. This implies, of course, a maximum
|
||||
image size of half the amount of memory.
|
||||
|
||||
There are two solutions to this:
|
||||
|
||||
* require half of memory to be free during suspend. That way you can
|
||||
read "new" data onto free spots, then cli and copy
|
||||
|
||||
* assume we had special "polling" ide driver that only uses memory
|
||||
between 0-640KB. That way, I'd have to make sure that 0-640KB is free
|
||||
during suspending, but otherwise it would work...
|
||||
|
||||
suspend2 shares this fundamental limitation, but does not include user
|
||||
data and disk caches into "used memory" by saving them in
|
||||
advance. That means that the limitation goes away in practice.
|
||||
|
||||
Q: Does linux support ACPI S4?
|
||||
|
||||
A: Yes. That's what echo platform > /sys/power/disk does.
|
||||
|
||||
Q: What is 'suspend2'?
|
||||
|
||||
A: suspend2 is 'Software Suspend 2', a forked implementation of
|
||||
suspend-to-disk which is available as separate patches for 2.4 and 2.6
|
||||
kernels from swsusp.sourceforge.net. It includes support for SMP, 4GB
|
||||
highmem and preemption. It also has a extensible architecture that
|
||||
allows for arbitrary transformations on the image (compression,
|
||||
encryption) and arbitrary backends for writing the image (eg to swap
|
||||
or an NFS share[Work In Progress]). Questions regarding suspend2
|
||||
should be sent to the mailing list available through the suspend2
|
||||
website, and not to the Linux Kernel Mailing List. We are working
|
||||
toward merging suspend2 into the mainline kernel.
|
||||
|
||||
Q: What is the freezing of tasks and why are we using it?
|
||||
|
||||
A: The freezing of tasks is a mechanism by which user space processes and some
|
||||
kernel threads are controlled during hibernation or system-wide suspend (on some
|
||||
architectures). See freezing-of-tasks.txt for details.
|
||||
|
||||
Q: What is the difference between "platform" and "shutdown"?
|
||||
|
||||
A:
|
||||
|
||||
shutdown: save state in linux, then tell bios to powerdown
|
||||
|
||||
platform: save state in linux, then tell bios to powerdown and blink
|
||||
"suspended led"
|
||||
|
||||
"platform" is actually right thing to do where supported, but
|
||||
"shutdown" is most reliable (except on ACPI systems).
|
||||
|
||||
Q: I do not understand why you have such strong objections to idea of
|
||||
selective suspend.
|
||||
|
||||
A: Do selective suspend during runtime power management, that's okay. But
|
||||
it's useless for suspend-to-disk. (And I do not see how you could use
|
||||
it for suspend-to-ram, I hope you do not want that).
|
||||
|
||||
Lets see, so you suggest to
|
||||
|
||||
* SUSPEND all but swap device and parents
|
||||
* Snapshot
|
||||
* Write image to disk
|
||||
* SUSPEND swap device and parents
|
||||
* Powerdown
|
||||
|
||||
Oh no, that does not work, if swap device or its parents uses DMA,
|
||||
you've corrupted data. You'd have to do
|
||||
|
||||
* SUSPEND all but swap device and parents
|
||||
* FREEZE swap device and parents
|
||||
* Snapshot
|
||||
* UNFREEZE swap device and parents
|
||||
* Write
|
||||
* SUSPEND swap device and parents
|
||||
|
||||
Which means that you still need that FREEZE state, and you get more
|
||||
complicated code. (And I have not yet introduce details like system
|
||||
devices).
|
||||
|
||||
Q: There don't seem to be any generally useful behavioral
|
||||
distinctions between SUSPEND and FREEZE.
|
||||
|
||||
A: Doing SUSPEND when you are asked to do FREEZE is always correct,
|
||||
but it may be unnecessarily slow. If you want your driver to stay simple,
|
||||
slowness may not matter to you. It can always be fixed later.
|
||||
|
||||
For devices like disk it does matter, you do not want to spindown for
|
||||
FREEZE.
|
||||
|
||||
Q: After resuming, system is paging heavily, leading to very bad interactivity.
|
||||
|
||||
A: Try running
|
||||
|
||||
cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u | while read file
|
||||
do
|
||||
test -f "$file" && cat "$file" > /dev/null
|
||||
done
|
||||
|
||||
after resume. swapoff -a; swapon -a may also be useful.
|
||||
|
||||
Q: What happens to devices during swsusp? They seem to be resumed
|
||||
during system suspend?
|
||||
|
||||
A: That's correct. We need to resume them if we want to write image to
|
||||
disk. Whole sequence goes like
|
||||
|
||||
Suspend part
|
||||
~~~~~~~~~~~~
|
||||
running system, user asks for suspend-to-disk
|
||||
|
||||
user processes are stopped
|
||||
|
||||
suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
|
||||
with state snapshot
|
||||
|
||||
state snapshot: copy of whole used memory is taken with interrupts disabled
|
||||
|
||||
resume(): devices are woken up so that we can write image to swap
|
||||
|
||||
write image to swap
|
||||
|
||||
suspend(PMSG_SUSPEND): suspend devices so that we can power off
|
||||
|
||||
turn the power off
|
||||
|
||||
Resume part
|
||||
~~~~~~~~~~~
|
||||
(is actually pretty similar)
|
||||
|
||||
running system, user asks for suspend-to-disk
|
||||
|
||||
user processes are stopped (in common case there are none, but with resume-from-initrd, no one knows)
|
||||
|
||||
read image from disk
|
||||
|
||||
suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
|
||||
with image restoration
|
||||
|
||||
image restoration: rewrite memory with image
|
||||
|
||||
resume(): devices are woken up so that system can continue
|
||||
|
||||
thaw all user processes
|
||||
|
||||
Q: What is this 'Encrypt suspend image' for?
|
||||
|
||||
A: First of all: it is not a replacement for dm-crypt encrypted swap.
|
||||
It cannot protect your computer while it is suspended. Instead it does
|
||||
protect from leaking sensitive data after resume from suspend.
|
||||
|
||||
Think of the following: you suspend while an application is running
|
||||
that keeps sensitive data in memory. The application itself prevents
|
||||
the data from being swapped out. Suspend, however, must write these
|
||||
data to swap to be able to resume later on. Without suspend encryption
|
||||
your sensitive data are then stored in plaintext on disk. This means
|
||||
that after resume your sensitive data are accessible to all
|
||||
applications having direct access to the swap device which was used
|
||||
for suspend. If you don't need swap after resume these data can remain
|
||||
on disk virtually forever. Thus it can happen that your system gets
|
||||
broken in weeks later and sensitive data which you thought were
|
||||
encrypted and protected are retrieved and stolen from the swap device.
|
||||
To prevent this situation you should use 'Encrypt suspend image'.
|
||||
|
||||
During suspend a temporary key is created and this key is used to
|
||||
encrypt the data written to disk. When, during resume, the data was
|
||||
read back into memory the temporary key is destroyed which simply
|
||||
means that all data written to disk during suspend are then
|
||||
inaccessible so they can't be stolen later on. The only thing that
|
||||
you must then take care of is that you call 'mkswap' for the swap
|
||||
partition used for suspend as early as possible during regular
|
||||
boot. This asserts that any temporary key from an oopsed suspend or
|
||||
from a failed or aborted resume is erased from the swap device.
|
||||
|
||||
As a rule of thumb use encrypted swap to protect your data while your
|
||||
system is shut down or suspended. Additionally use the encrypted
|
||||
suspend image to prevent sensitive data from being stolen after
|
||||
resume.
|
||||
|
||||
Q: Can I suspend to a swap file?
|
||||
|
||||
A: Generally, yes, you can. However, it requires you to use the "resume=" and
|
||||
"resume_offset=" kernel command line parameters, so the resume from a swap file
|
||||
cannot be initiated from an initrd or initramfs image. See
|
||||
swsusp-and-swap-files.txt for details.
|
||||
|
||||
Q: Is there a maximum system RAM size that is supported by swsusp?
|
||||
|
||||
A: It should work okay with highmem.
|
||||
|
||||
Q: Does swsusp (to disk) use only one swap partition or can it use
|
||||
multiple swap partitions (aggregate them into one logical space)?
|
||||
|
||||
A: Only one swap partition, sorry.
|
||||
|
||||
Q: If my application(s) causes lots of memory & swap space to be used
|
||||
(over half of the total system RAM), is it correct that it is likely
|
||||
to be useless to try to suspend to disk while that app is running?
|
||||
|
||||
A: No, it should work okay, as long as your app does not mlock()
|
||||
it. Just prepare big enough swap partition.
|
||||
|
||||
Q: What information is useful for debugging suspend-to-disk problems?
|
||||
|
||||
A: Well, last messages on the screen are always useful. If something
|
||||
is broken, it is usually some kernel driver, therefore trying with as
|
||||
little as possible modules loaded helps a lot. I also prefer people to
|
||||
suspend from console, preferably without X running. Booting with
|
||||
init=/bin/bash, then swapon and starting suspend sequence manually
|
||||
usually does the trick. Then it is good idea to try with latest
|
||||
vanilla kernel.
|
||||
|
||||
Q: How can distributions ship a swsusp-supporting kernel with modular
|
||||
disk drivers (especially SATA)?
|
||||
|
||||
A: Well, it can be done, load the drivers, then do echo into
|
||||
/sys/power/resume file from initrd. Be sure not to mount
|
||||
anything, not even read-only mount, or you are going to lose your
|
||||
data.
|
||||
|
||||
Q: How do I make suspend more verbose?
|
||||
|
||||
A: If you want to see any non-error kernel messages on the virtual
|
||||
terminal the kernel switches to during suspend, you have to set the
|
||||
kernel console loglevel to at least 4 (KERN_WARNING), for example by
|
||||
doing
|
||||
|
||||
# save the old loglevel
|
||||
read LOGLEVEL DUMMY < /proc/sys/kernel/printk
|
||||
# set the loglevel so we see the progress bar.
|
||||
# if the level is higher than needed, we leave it alone.
|
||||
if [ $LOGLEVEL -lt 5 ]; then
|
||||
echo 5 > /proc/sys/kernel/printk
|
||||
fi
|
||||
|
||||
IMG_SZ=0
|
||||
read IMG_SZ < /sys/power/image_size
|
||||
echo -n disk > /sys/power/state
|
||||
RET=$?
|
||||
#
|
||||
# the logic here is:
|
||||
# if image_size > 0 (without kernel support, IMG_SZ will be zero),
|
||||
# then try again with image_size set to zero.
|
||||
if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size
|
||||
echo 0 > /sys/power/image_size
|
||||
echo -n disk > /sys/power/state
|
||||
RET=$?
|
||||
fi
|
||||
|
||||
# restore previous loglevel
|
||||
echo $LOGLEVEL > /proc/sys/kernel/printk
|
||||
exit $RET
|
||||
|
||||
Q: Is this true that if I have a mounted filesystem on a USB device and
|
||||
I suspend to disk, I can lose data unless the filesystem has been mounted
|
||||
with "sync"?
|
||||
|
||||
A: That's right ... if you disconnect that device, you may lose data.
|
||||
In fact, even with "-o sync" you can lose data if your programs have
|
||||
information in buffers they haven't written out to a disk you disconnect,
|
||||
or if you disconnect before the device finished saving data you wrote.
|
||||
|
||||
Software suspend normally powers down USB controllers, which is equivalent
|
||||
to disconnecting all USB devices attached to your system.
|
||||
|
||||
Your system might well support low-power modes for its USB controllers
|
||||
while the system is asleep, maintaining the connection, using true sleep
|
||||
modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the
|
||||
/sys/power/state file; write "standby" or "mem".) We've not seen any
|
||||
hardware that can use these modes through software suspend, although in
|
||||
theory some systems might support "platform" modes that won't break the
|
||||
USB connections.
|
||||
|
||||
Remember that it's always a bad idea to unplug a disk drive containing a
|
||||
mounted filesystem. That's true even when your system is asleep! The
|
||||
safest thing is to unmount all filesystems on removable media (such USB,
|
||||
Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays)
|
||||
before suspending; then remount them after resuming.
|
||||
|
||||
There is a work-around for this problem. For more information, see
|
||||
Documentation/driver-api/usb/persist.rst.
|
||||
|
||||
Q: Can I suspend-to-disk using a swap partition under LVM?
|
||||
|
||||
A: Yes and No. You can suspend successfully, but the kernel will not be able
|
||||
to resume on its own. You need an initramfs that can recognize the resume
|
||||
situation, activate the logical volume containing the swap volume (but not
|
||||
touch any filesystems!), and eventually call
|
||||
|
||||
echo -n "$major:$minor" > /sys/power/resume
|
||||
|
||||
where $major and $minor are the respective major and minor device numbers of
|
||||
the swap volume.
|
||||
|
||||
uswsusp works with LVM, too. See http://suspend.sourceforge.net/
|
||||
|
||||
Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
|
||||
compiled with the similar configuration files. Anyway I found that
|
||||
suspend to disk (and resume) is much slower on 2.6.16 compared to
|
||||
2.6.15. Any idea for why that might happen or how can I speed it up?
|
||||
|
||||
A: This is because the size of the suspend image is now greater than
|
||||
for 2.6.15 (by saving more data we can get more responsive system
|
||||
after resume).
|
||||
|
||||
There's the /sys/power/image_size knob that controls the size of the
|
||||
image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as
|
||||
root), the 2.6.15 behavior should be restored. If it is still too
|
||||
slow, take a look at suspend.sf.net -- userland suspend is faster and
|
||||
supports LZF compression to speed it up further.
|
@ -1,5 +1,7 @@
|
||||
swsusp/S3 tricks
|
||||
~~~~~~~~~~~~~~~~
|
||||
================
|
||||
swsusp/S3 tricks
|
||||
================
|
||||
|
||||
Pavel Machek <pavel@ucw.cz>
|
||||
|
||||
If you want to trick swsusp/S3 into working, you might want to try:
|
@ -1,4 +1,7 @@
|
||||
=====================================================
|
||||
Documentation for userland software suspend interface
|
||||
=====================================================
|
||||
|
||||
(C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
|
||||
|
||||
First, the warnings at the beginning of swsusp.txt still apply.
|
||||
@ -30,13 +33,16 @@ called.
|
||||
|
||||
The ioctl() commands recognized by the device are:
|
||||
|
||||
SNAPSHOT_FREEZE - freeze user space processes (the current process is
|
||||
SNAPSHOT_FREEZE
|
||||
freeze user space processes (the current process is
|
||||
not frozen); this is required for SNAPSHOT_CREATE_IMAGE
|
||||
and SNAPSHOT_ATOMIC_RESTORE to succeed
|
||||
|
||||
SNAPSHOT_UNFREEZE - thaw user space processes frozen by SNAPSHOT_FREEZE
|
||||
SNAPSHOT_UNFREEZE
|
||||
thaw user space processes frozen by SNAPSHOT_FREEZE
|
||||
|
||||
SNAPSHOT_CREATE_IMAGE - create a snapshot of the system memory; the
|
||||
SNAPSHOT_CREATE_IMAGE
|
||||
create a snapshot of the system memory; the
|
||||
last argument of ioctl() should be a pointer to an int variable,
|
||||
the value of which will indicate whether the call returned after
|
||||
creating the snapshot (1) or after restoring the system memory state
|
||||
@ -45,48 +51,59 @@ SNAPSHOT_CREATE_IMAGE - create a snapshot of the system memory; the
|
||||
has been created the read() operation can be used to transfer
|
||||
it out of the kernel
|
||||
|
||||
SNAPSHOT_ATOMIC_RESTORE - restore the system memory state from the
|
||||
SNAPSHOT_ATOMIC_RESTORE
|
||||
restore the system memory state from the
|
||||
uploaded snapshot image; before calling it you should transfer
|
||||
the system memory snapshot back to the kernel using the write()
|
||||
operation; this call will not succeed if the snapshot
|
||||
image is not available to the kernel
|
||||
|
||||
SNAPSHOT_FREE - free memory allocated for the snapshot image
|
||||
SNAPSHOT_FREE
|
||||
free memory allocated for the snapshot image
|
||||
|
||||
SNAPSHOT_PREF_IMAGE_SIZE - set the preferred maximum size of the image
|
||||
SNAPSHOT_PREF_IMAGE_SIZE
|
||||
set the preferred maximum size of the image
|
||||
(the kernel will do its best to ensure the image size will not exceed
|
||||
this number, but if it turns out to be impossible, the kernel will
|
||||
create the smallest image possible)
|
||||
|
||||
SNAPSHOT_GET_IMAGE_SIZE - return the actual size of the hibernation image
|
||||
SNAPSHOT_GET_IMAGE_SIZE
|
||||
return the actual size of the hibernation image
|
||||
|
||||
SNAPSHOT_AVAIL_SWAP_SIZE - return the amount of available swap in bytes (the
|
||||
SNAPSHOT_AVAIL_SWAP_SIZE
|
||||
return the amount of available swap in bytes (the
|
||||
last argument should be a pointer to an unsigned int variable that will
|
||||
contain the result if the call is successful).
|
||||
|
||||
SNAPSHOT_ALLOC_SWAP_PAGE - allocate a swap page from the resume partition
|
||||
SNAPSHOT_ALLOC_SWAP_PAGE
|
||||
allocate a swap page from the resume partition
|
||||
(the last argument should be a pointer to a loff_t variable that
|
||||
will contain the swap page offset if the call is successful)
|
||||
|
||||
SNAPSHOT_FREE_SWAP_PAGES - free all swap pages allocated by
|
||||
SNAPSHOT_FREE_SWAP_PAGES
|
||||
free all swap pages allocated by
|
||||
SNAPSHOT_ALLOC_SWAP_PAGE
|
||||
|
||||
SNAPSHOT_SET_SWAP_AREA - set the resume partition and the offset (in <PAGE_SIZE>
|
||||
SNAPSHOT_SET_SWAP_AREA
|
||||
set the resume partition and the offset (in <PAGE_SIZE>
|
||||
units) from the beginning of the partition at which the swap header is
|
||||
located (the last ioctl() argument should point to a struct
|
||||
resume_swap_area, as defined in kernel/power/suspend_ioctls.h,
|
||||
containing the resume device specification and the offset); for swap
|
||||
partitions the offset is always 0, but it is different from zero for
|
||||
swap files (see Documentation/power/swsusp-and-swap-files.txt for
|
||||
swap files (see Documentation/power/swsusp-and-swap-files.rst for
|
||||
details).
|
||||
|
||||
SNAPSHOT_PLATFORM_SUPPORT - enable/disable the hibernation platform support,
|
||||
SNAPSHOT_PLATFORM_SUPPORT
|
||||
enable/disable the hibernation platform support,
|
||||
depending on the argument value (enable, if the argument is nonzero)
|
||||
|
||||
SNAPSHOT_POWER_OFF - make the kernel transition the system to the hibernation
|
||||
SNAPSHOT_POWER_OFF
|
||||
make the kernel transition the system to the hibernation
|
||||
state (eg. ACPI S4) using the platform (eg. ACPI) driver
|
||||
|
||||
SNAPSHOT_S2RAM - suspend to RAM; using this call causes the kernel to
|
||||
SNAPSHOT_S2RAM
|
||||
suspend to RAM; using this call causes the kernel to
|
||||
immediately enter the suspend-to-RAM state, so this call must always
|
||||
be preceded by the SNAPSHOT_FREEZE call and it is also necessary
|
||||
to use the SNAPSHOT_UNFREEZE call after the system wakes up. This call
|
||||
@ -98,10 +115,11 @@ SNAPSHOT_S2RAM - suspend to RAM; using this call causes the kernel to
|
||||
|
||||
The device's read() operation can be used to transfer the snapshot image from
|
||||
the kernel. It has the following limitations:
|
||||
|
||||
- you cannot read() more than one virtual memory page at a time
|
||||
- read()s across page boundaries are impossible (ie. if you read() 1/2 of
|
||||
a page in the previous call, you will only be able to read()
|
||||
_at_ _most_ 1/2 of the page in the next call)
|
||||
a page in the previous call, you will only be able to read()
|
||||
**at most** 1/2 of the page in the next call)
|
||||
|
||||
The device's write() operation is used for uploading the system memory snapshot
|
||||
into the kernel. It has the same limitations as the read() operation.
|
||||
@ -143,8 +161,10 @@ preferably using mlockall(), before calling SNAPSHOT_FREEZE.
|
||||
The suspending utility MUST check the value stored by SNAPSHOT_CREATE_IMAGE
|
||||
in the memory location pointed to by the last argument of ioctl() and proceed
|
||||
in accordance with it:
|
||||
|
||||
1. If the value is 1 (ie. the system memory snapshot has just been
|
||||
created and the system is ready for saving it):
|
||||
|
||||
(a) The suspending utility MUST NOT close the snapshot device
|
||||
_unless_ the whole suspend procedure is to be cancelled, in
|
||||
which case, if the snapshot image has already been saved, the
|
||||
@ -158,6 +178,7 @@ in accordance with it:
|
||||
called. However, it MAY mount a file system that was not
|
||||
mounted at that time and perform some operations on it (eg.
|
||||
use it for saving the image).
|
||||
|
||||
2. If the value is 0 (ie. the system state has just been restored from
|
||||
the snapshot image), the suspending utility MUST close the snapshot
|
||||
device. Afterwards it will be treated as a regular userland process,
|
@ -1,7 +1,8 @@
|
||||
===========================
|
||||
Video issues with S3 resume
|
||||
===========================
|
||||
|
||||
Video issues with S3 resume
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
2003-2006, Pavel Machek
|
||||
2003-2006, Pavel Machek
|
||||
|
||||
During S3 resume, hardware needs to be reinitialized. For most
|
||||
devices, this is easy, and kernel driver knows how to do
|
||||
@ -41,37 +42,37 @@ There are a few types of systems where video works after S3 resume:
|
||||
(1) systems where video state is preserved over S3.
|
||||
|
||||
(2) systems where it is possible to call the video BIOS during S3
|
||||
resume. Unfortunately, it is not correct to call the video BIOS at
|
||||
that point, but it happens to work on some machines. Use
|
||||
acpi_sleep=s3_bios.
|
||||
resume. Unfortunately, it is not correct to call the video BIOS at
|
||||
that point, but it happens to work on some machines. Use
|
||||
acpi_sleep=s3_bios.
|
||||
|
||||
(3) systems that initialize video card into vga text mode and where
|
||||
the BIOS works well enough to be able to set video mode. Use
|
||||
acpi_sleep=s3_mode on these.
|
||||
the BIOS works well enough to be able to set video mode. Use
|
||||
acpi_sleep=s3_mode on these.
|
||||
|
||||
(4) on some systems s3_bios kicks video into text mode, and
|
||||
acpi_sleep=s3_bios,s3_mode is needed.
|
||||
acpi_sleep=s3_bios,s3_mode is needed.
|
||||
|
||||
(5) radeon systems, where X can soft-boot your video card. You'll need
|
||||
a new enough X, and a plain text console (no vesafb or radeonfb). See
|
||||
http://www.doesi.gmxhome.de/linux/tm800s3/s3.html for more information.
|
||||
Alternatively, you should use vbetool (6) instead.
|
||||
a new enough X, and a plain text console (no vesafb or radeonfb). See
|
||||
http://www.doesi.gmxhome.de/linux/tm800s3/s3.html for more information.
|
||||
Alternatively, you should use vbetool (6) instead.
|
||||
|
||||
(6) other radeon systems, where vbetool is enough to bring system back
|
||||
to life. It needs text console to be working. Do vbetool vbestate
|
||||
save > /tmp/delme; echo 3 > /proc/acpi/sleep; vbetool post; vbetool
|
||||
vbestate restore < /tmp/delme; setfont <whatever>, and your video
|
||||
should work.
|
||||
to life. It needs text console to be working. Do vbetool vbestate
|
||||
save > /tmp/delme; echo 3 > /proc/acpi/sleep; vbetool post; vbetool
|
||||
vbestate restore < /tmp/delme; setfont <whatever>, and your video
|
||||
should work.
|
||||
|
||||
(7) on some systems, it is possible to boot most of kernel, and then
|
||||
POSTing bios works. Ole Rohne has patch to do just that at
|
||||
http://dev.gentoo.org/~marineam/patch-radeonfb-2.6.11-rc2-mm2.
|
||||
POSTing bios works. Ole Rohne has patch to do just that at
|
||||
http://dev.gentoo.org/~marineam/patch-radeonfb-2.6.11-rc2-mm2.
|
||||
|
||||
(8) on some systems, you can use the video_post utility and or
|
||||
do echo 3 > /sys/power/state && /usr/sbin/video_post - which will
|
||||
initialize the display in console mode. If you are in X, you can switch
|
||||
to a virtual terminal and back to X using CTRL+ALT+F1 - CTRL+ALT+F7 to get
|
||||
the display working in graphical mode again.
|
||||
(8) on some systems, you can use the video_post utility and or
|
||||
do echo 3 > /sys/power/state && /usr/sbin/video_post - which will
|
||||
initialize the display in console mode. If you are in X, you can switch
|
||||
to a virtual terminal and back to X using CTRL+ALT+F1 - CTRL+ALT+F7 to get
|
||||
the display working in graphical mode again.
|
||||
|
||||
Now, if you pass acpi_sleep=something, and it does not work with your
|
||||
bios, you'll get a hard crash during resume. Be careful. Also it is
|
||||
@ -87,99 +88,126 @@ chance of working.
|
||||
|
||||
Table of known working notebooks:
|
||||
|
||||
|
||||
=============================== ===============================================
|
||||
Model hack (or "how to do it")
|
||||
------------------------------------------------------------------------------
|
||||
=============================== ===============================================
|
||||
Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI
|
||||
Acer TM 230 s3_bios (2)
|
||||
Acer TM 242FX vbetool (6)
|
||||
Acer TM C110 video_post (8)
|
||||
Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8)
|
||||
Acer TM C300 vga=normal (only suspend on console, not in X),
|
||||
vbetool (6) or video_post (8)
|
||||
Acer TM 4052LCi s3_bios (2)
|
||||
Acer TM 636Lci s3_bios,s3_mode (4)
|
||||
Acer TM 650 (Radeon M7) vga=normal plus boot-radeon (5) gets text console back
|
||||
Acer TM 660 ??? (*)
|
||||
Acer TM 800 vga=normal, X patches, see webpage (5) or vbetool (6)
|
||||
Acer TM 803 vga=normal, X patches, see webpage (5) or vbetool (6)
|
||||
Acer TM 650 (Radeon M7) vga=normal plus boot-radeon (5) gets text
|
||||
console back
|
||||
Acer TM 660 ??? [#f1]_
|
||||
Acer TM 800 vga=normal, X patches, see webpage (5)
|
||||
or vbetool (6)
|
||||
Acer TM 803 vga=normal, X patches, see webpage (5)
|
||||
or vbetool (6)
|
||||
Acer TM 803LCi vga=normal, vbetool (6)
|
||||
Arima W730a vbetool needed (6)
|
||||
Asus L2400D s3_mode (3)(***) (S1 also works OK)
|
||||
Asus L2400D s3_mode (3) [#f2]_ (S1 also works OK)
|
||||
Asus L3350M (SiS 740) (6)
|
||||
Asus L3800C (Radeon M7) s3_bios (2) (S1 also works OK)
|
||||
Asus M6887Ne vga=normal, s3_bios (2), use radeon driver instead of fglrx in x.org
|
||||
Asus M6887Ne vga=normal, s3_bios (2), use radeon driver
|
||||
instead of fglrx in x.org
|
||||
Athlon64 desktop prototype s3_bios (2)
|
||||
Compal CL-50 ??? (*)
|
||||
Compal CL-50 ??? [#f1]_
|
||||
Compaq Armada E500 - P3-700 none (1) (S1 also works OK)
|
||||
Compaq Evo N620c vga=normal, s3_bios (2)
|
||||
Dell 600m, ATI R250 Lf none (1), but needs xorg-x11-6.8.1.902-1
|
||||
Dell D600, ATI RV250 vga=normal and X, or try vbestate (6)
|
||||
Dell D610 vga=normal and X (possibly vbestate (6) too, but not tested)
|
||||
Dell Inspiron 4000 ??? (*)
|
||||
Dell Inspiron 500m ??? (*)
|
||||
Dell D610 vga=normal and X (possibly vbestate (6) too,
|
||||
but not tested)
|
||||
Dell Inspiron 4000 ??? [#f1]_
|
||||
Dell Inspiron 500m ??? [#f1]_
|
||||
Dell Inspiron 510m ???
|
||||
Dell Inspiron 5150 vbetool needed (6)
|
||||
Dell Inspiron 600m ??? (*)
|
||||
Dell Inspiron 8200 ??? (*)
|
||||
Dell Inspiron 8500 ??? (*)
|
||||
Dell Inspiron 8600 ??? (*)
|
||||
eMachines athlon64 machines vbetool needed (6) (someone please get me model #s)
|
||||
HP NC6000 s3_bios, may not use radeonfb (2); or vbetool (6)
|
||||
HP NX7000 ??? (*)
|
||||
HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X
|
||||
Dell Inspiron 600m ??? [#f1]_
|
||||
Dell Inspiron 8200 ??? [#f1]_
|
||||
Dell Inspiron 8500 ??? [#f1]_
|
||||
Dell Inspiron 8600 ??? [#f1]_
|
||||
eMachines athlon64 machines vbetool needed (6) (someone please get
|
||||
me model #s)
|
||||
HP NC6000 s3_bios, may not use radeonfb (2);
|
||||
or vbetool (6)
|
||||
HP NX7000 ??? [#f1]_
|
||||
HP Pavilion ZD7000 vbetool post needed, need open-source nv
|
||||
driver for X
|
||||
HP Omnibook XE3 athlon version none (1)
|
||||
HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV
|
||||
HP Omnibook XE3L-GF vbetool (6)
|
||||
HP Omnibook 5150 none (1), (S1 also works OK)
|
||||
IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work.
|
||||
IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(]
|
||||
IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294
|
||||
Savage/IX-MV, vesafb gets "interesting"
|
||||
but X work.
|
||||
IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with
|
||||
BIOS 1.04 2002-08-23, but not at all with
|
||||
BIOS 1.11 2004-11-05 :-(]
|
||||
IBM TP R32 / Type 2658-MMG none (1)
|
||||
IBM TP R40 2722B3G ??? (*)
|
||||
IBM TP R40 2722B3G ??? [#f1]_
|
||||
IBM TP R50p / Type 1832-22U s3_bios (2)
|
||||
IBM TP R51 none (1)
|
||||
IBM TP T30 236681A ??? (*)
|
||||
IBM TP T30 236681A ??? [#f1]_
|
||||
IBM TP T40 / Type 2373-MU4 none (1)
|
||||
IBM TP T40p none (1)
|
||||
IBM TP R40p s3_bios (2)
|
||||
IBM TP T41p s3_bios (2), switch to X after resume
|
||||
IBM TP T42 s3_bios (2)
|
||||
IBM ThinkPad T42p (2373-GTG) s3_bios (2)
|
||||
IBM TP X20 ??? (*)
|
||||
IBM TP X20 ??? [#f1]_
|
||||
IBM TP X30 s3_bios, s3_mode (4)
|
||||
IBM TP X31 / Type 2672-XXH none (1), use radeontool (http://fdd.com/software/radeon/) to turn off backlight.
|
||||
IBM TP X32 none (1), but backlight is on and video is trashed after long suspend. s3_bios,s3_mode (4) works too. Perhaps that gets better results?
|
||||
IBM TP X31 / Type 2672-XXH none (1), use radeontool
|
||||
(http://fdd.com/software/radeon/) to
|
||||
turn off backlight.
|
||||
IBM TP X32 none (1), but backlight is on and video is
|
||||
trashed after long suspend. s3_bios,
|
||||
s3_mode (4) works too. Perhaps that gets
|
||||
better results?
|
||||
IBM Thinkpad X40 Type 2371-7JG s3_bios,s3_mode (4)
|
||||
IBM TP 600e none(1), but a switch to console and back to X is needed
|
||||
Medion MD4220 ??? (*)
|
||||
IBM TP 600e none(1), but a switch to console and
|
||||
back to X is needed
|
||||
Medion MD4220 ??? [#f1]_
|
||||
Samsung P35 vbetool needed (6)
|
||||
Sharp PC-AR10 (ATI rage) none (1), backlight does not switch off
|
||||
Sony Vaio PCG-C1VRX/K s3_bios (2)
|
||||
Sony Vaio PCG-F403 ??? (*)
|
||||
Sony Vaio PCG-F403 ??? [#f1]_
|
||||
Sony Vaio PCG-GRT995MP none (1), works with 'nv' X driver
|
||||
Sony Vaio PCG-GR7/K none (1), but needs radeonfb, use radeontool (http://fdd.com/software/radeon/) to turn off backlight.
|
||||
Sony Vaio PCG-N505SN ??? (*)
|
||||
Sony Vaio PCG-GR7/K none (1), but needs radeonfb, use
|
||||
radeontool (http://fdd.com/software/radeon/)
|
||||
to turn off backlight.
|
||||
Sony Vaio PCG-N505SN ??? [#f1]_
|
||||
Sony Vaio vgn-s260 X or boot-radeon can init it (5)
|
||||
Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will be blank unless you return to X.
|
||||
Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will
|
||||
be blank unless you return to X.
|
||||
Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4)
|
||||
Toshiba Libretto L5 none (1)
|
||||
Toshiba Libretto 100CT/110CT vbetool (6)
|
||||
Toshiba Portege 3020CT s3_mode (3)
|
||||
Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK)
|
||||
Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK)
|
||||
Toshiba Satellite 4090XCDT ??? (*)
|
||||
Toshiba Satellite P10-554 s3_bios,s3_mode (4)(****)
|
||||
Toshiba Satellite 4090XCDT ??? [#f1]_
|
||||
Toshiba Satellite P10-554 s3_bios,s3_mode (4)[#f3]_
|
||||
Toshiba M30 (2) xor X with nvidia driver using internal AGP
|
||||
Uniwill 244IIO ??? (*)
|
||||
Uniwill 244IIO ??? [#f1]_
|
||||
=============================== ===============================================
|
||||
|
||||
Known working desktop systems
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
=================== ============================= ========================
|
||||
Mainboard Graphics card hack (or "how to do it")
|
||||
------------------------------------------------------------------------------
|
||||
=================== ============================= ========================
|
||||
Asus A7V8X nVidia RIVA TNT2 model 64 s3_bios,s3_mode (4)
|
||||
=================== ============================= ========================
|
||||
|
||||
|
||||
(*) from https://wiki.ubuntu.com/HoaryPMResults, not sure
|
||||
which options to use. If you know, please tell me.
|
||||
.. [#f1] from https://wiki.ubuntu.com/HoaryPMResults, not sure
|
||||
which options to use. If you know, please tell me.
|
||||
|
||||
(***) To be tested with a newer kernel.
|
||||
.. [#f2] To be tested with a newer kernel.
|
||||
|
||||
(****) Not with SMP kernel, UP only.
|
||||
.. [#f3] Not with SMP kernel, UP only.
|
@ -117,7 +117,7 @@ PM support:
|
||||
implemented") error. You should also try to make sure that your
|
||||
driver uses as little power as possible when it's not doing
|
||||
anything. For the driver testing instructions see
|
||||
Documentation/power/drivers-testing.txt and for a relatively
|
||||
Documentation/power/drivers-testing.rst and for a relatively
|
||||
complete overview of the power management issues related to
|
||||
drivers see :ref:`Documentation/driver-api/pm/devices.rst <driverapi_pm_devices>`.
|
||||
|
||||
|
@ -22,7 +22,7 @@ the highest.
|
||||
|
||||
The actual EM used by EAS is _not_ maintained by the scheduler, but by a
|
||||
dedicated framework. For details about this framework and what it provides,
|
||||
please refer to its documentation (see Documentation/power/energy-model.txt).
|
||||
please refer to its documentation (see Documentation/power/energy-model.rst).
|
||||
|
||||
|
||||
2. Background and Terminology
|
||||
@ -81,7 +81,7 @@ through the arch_scale_cpu_capacity() callback.
|
||||
|
||||
The rest of platform knowledge used by EAS is directly read from the Energy
|
||||
Model (EM) framework. The EM of a platform is composed of a power cost table
|
||||
per 'performance domain' in the system (see Documentation/power/energy-model.txt
|
||||
per 'performance domain' in the system (see Documentation/power/energy-model.rst
|
||||
for futher details about performance domains).
|
||||
|
||||
The scheduler manages references to the EM objects in the topology code when the
|
||||
@ -352,7 +352,7 @@ could be amended in the future if proven otherwise.
|
||||
EAS uses the EM of a platform to estimate the impact of scheduling decisions on
|
||||
energy. So, your platform must provide power cost tables to the EM framework in
|
||||
order to make EAS start. To do so, please refer to documentation of the
|
||||
independent EM framework in Documentation/power/energy-model.txt.
|
||||
independent EM framework in Documentation/power/energy-model.rst.
|
||||
|
||||
Please also note that the scheduling domains need to be re-built after the
|
||||
EM has been registered in order to start EAS.
|
||||
|
@ -151,7 +151,7 @@ At the runtime you can disable idle states with below methods:
|
||||
|
||||
It is possible to disable CPU idle states by way of the PM QoS
|
||||
subsystem, more specifically by using the "/dev/cpu_dma_latency"
|
||||
interface (see Documentation/power/pm_qos_interface.txt for more
|
||||
interface (see Documentation/power/pm_qos_interface.rst for more
|
||||
details). As specified in the PM QoS documentation the requested
|
||||
parameter will stay in effect until the file descriptor is released.
|
||||
For example:
|
||||
|
@ -97,7 +97,7 @@ Linux 2.6:
|
||||
函数定义成返回 -ENOSYS(功能未实现)错误。你还应该尝试确
|
||||
保你的驱动在什么都不干的情况下将耗电降到最低。要获得驱动
|
||||
程序测试的指导,请参阅
|
||||
Documentation/power/drivers-testing.txt。有关驱动程序电
|
||||
Documentation/power/drivers-testing.rst。有关驱动程序电
|
||||
源管理问题相对全面的概述,请参阅
|
||||
Documentation/driver-api/pm/devices.rst。
|
||||
|
||||
|
@ -6446,7 +6446,7 @@ M: "Rafael J. Wysocki" <rjw@rjwysocki.net>
|
||||
M: Pavel Machek <pavel@ucw.cz>
|
||||
L: linux-pm@vger.kernel.org
|
||||
S: Supported
|
||||
F: Documentation/power/freezing-of-tasks.txt
|
||||
F: Documentation/power/freezing-of-tasks.rst
|
||||
F: include/linux/freezer.h
|
||||
F: kernel/freezer.c
|
||||
|
||||
@ -11764,7 +11764,7 @@ S: Maintained
|
||||
T: git git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git
|
||||
F: drivers/opp/
|
||||
F: include/linux/pm_opp.h
|
||||
F: Documentation/power/opp.txt
|
||||
F: Documentation/power/opp.rst
|
||||
F: Documentation/devicetree/bindings/opp/
|
||||
|
||||
OPL4 DRIVER
|
||||
@ -12143,7 +12143,7 @@ M: Sam Bobroff <sbobroff@linux.ibm.com>
|
||||
M: Oliver O'Halloran <oohall@gmail.com>
|
||||
L: linuxppc-dev@lists.ozlabs.org
|
||||
S: Supported
|
||||
F: Documentation/PCI/pci-error-recovery.txt
|
||||
F: Documentation/PCI/pci-error-recovery.rst
|
||||
F: drivers/pci/pcie/aer.c
|
||||
F: drivers/pci/pcie/dpc.c
|
||||
F: drivers/pci/pcie/err.c
|
||||
@ -12156,7 +12156,7 @@ PCI ERROR RECOVERY
|
||||
M: Linas Vepstas <linasvepstas@gmail.com>
|
||||
L: linux-pci@vger.kernel.org
|
||||
S: Supported
|
||||
F: Documentation/PCI/pci-error-recovery.txt
|
||||
F: Documentation/PCI/pci-error-recovery.rst
|
||||
|
||||
PCI MSI DRIVER FOR ALTERA MSI IP
|
||||
M: Ley Foon Tan <lftan@altera.com>
|
||||
|
@ -2447,7 +2447,7 @@ menuconfig APM
|
||||
machines with more than one CPU.
|
||||
|
||||
In order to use APM, you will need supporting software. For location
|
||||
and more information, read <file:Documentation/power/apm-acpi.txt>
|
||||
and more information, read <file:Documentation/power/apm-acpi.rst>
|
||||
and the Battery Powered Linux mini-HOWTO, available from
|
||||
<http://www.tldp.org/docs.html#howto>.
|
||||
|
||||
|
@ -1175,7 +1175,7 @@ struct skl_wm_params {
|
||||
* to be disabled. This shouldn't happen and we'll print some error messages in
|
||||
* case it happens.
|
||||
*
|
||||
* For more, read the Documentation/power/runtime_pm.txt.
|
||||
* For more, read the Documentation/power/runtime_pm.rst.
|
||||
*/
|
||||
struct i915_runtime_pm {
|
||||
atomic_t wakeref_count;
|
||||
|
@ -10,4 +10,4 @@ config PM_OPP
|
||||
OPP layer organizes the data internally using device pointers
|
||||
representing individual voltage domains and provides SOC
|
||||
implementations a ready to use framework to manage OPPs.
|
||||
For more information, read <file:Documentation/power/opp.txt>
|
||||
For more information, read <file:Documentation/power/opp.rst>
|
||||
|
@ -607,7 +607,7 @@ int power_supply_get_battery_info(struct power_supply *psy,
|
||||
|
||||
/* The property and field names below must correspond to elements
|
||||
* in enum power_supply_property. For reasoning, see
|
||||
* Documentation/power/power_supply_class.txt.
|
||||
* Documentation/power/power_supply_class.rst.
|
||||
*/
|
||||
|
||||
of_property_read_u32(battery_np, "energy-full-design-microwatt-hours",
|
||||
|
@ -52,7 +52,7 @@
|
||||
* irq line disabled until the threaded handler has been run.
|
||||
* IRQF_NO_SUSPEND - Do not disable this IRQ during suspend. Does not guarantee
|
||||
* that this interrupt will wake the system from a suspended
|
||||
* state. See Documentation/power/suspend-and-interrupts.txt
|
||||
* state. See Documentation/power/suspend-and-interrupts.rst
|
||||
* IRQF_FORCE_RESUME - Force enable it on resume even if IRQF_NO_SUSPEND is set
|
||||
* IRQF_NO_THREAD - Interrupt cannot be threaded
|
||||
* IRQF_EARLY_RESUME - Resume IRQ early during syscore instead of at device
|
||||
|
@ -16,6 +16,25 @@ typedef unsigned long kernel_ulong_t;
|
||||
|
||||
#define PCI_ANY_ID (~0)
|
||||
|
||||
/**
|
||||
* struct pci_device_id - PCI device ID structure
|
||||
* @vendor: Vendor ID to match (or PCI_ANY_ID)
|
||||
* @device: Device ID to match (or PCI_ANY_ID)
|
||||
* @subvendor: Subsystem vendor ID to match (or PCI_ANY_ID)
|
||||
* @subdevice: Subsystem device ID to match (or PCI_ANY_ID)
|
||||
* @class: Device class, subclass, and "interface" to match.
|
||||
* See Appendix D of the PCI Local Bus Spec or
|
||||
* include/linux/pci_ids.h for a full list of classes.
|
||||
* Most drivers do not need to specify class/class_mask
|
||||
* as vendor/device is normally sufficient.
|
||||
* @class_mask: Limit which sub-fields of the class field are compared.
|
||||
* See drivers/scsi/sym53c8xx_2/ for example of usage.
|
||||
* @driver_data: Data private to the driver.
|
||||
* Most drivers don't need to use driver_data field.
|
||||
* Best practice is to use driver_data as an index
|
||||
* into a static list of equivalent device types,
|
||||
* instead of using it as a pointer.
|
||||
*/
|
||||
struct pci_device_id {
|
||||
__u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/
|
||||
__u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */
|
||||
@ -257,17 +276,17 @@ struct pcmcia_device_id {
|
||||
__u16 match_flags;
|
||||
|
||||
__u16 manf_id;
|
||||
__u16 card_id;
|
||||
__u16 card_id;
|
||||
|
||||
__u8 func_id;
|
||||
__u8 func_id;
|
||||
|
||||
/* for real multi-function devices */
|
||||
__u8 function;
|
||||
__u8 function;
|
||||
|
||||
/* for pseudo multi-function devices */
|
||||
__u8 device_no;
|
||||
__u8 device_no;
|
||||
|
||||
__u32 prod_id_hash[4];
|
||||
__u32 prod_id_hash[4];
|
||||
|
||||
/* not matched against in kernelspace */
|
||||
const char * prod_id[4];
|
||||
|
@ -151,6 +151,8 @@ static inline const char *pci_power_name(pci_power_t state)
|
||||
#define PCI_PM_BUS_WAIT 50
|
||||
|
||||
/**
|
||||
* typedef pci_channel_state_t
|
||||
*
|
||||
* The pci_channel state describes connectivity between the CPU and
|
||||
* the PCI device. If some PCI bus between here and the PCI device
|
||||
* has crashed or locked up, this info is reflected here.
|
||||
@ -775,6 +777,50 @@ struct pci_error_handlers {
|
||||
|
||||
|
||||
struct module;
|
||||
|
||||
/**
|
||||
* struct pci_driver - PCI driver structure
|
||||
* @node: List of driver structures.
|
||||
* @name: Driver name.
|
||||
* @id_table: Pointer to table of device IDs the driver is
|
||||
* interested in. Most drivers should export this
|
||||
* table using MODULE_DEVICE_TABLE(pci,...).
|
||||
* @probe: This probing function gets called (during execution
|
||||
* of pci_register_driver() for already existing
|
||||
* devices or later if a new device gets inserted) for
|
||||
* all PCI devices which match the ID table and are not
|
||||
* "owned" by the other drivers yet. This function gets
|
||||
* passed a "struct pci_dev \*" for each device whose
|
||||
* entry in the ID table matches the device. The probe
|
||||
* function returns zero when the driver chooses to
|
||||
* take "ownership" of the device or an error code
|
||||
* (negative number) otherwise.
|
||||
* The probe function always gets called from process
|
||||
* context, so it can sleep.
|
||||
* @remove: The remove() function gets called whenever a device
|
||||
* being handled by this driver is removed (either during
|
||||
* deregistration of the driver or when it's manually
|
||||
* pulled out of a hot-pluggable slot).
|
||||
* The remove function always gets called from process
|
||||
* context, so it can sleep.
|
||||
* @suspend: Put device into low power state.
|
||||
* @suspend_late: Put device into low power state.
|
||||
* @resume_early: Wake device from low power state.
|
||||
* @resume: Wake device from low power state.
|
||||
* (Please see Documentation/power/pci.rst for descriptions
|
||||
* of PCI Power Management and the related functions.)
|
||||
* @shutdown: Hook into reboot_notifier_list (kernel/sys.c).
|
||||
* Intended to stop any idling DMA operations.
|
||||
* Useful for enabling wake-on-lan (NIC) or changing
|
||||
* the power state of a device before reboot.
|
||||
* e.g. drivers/net/e100.c.
|
||||
* @sriov_configure: Optional driver callback to allow configuration of
|
||||
* number of VFs to enable via sysfs "sriov_numvfs" file.
|
||||
* @err_handler: See Documentation/PCI/pci-error-recovery.rst
|
||||
* @groups: Sysfs attribute groups.
|
||||
* @driver: Driver model structure.
|
||||
* @dynids: List of dynamically added device IDs.
|
||||
*/
|
||||
struct pci_driver {
|
||||
struct list_head node;
|
||||
const char *name;
|
||||
@ -2206,7 +2252,7 @@ static inline u8 pci_vpd_srdt_tag(const u8 *srdt)
|
||||
|
||||
/**
|
||||
* pci_vpd_info_field_size - Extracts the information field length
|
||||
* @lrdt: Pointer to the beginning of an information field header
|
||||
* @info_field: Pointer to the beginning of an information field header
|
||||
*
|
||||
* Returns the extracted information field length.
|
||||
*/
|
||||
|
@ -284,7 +284,7 @@ typedef struct pm_message {
|
||||
* actions to be performed by a device driver's callbacks generally depend on
|
||||
* the platform and subsystem the device belongs to.
|
||||
*
|
||||
* Refer to Documentation/power/runtime_pm.txt for more information about the
|
||||
* Refer to Documentation/power/runtime_pm.rst for more information about the
|
||||
* role of the @runtime_suspend(), @runtime_resume() and @runtime_idle()
|
||||
* callbacks in device runtime power management.
|
||||
*/
|
||||
|
@ -65,7 +65,7 @@ config HIBERNATION
|
||||
need to run mkswap against the swap partition used for the suspend.
|
||||
|
||||
It also works with swap files to a limited extent (for details see
|
||||
<file:Documentation/power/swsusp-and-swap-files.txt>).
|
||||
<file:Documentation/power/swsusp-and-swap-files.rst>).
|
||||
|
||||
Right now you may boot without resuming and resume later but in the
|
||||
meantime you cannot use the swap partition(s)/file(s) involved in
|
||||
@ -74,7 +74,7 @@ config HIBERNATION
|
||||
MOUNT any journaled filesystems mounted before the suspend or they
|
||||
will get corrupted in a nasty way.
|
||||
|
||||
For more information take a look at <file:Documentation/power/swsusp.txt>.
|
||||
For more information take a look at <file:Documentation/power/swsusp.rst>.
|
||||
|
||||
config ARCH_SAVE_PAGE_KEYS
|
||||
bool
|
||||
@ -255,7 +255,7 @@ config APM_EMULATION
|
||||
notification of APM "events" (e.g. battery status change).
|
||||
|
||||
In order to use APM, you will need supporting software. For location
|
||||
and more information, read <file:Documentation/power/apm-acpi.txt>
|
||||
and more information, read <file:Documentation/power/apm-acpi.rst>
|
||||
and the Battery Powered Linux mini-HOWTO, available from
|
||||
<http://www.tldp.org/docs.html#howto>.
|
||||
|
||||
|
@ -165,7 +165,7 @@ config CFG80211_DEFAULT_PS
|
||||
|
||||
If this causes your applications to misbehave you should fix your
|
||||
applications instead -- they need to register their network
|
||||
latency requirement, see Documentation/power/pm_qos_interface.txt.
|
||||
latency requirement, see Documentation/power/pm_qos_interface.rst.
|
||||
|
||||
config CFG80211_DEBUGFS
|
||||
bool "cfg80211 DebugFS entries"
|
||||
|
Loading…
Reference in New Issue
Block a user