forked from Minki/linux
25d8d4eeca
- Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl8tOxATHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDQfEAClXHWf6hnxB84bEu39D51NkVotL1IG BRWFvyix+xHuUkHIouBPAAMl6ngY5X6wkYd+Z+CY9zHNtdSDoVlJE30YXdMQA/dE L/rYxR1884yGR/uU/3wusboO68ReXwcKQPmKOymUfh0zH7ujyJsSWLpXFK1YDC5d 2TVVTi0Q+P5ucMHDh0L+AHirIxZvtZSp43+J7xLtywsj+XAxJWCTGo5WCJbdgbCA Qbv3aOkVyUa3EgsbdM/STPpv82ebqT+PHxeSIO4Jw6ZODtKRH0R5YsWCApuY9eZ+ ebY9RLmgv9ZAhJqB2fv9A5NDcMoGpZNmjM7HrWpXwULKQpkBGHCzJ9FcSdHVMOx8 nbVMFjt4uzLwV1w8lFYslQ2tNH/uH2o9BlryV1RLpiiKokDAJO/NOsWN9y0u/I4J EmAM5DSX2LgVvvas96IlGK8KX4xkOkf8FLX/H5UDvvAfloH8J4CZXk/CWCab/nqY KEHPnMmYvQZ1w9SzyZg9sO/1p6Bl1Gmm75Jv2F1lBiRW/42VcGBI/qLsJ4lC59Fc KbwufYNYYG38wbxDLW1HAPJhRonxIcaZj3EEqk7aTiLZ55nNbu8e2k32CpNXTGqt npOhzJHimcq7L6+878ZW+xpbZwogIEUdRSsmwb6aT8za3ShnYwSA2Q3LYxh9xyGH j3GifvPq6Efp3Q== =QMY1 -----END PGP SIGNATURE----- Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits) selftests/powerpc: Fix pkey syscall redefinitions powerpc: Fix circular dependency between percpu.h and mmu.h powerpc/powernv/sriov: Fix use of uninitialised variable selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs powerpc/40x: Fix assembler warning about r0 powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric powerpc/papr_scm: Fetch nvdimm performance stats from PHYP cpuidle: pseries: Fixup exit latency for CEDE(0) cpuidle: pseries: Add function to parse extended CEDE records cpuidle: pseries: Set the latency-hint before entering CEDE selftests/powerpc: Fix online CPU selection powerpc/perf: Consolidate perf_callchain_user_[64|32]() powerpc/pseries/hotplug-cpu: Remove double free in error path powerpc/pseries/mobility: Add pr_debug() for device tree changes powerpc/pseries/mobility: Set pr_fmt() powerpc/cacheinfo: Warn if cache object chain becomes unordered powerpc/cacheinfo: Improve diagnostics about malformed cache lists powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages powerpc/cacheinfo: Set pr_fmt() powerpc: fix function annotations to avoid section mismatch warnings with gcc-10 ...
366 lines
16 KiB
ReStructuredText
366 lines
16 KiB
ReStructuredText
=========================
|
|
CPU hotplug in the Kernel
|
|
=========================
|
|
|
|
:Date: December, 2016
|
|
:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
|
|
Rusty Russell <rusty@rustcorp.com.au>,
|
|
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
|
|
Ashok Raj <ashok.raj@intel.com>,
|
|
Joel Schopp <jschopp@austin.ibm.com>
|
|
|
|
Introduction
|
|
============
|
|
|
|
Modern advances in system architectures have introduced advanced error
|
|
reporting and correction capabilities in processors. There are couple OEMS that
|
|
support NUMA hardware which are hot pluggable as well, where physical node
|
|
insertion and removal require support for CPU hotplug.
|
|
|
|
Such advances require CPUs available to a kernel to be removed either for
|
|
provisioning reasons, or for RAS purposes to keep an offending CPU off
|
|
system execution path. Hence the need for CPU hotplug support in the
|
|
Linux kernel.
|
|
|
|
A more novel use of CPU-hotplug support is its use today in suspend resume
|
|
support for SMP. Dual-core and HT support makes even a laptop run SMP kernels
|
|
which didn't support these methods.
|
|
|
|
|
|
Command Line Switches
|
|
=====================
|
|
``maxcpus=n``
|
|
Restrict boot time CPUs to *n*. Say if you have fourV CPUs, using
|
|
``maxcpus=2`` will only boot two. You can choose to bring the
|
|
other CPUs later online.
|
|
|
|
``nr_cpus=n``
|
|
Restrict the total amount of CPUs the kernel will support. If the number
|
|
supplied here is lower than the number of physically available CPUs, then
|
|
those CPUs can not be brought online later.
|
|
|
|
``additional_cpus=n``
|
|
Use this to limit hotpluggable CPUs. This option sets
|
|
``cpu_possible_mask = cpu_present_mask + additional_cpus``
|
|
|
|
This option is limited to the IA64 architecture.
|
|
|
|
``possible_cpus=n``
|
|
This option sets ``possible_cpus`` bits in ``cpu_possible_mask``.
|
|
|
|
This option is limited to the X86 and S390 architecture.
|
|
|
|
``cpu0_hotplug``
|
|
Allow to shutdown CPU0.
|
|
|
|
This option is limited to the X86 architecture.
|
|
|
|
CPU maps
|
|
========
|
|
|
|
``cpu_possible_mask``
|
|
Bitmap of possible CPUs that can ever be available in the
|
|
system. This is used to allocate some boot time memory for per_cpu variables
|
|
that aren't designed to grow/shrink as CPUs are made available or removed.
|
|
Once set during boot time discovery phase, the map is static, i.e no bits
|
|
are added or removed anytime. Trimming it accurately for your system needs
|
|
upfront can save some boot time memory.
|
|
|
|
``cpu_online_mask``
|
|
Bitmap of all CPUs currently online. Its set in ``__cpu_up()``
|
|
after a CPU is available for kernel scheduling and ready to receive
|
|
interrupts from devices. Its cleared when a CPU is brought down using
|
|
``__cpu_disable()``, before which all OS services including interrupts are
|
|
migrated to another target CPU.
|
|
|
|
``cpu_present_mask``
|
|
Bitmap of CPUs currently present in the system. Not all
|
|
of them may be online. When physical hotplug is processed by the relevant
|
|
subsystem (e.g ACPI) can change and new bit either be added or removed
|
|
from the map depending on the event is hot-add/hot-remove. There are currently
|
|
no locking rules as of now. Typical usage is to init topology during boot,
|
|
at which time hotplug is disabled.
|
|
|
|
You really don't need to manipulate any of the system CPU maps. They should
|
|
be read-only for most use. When setting up per-cpu resources almost always use
|
|
``cpu_possible_mask`` or ``for_each_possible_cpu()`` to iterate. To macro
|
|
``for_each_cpu()`` can be used to iterate over a custom CPU mask.
|
|
|
|
Never use anything other than ``cpumask_t`` to represent bitmap of CPUs.
|
|
|
|
|
|
Using CPU hotplug
|
|
=================
|
|
The kernel option *CONFIG_HOTPLUG_CPU* needs to be enabled. It is currently
|
|
available on multiple architectures including ARM, MIPS, PowerPC and X86. The
|
|
configuration is done via the sysfs interface: ::
|
|
|
|
$ ls -lh /sys/devices/system/cpu
|
|
total 0
|
|
drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu0
|
|
drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu1
|
|
drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu2
|
|
drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu3
|
|
drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu4
|
|
drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu5
|
|
drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu6
|
|
drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu7
|
|
drwxr-xr-x 2 root root 0 Dec 21 16:33 hotplug
|
|
-r--r--r-- 1 root root 4.0K Dec 21 16:33 offline
|
|
-r--r--r-- 1 root root 4.0K Dec 21 16:33 online
|
|
-r--r--r-- 1 root root 4.0K Dec 21 16:33 possible
|
|
-r--r--r-- 1 root root 4.0K Dec 21 16:33 present
|
|
|
|
The files *offline*, *online*, *possible*, *present* represent the CPU masks.
|
|
Each CPU folder contains an *online* file which controls the logical on (1) and
|
|
off (0) state. To logically shutdown CPU4: ::
|
|
|
|
$ echo 0 > /sys/devices/system/cpu/cpu4/online
|
|
smpboot: CPU 4 is now offline
|
|
|
|
Once the CPU is shutdown, it will be removed from */proc/interrupts*,
|
|
*/proc/cpuinfo* and should also not be shown visible by the *top* command. To
|
|
bring CPU4 back online: ::
|
|
|
|
$ echo 1 > /sys/devices/system/cpu/cpu4/online
|
|
smpboot: Booting Node 0 Processor 4 APIC 0x1
|
|
|
|
The CPU is usable again. This should work on all CPUs. CPU0 is often special
|
|
and excluded from CPU hotplug. On X86 the kernel option
|
|
*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to
|
|
shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be
|
|
used. Some known dependencies of CPU0:
|
|
|
|
* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline.
|
|
* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected.
|
|
|
|
Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies
|
|
on CPU0.
|
|
|
|
The CPU hotplug coordination
|
|
============================
|
|
|
|
The offline case
|
|
----------------
|
|
Once a CPU has been logically shutdown the teardown callbacks of registered
|
|
hotplug states will be invoked, starting with ``CPUHP_ONLINE`` and terminating
|
|
at state ``CPUHP_OFFLINE``. This includes:
|
|
|
|
* If tasks are frozen due to a suspend operation then *cpuhp_tasks_frozen*
|
|
will be set to true.
|
|
* All processes are migrated away from this outgoing CPU to new CPUs.
|
|
The new CPU is chosen from each process' current cpuset, which may be
|
|
a subset of all online CPUs.
|
|
* All interrupts targeted to this CPU are migrated to a new CPU
|
|
* timers are also migrated to a new CPU
|
|
* Once all services are migrated, kernel calls an arch specific routine
|
|
``__cpu_disable()`` to perform arch specific cleanup.
|
|
|
|
Using the hotplug API
|
|
---------------------
|
|
It is possible to receive notifications once a CPU is offline or onlined. This
|
|
might be important to certain drivers which need to perform some kind of setup
|
|
or clean up functions based on the number of available CPUs: ::
|
|
|
|
#include <linux/cpuhotplug.h>
|
|
|
|
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "X/Y:online",
|
|
Y_online, Y_prepare_down);
|
|
|
|
*X* is the subsystem and *Y* the particular driver. The *Y_online* callback
|
|
will be invoked during registration on all online CPUs. If an error
|
|
occurs during the online callback the *Y_prepare_down* callback will be
|
|
invoked on all CPUs on which the online callback was previously invoked.
|
|
After registration completed, the *Y_online* callback will be invoked
|
|
once a CPU is brought online and *Y_prepare_down* will be invoked when a
|
|
CPU is shutdown. All resources which were previously allocated in
|
|
*Y_online* should be released in *Y_prepare_down*.
|
|
The return value *ret* is negative if an error occurred during the
|
|
registration process. Otherwise a positive value is returned which
|
|
contains the allocated hotplug for dynamically allocated states
|
|
(*CPUHP_AP_ONLINE_DYN*). It will return zero for predefined states.
|
|
|
|
The callback can be remove by invoking ``cpuhp_remove_state()``. In case of a
|
|
dynamically allocated state (*CPUHP_AP_ONLINE_DYN*) use the returned state.
|
|
During the removal of a hotplug state the teardown callback will be invoked.
|
|
|
|
Multiple instances
|
|
~~~~~~~~~~~~~~~~~~
|
|
If a driver has multiple instances and each instance needs to perform the
|
|
callback independently then it is likely that a ''multi-state'' should be used.
|
|
First a multi-state state needs to be registered: ::
|
|
|
|
ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "X/Y:online,
|
|
Y_online, Y_prepare_down);
|
|
Y_hp_online = ret;
|
|
|
|
The ``cpuhp_setup_state_multi()`` behaves similar to ``cpuhp_setup_state()``
|
|
except it prepares the callbacks for a multi state and does not invoke
|
|
the callbacks. This is a one time setup.
|
|
Once a new instance is allocated, you need to register this new instance: ::
|
|
|
|
ret = cpuhp_state_add_instance(Y_hp_online, &d->node);
|
|
|
|
This function will add this instance to your previously allocated
|
|
*Y_hp_online* state and invoke the previously registered callback
|
|
(*Y_online*) on all online CPUs. The *node* element is a ``struct
|
|
hlist_node`` member of your per-instance data structure.
|
|
|
|
On removal of the instance: ::
|
|
cpuhp_state_remove_instance(Y_hp_online, &d->node)
|
|
|
|
should be invoked which will invoke the teardown callback on all online
|
|
CPUs.
|
|
|
|
Manual setup
|
|
~~~~~~~~~~~~
|
|
Usually it is handy to invoke setup and teardown callbacks on registration or
|
|
removal of a state because usually the operation needs to performed once a CPU
|
|
goes online (offline) and during initial setup (shutdown) of the driver. However
|
|
each registration and removal function is also available with a ``_nocalls``
|
|
suffix which does not invoke the provided callbacks if the invocation of the
|
|
callbacks is not desired. During the manual setup (or teardown) the functions
|
|
``get_online_cpus()`` and ``put_online_cpus()`` should be used to inhibit CPU
|
|
hotplug operations.
|
|
|
|
|
|
The ordering of the events
|
|
--------------------------
|
|
The hotplug states are defined in ``include/linux/cpuhotplug.h``:
|
|
|
|
* The states *CPUHP_OFFLINE* … *CPUHP_AP_OFFLINE* are invoked before the
|
|
CPU is up.
|
|
* The states *CPUHP_AP_OFFLINE* … *CPUHP_AP_ONLINE* are invoked
|
|
just the after the CPU has been brought up. The interrupts are off and
|
|
the scheduler is not yet active on this CPU. Starting with *CPUHP_AP_OFFLINE*
|
|
the callbacks are invoked on the target CPU.
|
|
* The states between *CPUHP_AP_ONLINE_DYN* and *CPUHP_AP_ONLINE_DYN_END* are
|
|
reserved for the dynamic allocation.
|
|
* The states are invoked in the reverse order on CPU shutdown starting with
|
|
*CPUHP_ONLINE* and stopping at *CPUHP_OFFLINE*. Here the callbacks are
|
|
invoked on the CPU that will be shutdown until *CPUHP_AP_OFFLINE*.
|
|
|
|
A dynamically allocated state via *CPUHP_AP_ONLINE_DYN* is often enough.
|
|
However if an earlier invocation during the bring up or shutdown is required
|
|
then an explicit state should be acquired. An explicit state might also be
|
|
required if the hotplug event requires specific ordering in respect to
|
|
another hotplug event.
|
|
|
|
Testing of hotplug states
|
|
=========================
|
|
One way to verify whether a custom state is working as expected or not is to
|
|
shutdown a CPU and then put it online again. It is also possible to put the CPU
|
|
to certain state (for instance *CPUHP_AP_ONLINE*) and then go back to
|
|
*CPUHP_ONLINE*. This would simulate an error one state after *CPUHP_AP_ONLINE*
|
|
which would lead to rollback to the online state.
|
|
|
|
All registered states are enumerated in ``/sys/devices/system/cpu/hotplug/states``: ::
|
|
|
|
$ tail /sys/devices/system/cpu/hotplug/states
|
|
138: mm/vmscan:online
|
|
139: mm/vmstat:online
|
|
140: lib/percpu_cnt:online
|
|
141: acpi/cpu-drv:online
|
|
142: base/cacheinfo:online
|
|
143: virtio/net:online
|
|
144: x86/mce:online
|
|
145: printk:online
|
|
168: sched:active
|
|
169: online
|
|
|
|
To rollback CPU4 to ``lib/percpu_cnt:online`` and back online just issue: ::
|
|
|
|
$ cat /sys/devices/system/cpu/cpu4/hotplug/state
|
|
169
|
|
$ echo 140 > /sys/devices/system/cpu/cpu4/hotplug/target
|
|
$ cat /sys/devices/system/cpu/cpu4/hotplug/state
|
|
140
|
|
|
|
It is important to note that the teardown callbac of state 140 have been
|
|
invoked. And now get back online: ::
|
|
|
|
$ echo 169 > /sys/devices/system/cpu/cpu4/hotplug/target
|
|
$ cat /sys/devices/system/cpu/cpu4/hotplug/state
|
|
169
|
|
|
|
With trace events enabled, the individual steps are visible, too: ::
|
|
|
|
# TASK-PID CPU# TIMESTAMP FUNCTION
|
|
# | | | | |
|
|
bash-394 [001] 22.976: cpuhp_enter: cpu: 0004 target: 140 step: 169 (cpuhp_kick_ap_work)
|
|
cpuhp/4-31 [004] 22.977: cpuhp_enter: cpu: 0004 target: 140 step: 168 (sched_cpu_deactivate)
|
|
cpuhp/4-31 [004] 22.990: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0
|
|
cpuhp/4-31 [004] 22.991: cpuhp_enter: cpu: 0004 target: 140 step: 144 (mce_cpu_pre_down)
|
|
cpuhp/4-31 [004] 22.992: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0
|
|
cpuhp/4-31 [004] 22.993: cpuhp_multi_enter: cpu: 0004 target: 140 step: 143 (virtnet_cpu_down_prep)
|
|
cpuhp/4-31 [004] 22.994: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0
|
|
cpuhp/4-31 [004] 22.995: cpuhp_enter: cpu: 0004 target: 140 step: 142 (cacheinfo_cpu_pre_down)
|
|
cpuhp/4-31 [004] 22.996: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0
|
|
bash-394 [001] 22.997: cpuhp_exit: cpu: 0004 state: 140 step: 169 ret: 0
|
|
bash-394 [005] 95.540: cpuhp_enter: cpu: 0004 target: 169 step: 140 (cpuhp_kick_ap_work)
|
|
cpuhp/4-31 [004] 95.541: cpuhp_enter: cpu: 0004 target: 169 step: 141 (acpi_soft_cpu_online)
|
|
cpuhp/4-31 [004] 95.542: cpuhp_exit: cpu: 0004 state: 141 step: 141 ret: 0
|
|
cpuhp/4-31 [004] 95.543: cpuhp_enter: cpu: 0004 target: 169 step: 142 (cacheinfo_cpu_online)
|
|
cpuhp/4-31 [004] 95.544: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0
|
|
cpuhp/4-31 [004] 95.545: cpuhp_multi_enter: cpu: 0004 target: 169 step: 143 (virtnet_cpu_online)
|
|
cpuhp/4-31 [004] 95.546: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0
|
|
cpuhp/4-31 [004] 95.547: cpuhp_enter: cpu: 0004 target: 169 step: 144 (mce_cpu_online)
|
|
cpuhp/4-31 [004] 95.548: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0
|
|
cpuhp/4-31 [004] 95.549: cpuhp_enter: cpu: 0004 target: 169 step: 145 (console_cpu_notify)
|
|
cpuhp/4-31 [004] 95.550: cpuhp_exit: cpu: 0004 state: 145 step: 145 ret: 0
|
|
cpuhp/4-31 [004] 95.551: cpuhp_enter: cpu: 0004 target: 169 step: 168 (sched_cpu_activate)
|
|
cpuhp/4-31 [004] 95.552: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0
|
|
bash-394 [005] 95.553: cpuhp_exit: cpu: 0004 state: 169 step: 140 ret: 0
|
|
|
|
As it an be seen, CPU4 went down until timestamp 22.996 and then back up until
|
|
95.552. All invoked callbacks including their return codes are visible in the
|
|
trace.
|
|
|
|
Architecture's requirements
|
|
===========================
|
|
The following functions and configurations are required:
|
|
|
|
``CONFIG_HOTPLUG_CPU``
|
|
This entry needs to be enabled in Kconfig
|
|
|
|
``__cpu_up()``
|
|
Arch interface to bring up a CPU
|
|
|
|
``__cpu_disable()``
|
|
Arch interface to shutdown a CPU, no more interrupts can be handled by the
|
|
kernel after the routine returns. This includes the shutdown of the timer.
|
|
|
|
``__cpu_die()``
|
|
This actually supposed to ensure death of the CPU. Actually look at some
|
|
example code in other arch that implement CPU hotplug. The processor is taken
|
|
down from the ``idle()`` loop for that specific architecture. ``__cpu_die()``
|
|
typically waits for some per_cpu state to be set, to ensure the processor dead
|
|
routine is called to be sure positively.
|
|
|
|
User Space Notification
|
|
=======================
|
|
After CPU successfully onlined or offline udev events are sent. A udev rule like: ::
|
|
|
|
SUBSYSTEM=="cpu", DRIVERS=="processor", DEVPATH=="/devices/system/cpu/*", RUN+="the_hotplug_receiver.sh"
|
|
|
|
will receive all events. A script like: ::
|
|
|
|
#!/bin/sh
|
|
|
|
if [ "${ACTION}" = "offline" ]
|
|
then
|
|
echo "CPU ${DEVPATH##*/} offline"
|
|
|
|
elif [ "${ACTION}" = "online" ]
|
|
then
|
|
echo "CPU ${DEVPATH##*/} online"
|
|
|
|
fi
|
|
|
|
can process the event further.
|
|
|
|
Kernel Inline Documentations Reference
|
|
======================================
|
|
|
|
.. kernel-doc:: include/linux/cpuhotplug.h
|