linux

History

Neeraj Upadhyay 683954e55c rcu: Check and report missed fqs timer wakeup on RCU stall For a new grace period request, the RCU GP kthread transitions through following states: a. [RCU_GP_WAIT_GPS] -> [RCU_GP_DONE_GPS] The RCU_GP_WAIT_GPS state is where the GP kthread waits for a request for a new GP. Once it receives a request (for example, when a new RCU callback is queued), the GP kthread transitions to RCU_GP_DONE_GPS. b. [RCU_GP_DONE_GPS] -> [RCU_GP_ONOFF] Grace period initialization starts in rcu_gp_init(), which records the start of new GP in rcu_state.gp_seq and transitions to RCU_GP_ONOFF. c. [RCU_GP_ONOFF] -> [RCU_GP_INIT] The purpose of the RCU_GP_ONOFF state is to apply the online/offline information that was buffered for any CPUs that recently came online or went offline. This state is maintained in per-leaf rcu_node bitmasks, with the buffered state in ->qsmaskinitnext and the state for the upcoming GP in ->qsmaskinit. At the end of this RCU_GP_ONOFF state, each bit in ->qsmaskinit will correspond to a CPU that must pass through a quiescent state before the upcoming grace period is allowed to complete. However, a leaf rcu_node structure with an all-zeroes ->qsmaskinit cannot necessarily be ignored. In preemptible RCU, there might well be tasks still in RCU read-side critical sections that were first preempted while running on one of the CPUs managed by this structure. Such tasks will be queued on this structure's ->blkd_tasks list. Only after this list fully drains can this leaf rcu_node structure be ignored, and even then only if none of its CPUs have come back online in the meantime. Once that happens, the ->qsmaskinit masks further up the tree will be updated to exclude this leaf rcu_node structure. Once the ->qsmaskinitnext and ->qsmaskinit fields have been updated as needed, the GP kthread transitions to RCU_GP_INIT. d. [RCU_GP_INIT] -> [RCU_GP_WAIT_FQS] The purpose of the RCU_GP_INIT state is to copy each ->qsmaskinit to the ->qsmask field within each rcu_node structure. This copying is done breadth-first from the root to the leaves. Why not just copy directly from ->qsmaskinitnext to ->qsmask? Because the ->qsmaskinitnext masks can change in the meantime as additional CPUs come online or go offline. Such changes would result in inconsistencies in the ->qsmask fields up and down the tree, which could in turn result in too-short grace periods or grace-period hangs. These issues are avoided by snapshotting the leaf rcu_node structures' ->qsmaskinitnext fields into their ->qsmaskinit counterparts, generating a consistent set of ->qsmaskinit fields throughout the tree, and only then copying these consistent ->qsmaskinit fields to their ->qsmask counterparts. Once this initialization step is complete, the GP kthread transitions to RCU_GP_WAIT_FQS, where it waits to do a force-quiescent-state scan on the one hand or for the end of the grace period on the other. e. [RCU_GP_WAIT_FQS] -> [RCU_GP_DOING_FQS] The RCU_GP_WAIT_FQS state waits for one of three things: (1) An explicit request to do a force-quiescent-state scan, (2) The end of the grace period, or (3) A short interval of time, after which it will do a force-quiescent-state (FQS) scan. The explicit request can come from rcutorture or from any CPU that has too many RCU callbacks queued (see the qhimark kernel parameter and the RCU_GP_FLAG_OVLD flag). The aforementioned "short period of time" is specified by the jiffies_till_first_fqs boot parameter for a given grace period's first FQS scan and by the jiffies_till_next_fqs for later FQS scans. Either way, once the wait is over, the GP kthread transitions to RCU_GP_DOING_FQS. f. [RCU_GP_DOING_FQS] -> [RCU_GP_CLEANUP] The RCU_GP_DOING_FQS state performs an FQS scan. Each such scan carries out two functions for any CPU whose bit is still set in its leaf rcu_node structure's ->qsmask field, that is, for any CPU that has not yet reported a quiescent state for the current grace period: i. Report quiescent states on behalf of CPUs that have been observed to be idle (from an RCU perspective) since the beginning of the grace period. ii. If the current grace period is too old, take various actions to encourage holdout CPUs to pass through quiescent states, including enlisting the aid of any calls to cond_resched() and might_sleep(), and even including IPIing the holdout CPUs. These checks are skipped for any leaf rcu_node structure with a all-zero ->qsmask field, however such structures are subject to RCU priority boosting if there are tasks on a given structure blocking the current grace period. The end of the grace period is detected when the root rcu_node structure's ->qsmask is zero and when there are no longer any preempted tasks blocking the current grace period. (No, this last check is not redundant. To see this, consider an rcu_node tree having exactly one structure that serves as both root and leaf.) Once the end of the grace period is detected, the GP kthread transitions to RCU_GP_CLEANUP. g. [RCU_GP_CLEANUP] -> [RCU_GP_CLEANED] The RCU_GP_CLEANUP state marks the end of grace period by updating the rcu_state structure's ->gp_seq field and also all rcu_node structures' ->gp_seq field. As before, the rcu_node tree is traversed in breadth first order. Once this update is complete, the GP kthread transitions to the RCU_GP_CLEANED state. i. [RCU_GP_CLEANED] -> [RCU_GP_INIT] Once in the RCU_GP_CLEANED state, the GP kthread immediately transitions into the RCU_GP_INIT state. j. The role of timers. If there is at least one idle CPU, and if timers are not firing, the transition from RCU_GP_DOING_FQS to RCU_GP_CLEANUP will never happen. Timers can fail to fire for a number of reasons, including issues in timer configuration, issues in the timer framework, and failure to handle softirqs (for example, when there is a storm of interrupts). Whatever the reason, if the timers fail to fire, the GP kthread will never be awakened, resulting in RCU CPU stall warnings and eventually in OOM. However, an RCU CPU stall warning has a large number of potential causes, as documented in Documentation/RCU/stallwarn.rst. This commit therefore adds analysis to the RCU CPU stall-warning code to emit an additional message if the cause of the stall is likely to be timer failure. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>		2021-01-06 16:54:11 -08:00
..
ABI	More power management updates for 5.11-rc1	2020-12-22 14:12:10 -08:00
accounting
admin-guide	A small set of late-arriving, small documentation fixes.	2020-12-24 14:20:33 -08:00
arm	ARM updates for 5.11:	2020-12-22 13:34:27 -08:00
arm64	ARM:	2020-12-20 10:44:05 -08:00
block
bpf
cdrom
core-api	Generic interrupt and irqchips subsystem:	2020-12-15 15:03:31 -08:00
cpu-freq
crypto
dev-tools	Merge branch 'akpm' (patches from Andrew)	2020-12-22 13:38:17 -08:00
devicetree	Devicetree fixes for v5.11, take 1:	2020-12-24 12:09:48 -08:00
doc-guide	kernel-doc: Fix example in Nested structs/unions	2020-12-08 10:23:17 -07:00
driver-api	RTC for 5.11	2020-12-20 10:12:06 -08:00
fault-injection
fb
features	ARM updates for 5.11:	2020-12-22 13:34:27 -08:00
filesystems	Various bug fixes and cleanups for ext4; no new features this cycle.	2020-12-24 14:16:02 -08:00
firmware_class
firmware-guide	Documentation: ACPI: enumeration: add PCI hierarchy representation	2020-11-23 17:59:49 +01:00
fpga
gpu	drm/docs: Fix todo.rst	2020-11-18 11:51:58 +01:00
hid	Merge branch 'for-5.11/core' into for-linus	2020-12-16 11:38:38 +01:00
hwmon	hwmon: (sbtsi) Add documentation	2020-12-12 08:34:29 -08:00
i2c
ia64	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
ide
iio
infiniband
input	Input: document inhibiting	2020-12-02 22:10:37 -08:00
isdn
kbuild	Kconfig updates for v5.11	2020-12-22 14:04:25 -08:00
kernel-hacking
leds	Changes for 5.11-rc1. Small cleanups/fixes mostly thanks to Marek,	2020-12-16 14:56:29 -08:00
litmus-tests
livepatch
locking	Documentation: seqlock: s/LOCKTYPE/LOCKNAME/g	2020-12-09 17:08:49 +01:00
m68k	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
maintainer
mhi
mips	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
misc-devices	Documentation: remove mic/index from misc-devices/index.rst	2020-11-04 11:38:32 +01:00
netlabel
networking	ARM: SoC drivers for v5.11	2020-12-16 16:38:41 -08:00
nios2	docs: nios2: add missing ReST file	2020-12-07 08:35:21 -07:00
nvdimm
openrisc	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
parisc	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
PCI
pcmcia
power	PM: EM: Update Energy Model with new flag indicating power scale	2020-11-10 20:29:28 +01:00
powerpc	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
process	A small set of late-arriving, small documentation fixes.	2020-12-24 14:20:33 -08:00
RCU	rcu: Check and report missed fqs timer wakeup on RCU stall	2021-01-06 16:54:11 -08:00
riscv	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
s390	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
scheduler	Power management updates for 5.11-rc1	2020-12-15 16:30:31 -08:00
scsi
security
sh	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
sound	ALSA: usb-audio: Add implicit_fb module option	2020-11-23 15:17:24 +01:00
sparc	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
sphinx	Kbuild updates for v5.11	2020-12-22 14:02:39 -08:00
sphinx-static
spi
staging
target	tweewide: Fix most Shebang lines	2020-12-08 23:30:04 +09:00
timers
trace	Kbuild updates for v5.11	2020-12-22 14:02:39 -08:00
translations	Networking updates for 5.11	2020-12-15 13:22:29 -08:00
usb
userspace-api	"Intel SGX is new hardware functionality that can be used by	2020-12-14 13:14:57 -08:00
virt	ARM:	2020-12-20 10:44:05 -08:00
vm	mm/lru: revise the comments of lru_lock	2020-12-15 14:48:04 -08:00
w1	w1: w1_therm: Rename conflicting sysfs attribute 'eeprom' to 'eeprom_cmd'	2020-11-12 08:50:13 +01:00
watchdog
x86	A much quieter cycle for documentation (happily), with, one hopes, the bulk	2020-12-14 16:55:54 -08:00
xtensa	A much quieter cycle for documentation (happily), with, one hopes, the bulk	2020-12-14 16:55:54 -08:00
.gitignore
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py	docs: Note that sphinx 1.7 will be required soon	2020-12-11 13:53:38 -07:00
COPYING-logo
docutils.conf
dontdiff
index.rst	docs: archis: add a per-architecture features list	2020-12-03 15:10:15 -07:00
Kconfig
logo.gif
Makefile
memory-barriers.txt	docs/memory-barriers.txt: Fix a typo in CPU MEMORY BARRIERS section	2020-11-06 17:24:51 -08:00
SubmittingPatches
watch_queue.rst