Commit Graph

982122 Commits

Author SHA1 Message Date
Paul E. McKenney
0d2460ba61 Merge branches 'doc.2021.01.06a', 'fixes.2021.01.04b', 'kfree_rcu.2021.01.04a', 'mmdumpobj.2021.01.22a', 'nocb.2021.01.06a', 'rt.2021.01.04a', 'stall.2021.01.06a', 'torture.2021.01.12a' and 'tortureall.2021.01.06a' into HEAD
doc.2021.01.06a: Documentation updates.
fixes.2021.01.04b: Miscellaneous fixes.
kfree_rcu.2021.01.04a: kfree_rcu() updates.
mmdumpobj.2021.01.22a: Dump allocation point for memory blocks.
nocb.2021.01.06a: RCU callback offload updates and cblist segment lengths.
rt.2021.01.04a: Real-time updates.
stall.2021.01.06a: RCU CPU stall warning updates.
torture.2021.01.12a: Torture-test updates and polling SRCU grace-period API.
tortureall.2021.01.06a: Torture-test script updates.
2021-01-22 15:26:44 -08:00
Paul E. McKenney
3375efeddf percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
Reference-count underflow for percpu_ref is detected in the RCU callback
percpu_ref_switch_to_atomic_rcu(), and the resulting warning does not
print anything allowing easy identification of which percpu_ref use
case is underflowing.  This is of course not normally a problem when
developing a new percpu_ref use case because it is most likely that
the problem resides in this new use case.  However, when deploying a
new kernel to a large set of servers, the underflow might well be a new
corner case in any of the old percpu_ref use cases.

This commit therefore calls mem_dump_obj() to dump out any additional
available information on the underflowing percpu_ref instance.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-22 15:24:23 -08:00
Paul E. McKenney
b4b7914a6a rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
The debug-object double-free checks in __call_rcu() print out the
RCU callback function, which is usually sufficient to track down the
double free.  However, all uses of things like queue_rcu_work() will
have the same RCU callback function (rcu_work_rcufn() in this case),
so a diagnostic message for a double queue_rcu_work() needs more than
just the callback function.

This commit therefore calls mem_dump_obj() to dump out any additional
available information on the double-freed callback.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-22 15:24:16 -08:00
Paul E. McKenney
bd34dcd412 mm: Make mem_obj_dump() vmalloc() dumps include start and length
This commit adds the starting address and number of pages to the vmalloc()
information dumped by way of vmalloc_dump_obj().

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-22 15:24:10 -08:00
Paul E. McKenney
98f180837a mm: Make mem_dump_obj() handle vmalloc() memory
This commit adds vmalloc() support to mem_dump_obj().  Note that the
vmalloc_dump_obj() function combines the checking and dumping, in
contrast with the split between kmem_valid_obj() and kmem_dump_obj().
The reason for the difference is that the checking in the vmalloc()
case involves acquiring a global lock, and redundant acquisitions of
global locks should be avoided, even on not-so-fast paths.

Note that this change causes on-stack variables to be reported as
vmalloc() storage from kernel_clone() or similar, depending on the degree
of inlining that your compiler does.  This is likely more helpful than
the earlier "non-paged (local) memory".

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-22 15:24:04 -08:00
Paul E. McKenney
b70fa3b12f mm: Make mem_dump_obj() handle NULL and zero-sized pointers
This commit makes mem_dump_obj() call out NULL and zero-sized pointers
specially instead of classifying them as non-paged memory.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-22 15:23:57 -08:00
Paul E. McKenney
8e7f37f2aa mm: Add mem_dump_obj() to print source of memory block
There are kernel facilities such as per-CPU reference counts that give
error messages in generic handlers or callbacks, whose messages are
unenlightening.  In the case of per-CPU reference-count underflow, this
is not a problem when creating a new use of this facility because in that
case the bug is almost certainly in the code implementing that new use.
However, trouble arises when deploying across many systems, which might
exercise corner cases that were not seen during development and testing.
Here, it would be really nice to get some kind of hint as to which of
several uses the underflow was caused by.

This commit therefore exposes a mem_dump_obj() function that takes
a pointer to memory (which must still be allocated if it has been
dynamically allocated) and prints available information on where that
memory came from.  This pointer can reference the middle of the block as
well as the beginning of the block, as needed by things like RCU callback
functions and timer handlers that might not know where the beginning of
the memory block is.  These functions and handlers can use mem_dump_obj()
to print out better hints as to where the problem might lie.

The information printed can depend on kernel configuration.  For example,
the allocation return address can be printed only for slab and slub,
and even then only when the necessary debug has been enabled.  For slab,
build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
to the next power of two or use the SLAB_STORE_USER when creating the
kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
to enable printing of the allocation-time stack trace.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
[ paulmck: Convert to printing and change names per Joonsoo Kim. ]
[ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
[ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
[ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
[ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
[ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-22 15:16:01 -08:00
Paul E. McKenney
d945f797e4 rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01
RCU's rcutree.use_softirq=0 kernel boot parameter substitutes the per-CPU
rcuc kthreads for softirq, which is used in real-time installations.
However, none of the rcutorture scenarios test this parameter.
This commit therefore adds rcutree.use_softirq=0 to the RUDE01 and
TASKS01 rcutorture scenarios, both of which indirectly exercise RCU.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-12 09:55:23 -08:00
Paul E. McKenney
1afb95fee0 torture: Maintain torture-specific set of CPUs-online books
The TREE01 rcutorture scenario intentionally creates confusion as to the
number of available CPUs by specifying the "maxcpus=8 nr_cpus=43" kernel
boot parameters.  This can disable rcutorture's load shedding, which
currently uses num_online_cpus(), which would count the extra 35 CPUs.
However, the rcutorture guest OS will be provisioned with only 8 CPUs,
which means that rcutorture will present full load even when all but one
of the original 8 CPUs are offline.  This can result in spurious errors
due to extreme overloading of that single remaining CPU.

This commit therefore keeps a separate set of books on the number of
usable online CPUs, so that torture_num_online_cpus() is used for load
shedding instead of num_online_cpus().  Note that initial sizing must
use num_online_cpus() because torture_num_online_cpus() will return
NR_CPUS until shortly after torture_onoff_init() is invoked.

Reported-by: Frederic Weisbecker <frederic@kernel.org>
[ paulmck: Apply feedback from kernel test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:22 -08:00
Paul E. McKenney
0b962c8fe0 torture: Clean up after torture-test CPU hotplugging
This commit puts all CPUs back online at the end of a torture test,
and also unconditionally puts them online at the beginning of the test,
rather than just in the case of built-in tests.  This allows torture tests
to behave in a predictable manner, whether built-in or based on modules.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:22 -08:00
Paul E. McKenney
edf7b84178 rcutorture: Make object_debug also double call_rcu() heap object
This commit provides a test for call_rcu() printing the allocation address
of a double-freed callback by double-freeing a callback allocated via
kmalloc().  However, this commit does not depend on any other commit.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:21 -08:00
Paul E. McKenney
8a67a20bf2 torture: Throttle VERBOSE_TOROUT_*() output
This commit adds kernel boot parameters torture.verbose_sleep_frequency
and torture.verbose_sleep_duration, which allow VERBOSE_TOROUT_*() output
to be throttled with periodic sleeps on large systems.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:21 -08:00
Paul E. McKenney
414c116e01 torture: Make refscale throttle high-rate printk()s
This commit adds a short delay for verbose_batched-throttled printk()s
to further decrease console flooding.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:20 -08:00
Paul E. McKenney
1eba0ef981 rcutorture: Use hrtimers for reader and writer delays
This commit replaces schedule_timeout_uninterruptible() and
schedule_timeout_interruptible() with torture_hrtimeout_us() and
torture_hrtimeout_jiffies() to avoid timer-wheel synchronization.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:20 -08:00
Paul E. McKenney
ed24affa71 torture: Make stutter use torture_hrtimeout_*() functions
This commit saves a few lines of code by making the stutter_wait()
and torture_stutter() functions use torture_hrtimeout_jiffies() and
torture_hrtimeout_us().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:20 -08:00
Paul E. McKenney
ea31fd9ca8 rcutorture: Use torture_hrtimeout_jiffies() to avoid busy-waits
Because rcu_torture_writer() and rcu_torture_fakewriter() predate
hrtimers, they do timer-wheel-decoupled timed waits by using the
timer-wheel-based schedule_timeout_interruptible() functions in
conjunction with a random udelay()-based wait.  This latter unnecessarily
burns CPU time, so this commit instead uses torture_hrtimeout_jiffies()
to decouple from the timer wheels without busy-waiting.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:19 -08:00
Paul E. McKenney
ae19aaafae torture: Add fuzzed hrtimer-based sleep functions
This commit adds torture_hrtimeout_ns(), torture_hrtimeout_us(),
torture_hrtimeout_ms(), torture_hrtimeout_jiffies(), and
torture_hrtimeout_s(), each of which uses hrtimers to block for a fuzzed
time interval.  These functions are intended to be used by the various
torture tests to decouple wakeups from the timer wheel, thus providing
more opportunity for Murphy to insert destructive race conditions.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:19 -08:00
Paul E. McKenney
682189a3f8 rcutorture: Make rcu_torture_fakewriter() use blocking wait primitives
Full testing of the new SRCU polling API requires that the fake
writers also use it in order to test concurrent calls to all of the API
members, especially start_poll_synchronize_srcu().  This commit makes
rcu_torture_fakewriter() use all available blocking grace-period-wait
primitives available from the RCU flavor under test.

Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:18 -08:00
Paul E. McKenney
18fbf307b7 rcutorture: Make synctype[] and nsynctype be static global
Full testing of the new SRCU polling API requires that the fake writers
also use it in order to test concurrent calls to all of the API members,
especially start_poll_synchronize_srcu().  This commit prepares the ground
for this by making the synctype[] and nsynctype variables be static
globals so that the rcu_torture_fakewriter() function can access them.
Initialization of these variables is moved from rcu_torture_writer()
to a new rcu_torture_write_types() function that is invoked from
rcu_torture_init() just before the first writer kthread is spawned.

Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:17:10 -08:00
Paul E. McKenney
12a910e3cd rcutorture: Require entire stutter period be post-boot
Currently, the rcu_torture_writer() function checks that all required
grace periods elapse during a stutter interval, which is a multi-second
time period during which the test load is removed.  However, this check
is suppressed during early boot (that is, before init is spawned) in
order to avoid false positives that otherwise occur due to heavy load
on the single boot CPU.

Unfortunately, this approach is insufficient.  It is possible that the
stutter interval might end just as init is spawned, so that early boot
conditions prevailed during almost the entire stutter interval.

This commit therefore takes a snapshot of boot-complete state just
before the stutter interval, thus suppressing the check for failure to
complete grace periods unless the entire stutter interval took place
after early boot.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:08:12 -08:00
Paul E. McKenney
e76506f0e8 refscale: Allow summarization of verbose output
The refscale test prints enough per-kthread console output to provoke RCU
CPU stall warnings on large systems.  This commit therefore allows this
output to be summarized.  For example, the refscale.verbose_batched=32
boot parameter would causes only every 32nd line of output to be logged.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:08:09 -08:00
Paul E. McKenney
e3e1a99787 torture: Compress KASAN vmlinux files
The sizes of vmlinux files built with KASAN enabled can approach a full
gigabyte, which can result in disk overflow sooner rather than later.
Fortunately, the xz command compresses them by almost an order of
magnitude.  This commit therefore uses xz to compress vmlinux file built
by torture.sh with KASAN enabled.

However, xz is not the fastest thing in the world.  In fact, it is way
slower than rotating-rust mass storage.  This commit therefore also adds a
--compress-kasan-vmlinux argument to specify the degree of xz concurrency,
which defaults to using all available CPUs if there are that many files in
need of compression.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:46 -08:00
Paul E. McKenney
c54e413822 torture: Add --kcsan-kmake-arg to torture.sh for KCSAN
In 2020, running KCSAN often requires careful choice of compiler.
This commit therefore adds a --kcsan-kmake-arg parameter to torture.sh
to allow specifying (for example) "CC=clang" to the kernel build process
to correctly build a KCSAN-enabled kernel.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:46 -08:00
Paul E. McKenney
c66c0f94b3 torture: Add command and results directory to torture.sh log
This commit adds the command and arguments to the torture.sh log file, and
also outputs the results directory.  This latter allows impatient users
to quickly find the results that are being generated by the current run.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:45 -08:00
Paul E. McKenney
8847bd4988 torture: Allow scenarios to be specified to torture.sh
This commit adds --configs-rcutorture, --configs-locktorture, and
--configs-scftorture arguments to torture.sh, allowing the desired
set of scenarios to be passed to each.  The default for each has been
changed from a large-system-appropriate set to just CFLIST for each.
Users are encouraged to create scripts that provide appropriate settings
for their specific systems.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:45 -08:00
Paul E. McKenney
5ae5f7453f torture: Drop log.long generation from torture.sh
Now that kvm.sh puts all the relevant details in the "log" file,
there is no need for torture.sh to generate a separate "log.long"
file.  This commit therefore drops this from torture.sh.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:45 -08:00
Paul E. McKenney
c679d90b21 torture: Make torture.sh refuse to do zero-length runs
This commit causes torture.sh to check for zero-length runs and to take
the cowardly option of refusing to run them, logging its cowardice for
later inspection.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:44 -08:00
Paul E. McKenney
d97addc419 torture: Make torture.sh throttle VERBOSE_TOROUT_*() for refscale
This commit causes torture.sh to use the torture.verbose_sleep_frequency
kernel boot parameter to throttle verbose refscale output on large systems.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:44 -08:00
Paul E. McKenney
1fe9cef42b torture: Make torture.sh allmodconfig retain and label output
This commit places "---" markers in the torture.sh script's allmodconfig
output, and uses "<<" to avoid overwriting earlier output from this
build test.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:43 -08:00
Paul E. McKenney
c9a9d8e8f2 torture: Create doyesno helper function for torture.sh
This commit saves a few lines of code by creating a doyesno helper bash
function for argument parsing.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:43 -08:00
Paul E. McKenney
264da4832b torture: Make torture.sh refscale runs use verbose_batched module parameter
On large systems, the refscale printk() rate can overrun the file system's
ability to accept console log messages.  This commit therefore uses the
new verbose_batched module parameter to rate-limit some of the higher-rate
printk() calls.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:42 -08:00
Paul E. McKenney
7a99487c76 torture: Make torture.sh rcuscale and refscale deal with allmodconfig
The .mod.c files created by allmodconfig builds interfers with the approach
torture.sh uses to enumerate types of rcuscale and refscale runs.  This
commit therefore tightens the pattern matching to avoid this interference.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:42 -08:00
Paul E. McKenney
532017b119 torture: Enable torture.sh argument checking
This commit uncomments the argument checking for the --duration argument
to torture.sh.  While in the area, it also corrects the duration units
from seconds to minutes.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:41 -08:00
Paul E. McKenney
69d2b33e3f torture: Auto-size SCF and scaling runs based on number of CPUs
This commit improves torture.sh flexibility by autoscaling the number
of CPUs to be used in variable-CPUs torture tests, including scftorture,
refscale, rcuscale, and kvfree.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:41 -08:00
Paul E. McKenney
a115a775a8 torture: Add "make allmodconfig" to torture.sh
This commit adds the ability to do "make allmodconfig" to torture.sh,
given that normal rcutorture runs do not normally catch missing exports.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:41 -08:00
Paul E. McKenney
197220d4a3 torture: Remove use of "eval" in torture.sh
The bash "eval" command enables Bobby Tables attacks, which might not
be a concern in torture testing by themselves, but one could imagine
these combined with a cut-and-paste attack.  This commit therefore gets
rid of them.  This comes at a price in terms of bash quoting not working
nicely, so the "--bootargs" argument lists are now passed to torture_one
via a bash-variable side channel.  This might be a bit ugly, but it will
also allow torture.sh to grow its own --bootargs parameter.

While in the area, add proper header comments for the bash functions.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:40 -08:00
Paul E. McKenney
1adb5d6b52 torture: Make torture.sh use common time-duration bash functions
This commit makes torture.sh use the new bash functions get_starttime()
and get_starttime_duration() created for kvm.sh.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 17:03:37 -08:00
Paul E. McKenney
bfc19c13d2 torture: Add torture.sh torture-everything script
Although tailoring a specific set of kvm.sh runs has served rcutorture
testing well over many years, it requires a relatively distraction-free
environment, which is not always available.  This commit therefore
adds a prototype torture.sh script that by default tortures pretty much
everything the rcutorture scripting is designed to torture, and which
can be given command-line arguments to take a more focused approach.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:59:47 -08:00
Neeraj Upadhyay
683954e55c rcu: Check and report missed fqs timer wakeup on RCU stall
For a new grace period request, the RCU GP kthread transitions through
following states:

a. [RCU_GP_WAIT_GPS] -> [RCU_GP_DONE_GPS]

The RCU_GP_WAIT_GPS state is where the GP kthread waits for a request
for a new GP.  Once it receives a request (for example, when a new RCU
callback is queued), the GP kthread transitions to RCU_GP_DONE_GPS.

b. [RCU_GP_DONE_GPS] -> [RCU_GP_ONOFF]

Grace period initialization starts in rcu_gp_init(), which records the
start of new GP in rcu_state.gp_seq and transitions to RCU_GP_ONOFF.

c. [RCU_GP_ONOFF] -> [RCU_GP_INIT]

The purpose of the RCU_GP_ONOFF state is to apply the online/offline
information that was buffered for any CPUs that recently came online or
went offline.  This state is maintained in per-leaf rcu_node bitmasks,
with the buffered state in ->qsmaskinitnext and the state for the upcoming
GP in ->qsmaskinit.  At the end of this RCU_GP_ONOFF state, each bit in
->qsmaskinit will correspond to a CPU that must pass through a quiescent
state before the upcoming grace period is allowed to complete.

However, a leaf rcu_node structure with an all-zeroes ->qsmaskinit
cannot necessarily be ignored.  In preemptible RCU, there might well be
tasks still in RCU read-side critical sections that were first preempted
while running on one of the CPUs managed by this structure.  Such tasks
will be queued on this structure's ->blkd_tasks list.  Only after this
list fully drains can this leaf rcu_node structure be ignored, and even
then only if none of its CPUs have come back online in the meantime.
Once that happens, the ->qsmaskinit masks further up the tree will be
updated to exclude this leaf rcu_node structure.

Once the ->qsmaskinitnext and ->qsmaskinit fields have been updated
as needed, the GP kthread transitions to RCU_GP_INIT.

d. [RCU_GP_INIT] -> [RCU_GP_WAIT_FQS]

The purpose of the RCU_GP_INIT state is to copy each ->qsmaskinit to
the ->qsmask field within each rcu_node structure.  This copying is done
breadth-first from the root to the leaves.  Why not just copy directly
from ->qsmaskinitnext to ->qsmask?  Because the ->qsmaskinitnext masks
can change in the meantime as additional CPUs come online or go offline.
Such changes would result in inconsistencies in the ->qsmask fields up and
down the tree, which could in turn result in too-short grace periods or
grace-period hangs.  These issues are avoided by snapshotting the leaf
rcu_node structures' ->qsmaskinitnext fields into their ->qsmaskinit
counterparts, generating a consistent set of ->qsmaskinit fields
throughout the tree, and only then copying these consistent ->qsmaskinit
fields to their ->qsmask counterparts.

Once this initialization step is complete, the GP kthread transitions
to RCU_GP_WAIT_FQS, where it waits to do a force-quiescent-state scan
on the one hand or for the end of the grace period on the other.

e. [RCU_GP_WAIT_FQS] -> [RCU_GP_DOING_FQS]

The RCU_GP_WAIT_FQS state waits for one of three things:  (1) An
explicit request to do a force-quiescent-state scan, (2) The end of
the grace period, or (3) A short interval of time, after which it
will do a force-quiescent-state (FQS) scan.  The explicit request can
come from rcutorture or from any CPU that has too many RCU callbacks
queued (see the qhimark kernel parameter and the RCU_GP_FLAG_OVLD
flag).  The aforementioned "short period of time" is specified by the
jiffies_till_first_fqs boot parameter for a given grace period's first
FQS scan and by the jiffies_till_next_fqs for later FQS scans.

Either way, once the wait is over, the GP kthread transitions to
RCU_GP_DOING_FQS.

f. [RCU_GP_DOING_FQS] -> [RCU_GP_CLEANUP]

The RCU_GP_DOING_FQS state performs an FQS scan.  Each such scan carries
out two functions for any CPU whose bit is still set in its leaf rcu_node
structure's ->qsmask field, that is, for any CPU that has not yet reported
a quiescent state for the current grace period:

  i.  Report quiescent states on behalf of CPUs that have been observed
      to be idle (from an RCU perspective) since the beginning of the
      grace period.

  ii. If the current grace period is too old, take various actions to
      encourage holdout CPUs to pass through quiescent states, including
      enlisting the aid of any calls to cond_resched() and might_sleep(),
      and even including IPIing the holdout CPUs.

These checks are skipped for any leaf rcu_node structure with a all-zero
->qsmask field, however such structures are subject to RCU priority
boosting if there are tasks on a given structure blocking the current
grace period.  The end of the grace period is detected when the root
rcu_node structure's ->qsmask is zero and when there are no longer any
preempted tasks blocking the current grace period.  (No, this last check
is not redundant.  To see this, consider an rcu_node tree having exactly
one structure that serves as both root and leaf.)

Once the end of the grace period is detected, the GP kthread transitions
to RCU_GP_CLEANUP.

g. [RCU_GP_CLEANUP] -> [RCU_GP_CLEANED]

The RCU_GP_CLEANUP state marks the end of grace period by updating the
rcu_state structure's ->gp_seq field and also all rcu_node structures'
->gp_seq field.  As before, the rcu_node tree is traversed in breadth
first order.  Once this update is complete, the GP kthread transitions
to the RCU_GP_CLEANED state.

i. [RCU_GP_CLEANED] -> [RCU_GP_INIT]

Once in the RCU_GP_CLEANED state, the GP kthread immediately transitions
into the RCU_GP_INIT state.

j. The role of timers.

If there is at least one idle CPU, and if timers are not firing, the
transition from RCU_GP_DOING_FQS to RCU_GP_CLEANUP will never happen.
Timers can fail to fire for a number of reasons, including issues in
timer configuration, issues in the timer framework, and failure to handle
softirqs (for example, when there is a storm of interrupts).  Whatever the
reason, if the timers fail to fire, the GP kthread will never be awakened,
resulting in RCU CPU stall warnings and eventually in OOM.

However, an RCU CPU stall warning has a large number of potential causes,
as documented in Documentation/RCU/stallwarn.rst.  This commit therefore
adds analysis to the RCU CPU stall-warning code to emit an additional
message if the cause of the stall is likely to be timer failure.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:54:11 -08:00
Paul E. McKenney
147c6852d3 rcu: Do any deferred nocb wakeups at CPU offline time
Because the need to wake a nocb GP kthread ("rcuog") is sometimes
detected when wakeups cannot be done, these wakeups can be deferred.
The wakeups are then carried out by calls to do_nocb_deferred_wakeup()
at various safe points in the code, including RCU's idle hooks.  However,
when a CPU goes offline, it invokes arch_cpu_idle_dead() without invoking
any of RCU's idle hooks.

This commit therefore adds a call to do_nocb_deferred_wakeup() in
rcu_report_dead() in order to handle any deferred wakeups that have been
requested by the outgoing CPU.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:50:24 -08:00
Paul E. McKenney
f759081e8f rcu/nocb: Code-style nits in callback-offloading toggling
This commit addresses a few code-style nits in callback-offloading
toggling, including one that predates this toggling.

Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:47:55 -08:00
Paul E. McKenney
3d0cef50f3 rcu/nocb: Add nocb CB kthread list to show_rcu_nocb_state() output
This commit improves debuggability by indicating laying out the order
in which rcuoc kthreads appear in the ->nocb_next_cb_rdp list.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:25:00 -08:00
Paul E. McKenney
341690611f rcu/nocb: Add grace period and task state to show_rcu_nocb_state() output
This commit improves debuggability by indicating which grace period each
batch of nocb callbacks is waiting on and by showing the task state and
last CPU for reach nocb kthread.

[ paulmck: Handle !SMP CB offloading per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:25:00 -08:00
Frederic Weisbecker
70e8088b97 tools/rcutorture: Support nocb toggle in TREE01
This commit adds periodic toggling of 7 of 8 CPUs every second to TREE01
in order to test NOCB toggle code.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:59 -08:00
Paul E. McKenney
2c4319bd1d rcutorture: Test runtime toggling of CPUs' callback offloading
Frederic Weisbecker is adding the ability to change the rcu_nocbs state
of CPUs at runtime, that is, to offload and deoffload their RCU callback
processing without the need to reboot.  As the old saying goes, "if it
ain't tested, it don't work", so this commit therefore adds prototype
rcutorture testing for this capability.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
2021-01-06 16:24:59 -08:00
Frederic Weisbecker
dcd42591eb timer: Add timer_curr_running()
This commit adds a timer_curr_running() function that verifies that the
current code is running in the context of the specified timer's handler.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:59 -08:00
Frederic Weisbecker
43759fe5a1 cpu/hotplug: Add lockdep_is_cpus_held()
This commit adds a lockdep_is_cpus_held() function to verify that the
proper locks are held and that various operations are running in the
correct context.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:59 -08:00
Frederic Weisbecker
634954c2db rcu/nocb: Locally accelerate callbacks as long as offloading isn't complete
The local callbacks processing checks if any callbacks need acceleration.
This commit carries out this checking under nocb lock protection in
the middle of toggle operations, during which time rcu_core() executes
concurrently with GP/CB kthreads.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:59 -08:00
Frederic Weisbecker
32aa2f4170 rcu/nocb: Process batch locally as long as offloading isn't complete
This commit makes sure to process the callbacks locally (via either
RCU_SOFTIRQ or the rcuc kthread) whenever the segcblist isn't entirely
offloaded.  This ensures that callbacks are invoked one way or another
while a CPU is in the middle of a toggle operation.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:59 -08:00
Frederic Weisbecker
e3abe959fb rcu/nocb: Only cond_resched() from actual offloaded batch processing
During a toggle operations, rcu_do_batch() may be invoked concurrently
by softirqs and offloaded processing for a given CPU's callbacks.
This commit therefore makes sure cond_resched() is invoked only from
the offloaded context.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Inspired-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06 16:24:59 -08:00