linux/kernel/rcu
Byungchul Park a35d16905e rcu: Add basic support for kfree_rcu() batching
Recently a discussion about stability and performance of a system
involving a high rate of kfree_rcu() calls surfaced on the list [1]
which led to another discussion how to prepare for this situation.

This patch adds basic batching support for kfree_rcu(). It is "basic"
because we do none of the slab management, dynamic allocation, code
moving or any of the other things, some of which previous attempts did
[2]. These fancier improvements can be follow-up patches and there are
different ideas being discussed in those regards. This is an effort to
start simple, and build up from there. In the future, an extension to
use kfree_bulk and possibly per-slab batching could be done to further
improve performance due to cache-locality and slab-specific bulk free
optimizations. By using an array of pointers, the worker thread
processing the work would need to read lesser data since it does not
need to deal with large rcu_head(s) any longer.

Torture tests follow in the next patch and show improvements of around
5x reduction in number of  grace periods on a 16 CPU system. More
details and test data are in that patch.

There is an implication with rcu_barrier() with this patch. Since the
kfree_rcu() calls can be batched, and may not be handed yet to the RCU
machinery in fact, the monitor may not have even run yet to do the
queue_rcu_work(), there seems no easy way of implementing rcu_barrier()
to wait for those kfree_rcu()s that are already made. So this means a
kfree_rcu() followed by an rcu_barrier() does not imply that memory will
be freed once rcu_barrier() returns.

Another implication is higher active memory usage (although not
run-away..) until the kfree_rcu() flooding ends, in comparison to
without batching. More details about this are in the second patch which
adds an rcuperf test.

Finally, in the near future we will get rid of kfree_rcu() special casing
within RCU such as in rcu_do_batch and switch everything to just
batching. Currently we don't do that since timer subsystem is not yet up
and we cannot schedule the kfree_rcu() monitor as the timer subsystem's
lock are not initialized. That would also mean getting rid of
kfree_call_rcu_nobatch() entirely.

[1] http://lore.kernel.org/lkml/20190723035725-mutt-send-email-mst@kernel.org
[2] https://lkml.org/lkml/2017/12/19/824

Cc: kernel-team@android.com
Cc: kernel-team@lge.com
Co-developed-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Applied 0day and Paul Walmsley feedback on ->monitor_todo. ]
[ paulmck: Make it work during early boot. ]
[ paulmck: Add a crude early boot self-test. ]
[ paulmck: Style adjustments and experimental docbook structure header. ]
Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.9999.1908161931110.32497@viisi.sifive.com/T/#me9956f66cb611b95d26ae92700e1d901f46e8c59
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:17:03 -08:00
..
Kconfig rcu: Use CONFIG_PREEMPTION 2019-07-31 19:03:35 +02:00
Kconfig.debug rcu: Add support for consolidated-RCU reader checking 2019-08-09 11:00:35 -07:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
rcu_segcblist.c rcu: Several rcu_segcblist functions can be static 2019-10-30 08:33:22 -07:00
rcu_segcblist.h rcu/nocb: Add bypass callback queueing 2019-08-13 14:37:32 -07:00
rcu.h Merge branches 'doc.2019.10.29a', 'fixes.2019.10.30a', 'nohz.2019.10.28a', 'replace.2019.10.30a', 'torture.2019.10.05a' and 'lkmm.2019.10.05a' into HEAD 2019-10-30 08:47:13 -07:00
rcuperf.c rcu: Remove unused variable rcu_perf_writer_state 2019-10-05 11:49:36 -07:00
rcutorture.c Merge branches 'doc.2019.10.29a', 'fixes.2019.10.30a', 'nohz.2019.10.28a', 'replace.2019.10.30a', 'torture.2019.10.05a' and 'lkmm.2019.10.05a' into HEAD 2019-10-30 08:47:13 -07:00
srcutiny.c srcu: Remove cleanup_srcu_struct_quiesced() 2019-03-26 14:39:24 -07:00
srcutree.c srcu: Avoid srcutorture security-based pointer obfuscation 2019-08-01 14:05:51 -07:00
sync.c rcu/sync: Simplify the state machine 2019-05-28 09:05:23 -07:00
tiny.c rcu: rcu_qs -- Use raise_softirq_irqoff to not save irqs twice 2019-03-26 14:37:49 -07:00
tree_exp.h rcu: Fix spelling mistake "greate"->"great" 2019-08-12 11:25:06 -07:00
tree_plugin.h rcu: Fix uninitialized variable in nocb_gp_wait() 2019-10-30 08:34:53 -07:00
tree_stall.h Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-16 17:25:49 -07:00
tree.c rcu: Add basic support for kfree_rcu() batching 2020-01-24 10:17:03 -08:00
tree.h rcu: Force tick on for nohz_full CPUs not reaching quiescent states 2019-10-28 07:02:21 -07:00
update.c rcu: Add basic support for kfree_rcu() batching 2020-01-24 10:17:03 -08:00