linux/kernel/rcu
Uladzislau Rezki (Sony) cc37d52076 rcu/kvfree: Use a polled API to speedup a reclaim process
Currently all objects placed into a batch wait for a full grace period
to elapse after that batch is ready to send to RCU.  However, this
can unnecessarily delay freeing of the first objects that were added
to the batch.  After all, several RCU grace periods might have elapsed
since those objects were added, and if so, there is no point in further
deferring their freeing.

This commit therefore adds per-page grace-period snapshots which are
obtained from get_state_synchronize_rcu().  When the batch is ready
to be passed to call_rcu(), each page's snapshot is checked by passing
it to poll_state_synchronize_rcu().  If a given page's RCU grace period
has already elapsed, its objects are freed immediately by kvfree_rcu_bulk().
Otherwise, these objects are freed after a call to synchronize_rcu().

This approach requires that the pages be traversed in reverse order,
that is, the oldest ones first.

Test example:

kvm.sh --memory 10G --torture rcuscale --allcpus --duration 1 \
  --kconfig CONFIG_NR_CPUS=64 \
  --kconfig CONFIG_RCU_NOCB_CPU=y \
  --kconfig CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y \
  --kconfig CONFIG_RCU_LAZY=n \
  --bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 \
  rcuscale.holdoff=20 rcuscale.kfree_loops=10000 \
  torture.disable_onoff_at_boot" --trust-make

Before this commit:

Total time taken by all kfree'ers: 8535693700 ns, loops: 10000, batches: 1188, memory footprint: 2248MB
Total time taken by all kfree'ers: 8466933582 ns, loops: 10000, batches: 1157, memory footprint: 2820MB
Total time taken by all kfree'ers: 5375602446 ns, loops: 10000, batches: 1130, memory footprint: 6502MB
Total time taken by all kfree'ers: 7523283832 ns, loops: 10000, batches: 1006, memory footprint: 3343MB
Total time taken by all kfree'ers: 6459171956 ns, loops: 10000, batches: 1150, memory footprint: 6549MB

After this commit:

Total time taken by all kfree'ers: 8560060176 ns, loops: 10000, batches: 1787, memory footprint: 61MB
Total time taken by all kfree'ers: 8573885501 ns, loops: 10000, batches: 1777, memory footprint: 93MB
Total time taken by all kfree'ers: 8320000202 ns, loops: 10000, batches: 1727, memory footprint: 66MB
Total time taken by all kfree'ers: 8552718794 ns, loops: 10000, batches: 1790, memory footprint: 75MB
Total time taken by all kfree'ers: 8601368792 ns, loops: 10000, batches: 1724, memory footprint: 62MB

The reduction in memory footprint is well in excess of an order of
magnitude.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03 17:48:41 -08:00
..
Kconfig printk changes for 6.2 2022-12-12 09:01:36 -08:00
Kconfig.debug rcu: Make SRCU mandatory 2022-11-29 15:00:06 -08:00
Makefile rcuperf: Change rcuperf to rcuscale 2020-08-24 18:39:24 -07:00
rcu_segcblist.c rcu: Clarify fill-the-gap comment in rcu_segcblist_advance() 2022-04-11 17:28:48 -07:00
rcu_segcblist.h rcu: Mark writes to the rcu_segcblist structure's ->flags field 2022-02-14 10:36:58 -08:00
rcu.h printk changes for 6.2 2022-12-12 09:01:36 -08:00
rcuscale.c rcu/rcuscale: Use call_rcu_hurry() for async reader test 2022-11-29 14:04:33 -08:00
rcutorture.c Merge branches 'doc.2022.10.20a', 'fixes.2022.10.21a', 'lazy.2022.11.30a', 'srcunmisafe.2022.11.09a', 'torture.2022.10.18c' and 'torturescript.2022.10.20a' into HEAD 2022-11-30 13:20:05 -08:00
refscale.c refscale: Convert test_lock spinlock to raw_spinlock 2022-06-21 15:57:04 -07:00
srcutiny.c srcu: Make Tiny synchronize_srcu() check for readers 2022-12-01 15:49:12 -08:00
srcutree.c srcu: Debug NMI safety even on archs that don't require it 2022-10-21 10:44:11 -07:00
sync.c rcu/sync: Use call_rcu_hurry() instead of call_rcu 2022-11-29 14:04:33 -08:00
tasks.h Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
tiny.c rcu: Refactor kvfree_call_rcu() and high-level helpers 2023-01-03 17:48:40 -08:00
tree_exp.h rcu: Make call_rcu() lazy to save power 2022-11-29 14:02:23 -08:00
tree_nocb.h rcu: Shrinker for lazy rcu 2022-11-29 14:02:52 -08:00
tree_plugin.h rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity() 2022-10-18 14:59:57 -07:00
tree_stall.h sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task() 2022-08-31 05:03:14 -07:00
tree.c rcu/kvfree: Use a polled API to speedup a reclaim process 2023-01-03 17:48:41 -08:00
tree.h rcu: Make call_rcu() lazy to save power 2022-11-29 14:02:23 -08:00
update.c rcu: Make SRCU mandatory 2022-11-29 15:00:06 -08:00