mirror of
https://github.com/torvalds/linux.git
synced 2024-11-26 14:12:06 +00:00
9294a037c0
Patch series "mm/damon: let users feed and tame/auto-tune DAMOS".
Introduce Aim-oriented Feedback-driven DAMOS Aggressiveness Auto-tuning.
It makes DAMOS self-tuned with periodic simple user feedback.
Background: DAMOS Control Difficulty
====================================
DAMOS helps users easily implement access pattern aware system operations.
However, controlling DAMOS in the wild is not that easy.
The basic way for DAMOS control is specifying the target access pattern.
In this approach, the user is assumed to well understand the access
pattern and the characteristics of the system and the workloads. Though
there are useful tools for that, it takes time and effort depending on the
complexity and the dynamicity of the system and the workloads. After all,
the access pattern consists of three ranges, namely the size, the access
rate, and the age of the regions. It means users need to tune six
parameters, which is anyway not a simple task.
One of the worst cases would be DAMOS being too aggressive like a
berserker, and therefore consuming too much system resource and making
unwanted radical system operations. To let users avoid such cases, DAMOS
allows users to set the upper-limit of the schemes' aggressiveness, namely
DAMOS quota. DAMOS further provides its best-effort under the limit by
prioritizing regions based on the access pattern of the regions. For
example, users can ask DAMOS to page out up to 100 MiB of memory regions
per second. Then DAMOS pages out regions that are not accessed for a
longer time (colder) first under the limit. This allows users to set the
target access pattern a bit naive with wider ranges, and focus on tuning
only one parameter, the quota. In other words, the number of parameters
to tune can be reduced from six to one.
Still, however, the optimum value for the quota depends on the system and
the workloads' characteristics, so not that simple. The number of
parameters to tune can also increase again if the user needs to run
multiple schemes.
Aim-oriented Feedback-driven DAMOS Aggressiveness Auto Tuning
=============================================================
Users would use DAMOS since they want to achieve something with it. They
will likely have measurable metrics representing the achievement and the
target number of the metric like SLO, and continuously measure that
anyway. While the additional cost of getting the information is nearly
zero, it could be useful for DAMOS to understand how appropriate its
current aggressiveness is set, and adjust it on its own to make the metric
value more close to the target.
Based on this idea, we introduce a new way of tuning DAMOS with nearly
zero additional effort, namely Aim-oriented Feedback-driven DAMOS
Aggressiveness Auto Tuning. It asks users to provide feedback
representing how well DAMOS is doing relative to the users' aim. Then
DAMOS adjusts its aggressiveness, specifically the quota that provides
the best effort result under the limit, based on the current level of
the aggressiveness and the users' feedback.
Implementation
==============
The implementation asks users to represent the feedback with score
numbers. The scores could be anything including user-space specific
metrics including latency and throughput of special user-space workloads,
and system metrics including free memory ratio, memory pressure stall time
(PSI), and active to inactive LRU lists size ratio. The feedback scores
and the aggressiveness of the given DAMOS scheme are assumed to be
positively proportional, though. Selecting metrics of the assumption is
the users' responsibility.
The core logic uses the below simple feedback loop algorithm to calculate
the next aggressiveness level of the scheme from the current
aggressiveness level and the current feedback (target_score and
current_score). It calculates the compensation for next aggressiveness as
a proportion of current aggressiveness and distance to the target score.
As a result, it arrives at the near-goal state in a short time using big
steps when it's far from the goal, but avoids making unnecessarily radical
changes that could turn out to be a bad decision using small steps when
its near to the goal.
f(n) = max(1, f(n - 1) * ((target_score - current_score) / target_score + 1))
Note that the compensation value becomes negative when it's over
achieving the goal. That's why the feedback metric and the
aggressiveness of the scheme should be positively proportional. The
distance-adaptive speed manipulation is simply applied.
Example Use Cases
=================
If users want to reduce the memory footprint of the system as much as
possible as long as the time spent for handling the resulting memory
pressure is within a threshold, they could use DAMOS scheme that reclaims
cold memory regions aiming for a little level of memory pressure stall
time.
If users want the active/inactive LRU lists well balanced to reduce the
performance impact due to possible future memory pressure, they could use
two schemes. The first one would be set to locate hot pages in the active
LRU list, aiming for a specific active-to-inactive LRU list size ratio,
say, 70%. The second one would be to locate cold pages in the inactive
LRU list, aiming for a specific inactive-to-active LRU list size ratio,
say, 30%. Then, DAMOS will balance the two schemes based on the goal and
feedback.
This aim-oriented auto tuning could also be useful for general
balancing-required access aware system operations such as system memory
auto scaling[3] and tiered memory management[4]. These two example usages
are not what current DAMOS implementation is already supporting, but
require additional DAMOS action developments, though.
Evaluation: subtle memory pressure aiming proactive reclamation
===============================================================
To show if the implementation works as expected, we prepare four different
system configurations on AWS i3.metal instances. The first setup
(original) runs the workload without any DAMOS scheme. The second setup
(not-tuned) runs the workload with a virtual address space-based proactive
reclamation scheme that pages out memory regions that are not accessed for
five seconds or more. The third setup (offline-tuned) runs the same
proactive reclamation DAMOS scheme, but after making it tuned for each
workload offline, using our previous user-space driven automatic tuning
approach, namely DAMOOS[1]. The fourth and final setup (AFDAA) runs the
scheme that is the same as that of 'not-tuned' setup, but aims to keep
0.5% of 'some' memory pressure stall time (PSI) for the last 10 seconds
using the aiming-oriented auto tuning.
For each setup, we run realistic workloads from PARSEC3 and SPLASH-2X
benchmark suites. For each run, we measure RSS and runtime of the
workload, and 'some' memory pressure stall time (PSI) of the system. We
repeat the runs five times and use averaged measurements.
For simple comparison of the results, we normalize the measurements to
those of 'original'. In the case of the PSI, though, the measurement for
'original' was zero, so we normalize the value to that of 'not-tuned'
scheme's result. The normalized results are shown below.
Not-tuned Offline-tuned AFDAA
RSS 0.622688178226118 0.787950678944904 0.740093483278979
runtime 1.11767826657912 1.0564674983585 1.0910833880499
PSI 1 0.727521443794069 0.308498846350299
The 'not-tuned' scheme achieves about 38.7% memory saving but incur about
11.7% runtime slowdown. The 'offline-tuned' scheme achieves about 22.2%
memory saving with about 5.5% runtime slowdown. It also achieves about
28.2% memory pressure stall time saving. AFDAA achieves about 26% memory
saving with about 9.1% runtime slowdown. It also achieves about 69.1%
memory pressure stall time saving. We repeat this test multiple times,
and get consistent results. AFDAA is now integrated in our daily DAMON
performance test setup.
Apparently the aggressiveness of 'AFDAA' setup is somewhere between those
of 'not-tuned' and 'offline-tuned' setup, since its memory saving and
runtime overhead are between those of the other two setups. Actually we
set the memory pressure stall time goal aiming for this middle
aggressiveness. The difference in the two metrics are not significant,
though. However, it shows significant saving of the memory pressure stall
time, which was the goal of the auto-tuning, over the two variants.
Hence, we conclude the automatic tuning is working as expected.
Please note that the AFDAA setup is only for the evaluation, and
therefore intentionally set a bit aggressive. It might not be
appropriate for production environments.
The test code is also available[2], so you could reproduce it on your
system and workloads.
Patches Sequence
================
The first four patches implement the core logic and user interfaces for
the auto tuning. The first patch implements the core logic for the auto
tuning, and the API for DAMOS users in the kernel space. The second
patch implements basic file operations of DAMON sysfs directories and
files that will be used for setting the goals and providing the
feedback. The third patch connects the quota goals files inputs to the
DAMOS core logic. Finally the fourth patch implements a dedicated DAMOS
sysfs command for efficiently committing the quota goals feedback.
Two patches for simple tests of the logic and interfaces follow. The
fifth patch implements the core logic unit test. The sixth patch
implements a selftest for the DAMON Sysfs interface for the goals.
Finally, three patches for documentation follows. The seventh patch
documents the design of the feature. The eighth patch updates the API
doc for the new sysfs files. The final eighth patch updates the usage
document for the features.
References
==========
[1] DAOS paper:
https://www.amazon.science/publications/daos-data-access-aware-operating-system
[2] Evaluation code:
|
||
---|---|---|
.. | ||
damon | ||
kasan | ||
kfence | ||
kmsan | ||
backing-dev.c | ||
balloon_compaction.c | ||
bootmem_info.c | ||
cma_debug.c | ||
cma_sysfs.c | ||
cma.c | ||
cma.h | ||
compaction.c | ||
debug_page_alloc.c | ||
debug_page_ref.c | ||
debug_vm_pgtable.c | ||
debug.c | ||
dmapool_test.c | ||
dmapool.c | ||
early_ioremap.c | ||
fadvise.c | ||
fail_page_alloc.c | ||
failslab.c | ||
filemap.c | ||
folio-compat.c | ||
gup_test.c | ||
gup_test.h | ||
gup.c | ||
highmem.c | ||
hmm.c | ||
huge_memory.c | ||
hugetlb_cgroup.c | ||
hugetlb_vmemmap.c | ||
hugetlb_vmemmap.h | ||
hugetlb.c | ||
hwpoison-inject.c | ||
init-mm.c | ||
internal.h | ||
interval_tree.c | ||
io-mapping.c | ||
ioremap.c | ||
Kconfig | ||
Kconfig.debug | ||
khugepaged.c | ||
kmemleak.c | ||
ksm.c | ||
list_lru.c | ||
maccess.c | ||
madvise.c | ||
Makefile | ||
mapping_dirty_helpers.c | ||
memblock.c | ||
memcontrol.c | ||
memfd.c | ||
memory_hotplug.c | ||
memory-failure.c | ||
memory-tiers.c | ||
memory.c | ||
mempolicy.c | ||
mempool.c | ||
memremap.c | ||
memtest.c | ||
migrate_device.c | ||
migrate.c | ||
mincore.c | ||
mlock.c | ||
mm_init.c | ||
mm_slot.h | ||
mmap_lock.c | ||
mmap.c | ||
mmu_gather.c | ||
mmu_notifier.c | ||
mmzone.c | ||
mprotect.c | ||
mremap.c | ||
msync.c | ||
nommu.c | ||
oom_kill.c | ||
page_alloc.c | ||
page_counter.c | ||
page_ext.c | ||
page_idle.c | ||
page_io.c | ||
page_isolation.c | ||
page_owner.c | ||
page_poison.c | ||
page_reporting.c | ||
page_reporting.h | ||
page_table_check.c | ||
page_vma_mapped.c | ||
page-writeback.c | ||
pagewalk.c | ||
percpu-internal.h | ||
percpu-km.c | ||
percpu-stats.c | ||
percpu-vm.c | ||
percpu.c | ||
pgalloc-track.h | ||
pgtable-generic.c | ||
process_vm_access.c | ||
ptdump.c | ||
readahead.c | ||
rmap.c | ||
rodata_test.c | ||
secretmem.c | ||
shmem_quota.c | ||
shmem.c | ||
show_mem.c | ||
shrinker_debug.c | ||
shrinker.c | ||
shuffle.c | ||
shuffle.h | ||
slab_common.c | ||
slab.c | ||
slab.h | ||
slub.c | ||
sparse-vmemmap.c | ||
sparse.c | ||
swap_cgroup.c | ||
swap_slots.c | ||
swap_state.c | ||
swap.c | ||
swap.h | ||
swapfile.c | ||
truncate.c | ||
usercopy.c | ||
userfaultfd.c | ||
util.c | ||
vmalloc.c | ||
vmpressure.c | ||
vmscan.c | ||
vmstat.c | ||
workingset.c | ||
z3fold.c | ||
zbud.c | ||
zpool.c | ||
zsmalloc.c | ||
zswap.c |