linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-26 14:12:06 +00:00

History

SeongJae Park 9294a037c0 mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning Patch series "mm/damon: let users feed and tame/auto-tune DAMOS". Introduce Aim-oriented Feedback-driven DAMOS Aggressiveness Auto-tuning. It makes DAMOS self-tuned with periodic simple user feedback. Background: DAMOS Control Difficulty ==================================== DAMOS helps users easily implement access pattern aware system operations. However, controlling DAMOS in the wild is not that easy. The basic way for DAMOS control is specifying the target access pattern. In this approach, the user is assumed to well understand the access pattern and the characteristics of the system and the workloads. Though there are useful tools for that, it takes time and effort depending on the complexity and the dynamicity of the system and the workloads. After all, the access pattern consists of three ranges, namely the size, the access rate, and the age of the regions. It means users need to tune six parameters, which is anyway not a simple task. One of the worst cases would be DAMOS being too aggressive like a berserker, and therefore consuming too much system resource and making unwanted radical system operations. To let users avoid such cases, DAMOS allows users to set the upper-limit of the schemes' aggressiveness, namely DAMOS quota. DAMOS further provides its best-effort under the limit by prioritizing regions based on the access pattern of the regions. For example, users can ask DAMOS to page out up to 100 MiB of memory regions per second. Then DAMOS pages out regions that are not accessed for a longer time (colder) first under the limit. This allows users to set the target access pattern a bit naive with wider ranges, and focus on tuning only one parameter, the quota. In other words, the number of parameters to tune can be reduced from six to one. Still, however, the optimum value for the quota depends on the system and the workloads' characteristics, so not that simple. The number of parameters to tune can also increase again if the user needs to run multiple schemes. Aim-oriented Feedback-driven DAMOS Aggressiveness Auto Tuning ============================================================= Users would use DAMOS since they want to achieve something with it. They will likely have measurable metrics representing the achievement and the target number of the metric like SLO, and continuously measure that anyway. While the additional cost of getting the information is nearly zero, it could be useful for DAMOS to understand how appropriate its current aggressiveness is set, and adjust it on its own to make the metric value more close to the target. Based on this idea, we introduce a new way of tuning DAMOS with nearly zero additional effort, namely Aim-oriented Feedback-driven DAMOS Aggressiveness Auto Tuning. It asks users to provide feedback representing how well DAMOS is doing relative to the users' aim. Then DAMOS adjusts its aggressiveness, specifically the quota that provides the best effort result under the limit, based on the current level of the aggressiveness and the users' feedback. Implementation ============== The implementation asks users to represent the feedback with score numbers. The scores could be anything including user-space specific metrics including latency and throughput of special user-space workloads, and system metrics including free memory ratio, memory pressure stall time (PSI), and active to inactive LRU lists size ratio. The feedback scores and the aggressiveness of the given DAMOS scheme are assumed to be positively proportional, though. Selecting metrics of the assumption is the users' responsibility. The core logic uses the below simple feedback loop algorithm to calculate the next aggressiveness level of the scheme from the current aggressiveness level and the current feedback (target_score and current_score). It calculates the compensation for next aggressiveness as a proportion of current aggressiveness and distance to the target score. As a result, it arrives at the near-goal state in a short time using big steps when it's far from the goal, but avoids making unnecessarily radical changes that could turn out to be a bad decision using small steps when its near to the goal. f(n) = max(1, f(n - 1) * ((target_score - current_score) / target_score + 1)) Note that the compensation value becomes negative when it's over achieving the goal. That's why the feedback metric and the aggressiveness of the scheme should be positively proportional. The distance-adaptive speed manipulation is simply applied. Example Use Cases ================= If users want to reduce the memory footprint of the system as much as possible as long as the time spent for handling the resulting memory pressure is within a threshold, they could use DAMOS scheme that reclaims cold memory regions aiming for a little level of memory pressure stall time. If users want the active/inactive LRU lists well balanced to reduce the performance impact due to possible future memory pressure, they could use two schemes. The first one would be set to locate hot pages in the active LRU list, aiming for a specific active-to-inactive LRU list size ratio, say, 70%. The second one would be to locate cold pages in the inactive LRU list, aiming for a specific inactive-to-active LRU list size ratio, say, 30%. Then, DAMOS will balance the two schemes based on the goal and feedback. This aim-oriented auto tuning could also be useful for general balancing-required access aware system operations such as system memory auto scaling[3] and tiered memory management[4]. These two example usages are not what current DAMOS implementation is already supporting, but require additional DAMOS action developments, though. Evaluation: subtle memory pressure aiming proactive reclamation =============================================================== To show if the implementation works as expected, we prepare four different system configurations on AWS i3.metal instances. The first setup (original) runs the workload without any DAMOS scheme. The second setup (not-tuned) runs the workload with a virtual address space-based proactive reclamation scheme that pages out memory regions that are not accessed for five seconds or more. The third setup (offline-tuned) runs the same proactive reclamation DAMOS scheme, but after making it tuned for each workload offline, using our previous user-space driven automatic tuning approach, namely DAMOOS[1]. The fourth and final setup (AFDAA) runs the scheme that is the same as that of 'not-tuned' setup, but aims to keep 0.5% of 'some' memory pressure stall time (PSI) for the last 10 seconds using the aiming-oriented auto tuning. For each setup, we run realistic workloads from PARSEC3 and SPLASH-2X benchmark suites. For each run, we measure RSS and runtime of the workload, and 'some' memory pressure stall time (PSI) of the system. We repeat the runs five times and use averaged measurements. For simple comparison of the results, we normalize the measurements to those of 'original'. In the case of the PSI, though, the measurement for 'original' was zero, so we normalize the value to that of 'not-tuned' scheme's result. The normalized results are shown below. Not-tuned Offline-tuned AFDAA RSS 0.622688178226118 0.787950678944904 0.740093483278979 runtime 1.11767826657912 1.0564674983585 1.0910833880499 PSI 1 0.727521443794069 0.308498846350299 The 'not-tuned' scheme achieves about 38.7% memory saving but incur about 11.7% runtime slowdown. The 'offline-tuned' scheme achieves about 22.2% memory saving with about 5.5% runtime slowdown. It also achieves about 28.2% memory pressure stall time saving. AFDAA achieves about 26% memory saving with about 9.1% runtime slowdown. It also achieves about 69.1% memory pressure stall time saving. We repeat this test multiple times, and get consistent results. AFDAA is now integrated in our daily DAMON performance test setup. Apparently the aggressiveness of 'AFDAA' setup is somewhere between those of 'not-tuned' and 'offline-tuned' setup, since its memory saving and runtime overhead are between those of the other two setups. Actually we set the memory pressure stall time goal aiming for this middle aggressiveness. The difference in the two metrics are not significant, though. However, it shows significant saving of the memory pressure stall time, which was the goal of the auto-tuning, over the two variants. Hence, we conclude the automatic tuning is working as expected. Please note that the AFDAA setup is only for the evaluation, and therefore intentionally set a bit aggressive. It might not be appropriate for production environments. The test code is also available[2], so you could reproduce it on your system and workloads. Patches Sequence ================ The first four patches implement the core logic and user interfaces for the auto tuning. The first patch implements the core logic for the auto tuning, and the API for DAMOS users in the kernel space. The second patch implements basic file operations of DAMON sysfs directories and files that will be used for setting the goals and providing the feedback. The third patch connects the quota goals files inputs to the DAMOS core logic. Finally the fourth patch implements a dedicated DAMOS sysfs command for efficiently committing the quota goals feedback. Two patches for simple tests of the logic and interfaces follow. The fifth patch implements the core logic unit test. The sixth patch implements a selftest for the DAMON Sysfs interface for the goals. Finally, three patches for documentation follows. The seventh patch documents the design of the feature. The eighth patch updates the API doc for the new sysfs files. The final eighth patch updates the usage document for the features. References ========== [1] DAOS paper: https://www.amazon.science/publications/daos-data-access-aware-operating-system [2] Evaluation code: `3f884e6119` [3] Memory auto scaling RFC idea: https://lore.kernel.org/damon/20231112195114.61474-1-sj@kernel.org/ [4] DAMON-based tiered memory management RFC idea: https://lore.kernel.org/damon/20231112195602.61525-1-sj@kernel.org/ This patch (of 9) Users can effectively control the upper-limit aggressiveness of DAMOS schemes using the quota feature. The quota provides best result under the limit by prioritizing regions based on the access pattern. That said, finding the best value, which could depend on dynamic characteristics of the system and the workloads, is still challenging. Implement a simple feedback-driven tuning mechanism and use it for automatic tuning of DAMOS quota. The implementation allows users to provide the feedback by setting a feedback score returning callback function. Then DAMOS periodically calls the function back and adjusts the quota based on the return value of the callback and current quota value. Note that the absolute-value based time/size quotas still work as the maximum hard limits of the scheme's aggressiveness. The feedback-driven auto-tuned quota is applied only if it is not exceeding the manually set maximum limits. Same for the scheme-target access pattern and filters like other features. [sj@kernel.org: document get_score_arg field of struct damos_quota] Link: https://lkml.kernel.org/r/20231204170106.60992-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231130023652.50284-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231130023652.50284-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2023-12-12 10:57:03 -08:00
..
damon	mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning	2023-12-12 10:57:03 -08:00
kasan	kasan: record and report more information	2023-12-10 16:51:55 -08:00
kfence	LoongArch changes for v6.6	2023-09-08 12:16:52 -07:00
kmsan	kmsan: use stack_depot_save instead of __stack_depot_save	2023-12-10 16:51:46 -08:00
backing-dev.c	writeback: remove redundant checks for root memcg	2023-08-21 13:37:48 -07:00
balloon_compaction.c
bootmem_info.c	bootmem: use kmemleak_free_part_phys in put_page_bootmem	2023-10-25 16:47:13 -07:00
cma_debug.c
cma_sysfs.c	mm: cma: make kobj_type structure constant	2023-03-28 16:20:06 -07:00
cma.c	mm/cma: use nth_page() in place of direct struct page manipulation	2023-10-04 10:32:29 -07:00
cma.h
compaction.c	mm/compaction: factor out code to test if we should run compaction for target order	2023-10-04 10:32:19 -07:00
debug_page_alloc.c	mm: page_alloc: split out DEBUG_PAGEALLOC	2023-06-09 16:25:23 -07:00
debug_page_ref.c
debug_vm_pgtable.c	mm: fix multiple typos in multiple files	2023-10-25 16:47:14 -07:00
debug.c	mm: update validate_mm() to use vma iterator	2023-06-09 16:25:31 -07:00
dmapool_test.c	dmapool: add alloc/free performance test	2023-04-05 19:42:38 -07:00
dmapool.c	dmapool: create/destroy cleanup	2023-06-09 16:25:17 -07:00
early_ioremap.c	mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup()	2023-06-09 16:25:56 -07:00
fadvise.c	mm: remove unnecessary pagevec includes	2023-06-23 16:59:31 -07:00
fail_page_alloc.c	mm: page_alloc: split out FAIL_PAGE_ALLOC	2023-06-09 16:25:23 -07:00
failslab.c	mm: fix unexpected changes to {failslab\|fail_page_alloc}.attr	2022-11-22 18:50:44 -08:00
filemap.c	mm/filemap: increase usage of folio_next_index() helper	2023-12-10 16:51:35 -08:00
folio-compat.c	mm: return void from folio_start_writeback() and related functions	2023-12-10 16:51:37 -08:00
gup_test.c	Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.	2023-06-23 16:58:19 -07:00
gup_test.h	mm/gup_test: start/stop/read functionality for PIN LONGTERM test	2022-11-08 17:37:15 -08:00
gup.c	mm/gup: fix follow_devmap_p[mu]d() on page==NULL handling	2023-12-10 16:51:52 -08:00
highmem.c	mm: ptep_get() conversion	2023-06-19 16:19:25 -07:00
hmm.c	mm: enable page walking API to lock vmas during the walk	2023-08-21 13:07:20 -07:00
huge_memory.c	mm: huge_memory: batch tlb flush when splitting a pte-mapped THP	2023-12-10 16:51:34 -08:00
hugetlb_cgroup.c	mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER	2023-10-18 14:34:17 -07:00
hugetlb_vmemmap.c	mm: hugetlb_vmemmap: convert page to folio	2023-12-10 16:51:54 -08:00
hugetlb_vmemmap.h	mm: hugetlb_vmemmap: fix reference to nonexistent file	2023-10-25 16:47:14 -07:00
hugetlb.c	hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write	2023-12-06 16:12:43 -08:00
hwpoison-inject.c
init-mm.c	mm: move dummy_vm_ops out of a header	2023-08-21 13:37:46 -07:00
internal.h	maple_tree: separate ma_state node from status	2023-12-12 10:56:58 -08:00
interval_tree.c
io-mapping.c
ioremap.c	mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed	2023-08-18 10:12:36 -07:00
Kconfig	zswap: shrink zswap pool based on memory pressure	2023-12-12 10:57:02 -08:00
Kconfig.debug	mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM	2023-05-29 16:14:28 +01:00
khugepaged.c	As usual, lots of singleton and doubleton patches all over the tree and	2023-11-02 20:53:31 -10:00
kmemleak.c	kmemleak: add checksum to backtrace report	2023-12-10 16:51:43 -08:00
ksm.c	mm/ksm: use kmap_local_page() in calc_checksum()	2023-12-10 16:51:49 -08:00
list_lru.c	list_lru: allow explicit memcg and NUMA node selection	2023-12-12 10:57:01 -08:00
maccess.c	mm: Fix copy_from_user_nofault().	2023-04-12 17:36:23 -07:00
madvise.c	mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()	2023-12-06 16:12:50 -08:00
Makefile	mm: vmscan: move shrinker-related code into a separate file	2023-10-04 10:32:23 -07:00
mapping_dirty_helpers.c	mm: fix clean_record_shared_mapping_range kernel-doc	2023-08-24 16:20:30 -07:00
memblock.c	NUMA: optimize detection of memory with no node id assigned by firmware	2023-12-10 16:51:34 -08:00
memcontrol.c	mm: memcg: add per-memcg zswap writeback stat	2023-12-12 10:57:02 -08:00
memfd.c	memfd: drop warning for missing exec-related flags	2023-10-04 10:32:22 -07:00
memory_hotplug.c	mm/memory_hotplug: split memmap_on_memory requests across memblocks	2023-12-10 16:51:34 -08:00
memory-failure.c	fs: convert error_remove_page to error_remove_folio	2023-12-10 16:51:42 -08:00
memory-tiers.c	dax, kmem: calculate abstract distance with general interface	2023-10-16 15:44:39 -07:00
memory.c	mm/memory: use kmap_local_page() in __wp_page_copy_user()	2023-12-10 16:51:49 -08:00
mempolicy.c	Many singleton patches against the MM code. The patch series which are	2023-11-02 19:38:47 -10:00
mempool.c	mm/mempool: replace kmap_atomic() with kmap_local_page()	2023-12-10 16:51:49 -08:00
memremap.c	mm: use vmem_altmap code without CONFIG_ZONE_DEVICE	2023-12-10 16:51:48 -08:00
memtest.c	mm: memtest: convert to memtest_report_meminfo()	2023-08-21 13:37:47 -07:00
migrate_device.c	Add x86 shadow stack support	2023-08-31 12:20:12 -07:00
migrate.c	mm: migrate: record the mlocked page status to remove unnecessary lru drain	2023-10-25 16:47:14 -07:00
mincore.c	mm: enable page walking API to lock vmas during the walk	2023-08-21 13:07:20 -07:00
mlock.c	mm: mlock: avoid folio_within_range() on KSM pages	2023-10-25 16:47:14 -07:00
mm_init.c	mm/mm_init.c: append newline to the unavailable ranges log-message	2023-12-10 16:51:51 -08:00
mm_slot.h
mmap_lock.c
mmap.c	mmap: remove the IA64-specific vma expansion implementation	2023-12-10 16:51:39 -08:00
mmu_gather.c	mm: fix kernel-doc warning from tlb_flush_rmaps()	2023-08-24 16:20:30 -07:00
mmu_notifier.c	mmu_notifiers: rename invalidate_range notifier	2023-08-18 10:12:41 -07:00
mmzone.c	zswap: shrink zswap pool based on memory pressure	2023-12-12 10:57:02 -08:00
mprotect.c	mm: mprotect: use a folio in change_pte_range()	2023-10-25 16:47:12 -07:00
mremap.c	mm: abstract VMA merge and extend into vma_merge_extend() helper	2023-10-18 14:34:18 -07:00
msync.c
nommu.c	Many singleton patches against the MM code. The patch series which are	2023-11-02 19:38:47 -10:00
oom_kill.c	mm, oom:dump_tasks add rss detailed information printing	2023-12-10 16:51:53 -08:00
page_alloc.c	mm: page_alloc: unreserve highatomic page blocks before oom	2023-12-10 16:51:52 -08:00
page_counter.c
page_ext.c	mm/page_ext: move functions around for minor cleanups to page_ext	2023-08-18 10:12:31 -07:00
page_idle.c	mm: page_idle: convert page idle to use a folio	2023-01-18 17:12:52 -08:00
page_io.c	mm: memcg: add THP swap out info for anonymous reclaim	2023-10-04 10:32:27 -07:00
page_isolation.c	mm/hugetlb: get rid of page_hstate()	2023-08-18 10:12:39 -07:00
page_owner.c	mm/page_owner: record and dump free_pid and free_tgid	2023-12-10 16:51:40 -08:00
page_poison.c	mm/page_poison: replace kmap_atomic() with kmap_local_page()	2023-12-10 16:51:50 -08:00
page_reporting.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
page_reporting.h
page_table_check.c	mm: convert page_table_check_pte_set() to page_table_check_ptes_set()	2023-08-24 16:20:18 -07:00
page_vma_mapped.c	mm: correct stale comment of function check_pte	2023-08-18 10:12:13 -07:00
page-writeback.c	mm: return void from folio_start_writeback() and related functions	2023-12-10 16:51:37 -08:00
pagewalk.c	mm: pagewalk: assert write mmap lock only for walking the user page tables	2023-12-10 16:51:53 -08:00
percpu-internal.h	percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing	2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c	Many singleton patches against the MM code. The patch series which are	2023-11-02 19:38:47 -10:00
pgalloc-track.h
pgtable-generic.c	mm/pgtable: notes on pte_offset_map[_lock]()	2023-08-18 10:12:25 -07:00
process_vm_access.c	mm: fix process_vm_rw page counts	2023-12-10 16:51:39 -08:00
ptdump.c	mm: ptdump should use ptep_get_lockless()	2023-06-19 16:19:24 -07:00
readahead.c	vfs: fix readahead(2) on block devices	2023-10-19 11:02:49 +02:00
rmap.c	mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()	2023-10-18 14:34:14 -07:00
rodata_test.c
secretmem.c	mm/secretmem: use a folio in secretmem_fault()	2023-08-21 13:38:02 -07:00
shmem_quota.c	shmem: Add default quota limit mount options	2023-08-09 09:15:40 +02:00
shmem.c	fs: convert error_remove_page to error_remove_folio	2023-12-10 16:51:42 -08:00
show_mem.c	mm: refactor si_mem_available()	2023-10-04 10:32:19 -07:00
shrinker_debug.c	mm: shrinker: convert shrinker_rwsem to mutex	2023-10-04 10:32:26 -07:00
shrinker.c	mm: shrinker: convert shrinker_rwsem to mutex	2023-10-04 10:32:26 -07:00
shuffle.c
shuffle.h	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
slab_common.c	RCU pull request for v6.7	2023-10-30 18:01:41 -10:00
slab.c	Randomized slab caches for kmalloc()	2023-07-18 10:07:47 +02:00
slab.h	mm: kmem: scoped objcg protection	2023-10-25 16:47:11 -07:00
slub.c	slub, kasan: improve interaction of KASAN and slub_debug poisoning	2023-12-10 16:51:48 -08:00
sparse-vmemmap.c	mm/vmemmap: allow architectures to override how vmemmap optimization works	2023-08-18 10:12:53 -07:00
sparse.c	mm/sparse: remove redundant judgments from macro for_each_present_section_nr	2023-08-18 10:12:14 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c	zswap: shrink zswap pool based on memory pressure	2023-12-12 10:57:02 -08:00
swap.c	mm: remove references to pagevec	2023-06-23 16:59:30 -07:00
swap.h	zswap: make shrinking memcg-aware	2023-12-12 10:57:01 -08:00
swapfile.c	mm/swapfile: replace kmap_atomic() with kmap_local_page()	2023-12-10 16:51:53 -08:00
truncate.c	fs: convert error_remove_page to error_remove_folio	2023-12-10 16:51:42 -08:00
usercopy.c	mm: Fix copy_from_user_nofault().	2023-04-12 17:36:23 -07:00
userfaultfd.c	mm: more ptep_get() conversion	2023-11-15 15:30:09 -08:00
util.c	mm/util: use kmap_local_page() in memcmp_pages()	2023-12-10 16:51:49 -08:00
vmalloc.c	mm/vmalloc: fix the unchecked dereference warning in vread_iter()	2023-11-01 12:38:35 -07:00
vmpressure.c	net-memcg: Fix scope of sockmem pressure indicators	2023-08-16 12:21:32 +01:00
vmscan.c	mm/vmstat: move pgdemote_* to per-node stats	2023-12-10 16:51:31 -08:00
vmstat.c	mm: memcg: add per-memcg zswap writeback stat	2023-12-12 10:57:02 -08:00
workingset.c	list_lru: allow explicit memcg and NUMA node selection	2023-12-12 10:57:01 -08:00
z3fold.c	mm/z3fold: remove obsolete comment for struct z3fold_pool	2023-08-21 13:37:51 -07:00
zbud.c	mm: zswap: remove shrink from zpool interface	2023-06-19 16:19:27 -07:00
zpool.c	mm: zswap: remove shrink from zpool interface	2023-06-19 16:19:27 -07:00
zsmalloc.c	zsmalloc: use copy_page for full page copy	2023-10-18 14:34:16 -07:00
zswap.c	zswap: shrink zswap pool based on memory pressure	2023-12-12 10:57:02 -08:00