linux/mm
Michal Hocko 688eb988d1 vmscan: memcg: always use swappiness of the reclaimed memcg
Memory reclaim always uses swappiness of the reclaim target memcg
(origin of the memory pressure) or vm_swappiness for global memory
reclaim.  This behavior was consistent (except for difference between
global and hard limit reclaim) because swappiness was enforced to be
consistent within each memcg hierarchy.

After "mm: memcontrol: remove hierarchy restrictions for swappiness and
oom_control" each memcg can have its own swappiness independent of
hierarchical parents, though, so the consistency guarantee is gone.
This can lead to an unexpected behavior.  Say that a group is explicitly
configured to not swapout by memory.swappiness=0 but its memory gets
swapped out anyway when the memory pressure comes from its parent with a
It is also unexpected that the knob is meaningless without setting the
hard limit which would trigger the reclaim and enforce the swappiness.
There are setups where the hard limit is configured higher in the
hierarchy by an administrator and children groups are under control of
somebody else who is interested in the swapout behavior but not
necessarily about the memory limit.

From a semantic point of view swappiness is an attribute defining anon
vs.
 file proportional scanning of LRU which is memcg specific (unlike
charges which are propagated up the hierarchy) so it should be applied
to the particular memcg's LRU regardless where the memory pressure comes
from.

This patch removes vmscan_swappiness() and stores the swappiness into
the scan_control structure.  mem_cgroup_swappiness is then used to
provide the correct value before shrink_lruvec is called.  The global
vm_swappiness is used for the root memcg.

[hughd@google.com: oopses immediately when booted with cgroup_disable=memory]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:17 -07:00
..
backing-dev.c arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
balloon_compaction.c mm: print more details for bad_page() 2014-01-23 16:36:50 -08:00
bootmem.c mm/bootmem.c: remove unused local `map' 2013-11-13 12:09:09 +09:00
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
compaction.c mm, compaction: properly signal and act upon lock and need_sched() contention 2014-06-04 16:54:11 -07:00
debug-pagealloc.c
dmapool.c mm/dmapool.c: reuse devres_release() to free resources 2014-06-04 16:54:08 -07:00
early_ioremap.c mm: create generic early_ioremap() support 2014-04-07 16:36:15 -07:00
fadvise.c
failslab.c
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
filemap.c mm: avoid unnecessary atomic operations during end_page_writeback() 2014-06-04 16:54:10 -07:00
fremap.c mm: softdirty: make freshly remapped file pages being softdirty unconditionally 2014-06-04 16:53:56 -07:00
frontswap.c swap: change swap_list_head to plist, add swap_avail_head 2014-06-04 16:54:07 -07:00
gup.c mm: cleanup __get_user_pages() 2014-06-04 16:54:05 -07:00
highmem.c
huge_memory.c mm/huge_memory.c: complete conversion to pr_foo() 2014-06-04 16:53:58 -07:00
hugetlb_cgroup.c cgroup: drop const from @buffer of cftype->write_string() 2014-03-19 10:23:54 -04:00
hugetlb.c hugetlb: rename hugepage_migration_support() to ..._supported() 2014-06-04 16:54:12 -07:00
hwpoison-inject.c mm/hwpoison: add '#' to hwpoison_inject 2014-01-21 16:19:48 -08:00
init-mm.c
internal.h mm, compaction: properly signal and act upon lock and need_sched() contention 2014-06-04 16:54:11 -07:00
interval_tree.c
iov_iter.c take iov_iter stuff to mm/iov_iter.c 2014-04-01 23:19:30 -04:00
Kconfig mm/zsmalloc: make zsmalloc module-buildable 2014-06-04 16:54:14 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c mem-hotplug: implement get/put_online_mems 2014-06-04 16:53:59 -07:00
ksm.c mm: close PageTail race 2014-03-04 07:55:47 -08:00
list_lru.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
maccess.c
madvise.c mm: madvise: fix MADV_WILLNEED on shmem swapouts 2014-05-23 09:37:29 -07:00
Makefile mm: move get_user_pages()-related code to separate file 2014-06-04 16:54:04 -07:00
memblock.c mm/memblock.c: use PFN_DOWN 2014-06-04 16:54:02 -07:00
memcontrol.c vmscan: memcg: always use swappiness of the reclaimed memcg 2014-06-06 16:08:17 -07:00
memory_hotplug.c mm, migration: add destination page freeing callback 2014-06-04 16:54:06 -07:00
memory-failure.c mm/memory-failure.c: support use of a dedicated thread to handle SIGBUS(BUS_MCEERR_AO) 2014-06-04 16:54:13 -07:00
memory.c mm: document do_fault_around() feature 2014-06-04 16:54:12 -07:00
mempolicy.c mm, migration: add destination page freeing callback 2014-06-04 16:54:06 -07:00
mempool.c mm/mempool: warn about __GFP_ZERO usage 2014-06-04 16:53:58 -07:00
migrate.c hugetlb: rename hugepage_migration_support() to ..._supported() 2014-06-04 16:54:12 -07:00
mincore.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
mlock.c mm: try_to_unmap_cluster() should lock_page() before mlocking 2014-04-07 16:35:57 -07:00
mm_init.c mm: bring back /sys/kernel/mm 2014-01-27 21:02:39 -08:00
mmap.c Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-05 08:05:29 -07:00
mmu_context.c sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm 2014-02-21 08:50:17 +01:00
mmu_notifier.c mm: audit/fix non-modular users of module_init in core code 2014-01-23 16:36:52 -08:00
mmzone.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mprotect.c mm: move mmu notifier call from change_protection to change_pmd_range 2014-04-07 16:35:50 -07:00
mremap.c mm, thp: close race between mremap() and split_huge_page() 2014-05-11 17:55:48 +09:00
msync.c mm/msync.c: sync only the requested range in msync() 2014-06-04 16:54:11 -07:00
nobootmem.c mm/nobootmem.c: mark function as static 2014-04-03 16:21:02 -07:00
nommu.c mm: fix 'ERROR: do not initialise globals to 0 or NULL' and coding style 2014-04-07 16:35:55 -07:00
oom_kill.c mm, oom: base root bonus on current usage 2014-01-30 16:56:56 -08:00
page_alloc.c mm: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
page_cgroup.c mm/page_cgroup.c: mark functions as static 2014-04-03 16:21:02 -07:00
page_io.c swap: use bdev_read_page() / bdev_write_page() 2014-06-04 16:54:02 -07:00
page_isolation.c mm: memory-hotplug: enable memory hotplug to handle hugepage 2013-09-11 15:57:48 -07:00
page-writeback.c mm: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
pagewalk.c mm/pagewalk.c: fix walk_page_range() access of wrong PTEs 2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c
percpu.c percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree() 2014-04-14 16:18:06 -04:00
pgtable-generic.c mm: fix TLB flush race between migration, and change_protection_range 2013-12-18 19:04:51 -08:00
process_vm_access.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
quicklist.c
readahead.c mm/readahead.c: inline ra_submit 2014-04-07 16:35:58 -07:00
rmap.c mm/rmap.c: cleanup ttu_flags 2014-06-04 16:54:12 -07:00
shmem.c mm: non-atomically mark page accessed during page cache allocation where possible 2014-06-04 16:54:10 -07:00
slab_common.c slab: delete cache from list after __kmem_cache_shutdown succeeds 2014-06-04 16:54:08 -07:00
slab.c memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab 2014-06-04 16:54:01 -07:00
slab.h memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab 2014-06-04 16:54:01 -07:00
slob.c slab: get_online_mems for kmem_cache_{create,destroy,shrink} 2014-06-04 16:53:59 -07:00
slub.c slub: search partial list on numa_mem_id(), instead of numa_node_id() 2014-06-06 16:08:06 -07:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c mm: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:35:54 -07:00
swap_state.c mm: page_alloc: convert hot/cold parameter and immediate callers to bool 2014-06-04 16:54:09 -07:00
swap.c mm: non-atomically mark page accessed during page cache allocation where possible 2014-06-04 16:54:10 -07:00
swapfile.c mm/swapfile.c: delete the "last_in_cluster < scan_base" loop in the body of scan_swap_map() 2014-06-04 16:54:12 -07:00
truncate.c mm: filemap: update find_get_pages_tag() to deal with shadow entries 2014-05-06 13:04:59 -07:00
util.c nick kvfree() from apparmor 2014-05-06 14:02:53 -04:00
vmacache.c mm,vmacache: optimize overflow system-wide flushing 2014-06-04 16:53:57 -07:00
vmalloc.c mm/vmalloc.c: export unmap_kernel_range() 2014-06-04 16:54:14 -07:00
vmpressure.c arm, pm, vmpressure: add missing slab.h includes 2014-02-03 13:24:01 -05:00
vmscan.c vmscan: memcg: always use swappiness of the reclaimed memcg 2014-06-06 16:08:17 -07:00
vmstat.c mm: use the light version __mod_zone_page_state in mlocked_vma_newpage() 2014-06-04 16:54:07 -07:00
workingset.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
zbud.c mm/zbud.c: make size unsigned like unique callsite 2014-06-04 16:54:13 -07:00
zsmalloc.c zsmalloc: fixup trivial zs size classes value in comments 2014-06-04 16:54:13 -07:00
zswap.c mm/zswap: NUMA aware allocation for zswap_dstmem 2014-06-04 16:54:14 -07:00