mirror of
https://github.com/torvalds/linux.git
synced 2024-12-27 05:11:48 +00:00
894bc31041
When the system contains lots of mlocked or otherwise unevictable pages, the pageout code (kswapd) can spend lots of time scanning over these pages. Worse still, the presence of lots of unevictable pages can confuse kswapd into thinking that more aggressive pageout modes are required, resulting in all kinds of bad behaviour. Infrastructure to manage pages excluded from reclaim--i.e., hidden from vmscan. Based on a patch by Larry Woodman of Red Hat. Reworked to maintain "unevictable" pages on a separate per-zone LRU list, to "hide" them from vmscan. Kosaki Motohiro added the support for the memory controller unevictable lru list. Pages on the unevictable list have both PG_unevictable and PG_lru set. Thus, PG_unevictable is analogous to and mutually exclusive with PG_active--it specifies which LRU list the page is on. The unevictable infrastructure is enabled by a new mm Kconfig option [CONFIG_]UNEVICTABLE_LRU. A new function 'page_evictable(page, vma)' in vmscan.c tests whether or not a page may be evictable. Subsequent patches will add the various !evictable tests. We'll want to keep these tests light-weight for use in shrink_active_list() and, possibly, the fault path. To avoid races between tasks putting pages [back] onto an LRU list and tasks that might be moving the page from non-evictable to evictable state, the new function 'putback_lru_page()' -- inverse to 'isolate_lru_page()' -- tests the "evictability" of a page after placing it on the LRU, before dropping the reference. If the page has become unevictable, putback_lru_page() will redo the 'putback', thus moving the page to the unevictable list. This way, we avoid "stranding" evictable pages on the unevictable list. [akpm@linux-foundation.org: fix fallout from out-of-order merge] [riel@redhat.com: fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build] [nishimura@mxp.nes.nec.co.jp: remove redundant mapping check] [kosaki.motohiro@jp.fujitsu.com: unevictable-lru-infrastructure: putback_lru_page()/unevictable page handling rework] [kosaki.motohiro@jp.fujitsu.com: kill unnecessary lock_page() in vmscan.c] [kosaki.motohiro@jp.fujitsu.com: revert migration change of unevictable lru infrastructure] [kosaki.motohiro@jp.fujitsu.com: revert to unevictable-lru-infrastructure-kconfig-fix.patch] [kosaki.motohiro@jp.fujitsu.com: restore patch failure of vmstat-unevictable-and-mlocked-pages-vm-events.patch] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Debugged-by: Benjamin Kidwell <benjkidwell@yahoo.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
225 lines
6.7 KiB
Plaintext
225 lines
6.7 KiB
Plaintext
config SELECT_MEMORY_MODEL
|
|
def_bool y
|
|
depends on EXPERIMENTAL || ARCH_SELECT_MEMORY_MODEL
|
|
|
|
choice
|
|
prompt "Memory model"
|
|
depends on SELECT_MEMORY_MODEL
|
|
default DISCONTIGMEM_MANUAL if ARCH_DISCONTIGMEM_DEFAULT
|
|
default SPARSEMEM_MANUAL if ARCH_SPARSEMEM_DEFAULT
|
|
default FLATMEM_MANUAL
|
|
|
|
config FLATMEM_MANUAL
|
|
bool "Flat Memory"
|
|
depends on !(ARCH_DISCONTIGMEM_ENABLE || ARCH_SPARSEMEM_ENABLE) || ARCH_FLATMEM_ENABLE
|
|
help
|
|
This option allows you to change some of the ways that
|
|
Linux manages its memory internally. Most users will
|
|
only have one option here: FLATMEM. This is normal
|
|
and a correct option.
|
|
|
|
Some users of more advanced features like NUMA and
|
|
memory hotplug may have different options here.
|
|
DISCONTIGMEM is an more mature, better tested system,
|
|
but is incompatible with memory hotplug and may suffer
|
|
decreased performance over SPARSEMEM. If unsure between
|
|
"Sparse Memory" and "Discontiguous Memory", choose
|
|
"Discontiguous Memory".
|
|
|
|
If unsure, choose this option (Flat Memory) over any other.
|
|
|
|
config DISCONTIGMEM_MANUAL
|
|
bool "Discontiguous Memory"
|
|
depends on ARCH_DISCONTIGMEM_ENABLE
|
|
help
|
|
This option provides enhanced support for discontiguous
|
|
memory systems, over FLATMEM. These systems have holes
|
|
in their physical address spaces, and this option provides
|
|
more efficient handling of these holes. However, the vast
|
|
majority of hardware has quite flat address spaces, and
|
|
can have degraded performance from the extra overhead that
|
|
this option imposes.
|
|
|
|
Many NUMA configurations will have this as the only option.
|
|
|
|
If unsure, choose "Flat Memory" over this option.
|
|
|
|
config SPARSEMEM_MANUAL
|
|
bool "Sparse Memory"
|
|
depends on ARCH_SPARSEMEM_ENABLE
|
|
help
|
|
This will be the only option for some systems, including
|
|
memory hotplug systems. This is normal.
|
|
|
|
For many other systems, this will be an alternative to
|
|
"Discontiguous Memory". This option provides some potential
|
|
performance benefits, along with decreased code complexity,
|
|
but it is newer, and more experimental.
|
|
|
|
If unsure, choose "Discontiguous Memory" or "Flat Memory"
|
|
over this option.
|
|
|
|
endchoice
|
|
|
|
config DISCONTIGMEM
|
|
def_bool y
|
|
depends on (!SELECT_MEMORY_MODEL && ARCH_DISCONTIGMEM_ENABLE) || DISCONTIGMEM_MANUAL
|
|
|
|
config SPARSEMEM
|
|
def_bool y
|
|
depends on SPARSEMEM_MANUAL
|
|
|
|
config FLATMEM
|
|
def_bool y
|
|
depends on (!DISCONTIGMEM && !SPARSEMEM) || FLATMEM_MANUAL
|
|
|
|
config FLAT_NODE_MEM_MAP
|
|
def_bool y
|
|
depends on !SPARSEMEM
|
|
|
|
#
|
|
# Both the NUMA code and DISCONTIGMEM use arrays of pg_data_t's
|
|
# to represent different areas of memory. This variable allows
|
|
# those dependencies to exist individually.
|
|
#
|
|
config NEED_MULTIPLE_NODES
|
|
def_bool y
|
|
depends on DISCONTIGMEM || NUMA
|
|
|
|
config HAVE_MEMORY_PRESENT
|
|
def_bool y
|
|
depends on ARCH_HAVE_MEMORY_PRESENT || SPARSEMEM
|
|
|
|
#
|
|
# SPARSEMEM_EXTREME (which is the default) does some bootmem
|
|
# allocations when memory_present() is called. If this cannot
|
|
# be done on your architecture, select this option. However,
|
|
# statically allocating the mem_section[] array can potentially
|
|
# consume vast quantities of .bss, so be careful.
|
|
#
|
|
# This option will also potentially produce smaller runtime code
|
|
# with gcc 3.4 and later.
|
|
#
|
|
config SPARSEMEM_STATIC
|
|
bool
|
|
|
|
#
|
|
# Architecture platforms which require a two level mem_section in SPARSEMEM
|
|
# must select this option. This is usually for architecture platforms with
|
|
# an extremely sparse physical address space.
|
|
#
|
|
config SPARSEMEM_EXTREME
|
|
def_bool y
|
|
depends on SPARSEMEM && !SPARSEMEM_STATIC
|
|
|
|
config SPARSEMEM_VMEMMAP_ENABLE
|
|
bool
|
|
|
|
config SPARSEMEM_VMEMMAP
|
|
bool "Sparse Memory virtual memmap"
|
|
depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE
|
|
default y
|
|
help
|
|
SPARSEMEM_VMEMMAP uses a virtually mapped memmap to optimise
|
|
pfn_to_page and page_to_pfn operations. This is the most
|
|
efficient option when sufficient kernel resources are available.
|
|
|
|
# eventually, we can have this option just 'select SPARSEMEM'
|
|
config MEMORY_HOTPLUG
|
|
bool "Allow for memory hot-add"
|
|
depends on SPARSEMEM || X86_64_ACPI_NUMA
|
|
depends on HOTPLUG && !HIBERNATION && ARCH_ENABLE_MEMORY_HOTPLUG
|
|
depends on (IA64 || X86 || PPC64 || SUPERH || S390)
|
|
|
|
comment "Memory hotplug is currently incompatible with Software Suspend"
|
|
depends on SPARSEMEM && HOTPLUG && HIBERNATION
|
|
|
|
config MEMORY_HOTPLUG_SPARSE
|
|
def_bool y
|
|
depends on SPARSEMEM && MEMORY_HOTPLUG
|
|
|
|
config MEMORY_HOTREMOVE
|
|
bool "Allow for memory hot remove"
|
|
depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
|
|
depends on MIGRATION
|
|
|
|
#
|
|
# If we have space for more page flags then we can enable additional
|
|
# optimizations and functionality.
|
|
#
|
|
# Regular Sparsemem takes page flag bits for the sectionid if it does not
|
|
# use a virtual memmap. Disable extended page flags for 32 bit platforms
|
|
# that require the use of a sectionid in the page flags.
|
|
#
|
|
config PAGEFLAGS_EXTENDED
|
|
def_bool y
|
|
depends on 64BIT || SPARSEMEM_VMEMMAP || !NUMA || !SPARSEMEM
|
|
|
|
# Heavily threaded applications may benefit from splitting the mm-wide
|
|
# page_table_lock, so that faults on different parts of the user address
|
|
# space can be handled with less contention: split it at this NR_CPUS.
|
|
# Default to 4 for wider testing, though 8 might be more appropriate.
|
|
# ARM's adjust_pte (unused if VIPT) depends on mm-wide page_table_lock.
|
|
# PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes.
|
|
#
|
|
config SPLIT_PTLOCK_CPUS
|
|
int
|
|
default "4096" if ARM && !CPU_CACHE_VIPT
|
|
default "4096" if PARISC && !PA20
|
|
default "4"
|
|
|
|
#
|
|
# support for page migration
|
|
#
|
|
config MIGRATION
|
|
bool "Page migration"
|
|
def_bool y
|
|
depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE
|
|
help
|
|
Allows the migration of the physical location of pages of processes
|
|
while the virtual addresses are not changed. This is useful for
|
|
example on NUMA systems to put pages nearer to the processors accessing
|
|
the page.
|
|
|
|
config RESOURCES_64BIT
|
|
bool "64 bit Memory and IO resources (EXPERIMENTAL)" if (!64BIT && EXPERIMENTAL)
|
|
default 64BIT
|
|
help
|
|
This option allows memory and IO resources to be 64 bit.
|
|
|
|
config PHYS_ADDR_T_64BIT
|
|
def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
|
|
|
|
config ZONE_DMA_FLAG
|
|
int
|
|
default "0" if !ZONE_DMA
|
|
default "1"
|
|
|
|
config BOUNCE
|
|
def_bool y
|
|
depends on BLOCK && MMU && (ZONE_DMA || HIGHMEM)
|
|
|
|
config NR_QUICK
|
|
int
|
|
depends on QUICKLIST
|
|
default "2" if SUPERH || AVR32
|
|
default "1"
|
|
|
|
config VIRT_TO_BUS
|
|
def_bool y
|
|
depends on !ARCH_NO_VIRT_TO_BUS
|
|
|
|
config UNEVICTABLE_LRU
|
|
bool "Add LRU list to track non-evictable pages"
|
|
default y
|
|
depends on MMU
|
|
help
|
|
Keeps unevictable pages off of the active and inactive pageout
|
|
lists, so kswapd will not waste CPU time or have its balancing
|
|
algorithms thrown off by scanning these pages. Selecting this
|
|
will use one page flag and increase the code size a little,
|
|
say Y unless you know what you are doing.
|
|
|
|
config MMU_NOTIFIER
|
|
bool
|