linux/arch/x86/mm
Tang Chen b959ed6c73 x86/mem-hotplug: support initialize page tables in bottom-up
The Linux kernel cannot migrate pages used by the kernel.  As a result,
kernel pages cannot be hot-removed.  So we cannot allocate hotpluggable
memory for the kernel.

In a memory hotplug system, any numa node the kernel resides in should be
unhotpluggable.  And for a modern server, each node could have at least
16GB memory.  So memory around the kernel image is highly likely
unhotpluggable.

ACPI SRAT (System Resource Affinity Table) contains the memory hotplug
info.  But before SRAT is parsed, memblock has already started to allocate
memory for the kernel.  So we need to prevent memblock from doing this.

So direct memory mapping page tables setup is the case.
init_mem_mapping() is called before SRAT is parsed.  To prevent page
tables being allocated within hotpluggable memory, we will use bottom-up
direction to allocate page tables from the end of kernel image to the
higher memory.

Note:
As for allocating page tables in lower memory, TJ said:

: This is an optional behavior which is triggered by a very specific kernel
: boot param, which I suspect is gonna need to stick around to support
: memory hotplug in the current setup unless we add another layer of address
: translation to support memory hotplug.

As for page tables may occupy too much lower memory if using 4K mapping
(CONFIG_DEBUG_PAGEALLOC and CONFIG_KMEMCHECK both disable using >4k
pages), TJ said:

: But as I said in the same paragraph, parsing SRAT earlier doesn't solve
: the problem in itself either.  Ignoring the option if 4k mapping is
: required and memory consumption would be prohibitive should work, no?
: Something like that would be necessary if we're gonna worry about cases
: like this no matter how we implement it, but, frankly, I'm not sure this
: is something worth worrying about.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:08 +09:00
..
kmemcheck bug.h: add include of it to various implicit C users 2012-02-29 17:15:08 -05:00
amdtopology.c x86/mm/numa: Simplify some bit mangling 2013-04-10 19:06:26 +02:00
dump_pagetables.c
extable.c x86, extable: Switch to relative exception table entries 2012-04-20 17:22:34 -07:00
fault.c perf/x86: Further optimize copy_from_user_nmi() 2013-10-29 12:02:54 +01:00
gup.c thp: add compound tail page _mapcount when mapped 2011-12-09 07:50:28 -08:00
highmem_32.c mm: accurately calculate zone->managed_pages for highmem zones 2013-07-03 16:07:33 -07:00
hugetlbpage.c mm: migrate: check movability of hugepage in unmap_and_move_huge_page() 2013-09-11 15:57:49 -07:00
init_32.c mm/x86: prepare for removing num_physpages and simplify mem_init() 2013-07-03 16:07:38 -07:00
init_64.c mm/x86: prepare for removing num_physpages and simplify mem_init() 2013-07-03 16:07:38 -07:00
init.c x86/mem-hotplug: support initialize page tables in bottom-up 2013-11-13 12:09:08 +09:00
iomap_32.c
ioremap.c mm: Remove unused variable idx0 in __early_ioremap() 2013-08-13 11:46:36 +02:00
kmmio.c
Makefile
memtest.c x86/memtest: Shorten time for tests 2013-02-18 09:28:42 +01:00
mm_internal.h x86, mm: Move after_bootmem to mm_internel.h 2012-11-17 11:59:45 -08:00
mmap.c x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member 2013-08-22 10:19:35 -07:00
mmio-mod.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
numa_32.c mm/x86: prepare for removing num_physpages and simplify mem_init() 2013-07-03 16:07:38 -07:00
numa_64.c x86, mm: kill numa_free_all_bootmem() 2012-11-17 11:59:47 -08:00
numa_emulation.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
numa_internal.h x86-32, mm: Rip out x86_32 NUMA remapping code 2013-01-31 14:12:30 -08:00
numa.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
pageattr-test.c x86: rename random32() to prandom_u32() 2013-04-29 18:28:42 -07:00
pageattr.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-04-30 08:40:35 -07:00
pat_internal.h
pat_rbtree.c rbtree: move augmented rbtree functionality to rbtree_augmented.h 2012-10-09 16:22:40 +09:00
pat.c x86: Do not try to sync identity map for non-mapped pages 2013-03-07 13:23:28 -08:00
pf_in.c
pf_in.h
pgtable_32.c Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
pgtable.c mm/pgtable: don't accumulate addr during pgd prepopulate pmd 2013-07-09 10:33:23 -07:00
physaddr.c x86, mm: Make DEBUG_VIRTUAL work earlier in boot 2013-01-25 16:33:22 -08:00
physaddr.h
setup_nx.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
srat.c ACPI / x86: Print Hot-Pluggable Field in SRAT. 2013-08-14 23:24:01 +02:00
testmmiotrace.c
tlb.c mm: vmstats: track TLB flush stats on UP too 2013-09-11 15:57:09 -07:00