linux/arch
Tang Chen b959ed6c73 x86/mem-hotplug: support initialize page tables in bottom-up
The Linux kernel cannot migrate pages used by the kernel.  As a result,
kernel pages cannot be hot-removed.  So we cannot allocate hotpluggable
memory for the kernel.

In a memory hotplug system, any numa node the kernel resides in should be
unhotpluggable.  And for a modern server, each node could have at least
16GB memory.  So memory around the kernel image is highly likely
unhotpluggable.

ACPI SRAT (System Resource Affinity Table) contains the memory hotplug
info.  But before SRAT is parsed, memblock has already started to allocate
memory for the kernel.  So we need to prevent memblock from doing this.

So direct memory mapping page tables setup is the case.
init_mem_mapping() is called before SRAT is parsed.  To prevent page
tables being allocated within hotpluggable memory, we will use bottom-up
direction to allocate page tables from the end of kernel image to the
higher memory.

Note:
As for allocating page tables in lower memory, TJ said:

: This is an optional behavior which is triggered by a very specific kernel
: boot param, which I suspect is gonna need to stick around to support
: memory hotplug in the current setup unless we add another layer of address
: translation to support memory hotplug.

As for page tables may occupy too much lower memory if using 4K mapping
(CONFIG_DEBUG_PAGEALLOC and CONFIG_KMEMCHECK both disable using >4k
pages), TJ said:

: But as I said in the same paragraph, parsing SRAT earlier doesn't solve
: the problem in itself either.  Ignoring the option if 4k mapping is
: required and memory consumption would be prohibitive should work, no?
: Something like that would be necessary if we're gonna worry about cases
: like this no matter how we implement it, but, frankly, I'm not sure this
: is something worth worrying about.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:08 +09:00
..
alpha sched, arch: Create asm/preempt.h 2013-09-25 14:07:50 +02:00
arc DeviceTree updates for 3.13. This is a bit larger pull request than 2013-11-12 16:52:17 +09:00
arm mm/arch: use NUMA_NO_NODE 2013-11-13 12:09:05 +09:00
arm64 mm/arch: use NUMA_NO_NODE 2013-11-13 12:09:05 +09:00
avr32 Linux 3.12-rc4 2013-10-09 12:36:13 +02:00
blackfin Main pin control pull request for the v3.13 cycle: 2013-11-12 15:40:03 +09:00
c6x DeviceTree updates for 3.13. This is a bit larger pull request than 2013-11-12 16:52:17 +09:00
cris cris: media platform drivers: fix build 2013-11-13 12:08:59 +09:00
frv sched, arch: Create asm/preempt.h 2013-09-25 14:07:50 +02:00
hexagon DeviceTree updates for 3.13. This is a bit larger pull request than 2013-11-12 16:52:17 +09:00
ia64 mm: use pgdat_end_pfn() to simplify the code in arch 2013-11-13 12:09:03 +09:00
m32r sched, arch: Create asm/preempt.h 2013-09-25 14:07:50 +02:00
m68k Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 10:20:12 +09:00
metag mm: use pgdat_end_pfn() to simplify the code in arch 2013-11-13 12:09:03 +09:00
microblaze mm/arch: use __free_reserved_page() to simplify the code 2013-11-13 12:09:03 +09:00
mips DeviceTree updates for 3.13. This is a bit larger pull request than 2013-11-12 16:52:17 +09:00
mn10300 sched, arch: Create asm/preempt.h 2013-09-25 14:07:50 +02:00
openrisc DeviceTree updates for 3.13. This is a bit larger pull request than 2013-11-12 16:52:17 +09:00
parisc mm/arch: use NUMA_NO_NODE 2013-11-13 12:09:05 +09:00
powerpc mm: use pgdat_end_pfn() to simplify the code in arch 2013-11-13 12:09:03 +09:00
s390 s390/mmap: randomize mmap base for bottom up direction 2013-11-13 12:09:08 +09:00
score Linux 3.12-rc4 2013-10-09 12:36:13 +02:00
sh mm: use pgdat_end_pfn() to simplify the code in arch 2013-11-13 12:09:03 +09:00
sparc mm/arch: use NUMA_NO_NODE 2013-11-13 12:09:05 +09:00
tile Merge branch 'core/urgent' into sched/core 2013-10-11 07:39:37 +02:00
um Merge branch 'linus' into sched/core 2013-11-01 08:24:41 +01:00
unicore32 sched, arch: Create asm/preempt.h 2013-09-25 14:07:50 +02:00
x86 x86/mem-hotplug: support initialize page tables in bottom-up 2013-11-13 12:09:08 +09:00
xtensa DeviceTree updates for 3.13. This is a bit larger pull request than 2013-11-12 16:52:17 +09:00
.gitignore
Kconfig Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 10:36:00 +09:00