linux/arch/x86/mm
Tejun Heo 1e01979c8f x86, numa: Implement pfn -> nid mapping granularity check
SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use
sections array to map pfn to nid which is limited in granularity.  If
NUMA nodes are laid out such that the mapping cannot be accurate, boot
will fail triggering BUG_ON() in mminit_verify_page_links().

On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
granular enough until commit 2706a0bf7b (x86, NUMA: Enable
CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  This
led to the following BUG_ON().

 On node 0 totalpages: 2096615
   DMA zone: 32 pages used for memmap
   DMA zone: 0 pages reserved
   DMA zone: 3927 pages, LIFO batch:0
   Normal zone: 1740 pages used for memmap
   Normal zone: 220978 pages, LIFO batch:31
   HighMem zone: 16405 pages used for memmap
   HighMem zone: 1853533 pages, LIFO batch:31
 BUG: Int 6: CR2   (null)
      EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
      EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
      err   (null)  EIP c16209aa   CS 00000060  flg 00010002
 Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
          (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
        f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
 Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
 Call Trace:
  [<c136b1e5>] ? early_fault+0x2e/0x2e
  [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
  [<c1620613>] ? memmap_init_zone+0xaf/0x10c
  [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
  [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
  [<c1601d80>] ? paging_init+0x112/0x118
  [<c15f578d>] ? setup_arch+0x791/0x82f
  [<c15f43d9>] ? start_kernel+0x6a/0x257

This patch implements node_map_pfn_alignment() which determines
maximum internode alignment and update numa_register_memblks() to
reject NUMA configuration if alignment exceeds the pfn -> nid mapping
granularity of the memory model as determined by PAGES_PER_SECTION.

This makes the problematic machine boot w/ flatmem by rejecting the
NUMA config and provides protection against crazy NUMA configurations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org
LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 21:58:29 -07:00
..
kmemcheck x86: Eliminate bp argument from the stack tracing routines 2010-11-18 14:37:34 +01:00
amdtopology.c x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too 2011-05-02 17:24:48 +02:00
dump_pagetables.c x86, mm: Create symbolic index into address_markers array 2010-07-20 16:56:19 -07:00
extable.c x86, 64-bit: Move K8 B step iret fixup to fault entry asm 2009-10-12 18:29:46 +02:00
fault.c x86: Move do_page_fault()'s error path under unlikely() 2011-05-26 13:54:03 +02:00
gup.c thp: mmu_notifier_test_young 2011-01-13 17:32:46 -08:00
highmem_32.c mm: fix race in kunmap_atomic() 2010-10-27 18:03:05 -07:00
hugetlbpage.c mm: Convert i_mmap_lock to a mutex 2011-05-25 08:39:18 -07:00
init_32.c x86, mm: Allow ZONE_DMA to be configurable 2011-05-16 14:03:28 -07:00
init_64.c x86, mm: Allow ZONE_DMA to be configurable 2011-05-16 14:03:28 -07:00
init.c mm: now that all old mmu_gather code is gone, remove the storage 2011-05-25 08:39:16 -07:00
iomap_32.c mm: fix race in kunmap_atomic() 2010-10-27 18:03:05 -07:00
ioremap.c ioremap: Delay sanity check until after a successful mapping 2011-04-29 08:02:47 +02:00
kmmio.c x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages 2010-06-18 11:30:09 +02:00
Makefile x86, NUMA: Rename amdtopology_64.c to amdtopology.c 2011-05-02 17:24:48 +02:00
memblock.c x86, efi: Do not reserve boot services regions within reserved areas 2011-06-18 22:48:49 +02:00
memtest.c x86, memblock: Replace e820_/_early string with memblock_ 2010-08-27 11:13:47 -07:00
mmap.c x86: Use helpers for rlimits 2010-01-27 15:17:31 -08:00
mmio-mod.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
numa_32.c x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ 2011-07-12 21:58:11 -07:00
numa_64.c x86, NUMA: Move NUMA init logic from numa_64.c to numa.c 2011-05-02 14:18:53 +02:00
numa_emulation.c x86, NUMA: Enable emulation on 32bit too 2011-05-02 17:24:48 +02:00
numa_internal.h x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() 2011-05-02 14:18:54 +02:00
numa.c x86, numa: Implement pfn -> nid mapping granularity check 2011-07-12 21:58:29 -07:00
pageattr-test.c
pageattr.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
pat_internal.h x86, pat: Fix memory leak in free_memtype 2010-05-26 11:26:04 -07:00
pat_rbtree.c rbtree: Undo augmented trees performance damage and regression 2010-07-05 14:43:50 +02:00
pat.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-08-06 10:17:52 -07:00
pf_in.c x86: Eliminate various 'set but not used' warnings 2011-05-21 19:10:33 +02:00
pf_in.h
pgtable_32.c x86: remove last traces of quicklist usage 2010-05-24 13:33:31 -07:00
pgtable.c x86: Flush TLB if PGD entry is changed in i386 PAE mode 2011-03-18 11:44:01 +01:00
physaddr.c x86: split __phys_addr out into separate file 2009-09-10 11:48:55 -07:00
physaddr.h x86: split __phys_addr out into separate file 2009-09-10 11:48:55 -07:00
setup_nx.c x86, cpu: Only CPU features determine NX capabilities 2010-11-10 15:43:15 -08:00
srat.c x86, NUMA: make srat.c 32bit safe 2011-05-02 14:18:52 +02:00
testmmiotrace.c x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages 2010-06-18 11:30:09 +02:00
tlb.c x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others() 2011-03-15 08:30:34 +01:00