Allows emulating more interesting NUMA configurations like a quad
socket AMD Magny-Cour:
"numa=fake=8:10,16,16,22,16,22,16,22,
16,10,22,16,22,16,22,16,
16,22,10,16,16,22,16,22,
22,16,16,10,22,16,22,16,
16,22,16,22,10,16,16,22,
22,16,22,16,16,10,22,16,
16,22,16,22,16,22,10,16,
22,16,22,16,22,16,16,10"
Which has a non-fully-connected topology.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/n/tip-e1136ef7kdffj7yf9tjhydln@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 "urgent" leftovers from Ingo Molnar:
"Pending x86/urgent bits that were not high prio enough to warrant
-rc-less v3.3-final inclusion."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: Fix pointer math issue in handle_ramdisks()
x86/ioapic: Add register level checks to detect bogus io-apic entries
x86, mce: Fix rcu splat in drain_mce_log_buffer()
x86, memblock: Move mem_hole_size() to .init
Without this fix the cpumask_of_node() for a fake=numa=2 is:
cpumask 0 ff
cpumask 1 ff
with the fix it's correct and it's set to:
cpumask 0 55
cpumask 1 aa
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mem_hole_size() is being called only from __init-marked functions, and as
such should be moved to .init section as well. Fixes this warning:
WARNING: vmlinux.o(.text+0x35511): Section mismatch in reference from the function mem_hole_size() to the function .init.text:absent_pages_in_range()
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1202281614450.31150@pobox.suse.cz
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Other than sanity check and debug message, the x86 specific version of
memblock reserve/free functions are simple wrappers around the generic
versions - memblock_reserve/free().
This patch adds debug messages with caller identification to the
generic versions and replaces x86 specific ones and kills them.
arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty
after this change and removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-14-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
memblock_x86_hole_size() calculates the total size of holes in a given
range according to memblock and is used by numa emulation code and
numa_meminfo_cover_memory().
Since conversion to MEMBLOCK_NODE_MAP, absent_pages_in_range() also
uses memblock and gives the same result. This patch replaces
memblock_x86_hole_size() uses with absent_pages_in_range(). After the
conversion the x86 function doesn't have any user left and is killed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-12-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
25818f0f28 (memblock: Make MEMBLOCK_ERROR be 0) thankfully made
MEMBLOCK_ERROR 0 and there already are codes which expect error return
to be 0. There's no point in keeping MEMBLOCK_ERROR around. End its
misery.
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310457490-3356-6-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Now that NUMA init path is unified, NUMA emulation can be enabled on
32bit. Make numa_emluation.c safe on 32bit by doing the followings.
* Define MAX_DMA32_PFN on 32bit too.
* Include bootmem.h for max_pfn declaration.
* Use u64 explicitly and always use PFN_PHYS() when converting page
number to address.
* Avoid __udivdi3() generation on 32bit by doing number of pages
calculation instead in split_nodes_interleave().
And drop X86_64 dependency from Kconfig.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y
when NUMA emulation is enabled is currently broken because it does
not iterate through every emulated node and bind cpus that have
affinity to it.
NUMA emulation should bind each cpu to every local node to
accurately represent the true NUMA topology of the underlying
machine.
debug_cpumask_set_cpu() needs to be fixed at the same time so
that the debugging information that it emits shows the new
cpumask of the node being assigned when the cpu is being added
or removed.
It can now take responsibility of setting or clearing the cpu
itself to remove the need for duplicate code.
Also change its last parameter, "enable", to have the correct bool
type since it can only be true or false.
-v2: Fix the return statements, by Kosaki Motohiro
Acked-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The distance transforming in numa_emulation() used to call
numa_set_distance() for all MAX_NUMNODES * MAX_NUMNODES node
combinations regardless of which are enabled. As numa_set_distance()
ignores all out-of-bound distance settings, this doesn't cause any
problem other than looping unnecessarily many times during boot.
However, as MAX_NUMNODES * MAX_NUMNODES can be pretty high, update the
code such that it iterates through only the enabled combinations.
Yinghai Lu identified the issue and provided an initial patch to
address the issue; however, the patch was incorrect in that it didn't
build emulated distance table when there's no physical distance table
and unnecessarily complex.
http://thread.gmane.org/gmane.linux.kernel/1107986/focus=1107988
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Undetermined entries in emu_nid_to_phys[] are filled with zero
assuming that physical node 0 is always online; however, this might
not be true depending on hardware configuration. Find a physical node
which is actually online and use it instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1103020628210.31626@chino.kir.corp.google.com>
Handling of out-of-bounds distances and allocation failure can use
better documentation. Add it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
NUMA distance table handling has the following problems.
* numa_reset_distance() uses numa_distance * sizeof(numa_distance[0])
as the table size when it should be using the square of
numa_distance.
* The same size miscalculation when allocation space for phys_dist in
numa_emulation().
* In numa_emulation(), phys_dist must be reserved; otherwise, the new
emulated distance table may overlap it.
Fix them and, while at it, take numa_distance_cnt resetting in
numa_reset_distance() out of the if block to simplify the code a bit.
David Rientjes reported incorrect handling of distance table during
emulation.
-tj: Edited out numa_alloc_distance() related changes which weren't
necessary and rewrote patch description.
-v2: Ingo was unhappy with 80-column limit induced linebreaks. Let
lines run over 80-column.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: David Rientjes <rientjes@google.com>
Create numa_emulation.c and move all NUMA emulation code there. The
definitions of struct numa_memblk and numa_meminfo are moved to
numa_64.h. Also, numa_remove_memblk_from(), numa_cleanup_meminfo(),
numa_reset_distance() along with numa_emulation() are made global.
- v2: Internal declarations moved to numa_internal.h as suggested by
Yinghai.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>