Commit Graph

2412 Commits

Author SHA1 Message Date
Yinghai Lu
d49c428840 x86: make generic arch support NUMAQ
... so it could fall back to normal numa and we'd reduce the impact of the
NUMAQ subarch.

NUMAQ depends on GENERICARCH
also decouple genericarch numa from acpi.
also make it fall back to bigsmp if apicid > 8.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:34:42 +02:00
Yinghai Lu
e0da336468 x86: introduce max_physical_apicid for bigsmp switching
a multi-socket test-system with 3 or 4 ioapics, when 4 dualcore cpus or
2 quadcore cpus installed, needs to switch to bigsmp or physflat.

CPU apic id is [4,11] instead of [0,7], and we need to check max apic
id instead of cpu numbers.

also add check for 32 bit when acpi is not compiled in or acpi=off.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:32:09 +02:00
Yinghai Lu
c3ff01672a x86: fix boot failure with 64GB+ system with numa 32-bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:59 +02:00
Yinghai Lu
9043f00796 x86, numa, 32-bit: use find_e820_area() to find KVA RAM on node
don't assume we can use RAM near the end of every node.
Esp systems that have few memory and they could have
kva address and kva RAM all below max_low_pfn.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:52 +02:00
Yinghai Lu
cc1a9d86ce mm, x86: shrink_active_range() should check all
Now we are using register_e820_active_regions() instead of
add_active_range() directly. So end_pfn could be different between the
value in early_node_map to node_end_pfn.

So we need to make shrink_active_range() smarter.

shrink_active_range() is a generic MM function in mm/page_alloc.c but
it is only used on 32-bit x86. Should we move it back to some file in
arch/x86?

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:44 +02:00
Yinghai Lu
db3660c190 x86: remove all active memory ranges before registering them again after trimming - 64bit
this way we keep the early_node_map all right.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:30:47 +02:00
Ingo Molnar
2377494285 Revert "x86, 32-bit: SRAT fix"
This reverts commit ea57a5a6db, a better
fix will be merged.
2008-06-09 10:57:16 +02:00
Ingo Molnar
ea57a5a6db x86, 32-bit: SRAT fix
we were adding reserved BIOS ranges as general memory as well => not good.

solves this crash:

[   20.068075] hostname used greatest stack depth: 6464 bytes left
[   20.121404] BUG: unable to handle kernel <1>BUG: unable to handle kernel NULL pointer dereference at 00000b8c
[   20.121404] IP: [<c01160ae>] kmap_atomic_prot+0x2d/0x1c3
[   20.121404] *pdpt = 00000000367eb001 *pde = 0000000000000000
[   20.121404] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   20.121404]
[   20.121404] Pid: 2061, comm: rc.sysinit Not tainted (2.6.26-rc3 #2440)
[   20.121404] EIP: 0060:[<c01160ae>] EFLAGS: 00010206 CPU: 0
[   20.121404] EIP is at kmap_atomic_prot+0x2d/0x1c3

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-06 16:38:35 +02:00
Huang, Ying
c45a707dbe x86: linked list of setup_data for i386
This patch adds linked list of struct setup_data supported for i386.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
0c51a965ed x86: extract common part of head32.c and head64.c into head.c
This patch extracts the common part of head32.c and head64.c into head.c.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
ecacf09f7d x86: reserve EFI memory map with reserve_early
This patch reserves the EFI memory map with reserve_early(). Because EFI
memory map is allocated by bootloader, if it is not reserved by
reserved_early(), it may be overwritten through address returned by
find_e820_area().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
d0ec2c6f2c x86: reserve highmem pages via reserve_early
This patch makes early reserved highmem pages become reserved
pages. This can be used for highmem pages allocated by bootloader such
as EFI memory map, linked list of setup_data, etc.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
d3fbe5ea95 x86: split out common code into find_overlapped_early()
This patch clean up reserve_early() family functions by extracting the
common part of reserve_early(), free_early() and bad_addr() into
find_overlapped_early().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:01 +02:00
Yinghai Lu
bd70e522af x86: e820 max_arch_pfn typo fix for 64 bit
should use right shift

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 23:15:33 +02:00
Yinghai Lu
7b2a0a6c48 x86: make 32-bit use e820_register_active_regions()
this way 32-bit is more similar to 64-bit, and smarter e820 and numa.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:58 +02:00
Yinghai Lu
ee0c80fadf x86: move e820_register_active() to e820.c
to prepare 32-bit to use it.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:57 +02:00
Yinghai Lu
84b56fa46b x86, numa, 32-bit: make sure get we kva space
when 1/3 user/kernel split is used, and less memory is installed, or if
we have a big hole below 4g, max_low_pfn is still using 3g-128m

try to go down from max_low_pfn until we get it. otherwise will panic.

need to make 32-bit code to use register_e820_active_regions ... later.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:55 +02:00
Yinghai Lu
ab530e1f78 x86: early check if a system is numaq
so we could fall back to one node numa.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:06 +02:00
Yinghai Lu
f19dc2f22a x86: change propagate_e820_map() back to find_max_pfn(), 32-bit, fix
add memory_present() calls for sparse and non-numa.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:00:38 +02:00
Yinghai Lu
c8c034ce79 x86: clean up max_pfn_mapped usage - 64-bit
on 64-bit we only get valid max_pfn_mapped after init_memory_mapping().

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:29 +02:00
Yinghai Lu
6af61a7614 x86: clean up max_pfn_mapped usage - 32-bit
on 32-bit in head_32.S after initial page table is done, we get initial
max_pfn_mapped, and then kernel_physical_mapping_init will give us
a final one.

We need to use that to make sure find_e820_area will get valid addresses
for boot_map and for NODE_DATA(0) on numa32.

XEN PV and lguest may need to assign max_pfn_mapped too.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:28 +02:00
Yinghai Lu
287572cb38 x86, numa, 32-bit: avoid clash between ramdisk and kva
use find_e820_area to get address space...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:28 +02:00
Yinghai Lu
2944e16b25 x86: update mptable
make mptable to be consistent with acpi routing, so we could:

1. kexec kernel with acpi=off
2. work around BIOSes where acpi routing is working, but mptable is
   not right, so can use kernel/kexec to start other OSes that don't have
   good acpi support.

command line: update_mptable

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:27 +02:00
Yinghai Lu
e8c27ac919 x86, numa, 32-bit: print out debug info on all kvas
also fix the print out of node_remap_end_vaddr

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:26 +02:00
Yinghai Lu
0596152388 x86, 32-bit: change propagate_e820_map() back to find_max_pfn()
we don't need to call memory_present that early.
numa and sparse will call memory_present later and might
even fail, it will call memory_present for the full range.

also for sparse it will call alloc_bootmem ... before we set up bootmem.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu
b66cd72073 x86: set node_remap_size[0] in fallback path
... otherwise alloc_remap will not get node_mem_map from kva area, and
alloc_node_mem_map has to alloc_bootmem_node to get mem_map.
It will use two low address copies ...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu
ba924c81dd x86, numa, 32-bit: increase max_elements to 1024
so every element will represent 64M instead of 256M.

AMD opteron could have HW memory hole remapping, so could have
[0, 8g + 64M) on node0. Reduce element size to 64M to keep that on node 0

Later we need to use find_e820_area() to allocate memory_node_map like
on 64-bit. But need to move memory_present out of populate_mem_map...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:24 +02:00
Yinghai Lu
9a73aa81ff x86: 32bit numa srat fix early_ioremap leak
on two node system (16g RAM) with numa config I got this crash:

get_memcfg_from_srat: assigning address to rsdp
RSD PTR  v0 [ACPIAM]
ACPI: Too big length in RSDT: 92
failed to get NUMA memory information from SRAT table
NUMA - single node, flat memory mode
Node: 0, start_pfn: 0, end_pfn: 153
 Setting physnode_map array to node 0 for pfns:
 0
...
Pid: 0, comm: swapper Not tainted 2.6.26-rc4 #4
 [<80b41289>] hlt_loop+0x0/0x3
 [<8011efa0>] ? alloc_remap+0x50/0x70
 [<8079e32e>] alloc_node_mem_map+0x5e/0xa0
 [<8012e77b>] ? printk+0x1b/0x20
 [<80b590f6>] free_area_init_node+0xc6/0x470
 [<80b588fc>] ? __alloc_bootmem_node+0x2c/0x50
 [<80b58ad8>] ? find_min_pfn_for_node+0x38/0x70
 [<8012e77b>] ? printk+0x1b/0x20
 [<80b597c4>] free_area_init_nodes+0x254/0x2d0
 [<80b544d7>] zone_sizes_init+0x97/0xa0
 [<80b48a03>] setup_arch+0x383/0x530
 [<8012e77b>] ? printk+0x1b/0x20
 [<80b41aa4>] start_kernel+0x64/0x350
 [<80b412d8>] i386_start_kernel+0x8/0x10
 =======================

this patch increases the acpi table limit to 32.
Also match early_ioremap() with early_iounmap().

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:56 +02:00
Yinghai Lu
a5481280b2 x86: extend e820 early_res support 32bit -fix #5
reserve early numa kva, so it will not clash with new RAMDISK

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:53 +02:00
Yinghai Lu
163872950d x86: extend e820 early_res support 32bit -fix #4
reserve_early pgdata for 32bit numa

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:50 +02:00
Yinghai Lu
f0d43100f1 x86: extend e820 early_res support 32bit -fix #3
introduce init_pg_table_start, so xen PV could specify the value.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:47 +02:00
Yinghai Lu
3945e2c9ab x86: extend e820 ealy_res support 32bit - fix #2
remove extra -1 in reseve_early calling
    panic if can not find space for new RAMDISK

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:20:12 +02:00
Thomas Gleixner
85cc35fa72 x86: fix mpparse fallout
UP builds with LOCAL_APIC=y and IO_APIC=n fail with a missing
reference to mp_bus_not_pci. Distangle the mpparse code some more and
move the ioapic specific bus check into a separate function.

This code needs sume urgent un#ifdef surgery all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 21:21:36 +02:00
Alexey Starikovskiy
136ef671df x86: allow MPPARSE to be deselected in SMP configs 2008-05-25 12:01:26 +02:00
Alexey Starikovskiy
8732fc4b23 x86: move mp_bus_not_pci from mpparse.c 2008-05-25 12:01:26 +02:00
Alexey Starikovskiy
ce6444d39f x86: mp_bus_id_to_pci_bus is not needed 2008-05-25 12:01:25 +02:00
Alexey Starikovskiy
bab4b27c00 x86: move smp_found_config 2008-05-25 12:01:25 +02:00
Alexey Starikovskiy
f391835290 x86: move pic_mode to apic_32.c 2008-05-25 12:01:25 +02:00
Alexey Starikovskiy
b3e2416465 x86: Set pic_mode only if local apic code is present 2008-05-25 12:01:25 +02:00
Yinghai Lu
bf62f3981c x86: move e820_mark_nosave_regions to e820.c
and make e820_mark_nosave_regions to take limit_pfn to use max_low_pfn
for 32bit and end_pfn for 64bit

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 11:35:53 +02:00
Alexey Starikovskiy
aafbdf71f1 x86: fix mpparse/acpi interaction
Sitsofe Wheeler reported boot problems on linux-next.

It looks like the same issue as found by Soeren Sandman in 7575217f656a93,
"x86: initialize all fields of mp_irqs[mp_irq_entries]".

But his fix is also not complete, as dstapic is used before it assigned.

Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:13 +02:00
Soeren Sandmann
59f4519ad7 x86: initialize all fields of mp_irqs[mp_irq_entries]
Commit "x86: make config_irqsrc not MPspec specific" introduced some uses
of uninitialized fields in mp_config_acpi_legacy_irqs(). I need the
following patch to get sched-devel/master to boot.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:13 +02:00
Alexey Starikovskiy
2fddb6e28e x86: make config_irqsrc not MPspec specific
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:13 +02:00
Alexey Starikovskiy
ec2cd0a22e x86: make struct config_ioapic not MPspec specific
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:13 +02:00
Alexey Starikovskiy
5f8951487d x86: make mp_ioapic_routing definition local
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Alexey Starikovskiy
11113f84c7 x86: complete move ACPI from mpparse.c
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Alexey Starikovskiy
32c5061265 x86: move es7000_plat out of mpparse.c
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Yinghai Lu
11a62a0560 x86: cleanup print out for mptable
the new output is:

 MPTABLE: OEM ID: SUN
 MPTABLE: Product ID: 4600 M2
 MPTABLE: APIC at: 0x

instead of it all in one line with <6> and double Product ID...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:12 +02:00
Thomas Gleixner
4a139a7fde x86: include pci.h in e820_64.c
global pci_mem_start needs a declaration. include pci.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Thomas Gleixner
a91eea6df3 x86: fix shadow variables of global end_pnf in e820_64.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00