linux/arch/x86/kernel
Lukasz Anaczkowski d81056b527 x86, ACPI: Handle apic/x2apic entries in MADT in correct order
ACPI specifies the following rules when listing APIC IDs:
(1) Boot processor is listed first
(2) For multi-threaded processors, BIOS should list the first logical
    processor of each of the individual multi-threaded processors in MADT
    before listing any of the second logical processors.
(3) APIC IDs < 0xFF should be listed in APIC subtable, APIC IDs >= 0xFF
    should be listed in X2APIC subtable

Because of above, when there's more than 0xFF logical CPUs, BIOS
interleaves APIC/X2APIC subtables.

Assuming, there's 72 cores, 72 hyper-threads each, 288 CPUs total,
listing is like this:

APIC (0,4,8, .., 252)
X2APIC (258,260,264, .. 284)
APIC (1,5,9,...,253)
X2APIC (259,261,265,...,285)
APIC (2,6,10,...,254)
X2APIC (260,262,266,..,286)
APIC (3,7,11,...,251)
X2APIC (255,261,262,266,..,287)

Now, before this patch, due to how ACPI MADT subtables were parsed (BSP
then X2APIC then APIC), kernel enumerated CPUs in reverted order (i.e.
high APIC IDs were getting low logical IDs, and low APIC IDs were
getting high logical IDs).
This is wrong for the following reasons:
() it's hard to predict how cores and threads are enumerated
() when it's hard to predict, s/w threads cannot be properly affinitized
   causing significant performance impact due to e.g. inproper cache
   sharing
() enumeration is inconsistent with how threads are enumerated on
   other Intel Xeon processors

So, order in which MADT APIC/X2APIC handlers are passed is
reverse and both handlers are passed to be called during same MADT
table to walk to achieve correct CPU enumeration.

In scenario when someone boots kernel with options 'maxcpus=72 nox2apic',
in result less cores may be booted, since some of the CPUs the kernel
will try to use will have APIC ID >= 0xFF. In such case, one
should not pass 'nox2apic'.

Disclimer: code parsing MADT APIC/X2APIC has not been touched since 2009,
when X2APIC support was initially added. I do not know why MADT parsing
code was added in the reversed order in the first place.
I guess it didn't matter at that time since nobody cared about cores
with APIC IDs >= 0xFF, right?

This patch is based on work of "Yinghai Lu <yinghai@kernel.org>"
previously published at https://lkml.org/lkml/2013/1/21/563

Here's the explanation why parsing interface needs to be changed
and why simpler approach will not work https://lkml.org/lkml/2015/9/7/285

Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de> (commit message)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-15 01:31:06 +02:00
..
acpi x86, ACPI: Handle apic/x2apic entries in MADT in correct order 2015-10-15 01:31:06 +02:00
apic Merge branch 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2015-09-08 12:28:10 -07:00
cpu watchdog: rename watchdog_suspend() and watchdog_resume() 2015-09-04 16:54:41 -07:00
fpu x86/fpu/math-emu: Fix crash in fork() 2015-08-22 10:23:03 +02:00
kprobes Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-14 14:37:47 -07:00
.gitignore
alternative.c Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-06-22 17:59:09 -07:00
amd_gart_64.c
amd_nb.c x86/gart: Check for GART support before accessing GART registers 2015-05-06 11:15:53 +02:00
apb_timer.c x86/asm/tsc: Rename native_read_tsc() to rdtsc() 2015-07-06 15:23:28 +02:00
aperture_64.c x86/gart: Check for GART support before accessing GART registers 2015-05-06 11:15:53 +02:00
apm_32.c apm32: Fix cputime == jiffies assumption 2015-07-29 15:44:58 +02:00
asm-offsets_32.c x86: Remove unused TI_cpu 2015-05-05 20:48:02 +02:00
asm-offsets_64.c x86/asm/entry: (Re-)rename __NR_entry_INT80_compat_max to __NR_syscall_compat_max 2015-06-08 23:43:38 +02:00
asm-offsets.c x86: Merge common 32-bit values in asm-offsets.c 2015-05-05 20:48:02 +02:00
audit_64.c
bootflag.c x86: don't use module_init for non-modular core bootflag code 2015-06-16 14:12:34 -04:00
check.c Linux 4.2-rc8 2015-08-25 09:59:19 +02:00
cpuid.c x86: Drop bogus __ref / __refdata annotations 2015-07-20 18:57:20 +02:00
crash_dump_32.c
crash_dump_64.c
crash.c x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h> 2015-06-03 12:02:00 +02:00
devicetree.c Replace module_init with equivalent device_initcall in non modules. 2015-07-02 10:30:48 -07:00
doublefault.c
dumpstack_32.c Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 13:23:34 -07:00
dumpstack_64.c
dumpstack.c Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 13:23:34 -07:00
e820.c The libnvdimm sub-system introduces, in addition to the libnvdimm-core, 2015-06-29 10:34:42 -07:00
early_printk.c x86/earlyprintk: Allow early_printk() to use console style parameters like '115200n8' 2015-07-06 17:33:47 +02:00
early-quirks.c Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux 2015-06-26 13:18:51 -07:00
espfix_64.c Merge branch 'x86/urgent' into x86/asm, before applying dependent patches 2015-07-31 10:23:35 +02:00
ftrace.c
head32.c
head64.c x86/kasan: Fix KASAN shadow region page tables 2015-07-06 14:53:13 +02:00
head_32.S Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-06-22 17:59:09 -07:00
head_64.S x86/kasan: Fix KASAN shadow region page tables 2015-07-06 14:53:13 +02:00
head.c
hpet.c Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 15:42:28 -07:00
hw_breakpoint.c perf/x86/hw_breakpoints: Fix check for kernel-space breakpoints 2015-08-04 10:16:55 +02:00
i386_ksyms_32.c preempt: Use preempt_schedule_context() as the official tracing preemption point 2015-06-07 15:57:42 +02:00
i8237.c
i8253.c clockevents/drivers/i8253: Migrate to new 'set-state' interface 2015-08-10 11:40:30 +02:00
i8259.c x86/asm/entry/irq: Clean up IRQn_VECTOR macros 2015-05-10 12:34:28 +02:00
io_delay.c
ioport.c x86/asm/entry: Rename 'init_tss' to 'cpu_tss' 2015-03-06 08:32:58 +01:00
irq_32.c x86/irq: Do not dereference irq descriptor before checking it 2015-08-28 10:30:15 +02:00
irq_64.c x86/irq: Store irq descriptor in vector array 2015-08-06 00:14:59 +02:00
irq_work.c x86: Consolidate irq entering inlines 2015-05-15 16:04:49 +02:00
irq.c Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 15:20:51 -07:00
irqinit.c x86/irq: Store irq descriptor in vector array 2015-08-06 00:14:59 +02:00
jump_label.c jump_label: Rename JUMP_LABEL_{EN,DIS}ABLE to JUMP_LABEL_{JMP,NOP} 2015-08-03 11:34:12 +02:00
kdebugfs.c
kexec-bzimage64.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2015-09-08 12:41:25 -07:00
kgdb.c Linux 4.0-rc7 2015-04-08 09:01:54 +02:00
ksysfs.c
kvm.c The bulk of the changes here is for x86. And for once it's not 2015-06-24 09:36:49 -07:00
kvmclock.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
ldt.c x86/ldt: Make modify_ldt synchronous 2015-07-31 10:23:23 +02:00
livepatch.c
machine_kexec_32.c
machine_kexec_64.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching 2015-06-23 14:07:26 -07:00
Makefile kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
mcount_64.S
mmconf-fam10h_64.c
module.c x86/mm/KASLR: Propagate KASLR status to kernel proper 2015-04-03 15:26:15 +02:00
mpparse.c x86: Cleanup irq_domain ops 2015-04-24 15:36:55 +02:00
msr.c
nmi_selftest.c
nmi.c Merge branch 'x86/urgent' into x86/asm, before applying dependent patches 2015-07-31 10:23:35 +02:00
paravirt_patch_32.c x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooks 2015-07-06 15:23:26 +02:00
paravirt_patch_64.c Merge branch 'locking/core' into x86/core, to prepare for dependent patch 2015-06-03 10:07:35 +02:00
paravirt-spinlocks.c locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS 2015-05-11 09:52:09 +02:00
paravirt.c x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooks 2015-07-06 15:23:26 +02:00
pci-calgary_64.c
pci-dma.c dma-mapping: consolidate dma_set_mask 2015-09-10 13:29:01 -07:00
pci-iommu_table.c
pci-nommu.c
pci-swiotlb.c x86/swiotlb: Try coherent allocations with __GFP_NOWARN 2015-06-11 08:28:38 +02:00
pcspeaker.c
perf_regs.c perf/x86/64: Report regs_user->ax too in get_regs_user() 2015-04-11 13:08:53 +02:00
pmem.c libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option 2015-08-19 00:34:34 -04:00
probe_roms.c
process_32.c x86/vm86: Clean up vm86.h includes 2015-07-31 13:31:10 +02:00
process_64.c x86/ldt: Make modify_ldt() optional 2015-07-31 13:30:45 +02:00
process.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
ptrace.c x86/entry: Move C entry and exit code to arch/x86/entry/common.c 2015-07-07 10:59:05 +02:00
pvclock.c x86: pvclock: Really remove the sched notifier for cross-cpu migrations 2015-04-27 15:49:30 +02:00
quirks.c
reboot_fixups_32.c
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relocate_kernel_32.S x86/asm: Optimize unnecessarily wide TEST instructions 2015-03-07 11:12:43 +01:00
relocate_kernel_64.S x86/asm: Replace "MOVQ $imm, %reg" with MOVL 2015-04-01 13:17:39 +02:00
resource.c
rtc.c
setup_percpu.c
setup.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
signal_compat.c x86/compat: Move copy_siginfo_*_user32() to signal_compat.c 2015-07-06 15:28:55 +02:00
signal.c Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixes 2015-08-18 09:39:47 +02:00
smp.c x86/mce: Clear Local MCE opt-in before kexec 2015-08-13 10:12:52 +02:00
smpboot.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 09:33:26 -07:00
stacktrace.c
step.c Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixes 2015-08-18 09:39:47 +02:00
sys_x86_64.c x86/mm: Improve AMD Bulldozer ASLR workaround 2015-03-31 10:01:17 +02:00
sysfb_efi.c
sysfb_simplefb.c
sysfb.c
tboot.c
tce_64.c
test_nx.c
test_rodata.c treewide: Fix typo in printk messages 2015-03-06 23:05:39 +01:00
time.c x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()' 2015-03-23 11:14:17 +01:00
tls.c
tls.h
topology.c x86: Drop bogus __ref / __refdata annotations 2015-07-20 18:57:20 +02:00
trace_clock.c x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites 2015-07-06 15:23:29 +02:00
tracepoint.c
traps.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
tsc_msr.c
tsc_sync.c x86/asm/tsc/sync: Use rdtsc_ordered() in check_tsc_warp() and drop extra barriers 2015-07-06 15:23:29 +02:00
tsc.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-03 15:46:07 -07:00
uprobes.c uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever 2015-07-31 10:38:06 +02:00
verify_cpu.S
vm86_32.c x86/vm86: Rename vm86->v86flags and v86mask 2015-07-31 13:31:11 +02:00
vmlinux.lds.S kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
vsmp_64.c x86: replace __init_or_module with __init in non-modular vsmp_64.c 2015-06-16 14:12:41 -04:00
x86_init.c PCI changes for the v4.2 merge window: 2015-06-23 13:41:24 -07:00
x8664_ksyms_64.c preempt: Use preempt_schedule_context() as the official tracing preemption point 2015-06-07 15:57:42 +02:00