linux/arch/x86/kernel
Dave Hansen 5dfd486c47 x86, kvm: Fix kvm's use of __pa() on percpu areas
In short, it is illegal to call __pa() on an address holding
a percpu variable.  This replaces those __pa() calls with
slow_virt_to_phys().  All of the cases in this patch are
in boot time (or CPU hotplug time at worst) code, so the
slow pagetable walking in slow_virt_to_phys() is not expected
to have a performance impact.

The times when this actually matters are pretty obscure
(certain 32-bit NUMA systems), but it _does_ happen.  It is
important to keep KVM guests working on these systems because
the real hardware is getting harder and harder to find.

This bug manifested first by me seeing a plain hang at boot
after this message:

	CPU 0 irqstacks, hard=f3018000 soft=f301a000

or, sometimes, it would actually make it out to the console:

[    0.000000] BUG: unable to handle kernel paging request at ffffffff

I eventually traced it down to the KVM async pagefault code.
This can be worked around by disabling that code either at
compile-time, or on the kernel command-line.

The kvm async pagefault code was injecting page faults in
to the guest which the guest misinterpreted because its
"reason" was not being properly sent from the host.

The guest passes a physical address of an per-cpu async page
fault structure via an MSR to the host.  Since __pa() is
broken on percpu data, the physical address it sent was
bascially bogus and the host went scribbling on random data.
The guest never saw the real reason for the page fault (it
was injected by the host), assumed that the kernel had taken
a _real_ page fault, and panic()'d.  The behavior varied,
though, depending on what got corrupted by the bad write.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130122212435.4905663F@kernel.stglabs.ibm.com
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25 16:34:55 -08:00
..
acpi Linux 3.8-rc5 2013-01-25 16:31:21 -08:00
apic Linux 3.8-rc5 2013-01-25 16:31:21 -08:00
cpu Linux 3.8-rc5 2013-01-25 16:31:21 -08:00
.gitignore
alternative.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 10:47:45 -07:00
amd_gart_64.c X86 & IA64: adapt for dma_map_ops changes 2012-03-28 16:36:31 +02:00
amd_nb.c Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-07-22 16:07:45 -07:00
apb_timer.c Merge branch 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-07-23 10:34:47 -07:00
aperture_64.c x86/gart: Fix kmemleak warning 2012-06-06 11:58:38 +02:00
apm_32.c x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> 2012-06-06 09:17:22 +02:00
asm-offsets_32.c x86: Generate system call tables and unistd_*.h from tables 2011-11-17 13:35:37 -08:00
asm-offsets_64.c x32: If configured, add x32 system calls to system call tables 2012-02-20 12:52:06 -08:00
asm-offsets.c x86, um/x86: switch to generic sys_execve and kernel_execve 2012-09-30 22:53:32 -04:00
audit_64.c
bootflag.c
check.c x86: kernel/check.c simple_strtoul cleanup 2012-05-15 15:36:41 -07:00
cpuid.c Use get_online_cpus to avoid races involving CPU hotplug 2012-09-23 07:43:56 -07:00
crash_dump_32.c x86: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:15 +08:00
crash_dump_64.c
crash.c x86/kexec: crash_vmclear_local_vmcss needs __rcu 2012-12-11 19:55:23 -02:00
devicetree.c x86: dt: Use linear irq domain for ioapic(s) 2012-08-21 22:16:57 +02:00
doublefault_32.c
dumpstack_32.c x86: Move call to print_modules() out of show_regs() 2012-06-20 14:33:48 +02:00
dumpstack_64.c x86: Move call to print_modules() out of show_regs() 2012-06-20 14:33:48 +02:00
dumpstack.c x86: Move call to print_modules() out of show_regs() 2012-06-20 14:33:48 +02:00
e820.c x86, mm: Trim memory in memblock to be page aligned 2012-10-24 11:52:21 -07:00
early_printk.c Revert "x86/early_printk: Replace obsolete simple_strtoul() usage with kstrtoint()" 2012-07-22 15:47:52 +02:00
early-quirks.c
entry_32.S Fixes: 2013-01-18 12:02:52 -08:00
entry_64.S Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-12-20 18:05:28 -08:00
ftrace.c x86/ftrace: Use __pa_symbol instead of __pa on C visible symbols 2012-11-16 16:42:09 -08:00
head32.c x86: Fix warning about cast from pointer to integer of different size 2012-11-19 10:45:19 -08:00
head64.c x86: Fix warning about cast from pointer to integer of different size 2012-11-19 10:45:19 -08:00
head_32.S Merge branch 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-12-11 19:56:33 -08:00
head_64.S x86-64, hotplug: Add start_cpu0() entry point to head_64.S 2012-11-14 09:39:51 -08:00
head.c memblock, x86: Replace memblock_x86_reserve/free_range() with generic ones 2011-07-14 11:47:53 -07:00
hpet.c x86: hpet: Fix masking of MSI interrupts 2012-11-02 22:53:27 +01:00
hw_breakpoint.c
i386_ksyms_32.c
i387.c x86/i387.c: Initialize thread xstate only on CPU0 only once 2012-11-14 15:28:11 -08:00
i8237.c
i8253.c x86: Use common i8253 clockevent 2011-07-01 10:37:14 +02:00
i8259.c x86/irq/i8259: Fix incorrect comment 2012-08-22 09:34:24 +02:00
io_delay.c
ioport.c
irq_32.c x86: Use common threadinfo allocator 2012-05-08 14:08:44 +02:00
irq_64.c x86: Add stack top margin for stack overflow checking 2011-12-07 09:27:11 +01:00
irq_work.c
irq.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 11:13:33 -07:00
irqinit.c x86, 386 removal: Remove support for IRQ 13 FPU error reporting 2012-12-17 11:42:40 -08:00
jump_label.c jump_label, x86: Fix section mismatch 2011-12-06 20:41:02 +01:00
kdebugfs.c arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case 2012-07-26 15:07:20 +02:00
kgdb.c kgdb,x86: fix warning about unused variable 2012-10-12 06:37:34 -05:00
kprobes-common.h x86/kprobes: Split out optprobe related code to kprobes-opt.c 2012-03-06 09:49:49 +01:00
kprobes-opt.c x86/kprobes: Split out optprobe related code to kprobes-opt.c 2012-03-06 09:49:49 +01:00
kprobes.c kprobes/x86: Move skip_singlestep up 2012-09-20 14:48:16 +02:00
kvm.c x86, kvm: Fix kvm's use of __pa() on percpu areas 2013-01-25 16:34:55 -08:00
kvmclock.c x86, kvm: Fix kvm's use of __pa() on percpu areas 2013-01-25 16:34:55 -08:00
ldt.c Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
machine_kexec_32.c Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
machine_kexec_64.c
Makefile tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
microcode_amd.c x86, microcode, AMD: Add support for family 16h processors 2012-11-20 22:23:28 -08:00
microcode_core.c Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 11:15:17 -07:00
microcode_intel.c x86, microcode: Add a refresh firmware flag to ->request_microcode_fw 2012-08-22 16:15:58 -07:00
mmconf-fam10h_64.c
module.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-07-24 13:34:56 -07:00
mpparse.c Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-05-29 20:14:53 -07:00
msr.c Use get_online_cpus to avoid races involving CPU hotplug 2012-09-23 07:43:56 -07:00
nmi_selftest.c x86/nmi: Clean up register_nmi_handler() usage 2012-06-20 14:23:17 +02:00
nmi.c x86: Save cr2 in NMI in case NMIs take a page fault (for i386) 2012-06-08 18:51:12 -04:00
paravirt_patch_32.c
paravirt_patch_64.c
paravirt-spinlocks.c
paravirt.c x86, pvops: Remove hooks for {rd,wr}msr_safe_regs 2012-06-07 11:41:08 -07:00
pci-calgary_64.c x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> 2012-06-06 09:17:22 +02:00
pci-dma.c X86: drivers: remove __dev* attributes. 2013-01-03 15:57:04 -08:00
pci-iommu_table.c
pci-nommu.c X86: integrate CMA with DMA-mapping subsystem 2012-05-21 15:09:38 +02:00
pci-swiotlb.c X86 & IA64: adapt for dma_map_ops changes 2012-03-28 16:36:31 +02:00
pcspeaker.c
perf_regs.c perf: Fix off by one test in perf_reg_value() 2012-09-19 17:08:40 +02:00
probe_roms.c x86/pci/probe_roms: Add missing __iomem annotation to pci_map_biosrom() 2012-09-05 10:52:25 +02:00
process_32.c flagday: don't pass regs to copy_thread() 2012-11-28 23:43:42 -05:00
process_64.c flagday: don't pass regs to copy_thread() 2012-11-28 23:43:42 -05:00
process.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-12-12 12:22:13 -08:00
ptrace.c Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2012-12-03 06:27:05 +01:00
pvclock.c x86: pvclock: generic pvclock vsyscall initialization 2012-11-27 23:29:09 -02:00
quirks.c X86: drivers: remove __dev* attributes. 2013-01-03 15:57:04 -08:00
reboot_fixups_32.c
reboot.c x86/reboot: Remove quirk entry for SBC FITPC 2012-10-04 12:22:32 +02:00
relocate_kernel_32.S kexec, x86: Fix incorrect jump back address if not preserving context 2011-07-21 11:19:28 +02:00
relocate_kernel_64.S kexec, x86: Fix incorrect jump back address if not preserving context 2011-07-21 11:19:28 +02:00
resource.c
rtc.c x86: Allow tracing of functions in arch/x86/kernel/rtc.c 2012-10-24 13:14:22 +02:00
setup_percpu.c x86: Add read_mostly declaration/definition to variables from smp.h 2012-06-14 12:42:11 +02:00
setup.c Linux 3.8-rc5 2013-01-25 16:31:21 -08:00
signal.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-12-20 18:05:28 -08:00
smp.c x86/reboot: Update nonmi_ipi parameter 2012-05-14 11:49:38 +02:00
smpboot.c Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-12-11 19:58:29 -08:00
stacktrace.c
step.c ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL 2013-01-22 10:08:00 -08:00
sys_x86_64.c mm: fix cache coloring on x86_64 architecture 2012-12-11 17:22:25 -08:00
syscall_32.c x86, syscall: Re-fix typo in comment 2011-11-18 16:25:07 -08:00
syscall_64.c x32: If configured, add x32 system calls to system call tables 2012-02-20 12:52:06 -08:00
tboot.c Revert "x86-64/efi: Use EFI to deal with platform wall clock (again)" 2012-12-15 15:20:41 -08:00
tce_64.c Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
test_nx.c
test_rodata.c x86, extable: Remove open-coded exception table entries in arch/x86/kernel/test_rodata.c 2012-04-20 13:51:38 -07:00
time.c MCA: delete all remaining traces of microchannel bus support. 2012-05-17 19:06:13 -04:00
tls.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-03-29 14:28:26 -07:00
tls.h
topology.c x86, topology: Debug CPU0 hotplug 2012-11-14 15:28:11 -08:00
trace_clock.c tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
traps.c Merge branch 'x86/nuke386' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-12-19 13:02:23 -08:00
tsc_sync.c x86/tsc: Reduce the TSC sync check time for core-siblings 2012-02-22 11:49:40 +01:00
tsc.c x86: Allow tracing of functions in arch/x86/kernel/rtc.c 2012-10-24 13:14:22 +02:00
uprobes.c uprobes/x86: Cleanup the single-stepping code 2012-11-03 17:15:12 +01:00
verify_cpu.S
vm86_32.c thp: change split_huge_page_pmd() interface 2012-12-12 17:38:31 -08:00
vmlinux.lds.S x86, realmode: Move ACPI wakeup to unified realmode code 2012-05-08 11:46:05 -07:00
vsmp_64.c x86/apic/x2apic: Limit the vector reservation to the user specified mask 2012-07-06 11:00:22 +02:00
vsyscall_64.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2012-12-16 15:40:50 -08:00
vsyscall_emu_64.S x86-64: Rework vsyscall emulation and add vsyscall= parameter 2011-08-10 19:26:46 -05:00
vsyscall_trace.h x86-64: Add vsyscall:emulate_vsyscall trace event 2011-08-04 16:13:53 -07:00
x86_init.c x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done() 2012-09-12 15:33:06 +02:00
x8664_ksyms_64.c x86: Improve __phys_addr performance by making use of carry flags and inlining 2012-11-16 16:42:08 -08:00
xsave.c x86, smap: Do not abuse the [f][x]rstor_checking() functions for user space 2012-09-25 15:42:18 -07:00