linux/arch/x86/kernel
Thomas Gleixner 551adc6057 x86/irq: Cure live lock in fixup_irqs()
Harry reported, that he's able to trigger a system freeze with cpu hot
unplug. The freeze turned out to be a live lock caused by recent changes in
irq_force_complete_move().

When fixup_irqs() and from there irq_force_complete_move() is called on the
dying cpu, then all other cpus are in stop machine an wait for the dying cpu
to complete the teardown. If there is a move of an interrupt pending then
irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
mask and waits for them to clear the mask. That's obviously impossible as
those cpus are firmly stuck in stop machine with interrupts disabled.

I should have known that, but I completely overlooked it being concentrated on
the locking issues around the vectors. And the existance of the call to
__irq_complete_move() in the code, which actually sends the cleanup IPI made
it reasonable to wait for that cleanup to complete. That call was bogus even
before the recent changes as it was just a pointless distraction.

We have to look at two cases:

1) The move_in_progress flag of the interrupt is set

   This means the ioapic has been updated with the new vector, but it has not
   fired yet. In theory there is a race:

   set_ioapic(new_vector) <-- Interrupt is raised before update is effective,
   			      i.e. it's raised on the old vector. 

   So if the target cpu cannot handle that interrupt before the old vector is
   cleaned up, we get a spurious interrupt and in the worst case the ioapic
   irq line becomes stale, but my experiments so far have only resulted in
   spurious interrupts.

   But in case of cpu hotplug this should be a non issue because if the
   affinity update happens right before all cpus rendevouz in stop machine,
   there is no way that the interrupt can be blocked on the target cpu because
   all cpus loops first with interrupts enabled in stop machine, so the old
   vector is not yet cleaned up when the interrupt fires.

   So the only way to run into this issue is if the delivery of the interrupt
   on the apic/system bus would be delayed beyond the point where the target
   cpu disables interrupts in stop machine. I doubt that it can happen, but at
   least there is a theroretical chance. Virtualization might be able to
   expose this, but AFAICT the IOAPIC emulation is not as stupid as the real
   hardware.

   I've spent quite some time over the weekend to enforce that situation,
   though I was not able to trigger the delayed case.

2) The move_in_progress flag is not set and the old_domain cpu mask is not
   empty.

   That means, that an interrupt was delivered after the change and the
   cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
   responded to it yet.

In both cases we can assume that the next interrupt will arrive on the new
vector, so we can cleanup the old vectors on the cpus in the old_domain cpu
mask.

Fixes: 98229aa36c "x86/irq: Plug vector cleanup race"
Reported-by: Harry Junior <harryjr@outlook.fr>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603140931430.3657@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-18 14:51:06 +01:00
..
acpi Merge branch 'x86/cleanups' into x86/urgent 2016-03-17 09:44:57 +01:00
apic x86/irq: Cure live lock in fixup_irqs() 2016-03-18 14:51:06 +01:00
cpu Merge branch 'x86/cleanups' into x86/urgent 2016-03-17 09:44:57 +01:00
fpu Merge branch 'x86/cleanups' into x86/urgent 2016-03-17 09:44:57 +01:00
kprobes x86/mm: Expand the exception table logic to allow new handling options 2016-02-18 09:21:46 +01:00
.gitignore
alternative.c x86/alternatives: Make optimize_nops() interrupt safe and synced 2015-09-03 21:27:47 +02:00
amd_gart_64.c
amd_nb.c x86/gart: Check for GART support before accessing GART registers 2015-05-06 11:15:53 +02:00
apb_timer.c x86/asm/tsc: Rename native_read_tsc() to rdtsc() 2015-07-06 15:23:28 +02:00
aperture_64.c x86/gart: Check for GART support before accessing GART registers 2015-05-06 11:15:53 +02:00
apm_32.c x86: Fix misspellings in comments 2016-02-24 08:44:58 +01:00
asm-offsets_32.c x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup 2016-03-10 09:48:14 +01:00
asm-offsets_64.c x86/syscalls: Add syscall entry qualifiers 2016-01-29 09:46:38 +01:00
asm-offsets.c x86/asm-offsets: Remove PARAVIRT_enabled 2016-03-08 14:16:44 +01:00
audit_64.c
bootflag.c x86: don't use module_init for non-modular core bootflag code 2015-06-16 14:12:34 -04:00
check.c Linux 4.2-rc8 2015-08-25 09:59:19 +02:00
cpuid.c new helpers: no_seek_end_llseek{,_size}() 2015-12-23 10:41:31 -05:00
crash_dump_32.c
crash_dump_64.c
crash.c x86/kexec: Remove walk_iomem_res() call with GART type 2016-01-30 09:49:59 +01:00
devicetree.c Replace module_init with equivalent device_initcall in non modules. 2015-07-02 10:30:48 -07:00
doublefault.c
dumpstack_32.c
dumpstack_64.c
dumpstack.c
e820.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 09:32:27 -07:00
early_printk.c x86: Fix misspellings in comments 2016-02-24 08:44:58 +01:00
early-quirks.c Linux 4.4-rc2 2015-11-23 09:04:05 +01:00
espfix_64.c Merge branch 'x86/urgent' into x86/asm, before applying dependent patches 2015-07-31 10:23:35 +02:00
ftrace.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 09:32:27 -07:00
head32.c
head64.c Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 10:02:25 -07:00
head_32.S Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 10:45:39 -07:00
head_64.S Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 10:02:25 -07:00
head.c
hpet.c x86/cpufeature: Carve out X86_FEATURE_* 2016-01-30 11:22:17 +01:00
hw_breakpoint.c x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros 2015-12-19 11:49:55 +01:00
i386_ksyms_32.c preempt: Use preempt_schedule_context() as the official tracing preemption point 2015-06-07 15:57:42 +02:00
i8237.c
i8253.c clockevents/drivers/i8253: Migrate to new 'set-state' interface 2015-08-10 11:40:30 +02:00
i8259.c x86/irq: Probe for PIC presence before allocating descs for legacy IRQs 2015-11-07 10:37:37 +01:00
io_delay.c
ioport.c x86/iopl: Fix iopl capability check on Xen PV 2016-03-17 09:49:27 +01:00
irq_32.c genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
irq_64.c x86/irq: Drop unlikely before IS_ERR_OR_NULL 2015-10-01 11:08:56 +02:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
irq.c x86/irq: Call irq_force_move_complete with irq descriptor 2016-01-15 13:44:01 +01:00
irqinit.c x86/irq: Store irq descriptor in vector array 2015-08-06 00:14:59 +02:00
jump_label.c jump_label: Rename JUMP_LABEL_{EN,DIS}ABLE to JUMP_LABEL_{JMP,NOP} 2015-08-03 11:34:12 +02:00
kdebugfs.c
kexec-bzimage64.c x86: Fix misspellings in comments 2016-02-24 08:44:58 +01:00
kgdb.c Merge branch 'x86/cleanups' into x86/urgent 2016-03-17 09:44:57 +01:00
ksysfs.c
kvm.c The bulk of the changes here is for x86. And for once it's not 2015-06-24 09:36:49 -07:00
kvmclock.c x86: Fix misspellings in comments 2016-02-24 08:44:58 +01:00
ldt.c x86/ldt: Fix small LDT allocation for Xen 2015-09-14 12:10:50 +02:00
livepatch.c livepatch: Cleanup module page permission changes 2015-12-04 22:51:07 +01:00
machine_kexec_32.c
machine_kexec_64.c kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
Makefile kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
mcount_64.S x86/ftrace, x86/asm: Kill ftrace_caller_end label 2016-02-17 08:47:22 +01:00
mmconf-fam10h_64.c
module.c
mpparse.c x86/cpufeature: Use enum cpuid_leafs instead of magic numbers 2016-02-01 10:46:48 +01:00
msr.c x86/cpufeature: Carve out X86_FEATURE_* 2016-01-30 11:22:17 +01:00
nmi_selftest.c
nmi.c x86/nmi: Mark 'ignore_nmis' as __read_mostly 2016-03-08 12:48:19 +01:00
paravirt_patch_32.c x86/paravirt: Remove the unused irq_enable_sysexit pv op 2015-11-23 10:48:16 +01:00
paravirt_patch_64.c x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op 2015-11-23 10:48:16 +01:00
paravirt-spinlocks.c locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS 2015-05-11 09:52:09 +02:00
paravirt.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-11 16:26:03 -08:00
pci-calgary_64.c x86/platform/calgary: Constify cal_chipset_ops structures 2015-11-29 08:50:58 +01:00
pci-dma.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
pci-iommu_table.c
pci-nommu.c
pci-swiotlb.c x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN 2015-12-06 12:46:31 +01:00
pcspeaker.c
perf_regs.c
pmem.c x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search 2016-01-30 09:49:59 +01:00
probe_roms.c
process_32.c sched/core, sched/x86: Kill thread_info::saved_preempt_count 2015-10-06 17:08:18 +02:00
process_64.c x86/iopl/64: Properly context-switch IOPL on Xen PV 2016-03-17 09:49:26 +01:00
process.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 09:32:27 -07:00
ptrace.c arch/x86/kernel/ptrace.c: Remove unused arg_offs_table 2015-12-29 11:35:34 +01:00
pvclock.c x86/vdso: Remove pvclock fixmap machinery 2015-12-11 08:56:03 +01:00
quirks.c timers/x86/hpet: Type adjustments 2015-10-21 11:17:32 +02:00
reboot_fixups_32.c
reboot.c x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[] 2016-01-12 12:27:36 +01:00
relocate_kernel_32.S
relocate_kernel_64.S
resource.c
rtc.c x86/paravirt: Prevent rtc_cmos platform device init on PV guests 2015-12-19 21:35:13 +01:00
setup_percpu.c
setup.c x86/e820: Set System RAM type and descriptor 2016-01-30 09:49:57 +01:00
signal_compat.c x86/compat: Move copy_siginfo_*_user32() to signal_compat.c 2015-07-06 15:28:55 +02:00
signal.c x86/signal/64: Re-add support for SS in the 64-bit signal context 2016-02-17 08:32:11 +01:00
smp.c x86/smp: Remove single IPI wrapper 2015-11-05 13:07:54 +01:00
smpboot.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 13:50:29 -07:00
stacktrace.c
step.c Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixes 2015-08-18 09:39:47 +02:00
sys_x86_64.c
sysfb_efi.c
sysfb_simplefb.c
sysfb.c
tboot.c
tce_64.c
test_nx.c x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option 2016-02-22 08:51:38 +01:00
test_rodata.c x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option 2016-02-22 08:51:38 +01:00
time.c
tls.c
tls.h
topology.c x86: Drop bogus __ref / __refdata annotations 2015-07-20 18:57:20 +02:00
trace_clock.c x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites 2015-07-06 15:23:29 +02:00
tracepoint.c
traps.c Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 10:23:56 -07:00
tsc_msr.c
tsc_sync.c x86/asm/tsc/sync: Use rdtsc_ordered() in check_tsc_warp() and drop extra barriers 2015-07-06 15:23:29 +02:00
tsc.c x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known() 2016-03-18 14:51:06 +01:00
uprobes.c uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever 2015-07-31 10:38:06 +02:00
verify_cpu.S x86/cpufeature: Carve out X86_FEATURE_* 2016-01-30 11:22:17 +01:00
vm86_32.c x86/cpufeature: Replace the old static_cpu_has() with safe variant 2016-01-30 11:22:18 +01:00
vmlinux.lds.S Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 09:32:27 -07:00
vsmp_64.c x86: replace __init_or_module with __init in non-modular vsmp_64.c 2015-06-16 14:12:41 -04:00
x86_init.c x86/tsc: Remove unused tsc_pre_init() hook 2015-11-19 11:03:13 +01:00
x8664_ksyms_64.c x86/mm, x86/mce: Add memcpy_mcsafe() 2016-03-08 17:54:38 +01:00