linux/arch/x86
Chuck Anderson fc78d343fa xen/smp: initialize IPI vectors before marking CPU online
An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:

	kernel BUG at drivers/xen/events.c:1328!

RCU has detected that a CPU has not entered a quiescent state within the
grace period.  It needs to send the CPU a reschedule IPI if it is not
offline.  rcu_implicit_offline_qs() does this check:

	/*
	 * If the CPU is offline, it is in a quiescent state.  We can
	 * trust its state not to change because interrupts are disabled.
	 */
	if (cpu_is_offline(rdp->cpu)) {
		rdp->offline_fqs++;
		return 1;
	}

	Else the CPU is online.  Send it a reschedule IPI.

The CPU is in the middle of being hot-plugged and has been marked online
(!cpu_is_offline()).  See start_secondary():

	set_cpu_online(smp_processor_id(), true);
	...
	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;

start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
mark it as active:

	/*
	 * Wait until the cpu which brought this one up marked it
	 * online before enabling interrupts. If we don't do that then
	 * we can end up waking up the softirq thread before this cpu
	 * reached the active state, which makes the scheduler unhappy
	 * and schedule the softirq thread on the wrong cpu. This is
	 * only observable with forced threaded interrupts, but in
	 * theory it could also happen w/o them. It's just way harder
	 * to achieve.
	 */
	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
		cpu_relax();

	/* enable local interrupts */
	local_irq_enable();

The CPU being hot-plugged will be marked active after it has been fully
initialized by the CPU managing the hot-plug.  In the Xen PVHVM case
xen_smp_intr_init() is called to set up the hot-plugged vCPU's
XEN_RESCHEDULE_VECTOR.

The hot-plugging CPU is marked online, not marked active and does not have
its IPI vectors set up.  rcu_implicit_offline_qs() sees the hot-plugging
cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
This will lead to:

	kernel BUG at drivers/xen/events.c:1328!

	xen_send_IPI_one()
	xen_smp_send_reschedule()
	rcu_implicit_offline_qs()
	rcu_implicit_dynticks_qs()
	force_qs_rnp()
	force_quiescent_state()
	__rcu_process_callbacks()
	rcu_process_callbacks()
	__do_softirq()
	call_softirq()
	do_softirq()
	irq_exit()
	xen_evtchn_do_upcall()

because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
the XEN_RESCHEDULE_VECTOR.

There is at least one other place that has caused the same crash:

	xen_smp_send_reschedule()
	wake_up_idle_cpu()
	add_timer_on()
	clocksource_watchdog()
	call_timer_fn()
	run_timer_softirq()
	__do_softirq()
	call_softirq()
	do_softirq()
	irq_exit()
	xen_evtchn_do_upcall()
	xen_hvm_callback_vector()

clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
a watchdog timer:

	/*
	 * Cycle through CPUs to check if the CPUs stay synchronized
	 * to each other.
	 */
	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
	if (next_cpu >= nr_cpu_ids)
		next_cpu = cpumask_first(cpu_online_mask);
	watchdog_timer.expires += WATCHDOG_INTERVAL;
	add_timer_on(&watchdog_timer, next_cpu);

This resulted in an attempt to send an IPI to a hot-plugging CPU that
had not initialized its reschedule vector. One option would be to make
the RCU code check to not check for CPU offline but for CPU active.
As becoming active is done after a CPU is online (in older kernels).

But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
completely reworked - in the online path, cpu_active is set *before* cpu_online,
and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
is better left to the scheduler alone, since it has the intelligence to use it
judiciously."

The conclusion was that:
"
1. At the IPI sender side:

   It is incorrect to send an IPI to an offline CPU (cpu not present in
   the cpu_online_mask). There are numerous places where we check this
   and warn/complain.

2. At the IPI receiver side:

   It is incorrect to let the world know of our presence (by setting
   ourselves in global bitmasks) until our initialization steps are complete
   to such an extent that we can handle the consequences (such as
   receiving interrupts without crashing the sender etc.)
" (from Srivatsa)

As the native code enables the interrupts at some point we need to be
able to service them. In other words a CPU must have valid IPI vectors
if it has been marked online.

It doesn't need to handle the IPI (interrupts may be disabled) but needs
to have valid IPI vectors because another CPU may find it in cpu_online_mask
and attempt to send it an IPI.

This patch will change the order of the Xen vCPU bring-up functions so that
Xen vectors have been set up before start_secondary() is called.
It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
to initialize it.

Orabug 13823853
Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-08-20 10:13:05 -04:00
..
boot Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-04-30 08:34:07 -07:00
configs x86: Default to ARCH=x86 to avoid overriding CONFIG_64BIT 2012-12-20 14:37:18 -08:00
crypto crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of camellia cipher 2013-04-25 21:09:07 +08:00
ia32 unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE 2013-05-09 13:46:38 -04:00
include Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux 2013-05-11 15:23:17 -07:00
kernel x86: Fix bit corruption at CPU resume time 2013-05-20 11:36:03 -07:00
kvm KVM: emulator: emulate SALC 2013-05-09 13:15:08 +03:00
lguest Merge branch 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-04-30 08:41:21 -07:00
lib Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS 2013-04-30 17:04:09 -07:00
math-emu
mm x86/mm: Add missing comments for initial kernel direct mapping 2013-05-10 12:00:35 +02:00
net filter: bpf_jit_comp: refactor and unify BPF JIT image dump output 2013-03-21 17:25:56 -04:00
oprofile
pci Bug-fixes: 2013-05-11 16:19:30 -07:00
platform Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
power x86, gdt, hibernate: Store/load GDT for hibernate path. 2013-05-02 11:27:35 -07:00
realmode x86, relocs: Refactor the relocs tool to merge 32- and 64-bit ELF 2013-04-16 16:02:58 -07:00
syscalls unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE 2013-05-09 13:46:38 -04:00
tools x86, relocs: Refactor the relocs tool to merge 32- and 64-bit ELF 2013-04-16 16:02:58 -07:00
um make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect 2013-03-03 22:58:33 -05:00
vdso timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet() 2013-02-15 12:13:18 +01:00
video
xen xen/smp: initialize IPI vectors before marking CPU online 2013-08-20 10:13:05 -04:00
.gitignore
Kbuild
Kconfig Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-05-15 14:05:17 -07:00
Kconfig.cpu x86, 386 removal: Document Nx586 as a 386 and thus unsupported 2012-11-29 13:28:39 -08:00
Kconfig.debug Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS 2013-04-30 17:04:09 -07:00
Makefile Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-19 19:12:03 -08:00
Makefile_32.cpu x86, 386 removal: Remove CONFIG_M386 from Kconfig 2012-11-29 13:23:01 -08:00
Makefile.um