linux/arch
Michael Wang 9a0133613e power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
Since v1:
	Edited the comment according to Srivatsa's suggestion.

During the testing, we encounter below WARN followed by Oops:

	WARNING: at kernel/sched/core.c:6218
	...
	NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200
	LR [c000000000101358] .build_sched_domains+0xec8/0x1200
	PACATMSCRATCH [800000000000f032]
	Call Trace:
	[c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200
	[c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
	[c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
	[c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
	...
	Oops: Kernel access of bad area, sig: 11 [#1]
	...
	NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0
	LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200
	PACATMSCRATCH [8000000000029032]
	Call Trace:
	[c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0
	[c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200
	[c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
	[c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
	[c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
	...

This was caused by that 'sd->groups == NULL' after building groups, which
was caused by the empty 'sd->span'.

The cpu's domain contained nothing because the cpu was assigned to a wrong
node, due to the following unfortunate sequence of events:

1. The hypervisor sent a topology update to the guest OS, to notify changes
   to the cpu-node mapping. However, the update was actually redundant - i.e.,
   the "new" mapping was exactly the same as the old one.

2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting
   the 'for-loop' in arch_update_cpu_topology().

3. So we ended up calling stop-machine() with an empty cpumask list, which made
   stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as
   the cpu to run the payload (the update_cpu_topology() function).

4. This causes update_cpu_topology() to be run by CPU0. And since 'updates'
   is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology()
   finds update->cpu as well as update->new_nid to be 0. In other words, we
   end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly.

Along with the following wrong updating, it causes the sched-domain rebuild
code to break and crash the system.

Fix this by skipping the topology update in cases where we find that
the topology has not actually changed in reality (ie., spurious updates).

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Robert Jennings <rcj@linux.vnet.ibm.com>
CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
CC: Alistair Popple <alistair@popple.id.au>
Suggested-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-09 12:54:17 +10:00
..
alpha PCI changes for the v3.15 merge window: 2014-04-01 15:14:04 -07:00
arc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-04-02 16:23:38 -07:00
arm IOMMU Upates for Linux v3.15 2014-04-05 18:46:26 -07:00
arm64 Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm 2014-04-02 14:50:10 -07:00
avr32 Char/Misc driver patches for 3.15-rc1 2014-04-01 16:13:21 -07:00
blackfin Most of the changes were largely clean ups, and some documentation. 2014-04-03 10:26:31 -07:00
c6x Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 10:59:39 -07:00
cris Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-04-02 16:23:38 -07:00
frv PCI changes for the v3.15 merge window: 2014-04-01 15:14:04 -07:00
hexagon Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel 2014-04-05 13:19:32 -07:00
ia64 Most of the changes were largely clean ups, and some documentation. 2014-04-03 10:26:31 -07:00
m32r arch: Remove stub cputime.h headers 2014-03-13 16:09:30 +01:00
m68k Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu 2014-04-05 13:18:52 -07:00
metag Most of the changes were largely clean ups, and some documentation. 2014-04-03 10:26:31 -07:00
microblaze Most of the changes were largely clean ups, and some documentation. 2014-04-03 10:26:31 -07:00
mips Most of the changes were largely clean ups, and some documentation. 2014-04-03 10:26:31 -07:00
mn10300 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:22:57 -07:00
openrisc locking/mcs: Allow architecture specific asm files to be used for contended case 2014-02-09 21:18:52 +01:00
parisc Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:22:57 -07:00
powerpc power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update 2014-04-09 12:54:17 +10:00
s390 Most of the changes were largely clean ups, and some documentation. 2014-04-03 10:26:31 -07:00
score score: remove unused CPU_SCORE7 Kconfig parameter 2014-04-03 16:20:52 -07:00
sh Merge branch 'akpm' (incoming from Andrew) 2014-04-03 16:22:16 -07:00
sparc Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-04-03 13:05:42 -07:00
tile Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile 2014-04-06 08:11:57 -07:00
um Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-04-02 20:53:45 -07:00
unicore32 locking/mcs: Allow architecture specific asm files to be used for contended case 2014-02-09 21:18:52 +01:00
x86 A number of cleanups plus support for the RDSEED instruction, which 2014-04-04 21:25:28 -07:00
xtensa Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:22:57 -07:00
.gitignore
Kconfig uprobes: Kconfig dependency fix 2014-03-18 16:39:33 -04:00