linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-06 03:51:48 +00:00

History

Michael Wang 9a0133613e power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update Since v1: Edited the comment according to Srivatsa's suggestion. During the testing, we encounter below WARN followed by Oops: WARNING: at kernel/sched/core.c:6218 ... NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200 LR [c000000000101358] .build_sched_domains+0xec8/0x1200 PACATMSCRATCH [800000000000f032] Call Trace: [c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0 LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200 PACATMSCRATCH [8000000000029032] Call Trace: [c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0 [c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... This was caused by that 'sd->groups == NULL' after building groups, which was caused by the empty 'sd->span'. The cpu's domain contained nothing because the cpu was assigned to a wrong node, due to the following unfortunate sequence of events: 1. The hypervisor sent a topology update to the guest OS, to notify changes to the cpu-node mapping. However, the update was actually redundant - i.e., the "new" mapping was exactly the same as the old one. 2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting the 'for-loop' in arch_update_cpu_topology(). 3. So we ended up calling stop-machine() with an empty cpumask list, which made stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as the cpu to run the payload (the update_cpu_topology() function). 4. This causes update_cpu_topology() to be run by CPU0. And since 'updates' is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology() finds update->cpu as well as update->new_nid to be 0. In other words, we end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly. Along with the following wrong updating, it causes the sched-domain rebuild code to break and crash the system. Fix this by skipping the topology update in cases where we find that the topology has not actually changed in reality (ie., spurious updates). CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> CC: Andrew Morton <akpm@linux-foundation.org> CC: Robert Jennings <rcj@linux.vnet.ibm.com> CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com> CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> CC: Alistair Popple <alistair@popple.id.au> Suggested-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>		2014-04-09 12:54:17 +10:00
..
boot	powerpc: T4240: Add ina220 node in dts	2014-03-19 16:57:25 -05:00
configs	Merge remote-tracking branch 'scott/next' into next	2014-03-24 10:26:10 +11:00
crypto	powerpc: Fix compile of sha1-powerpc-asm.S on 32-bit	2013-03-05 16:56:26 +11:00
include	powerpc/opal: Add missing include	2014-04-09 12:53:36 +10:00
kernel	powerpc: Add lq/stq emulation	2014-04-09 12:53:28 +10:00
kvm	Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm	2014-04-02 14:50:10 -07:00
lib	selftests/powerpc: Import Anton's memcpy / copy_tofrom_user tests	2014-03-07 15:53:12 +11:00
math-emu	powerpc: Correct emulated mtfsf instruction	2014-04-07 10:33:11 +10:00
mm	power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update	2014-04-09 12:54:17 +10:00
net	net: filter: add jited flag to indicate jit compiled filters	2014-03-31 00:45:08 -04:00
oprofile	cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE\|RESUMECHANGE}	2014-03-19 14:10:24 +01:00
perf	powerpc/perf: Fix handling of L3 events with bank == 1	2014-03-24 09:48:33 +11:00
platforms	arch/powerpc: Use RCU_INIT_POINTER(x, NULL) in platforms/cell/spu_syscalls.c	2014-04-09 12:53:40 +10:00
sysdev	powerpc: Use of_node_init() for the fakenode in msi_bitmap.c	2014-04-09 12:53:07 +10:00
xmon	powerpc: Fix xmon disassembler for little-endian	2014-03-07 15:50:12 +11:00
Kconfig	Devicetree changes for v3.15	2014-04-02 14:27:15 -07:00
Kconfig.debug	Merge branch 'kconfig-diet' from Dave Hansen	2013-07-04 11:25:51 -07:00
Makefile	powerpc/le: Avoid creatng R_PPC64_TOCSAVE relocations for modules.	2014-04-09 12:53:44 +10:00
relocs_check.pl	Fix warning typo "CONFIG_RELCOATABLE"	2013-05-29 15:11:30 +02:00