linux

History

Michael Wang 9a0133613e power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update Since v1: Edited the comment according to Srivatsa's suggestion. During the testing, we encounter below WARN followed by Oops: WARNING: at kernel/sched/core.c:6218 ... NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200 LR [c000000000101358] .build_sched_domains+0xec8/0x1200 PACATMSCRATCH [800000000000f032] Call Trace: [c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0 LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200 PACATMSCRATCH [8000000000029032] Call Trace: [c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0 [c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... This was caused by that 'sd->groups == NULL' after building groups, which was caused by the empty 'sd->span'. The cpu's domain contained nothing because the cpu was assigned to a wrong node, due to the following unfortunate sequence of events: 1. The hypervisor sent a topology update to the guest OS, to notify changes to the cpu-node mapping. However, the update was actually redundant - i.e., the "new" mapping was exactly the same as the old one. 2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting the 'for-loop' in arch_update_cpu_topology(). 3. So we ended up calling stop-machine() with an empty cpumask list, which made stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as the cpu to run the payload (the update_cpu_topology() function). 4. This causes update_cpu_topology() to be run by CPU0. And since 'updates' is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology() finds update->cpu as well as update->new_nid to be 0. In other words, we end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly. Along with the following wrong updating, it causes the sched-domain rebuild code to break and crash the system. Fix this by skipping the topology update in cases where we find that the topology has not actually changed in reality (ie., spurious updates). CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> CC: Andrew Morton <akpm@linux-foundation.org> CC: Robert Jennings <rcj@linux.vnet.ibm.com> CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com> CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> CC: Alistair Popple <alistair@popple.id.au> Suggested-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>		2014-04-09 12:54:17 +10:00
..
40x_mmu.c	memblock: Remove rmo_size, burry it in arch/powerpc where it belongs	2010-08-05 12:56:08 +10:00
44x_mmu.c	powerpc: Delete __cpuinit usage from all users	2013-07-01 11:10:36 +10:00
dma-noncoherent.c	mm/arch: use __free_reserved_page() to simplify the code	2013-11-13 12:09:03 +09:00
fault.c	arch: mm: pass userspace fault flag to generic fault handler	2013-09-12 15:38:01 -07:00
fsl_booke_mmu.c	powerpc/e6500: TLB miss handler with hardware tablewalk support	2014-01-09 17:52:19 -06:00
gup.c	powerpc/mm: NUMA pte should be handled via slow path in get_user_pages_fast()	2014-04-09 12:53:03 +10:00
hash_low_32.S	powerpc: Use CURRENT_THREAD_INFO instead of open coded assembly	2012-07-11 14:18:22 +10:00
hash_low_64.S	powerpc/mm: Free up _PAGE_COHERENCE for numa fault use later	2013-12-09 11:40:28 +11:00
hash_native_64.c	powerpc: Book 3S MMU little endian support	2013-10-11 16:48:26 +11:00
hash_utils_64.c	powerpc: Fix kdump hang issue on p8 with relocation on exception enabled.	2014-02-11 11:24:47 +11:00
highmem.c	mm: fix race in kunmap_atomic()	2010-10-27 18:03:05 -07:00
hugepage-hash64.c	powerpc/mm: Free up _PAGE_COHERENCE for numa fault use later	2013-12-09 11:40:28 +11:00
hugetlbpage-book3e.c	powerpc/fsl-book3e-64: Use paca for hugetlb TLB1 entry selection	2014-01-09 17:52:20 -06:00
hugetlbpage-hash64.c	powerpc/mm: Free up _PAGE_COHERENCE for numa fault use later	2013-12-09 11:40:28 +11:00
hugetlbpage.c	powerpc/hugetlb: Replace __get_cpu_var with get_cpu_var	2014-01-29 17:02:26 +11:00
icswx_pid.c	powerpc: Split ICSWX ACOP and PID processing	2011-11-25 14:11:27 +11:00
icswx.c	powerpc: Fix typo "CONFIG_ICSWX_PID"	2013-04-18 13:03:54 +10:00
icswx.h	powerpc/icswx: Fix race condition with IPI setting ACOP	2012-03-07 17:06:09 +11:00
init_32.c	powerpc/8xx: Fixing memory init issue with CONFIG_PIN_TLB	2013-10-28 21:11:22 -05:00
init_64.c	powerpc: Prepare to support kernel handling of IOMMU map/unmap	2013-10-11 17:24:39 +11:00
Makefile	powerpc/THP: Add code to handle HPTE faults for hugepages	2013-06-21 16:01:56 +10:00
mem.c	powerpc/pseries: Use remove_memory() to remove memory	2014-03-07 15:53:13 +11:00
mmap.c	mm: remove free_area_cache	2013-07-10 18:11:34 -07:00
mmu_context_hash32.c	powerpc: include export.h for files using EXPORT_SYMBOL/THIS_MODULE	2011-10-31 19:30:38 -04:00
mmu_context_hash64.c	powerpc: Reduce PTE table memory wastage	2013-04-30 16:00:07 +10:00
mmu_context_nohash.c	powerpc: Delete __cpuinit usage from all users	2013-07-01 11:10:36 +10:00
mmu_decl.h	powerpc/fsl_booke: smp support for booting a relocatable kernel above 64M	2014-01-09 17:52:18 -06:00
numa.c	power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update	2014-04-09 12:54:17 +10:00
pgtable_32.c	powerpc: add barrier after writing kernel PTE	2014-01-09 17:52:19 -06:00
pgtable_64.c	powerpc/mm: Make sure a local_irq_disable prevent a parallel THP split	2014-03-24 09:48:34 +11:00
pgtable.c	powerpc: Delete non-required instances of include <linux/init.h>	2014-01-15 13:46:44 +11:00
ppc_mmu_32.c	memblock: Remove rmo_size, burry it in arch/powerpc where it belongs	2010-08-05 12:56:08 +10:00
slb_low.S	powerpc: Rename USER_ESID_BITS* to ESID_BITS*	2013-03-17 12:45:44 +11:00
slb.c	powerpc: Fix little endian lppaca, slb_shadow and dtl_entry	2013-08-14 15:33:35 +10:00
slice.c	powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the allowed address space	2014-01-29 17:02:25 +11:00
stab.c	powerpc/mm: Remove uses of abs_to_virt() and virt_to_abs()	2012-09-05 15:19:31 +10:00
subpage-prot.c	powerpc/mm: Add new "set" flag argument to pte/pmd update function	2014-02-17 11:19:35 +11:00
tlb_hash32.c	powerpc: include export.h for files using EXPORT_SYMBOL/THIS_MODULE	2011-10-31 19:30:38 -04:00
tlb_hash64.c	powerpc: Delete non-required instances of include <linux/init.h>	2014-01-15 13:46:44 +11:00
tlb_low_64e.S	powerpc/booke64: Use SPRG_TLB_EXFRAME on bolted handlers	2014-03-19 19:57:15 -05:00
tlb_nohash_low.S	powerpc/fsl_booke: smp support for booting a relocatable kernel above 64M	2014-01-09 17:52:18 -06:00
tlb_nohash.c	powerpc/booke64: Critical and machine check exception support	2014-03-19 19:57:27 -05:00