linux

History

Michal Hocko 46a3679b81 mm, memory_hotplug: do not clear numa_node association after hot_remove Per-cpu numa_node provides a default node for each possible cpu. The association gets initialized during the boot when the architecture specific code explores cpu->NUMA affinity. When the whole NUMA node is removed though we are clearing this association try_offline_node check_and_unmap_cpu_on_node unmap_cpu_on_node numa_clear_node numa_set_node(cpu, NUMA_NO_NODE) This means that whoever calls cpu_to_node for a cpu associated with such a node will get NUMA_NO_NODE. This is problematic for two reasons. First it is fragile because __alloc_pages_node would simply blow up on an out-of-bound access. We have encountered this when loading kvm module BUG: unable to handle kernel paging request at 00000000000021c0 IP: __alloc_pages_nodemask+0x93/0xb70 PGD 800000ffe853e067 PUD 7336bbc067 PMD 0 Oops: 0000 [#1] SMP [...] CPU: 88 PID: 1223749 Comm: modprobe Tainted: G W 4.4.156-94.64-default #1 RIP: __alloc_pages_nodemask+0x93/0xb70 RSP: 0018:ffff887354493b40 EFLAGS: 00010202 RAX: 00000000000021c0 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00000000014000c0 RBP: 00000000014000c0 R08: ffffffffffffffff R09: 0000000000000000 R10: ffff88fffc89e790 R11: 0000000000014000 R12: 0000000000000101 R13: ffffffffa0772cd4 R14: ffffffffa0769ac0 R15: 0000000000000000 FS: 00007fdf2f2f1700(0000) GS:ffff88fffc880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000021c0 CR3: 00000077205ee000 CR4: 0000000000360670 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: alloc_vmcs_cpu+0x3d/0x90 [kvm_intel] hardware_setup+0x781/0x849 [kvm_intel] kvm_arch_hardware_setup+0x28/0x190 [kvm] kvm_init+0x7c/0x2d0 [kvm] vmx_init+0x1e/0x32c [kvm_intel] do_one_initcall+0xca/0x1f0 do_init_module+0x5a/0x1d7 load_module+0x1393/0x1c90 SYSC_finit_module+0x70/0xa0 entry_SYSCALL_64_fastpath+0x1e/0xb7 DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb7 on an older kernel but the code is basically the same in the current Linus tree as well. alloc_vmcs_cpu could use alloc_pages_nodemask which would recognize NUMA_NO_NODE and use alloc_pages_node which would translate it to numa_mem_id but that is wrong as well because it would use a cpu affinity of the local CPU which might be quite far from the original node. It is also reasonable to expect that cpu_to_node will provide a sane value and there might be many more callers like that. The second problem is that __register_one_node relies on cpu_to_node to properly associate cpus back to the node when it is onlined. We do not want to lose that link as there is no arch independent way to get it from the early boot time AFAICS. Drop the whole check_and_unmap_cpu_on_node machinery and keep the association to fix both issues. The NODE_DATA(nid) is not deallocated so it will stay in place and if anybody wants to allocate from that node then a fallback node will be used. Thanks to Vlastimil Babka for his live system debugging skills that helped debugging the issue. Link: http://lkml.kernel.org/r/20181108100413.966-1-mhocko@kernel.org Fixes: `e13fe8695c` ("cpu-hotplug,memory-hotplug: clear cpu_to_node() when offlining the node") Signed-off-by: Michal Hocko <mhocko@suse.com> Debugged-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2018-12-28 12:11:47 -08:00
..
kasan	kasan: add SPDX-License-Identifier mark to source files	2018-12-28 12:11:44 -08:00
backing-dev.c	blkcg: delay blkg destruction until after writeback has finished	2018-08-31 14:48:56 -06:00
balloon_compaction.c	virtio_balloon: fix deadlock on OOM	2017-11-14 23:57:38 +02:00
cleancache.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
cma_debug.c	mm/cma: remove unsupported gfp_mask parameter from cma_alloc()	2018-08-17 16:20:32 -07:00
cma.c	kasan, mm, arm64: tag non slab memory allocated via pagealloc	2018-12-28 12:11:44 -08:00
cma.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
compaction.c	psi: pressure stall information for CPU, memory, and IO	2018-10-26 16:26:32 -07:00
debug_page_ref.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
debug.c	mm: lower the printk loglevel for __dump_page messages	2018-12-28 12:11:46 -08:00
dmapool.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
early_ioremap.c	mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep	2017-12-11 14:54:44 +01:00
fadvise.c	vfs: implement readahead(2) using POSIX_FADV_WILLNEED	2018-08-30 20:01:32 +02:00
failslab.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
filemap.c	vfs: rework data cloning infrastructure	2018-11-02 09:33:08 -07:00
frame_vector.c	mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'	2017-12-14 16:00:48 -08:00
frontswap.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
gup_benchmark.c	mm/gup_benchmark.c: prevent integer overflow in ioctl	2018-10-31 08:54:12 -07:00
gup.c	mm/gup: finish consolidating error handling	2018-11-30 14:56:13 -08:00
highmem.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
hmm.c	mm/hmm: invalidate device page table at start of invalidation	2018-10-31 08:54:12 -07:00
huge_memory.c	mm: thp: fix flags for pmd migration when split	2018-12-21 14:51:18 -08:00
hugetlb_cgroup.c	mm: rename page_counter's count/limit into usage/max	2018-06-07 17:34:35 -07:00
hugetlb.c	hugetlbfs: call VM_BUG_ON_PAGE earlier in free_huge_page()	2018-12-14 15:05:45 -08:00
hwpoison-inject.c	mm/memory_failure: Remove unused trapno from memory_failure	2018-01-23 12:17:42 -06:00
init-mm.c	mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids	2018-07-17 09:35:30 +02:00
internal.h	memblock: rename __free_pages_bootmem to memblock_free_pages	2018-10-31 08:54:16 -07:00
interval_tree.c	mm/interval_tree.c: use vma_pages() helper	2018-01-31 17:18:37 -08:00
Kconfig	ksm: replace jhash2 with xxhash	2018-12-28 12:11:46 -08:00
Kconfig.debug	mm: clarify CONFIG_PAGE_POISONING and usage	2018-08-22 10:52:44 -07:00
khugepaged.c	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu	2018-12-04 07:52:30 +01:00
kmemleak-test.c
kmemleak.c	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
ksm.c	ksm: replace jhash2 with xxhash	2018-12-28 12:11:46 -08:00
list_lru.c	mm/list_lru: introduce list_lru_shrink_walk_irq()	2018-08-17 16:20:32 -07:00
maccess.c	x86/fault: BUG() when uaccess helpers fault on kernel addresses	2018-09-03 15:12:09 +02:00
madvise.c	Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax	2018-10-28 11:35:40 -07:00
Makefile	mm: remove nobootmem	2018-10-31 08:54:16 -07:00
memblock.c	memblock: annotate memblock_is_reserved() with __init_memblock	2018-12-14 15:05:45 -08:00
memcontrol.c	mm: handle no memcg case in memcg_kmem_charge() properly	2018-11-03 10:09:37 -07:00
memfd.c	memfd: Convert memfd_tag_pins to XArray	2018-10-21 10:46:41 -04:00
memory_hotplug.c	mm, memory_hotplug: do not clear numa_node association after hot_remove	2018-12-28 12:11:47 -08:00
memory-failure.c	dax: Fix unlock mismatch with updated API	2018-12-04 21:32:00 -08:00
memory.c	mm: Fix warning in insert_pfn()	2018-10-31 08:54:17 -07:00
mempolicy.c	Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"	2018-12-08 10:26:20 -08:00
mempool.c	mm/mempool.c: add missing parameter description	2018-08-22 10:52:44 -07:00
memtest.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
migrate.c	Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax	2018-10-28 11:35:40 -07:00
mincore.c	xarray: Replace exceptional entries	2018-09-29 22:47:49 -04:00
mlock.c	dax: remove VM_MIXEDMAP for fsdax and device dax	2018-08-17 16:20:27 -07:00
mm_init.c	mm: access zone->node via zone_to_nid() and zone_set_nid()	2018-08-22 10:52:45 -07:00
mmap.c	mm/mmap.c: remove verify_mm_writelocked()	2018-12-28 12:11:47 -08:00
mmu_context.c
mmu_gather.c	mm: Replace call_rcu_sched() with call_rcu()	2018-11-27 09:21:46 -08:00
mmu_notifier.c	mm/mmu_notifier.c: remove mmu_notifier_synchronize()	2018-12-28 12:11:46 -08:00
mmzone.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mprotect.c	x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings	2018-06-20 19:10:01 +02:00
mremap.c	mm: mremap: downgrade mmap_sem to read when shrinking	2018-10-26 16:26:35 -07:00
msync.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
nommu.c	mm/gup: cache dev_pagemap while pinning pages	2018-10-26 16:38:15 -07:00
oom_kill.c	Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2018-10-24 11:22:39 +01:00
page_alloc.c	mm: only report isolation failures when offlining memory	2018-12-28 12:11:46 -08:00
page_counter.c	memcg: introduce memory.min	2018-06-07 17:34:36 -07:00
page_ext.c	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
page_idle.c	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
page_io.c	for-linus-20181102	2018-11-02 11:25:48 -07:00
page_isolation.c	mm: only report isolation failures when offlining memory	2018-12-28 12:11:46 -08:00
page_owner.c	mm/page_owner: clamp read count to PAGE_SIZE	2018-12-28 12:11:46 -08:00
page_poison.c	virtio, vhost: fixes, tweaks	2018-11-01 14:42:49 -07:00
page_vma_mapped.c	mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly	2018-10-31 08:54:11 -07:00
page-writeback.c	Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax	2018-10-28 11:35:40 -07:00
pagewalk.c	mm: kernel-doc: add missing parameter descriptions	2018-04-05 21:36:27 -07:00
percpu-internal.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
percpu-km.c	percpu: allow select gfp to be passed to underlying allocators	2018-02-18 05:33:01 -08:00
percpu-stats.c	treewide: Use array_size() in vmalloc()	2018-06-12 16:19:22 -07:00
percpu-vm.c	percpu: allow select gfp to be passed to underlying allocators	2018-02-18 05:33:01 -08:00
percpu.c	Merge branch 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu	2018-11-01 09:27:57 -07:00
pgtable-generic.c	x86/mm: Page size aware flush_tlb_mm_range()	2018-10-09 16:51:11 +02:00
process_vm_access.c	mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors	2018-02-06 18:32:48 -08:00
quicklist.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
readahead.c	mm/readahead.c: simplify get_next_ra_size()	2018-12-28 12:11:46 -08:00
rmap.c	mm/huge_memory: rename freeze_page() to unmap_page()	2018-11-30 14:56:14 -08:00
rodata_test.c	mm: fix RODATA_TEST failure "rodata_test: test data was not read only"	2017-10-03 17:54:24 -07:00
shmem.c	drm pull request for 4.21-rc1	2018-12-25 11:48:26 -08:00
slab_common.c	mm, slab: remove unnecessary unlikely()	2018-12-28 12:11:46 -08:00
slab.c	kasan, mm, arm64: tag non slab memory allocated via pagealloc	2018-12-28 12:11:44 -08:00
slab.h	kasan, mm: change hooks signatures	2018-12-28 12:11:43 -08:00
slob.c	slab: __GFP_ZERO is incompatible with a constructor	2018-06-07 17:34:34 -07:00
slub.c	mm/slub.c: record final state of slub action in deactivate_slab()	2018-12-28 12:11:46 -08:00
sparse-vmemmap.c	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
sparse.c	mm/hotplug: optimize clear_hwpoisoned_pages()	2018-12-28 12:11:46 -08:00
swap_cgroup.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
swap_slots.c	mm, swap, get_swap_pages: use entry_size instead of cluster in parameter	2018-08-22 10:52:44 -07:00
swap_state.c	Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax	2018-10-28 11:35:40 -07:00
swap.c	mm: Replace spin_is_locked() with lockdep	2018-11-12 09:06:22 -08:00
swapfile.c	mm/swapfile.c: use kvzalloc for swap_info_struct allocation	2018-11-18 10:15:09 -08:00
truncate.c	mm: cleancache: fix corruption on missed inode invalidation	2018-11-30 14:56:14 -08:00
usercopy.c	usercopy: Allow boot cmdline disabling of hardening	2018-07-04 08:04:52 -07:00
userfaultfd.c	userfaultfd: shmem: add i_size checks	2018-11-30 14:56:14 -08:00
util.c	kvfree(): fix misleading comment	2018-10-26 16:26:33 -07:00
vmacache.c	mm: get rid of vmacache_flush_all() entirely	2018-09-13 15:18:04 -10:00
vmalloc.c	vfree: add debug might_sleep()	2018-10-26 16:26:33 -07:00
vmpressure.c	mm/vmpressure.c: convert to use match_string() helper	2018-06-07 17:34:36 -07:00
vmscan.c	Merge drm/drm-next into drm-intel-next-queued	2018-11-20 13:14:08 +02:00
vmstat.c	mm/vmstat.c: fix NUMA statistics updates	2018-11-18 10:15:10 -08:00
workingset.c	Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax	2018-10-28 11:35:40 -07:00
z3fold.c	z3fold: fix possible reclaim races	2018-11-18 10:15:09 -08:00
zbud.c	mm: docs: fix parameter names mismatch	2018-02-06 18:32:48 -08:00
zpool.c	mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc	2018-02-21 15:35:43 -08:00
zsmalloc.c	mm/zsmalloc.c: fix fall-through annotation	2018-10-26 16:26:35 -07:00
zswap.c	zswap: re-check zswap_is_full() after do zswap_shrink()	2018-07-26 19:38:03 -07:00