linux/mm
Waiman Long 64977918c2 mm/kmemleak: skip unlikely objects in kmemleak_scan() without taking lock
There are 3 RCU-based object iteration loops in kmemleak_scan().  Because
of the need to take RCU read lock, we can't insert cond_resched() into the
loop like other parts of the function.  As there can be millions of
objects to be scanned, it takes a while to iterate all of them.  The
kmemleak functionality is usually enabled in a debug kernel which is much
slower than a non-debug kernel.  With sufficient number of kmemleak
objects, the time to iterate them all may exceed 22s causing soft lockup.

  watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kmemleak:625]

In this particular bug report, the soft lockup happen in the 2nd iteration
loop.

In the 2nd and 3rd loops, most of the objects are checked and then skipped
under the object lock.  Only a selected fews are modified.  Those objects
certainly need lock protection.  However, the lock/unlock operation is
slow especially with interrupt disabling and enabling included.

We can actually do some basic check like color_white() without taking the
lock and skip the object accordingly.  Of course, this kind of check is
racy and may miss objects that are being modified concurrently.  The cost
of missed objects, however, is just that they will be discovered in the
next scan instead.  The advantage of doing so is that iteration can be
done much faster especially with LOCKDEP enabled in a debug kernel.

With a debug kernel running on a 2-socket 96-thread x86-64 system
(HZ=1000), the 2nd and 3rd iteration loops speedup with this patch on the
first kmemleak_scan() call after bootup is shown in the table below.

                   Before patch                    After patch
  Loop #    # of objects  Elapsed time     # of objects  Elapsed time
  ------    ------------  ------------     ------------  ------------
    2        2,599,850      2.392s          2,596,364       0.266s
    3        2,600,176      2.171s          2,597,061       0.260s

This patch reduces loop iteration times by about 88%.  This will greatly
reduce the chance of a soft lockup happening in the 2nd or 3rd iteration
loops.

Even though the first loop runs a little bit faster, it can still be
problematic if many kmemleak objects are there.  As the object count has
to be modified in every object, we cannot avoid taking the object lock. 
So other way to prevent soft lockup will be needed.

Link: https://lkml.kernel.org/r/20220614220359.59282-3-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16 19:48:32 -07:00
..
damon mm: damon: use HPAGE_PMD_SIZE 2022-05-19 14:08:55 -07:00
kasan mm: kasan: fix input of vmalloc_to_page() 2022-05-27 09:33:46 -07:00
kfence Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
backing-dev.c blk-cgroup: remove unneeded includes from <linux/blk-cgroup.h> 2022-05-02 14:06:20 -06:00
balloon_compaction.c mm/balloon_compaction: make balloon page compaction callbacks static 2022-03-28 16:52:57 -04:00
bootmem_info.c
cma_debug.c
cma_sysfs.c
cma.c Revert "mm/cma.c: remove redundant cma_mutex lock" 2022-05-13 15:11:26 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c mm, compaction: fast_find_migrateblock() should return pfn in the target zone 2022-05-13 16:48:57 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: add tests for __HAVE_ARCH_PTE_SWP_EXCLUSIVE 2022-05-09 18:20:45 -07:00
debug.c mm: unexport page_init_poison 2022-03-24 19:06:45 -07:00
dmapool.c
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
failslab.c mm: fix missing handler for __GFP_NOWARN 2022-05-19 14:08:55 -07:00
filemap.c filemap: Cache the value of vm_flags 2022-06-09 16:24:25 -04:00
folio-compat.c fs: Remove aop flags parameter from grab_cache_page_write_begin() 2022-05-08 14:28:19 -04:00
frontswap.c
gup_test.c
gup_test.h
gup.c mm: avoid unnecessary page fault retires on shared memory types 2022-06-16 19:48:27 -07:00
highmem.c highmem: fix checks in __kmap_local_sched_{in,out} 2022-04-08 14:20:36 -10:00
hmm.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
huge_memory.c mm/huge_memory: Fix xarray node memory leak 2022-06-09 16:24:25 -04:00
hugetlb_cgroup.c
hugetlb_vmemmap.c mm: hugetlb_vmemmap: fix CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON 2022-06-01 15:57:16 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* 2022-04-28 23:16:15 -07:00
hugetlb.c delayacct: track delays from write-protect copy 2022-06-01 15:55:25 -07:00
hwpoison-inject.c mm/hwpoison: disable hwpoison filter during removing 2022-05-13 07:20:19 -07:00
init-mm.c
internal.h mm: split free page with properly free memory accounting and without race 2022-05-27 09:33:43 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig mm: Kconfig: reorganize misplaced mm options 2022-05-27 09:33:47 -07:00
Kconfig.debug Two followon fixes for the post-5.19 series "Use pageblock_order for cma 2022-05-27 11:40:49 -07:00
khugepaged.c mm: khugepaged: introduce khugepaged_enter_vma() helper 2022-05-19 14:08:50 -07:00
kmemleak.c mm/kmemleak: skip unlikely objects in kmemleak_scan() without taking lock 2022-06-16 19:48:32 -07:00
ksm.c ksm: fix typo in comment 2022-05-25 10:47:48 -07:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c mm: filter out swapin error entry in shmem mapping 2022-05-27 09:33:46 -07:00
Makefile mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* 2022-04-28 23:16:15 -07:00
mapping_dirty_helpers.c
memblock.c mm: kmemleak: remove kmemleak_not_leak_phys() and the min_count argument to kmemleak_alloc_phys() 2022-06-16 19:48:30 -07:00
memcontrol.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-05 11:08:32 -08:00
memory_hotplug.c mm/memory_hotplug: drop 'reason' argument from check_pfn_span() 2022-06-16 19:48:28 -07:00
memory-failure.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
memory.c mm: avoid unnecessary page fault retires on shared memory types 2022-06-16 19:48:27 -07:00
mempolicy.c mm/mempolicy: fix uninit-value in mpol_rebind_policy() 2022-05-19 14:08:54 -07:00
mempool.c mm/mempool: use might_alloc() 2022-06-16 19:48:30 -07:00
memremap.c mm/memremap: fix memunmap_pages() race with get_dev_pagemap() 2022-06-16 19:48:31 -07:00
memtest.c
migrate_device.c mm: remember exclusively mapped anonymous pages with PG_anon_exclusive 2022-05-09 18:20:44 -07:00
migrate.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
mincore.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
mlock.c mm/munlock: protect the per-CPU pagevec by a local_lock_t 2022-04-01 11:46:09 -07:00
mm_init.c
mmap_lock.c
mmap.c powerpc updates for 5.19 2022-05-28 11:27:17 -07:00
mmu_gather.c mm/mmu_gather: limit free batch count and add schedule point in tlb_batch_pages_flush 2022-04-28 23:16:12 -07:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-21 20:01:10 -07:00
mmzone.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mprotect.c mm/hugetlb: handle UFFDIO_WRITEPROTECT 2022-05-13 07:20:11 -07:00
mremap.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
msync.c
nommu.c no-MMU: expose vmalloc_huge() for alloc_large_system_hash() 2022-04-25 10:11:49 -07:00
oom_kill.c mm/oom_kill.c: fix vm_oom_kill_table[] ifdeffery 2022-06-01 15:57:16 -07:00
page_alloc.c mm/page_alloc: use might_alloc() 2022-06-16 19:48:29 -07:00
page_counter.c
page_ext.c mm: use for_each_online_node and node_online instead of open coding 2022-04-29 14:36:58 -07:00
page_idle.c mm: don't be stuck to rmap lock on reclaim path 2022-05-19 14:08:54 -07:00
page_io.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
page_isolation.c mm: page_isolation: use compound_nr() correctly in isolate_single_pageblock() 2022-06-01 15:57:16 -07:00
page_owner.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c Six hotfixes. One from Miaohe Lin is considered a minor thing so it isn't 2022-05-27 11:29:35 -07:00
page_vma_mapped.c mm: pvmw: add support for walking devmap pages 2022-04-28 23:16:10 -07:00
page-writeback.c sysctl changes for v5.19-rc1 2022-05-26 16:57:20 -07:00
pagewalk.c
percpu-internal.h percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
percpu-km.c
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c
percpu.c percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
pgalloc-track.h
pgtable-generic.c mm: avoid unnecessary flush on change_huge_pmd() 2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c mm: sparsemem: use page table lock to protect kernel pmd operations 2022-03-22 15:57:08 -07:00
readahead.c filemap: Don't release a locked folio 2022-06-09 16:24:25 -04:00
rmap.c mm: don't be stuck to rmap lock on reclaim path 2022-05-19 14:08:54 -07:00
rodata_test.c
secretmem.c secretmem: Convert to free_folio 2022-05-09 23:12:53 -04:00
shmem.c mm/shmem.c: clean up comment of shmem_swapin_folio 2022-06-16 19:48:28 -07:00
shuffle.c
shuffle.h
slab_common.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
slab.c mm/slab: delete cache_alloc_debugcheck_before() 2022-06-16 19:48:29 -07:00
slab.h slab changes for 5.19 2022-05-25 10:24:04 -07:00
slob.c mm: make minimum slab alignment a runtime property 2022-05-13 07:20:07 -07:00
slub.c slab changes for 5.19 2022-05-25 10:24:04 -07:00
sparse-vmemmap.c mm/sparse-vmemmap.c: remove unwanted initialization in vmemmap_populate_compound_pages() 2022-06-16 19:48:32 -07:00
sparse.c mm/memory-failure.c: move clear_hwpoisoned_pages 2022-05-13 07:20:19 -07:00
swap_cgroup.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
swap_slots.c mm/swap: remove buggy cache->nr check in refill_swap_slots_cache 2022-05-19 14:08:51 -07:00
swap_state.c mm: filter out swapin error entry in shmem mapping 2022-05-27 09:33:46 -07:00
swap.c mm/swap: fix the comment of get_kernel_pages 2022-05-19 14:08:52 -07:00
swap.h swap: convert add_to_swap() to take a folio 2022-05-13 07:20:15 -07:00
swapfile.c Two followon fixes for the post-5.19 series "Use pageblock_order for cma 2022-05-27 11:40:49 -07:00
truncate.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
usercopy.c mm: usercopy: move the virt_addr_valid() below the is_vmalloc_addr() 2022-05-16 16:02:21 -07:00
userfaultfd.c mm/uffd: enable write protection for shmem & hugetlbfs 2022-05-13 07:20:11 -07:00
util.c powerpc updates for 5.19 2022-05-28 11:27:17 -07:00
vmacache.c
vmalloc.c mm/vmalloc: add code comment for find_vmap_area_exceed_addr() 2022-06-16 19:48:29 -07:00
vmpressure.c
vmscan.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
vmstat.c Bitmap patches for 5.19-rc1 2022-06-04 14:04:27 -07:00
workingset.c memcg: sync flush only if periodic flush is delayed 2022-04-21 20:01:09 -07:00
z3fold.c mm/z3fold: fix z3fold_page_migrate races with z3fold_map 2022-05-27 09:33:44 -07:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-05-13 15:11:26 -07:00
zswap.c zswap: memcg accounting 2022-05-19 14:08:53 -07:00