linux/mm
Joonsoo Kim b03a017beb mm/slab: introduce new slab management type, OBJFREELIST_SLAB
SLAB needs an array to manage freed objects in a slab.  It is only used
if some objects are freed so we can use free object itself as this
array.  This requires additional branch in somewhat critical lock path
to check if it is first freed object or not but that's all we need.
Benefits is that we can save extra memory usage and reduce some
computational overhead by allocating a management array when new slab is
created.

Code change is rather complex than what we can expect from the idea, in
order to handle debugging feature efficiently.  If you want to see core
idea only, please remove '#if DEBUG' block in the patch.

Although this idea can apply to all caches whose size is larger than
management array size, it isn't applied to caches which have a
constructor.  If such cache's object is used for management array,
constructor should be called for it before that object is returned to
user.  I guess that overhead overwhelm benefit in that case so this idea
doesn't applied to them at least now.

For summary, from now on, slab management type is determined by
following logic.

1) if management array size is smaller than object size and no ctor, it
   becomes OBJFREELIST_SLAB.

2) if management array size is smaller than leftover, it becomes
   NORMAL_SLAB which uses leftover as a array.

3) if OFF_SLAB help to save memory than way 4), it becomes OFF_SLAB.
   It allocate a management array from the other cache so memory waste
   happens.

4) others become NORMAL_SLAB.  It uses dedicated internal memory in a
   slab as a management array so it causes memory waste.

In my system, without enabling CONFIG_DEBUG_SLAB, Almost caches become
OBJFREELIST_SLAB and NORMAL_SLAB (using leftover) which doesn't waste
memory.  Following is the result of number of caches with specific slab
management type.

TOTAL = OBJFREELIST + NORMAL(leftover) + NORMAL + OFF

/Before/
126 = 0 + 60 + 25 + 41

/After/
126 = 97 + 12 + 15 + 2

Result shows that number of caches that doesn't waste memory increase
from 60 to 109.

I did some benchmarking and it looks that benefit are more than loss.

Kmalloc: Repeatedly allocate then free test

/Before/
[    0.286809] 1. Kmalloc: Repeatedly allocate then free test
[    1.143674] 100000 times kmalloc(32) -> 116 cycles kfree -> 78 cycles
[    1.441726] 100000 times kmalloc(64) -> 121 cycles kfree -> 80 cycles
[    1.815734] 100000 times kmalloc(128) -> 168 cycles kfree -> 85 cycles
[    2.380709] 100000 times kmalloc(256) -> 287 cycles kfree -> 95 cycles
[    3.101153] 100000 times kmalloc(512) -> 370 cycles kfree -> 117 cycles
[    3.942432] 100000 times kmalloc(1024) -> 413 cycles kfree -> 156 cycles
[    5.227396] 100000 times kmalloc(2048) -> 622 cycles kfree -> 248 cycles
[    7.519793] 100000 times kmalloc(4096) -> 1102 cycles kfree -> 452 cycles

/After/
[    1.205313] 100000 times kmalloc(32) -> 117 cycles kfree -> 78 cycles
[    1.510526] 100000 times kmalloc(64) -> 124 cycles kfree -> 81 cycles
[    1.827382] 100000 times kmalloc(128) -> 130 cycles kfree -> 84 cycles
[    2.226073] 100000 times kmalloc(256) -> 177 cycles kfree -> 92 cycles
[    2.814747] 100000 times kmalloc(512) -> 286 cycles kfree -> 112 cycles
[    3.532952] 100000 times kmalloc(1024) -> 344 cycles kfree -> 141 cycles
[    4.608777] 100000 times kmalloc(2048) -> 519 cycles kfree -> 210 cycles
[    6.350105] 100000 times kmalloc(4096) -> 789 cycles kfree -> 391 cycles

In fact, I tested another idea implementing OBJFREELIST_SLAB with
extendable linked array through another freed object.  It can remove
memory waste completely but it causes more computational overhead in
critical lock path and it seems that overhead outweigh benefit.  So, this
patch doesn't include it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
..
kasan kasan: add functions to clear stack poison 2016-03-09 15:43:42 -08:00
backing-dev.c mm/backing-dev.c: fix error path in wb_init() 2016-02-11 18:35:48 -08:00
balloon_compaction.c virtio_balloon: fix race between migration and ballooning 2016-01-12 20:47:06 +02:00
bootmem.c x86/mm: Introduce max_possible_pfn 2015-12-06 12:46:31 +01:00
cleancache.c cleancache: constify cleancache_ops structure 2016-01-27 09:09:57 -05:00
cma_debug.c mm/cma_debug: correct size input to bitmap function 2015-07-17 16:39:54 -07:00
cma.c mm/cma.c: suppress warning 2015-11-05 19:34:48 -08:00
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
compaction.c mm/compaction.c: __compact_pgdat() code cleanuup 2016-01-14 16:00:49 -08:00
debug-pagealloc.c mm/debug-pagealloc: make debug-pagealloc boottime configurable 2014-12-13 12:42:48 -08:00
debug.c mm: rework mapcount accounting to enable 4k mapping of THPs 2016-01-15 17:56:32 -08:00
dmapool.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
early_ioremap.c mm/early_ioremap: use offset_in_page macro 2015-11-05 19:34:48 -08:00
fadvise.c writeback: implement and use inode_congested() 2015-06-02 08:33:35 -06:00
failslab.c mm: fault-inject take over bootstrap kmem_cache check 2016-03-15 16:55:16 -07:00
filemap.c mm: __delete_from_page_cache show Bad page if mapped 2016-03-09 15:43:42 -08:00
frame_vector.c mm: fix docbook comment for get_vaddr_frames() 2015-11-05 19:34:48 -08:00
frontswap.c frontswap: allow multiple backends 2015-06-24 17:49:45 -07:00
gup.c mm: retire GUP WARN_ON_ONCE that outlived its usefulness 2016-02-03 08:57:14 -08:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c thp: call pmdp_invalidate() with correct virtual address 2016-02-24 10:46:30 -08:00
hugetlb_cgroup.c mm: make compound_head() robust 2015-11-06 17:50:42 -08:00
hugetlb.c mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers 2016-03-09 15:43:42 -08:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c
internal.h mm: polish virtual memory accounting 2016-02-03 08:28:43 -08:00
interval_tree.c mm: replace vma->sharead.linear with vma->shared 2015-02-10 14:30:31 -08:00
Kconfig mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT 2016-02-05 18:10:40 -08:00
Kconfig.debug mm/debug_pagealloc: remove obsolete Kconfig options 2015-01-08 15:10:52 -08:00
kmemcheck.c mm: kmemcheck skip object if slab allocation failed 2016-03-15 16:55:16 -07:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c Revert "gfp: add __GFP_NOACCOUNT" 2016-01-14 16:00:49 -08:00
ksm.c mm/ksm.c: mark stable page dirty 2016-01-15 17:56:32 -08:00
list_lru.c mm: memcontrol: move kmem accounting code to CONFIG_MEMCG 2016-01-20 17:09:18 -08:00
maccess.c mm/maccess.c: actually return -EFAULT from strncpy_from_unsafe 2015-11-05 19:34:48 -08:00
madvise.c mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called 2016-01-15 17:56:32 -08:00
Makefile media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
memblock.c memblock: don't mark memblock_phys_mem_size() as __init 2016-02-05 18:10:40 -08:00
memcontrol.c thp: change pmd_trans_huge_lock() interface to return ptl 2016-01-21 17:20:51 -08:00
memory_hotplug.c xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM 2016-01-30 09:49:58 +01:00
memory-failure.c mm: soft-offline: exit with failure for non anonymous thp 2016-01-15 17:56:32 -08:00
memory.c mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED 2016-02-27 10:28:52 -08:00
mempolicy.c mm, thp: fix migration of PTE-mapped transparent huge pages 2016-03-09 15:43:42 -08:00
mempool.c mm/mempool: avoid KASAN marking mempool poison checks as use-after-free 2016-03-11 16:17:47 -08:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c mm: numa: quickly fail allocations for NUMA balancing on full nodes 2016-02-27 10:28:52 -08:00
mincore.c thp: change pmd_trans_huge_lock() interface to return ptl 2016-01-21 17:20:51 -08:00
mlock.c mm: fix mlock accouting 2016-01-21 17:20:51 -08:00
mm_init.c mm: meminit: remove mminit_verify_page_links 2015-06-30 19:44:56 -07:00
mmap.c mm: fix regression in remap_file_pages() emulation 2016-02-18 16:23:24 -08:00
mmu_context.c
mmu_notifier.c mmu-notifier: add clear_young callback 2015-09-10 13:29:01 -07:00
mmzone.c mm/mmzone.c: memmap_valid_within() can be boolean 2016-01-14 16:00:49 -08:00
mprotect.c mm, dax: check for pmd_none() after split_huge_pmd() 2016-02-11 18:35:48 -08:00
mremap.c mm, dax: check for pmd_none() after split_huge_pmd() 2016-02-11 18:35:48 -08:00
msync.c mm/msync: use offset_in_page macro 2015-11-05 19:34:48 -08:00
nobootmem.c x86/mm: Introduce max_possible_pfn 2015-12-06 12:46:31 +01:00
nommu.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
oom_kill.c mm, shmem: add internal shmem resident memory accounting 2016-01-14 16:00:49 -08:00
page_alloc.c mm, hugetlb: don't require CMA for runtime gigantic pages 2016-02-05 18:10:40 -08:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_idle.c mm: add page_check_address_transhuge() helper 2016-01-15 17:56:32 -08:00
page_io.c fs: use helper bio_add_page() instead of open coding on bi_io_vec 2015-08-13 12:32:00 -06:00
page_isolation.c mm/page_isolation: do some cleanup in "undo_isolate_page_range" 2016-01-15 17:56:32 -08:00
page_owner.c mm/page_owner: set correct gfp_mask on page_owner 2015-07-17 16:39:54 -07:00
page-writeback.c mm: page_alloc: generalize the dirty balance reserve 2016-01-14 16:00:49 -08:00
pagewalk.c thp: rename split_huge_page_pmd() to split_huge_pmd() 2016-01-15 17:56:32 -08:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c tree wide: use kvfree() than conditional kfree()/vfree() 2016-01-22 17:02:18 -08:00
pgtable-generic.c mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGE 2016-02-11 18:35:48 -08:00
process_vm_access.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
quicklist.c
readahead.c mm: move lru_to_page to mm_inline.h 2016-01-14 16:00:49 -08:00
rmap.c mm: fix locking order in mm_take_all_locks() 2016-01-15 17:56:32 -08:00
shmem.c make sure that freeing shmem fast symlinks is RCU-delayed 2016-01-22 18:08:52 -05:00
slab_common.c mm: new API kfree_bulk() for SLAB+SLUB allocators 2016-03-15 16:55:16 -07:00
slab.c mm/slab: introduce new slab management type, OBJFREELIST_SLAB 2016-03-15 16:55:16 -07:00
slab.h mm: fix some spelling 2016-03-15 16:55:16 -07:00
slob.c mm: slab: free kmem_cache_node after destroy sysfs file 2016-02-18 16:23:24 -08:00
slub.c mm: new API kfree_bulk() for SLAB+SLUB allocators 2016-03-15 16:55:16 -07:00
sparse-vmemmap.c x86, mm: introduce vmem_altmap to augment vmemmap_populate() 2016-01-15 17:56:32 -08:00
sparse.c x86, mm: introduce vmem_altmap to augment vmemmap_populate() 2016-01-15 17:56:32 -08:00
swap_cgroup.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swap_state.c mm: memcontrol: charge swap to cgroup2 2016-01-20 17:09:18 -08:00
swap.c mm, x86: get_user_pages() for dax mappings 2016-01-15 17:56:32 -08:00
swapfile.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
truncate.c dax: support dirty DAX entries in radix tree 2016-01-22 17:02:18 -08:00
userfaultfd.c memcg: adjust to support new THP refcounting 2016-01-15 17:56:32 -08:00
util.c proc: revert /proc/<pid>/maps [stack:TID] annotation 2016-02-03 08:28:43 -08:00
vmacache.c mm/vmacache: inline vmacache_valid_mm() 2015-11-05 19:34:48 -08:00
vmalloc.c mm/vmalloc.c: use macro IS_ALIGNED to judge the aligment 2016-01-15 17:56:32 -08:00
vmpressure.c mm/vmpressure.c: fix subtree pressure detection 2016-02-03 08:28:43 -08:00
vmscan.c mm: downgrade VM_BUG in isolate_lru_page() to warning 2016-02-05 18:10:40 -08:00
vmstat.c vmstat: make vmstat_update deferrable 2016-02-05 18:10:40 -08:00
workingset.c dax: support dirty DAX entries in radix tree 2016-01-22 17:02:18 -08:00
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c zsmalloc: fix migrate_zspage-zs_free race condition 2016-01-20 17:09:18 -08:00
zswap.c mm/zswap: change incorrect strncmp use to strcmp 2015-12-18 14:25:40 -08:00