linux/mm
Tejun Heo 10d53c748b memcg: ratify and consolidate over-charge handling
try_charge() is the main charging logic of memcg.  When it hits the limit
but either can't fail the allocation due to __GFP_NOFAIL or the task is
likely to free memory very soon, being OOM killed, has SIGKILL pending or
exiting, it "bypasses" the charge to the root memcg and returns -EINTR.
While this is one approach which can be taken for these situations, it has
several issues.

* It unnecessarily lies about the reality.  The number itself doesn't
  go over the limit but the actual usage does.  memcg is either forced
  to or actively chooses to go over the limit because that is the
  right behavior under the circumstances, which is completely fine,
  but, if at all avoidable, it shouldn't be misrepresenting what's
  happening by sneaking the charges into the root memcg.

* Despite trying, we already do over-charge.  kmemcg can't deal with
  switching over to the root memcg by the point try_charge() returns
  -EINTR, so it open-codes over-charing.

* It complicates the callers.  Each try_charge() user has to handle
  the weird -EINTR exception.  memcg_charge_kmem() does the manual
  over-charging.  mem_cgroup_do_precharge() performs unnecessary
  uncharging of root memcg, which BTW is inconsistent with what
  memcg_charge_kmem() does but not broken as [un]charging are noops on
  root memcg.  mem_cgroup_try_charge() needs to switch the returned
  cgroup to the root one.

The reality is that in memcg there are cases where we are forced and/or
willing to go over the limit.  Each such case needs to be scrutinized and
justified but there definitely are situations where that is the right
thing to do.  We alredy do this but with a superficial and inconsistent
disguise which leads to unnecessary complications.

This patch updates try_charge() so that it over-charges and returns 0 when
deemed necessary.  -EINTR return is removed along with all special case
handling in the callers.

While at it, remove the local variable @ret, which was initialized to zero
and never changed, along with done: label which just returned the always
zero @ret.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
..
kasan kasan: fix last shadow judgement in memory_is_poisoned_16() 2015-09-17 21:16:07 -07:00
backing-dev.c writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in cgwb_bdi_destroy() 2015-10-21 08:17:29 -06:00
balloon_compaction.c mm/balloon_compaction: fix deflation when compaction is disabled 2014-10-29 16:33:15 -07:00
bootmem.c bootmem: avoid freeing to bootmem after bootmem is done 2015-09-08 15:35:28 -07:00
cleancache.c cleancache: remove limit on the number of cleancache enabled filesystems 2015-04-14 16:49:03 -07:00
cma_debug.c mm/cma_debug: correct size input to bitmap function 2015-07-17 16:39:54 -07:00
cma.c mm: cma: fix incorrect type conversion for size during dma allocation 2015-10-23 17:55:10 +09:00
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
compaction.c mm/compaction: correct to flush migrated pages if pageblock skip happens 2015-09-08 15:35:28 -07:00
debug-pagealloc.c mm/debug-pagealloc: make debug-pagealloc boottime configurable 2014-12-13 12:42:48 -08:00
debug.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
dmapool.c dmapool: fix overflow condition in pool_find_page() 2015-10-01 21:42:35 -04:00
early_ioremap.c mm/early_ioremap: add explicit #include of asm/early_ioremap.h 2015-09-11 15:21:34 -07:00
fadvise.c writeback: implement and use inode_congested() 2015-06-02 08:33:35 -06:00
failslab.c debugfs: Pass bool pointer to debugfs_create_bool() 2015-10-04 11:36:07 +01:00
filemap.c mm: make sendfile(2) killable 2015-10-23 17:55:10 +09:00
frame_vector.c [media] mm: Provide new get_vaddr_frames() helper 2015-08-16 13:02:47 -03:00
frontswap.c frontswap: allow multiple backends 2015-06-24 17:49:45 -07:00
gup.c mm: make GUP handle pfn mapping unless FOLL_GET is requested 2015-09-04 16:54:41 -07:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c - Support for new MM features in ARCv2 cores (THP, PAE40) 2015-11-03 13:21:09 -08:00
hugetlb_cgroup.c mm: page_counter: pull "-1" handling out of page_counter_memparse() 2015-02-11 17:06:02 -08:00
hugetlb.c mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault 2015-10-01 21:42:35 -04:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c
internal.h mm/compaction: correct to flush migrated pages if pageblock skip happens 2015-09-08 15:35:28 -07:00
interval_tree.c mm: replace vma->sharead.linear with vma->shared 2015-02-10 14:30:31 -08:00
Kconfig media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
Kconfig.debug mm/debug_pagealloc: remove obsolete Kconfig options 2015-01-08 15:10:52 -08:00
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c mm/kmemleak.c: remove unneeded initialization of object to NULL 2015-11-05 19:34:48 -08:00
ksm.c mm: remove rest of ACCESS_ONCE() usages 2015-04-15 16:35:18 -07:00
list_lru.c list_lru: don't call list_lru_from_kmem if the list_head is empty 2015-09-08 15:35:28 -07:00
maccess.c uaccess: reimplement probe_kernel_address() using probe_kernel_read() 2015-11-05 19:34:48 -08:00
madvise.c mm: madvise allow remove operation for hugetlbfs 2015-09-08 15:35:28 -07:00
Makefile media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
memblock.c mm/memblock.c: fix comment in __next_mem_range() 2015-09-08 15:35:28 -07:00
memcontrol.c memcg: ratify and consolidate over-charge handling 2015-11-05 19:34:48 -08:00
memory_hotplug.c mm: memory hotplug with an existing resource 2015-10-23 14:19:58 +01:00
memory-failure.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
memory.c mm, dax: fix DAX deadlocks 2015-10-16 11:42:28 -07:00
mempolicy.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
mempool.c mm/mempool: allow NULL `pool' pointer in mempool_destroy() 2015-09-08 15:35:28 -07:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c memcg: fix dirty page migration 2015-10-01 21:42:35 -04:00
mincore.c mincore: apply page table walker on do_mincore() 2015-02-11 17:06:06 -08:00
mlock.c mm/mlock.c: reorganize mlockall() return values and remove goto-out label 2015-11-05 19:34:48 -08:00
mm_init.c mm: meminit: remove mminit_verify_page_links 2015-06-30 19:44:56 -07:00
mmap.c mm/mmap.c: remove useless statement "vma = NULL" in find_vma() 2015-11-05 19:34:48 -08:00
mmu_context.c
mmu_notifier.c mmu-notifier: add clear_young callback 2015-09-10 13:29:01 -07:00
mmzone.c mm: microoptimize zonelist operations 2015-02-11 17:06:02 -08:00
mprotect.c userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx 2015-09-04 16:54:41 -07:00
mremap.c mremap: simplify the "overlap" check in mremap_to() 2015-09-04 16:54:41 -07:00
msync.c mm: remove rest usage of VM_NONLINEAR and pte_file() 2015-02-10 14:30:31 -08:00
nobootmem.c mm: page_alloc: pass PFN to __free_pages_bootmem 2015-06-30 19:44:55 -07:00
nommu.c mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() 2015-09-10 13:29:01 -07:00
oom_kill.c mm, oom: remove unnecessary variable 2015-09-08 15:35:28 -07:00
page_alloc.c debugfs: Pass bool pointer to debugfs_create_bool() 2015-10-04 11:36:07 +01:00
page_counter.c mm: page_counter: pull "-1" handling out of page_counter_memparse() 2015-02-11 17:06:02 -08:00
page_ext.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_idle.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_io.c fs: use helper bio_add_page() instead of open coding on bi_io_vec 2015-08-13 12:32:00 -06:00
page_isolation.c mm, page_isolation: make set/unset_migratetype_isolate() file-local 2015-09-08 15:35:28 -07:00
page_owner.c mm/page_owner: set correct gfp_mask on page_owner 2015-07-17 16:39:54 -07:00
page-writeback.c writeback: fix incorrect calculation of available memory for memcg domains 2015-10-12 10:31:13 -06:00
pagewalk.c mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callers 2015-03-25 16:20:30 -07:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c percpu: clean up of schunk->map[] assignment in pcpu_setup_first_chunk 2015-07-21 11:31:00 -04:00
pgtable-generic.c mm,thp: introduce flush_pmd_tlb_range 2015-10-17 17:48:20 +05:30
process_vm_access.c process_vm_access: switch to {compat_,}import_iovec() 2015-04-11 22:27:12 -04:00
quicklist.c
readahead.c mm, fs: obey gfp_mapping for add_to_page_cache() 2015-10-16 11:42:28 -07:00
rmap.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
shmem.c shmem: recalculate file inode when fstat 2015-09-08 15:35:28 -07:00
slab_common.c mm/slab_common.c: initialize kmem_cache pointer to NULL 2015-11-05 19:34:48 -08:00
slab.c mm: slab: only move management objects off-slab for sizes larger than KMALLOC_MIN_SIZE 2015-11-05 19:34:48 -08:00
slab.h mm/slab_common.c: clear pointers to per memcg caches on destroy 2015-11-05 19:34:48 -08:00
slob.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
slub.c mm/slub: calculate start order with reserved in consideration 2015-11-05 19:34:48 -08:00
sparse-vmemmap.c
sparse.c
swap_cgroup.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swap_state.c mm: swap: zswap: maybe_preload & refactoring 2015-09-08 15:35:28 -07:00
swap.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
swapfile.c mm: /proc/pid/smaps:: show proportional swap share of the mapping 2015-09-08 15:35:28 -07:00
truncate.c memcg: add per cgroup dirty page accounting 2015-06-02 08:33:33 -06:00
userfaultfd.c userfaultfd: avoid mmap_sem read recursion in mcopy_atomic 2015-09-04 16:54:41 -07:00
util.c mm: uninline and cleanup page-mapping related helpers 2015-04-15 16:35:19 -07:00
vmacache.c mm,vmacache: count number of system-wide flushes 2014-12-13 12:42:48 -08:00
vmalloc.c mm: get rid of 'vmalloc_info' from /proc/meminfo 2015-11-01 17:09:15 -08:00
vmpressure.c mm/vmpressure.c: fix race in vmpressure_work_fn() 2014-12-02 17:32:07 -08:00
vmscan.c vmscan: fix sane_reclaim helper for legacy memcg 2015-09-22 15:09:53 -07:00
vmstat.c vmstat: explicitly schedule per-cpu work on the CPU we need it to run on 2015-10-15 13:01:50 -07:00
workingset.c list_lru: add helpers to isolate items 2015-02-12 18:54:10 -08:00
zbud.c mm: zbud: constify the zbud_ops 2015-09-08 15:35:28 -07:00
zpool.c zpool: add zpool_has_pool() 2015-09-10 13:29:01 -07:00
zsmalloc.c mm: zpool: constify the zpool_ops 2015-09-08 15:35:28 -07:00
zswap.c zswap: change zpool/compressor at runtime 2015-09-10 13:29:01 -07:00