linux/mm
Hugh Dickins d189922862 shmem: fix negative rss in memcg memory.stat
When adding the page_private checks before calling shmem_replace_page(), I
did realize that there is a further race, but thought it too unlikely to
need a hurried fix.

But independently I've been chasing why a mem cgroup's memory.stat
sometimes shows negative rss after all tasks have gone: I expected it to
be a stats gathering bug, but actually it's shmem swapping's fault.

It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
the page may have been removed from swapcache before getting the lock; or
it may have been freed and reused and be back in swapcache; and it can
even be using the same swap location as before (page_private same).

The swapoff case is already secure against this (swap cannot be reused
until the whole area has been swapped off, and a new swapped on); and
shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check for
the expected radix_tree entry - but a little too late.

By that time, we might have already decided to shmem_replace_page(): I
don't know of a problem from that, but I'd feel more at ease not to do so
spuriously.  And we have already done mem_cgroup_cache_charge(), on
perhaps the wrong mem cgroup: and this charge is not then undone on the
error path, because PageSwapCache ends up preventing that.

It's this last case which causes the occasional negative rss in
memory.stat: the page is charged here as cache, but (sometimes) found to
be anon when eventually it's uncharged - and in between, it's an
undeserved charge on the wrong memcg.

Fix this by adding an earlier check on the radix_tree entry: it's
inelegant to descend the tree twice, but swapping is not the fast path,
and a better solution would need a pair (try+commit) of memcg calls, and a
rework of shmem_replace_page() to keep out of the swapcache.

We can use the added shmem_confirm_swap() function to replace the
find_get_page+page_cache_release we were already doing on the error path.
And add a comment on that -EEXIST: it seems a peculiar errno to be using,
but originates from its use in radix_tree_insert().

[It can be surprising to see positive rss left in a memcg's memory.stat
after all tasks have gone, since it is supposed to count anonymous but not
shmem.  Aside from sharing anon pages via fork with a task in some other
memcg, it often happens after swapping: because a swap page can't be freed
while under writeback, nor while locked.  So it's not an error, and these
residual pages are easily freed once pressure demands.]

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11 16:04:48 -07:00
..
backing-dev.c backing-dev: fix wakeup timer races with bdi_unregister() 2012-02-01 16:52:49 +08:00
bootmem.c mm/bootmem.c: cleanup on addition to bootmem data list 2012-05-29 16:22:24 -07:00
bounce.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
cleancache.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
compaction.c mm, thp: abort compaction if migration page cannot be charged to memcg 2012-07-11 16:04:43 -07:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c
fadvise.c fadvise: only initiate writeback for specified range with FADV_DONTNEED 2012-01-10 16:30:43 -08:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap_xip.c fs: introduce inode operation ->update_time 2012-06-01 12:07:25 -04:00
filemap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-06-01 10:34:35 -07:00
fremap.c
frontswap.c frontswap: s/put_page/store/g s/get_page/load 2012-05-15 11:34:08 -04:00
highmem.c
huge_memory.c mm/memcg: apply add/del_page to lruvec 2012-05-29 16:22:28 -07:00
hugetlb.c mm: fix vma_resv_map() NULL pointer 2012-05-30 08:48:13 -07:00
hwpoison-inject.c HWPOISON: Clean up memory_failure() vs. __memory_failure() 2012-01-03 12:06:32 -08:00
init-mm.c
internal.h Revert "mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks" 2012-06-03 20:05:57 -07:00
Kconfig Frontswap provides a "transcendent memory" interface for swap pages. 2012-06-04 12:28:45 -07:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c ksm: cleanup: introduce find_mergeable_vma() 2012-03-21 17:54:59 -07:00
maccess.c
madvise.c mm: Hold a file reference in madvise_remove 2012-07-06 10:34:38 -07:00
Makefile Frontswap provides a "transcendent memory" interface for swap pages. 2012-06-04 12:28:45 -07:00
memblock.c mm/memblock: fix overlapping allocation when doubling reserved array 2012-06-20 14:39:36 -07:00
memcontrol.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
memory_hotplug.c mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails 2012-07-11 16:04:46 -07:00
memory-failure.c mm/memory_failure: let the compiler add the function name 2012-05-29 16:22:18 -07:00
memory.c mm/memory.c: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
mempolicy.c mm, mempolicy: fix mbind() to do synchronous migration 2012-06-20 22:10:42 -07:00
mempool.c mempool: fix first round failure behavior 2012-01-10 16:30:45 -08:00
migrate.c mm: fix warning in __set_page_dirty_nobuffers 2012-06-03 20:05:47 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c
mmap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-06-01 10:34:35 -07:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c
mmzone.c mm: add link from struct lruvec to struct zone 2012-05-29 16:22:26 -07:00
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c move security_mmap_addr() to saner place 2012-06-01 10:37:16 -04:00
msync.c
nobootmem.c mm: remove sparsemem allocation details from the bootmem allocator 2012-05-29 16:22:22 -07:00
nommu.c nommu: fix compilation of nommu.c 2012-06-04 17:17:31 -04:00
oom_kill.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
page_alloc.c Revert "mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks" 2012-06-03 20:05:57 -07:00
page_cgroup.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
page_io.c frontswap: s/put_page/store/g s/get_page/load 2012-05-15 11:34:08 -04:00
page_isolation.c mm: page_isolation: MIGRATE_CMA isolation functions added 2012-05-21 15:09:33 +02:00
page-writeback.c writeback: initialize global_dirty_limit 2012-05-06 13:41:58 +08:00
pagewalk.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu-km.c
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c kmemleak: Fix the kmemleak tracking of the percpu areas with !SMP 2012-05-09 10:13:29 -07:00
pgtable-generic.c arch/tile: allow building Linux with transparent huge pages enabled 2012-05-25 12:48:21 -04:00
prio_tree.c
process_vm_access.c aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() 2012-05-31 17:49:32 -07:00
quicklist.c
readahead.c mm: move readahead syscall to mm/readahead.c 2012-05-29 16:22:23 -07:00
rmap.c mm: remove swap token code 2012-05-29 16:22:19 -07:00
shmem.c shmem: fix negative rss in memcg memory.stat 2012-07-11 16:04:48 -07:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-03-28 15:04:26 -07:00
slob.c
slub.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-06-01 16:50:23 -07:00
sparse-vmemmap.c
sparse.c mm: remove sparsemem allocation details from the bootmem allocator 2012-05-29 16:22:22 -07:00
swap_state.c mm: fix s390 BUG by __set_page_dirty_no_writeback on swap 2012-04-23 18:19:22 -07:00
swap.c mm/memcg: apply add/del_page to lruvec 2012-05-29 16:22:28 -07:00
swapfile.c swap: fix shmem swapping when more than 8 areas 2012-06-15 21:48:14 -07:00
truncate.c mm/fs: remove truncate_range 2012-05-29 16:22:23 -07:00
util.c new helper: vm_mmap_pgoff() 2012-06-01 10:37:18 -04:00
vmalloc.c mm: fix faulty initialization in vmalloc_init() 2012-05-29 16:22:24 -07:00
vmscan.c memory hotplug: fix invalid memory access caused by stale kswapd pointer 2012-07-11 16:04:41 -07:00
vmstat.c mm/vmstat.c: remove debug fs entries on failure of file creation and made extfrag_debug_root dentry local 2012-05-29 16:22:19 -07:00