linux/mm
Daisuke Nishimura 3c776e6466 memcg: charge swapcache to proper memcg
memcg_test.txt says at 4.1:

	This swap-in is one of the most complicated work. In do_swap_page(),
	following events occur when pte is unchanged.

	(1) the page (SwapCache) is looked up.
	(2) lock_page()
	(3) try_charge_swapin()
	(4) reuse_swap_page() (may call delete_swap_cache())
	(5) commit_charge_swapin()
	(6) swap_free().

	Considering following situation for example.

	(A) The page has not been charged before (2) and reuse_swap_page()
	    doesn't call delete_from_swap_cache().
	(B) The page has not been charged before (2) and reuse_swap_page()
	    calls delete_from_swap_cache().
	(C) The page has been charged before (2) and reuse_swap_page() doesn't
	    call delete_from_swap_cache().
	(D) The page has been charged before (2) and reuse_swap_page() calls
	    delete_from_swap_cache().

	    memory.usage/memsw.usage changes to this page/swp_entry will be
	 Case          (A)      (B)       (C)     (D)
         Event
       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
          ===========================================
          (3)        +1/+1    +1/+1     +1/+1   +1/+1
          (4)          -       0/ 0       -     -1/ 0
          (5)         0/-1     0/ 0     -1/-1    0/ 0
          (6)          -       0/-1       -      0/-1
          ===========================================
       Result         1/ 1     1/ 1      1/ 1    1/ 1

       In any cases, charges to this page should be 1/ 1.

In case of (D), mem_cgroup_try_get_from_swapcache() returns NULL
(because lookup_swap_cgroup() returns NULL), so "+1/+1" at (3) means
charges to the memcg("foo") to which the "current" belongs.
OTOH, "-1/0" at (4) and "0/-1" at (6) means uncharges from the memcg("baa")
to which the page has been charged.

So, if the "foo" and "baa" is different(for example because of task move),
this charge will be moved from "baa" to "foo".

I think this is an unexpected behavior.

This patch fixes this by modifying mem_cgroup_try_get_from_swapcache()
to return the memcg to which the swapcache has been charged if PCG_USED bit
is set.
IIUC, checking PCG_USED bit of swapcache is safe under page lock.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:04:56 -07:00
..
allocpercpu.c cpumask: use new cpumask_ functions in core code. 2009-03-30 22:05:16 +10:30
backing-dev.c Move the default_backing_dev_info out of readahead.c and into backing-dev.c 2009-03-26 11:01:33 +01:00
bootmem.c bootmem, x86: further fixes for arch-specific bootmem wrapping 2009-03-01 16:06:56 +09:00
bounce.c bounce: don't rely on a zeroed bio_vec list 2008-12-29 08:29:52 +01:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c [CVE-2009-0029] System call wrapper special cases 2009-01-14 14:15:18 +01:00
failslab.c SLUB: failslab support 2008-12-29 11:27:46 +02:00
filemap_xip.c mm: do_xip_mapping_read: fix length calculation 2009-04-02 19:04:49 -07:00
filemap.c x86, mm: dont use non-temporal stores in pagecache accesses 2009-03-02 11:06:49 +01:00
fremap.c Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
highmem.c mm: introduce debug_kmap_atomic 2009-04-01 08:59:14 -07:00
hugetlb.c hugetlb: chg cannot become less than 0 2009-04-01 08:59:13 -07:00
internal.h nommu: there is no mlock() for NOMMU, so don't provide the bits 2009-04-01 08:59:14 -07:00
Kconfig nommu: make CONFIG_UNEVICTABLE_LRU available when CONFIG_MMU=n 2009-04-01 08:59:15 -07:00
Kconfig.debug generic debug pagealloc: build fix 2009-04-02 19:04:48 -07:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c [CVE-2009-0029] System call wrappers part 14 2009-01-14 14:15:24 +01:00
Makefile generic debug pagealloc 2009-04-01 08:59:13 -07:00
memcontrol.c memcg: charge swapcache to proper memcg 2009-04-02 19:04:56 -07:00
memory_hotplug.c mm: remove GFP_HIGHUSER_PAGECACHE 2009-01-06 15:59:01 -08:00
memory.c mm: page_mkwrite change prototype to match fault 2009-04-01 08:59:14 -07:00
mempolicy.c [CVE-2009-0029] System call wrappers part 28 2009-01-14 14:15:30 +01:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c migration: migrate_vmas should check "vma" 2009-02-11 14:25:34 -08:00
mincore.c [CVE-2009-0029] System call wrappers part 14 2009-01-14 14:15:24 +01:00
mlock.c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-02-17 14:27:39 -08:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c nommu: fix a number of issues with the per-MM VMA patch 2009-04-02 19:04:48 -07:00
mmu_notifier.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmzone.c mm: mark the correct zone as full when scanning zonelists 2008-09-13 14:41:52 -07:00
mprotect.c Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
mremap.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
msync.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
nommu.c nommu: fix a number of issues with the per-MM VMA patch 2009-04-02 19:04:48 -07:00
oom_kill.c memcg: show memcg information during OOM 2009-04-02 19:04:55 -07:00
page_alloc.c vmscan: fix it to take care of nodemask 2009-04-01 08:59:15 -07:00
page_cgroup.c memcg: use __GFP_NOWARN in page cgroup allocation 2009-02-11 14:25:35 -08:00
page_io.c block: fix bad definition of BIO_RW_SYNC 2009-02-18 10:32:00 +01:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c mm: fix proc_dointvec_userhz_jiffies "breakage" 2009-04-01 08:59:13 -07:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL 2009-03-30 22:05:11 +10:30
percpu.c percpu: generalize embedding first chunk setup helper 2009-03-10 16:27:48 +09:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c mm: size of quicklists shouldn't be proportional to the number of CPUs 2008-09-02 19:21:38 -07:00
readahead.c Move the default_backing_dev_info out of readahead.c and into backing-dev.c 2009-03-26 11:01:33 +01:00
rmap.c mm: fix mlocked page counter mismatch 2009-02-11 14:25:35 -08:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
shmem.c shmem: writepage directly to swap 2009-04-01 08:59:15 -07:00
slab.c workqueue: add to_delayed_work() helper function 2009-04-02 19:04:50 -07:00
slob.c Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-03-30 17:17:35 -07:00
slub.c Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-03-30 17:17:35 -07:00
sparse-vmemmap.c vmemmap: warn about page_structs with remote distance 2008-11-06 15:41:19 -08:00
sparse.c mm: mminit_validate_memmodel_limits(): remove redundant test 2009-04-01 08:59:11 -07:00
swap_state.c memcg: mem+swap controller core 2009-01-08 08:31:05 -08:00
swap.c mm: remove pagevec_swap_free() 2009-04-01 08:59:13 -07:00
swapfile.c PM/hibernate: fix "swap breaks after hibernation failures" 2009-02-21 14:17:17 -08:00
thrash.c
truncate.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
util.c memdup_user(): introduce 2009-04-01 08:59:13 -07:00
vmalloc.c vmap: remove needless lock and list in vmap 2009-04-01 08:59:11 -07:00
vmscan.c vmscan: fix it to take care of nodemask 2009-04-01 08:59:15 -07:00
vmstat.c mm: align vmstat_work's timer 2009-04-02 19:04:48 -07:00