linux/mm
KAMEZAWA Hiroyuki 08e552c69c memcg: synchronized LRU
A big patch for changing memcg's LRU semantics.

Now,
  - page_cgroup is linked to mem_cgroup's its own LRU (per zone).

  - LRU of page_cgroup is not synchronous with global LRU.

  - page and page_cgroup is one-to-one and statically allocated.

  - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
    - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);

  - SwapCache is handled.

And, when we handle LRU list of page_cgroup, we do following.

	pc = lookup_page_cgroup(page);
	lock_page_cgroup(pc); .....................(1)
	mz = page_cgroup_zoneinfo(pc);
	spin_lock(&mz->lru_lock);
	.....add to LRU
	spin_unlock(&mz->lru_lock);
	unlock_page_cgroup(pc);

But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.

This is a trial to remove this dirty nesting of locks.
This patch changes mz->lru_lock to be zone->lru_lock.
Then, above sequence will be written as

        spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
	mem_cgroup_add/remove/etc_lru() {
		pc = lookup_page_cgroup(page);
		mz = page_cgroup_zoneinfo(pc);
		if (PageCgroupUsed(pc)) {
			....add to LRU
		}
        spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU

This is much simpler.
(*) We're safe even if we don't take lock_page_cgroup(pc). Because..
    1. When pc->mem_cgroup can be modified.
       - at charge.
       - at account_move().
    2. at charge
       the PCG_USED bit is not set before pc->mem_cgroup is fixed.
    3. at account_move()
       the page is isolated and not on LRU.

Pros.
  - easy for maintenance.
  - memcg can make use of laziness of pagevec.
  - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
  - LRU status of memcg will be synchronized with global LRU's one.
  - # of locks are reduced.
  - account_move() is simplified very much.
Cons.
  - may increase cost of LRU rotation.
    (no impact if memcg is not configured.)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
..
allocpercpu.c mm/allocpercpu.c: make 4 functions static 2008-07-26 12:00:12 -07:00
backing-dev.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-06 17:10:04 -08:00
bootmem.c bootmem: print request details before BUG_ON(them) 2009-01-06 15:59:10 -08:00
bounce.c bounce: don't rely on a zeroed bio_vec list 2008-12-29 08:29:52 +01:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
failslab.c SLUB: failslab support 2008-12-29 11:27:46 +02:00
filemap_xip.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
filemap.c mm: pagecache gfp flags fix 2009-01-06 15:59:09 -08:00
fremap.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
highmem.c x86, pat: avoid highmem cache attribute aliasing 2008-08-15 17:22:57 +02:00
hugetlb.c mm: hugetlb: remove redundant `if' operation 2009-01-06 15:59:10 -08:00
internal.h mm: make get_user_pages() interruptible 2009-01-06 15:59:08 -08:00
Kconfig Remove obsolete CONFIG_RESOURCES_64BIT 2009-01-06 15:59:14 -08:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c madvise: update function comment of madvise_dontneed 2008-07-30 09:41:45 -07:00
Makefile shmem: unify regular and tiny shmem 2009-01-06 15:59:08 -08:00
memcontrol.c memcg: synchronized LRU 2009-01-08 08:31:05 -08:00
memory_hotplug.c mm: remove GFP_HIGHUSER_PAGECACHE 2009-01-06 15:59:01 -08:00
memory.c memcg: mem+swap controller core 2009-01-08 08:31:05 -08:00
mempolicy.c Merge branch 'master' into next 2008-11-14 11:29:12 +11:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c memcg: simple migration handling 2009-01-08 08:31:04 -08:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c mm: make get_user_pages() interruptible 2009-01-06 15:59:08 -08:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mm: check for no mmaps in exit_mmap() 2009-01-06 15:59:10 -08:00
mmu_notifier.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmzone.c mm: mark the correct zone as full when scanning zonelists 2008-09-13 14:41:52 -07:00
mprotect.c mm: cleanup: remove #ifdef CONFIG_MIGRATION 2009-01-06 15:59:00 -08:00
mremap.c mm: update my address 2009-01-05 17:44:42 -08:00
msync.c add a vfs_fsync helper 2009-01-05 11:54:28 -05:00
nommu.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
oom_kill.c oom: print triggering task's cpuset and mems allowed 2009-01-06 15:58:59 -08:00
page_alloc.c mm: remove CONFIG_OUT_OF_LINE_PFN_TO_PAGE 2009-01-06 15:59:10 -08:00
page_cgroup.c memcg: synchronized LRU 2009-01-08 08:31:05 -08:00
page_io.c mm: try_to_free_swap replaces remove_exclusive_swap_page 2009-01-06 15:59:03 -08:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c mm: add dirty_background_bytes and dirty_bytes sysctls 2009-01-06 15:59:03 -08:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c mm: size of quicklists shouldn't be proportional to the number of CPUs 2008-09-02 19:21:38 -07:00
readahead.c vmscan: split LRU lists into anon & file sets 2008-10-20 08:50:25 -07:00
rmap.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
shmem.c memcg: handle swap caches 2009-01-08 08:31:05 -08:00
slab.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
slob.c slob: do not pass the SLAB flags as GFP in kmem_cache_create() 2008-12-15 16:27:06 -08:00
slub.c trivial: fix an -> a typos in documentation and comments 2009-01-06 11:28:07 +01:00
sparse-vmemmap.c vmemmap: warn about page_structs with remote distance 2008-11-06 15:41:19 -08:00
sparse.c meminit section warnings 2008-11-30 10:03:35 -08:00
swap_state.c memcg: mem+swap controller core 2009-01-08 08:31:05 -08:00
swap.c memcg: synchronized LRU 2009-01-08 08:31:05 -08:00
swapfile.c memcg: mem+swap controller core 2009-01-08 08:31:05 -08:00
thrash.c
truncate.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
util.c mm: Make generic weak get_user_pages_fast and EXPORT_GPL it 2008-08-12 17:52:53 +10:00
vmalloc.c mm: vmalloc make lazy unmapping configurable 2009-01-06 15:59:01 -08:00
vmscan.c memcg: synchronized LRU 2009-01-08 08:31:05 -08:00
vmstat.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30