linux/include
KAMEZAWA Hiroyuki ab936cbcd0 memcg: add mem_cgroup_replace_page_cache() to fix LRU issue
Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a
function replace_page_cache_page().  This function replaces a page in the
radix-tree with a new page.  WHen doing this, memory cgroup needs to fix
up the accounting information.  memcg need to check PCG_USED bit etc.

In some(many?) cases, 'newpage' is on LRU before calling
replace_page_cache().  So, memcg's LRU accounting information should be
fixed, too.

This patch adds mem_cgroup_replace_page_cache() and removes the old hooks.
 In that function, old pages will be unaccounted without touching
res_counter and new page will be accounted to the memcg (of old page).
WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid
races with LRU handling.

Background:
  replace_page_cache_page() is called by FUSE code in its splice() handling.
  Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
  page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
  because rmdir() checks the whole LRU is empty and there is no account leak.
  If a page is on the other LRU than it should be, rmdir() will fail.

This bug was added in March 2011, but no bug report yet.  I guess there
are not many people who use memcg and FUSE at the same time with upstream
kernels.

The result of this bug is that admin cannot destroy a memcg because of
account leak.  So, no panic, no deadlock.  And, even if an active cgroup
exist, umount can succseed.  So no problem at shutdown.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:04 -08:00
..
acpi
asm-generic lib: use generic pci_iomap on all architectures 2012-01-10 18:04:27 -08:00
crypto
drm Merge branch 'drm-core-next' of git://people.freedesktop.org/~airlied/linux 2012-01-10 11:04:36 -08:00
keys
linux memcg: add mem_cgroup_replace_page_cache() to fix LRU issue 2012-01-12 20:13:04 -08:00
math-emu
media Merge branch 'depends/rmk/restart' into next/cleanup 2012-01-07 11:51:28 +00:00
misc
mtd mtd: document that MEMWRITE ioctl is NAND-specific 2012-01-09 18:18:36 +00:00
net Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs 2012-01-10 15:09:01 -08:00
pcmcia
rdma rdma/core: Fix sparse warnings 2012-01-04 09:17:45 -08:00
rxrpc
scsi SCSI updates for post 3.2 merge window 2012-01-10 10:36:08 -08:00
sound Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound 2012-01-12 08:00:30 -08:00
target
trace Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux 2012-01-10 16:59:59 -08:00
video
xen Merge branch 'stable/for-linus-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen 2012-01-10 10:09:59 -08:00
Kbuild