linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-28 07:01:32 +00:00

History

Hugh Dickins 4035c07a89 ksm: take keyhole reference to page There's a lamentable flaw in KSM swapping: the stable_node holds a reference to the ksm page, so the page to be freed cannot actually be freed until ksmd works its way around to removing the last rmap_item from its stable_node. Which in some configurations may take minutes: not quite responsive enough for memory reclaim. And we don't want to twist KSM and its locking more tightly into the rest of mm. What a pity. But although the stable_node needs to hold a pointer to the ksm page, does it actually need to raise the reference count of that page? No. It would need to do so if struct pages were ordinary kmalloc'ed objects; but they are more stable than that, and reused in particular ways according to particular rules. Access to stable_node from its pointer in struct page is no problem, so long as we never free a stable_node before the ksm page itself has been freed. Access to struct page from its pointer in stable_node: reintroduce get_ksm_page(), and let that peep out through its keyhole (the stable_node pointer to ksm page), to see if that struct page still holds the right key to open it (the ksm page mapping pointer back to this stable_node). This relies upon the established way in which free_hot_cold_page() sets an anon (including ksm) page->mapping to NULL; and relies upon no other user of a struct page to put something which looks like the original stable_node pointer (with two low bits also set) into page->mapping. It also needs get_page_unless_zero() technique pioneered by speculative pagecache; and uses rcu_read_lock() to keep the guarantees that gives. There are several drivers which put pointers of their own into page-> mapping; but none of those could coincide with our stable_node pointers, since KSM won't free a stable_node until it sees that the page has gone. The only problem case found is the pagetable spinlock USE_SPLIT_PTLOCKS places in struct page (my own abuse): to accommodate GENERIC_LOCKBREAK's break_lock on 32-bit, that spans both page->private and page->mapping. Since break_lock is only 0 or 1, again no confusion for get_ksm_page(). But what of DEBUG_SPINLOCK on 64-bit bigendian? When owner_cpu is 3 (matching PageKsm low bits), it might see 0xdead4ead00000003 in page-> mapping, which might coincide? We could get around that by... but a better answer is to suppress USE_SPLIT_PTLOCKS when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, to stop bloating sizeof(struct page) in their case - already proposed in an earlier mm/Kconfig patch. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2009-12-15 08:53:19 -08:00
..
backing-dev.c	flusher: Fix PF_FROZEN race	2009-12-03 13:49:43 +01:00
bootmem.c	bootmem: Add free_bootmem_late()	2009-11-10 12:31:43 +01:00
bounce.c
debug-pagealloc.c
dmapool.c	dmapools: protect page_list walk in show_pools()	2009-06-30 18:56:00 -07:00
fadvise.c	readahead: move max_sane_readahead() calls into force_page_cache_readahead()	2009-06-16 19:47:28 -07:00
failslab.c
filemap_xip.c	const: mark struct vm_struct_operations	2009-09-27 11:39:25 -07:00
filemap.c	kill wait_on_page_writeback_range	2009-12-10 15:02:50 +01:00
fremap.c
highmem.c	highmem: Fix debug_kmap_atomic() to also handle KM_IRQ_PTE, KM_NMI, and KM_NMI_PTE	2009-11-10 04:15:47 +01:00
hugetlb.c	mm: add gfp flags for NODEMASK_ALLOC slab allocations	2009-12-15 08:53:13 -08:00
hwpoison-inject.c	HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs	2009-09-16 11:50:17 +02:00
init-mm.c	mm: consolidate init_mm definition	2009-06-16 19:47:28 -07:00
internal.h	ksm: fix mlockfreed to munlocked	2009-12-15 08:53:19 -08:00
Kconfig	mm: stop ptlock enlarging struct page	2009-12-15 08:53:17 -08:00
Kconfig.debug	trivial: improve help text for mm debug config options	2009-09-21 15:14:57 +02:00
kmemcheck.c
kmemleak-test.c	percpu: clean up percpu variable definitions	2009-06-24 15:13:48 +09:00
kmemleak.c	tree-wide: fix typos "aquire" -> "acquire", "cumsumed" -> "consumed"	2009-11-09 09:40:57 +01:00
ksm.c	ksm: take keyhole reference to page	2009-12-15 08:53:19 -08:00
maccess.c
madvise.c	Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6	2009-09-24 07:53:22 -07:00
Makefile	percpu: kill legacy percpu allocator	2009-10-02 13:29:29 +09:00
memcontrol.c	tree-wide: fix assorted typos all over the place	2009-12-04 15:39:55 +01:00
memory_hotplug.c	mm: clear node in N_HIGH_MEMORY and stop kswapd when all memory is offlined	2009-12-15 08:53:13 -08:00
memory-failure.c	mm: CONFIG_MMU for PG_mlocked	2009-12-15 08:53:17 -08:00
memory.c	ksm: let shared pages be swappable	2009-12-15 08:53:19 -08:00
mempolicy.c	hugetlb: derive huge pages nodes allowed from task mempolicy	2009-12-15 08:53:12 -08:00
mempool.c	mm: remove broken 'kzalloc' mempool	2009-09-22 07:17:35 -07:00
migrate.c	mm: define PAGE_MAPPING_FLAGS	2009-12-15 08:53:17 -08:00
mincore.c
mlock.c	ksm: fix mlockfreed to munlocked	2009-12-15 08:53:19 -08:00
mm_init.c
mmap.c	mmap: don't return ENOMEM when mapcount is temporarily exceeded in munmap()	2009-12-15 08:53:11 -08:00
mmu_context.c	mm: reduce atomic use on use_mm fast path	2009-09-22 07:17:42 -07:00
mmu_notifier.c	ksm: add mmu_notifier set_pte_at_notify()	2009-09-22 07:17:31 -07:00
mmzone.c
mprotect.c	perf: Do the big rename: Performance Counters -> Performance Events	2009-09-21 14:28:04 +02:00
mremap.c	Take arch_mmap_check() into get_unmapped_area()	2009-12-11 06:44:58 -05:00
msync.c
nommu.c	NOMMU: Don't pass NULL pointers to fput() in do_mmap_pgoff()	2009-10-31 12:11:37 -07:00
oom_kill.c	oom: dump stack and VM state when oom killer panics	2009-12-15 08:53:10 -08:00
page_alloc.c	mm: CONFIG_MMU for PG_mlocked	2009-12-15 08:53:17 -08:00
page_cgroup.c	memory hotplug: alloc page from other node in memory online	2009-09-22 07:17:26 -07:00
page_io.c	swap: rework map_swap_page() again	2009-12-15 08:53:16 -08:00
page_isolation.c
page-writeback.c	writeback: remove unused nonblocking and congestion checks	2009-12-03 13:54:25 +01:00
pagewalk.c
percpu.c	Merge branch 'for-linus' into for-next	2009-12-08 10:02:12 +09:00
prio_tree.c
quicklist.c	cpumask: use new-style cpumask ops in mm/quicklist.	2009-09-24 09:34:52 +09:30
readahead.c	readahead: introduce context readahead algorithm	2009-06-16 19:47:30 -07:00
rmap.c	ksm: hold anon_vma in rmap_item	2009-12-15 08:53:19 -08:00
shmem_acl.c	shmfs: use 'check_acl' instead of 'permission'	2009-09-08 11:08:46 -07:00
shmem.c	swap_info: note SWAP_MAP_SHMEM	2009-12-15 08:53:16 -08:00
slab.c	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-12-14 10:13:22 -08:00
slob.c	slab: remove duplicate kmem_cache_init_late() declarations	2009-08-06 11:36:25 +03:00
slub.c	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-12-14 10:13:22 -08:00
sparse-vmemmap.c	memory hotplug: alloc page from other node in memory online	2009-09-22 07:17:26 -07:00
sparse.c	memory hotplug: alloc page from other node in memory online	2009-09-22 07:17:26 -07:00
swap_state.c	mm: add_to_swap_cache() does not return -EEXIST	2009-09-22 07:17:35 -07:00
swap.c	mm: replace various uses of num_physpages by totalram_pages	2009-09-22 07:17:38 -07:00
swapfile.c	ksm: let shared pages be swappable	2009-12-15 08:53:19 -08:00
thrash.c	mm: pass mm to grab_swap_token	2009-06-23 12:50:05 -07:00
truncate.c	mm: fix comments for invalidate_inode_pages2()	2009-12-04 15:39:48 +01:00
util.c	fix a struct file leak in do_mmap_pgoff()	2009-12-11 06:44:57 -05:00
vmalloc.c	vmalloc(): adjust gfp mask passed on nested vmalloc() invocation	2009-12-15 08:53:13 -08:00
vmscan.c	vmscan: make consistent of reclaim bale out between do_try_to_free_page and shrink_zone	2009-12-15 08:53:18 -08:00
vmstat.c	vmscan: stop kswapd waiting on congestion when the min watermark is not being met	2009-12-15 08:53:16 -08:00