linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-01 01:31:44 +00:00

History

Lee Schermerhorn 252c5f94d9 mmap: avoid unnecessary anon_vma lock acquisition in vma_adjust() We noticed very erratic behavior [throughput] with the AIM7 shared workload running on recent distro [SLES11] and mainline kernels on an 8-socket, 32-core, 256GB x86_64 platform. On the SLES11 kernel [2.6.27.19+] with Barcelona processors, as we increased the load [10s of thousands of tasks], the throughput would vary between two "plateaus"--one at ~65K jobs per minute and one at ~130K jpm. The simple patch below causes the results to smooth out at the ~130k plateau. But wait, there's more: We do not see this behavior on smaller platforms--e.g., 4 socket/8 core. This could be the result of the larger number of cpus on the larger platform--a scalability issue--or it could be the result of the larger number of interconnect "hops" between some nodes in this platform and how the tasks for a given load end up distributed over the nodes' cpus and memories--a stochastic NUMA effect. The variability in the results are less pronounced [on the same platform] with Shanghai processors and with mainline kernels. With 31-rc6 on Shanghai processors and 288 file systems on 288 fibre attached storage volumes, the curves [jpm vs load] are both quite flat with the patched kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K jpm] than the unpatched kernel. Profiling indicated that the "slow" runs were incurring high[er] contention on an anon_vma lock in vma_adjust(), apparently called from the sbrk() system call. The patch: A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the anon_vma lock when we're only adjusting the end of a vma, as is the case for brk(). The comment questions whether it's worth while to optimize for this case. Apparently, on the newer, larger x86_64 platforms, with interesting NUMA topologies, it is worth while--especially considering that the patch [if correct!] is quite simple. We can detect this condition--no overlap with next vma--by noting a NULL "importer". The anon_vma pointer will also be NULL in this case, so simply avoid loading vma->anon_vma to avoid the lock. However, we DO need to take the anon_vma lock when we're inserting a vma ['insert' non-NULL] even when we have no overlap [NULL "importer"], so we need to check for 'insert', as well. And Hugh points out that we should also take it when adjusting vm_start (so that rmap.c can rely upon vma_address() while it holds the anon_vma lock). akpm: Zhang Yanmin reprts a 150% throughput improvement with aim7, so it might be -stable material even though thiss isn't a regression: "this issue is not clear on dual socket Nehalem machine (242 cpu), but is severe on large machine (482 cpu)" [hugh.dickins@tiscali.co.uk: test vma start too] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Nick Piggin <npiggin@suse.de> Cc: Eric Whitney <eric.whitney@hp.com> Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2009-09-22 07:17:41 -07:00
..
allocpercpu.c	percpu: use dynamic percpu allocator as the default percpu allocator	2009-06-24 15:13:35 +09:00
backing-dev.c	writeback: splice dirty inode entries to default bdi on bdi_destroy()	2009-09-16 15:18:52 +02:00
bootmem.c	kmemleak: Do not report alloc_bootmem blocks as leaks	2009-08-27 14:29:17 +01:00
bounce.c	block: remove some includings of blktrace_api.h	2009-06-16 11:19:36 +02:00
debug-pagealloc.c	generic debug pagealloc	2009-04-01 08:59:13 -07:00
dmapool.c	dmapools: protect page_list walk in show_pools()	2009-06-30 18:56:00 -07:00
fadvise.c	readahead: move max_sane_readahead() calls into force_page_cache_readahead()	2009-06-16 19:47:28 -07:00
failslab.c	kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c	2009-04-03 12:23:01 +02:00
filemap_xip.c	mm: do_xip_mapping_read: fix length calculation	2009-04-02 19:04:49 -07:00
filemap.c	mm: oom analysis: add shmem vmstat	2009-09-22 07:17:27 -07:00
fremap.c	Do not account for the address space used by hugetlbfs using VM_ACCOUNT	2009-02-10 10:48:42 -08:00
highmem.c	block: remove some includings of blktrace_api.h	2009-06-16 11:19:36 +02:00
hugetlb.c	mm: hugetlbfs_pagecache_present	2009-09-22 07:17:41 -07:00
init-mm.c	mm: consolidate init_mm definition	2009-06-16 19:47:28 -07:00
internal.h	mm: move highest_memmap_pfn	2009-09-22 07:17:41 -07:00
Kconfig	ksm: add some documentation	2009-09-22 07:17:33 -07:00
Kconfig.debug	kmemcheck: enable in the x86 Kconfig	2009-06-15 15:49:15 +02:00
kmemcheck.c	kmemcheck: add hooks for the page allocator	2009-06-15 15:48:33 +02:00
kmemleak-test.c	percpu: clean up percpu variable definitions	2009-06-24 15:13:48 +09:00
kmemleak.c	kmemleak: Improve the "Early log buffer exceeded" error message	2009-09-11 10:42:09 +01:00
ksm.c	ksm: unmerge is an origin of OOMs	2009-09-22 07:17:33 -07:00
maccess.c	[S390] maccess: add weak attribute to probe_kernel_write	2009-06-12 10:27:37 +02:00
madvise.c	ksm: the mm interface to ksm	2009-09-22 07:17:31 -07:00
Makefile	ksm: the mm interface to ksm	2009-09-22 07:17:31 -07:00
memcontrol.c	mm: drop unneeded double negations	2009-09-22 07:17:35 -07:00
memory_hotplug.c	memory hotplug: fix updating of num_physpages for hot plugged memory	2009-09-22 07:17:38 -07:00
memory.c	mm: move highest_memmap_pfn	2009-09-22 07:17:41 -07:00
mempolicy.c	mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware	2009-08-07 10:39:55 -07:00
mempool.c	mm: remove broken 'kzalloc' mempool	2009-09-22 07:17:35 -07:00
migrate.c	mm: return boolean from page_has_private()	2009-09-22 07:17:38 -07:00
mincore.c	[CVE-2009-0029] System call wrappers part 14	2009-01-14 14:15:24 +01:00
mlock.c	mm: m(un)lock avoid ZERO_PAGE	2009-09-22 07:17:40 -07:00
mm_init.c	mm: mminit_loglevel cannot be __meminitdata anymore	2008-08-20 15:40:30 -07:00
mmap.c	mmap: avoid unnecessary anon_vma lock acquisition in vma_adjust()	2009-09-22 07:17:41 -07:00
mmu_notifier.c	ksm: add mmu_notifier set_pte_at_notify()	2009-09-22 07:17:31 -07:00
mmzone.c	[ARM] Double check memmap is actually valid with a memmap has unexpected holes V2	2009-05-18 11:22:24 +01:00
mprotect.c	perf: Do the big rename: Performance Counters -> Performance Events	2009-09-21 14:28:04 +02:00
mremap.c	ksm: mremap use err from ksm_madvise	2009-09-22 07:17:33 -07:00
msync.c	[CVE-2009-0029] System call wrappers part 13	2009-01-14 14:15:23 +01:00
nommu.c	mm: FOLL flags for GUP flags	2009-09-22 07:17:40 -07:00
oom_kill.c	oom: oom_kill doesn't kill vfork parent (or child)	2009-09-22 07:17:39 -07:00
page_alloc.c	mm: move highest_memmap_pfn	2009-09-22 07:17:41 -07:00
page_cgroup.c	memory hotplug: alloc page from other node in memory online	2009-09-22 07:17:26 -07:00
page_io.c	mm: remove file argument from swap_readpage()	2009-06-16 19:47:44 -07:00
page_isolation.c	memory hotplug: fix page_zone() calculation in test_pages_isolated()	2008-11-06 15:41:19 -08:00
page-writeback.c	mm: count only reclaimable lru pages	2009-09-22 07:17:30 -07:00
pagewalk.c	pagemap: pass mm into pagewalkers	2008-06-12 18:05:41 -07:00
percpu.c	Merge branch 'for-next' into for-linus	2009-09-15 09:57:19 +09:00
prio_tree.c	spelling fixes: mm/	2007-10-20 01:27:18 +02:00
quicklist.c	percpu: cleanup percpu array definitions	2009-06-24 15:13:45 +09:00
readahead.c	readahead: introduce context readahead algorithm	2009-06-16 19:47:30 -07:00
rmap.c	ksm: no debug in page_dup_rmap()	2009-09-22 07:17:31 -07:00
shmem_acl.c	shmfs: use 'check_acl' instead of 'permission'	2009-09-08 11:08:46 -07:00
shmem.c	tmpfs: depend on shmem	2009-09-22 07:17:41 -07:00
slab.c	mm: replace various uses of num_physpages by totalram_pages	2009-09-22 07:17:38 -07:00
slob.c	slab: remove duplicate kmem_cache_init_late() declarations	2009-08-06 11:36:25 +03:00
slub.c	mm: kmem_cache_create(): make it easier to catch NULL cache names	2009-09-22 07:17:33 -07:00
sparse-vmemmap.c	memory hotplug: alloc page from other node in memory online	2009-09-22 07:17:26 -07:00
sparse.c	memory hotplug: alloc page from other node in memory online	2009-09-22 07:17:26 -07:00
swap_state.c	mm: add_to_swap_cache() does not return -EEXIST	2009-09-22 07:17:35 -07:00
swap.c	mm: replace various uses of num_physpages by totalram_pages	2009-09-22 07:17:38 -07:00
swapfile.c	ksm: unmerge is an origin of OOMs	2009-09-22 07:17:33 -07:00
thrash.c	mm: pass mm to grab_swap_token	2009-06-23 12:50:05 -07:00
truncate.c	mm: remove __invalidate_mapping_pages variant	2009-06-16 19:47:43 -07:00
util.c	Merge branches 'slab/documentation', 'slab/fixes', 'slob/cleanups' and 'slub/fixes' into for-linus	2009-06-17 08:30:15 +03:00
vmalloc.c	mm: replace various uses of num_physpages by totalram_pages	2009-09-22 07:17:38 -07:00
vmscan.c	mm/vmscan: remove page_queue_congested() comment	2009-09-22 07:17:39 -07:00
vmstat.c	mm: vmstat: add isolate pages	2009-09-22 07:17:29 -07:00