linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-15 07:33:56 +00:00

History

Andrea Arcangeli 37a1c49a91 thp: mremap support and TLB optimization This adds THP support to mremap (decreases the number of split_huge_page() calls). Here are also some benchmarks with a proggy like this: === #define _GNU_SOURCE #include <sys/mman.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #include <sys/time.h> #define SIZE (5UL102410241024) int main() { static struct timeval oldstamp, newstamp; long diffsec; char p, p2, p3, p4; if (posix_memalign((void )&p, 210241024, SIZE)) perror("memalign"), exit(1); if (posix_memalign((void )&p2, 210241024, SIZE)) perror("memalign"), exit(1); if (posix_memalign((void )&p3, 210241024, 4096)) perror("memalign"), exit(1); memset(p, 0xff, SIZE); memset(p2, 0xff, SIZE); memset(p3, 0x77, 4096); gettimeofday(&oldstamp, NULL); p4 = mremap(p, SIZE, SIZE, MREMAP_FIXED\|MREMAP_MAYMOVE, p3); gettimeofday(&newstamp, NULL); diffsec = newstamp.tv_sec - oldstamp.tv_sec; diffsec = newstamp.tv_usec - oldstamp.tv_usec + 1000000 diffsec; printf("usec %ld\n", diffsec); if (p == MAP_FAILED \|\| p4 != p3) //if (p == MAP_FAILED) perror("mremap"), exit(1); if (memcmp(p4, p2, SIZE)) printf("mremap bug\n"), exit(1); printf("ok\n"); return 0; } === THP on Performance counter stats for './largepage13' (3 runs): 69195836 dTLB-loads ( +- 3.546% ) (scaled from 50.30%) 60708 dTLB-load-misses ( +- 11.776% ) (scaled from 52.62%) 676266476 dTLB-stores ( +- 5.654% ) (scaled from 69.54%) 29856 dTLB-store-misses ( +- 4.081% ) (scaled from 89.22%) 1055848782 iTLB-loads ( +- 4.526% ) (scaled from 80.18%) 8689 iTLB-load-misses ( +- 2.987% ) (scaled from 58.20%) 7.314454164 seconds time elapsed ( +- 0.023% ) THP off Performance counter stats for './largepage13' (3 runs): 1967379311 dTLB-loads ( +- 0.506% ) (scaled from 60.59%) 9238687 dTLB-load-misses ( +- 22.547% ) (scaled from 61.87%) 2014239444 dTLB-stores ( +- 0.692% ) (scaled from 60.40%) 3312335 dTLB-store-misses ( +- 7.304% ) (scaled from 67.60%) 6764372065 iTLB-loads ( +- 0.925% ) (scaled from 79.00%) 8202 iTLB-load-misses ( +- 0.475% ) (scaled from 70.55%) 9.693655243 seconds time elapsed ( +- 0.069% ) grep thp /proc/vmstat thp_fault_alloc 35849 thp_fault_fallback 0 thp_collapse_alloc 3 thp_collapse_alloc_failed 0 thp_split 0 thp_split 0 confirms no thp split despite plenty of hugepages allocated. The measurement of only the mremap time (so excluding the 3 long memset and final long 10GB memory accessing memcmp): THP on usec 14824 usec 14862 usec 14859 THP off usec 256416 usec 255981 usec 255847 With an older kernel without the mremap optimizations (the below patch optimizes the non THP version too). THP on usec 392107 usec 390237 usec 404124 THP off usec 444294 usec 445237 usec 445820 I guess with a threaded program that sends more IPI on large SMP it'd create an even larger difference. All debug options are off except DEBUG_VM to avoid skewing the results. The only problem for native 2M mremap like it happens above both the source and destination address must be 2M aligned or the hugepmd can't be moved without a split but that is an hardware limitation. [akpm@linux-foundation.org: coding-style nitpicking] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Johannes Weiner <jweiner@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2011-10-31 17:30:48 -07:00
..
acpi	PCI hotplug: acpiphp: Prevent deadlock on PCI-to-PCI bridge remove	2011-10-14 09:05:31 -07:00
asm-generic	dma-mapping: fix sync_single_range_* DMA debugging	2011-10-31 17:30:44 -07:00
crypto
drm	Revert "drm/ttm: add a way to bo_wait for either the last read or last write"	2011-10-27 18:28:37 +02:00
keys
linux	thp: mremap support and TLB optimization	2011-10-31 17:30:48 -07:00
math-emu
media	doc: fix broken references	2011-09-27 18:08:04 +02:00
mtd
net	ipv6: tcp: fix TCLASS value in ACK messages sent from TIME_WAIT	2011-10-27 00:44:35 -04:00
pcmcia
rdma	net: consolidate and fix ethtool_ops->get_settings calling	2011-09-15 17:32:26 -04:00
rxrpc
scsi	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6	2011-10-28 16:44:18 -07:00
sound	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound	2011-10-28 14:25:01 -07:00
target	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2011-10-25 12:11:02 +02:00
trace	mm: change isolate mode from #define to bitwise type	2011-10-31 17:30:44 -07:00
video	Merge branch 'for-florian' of git://gitorious.org/linux-omap-dss2/linux into fbdev-next	2011-10-15 00:19:52 +00:00
xen	Merge branches 'stable/drivers-3.2', 'stable/drivers.bugfixes-3.2' and 'stable/pci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen	2011-10-25 09:19:36 +02:00
Kbuild