linux/arch/x86
Shaohua Li 70e4a36973 x86: Scale up the number of TLB invalidate vectors with NR_CPUs, up to 32
Make the maxium TLB invalidate vectors depend on NR_CPUS linearly,
with a maximum of 32 vectors.

We currently only have 8 vectors for TLB invalidation and that is clearly
inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and
tlbstate_lock is used to protect them. flush_tlb_page() is
heavily used in page reclaim, which will cause a lot of lock
contention for tlbstate_lock.

Andi Kleen suggested increasing the vectors number to 32, which should be
good for current typical systems to reduce the tlbstate_lock contention.

My test system has 4 sockets and 64G memory, and 64 CPUs. My
workload creates 64 processes. Each process mmap reads a big
empty sparse file. The total size of the files are 2*total_mem,
so this will cause a lot of page reclaim.

Below is the result I get from perf call-graph profiling:

 without the patch:
 ------------------

    24.25%           usemem  [kernel]                                   [k] _raw_spin_lock
                     |
                     --- _raw_spin_lock
                        |
                        |--42.15%-- native_flush_tlb_others

 with the patch:
 ------------------

    14.96%           usemem  [kernel]                                   [k] _raw_spin_lock
                     |
                     --- _raw_spin_lock
                        |--13.89%-- native_flush_tlb_others

So this heavily reduces the tlbstate_lock contention.

Suggested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1295232727.1949.709.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 13:03:08 +01:00
..
boot x86: support XZ-compressed kernel 2011-01-13 08:03:25 -08:00
configs defconfig reduction 2010-08-14 22:26:53 +02:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2011-01-13 10:25:58 -08:00
ia32 BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
include/asm x86: Scale up the number of TLB invalidate vectors with NR_CPUs, up to 32 2011-02-14 13:03:08 +01:00
kernel x86: Allocate 32 tlb_invalidate_interrupt handler stubs 2011-02-14 13:03:08 +01:00
kvm KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index 2011-02-09 18:31:36 +02:00
lguest LGUEST_GUEST: fix unmet direct dependencies (VIRTUALIZATION && VIRTIO) 2011-01-20 21:37:30 +10:30
lib x86: udelay: Use this_cpu_read to avoid address calculation 2011-01-04 06:08:55 +01:00
math-emu
mm x86, nx: Don't force pages RW when setting NX bits 2011-02-02 16:02:36 -08:00
oprofile Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-01-11 11:02:13 -08:00
pci Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 2011-01-14 09:29:05 -08:00
platform x86: OLPC: convert olpc-xo1 driver from pci device to platform device 2011-01-14 12:38:15 +01:00
power x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states 2010-08-20 14:59:02 +02:00
tools
vdso x86, gcc-4.6: Use gcc -m options when building vdso 2010-12-13 16:08:37 -08:00
video
xen xen/setup: Route halt operations to safe_halt pvop. 2011-01-27 12:00:24 -05:00
.gitignore
Kbuild x86: Add platform directory 2010-10-27 14:30:01 +02:00
Kconfig kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
Kconfig.cpu kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
Kconfig.debug kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
Makefile Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-10-21 13:06:00 -07:00
Makefile_32.cpu jump label: Add work around to i386 gcc asm goto bug 2010-10-29 14:45:29 -04:00