linux/arch/x86
Waiman Long dd0fd8bca1 x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64
It was found when running fio sequential write test with a XFS ramdisk
on a KVM guest running on a 2-socket x86-64 system, the %CPU times
as reported by perf were as follows:

 69.75%  0.59%  fio  [k] down_write
 69.15%  0.01%  fio  [k] call_rwsem_down_write_failed
 67.12%  1.12%  fio  [k] rwsem_down_write_failed
 63.48% 52.77%  fio  [k] osq_lock
  9.46%  7.88%  fio  [k] __raw_callee_save___kvm_vcpu_is_preempt
  3.93%  3.93%  fio  [k] __kvm_vcpu_is_preempted

Making vcpu_is_preempted() a callee-save function has a relatively
high cost on x86-64 primarily due to at least one more cacheline of
data access from the saving and restoring of registers (8 of them)
to and from stack as well as one more level of function call.

To reduce this performance overhead, an optimized assembly version
of the the __raw_callee_save___kvm_vcpu_is_preempt() function is
provided for x86-64.

With this patch applied on a KVM guest on a 2-socket 16-core 32-thread
system with 16 parallel jobs (8 on each socket), the aggregrate
bandwidth of the fio test on an XFS ramdisk were as follows:

   I/O Type      w/o patch    with patch
   --------      ---------    ----------
   random read   8141.2 MB/s  8497.1 MB/s
   seq read      8229.4 MB/s  8304.2 MB/s
   random write  1675.5 MB/s  1701.5 MB/s
   seq write     1681.3 MB/s  1699.9 MB/s

There are some increases in the aggregated bandwidth because of
the patch.

The perf data now became:

 70.78%  0.58%  fio  [k] down_write
 70.20%  0.01%  fio  [k] call_rwsem_down_write_failed
 69.70%  1.17%  fio  [k] rwsem_down_write_failed
 59.91% 55.42%  fio  [k] osq_lock
 10.14% 10.14%  fio  [k] __kvm_vcpu_is_preempted

The assembly code was verified by using a test kernel module to
compare the output of C __kvm_vcpu_is_preempted() and that of assembly
__raw_callee_save___kvm_vcpu_is_preempt() to verify that they matched.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-21 12:48:35 +01:00
..
boot x86/boot: Add missing declaration of string functions 2017-01-09 11:53:05 +01:00
configs IOMMU Updates for Linux v4.9 2016-10-11 12:52:41 -07:00
crypto crypto: aesni - Fix failure when built-in with modular pcbc 2016-12-30 18:20:45 +08:00
entry x86/entry: Fix the end of the stack for newly forked tasks 2017-01-12 09:28:29 +01:00
events Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-01-18 11:13:41 -08:00
ia32 Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
include x86/paravirt: Change vcp_is_preempted() arg type to long 2017-02-21 12:48:06 +01:00
kernel x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 2017-02-21 12:48:35 +01:00
kvm KVM: VMX: use correct vmcs_read/write for guest segment selector/base 2017-02-21 12:45:49 +01:00
lguest clocksource: Use a plain u64 instead of cycle_t 2016-12-25 11:04:12 +01:00
lib Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
math-emu Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mm x86/mpx: Use compatible types in comparison to fix sparse error 2017-01-14 09:32:06 +01:00
net bpf: change back to orig prog on too many passes 2017-01-08 17:00:18 -05:00
oprofile x86/oprofile/nmi: Convert to hotplug state machine 2016-11-22 23:34:43 +01:00
pci x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F 2017-01-11 09:11:15 -06:00
platform Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-01-15 12:03:11 -08:00
power Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-18 13:59:10 -08:00
purgatory x86/kexec: add -fno-PIE 2016-11-09 22:28:09 +01:00
ras x86/RAS: Add TSC timestamp to the injected MCE 2016-11-08 17:10:13 +01:00
realmode x86/build: Don't use $(LINUXINCLUDE) twice 2016-11-28 07:49:17 +01:00
tools x86/tools: Fix gcc-7 warning in relocs.c 2016-12-19 11:50:24 +01:00
um Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
video
xen Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb 2017-01-06 10:53:21 -08:00
.gitignore
Kbuild
Kconfig Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-22 09:25:45 -08:00
Kconfig.cpu
Kconfig.debug
Makefile lib/raid6: Add AVX512 optimized gen_syndrome functions 2016-09-21 09:09:44 -07:00
Makefile_32.cpu
Makefile.um