linux/arch/x86
Jussi Kivilinna f94a73f8dd crypto: twofish-avx - tune assembler code for more performance
Patch replaces 'movb' instructions with 'movzbl' to break false register
dependencies and interleaves instructions better for out-of-order scheduling.

Tested on Intel Core i5-2450M and AMD FX-8100.

tcrypt ECB results:

Intel Core i5-2450M:

size    old-vs-new      new-vs-3way     old-vs-3way
        enc     dec     enc     dec     enc     dec
256     1.12x   1.13x   1.36x   1.37x   1.21x   1.22x
1k      1.14x   1.14x   1.48x   1.49x   1.29x   1.31x
8k      1.14x   1.14x   1.50x   1.52x   1.32x   1.33x

AMD FX-8100:

size    old-vs-new      new-vs-3way     old-vs-3way
        enc     dec     enc     dec     enc     dec
256     1.10x   1.11x   1.01x   1.01x   0.92x   0.91x
1k      1.11x   1.12x   1.08x   1.07x   0.97x   0.96x
8k      1.11x   1.13x   1.10x   1.08x   0.99x   0.97x

[v2]
 - Do instruction interleaving another way to avoid adding new FPU<=>CPU
   register moves as these cause performance drop on Bulldozer.
 - Further interleaving improvements for better out-of-order scheduling.

Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:04 +08:00
..
boot Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-07-26 13:13:25 -07:00
configs x86/kconfig: Remove CONFIG_TR=y from the defconfigs 2012-03-24 08:18:03 +01:00
crypto crypto: twofish-avx - tune assembler code for more performance 2012-09-07 04:17:04 +08:00
ia32 x86, compat: Use test_thread_flag(TIF_IA32) in compat signal delivery 2012-06-14 18:16:04 -07:00
include/asm Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-07-31 15:34:13 -07:00
kernel Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-07-31 15:34:13 -07:00
kvm KVM updates for the 3.6 merge window 2012-07-24 12:01:20 -07:00
lguest
lib Merge branch 'x86/cpu' into perf/core 2012-07-05 21:12:11 +02:00
math-emu x86: Rename trap_no to trap_nr in thread_struct 2012-03-13 06:24:09 +01:00
mm Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-07-26 13:17:17 -07:00
net x86 bpf_jit: support BPF_S_ANC_ALU_XOR_X instruction 2012-06-06 09:42:44 -07:00
oprofile perf/x86/amd: Unify AMD's generic and family 15h pmus 2012-07-05 21:19:41 +02:00
pci Merge branch 'pci/myron-pcibios_setup' into next 2012-07-05 15:31:05 -06:00
platform Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-07-26 13:17:17 -07:00
power x86, kvm: Call restore_sched_clock_state() only after %gs is initialized 2012-04-02 13:53:00 +02:00
realmode x86-64, reboot: Be more paranoid in 64-bit reboot=bios 2012-06-21 10:25:03 -07:00
syscalls syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
tools x86/decoder: Fix bsr/bsf/jmpe decoding with operand-size prefix 2012-06-06 08:54:18 +02:00
um x86, um: Correct syscall table type attributes breaking gcc 4.8 2012-06-09 12:51:09 -07:00
vdso x86, cpu: Rename checking_wrmsrl() to wrmsrl_safe() 2012-06-07 13:32:04 -07:00
video x86: Use vga_default_device() when determining whether an fb is primary 2012-04-24 09:50:17 +01:00
xen Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-07-26 13:17:17 -07:00
.gitignore
Kbuild x86, realmode: realmode.bin infrastructure 2012-05-08 11:41:48 -07:00
Kconfig ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION 2012-07-30 17:25:21 -07:00
Kconfig.cpu x86: Tighten dependencies of CPU_SUP_*_32 2012-03-08 10:57:34 +01:00
Kconfig.debug x86/tlb: add tlb_flushall_shift knob into debugfs 2012-06-27 19:29:10 -07:00
Makefile x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported 2012-06-23 19:25:22 -07:00
Makefile_32.cpu
Makefile.um um: fix linker script generation 2012-04-09 13:59:00 -04:00