linux/arch
Jussi Kivilinna f94a73f8dd crypto: twofish-avx - tune assembler code for more performance
Patch replaces 'movb' instructions with 'movzbl' to break false register
dependencies and interleaves instructions better for out-of-order scheduling.

Tested on Intel Core i5-2450M and AMD FX-8100.

tcrypt ECB results:

Intel Core i5-2450M:

size    old-vs-new      new-vs-3way     old-vs-3way
        enc     dec     enc     dec     enc     dec
256     1.12x   1.13x   1.36x   1.37x   1.21x   1.22x
1k      1.14x   1.14x   1.48x   1.49x   1.29x   1.31x
8k      1.14x   1.14x   1.50x   1.52x   1.32x   1.33x

AMD FX-8100:

size    old-vs-new      new-vs-3way     old-vs-3way
        enc     dec     enc     dec     enc     dec
256     1.10x   1.11x   1.01x   1.01x   0.92x   0.91x
1k      1.11x   1.12x   1.08x   1.07x   0.97x   0.96x
8k      1.11x   1.13x   1.10x   1.08x   0.99x   0.97x

[v2]
 - Do instruction interleaving another way to avoid adding new FPU<=>CPU
   register moves as these cause performance drop on Bulldozer.
 - Further interleaving improvements for better out-of-order scheduling.

Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:04 +08:00
..
alpha ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION 2012-07-30 17:25:21 -07:00
arm arm/crypto: Add optimized AES and SHA1 routines 2012-09-07 04:17:02 +08:00
avr32 ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION 2012-07-30 17:25:21 -07:00
blackfin Merge branch 'akpm' (Andrew's patch-bomb) 2012-07-30 17:25:34 -07:00
c6x Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2012-07-24 10:01:50 -07:00
cris ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION 2012-07-30 17:25:21 -07:00
frv Merge branch 'akpm' (Andrew's patch-bomb) 2012-07-30 17:25:34 -07:00
h8300 ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION 2012-07-30 17:25:21 -07:00
hexagon Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2012-07-30 11:24:53 -07:00
ia64 Merge branch 'akpm' (Andrew's patch-bomb) 2012-07-31 19:25:39 -07:00
m32r ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION 2012-07-30 17:25:21 -07:00
m68k ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION 2012-07-30 17:25:21 -07:00
microblaze ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION 2012-07-30 17:25:21 -07:00
mips mips: zero out pg_data_t when it's allocated 2012-07-31 18:42:49 -07:00
mn10300 Merge branch 'akpm' (Andrew's patch-bomb) 2012-07-30 17:25:34 -07:00
openrisc Remove useless wrappers of asm-generic/rmap.h 2012-06-28 11:29:26 +02:00
parisc PCI changes for the 3.6 merge window: 2012-07-24 16:17:07 -07:00
powerpc powerpc/crypto: add compression support to arch vec 2012-08-01 17:47:53 +08:00
s390 crypto: arch/s390 - cleanup - remove unneeded cra_list initialization 2012-08-01 17:47:29 +08:00
score
sh memcg: rename config variables 2012-07-31 18:42:43 -07:00
sparc This patch series contains a major revamp of how we collect entropy 2012-07-31 19:07:42 -07:00
tile memcg: rename config variables 2012-07-31 18:42:43 -07:00
um Merge branch 'akpm' (Andrew's patch-bomb) 2012-07-31 19:25:39 -07:00
unicore32 PCI changes for the 3.6 merge window: 2012-07-24 16:17:07 -07:00
x86 crypto: twofish-avx - tune assembler code for more performance 2012-09-07 04:17:04 +08:00
xtensa xtensa: select generic atomic64_t support 2012-07-31 18:42:39 -07:00
.gitignore
Kconfig ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION 2012-07-30 17:25:21 -07:00