linux

History

Eric Biggers 1862eb0073 crypto: arm/blake2b - add NEON-accelerated BLAKE2b Add a NEON-accelerated implementation of BLAKE2b. On Cortex-A7 (which these days is the most common ARM processor that doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as SHA-256, and slightly faster than SHA-1. It is also almost three times as fast as the generic implementation of BLAKE2b: Algorithm Cycles per byte (on 4096-byte messages) =================== ======================================= blake2b-256-neon 14.0 sha1-neon 16.3 blake2s-256-arm 18.8 sha1-asm 20.8 blake2s-256-generic 26.0 sha256-neon 28.9 sha256-asm 32.0 blake2b-256-generic 38.9 This implementation isn't directly based on any other implementation, but it borrows some ideas from previous NEON code I've written as well as from chacha-neon-core.S. At least on Cortex-A7, it is faster than the other NEON implementations of BLAKE2b I'm aware of (the implementation in the BLAKE2 official repository using intrinsics, and Andrew Moon's implementation which can be found in SUPERCOP). It does only one block at a time, so it performs well on short messages too. NEON-accelerated BLAKE2b is useful because there is interest in using BLAKE2b-256 for dm-verity on low-end Android devices (specifically, devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On these devices, the performance cost of upgrading to SHA-256 may be unacceptable, whereas BLAKE2b-256 would actually improve performance. Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which is intended for 32-bit platforms), on 32-bit ARM processors with NEON, BLAKE2b is actually faster than BLAKE2s. This is because NEON supports 64-bit operations, and because BLAKE2s's block size is too small for NEON to be helpful for it. The best I've been able to do with BLAKE2s on Cortex-A7 is 18.8 cpb with an optimized scalar implementation. (I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but they're more complex as they require running multiple hashes at once. Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7, so I expect that any speedup from BLAKE2sp or BLAKE3 would come only from the smaller number of rounds, not from the extra parallelism.) For now this BLAKE2b implementation is only wired up to the shash API, since there is no library API for BLAKE2b yet. However, I've tried to keep things consistent with BLAKE2s, e.g. by defining blake2b_compress_arch() which is analogous to blake2s_compress_arch() and could be exported for use by the library API later if needed. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>		2021-01-03 08:41:39 +11:00
..
.gitignore	SPDX patches for 5.7-rc1.	2020-04-03 13:12:26 -07:00
aes-ce-core.S	crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata	2020-12-04 18:13:14 +11:00
aes-ce-glue.c	crypto: remove CRYPTO_TFM_RES_BAD_KEY_LEN	2020-01-09 11:30:53 +08:00
aes-cipher-core.S	crypto: arm/aes-cipher - switch to shared AES inverse Sbox	2019-07-26 14:58:37 +10:00
aes-cipher-glue.c	crypto: arm/aes-scalar - unexport en/decryption routines	2019-07-26 14:58:38 +10:00
aes-neonbs-core.S	crypto: arm/aes-neonbs - avoid loading reorder argument on encryption	2020-09-25 17:48:15 +10:00
aes-neonbs-glue.c	crypto: remove cipher routines from public crypto API	2021-01-03 08:41:35 +11:00
blake2b-neon-core.S	crypto: arm/blake2b - add NEON-accelerated BLAKE2b	2021-01-03 08:41:39 +11:00
blake2b-neon-glue.c	crypto: arm/blake2b - add NEON-accelerated BLAKE2b	2021-01-03 08:41:39 +11:00
blake2s-core.S	crypto: arm/blake2s - add ARM scalar optimized BLAKE2s	2021-01-03 08:41:39 +11:00
blake2s-glue.c	crypto: arm/blake2s - add ARM scalar optimized BLAKE2s	2021-01-03 08:41:39 +11:00
chacha-glue.c	crypto: arm/chacha-neon - add missing counter increment	2021-01-03 08:35:35 +11:00
chacha-neon-core.S	crypto: arm/chacha-neon - optimize for non-block size multiples	2020-11-13 20:38:44 +11:00
chacha-scalar-core.S	crypto: arm/chacha - remove dependency on generic ChaCha driver	2019-11-17 09:02:40 +08:00
crc32-ce-core.S	crypto: Replace HTTP links with HTTPS ones	2020-07-23 17:34:20 +10:00
crc32-ce-glue.c	crypto: remove CRYPTO_TFM_RES_BAD_KEY_LEN	2020-01-09 11:30:53 +08:00
crct10dif-ce-core.S	crypto: arm - use Kconfig based compiler checks for crypto opcodes	2019-10-23 19:46:56 +11:00
crct10dif-ce-glue.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500	2019-06-19 17:09:55 +02:00
curve25519-core.S	crypto: arm/curve25519 - wire up NEON implementation	2019-11-17 09:02:44 +08:00
curve25519-glue.c	crypto: arm/curve25519 - include <linux/scatterlist.h>	2020-08-25 11:24:07 +10:00
ghash-ce-core.S	crypto: arm/ghash-ce - define fpu before fpu registers are referenced	2020-03-06 12:28:25 +11:00
ghash-ce-glue.c	crypto: arm/ghash - use variably sized key struct	2020-07-09 22:14:33 +10:00
Kconfig	crypto: arm/blake2b - add NEON-accelerated BLAKE2b	2021-01-03 08:41:39 +11:00
Makefile	crypto: arm/blake2b - add NEON-accelerated BLAKE2b	2021-01-03 08:41:39 +11:00
nh-neon-core.S	crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305	2018-11-20 14:26:56 +08:00
nhpoly1305-neon-glue.c	crypto: arch/nhpoly1305 - process in explicit 4k chunks	2020-04-30 15:16:59 +10:00
poly1305-armv4.pl	crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation	2019-11-17 09:02:42 +08:00
poly1305-core.S_shipped	crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation	2019-11-17 09:02:42 +08:00
poly1305-glue.c	crypto: arm/poly1305 - Add prototype for poly1305_blocks_neon	2020-09-04 17:57:14 +10:00
sha1_glue.c	crypto: sha - split sha.h into sha1.h and sha2.h	2020-11-20 14:45:33 +11:00
sha1_neon_glue.c	crypto: sha - split sha.h into sha1.h and sha2.h	2020-11-20 14:45:33 +11:00
sha1-armv4-large.S	crypto: Replace HTTP links with HTTPS ones	2020-07-23 17:34:20 +10:00
sha1-armv7-neon.S	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
sha1-ce-core.S	crypto: arm - use Kconfig based compiler checks for crypto opcodes	2019-10-23 19:46:56 +11:00
sha1-ce-glue.c	crypto: sha - split sha.h into sha1.h and sha2.h	2020-11-20 14:45:33 +11:00
sha1.h	crypto: sha - split sha.h into sha1.h and sha2.h	2020-11-20 14:45:33 +11:00
sha2-ce-core.S	crypto: arm - use Kconfig based compiler checks for crypto opcodes	2019-10-23 19:46:56 +11:00
sha2-ce-glue.c	crypto: sha - split sha.h into sha1.h and sha2.h	2020-11-20 14:45:33 +11:00
sha256_glue.c	crypto: sha - split sha.h into sha1.h and sha2.h	2020-11-20 14:45:33 +11:00
sha256_glue.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
sha256_neon_glue.c	crypto: sha - split sha.h into sha1.h and sha2.h	2020-11-20 14:45:33 +11:00
sha256-armv4.pl	crypto: arm/sha256-neon - avoid ADRL pseudo instruction	2020-09-25 17:48:13 +10:00
sha256-core.S_shipped	crypto: arm/sha256-neon - avoid ADRL pseudo instruction	2020-09-25 17:48:13 +10:00
sha512-armv4.pl	crypto: arm/sha512-neon - avoid ADRL pseudo instruction	2020-09-25 17:48:14 +10:00
sha512-core.S_shipped	crypto: arm/sha512-neon - avoid ADRL pseudo instruction	2020-09-25 17:48:14 +10:00
sha512-glue.c	crypto: sha - split sha.h into sha1.h and sha2.h	2020-11-20 14:45:33 +11:00
sha512-neon-glue.c	crypto: sha - split sha.h into sha1.h and sha2.h	2020-11-20 14:45:33 +11:00
sha512.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00