linux/arch/arm64
Ard Biesheuvel 11031c0d7d crypto: arm64/gcm-ce - implement 4 way interleave
To improve performance on cores with deep pipelines such as ThunderX2,
reimplement gcm(aes) using a 4-way interleave rather than the 2-way
interleave we use currently.

This comes down to a complete rewrite of the GCM part of the combined
GCM/GHASH driver, and instead of interleaving two invocations of AES
with the GHASH handling at the instruction level, the new version
uses a more coarse grained approach where each chunk of 64 bytes is
encrypted first and then ghashed (or ghashed and then decrypted in
the converse case).

The core NEON routine is now able to consume inputs of any size,
and tail blocks of less than 64 bytes are handled using overlapping
loads and stores, and processed by the same 4-way encryption and
hashing routines. This gets rid of most of the branches, and avoids
having to return to the C code to handle the tail block using a
stack buffer.

The table below compares the performance of the old driver and the new
one on various micro-architectures and running in various modes.

        |     AES-128      |     AES-192      |     AES-256      |
 #bytes | 512 | 1500 |  4k | 512 | 1500 |  4k | 512 | 1500 |  4k |
 -------+-----+------+-----+-----+------+-----+-----+------+-----+
    TX2 | 35% |  23% | 11% | 34% |  20% |  9% | 38% |  25% | 16% |
   EMAG | 11% |   6% |  3% | 12% |   4% |  2% | 11% |   4% |  2% |
    A72 |  8% |   5% | -4% |  9% |   4% | -5% |  7% |   4% | -5% |
    A53 | 11% |   6% | -1% | 10% |   8% | -1% | 10% |   8% | -2% |

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-10-05 01:04:31 +10:00
..
boot pci-v5.4-changes 2019-09-23 19:16:01 -07:00
configs arm64: defconfig: Enable Qualcomm QUSB2 PHY 2019-09-04 17:15:29 +02:00
crypto crypto: arm64/gcm-ce - implement 4 way interleave 2019-10-05 01:04:31 +10:00
include mm: treewide: clarify pgtable_page_{ctor,dtor}() naming 2019-09-26 10:10:44 -07:00
kernel arm64, mm: make randomization selected by generic topdown mmap layout 2019-09-24 15:54:11 -07:00
kvm * s390: ioctl hardening, selftests 2019-09-18 09:49:13 -07:00
lib Merge branch 'for-next/atomics' into for-next/core 2019-08-30 12:55:39 +01:00
mm mm: treewide: clarify pgtable_page_{ctor,dtor}() naming 2019-09-26 10:10:44 -07:00
net arm64: bpf: optimize modulo operation 2019-09-03 15:44:40 +02:00
xen treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Kbuild arm64: add arch/arm64/Kbuild 2019-08-21 18:47:15 +01:00
Kconfig Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
Kconfig.debug treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Kconfig.platforms arm64: exynos: Enable exynos-chipid driver 2019-09-04 22:43:26 +02:00
Makefile Kbuild updates for v5.4 2019-09-20 08:36:47 -07:00