linux/arch/arm64/lib
Ard Biesheuvel efdb25efc7 arm64/lib: improve CRC32 performance for deep pipelines
Improve the performance of the crc32() asm routines by getting rid of
most of the branches and small sized loads on the common path.

Instead, use a branchless code path involving overlapping 16 byte
loads to process the first (length % 32) bytes, and process the
remainder using a loop that processes 32 bytes at a time.

Tested using the following test program:

  #include <stdlib.h>

  extern void crc32_le(unsigned short, char const*, int);

  int main(void)
  {
    static const char buf[4096];

    srand(20181126);

    for (int i = 0; i < 100 * 1000 * 1000; i++)
      crc32_le(0, buf, rand() % 1024);

    return 0;
  }

On Cortex-A53 and Cortex-A57, the performance regresses but only very
slightly. On Cortex-A72 however, the performance improves from

  $ time ./crc32

  real  0m10.149s
  user  0m10.149s
  sys   0m0.000s

to

  $ time ./crc32

  real  0m7.915s
  user  0m7.915s
  sys   0m0.000s

Cc: Rui Sun <sunrui26@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 13:58:04 +00:00
..
atomic_ll_sc.c arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics 2015-07-27 15:28:50 +01:00
clear_page.S
clear_user.S arm64: uaccess: Mask __user pointers for __arch_{clear, copy_*}_user 2018-02-06 22:53:40 +00:00
copy_from_user.S arm64: kpti: Fix the interaction between ASID switching and software PAN 2018-01-16 17:37:48 +00:00
copy_in_user.S arm64: uaccess: Mask __user pointers for __arch_{clear, copy_*}_user 2018-02-06 22:53:40 +00:00
copy_page.S arm64/lib: copy_page: use consistent prefetch stride 2017-07-25 10:04:42 +01:00
copy_template.S scripts/spelling.txt: add "overwritting" pattern and fix typo instances 2017-02-27 18:43:47 -08:00
copy_to_user.S arm64: kpti: Fix the interaction between ASID switching and software PAN 2018-01-16 17:37:48 +00:00
crc32.S arm64/lib: improve CRC32 performance for deep pipelines 2018-11-30 13:58:04 +00:00
delay.c arm64: use WFE for long delays 2017-10-13 18:56:15 +01:00
Makefile arm64: lse: remove -fcall-used-x0 flag 2018-09-24 10:56:24 +01:00
memchr.S arm64: lib: use C string functions with KASAN enabled 2018-10-26 16:25:18 -07:00
memcmp.S arm64: lib: use C string functions with KASAN enabled 2018-10-26 16:25:18 -07:00
memcpy.S arm64: add KASAN support 2015-10-12 17:46:36 +01:00
memmove.S arm64: add KASAN support 2015-10-12 17:46:36 +01:00
memset.S arm64: add KASAN support 2015-10-12 17:46:36 +01:00
strchr.S arm64: lib: use C string functions with KASAN enabled 2018-10-26 16:25:18 -07:00
strcmp.S arm64: lib: use C string functions with KASAN enabled 2018-10-26 16:25:18 -07:00
strlen.S arm64: lib: use C string functions with KASAN enabled 2018-10-26 16:25:18 -07:00
strncmp.S arm64: lib: use C string functions with KASAN enabled 2018-10-26 16:25:18 -07:00
strnlen.S arm64: lib: use C string functions with KASAN enabled 2018-10-26 16:25:18 -07:00
strrchr.S arm64: lib: use C string functions with KASAN enabled 2018-10-26 16:25:18 -07:00
tishift.S arm64: export tishift functions to modules 2018-05-21 19:00:48 +01:00
uaccess_flushcache.c arm64: uaccess: Add the uaccess_flushcache.c file 2017-08-10 10:49:21 +01:00