linux/include
Xiao Wang a43fe27d65
riscv: Optimize crc32 with Zbc extension
As suggested by the B-ext spec, the Zbc (carry-less multiplication)
instructions can be used to accelerate CRC calculations. Currently, the
crc32 is the most widely used crc function inside kernel, so this patch
focuses on the optimization of just the crc32 APIs.

Compared with the current table-lookup based optimization, Zbc based
optimization can also achieve large stride during CRC calculation loop,
meantime, it avoids the memory access latency of the table-lookup based
implementation and it reduces memory footprint.

If Zbc feature is not supported in a runtime environment, then the
table-lookup based implementation would serve as fallback via alternative
mechanism.

By inspecting the vmlinux built by gcc v12.2.0 with default optimization
level (-O2), we can see below instruction count change for each 8-byte
stride in the CRC32 loop:

rv64: crc32_be (54->31), crc32_le (54->13), __crc32c_le (54->13)
rv32: crc32_be (50->32), crc32_le (50->16), __crc32c_le (50->16)

The compile target CPU is little endian, extra effort is needed for byte
swapping for the crc32_be API, thus, the instruction count change is not
as significant as that in the *_le cases.

This patch is tested on QEMU VM with the kernel CRC32 selftest for both
rv64 and rv32. Running the CRC32 selftest on a real hardware (SpacemiT K1)
with Zbc extension shows 65% and 125% performance improvement respectively
on crc32_test() and crc32c_test().

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240621054707.1847548-1-xiao.w.wang@intel.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10 13:19:50 -07:00
..
acpi The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
asm-generic asm-generic cleanups for 6.10 2024-05-20 15:18:34 -07:00
clocksource
crypto This push fixes a bug in the new ecc P521 code as well as a buggy 2024-05-20 08:47:54 -07:00
drm drm fixes for 6.10-rc1 2024-05-24 17:28:02 -07:00
dt-bindings - Core Frameworks 2024-05-22 10:49:54 -07:00
keys Hi, 2024-05-13 10:40:15 -07:00
kunit kunit: Print last test location on fault 2024-05-06 14:22:02 -06:00
kvm Merge branch kvm-arm64/misc-6.10 into kvmarm-master/next 2024-05-08 16:41:50 +01:00
linux riscv: Optimize crc32 with Zbc extension 2024-07-10 13:19:50 -07:00
math-emu
media media: cec.h: Fix kerneldoc 2024-05-04 10:19:59 +02:00
memory
misc
net more s390 updates for 6.10 merge window 2024-05-21 12:09:36 -07:00
pcmcia
ras tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
rdma The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
rv
scsi SCSI misc on 20240514 2024-05-14 18:25:53 -07:00
soc I'm actually surprised this time. There aren't any new Qualcomm SoC clk 2024-05-18 12:48:37 -07:00
sound ASoC: Fixes for v6.10 2024-05-23 13:29:27 +02:00
target
trace block-6.10-20240523 2024-05-23 13:44:47 -07:00
uapi drm fixes for 6.10-rc1 2024-05-24 17:28:02 -07:00
ufs
vdso
video
xen