The only remaining user of the fallback implementation of 64x64
polynomial multiplication using 8x8 PMULL instructions is the final
reduction from a 16 byte vector to a 16-bit CRC.
The fallback code is complicated and messy, and this reduction has
little impact on the overall performance, so instead, let's calculate
the final CRC by passing the 16 byte vector to the generic CRC-T10DIF
implementation when running the fallback version.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The CRC-T10DIF implementation for arm64 has a version that uses 8x8
polynomial multiplication, for cores that lack the crypto extensions,
which cover the 64x64 polynomial multiplication instruction that the
algorithm was built around.
This fallback version rather naively adopted the 64x64 polynomial
multiplication algorithm that I ported from ARM for the GHASH driver,
which needs 8 PMULL8 instructions to implement one PMULL64. This is
reasonable, given that each 8-bit vector element needs to be multiplied
with each element in the other vector, producing 8 vectors with partial
results that need to be combined to yield the correct result.
However, most PMULL64 invocations in the CRC-T10DIF code involve
multiplication by a pair of 16-bit folding coefficients, and so all the
partial results from higher order bytes will be zero, and there is no
need to calculate them to begin with.
Then, the CRC-T10DIF algorithm always XORs the output values of the
PMULL64 instructions being issued in pairs, and so there is no need to
faithfully implement each individual PMULL64 instruction, as long as
XORing the results pairwise produces the expected result.
Implementing these improvements results in a speedup of 3.3x on low-end
platforms such as Raspberry Pi 4 (Cortex-A72)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is a partial revert of commit fc754c024a, which moved the logic
into C code which ensures that kernel mode NEON code does not hog the
CPU for too long.
This is no longer needed now that kernel mode NEON no longer disables
preemption, so we can drop this.
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The ahash_init functions may return fails. The ahash_hmac_init should
not return ok when ahash_init returns error. For an example, ahash_init
will return -ENOMEM when allocation memory is error.
Fixes: 9d12ba86f8 ("crypto: brcm - Add Broadcom SPU driver")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The caam_rsa_set_priv_key_form did not check for memory allocation errors.
Add the checks to the caam_rsa_set_priv_key_form functions.
Fixes: 52e26d77b8 ("crypto: caam - add support for RSA key form 2")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Gaurav Jain <gaurav.jain@nxp.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a driver for the random number generator present on the Broadcom
BCM74110 SoC.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a binding for the random number generator used on the BCM74110.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In commit 24cc57d8fa ("padata: Honor the caller's alignment in case of
chunk_size 0"), the line 'ps.chunk_size = max(ps.chunk_size, 1ul)' was
added, making 'ps.chunk_size = 1U' redundant and never executed.
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The commit 320406cb60 ("crypto: inside-secure - Replace generic aes
with libaes") replaced crypto_alloc_cipher() with kmalloc(), but did not
modify the handling of the return value. When kmalloc() returns NULL,
PTR_ERR_OR_ZERO(NULL) returns 0, but in fact, the memory allocation has
failed, and -ENOMEM should be returned.
Fixes: 320406cb60 ("crypto: inside-secure - Replace generic aes with libaes")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Acked-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The adf_init_aer() won't destroy device_reset_wq when alloc_workqueue()
for device_sriov_wq failed. Add destroy_workqueue for device_reset_wq to
fix this issue.
Fixes: 4469f9b234 ("crypto: qat - re-enable sriov after pf reset")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 1e562deace ("crypto: rsassa-pkcs1 - Migrate to sig_alg backend")
enforced that rsassa-pkcs1 sign/verify operations specify a hash
algorithm. That is necessary because per RFC 8017 sec 8.2, a hash
algorithm identifier must be prepended to the hash before generating or
verifying the signature ("Full Hash Prefix").
However the commit went too far in that it changed user space behavior:
KEYCTL_PKEY_QUERY system calls now return -EINVAL unless they specify a
hash algorithm. Intel Wireless Daemon (iwd) is one application issuing
such system calls (for EAP-TLS).
Closer analysis of the Embedded Linux Library (ell) used by iwd reveals
that the problem runs even deeper: When iwd uses TLS 1.1 or earlier, it
not only queries for keys, but performs sign/verify operations without
specifying a hash algorithm. These legacy TLS versions concatenate an
MD5 to a SHA-1 hash and omit the Full Hash Prefix:
https://git.kernel.org/pub/scm/libs/ell/ell.git/tree/ell/tls-suites.c#n97
TLS 1.1 was deprecated in 2021 by RFC 8996, but removal of support was
inadvertent in this case. It probably should be coordinated with iwd
maintainers first.
So reinstate support for such legacy protocols by defaulting to hash
algorithm "none" which uses an empty Full Hash Prefix.
If it is later on decided to remove TLS 1.1 support but still allow
KEYCTL_PKEY_QUERY without a hash algorithm, that can be achieved by
reverting the present commit and replacing it with the following patch:
https://lore.kernel.org/r/ZxalYZwH5UiGX5uj@wunner.de/
It's worth noting that Python's cryptography library gained support for
such legacy use cases very recently, so they do seem to still be a thing.
The Python developers identified IKE version 1 as another protocol
omitting the Full Hash Prefix:
https://github.com/pyca/cryptography/issues/10226https://github.com/pyca/cryptography/issues/5495
The author of those issues, Zoltan Kelemen, spent considerable effort
searching for test vectors but only found one in a 2019 blog post by
Kevin Jones. Add it to testmgr.h to verify correctness of this feature.
Examination of wpa_supplicant as well as various IKE daemons (libreswan,
strongswan, isakmpd, raccoon) has determined that none of them seems to
use the kernel's Key Retention Service, so iwd is the only affected user
space application known so far.
Fixes: 1e562deace ("crypto: rsassa-pkcs1 - Migrate to sig_alg backend")
Reported-by: Klara Modin <klarasmodin@gmail.com>
Tested-by: Klara Modin <klarasmodin@gmail.com>
Closes: https://lore.kernel.org/r/2ed09a22-86c0-4cf0-8bda-ef804ccb3413@gmail.com/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If an error indicating that the device needs to be reset is reported,
disable the error reporting before device reset is complete,
enable the error reporting after the reset is complete to prevent
the same error from being reported repeatedly.
Fixes: eaebf4c3b1 ("crypto: hisilicon - Unify hardware error init/uninit into QM")
Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Query the capability register status of accelerator devices
(SEC, HPRE and ZIP) through the debugfs interface, for example:
cat cap_regs. The purpose is to improve the robustness and
locability of hardware devices and drivers.
Signed-off-by: Qi Tao <taoqi10@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
encrypt_blob(), decrypt_blob() and create_signature() were some of the
functions added in 2018 by
commit 5a30771832 ("KEYS: Provide missing asymmetric key subops for new
key type ops [ver #2]")
however, they've not been used.
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
After commit 0edb555a65 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.
Convert all platform drivers below drivers/char/hw_random to use
.remove(), with the eventual goal to drop struct
platform_driver::remove_new(). As .remove() and .remove_new() have the
same prototypes, conversion is done by just changing the structure
member name in the driver initializer.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The explicit crypto_engine_stop() call is not needed, as it is already
called internally by crypto_engine_exit().
Signed-off-by: Ovidiu Panait <ovidiu.panait.oss@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The explicit crypto_engine_stop() call is not needed, as it is already
called internally by crypto_engine_exit().
Signed-off-by: Ovidiu Panait <ovidiu.panait.oss@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move crypto_simd_disabled_for_test to lib/ so that crypto_simd_usable()
can be used by library code.
This was discussed previously
(https://lore.kernel.org/linux-crypto/20220716062920.210381-4-ebiggers@kernel.org/)
but was not done because there was no use case yet. However, this is
now needed for the arm64 CRC32 library code.
Tested with:
export ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
echo CONFIG_CRC32=y > .config
echo CONFIG_MODULES=y >> .config
echo CONFIG_CRYPTO=m >> .config
echo CONFIG_DEBUG_KERNEL=y >> .config
echo CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n >> .config
echo CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y >> .config
make olddefconfig
make -j$(nproc)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The while loop breaks in the first run because of incorrect
if condition. It also causes the statements after the if to
appear dead.
Fix this by changing the condition from if(timeout--) to
if(!timeout--).
This bug was reported by Coverity Scan.
Report:
CID 1600859: (#1 of 1): Logically dead code (DEADCODE)
dead_error_line: Execution cannot reach this statement: udelay(30UL);
Fixes: 9e2c7d9994 ("crypto: cavium - Add Support for Octeon-tx CPT Engine")
Signed-off-by: Everest K.C. <everestkc@everestkc.com.np>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Document the crypto engine on the SA8775P Platform.
Signed-off-by: Yuvaraj Ranganathan <quic_yrangana@quicinc.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for Airoha TRNG. The Airoha SoC provide a True RNG module
that can output 4 bytes of raw data at times.
The module makes use of various noise source to provide True Random
Number Generation.
On probe the module is reset to operate Health Test and verify correct
execution of it.
The module can also provide DRBG function but the execution mode is
mutually exclusive, running as TRNG doesn't permit to also run it as
DRBG.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for Airoha EN7581 True Random Number generator.
This module can generate up to 4bytes of raw data at times and support
self health test at startup. The module gets noise for randomness from
various source from ADC, AP, dedicated clocks and other devices attached
to the SoC producing true random numbers.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There is a spelling mistake of 'accelaration' in comments which
should be 'acceleration'.
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove returns that are immediately followed by another return.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Stop using FRAME_BEGIN and FRAME_END in the AEGIS assembly functions,
since all these functions are now leaf functions. This eliminates some
unnecessary instructions.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Update a caller of aegis128_aesni_ad() to round down the length to a
block boundary. After that, aegis128_aesni_ad(), aegis128_aesni_enc(),
and aegis128_aesni_dec() are only passed whole blocks. Update the
assembly code to take advantage of that, which eliminates some unneeded
instructions. For aegis128_aesni_enc() and aegis128_aesni_dec(), the
length is also always nonzero, so stop checking for zero length.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Optimize the code that loads and stores partial blocks, taking advantage
of SSE4.1. The code is adapted from that in aes-gcm-aesni-x86_64.S.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Adjust the prototypes of the AEGIS assembly functions:
- Use proper types instead of 'void *', when applicable.
- Move the length parameter to after the buffers it describes rather
than before, to match the usual convention. Also shorten its name to
just len (which is the name used in the assembly code).
- Declare register aliases at the beginning of each function rather than
once per file. This was necessary because len was moved, but also it
allows adding some aliases where raw registers were used before.
- Put assoclen and cryptlen in the correct order when declaring the
finalization function in the .c file.
- Remove the unnecessary "crypto_" prefix.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Start using SSE4.1 instructions in the AES-NI AEGIS code, with the first
use case being preparing the length block in fewer instructions.
In practice this does not reduce the set of CPUs on which the code can
run, because all Intel and AMD CPUs with AES-NI also have SSE4.1.
Upgrade the existing SSE2 feature check to SSE4.1, though it seems this
check is not strictly necessary; the aesni-intel module has been getting
away with using SSE4.1 despite checking for AES-NI only.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove the AEGIS assembly code paths that were "optimized" to operate on
16-byte aligned data using movdqa, and instead just use the code paths
that use movdqu and can handle data with any alignment.
This does not reduce performance. movdqa is basically a historical
artifact; on aligned data, movdqu and movdqa have had the same
performance since Intel Nehalem (2008) and AMD Bulldozer (2011). And
code that requires AES-NI cannot run on CPUs older than those anyway.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of using a struct of function pointers to decide whether to call
the encryption or decryption assembly functions, use a conditional
branch on a bool. Force-inline the functions to avoid actually
generating the branch. This improves performance slightly since
indirect calls are slow. Remove the now-unnecessary CFI stubs.
Note that just force-inlining the existing functions might cause the
compiler to optimize out the indirect branches, but that would not be a
reliable way to do it and the CFI stubs would still be required.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Don't bother providing empty stubs for the init and exit methods in
struct aead_alg, since they are optional anyway.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fix the AEGIS assembly code to access 'unsigned int' arguments as 32-bit
values instead of 64-bit, since the upper bits of the corresponding
64-bit registers are not guaranteed to be zero.
Note: there haven't been any reports of this bug actually causing
incorrect behavior. Neither gcc nor clang guarantee zero-extension to
64 bits, but zero-extension is likely to happen in practice because most
instructions that operate on 32-bit registers zero-extend to 64 bits.
Fixes: 1d373d4e8e ("crypto: x86 - Add optimized AEGIS implementations")
Cc: stable@vger.kernel.org
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crc32c-generic is currently backed by the architecture's CRC-32c library
code, which may offer a variety of implementations depending on the
capabilities of the platform. These are not covered by the crypto
subsystem's fuzz testing capabilities because crc32c-generic is the
reference driver that the fuzzing logic uses as a source of truth.
Fix this by providing a crc32c-arch implementation which is based on the
arch library code if available, and modify crc32c-generic so it is
always based on the generic C implementation. If the arch has no CRC-32c
library code, this change does nothing.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crc32-generic is currently backed by the architecture's CRC-32 library
code, which may offer a variety of implementations depending on the
capabilities of the platform. These are not covered by the crypto
subsystem's fuzz testing capabilities because crc32-generic is the
reference driver that the fuzzing logic uses as a source of truth.
Fix this by providing a crc32-arch implementation which is based on the
arch library code if available, and modify crc32-generic so it is
always based on the generic C implementation. If the arch has no CRC-32
library code, this change does nothing.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove hard-coded strings by using the helper functions str_true_false()
and str_enabled_disabled().
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
RNG max clock frequency can be updated to 48MHz for stm32mp1x
platforms according to the latest specifications.
Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement the support for STM32MP25x platforms. On this platform, a
security clock is shared between some hardware blocks. For the RNG,
it is the RNG kernel clock. Therefore, the gate is no more shared
between the RNG bus and kernel clocks as on STM32MP1x platforms and
the bus clock has to be managed on its own.
Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add RNG STM32MP25x platforms compatible. Update the clock
properties management to support all versions.
Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently there is an unnecessary error check on ret without a proceeding
assignment to ret that needs checking. The check is redundant and can be
removed.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Akhil R <akhilrajeev@nvidia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rename devdata_mutex to devdata_spinlock to accurately reflect its
implementation as a spinlock.
[1] v1 https://lore.kernel.org/all/ZwyqD-w5hEhrnqTB@linux.ibm.com
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Since commit 8f4f68e788 ("crypto: pcrypt - Fix hungtask for
PADATA_RESET"), the pcrypt encryption and decryption operations return
-EAGAIN when the CPU goes online or offline. In alg_test(), a WARN is
generated when pcrypt_aead_decrypt() or pcrypt_aead_encrypt() returns
-EAGAIN, the unnecessary panic will occur when panic_on_warn set 1.
Fix this issue by calling crypto layer directly without parallelization
in that case.
Fixes: 8f4f68e788 ("crypto: pcrypt - Fix hungtask for PADATA_RESET")
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
'struct pm_status_row' are not modified in this driver.
Constifying this structure moves some data to a read-only section, so
increases overall security.
Update the prototype of some functions accordingly.
On a x86_64, with allmodconfig, as an example:
Before:
======
text data bss dec hex filename
4400 1059 0 5459 1553 drivers/crypto/intel/qat/qat_common/adf_gen4_pm_debugfs.o
After:
=====
text data bss dec hex filename
5216 243 0 5459 1553 drivers/crypto/intel/qat/qat_common/adf_gen4_pm_debugfs.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The Marvell Armada RNG uses the same IP as TI from Inside Secure and is
already using the binding. The only missing part is the
"marvell,armada-8k-rng" compatible string.
Rename the binding to inside-secure,safexcel-eip76.yaml to better
reflect it is multi-vendor, licensed IP and to follow the naming
convention using compatible string.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit a7d45ba77d ("crypto: ecdsa - Register NIST P521 and extend test
suite") added support for ECDSA signature verification using NIST P521,
but forgot to amend the Kconfig help text. Fix it.
Fixes: a7d45ba77d ("crypto: ecdsa - Register NIST P521 and extend test suite")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit a2471684da ("crypto: ecdsa - Move X9.62 signature size
calculation into template") introduced ->max_size() and ->digest_size()
callbacks to struct sig_alg. They return an algorithm's maximum
signature size and digest size, respectively.
For algorithms which lack these callbacks, crypto_register_sig() was
amended to use the ->key_size() callback instead.
However the commit neglected to also amend sig_register_instance().
As a result, the ->max_size() and ->digest_size() callbacks remain NULL
pointers if instances do not define them. A KEYCTL_PKEY_QUERY system
call results in an oops for such instances:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Call Trace:
software_key_query+0x169/0x370
query_asymmetric_key+0x67/0x90
keyctl_pkey_query+0x86/0x120
__do_sys_keyctl+0x428/0x480
do_syscall_64+0x4b/0x110
The only instances affected by this are "pkcs1(rsa, ...)".
Fix by moving the callback checks from crypto_register_sig() to
sig_prepare_alg(), which is also invoked by sig_register_instance().
Change the return type of sig_prepare_alg() from void to int to be able
to return errors. This matches other algorithm types, see e.g.
aead_prepare_alg() or ahash_prepare_alg().
Fixes: a2471684da ("crypto: ecdsa - Move X9.62 signature size calculation into template")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crc32c-pcl-intel-asm_64.S has a loop with 1 to 127 iterations fully
unrolled and uses a jump table to jump into the correct location. This
optimization is misguided, as it bloats the binary code size and
introduces an indirect call. x86_64 CPUs can predict loops well, so it
is fine to just use a loop instead. Loop bookkeeping instructions can
compete with the crc instructions for the ALUs, but this is easily
mitigated by unrolling the loop by a smaller amount, such as 4 times.
Therefore, re-roll the loop and make related tweaks to the code.
This reduces the binary code size of crc_pclmul() from 4546 bytes to 418
bytes, a 91% reduction. In general it also makes the code faster, with
some large improvements seen when retpoline is enabled.
More detailed performance results are shown below. They are given as
percent improvement in throughput (negative means regressed) for CPU
microarchitecture vs. input length in bytes. E.g. an improvement from
40 GB/s to 50 GB/s would be listed as 25%.
Table 1: Results with retpoline enabled (the default):
| 512 | 833 | 1024 | 2000 | 3173 | 4096 |
---------------------+-------+-------+-------+------ +-------+-------+
Intel Haswell | 35.0% | 20.7% | 17.8% | 9.7% | -0.2% | 4.4% |
Intel Emerald Rapids | 66.8% | 45.2% | 36.3% | 19.3% | 0.0% | 5.4% |
AMD Zen 2 | 29.5% | 17.2% | 13.5% | 8.6% | -0.5% | 2.8% |
Table 2: Results with retpoline disabled:
| 512 | 833 | 1024 | 2000 | 3173 | 4096 |
---------------------+-------+-------+-------+------ +-------+-------+
Intel Haswell | 3.3% | 4.8% | 4.5% | 0.9% | -2.9% | 0.3% |
Intel Emerald Rapids | 7.5% | 6.4% | 5.2% | 2.3% | -0.0% | 0.6% |
AMD Zen 2 | 11.8% | 1.4% | 0.2% | 1.3% | -0.9% | -0.2% |
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fix crc32c-pcl-intel-asm_64.S to access 32-bit arguments as 32-bit
values instead of 64-bit, since the upper bits of the corresponding
64-bit registers are not guaranteed to be zero. Also update the type of
the length argument to be unsigned int rather than int, as the assembly
code treats it as unsigned.
Note: there haven't been any reports of this bug actually causing
incorrect behavior. Neither gcc nor clang guarantee zero-extension to
64 bits, but zero-extension is likely to happen in practice because most
instructions that operate on 32-bit registers zero-extend to 64 bits.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The assembly code in crc32c-pcl-intel-asm_64.S is invoked only for
lengths >= 512, due to the overhead of saving and restoring FPU state.
Therefore, it is unnecessary for this code to be excessively "optimized"
for lengths < 200. Eliminate the excessive unrolling of this part of
the code and use a more straightforward qword-at-a-time loop.
Note: the part of the code in question is not entirely redundant, as it
is still used to process any remainder mod 24, as well as any remaining
data when fewer than 200 bytes remain after least one 3072-byte chunk.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>