Extends the x86_64 ChaCha20 implementation by a function processing eight
ChaCha20 blocks in parallel using AVX2.
For large messages, throughput increases by ~55-70% compared to four block
SSSE3:
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 41999675 operations in 10 seconds (671994800 bytes)
test 1 (256 bit key, 64 byte blocks): 45805908 operations in 10 seconds (2931578112 bytes)
test 2 (256 bit key, 256 byte blocks): 32814947 operations in 10 seconds (8400626432 bytes)
test 3 (256 bit key, 1024 byte blocks): 19777167 operations in 10 seconds (20251819008 bytes)
test 4 (256 bit key, 8192 byte blocks): 2279321 operations in 10 seconds (18672197632 bytes)
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
four ChaCha20 blocks in parallel. This avoids the word shuffling needed
in the single block variant, further increasing throughput.
For large messages, throughput increases by ~110% compared to single block
SSSE3:
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
single block variant works on a single state matrix using SSE instructions.
It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
operations.
For large messages, throughput increases by ~65% compared to
chacha20-generic:
testing speed of chacha20 (chacha20-generic) encryption
test 0 (256 bit key, 16 byte blocks): 45089207 operations in 10 seconds (721427312 bytes)
test 1 (256 bit key, 64 byte blocks): 43839521 operations in 10 seconds (2805729344 bytes)
test 2 (256 bit key, 256 byte blocks): 12702056 operations in 10 seconds (3251726336 bytes)
test 3 (256 bit key, 1024 byte blocks): 3371173 operations in 10 seconds (3452081152 bytes)
test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (3460857856 bytes)
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As architecture specific drivers need a software fallback, export a
ChaCha20 en-/decryption function together with some helpers in a header
file.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Adds individual ChaCha20 and Poly1305 and a combined rfc7539esp AEAD speed
test using mode numbers 214, 321 and 213. For Poly1305 we add a specific
speed template, as it expects the key prepended to the input data.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch converts rfc7539 and rfc7539esp to the new AEAD interface.
The test vectors for rfc7539esp have also been updated to include
the IV.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Martin Willi <martin@strongswan.org>
Introduce constrains for RSA keys lengths.
Only key lengths of 512, 1024, 1536, 2048, 3072, and 4096 bits
will be supported.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add RSA support to QAT driver.
Removed unused RNG rings.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Load Modular Math Processor(MMP) firmware into QAT devices to support
public key algorithm acceleration.
Signed-off-by: Pingchao Yang <pingchao.yang@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch converts the ARM64 aes-ce-ccm implementation to the
new AEAD interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
This patch disables the rfc4309 test while the conversion to the
new seqiv calling convention takes place. It also replaces the
rfc4309 test vectors with ones that will work with the new IV
convention.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
vmx-crypto driver make use of some VSX instructions which are
only available if VSX is enabled. Running in cases where VSX
are not enabled vmx-crypto fails in a VSX exception.
In order to fix this enable_kernel_vsx() was added to turn on
VSX instructions for vmx-crypto.
Signed-off-by: Leonidas S. Barbosa <leosilva@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
enable_kernel_vsx() function was commented since anything was using
it. However, vmx-crypto driver uses VSX instructions which are
only available if VSX is enable. Otherwise it rises an exception oops.
This patch uncomment enable_kernel_vsx() routine and makes it available.
Signed-off-by: Leonidas S. Barbosa <leosilva@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
platform_driver does not need to set an owner because
platform_driver_register() will set it.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch converts rfc4106 to the new calling convention where
the IV is now part of the AD and needs to be skipped.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch converts rfc4106 to the new calling convention where
the IV is now part of the AD and needs to be skipped. This patch
also makes use of type-safe AEAD functions where possible.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch converts rfc4106 to the new calling convention where
the IV is now part of the AD and needs to be skipped. This patch
also makes use of the new type-safe way of freeing instances.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch converts rfc4106 to the new calling convention where
the IV is now in the AD and needs to be skipped.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch allows the AEAD speed tests to cope with the new seqiv
calling convention as well as the old one.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch disables the rfc4106 test while the conversion to the
new seqiv calling convention takes place. It also converts the
rfc4106 test vectors to the new format.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch replaces the seqniv generator with seqiv when the
underlying algorithm understands the new calling convention.
This not only makes more sense as now seqiv is solely responsible
for IV generation rather than also determining how the IV is going
to be used, it also allows for optimisations in the underlying
implementation. For example, the space for the IV could be used
to add padding for authentication.
This patch also removes the unnecessary copying of IV to dst
during seqiv decryption as the IV is part of the AD and not cipher
text.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes a bug where we were incorrectly including the
IV in the AD during encryption. The IV must remain in the plain
text for it to be encrypted.
During decryption there is no need to copy the IV to dst because
it's now part of the AD.
This patch removes an unncessary check on authsize which would be
performed by the underlying decrypt call.
Finally this patch makes use of the type-safe init/exit functions.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch allows the CRYPTO_ALG_AEAD_NEW flag to be propagated.
It also restores the ASYNC bit that went missing during the AEAD
conversion.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a type-safe function for freeing AEAD instances
to struct aead_instance. This replaces the existing free function
in struct crypto_template which does not know the type of the
instance that it's freeing.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently the task of freeing an instance is given to the crypto
template. However, it has no type information on the instance so
we have to resort to checking type information at runtime.
This patch introduces a free function to crypto_type that will be
used to free an instance. This can then be used to free an instance
in a type-safe manner.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The transform context is shared memory and must not be written
to without locking. This patch adds locking to nx-842 to prevent
context corruption.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AEAD speed tests doesn't do a wait_for_completition,
if the return value is EINPROGRESS or EBUSY.
Fixing it here.
Also add a test case for gcm(aes).
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use BIT()/GENMASK() macros for all register definitions instead of
hand-writing bit masks.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AES_CTRL_REG is used to configure AES mode. Before configuring
any mode we need to make sure all other modes are reset or else
driver will misbehave. So mask all modes before configuring
any AES mode.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Increasing the priority of omap-aes hw algos, in order to take
precedence over sw algos.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Algo self tests are failing for CTR mode with omap-aes driver,
giving the following error:
[ 150.053644] omap_aes_crypt: request size is not exact amount of AES blocks
[ 150.061262] alg: skcipher: encryption failed on test 5 for ctr-aes-omap: ret=22
This is because the input length is not aligned with AES_BLOCK_SIZE.
Adding support for omap-aes driver for inputs with length not aligned
with AES_BLOCK_SIZE.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes a host of reentrancy bugs in the nx driver. The
following algorithms are affected:
* CCM
* GCM
* CTR
* XCBC
* SHA256
* SHA512
The crypto API allows a single transform to be used by multiple
threads simultaneously. For example, IPsec will use a single tfm
to process packets for a given SA. As packets may arrive on
multiple CPUs that tfm must be reentrant.
The nx driver does try to deal with this by using a spin lock.
Unfortunately only the basic AES/CBC/ECB algorithms do this in
the correct way.
The symptom of these bugs may range from the generation of incorrect
output to memory corruption.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
While we never would successfully load on the wrong machine type, there
is extra output by default regardless of machine type.
For instance, on a PowerVM LPAR, we see the following:
nx_compress_powernv: loading
nx_compress_powernv: no coprocessors found
even though those coprocessors could never be found.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Dan Streetman <ddstreet@us.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
All tests for cbc(aes) use only blocks of data with a multiple of 4.
This test adds a test with some odd SG size.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AEAD version of cryptd uses the same context for its own state
as well as that of the child. In doing so it did not maintain the
proper ordering, thus resulting in potential state corruption where
the child will overwrite the state stored by cryptd.
This patch fixes and also sets the request size properly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If the device-tree indicates the nx-842 device's status is 'disabled',
we emit two messages:
nx_compress_pseries ibm,compression-v1: nx842_OF_upd_status: status 'disabled' is not 'okay'.
nx_compress_pseries ibm,compression-v1: nx842_OF_upd: device disabled
Given that 'disabled' is a valid state, and we are going to emit that
the device is disabled, only print out a non-'okay' status if it is not
'disabled'.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
While there is no technical reason that both nx-842.c and
nx-842-pseries.c can have the same name for the init/exit functions, it
is a bit confusing with initcall_debug. Rename the pseries specific
functions appropriately
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The current documention mentions explicitly that EINVAL should be
returned if the device is not available, but nx842_OF_upd_status()
always returns 0. However, nx842_probe() specifically checks for
non-ENODEV returns from nx842_of_upd() (which in turn calls
nx842_OF_upd_status()) and emits an extra error in that case. It seems
like the proper return code of a disabled device is ENODEV.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add the necessary module device tables to the platform support to allow
for autoloading of the CCP driver. This will allow for the CCP's hwrng
support to be available without having to manually load the driver. The
module device table entry for the pci support is already present.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>