Commit Graph

347 Commits

Author SHA1 Message Date
tim
5dda42fc89 crypto: x86/sha - Restructure x86 sha256 glue code to expose all the available sha256 transforms
Restructure the x86 sha256 glue code so we will expose sha256 transforms
based on SSSE3, AVX, AVX2 or SHA-NI extension as separate individual
drivers when cpu provides such support. This will make it easy for
alternative algorithms to be used if desired and makes the code cleaner
and easier to maintain.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:11 +08:00
tim
85c66ecd6f crypto: x86/sha - Restructure x86 sha1 glue code to expose all the available sha1 transforms
Restructure the x86 sha1 glue code so we will expose sha1 transforms based
on SSSE3, AVX, AVX2 or SHA-NI extension as separate individual drivers
when cpu provides such support. This will make it easy for alternative
algorithms to be used if desired and makes the code cleaner and easier
to maintain.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:11 +08:00
tim
e38b6b7fcf crypto: x86/sha - Add build support for Intel SHA Extensions optimized SHA1 and SHA256
This patch provides the configuration and build support to
include and build the optimized SHA1 and SHA256 update transforms
for the kernel's crypto library.

Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:06 +08:00
tim
95fca7df0b crypto: x86/sha - glue code for Intel SHA extensions optimized SHA1 & SHA256
This patch adds the glue code to detect and utilize the Intel SHA
extensions optimized SHA1 and SHA256 update transforms when available.

This code has been tested on Broxton for functionality.

Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:06 +08:00
tim
600a2334e8 crypto: x86/sha - Intel SHA Extensions optimized SHA256 transform function
This patch includes the Intel SHA Extensions optimized implementation
of SHA-256 update function. This function has been tested on Broxton
platform and measured a speed up of 3.6x over the SSSE3 implementiation
for 4K blocks.

Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:06 +08:00
tim
c356a7e975 crypto: x86/sha - Intel SHA Extensions optimized SHA1 transform function
This patch includes the Intel SHA Extensions optimized implementation
of SHA-1 update function. This function has been tested on Broxton
platform and measured a speed up of 3.6x over the SSSE3 implementiation
for 4K blocks.

Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:05 +08:00
Dave Hansen
d91cab7813 x86/fpu: Rename XSAVE macros
There are two concepts that have some confusing naming:
 1. Extended State Component numbers (currently called
    XFEATURE_BIT_*)
 2. Extended State Component masks (currently called XSTATE_*)

The numbers are (currently) from 0-9.  State component 3 is the
bounds registers for MPX, for instance.

But when we want to enable "state component 3", we go set a bit
in XCR0.  The bit we set is 1<<3.  We can check to see if a
state component feature is enabled by looking at its bit.

The current 'xfeature_bit's are at best xfeature bit _numbers_.
Calling them bits is at best inconsistent with ending the enum
list with 'XFEATURES_NR_MAX'.

This patch renames the enum to be 'xfeature'.  These also
happen to be what the Intel documentation calls a "state
component".

We also want to differentiate these from the "XSTATE_*" macros.
The "XSTATE_*" macros are a mask, and we rename them to match.

These macros are reasonably widely used so this patch is a
wee bit big, but this really is just a rename.

The only non-mechanical part of this is the

	s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/

We need a better name for it, but that's another patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com
[ Ported to v4.3-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:46 +02:00
Andrey Ryabinin
71c6da846b crypto: ghash-clmulni: specify context size for ghash async algorithm
Currently context size (cra_ctxsize) doesn't specified for
ghash_async_alg. Which means it's zero. Thus crypto_create_tfm()
doesn't allocate needed space for ghash_async_ctx, so any
read/write to ctx (e.g. in ghash_async_init_tfm()) is not valid.

Cc: stable@vger.kernel.org
Signed-off-by: Andrey Ryabinin <aryabinin@odin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-04 23:21:07 +08:00
Herbert Xu
5e4b8c1fcc crypto: aead - Remove CRYPTO_ALG_AEAD_NEW flag
This patch removes the CRYPTO_ALG_AEAD_NEW flag now that everyone
has been converted.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-08-17 16:53:53 +08:00
Martin Willi
b1ccc8f4b6 crypto: poly1305 - Add a four block AVX2 variant for x86_64
Extends the x86_64 Poly1305 authenticator by a function processing four
consecutive Poly1305 blocks in parallel using AVX2 instructions.

For large messages, throughput increases by ~15-45% compared to two
block SSE2:

testing speed of poly1305 (poly1305-simd)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3809514 opers/sec,  365713411 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5973423 opers/sec,  573448627 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9446779 opers/sec,  906890803 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1364814 opers/sec,  393066691 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2045780 opers/sec,  589184697 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711946 opers/sec, 1069040592 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  573686 opers/sec,  605812732 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1647802 opers/sec, 1740079440 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  292970 opers/sec,  609378224 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  943229 opers/sec, 1961916528 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  494623 opers/sec, 2041804569 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  254045 opers/sec, 2089271014 bytes/sec

testing speed of poly1305 (poly1305-simd)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3826224 opers/sec,  367317552 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5948638 opers/sec,  571069267 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9439110 opers/sec,  906154627 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1367756 opers/sec,  393913872 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2056881 opers/sec,  592381958 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711153 opers/sec, 1068812179 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  574940 opers/sec,  607136745 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1948830 opers/sec, 2057964585 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  293308 opers/sec,  610082096 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates): 1235224 opers/sec, 2569267792 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  684405 opers/sec, 2825226316 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  367101 opers/sec, 3019039446 bytes/sec

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:29 +08:00
Martin Willi
da35b22df3 crypto: poly1305 - Add a two block SSE2 variant for x86_64
Extends the x86_64 SSE2 Poly1305 authenticator by a function processing two
consecutive Poly1305 blocks in parallel using a derived key r^2. Loop
unrolling can be more effectively mapped to SSE instructions, further
increasing throughput.

For large messages, throughput increases by ~45-65% compared to single
block SSE2:

testing speed of poly1305 (poly1305-simd)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3790063 opers/sec,  363846076 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5913378 opers/sec,  567684355 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9352574 opers/sec,  897847104 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1362145 opers/sec,  392297990 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2007075 opers/sec,  578037628 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3709811 opers/sec, 1068425798 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  566272 opers/sec,  597984182 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1111657 opers/sec, 1173910108 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  288857 opers/sec,  600823808 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  590746 opers/sec, 1228751888 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  301825 opers/sec, 1245936902 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  153075 opers/sec, 1258896201 bytes/sec

testing speed of poly1305 (poly1305-simd)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3809514 opers/sec,  365713411 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5973423 opers/sec,  573448627 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9446779 opers/sec,  906890803 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1364814 opers/sec,  393066691 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2045780 opers/sec,  589184697 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711946 opers/sec, 1069040592 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  573686 opers/sec,  605812732 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1647802 opers/sec, 1740079440 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  292970 opers/sec,  609378224 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  943229 opers/sec, 1961916528 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  494623 opers/sec, 2041804569 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  254045 opers/sec, 2089271014 bytes/sec

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:28 +08:00
Martin Willi
c70f4abef0 crypto: poly1305 - Add a SSE2 SIMD variant for x86_64
Implements an x86_64 assembler driver for the Poly1305 authenticator. This
single block variant holds the 130-bit integer in 5 32-bit words, but uses
SSE to do two multiplications/additions in parallel.

When calling updates with small blocks, the overhead for kernel_fpu_begin/
kernel_fpu_end() negates the perfmance gain. We therefore use the
poly1305-generic fallback for small updates.

For large messages, throughput increases by ~5-10% compared to
poly1305-generic:

testing speed of poly1305 (poly1305-generic)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 4080026 opers/sec,  391682496 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 6221094 opers/sec,  597225024 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9609750 opers/sec,  922536057 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1459379 opers/sec,  420301267 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2115179 opers/sec,  609171609 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3729874 opers/sec, 1074203856 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  593000 opers/sec,  626208000 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1081536 opers/sec, 1142102332 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  302077 opers/sec,  628320576 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  554384 opers/sec, 1153120176 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  278715 opers/sec, 1150536345 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  140202 opers/sec, 1153022070 bytes/sec

testing speed of poly1305 (poly1305-simd)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3790063 opers/sec,  363846076 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5913378 opers/sec,  567684355 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9352574 opers/sec,  897847104 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1362145 opers/sec,  392297990 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2007075 opers/sec,  578037628 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3709811 opers/sec, 1068425798 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  566272 opers/sec,  597984182 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1111657 opers/sec, 1173910108 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  288857 opers/sec,  600823808 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  590746 opers/sec, 1228751888 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  301825 opers/sec, 1245936902 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  153075 opers/sec, 1258896201 bytes/sec

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:27 +08:00
Martin Willi
3d1e93cdf1 crypto: chacha20 - Add an eight block AVX2 variant for x86_64
Extends the x86_64 ChaCha20 implementation by a function processing eight
ChaCha20 blocks in parallel using AVX2.

For large messages, throughput increases by ~55-70% compared to four block
SSSE3:

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 41999675 operations in 10 seconds (671994800 bytes)
test 1 (256 bit key, 64 byte blocks): 45805908 operations in 10 seconds (2931578112 bytes)
test 2 (256 bit key, 256 byte blocks): 32814947 operations in 10 seconds (8400626432 bytes)
test 3 (256 bit key, 1024 byte blocks): 19777167 operations in 10 seconds (20251819008 bytes)
test 4 (256 bit key, 8192 byte blocks): 2279321 operations in 10 seconds (18672197632 bytes)

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:25 +08:00
Martin Willi
274f938e0a crypto: chacha20 - Add a four block SSSE3 variant for x86_64
Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
four ChaCha20 blocks in parallel. This avoids the word shuffling needed
in the single block variant, further increasing throughput.

For large messages, throughput increases by ~110% compared to single block
SSSE3:

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:25 +08:00
Martin Willi
c9320b6dcb crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64
Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
single block variant works on a single state matrix using SSE instructions.
It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
operations.

For large messages, throughput increases by ~65% compared to
chacha20-generic:

testing speed of chacha20 (chacha20-generic) encryption
test 0 (256 bit key, 16 byte blocks): 45089207 operations in 10 seconds (721427312 bytes)
test 1 (256 bit key, 64 byte blocks): 43839521 operations in 10 seconds (2805729344 bytes)
test 2 (256 bit key, 256 byte blocks): 12702056 operations in 10 seconds (3251726336 bytes)
test 3 (256 bit key, 1024 byte blocks): 3371173 operations in 10 seconds (3452081152 bytes)
test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (3460857856 bytes)

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:24 +08:00
Herbert Xu
e9b8d2c20a crypto: aesni - Use new IV convention
This patch converts rfc4106 to the new calling convention where
the IV is now in the AD and needs to be skipped.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-14 14:56:47 +08:00
Tadeusz Struk
0fbafd06bd crypto: aesni - fix failing setkey for rfc4106-gcm-aesni
rfc4106(gcm(aes)) uses ctr(aes) to generate hash key. ctr(aes) needs
chainiv, but the chainiv gets initialized after aesni_intel when both
are statically linked so the setkey fails.
This patch forces aesni_intel to be initialized after chainiv.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-29 16:06:30 +08:00
Linus Torvalds
44d21c3f3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 4.2:

  API:

   - Convert RNG interface to new style.

   - New AEAD interface with one SG list for AD and plain/cipher text.
     All external AEAD users have been converted.

   - New asymmetric key interface (akcipher).

  Algorithms:

   - Chacha20, Poly1305 and RFC7539 support.

   - New RSA implementation.

   - Jitter RNG.

   - DRBG is now seeded with both /dev/random and Jitter RNG.  If kernel
     pool isn't ready then DRBG will be reseeded when it is.

   - DRBG is now the default crypto API RNG, replacing krng.

   - 842 compression (previously part of powerpc nx driver).

  Drivers:

   - Accelerated SHA-512 for arm64.

   - New Marvell CESA driver that supports DMA and more algorithms.

   - Updated powerpc nx 842 support.

   - Added support for SEC1 hardware to talitos"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (292 commits)
  crypto: marvell/cesa - remove COMPILE_TEST dependency
  crypto: algif_aead - Temporarily disable all AEAD algorithms
  crypto: af_alg - Forbid the use internal algorithms
  crypto: echainiv - Only hold RNG during initialisation
  crypto: seqiv - Add compatibility support without RNG
  crypto: eseqiv - Offer normal cipher functionality without RNG
  crypto: chainiv - Offer normal cipher functionality without RNG
  crypto: user - Add CRYPTO_MSG_DELRNG
  crypto: user - Move cryptouser.h to uapi
  crypto: rng - Do not free default RNG when it becomes unused
  crypto: skcipher - Allow givencrypt to be NULL
  crypto: sahara - propagate the error on clk_disable_unprepare() failure
  crypto: rsa - fix invalid select for AKCIPHER
  crypto: picoxcell - Update to the current clk API
  crypto: nx - Check for bogus firmware properties
  crypto: marvell/cesa - add DT bindings documentation
  crypto: marvell/cesa - add support for Kirkwood and Dove SoCs
  crypto: marvell/cesa - add support for Orion SoCs
  crypto: marvell/cesa - add allhwsupport module parameter
  crypto: marvell/cesa - add support for all armada SoCs
  ...
2015-06-22 21:04:48 -07:00
Jeremiah Mahler
de1e00871d crypto: aesni - fix crypto_fpu_exit() section mismatch
The '__init aesni_init()' function calls the '__exit crypto_fpu_exit()'
function directly.  Since they are in different sections, this generates
a warning.

  make CONFIG_DEBUG_SECTION_MISMATCH=y
  ...
  WARNING: arch/x86/crypto/aesni-intel.o(.init.text+0x12b): Section
  mismatch in reference from the function init_module() to the function
  .exit.text:crypto_fpu_exit()
  The function __init init_module() references
  a function __exit crypto_fpu_exit().
  This is often seen when error handling in the init function
  uses functionality in the exit path.
  The fix is often to remove the __exit annotation of
  crypto_fpu_exit() so it may be used outside an exit section.

Fix the warning by removing the __exit annotation.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-15 18:15:58 +08:00
Herbert Xu
b7c89d9e2f crypto: aesni - Convert rfc4106 to new AEAD interface
This patch converts the low-level __gcm-aes-aesni algorithm to
the new AEAD interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-03 10:51:24 +08:00
Herbert Xu
af05b3009b crypto: aesni - Convert top-level rfc4106 algorithm to new interface
This patch converts rfc4106-gcm-aesni to the new AEAD interface.
The low-level interface remains as is for now because we can't
touch it until cryptd itself is upgraded.

In the conversion I've also removed the duplicate copy of the
context in the top-level algorithm.  Now all processing is carried
out in the low-level __driver-gcm-aes-aesni algorithm.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-03 10:48:36 +08:00
Herbert Xu
6d7e3d8995 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Merge the crypto tree for 4.1 to pull in the changeset that disables
algif_aead.
2015-05-28 11:16:41 +08:00
Ingo Molnar
b54b4bbbf5 x86/fpu, crypto: Fix AVX2 feature tests
For some CPU models I broke the AVX2 feature detection in:

  7bc371faa9 ("x86/fpu, crypto x86/camellia_aesni_avx2: Simplify the camellia_aesni_init() xfeature checks")
  534ff06e39 ("x86/fpu, crypto x86/serpent_avx2: Simplify the init() xfeature checks")

... because I did not realize that it's possible for a CPU to support
the xstate necessary for AVX2 execution (XSTATE_YMM), but not have
the AVX2 instructions themselves.

Restore the necessary CPUID checks as well.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-22 10:58:45 +02:00
Ingo Molnar
57dd083e0c x86/fpu, crypto x86/sha1_mb: Remove FPU internal headers from sha1_mb.c
This file only uses the public FPU APIs, so remove the xcr.h, fpu/xstate.h
and fpu/internal.h headers and add the fpu/api.h include.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:59 +02:00
Ingo Molnar
534ff06e39 x86/fpu, crypto x86/serpent_avx2: Simplify the init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:59 +02:00
Ingo Molnar
d1e509660c x86/fpu, crypto x86/sha1_ssse3: Simplify the sha1_ssse3_mod_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:59 +02:00
Ingo Molnar
1debf7db2b x86/fpu, crypto x86/cast6_avx: Simplify the cast6_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:58 +02:00
Ingo Molnar
c93b8a3963 x86/fpu, crypto x86/sha512_ssse3: Simplify the sha512_ssse3_mod_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:58 +02:00
Ingo Molnar
d5d34d98d2 x86/fpu, crypto x86/cast5_avx: Simplify the cast5_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:58 +02:00
Ingo Molnar
c1c23f7e5e x86/fpu, crypto x86/serpent_avx: Simplify the serpent_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:57 +02:00
Ingo Molnar
4eecd2616d x86/fpu, crypto x86/twofish_avx: Simplify the twofish_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:57 +02:00
Ingo Molnar
7bc371faa9 x86/fpu, crypto x86/camellia_aesni_avx2: Simplify the camellia_aesni_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:57 +02:00
Ingo Molnar
70d51eb65d x86/fpu, crypto x86/sha256_ssse3: Simplify the sha256_ssse3_mod_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:57 +02:00
Ingo Molnar
ce4f5f9b65 x86/fpu, crypto x86/camellia_aesni_avx: Simplify the camellia_aesni_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit:

     text    data     bss     dec     hex filename
     2128    2896       0    5024    13a0 camellia_aesni_avx_glue.o.before
     2067    2896       0    4963    1363 camellia_aesni_avx_glue.o.after

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:56 +02:00
Ingo Molnar
669ebabb79 x86/fpu: Rename fpu/xsave.h to fpu/xstate.h
'xsave' is an x86 instruction name to most people - but xsave.h is
about a lot more than just the XSAVE instruction: it includes
definitions and support, both internal and external, related to
xstate and xfeatures support.

As a first step in cleaning up the various xstate uses rename this
header to 'fpu/xstate.h' to better reflect what this header file
is about.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:54 +02:00
Ingo Molnar
78f7f1e54b x86/fpu: Rename fpu-internal.h to fpu/internal.h
This unifies all the FPU related header files under a unified, hiearchical
naming scheme:

 - asm/fpu/types.h:      FPU related data types, needed for 'struct task_struct',
                         widely included in almost all kernel code, and hence kept
                         as small as possible.

 - asm/fpu/api.h:        FPU related 'public' methods exported to other subsystems.

 - asm/fpu/internal.h:   FPU subsystem internal methods

 - asm/fpu/xsave.h:      XSAVE support internal methods

(Also standardize the header guard in asm/fpu/internal.h.)

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:31 +02:00
Ingo Molnar
a137fb6bbf x86/fpu: Move xsave.h to fpu/xsave.h
Move the xsave.h header file to the FPU directory as well.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:30 +02:00
Ingo Molnar
df6b35f409 x86/fpu: Rename i387.h to fpu/api.h
We already have fpu/types.h, move i387.h to fpu/api.h.

The file name has become a misnomer anyway: it offers generic FPU APIs,
but is not limited to i387 functionality.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:30 +02:00
Ingo Molnar
f89e32e0a3 x86/fpu: Fix header file dependencies of fpu-internal.h
Fix a minor header file dependency bug in asm/fpu-internal.h: it
relies on i387.h but does not include it. All users of fpu-internal.h
included it explicitly.

Also remove unnecessary includes, to reduce compilation time.

This also makes it easier to use it as a standalone header file
for FPU internals, such as an upcoming C module in arch/x86/kernel/fpu/.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:16 +02:00
Herbert Xu
a5a2b4da01 crypto: aesni - Use crypto_aead_set_reqsize helper
This patch uses the crypto_aead_set_reqsize helper to avoid directly
touching the internals of aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-13 10:31:43 +08:00
firo yang
21a6dd5b39 crypto: sha1-mb - Remove pointless cast
Since kzalloc() returns a void pointer, we don't need to cast the
return value in arch/x86/crypto/sha-mb/sha1_mb.c::sha1_mb_mod_init().

Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-04-26 14:33:17 +08:00
Ard Biesheuvel
00425bb181 crypto: x86/sha512_ssse3 - fixup for asm function prototype change
Patch e68410ebf6 ("crypto: x86/sha512_ssse3 - move SHA-384/512
SSSE3 implementation to base layer") changed the prototypes of the
core asm SHA-512 implementations so that they are compatible with
the prototype used by the base layer.

However, in one instance, the register that was used for passing the
input buffer was reused as a scratch register later on in the code,
and since the input buffer param changed places with the digest param
-which needs to be written back before the function returns- this
resulted in the scratch register to be dereferenced in a memory write
operation, causing a GPF.

Fix this by changing the scratch register to use the same register as
the input buffer param again.

Fixes: e68410ebf6 ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer")
Reported-By: Bobby Powers <bobbypowers@gmail.com>
Tested-By: Bobby Powers <bobbypowers@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-04-24 20:09:01 +08:00
Linus Torvalds
cb906953d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 4.1:

  New interfaces:
   - user-space interface for AEAD
   - user-space interface for RNG (i.e., pseudo RNG)

  New hashes:
   - ARMv8 SHA1/256
   - ARMv8 AES
   - ARMv8 GHASH
   - ARM assembler and NEON SHA256
   - MIPS OCTEON SHA1/256/512
   - MIPS img-hash SHA1/256 and MD5
   - Power 8 VMX AES/CBC/CTR/GHASH
   - PPC assembler AES, SHA1/256 and MD5
   - Broadcom IPROC RNG driver

  Cleanups/fixes:
   - prevent internal helper algos from being exposed to user-space
   - merge common code from assembly/C SHA implementations
   - misc fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (169 commits)
  crypto: arm - workaround for building with old binutils
  crypto: arm/sha256 - avoid sha256 code on ARMv7-M
  crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer
  crypto: x86/sha256_ssse3 - move SHA-224/256 SSSE3 implementation to base layer
  crypto: x86/sha1_ssse3 - move SHA-1 SSSE3 implementation to base layer
  crypto: arm64/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer
  crypto: arm64/sha1-ce - move SHA-1 ARMv8 implementation to base layer
  crypto: arm/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer
  crypto: arm/sha256 - move SHA-224/256 ASM/NEON implementation to base layer
  crypto: arm/sha1-ce - move SHA-1 ARMv8 implementation to base layer
  crypto: arm/sha1_neon - move SHA-1 NEON implementation to base layer
  crypto: arm/sha1 - move SHA-1 ARM asm implementation to base layer
  crypto: sha512-generic - move to generic glue implementation
  crypto: sha256-generic - move to generic glue implementation
  crypto: sha1-generic - move to generic glue implementation
  crypto: sha512 - implement base layer for SHA-512
  crypto: sha256 - implement base layer for SHA-256
  crypto: sha1 - implement base layer for SHA-1
  crypto: api - remove instance when test failed
  crypto: api - Move alg ref count init to crypto_check_alg
  ...
2015-04-15 10:42:15 -07:00
Ard Biesheuvel
e68410ebf6 crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.  It also changes the
prototypes of the core asm functions to be compatible with the base
prototype

  void (sha512_block_fn)(struct sha256_state *sst, u8 const *src, int blocks)

so that they can be passed to the base layer directly.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-04-10 21:39:48 +08:00
Ard Biesheuvel
1631030ae6 crypto: x86/sha256_ssse3 - move SHA-224/256 SSSE3 implementation to base layer
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer. It also changes the
prototypes of the core asm functions to be compatible with the base
prototype

  void (sha256_block_fn)(struct sha256_state *sst, u8 const *src, int blocks)

so that they can be passed to the base layer directly.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-04-10 21:39:47 +08:00
Ard Biesheuvel
824b43763c crypto: x86/sha1_ssse3 - move SHA-1 SSSE3 implementation to base layer
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-04-10 21:39:47 +08:00
Denys Vlasenko
a734b4a23e x86/asm: Replace "MOVQ $imm, %reg" with MOVL
There is no reason to use MOVQ to load a non-negative immediate
constant value into a 64-bit register. MOVL does the same, since
the upper 32 bits are zero-extended by the CPU.

This makes the code a bit smaller, while leaving functionality
unchanged.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-8-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 13:17:39 +02:00
Stephan Mueller
555fa17b2b crypto: sha-mb - mark Multi buffer SHA1 helper cipher
Flag all Multi buffer SHA1 helper ciphers as internal ciphers
to prevent them from being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:13 +08:00
Stephan Mueller
4dda66f62e crypto: twofish_avx - mark Twofish AVX helper ciphers
Flag all Twofish AVX helper ciphers as internal ciphers to prevent
them from being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:11 +08:00
Stephan Mueller
748be1f1bf crypto: serpent_sse2 - mark Serpent SSE2 helper ciphers
Flag all Serpent SSE2 helper ciphers as internal ciphers to prevent
them from being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:10 +08:00
Stephan Mueller
65aed53941 crypto: serpent_avx - mark Serpent AVX helper ciphers
Flag all Serpent AVX helper ciphers as internal ciphers to prevent
them from being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:10 +08:00
Stephan Mueller
f82419acd8 crypto: serpent_avx2 - mark Serpent AVX2 helper ciphers
Flag all Serpent AVX2 helper ciphers as internal ciphers to prevent
them from being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:09 +08:00
Stephan Mueller
e69b8a46ca crypto: cast6_avx - mark CAST6 helper ciphers
Flag all CAST6 helper ciphers as internal ciphers to prevent them
from being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:09 +08:00
Stephan Mueller
7d2c31dd70 crypto: camellia_aesni_avx - mark AVX Camellia helper ciphers
Flag all AVX Camellia helper ciphers as internal ciphers to prevent
them from being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:08 +08:00
Stephan Mueller
680574e8b3 crypto: cast5_avx - mark CAST5 helper ciphers
Flag all CAST5 helper ciphers as internal ciphers to prevent them
from being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:08 +08:00
Stephan Mueller
a62356a978 crypto: camellia_aesni_avx2 - mark AES-NI Camellia helper ciphers
Flag all AES-NI Camellia helper ciphers as internal ciphers to
prevent them from being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:07 +08:00
Stephan Mueller
6a9b52b7fa crypto: clmulni - mark ghash clmulni helper ciphers
Flag all ash clmulni helper ciphers as internal ciphers to prevent them
from being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:06 +08:00
Stephan Mueller
eabdc320ec crypto: aesni - mark AES-NI helper ciphers
Flag all AES-NI helper ciphers as internal ciphers to prevent them from
being called by normal users.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-31 21:21:05 +08:00
Ameen Ali
c42e9902f3 crypto: sha1-mb - Syntax error
fixing a syntax-error .

Signed-off-by: Ameen Ali <AmeenAli023@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-16 21:47:58 +11:00
Julia Lawall
05713ba905 crypto: don't export static symbol
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
type T;
identifier f;
@@

static T f (...) { ... }

@@
identifier r.f;
declarer name EXPORT_SYMBOL_GPL;
@@

-EXPORT_SYMBOL_GPL(f);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-13 21:37:15 +11:00
Stephan Mueller
ccfe8c3f7e crypto: aesni - fix memory usage in GCM decryption
The kernel crypto API logic requires the caller to provide the
length of (ciphertext || authentication tag) as cryptlen for the
AEAD decryption operation. Thus, the cipher implementation must
calculate the size of the plaintext output itself and cannot simply use
cryptlen.

The RFC4106 GCM decryption operation tries to overwrite cryptlen memory
in req->dst. As the destination buffer for decryption only needs to hold
the plaintext memory but cryptlen references the input buffer holding
(ciphertext || authentication tag), the assumption of the destination
buffer length in RFC4106 GCM operation leads to a too large size. This
patch simply uses the already calculated plaintext size.

In addition, this patch fixes the offset calculation of the AAD buffer
pointer: as mentioned before, cryptlen already includes the size of the
tag. Thus, the tag does not need to be added. With the addition, the AAD
will be written beyond the already allocated buffer.

Note, this fixes a kernel crash that can be triggered from user space
via AF_ALG(aead) -- simply use the libkcapi test application
from [1] and update it to use rfc4106-gcm-aes.

Using [1], the changes were tested using CAVS vectors to demonstrate
that the crypto operation still delivers the right results.

[1] http://www.chronox.de/libkcapi.html

CC: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-13 21:32:21 +11:00
Tadeusz Struk
81e397d937 crypto: aesni - make driver-gcm-aes-aesni helper a proper aead alg
Changed the __driver-gcm-aes-aesni to be a proper aead algorithm.
This required a valid setkey and setauthsize functions to be added and also
some changes to make sure that math context is not corrupted when the alg is
used directly.
Note that the __driver-gcm-aes-aesni should not be used directly by modules
that can use it in interrupt context as we don't have a good fallback mechanism
in this case.

Signed-off-by: Adrian Hoban <adrian.hoban@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-02-28 23:31:35 +13:00
Lad, Prabhakar
66c046b407 crypto: sha-mb - Fix big integer constant sparse warning
this patch fixes following sparse warning:

sha1_mb_mgr_init_avx2.c:59:31: warning: constant 0xF76543210 is so big it is long

Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-02-27 22:48:49 +13:00
Linus Torvalds
fee5429e02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 3.20:

   - Added 192/256-bit key support to aesni GCM.
   - Added MIPS OCTEON MD5 support.
   - Fixed hwrng starvation and race conditions.
   - Added note that memzero_explicit is not a subsitute for memset.
   - Added user-space interface for crypto_rng.
   - Misc fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
  crypto: tcrypt - do not allocate iv on stack for aead speed tests
  crypto: testmgr - limit IV copy length in aead tests
  crypto: tcrypt - fix buflen reminder calculation
  crypto: testmgr - mark rfc4106(gcm(aes)) as fips_allowed
  crypto: caam - fix resource clean-up on error path for caam_jr_init
  crypto: caam - pair irq map and dispose in the same function
  crypto: ccp - terminate ccp_support array with empty element
  crypto: caam - remove unused local variable
  crypto: caam - remove dead code
  crypto: caam - don't emit ICV check failures to dmesg
  hwrng: virtio - drop extra empty line
  crypto: replace scatterwalk_sg_next with sg_next
  crypto: atmel - Free memory in error path
  crypto: doc - remove colons in comments
  crypto: seqiv - Ensure that IV size is at least 8 bytes
  crypto: cts - Weed out non-CBC algorithms
  MAINTAINERS: add linux-crypto to hw random
  crypto: cts - Remove bogus use of seqiv
  crypto: qat - don't need qat_auth_state struct
  crypto: algif_rng - fix sparse non static symbol warning
  ...
2015-02-14 09:47:01 -08:00
Timothy McCaffrey
e31ac32d3b crypto: aesni - Add support for 192 & 256 bit keys to AESNI RFC4106
These patches fix the RFC4106 implementation in the aesni-intel
module so it supports 192 & 256 bit keys.

Since the AVX support that was added to this module also only
supports 128 bit keys, and this patch only affects the SSE
implementation, changes were also made to use the SSE version
if key sizes other than 128 are specified.

RFC4106 specifies that 192 & 256 bit keys must be supported (section
8.4).

Also, this should fix Strongswan issue 341 where the aesni module
needs to be unloaded if 256 bit keys are used:

http://wiki.strongswan.org/issues/341

This patch has been tested with Sandy Bridge and Haswell processors.
With 128 bit keys and input buffers > 512 bytes a slight performance
degradation was noticed (~1%).  For input buffers of less than 512
bytes there was no performance impact.  Compared to 128 bit keys,
256 bit key size performance is approx. .5 cycles per byte slower
on Sandy Bridge, and .37 cycles per byte slower on Haswell (vs.
SSE code).

This patch has also been tested with StrongSwan IPSec connections
where it worked correctly.

I created this diff from a git clone of crypto-2.6.git.

Any questions, please feel free to contact me.

Signed-off-by: Timothy McCaffrey <timothy.mccaffrey@unisys.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-01-14 21:56:51 +11:00
Mathias Krause
d8219f52a7 crypto: x86/des3_ede - drop bogus module aliases
This module implements variations of "des3_ede" only. Drop the bogus
module aliases for "des".

Cc: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-01-13 22:30:52 +11:00
Mathias Krause
3e14dcf7cb crypto: add missing crypto module aliases
Commit 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
changed the automatic module loading when requesting crypto algorithms
to prefix all module requests with "crypto-". This requires all crypto
modules to have a crypto specific module alias even if their file name
would otherwise match the requested crypto algorithm.

Even though commit 5d26a105b5 added those aliases for a vast amount of
modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO
annotations to those files to make them get loaded automatically, again.
This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work
with kernels v3.18 and below.

Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former
won't work for crypto modules any more.

Fixes: 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-01-13 22:29:11 +11:00
Vinson Lee
0b8c960cf6 crypto: sha-mb - Add avx2_supported check.
This patch fixes this allyesconfig target build error with older
binutils.

  LD      arch/x86/crypto/built-in.o
ld: arch/x86/crypto/sha-mb/built-in.o: No such file: No such file or directory

Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Vinson Lee <vlee@twitter.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-01-05 21:35:02 +11:00
Mathias Krause
0b1e95b2fa crypto: aesni - fix "by8" variant for 128 bit keys
The "by8" counter mode optimization is broken for 128 bit keys with
input data longer than 128 bytes. It uses the wrong key material for
en- and decryption.

The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved
in case we're handling more than 128 bytes of input data -- they won't
get reloaded after the initial load. They must therefore be (a) loaded
on the first iteration and (b) be preserved for the latter ones. The
implementation for 128 bit keys does not comply with (a) nor (b).

Fix this by bringing the implementation back to its original source
and correctly load the key registers and preserve their values by
*not* re-using the registers for other purposes.

Kudos to James for reporting the issue and providing a test case
showing the discrepancies.

Reported-by: James Yonan <james@openvpn.net>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.18
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-01-05 21:35:02 +11:00
Julia Lawall
a6326ba025 crypto: sha - replace memset by memzero_explicit
Memset on a local variable may be removed when it is called just before the
variable goes out of scope.  Using memzero_explicit defeats this
optimization.  A simplified version of the semantic patch that makes this
change is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
type T;
@@

{
... when any
T x[...];
... when any
    when exists
- memset
+ memzero_explicit
  (x,
-0,
  ...)
... when != x
    when strict
}
// </smpl>

This change was suggested by Daniel Borkmann <dborkman@redhat.com>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-12-02 22:55:49 +08:00
Kees Cook
4943ba16bb crypto: include crypto- module prefix in template
This adds the module loading prefix "crypto-" to the template lookup
as well.

For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly
includes the "crypto-" prefix at every level, correctly rejecting "vfat":

	net-pf-38
	algif-hash
	crypto-vfat(blowfish)
	crypto-vfat(blowfish)-all
	crypto-vfat

Reported-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-26 20:06:30 +08:00
Dan Carpenter
5d1b3c98ec crypto: sha-mb - remove a bogus NULL check
This can't be NULL and we dereferenced it earlier.  Smatch used to
ignore these things where the pointer was obviously non-NULL but I've
found that sometimes the intention was to check something else so we
were maybe missing bugs.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-25 22:50:43 +08:00
Kees Cook
5d26a105b5 crypto: prefix module autoloading with "crypto-"
This prefixes all crypto module loading with "crypto-" so we never run
the risk of exposing module auto-loading to userspace via a crypto API,
as demonstrated by Mathias Krause:

https://lkml.org/lkml/2013/3/4/70

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-24 22:43:57 +08:00
Valentin Rothberg
304576a776 crypto: aesni - remove unnecessary #define
The CPP identifier 'HAS_PCBC' is defined when the Kconfig
option CRYPTO_PCBC is set as 'y' or 'm', and is further
used in two ifdef blocks to conditionally compile source
code. This indirection hides the actual Kconfig dependency
and complicates readability. Moreover, it's inconsistent
with the rest of the ifdef blocks in the file, which
directly reference Kconfig options.

This patch removes 'HAS_PCBC' and replaces its occurrences
with the actual dependency on 'CRYPTO_PCBC' being set as
'y' or 'm'.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-06 23:14:59 +08:00
Mathias Krause
5cfed7b335 Revert "crypto: aesni - disable "by8" AVX CTR optimization"
This reverts commit 7da4b29d49.

Now, that the issue is fixed, we can re-enable the code.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-10-02 14:40:28 +08:00
Herbert Xu
9561dccb45 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Merging the crypto tree for 3.17 to pull in the "by8" AVX CTR revert.
2014-10-02 14:37:20 +08:00
Mathias Krause
e3b3bb5ac1 crypto: aesni - remove unused defines in "by8" variant
The defines for xkey3, xkey6 and xkey9 are not used in the code. They're
probably left overs from merging the three source files for 128, 192 and
256 bit AES. They can safely be removed.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-10-02 14:35:03 +08:00
Mathias Krause
80dca4734b crypto: aesni - fix counter overflow handling in "by8" variant
The "by8" CTR AVX implementation fails to propperly handle counter
overflows. That was the reason it got disabled in commit 7da4b29d49
("crypto: aesni - disable "by8" AVX CTR optimization").

Fix the overflow handling by incrementing the counter block as a double
quad word, i.e. a 128 bit, and testing for overflows afterwards. We need
to use VPTEST to do so as VPADD* does not set the flags itself and
silently drops the carry bit.

As this change adds branches to the hot path, minor performance
regressions  might be a side effect. But, OTOH, we now have a conforming
implementation -- the preferable goal.

A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical
numbers for the old and this version with differences within the noise
range. A dm-crypt test with the fixed version gave even slightly better
results for this version. So the performance impact might not be as big
as expected.

Tested-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-10-02 14:35:03 +08:00
Mathias Krause
7da4b29d49 crypto: aesni - disable "by8" AVX CTR optimization
The "by8" implementation introduced in commit 22cddcc7df ("crypto: aes
- AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it
handles counter block overflows differently. It only accounts the right
most 32 bit as a counter -- not the whole block as all other
implementations do. This makes it fail the cryptomgr test #4 that
specifically tests this corner case.

As we're quite late in the release cycle, just disable the "by8" variant
for now.

Reported-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-09-24 21:15:31 +08:00
Fengguang Wu
4c1948fc47 crypto: sha-mb - sha1_mb_alg_state can be static
CC: Tim Chen <tim.c.chen@linux.intel.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-08-26 14:40:52 +08:00
Tim Chen
ad61e042e9 crypto: sha-mb - SHA1 multibuffer job manager and glue code
This patch introduces the multi-buffer job manager which is responsible
for submitting scatter-gather buffers from several SHA1 jobs to the
multi-buffer algorithm.  It also contains the flush routine to that's
called by the crypto daemon to complete the job when no new jobs arrive
before the deadline of maximum latency of a SHA1 crypto job.

The SHA1 multi-buffer crypto algorithm is defined and initialized in
this patch.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-08-25 20:32:30 +08:00
Tim Chen
12d2513d5f crypto: sha-mb - SHA1 multibuffer crypto computation (x8 AVX2)
This patch introduces the assembly routines to do SHA1 computation on
buffers belonging to serveral jobs at once.  The assembly routines are
optimized with AVX2 instructions that have 8 data lanes and using AVX2
registers.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-08-25 20:32:29 +08:00
Tim Chen
2249cbb53e crypto: sha-mb - SHA1 multibuffer submit and flush routines for AVX2
This patch introduces the routines used to submit and flush buffers
belonging to SHA1 crypto jobs to the SHA1 multibuffer algorithm.  It is
implemented mostly in assembly optimized with AVX2 instructions.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-08-25 20:32:28 +08:00
Tim Chen
1161777823 crypto: sha-mb - SHA1 multibuffer algorithm data structures
This patch introduces the data structures and prototypes of functions
needed for computing SHA1 hash using multi-buffer.  Included are the
structures of the multi-buffer SHA1 job, job scheduler in C and x86
assembly.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-08-25 20:32:26 +08:00
Linus Torvalds
3e7a716a92 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 - CTR(AES) optimisation on x86_64 using "by8" AVX.
 - arm64 support to ccp
 - Intel QAT crypto driver
 - Qualcomm crypto engine driver
 - x86-64 assembly optimisation for 3DES
 - CTR(3DES) speed test
 - move FIPS panic from module.c so that it only triggers on crypto
   modules
 - SP800-90A Deterministic Random Bit Generator (drbg).
 - more test vectors for ghash.
 - tweak self tests to catch partial block bugs.
 - misc fixes.

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (94 commits)
  crypto: drbg - fix failure of generating multiple of 2**16 bytes
  crypto: ccp - Do not sign extend input data to CCP
  crypto: testmgr - add missing spaces to drbg error strings
  crypto: atmel-tdes - Switch to managed version of kzalloc
  crypto: atmel-sha - Switch to managed version of kzalloc
  crypto: testmgr - use chunks smaller than algo block size in chunk tests
  crypto: qat - Fixed SKU1 dev issue
  crypto: qat - Use hweight for bit counting
  crypto: qat - Updated print outputs
  crypto: qat - change ae_num to ae_id
  crypto: qat - change slice->regions to slice->region
  crypto: qat - use min_t macro
  crypto: qat - remove unnecessary parentheses
  crypto: qat - remove unneeded header
  crypto: qat - checkpatch blank lines
  crypto: qat - remove unnecessary return codes
  crypto: Resolve shadow warnings
  crypto: ccp - Remove "select OF" from Kconfig
  crypto: caam - fix DECO RSR polling
  crypto: qce - Let 'DEV_QCE' depend on both HAS_DMA and HAS_IOMEM
  ...
2014-08-04 09:52:51 -07:00
Jussi Kivilinna
cfe82d4f45 crypto: sha512_ssse3 - fix byte count to bit count conversion
Byte-to-bit-count computation is only partly converted to big-endian and is
mixing in CPU-endian values. Problem was noticed by sparce with warning:

  CHECK   arch/x86/crypto/sha512_ssse3_glue.c
arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades to integer
arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in assignment (different base types)
arch/x86/crypto/sha512_ssse3_glue.c:144:17:    expected restricted __be64 <noident>
arch/x86/crypto/sha512_ssse3_glue.c:144:17:    got unsigned long long

Cc: <stable@vger.kernel.org>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-06-25 21:55:02 +08:00
Jussi Kivilinna
5e50d43d65 crypto: des3_ede-x86_64 - fix parse warning
Patch fixes following sparse warning:

  CHECK   arch/x86/crypto/des3_ede_glue.c
arch/x86/crypto/des3_ede_glue.c:308:52: warning: restricted __be64 degrades to integer
arch/x86/crypto/des3_ede_glue.c:309:52: warning: restricted __be64 degrades to integer
arch/x86/crypto/des3_ede_glue.c:310:52: warning: restricted __be64 degrades to integer
arch/x86/crypto/des3_ede_glue.c:326:44: warning: restricted __be64 degrades to integer

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-06-25 21:38:43 +08:00
chandramouli narayanan
22cddcc7df crypto: aes - AES CTR x86_64 "by8" AVX optimization
This patch introduces "by8" AES CTR mode AVX optimization inspired by
Intel Optimized IPSEC Cryptograhpic library. For additional information,
please see:
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972

The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and
aes_ctr_enc_256_avx_by8() are adapted from
Intel Optimized IPSEC Cryptographic library. When both AES and AVX features
are enabled in a platform, the glue code in AESNI module overrieds the
existing "by4" CTR mode en/decryption with the "by8"
AES CTR mode en/decryption.

On a Haswell desktop, with turbo disabled and all cpus running
at maximum frequency, the "by8" CTR mode optimization
shows better performance results across data & key sizes
as measured by tcrypt.

The average performance improvement of the "by8" version over the "by4"
version is as follows:

For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement.
For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement.
For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement.

A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8"
optimization shows the following results:

tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop:
---------------------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 bytes)

tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop:
---------------------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 539 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1153 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 8458 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 353 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 360 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 512 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1277 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 8745 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 348 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 335 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 451 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1030 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 6611 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 354 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 346 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 488 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1154 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 8390 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 357 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 362 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 515 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1284 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 8681 cycles (8192 bytes)

crypto: Incorporate feed back to AES CTR mode optimization patch

Specifically, the following:
a) alignment around main loop in aes_ctrby8_avx_x86_64.S
b) .rodata around data constants used in the assembely code.
c) the use of CONFIG_AVX in the glue code.
d) fix up white space.
e) informational message for "by8" AES CTR mode optimization
f) "by8" AES CTR mode optimization can be simply enabled
if the platform supports both AES and AVX features. The
optimization works superbly on Sandybridge as well.

Testing on Haswell shows no performance change since the last.

Testing on Sandybridge shows that the "by8" AES CTR mode optimization
greatly improves performance.

tcrypt log with "by4" AES CTR mode optimization on Sandybridge
--------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 383 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 408 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 707 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1864 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 12813 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 395 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 432 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 780 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 2132 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 15765 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 416 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 438 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 842 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 2383 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 16945 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 389 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 409 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 704 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1865 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 12783 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 409 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 434 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 792 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 2151 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 15804 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 421 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 444 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 840 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 2394 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 16928 cycles (8192 bytes)

tcrypt log with "by8" AES CTR mode optimization on Sandybridge
--------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 383 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 401 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 522 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1136 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7046 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 394 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 418 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 559 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1263 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9072 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 408 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 428 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1385 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 9224 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 390 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 402 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 530 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1135 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7079 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 414 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 417 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 572 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1312 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9073 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 415 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 454 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 598 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1407 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 9288 cycles (8192 bytes)

crypto: Fix redundant checks

a) Fix the redundant check for cpu_has_aes
b) Fix the key length check when invoking the CTR mode "by8"
encryptor/decryptor.

crypto: fix typo in AES ctr mode transform

Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com>
Reviewed-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-06-20 21:27:58 +08:00
Jussi Kivilinna
6574e6c64e crypto: des_3des - add x86-64 assembly implementation
Patch adds x86_64 assembly implementation of Triple DES EDE cipher algorithm.
Two assembly implementations are provided. First is regular 'one-block at
time' encrypt/decrypt function. Second is 'three-blocks at time' function that
gains performance increase on out-of-order CPUs.

tcrypt test results:

Intel Core i5-4570:

des3_ede-asm vs des3_ede-generic:
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16B     1.21x   1.22x   1.27x   1.36x   1.25x   1.25x
64B     1.98x   1.96x   1.23x   2.04x   2.01x   2.00x
256B    2.34x   2.37x   1.21x   2.40x   2.38x   2.39x
1024B   2.50x   2.47x   1.22x   2.51x   2.52x   2.51x
8192B   2.51x   2.53x   1.21x   2.56x   2.54x   2.55x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-06-20 21:27:58 +08:00
George Spelvin
473946e674 crypto: crc32c-pclmul - Shrink K_table to 32-bit words
There's no need for the K_table to be made of 64-bit words.  For some
reason, the original authors didn't fully reduce the values modulo the
CRC32C polynomial, and so had some 33-bit values in there.  They can
all be reduced to 32 bits.

Doing that cuts the table size in half.  Since the code depends on both
pclmulq and crc32, SSE 4.1 is obviously present, so we can use pmovzxdq
to fetch it in the correct format.

This adds (measured on Ivy Bridge) 1 cycle per main loop iteration
(CRC of up to 3K bytes), less than 0.2%.  The hope is that the reduced
D-cache footprint will make up the loss in other code.

Two other related fixes:
* K_table is read-only, so belongs in .rodata, and
* There's no need for more than 8-byte alignment

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-06-20 21:27:57 +08:00
Herbert Xu
0ea481466d crypto: ghash-clmulni-intel - Use u128 instead of be128 for internal key
The internal key isn't actually in big-endian format so let's switch
to u128 which also happens to allow us to remove a sparse warning.

Based on suggestion by Ard Biesheuvel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2014-04-04 21:06:14 +08:00
Ard Biesheuvel
8ceee72808 crypto: ghash-clmulni-intel - use C implementation for setkey()
The GHASH setkey() function uses SSE registers but fails to call
kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and
then having to deal with the restriction that they cannot be called from
interrupt context, move the setkey() implementation to the C domain.

Note that setkey() does not use any particular SSE features and is not
expected to become a performance bottleneck.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Fixes: 0e1227d356 (crypto: ghash - Add PCLMULQDQ accelerated implementation)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-04-01 17:22:47 +08:00
Mathias Krause
37b2894717 crypto: x86/sha1 - reduce size of the AVX2 asm implementation
There is really no need to page align sha1_transform_avx2. The default
alignment is just fine. This is not the hot code but only the entry
point, after all.

Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Reviewed-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-03-25 20:25:43 +08:00
Mathias Krause
6c8c17cc7a crypto: x86/sha1 - fix stack alignment of AVX2 variant
The AVX2 implementation might waste up to a page of stack memory because
of a wrong alignment calculation. This will, in the worst case, increase
the stack usage of sha1_transform_avx2() alone to 5.4 kB -- way to big
for a kernel function. Even worse, it might also allocate *less* bytes
than needed if the stack pointer is already aligned bacause in that case
the 'sub %rbx, %rsp' is effectively moving the stack pointer upwards,
not downwards.

Fix those issues by changing and simplifying the alignment calculation
to use a 32 byte alignment, the alignment really needed.

Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Reviewed-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-03-25 20:25:43 +08:00
Mathias Krause
6ca5afb8c2 crypto: x86/sha1 - re-enable the AVX variant
Commit 7c1da8d0d0 "crypto: sha - SHA1 transform x86_64 AVX2"
accidentally disabled the AVX variant by making the avx_usable() test
not only fail in case the CPU doesn't support AVX or OSXSAVE but also
if it doesn't support AVX2.

Fix that regression by splitting up the AVX/AVX2 test into two
functions. Also test for the BMI1 extension in the avx2_usable() test
as the AVX2 implementation not only makes use of BMI2 but also BMI1
instructions.

Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Reviewed-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-03-25 20:25:42 +08:00
chandramouli narayanan
7c1da8d0d0 crypto: sha - SHA1 transform x86_64 AVX2
This git patch adds x86_64 AVX2 optimization of SHA1
transform to crypto support. The patch has been tested with 3.14.0-rc1
kernel.

On a Haswell desktop, with turbo disabled and all cpus running
at maximum frequency, tcrypt shows AVX2 performance improvement
from 3% for 256 bytes update to 16% for 1024 bytes update over
AVX implementation.

This patch adds sha1_avx2_transform(), the glue, build and
configuration changes needed for AVX2 optimization of
SHA1 transform to crypto support.

sha1-ssse3 is one module which adds the necessary optimization
support (SSSE3/AVX/AVX2) for the low-level SHA1 transform function.
With better optimization support, transform function is overridden
as the case may be. In the case of AVX2, due to performance reasons
across datablock sizes, the AVX or AVX2 transform function is used
at run-time as it suits best. The Makefile change therefore appends
the necessary objects to the linkage. Due to this, the patch merely
appends AVX2 transform to the existing build mix and Kconfig support
and leaves the configuration build support as is.

Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-03-21 21:54:30 +08:00
Dan Carpenter
b3bd5869fd crypto: remove a duplicate checks in __cbc_decrypt()
We checked "nbytes < bsize" before so it can't happen here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Acked-by: Johannes Götzfried <johannes.goetzfried@cs.fau.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-02-27 05:56:54 +08:00
Tim Chen
79ba451d66 crypto: aesni - fix build on x86 (32bit)
We rename aesni-intel_avx.S to aesni-intel_avx-x86_64.S to indicate
that it is only used by x86_64 architecture.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-01-15 11:36:34 +08:00
Andy Shevchenko
8610d7bf60 crypto: aesni - fix build on x86 (32bit)
It seems commit d764593a "crypto: aesni - AVX and AVX2 version of AESNI-GCM
encode and decode" breaks a build on x86_32 since it's designed only for
x86_64. This patch makes a compilation unit conditional to CONFIG_64BIT and
functions usage to CONFIG_X86_64.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-12-31 19:47:46 +08:00
Tim Chen
d764593af9 crypto: aesni - AVX and AVX2 version of AESNI-GCM encode and decode
We have added AVX and AVX2 routines that optimize AESNI-GCM encode/decode.
These routines are optimized for encrypt and decrypt of large buffers.
In tests we have seen up to 6% speedup for 1K, 11% speedup for 2K and
18% speedup for 8K buffer over the existing SSE version.  These routines
should provide even better speedup for future Intel x86_64 cpus.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-12-20 20:06:24 +08:00