| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384
algorithms.
tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm:
block-size bytes/update old-vs-new
16 16 2.99x
64 16 2.67x
64 64 3.00x
256 16 2.64x
256 64 3.06x
256 256 3.33x
1024 16 2.53x
1024 256 3.39x
1024 1024 3.52x
2048 16 2.50x
2048 256 3.41x
2048 1024 3.54x
2048 2048 3.57x
4096 16 2.49x
4096 256 3.42x
4096 1024 3.56x
4096 4096 3.59x
8192 16 2.48x
8192 256 3.42x
8192 1024 3.56x
8192 4096 3.60x
8192 8192 3.60x
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds ARM NEON assembly implementation of SHA-1 algorithm.
tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm:
block-size bytes/update old-vs-new
16 16 1.04x
64 16 1.02x
64 64 1.05x
256 16 1.03x
256 64 1.04x
256 256 1.30x
1024 16 1.03x
1024 256 1.36x
1024 1024 1.52x
2048 16 1.03x
2048 256 1.39x
2048 1024 1.55x
2048 2048 1.59x
4096 16 1.03x
4096 256 1.40x
4096 1024 1.57x
4096 4096 1.62x
8192 16 1.03x
8192 256 1.40x
8192 1024 1.58x
8192 4096 1.63x
8192 8192 1.63x
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bit sliced AES gives around 45% speedup on Cortex-A15 for encryption
and around 25% for decryption. This implementation of the AES algorithm
does not rely on any lookup tables so it is believed to be invulnerable
to cache timing attacks.
This algorithm processes up to 8 blocks in parallel in constant time. This
means that it is not usable by chaining modes that are strictly sequential
in nature, such as CBC encryption. CBC decryption, however, can benefit from
this implementation and runs about 25% faster. The other chaining modes
implemented in this module, XTS and CTR, can execute fully in parallel in
both directions.
The core code has been adopted from the OpenSSL project (in collaboration
with the original author, on cc). For ease of maintenance, this version is
identical to the upstream OpenSSL code, i.e., all modifications that were
required to make it suitable for inclusion into the kernel have been made
upstream. The original can be found here:
http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130
Note to integrators:
While this implementation is significantly faster than the existing table
based ones (generic or ARM asm), especially in CTR mode, the effects on
power efficiency are unclear as of yet. This code does fundamentally more
work, by calculating values that the table based code obtains by a simple
lookup; only by doing all of that work in a SIMD fashion, it manages to
perform better.
Cc: Andy Polyakov <appro@openssl.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
|
|
Add assembler versions of AES and SHA1 for ARM platforms. This has provided
up to a 50% improvement in IPsec/TCP throughout for tunnels using AES128/SHA1.
Platform CPU SPeed Endian Before (bps) After (bps) Improvement
IXP425 533 MHz big 11217042 15566294 ~38%
KS8695 166 MHz little 3828549 5795373 ~51%
Signed-off-by: David McCullough <ucdevel@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|