Compare commits

...

24 Commits

Author SHA1 Message Date
Jay D Dee
bc5a5c6df8 v3.23.3 2023-09-28 18:43:18 -04:00
Jay D Dee
be88afc349 v3.23.2 2023-09-21 12:34:06 -04:00
Jay D Dee
d6b5750362 v3.23.1 2023-09-13 11:48:52 -04:00
Jay D Dee
4378d2f841 v3.23.0 2023-08-30 20:15:48 -04:00
Jay D Dee
57a6b7b58b v3.22.3 2023-06-14 11:07:40 -04:00
Jay D Dee
de564ccbde v3.22.2 2023-04-06 13:38:37 -04:00
Jay D Dee
fcd7727b0d v3.22.1 2023-03-24 18:29:42 -04:00
Jay D Dee
3dd6787531 v3.22.0 2023-03-21 17:12:51 -04:00
Jay D Dee
cae1ce2ab7 v3.21.5 2023-03-15 12:27:04 -04:00
Jay D Dee
7a91c41d74 v3.21.4 2023-03-13 14:54:38 -04:00
Jay D Dee
c6bc9d67fb v3.21.3 Unreleased 2023-03-13 03:20:13 -04:00
Jay D Dee
b339450898 v3.21.3 2023-03-11 14:54:49 -05:00
Jay D Dee
fb93160641 v3.21.2 2023-03-03 12:38:31 -05:00
Jay D Dee
520d4d5384 v3.21.1 2023-02-08 22:11:05 -05:00
Jay D Dee
da7030faa8 v3.21.0 2022-12-21 13:09:14 -05:00
Jay D Dee
bd84f199fe v3.20.3 2022-10-21 23:12:18 -04:00
Jay D Dee
58030e2788 v3.20.2 2022-08-01 20:21:05 -04:00
Jay D Dee
1321ac474c v3.20.1 2022-07-26 18:36:40 -04:00
Jay D Dee
40d07c0097 v3.20.0 2022-07-17 13:30:50 -04:00
Jay D Dee
f552f2b1e8 v3.19.9 2022-07-10 11:04:00 -04:00
Jay D Dee
26b8927632 v3.19.8 2022-05-27 18:12:30 -04:00
Jay D Dee
db76d3865f v3.19.7 2022-04-02 12:44:57 -04:00
Jay D Dee
5b678d2481 v3.19.6 2022-02-21 23:14:24 -05:00
Jay D Dee
90137b391e v3.19.5 2022-01-30 20:59:54 -05:00
271 changed files with 30349 additions and 27888 deletions

View File

@@ -1,4 +1,6 @@
These instructions may be out of date, see the Wiki for the latest...
https://github.com/JayDDee/cpuminer-opt/wiki/Compiling-from-source
1. Requirements:
---------------
@@ -35,7 +37,7 @@ SHA support on AMD Ryzen CPUs requires gcc version 5 or higher and
openssl 1.1.0e or higher.
znver1 and znver2 should be recognized on most recent version of GCC and
znver3 is expected with GCC 11. GCC 11 also includes rocketlake support.
znver3 is available with GCC 11. GCC 11 also includes rocketlake support.
In the meantime here are some suggestions to compile with new CPUs:
"-march=native" is usually the best choice, used by build.sh.

View File

@@ -1,158 +1,4 @@
Instructions for compiling cpuminer-opt for Windows.
Thwaw intructions nay be out of date. Please consult the wiki for
the latest:
Please consult the wiki for Windows compile instructions.
https://github.com/JayDDee/cpuminer-opt/wiki/Compiling-from-source
Windows compilation using Visual Studio is not supported. Mingw64 is
used on a Linux system (bare metal or virtual machine) to cross-compile
cpuminer-opt executable binaries for Windows.
These instructions were written for Debian and Ubuntu compatible distributions
but should work on other major distributions as well. However some of the
package names or file paths may be different.
It is assumed a Linux system is already available and running. And the user
has enough Linux knowledge to find and install packages and follow these
instructions.
First it is a good idea to create new user specifically for cross compiling.
It keeps all mingw stuff contained and isolated from the rest of the system.
Step by step...
1. Install necessary packages from the distribution's repositories.
Refer to Linux compile instructions and install required packages.
Additionally, install mingw-w64.
sudo apt-get install mingw-w64 libz-mingw-w64-dev
2. Create a local library directory for packages to be compiled in the next
step. Suggested location is $HOME/usr/lib/
$ mkdir $HOME/usr/lib
3. Download and build other packages for mingw that don't have a mingw64
version available in the repositories.
Download the following source code packages from their respective and
respected download locations, copy them to $HOME/usr/lib/ and uncompress them.
openssl: https://github.com/openssl/openssl/releases
curl: https://github.com/curl/curl/releases
gmp: https://gmplib.org/download/gmp/
In most cases the latest version is ok but it's safest to download the same major and minor version as included in your distribution. The following uses versions from Ubuntu 20.04. Change version numbers as required.
Run the following commands or follow the supplied instructions. Do not run "make install" unless you are using /usr/lib, which isn't recommended.
Some instructions insist on running "make check". If make check fails it may still work, YMMV.
You can speed up "make" by using all CPU cores available with "-j n" where n is the number of CPU threads you want to use.
openssl:
$ ./Configure mingw64 shared --cross-compile-prefix=x86_64-w64-mingw32-
$ make
Make may fail with an ld error, just ensure libcrypto-1_1-x64.dll is created.
curl:
$ ./configure --with-winssl --with-winidn --host=x86_64-w64-mingw32
$ make
gmp:
$ ./configure --host=x86_64-w64-mingw32
$ make
4. Tweak the environment.
This step is required everytime you login or the commands can be added to .bashrc.
Define some local variables to point to local library.
$ export LOCAL_LIB="$HOME/usr/lib"
$ export LDFLAGS="-L$LOCAL_LIB/curl/lib/.libs -L$LOCAL_LIB/gmp/.libs -L$LOCAL_LIB/openssl"
$ export CONFIGURE_ARGS="--with-curl=$LOCAL_LIB/curl --with-crypto=$LOCAL_LIB/openssl --host=x86_64-w64-mingw32"
Adjust for gcc version:
$ export GCC_MINGW_LIB="/usr/lib/gcc/x86_64-w64-mingw32/9.3-win32"
Create a release directory and copy some dll files previously built. This can be done outside of cpuminer-opt and only needs to be done once. If the release directory is in cpuminer-opt directory it needs to be recreated every time a source package is decompressed.
$ mkdir release
$ cp /usr/x86_64-w64-mingw32/lib/zlib1.dll release/
$ cp /usr/x86_64-w64-mingw32/lib/libwinpthread-1.dll release/
$ cp $GCC_MINGW_LIB/libstdc++-6.dll release/
$ cp $GCC_MINGW_LIB/libgcc_s_seh-1.dll release/
$ cp $LOCAL_LIB/openssl/libcrypto-1_1-x64.dll release/
$ cp $LOCAL_LIB/curl/lib/.libs/libcurl-4.dll release/
The following steps need to be done every time a new source package is
opened.
5. Download cpuminer-opt
Download the latest source code package of cpumuner-opt to your desired
location. .zip or .tar.gz, your choice.
https://github.com/JayDDee/cpuminer-opt/releases
Decompress and change to the cpuminer-opt directory.
6. compile
Create a link to the locally compiled version of gmp.h
$ ln -s $LOCAL_LIB/gmp-version/gmp.h ./gmp.h
$ ./autogen.sh
Configure the compiler for the CPU architecture of the host machine:
CFLAGS="-O3 -march=native -Wall" ./configure $CONFIGURE_ARGS
or cross compile for a specific CPU architecture:
CFLAGS="-O3 -march=znver1 -Wall" ./configure $CONFIGURE_ARGS
This will compile for AMD Ryzen.
You can compile more generically for a set of specific CPU features if you know what features you want:
CFLAGS="-O3 -maes -msse4.2 -Wall" ./configure $CONFIGURE_ARGS
This will compile for an older CPU that does not have AVX.
You can find several examples in README.txt
If you have a CPU with more than 64 threads and Windows 7 or higher you can enable the CPU Groups feature by adding the following to CFLAGS:
"-D_WIN32_WINNT=0x0601"
Once you have run configure successfully run the compiler with n CPU threads:
$ make -j n
Copy cpuminer.exe to the release directory, compress and copy the release directory to a Windows system and run cpuminer.exe from the command line.
Run cpuminer
In a command windows change directories to the unzipped release folder. To get a list of all options:
cpuminer.exe --help
Command options are specific to where you mine. Refer to the pool's instructions on how to set them.

View File

@@ -36,28 +36,21 @@ cpuminer_SOURCES = \
algo/argon2/argon2d/argon2d/argon2d_thread.c \
algo/argon2/argon2d/argon2d/encoding.c \
algo/blake/sph_blake.c \
algo/blake/blake256-hash-4way.c \
algo/blake/blake512-hash-4way.c \
algo/blake/blake256-hash.c \
algo/blake/blake512-hash.c \
algo/blake/blake-gate.c \
algo/blake/blake.c \
algo/blake/blake-4way.c \
algo/blake/sph_blake2b.c \
algo/blake/sph-blake2s.c \
algo/blake/blake2s-hash-4way.c \
algo/blake/blake2s-hash.c \
algo/blake/blake2s.c \
algo/blake/blake2s-gate.c \
algo/blake/blake2s-4way.c \
algo/blake/blake2b-hash-4way.c \
algo/blake/blake2b-hash.c \
algo/blake/blake2b.c \
algo/blake/blake2b-gate.c \
algo/blake/blake2b-4way.c \
algo/blake/blakecoin-gate.c \
algo/blake/mod_blakecoin.c \
algo/blake/blakecoin.c \
algo/blake/blakecoin-4way.c \
algo/blake/decred-gate.c \
algo/blake/decred.c \
algo/blake/decred-4way.c \
algo/blake/pentablake-gate.c \
algo/blake/pentablake-4way.c \
algo/blake/pentablake.c \
@@ -166,8 +159,6 @@ cpuminer_SOURCES = \
algo/sha/sph_sha2big.c \
algo/sha/sha256-hash-4way.c \
algo/sha/sha512-hash-4way.c \
algo/sha/sha256-hash-opt.c \
algo/sha/sha256-hash-2way-ni.c \
algo/sha/hmac-sha256-hash.c \
algo/sha/hmac-sha256-hash-4way.c \
algo/sha/sha256d.c \
@@ -175,9 +166,10 @@ cpuminer_SOURCES = \
algo/sha/sha256d-4way.c \
algo/sha/sha256t-gate.c \
algo/sha/sha256t-4way.c \
algo/sha/sha256t.c \
algo/sha/sha256q-4way.c \
algo/sha/sha256q.c \
algo/sha/sha512256d-4way.c \
algo/sha/sha256dt.c \
algo/shabal/sph_shabal.c \
algo/shabal/shabal-hash-4way.c \
algo/shavite/sph_shavite.c \
@@ -205,7 +197,6 @@ cpuminer_SOURCES = \
algo/verthash/tiny_sha3/sha3.c \
algo/verthash/tiny_sha3/sha3-4way.c \
algo/whirlpool/sph_whirlpool.c \
algo/whirlpool/whirlpool-hash-4way.c \
algo/whirlpool/whirlpool-gate.c \
algo/whirlpool/whirlpool.c \
algo/whirlpool/whirlpoolx.c \
@@ -285,11 +276,9 @@ cpuminer_SOURCES = \
algo/x22/x22i-gate.c \
algo/x22/x25x.c \
algo/x22/x25x-4way.c \
algo/yescrypt/yescrypt.c \
algo/yescrypt/yescrypt-best.c \
algo/yespower/yespower-gate.c \
algo/yespower/yespower-blake2b.c \
algo/yespower/crypto/blake2b-yp.c \
algo/yespower/crypto/hmac-blake2b.c \
algo/yespower/yescrypt-r8g.c \
algo/yespower/yespower-opt.c
@@ -298,10 +287,10 @@ disable_flags =
if USE_ASM
cpuminer_SOURCES += asm/neoscrypt_asm.S
if ARCH_x86
cpuminer_SOURCES += asm/sha2-x86.S asm/scrypt-x86.S asm/aesb-x86.S
cpuminer_SOURCES += asm/sha2-x86.S asm/scrypt-x86.S
endif
if ARCH_x86_64
cpuminer_SOURCES += asm/sha2-x64.S asm/scrypt-x64.S asm/aesb-x64.S
cpuminer_SOURCES += asm/sha2-x64.S asm/scrypt-x64.S
endif
if ARCH_ARM
cpuminer_SOURCES += asm/sha2-arm.S asm/scrypt-arm.S

View File

@@ -40,17 +40,25 @@ Requirements
Intel Core2 and newer and AMD equivalents. Further optimizations are available
on some algoritms for CPUs with AES, AVX, AVX2, SHA, AVX512 and VAES.
Older CPUs are supported by cpuminer-multi by TPruvot but at reduced
performance.
32 bit CPUs are not supported.
Other CPU architectures such as ARM, Raspberry Pi, RISC-V, Xeon Phi, etc,
are not supported.
ARM and Aarch64 CPUs are not supported.
Mobile CPUs like laptop computers are not recommended because they aren't
designed for extreme heat of operating at full load for extended periods of
time.
Older CPUs and ARM architecture may be supported by cpuminer-multi by TPruvot.
2. 64 bit Linux or Windows OS. Ubuntu and Fedora based distributions,
including Mint and Centos, are known to work and have all dependencies
in their repositories. Others may work but may require more effort. Older
versions such as Centos 6 don't work due to missing features.
64 bit Windows OS is supported with mingw_w64 and msys or pre-built binaries.
Windows 7 or newer is supported with mingw_w64 and msys or using the pre-built
binaries. WindowsXP 64 bit is YMMV.
FreeBSD is not actively tested but should work, YMMV.
MacOS, OSx and Android are not supported.
3. Stratum pool supporting stratum+tcp:// or stratum+ssl:// protocols or
@@ -66,53 +74,50 @@ Supported Algorithms
argon2d250 argon2d-crds, Credits (CRDS)
argon2d500 argon2d-dyn, Dynamic (DYN)
argon2d4096 argon2d-uis, Unitus, (UIS)
axiom Shabal-256 MemoHash
blake Blake-256 (SFR)
blake2b Blake2b 256
blake2s Blake-2 S
blake Blake-256
blake2b Blake2-512
blake2s Blake2-256
blakecoin blake256r8
bmw BMW 256
bmw512 BMW 512
c11 Chaincoin
c11
decred
deep Deepcoin (DCN)
dmd-gr Diamond-Groestl
groestl Groestl coin
hex x16r-hex
hmq1725 Espers
hmq1725
hodl Hodlcoin
jha Jackpotcoin
keccak Maxcoin
keccakc Creative coin
lbry LBC, LBRY Credits
luffa Luffa
lyra2h Hppcoin
lyra2h
lyra2re lyra2
lyra2rev2 lyra2v2
lyra2rev3 lyrav2v3
lyra2z
lyra2z330 Lyra2 330 rows, Zoin (ZOI)
m7m Magi (XMG)
minotaur Ringcoin (RNG)
lyra2z330
m7m
minotaur
minotaurx
myr-gr Myriad-Groestl
neoscrypt NeoScrypt(128, 2, 1)
nist5 Nist5
pentablake Pentablake
phi1612 phi
phi2 Luxcoin (LUX)
phi2-lux identical to phi2
pluck Pluck:128 (Supcoin)
phi2
polytimos Ninja
power2b MicroBitcoin (MBC)
quark Quark
qubit Qubit
scrypt scrypt(1024, 1, 1) (default)
scrypt:N scrypt(N, 1, 1)
scryptn2 scrypt(1048576, 1, 1)
sha256d Double SHA-256
sha256q Quad SHA-256, Pyrite (PYE)
sha256t Triple SHA-256, Onecoin (OC)
sha256q Quad SHA-256
sha256t Triple SHA-256
sha3d Double keccak256 (BSHA3)
shavite3 Shavite3
skein Skein+Sha (Skeincoin)
skein2 Double Skein (Woodcoin)
skunk Signatum (SIGT)
@@ -128,17 +133,17 @@ Supported Algorithms
x11 Dash
x11evo Revolvercoin
x11gost sib (SibCoin)
x12 Galaxie Cash (GCH)
x13 X13
x12
x13
x13bcd bcd
x13sm3 hsr (Hshare)
x14 X14
x15 X15
x14
x15
x16r
x16rv2
x16rt Gincoin (GIN)
x16rt-veil Veil (VEIL)
x16s Pigeoncoin (PGN)
x16rt
x16rt-veil veil
x16s
x17
x21s
x22i

View File

@@ -1,12 +1,22 @@
This file is included in the Windows binary package. Compile instructions
for Linux and Windows can be found in RELEASE_NOTES.
This package is officially avalable only from:
cpuminer-opt is open source and free of any fees. Many forks exist that are
closed source and contain usage fees. support open source free software.
This package is officially avalaible only from:
https://github.com/JayDDee/cpuminer-opt
No other sources should be trusted.
cpuminer is a console program that is executed from a DOS or Powershell
prompt. There is no GUI and no mouse support.
command prompt. There is no GUI and no mouse support.
New users are encouraged to consult the cpuminer-opt Wiki for detailed
information on usage:
https://github.com/JayDDee/cpuminer-opt/wiki
Miner programs are often flagged as malware by antivirus programs. This is
a false positive, they are flagged simply because they are cryptocurrency
@@ -43,12 +53,11 @@ cpuminer-avx2.exe Haswell, Skylake, Kabylake, Coffeelake, Cometlake
cpuminer-avx2-sha.exe AMD Zen1, Zen2
cpuminer-avx2-sha-vaes.exe Intel Alderlake*, AMD Zen3
cpuminer-avx512.exe Intel HEDT Skylake-X, Cascadelake
cpuminer-avx512-sha-vaes.exe Icelake, Tigerlake, Rocketlake
cpuminer-avx512-sha-vaes.exe AMD Zen4, Intel Rocketlake, Icelake
* Alderlake is a hybrid architecture. With the E-cores disabled it may be
possible to enable AVX512 on the the P-cores and use the avx512-sha-vaes
build. This is not officially supported by Intel at time of writing.
Check for current information.
* Alderlake is a hybrid architecture with a mix of E-cores & P-cores. Although
the P-cores can support AVX512 the E-cores can't so Intel decided to disable
AVX512 on the the P-cores.
Notes about included DLL files:
@@ -59,9 +68,10 @@ source code obtained from the author's official repository. The exact
procedure is documented in the build instructions for Windows:
https://github.com/JayDDee/cpuminer-opt/wiki/Compiling-from-source
Some DLL filess may already be installed on the system by Windows or third
party packages. They often will work and may be used instead of the included
file.
Some included DLL files may already be installed on the system by Windows or
third party packages. They often will work and may be used instead of the
included version of the files.
If you like this software feel free to donate:

View File

@@ -22,7 +22,7 @@ required.
Compile Instructions
--------------------
See INSTALL_LINUX or INSTALL_WINDOWS for compile instruuctions
See INSTALL_LINUX or INSTALL_WINDOWS for compile instructions
Requirements
------------
@@ -65,31 +65,198 @@ If not what makes it happen or not happen?
Change Log
----------
v3.23.3
#400: Removed excessive thread restarts when mining solo.
Fixed build_msys2.sh for gcc-13 by removing unsupported option "--param=evrp-mode=legacy" from CFLAGS.
Added CPUID detection and reporting of CPUs and SW builds supporting SHA512 extension.
Added prototype of sha-512 using SHA512 intrinsics, untested.
Other improvements and code cleanup.
v3.23.2
sha256dt, sha256t & sha256d +10% with SHA, small improvement with AVX2.
Other small improvements and code cleanup.
v3.23.1
#349: Fix sha256t low difficulty shares and low effective hash rate.
Faster sha256dt: AVX512 +7%, SHA +200%, AVX2 +5%.
Faster blakecoin & vanilla: AVX2 +30%, AVX512 +110%.
Other small improvements and code cleanup.
v3.23.0
#398: Prevent GBT fallback to Getwork on network error.
#398: Prevent excessive logs when conditional mining is paused when mining solo.
Fix a false start if stratum doesn't immediately send a new job after connecting.
Tweak diagonal shuffle in Blake2b & Blake256 1-way SIMD to reduce latency.
CPUID support for AVX10.
Initial changes to AVX2 targeted code in preparation for AVX10.
Code cleanup and miscellaneous small improvements.
v3.22.3
Data interleaving and byte swap optimizations with AVX2, AVX512 & AVX512VBMI.
Faster Luffa with AVX2 & AVX512.
Other small optimizations.
Some code cleanup.
v3.22.2
Added sha512256d & sha256dt algos.
Fixed intermittant invalid shares lyra2v2 AVX512.
Removed application limits on the number of CPUs and threads, HW and OS limits still apply.
Added a log warning if more threads are defined than active CPUs in affinity mask.
Improved merkle tree memory management for stratum.
Added transaction count to New Work log.
Other small improvements.
v3.22.1
#393 fixed segfault in GBT, regression from v3.22.0.
More efficient 32 bit data interleaving.
v3.22.0
Stratum: faster netdiff calculation.
Merged a few updates from Pooler/cpuminer:
Use CURLOPT_POSTFIELDS in json_rpc_call,
Use CURLINFO_ACTIVESOCKET when supported,
JSONRPC speedup,
Speed up hex2bin function.
Small log improvements, notably more frequent hash rate reports.
Removed decred algo.
v3.21.5
All issues with v3.21.3 & v3.21.4 should be resolved.
Changes since v3.21.2:
#392 #379 #389 Fixed misaligned address segfault solo mining.
#392 Fixed stats for myr-gr algo, and a few others, for CPUs without AVX2.
#392 Fixed conditional mining.
#392 Fixed cpu affinity on Ryzen CPUs using Windows binaries,
Windows binaries no longer support CPU groups,
Windows binaries support CPUs with up to 64 threads.
Small optimizations to serialized vectoring.
v3.21.4 CANCELLED
Reapply selected changes from v3.21.3.
#392 #379 #389 Fixed misaligned address segfault solo mining.
#392 Fixed conditional mining.
#392 Fixed cpu affinity on Ryzen CPUs using Windows binaries,
Windows binaries no longer support CPU groups,
Windows binaries support CPUs with up to 64 threads.
v3.21.3.1 UNRELEASED
Revert to 3.21.2
v3.21.3 CANCELLED
#392 #379 #389 Fixed misaligned address segfault solo mining.
#392 Fixed stats for myr-gr algo, and a few others, for CPUs without AVX2.
#392 Fixed conditional mining.
#392 Fixed cpu affinity on Ryzen CPUs using Windows binaries,
Windows binaries no longer support CPU groups,
Windows binaries support CPUs with up to 64 threads.
Midstate prehash is now centralized, done only once instead of by every thread
for selected algos.
Small optimizations to serialized vectoring.
v3.21.2
Faster SALSA SIMD shuffle for yespower, yescrypt & scryptn2.
Fixed a couple of compiler warnings with gcc-12.
v3.21.1
Fixed a segfault in some obsolete algos.
Small optimizations to Hamsi & Shabal AVX2 & AVX512.
v3.21.0
Added minotaurx algo for stratum only.
Blake256 & sha256 prehash optimized to ignore zero-padded data for AVX2 & AVX512.
Other small improvements.
v3.20.3
Faster c11 algo: AVX512 6%, AVX2 4%, AVX2+VAES 15%.
Faster AVX2+VAES for anime 14%, hmq1725 6%.
Small optimizations to Luffa AVX2 & AVX512.
v3.20.2
Bit rotation optimizations to Blake256, Blake512, Blake2b, Blake2s & Lyra2-blake2b for SSE2 & AVX2.
Removed old unused yescrypt library and other unused code.
v3.20.1
sph_blake2b optimized 1-way SSSE3 & AVX2.
Removed duplicate Blake2b used by Power2b algo, will now use optimized sph_blake2b.
Removed imprecise hash & target display from rejected share log.
Share and target difficulty is now displayed only for low difficulty shares.
Updated configure.ac to check for AVX512 asm support.
Small optimization to Lyra2 SSE2.
v3.20.0
#375 Fixed segfault in algos using Groestl VAES due to use of uninitialized data.
v3.19.9
More Blake256, Blake512, Luffa & Cubehash prehash optimizations.
Relaxed some excessively strict data alignment that was negatively affecting performance.
v3.19.8
#370 "stratum+ssl", in addition to "stratum+tcps", is now recognized as a valid
url protocol specifier for requesting a secure stratum connection.
The full url, including the protocol, is now displayed in the stratum connect
log and the periodic summary log.
Small optimizations to Cubehash, AVX2 & AVX512.
Byte order and prehash optimizations for Blake256 & Blake512, AVX2 & AVX512.
v3.19.7
#369 Fixed time limited mining, --time-limit.
Fixed a potential compile error when using optimization below -O3.
v3.19.6
#363 Fixed a stratum bug where the first job may be ignored delaying start of hashing
Fixed handling of nonce exhaust when hashing a fast algo with extranonce disabled
Small optimization to Shavite.
v3.19.5
Enhanced stratum-keepalive preemptively resets the stratum connection
before the server to avoid lost shares.
Added build-msys2.sh shell script for easier compiling on Windows, see Wiki for details.
X16RT: eliminate unnecessary recalculations of the hash order.
Fix a few compiler warnings.
Fixed log colour error when a block is solved.
v3.19.4
#359: Fix verthash memory allocation for non-hugepages, broken in v3.19.3.
New option stratum-keepalive prevents stratum timeouts when no shares are
submitted for several minutes due to high difficulty.
Fixed a bug displaying optimizations for some algos.
v3.19.3
Linux: Faster verthash (+25%), scryptn2 (+2%) when huge pages are available.
Small speed up for Hamsi AVX2 & AVX512, Keccak AVX512.
v3.19.2
Fixed log displaying incorrect memory usage for scrypt, broken in v3.19.1.
Reduce log noise when replies to submitted shares are lost due to stratum errors.
Fugue prehash optimization for X16r family AVX2 & AVX512.
Small speed improvement for Hamsi AVX2 & AVX512.
Win: With CPU groups enabled the number of CPUs displayed in the ASCII art
affinity map is the number of CPUs in a CPU group, was number of CPUs up to 64.
@@ -101,7 +268,6 @@ Changes to Windows binaries package:
- zen build renamed to avx2-sha, supports Zen1 & Zen2,
- avx512-sha build removed, Rocketlake CPUs can use avx512-sha-vaes,
- see README.txt for compatibility details.
Fixed a few compiler warnings that are new in GCC 11.
Other minor fixes.
@@ -115,22 +281,17 @@ Changes to cpu-affinity:
- streamlined code for more efficient initialization of miner threads,
- precise affining of each miner thread to a specific CPU,
- added an option to disable CPU affinity with "--cpu-affinity 0"
Faster sha256t with AVX512 & AVX2.
Added stratum error count to stats log, reported only when non-zero.
v3.18.2
Issue #342, fixed Groestl AES on Windows, broken in v3.18.0.
AVX512 for sha256d.
SSE42 and AVX may now be displayed as mining features at startup.
This is hard coded for each algo, and is only implemented for scrypt
at this time as it is the only algo with significant performance differences
with those features.
Fixed an issue where a high hashrate algo could cause excessive invalid hash
rate log reports when starting up in benchmark mode.
@@ -141,9 +302,7 @@ More speed for scrypt:
- AVX2 is now used by default on CPUS with SHA but not AVX512,
- scrypt:1024 performance lost in v3.18.0 is restored,
- AVX512 & AVX2 improvements to scrypt:1024.
Big speedup for SwiFFTx AVX2 & SSE4.1: x22i +55%, x25x +22%.
Issue #337: fixed a problem that could display negative stats values in the
first summary report if the report was forced prematurely due to a stratum
diff change. The stats will still be invalid but should display zeros.
@@ -156,26 +315,19 @@ Complete rewrite of Scrypt code, optimized for large N factor (scryptn2):
- memory requirements reduced 30-60% depending on CPU architecture,
- memory usage displayed at startup,
- scrypt, default N=1024 (LTC), will likely perform slower.
Improved stale share detection and handling for Scrypt with large N factor:
- abort and discard partially computed hash when new work is detected,
- quicker response to new job, less time wasted mining stale job.
Improved stale share handling for all algorithms:
- report possible stale share when new work received with a previously
submitted share still pending,
- when new work is detected report the submission of an already completed,
otherwise valid, but likely stale, share,
- fixed incorrect block height in stale share log.
Small performance improvements to sha, bmw, cube & hamsi for AVX512 & AVX2.
When stratum disconnects miner threads go to idle until reconnected.
Colour changes to some logs.
Some low level function name changes for clarity and consistency.
The reference hashrate in the summary log and the benchmark total hashrate
are now the mean hashrate for the session.
@@ -288,7 +440,6 @@ Fixed neoscrypt BUG log.
v3.14.3
#265: more mutex changes to reduce blocking with high thread count.
#267: fixed hodl algo potential memory alignment issue,
add warning when thread count is not valid for mining hodl algo.

83
aclocal.m4 vendored
View File

@@ -1,6 +1,6 @@
# generated automatically by aclocal 1.16.1 -*- Autoconf -*-
# generated automatically by aclocal 1.16.5 -*- Autoconf -*-
# Copyright (C) 1996-2018 Free Software Foundation, Inc.
# Copyright (C) 1996-2021 Free Software Foundation, Inc.
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -14,13 +14,13 @@
m4_ifndef([AC_CONFIG_MACRO_DIRS], [m4_defun([_AM_CONFIG_MACRO_DIRS], [])m4_defun([AC_CONFIG_MACRO_DIRS], [_AM_CONFIG_MACRO_DIRS($@)])])
m4_ifndef([AC_AUTOCONF_VERSION],
[m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl
m4_if(m4_defn([AC_AUTOCONF_VERSION]), [2.69],,
[m4_warning([this file was generated for autoconf 2.69.
m4_if(m4_defn([AC_AUTOCONF_VERSION]), [2.71],,
[m4_warning([this file was generated for autoconf 2.71.
You have another version of autoconf. It may work, but is not guaranteed to.
If you have problems, you may need to regenerate the build system entirely.
To do so, use the procedure documented by the package, typically 'autoreconf'.])])
# Copyright (C) 2002-2018 Free Software Foundation, Inc.
# Copyright (C) 2002-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -35,7 +35,7 @@ AC_DEFUN([AM_AUTOMAKE_VERSION],
[am__api_version='1.16'
dnl Some users find AM_AUTOMAKE_VERSION and mistake it for a way to
dnl require some minimum version. Point them to the right macro.
m4_if([$1], [1.16.1], [],
m4_if([$1], [1.16.5], [],
[AC_FATAL([Do not call $0, use AM_INIT_AUTOMAKE([$1]).])])dnl
])
@@ -51,14 +51,14 @@ m4_define([_AM_AUTOCONF_VERSION], [])
# Call AM_AUTOMAKE_VERSION and AM_AUTOMAKE_VERSION so they can be traced.
# This function is AC_REQUIREd by AM_INIT_AUTOMAKE.
AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION],
[AM_AUTOMAKE_VERSION([1.16.1])dnl
[AM_AUTOMAKE_VERSION([1.16.5])dnl
m4_ifndef([AC_AUTOCONF_VERSION],
[m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl
_AM_AUTOCONF_VERSION(m4_defn([AC_AUTOCONF_VERSION]))])
# Figure out how to run the assembler. -*- Autoconf -*-
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
# Copyright (C) 2001-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -78,7 +78,7 @@ _AM_IF_OPTION([no-dependencies],, [_AM_DEPENDENCIES([CCAS])])dnl
# AM_AUX_DIR_EXPAND -*- Autoconf -*-
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
# Copyright (C) 2001-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -130,7 +130,7 @@ am_aux_dir=`cd "$ac_aux_dir" && pwd`
# AM_CONDITIONAL -*- Autoconf -*-
# Copyright (C) 1997-2018 Free Software Foundation, Inc.
# Copyright (C) 1997-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -161,7 +161,7 @@ AC_CONFIG_COMMANDS_PRE(
Usually this means the macro was only invoked conditionally.]])
fi])])
# Copyright (C) 1999-2018 Free Software Foundation, Inc.
# Copyright (C) 1999-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -352,7 +352,7 @@ _AM_SUBST_NOTMAKE([am__nodep])dnl
# Generate code to set up dependency tracking. -*- Autoconf -*-
# Copyright (C) 1999-2018 Free Software Foundation, Inc.
# Copyright (C) 1999-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -391,7 +391,9 @@ AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS],
done
if test $am_rc -ne 0; then
AC_MSG_FAILURE([Something went wrong bootstrapping makefile fragments
for automatic dependency tracking. Try re-running configure with the
for automatic dependency tracking. If GNU make was not used, consider
re-running the configure script with MAKE="gmake" (or whatever is
necessary). You can also try re-running configure with the
'--disable-dependency-tracking' option to at least be able to build
the package (albeit without support for automatic dependency tracking).])
fi
@@ -418,7 +420,7 @@ AC_DEFUN([AM_OUTPUT_DEPENDENCY_COMMANDS],
# Do all the work for Automake. -*- Autoconf -*-
# Copyright (C) 1996-2018 Free Software Foundation, Inc.
# Copyright (C) 1996-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -446,6 +448,10 @@ m4_defn([AC_PROG_CC])
# release and drop the old call support.
AC_DEFUN([AM_INIT_AUTOMAKE],
[AC_PREREQ([2.65])dnl
m4_ifdef([_$0_ALREADY_INIT],
[m4_fatal([$0 expanded multiple times
]m4_defn([_$0_ALREADY_INIT]))],
[m4_define([_$0_ALREADY_INIT], m4_expansion_stack)])dnl
dnl Autoconf wants to disallow AM_ names. We explicitly allow
dnl the ones we care about.
m4_pattern_allow([^AM_[A-Z]+FLAGS$])dnl
@@ -482,7 +488,7 @@ m4_ifval([$3], [_AM_SET_OPTION([no-define])])dnl
[_AM_SET_OPTIONS([$1])dnl
dnl Diagnose old-style AC_INIT with new-style AM_AUTOMAKE_INIT.
m4_if(
m4_ifdef([AC_PACKAGE_NAME], [ok]):m4_ifdef([AC_PACKAGE_VERSION], [ok]),
m4_ifset([AC_PACKAGE_NAME], [ok]):m4_ifset([AC_PACKAGE_VERSION], [ok]),
[ok:ok],,
[m4_fatal([AC_INIT should be called with package and version arguments])])dnl
AC_SUBST([PACKAGE], ['AC_PACKAGE_TARNAME'])dnl
@@ -534,6 +540,20 @@ AC_PROVIDE_IFELSE([AC_PROG_OBJCXX],
[m4_define([AC_PROG_OBJCXX],
m4_defn([AC_PROG_OBJCXX])[_AM_DEPENDENCIES([OBJCXX])])])dnl
])
# Variables for tags utilities; see am/tags.am
if test -z "$CTAGS"; then
CTAGS=ctags
fi
AC_SUBST([CTAGS])
if test -z "$ETAGS"; then
ETAGS=etags
fi
AC_SUBST([ETAGS])
if test -z "$CSCOPE"; then
CSCOPE=cscope
fi
AC_SUBST([CSCOPE])
AC_REQUIRE([AM_SILENT_RULES])dnl
dnl The testsuite driver may need to know about EXEEXT, so add the
dnl 'am__EXEEXT' conditional if _AM_COMPILER_EXEEXT was seen. This
@@ -615,7 +635,7 @@ for _am_header in $config_headers :; do
done
echo "timestamp for $_am_arg" >`AS_DIRNAME(["$_am_arg"])`/stamp-h[]$_am_stamp_count])
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
# Copyright (C) 2001-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -636,7 +656,7 @@ if test x"${install_sh+set}" != xset; then
fi
AC_SUBST([install_sh])])
# Copyright (C) 2003-2018 Free Software Foundation, Inc.
# Copyright (C) 2003-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -658,7 +678,7 @@ AC_SUBST([am__leading_dot])])
# Add --enable-maintainer-mode option to configure. -*- Autoconf -*-
# From Jim Meyering
# Copyright (C) 1996-2018 Free Software Foundation, Inc.
# Copyright (C) 1996-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -693,7 +713,7 @@ AC_MSG_CHECKING([whether to enable maintainer-specific portions of Makefiles])
# Check to see how 'make' treats includes. -*- Autoconf -*-
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
# Copyright (C) 2001-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -736,7 +756,7 @@ AC_SUBST([am__quote])])
# Fake the existence of programs that GNU maintainers use. -*- Autoconf -*-
# Copyright (C) 1997-2018 Free Software Foundation, Inc.
# Copyright (C) 1997-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -757,12 +777,7 @@ AC_DEFUN([AM_MISSING_HAS_RUN],
[AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl
AC_REQUIRE_AUX_FILE([missing])dnl
if test x"${MISSING+set}" != xset; then
case $am_aux_dir in
*\ * | *\ *)
MISSING="\${SHELL} \"$am_aux_dir/missing\"" ;;
*)
MISSING="\${SHELL} $am_aux_dir/missing" ;;
esac
MISSING="\${SHELL} '$am_aux_dir/missing'"
fi
# Use eval to expand $SHELL
if eval "$MISSING --is-lightweight"; then
@@ -775,7 +790,7 @@ fi
# Helper functions for option handling. -*- Autoconf -*-
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
# Copyright (C) 2001-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -804,7 +819,7 @@ AC_DEFUN([_AM_SET_OPTIONS],
AC_DEFUN([_AM_IF_OPTION],
[m4_ifset(_AM_MANGLE_OPTION([$1]), [$2], [$3])])
# Copyright (C) 1999-2018 Free Software Foundation, Inc.
# Copyright (C) 1999-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -851,7 +866,7 @@ AC_LANG_POP([C])])
# For backward compatibility.
AC_DEFUN_ONCE([AM_PROG_CC_C_O], [AC_REQUIRE([AC_PROG_CC])])
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
# Copyright (C) 2001-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -870,7 +885,7 @@ AC_DEFUN([AM_RUN_LOG],
# Check to make sure that the build environment is sane. -*- Autoconf -*-
# Copyright (C) 1996-2018 Free Software Foundation, Inc.
# Copyright (C) 1996-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -951,7 +966,7 @@ AC_CONFIG_COMMANDS_PRE(
rm -f conftest.file
])
# Copyright (C) 2009-2018 Free Software Foundation, Inc.
# Copyright (C) 2009-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -1011,7 +1026,7 @@ AC_SUBST([AM_BACKSLASH])dnl
_AM_SUBST_NOTMAKE([AM_BACKSLASH])dnl
])
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
# Copyright (C) 2001-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -1039,7 +1054,7 @@ fi
INSTALL_STRIP_PROGRAM="\$(install_sh) -c -s"
AC_SUBST([INSTALL_STRIP_PROGRAM])])
# Copyright (C) 2006-2018 Free Software Foundation, Inc.
# Copyright (C) 2006-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
@@ -1058,7 +1073,7 @@ AC_DEFUN([AM_SUBST_NOTMAKE], [_AM_SUBST_NOTMAKE($@)])
# Check how to create a tarball. -*- Autoconf -*-
# Copyright (C) 2004-2018 Free Software Foundation, Inc.
# Copyright (C) 2004-2021 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,

View File

@@ -67,7 +67,6 @@ void do_nothing () {}
bool return_true () { return true; }
bool return_false () { return false; }
void *return_null () { return NULL; }
void call_error () { printf("ERR: Uninitialized function pointer\n"); }
void algo_not_tested()
{
@@ -95,7 +94,8 @@ int null_scanhash()
return 0;
}
// Default generic scanhash can be used in many cases.
// Default generic scanhash can be used in many cases. Not to be used when
// prehashing can be done or when byte swapping the data can be avoided.
int scanhash_generic( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
@@ -152,6 +152,9 @@ int scanhash_4way_64in_32out( struct work *work, uint32_t max_nonce,
const bool bench = opt_benchmark;
mm256_bswap32_intrlv80_4x64( vdata, pdata );
// overwrite byte swapped nonce with original byte order for proper
// incrementing. The nonce only needs to byte swapped if it is to be
// sumbitted.
*noncev = mm256_intrlv_blend_32(
_mm256_set_epi32( n+3, 0, n+2, 0, n+1, 0, n, 0 ), *noncev );
do
@@ -168,7 +171,7 @@ int scanhash_4way_64in_32out( struct work *work, uint32_t max_nonce,
}
}
*noncev = _mm256_add_epi32( *noncev,
m256_const1_64( 0x0000000400000000 ) );
_mm256_set1_epi64x( 0x0000000400000000 ) );
n += 4;
} while ( likely( ( n <= last_nonce ) && !work_restart[thr_id].restart ) );
pdata[19] = n;
@@ -224,7 +227,7 @@ int scanhash_8way_64in_32out( struct work *work, uint32_t max_nonce,
}
}
*noncev = _mm512_add_epi32( *noncev,
m512_const1_64( 0x0000000800000000 ) );
_mm512_set1_epi64( 0x0000000800000000 ) );
n += 8;
} while ( likely( ( n < last_nonce ) && !work_restart[thr_id].restart ) );
pdata[19] = n;
@@ -245,7 +248,7 @@ int null_hash()
return 0;
};
void init_algo_gate( algo_gate_t* gate )
static void init_algo_gate( algo_gate_t* gate )
{
gate->miner_thread_init = (void*)&return_true;
gate->scanhash = (void*)&scanhash_generic;
@@ -260,8 +263,6 @@ void init_algo_gate( algo_gate_t* gate )
gate->build_block_header = (void*)&std_build_block_header;
gate->build_extraheader = (void*)&std_build_extraheader;
gate->set_work_data_endian = (void*)&do_nothing;
gate->calc_network_diff = (void*)&std_calc_network_diff;
gate->ready_to_mine = (void*)&std_ready_to_mine;
gate->resync_threads = (void*)&do_nothing;
gate->do_this_thread = (void*)&return_true;
gate->longpoll_rpc_call = (void*)&std_longpoll_rpc_call;
@@ -305,7 +306,6 @@ bool register_algo_gate( int algo, algo_gate_t *gate )
case ALGO_BLAKECOIN: rc = register_blakecoin_algo ( gate ); break;
case ALGO_BMW512: rc = register_bmw512_algo ( gate ); break;
case ALGO_C11: rc = register_c11_algo ( gate ); break;
case ALGO_DECRED: rc = register_decred_algo ( gate ); break;
case ALGO_DEEP: rc = register_deep_algo ( gate ); break;
case ALGO_DMD_GR: rc = register_dmd_gr_algo ( gate ); break;
case ALGO_GROESTL: rc = register_groestl_algo ( gate ); break;
@@ -324,6 +324,7 @@ bool register_algo_gate( int algo, algo_gate_t *gate )
case ALGO_LYRA2Z330: rc = register_lyra2z330_algo ( gate ); break;
case ALGO_M7M: rc = register_m7m_algo ( gate ); break;
case ALGO_MINOTAUR: rc = register_minotaur_algo ( gate ); break;
case ALGO_MINOTAURX: rc = register_minotaur_algo ( gate ); break;
case ALGO_MYR_GR: rc = register_myriad_algo ( gate ); break;
case ALGO_NEOSCRYPT: rc = register_neoscrypt_algo ( gate ); break;
case ALGO_NIST5: rc = register_nist5_algo ( gate ); break;
@@ -336,9 +337,11 @@ bool register_algo_gate( int algo, algo_gate_t *gate )
case ALGO_QUBIT: rc = register_qubit_algo ( gate ); break;
case ALGO_SCRYPT: rc = register_scrypt_algo ( gate ); break;
case ALGO_SHA256D: rc = register_sha256d_algo ( gate ); break;
case ALGO_SHA256DT: rc = register_sha256dt_algo ( gate ); break;
case ALGO_SHA256Q: rc = register_sha256q_algo ( gate ); break;
case ALGO_SHA256T: rc = register_sha256t_algo ( gate ); break;
case ALGO_SHA3D: rc = register_sha3d_algo ( gate ); break;
case ALGO_SHA512256D: rc = register_sha512256d_algo ( gate ); break;
case ALGO_SHAVITE3: rc = register_shavite_algo ( gate ); break;
case ALGO_SKEIN: rc = register_skein_algo ( gate ); break;
case ALGO_SKEIN2: rc = register_skein2_algo ( gate ); break;
@@ -371,15 +374,11 @@ bool register_algo_gate( int algo, algo_gate_t *gate )
case ALGO_X22I: rc = register_x22i_algo ( gate ); break;
case ALGO_X25X: rc = register_x25x_algo ( gate ); break;
case ALGO_XEVAN: rc = register_xevan_algo ( gate ); break;
case ALGO_YESCRYPT: rc = register_yescrypt_05_algo ( gate ); break;
// case ALGO_YESCRYPT: register_yescrypt_algo ( gate ); break;
case ALGO_YESCRYPTR8: rc = register_yescryptr8_05_algo ( gate ); break;
// case ALGO_YESCRYPTR8: register_yescryptr8_algo ( gate ); break;
case ALGO_YESCRYPT: rc = register_yescrypt_algo ( gate ); break;
case ALGO_YESCRYPTR8: rc = register_yescryptr8_algo ( gate ); break;
case ALGO_YESCRYPTR8G: rc = register_yescryptr8g_algo ( gate ); break;
case ALGO_YESCRYPTR16: rc = register_yescryptr16_05_algo( gate ); break;
// case ALGO_YESCRYPTR16: register_yescryptr16_algo ( gate ); break;
case ALGO_YESCRYPTR32: rc = register_yescryptr32_05_algo( gate ); break;
// case ALGO_YESCRYPTR32: register_yescryptr32_algo ( gate ); break;
case ALGO_YESCRYPTR16: rc = register_yescryptr16_algo ( gate ); break;
case ALGO_YESCRYPTR32: rc = register_yescryptr32_algo ( gate ); break;
case ALGO_YESPOWER: rc = register_yespower_algo ( gate ); break;
case ALGO_YESPOWERR16: rc = register_yespowerr16_algo ( gate ); break;
case ALGO_YESPOWER_B2B: rc = register_yespower_b2b_algo ( gate ); break;
@@ -427,7 +426,6 @@ const char* const algo_alias_map[][2] =
{ "blake256r8", "blakecoin" },
{ "blake256r8vnl", "vanilla" },
{ "blake256r14", "blake" },
{ "blake256r14dcr", "decred" },
{ "diamond", "dmd-gr" },
{ "espers", "hmq1725" },
{ "flax", "c11" },

View File

@@ -94,10 +94,14 @@ typedef uint32_t set_t;
#define SSE42_OPT 4
#define AVX_OPT 8 // Sandybridge
#define AVX2_OPT 0x10 // Haswell, Zen1
#define SHA_OPT 0x20 // Zen1, Icelake (sha256)
#define AVX512_OPT 0x40 // Skylake-X (AVX512[F,VL,DQ,BW])
#define VAES_OPT 0x80 // Icelake (VAES & AVX512)
#define SHA_OPT 0x20 // Zen1, Icelake (deprecated)
#define AVX512_OPT 0x40 // Skylake-X, Zen4 (AVX512[F,VL,DQ,BW])
#define VAES_OPT 0x80 // Icelake, Zen3
#define SHA512_OPT 0x100 // Lunar Lake, Arrow Lake
// AVX10 does not have explicit algo features:
// AVX10_512 is compatible with AVX512 + VAES
// AVX10_256 is compatible with AVX2 + VAES
// return set containing all elements from sets a & b
inline set_t set_union ( set_t a, set_t b ) { return a | b; }
@@ -144,7 +148,7 @@ void ( *gen_merkle_root ) ( char*, struct stratum_ctx* );
void ( *build_extraheader ) ( struct work*, struct stratum_ctx* );
void ( *build_block_header ) ( struct work*, uint32_t, uint32_t*,
uint32_t*, uint32_t, uint32_t,
uint32_t*, uint32_t, uint32_t,
unsigned char* );
// Build mining.submit message
@@ -155,19 +159,13 @@ char* ( *malloc_txs_request ) ( struct work* );
// Big endian or little endian
void ( *set_work_data_endian ) ( struct work* );
double ( *calc_network_diff ) ( struct work* );
// Wait for first work
bool ( *ready_to_mine ) ( struct work*, struct stratum_ctx*, int );
// Diverge mining threads
bool ( *do_this_thread ) ( int );
// After do_this_thread
void ( *resync_threads ) ( int, struct work* );
// No longer needed
json_t* (*longpoll_rpc_call) ( CURL*, int*, char* );
json_t* ( *longpoll_rpc_call ) ( CURL*, int*, char* );
set_t optimizations;
int ( *get_work_data_size ) ();
@@ -270,7 +268,9 @@ void std_get_new_work( struct work *work, struct work *g_work, int thr_id,
uint32_t* end_nonce_ptr );
void sha256d_gen_merkle_root( char *merkle_root, struct stratum_ctx *sctx );
void SHA256_gen_merkle_root ( char *merkle_root, struct stratum_ctx *sctx );
void sha256_gen_merkle_root ( char *merkle_root, struct stratum_ctx *sctx );
// OpenSSL sha256 deprecated
//void SHA256_gen_merkle_root ( char *merkle_root, struct stratum_ctx *sctx );
bool std_le_work_decode( struct work *work );
bool std_be_work_decode( struct work *work );
@@ -286,8 +286,6 @@ char* std_malloc_txs_request( struct work *work );
// Default is do_nothing, little endian is assumed
void set_work_data_big_endian( struct work *work );
double std_calc_network_diff( struct work *work );
void std_build_block_header( struct work* g_work, uint32_t version,
uint32_t *prevhash, uint32_t *merkle_root,
uint32_t ntime, uint32_t nbits,
@@ -297,9 +295,6 @@ void std_build_extraheader( struct work *work, struct stratum_ctx *sctx );
json_t* std_longpoll_rpc_call( CURL *curl, int *err, char *lp_url );
bool std_ready_to_mine( struct work* work, struct stratum_ctx* stratum,
int thr_id );
int std_get_work_data_size();
// Gate admin functions

View File

@@ -344,7 +344,7 @@ static size_t
detect_cpu(void) {
//union { uint8_t s[12]; uint32_t i[3]; } vendor_string;
//cpu_vendors_x86 vendor = cpu_nobody;
x86_regs regs;
x86_regs regs; regs.eax = regs.ebx = regs.ecx = 0;
uint32_t max_level, max_ext_level;
size_t cpu_flags = 0;
#if defined(X86ASM_AVX) || defined(X86_64ASM_AVX)
@@ -460,4 +460,4 @@ get_top_cpuflag_desc(size_t flag) {
#endif
#endif
#endif /* defined(CPU_X86) || defined(CPU_X86_64) */
#endif /* defined(CPU_X86) || defined(CPU_X86_64) */

View File

@@ -4,11 +4,12 @@ typedef void (FASTCALL *scrypt_ROMixfn)(scrypt_mix_word_t *X/*[chunkWords]*/, sc
#endif
/* romix pre/post nop function */
/*
static void asm_calling_convention
scrypt_romix_nop(scrypt_mix_word_t *blocks, size_t nblocks) {
(void)blocks; (void)nblocks;
}
*/
/* romix pre/post endian conversion function */
static void asm_calling_convention
scrypt_romix_convert_endian(scrypt_mix_word_t *blocks, size_t nblocks) {

View File

@@ -77,7 +77,7 @@ bool register_argon2_algo( algo_gate_t* gate )
gate->optimizations = SSE2_OPT | AVX_OPT | AVX2_OPT;
gate->scanhash = (void*)&scanhash_argon2;
gate->hash = (void*)&argon2hash;
gate->gen_merkle_root = (void*)&SHA256_gen_merkle_root;
gate->gen_merkle_root = (void*)&sha256_gen_merkle_root;
opt_target_factor = 65536.0;
return true;

View File

@@ -1,5 +1,5 @@
#include "blake-gate.h"
#include "blake-hash-4way.h"
#include "blake256-hash.h"
#include <string.h>
#include <stdint.h>
#include <memory.h>

View File

@@ -1,192 +0,0 @@
/* $Id: sph_blake.h 252 2011-06-07 17:55:14Z tp $ */
/**
* BLAKE interface. BLAKE is a family of functions which differ by their
* output size; this implementation defines BLAKE for output sizes 224,
* 256, 384 and 512 bits. This implementation conforms to the "third
* round" specification.
*
* ==========================(LICENSE BEGIN)============================
*
* Copyright (c) 2007-2010 Projet RNRT SAPHIR
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* ===========================(LICENSE END)=============================
*
* @file sph_blake.h
* @author Thomas Pornin <thomas.pornin@cryptolog.com>
*/
#ifndef __BLAKE_HASH_4WAY__
#define __BLAKE_HASH_4WAY__ 1
#ifdef __cplusplus
extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "simd-utils.h"
#define SPH_SIZE_blake256 256
#define SPH_SIZE_blake512 512
//////////////////////////
//
// Blake-256 4 way SSE2
typedef struct {
unsigned char buf[64<<2];
uint32_t H[8<<2];
size_t ptr;
uint32_t T0, T1;
int rounds; // 14 for blake, 8 for blakecoin & vanilla
} blake_4way_small_context __attribute__ ((aligned (64)));
// Default, 14 rounds, blake, decred
typedef blake_4way_small_context blake256_4way_context;
void blake256_4way_init(void *ctx);
void blake256_4way_update(void *ctx, const void *data, size_t len);
void blake256_4way_close(void *ctx, void *dst);
// 14 rounds, blake, decred
typedef blake_4way_small_context blake256r14_4way_context;
void blake256r14_4way_init(void *cc);
void blake256r14_4way_update(void *cc, const void *data, size_t len);
void blake256r14_4way_close(void *cc, void *dst);
// 8 rounds, blakecoin, vanilla
typedef blake_4way_small_context blake256r8_4way_context;
void blake256r8_4way_init(void *cc);
void blake256r8_4way_update(void *cc, const void *data, size_t len);
void blake256r8_4way_close(void *cc, void *dst);
#ifdef __AVX2__
//////////////////////////
//
// Blake-256 8 way AVX2
typedef struct {
__m256i buf[16] __attribute__ ((aligned (64)));
__m256i H[8];
size_t ptr;
sph_u32 T0, T1;
int rounds; // 14 for blake, 8 for blakecoin & vanilla
} blake_8way_small_context;
// Default 14 rounds
typedef blake_8way_small_context blake256_8way_context;
void blake256_8way_init(void *cc);
void blake256_8way_update(void *cc, const void *data, size_t len);
void blake256_8way_close(void *cc, void *dst);
// 14 rounds, blake, decred
typedef blake_8way_small_context blake256r14_8way_context;
void blake256r14_8way_init(void *cc);
void blake256r14_8way_update(void *cc, const void *data, size_t len);
void blake256r14_8way_close(void *cc, void *dst);
// 8 rounds, blakecoin, vanilla
typedef blake_8way_small_context blake256r8_8way_context;
void blake256r8_8way_init(void *cc);
void blake256r8_8way_update(void *cc, const void *data, size_t len);
void blake256r8_8way_close(void *cc, void *dst);
// Blake-512 4 way AVX2
typedef struct {
__m256i buf[16];
__m256i H[8];
__m256i S[4];
size_t ptr;
sph_u64 T0, T1;
} blake_4way_big_context __attribute__ ((aligned (128)));
typedef blake_4way_big_context blake512_4way_context;
void blake512_4way_init( blake_4way_big_context *sc );
void blake512_4way_update( void *cc, const void *data, size_t len );
void blake512_4way_close( void *cc, void *dst );
void blake512_4way_full( blake_4way_big_context *sc, void * dst,
const void *data, size_t len );
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
////////////////////////////
//
// Blake-256 16 way AVX512
typedef struct {
__m512i buf[16];
__m512i H[8];
size_t ptr;
uint32_t T0, T1;
int rounds; // 14 for blake, 8 for blakecoin & vanilla
} blake_16way_small_context __attribute__ ((aligned (128)));
// Default 14 rounds
typedef blake_16way_small_context blake256_16way_context;
void blake256_16way_init(void *cc);
void blake256_16way_update(void *cc, const void *data, size_t len);
void blake256_16way_close(void *cc, void *dst);
// 14 rounds, blake, decred
typedef blake_16way_small_context blake256r14_16way_context;
void blake256r14_16way_init(void *cc);
void blake256r14_16way_update(void *cc, const void *data, size_t len);
void blake256r14_16way_close(void *cc, void *dst);
// 8 rounds, blakecoin, vanilla
typedef blake_16way_small_context blake256r8_16way_context;
void blake256r8_16way_init(void *cc);
void blake256r8_16way_update(void *cc, const void *data, size_t len);
void blake256r8_16way_close(void *cc, void *dst);
////////////////////////////
//
//// Blake-512 8 way AVX512
typedef struct {
__m512i buf[16];
__m512i H[8];
__m512i S[4];
size_t ptr;
sph_u64 T0, T1;
} blake_8way_big_context __attribute__ ((aligned (128)));
typedef blake_8way_big_context blake512_8way_context;
void blake512_8way_init( blake_8way_big_context *sc );
void blake512_8way_update( void *cc, const void *data, size_t len );
void blake512_8way_close( void *cc, void *dst );
void blake512_8way_full( blake_8way_big_context *sc, void * dst,
const void *data, size_t len );
void blake512_8way_hash_le80( void *hash, const void *data );
#endif // AVX512
#endif // AVX2
#ifdef __cplusplus
}
#endif
#endif // BLAKE_HASH_4WAY_H__

File diff suppressed because it is too large Load Diff

2599
algo/blake/blake256-hash.c Normal file

File diff suppressed because it is too large Load Diff

124
algo/blake/blake256-hash.h Normal file
View File

@@ -0,0 +1,124 @@
#ifndef BLAKE256_HASH__
#define BLAKE256_HASH__ 1
#include <stddef.h>
#include "simd-utils.h"
/////////////////////////
//
// Blake-256 1 way SSE2
void blake256_transform_le( uint32_t *H, const uint32_t *buf,
const uint32_t T0, const uint32_t T1, int rounds );
//////////////////////////
//
// Blake-256 4 way SSE2
typedef struct {
unsigned char buf[64<<2];
uint32_t H[8<<2];
size_t ptr;
uint32_t T0, T1;
int rounds; // 14 for blake, 8 for blakecoin & vanilla
} blake_4way_small_context __attribute__ ((aligned (64)));
// Default, 14 rounds
typedef blake_4way_small_context blake256_4way_context;
void blake256_4way_init(void *ctx);
void blake256_4way_update(void *ctx, const void *data, size_t len);
void blake256_4way_close(void *ctx, void *dst);
// 14 rounds
typedef blake_4way_small_context blake256r14_4way_context;
void blake256r14_4way_init(void *cc);
void blake256r14_4way_update(void *cc, const void *data, size_t len);
void blake256r14_4way_close(void *cc, void *dst);
// 8 rounds, blakecoin, vanilla
typedef blake_4way_small_context blake256r8_4way_context;
void blake256r8_4way_init(void *cc);
void blake256r8_4way_update(void *cc, const void *data, size_t len);
void blake256r8_4way_close(void *cc, void *dst);
#ifdef __AVX2__
//////////////////////////
//
// Blake-256 8 way AVX2
typedef struct {
__m256i buf[16] __attribute__ ((aligned (64)));
__m256i H[8];
size_t ptr;
uint32_t T0, T1;
int rounds; // 14 for blake, 8 for blakecoin & vanilla
} blake_8way_small_context;
// Default 14 rounds
typedef blake_8way_small_context blake256_8way_context;
void blake256_8way_init(void *cc);
void blake256_8way_update(void *cc, const void *data, size_t len);
void blake256_8way_close(void *cc, void *dst);
void blake256_8way_update_le(void *cc, const void *data, size_t len);
void blake256_8way_close_le(void *cc, void *dst);
void blake256_8way_round0_prehash_le( void *midstate, const void *midhash,
void *data );
void blake256_8way_final_rounds_le( void *final_hash, const void *midstate,
const void *midhash, const void *data, const int rounds );
// 14 rounds, blake, decred
typedef blake_8way_small_context blake256r14_8way_context;
void blake256r14_8way_init(void *cc);
void blake256r14_8way_update(void *cc, const void *data, size_t len);
void blake256r14_8way_close(void *cc, void *dst);
// 8 rounds, blakecoin, vanilla
typedef blake_8way_small_context blake256r8_8way_context;
void blake256r8_8way_init(void *cc);
void blake256r8_8way_update(void *cc, const void *data, size_t len);
void blake256r8_8way_close(void *cc, void *dst);
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
////////////////////////////
//
// Blake-256 16 way AVX512
typedef struct {
__m512i buf[16];
__m512i H[8];
size_t ptr;
uint32_t T0, T1;
int rounds; // 14 for blake, 8 for blakecoin & vanilla
} blake_16way_small_context __attribute__ ((aligned (128)));
// Default 14 rounds
typedef blake_16way_small_context blake256_16way_context;
void blake256_16way_init(void *cc);
void blake256_16way_update(void *cc, const void *data, size_t len);
void blake256_16way_close(void *cc, void *dst);
// Expects data in little endian order, no byte swap needed
void blake256_16way_update_le(void *cc, const void *data, size_t len);
void blake256_16way_close_le(void *cc, void *dst);
void blake256_16way_round0_prehash_le( void *midstate, const void *midhash,
void *data );
void blake256_16way_final_rounds_le( void *final_hash, const void *midstate,
const void *midhash, const void *data, const int rounds );
// 14 rounds, blake, decred
typedef blake_16way_small_context blake256r14_16way_context;
void blake256r14_16way_init(void *cc);
void blake256r14_16way_update(void *cc, const void *data, size_t len);
void blake256r14_16way_close(void *cc, void *dst);
// 8 rounds, blakecoin, vanilla
typedef blake_16way_small_context blake256r8_16way_context;
void blake256r8_16way_init(void *cc);
void blake256r8_16way_update(void *cc, const void *data, size_t len);
void blake256r8_16way_close(void *cc, void *dst);
#endif // AVX512
#endif // AVX2
#endif // BLAKE256_HASH_H__

View File

@@ -1,113 +0,0 @@
/**
* Blake2-B Implementation
* tpruvot@github 2015-2016
*/
#include "blake2b-gate.h"
#include <string.h>
#include <stdint.h>
#include "blake2b-hash-4way.h"
#if defined(BLAKE2B_8WAY)
int scanhash_blake2b_8way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t hash[8*8] __attribute__ ((aligned (128)));;
uint32_t vdata[20*8] __attribute__ ((aligned (64)));;
uint32_t lane_hash[8] __attribute__ ((aligned (64)));
blake2b_8way_ctx ctx __attribute__ ((aligned (64)));
uint32_t *hash7 = &(hash[49]); // 3*16+1
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
int thr_id = mythr->id;
__m512i *noncev = (__m512i*)vdata + 9; // aligned
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
uint32_t n = first_nonce;
mm512_bswap32_intrlv80_8x64( vdata, pdata );
do {
*noncev = mm512_intrlv_blend_32( mm512_bswap_32(
_mm512_set_epi32( n+7, 0, n+6, 0, n+5, 0, n+4, 0,
n+3, 0, n+2, 0, n+1, 0, n , 0 ) ), *noncev );
blake2b_8way_init( &ctx );
blake2b_8way_update( &ctx, vdata, 80 );
blake2b_8way_final( &ctx, hash );
for ( int lane = 0; lane < 8; lane++ )
if ( hash7[ lane<<1 ] <= Htarg )
{
extr_lane_8x64( lane_hash, hash, lane, 256 );
if ( fulltest( lane_hash, ptarget ) && !opt_benchmark )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
n += 8;
} while ( (n < max_nonce-8) && !work_restart[thr_id].restart);
*hashes_done = n - first_nonce + 1;
return 0;
}
#elif defined(BLAKE2B_4WAY)
// Function not used, code inlined.
void blake2b_4way_hash(void *output, const void *input)
{
blake2b_4way_ctx ctx;
blake2b_4way_init( &ctx );
blake2b_4way_update( &ctx, input, 80 );
blake2b_4way_final( &ctx, output );
}
int scanhash_blake2b_4way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t hash[8*4] __attribute__ ((aligned (64)));;
uint32_t vdata[20*4] __attribute__ ((aligned (32)));;
uint32_t lane_hash[8] __attribute__ ((aligned (32)));
blake2b_4way_ctx ctx __attribute__ ((aligned (32)));
uint32_t *hash7 = &(hash[25]); // 3*8+1
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
int thr_id = mythr->id;
__m256i *noncev = (__m256i*)vdata + 9; // aligned
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
uint32_t n = first_nonce;
mm256_bswap32_intrlv80_4x64( vdata, pdata );
do {
*noncev = mm256_intrlv_blend_32( mm256_bswap_32(
_mm256_set_epi32( n+3, 0, n+2, 0, n+1, 0, n, 0 ) ), *noncev );
blake2b_4way_init( &ctx );
blake2b_4way_update( &ctx, vdata, 80 );
blake2b_4way_final( &ctx, hash );
for ( int lane = 0; lane < 4; lane++ )
if ( hash7[ lane<<1 ] <= Htarg )
{
extr_lane_4x64( lane_hash, hash, lane, 256 );
if ( fulltest( lane_hash, ptarget ) && !opt_benchmark )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
n += 4;
} while ( (n < max_nonce-4) && !work_restart[thr_id].restart);
*hashes_done = n - first_nonce + 1;
return 0;
}
#endif

View File

@@ -1,20 +0,0 @@
#include "blake2b-gate.h"
bool register_blake2b_algo( algo_gate_t* gate )
{
#if defined(BLAKE2B_8WAY)
gate->scanhash = (void*)&scanhash_blake2b_8way;
// gate->hash = (void*)&blake2b_8way_hash;
#elif defined(BLAKE2B_4WAY)
gate->scanhash = (void*)&scanhash_blake2b_4way;
gate->hash = (void*)&blake2b_4way_hash;
#else
gate->scanhash = (void*)&scanhash_blake2b;
gate->hash = (void*)&blake2b_hash;
#endif
gate->optimizations = AVX2_OPT | AVX512_OPT;
return true;
};

View File

@@ -1,34 +0,0 @@
#ifndef __BLAKE2B_GATE_H__
#define __BLAKE2B_GATE_H__ 1
#include <stdint.h>
#include "algo-gate-api.h"
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
#define BLAKE2B_8WAY
#elif defined(__AVX2__)
#define BLAKE2B_4WAY
#endif
bool register_blake2b_algo( algo_gate_t* gate );
#if defined(BLAKE2B_8WAY)
//void blake2b_8way_hash( void *state, const void *input );
int scanhash_blake2b_8way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#elif defined(BLAKE2B_4WAY)
void blake2b_4way_hash( void *state, const void *input );
int scanhash_blake2b_4way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#else
void blake2b_hash( void *state, const void *input );
int scanhash_blake2b( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#endif
#endif

View File

@@ -31,7 +31,7 @@
#include <stdint.h>
#include <string.h>
#include "blake2b-hash-4way.h"
#include "blake2b-hash.h"
#if defined(__AVX2__)
@@ -52,6 +52,180 @@ static const uint8_t sigma[12][16] =
};
#define Z00 0
#define Z01 1
#define Z02 2
#define Z03 3
#define Z04 4
#define Z05 5
#define Z06 6
#define Z07 7
#define Z08 8
#define Z09 9
#define Z0A A
#define Z0B B
#define Z0C C
#define Z0D D
#define Z0E E
#define Z0F F
#define Z10 E
#define Z11 A
#define Z12 4
#define Z13 8
#define Z14 9
#define Z15 F
#define Z16 D
#define Z17 6
#define Z18 1
#define Z19 C
#define Z1A 0
#define Z1B 2
#define Z1C B
#define Z1D 7
#define Z1E 5
#define Z1F 3
#define Z20 B
#define Z21 8
#define Z22 C
#define Z23 0
#define Z24 5
#define Z25 2
#define Z26 F
#define Z27 D
#define Z28 A
#define Z29 E
#define Z2A 3
#define Z2B 6
#define Z2C 7
#define Z2D 1
#define Z2E 9
#define Z2F 4
#define Z30 7
#define Z31 9
#define Z32 3
#define Z33 1
#define Z34 D
#define Z35 C
#define Z36 B
#define Z37 E
#define Z38 2
#define Z39 6
#define Z3A 5
#define Z3B A
#define Z3C 4
#define Z3D 0
#define Z3E F
#define Z3F 8
#define Z40 9
#define Z41 0
#define Z42 5
#define Z43 7
#define Z44 2
#define Z45 4
#define Z46 A
#define Z47 F
#define Z48 E
#define Z49 1
#define Z4A B
#define Z4B C
#define Z4C 6
#define Z4D 8
#define Z4E 3
#define Z4F D
#define Z50 2
#define Z51 C
#define Z52 6
#define Z53 A
#define Z54 0
#define Z55 B
#define Z56 8
#define Z57 3
#define Z58 4
#define Z59 D
#define Z5A 7
#define Z5B 5
#define Z5C F
#define Z5D E
#define Z5E 1
#define Z5F 9
#define Z60 C
#define Z61 5
#define Z62 1
#define Z63 F
#define Z64 E
#define Z65 D
#define Z66 4
#define Z67 A
#define Z68 0
#define Z69 7
#define Z6A 6
#define Z6B 3
#define Z6C 9
#define Z6D 2
#define Z6E 8
#define Z6F B
#define Z70 D
#define Z71 B
#define Z72 7
#define Z73 E
#define Z74 C
#define Z75 1
#define Z76 3
#define Z77 9
#define Z78 5
#define Z79 0
#define Z7A F
#define Z7B 4
#define Z7C 8
#define Z7D 6
#define Z7E 2
#define Z7F A
#define Z80 6
#define Z81 F
#define Z82 E
#define Z83 9
#define Z84 B
#define Z85 3
#define Z86 0
#define Z87 8
#define Z88 C
#define Z89 2
#define Z8A D
#define Z8B 7
#define Z8C 1
#define Z8D 4
#define Z8E A
#define Z8F 5
#define Z90 A
#define Z91 2
#define Z92 8
#define Z93 4
#define Z94 7
#define Z95 6
#define Z96 1
#define Z97 5
#define Z98 F
#define Z99 B
#define Z9A 9
#define Z9B E
#define Z9C 3
#define Z9D C
#define Z9E D
#define Z9F 0
#define Mx(r, i) Mx_(Z ## r ## i)
#define Mx_(n) Mx__(n)
#define Mx__(n) M ## n
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
#define B2B8W_G(a, b, c, d, x, y) \
@@ -78,17 +252,17 @@ static void blake2b_8way_compress( blake2b_8way_ctx *ctx, int last )
v[ 5] = ctx->h[5];
v[ 6] = ctx->h[6];
v[ 7] = ctx->h[7];
v[ 8] = m512_const1_64( 0x6A09E667F3BCC908 );
v[ 9] = m512_const1_64( 0xBB67AE8584CAA73B );
v[10] = m512_const1_64( 0x3C6EF372FE94F82B );
v[11] = m512_const1_64( 0xA54FF53A5F1D36F1 );
v[12] = m512_const1_64( 0x510E527FADE682D1 );
v[13] = m512_const1_64( 0x9B05688C2B3E6C1F );
v[14] = m512_const1_64( 0x1F83D9ABFB41BD6B );
v[15] = m512_const1_64( 0x5BE0CD19137E2179 );
v[ 8] = v512_64( 0x6A09E667F3BCC908 );
v[ 9] = v512_64( 0xBB67AE8584CAA73B );
v[10] = v512_64( 0x3C6EF372FE94F82B );
v[11] = v512_64( 0xA54FF53A5F1D36F1 );
v[12] = v512_64( 0x510E527FADE682D1 );
v[13] = v512_64( 0x9B05688C2B3E6C1F );
v[14] = v512_64( 0x1F83D9ABFB41BD6B );
v[15] = v512_64( 0x5BE0CD19137E2179 );
v[12] = _mm512_xor_si512( v[12], _mm512_set1_epi64( ctx->t[0] ) );
v[13] = _mm512_xor_si512( v[13], _mm512_set1_epi64( ctx->t[1] ) );
v[12] = _mm512_xor_si512( v[12], v512_64( ctx->t[0] ) );
v[13] = _mm512_xor_si512( v[13], v512_64( ctx->t[1] ) );
if ( last )
v[14] = mm512_not( v[14] );
@@ -136,16 +310,16 @@ int blake2b_8way_init( blake2b_8way_ctx *ctx )
{
size_t i;
ctx->h[0] = m512_const1_64( 0x6A09E667F3BCC908 );
ctx->h[1] = m512_const1_64( 0xBB67AE8584CAA73B );
ctx->h[2] = m512_const1_64( 0x3C6EF372FE94F82B );
ctx->h[3] = m512_const1_64( 0xA54FF53A5F1D36F1 );
ctx->h[4] = m512_const1_64( 0x510E527FADE682D1 );
ctx->h[5] = m512_const1_64( 0x9B05688C2B3E6C1F );
ctx->h[6] = m512_const1_64( 0x1F83D9ABFB41BD6B );
ctx->h[7] = m512_const1_64( 0x5BE0CD19137E2179 );
ctx->h[0] = v512_64( 0x6A09E667F3BCC908 );
ctx->h[1] = v512_64( 0xBB67AE8584CAA73B );
ctx->h[2] = v512_64( 0x3C6EF372FE94F82B );
ctx->h[3] = v512_64( 0xA54FF53A5F1D36F1 );
ctx->h[4] = v512_64( 0x510E527FADE682D1 );
ctx->h[5] = v512_64( 0x9B05688C2B3E6C1F );
ctx->h[6] = v512_64( 0x1F83D9ABFB41BD6B );
ctx->h[7] = v512_64( 0x5BE0CD19137E2179 );
ctx->h[0] = _mm512_xor_si512( ctx->h[0], m512_const1_64( 0x01010020 ) );
ctx->h[0] = _mm512_xor_si512( ctx->h[0], v512_64( 0x01010020 ) );
ctx->t[0] = 0;
ctx->t[1] = 0;
@@ -214,11 +388,11 @@ void blake2b_8way_final( blake2b_8way_ctx *ctx, void *out )
#define B2B_G(a, b, c, d, x, y) \
{ \
v[a] = _mm256_add_epi64( _mm256_add_epi64( v[a], v[b] ), x ); \
v[d] = mm256_ror_64( _mm256_xor_si256( v[d], v[a] ), 32 ); \
v[d] = mm256_swap64_32( _mm256_xor_si256( v[d], v[a] ) ); \
v[c] = _mm256_add_epi64( v[c], v[d] ); \
v[b] = mm256_ror_64( _mm256_xor_si256( v[b], v[c] ), 24 ); \
v[b] = mm256_shuflr64_24( _mm256_xor_si256( v[b], v[c] ) ); \
v[a] = _mm256_add_epi64( _mm256_add_epi64( v[a], v[b] ), y ); \
v[d] = mm256_ror_64( _mm256_xor_si256( v[d], v[a] ), 16 ); \
v[d] = mm256_shuflr64_16( _mm256_xor_si256( v[d], v[a] ) ); \
v[c] = _mm256_add_epi64( v[c], v[d] ); \
v[b] = mm256_ror_64( _mm256_xor_si256( v[b], v[c] ), 63 ); \
}
@@ -245,17 +419,17 @@ static void blake2b_4way_compress( blake2b_4way_ctx *ctx, int last )
v[ 5] = ctx->h[5];
v[ 6] = ctx->h[6];
v[ 7] = ctx->h[7];
v[ 8] = m256_const1_64( 0x6A09E667F3BCC908 );
v[ 9] = m256_const1_64( 0xBB67AE8584CAA73B );
v[10] = m256_const1_64( 0x3C6EF372FE94F82B );
v[11] = m256_const1_64( 0xA54FF53A5F1D36F1 );
v[12] = m256_const1_64( 0x510E527FADE682D1 );
v[13] = m256_const1_64( 0x9B05688C2B3E6C1F );
v[14] = m256_const1_64( 0x1F83D9ABFB41BD6B );
v[15] = m256_const1_64( 0x5BE0CD19137E2179 );
v[ 8] = v256_64( 0x6A09E667F3BCC908 );
v[ 9] = v256_64( 0xBB67AE8584CAA73B );
v[10] = v256_64( 0x3C6EF372FE94F82B );
v[11] = v256_64( 0xA54FF53A5F1D36F1 );
v[12] = v256_64( 0x510E527FADE682D1 );
v[13] = v256_64( 0x9B05688C2B3E6C1F );
v[14] = v256_64( 0x1F83D9ABFB41BD6B );
v[15] = v256_64( 0x5BE0CD19137E2179 );
v[12] = _mm256_xor_si256( v[12], _mm256_set1_epi64x( ctx->t[0] ) );
v[13] = _mm256_xor_si256( v[13], _mm256_set1_epi64x( ctx->t[1] ) );
v[12] = _mm256_xor_si256( v[12], v256_64( ctx->t[0] ) );
v[13] = _mm256_xor_si256( v[13], v256_64( ctx->t[1] ) );
if ( last )
v[14] = mm256_not( v[14] );
@@ -303,16 +477,16 @@ int blake2b_4way_init( blake2b_4way_ctx *ctx )
{
size_t i;
ctx->h[0] = m256_const1_64( 0x6A09E667F3BCC908 );
ctx->h[1] = m256_const1_64( 0xBB67AE8584CAA73B );
ctx->h[2] = m256_const1_64( 0x3C6EF372FE94F82B );
ctx->h[3] = m256_const1_64( 0xA54FF53A5F1D36F1 );
ctx->h[4] = m256_const1_64( 0x510E527FADE682D1 );
ctx->h[5] = m256_const1_64( 0x9B05688C2B3E6C1F );
ctx->h[6] = m256_const1_64( 0x1F83D9ABFB41BD6B );
ctx->h[7] = m256_const1_64( 0x5BE0CD19137E2179 );
ctx->h[0] = v256_64( 0x6A09E667F3BCC908 );
ctx->h[1] = v256_64( 0xBB67AE8584CAA73B );
ctx->h[2] = v256_64( 0x3C6EF372FE94F82B );
ctx->h[3] = v256_64( 0xA54FF53A5F1D36F1 );
ctx->h[4] = v256_64( 0x510E527FADE682D1 );
ctx->h[5] = v256_64( 0x9B05688C2B3E6C1F );
ctx->h[6] = v256_64( 0x1F83D9ABFB41BD6B );
ctx->h[7] = v256_64( 0x5BE0CD19137E2179 );
ctx->h[0] = _mm256_xor_si256( ctx->h[0], m256_const1_64( 0x01010020 ) );
ctx->h[0] = _mm256_xor_si256( ctx->h[0], v256_64( 0x01010020 ) );
ctx->t[0] = 0;
ctx->t[1] = 0;

View File

@@ -1,64 +1,175 @@
/**
* Blake2-B Implementation
* tpruvot@github 2015-2016
*/
#include "blake2b-gate.h"
#if !defined(BLAKE2B_8WAY) && !defined(BLAKE2B_4WAY)
#include "algo-gate-api.h"
#include <string.h>
#include <stdint.h>
#include "algo/blake/sph_blake2b.h"
#include "blake2b-hash.h"
#define MIDLEN 76
#define A 64
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
#define BLAKE2B_8WAY
#elif defined(__AVX2__)
#define BLAKE2B_4WAY
#endif
void blake2b_hash(void *output, const void *input)
#if defined(BLAKE2B_8WAY)
int scanhash_blake2b_8way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint8_t _ALIGN(A) hash[32];
sph_blake2b_ctx ctx __attribute__ ((aligned (64)));
uint32_t hash[8*8] __attribute__ ((aligned (128)));;
uint32_t vdata[20*8] __attribute__ ((aligned (64)));;
uint32_t lane_hash[8] __attribute__ ((aligned (64)));
blake2b_8way_ctx ctx __attribute__ ((aligned (64)));
uint32_t *hash7 = &(hash[49]); // 3*16+1
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
int thr_id = mythr->id;
__m512i *noncev = (__m512i*)vdata + 9; // aligned
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
sph_blake2b_init(&ctx, 32, NULL, 0);
sph_blake2b_update(&ctx, input, 80);
sph_blake2b_final(&ctx, hash);
uint32_t n = first_nonce;
memcpy(output, hash, 32);
mm512_bswap32_intrlv80_8x64( vdata, pdata );
do {
*noncev = mm512_intrlv_blend_32( mm512_bswap_32(
_mm512_set_epi32( n+7, 0, n+6, 0, n+5, 0, n+4, 0,
n+3, 0, n+2, 0, n+1, 0, n , 0 ) ), *noncev );
blake2b_8way_init( &ctx );
blake2b_8way_update( &ctx, vdata, 80 );
blake2b_8way_final( &ctx, hash );
for ( int lane = 0; lane < 8; lane++ )
if ( hash7[ lane<<1 ] <= Htarg )
{
extr_lane_8x64( lane_hash, hash, lane, 256 );
if ( fulltest( lane_hash, ptarget ) && !opt_benchmark )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
n += 8;
} while ( (n < max_nonce-8) && !work_restart[thr_id].restart);
*hashes_done = n - first_nonce + 1;
return 0;
}
int scanhash_blake2b( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
#elif defined(BLAKE2B_4WAY)
// Function not used, code inlined.
void blake2b_4way_hash(void *output, const void *input)
{
uint32_t _ALIGN(A) vhashcpu[8];
uint32_t _ALIGN(A) endiandata[20];
blake2b_4way_ctx ctx;
blake2b_4way_init( &ctx );
blake2b_4way_update( &ctx, input, 80 );
blake2b_4way_final( &ctx, output );
}
int scanhash_blake2b_4way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t hash[8*4] __attribute__ ((aligned (64)));;
uint32_t vdata[20*4] __attribute__ ((aligned (32)));;
uint32_t lane_hash[8] __attribute__ ((aligned (32)));
blake2b_4way_ctx ctx __attribute__ ((aligned (32)));
uint32_t *hash7 = &(hash[25]); // 3*8+1
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
int thr_id = mythr->id; // thr_id arg is deprecated
int thr_id = mythr->id;
__m256i *noncev = (__m256i*)vdata + 9; // aligned
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
uint32_t n = first_nonce;
for (int i=0; i < 19; i++) {
be32enc(&endiandata[i], pdata[i]);
}
mm256_bswap32_intrlv80_4x64( vdata, pdata );
do {
be32enc(&endiandata[19], n);
blake2b_hash(vhashcpu, endiandata);
*noncev = mm256_intrlv_blend_32( mm256_bswap_32(
_mm256_set_epi32( n+3, 0, n+2, 0, n+1, 0, n, 0 ) ), *noncev );
if (vhashcpu[7] <= Htarg && fulltest(vhashcpu, ptarget))
blake2b_4way_init( &ctx );
blake2b_4way_update( &ctx, vdata, 80 );
blake2b_4way_final( &ctx, hash );
for ( int lane = 0; lane < 4; lane++ )
if ( hash7[ lane<<1 ] <= Htarg )
{
pdata[19] = n;
submit_solution( work, vhashcpu, mythr );
extr_lane_4x64( lane_hash, hash, lane, 256 );
if ( fulltest( lane_hash, ptarget ) && !opt_benchmark )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
n += 4;
} while ( (n < max_nonce-4) && !work_restart[thr_id].restart);
*hashes_done = n - first_nonce + 1;
return 0;
}
#else
#include "algo/blake/sph_blake2b.h"
void blake2b_hash(void *output, const void *input)
{
uint8_t _ALIGN(32) hash[32];
sph_blake2b_ctx ctx __attribute__ ((aligned (32)));
sph_blake2b_init(&ctx, 32, NULL, 0);
sph_blake2b_update(&ctx, input, 80);
sph_blake2b_final(&ctx, hash);
memcpy(output, hash, 32);
}
int scanhash_blake2b( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t _ALIGN(32) hash64[8];
uint32_t _ALIGN(32) endiandata[20];
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
int thr_id = mythr->id;
const uint32_t first_nonce = pdata[19];
uint32_t n = first_nonce;
mm128_bswap32_80( endiandata, pdata );
do {
endiandata[19] = n;
blake2b_hash( hash64, endiandata );
if ( unlikely( valid_hash( hash64, ptarget ) ) && !opt_benchmark )
{
pdata[19] = bswap_32( n );
submit_solution( work, hash64, mythr );
}
n++;
} while (n < max_nonce && !work_restart[thr_id].restart);
*hashes_done = n - first_nonce + 1;
pdata[19] = n;
} while (n < max_nonce && !work_restart[thr_id].restart);
*hashes_done = n - first_nonce + 1;
pdata[19] = n;
return 0;
return 0;
}
#endif
bool register_blake2b_algo( algo_gate_t* gate )
{
#if defined(BLAKE2B_8WAY)
gate->scanhash = (void*)&scanhash_blake2b_8way;
#elif defined(BLAKE2B_4WAY)
gate->scanhash = (void*)&scanhash_blake2b_4way;
gate->hash = (void*)&blake2b_4way_hash;
#else
gate->scanhash = (void*)&scanhash_blake2b;
gate->hash = (void*)&blake2b_hash;
#endif
gate->optimizations = AVX2_OPT | AVX512_OPT;
return true;
};

View File

@@ -1,170 +0,0 @@
#include "blake2s-gate.h"
#include "blake2s-hash-4way.h"
#include <string.h>
#include <stdint.h>
#if defined(BLAKE2S_16WAY)
static __thread blake2s_16way_state blake2s_16w_ctx;
void blake2s_16way_hash( void *output, const void *input )
{
blake2s_16way_state ctx;
memcpy( &ctx, &blake2s_16w_ctx, sizeof ctx );
blake2s_16way_update( &ctx, input + (64<<4), 16 );
blake2s_16way_final( &ctx, output, BLAKE2S_OUTBYTES );
}
int scanhash_blake2s_16way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t vdata[20*16] __attribute__ ((aligned (128)));
uint32_t hash[8*16] __attribute__ ((aligned (64)));
uint32_t lane_hash[8] __attribute__ ((aligned (64)));
uint32_t *hash7 = &(hash[7<<4]);
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
__m512i *noncev = (__m512i*)vdata + 19; // aligned
uint32_t n = first_nonce;
int thr_id = mythr->id;
mm512_bswap32_intrlv80_16x32( vdata, pdata );
blake2s_16way_init( &blake2s_16w_ctx, BLAKE2S_OUTBYTES );
blake2s_16way_update( &blake2s_16w_ctx, vdata, 64 );
do {
*noncev = mm512_bswap_32( _mm512_set_epi32(
n+15, n+14, n+13, n+12, n+11, n+10, n+ 9, n+ 8,
n+ 7, n+ 6, n+ 5, n+ 4, n+ 3, n+ 2, n+ 1, n ) );
pdata[19] = n;
blake2s_16way_hash( hash, vdata );
for ( int lane = 0; lane < 16; lane++ )
if ( unlikely( hash7[lane] <= Htarg ) )
{
extr_lane_16x32( lane_hash, hash, lane, 256 );
if ( likely( fulltest( lane_hash, ptarget ) && !opt_benchmark ) )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
n += 16;
} while ( (n < max_nonce-16) && !work_restart[thr_id].restart );
*hashes_done = n - first_nonce + 1;
return 0;
}
#elif defined(BLAKE2S_8WAY)
static __thread blake2s_8way_state blake2s_8w_ctx;
void blake2s_8way_hash( void *output, const void *input )
{
blake2s_8way_state ctx;
memcpy( &ctx, &blake2s_8w_ctx, sizeof ctx );
blake2s_8way_update( &ctx, input + (64<<3), 16 );
blake2s_8way_final( &ctx, output, BLAKE2S_OUTBYTES );
}
int scanhash_blake2s_8way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t vdata[20*8] __attribute__ ((aligned (64)));
uint32_t hash[8*8] __attribute__ ((aligned (32)));
uint32_t lane_hash[8] __attribute__ ((aligned (32)));
uint32_t *hash7 = &(hash[7<<3]);
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
__m256i *noncev = (__m256i*)vdata + 19; // aligned
uint32_t n = first_nonce;
int thr_id = mythr->id;
mm256_bswap32_intrlv80_8x32( vdata, pdata );
blake2s_8way_init( &blake2s_8w_ctx, BLAKE2S_OUTBYTES );
blake2s_8way_update( &blake2s_8w_ctx, vdata, 64 );
do {
*noncev = mm256_bswap_32( _mm256_set_epi32( n+7, n+6, n+5, n+4,
n+3, n+2, n+1, n ) );
pdata[19] = n;
blake2s_8way_hash( hash, vdata );
for ( int lane = 0; lane < 8; lane++ )
if ( unlikely( hash7[lane] <= Htarg ) )
{
extr_lane_8x32( lane_hash, hash, lane, 256 );
if ( likely( fulltest( lane_hash, ptarget ) && !opt_benchmark ) )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
n += 8;
} while ( (n < max_nonce) && !work_restart[thr_id].restart );
*hashes_done = n - first_nonce + 1;
return 0;
}
#elif defined(BLAKE2S_4WAY)
static __thread blake2s_4way_state blake2s_4w_ctx;
void blake2s_4way_hash( void *output, const void *input )
{
blake2s_4way_state ctx;
memcpy( &ctx, &blake2s_4w_ctx, sizeof ctx );
blake2s_4way_update( &ctx, input + (64<<2), 16 );
blake2s_4way_final( &ctx, output, BLAKE2S_OUTBYTES );
}
int scanhash_blake2s_4way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t vdata[20*4] __attribute__ ((aligned (64)));
uint32_t hash[8*4] __attribute__ ((aligned (32)));
uint32_t lane_hash[8] __attribute__ ((aligned (32)));
uint32_t *hash7 = &(hash[7<<2]);
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
__m128i *noncev = (__m128i*)vdata + 19; // aligned
uint32_t n = first_nonce;
int thr_id = mythr->id;
mm128_bswap32_intrlv80_4x32( vdata, pdata );
blake2s_4way_init( &blake2s_4w_ctx, BLAKE2S_OUTBYTES );
blake2s_4way_update( &blake2s_4w_ctx, vdata, 64 );
do {
*noncev = mm128_bswap_32( _mm_set_epi32( n+3, n+2, n+1, n ) );
pdata[19] = n;
blake2s_4way_hash( hash, vdata );
for ( int lane = 0; lane < 4; lane++ ) if ( hash7[lane] <= Htarg )
{
extr_lane_4x32( lane_hash, hash, lane, 256 );
if ( fulltest( lane_hash, ptarget ) && !opt_benchmark )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
n += 4;
} while ( (n < max_nonce) && !work_restart[thr_id].restart );
*hashes_done = n - first_nonce + 1;
return 0;
}
#endif

View File

@@ -1,23 +0,0 @@
#include "blake2s-gate.h"
bool register_blake2s_algo( algo_gate_t* gate )
{
#if defined(BLAKE2S_16WAY)
gate->scanhash = (void*)&scanhash_blake2s_16way;
gate->hash = (void*)&blake2s_16way_hash;
#elif defined(BLAKE2S_8WAY)
//#if defined(BLAKE2S_8WAY)
gate->scanhash = (void*)&scanhash_blake2s_8way;
gate->hash = (void*)&blake2s_8way_hash;
#elif defined(BLAKE2S_4WAY)
gate->scanhash = (void*)&scanhash_blake2s_4way;
gate->hash = (void*)&blake2s_4way_hash;
#else
gate->scanhash = (void*)&scanhash_blake2s;
gate->hash = (void*)&blake2s_hash;
#endif
gate->optimizations = SSE2_OPT | AVX2_OPT | AVX512_OPT;
return true;
};

View File

@@ -1,46 +0,0 @@
#ifndef __BLAKE2S_GATE_H__
#define __BLAKE2S_GATE_H__ 1
#include <stdint.h>
#include "algo-gate-api.h"
#if defined(__SSE2__)
#define BLAKE2S_4WAY
#endif
#if defined(__AVX2__)
#define BLAKE2S_8WAY
#endif
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
#define BLAKE2S_16WAY
#endif
bool register_blake2s_algo( algo_gate_t* gate );
#if defined(BLAKE2S_16WAY)
void blake2s_16way_hash( void *state, const void *input );
int scanhash_blake2s_16way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#elif defined (BLAKE2S_8WAY)
void blake2s_8way_hash( void *state, const void *input );
int scanhash_blake2s_8way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#elif defined (BLAKE2S_4WAY)
void blake2s_4way_hash( void *state, const void *input );
int scanhash_blake2s_4way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#else
void blake2s_hash( void *state, const void *input );
int scanhash_blake2s( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#endif
#endif

View File

@@ -11,7 +11,7 @@
* this software. If not, see <http://creativecommons.org/publicdomain/zero/1.0/>.
*/
#include "blake2s-hash-4way.h"
#include "blake2s-hash.h"
#include <stdint.h>
#include <string.h>
@@ -62,23 +62,23 @@ int blake2s_4way_init( blake2s_4way_state *S, const uint8_t outlen )
memset( S, 0, sizeof( blake2s_4way_state ) );
S->h[0] = m128_const1_64( 0x6A09E6676A09E667ULL );
S->h[1] = m128_const1_64( 0xBB67AE85BB67AE85ULL );
S->h[2] = m128_const1_64( 0x3C6EF3723C6EF372ULL );
S->h[3] = m128_const1_64( 0xA54FF53AA54FF53AULL );
S->h[4] = m128_const1_64( 0x510E527F510E527FULL );
S->h[5] = m128_const1_64( 0x9B05688C9B05688CULL );
S->h[6] = m128_const1_64( 0x1F83D9AB1F83D9ABULL );
S->h[7] = m128_const1_64( 0x5BE0CD195BE0CD19ULL );
S->h[0] = v128_64( 0x6A09E6676A09E667ULL );
S->h[1] = v128_64( 0xBB67AE85BB67AE85ULL );
S->h[2] = v128_64( 0x3C6EF3723C6EF372ULL );
S->h[3] = v128_64( 0xA54FF53AA54FF53AULL );
S->h[4] = v128_64( 0x510E527F510E527FULL );
S->h[5] = v128_64( 0x9B05688C9B05688CULL );
S->h[6] = v128_64( 0x1F83D9AB1F83D9ABULL );
S->h[7] = v128_64( 0x5BE0CD195BE0CD19ULL );
// for( int i = 0; i < 8; ++i )
// S->h[i] = _mm_set1_epi32( blake2s_IV[i] );
// S->h[i] = v128_32( blake2s_IV[i] );
uint32_t *p = ( uint32_t * )( P );
/* IV XOR ParamBlock */
for ( size_t i = 0; i < 8; ++i )
S->h[i] = _mm_xor_si128( S->h[i], _mm_set1_epi32( p[i] ) );
S->h[i] = _mm_xor_si128( S->h[i], v128_32( p[i] ) );
return 0;
}
@@ -90,29 +90,29 @@ int blake2s_4way_compress( blake2s_4way_state *S, const __m128i* block )
memcpy_128( m, block, 16 );
memcpy_128( v, S->h, 8 );
v[ 8] = m128_const1_64( 0x6A09E6676A09E667ULL );
v[ 9] = m128_const1_64( 0xBB67AE85BB67AE85ULL );
v[10] = m128_const1_64( 0x3C6EF3723C6EF372ULL );
v[11] = m128_const1_64( 0xA54FF53AA54FF53AULL );
v[12] = _mm_xor_si128( _mm_set1_epi32( S->t[0] ),
m128_const1_64( 0x510E527F510E527FULL ) );
v[13] = _mm_xor_si128( _mm_set1_epi32( S->t[1] ),
m128_const1_64( 0x9B05688C9B05688CULL ) );
v[14] = _mm_xor_si128( _mm_set1_epi32( S->f[0] ),
m128_const1_64( 0x1F83D9AB1F83D9ABULL ) );
v[15] = _mm_xor_si128( _mm_set1_epi32( S->f[1] ),
m128_const1_64( 0x5BE0CD195BE0CD19ULL ) );
v[ 8] = v128_64( 0x6A09E6676A09E667ULL );
v[ 9] = v128_64( 0xBB67AE85BB67AE85ULL );
v[10] = v128_64( 0x3C6EF3723C6EF372ULL );
v[11] = v128_64( 0xA54FF53AA54FF53AULL );
v[12] = _mm_xor_si128( v128_32( S->t[0] ),
v128_64( 0x510E527F510E527FULL ) );
v[13] = _mm_xor_si128( v128_32( S->t[1] ),
v128_64( 0x9B05688C9B05688CULL ) );
v[14] = _mm_xor_si128( v128_32( S->f[0] ),
v128_64( 0x1F83D9AB1F83D9ABULL ) );
v[15] = _mm_xor_si128( v128_32( S->f[1] ),
v128_64( 0x5BE0CD195BE0CD19ULL ) );
#define G4W( sigma0, sigma1, a, b, c, d ) \
do { \
uint8_t s0 = sigma0; \
uint8_t s1 = sigma1; \
a = _mm_add_epi32( _mm_add_epi32( a, b ), m[ s0 ] ); \
d = mm128_ror_32( _mm_xor_si128( d, a ), 16 ); \
d = mm128_swap32_16( _mm_xor_si128( d, a ) ); \
c = _mm_add_epi32( c, d ); \
b = mm128_ror_32( _mm_xor_si128( b, c ), 12 ); \
a = _mm_add_epi32( _mm_add_epi32( a, b ), m[ s1 ] ); \
d = mm128_ror_32( _mm_xor_si128( d, a ), 8 ); \
d = mm128_shuflr32_8( _mm_xor_si128( d, a ) ); \
c = _mm_add_epi32( c, d ); \
b = mm128_ror_32( _mm_xor_si128( b, c ), 7 ); \
} while(0)
@@ -269,35 +269,35 @@ int blake2s_8way_compress( blake2s_8way_state *S, const __m256i *block )
memcpy_256( m, block, 16 );
memcpy_256( v, S->h, 8 );
v[ 8] = m256_const1_64( 0x6A09E6676A09E667ULL );
v[ 9] = m256_const1_64( 0xBB67AE85BB67AE85ULL );
v[10] = m256_const1_64( 0x3C6EF3723C6EF372ULL );
v[11] = m256_const1_64( 0xA54FF53AA54FF53AULL );
v[12] = _mm256_xor_si256( _mm256_set1_epi32( S->t[0] ),
m256_const1_64( 0x510E527F510E527FULL ) );
v[ 8] = v256_64( 0x6A09E6676A09E667ULL );
v[ 9] = v256_64( 0xBB67AE85BB67AE85ULL );
v[10] = v256_64( 0x3C6EF3723C6EF372ULL );
v[11] = v256_64( 0xA54FF53AA54FF53AULL );
v[12] = _mm256_xor_si256( v256_32( S->t[0] ),
v256_64( 0x510E527F510E527FULL ) );
v[13] = _mm256_xor_si256( _mm256_set1_epi32( S->t[1] ),
m256_const1_64( 0x9B05688C9B05688CULL ) );
v[13] = _mm256_xor_si256( v256_32( S->t[1] ),
v256_64( 0x9B05688C9B05688CULL ) );
v[14] = _mm256_xor_si256( _mm256_set1_epi32( S->f[0] ),
m256_const1_64( 0x1F83D9AB1F83D9ABULL ) );
v[14] = _mm256_xor_si256( v256_32( S->f[0] ),
v256_64( 0x1F83D9AB1F83D9ABULL ) );
v[15] = _mm256_xor_si256( _mm256_set1_epi32( S->f[1] ),
m256_const1_64( 0x5BE0CD195BE0CD19ULL ) );
v[15] = _mm256_xor_si256( v256_32( S->f[1] ),
v256_64( 0x5BE0CD195BE0CD19ULL ) );
/*
v[ 8] = _mm256_set1_epi32( blake2s_IV[0] );
v[ 9] = _mm256_set1_epi32( blake2s_IV[1] );
v[10] = _mm256_set1_epi32( blake2s_IV[2] );
v[11] = _mm256_set1_epi32( blake2s_IV[3] );
v[12] = _mm256_xor_si256( _mm256_set1_epi32( S->t[0] ),
_mm256_set1_epi32( blake2s_IV[4] ) );
v[13] = _mm256_xor_si256( _mm256_set1_epi32( S->t[1] ),
_mm256_set1_epi32( blake2s_IV[5] ) );
v[14] = _mm256_xor_si256( _mm256_set1_epi32( S->f[0] ),
_mm256_set1_epi32( blake2s_IV[6] ) );
v[15] = _mm256_xor_si256( _mm256_set1_epi32( S->f[1] ),
_mm256_set1_epi32( blake2s_IV[7] ) );
v[ 8] = v256_32( blake2s_IV[0] );
v[ 9] = v256_32( blake2s_IV[1] );
v[10] = v256_32( blake2s_IV[2] );
v[11] = v256_32( blake2s_IV[3] );
v[12] = _mm256_xor_si256( v256_32( S->t[0] ),
v256_32( blake2s_IV[4] ) );
v[13] = _mm256_xor_si256( v256_32( S->t[1] ),
v256_32( blake2s_IV[5] ) );
v[14] = _mm256_xor_si256( v256_32( S->f[0] ),
v256_32( blake2s_IV[6] ) );
v[15] = _mm256_xor_si256( v256_32( S->f[1] ),
v256_32( blake2s_IV[7] ) );
#define G8W(r,i,a,b,c,d) \
@@ -320,11 +320,11 @@ do { \
uint8_t s0 = sigma0; \
uint8_t s1 = sigma1; \
a = _mm256_add_epi32( _mm256_add_epi32( a, b ), m[ s0 ] ); \
d = mm256_ror_32( _mm256_xor_si256( d, a ), 16 ); \
d = mm256_swap32_16( _mm256_xor_si256( d, a ) ); \
c = _mm256_add_epi32( c, d ); \
b = mm256_ror_32( _mm256_xor_si256( b, c ), 12 ); \
a = _mm256_add_epi32( _mm256_add_epi32( a, b ), m[ s1 ] ); \
d = mm256_ror_32( _mm256_xor_si256( d, a ), 8 ); \
d = mm256_shuflr32_8( _mm256_xor_si256( d, a ) ); \
c = _mm256_add_epi32( c, d ); \
b = mm256_ror_32( _mm256_xor_si256( b, c ), 7 ); \
} while(0)
@@ -391,24 +391,24 @@ int blake2s_8way_init( blake2s_8way_state *S, const uint8_t outlen )
memset( P->personal, 0, sizeof( P->personal ) );
memset( S, 0, sizeof( blake2s_8way_state ) );
S->h[0] = m256_const1_64( 0x6A09E6676A09E667ULL );
S->h[1] = m256_const1_64( 0xBB67AE85BB67AE85ULL );
S->h[2] = m256_const1_64( 0x3C6EF3723C6EF372ULL );
S->h[3] = m256_const1_64( 0xA54FF53AA54FF53AULL );
S->h[4] = m256_const1_64( 0x510E527F510E527FULL );
S->h[5] = m256_const1_64( 0x9B05688C9B05688CULL );
S->h[6] = m256_const1_64( 0x1F83D9AB1F83D9ABULL );
S->h[7] = m256_const1_64( 0x5BE0CD195BE0CD19ULL );
S->h[0] = v256_64( 0x6A09E6676A09E667ULL );
S->h[1] = v256_64( 0xBB67AE85BB67AE85ULL );
S->h[2] = v256_64( 0x3C6EF3723C6EF372ULL );
S->h[3] = v256_64( 0xA54FF53AA54FF53AULL );
S->h[4] = v256_64( 0x510E527F510E527FULL );
S->h[5] = v256_64( 0x9B05688C9B05688CULL );
S->h[6] = v256_64( 0x1F83D9AB1F83D9ABULL );
S->h[7] = v256_64( 0x5BE0CD195BE0CD19ULL );
// for( int i = 0; i < 8; ++i )
// S->h[i] = _mm256_set1_epi32( blake2s_IV[i] );
// S->h[i] = v256_32( blake2s_IV[i] );
uint32_t *p = ( uint32_t * )( P );
/* IV XOR ParamBlock */
for ( size_t i = 0; i < 8; ++i )
S->h[i] = _mm256_xor_si256( S->h[i], _mm256_set1_epi32( p[i] ) );
S->h[i] = _mm256_xor_si256( S->h[i], v256_32( p[i] ) );
return 0;
}
@@ -510,21 +510,21 @@ int blake2s_16way_compress( blake2s_16way_state *S, const __m512i *block )
memcpy_512( m, block, 16 );
memcpy_512( v, S->h, 8 );
v[ 8] = m512_const1_64( 0x6A09E6676A09E667ULL );
v[ 9] = m512_const1_64( 0xBB67AE85BB67AE85ULL );
v[10] = m512_const1_64( 0x3C6EF3723C6EF372ULL );
v[11] = m512_const1_64( 0xA54FF53AA54FF53AULL );
v[12] = _mm512_xor_si512( _mm512_set1_epi32( S->t[0] ),
m512_const1_64( 0x510E527F510E527FULL ) );
v[ 8] = v512_64( 0x6A09E6676A09E667ULL );
v[ 9] = v512_64( 0xBB67AE85BB67AE85ULL );
v[10] = v512_64( 0x3C6EF3723C6EF372ULL );
v[11] = v512_64( 0xA54FF53AA54FF53AULL );
v[12] = _mm512_xor_si512( v512_32( S->t[0] ),
v512_64( 0x510E527F510E527FULL ) );
v[13] = _mm512_xor_si512( _mm512_set1_epi32( S->t[1] ),
m512_const1_64( 0x9B05688C9B05688CULL ) );
v[13] = _mm512_xor_si512( v512_32( S->t[1] ),
v512_64( 0x9B05688C9B05688CULL ) );
v[14] = _mm512_xor_si512( _mm512_set1_epi32( S->f[0] ),
m512_const1_64( 0x1F83D9AB1F83D9ABULL ) );
v[14] = _mm512_xor_si512( v512_32( S->f[0] ),
v512_64( 0x1F83D9AB1F83D9ABULL ) );
v[15] = _mm512_xor_si512( _mm512_set1_epi32( S->f[1] ),
m512_const1_64( 0x5BE0CD195BE0CD19ULL ) );
v[15] = _mm512_xor_si512( v512_32( S->f[1] ),
v512_64( 0x5BE0CD195BE0CD19ULL ) );
#define G16W( sigma0, sigma1, a, b, c, d) \
@@ -589,20 +589,20 @@ int blake2s_16way_init( blake2s_16way_state *S, const uint8_t outlen )
memset( P->personal, 0, sizeof( P->personal ) );
memset( S, 0, sizeof( blake2s_16way_state ) );
S->h[0] = m512_const1_64( 0x6A09E6676A09E667ULL );
S->h[1] = m512_const1_64( 0xBB67AE85BB67AE85ULL );
S->h[2] = m512_const1_64( 0x3C6EF3723C6EF372ULL );
S->h[3] = m512_const1_64( 0xA54FF53AA54FF53AULL );
S->h[4] = m512_const1_64( 0x510E527F510E527FULL );
S->h[5] = m512_const1_64( 0x9B05688C9B05688CULL );
S->h[6] = m512_const1_64( 0x1F83D9AB1F83D9ABULL );
S->h[7] = m512_const1_64( 0x5BE0CD195BE0CD19ULL );
S->h[0] = v512_64( 0x6A09E6676A09E667ULL );
S->h[1] = v512_64( 0xBB67AE85BB67AE85ULL );
S->h[2] = v512_64( 0x3C6EF3723C6EF372ULL );
S->h[3] = v512_64( 0xA54FF53AA54FF53AULL );
S->h[4] = v512_64( 0x510E527F510E527FULL );
S->h[5] = v512_64( 0x9B05688C9B05688CULL );
S->h[6] = v512_64( 0x1F83D9AB1F83D9ABULL );
S->h[7] = v512_64( 0x5BE0CD195BE0CD19ULL );
uint32_t *p = ( uint32_t * )( P );
/* IV XOR ParamBlock */
for ( size_t i = 0; i < 8; ++i )
S->h[i] = _mm512_xor_si512( S->h[i], _mm512_set1_epi32( p[i] ) );
S->h[i] = _mm512_xor_si512( S->h[i], v512_32( p[i] ) );
return 0;
}

View File

@@ -1,75 +1,252 @@
#include "blake2s-gate.h"
#if !defined(BLAKE2S_16WAY) && !defined(BLAKE2S_8WAY) && !defined(BLAKE2S)
#include "algo-gate-api.h"
#include "blake2s-hash.h"
#include <string.h>
#include <stdint.h>
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
#define BLAKE2S_16WAY
#elif defined(__AVX2__)
#define BLAKE2S_8WAY
#elif defined(__SSE2__)
#define BLAKE2S_4WAY
#endif
#if defined(BLAKE2S_16WAY)
static __thread blake2s_16way_state blake2s_16w_ctx;
void blake2s_16way_hash( void *output, const void *input )
{
blake2s_16way_state ctx;
memcpy( &ctx, &blake2s_16w_ctx, sizeof ctx );
blake2s_16way_update( &ctx, input + (64<<4), 16 );
blake2s_16way_final( &ctx, output, BLAKE2S_OUTBYTES );
}
int scanhash_blake2s_16way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t vdata[20*16] __attribute__ ((aligned (128)));
uint32_t hash[8*16] __attribute__ ((aligned (64)));
uint32_t lane_hash[8] __attribute__ ((aligned (64)));
uint32_t *hash7 = &(hash[7<<4]);
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
__m512i *noncev = (__m512i*)vdata + 19; // aligned
uint32_t n = first_nonce;
int thr_id = mythr->id;
mm512_bswap32_intrlv80_16x32( vdata, pdata );
blake2s_16way_init( &blake2s_16w_ctx, BLAKE2S_OUTBYTES );
blake2s_16way_update( &blake2s_16w_ctx, vdata, 64 );
do {
*noncev = mm512_bswap_32( _mm512_set_epi32(
n+15, n+14, n+13, n+12, n+11, n+10, n+ 9, n+ 8,
n+ 7, n+ 6, n+ 5, n+ 4, n+ 3, n+ 2, n+ 1, n ) );
pdata[19] = n;
blake2s_16way_hash( hash, vdata );
for ( int lane = 0; lane < 16; lane++ )
if ( unlikely( hash7[lane] <= Htarg ) )
{
extr_lane_16x32( lane_hash, hash, lane, 256 );
if ( likely( fulltest( lane_hash, ptarget ) && !opt_benchmark ) )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
n += 16;
} while ( (n < max_nonce-16) && !work_restart[thr_id].restart );
*hashes_done = n - first_nonce + 1;
return 0;
}
#elif defined(BLAKE2S_8WAY)
static __thread blake2s_8way_state blake2s_8w_ctx;
void blake2s_8way_hash( void *output, const void *input )
{
blake2s_8way_state ctx;
memcpy( &ctx, &blake2s_8w_ctx, sizeof ctx );
blake2s_8way_update( &ctx, input + (64<<3), 16 );
blake2s_8way_final( &ctx, output, BLAKE2S_OUTBYTES );
}
int scanhash_blake2s_8way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t vdata[20*8] __attribute__ ((aligned (64)));
uint32_t hash[8*8] __attribute__ ((aligned (32)));
uint32_t lane_hash[8] __attribute__ ((aligned (32)));
uint32_t *hash7 = &(hash[7<<3]);
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
__m256i *noncev = (__m256i*)vdata + 19; // aligned
uint32_t n = first_nonce;
int thr_id = mythr->id;
mm256_bswap32_intrlv80_8x32( vdata, pdata );
blake2s_8way_init( &blake2s_8w_ctx, BLAKE2S_OUTBYTES );
blake2s_8way_update( &blake2s_8w_ctx, vdata, 64 );
do {
*noncev = mm256_bswap_32( _mm256_set_epi32( n+7, n+6, n+5, n+4,
n+3, n+2, n+1, n ) );
pdata[19] = n;
blake2s_8way_hash( hash, vdata );
for ( int lane = 0; lane < 8; lane++ )
if ( unlikely( hash7[lane] <= Htarg ) )
{
extr_lane_8x32( lane_hash, hash, lane, 256 );
if ( likely( fulltest( lane_hash, ptarget ) && !opt_benchmark ) )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
n += 8;
} while ( (n < max_nonce) && !work_restart[thr_id].restart );
*hashes_done = n - first_nonce + 1;
return 0;
}
#elif defined(BLAKE2S_4WAY)
static __thread blake2s_4way_state blake2s_4w_ctx;
void blake2s_4way_hash( void *output, const void *input )
{
blake2s_4way_state ctx;
memcpy( &ctx, &blake2s_4w_ctx, sizeof ctx );
blake2s_4way_update( &ctx, input + (64<<2), 16 );
blake2s_4way_final( &ctx, output, BLAKE2S_OUTBYTES );
}
int scanhash_blake2s_4way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t vdata[20*4] __attribute__ ((aligned (64)));
uint32_t hash[8*4] __attribute__ ((aligned (32)));
uint32_t lane_hash[8] __attribute__ ((aligned (32)));
uint32_t *hash7 = &(hash[7<<2]);
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
__m128i *noncev = (__m128i*)vdata + 19; // aligned
uint32_t n = first_nonce;
int thr_id = mythr->id;
mm128_bswap32_intrlv80_4x32( vdata, pdata );
blake2s_4way_init( &blake2s_4w_ctx, BLAKE2S_OUTBYTES );
blake2s_4way_update( &blake2s_4w_ctx, vdata, 64 );
do {
*noncev = mm128_bswap_32( _mm_set_epi32( n+3, n+2, n+1, n ) );
pdata[19] = n;
blake2s_4way_hash( hash, vdata );
for ( int lane = 0; lane < 4; lane++ ) if ( hash7[lane] <= Htarg )
{
extr_lane_4x32( lane_hash, hash, lane, 256 );
if ( fulltest( lane_hash, ptarget ) && !opt_benchmark )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
n += 4;
} while ( (n < max_nonce) && !work_restart[thr_id].restart );
*hashes_done = n - first_nonce + 1;
return 0;
}
#else
#include "sph-blake2s.h"
static __thread blake2s_state blake2s_ctx;
//static __thread blake2s_state s_ctx;
#define MIDLEN 76
void blake2s_hash( void *output, const void *input )
{
unsigned char _ALIGN(64) hash[BLAKE2S_OUTBYTES];
blake2s_state ctx __attribute__ ((aligned (64)));
unsigned char _ALIGN(32) hash[BLAKE2S_OUTBYTES];
blake2s_state ctx __attribute__ ((aligned (32)));
memcpy( &ctx, &blake2s_ctx, sizeof ctx );
blake2s_update( &ctx, input+64, 16 );
// blake2s_init(&ctx, BLAKE2S_OUTBYTES);
// blake2s_update(&ctx, input, 80);
blake2s_final( &ctx, hash, BLAKE2S_OUTBYTES );
blake2s_final( &ctx, hash, BLAKE2S_OUTBYTES );
memcpy(output, hash, 32);
memcpy(output, hash, 32);
}
/*
static void blake2s_hash_end(uint32_t *output, const uint32_t *input)
int scanhash_blake2s( struct work *work,uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
s_ctx.buflen = MIDLEN;
memcpy(&s_ctx, &s_midstate, 32 + 16 + MIDLEN);
blake2s_update(&s_ctx, (uint8_t*) &input[MIDLEN/4], 80 - MIDLEN);
blake2s_final(&s_ctx, (uint8_t*) output, BLAKE2S_OUTBYTES);
uint32_t *pdata = work->data;
const uint32_t *ptarget = work->target;
uint32_t _ALIGN(32) hash32[8];
uint32_t _ALIGN(32) endiandata[20];
const int thr_id = mythr->id;
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
uint32_t n = first_nonce;
mm128_bswap32_80( endiandata, pdata );
// midstate
blake2s_init( &blake2s_ctx, BLAKE2S_OUTBYTES );
blake2s_update( &blake2s_ctx, (uint8_t*) endiandata, 64 );
do
{
endiandata[19] = n;
blake2s_hash( hash32, endiandata );
if ( unlikely( valid_hash( hash32, ptarget ) ) && !opt_benchmark )
{
pdata[19] = bswap_32( n );
submit_solution( work, hash32, mythr );
}
n++;
} while (n < max_nonce && !work_restart[thr_id].restart);
*hashes_done = n - first_nonce + 1;
pdata[19] = n;
return 0;
}
*/
int scanhash_blake2s( struct work *work,
uint32_t max_nonce, uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
uint32_t _ALIGN(64) hash64[8];
uint32_t _ALIGN(64) endiandata[20];
int thr_id = mythr->id; // thr_id arg is deprecated
const uint32_t Htarg = ptarget[7];
const uint32_t first_nonce = pdata[19];
uint32_t n = first_nonce;
swab32_array( endiandata, pdata, 20 );
// midstate
blake2s_init( &blake2s_ctx, BLAKE2S_OUTBYTES );
blake2s_update( &blake2s_ctx, (uint8_t*) endiandata, 64 );
do {
be32enc(&endiandata[19], n);
blake2s_hash( hash64, endiandata );
if (hash64[7] <= Htarg && fulltest(hash64, ptarget)) {
*hashes_done = n - first_nonce + 1;
pdata[19] = n;
return true;
}
n++;
} while (n < max_nonce && !work_restart[thr_id].restart);
*hashes_done = n - first_nonce + 1;
pdata[19] = n;
return 0;
}
#endif
bool register_blake2s_algo( algo_gate_t* gate )
{
#if defined(BLAKE2S_16WAY)
gate->scanhash = (void*)&scanhash_blake2s_16way;
gate->hash = (void*)&blake2s_16way_hash;
#elif defined(BLAKE2S_8WAY)
gate->scanhash = (void*)&scanhash_blake2s_8way;
gate->hash = (void*)&blake2s_8way_hash;
#elif defined(BLAKE2S_4WAY)
gate->scanhash = (void*)&scanhash_blake2s_4way;
gate->hash = (void*)&blake2s_4way_hash;
#else
gate->scanhash = (void*)&scanhash_blake2s;
gate->hash = (void*)&blake2s_hash;
#endif
gate->optimizations = SSE2_OPT | AVX2_OPT | AVX512_OPT;
return true;
};

File diff suppressed because it is too large Load Diff

1877
algo/blake/blake512-hash.c Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,83 @@
#ifndef BLAKE512_HASH__
#define BLAKE512_HASH__ 1
#include <stddef.h>
#include "simd-utils.h"
/////////////////////////
//
// Blake-512 1 way SSE2 & AVX2
typedef struct {
unsigned char buf[128]; /* first field, for alignment */
uint64_t H[8];
uint64_t T0, T1;
size_t ptr;
} blake512_context __attribute__ ((aligned (32)));
void blake512_transform( uint64_t *H, const uint64_t *buf,
const uint64_t T0, const uint64_t T1 );
void blake512_init( blake512_context *sc );
void blake512_update( blake512_context *sc, const void *data, size_t len );
void blake512_close( blake512_context *sc, void *dst );
void blake512_full( blake512_context *sc, void *dst, const void *data,
size_t len );
#ifdef __AVX2__
// Blake-512 4 way AVX2
typedef struct {
__m256i buf[16];
__m256i H[8];
__m256i S[4];
size_t ptr;
uint64_t T0, T1;
} blake_4way_big_context __attribute__ ((aligned (64)));
typedef blake_4way_big_context blake512_4way_context;
void blake512_4way_init( blake_4way_big_context *sc );
void blake512_4way_update( void *cc, const void *data, size_t len );
void blake512_4way_close( void *cc, void *dst );
void blake512_4way_full( blake_4way_big_context *sc, void * dst,
const void *data, size_t len );
void blake512_4way_full_le( blake_4way_big_context *sc, void * dst,
const void *data, size_t len );
void blake512_4way_prehash_le( blake_4way_big_context *sc, __m256i *midstate,
const void *data );
void blake512_4way_final_le( blake_4way_big_context *sc, void *hash,
const __m256i nonce, const __m256i *midstate );
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
////////////////////////////
//
//// Blake-512 8 way AVX512
typedef struct {
__m512i buf[16];
__m512i H[8];
__m512i S[4];
size_t ptr;
uint64_t T0, T1;
} blake_8way_big_context __attribute__ ((aligned (128)));
typedef blake_8way_big_context blake512_8way_context;
void blake512_8way_init( blake_8way_big_context *sc );
void blake512_8way_update( void *cc, const void *data, size_t len );
void blake512_8way_close( void *cc, void *dst );
void blake512_8way_full( blake_8way_big_context *sc, void * dst,
const void *data, size_t len );
void blake512_8way_full_le( blake_8way_big_context *sc, void * dst,
const void *data, size_t len );
void blake512_8way_prehash_le( blake_8way_big_context *sc, __m512i *midstate,
const void *data );
void blake512_8way_final_le( blake_8way_big_context *sc, void *hash,
const __m512i nonce, const __m512i *midstate );
#endif // AVX512
#endif // AVX2
#endif // BLAKE512_HASH_H__

View File

@@ -1,10 +1,152 @@
#include "blakecoin-gate.h"
#include "blake-hash-4way.h"
#include "blake256-hash.h"
#include <string.h>
#include <stdint.h>
#include <memory.h>
#if defined (BLAKECOIN_4WAY)
#define rounds 8
#if defined (BLAKECOIN_16WAY)
int scanhash_blakecoin_16way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t hash32[8*16] __attribute__ ((aligned (64)));
uint32_t midstate_vars[16*16] __attribute__ ((aligned (64)));
__m512i block0_hash[8] __attribute__ ((aligned (64)));
__m512i block_buf[16] __attribute__ ((aligned (64)));
uint32_t lane_hash[8] __attribute__ ((aligned (32)));
uint32_t *hash32_d7 = (uint32_t*)&( ((__m512i*)hash32)[7] );
uint32_t *pdata = work->data;
const uint32_t *ptarget = work->target;
const uint32_t targ32_d7 = ptarget[7];
uint32_t phash[8] __attribute__ ((aligned (64))) =
{
0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A,
0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19
};
uint32_t n = pdata[19];
const uint32_t first_nonce = (const uint32_t) n;
const uint32_t last_nonce = max_nonce - 16;
const int thr_id = mythr->id;
const bool bench = opt_benchmark;
const __m512i sixteen = v512_32( 16 );
// Prehash first block
blake256_transform_le( phash, pdata, 512, 0, rounds );
block0_hash[0] = v512_32( phash[0] );
block0_hash[1] = v512_32( phash[1] );
block0_hash[2] = v512_32( phash[2] );
block0_hash[3] = v512_32( phash[3] );
block0_hash[4] = v512_32( phash[4] );
block0_hash[5] = v512_32( phash[5] );
block0_hash[6] = v512_32( phash[6] );
block0_hash[7] = v512_32( phash[7] );
// Build vectored second block, interleave last 16 bytes of data using
// unique nonces.
block_buf[0] = v512_32( pdata[16] );
block_buf[1] = v512_32( pdata[17] );
block_buf[2] = v512_32( pdata[18] );
block_buf[3] =
_mm512_set_epi32( n+15, n+14, n+13, n+12, n+11, n+10, n+ 9, n+ 8,
n+ 7, n+ 6, n+ 5, n+ 4, n+ 3, n+ 2, n +1, n );
// Partialy prehash second block without touching nonces in block_buf[3].
blake256_16way_round0_prehash_le( midstate_vars, block0_hash, block_buf );
do {
blake256_16way_final_rounds_le( hash32, midstate_vars, block0_hash,
block_buf, rounds );
for ( int lane = 0; lane < 16; lane++ )
if ( unlikely( hash32_d7[ lane ] <= targ32_d7 ) )
{
extr_lane_16x32( lane_hash, hash32, lane, 256 );
if ( likely( valid_hash( lane_hash, ptarget ) && !bench ) )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
block_buf[3] = _mm512_add_epi32( block_buf[3], sixteen );
n += 16;
} while ( likely( (n < last_nonce) && !work_restart[thr_id].restart) );
pdata[19] = n;
*hashes_done = n - first_nonce;
return 0;
}
#elif defined (BLAKECOIN_8WAY)
int scanhash_blakecoin_8way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t hash32[8*8] __attribute__ ((aligned (64)));
uint32_t midstate_vars[16*8] __attribute__ ((aligned (32)));
__m256i block0_hash[8] __attribute__ ((aligned (32)));
__m256i block_buf[16] __attribute__ ((aligned (32)));
uint32_t lane_hash[8] __attribute__ ((aligned (32)));
uint32_t *hash32_d7 = (uint32_t*)&( ((__m256i*)hash32)[7] );
uint32_t *pdata = work->data;
const uint32_t *ptarget = work->target;
const uint32_t targ32_d7 = ptarget[7];
uint32_t phash[8] __attribute__ ((aligned (32))) =
{
0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A,
0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19
};
uint32_t n = pdata[19];
const uint32_t first_nonce = (const uint32_t) n;
const uint32_t last_nonce = max_nonce - 8;
const int thr_id = mythr->id;
const bool bench = opt_benchmark;
const __m256i eight = v256_32( 8 );
// Prehash first block
blake256_transform_le( phash, pdata, 512, 0, rounds );
block0_hash[0] = v256_32( phash[0] );
block0_hash[1] = v256_32( phash[1] );
block0_hash[2] = v256_32( phash[2] );
block0_hash[3] = v256_32( phash[3] );
block0_hash[4] = v256_32( phash[4] );
block0_hash[5] = v256_32( phash[5] );
block0_hash[6] = v256_32( phash[6] );
block0_hash[7] = v256_32( phash[7] );
// Build vectored second block, interleave last 16 bytes of data using
// unique nonces.
block_buf[0] = v256_32( pdata[16] );
block_buf[1] = v256_32( pdata[17] );
block_buf[2] = v256_32( pdata[18] );
block_buf[3] = _mm256_set_epi32( n+7, n+6, n+5, n+4, n+3, n+2, n+1, n );
// Partialy prehash second block without touching nonces in block_buf[3].
blake256_8way_round0_prehash_le( midstate_vars, block0_hash, block_buf );
do {
blake256_8way_final_rounds_le( hash32, midstate_vars, block0_hash,
block_buf, rounds );
for ( int lane = 0; lane < 8; lane++ )
if ( unlikely( hash32_d7[ lane ] <= targ32_d7 ) )
{
extr_lane_8x32( lane_hash, hash32, lane, 256 );
if ( likely( valid_hash( lane_hash, ptarget ) && !bench ) )
{
pdata[19] = n + lane;
submit_solution( work, lane_hash, mythr );
}
}
block_buf[3] = _mm256_add_epi32( block_buf[3], eight );
n += 8;
} while ( likely( (n < last_nonce) && !work_restart[thr_id].restart) );
pdata[19] = n;
*hashes_done = n - first_nonce;
return 0;
}
#elif defined (BLAKECOIN_4WAY)
blake256r8_4way_context blakecoin_4w_ctx;
@@ -61,61 +203,3 @@ int scanhash_blakecoin_4way( struct work *work, uint32_t max_nonce,
#endif
#if defined(BLAKECOIN_8WAY)
blake256r8_8way_context blakecoin_8w_ctx;
void blakecoin_8way_hash( void *state, const void *input )
{
uint32_t vhash[8*8] __attribute__ ((aligned (64)));
blake256r8_8way_context ctx;
memcpy( &ctx, &blakecoin_8w_ctx, sizeof ctx );
blake256r8_8way_update( &ctx, input + (64<<3), 16 );
blake256r8_8way_close( &ctx, vhash );
dintrlv_8x32( state, state+ 32, state+ 64, state+ 96, state+128,
state+160, state+192, state+224, vhash, 256 );
}
int scanhash_blakecoin_8way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t vdata[20*8] __attribute__ ((aligned (64)));
uint32_t hash[8*8] __attribute__ ((aligned (32)));
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t first_nonce = pdata[19];
uint32_t HTarget = ptarget[7];
uint32_t n = first_nonce;
__m256i *noncev = (__m256i*)vdata + 19; // aligned
int thr_id = mythr->id; // thr_id arg is deprecated
if ( opt_benchmark )
HTarget = 0x7f;
mm256_bswap32_intrlv80_8x32( vdata, pdata );
blake256r8_8way_init( &blakecoin_8w_ctx );
blake256r8_8way_update( &blakecoin_8w_ctx, vdata, 64 );
do {
*noncev = mm256_bswap_32( _mm256_set_epi32( n+7, n+6, n+5, n+4,
n+3, n+2, n+1, n ) );
pdata[19] = n;
blakecoin_8way_hash( hash, vdata );
for ( int i = 0; i < 8; i++ )
if ( (hash+(i<<3))[7] <= HTarget && fulltest( hash+(i<<3), ptarget )
&& !opt_benchmark )
{
pdata[19] = n+i;
submit_solution( work, hash+(i<<3), mythr );
}
n += 8;
} while ( (n < max_nonce) && !work_restart[thr_id].restart );
*hashes_done = n - first_nonce + 1;
return 0;
}
#endif

View File

@@ -4,10 +4,10 @@
// vanilla uses default gen merkle root, otherwise identical to blakecoin
bool register_vanilla_algo( algo_gate_t* gate )
{
#if defined(BLAKECOIN_8WAY)
#if defined(BLAKECOIN_16WAY)
gate->scanhash = (void*)&scanhash_blakecoin_16way;
#elif defined(BLAKECOIN_8WAY)
gate->scanhash = (void*)&scanhash_blakecoin_8way;
gate->hash = (void*)&blakecoin_8way_hash;
#elif defined(BLAKECOIN_4WAY)
gate->scanhash = (void*)&scanhash_blakecoin_4way;
gate->hash = (void*)&blakecoin_4way_hash;
@@ -15,14 +15,14 @@ bool register_vanilla_algo( algo_gate_t* gate )
gate->scanhash = (void*)&scanhash_blakecoin;
gate->hash = (void*)&blakecoinhash;
#endif
gate->optimizations = SSE42_OPT | AVX2_OPT;
gate->optimizations = SSE2_OPT | AVX2_OPT | AVX512_OPT;
return true;
}
bool register_blakecoin_algo( algo_gate_t* gate )
{
register_vanilla_algo( gate );
gate->gen_merkle_root = (void*)&SHA256_gen_merkle_root;
gate->gen_merkle_root = (void*)&sha256_gen_merkle_root;
return true;
}

View File

@@ -1,30 +1,36 @@
#ifndef __BLAKECOIN_GATE_H__
#define __BLAKECOIN_GATE_H__ 1
#ifndef BLAKECOIN_GATE_H__
#define BLAKECOIN_GATE_H__ 1
#include "algo-gate-api.h"
#include <stdint.h>
#if defined(__SSE4_2__)
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
#define BLAKECOIN_16WAY
#elif defined(__AVX2__)
#define BLAKECOIN_8WAY
#elif defined(__SSE2__) // always true
#define BLAKECOIN_4WAY
#endif
#if defined(__AVX2__)
#define BLAKECOIN_8WAY
#endif
#if defined (BLAKECOIN_8WAY)
void blakecoin_8way_hash(void *state, const void *input);
#if defined (BLAKECOIN_16WAY)
int scanhash_blakecoin_16way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#elif defined (BLAKECOIN_8WAY)
//void blakecoin_8way_hash(void *state, const void *input);
int scanhash_blakecoin_8way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#endif
#if defined (BLAKECOIN_4WAY)
#elif defined (BLAKECOIN_4WAY)
void blakecoin_4way_hash(void *state, const void *input);
int scanhash_blakecoin_4way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#endif
#else // never used
void blakecoinhash( void *state, const void *input );
int scanhash_blakecoin( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#endif
#endif

View File

@@ -1,6 +1,6 @@
#include "blakecoin-gate.h"
#if !defined(BLAKECOIN_8WAY) && !defined(BLAKECOIN_4WAY)
#if !defined(BLAKECOIN_16WAY) && !defined(BLAKECOIN_8WAY) && !defined(BLAKECOIN_4WAY)
#define BLAKE32_ROUNDS 8
#include "sph_blake.h"
@@ -12,7 +12,6 @@ void blakecoin_close(void *cc, void *dst);
#include <string.h>
#include <stdint.h>
#include <memory.h>
#include <openssl/sha.h>
// context management is staged for efficiency.
// 1. global initial ctx cached on startup
@@ -35,8 +34,8 @@ void blakecoinhash( void *state, const void *input )
uint8_t hash[64] __attribute__ ((aligned (32)));
uint8_t *ending = (uint8_t*) input + 64;
// copy cached midstate
memcpy( &ctx, &blake_mid_ctx, sizeof ctx );
// copy cached midstate
memcpy( &ctx, &blake_mid_ctx, sizeof ctx );
blakecoin( &ctx, ending, 16 );
blakecoin_close( &ctx, hash );
memcpy( state, hash, 32 );
@@ -45,8 +44,8 @@ void blakecoinhash( void *state, const void *input )
int scanhash_blakecoin( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t first_nonce = pdata[19];
uint32_t HTarget = ptarget[7];
int thr_id = mythr->id; // thr_id arg is deprecated
@@ -60,10 +59,10 @@ int scanhash_blakecoin( struct work *work, uint32_t max_nonce,
HTarget = 0x7f;
// we need big endian data...
for (int kk=0; kk < 19; kk++)
be32enc(&endiandata[kk], ((uint32_t*)pdata)[kk]);
for (int kk=0; kk < 19; kk++)
be32enc(&endiandata[kk], ((uint32_t*)pdata)[kk]);
blake_midstate_init( endiandata );
blake_midstate_init( endiandata );
#ifdef DEBUG_ALGO
applog(LOG_DEBUG,"[%d] Target=%08x %08x", thr_id, ptarget[6], ptarget[7]);

View File

@@ -1,74 +0,0 @@
#include "decred-gate.h"
#include "blake-hash-4way.h"
#include <string.h>
#include <stdint.h>
#include <memory.h>
#include <unistd.h>
#if defined (DECRED_4WAY)
static __thread blake256_4way_context blake_mid;
void decred_hash_4way( void *state, const void *input )
{
uint32_t vhash[8*4] __attribute__ ((aligned (64)));
// uint32_t hash0[8] __attribute__ ((aligned (32)));
// uint32_t hash1[8] __attribute__ ((aligned (32)));
// uint32_t hash2[8] __attribute__ ((aligned (32)));
// uint32_t hash3[8] __attribute__ ((aligned (32)));
const void *tail = input + ( DECRED_MIDSTATE_LEN << 2 );
int tail_len = 180 - DECRED_MIDSTATE_LEN;
blake256_4way_context ctx __attribute__ ((aligned (64)));
memcpy( &ctx, &blake_mid, sizeof(blake_mid) );
blake256_4way_update( &ctx, tail, tail_len );
blake256_4way_close( &ctx, vhash );
dintrlv_4x32( state, state+32, state+64, state+96, vhash, 256 );
}
int scanhash_decred_4way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t vdata[48*4] __attribute__ ((aligned (64)));
uint32_t hash[8*4] __attribute__ ((aligned (32)));
uint32_t _ALIGN(64) edata[48];
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t first_nonce = pdata[DECRED_NONCE_INDEX];
uint32_t n = first_nonce;
const uint32_t HTarget = opt_benchmark ? 0x7f : ptarget[7];
int thr_id = mythr->id; // thr_id arg is deprecated
// copy to buffer guaranteed to be aligned.
memcpy( edata, pdata, 180 );
// use the old way until new way updated for size.
mm128_intrlv_4x32x( vdata, edata, edata, edata, edata, 180*8 );
blake256_4way_init( &blake_mid );
blake256_4way_update( &blake_mid, vdata, DECRED_MIDSTATE_LEN );
uint32_t *noncep = vdata + DECRED_NONCE_INDEX * 4;
do {
* noncep = n;
*(noncep+1) = n+1;
*(noncep+2) = n+2;
*(noncep+3) = n+3;
decred_hash_4way( hash, vdata );
for ( int i = 0; i < 4; i++ )
if ( (hash+(i<<3))[7] <= HTarget )
if ( fulltest( hash+(i<<3), ptarget ) && !opt_benchmark )
{
pdata[DECRED_NONCE_INDEX] = n+i;
submit_solution( work, hash+(i<<3), mythr );
}
n += 4;
} while ( (n < max_nonce) && !work_restart[thr_id].restart );
*hashes_done = n - first_nonce + 1;
return 0;
}
#endif

View File

@@ -1,168 +0,0 @@
#include "decred-gate.h"
#include <unistd.h>
#include <memory.h>
#include <string.h>
uint32_t *decred_get_nonceptr( uint32_t *work_data )
{
return &work_data[ DECRED_NONCE_INDEX ];
}
long double decred_calc_network_diff( struct work* work )
{
// sample for diff 43.281 : 1c05ea29
// todo: endian reversed on longpoll could be zr5 specific...
uint32_t nbits = work->data[ DECRED_NBITS_INDEX ];
uint32_t bits = ( nbits & 0xffffff );
int16_t shift = ( swab32(nbits) & 0xff ); // 0x1c = 28
int m;
long double d = (long double)0x0000ffff / (long double)bits;
for ( m = shift; m < 29; m++ )
d *= 256.0;
for ( m = 29; m < shift; m++ )
d /= 256.0;
if ( shift == 28 )
d *= 256.0; // testnet
if ( opt_debug_diff )
applog( LOG_DEBUG, "net diff: %f -> shift %u, bits %08x", (double)d,
shift, bits );
return net_diff;
}
void decred_decode_extradata( struct work* work, uint64_t* net_blocks )
{
// some random extradata to make the work unique
work->data[ DECRED_XNONCE_INDEX ] = (rand()*4);
work->height = work->data[32];
if (!have_longpoll && work->height > *net_blocks + 1)
{
char netinfo[64] = { 0 };
if ( net_diff > 0. )
{
if (net_diff != work->targetdiff)
sprintf(netinfo, ", diff %.3f, target %.1f", net_diff,
work->targetdiff);
else
sprintf(netinfo, ", diff %.3f", net_diff);
}
applog(LOG_BLUE, "%s block %d%s", algo_names[opt_algo], work->height,
netinfo);
*net_blocks = work->height - 1;
}
}
void decred_be_build_stratum_request( char *req, struct work *work,
struct stratum_ctx *sctx )
{
unsigned char *xnonce2str;
uint32_t ntime, nonce;
char ntimestr[9], noncestr[9];
be32enc( &ntime, work->data[ DECRED_NTIME_INDEX ] );
be32enc( &nonce, work->data[ DECRED_NONCE_INDEX ] );
bin2hex( ntimestr, (char*)(&ntime), sizeof(uint32_t) );
bin2hex( noncestr, (char*)(&nonce), sizeof(uint32_t) );
xnonce2str = abin2hex( (char*)( &work->data[ DECRED_XNONCE_INDEX ] ),
sctx->xnonce1_size );
snprintf( req, JSON_BUF_LEN,
"{\"method\": \"mining.submit\", \"params\": [\"%s\", \"%s\", \"%s\", \"%s\", \"%s\"], \"id\":4}",
rpc_user, work->job_id, xnonce2str, ntimestr, noncestr );
free(xnonce2str);
}
#define min(a,b) (a>b ? (b) :(a))
void decred_build_extraheader( struct work* g_work, struct stratum_ctx* sctx )
{
uchar merkle_root[64] = { 0 };
uint32_t extraheader[32] = { 0 };
int headersize = 0;
uint32_t* extradata = (uint32_t*) sctx->xnonce1;
int i;
// getwork over stratum, getwork merkle + header passed in coinb1
memcpy(merkle_root, sctx->job.coinbase, 32);
headersize = min((int)sctx->job.coinbase_size - 32,
sizeof(extraheader) );
memcpy( extraheader, &sctx->job.coinbase[32], headersize );
// Assemble block header
memset( g_work->data, 0, sizeof(g_work->data) );
g_work->data[0] = le32dec( sctx->job.version );
for ( i = 0; i < 8; i++ )
g_work->data[1 + i] = swab32(
le32dec( (uint32_t *) sctx->job.prevhash + i ) );
for ( i = 0; i < 8; i++ )
g_work->data[9 + i] = swab32( be32dec( (uint32_t *) merkle_root + i ) );
// for ( i = 0; i < 8; i++ ) // prevhash
// g_work->data[1 + i] = swab32( g_work->data[1 + i] );
// for ( i = 0; i < 8; i++ ) // merkle
// g_work->data[9 + i] = swab32( g_work->data[9 + i] );
for ( i = 0; i < headersize/4; i++ ) // header
g_work->data[17 + i] = extraheader[i];
// extradata
for ( i = 0; i < sctx->xnonce1_size/4; i++ )
g_work->data[ DECRED_XNONCE_INDEX + i ] = extradata[i];
for ( i = DECRED_XNONCE_INDEX + sctx->xnonce1_size/4; i < 45; i++ )
g_work->data[i] = 0;
g_work->data[37] = (rand()*4) << 8;
// block header suffix from coinb2 (stake version)
memcpy( &g_work->data[44],
&sctx->job.coinbase[ sctx->job.coinbase_size-4 ], 4 );
sctx->block_height = g_work->data[32];
//applog_hex(work->data, 180);
//applog_hex(&work->data[36], 36);
}
#undef min
bool decred_ready_to_mine( struct work* work, struct stratum_ctx* stratum,
int thr_id )
{
if ( have_stratum && strcmp(stratum->job.job_id, work->job_id) )
// need to regen g_work..
return false;
if ( have_stratum && !work->data[0] && !opt_benchmark )
{
sleep(1);
return false;
}
// extradata: prevent duplicates
work->data[ DECRED_XNONCE_INDEX ] += 1;
work->data[ DECRED_XNONCE_INDEX + 1 ] |= thr_id;
return true;
}
int decred_get_work_data_size() { return DECRED_DATA_SIZE; }
bool register_decred_algo( algo_gate_t* gate )
{
#if defined(DECRED_4WAY)
four_way_not_tested();
gate->scanhash = (void*)&scanhash_decred_4way;
gate->hash = (void*)&decred_hash_4way;
#else
gate->scanhash = (void*)&scanhash_decred;
gate->hash = (void*)&decred_hash;
#endif
gate->optimizations = AVX2_OPT;
// gate->get_nonceptr = (void*)&decred_get_nonceptr;
gate->decode_extra_data = (void*)&decred_decode_extradata;
gate->build_stratum_request = (void*)&decred_be_build_stratum_request;
gate->work_decode = (void*)&std_be_work_decode;
gate->submit_getwork_result = (void*)&std_be_submit_getwork_result;
gate->build_extraheader = (void*)&decred_build_extraheader;
gate->ready_to_mine = (void*)&decred_ready_to_mine;
gate->nbits_index = DECRED_NBITS_INDEX;
gate->ntime_index = DECRED_NTIME_INDEX;
gate->nonce_index = DECRED_NONCE_INDEX;
gate->get_work_data_size = (void*)&decred_get_work_data_size;
gate->work_cmp_size = DECRED_WORK_COMPARE_SIZE;
allow_mininginfo = false;
have_gbt = false;
return true;
}

View File

@@ -1,36 +0,0 @@
#ifndef __DECRED_GATE_H__
#define __DECRED_GATE_H__
#include "algo-gate-api.h"
#include <stdint.h>
#define DECRED_NBITS_INDEX 29
#define DECRED_NTIME_INDEX 34
#define DECRED_NONCE_INDEX 35
#define DECRED_XNONCE_INDEX 36
#define DECRED_DATA_SIZE 192
#define DECRED_WORK_COMPARE_SIZE 140
#define DECRED_MIDSTATE_LEN 128
#if defined (__AVX2__)
//void blakehash_84way(void *state, const void *input);
//int scanhash_blake_8way( struct work *work, uint32_t max_nonce,
// uint64_t *hashes_done );
#endif
#if defined(__SSE4_2__)
#define DECRED_4WAY
#endif
#if defined (DECRED_4WAY)
void decred_hash_4way(void *state, const void *input);
int scanhash_decred_4way( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#endif
void decred_hash( void *state, const void *input );
int scanhash_decred( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr );
#endif

View File

@@ -1,282 +0,0 @@
#include "decred-gate.h"
#if !defined(DECRED_8WAY) && !defined(DECRED_4WAY)
#include "sph_blake.h"
#include <string.h>
#include <stdint.h>
#include <memory.h>
#include <unistd.h>
/*
#ifndef min
#define min(a,b) (a>b ? b : a)
#endif
#ifndef max
#define max(a,b) (a<b ? b : a)
#endif
*/
/*
#define DECRED_NBITS_INDEX 29
#define DECRED_NTIME_INDEX 34
#define DECRED_NONCE_INDEX 35
#define DECRED_XNONCE_INDEX 36
#define DECRED_DATA_SIZE 192
#define DECRED_WORK_COMPARE_SIZE 140
*/
static __thread sph_blake256_context blake_mid;
static __thread bool ctx_midstate_done = false;
void decred_hash(void *state, const void *input)
{
// #define MIDSTATE_LEN 128
sph_blake256_context ctx __attribute__ ((aligned (64)));
uint8_t *ending = (uint8_t*) input;
ending += DECRED_MIDSTATE_LEN;
if (!ctx_midstate_done) {
sph_blake256_init(&blake_mid);
sph_blake256(&blake_mid, input, DECRED_MIDSTATE_LEN);
ctx_midstate_done = true;
}
memcpy(&ctx, &blake_mid, sizeof(blake_mid));
sph_blake256(&ctx, ending, (180 - DECRED_MIDSTATE_LEN));
sph_blake256_close(&ctx, state);
}
void decred_hash_simple(void *state, const void *input)
{
sph_blake256_context ctx;
sph_blake256_init(&ctx);
sph_blake256(&ctx, input, 180);
sph_blake256_close(&ctx, state);
}
int scanhash_decred( struct work *work, uint32_t max_nonce,
uint64_t *hashes_done, struct thr_info *mythr )
{
uint32_t _ALIGN(64) endiandata[48];
uint32_t _ALIGN(64) hash32[8];
uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
int thr_id = mythr->id; // thr_id arg is deprecated
// #define DCR_NONCE_OFT32 35
const uint32_t first_nonce = pdata[DECRED_NONCE_INDEX];
const uint32_t HTarget = opt_benchmark ? 0x7f : ptarget[7];
uint32_t n = first_nonce;
ctx_midstate_done = false;
#if 1
memcpy(endiandata, pdata, 180);
#else
for (int k=0; k < (180/4); k++)
be32enc(&endiandata[k], pdata[k]);
#endif
do {
//be32enc(&endiandata[DCR_NONCE_OFT32], n);
endiandata[DECRED_NONCE_INDEX] = n;
decred_hash(hash32, endiandata);
if (hash32[7] <= HTarget && fulltest(hash32, ptarget))
{
pdata[DECRED_NONCE_INDEX] = n;
submit_solution( work, hash32, mythr );
}
n++;
} while (n < max_nonce && !work_restart[thr_id].restart);
*hashes_done = n - first_nonce + 1;
pdata[DECRED_NONCE_INDEX] = n;
return 0;
}
/*
uint32_t *decred_get_nonceptr( uint32_t *work_data )
{
return &work_data[ DECRED_NONCE_INDEX ];
}
double decred_calc_network_diff( struct work* work )
{
// sample for diff 43.281 : 1c05ea29
// todo: endian reversed on longpoll could be zr5 specific...
uint32_t nbits = work->data[ DECRED_NBITS_INDEX ];
uint32_t bits = ( nbits & 0xffffff );
int16_t shift = ( swab32(nbits) & 0xff ); // 0x1c = 28
int m;
double d = (double)0x0000ffff / (double)bits;
for ( m = shift; m < 29; m++ )
d *= 256.0;
for ( m = 29; m < shift; m++ )
d /= 256.0;
if ( shift == 28 )
d *= 256.0; // testnet
if ( opt_debug_diff )
applog( LOG_DEBUG, "net diff: %f -> shift %u, bits %08x", d,
shift, bits );
return net_diff;
}
void decred_decode_extradata( struct work* work, uint64_t* net_blocks )
{
// some random extradata to make the work unique
work->data[ DECRED_XNONCE_INDEX ] = (rand()*4);
work->height = work->data[32];
if (!have_longpoll && work->height > *net_blocks + 1)
{
char netinfo[64] = { 0 };
if (net_diff > 0.)
{
if (net_diff != work->targetdiff)
sprintf(netinfo, ", diff %.3f, target %.1f", net_diff,
work->targetdiff);
else
sprintf(netinfo, ", diff %.3f", net_diff);
}
applog(LOG_BLUE, "%s block %d%s", algo_names[opt_algo], work->height,
netinfo);
*net_blocks = work->height - 1;
}
}
void decred_be_build_stratum_request( char *req, struct work *work,
struct stratum_ctx *sctx )
{
unsigned char *xnonce2str;
uint32_t ntime, nonce;
char ntimestr[9], noncestr[9];
be32enc( &ntime, work->data[ DECRED_NTIME_INDEX ] );
be32enc( &nonce, work->data[ DECRED_NONCE_INDEX ] );
bin2hex( ntimestr, (char*)(&ntime), sizeof(uint32_t) );
bin2hex( noncestr, (char*)(&nonce), sizeof(uint32_t) );
xnonce2str = abin2hex( (char*)( &work->data[ DECRED_XNONCE_INDEX ] ),
sctx->xnonce1_size );
snprintf( req, JSON_BUF_LEN,
"{\"method\": \"mining.submit\", \"params\": [\"%s\", \"%s\", \"%s\", \"%s\", \"%s\"], \"id\":4}",
rpc_user, work->job_id, xnonce2str, ntimestr, noncestr );
free(xnonce2str);
}
*/
/*
// data shared between gen_merkle_root and build_extraheader.
__thread uint32_t decred_extraheader[32] = { 0 };
__thread int decred_headersize = 0;
void decred_gen_merkle_root( char* merkle_root, struct stratum_ctx* sctx )
{
// getwork over stratum, getwork merkle + header passed in coinb1
memcpy(merkle_root, sctx->job.coinbase, 32);
decred_headersize = min((int)sctx->job.coinbase_size - 32,
sizeof(decred_extraheader) );
memcpy( decred_extraheader, &sctx->job.coinbase[32], decred_headersize);
}
*/
/*
#define min(a,b) (a>b ? (b) :(a))
void decred_build_extraheader( struct work* g_work, struct stratum_ctx* sctx )
{
uchar merkle_root[64] = { 0 };
uint32_t extraheader[32] = { 0 };
int headersize = 0;
uint32_t* extradata = (uint32_t*) sctx->xnonce1;
size_t t;
int i;
// getwork over stratum, getwork merkle + header passed in coinb1
memcpy(merkle_root, sctx->job.coinbase, 32);
headersize = min((int)sctx->job.coinbase_size - 32,
sizeof(extraheader) );
memcpy( extraheader, &sctx->job.coinbase[32], headersize );
// Increment extranonce2
for ( t = 0; t < sctx->xnonce2_size && !( ++sctx->job.xnonce2[t] ); t++ );
// Assemble block header
memset( g_work->data, 0, sizeof(g_work->data) );
g_work->data[0] = le32dec( sctx->job.version );
for ( i = 0; i < 8; i++ )
g_work->data[1 + i] = swab32(
le32dec( (uint32_t *) sctx->job.prevhash + i ) );
for ( i = 0; i < 8; i++ )
g_work->data[9 + i] = swab32( be32dec( (uint32_t *) merkle_root + i ) );
// for ( i = 0; i < 8; i++ ) // prevhash
// g_work->data[1 + i] = swab32( g_work->data[1 + i] );
// for ( i = 0; i < 8; i++ ) // merkle
// g_work->data[9 + i] = swab32( g_work->data[9 + i] );
for ( i = 0; i < headersize/4; i++ ) // header
g_work->data[17 + i] = extraheader[i];
// extradata
for ( i = 0; i < sctx->xnonce1_size/4; i++ )
g_work->data[ DECRED_XNONCE_INDEX + i ] = extradata[i];
for ( i = DECRED_XNONCE_INDEX + sctx->xnonce1_size/4; i < 45; i++ )
g_work->data[i] = 0;
g_work->data[37] = (rand()*4) << 8;
// block header suffix from coinb2 (stake version)
memcpy( &g_work->data[44],
&sctx->job.coinbase[ sctx->job.coinbase_size-4 ], 4 );
sctx->bloc_height = g_work->data[32];
//applog_hex(work->data, 180);
//applog_hex(&work->data[36], 36);
}
#undef min
bool decred_ready_to_mine( struct work* work, struct stratum_ctx* stratum,
int thr_id )
{
if ( have_stratum && strcmp(stratum->job.job_id, work->job_id) )
// need to regen g_work..
return false;
if ( have_stratum && !work->data[0] && !opt_benchmark )
{
sleep(1);
return false;
}
// extradata: prevent duplicates
work->data[ DECRED_XNONCE_INDEX ] += 1;
work->data[ DECRED_XNONCE_INDEX + 1 ] |= thr_id;
return true;
}
bool register_decred_algo( algo_gate_t* gate )
{
gate->optimizations = SSE2_OPT;
gate->scanhash = (void*)&scanhash_decred;
gate->hash = (void*)&decred_hash;
gate->get_nonceptr = (void*)&decred_get_nonceptr;
gate->decode_extra_data = (void*)&decred_decode_extradata;
gate->build_stratum_request = (void*)&decred_be_build_stratum_request;
gate->work_decode = (void*)&std_be_work_decode;
gate->submit_getwork_result = (void*)&std_be_submit_getwork_result;
gate->build_extraheader = (void*)&decred_build_extraheader;
gate->ready_to_mine = (void*)&decred_ready_to_mine;
gate->nbits_index = DECRED_NBITS_INDEX;
gate->ntime_index = DECRED_NTIME_INDEX;
gate->nonce_index = DECRED_NONCE_INDEX;
gate->work_data_size = DECRED_DATA_SIZE;
gate->work_cmp_size = DECRED_WORK_COMPARE_SIZE;
allow_mininginfo = false;
have_gbt = false;
return true;
}
*/
#endif

View File

@@ -1,14 +1,12 @@
#include "pentablake-gate.h"
#if defined (__AVX2__)
#if defined(PENTABLAKE_4WAY)
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include "blake-hash-4way.h"
#include "sph_blake.h"
#include "blake512-hash.h"
extern void pentablakehash_4way( void *output, const void *input )
{

View File

@@ -4,9 +4,10 @@
#include "algo-gate-api.h"
#include <stdint.h>
#if defined(__AVX2__)
#define PENTABLAKE_4WAY
#endif
// 4way is broken
//#if defined(__AVX2__)
// #define PENTABLAKE_4WAY
//#endif
#if defined(PENTABLAKE_4WAY)
void pentablakehash_4way( void *state, const void *input );

View File

@@ -14,8 +14,9 @@
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include "algo/sha/sph_types.h"
#include "simd-utils.h"
#include "compat/sph_types.h"
#include "compat.h"
#include "sph-blake2s.h"
static const uint32_t blake2s_IV[8] =
@@ -208,8 +209,8 @@ int blake2s_init_key( blake2s_state *S, const uint8_t outlen, const void *key, c
int blake2s_compress( blake2s_state *S, const uint8_t block[BLAKE2S_BLOCKBYTES] )
{
uint32_t m[16];
uint32_t v[16];
uint32_t _ALIGN(32) m[16];
uint32_t _ALIGN(32) v[16];
for( size_t i = 0; i < 16; ++i )
m[i] = load32( block + i * sizeof( m[i] ) );
@@ -225,6 +226,58 @@ int blake2s_compress( blake2s_state *S, const uint8_t block[BLAKE2S_BLOCKBYTES]
v[13] = S->t[1] ^ blake2s_IV[5];
v[14] = S->f[0] ^ blake2s_IV[6];
v[15] = S->f[1] ^ blake2s_IV[7];
#if defined(__SSE2__)
__m128i *V = (__m128i*)v;
#define BLAKE2S_ROUND( r ) \
V[0] = _mm_add_epi32( V[0], _mm_add_epi32( V[1], _mm_set_epi32( \
m[blake2s_sigma[r][ 6]], m[blake2s_sigma[r][ 4]], \
m[blake2s_sigma[r][ 2]], m[blake2s_sigma[r][ 0]] ) ) ); \
V[3] = mm128_swap32_16( _mm_xor_si128( V[3], V[0] ) ); \
V[2] = _mm_add_epi32( V[2], V[3] ); \
V[1] = mm128_ror_32( _mm_xor_si128( V[1], V[2] ), 12 ); \
V[0] = _mm_add_epi32( V[0], _mm_add_epi32( V[1], _mm_set_epi32( \
m[blake2s_sigma[r][ 7]], m[blake2s_sigma[r][ 5]], \
m[blake2s_sigma[r][ 3]], m[blake2s_sigma[r][ 1]] ) ) ); \
V[3] = mm128_shuflr32_8( _mm_xor_si128( V[3], V[0] ) ); \
V[2] = _mm_add_epi32( V[2], V[3] ); \
V[1] = mm128_ror_32( _mm_xor_si128( V[1], V[2] ), 7 ); \
V[0] = mm128_shufll_32( V[0] ); \
V[3] = mm128_swap_64( V[3] ); \
V[2] = mm128_shuflr_32( V[2] ); \
V[0] = _mm_add_epi32( V[0], _mm_add_epi32( V[1], _mm_set_epi32( \
m[blake2s_sigma[r][12]], m[blake2s_sigma[r][10]], \
m[blake2s_sigma[r][ 8]], m[blake2s_sigma[r][14]] ) ) ); \
V[3] = mm128_swap32_16( _mm_xor_si128( V[3], V[0] ) ); \
V[2] = _mm_add_epi32( V[2], V[3] ); \
V[1] = mm128_ror_32( _mm_xor_si128( V[1], V[2] ), 12 ); \
V[0] = _mm_add_epi32( V[0], _mm_add_epi32( V[1], _mm_set_epi32( \
m[blake2s_sigma[r][13]], m[blake2s_sigma[r][11]], \
m[blake2s_sigma[r][ 9]], m[blake2s_sigma[r][15]] ) ) ); \
V[3] = mm128_shuflr32_8( _mm_xor_si128( V[3], V[0] ) ); \
V[2] = _mm_add_epi32( V[2], V[3] ); \
V[1] = mm128_ror_32( _mm_xor_si128( V[1], V[2] ), 7 ); \
V[0] = mm128_shuflr_32( V[0] ); \
V[3] = mm128_swap_64( V[3] ); \
V[2] = mm128_shufll_32( V[2] )
BLAKE2S_ROUND(0);
BLAKE2S_ROUND(1);
BLAKE2S_ROUND(2);
BLAKE2S_ROUND(3);
BLAKE2S_ROUND(4);
BLAKE2S_ROUND(5);
BLAKE2S_ROUND(6);
BLAKE2S_ROUND(7);
BLAKE2S_ROUND(8);
BLAKE2S_ROUND(9);
#undef BLAKE2S_ROUND
#else
#define G(r,i,a,b,c,d) \
do { \
a = a + b + m[blake2s_sigma[r][2*i+0]]; \
@@ -236,6 +289,7 @@ int blake2s_compress( blake2s_state *S, const uint8_t block[BLAKE2S_BLOCKBYTES]
c = c + d; \
b = SPH_ROTR32(b ^ c, 7); \
} while(0)
#define ROUND(r) \
do { \
G(r,0,v[ 0],v[ 4],v[ 8],v[12]); \
@@ -247,7 +301,8 @@ int blake2s_compress( blake2s_state *S, const uint8_t block[BLAKE2S_BLOCKBYTES]
G(r,6,v[ 2],v[ 7],v[ 8],v[13]); \
G(r,7,v[ 3],v[ 4],v[ 9],v[14]); \
} while(0)
ROUND( 0 );
ROUND( 0 );
ROUND( 1 );
ROUND( 2 );
ROUND( 3 );
@@ -258,6 +313,8 @@ int blake2s_compress( blake2s_state *S, const uint8_t block[BLAKE2S_BLOCKBYTES]
ROUND( 8 );
ROUND( 9 );
#endif
for( size_t i = 0; i < 8; ++i )
S->h[i] = S->h[i] ^ v[i] ^ v[i + 8];

View File

@@ -630,6 +630,69 @@ static const sph_u64 CB[16] = {
H7 ^= S3 ^ V7 ^ VF; \
} while (0)
#define COMPRESS32_LE do { \
sph_u32 M0, M1, M2, M3, M4, M5, M6, M7; \
sph_u32 M8, M9, MA, MB, MC, MD, ME, MF; \
sph_u32 V0, V1, V2, V3, V4, V5, V6, V7; \
sph_u32 V8, V9, VA, VB, VC, VD, VE, VF; \
V0 = H0; \
V1 = H1; \
V2 = H2; \
V3 = H3; \
V4 = H4; \
V5 = H5; \
V6 = H6; \
V7 = H7; \
V8 = S0 ^ CS0; \
V9 = S1 ^ CS1; \
VA = S2 ^ CS2; \
VB = S3 ^ CS3; \
VC = T0 ^ CS4; \
VD = T0 ^ CS5; \
VE = T1 ^ CS6; \
VF = T1 ^ CS7; \
M0 = *((uint32_t*)(buf + 0)); \
M1 = *((uint32_t*)(buf + 4)); \
M2 = *((uint32_t*)(buf + 8)); \
M3 = *((uint32_t*)(buf + 12)); \
M4 = *((uint32_t*)(buf + 16)); \
M5 = *((uint32_t*)(buf + 20)); \
M6 = *((uint32_t*)(buf + 24)); \
M7 = *((uint32_t*)(buf + 28)); \
M8 = *((uint32_t*)(buf + 32)); \
M9 = *((uint32_t*)(buf + 36)); \
MA = *((uint32_t*)(buf + 40)); \
MB = *((uint32_t*)(buf + 44)); \
MC = *((uint32_t*)(buf + 48)); \
MD = *((uint32_t*)(buf + 52)); \
ME = *((uint32_t*)(buf + 56)); \
MF = *((uint32_t*)(buf + 60)); \
ROUND_S(0); \
ROUND_S(1); \
ROUND_S(2); \
ROUND_S(3); \
ROUND_S(4); \
ROUND_S(5); \
ROUND_S(6); \
ROUND_S(7); \
if (BLAKE32_ROUNDS == 14) { \
ROUND_S(8); \
ROUND_S(9); \
ROUND_S(0); \
ROUND_S(1); \
ROUND_S(2); \
ROUND_S(3); \
} \
H0 ^= S0 ^ V0 ^ V8; \
H1 ^= S1 ^ V1 ^ V9; \
H2 ^= S2 ^ V2 ^ VA; \
H3 ^= S3 ^ V3 ^ VB; \
H4 ^= S0 ^ V4 ^ VC; \
H5 ^= S1 ^ V5 ^ VD; \
H6 ^= S2 ^ V6 ^ VE; \
H7 ^= S3 ^ V7 ^ VF; \
} while (0)
#endif
#if SPH_64
@@ -843,6 +906,45 @@ blake32(sph_blake_small_context *sc, const void *data, size_t len)
sc->ptr = ptr;
}
static void
blake32_le(sph_blake_small_context *sc, const void *data, size_t len)
{
unsigned char *buf;
size_t ptr;
DECL_STATE32
buf = sc->buf;
ptr = sc->ptr;
if (len < (sizeof sc->buf) - ptr) {
memcpy(buf + ptr, data, len);
ptr += len;
sc->ptr = ptr;
return;
}
READ_STATE32(sc);
while (len > 0) {
size_t clen;
clen = (sizeof sc->buf) - ptr;
if (clen > len)
clen = len;
memcpy(buf + ptr, data, clen);
ptr += clen;
data = (const unsigned char *)data + clen;
len -= clen;
if (ptr == sizeof sc->buf) {
if ((T0 = SPH_T32(T0 + 512)) < 512)
T1 = SPH_T32(T1 + 1);
COMPRESS32_LE;
ptr = 0;
}
}
WRITE_STATE32(sc);
sc->ptr = ptr;
}
static void
blake32_close(sph_blake_small_context *sc,
unsigned ub, unsigned n, void *dst, size_t out_size_w32)
@@ -1050,6 +1152,12 @@ sph_blake256(void *cc, const void *data, size_t len)
blake32(cc, data, len);
}
void
sph_blake256_update_le(void *cc, const void *data, size_t len)
{
blake32_le(cc, data, len);
}
/* see sph_blake.h */
void
sph_blake256_close(void *cc, void *dst)

View File

@@ -42,7 +42,7 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
/**
* Output size (in bits) for BLAKE-224.
@@ -198,6 +198,7 @@ void sph_blake256_init(void *cc);
* @param len the input data length (in bytes)
*/
void sph_blake256(void *cc, const void *data, size_t len);
void sph_blake256_update_le(void *cc, const void *data, size_t len);
/**
* Terminate the current BLAKE-256 computation and output the result into

View File

@@ -30,18 +30,11 @@
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include "algo/sha/sph_types.h"
#include "simd-utils.h"
#include "compat/sph_types.h"
#include "sph_blake2b.h"
// Cyclic right rotation.
#ifndef ROTR64
#define ROTR64(x, y) (((x) >> (y)) ^ ((x) << (64 - (y))))
#endif
// Little-endian byte access.
#define B2B_GET64(p) \
(((uint64_t) ((uint8_t *) (p))[0]) ^ \
(((uint64_t) ((uint8_t *) (p))[1]) << 8) ^ \
@@ -52,47 +45,160 @@
(((uint64_t) ((uint8_t *) (p))[6]) << 48) ^ \
(((uint64_t) ((uint8_t *) (p))[7]) << 56))
// G Mixing function.
#if defined(__AVX2__)
#define B2B_G(a, b, c, d, x, y) { \
v[a] = v[a] + v[b] + x; \
v[d] = ROTR64(v[d] ^ v[a], 32); \
v[c] = v[c] + v[d]; \
v[b] = ROTR64(v[b] ^ v[c], 24); \
v[a] = v[a] + v[b] + y; \
v[d] = ROTR64(v[d] ^ v[a], 16); \
v[c] = v[c] + v[d]; \
v[b] = ROTR64(v[b] ^ v[c], 63); }
#define BLAKE2B_G( Sa, Sb, Sc, Sd, Se, Sf, Sg, Sh ) \
{ \
V[0] = _mm256_add_epi64( V[0], _mm256_add_epi64( V[1], \
_mm256_set_epi64x( m[ sigmaR[ Sg ] ], m[ sigmaR[ Se ] ], \
m[ sigmaR[ Sc ] ], m[ sigmaR[ Sa ] ] ) ) ); \
V[3] = mm256_swap64_32( _mm256_xor_si256( V[3], V[0] ) ); \
V[2] = _mm256_add_epi64( V[2], V[3] ); \
V[1] = mm256_shuflr64_24( _mm256_xor_si256( V[1], V[2] ) ); \
\
V[0] = _mm256_add_epi64( V[0], _mm256_add_epi64( V[1], \
_mm256_set_epi64x( m[ sigmaR[ Sh ] ], m[ sigmaR[ Sf ] ], \
m[ sigmaR[ Sd ] ], m[ sigmaR[ Sb ] ] ) ) ); \
V[3] = mm256_shuflr64_16( _mm256_xor_si256( V[3], V[0] ) ); \
V[2] = _mm256_add_epi64( V[2], V[3] ); \
V[1] = mm256_ror_64( _mm256_xor_si256( V[1], V[2] ), 63 ); \
}
// Pivot about V[1] instead of V[0] reduces latency.
#define BLAKE2B_ROUND( R ) \
{ \
__m256i *V = (__m256i*)v; \
const uint8_t *sigmaR = sigma[R]; \
BLAKE2B_G( 0, 1, 2, 3, 4, 5, 6, 7 ); \
V[0] = mm256_shufll_64( V[0] ); \
V[3] = mm256_swap_128( V[3] ); \
V[2] = mm256_shuflr_64( V[2] ); \
BLAKE2B_G( 14, 15, 8, 9, 10, 11, 12, 13 ); \
V[0] = mm256_shuflr_64( V[0] ); \
V[3] = mm256_swap_128( V[3] ); \
V[2] = mm256_shufll_64( V[2] ); \
}
/*
#define BLAKE2B_ROUND( R ) \
{ \
__m256i *V = (__m256i*)v; \
const uint8_t *sigmaR = sigma[R]; \
BLAKE2B_G( 0, 1, 2, 3, 4, 5, 6, 7 ); \
V[3] = mm256_shufll_64( V[3] ); \
V[2] = mm256_swap_128( V[2] ); \
V[1] = mm256_shuflr_64( V[1] ); \
BLAKE2B_G( 8, 9, 10, 11, 12, 13, 14, 15 ); \
V[3] = mm256_shuflr_64( V[3] ); \
V[2] = mm256_swap_128( V[2] ); \
V[1] = mm256_shufll_64( V[1] ); \
}
*/
#elif defined(__SSE2__)
// always true
#define BLAKE2B_G( Va, Vb, Vc, Vd, Sa, Sb, Sc, Sd ) \
{ \
Va = _mm_add_epi64( Va, _mm_add_epi64( Vb, \
_mm_set_epi64x( m[ sigmaR[ Sc ] ], m[ sigmaR[ Sa ] ] ) ) ); \
Vd = mm128_swap64_32( _mm_xor_si128( Vd, Va ) ); \
Vc = _mm_add_epi64( Vc, Vd ); \
Vb = mm128_shuflr64_24( _mm_xor_si128( Vb, Vc ) ); \
\
Va = _mm_add_epi64( Va, _mm_add_epi64( Vb, \
_mm_set_epi64x( m[ sigmaR[ Sd ] ], m[ sigmaR[ Sb ] ] ) ) ); \
Vd = mm128_shuflr64_16( _mm_xor_si128( Vd, Va ) ); \
Vc = _mm_add_epi64( Vc, Vd ); \
Vb = mm128_ror_64( _mm_xor_si128( Vb, Vc ), 63 ); \
}
#define BLAKE2B_ROUND( R ) \
{ \
__m128i *V = (__m128i*)v; \
__m128i V2, V3, V6, V7; \
const uint8_t *sigmaR = sigma[R]; \
BLAKE2B_G( V[0], V[2], V[4], V[6], 0, 1, 2, 3 ); \
BLAKE2B_G( V[1], V[3], V[5], V[7], 4, 5, 6, 7 ); \
V2 = mm128_alignr_64( V[3], V[2], 1 ); \
V3 = mm128_alignr_64( V[2], V[3], 1 ); \
V6 = mm128_alignr_64( V[6], V[7], 1 ); \
V7 = mm128_alignr_64( V[7], V[6], 1 ); \
BLAKE2B_G( V[0], V2, V[5], V6, 8, 9, 10, 11 ); \
BLAKE2B_G( V[1], V3, V[4], V7, 12, 13, 14, 15 ); \
V[2] = mm128_alignr_64( V2, V3, 1 ); \
V[3] = mm128_alignr_64( V3, V2, 1 ); \
V[6] = mm128_alignr_64( V7, V6, 1 ); \
V[7] = mm128_alignr_64( V6, V7, 1 ); \
}
#else
// never used, SSE2 is always available
#ifndef ROTR64
#define ROTR64(x, y) (((x) >> (y)) ^ ((x) << (64 - (y))))
#endif
#define BLAKE2B_G( R, Va, Vb, Vc, Vd, Sa, Sb ) \
{ \
Va = Va + Vb + m[ sigma[R][Sa] ]; \
Vd = ROTR64( Vd ^ Va, 32 ); \
Vc = Vc + Vd; \
Vb = ROTR64( Vb ^ Vc, 24 ); \
\
Va = Va + Vb + m[ sigma[R][Sb] ]; \
Vd = ROTR64( Vd ^ Va, 16 ); \
Vc = Vc + Vd; \
Vb = ROTR64( Vb ^ Vc, 63 ); \
}
#define BLAKE2B_ROUND( R ) \
{ \
BLAKE2B_G( R, v[ 0], v[ 4], v[ 8], v[12], 0, 1 ); \
BLAKE2B_G( R, v[ 1], v[ 5], v[ 9], v[13], 2, 3 ); \
BLAKE2B_G( R, v[ 2], v[ 6], v[10], v[14], 4, 5 ); \
BLAKE2B_G( R, v[ 3], v[ 7], v[11], v[15], 6, 7 ); \
BLAKE2B_G( R, v[ 0], v[ 5], v[10], v[15], 8, 9 ); \
BLAKE2B_G( R, v[ 1], v[ 6], v[11], v[12], 10, 11 ); \
BLAKE2B_G( R, v[ 2], v[ 7], v[ 8], v[13], 12, 13 ); \
BLAKE2B_G( R, v[ 3], v[ 4], v[ 9], v[14], 14, 15 ); \
}
#endif
// Initialization Vector.
static const uint64_t blake2b_iv[8] = {
static const uint64_t blake2b_iv[8] __attribute__ ((aligned (32))) =
{
0x6A09E667F3BCC908, 0xBB67AE8584CAA73B,
0x3C6EF372FE94F82B, 0xA54FF53A5F1D36F1,
0x510E527FADE682D1, 0x9B05688C2B3E6C1F,
0x1F83D9ABFB41BD6B, 0x5BE0CD19137E2179
};
static const uint8_t sigma[12][16] __attribute__ ((aligned (32))) =
{
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
{ 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 },
{ 11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4 },
{ 7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8 },
{ 9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13 },
{ 2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9 },
{ 12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11 },
{ 13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10 },
{ 6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5 },
{ 10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13, 0 },
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
{ 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 }
};
// Compression function. "last" flag indicates last block.
static void blake2b_compress( sph_blake2b_ctx *ctx, int last )
{
const uint8_t sigma[12][16] = {
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
{ 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 },
{ 11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4 },
{ 7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8 },
{ 9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13 },
{ 2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9 },
{ 12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11 },
{ 13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10 },
{ 6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5 },
{ 10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13, 0 },
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
{ 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 }
};
int i;
uint64_t v[16], m[16];
uint64_t v[16] __attribute__ ((aligned (32)));
uint64_t m[16] __attribute__ ((aligned (32)));
int i;
for (i = 0; i < 8; i++) { // init work variables
v[i] = ctx->h[i];
@@ -106,16 +212,8 @@ static void blake2b_compress( sph_blake2b_ctx *ctx, int last )
for (i = 0; i < 16; i++) // get little-endian words
m[i] = B2B_GET64(&ctx->b[8 * i]);
for (i = 0; i < 12; i++) { // twelve rounds
B2B_G( 0, 4, 8, 12, m[sigma[i][ 0]], m[sigma[i][ 1]]);
B2B_G( 1, 5, 9, 13, m[sigma[i][ 2]], m[sigma[i][ 3]]);
B2B_G( 2, 6, 10, 14, m[sigma[i][ 4]], m[sigma[i][ 5]]);
B2B_G( 3, 7, 11, 15, m[sigma[i][ 6]], m[sigma[i][ 7]]);
B2B_G( 0, 5, 10, 15, m[sigma[i][ 8]], m[sigma[i][ 9]]);
B2B_G( 1, 6, 11, 12, m[sigma[i][10]], m[sigma[i][11]]);
B2B_G( 2, 7, 8, 13, m[sigma[i][12]], m[sigma[i][13]]);
B2B_G( 3, 4, 9, 14, m[sigma[i][14]], m[sigma[i][15]]);
}
for (i = 0; i < 12; i++)
BLAKE2B_ROUND( i );
for( i = 0; i < 8; ++i )
ctx->h[i] ^= v[i] ^ v[i + 8];

View File

@@ -41,8 +41,6 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "simd-utils.h"
#define SPH_SIZE_bmw256 256
@@ -57,7 +55,7 @@ typedef struct {
__m128i buf[64];
__m128i H[16];
size_t ptr;
sph_u32 bit_count; // assume bit_count fits in 32 bits
uint32_t bit_count; // assume bit_count fits in 32 bits
} bmw_4way_small_context;
typedef bmw_4way_small_context bmw256_4way_context;
@@ -144,7 +142,7 @@ typedef struct {
__m256i buf[16];
__m256i H[16];
size_t ptr;
sph_u64 bit_count;
uint64_t bit_count;
} bmw_4way_big_context __attribute__((aligned(128)));
typedef bmw_4way_big_context bmw512_4way_context;

View File

@@ -109,7 +109,7 @@ static const uint32_t IV256[] = {
_mm_sub_epi32( _mm_add_epi32( rol_off_32( M, j, 0 ), \
rol_off_32( M, j, 3 ) ), \
rol_off_32( M, j, 10 ) ), \
_mm_set1_epi32( ( (j)+16 ) * SPH_C32(0x05555555UL) ) ), \
_mm_set1_epi32( ( (j)+16 ) * 0x05555555UL ) ), \
H[ ( (j)+7 ) & 0xF ] )
@@ -451,22 +451,22 @@ static const __m128i final_s[16] =
*/
void bmw256_4way_init( bmw256_4way_context *ctx )
{
ctx->H[ 0] = m128_const1_64( 0x4041424340414243 );
ctx->H[ 1] = m128_const1_64( 0x4445464744454647 );
ctx->H[ 2] = m128_const1_64( 0x48494A4B48494A4B );
ctx->H[ 3] = m128_const1_64( 0x4C4D4E4F4C4D4E4F );
ctx->H[ 4] = m128_const1_64( 0x5051525350515253 );
ctx->H[ 5] = m128_const1_64( 0x5455565754555657 );
ctx->H[ 6] = m128_const1_64( 0x58595A5B58595A5B );
ctx->H[ 7] = m128_const1_64( 0x5C5D5E5F5C5D5E5F );
ctx->H[ 8] = m128_const1_64( 0x6061626360616263 );
ctx->H[ 9] = m128_const1_64( 0x6465666764656667 );
ctx->H[10] = m128_const1_64( 0x68696A6B68696A6B );
ctx->H[11] = m128_const1_64( 0x6C6D6E6F6C6D6E6F );
ctx->H[12] = m128_const1_64( 0x7071727370717273 );
ctx->H[13] = m128_const1_64( 0x7475767774757677 );
ctx->H[14] = m128_const1_64( 0x78797A7B78797A7B );
ctx->H[15] = m128_const1_64( 0x7C7D7E7F7C7D7E7F );
ctx->H[ 0] = _mm_set1_epi64x( 0x4041424340414243 );
ctx->H[ 1] = _mm_set1_epi64x( 0x4445464744454647 );
ctx->H[ 2] = _mm_set1_epi64x( 0x48494A4B48494A4B );
ctx->H[ 3] = _mm_set1_epi64x( 0x4C4D4E4F4C4D4E4F );
ctx->H[ 4] = _mm_set1_epi64x( 0x5051525350515253 );
ctx->H[ 5] = _mm_set1_epi64x( 0x5455565754555657 );
ctx->H[ 6] = _mm_set1_epi64x( 0x58595A5B58595A5B );
ctx->H[ 7] = _mm_set1_epi64x( 0x5C5D5E5F5C5D5E5F );
ctx->H[ 8] = _mm_set1_epi64x( 0x6061626360616263 );
ctx->H[ 9] = _mm_set1_epi64x( 0x6465666764656667 );
ctx->H[10] = _mm_set1_epi64x( 0x68696A6B68696A6B );
ctx->H[11] = _mm_set1_epi64x( 0x6C6D6E6F6C6D6E6F );
ctx->H[12] = _mm_set1_epi64x( 0x7071727370717273 );
ctx->H[13] = _mm_set1_epi64x( 0x7475767774757677 );
ctx->H[14] = _mm_set1_epi64x( 0x78797A7B78797A7B );
ctx->H[15] = _mm_set1_epi64x( 0x7C7D7E7F7C7D7E7F );
// for ( int i = 0; i < 16; i++ )
@@ -485,7 +485,7 @@ bmw32_4way(bmw_4way_small_context *sc, const void *data, size_t len)
size_t ptr;
const int buf_size = 64; // bytes of one lane, compatible with len
sc->bit_count += (sph_u32)len << 3;
sc->bit_count += (uint32_t)len << 3;
buf = sc->buf;
ptr = sc->ptr;
h1 = sc->H;
@@ -529,7 +529,7 @@ bmw32_4way_close(bmw_4way_small_context *sc, unsigned ub, unsigned n,
buf = sc->buf;
ptr = sc->ptr;
buf[ ptr>>2 ] = m128_const1_64( 0x0000008000000080 );
buf[ ptr>>2 ] = _mm_set1_epi64x( 0x0000008000000080 );
ptr += 4;
h = sc->H;
@@ -959,22 +959,22 @@ static const __m256i final_s8[16] =
void bmw256_8way_init( bmw256_8way_context *ctx )
{
ctx->H[ 0] = m256_const1_64( 0x4041424340414243 );
ctx->H[ 1] = m256_const1_64( 0x4445464744454647 );
ctx->H[ 2] = m256_const1_64( 0x48494A4B48494A4B );
ctx->H[ 3] = m256_const1_64( 0x4C4D4E4F4C4D4E4F );
ctx->H[ 4] = m256_const1_64( 0x5051525350515253 );
ctx->H[ 5] = m256_const1_64( 0x5455565754555657 );
ctx->H[ 6] = m256_const1_64( 0x58595A5B58595A5B );
ctx->H[ 7] = m256_const1_64( 0x5C5D5E5F5C5D5E5F );
ctx->H[ 8] = m256_const1_64( 0x6061626360616263 );
ctx->H[ 9] = m256_const1_64( 0x6465666764656667 );
ctx->H[10] = m256_const1_64( 0x68696A6B68696A6B );
ctx->H[11] = m256_const1_64( 0x6C6D6E6F6C6D6E6F );
ctx->H[12] = m256_const1_64( 0x7071727370717273 );
ctx->H[13] = m256_const1_64( 0x7475767774757677 );
ctx->H[14] = m256_const1_64( 0x78797A7B78797A7B );
ctx->H[15] = m256_const1_64( 0x7C7D7E7F7C7D7E7F );
ctx->H[ 0] = _mm256_set1_epi64x( 0x4041424340414243 );
ctx->H[ 1] = _mm256_set1_epi64x( 0x4445464744454647 );
ctx->H[ 2] = _mm256_set1_epi64x( 0x48494A4B48494A4B );
ctx->H[ 3] = _mm256_set1_epi64x( 0x4C4D4E4F4C4D4E4F );
ctx->H[ 4] = _mm256_set1_epi64x( 0x5051525350515253 );
ctx->H[ 5] = _mm256_set1_epi64x( 0x5455565754555657 );
ctx->H[ 6] = _mm256_set1_epi64x( 0x58595A5B58595A5B );
ctx->H[ 7] = _mm256_set1_epi64x( 0x5C5D5E5F5C5D5E5F );
ctx->H[ 8] = _mm256_set1_epi64x( 0x6061626360616263 );
ctx->H[ 9] = _mm256_set1_epi64x( 0x6465666764656667 );
ctx->H[10] = _mm256_set1_epi64x( 0x68696A6B68696A6B );
ctx->H[11] = _mm256_set1_epi64x( 0x6C6D6E6F6C6D6E6F );
ctx->H[12] = _mm256_set1_epi64x( 0x7071727370717273 );
ctx->H[13] = _mm256_set1_epi64x( 0x7475767774757677 );
ctx->H[14] = _mm256_set1_epi64x( 0x78797A7B78797A7B );
ctx->H[15] = _mm256_set1_epi64x( 0x7C7D7E7F7C7D7E7F );
ctx->ptr = 0;
ctx->bit_count = 0;
}
@@ -1030,7 +1030,7 @@ void bmw256_8way_close( bmw256_8way_context *ctx, void *dst )
buf = ctx->buf;
ptr = ctx->ptr;
buf[ ptr>>2 ] = m256_const1_64( 0x0000008000000080 );
buf[ ptr>>2 ] = _mm256_set1_epi64x( 0x0000008000000080 );
ptr += 4;
h = ctx->H;
@@ -1460,22 +1460,22 @@ static const __m512i final_s16[16] =
void bmw256_16way_init( bmw256_16way_context *ctx )
{
ctx->H[ 0] = m512_const1_64( 0x4041424340414243 );
ctx->H[ 1] = m512_const1_64( 0x4445464744454647 );
ctx->H[ 2] = m512_const1_64( 0x48494A4B48494A4B );
ctx->H[ 3] = m512_const1_64( 0x4C4D4E4F4C4D4E4F );
ctx->H[ 4] = m512_const1_64( 0x5051525350515253 );
ctx->H[ 5] = m512_const1_64( 0x5455565754555657 );
ctx->H[ 6] = m512_const1_64( 0x58595A5B58595A5B );
ctx->H[ 7] = m512_const1_64( 0x5C5D5E5F5C5D5E5F );
ctx->H[ 8] = m512_const1_64( 0x6061626360616263 );
ctx->H[ 9] = m512_const1_64( 0x6465666764656667 );
ctx->H[10] = m512_const1_64( 0x68696A6B68696A6B );
ctx->H[11] = m512_const1_64( 0x6C6D6E6F6C6D6E6F );
ctx->H[12] = m512_const1_64( 0x7071727370717273 );
ctx->H[13] = m512_const1_64( 0x7475767774757677 );
ctx->H[14] = m512_const1_64( 0x78797A7B78797A7B );
ctx->H[15] = m512_const1_64( 0x7C7D7E7F7C7D7E7F );
ctx->H[ 0] = _mm512_set1_epi64( 0x4041424340414243 );
ctx->H[ 1] = _mm512_set1_epi64( 0x4445464744454647 );
ctx->H[ 2] = _mm512_set1_epi64( 0x48494A4B48494A4B );
ctx->H[ 3] = _mm512_set1_epi64( 0x4C4D4E4F4C4D4E4F );
ctx->H[ 4] = _mm512_set1_epi64( 0x5051525350515253 );
ctx->H[ 5] = _mm512_set1_epi64( 0x5455565754555657 );
ctx->H[ 6] = _mm512_set1_epi64( 0x58595A5B58595A5B );
ctx->H[ 7] = _mm512_set1_epi64( 0x5C5D5E5F5C5D5E5F );
ctx->H[ 8] = _mm512_set1_epi64( 0x6061626360616263 );
ctx->H[ 9] = _mm512_set1_epi64( 0x6465666764656667 );
ctx->H[10] = _mm512_set1_epi64( 0x68696A6B68696A6B );
ctx->H[11] = _mm512_set1_epi64( 0x6C6D6E6F6C6D6E6F );
ctx->H[12] = _mm512_set1_epi64( 0x7071727370717273 );
ctx->H[13] = _mm512_set1_epi64( 0x7475767774757677 );
ctx->H[14] = _mm512_set1_epi64( 0x78797A7B78797A7B );
ctx->H[15] = _mm512_set1_epi64( 0x7C7D7E7F7C7D7E7F );
ctx->ptr = 0;
ctx->bit_count = 0;
}
@@ -1531,7 +1531,7 @@ void bmw256_16way_close( bmw256_16way_context *ctx, void *dst )
buf = ctx->buf;
ptr = ctx->ptr;
buf[ ptr>>2 ] = m512_const1_64( 0x0000008000000080 );
buf[ ptr>>2 ] = _mm512_set1_epi64( 0x0000008000000080 );
ptr += 4;
h = ctx->H;

View File

@@ -45,15 +45,15 @@ extern "C"{
#define LPAR (
static const sph_u64 IV512[] = {
SPH_C64(0x8081828384858687), SPH_C64(0x88898A8B8C8D8E8F),
SPH_C64(0x9091929394959697), SPH_C64(0x98999A9B9C9D9E9F),
SPH_C64(0xA0A1A2A3A4A5A6A7), SPH_C64(0xA8A9AAABACADAEAF),
SPH_C64(0xB0B1B2B3B4B5B6B7), SPH_C64(0xB8B9BABBBCBDBEBF),
SPH_C64(0xC0C1C2C3C4C5C6C7), SPH_C64(0xC8C9CACBCCCDCECF),
SPH_C64(0xD0D1D2D3D4D5D6D7), SPH_C64(0xD8D9DADBDCDDDEDF),
SPH_C64(0xE0E1E2E3E4E5E6E7), SPH_C64(0xE8E9EAEBECEDEEEF),
SPH_C64(0xF0F1F2F3F4F5F6F7), SPH_C64(0xF8F9FAFBFCFDFEFF)
static const uint64_t IV512[] = {
0x8081828384858687, 0x88898A8B8C8D8E8F,
0x9091929394959697, 0x98999A9B9C9D9E9F,
0xA0A1A2A3A4A5A6A7, 0xA8A9AAABACADAEAF,
0xB0B1B2B3B4B5B6B7, 0xB8B9BABBBCBDBEBF,
0xC0C1C2C3C4C5C6C7, 0xC8C9CACBCCCDCECF,
0xD0D1D2D3D4D5D6D7, 0xD8D9DADBDCDDDEDF,
0xE0E1E2E3E4E5E6E7, 0xE8E9EAEBECEDEEEF,
0xF0F1F2F3F4F5F6F7, 0xF8F9FAFBFCFDFEFF
};
#if defined(__SSE2__)
@@ -594,9 +594,6 @@ void bmw512_2way_close( bmw_2way_big_context *ctx, void *dst )
#define rb6(x) mm256_rol_64( x, 43 )
#define rb7(x) mm256_rol_64( x, 53 )
#define rol_off_64( M, j ) \
mm256_rol_64( M[ (j) & 0xF ], ( (j) & 0xF ) + 1 )
#define add_elt_b( mj0, mj3, mj10, h, K ) \
_mm256_xor_si256( h, _mm256_add_epi64( K, \
_mm256_sub_epi64( _mm256_add_epi64( mj0, mj3 ), mj10 ) ) )
@@ -732,41 +729,58 @@ void compress_big( const __m256i *M, const __m256i H[16], __m256i dH[16] )
qt[15] = _mm256_add_epi64( sb0( Wb15), H[ 0] );
__m256i mj[16];
for ( i = 0; i < 16; i++ )
mj[i] = rol_off_64( M, i );
qt[16] = add_elt_b( mj[ 0], mj[ 3], mj[10], H[ 7],
(const __m256i)_mm256_set1_epi64x( 16 * 0x0555555555555555ULL ) );
qt[17] = add_elt_b( mj[ 1], mj[ 4], mj[11], H[ 8],
(const __m256i)_mm256_set1_epi64x( 17 * 0x0555555555555555ULL ) );
qt[18] = add_elt_b( mj[ 2], mj[ 5], mj[12], H[ 9],
(const __m256i)_mm256_set1_epi64x( 18 * 0x0555555555555555ULL ) );
qt[19] = add_elt_b( mj[ 3], mj[ 6], mj[13], H[10],
(const __m256i)_mm256_set1_epi64x( 19 * 0x0555555555555555ULL ) );
qt[20] = add_elt_b( mj[ 4], mj[ 7], mj[14], H[11],
(const __m256i)_mm256_set1_epi64x( 20 * 0x0555555555555555ULL ) );
qt[21] = add_elt_b( mj[ 5], mj[ 8], mj[15], H[12],
(const __m256i)_mm256_set1_epi64x( 21 * 0x0555555555555555ULL ) );
qt[22] = add_elt_b( mj[ 6], mj[ 9], mj[ 0], H[13],
(const __m256i)_mm256_set1_epi64x( 22 * 0x0555555555555555ULL ) );
qt[23] = add_elt_b( mj[ 7], mj[10], mj[ 1], H[14],
(const __m256i)_mm256_set1_epi64x( 23 * 0x0555555555555555ULL ) );
qt[24] = add_elt_b( mj[ 8], mj[11], mj[ 2], H[15],
(const __m256i)_mm256_set1_epi64x( 24 * 0x0555555555555555ULL ) );
qt[25] = add_elt_b( mj[ 9], mj[12], mj[ 3], H[ 0],
(const __m256i)_mm256_set1_epi64x( 25 * 0x0555555555555555ULL ) );
qt[26] = add_elt_b( mj[10], mj[13], mj[ 4], H[ 1],
(const __m256i)_mm256_set1_epi64x( 26 * 0x0555555555555555ULL ) );
qt[27] = add_elt_b( mj[11], mj[14], mj[ 5], H[ 2],
(const __m256i)_mm256_set1_epi64x( 27 * 0x0555555555555555ULL ) );
qt[28] = add_elt_b( mj[12], mj[15], mj[ 6], H[ 3],
(const __m256i)_mm256_set1_epi64x( 28 * 0x0555555555555555ULL ) );
qt[29] = add_elt_b( mj[13], mj[ 0], mj[ 7], H[ 4],
(const __m256i)_mm256_set1_epi64x( 29 * 0x0555555555555555ULL ) );
qt[30] = add_elt_b( mj[14], mj[ 1], mj[ 8], H[ 5],
(const __m256i)_mm256_set1_epi64x( 30 * 0x0555555555555555ULL ) );
qt[31] = add_elt_b( mj[15], mj[ 2], mj[ 9], H[ 6],
(const __m256i)_mm256_set1_epi64x( 31 * 0x0555555555555555ULL ) );
mj[ 0] = mm256_rol_64( M[ 0], 1 );
mj[ 1] = mm256_rol_64( M[ 1], 2 );
mj[ 2] = mm256_rol_64( M[ 2], 3 );
mj[ 3] = mm256_rol_64( M[ 3], 4 );
mj[ 4] = mm256_rol_64( M[ 4], 5 );
mj[ 5] = mm256_rol_64( M[ 5], 6 );
mj[ 6] = mm256_rol_64( M[ 6], 7 );
mj[ 7] = mm256_rol_64( M[ 7], 8 );
mj[ 8] = mm256_rol_64( M[ 8], 9 );
mj[ 9] = mm256_rol_64( M[ 9], 10 );
mj[10] = mm256_rol_64( M[10], 11 );
mj[11] = mm256_rol_64( M[11], 12 );
mj[12] = mm256_rol_64( M[12], 13 );
mj[13] = mm256_rol_64( M[13], 14 );
mj[14] = mm256_rol_64( M[14], 15 );
mj[15] = mm256_rol_64( M[15], 16 );
__m256i K = _mm256_set1_epi64x( 16 * 0x0555555555555555ULL );
const __m256i Kincr = _mm256_set1_epi64x( 0x0555555555555555ULL );
qt[16] = add_elt_b( mj[ 0], mj[ 3], mj[10], H[ 7], K );
K = _mm256_add_epi64( K, Kincr );
qt[17] = add_elt_b( mj[ 1], mj[ 4], mj[11], H[ 8], K );
K = _mm256_add_epi64( K, Kincr );
qt[18] = add_elt_b( mj[ 2], mj[ 5], mj[12], H[ 9], K );
K = _mm256_add_epi64( K, Kincr );
qt[19] = add_elt_b( mj[ 3], mj[ 6], mj[13], H[10], K );
K = _mm256_add_epi64( K, Kincr );
qt[20] = add_elt_b( mj[ 4], mj[ 7], mj[14], H[11], K );
K = _mm256_add_epi64( K, Kincr );
qt[21] = add_elt_b( mj[ 5], mj[ 8], mj[15], H[12], K );
K = _mm256_add_epi64( K, Kincr );
qt[22] = add_elt_b( mj[ 6], mj[ 9], mj[ 0], H[13], K );
K = _mm256_add_epi64( K, Kincr );
qt[23] = add_elt_b( mj[ 7], mj[10], mj[ 1], H[14], K );
K = _mm256_add_epi64( K, Kincr );
qt[24] = add_elt_b( mj[ 8], mj[11], mj[ 2], H[15], K );
K = _mm256_add_epi64( K, Kincr );
qt[25] = add_elt_b( mj[ 9], mj[12], mj[ 3], H[ 0], K );
K = _mm256_add_epi64( K, Kincr );
qt[26] = add_elt_b( mj[10], mj[13], mj[ 4], H[ 1], K );
K = _mm256_add_epi64( K, Kincr );
qt[27] = add_elt_b( mj[11], mj[14], mj[ 5], H[ 2], K );
K = _mm256_add_epi64( K, Kincr );
qt[28] = add_elt_b( mj[12], mj[15], mj[ 6], H[ 3], K );
K = _mm256_add_epi64( K, Kincr );
qt[29] = add_elt_b( mj[13], mj[ 0], mj[ 7], H[ 4], K );
K = _mm256_add_epi64( K, Kincr );
qt[30] = add_elt_b( mj[14], mj[ 1], mj[ 8], H[ 5], K );
K = _mm256_add_epi64( K, Kincr );
qt[31] = add_elt_b( mj[15], mj[ 2], mj[ 9], H[ 6], K );
qt[16] = _mm256_add_epi64( qt[16], expand1_b( qt, 16 ) );
qt[17] = _mm256_add_epi64( qt[17], expand1_b( qt, 17 ) );
@@ -880,24 +894,24 @@ static const __m256i final_b[16] =
};
static void
bmw64_4way_init( bmw_4way_big_context *sc, const sph_u64 *iv )
bmw64_4way_init( bmw_4way_big_context *sc, const uint64_t *iv )
{
sc->H[ 0] = m256_const1_64( 0x8081828384858687 );
sc->H[ 1] = m256_const1_64( 0x88898A8B8C8D8E8F );
sc->H[ 2] = m256_const1_64( 0x9091929394959697 );
sc->H[ 3] = m256_const1_64( 0x98999A9B9C9D9E9F );
sc->H[ 4] = m256_const1_64( 0xA0A1A2A3A4A5A6A7 );
sc->H[ 5] = m256_const1_64( 0xA8A9AAABACADAEAF );
sc->H[ 6] = m256_const1_64( 0xB0B1B2B3B4B5B6B7 );
sc->H[ 7] = m256_const1_64( 0xB8B9BABBBCBDBEBF );
sc->H[ 8] = m256_const1_64( 0xC0C1C2C3C4C5C6C7 );
sc->H[ 9] = m256_const1_64( 0xC8C9CACBCCCDCECF );
sc->H[10] = m256_const1_64( 0xD0D1D2D3D4D5D6D7 );
sc->H[11] = m256_const1_64( 0xD8D9DADBDCDDDEDF );
sc->H[12] = m256_const1_64( 0xE0E1E2E3E4E5E6E7 );
sc->H[13] = m256_const1_64( 0xE8E9EAEBECEDEEEF );
sc->H[14] = m256_const1_64( 0xF0F1F2F3F4F5F6F7 );
sc->H[15] = m256_const1_64( 0xF8F9FAFBFCFDFEFF );
sc->H[ 0] = _mm256_set1_epi64x( 0x8081828384858687 );
sc->H[ 1] = _mm256_set1_epi64x( 0x88898A8B8C8D8E8F );
sc->H[ 2] = _mm256_set1_epi64x( 0x9091929394959697 );
sc->H[ 3] = _mm256_set1_epi64x( 0x98999A9B9C9D9E9F );
sc->H[ 4] = _mm256_set1_epi64x( 0xA0A1A2A3A4A5A6A7 );
sc->H[ 5] = _mm256_set1_epi64x( 0xA8A9AAABACADAEAF );
sc->H[ 6] = _mm256_set1_epi64x( 0xB0B1B2B3B4B5B6B7 );
sc->H[ 7] = _mm256_set1_epi64x( 0xB8B9BABBBCBDBEBF );
sc->H[ 8] = _mm256_set1_epi64x( 0xC0C1C2C3C4C5C6C7 );
sc->H[ 9] = _mm256_set1_epi64x( 0xC8C9CACBCCCDCECF );
sc->H[10] = _mm256_set1_epi64x( 0xD0D1D2D3D4D5D6D7 );
sc->H[11] = _mm256_set1_epi64x( 0xD8D9DADBDCDDDEDF );
sc->H[12] = _mm256_set1_epi64x( 0xE0E1E2E3E4E5E6E7 );
sc->H[13] = _mm256_set1_epi64x( 0xE8E9EAEBECEDEEEF );
sc->H[14] = _mm256_set1_epi64x( 0xF0F1F2F3F4F5F6F7 );
sc->H[15] = _mm256_set1_epi64x( 0xF8F9FAFBFCFDFEFF );
sc->ptr = 0;
sc->bit_count = 0;
}
@@ -912,7 +926,7 @@ bmw64_4way( bmw_4way_big_context *sc, const void *data, size_t len )
size_t ptr;
const int buf_size = 128; // bytes of one lane, compatible with len
sc->bit_count += (sph_u64)len << 3;
sc->bit_count += (uint64_t)len << 3;
buf = sc->buf;
ptr = sc->ptr;
h1 = sc->H;
@@ -953,7 +967,7 @@ bmw64_4way_close(bmw_4way_big_context *sc, unsigned ub, unsigned n,
buf = sc->buf;
ptr = sc->ptr;
buf[ ptr>>3 ] = m256_const1_64( 0x80 );
buf[ ptr>>3 ] = _mm256_set1_epi64x( 0x80 );
ptr += 8;
h = sc->H;
@@ -1034,9 +1048,6 @@ bmw512_4way_addbits_and_close(void *cc, unsigned ub, unsigned n, void *dst)
#define r8b6(x) mm512_rol_64( x, 43 )
#define r8b7(x) mm512_rol_64( x, 53 )
#define rol8w_off_64( M, j ) \
mm512_rol_64( M[ (j) & 0xF ], ( (j) & 0xF ) + 1 )
#define add_elt_b8( mj0, mj3, mj10, h, K ) \
_mm512_xor_si512( h, _mm512_add_epi64( K, \
_mm512_sub_epi64( _mm512_add_epi64( mj0, mj3 ), mj10 ) ) )
@@ -1171,41 +1182,58 @@ void compress_big_8way( const __m512i *M, const __m512i H[16],
qt[15] = _mm512_add_epi64( s8b0( W8b15), H[ 0] );
__m512i mj[16];
for ( i = 0; i < 16; i++ )
mj[i] = rol8w_off_64( M, i );
mj[ 0] = mm512_rol_64( M[ 0], 1 );
mj[ 1] = mm512_rol_64( M[ 1], 2 );
mj[ 2] = mm512_rol_64( M[ 2], 3 );
mj[ 3] = mm512_rol_64( M[ 3], 4 );
mj[ 4] = mm512_rol_64( M[ 4], 5 );
mj[ 5] = mm512_rol_64( M[ 5], 6 );
mj[ 6] = mm512_rol_64( M[ 6], 7 );
mj[ 7] = mm512_rol_64( M[ 7], 8 );
mj[ 8] = mm512_rol_64( M[ 8], 9 );
mj[ 9] = mm512_rol_64( M[ 9], 10 );
mj[10] = mm512_rol_64( M[10], 11 );
mj[11] = mm512_rol_64( M[11], 12 );
mj[12] = mm512_rol_64( M[12], 13 );
mj[13] = mm512_rol_64( M[13], 14 );
mj[14] = mm512_rol_64( M[14], 15 );
mj[15] = mm512_rol_64( M[15], 16 );
qt[16] = add_elt_b8( mj[ 0], mj[ 3], mj[10], H[ 7],
(const __m512i)_mm512_set1_epi64( 16 * 0x0555555555555555ULL ) );
qt[17] = add_elt_b8( mj[ 1], mj[ 4], mj[11], H[ 8],
(const __m512i)_mm512_set1_epi64( 17 * 0x0555555555555555ULL ) );
qt[18] = add_elt_b8( mj[ 2], mj[ 5], mj[12], H[ 9],
(const __m512i)_mm512_set1_epi64( 18 * 0x0555555555555555ULL ) );
qt[19] = add_elt_b8( mj[ 3], mj[ 6], mj[13], H[10],
(const __m512i)_mm512_set1_epi64( 19 * 0x0555555555555555ULL ) );
qt[20] = add_elt_b8( mj[ 4], mj[ 7], mj[14], H[11],
(const __m512i)_mm512_set1_epi64( 20 * 0x0555555555555555ULL ) );
qt[21] = add_elt_b8( mj[ 5], mj[ 8], mj[15], H[12],
(const __m512i)_mm512_set1_epi64( 21 * 0x0555555555555555ULL ) );
qt[22] = add_elt_b8( mj[ 6], mj[ 9], mj[ 0], H[13],
(const __m512i)_mm512_set1_epi64( 22 * 0x0555555555555555ULL ) );
qt[23] = add_elt_b8( mj[ 7], mj[10], mj[ 1], H[14],
(const __m512i)_mm512_set1_epi64( 23 * 0x0555555555555555ULL ) );
qt[24] = add_elt_b8( mj[ 8], mj[11], mj[ 2], H[15],
(const __m512i)_mm512_set1_epi64( 24 * 0x0555555555555555ULL ) );
qt[25] = add_elt_b8( mj[ 9], mj[12], mj[ 3], H[ 0],
(const __m512i)_mm512_set1_epi64( 25 * 0x0555555555555555ULL ) );
qt[26] = add_elt_b8( mj[10], mj[13], mj[ 4], H[ 1],
(const __m512i)_mm512_set1_epi64( 26 * 0x0555555555555555ULL ) );
qt[27] = add_elt_b8( mj[11], mj[14], mj[ 5], H[ 2],
(const __m512i)_mm512_set1_epi64( 27 * 0x0555555555555555ULL ) );
qt[28] = add_elt_b8( mj[12], mj[15], mj[ 6], H[ 3],
(const __m512i)_mm512_set1_epi64( 28 * 0x0555555555555555ULL ) );
qt[29] = add_elt_b8( mj[13], mj[ 0], mj[ 7], H[ 4],
(const __m512i)_mm512_set1_epi64( 29 * 0x0555555555555555ULL ) );
qt[30] = add_elt_b8( mj[14], mj[ 1], mj[ 8], H[ 5],
(const __m512i)_mm512_set1_epi64( 30 * 0x0555555555555555ULL ) );
qt[31] = add_elt_b8( mj[15], mj[ 2], mj[ 9], H[ 6],
(const __m512i)_mm512_set1_epi64( 31 * 0x0555555555555555ULL ) );
__m512i K = _mm512_set1_epi64( 16 * 0x0555555555555555ULL );
const __m512i Kincr = _mm512_set1_epi64( 0x0555555555555555ULL );
qt[16] = add_elt_b8( mj[ 0], mj[ 3], mj[10], H[ 7], K );
K = _mm512_add_epi64( K, Kincr );
qt[17] = add_elt_b8( mj[ 1], mj[ 4], mj[11], H[ 8], K );
K = _mm512_add_epi64( K, Kincr );
qt[18] = add_elt_b8( mj[ 2], mj[ 5], mj[12], H[ 9], K );
K = _mm512_add_epi64( K, Kincr );
qt[19] = add_elt_b8( mj[ 3], mj[ 6], mj[13], H[10], K );
K = _mm512_add_epi64( K, Kincr );
qt[20] = add_elt_b8( mj[ 4], mj[ 7], mj[14], H[11], K );
K = _mm512_add_epi64( K, Kincr );
qt[21] = add_elt_b8( mj[ 5], mj[ 8], mj[15], H[12], K );
K = _mm512_add_epi64( K, Kincr );
qt[22] = add_elt_b8( mj[ 6], mj[ 9], mj[ 0], H[13], K );
K = _mm512_add_epi64( K, Kincr );
qt[23] = add_elt_b8( mj[ 7], mj[10], mj[ 1], H[14], K );
K = _mm512_add_epi64( K, Kincr );
qt[24] = add_elt_b8( mj[ 8], mj[11], mj[ 2], H[15], K );
K = _mm512_add_epi64( K, Kincr );
qt[25] = add_elt_b8( mj[ 9], mj[12], mj[ 3], H[ 0], K );
K = _mm512_add_epi64( K, Kincr );
qt[26] = add_elt_b8( mj[10], mj[13], mj[ 4], H[ 1], K );
K = _mm512_add_epi64( K, Kincr );
qt[27] = add_elt_b8( mj[11], mj[14], mj[ 5], H[ 2], K );
K = _mm512_add_epi64( K, Kincr );
qt[28] = add_elt_b8( mj[12], mj[15], mj[ 6], H[ 3], K );
K = _mm512_add_epi64( K, Kincr );
qt[29] = add_elt_b8( mj[13], mj[ 0], mj[ 7], H[ 4], K );
K = _mm512_add_epi64( K, Kincr );
qt[30] = add_elt_b8( mj[14], mj[ 1], mj[ 8], H[ 5], K );
K = _mm512_add_epi64( K, Kincr );
qt[31] = add_elt_b8( mj[15], mj[ 2], mj[ 9], H[ 6], K );
qt[16] = _mm512_add_epi64( qt[16], expand1_b8( qt, 16 ) );
qt[17] = _mm512_add_epi64( qt[17], expand1_b8( qt, 17 ) );
@@ -1349,24 +1377,24 @@ static const __m512i final_b8[16] =
void bmw512_8way_init( bmw512_8way_context *ctx )
//bmw64_4way_init( bmw_4way_big_context *sc, const sph_u64 *iv )
//bmw64_4way_init( bmw_4way_big_context *sc, const uint64_t *iv )
{
ctx->H[ 0] = m512_const1_64( 0x8081828384858687 );
ctx->H[ 1] = m512_const1_64( 0x88898A8B8C8D8E8F );
ctx->H[ 2] = m512_const1_64( 0x9091929394959697 );
ctx->H[ 3] = m512_const1_64( 0x98999A9B9C9D9E9F );
ctx->H[ 4] = m512_const1_64( 0xA0A1A2A3A4A5A6A7 );
ctx->H[ 5] = m512_const1_64( 0xA8A9AAABACADAEAF );
ctx->H[ 6] = m512_const1_64( 0xB0B1B2B3B4B5B6B7 );
ctx->H[ 7] = m512_const1_64( 0xB8B9BABBBCBDBEBF );
ctx->H[ 8] = m512_const1_64( 0xC0C1C2C3C4C5C6C7 );
ctx->H[ 9] = m512_const1_64( 0xC8C9CACBCCCDCECF );
ctx->H[10] = m512_const1_64( 0xD0D1D2D3D4D5D6D7 );
ctx->H[11] = m512_const1_64( 0xD8D9DADBDCDDDEDF );
ctx->H[12] = m512_const1_64( 0xE0E1E2E3E4E5E6E7 );
ctx->H[13] = m512_const1_64( 0xE8E9EAEBECEDEEEF );
ctx->H[14] = m512_const1_64( 0xF0F1F2F3F4F5F6F7 );
ctx->H[15] = m512_const1_64( 0xF8F9FAFBFCFDFEFF );
ctx->H[ 0] = _mm512_set1_epi64( 0x8081828384858687 );
ctx->H[ 1] = _mm512_set1_epi64( 0x88898A8B8C8D8E8F );
ctx->H[ 2] = _mm512_set1_epi64( 0x9091929394959697 );
ctx->H[ 3] = _mm512_set1_epi64( 0x98999A9B9C9D9E9F );
ctx->H[ 4] = _mm512_set1_epi64( 0xA0A1A2A3A4A5A6A7 );
ctx->H[ 5] = _mm512_set1_epi64( 0xA8A9AAABACADAEAF );
ctx->H[ 6] = _mm512_set1_epi64( 0xB0B1B2B3B4B5B6B7 );
ctx->H[ 7] = _mm512_set1_epi64( 0xB8B9BABBBCBDBEBF );
ctx->H[ 8] = _mm512_set1_epi64( 0xC0C1C2C3C4C5C6C7 );
ctx->H[ 9] = _mm512_set1_epi64( 0xC8C9CACBCCCDCECF );
ctx->H[10] = _mm512_set1_epi64( 0xD0D1D2D3D4D5D6D7 );
ctx->H[11] = _mm512_set1_epi64( 0xD8D9DADBDCDDDEDF );
ctx->H[12] = _mm512_set1_epi64( 0xE0E1E2E3E4E5E6E7 );
ctx->H[13] = _mm512_set1_epi64( 0xE8E9EAEBECEDEEEF );
ctx->H[14] = _mm512_set1_epi64( 0xF0F1F2F3F4F5F6F7 );
ctx->H[15] = _mm512_set1_epi64( 0xF8F9FAFBFCFDFEFF );
ctx->ptr = 0;
ctx->bit_count = 0;
}
@@ -1420,7 +1448,7 @@ void bmw512_8way_close( bmw512_8way_context *ctx, void *dst )
buf = ctx->buf;
ptr = ctx->ptr;
buf[ ptr>>3 ] = m512_const1_64( 0x80 );
buf[ ptr>>3 ] = _mm512_set1_epi64( 0x80 );
ptr += 8;
h = ctx->H;
@@ -1455,22 +1483,22 @@ void bmw512_8way_full( bmw512_8way_context *ctx, void *out, const void *data,
// Init
H[ 0] = m512_const1_64( 0x8081828384858687 );
H[ 1] = m512_const1_64( 0x88898A8B8C8D8E8F );
H[ 2] = m512_const1_64( 0x9091929394959697 );
H[ 3] = m512_const1_64( 0x98999A9B9C9D9E9F );
H[ 4] = m512_const1_64( 0xA0A1A2A3A4A5A6A7 );
H[ 5] = m512_const1_64( 0xA8A9AAABACADAEAF );
H[ 6] = m512_const1_64( 0xB0B1B2B3B4B5B6B7 );
H[ 7] = m512_const1_64( 0xB8B9BABBBCBDBEBF );
H[ 8] = m512_const1_64( 0xC0C1C2C3C4C5C6C7 );
H[ 9] = m512_const1_64( 0xC8C9CACBCCCDCECF );
H[10] = m512_const1_64( 0xD0D1D2D3D4D5D6D7 );
H[11] = m512_const1_64( 0xD8D9DADBDCDDDEDF );
H[12] = m512_const1_64( 0xE0E1E2E3E4E5E6E7 );
H[13] = m512_const1_64( 0xE8E9EAEBECEDEEEF );
H[14] = m512_const1_64( 0xF0F1F2F3F4F5F6F7 );
H[15] = m512_const1_64( 0xF8F9FAFBFCFDFEFF );
H[ 0] = _mm512_set1_epi64( 0x8081828384858687 );
H[ 1] = _mm512_set1_epi64( 0x88898A8B8C8D8E8F );
H[ 2] = _mm512_set1_epi64( 0x9091929394959697 );
H[ 3] = _mm512_set1_epi64( 0x98999A9B9C9D9E9F );
H[ 4] = _mm512_set1_epi64( 0xA0A1A2A3A4A5A6A7 );
H[ 5] = _mm512_set1_epi64( 0xA8A9AAABACADAEAF );
H[ 6] = _mm512_set1_epi64( 0xB0B1B2B3B4B5B6B7 );
H[ 7] = _mm512_set1_epi64( 0xB8B9BABBBCBDBEBF );
H[ 8] = _mm512_set1_epi64( 0xC0C1C2C3C4C5C6C7 );
H[ 9] = _mm512_set1_epi64( 0xC8C9CACBCCCDCECF );
H[10] = _mm512_set1_epi64( 0xD0D1D2D3D4D5D6D7 );
H[11] = _mm512_set1_epi64( 0xD8D9DADBDCDDDEDF );
H[12] = _mm512_set1_epi64( 0xE0E1E2E3E4E5E6E7 );
H[13] = _mm512_set1_epi64( 0xE8E9EAEBECEDEEEF );
H[14] = _mm512_set1_epi64( 0xF0F1F2F3F4F5F6F7 );
H[15] = _mm512_set1_epi64( 0xF8F9FAFBFCFDFEFF );
// Update
@@ -1502,7 +1530,7 @@ void bmw512_8way_full( bmw512_8way_context *ctx, void *out, const void *data,
__m512i h1[16], h2[16];
size_t u, v;
buf[ ptr>>3 ] = m512_const1_64( 0x80 );
buf[ ptr>>3 ] = _mm512_set1_epi64( 0x80 );
ptr += 8;
if ( ptr > (buf_size - 8) )

View File

@@ -41,7 +41,7 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
/**
* Output size (in bits) for BMW-224.

View File

@@ -54,14 +54,12 @@ static void transform_4way( cube_4way_context *sp )
x5 = _mm512_add_epi32( x1, x5 );
x6 = _mm512_add_epi32( x2, x6 );
x7 = _mm512_add_epi32( x3, x7 );
y0 = x0;
y1 = x1;
x0 = mm512_rol_32( x2, 7 );
x1 = mm512_rol_32( x3, 7 );
x2 = mm512_rol_32( y0, 7 );
x3 = mm512_rol_32( y1, 7 );
x0 = _mm512_xor_si512( x0, x4 );
x1 = _mm512_xor_si512( x1, x5 );
y0 = mm512_rol_32( x2, 7 );
y1 = mm512_rol_32( x3, 7 );
x2 = mm512_rol_32( x0, 7 );
x3 = mm512_rol_32( x1, 7 );
x0 = _mm512_xor_si512( y0, x4 );
x1 = _mm512_xor_si512( y1, x5 );
x2 = _mm512_xor_si512( x2, x6 );
x3 = _mm512_xor_si512( x3, x7 );
x4 = mm512_swap128_64( x4 );
@@ -72,15 +70,13 @@ static void transform_4way( cube_4way_context *sp )
x5 = _mm512_add_epi32( x1, x5 );
x6 = _mm512_add_epi32( x2, x6 );
x7 = _mm512_add_epi32( x3, x7 );
y0 = x0;
y1 = x2;
x0 = mm512_rol_32( x1, 11 );
x1 = mm512_rol_32( y0, 11 );
x2 = mm512_rol_32( x3, 11 );
x3 = mm512_rol_32( y1, 11 );
x0 = _mm512_xor_si512( x0, x4 );
y0 = mm512_rol_32( x1, 11 );
x1 = mm512_rol_32( x0, 11 );
y1 = mm512_rol_32( x3, 11 );
x3 = mm512_rol_32( x2, 11 );
x0 = _mm512_xor_si512( y0, x4 );
x1 = _mm512_xor_si512( x1, x5 );
x2 = _mm512_xor_si512( x2, x6 );
x2 = _mm512_xor_si512( y1, x6 );
x3 = _mm512_xor_si512( x3, x7 );
x4 = mm512_swap64_32( x4 );
x5 = mm512_swap64_32( x5 );
@@ -131,83 +127,67 @@ static void transform_4way_2buf( cube_4way_2buf_context *sp )
{
x4 = _mm512_add_epi32( x0, x4 );
y4 = _mm512_add_epi32( y0, y4 );
tx0 = x0;
ty0 = y0;
x5 = _mm512_add_epi32( x1, x5 );
y5 = _mm512_add_epi32( y1, y5 );
tx1 = x1;
ty1 = y1;
x0 = mm512_rol_32( x2, 7 );
y0 = mm512_rol_32( y2, 7 );
tx0 = mm512_rol_32( x2, 7 );
ty0 = mm512_rol_32( y2, 7 );
tx1 = mm512_rol_32( x3, 7 );
ty1 = mm512_rol_32( y3, 7 );
x6 = _mm512_add_epi32( x2, x6 );
y6 = _mm512_add_epi32( y2, y6 );
x1 = mm512_rol_32( x3, 7 );
y1 = mm512_rol_32( y3, 7 );
y6 = _mm512_add_epi32( y2, y6 );
x7 = _mm512_add_epi32( x3, x7 );
y7 = _mm512_add_epi32( y3, y7 );
x2 = mm512_rol_32( tx0, 7 );
y2 = mm512_rol_32( ty0, 7 );
x0 = _mm512_xor_si512( x0, x4 );
y0 = _mm512_xor_si512( y0, y4 );
x2 = mm512_rol_32( x0, 7 );
y2 = mm512_rol_32( y0, 7 );
x3 = mm512_rol_32( x1, 7 );
y3 = mm512_rol_32( y1, 7 );
x0 = _mm512_xor_si512( tx0, x4 );
y0 = _mm512_xor_si512( ty0, y4 );
x1 = _mm512_xor_si512( tx1, x5 );
y1 = _mm512_xor_si512( ty1, y5 );
x4 = mm512_swap128_64( x4 );
x3 = mm512_rol_32( tx1, 7 );
y3 = mm512_rol_32( ty1, 7 );
y4 = mm512_swap128_64( y4 );
x1 = _mm512_xor_si512( x1, x5 );
y1 = _mm512_xor_si512( y1, y5 );
x5 = mm512_swap128_64( x5 );
y5 = mm512_swap128_64( y5 );
x2 = _mm512_xor_si512( x2, x6 );
y2 = _mm512_xor_si512( y2, y6 );
y5 = mm512_swap128_64( y5 );
x3 = _mm512_xor_si512( x3, x7 );
y3 = _mm512_xor_si512( y3, y7 );
x6 = mm512_swap128_64( x6 );
y6 = mm512_swap128_64( y6 );
x7 = mm512_swap128_64( x7 );
y7 = mm512_swap128_64( y7 );
x4 = _mm512_add_epi32( x0, x4 );
y4 = _mm512_add_epi32( y0, y4 );
y6 = mm512_swap128_64( y6 );
x5 = _mm512_add_epi32( x1, x5 );
y5 = _mm512_add_epi32( y1, y5 );
x7 = mm512_swap128_64( x7 );
tx0 = mm512_rol_32( x1, 11 );
ty0 = mm512_rol_32( y1, 11 );
tx1 = mm512_rol_32( x3, 11 );
ty1 = mm512_rol_32( y3, 11 );
x6 = _mm512_add_epi32( x2, x6 );
y6 = _mm512_add_epi32( y2, y6 );
tx0 = x0;
ty0 = y0;
y7 = mm512_swap128_64( y7 );
tx1 = x2;
ty1 = y2;
x0 = mm512_rol_32( x1, 11 );
y0 = mm512_rol_32( y1, 11 );
x7 = _mm512_add_epi32( x3, x7 );
y7 = _mm512_add_epi32( y3, y7 );
x1 = mm512_rol_32( tx0, 11 );
y1 = mm512_rol_32( ty0, 11 );
x0 = _mm512_xor_si512( x0, x4 );
x4 = mm512_swap64_32( x4 );
y0 = _mm512_xor_si512( y0, y4 );
x2 = mm512_rol_32( x3, 11 );
y4 = mm512_swap64_32( y4 );
y2 = mm512_rol_32( y3, 11 );
x1 = mm512_rol_32( x0, 11 );
y1 = mm512_rol_32( y0, 11 );
x3 = mm512_rol_32( x2, 11 );
y3 = mm512_rol_32( y2, 11 );
x0 = _mm512_xor_si512( tx0, x4 );
y0 = _mm512_xor_si512( ty0, y4 );
x1 = _mm512_xor_si512( x1, x5 );
x5 = mm512_swap64_32( x5 );
y1 = _mm512_xor_si512( y1, y5 );
x3 = mm512_rol_32( tx1, 11 );
x4 = mm512_swap64_32( x4 );
y4 = mm512_swap64_32( y4 );
x5 = mm512_swap64_32( x5 );
y5 = mm512_swap64_32( y5 );
y3 = mm512_rol_32( ty1, 11 );
x2 = _mm512_xor_si512( x2, x6 );
x6 = mm512_swap64_32( x6 );
y2 = _mm512_xor_si512( y2, y6 );
y6 = mm512_swap64_32( y6 );
x2 = _mm512_xor_si512( tx1, x6 );
y2 = _mm512_xor_si512( ty1, y6 );
x3 = _mm512_xor_si512( x3, x7 );
x7 = mm512_swap64_32( x7 );
y3 = _mm512_xor_si512( y3, y7 );
x6 = mm512_swap64_32( x6 );
y6 = mm512_swap64_32( y6 );
x7 = mm512_swap64_32( x7 );
y7 = mm512_swap64_32( y7 );
}
@@ -241,22 +221,14 @@ int cube_4way_init( cube_4way_context *sp, int hashbitlen, int rounds,
sp->rounds = rounds;
sp->pos = 0;
h[ 0] = m512_const1_128( iv[0] );
h[ 1] = m512_const1_128( iv[1] );
h[ 2] = m512_const1_128( iv[2] );
h[ 3] = m512_const1_128( iv[3] );
h[ 4] = m512_const1_128( iv[4] );
h[ 5] = m512_const1_128( iv[5] );
h[ 6] = m512_const1_128( iv[6] );
h[ 7] = m512_const1_128( iv[7] );
h[ 0] = m512_const1_128( iv[0] );
h[ 1] = m512_const1_128( iv[1] );
h[ 2] = m512_const1_128( iv[2] );
h[ 3] = m512_const1_128( iv[3] );
h[ 4] = m512_const1_128( iv[4] );
h[ 5] = m512_const1_128( iv[5] );
h[ 6] = m512_const1_128( iv[6] );
h[ 7] = m512_const1_128( iv[7] );
h[ 0] = mm512_bcast_m128( iv[0] );
h[ 1] = mm512_bcast_m128( iv[1] );
h[ 2] = mm512_bcast_m128( iv[2] );
h[ 3] = mm512_bcast_m128( iv[3] );
h[ 4] = mm512_bcast_m128( iv[4] );
h[ 5] = mm512_bcast_m128( iv[5] );
h[ 6] = mm512_bcast_m128( iv[6] );
h[ 7] = mm512_bcast_m128( iv[7] );
return 0;
}
@@ -287,11 +259,11 @@ int cube_4way_close( cube_4way_context *sp, void *output )
// pos is zero for 64 byte data, 1 for 80 byte data.
sp->h[ sp->pos ] = _mm512_xor_si512( sp->h[ sp->pos ],
m512_const2_64( 0, 0x0000000000000080 ) );
mm512_bcast128lo_64( 0x0000000000000080 ) );
transform_4way( sp );
sp->h[7] = _mm512_xor_si512( sp->h[7],
m512_const2_64( 0x0000000100000000, 0 ) );
mm512_bcast128hi_64( 0x0000000100000000 ) );
for ( i = 0; i < 10; ++i )
transform_4way( sp );
@@ -311,14 +283,14 @@ int cube_4way_full( cube_4way_context *sp, void *output, int hashbitlen,
sp->rounds = 16;
sp->pos = 0;
h[ 0] = m512_const1_128( iv[0] );
h[ 1] = m512_const1_128( iv[1] );
h[ 2] = m512_const1_128( iv[2] );
h[ 3] = m512_const1_128( iv[3] );
h[ 4] = m512_const1_128( iv[4] );
h[ 5] = m512_const1_128( iv[5] );
h[ 6] = m512_const1_128( iv[6] );
h[ 7] = m512_const1_128( iv[7] );
h[ 0] = mm512_bcast_m128( iv[0] );
h[ 1] = mm512_bcast_m128( iv[1] );
h[ 2] = mm512_bcast_m128( iv[2] );
h[ 3] = mm512_bcast_m128( iv[3] );
h[ 4] = mm512_bcast_m128( iv[4] );
h[ 5] = mm512_bcast_m128( iv[5] );
h[ 6] = mm512_bcast_m128( iv[6] );
h[ 7] = mm512_bcast_m128( iv[7] );
const int len = size >> 4;
const __m512i *in = (__m512i*)data;
@@ -338,11 +310,11 @@ int cube_4way_full( cube_4way_context *sp, void *output, int hashbitlen,
// pos is zero for 64 byte data, 1 for 80 byte data.
sp->h[ sp->pos ] = _mm512_xor_si512( sp->h[ sp->pos ],
m512_const2_64( 0, 0x0000000000000080 ) );
mm512_bcast128lo_64( 0x0000000000000080 ) );
transform_4way( sp );
sp->h[7] = _mm512_xor_si512( sp->h[7],
m512_const2_64( 0x0000000100000000, 0 ) );
mm512_bcast128hi_64( 0x0000000100000000 ) );
for ( i = 0; i < 10; ++i )
transform_4way( sp );
@@ -364,14 +336,14 @@ int cube_4way_2buf_full( cube_4way_2buf_context *sp,
sp->rounds = 16;
sp->pos = 0;
h1[0] = h0[0] = m512_const1_128( iv[0] );
h1[1] = h0[1] = m512_const1_128( iv[1] );
h1[2] = h0[2] = m512_const1_128( iv[2] );
h1[3] = h0[3] = m512_const1_128( iv[3] );
h1[4] = h0[4] = m512_const1_128( iv[4] );
h1[5] = h0[5] = m512_const1_128( iv[5] );
h1[6] = h0[6] = m512_const1_128( iv[6] );
h1[7] = h0[7] = m512_const1_128( iv[7] );
h1[0] = h0[0] = mm512_bcast_m128( iv[0] );
h1[1] = h0[1] = mm512_bcast_m128( iv[1] );
h1[2] = h0[2] = mm512_bcast_m128( iv[2] );
h1[3] = h0[3] = mm512_bcast_m128( iv[3] );
h1[4] = h0[4] = mm512_bcast_m128( iv[4] );
h1[5] = h0[5] = mm512_bcast_m128( iv[5] );
h1[6] = h0[6] = mm512_bcast_m128( iv[6] );
h1[7] = h0[7] = mm512_bcast_m128( iv[7] );
const int len = size >> 4;
const __m512i *in0 = (__m512i*)data0;
@@ -393,13 +365,13 @@ int cube_4way_2buf_full( cube_4way_2buf_context *sp,
}
// pos is zero for 64 byte data, 1 for 80 byte data.
__m512i tmp = m512_const2_64( 0, 0x0000000000000080 );
__m512i tmp = mm512_bcast128lo_64( 0x0000000000000080 );
sp->h0[ sp->pos ] = _mm512_xor_si512( sp->h0[ sp->pos ], tmp );
sp->h1[ sp->pos ] = _mm512_xor_si512( sp->h1[ sp->pos ], tmp );
transform_4way_2buf( sp );
tmp = m512_const2_64( 0x0000000100000000, 0 );
tmp = mm512_bcast128hi_64( 0x0000000100000000 );
sp->h0[7] = _mm512_xor_si512( sp->h0[7], tmp );
sp->h1[7] = _mm512_xor_si512( sp->h1[7], tmp );
@@ -412,7 +384,6 @@ int cube_4way_2buf_full( cube_4way_2buf_context *sp,
return 0;
}
int cube_4way_update_close( cube_4way_context *sp, void *output,
const void *data, size_t size )
{
@@ -434,11 +405,11 @@ int cube_4way_update_close( cube_4way_context *sp, void *output,
// pos is zero for 64 byte data, 1 for 80 byte data.
sp->h[ sp->pos ] = _mm512_xor_si512( sp->h[ sp->pos ],
m512_const2_64( 0, 0x0000000000000080 ) );
mm512_bcast128lo_64( 0x0000000000000080 ) );
transform_4way( sp );
sp->h[7] = _mm512_xor_si512( sp->h[7],
m512_const2_64( 0x0000000100000000, 0 ) );
mm512_bcast128hi_64( 0x0000000100000000 ) );
for ( i = 0; i < 10; ++i )
transform_4way( sp );
@@ -452,21 +423,6 @@ int cube_4way_update_close( cube_4way_context *sp, void *output,
// 2 way 128
// This isn't expected to be used with AVX512 so HW rotate intruction
// is assumed not avaiable.
// Use double buffering to optimize serial bit rotations. Full double
// buffering isn't practical because it needs twice as many registers
// with AVX2 having only half as many as AVX512.
#define ROL2( out0, out1, in0, in1, c ) \
{ \
__m256i t0 = _mm256_slli_epi32( in0, c ); \
__m256i t1 = _mm256_slli_epi32( in1, c ); \
out0 = _mm256_srli_epi32( in0, 32-(c) ); \
out1 = _mm256_srli_epi32( in1, 32-(c) ); \
out0 = _mm256_or_si256( out0, t0 ); \
out1 = _mm256_or_si256( out1, t1 ); \
}
static void transform_2way( cube_2way_context *sp )
{
int r;
@@ -489,33 +445,33 @@ static void transform_2way( cube_2way_context *sp )
x5 = _mm256_add_epi32( x1, x5 );
x6 = _mm256_add_epi32( x2, x6 );
x7 = _mm256_add_epi32( x3, x7 );
y0 = x0;
y1 = x1;
ROL2( x0, x1, x2, x3, 7 );
ROL2( x2, x3, y0, y1, 7 );
x0 = _mm256_xor_si256( x0, x4 );
y0 = mm256_rol_32( x2, 7 );
y1 = mm256_rol_32( x3, 7 );
x2 = mm256_rol_32( x0, 7 );
x3 = mm256_rol_32( x1, 7 );
x0 = _mm256_xor_si256( y0, x4 );
x1 = _mm256_xor_si256( y1, x5 );
x2 = _mm256_xor_si256( x2, x6 );
x3 = _mm256_xor_si256( x3, x7 );
x4 = mm256_swap128_64( x4 );
x1 = _mm256_xor_si256( x1, x5 );
x2 = _mm256_xor_si256( x2, x6 );
x5 = mm256_swap128_64( x5 );
x3 = _mm256_xor_si256( x3, x7 );
x4 = _mm256_add_epi32( x0, x4 );
x6 = mm256_swap128_64( x6 );
y0 = x0;
x5 = _mm256_add_epi32( x1, x5 );
x7 = mm256_swap128_64( x7 );
x4 = _mm256_add_epi32( x0, x4 );
x5 = _mm256_add_epi32( x1, x5 );
x6 = _mm256_add_epi32( x2, x6 );
y1 = x2;
ROL2( x0, x1, x1, y0, 11 );
x7 = _mm256_add_epi32( x3, x7 );
ROL2( x2, x3, x3, y1, 11 );
x0 = _mm256_xor_si256( x0, x4 );
x4 = mm256_swap64_32( x4 );
y0 = mm256_rol_32( x1, 11 );
x1 = mm256_rol_32( x0, 11 );
y1 = mm256_rol_32( x3, 11 );
x3 = mm256_rol_32( x2, 11 );
x0 = _mm256_xor_si256( y0, x4 );
x1 = _mm256_xor_si256( x1, x5 );
x5 = mm256_swap64_32( x5 );
x2 = _mm256_xor_si256( x2, x6 );
x6 = mm256_swap64_32( x6 );
x2 = _mm256_xor_si256( y1, x6 );
x3 = _mm256_xor_si256( x3, x7 );
x4 = mm256_swap64_32( x4 );
x5 = mm256_swap64_32( x5 );
x6 = mm256_swap64_32( x6 );
x7 = mm256_swap64_32( x7 );
}
@@ -540,27 +496,18 @@ int cube_2way_init( cube_2way_context *sp, int hashbitlen, int rounds,
sp->rounds = rounds;
sp->pos = 0;
h[ 0] = m256_const1_128( iv[0] );
h[ 1] = m256_const1_128( iv[1] );
h[ 2] = m256_const1_128( iv[2] );
h[ 3] = m256_const1_128( iv[3] );
h[ 4] = m256_const1_128( iv[4] );
h[ 5] = m256_const1_128( iv[5] );
h[ 6] = m256_const1_128( iv[6] );
h[ 7] = m256_const1_128( iv[7] );
h[ 0] = m256_const1_128( iv[0] );
h[ 1] = m256_const1_128( iv[1] );
h[ 2] = m256_const1_128( iv[2] );
h[ 3] = m256_const1_128( iv[3] );
h[ 4] = m256_const1_128( iv[4] );
h[ 5] = m256_const1_128( iv[5] );
h[ 6] = m256_const1_128( iv[6] );
h[ 7] = m256_const1_128( iv[7] );
h[ 0] = mm256_bcast_m128( iv[0] );
h[ 1] = mm256_bcast_m128( iv[1] );
h[ 2] = mm256_bcast_m128( iv[2] );
h[ 3] = mm256_bcast_m128( iv[3] );
h[ 4] = mm256_bcast_m128( iv[4] );
h[ 5] = mm256_bcast_m128( iv[5] );
h[ 6] = mm256_bcast_m128( iv[6] );
h[ 7] = mm256_bcast_m128( iv[7] );
return 0;
}
int cube_2way_update( cube_2way_context *sp, const void *data, size_t size )
{
const int len = size >> 4;
@@ -587,13 +534,14 @@ int cube_2way_close( cube_2way_context *sp, void *output )
// pos is zero for 64 byte data, 1 for 80 byte data.
sp->h[ sp->pos ] = _mm256_xor_si256( sp->h[ sp->pos ],
m256_const2_64( 0, 0x0000000000000080 ) );
mm256_bcast128lo_64( 0x0000000000000080 ) );
transform_2way( sp );
sp->h[7] = _mm256_xor_si256( sp->h[7],
m256_const2_64( 0x0000000100000000, 0 ) );
mm256_bcast128hi_64( 0x0000000100000000 ) );
for ( i = 0; i < 10; ++i ) transform_2way( sp );
for ( i = 0; i < 10; ++i )
transform_2way( sp );
memcpy( hash, sp->h, sp->hashlen<<5 );
return 0;
@@ -620,13 +568,14 @@ int cube_2way_update_close( cube_2way_context *sp, void *output,
// pos is zero for 64 byte data, 1 for 80 byte data.
sp->h[ sp->pos ] = _mm256_xor_si256( sp->h[ sp->pos ],
m256_const2_64( 0, 0x0000000000000080 ) );
mm256_bcast128lo_64( 0x0000000000000080 ) );
transform_2way( sp );
sp->h[7] = _mm256_xor_si256( sp->h[7],
m256_const2_64( 0x0000000100000000, 0 ) );
mm256_bcast128hi_64( 0x0000000100000000 ) );
for ( i = 0; i < 10; ++i ) transform_2way( sp );
for ( i = 0; i < 10; ++i )
transform_2way( sp );
memcpy( hash, sp->h, sp->hashlen<<5 );
return 0;
@@ -643,14 +592,14 @@ int cube_2way_full( cube_2way_context *sp, void *output, int hashbitlen,
sp->rounds = 16;
sp->pos = 0;
h[ 0] = m256_const1_128( iv[0] );
h[ 1] = m256_const1_128( iv[1] );
h[ 2] = m256_const1_128( iv[2] );
h[ 3] = m256_const1_128( iv[3] );
h[ 4] = m256_const1_128( iv[4] );
h[ 5] = m256_const1_128( iv[5] );
h[ 6] = m256_const1_128( iv[6] );
h[ 7] = m256_const1_128( iv[7] );
h[ 0] = mm256_bcast_m128( iv[0] );
h[ 1] = mm256_bcast_m128( iv[1] );
h[ 2] = mm256_bcast_m128( iv[2] );
h[ 3] = mm256_bcast_m128( iv[3] );
h[ 4] = mm256_bcast_m128( iv[4] );
h[ 5] = mm256_bcast_m128( iv[5] );
h[ 6] = mm256_bcast_m128( iv[6] );
h[ 7] = mm256_bcast_m128( iv[7] );
const int len = size >> 4;
const __m256i *in = (__m256i*)data;
@@ -670,13 +619,14 @@ int cube_2way_full( cube_2way_context *sp, void *output, int hashbitlen,
// pos is zero for 64 byte data, 1 for 80 byte data.
sp->h[ sp->pos ] = _mm256_xor_si256( sp->h[ sp->pos ],
m256_const2_64( 0, 0x0000000000000080 ) );
mm256_bcast128lo_64( 0x0000000000000080 ) );
transform_2way( sp );
sp->h[7] = _mm256_xor_si256( sp->h[7],
m256_const2_64( 0x0000000100000000, 0 ) );
mm256_bcast128hi_64( 0x0000000100000000 ) );
for ( i = 0; i < 10; ++i ) transform_2way( sp );
for ( i = 0; i < 10; ++i )
transform_2way( sp );
memcpy( hash, sp->h, sp->hashlen<<5 );
return 0;

View File

@@ -9,7 +9,6 @@
#include <immintrin.h>
#endif
#include "cubehash_sse2.h"
#include "algo/sha/sha3-defs.h"
#include <stdbool.h>
#include <unistd.h>
#include <memory.h>
@@ -32,7 +31,7 @@ static void transform( cubehashParam *sp )
{
x1 = _mm512_add_epi32( x0, x1 );
x0 = mm512_swap_256( x0 );
x0 = mm512_rol_32( x0, 7 );
x0 = mm512_rol_32( x0, 7 );
x0 = _mm512_xor_si512( x0, x1 );
x1 = mm512_swap128_64( x1 );
x1 = _mm512_add_epi32( x0, x1 );
@@ -58,19 +57,18 @@ static void transform( cubehashParam *sp )
{
x2 = _mm256_add_epi32( x0, x2 );
x3 = _mm256_add_epi32( x1, x3 );
y0 = x0;
x0 = mm256_rol_32( x1, 7 );
x1 = mm256_rol_32( y0, 7 );
x0 = _mm256_xor_si256( x0, x2 );
x1 = _mm256_xor_si256( x1, x3 );
y0 = mm256_rol_32( x1, 7 );
y1 = mm256_rol_32( x0, 7 );
x0 = _mm256_xor_si256( y0, x2 );
x1 = _mm256_xor_si256( y1, x3 );
x2 = mm256_swap128_64( x2 );
x3 = mm256_swap128_64( x3 );
x2 = _mm256_add_epi32( x0, x2 );
x3 = _mm256_add_epi32( x1, x3 );
y0 = mm256_swap_128( x0 );
y1 = mm256_swap_128( x1 );
x0 = mm256_rol_32( y0, 11 );
x1 = mm256_rol_32( y1, 11 );
x0 = mm256_swap_128( x0 );
x1 = mm256_swap_128( x1 );
x0 = mm256_rol_32( x0, 11 );
x1 = mm256_rol_32( x1, 11 );
x0 = _mm256_xor_si256( x0, x2 );
x1 = _mm256_xor_si256( x1, x3 );
x2 = mm256_swap64_32( x2 );
@@ -94,47 +92,48 @@ static void transform( cubehashParam *sp )
x6 = _mm_load_si128( (__m128i*)sp->x + 6 );
x7 = _mm_load_si128( (__m128i*)sp->x + 7 );
for (r = 0; r < rounds; ++r) {
x4 = _mm_add_epi32(x0, x4);
x5 = _mm_add_epi32(x1, x5);
x6 = _mm_add_epi32(x2, x6);
x7 = _mm_add_epi32(x3, x7);
y0 = x2;
y1 = x3;
y2 = x0;
y3 = x1;
x0 = _mm_xor_si128(_mm_slli_epi32(y0, 7), _mm_srli_epi32(y0, 25));
x1 = _mm_xor_si128(_mm_slli_epi32(y1, 7), _mm_srli_epi32(y1, 25));
x2 = _mm_xor_si128(_mm_slli_epi32(y2, 7), _mm_srli_epi32(y2, 25));
x3 = _mm_xor_si128(_mm_slli_epi32(y3, 7), _mm_srli_epi32(y3, 25));
x0 = _mm_xor_si128(x0, x4);
x1 = _mm_xor_si128(x1, x5);
x2 = _mm_xor_si128(x2, x6);
x3 = _mm_xor_si128(x3, x7);
x4 = _mm_shuffle_epi32(x4, 0x4e);
x5 = _mm_shuffle_epi32(x5, 0x4e);
x6 = _mm_shuffle_epi32(x6, 0x4e);
x7 = _mm_shuffle_epi32(x7, 0x4e);
x4 = _mm_add_epi32(x0, x4);
x5 = _mm_add_epi32(x1, x5);
x6 = _mm_add_epi32(x2, x6);
x7 = _mm_add_epi32(x3, x7);
y0 = x1;
y1 = x0;
y2 = x3;
y3 = x2;
x0 = _mm_xor_si128(_mm_slli_epi32(y0, 11), _mm_srli_epi32(y0, 21));
x1 = _mm_xor_si128(_mm_slli_epi32(y1, 11), _mm_srli_epi32(y1, 21));
x2 = _mm_xor_si128(_mm_slli_epi32(y2, 11), _mm_srli_epi32(y2, 21));
x3 = _mm_xor_si128(_mm_slli_epi32(y3, 11), _mm_srli_epi32(y3, 21));
x0 = _mm_xor_si128(x0, x4);
x1 = _mm_xor_si128(x1, x5);
x2 = _mm_xor_si128(x2, x6);
x3 = _mm_xor_si128(x3, x7);
x4 = _mm_shuffle_epi32(x4, 0xb1);
x5 = _mm_shuffle_epi32(x5, 0xb1);
x6 = _mm_shuffle_epi32(x6, 0xb1);
x7 = _mm_shuffle_epi32(x7, 0xb1);
for ( r = 0; r < rounds; ++r )
{
x4 = _mm_add_epi32( x0, x4 );
x5 = _mm_add_epi32( x1, x5 );
x6 = _mm_add_epi32( x2, x6 );
x7 = _mm_add_epi32( x3, x7 );
y0 = x2;
y1 = x3;
y2 = x0;
y3 = x1;
x0 = mm128_rol_32( y0, 7 );
x1 = mm128_rol_32( y1, 7 );
x2 = mm128_rol_32( y2, 7 );
x3 = mm128_rol_32( y3, 7 );
x0 = _mm_xor_si128( x0, x4 );
x1 = _mm_xor_si128( x1, x5 );
x2 = _mm_xor_si128( x2, x6 );
x3 = _mm_xor_si128( x3, x7 );
x4 = _mm_shuffle_epi32( x4, 0x4e );
x5 = _mm_shuffle_epi32( x5, 0x4e );
x6 = _mm_shuffle_epi32( x6, 0x4e );
x7 = _mm_shuffle_epi32( x7, 0x4e );
x4 = _mm_add_epi32( x0, x4 );
x5 = _mm_add_epi32( x1, x5 );
x6 = _mm_add_epi32( x2, x6 );
x7 = _mm_add_epi32( x3, x7 );
y0 = x1;
y1 = x0;
y2 = x3;
y3 = x2;
x0 = mm128_rol_32( y0, 11 );
x1 = mm128_rol_32( y1, 11 );
x2 = mm128_rol_32( y2, 11 );
x3 = mm128_rol_32( y3, 11 );
x0 = _mm_xor_si128( x0, x4 );
x1 = _mm_xor_si128( x1, x5 );
x2 = _mm_xor_si128( x2, x6 );
x3 = _mm_xor_si128( x3, x7 );
x4 = _mm_shuffle_epi32( x4, 0xb1 );
x5 = _mm_shuffle_epi32( x5, 0xb1 );
x6 = _mm_shuffle_epi32( x6, 0xb1 );
x7 = _mm_shuffle_epi32( x7, 0xb1 );
}
_mm_store_si128( (__m128i*)sp->x, x0 );
@@ -180,25 +179,25 @@ int cubehashInit(cubehashParam *sp, int hashbitlen, int rounds, int blockbytes)
if ( hashbitlen == 512 )
{
x[0] = m128_const_64( 0x4167D83E2D538B8B, 0x50F494D42AEA2A61 );
x[1] = m128_const_64( 0x50AC5695CC39968E, 0xC701CF8C3FEE2313 );
x[2] = m128_const_64( 0x825B453797CF0BEF, 0xA647A8B34D42C787 );
x[3] = m128_const_64( 0xA23911AED0E5CD33, 0xF22090C4EEF864D2 );
x[4] = m128_const_64( 0xB64445321B017BEF, 0x148FE485FCD398D9 );
x[5] = m128_const_64( 0x0DBADEA991FA7934, 0x2FF5781C6A536159 );
x[6] = m128_const_64( 0xBC796576B1C62456, 0xA5A70E75D65C8A2B );
x[7] = m128_const_64( 0xD43E3B447795D246, 0xE7989AF11921C8F7 );
x[0] = _mm_set_epi64x( 0x4167D83E2D538B8B, 0x50F494D42AEA2A61 );
x[1] = _mm_set_epi64x( 0x50AC5695CC39968E, 0xC701CF8C3FEE2313 );
x[2] = _mm_set_epi64x( 0x825B453797CF0BEF, 0xA647A8B34D42C787 );
x[3] = _mm_set_epi64x( 0xA23911AED0E5CD33, 0xF22090C4EEF864D2 );
x[4] = _mm_set_epi64x( 0xB64445321B017BEF, 0x148FE485FCD398D9 );
x[5] = _mm_set_epi64x( 0x0DBADEA991FA7934, 0x2FF5781C6A536159 );
x[6] = _mm_set_epi64x( 0xBC796576B1C62456, 0xA5A70E75D65C8A2B );
x[7] = _mm_set_epi64x( 0xD43E3B447795D246, 0xE7989AF11921C8F7 );
}
else
{
x[0] = m128_const_64( 0x35481EAE63117E71, 0xCCD6F29FEA2BD4B4 );
x[1] = m128_const_64( 0xF4CC12BE7E624131, 0xE5D94E6322512D5B );
x[2] = m128_const_64( 0x3361DA8CD0720C35, 0x42AF2070C2D0B696 );
x[3] = m128_const_64( 0x40E5FBAB4680AC00, 0x8EF8AD8328CCECA4 );
x[4] = m128_const_64( 0xF0B266796C859D41, 0x6107FBD5D89041C3 );
x[5] = m128_const_64( 0x93CB628565C892FD, 0x5FA2560309392549 );
x[6] = m128_const_64( 0x85254725774ABFDD, 0x9E4B4E602AF2B5AE );
x[7] = m128_const_64( 0xD6032C0A9CDAF8AF, 0x4AB6AAD615815AEB );
x[0] = _mm_set_epi64x( 0x35481EAE63117E71, 0xCCD6F29FEA2BD4B4 );
x[1] = _mm_set_epi64x( 0xF4CC12BE7E624131, 0xE5D94E6322512D5B );
x[2] = _mm_set_epi64x( 0x3361DA8CD0720C35, 0x42AF2070C2D0B696 );
x[3] = _mm_set_epi64x( 0x40E5FBAB4680AC00, 0x8EF8AD8328CCECA4 );
x[4] = _mm_set_epi64x( 0xF0B266796C859D41, 0x6107FBD5D89041C3 );
x[5] = _mm_set_epi64x( 0x93CB628565C892FD, 0x5FA2560309392549 );
x[6] = _mm_set_epi64x( 0x85254725774ABFDD, 0x9E4B4E602AF2B5AE );
x[7] = _mm_set_epi64x( 0xD6032C0A9CDAF8AF, 0x4AB6AAD615815AEB );
}
return SUCCESS;
@@ -234,10 +233,10 @@ int cubehashDigest( cubehashParam *sp, byte *digest )
// pos is zero for 64 byte data, 1 for 80 byte data.
sp->x[ sp->pos ] = _mm_xor_si128( sp->x[ sp->pos ],
m128_const_64( 0, 0x80 ) );
_mm_set_epi64x( 0, 0x80 ) );
transform( sp );
sp->x[7] = _mm_xor_si128( sp->x[7], m128_const_64( 0x100000000, 0 ) );
sp->x[7] = _mm_xor_si128( sp->x[7], _mm_set_epi64x( 0x100000000, 0 ) );
transform( sp );
transform( sp );
transform( sp );
@@ -279,10 +278,10 @@ int cubehashUpdateDigest( cubehashParam *sp, byte *digest,
// pos is zero for 64 byte data, 1 for 80 byte data.
sp->x[ sp->pos ] = _mm_xor_si128( sp->x[ sp->pos ],
m128_const_64( 0, 0x80 ) );
_mm_set_epi64x( 0, 0x80 ) );
transform( sp );
sp->x[7] = _mm_xor_si128( sp->x[7], m128_const_64( 0x100000000, 0 ) );
sp->x[7] = _mm_xor_si128( sp->x[7], _mm_set_epi64x( 0x100000000, 0 ) );
transform( sp );
transform( sp );
@@ -313,25 +312,25 @@ int cubehash_full( cubehashParam *sp, byte *digest, int hashbitlen,
if ( hashbitlen == 512 )
{
x[0] = m128_const_64( 0x4167D83E2D538B8B, 0x50F494D42AEA2A61 );
x[1] = m128_const_64( 0x50AC5695CC39968E, 0xC701CF8C3FEE2313 );
x[2] = m128_const_64( 0x825B453797CF0BEF, 0xA647A8B34D42C787 );
x[3] = m128_const_64( 0xA23911AED0E5CD33, 0xF22090C4EEF864D2 );
x[4] = m128_const_64( 0xB64445321B017BEF, 0x148FE485FCD398D9 );
x[5] = m128_const_64( 0x0DBADEA991FA7934, 0x2FF5781C6A536159 );
x[6] = m128_const_64( 0xBC796576B1C62456, 0xA5A70E75D65C8A2B );
x[7] = m128_const_64( 0xD43E3B447795D246, 0xE7989AF11921C8F7 );
x[0] = _mm_set_epi64x( 0x4167D83E2D538B8B, 0x50F494D42AEA2A61 );
x[1] = _mm_set_epi64x( 0x50AC5695CC39968E, 0xC701CF8C3FEE2313 );
x[2] = _mm_set_epi64x( 0x825B453797CF0BEF, 0xA647A8B34D42C787 );
x[3] = _mm_set_epi64x( 0xA23911AED0E5CD33, 0xF22090C4EEF864D2 );
x[4] = _mm_set_epi64x( 0xB64445321B017BEF, 0x148FE485FCD398D9 );
x[5] = _mm_set_epi64x( 0x0DBADEA991FA7934, 0x2FF5781C6A536159 );
x[6] = _mm_set_epi64x( 0xBC796576B1C62456, 0xA5A70E75D65C8A2B );
x[7] = _mm_set_epi64x( 0xD43E3B447795D246, 0xE7989AF11921C8F7 );
}
else
{
x[0] = m128_const_64( 0x35481EAE63117E71, 0xCCD6F29FEA2BD4B4 );
x[1] = m128_const_64( 0xF4CC12BE7E624131, 0xE5D94E6322512D5B );
x[2] = m128_const_64( 0x3361DA8CD0720C35, 0x42AF2070C2D0B696 );
x[3] = m128_const_64( 0x40E5FBAB4680AC00, 0x8EF8AD8328CCECA4 );
x[4] = m128_const_64( 0xF0B266796C859D41, 0x6107FBD5D89041C3 );
x[5] = m128_const_64( 0x93CB628565C892FD, 0x5FA2560309392549 );
x[6] = m128_const_64( 0x85254725774ABFDD, 0x9E4B4E602AF2B5AE );
x[7] = m128_const_64( 0xD6032C0A9CDAF8AF, 0x4AB6AAD615815AEB );
x[0] = _mm_set_epi64x( 0x35481EAE63117E71, 0xCCD6F29FEA2BD4B4 );
x[1] = _mm_set_epi64x( 0xF4CC12BE7E624131, 0xE5D94E6322512D5B );
x[2] = _mm_set_epi64x( 0x3361DA8CD0720C35, 0x42AF2070C2D0B696 );
x[3] = _mm_set_epi64x( 0x40E5FBAB4680AC00, 0x8EF8AD8328CCECA4 );
x[4] = _mm_set_epi64x( 0xF0B266796C859D41, 0x6107FBD5D89041C3 );
x[5] = _mm_set_epi64x( 0x93CB628565C892FD, 0x5FA2560309392549 );
x[6] = _mm_set_epi64x( 0x85254725774ABFDD, 0x9E4B4E602AF2B5AE );
x[7] = _mm_set_epi64x( 0xD6032C0A9CDAF8AF, 0x4AB6AAD615815AEB );
}
@@ -358,10 +357,10 @@ int cubehash_full( cubehashParam *sp, byte *digest, int hashbitlen,
// pos is zero for 64 byte data, 1 for 80 byte data.
sp->x[ sp->pos ] = _mm_xor_si128( sp->x[ sp->pos ],
m128_const_64( 0, 0x80 ) );
_mm_set_epi64x( 0, 0x80 ) );
transform( sp );
sp->x[7] = _mm_xor_si128( sp->x[7], m128_const_64( 0x100000000, 0 ) );
sp->x[7] = _mm_xor_si128( sp->x[7], _mm_set_epi64x( 0x100000000, 0 ) );
transform( sp );
transform( sp );

View File

@@ -3,7 +3,7 @@
#include "compat.h"
#include <stdint.h>
#include "algo/sha/sha3-defs.h"
#include "compat/sha3-defs.h"
#define OPTIMIZE_SSE2
@@ -15,11 +15,11 @@
struct _cubehashParam
{
__m128i _ALIGN(64) x[8]; // aligned for __m512i
int hashlen; // __m128i
int rounds;
int blocksize; // __m128i
int pos; // number of __m128i read into x from current block
__m128i _ALIGN(64) x[8]; // aligned for __m256i
};
typedef struct _cubehashParam cubehashParam;

View File

@@ -42,7 +42,7 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
/**
* Output size (in bits) for CubeHash-224.

View File

@@ -566,16 +566,16 @@ HashReturn echo_full( hashState_echo *state, BitSequence *hashval,
state->uHashSize = 256;
state->uBlockLength = 192;
state->uRounds = 8;
state->hashsize = m128_const_64( 0, 0x100 );
state->const1536 = m128_const_64( 0, 0x600 );
state->hashsize = _mm_set_epi64x( 0, 0x100 );
state->const1536 = _mm_set_epi64x( 0, 0x600 );
break;
case 512:
state->uHashSize = 512;
state->uBlockLength = 128;
state->uRounds = 10;
state->hashsize = m128_const_64( 0, 0x200 );
state->const1536 = m128_const_64( 0, 0x400 );
state->hashsize = _mm_set_epi64x( 0, 0x200 );
state->const1536 = _mm_set_epi64x( 0, 0x400 );
break;
default:

View File

@@ -22,7 +22,7 @@
#endif
#include "algo/sha/sha3_common.h"
#include "compat/sha3_common.h"
#include <emmintrin.h>

View File

@@ -162,9 +162,9 @@ void echo_4way_compress( echo_4way_context *ctx, const __m512i *pmsg,
unsigned int r, b, i, j;
__m512i t1, t2, s2, k1;
__m512i _state[4][4], _state2[4][4], _statebackup[4][4];
__m512i one = m512_one_128;
__m512i mul2mask = m512_const2_64( 0, 0x00001b00 );
__m512i lsbmask = m512_const1_32( 0x01010101 );
const __m512i one = mm512_bcast128lo_64( 1 );
const __m512i mul2mask = mm512_bcast128lo_64( 0x00001b00 );
const __m512i lsbmask = _mm512_set1_epi32( 0x01010101 );
_state[ 0 ][ 0 ] = ctx->state[ 0 ][ 0 ];
_state[ 0 ][ 1 ] = ctx->state[ 0 ][ 1 ];
@@ -264,16 +264,16 @@ int echo_4way_init( echo_4way_context *ctx, int nHashSize )
ctx->uHashSize = 256;
ctx->uBlockLength = 192;
ctx->uRounds = 8;
ctx->hashsize = m512_const2_64( 0, 0x100 );
ctx->const1536 = m512_const2_64( 0, 0x600 );
ctx->hashsize = mm512_bcast128lo_64( 0x100 );
ctx->const1536 = mm512_bcast128lo_64( 0x600 );
break;
case 512:
ctx->uHashSize = 512;
ctx->uBlockLength = 128;
ctx->uRounds = 10;
ctx->hashsize = m512_const2_64( 0, 0x200 );
ctx->const1536 = m512_const2_64( 0, 0x400);
ctx->hashsize = mm512_bcast128lo_64( 0x200 );
ctx->const1536 = mm512_bcast128lo_64( 0x400);
break;
default:
@@ -305,7 +305,7 @@ int echo_4way_update_close( echo_4way_context *state, void *hashval,
{
echo_4way_compress( state, data, 1 );
state->processed_bits = 1024;
remainingbits = m512_const2_64( 0, -1024 );
remainingbits = mm512_bcast128lo_64( -1024 );
vlen = 0;
}
else
@@ -313,13 +313,15 @@ int echo_4way_update_close( echo_4way_context *state, void *hashval,
vlen = databitlen / 128; // * 4 lanes / 128 bits per lane
memcpy_512( state->buffer, data, vlen );
state->processed_bits += (unsigned int)( databitlen );
remainingbits = m512_const2_64( 0, (uint64_t)databitlen );
remainingbits = mm512_bcast128lo_64( (uint64_t)databitlen );
}
state->buffer[ vlen ] = m512_const2_64( 0, 0x80 );
state->buffer[ vlen ] = mm512_bcast128lo_64( 0x80 );
memset_zero_512( state->buffer + vlen + 1, vblen - vlen - 2 );
state->buffer[ vblen-2 ] = m512_const2_64( (uint64_t)state->uHashSize << 48, 0 );
state->buffer[ vblen-1 ] = m512_const2_64( 0, state->processed_bits);
state->buffer[ vblen-2 ] =
mm512_bcast128hi_64( (uint64_t)state->uHashSize << 48 );
state->buffer[ vblen-1 ] =
mm512_bcast128lo_64( state->processed_bits );
state->k = _mm512_add_epi64( state->k, remainingbits );
state->k = _mm512_sub_epi64( state->k, state->const1536 );
@@ -352,16 +354,16 @@ int echo_4way_full( echo_4way_context *ctx, void *hashval, int nHashSize,
ctx->uHashSize = 256;
ctx->uBlockLength = 192;
ctx->uRounds = 8;
ctx->hashsize = m512_const2_64( 0, 0x100 );
ctx->const1536 = m512_const2_64( 0, 0x600 );
ctx->hashsize = mm512_bcast128lo_64( 0x100 );
ctx->const1536 = mm512_bcast128lo_64( 0x600 );
break;
case 512:
ctx->uHashSize = 512;
ctx->uBlockLength = 128;
ctx->uRounds = 10;
ctx->hashsize = m512_const2_64( 0, 0x200 );
ctx->const1536 = m512_const2_64( 0, 0x400 );
ctx->hashsize = mm512_bcast128lo_64( 0x200 );
ctx->const1536 = mm512_bcast128lo_64( 0x400 );
break;
default:
@@ -388,7 +390,7 @@ int echo_4way_full( echo_4way_context *ctx, void *hashval, int nHashSize,
{
echo_4way_compress( ctx, data, 1 );
ctx->processed_bits = 1024;
remainingbits = m512_const2_64( 0, -1024 );
remainingbits = mm512_bcast128lo_64( -1024 );
vlen = 0;
}
else
@@ -396,14 +398,14 @@ int echo_4way_full( echo_4way_context *ctx, void *hashval, int nHashSize,
vlen = databitlen / 128; // * 4 lanes / 128 bits per lane
memcpy_512( ctx->buffer, data, vlen );
ctx->processed_bits += (unsigned int)( databitlen );
remainingbits = m512_const2_64( 0, databitlen );
remainingbits = mm512_bcast128lo_64( databitlen );
}
ctx->buffer[ vlen ] = m512_const2_64( 0, 0x80 );
ctx->buffer[ vlen ] = mm512_bcast128lo_64( 0x80 );
memset_zero_512( ctx->buffer + vlen + 1, vblen - vlen - 2 );
ctx->buffer[ vblen-2 ] =
m512_const2_64( (uint64_t)ctx->uHashSize << 48, 0 );
ctx->buffer[ vblen-1 ] = m512_const2_64( 0, ctx->processed_bits);
mm512_bcast128hi_64( (uint64_t)ctx->uHashSize << 48 );
ctx->buffer[ vblen-1 ] = mm512_bcast128lo_64( ctx->processed_bits);
ctx->k = _mm512_add_epi64( ctx->k, remainingbits );
ctx->k = _mm512_sub_epi64( ctx->k, ctx->const1536 );
@@ -425,9 +427,9 @@ int echo_4way_full( echo_4way_context *ctx, void *hashval, int nHashSize,
// AVX2 + VAES
#define mul2mask_2way m256_const2_64( 0, 0x0000000000001b00 )
#define mul2mask_2way mm256_bcast128lo_64( 0x0000000000001b00 )
#define lsbmask_2way m256_const1_32( 0x01010101 )
#define lsbmask_2way _mm256_set1_epi32( 0x01010101 )
#define ECHO_SUBBYTES4_2WAY( state, j ) \
state[0][j] = _mm256_aesenc_epi128( state[0][j], k1 ); \
@@ -467,8 +469,7 @@ int echo_4way_full( echo_4way_context *ctx, void *hashval, int nHashSize,
t1 = _mm256_and_si256( t1, lsbmask_2way ); \
t2 = _mm256_shuffle_epi8( mul2mask_2way, t1 ); \
s2 = _mm256_xor_si256( s2, t2 );\
state2[ 0 ][ j ] = _mm256_xor_si256( state2[ 0 ][ j ], \
_mm256_xor_si256( s2, state1[ 1 ][ j1 ] ) ); \
state2[ 0 ][ j ] = mm256_xor3( state2[ 0 ][ j ], s2, state1[ 1 ][ j1 ] ); \
state2[ 1 ][ j ] = _mm256_xor_si256( state2[ 1 ][ j ], s2 ); \
state2[ 2 ][ j ] = _mm256_xor_si256( state2[ 2 ][ j ], state1[ 1 ][ j1 ] ); \
state2[ 3 ][ j ] = _mm256_xor_si256( state2[ 3 ][ j ], state1[ 1 ][ j1 ] ); \
@@ -478,8 +479,7 @@ int echo_4way_full( echo_4way_context *ctx, void *hashval, int nHashSize,
t2 = _mm256_shuffle_epi8( mul2mask_2way, t1 ); \
s2 = _mm256_xor_si256( s2, t2 ); \
state2[ 0 ][ j ] = _mm256_xor_si256( state2[ 0 ][ j ], state1[ 2 ][ j2 ] ); \
state2[ 1 ][ j ] = _mm256_xor_si256( state2[ 1 ][ j ], \
_mm256_xor_si256( s2, state1[ 2 ][ j2 ] ) ); \
state2[ 1 ][ j ] = mm256_xor3( state2[ 1 ][ j ], s2, state1[ 2 ][ j2 ] ); \
state2[ 2 ][ j ] = _mm256_xor_si256( state2[ 2 ][ j ], s2 ); \
state2[ 3 ][ j ] = _mm256_xor_si256( state2[ 3][ j ], state1[ 2 ][ j2 ] ); \
s2 = _mm256_add_epi8( state1[ 3 ][ j3 ], state1[ 3 ][ j3 ] ); \
@@ -489,8 +489,7 @@ int echo_4way_full( echo_4way_context *ctx, void *hashval, int nHashSize,
s2 = _mm256_xor_si256( s2, t2 ); \
state2[ 0 ][ j ] = _mm256_xor_si256( state2[ 0 ][ j ], state1[ 3 ][ j3 ] ); \
state2[ 1 ][ j ] = _mm256_xor_si256( state2[ 1 ][ j ], state1[ 3 ][ j3 ] ); \
state2[ 2 ][ j ] = _mm256_xor_si256( state2[ 2 ][ j ], \
_mm256_xor_si256( s2, state1[ 3 ][ j3] ) ); \
state2[ 2 ][ j ] = mm256_xor3( state2[ 2 ][ j ], s2, state1[ 3 ][ j3] ); \
state2[ 3 ][ j ] = _mm256_xor_si256( state2[ 3 ][ j ], s2 ); \
} while(0)
@@ -679,16 +678,16 @@ int echo_2way_init( echo_2way_context *ctx, int nHashSize )
ctx->uHashSize = 256;
ctx->uBlockLength = 192;
ctx->uRounds = 8;
ctx->hashsize = m256_const2_64( 0, 0x100 );
ctx->const1536 = m256_const2_64( 0, 0x600 );
ctx->hashsize = mm256_bcast128lo_64( 0x100 );
ctx->const1536 = mm256_bcast128lo_64( 0x600 );
break;
case 512:
ctx->uHashSize = 512;
ctx->uBlockLength = 128;
ctx->uRounds = 10;
ctx->hashsize = m256_const2_64( 0, 0x200 );
ctx->const1536 = m256_const2_64( 0, 0x400 );
ctx->hashsize = mm256_bcast128lo_64( 0x200 );
ctx->const1536 = mm256_bcast128lo_64( 0x400 );
break;
default:
@@ -720,20 +719,20 @@ int echo_2way_update_close( echo_2way_context *state, void *hashval,
{
echo_2way_compress( state, data, 1 );
state->processed_bits = 1024;
remainingbits = m256_const2_64( 0, -1024 );
remainingbits = mm256_bcast128lo_64( -1024 );
vlen = 0;
}
else
{
memcpy_256( state->buffer, data, vlen );
state->processed_bits += (unsigned int)( databitlen );
remainingbits = m256_const2_64( 0, databitlen );
remainingbits = mm256_bcast128lo_64( databitlen );
}
state->buffer[ vlen ] = m256_const2_64( 0, 0x80 );
state->buffer[ vlen ] = mm256_bcast128lo_64( 0x80 );
memset_zero_256( state->buffer + vlen + 1, vblen - vlen - 2 );
state->buffer[ vblen-2 ] = m256_const2_64( (uint64_t)state->uHashSize << 48, 0 );
state->buffer[ vblen-1 ] = m256_const2_64( 0, state->processed_bits );
state->buffer[ vblen-2 ] = mm256_bcast128hi_64( (uint64_t)state->uHashSize << 48 );
state->buffer[ vblen-1 ] = mm256_bcast128lo_64( state->processed_bits );
state->k = _mm256_add_epi64( state->k, remainingbits );
state->k = _mm256_sub_epi64( state->k, state->const1536 );
@@ -766,16 +765,16 @@ int echo_2way_full( echo_2way_context *ctx, void *hashval, int nHashSize,
ctx->uHashSize = 256;
ctx->uBlockLength = 192;
ctx->uRounds = 8;
ctx->hashsize = m256_const2_64( 0, 0x100 );
ctx->const1536 = m256_const2_64( 0, 0x600 );
ctx->hashsize = mm256_bcast128lo_64( 0x100 );
ctx->const1536 = mm256_bcast128lo_64( 0x600 );
break;
case 512:
ctx->uHashSize = 512;
ctx->uBlockLength = 128;
ctx->uRounds = 10;
ctx->hashsize = m256_const2_64( 0, 0x200 );
ctx->const1536 = m256_const2_64( 0, 0x400 );
ctx->hashsize = mm256_bcast128lo_64( 0x200 );
ctx->const1536 = mm256_bcast128lo_64( 0x400 );
break;
default:
@@ -798,7 +797,7 @@ int echo_2way_full( echo_2way_context *ctx, void *hashval, int nHashSize,
{
echo_2way_compress( ctx, data, 1 );
ctx->processed_bits = 1024;
remainingbits = m256_const2_64( 0, -1024 );
remainingbits = mm256_bcast128lo_64( -1024 );
vlen = 0;
}
else
@@ -806,13 +805,13 @@ int echo_2way_full( echo_2way_context *ctx, void *hashval, int nHashSize,
vlen = databitlen / 128; // * 4 lanes / 128 bits per lane
memcpy_256( ctx->buffer, data, vlen );
ctx->processed_bits += (unsigned int)( databitlen );
remainingbits = m256_const2_64( 0, databitlen );
remainingbits = mm256_bcast128lo_64( databitlen );
}
ctx->buffer[ vlen ] = m256_const2_64( 0, 0x80 );
ctx->buffer[ vlen ] = mm256_bcast128lo_64( 0x80 );
memset_zero_256( ctx->buffer + vlen + 1, vblen - vlen - 2 );
ctx->buffer[ vblen-2 ] = m256_const2_64( (uint64_t)ctx->uHashSize << 48, 0 );
ctx->buffer[ vblen-1 ] = m256_const2_64( 0, ctx->processed_bits );
ctx->buffer[ vblen-2 ] = mm256_bcast128hi_64( (uint64_t)ctx->uHashSize << 48 );
ctx->buffer[ vblen-1 ] = mm256_bcast128lo_64( ctx->processed_bits );
ctx->k = _mm256_add_epi64( ctx->k, remainingbits );
ctx->k = _mm256_sub_epi64( ctx->k, ctx->const1536 );

View File

@@ -73,7 +73,7 @@ extern "C"{
#endif
#define AES_BIG_ENDIAN 0
#include "algo/sha/aes_helper.c"
#include "compat/aes_helper.c"
#if SPH_ECHO_64

View File

@@ -43,7 +43,7 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
/**
* Output size (in bits) for ECHO-224.

View File

@@ -33,11 +33,11 @@ MYALIGN const unsigned long long _supermix4b[] = {0x07020d08080e0d0d, 0x07070908
MYALIGN const unsigned long long _supermix4c[] = {0x0706050403020000, 0x0302000007060504};
MYALIGN const unsigned long long _supermix7a[] = {0x010c0b060d080702, 0x0904030e03000104};
MYALIGN const unsigned long long _supermix7b[] = {0x8080808080808080, 0x0504070605040f06};
MYALIGN const unsigned long long _k_n[] = {0x4E4E4E4E4E4E4E4E, 0x1B1B1B1B0E0E0E0E};
MYALIGN const unsigned char _shift_one_mask[] = {7, 4, 5, 6, 11, 8, 9, 10, 15, 12, 13, 14, 3, 0, 1, 2};
MYALIGN const unsigned char _shift_four_mask[] = {13, 14, 15, 12, 1, 2, 3, 0, 5, 6, 7, 4, 9, 10, 11, 8};
MYALIGN const unsigned char _shift_seven_mask[] = {10, 11, 8, 9, 14, 15, 12, 13, 2, 3, 0, 1, 6, 7, 4, 5};
MYALIGN const unsigned char _aes_shift_rows[] = {0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11};
//MYALIGN const unsigned long long _k_n[] = {0x4E4E4E4E4E4E4E4E, 0x1B1B1B1B0E0E0E0E};
//MYALIGN const unsigned char _shift_one_mask[] = {7, 4, 5, 6, 11, 8, 9, 10, 15, 12, 13, 14, 3, 0, 1, 2};
//MYALIGN const unsigned char _shift_four_mask[] = {13, 14, 15, 12, 1, 2, 3, 0, 5, 6, 7, 4, 9, 10, 11, 8};
//MYALIGN const unsigned char _shift_seven_mask[] = {10, 11, 8, 9, 14, 15, 12, 13, 2, 3, 0, 1, 6, 7, 4, 5};
//MYALIGN const unsigned char _aes_shift_rows[] = {0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11};
MYALIGN const unsigned int _inv_shift_rows[] = {0x070a0d00, 0x0b0e0104, 0x0f020508, 0x0306090c};
MYALIGN const unsigned int _mul2mask[] = {0x1b1b0000, 0x00000000, 0x00000000, 0x00000000};
MYALIGN const unsigned int _mul4mask[] = {0x2d361b00, 0x00000000, 0x00000000, 0x00000000};
@@ -131,7 +131,7 @@ MYALIGN const unsigned int _IV512[] = {
t1 = _mm_srli_epi16(t0, 6);\
t1 = _mm_and_si128(t1, M128(_lsbmask2));\
t3 = _mm_xor_si128(t3, _mm_shuffle_epi8(M128(_mul2mask), t1));\
t0 = _mm_xor_si128(t4, _mm_shuffle_epi8(M128(_mul4mask), t1))
t0 = _mm_xor_si128(t4, _mm_shuffle_epi8(M128(_mul4mask), t1))
/*
#define PRESUPERMIX(x, t1, s1, s2, t2)\

View File

@@ -20,7 +20,7 @@
#error "Unsupported configuration, AES needs SSE4.1. Compile without AES."
#endif
#include "algo/sha/sha3_common.h"
#include "compat/sha3_common.h"
#include "simd-utils.h"

View File

@@ -2,7 +2,7 @@
#define SPH_FUGUE_H__
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
#ifdef __cplusplus
extern "C"{

View File

@@ -41,7 +41,7 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
/**
* Output size (in bits) for GOST-256.

View File

@@ -139,7 +139,7 @@ static const __m128i SUBSH_MASK7 = { 0x06090c0f0205080b, 0x0e0104070a0d0003 };
\
/* compute z_i : double x_i using temp xmm8 and 1B xmm9 */\
/* compute w_i : add y_{i+4} */\
b1 = m128_const1_64( 0x1b1b1b1b1b1b1b1b );\
b1 = _mm_set1_epi64x( 0x1b1b1b1b1b1b1b1b );\
MUL2(a0, b0, b1);\
a0 = _mm_xor_si128(a0, TEMP0);\
MUL2(a1, b0, b1);\
@@ -237,7 +237,7 @@ static const __m128i SUBSH_MASK7 = { 0x06090c0f0205080b, 0x0e0104070a0d0003 };
\
/* compute z_i : double x_i using temp xmm8 and 1B xmm9 */\
/* compute w_i : add y_{i+4} */\
b1 = m128_const1_64( 0x1b1b1b1b1b1b1b1b );\
b1 = _mm_set1_epi64x( 0x1b1b1b1b1b1b1b1b );\
MUL2(a0, b0, b1);\
a0 = _mm_xor_si128(a0, TEMP0);\
MUL2(a1, b0, b1);\

View File

@@ -128,7 +128,7 @@ static const __m128i SUBSH_MASK7 = { 0x090c000306080b07, 0x02050f0a0d01040e };
\
/* compute z_i : double x_i using temp xmm8 and 1B xmm9 */\
/* compute w_i : add y_{i+4} */\
b1 = m128_const1_64( 0x1b1b1b1b1b1b1b1b );\
b1 = _mm_set1_epi64x( 0x1b1b1b1b1b1b1b1b );\
MUL2(a0, b0, b1);\
a0 = _mm_xor_si128(a0, TEMP0);\
MUL2(a1, b0, b1);\
@@ -226,7 +226,7 @@ static const __m128i SUBSH_MASK7 = { 0x090c000306080b07, 0x02050f0a0d01040e };
\
/* compute z_i : double x_i using temp xmm8 and 1B xmm9 */\
/* compute w_i : add y_{i+4} */\
b1 = m128_const1_64( 0x1b1b1b1b1b1b1b1b );\
b1 = _mm_set1_epi64x( 0x1b1b1b1b1b1b1b1b );\
MUL2(a0, b0, b1);\
a0 = _mm_xor_si128(a0, TEMP0);\
MUL2(a1, b0, b1);\
@@ -275,7 +275,7 @@ static const __m128i SUBSH_MASK7 = { 0x090c000306080b07, 0x02050f0a0d01040e };
*/
#define ROUND(i, a0, a1, a2, a3, a4, a5, a6, a7, b0, b1, b2, b3, b4, b5, b6, b7){\
/* AddRoundConstant */\
b1 = m128_const_64( 0xffffffffffffffff, 0 ); \
b1 = _mm_set_epi64x( 0xffffffffffffffff, 0 ); \
a0 = _mm_xor_si128( a0, casti_m128i( round_const_l0, i ) ); \
a1 = _mm_xor_si128( a1, b1 ); \
a2 = _mm_xor_si128( a2, b1 ); \

View File

@@ -24,9 +24,6 @@ HashReturn_gr init_groestl( hashState_groestl* ctx, int hashlen )
ctx->hashlen = hashlen;
if (ctx->chaining == NULL || ctx->buffer == NULL)
return FAIL_GR;
for ( i = 0; i < SIZE512; i++ )
{
ctx->chaining[i] = _mm_setzero_si128();
@@ -34,7 +31,7 @@ HashReturn_gr init_groestl( hashState_groestl* ctx, int hashlen )
}
// The only non-zero in the IV is len. It can be hard coded.
ctx->chaining[ 6 ] = m128_const_64( 0x0200000000000000, 0 );
ctx->chaining[ 6 ] = _mm_set_epi64x( 0x0200000000000000, 0 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
@@ -46,15 +43,12 @@ HashReturn_gr reinit_groestl( hashState_groestl* ctx )
{
int i;
if (ctx->chaining == NULL || ctx->buffer == NULL)
return FAIL_GR;
for ( i = 0; i < SIZE512; i++ )
{
ctx->chaining[i] = _mm_setzero_si128();
ctx->buffer[i] = _mm_setzero_si128();
}
ctx->chaining[ 6 ] = m128_const_64( 0x0200000000000000, 0 );
ctx->chaining[ 6 ] = _mm_set_epi64x( 0x0200000000000000, 0 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
@@ -122,7 +116,7 @@ HashReturn_gr final_groestl( hashState_groestl* ctx, void* output )
else
{
// add first padding
ctx->buffer[rem_ptr] = m128_const_64( 0, 0x80 );
ctx->buffer[rem_ptr] = _mm_set_epi64x( 0, 0x80 );
// add zero padding
for ( i = rem_ptr + 1; i < SIZE512 - 1; i++ )
ctx->buffer[i] = _mm_setzero_si128();
@@ -154,16 +148,14 @@ int groestl512_full( hashState_groestl* ctx, void* output,
ctx->chaining[i] = _mm_setzero_si128();
ctx->buffer[i] = _mm_setzero_si128();
}
ctx->chaining[ 6 ] = m128_const_64( 0x0200000000000000, 0 );
ctx->chaining[ 6 ] = _mm_set_epi64x( 0x0200000000000000, 0 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
// --- update ---
const int len = (int)databitlen / 128;
const int hashlen_m128i = ctx->hashlen / 16; // bytes to __m128i
const int hash_offset = SIZE512 - hashlen_m128i;
int rem = ctx->rem_ptr;
uint64_t blocks = len / SIZE512;
__m128i* in = (__m128i*)input;
@@ -175,8 +167,8 @@ int groestl512_full( hashState_groestl* ctx, void* output,
// copy any remaining data to buffer, it may already contain data
// from a previous update for a midstate precalc
for ( i = 0; i < len % SIZE512; i++ )
ctx->buffer[ rem + i ] = in[ ctx->buf_ptr + i ];
i += rem; // use i as rem_ptr in final
ctx->buffer[ i ] = in[ ctx->buf_ptr + i ];
// use i as rem_ptr in final
//--- final ---
@@ -190,7 +182,7 @@ int groestl512_full( hashState_groestl* ctx, void* output,
else
{
// add first padding
ctx->buffer[i] = m128_const_64( 0, 0x80 );
ctx->buffer[i] = _mm_set_epi64x( 0, 0x80 );
// add zero padding
for ( i += 1; i < SIZE512 - 1; i++ )
ctx->buffer[i] = _mm_setzero_si128();
@@ -247,7 +239,7 @@ HashReturn_gr update_and_final_groestl( hashState_groestl* ctx, void* output,
else
{
// add first padding
ctx->buffer[i] = m128_const_64( 0, 0x80 );
ctx->buffer[i] = _mm_set_epi64x( 0, 0x80 );
// add zero padding
for ( i += 1; i < SIZE512 - 1; i++ )
ctx->buffer[i] = _mm_setzero_si128();

View File

@@ -20,8 +20,8 @@
#define LENGTH (512)
#include "brg_endian.h"
#define NEED_UINT_64T
#include "algo/sha/brg_types.h"
//#define NEED_UINT_64T
#include "compat/brg_types.h"
/* some sizes (number of bytes) */
#define ROWS (8)

View File

@@ -22,9 +22,6 @@ HashReturn_gr init_groestl256( hashState_groestl256* ctx, int hashlen )
ctx->hashlen = hashlen;
if (ctx->chaining == NULL || ctx->buffer == NULL)
return FAIL_GR;
for ( i = 0; i < SIZE256; i++ )
{
ctx->chaining[i] = _mm_setzero_si128();
@@ -43,19 +40,14 @@ HashReturn_gr reinit_groestl256(hashState_groestl256* ctx)
{
int i;
if (ctx->chaining == NULL || ctx->buffer == NULL)
return FAIL_GR;
for ( i = 0; i < SIZE256; i++ )
{
ctx->chaining[i] = _mm_setzero_si128();
ctx->buffer[i] = _mm_setzero_si128();
}
ctx->chaining[ 3 ] = m128_const_64( 0, 0x0100000000000000 );
ctx->chaining[ 3 ] = _mm_set_epi64x( 0, 0x0100000000000000 );
// ((u64*)ctx->chaining)[COLS-1] = U64BIG((u64)LENGTH);
// INIT256(ctx->chaining);
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
@@ -227,12 +219,10 @@ int groestl256_full( hashState_groestl256* ctx,
((u64*)ctx->chaining)[COLS-1] = U64BIG((u64)LENGTH);
INIT256( ctx->chaining );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
const int len = (int)databitlen / 128;
const int hashlen_m128i = ctx->hashlen / 16; // bytes to __m128i
const int hash_offset = SIZE256 - hashlen_m128i;
int rem = ctx->rem_ptr;
int blocks = len / SIZE256;
__m128i* in = (__m128i*)input;
@@ -245,7 +235,7 @@ int groestl256_full( hashState_groestl256* ctx,
// cryptonight has 200 byte input, an odd number of __m128i
// remainder is only 8 bytes, ie u64.
if ( databitlen % 128 !=0 )
if ( databitlen % 128 != 0 )
{
// must be cryptonight, copy 64 bits of data
*(uint64_t*)(ctx->buffer) = *(uint64_t*)(&in[ ctx->buf_ptr ] );
@@ -255,8 +245,8 @@ int groestl256_full( hashState_groestl256* ctx,
{
// Copy any remaining data to buffer for final transform
for ( i = 0; i < len % SIZE256; i++ )
ctx->buffer[ rem + i ] = in[ ctx->buf_ptr + i ];
i += rem; // use i as rem_ptr in final
ctx->buffer[ i ] = in[ ctx->buf_ptr + i ];
// use i as rem_ptr in final
}
//--- final ---

View File

@@ -34,8 +34,7 @@ typedef crypto_uint64 u64;
//#define LENGTH (512)
#include "brg_endian.h"
#define NEED_UINT_64T
#include "algo/sha/brg_types.h"
#include "compat/brg_types.h"
#ifdef IACA_TRACE
#include IACA_MARKS

View File

@@ -17,7 +17,7 @@ bool register_dmd_gr_algo( algo_gate_t *gate )
bool register_groestl_algo( algo_gate_t* gate )
{
register_dmd_gr_algo( gate );
gate->gen_merkle_root = (void*)&SHA256_gen_merkle_root;
gate->gen_merkle_root = (void*)&sha256_gen_merkle_root;
return true;
};

View File

@@ -26,9 +26,6 @@ int groestl256_4way_init( groestl256_4way_context* ctx, uint64_t hashlen )
ctx->hashlen = hashlen;
if (ctx->chaining == NULL || ctx->buffer == NULL)
return 1;
for ( i = 0; i < SIZE256; i++ )
{
ctx->chaining[i] = m512_zero;
@@ -36,8 +33,7 @@ int groestl256_4way_init( groestl256_4way_context* ctx, uint64_t hashlen )
}
// The only non-zero in the IV is len. It can be hard coded.
ctx->chaining[ 3 ] = m512_const2_64( 0, 0x0100000000000000 );
ctx->chaining[ 3 ] = mm512_bcast128lo_64( 0x0100000000000000 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
@@ -50,14 +46,10 @@ int groestl256_4way_full( groestl256_4way_context* ctx, void* output,
const int len = (int)datalen >> 4;
const int hashlen_m128i = 32 >> 4; // bytes to __m128i
const int hash_offset = SIZE256 - hashlen_m128i;
int rem = ctx->rem_ptr;
uint64_t blocks = len / SIZE256;
__m512i* in = (__m512i*)input;
int i;
if (ctx->chaining == NULL || ctx->buffer == NULL)
return 1;
for ( i = 0; i < SIZE256; i++ )
{
ctx->chaining[i] = m512_zero;
@@ -65,9 +57,8 @@ int groestl256_4way_full( groestl256_4way_context* ctx, void* output,
}
// The only non-zero in the IV is len. It can be hard coded.
ctx->chaining[ 3 ] = m512_const2_64( 0, 0x0100000000000000 );
ctx->chaining[ 3 ] = mm512_bcast128lo_64( 0x0100000000000000 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
// --- update ---
@@ -76,11 +67,10 @@ int groestl256_4way_full( groestl256_4way_context* ctx, void* output,
TF512_4way( ctx->chaining, &in[ i * SIZE256 ] );
ctx->buf_ptr = blocks * SIZE256;
// copy any remaining data to buffer, it may already contain data
// from a previous update for a midstate precalc
// copy any remaining data to buffer
for ( i = 0; i < len % SIZE256; i++ )
ctx->buffer[ rem + i ] = in[ ctx->buf_ptr + i ];
i += rem; // use i as rem_ptr in final
ctx->buffer[ i ] = in[ ctx->buf_ptr + i ];
// use i as rem_ptr in final
//--- final ---
@@ -89,18 +79,18 @@ int groestl256_4way_full( groestl256_4way_context* ctx, void* output,
if ( i == SIZE256 - 1 )
{
// only 1 vector left in buffer, all padding at once
ctx->buffer[i] = m512_const2_64( blocks << 56, 0x80 );
ctx->buffer[i] = mm512_set2_64( blocks << 56, 0x80 );
}
else
{
// add first padding
ctx->buffer[i] = m512_const2_64( 0, 0x80 );
ctx->buffer[i] = mm512_bcast128lo_64( 0x80 );
// add zero padding
for ( i += 1; i < SIZE256 - 1; i++ )
ctx->buffer[i] = m512_zero;
// add length padding, second last byte is zero unless blocks > 255
ctx->buffer[i] = m512_const2_64( blocks << 56, 0 );
ctx->buffer[i] = mm512_bcast128hi_64( blocks << 56 );
}
// digest final padding block and do output transform
@@ -146,18 +136,18 @@ int groestl256_4way_update_close( groestl256_4way_context* ctx, void* output,
if ( i == SIZE256 - 1 )
{
// only 1 vector left in buffer, all padding at once
ctx->buffer[i] = m512_const2_64( blocks << 56, 0x80 );
ctx->buffer[i] = mm512_set2_64( blocks << 56, 0x80 );
}
else
{
// add first padding
ctx->buffer[i] = m512_const2_64( 0, 0x80 );
ctx->buffer[i] = mm512_bcast128lo_64( 0x80 );
// add zero padding
for ( i += 1; i < SIZE256 - 1; i++ )
ctx->buffer[i] = m512_zero;
// add length padding, second last byte is zero unless blocks > 255
ctx->buffer[i] = m512_const2_64( blocks << 56, 0 );
ctx->buffer[i] = mm512_bcast128hi_64( blocks << 56 );
}
// digest final padding block and do output transform
@@ -182,8 +172,8 @@ int groestl256_2way_init( groestl256_2way_context* ctx, uint64_t hashlen )
ctx->hashlen = hashlen;
if (ctx->chaining == NULL || ctx->buffer == NULL)
return 1;
// if (ctx->chaining == NULL || ctx->buffer == NULL)
// return 1;
for ( i = 0; i < SIZE256; i++ )
{
@@ -192,7 +182,7 @@ int groestl256_2way_init( groestl256_2way_context* ctx, uint64_t hashlen )
}
// The only non-zero in the IV is len. It can be hard coded.
ctx->chaining[ 3 ] = m256_const2_64( 0, 0x0100000000000000 );
ctx->chaining[ 3 ] = mm256_bcast128lo_64( 0x0100000000000000 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
@@ -206,14 +196,10 @@ int groestl256_2way_full( groestl256_2way_context* ctx, void* output,
const int len = (int)datalen >> 4;
const int hashlen_m128i = 32 >> 4; // bytes to __m128i
const int hash_offset = SIZE256 - hashlen_m128i;
int rem = ctx->rem_ptr;
uint64_t blocks = len / SIZE256;
__m256i* in = (__m256i*)input;
int i;
if (ctx->chaining == NULL || ctx->buffer == NULL)
return 1;
for ( i = 0; i < SIZE256; i++ )
{
ctx->chaining[i] = m256_zero;
@@ -221,9 +207,8 @@ int groestl256_2way_full( groestl256_2way_context* ctx, void* output,
}
// The only non-zero in the IV is len. It can be hard coded.
ctx->chaining[ 3 ] = m256_const2_64( 0, 0x0100000000000000 );
ctx->chaining[ 3 ] = mm256_bcast128lo_64( 0x0100000000000000 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
// --- update ---
@@ -232,11 +217,10 @@ int groestl256_2way_full( groestl256_2way_context* ctx, void* output,
TF512_2way( ctx->chaining, &in[ i * SIZE256 ] );
ctx->buf_ptr = blocks * SIZE256;
// copy any remaining data to buffer, it may already contain data
// from a previous update for a midstate precalc
// copy any remaining data to buffer
for ( i = 0; i < len % SIZE256; i++ )
ctx->buffer[ rem + i ] = in[ ctx->buf_ptr + i ];
i += rem; // use i as rem_ptr in final
ctx->buffer[ i ] = in[ ctx->buf_ptr + i ];
// use i as rem_ptr in final
//--- final ---
@@ -245,18 +229,18 @@ int groestl256_2way_full( groestl256_2way_context* ctx, void* output,
if ( i == SIZE256 - 1 )
{
// only 1 vector left in buffer, all padding at once
ctx->buffer[i] = m256_const2_64( blocks << 56, 0x80 );
ctx->buffer[i] = mm256_set2_64( blocks << 56, 0x80 );
}
else
{
// add first padding
ctx->buffer[i] = m256_const2_64( 0, 0x80 );
ctx->buffer[i] = mm256_bcast128lo_64( 0x80 );
// add zero padding
for ( i += 1; i < SIZE256 - 1; i++ )
ctx->buffer[i] = m256_zero;
// add length padding, second last byte is zero unless blocks > 255
ctx->buffer[i] = m256_const2_64( blocks << 56, 0 );
ctx->buffer[i] = mm256_bcast128hi_64( blocks << 56 );
}
// digest final padding block and do output transform
@@ -301,23 +285,22 @@ int groestl256_2way_update_close( groestl256_2way_context* ctx, void* output,
if ( i == SIZE256 - 1 )
{
// only 1 vector left in buffer, all padding at once
ctx->buffer[i] = m256_const2_64( blocks << 56, 0x80 );
ctx->buffer[i] = mm256_set2_64( blocks << 56, 0x80 );
}
else
{
// add first padding
ctx->buffer[i] = m256_const2_64( 0, 0x80 );
ctx->buffer[i] = mm256_bcast128lo_64( 0x80 );
// add zero padding
for ( i += 1; i < SIZE256 - 1; i++ )
ctx->buffer[i] = m256_zero;
// add length padding, second last byte is zero unless blocks > 255
ctx->buffer[i] = m256_const2_64( blocks << 56, 0 );
ctx->buffer[i] = mm256_bcast128hi_64( blocks << 56 );
}
// digest final padding block and do output transform
TF512_2way( ctx->chaining, ctx->buffer );
OF512_2way( ctx->chaining );
// store hash result in output

View File

@@ -22,10 +22,6 @@
#define LENGTH (256)
//#include "brg_endian.h"
//#define NEED_UINT_64T
//#include "algo/sha/brg_types.h"
/* some sizes (number of bytes) */
#define ROWS (8)
#define LENGTHFIELDLEN (ROWS)

View File

@@ -165,7 +165,7 @@ static const __m512i SUBSH_MASK7 = { 0x090c000306080b07, 0x02050f0a0d01040e,
\
/* compute z_i : double x_i using temp xmm8 and 1B xmm9 */\
/* compute w_i : add y_{i+4} */\
b1 = m512_const1_64( 0x1b1b1b1b1b1b1b1b ); \
b1 = _mm512_set1_epi64( 0x1b1b1b1b1b1b1b1b ); \
MUL2( a0, b0, b1 ); \
a0 = _mm512_xor_si512( a0, TEMP0 ); \
MUL2( a1, b0, b1 ); \
@@ -205,116 +205,18 @@ static const __m512i SUBSH_MASK7 = { 0x090c000306080b07, 0x02050f0a0d01040e,
b1 = _mm512_xor_si512( b1, a4 ); \
}/*MixBytes*/
#if 0
#define MixBytes(a0, a1, a2, a3, a4, a5, a6, a7, b0, b1, b2, b3, b4, b5, b6, b7){\
/* t_i = a_i + a_{i+1} */\
b6 = a0;\
b7 = a1;\
a0 = _mm512_xor_si512(a0, a1);\
b0 = a2;\
a1 = _mm512_xor_si512(a1, a2);\
b1 = a3;\
a2 = _mm512_xor_si512(a2, a3);\
b2 = a4;\
a3 = _mm512_xor_si512(a3, a4);\
b3 = a5;\
a4 = _mm512_xor_si512(a4, a5);\
b4 = a6;\
a5 = _mm512_xor_si512(a5, a6);\
b5 = a7;\
a6 = _mm512_xor_si512(a6, a7);\
a7 = _mm512_xor_si512(a7, b6);\
\
/* build y4 y5 y6 ... in regs xmm8, xmm9, xmm10 by adding t_i*/\
b0 = _mm512_xor_si512(b0, a4);\
b6 = _mm512_xor_si512(b6, a4);\
b1 = _mm512_xor_si512(b1, a5);\
b7 = _mm512_xor_si512(b7, a5);\
b2 = _mm512_xor_si512(b2, a6);\
b0 = _mm512_xor_si512(b0, a6);\
/* spill values y_4, y_5 to memory */\
TEMP0 = b0;\
b3 = _mm512_xor_si512(b3, a7);\
b1 = _mm512_xor_si512(b1, a7);\
TEMP1 = b1;\
b4 = _mm512_xor_si512(b4, a0);\
b2 = _mm512_xor_si512(b2, a0);\
/* save values t0, t1, t2 to xmm8, xmm9 and memory */\
b0 = a0;\
b5 = _mm512_xor_si512(b5, a1);\
b3 = _mm512_xor_si512(b3, a1);\
b1 = a1;\
b6 = _mm512_xor_si512(b6, a2);\
b4 = _mm512_xor_si512(b4, a2);\
TEMP2 = a2;\
b7 = _mm512_xor_si512(b7, a3);\
b5 = _mm512_xor_si512(b5, a3);\
\
/* compute x_i = t_i + t_{i+3} */\
a0 = _mm512_xor_si512(a0, a3);\
a1 = _mm512_xor_si512(a1, a4);\
a2 = _mm512_xor_si512(a2, a5);\
a3 = _mm512_xor_si512(a3, a6);\
a4 = _mm512_xor_si512(a4, a7);\
a5 = _mm512_xor_si512(a5, b0);\
a6 = _mm512_xor_si512(a6, b1);\
a7 = _mm512_xor_si512(a7, TEMP2);\
\
/* compute z_i : double x_i using temp xmm8 and 1B xmm9 */\
/* compute w_i : add y_{i+4} */\
b1 = m512_const1_64( 0x1b1b1b1b1b1b1b1b );\
MUL2(a0, b0, b1);\
a0 = _mm512_xor_si512(a0, TEMP0);\
MUL2(a1, b0, b1);\
a1 = _mm512_xor_si512(a1, TEMP1);\
MUL2(a2, b0, b1);\
a2 = _mm512_xor_si512(a2, b2);\
MUL2(a3, b0, b1);\
a3 = _mm512_xor_si512(a3, b3);\
MUL2(a4, b0, b1);\
a4 = _mm512_xor_si512(a4, b4);\
MUL2(a5, b0, b1);\
a5 = _mm512_xor_si512(a5, b5);\
MUL2(a6, b0, b1);\
a6 = _mm512_xor_si512(a6, b6);\
MUL2(a7, b0, b1);\
a7 = _mm512_xor_si512(a7, b7);\
\
/* compute v_i : double w_i */\
/* add to y_4 y_5 .. v3, v4, ... */\
MUL2(a0, b0, b1);\
b5 = _mm512_xor_si512(b5, a0);\
MUL2(a1, b0, b1);\
b6 = _mm512_xor_si512(b6, a1);\
MUL2(a2, b0, b1);\
b7 = _mm512_xor_si512(b7, a2);\
MUL2(a5, b0, b1);\
b2 = _mm512_xor_si512(b2, a5);\
MUL2(a6, b0, b1);\
b3 = _mm512_xor_si512(b3, a6);\
MUL2(a7, b0, b1);\
b4 = _mm512_xor_si512(b4, a7);\
MUL2(a3, b0, b1);\
MUL2(a4, b0, b1);\
b0 = TEMP0;\
b1 = TEMP1;\
b0 = _mm512_xor_si512(b0, a3);\
b1 = _mm512_xor_si512(b1, a4);\
}/*MixBytes*/
#endif
#define MASK_NOT( a ) _mm512_mask_ternarylogic_epi64( a, 0xaa, a, a, 1 )
#define ROUND(i, a0, a1, a2, a3, a4, a5, a6, a7, b0, b1, b2, b3, b4, b5, b6, b7){\
/* AddRoundConstant */\
b1 = m512_const2_64( 0xffffffffffffffff, 0 ); \
a0 = _mm512_xor_si512( a0, m512_const1_128( round_const_l0[i] ) );\
a1 = _mm512_xor_si512( a1, b1 );\
a2 = _mm512_xor_si512( a2, b1 );\
a3 = _mm512_xor_si512( a3, b1 );\
a4 = _mm512_xor_si512( a4, b1 );\
a5 = _mm512_xor_si512( a5, b1 );\
a6 = _mm512_xor_si512( a6, b1 );\
a7 = _mm512_xor_si512( a7, m512_const1_128( round_const_l7[i] ) );\
a0 = _mm512_xor_si512( a0, mm512_bcast_m128( round_const_l0[i] ) );\
a1 = MASK_NOT( a1 ); \
a2 = MASK_NOT( a2 ); \
a3 = MASK_NOT( a3 ); \
a4 = MASK_NOT( a4 ); \
a5 = MASK_NOT( a5 ); \
a6 = MASK_NOT( a6 ); \
a7 = _mm512_xor_si512( a7, mm512_bcast_m128( round_const_l7[i] ) );\
\
/* ShiftBytes + SubBytes (interleaved) */\
b0 = _mm512_xor_si512( b0, b0 );\
@@ -450,7 +352,7 @@ static const __m512i SUBSH_MASK7 = { 0x090c000306080b07, 0x02050f0a0d01040e,
* outputs: (i0-7) = (0|S)
*/
#define Matrix_Transpose_O_B(i0, i1, i2, i3, i4, i5, i6, i7, t0){\
t0 = _mm512_xor_si512( t0, t0 );\
t0 = m512_zero;\
i1 = i0;\
i3 = i2;\
i5 = i4;\
@@ -481,11 +383,11 @@ static const __m512i SUBSH_MASK7 = { 0x090c000306080b07, 0x02050f0a0d01040e,
void TF512_4way( __m512i* chaining, __m512i* message )
{
static __m512i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
static __m512i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
static __m512i TEMP0;
static __m512i TEMP1;
static __m512i TEMP2;
__m512i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
__m512i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
__m512i TEMP0;
__m512i TEMP1;
__m512i TEMP2;
/* load message into registers xmm12 - xmm15 */
xmm12 = message[0];
@@ -547,11 +449,11 @@ void TF512_4way( __m512i* chaining, __m512i* message )
void OF512_4way( __m512i* chaining )
{
static __m512i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
static __m512i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
static __m512i TEMP0;
static __m512i TEMP1;
static __m512i TEMP2;
__m512i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
__m512i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
__m512i TEMP0;
__m512i TEMP1;
__m512i TEMP2;
/* load CV into registers xmm8, xmm10, xmm12, xmm14 */
xmm8 = chaining[0];
@@ -637,7 +539,7 @@ static const __m256i SUBSH_MASK7_2WAY =
j = _mm256_cmpgt_epi8(j, i );\
i = _mm256_add_epi8(i, i);\
j = _mm256_and_si256(j, k);\
i = _mm256_xor_si256(i, j);\
i = mm256_xorand( i, j, k );\
}
#define MixBytes_2way(a0, a1, a2, a3, a4, a5, a6, a7, b0, b1, b2, b3, b4, b5, b6, b7){\
@@ -648,7 +550,7 @@ static const __m256i SUBSH_MASK7_2WAY =
b0 = a2;\
a1 = _mm256_xor_si256(a1, a2);\
b1 = a3;\
a2 = _mm256_xor_si256(a2, a3);\
TEMP2 = _mm256_xor_si256(a2, a3);\
b2 = a4;\
a3 = _mm256_xor_si256(a3, a4);\
b3 = a5;\
@@ -660,34 +562,20 @@ static const __m256i SUBSH_MASK7_2WAY =
a7 = _mm256_xor_si256(a7, b6);\
\
/* build y4 y5 y6 ... in regs xmm8, xmm9, xmm10 by adding t_i*/\
b0 = _mm256_xor_si256(b0, a4);\
b6 = _mm256_xor_si256(b6, a4);\
b1 = _mm256_xor_si256(b1, a5);\
b7 = _mm256_xor_si256(b7, a5);\
b2 = _mm256_xor_si256(b2, a6);\
b0 = _mm256_xor_si256(b0, a6);\
/* spill values y_4, y_5 to memory */\
TEMP0 = b0;\
b3 = _mm256_xor_si256(b3, a7);\
b1 = _mm256_xor_si256(b1, a7);\
TEMP1 = b1;\
b4 = _mm256_xor_si256(b4, a0);\
b2 = _mm256_xor_si256(b2, a0);\
/* save values t0, t1, t2 to xmm8, xmm9 and memory */\
b0 = a0;\
b5 = _mm256_xor_si256(b5, a1);\
b3 = _mm256_xor_si256(b3, a1);\
b1 = a1;\
b6 = _mm256_xor_si256(b6, a2);\
b4 = _mm256_xor_si256(b4, a2);\
TEMP2 = a2;\
b7 = _mm256_xor_si256(b7, a3);\
b5 = _mm256_xor_si256(b5, a3);\
\
TEMP0 = mm256_xor3( b0, a4, a6 ); \
TEMP1 = mm256_xor3( b1, a5, a7 ); \
b2 = mm256_xor3( b2, a6, a0 ); \
b0 = a0; \
b3 = mm256_xor3( b3, a7, a1 ); \
b1 = a1; \
b6 = mm256_xor3( b6, a4, TEMP2 ); \
b4 = mm256_xor3( b4, a0, TEMP2 ); \
b7 = mm256_xor3( b7, a5, a3 ); \
b5 = mm256_xor3( b5, a1, a3 ); \
/* compute x_i = t_i + t_{i+3} */\
a0 = _mm256_xor_si256(a0, a3);\
a1 = _mm256_xor_si256(a1, a4);\
a2 = _mm256_xor_si256(a2, a5);\
a2 = _mm256_xor_si256( TEMP2, a5);\
a3 = _mm256_xor_si256(a3, a6);\
a4 = _mm256_xor_si256(a4, a7);\
a5 = _mm256_xor_si256(a5, b0);\
@@ -696,7 +584,7 @@ static const __m256i SUBSH_MASK7_2WAY =
\
/* compute z_i : double x_i using temp xmm8 and 1B xmm9 */\
/* compute w_i : add y_{i+4} */\
b1 = m256_const1_64( 0x1b1b1b1b1b1b1b1b );\
b1 = _mm256_set1_epi64x( 0x1b1b1b1b1b1b1b1b );\
MUL2_2WAY(a0, b0, b1);\
a0 = _mm256_xor_si256(a0, TEMP0);\
MUL2_2WAY(a1, b0, b1);\
@@ -738,15 +626,15 @@ static const __m256i SUBSH_MASK7_2WAY =
#define ROUND_2WAY(i, a0, a1, a2, a3, a4, a5, a6, a7, b0, b1, b2, b3, b4, b5, b6, b7){\
/* AddRoundConstant */\
b1 = m256_const2_64( 0xffffffffffffffff, 0 ); \
a0 = _mm256_xor_si256( a0, m256_const1_128( round_const_l0[i] ) );\
b1 = mm256_bcast_m128( mm128_mask_32( m128_neg1, 0x3 ) ); \
a0 = _mm256_xor_si256( a0, mm256_bcast_m128( round_const_l0[i] ) );\
a1 = _mm256_xor_si256( a1, b1 );\
a2 = _mm256_xor_si256( a2, b1 );\
a3 = _mm256_xor_si256( a3, b1 );\
a4 = _mm256_xor_si256( a4, b1 );\
a5 = _mm256_xor_si256( a5, b1 );\
a6 = _mm256_xor_si256( a6, b1 );\
a7 = _mm256_xor_si256( a7, m256_const1_128( round_const_l7[i] ) );\
a7 = _mm256_xor_si256( a7, mm256_bcast_m128( round_const_l7[i] ) );\
\
/* ShiftBytes + SubBytes (interleaved) */\
b0 = _mm256_xor_si256( b0, b0 );\
@@ -769,7 +657,6 @@ static const __m256i SUBSH_MASK7_2WAY =
\
/* MixBytes */\
MixBytes_2way(a0, a1, a2, a3, a4, a5, a6, a7, b0, b1, b2, b3, b4, b5, b6, b7);\
\
}
/* 10 rounds, P and Q in parallel */
@@ -850,7 +737,7 @@ static const __m256i SUBSH_MASK7_2WAY =
}/**/
#define Matrix_Transpose_O_B_2way(i0, i1, i2, i3, i4, i5, i6, i7, t0){\
t0 = _mm256_xor_si256( t0, t0 );\
t0 = m256_zero;\
i1 = i0;\
i3 = i2;\
i5 = i4;\
@@ -874,11 +761,11 @@ static const __m256i SUBSH_MASK7_2WAY =
void TF512_2way( __m256i* chaining, __m256i* message )
{
static __m256i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
static __m256i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
static __m256i TEMP0;
static __m256i TEMP1;
static __m256i TEMP2;
__m256i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
__m256i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
__m256i TEMP0;
__m256i TEMP1;
__m256i TEMP2;
/* load message into registers xmm12 - xmm15 */
xmm12 = message[0];
@@ -940,11 +827,11 @@ void TF512_2way( __m256i* chaining, __m256i* message )
void OF512_2way( __m256i* chaining )
{
static __m256i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
static __m256i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
static __m256i TEMP0;
static __m256i TEMP1;
static __m256i TEMP2;
__m256i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
__m256i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
__m256i TEMP0;
__m256i TEMP1;
__m256i TEMP2;
/* load CV into registers xmm8, xmm10, xmm12, xmm14 */
xmm8 = chaining[0];

View File

@@ -21,15 +21,11 @@
int groestl512_4way_init( groestl512_4way_context* ctx, uint64_t hashlen )
{
if (ctx->chaining == NULL || ctx->buffer == NULL)
return 1;
memset_zero_512( ctx->chaining, SIZE512 );
memset_zero_512( ctx->buffer, SIZE512 );
// The only non-zero in the IV is len. It can be hard coded.
ctx->chaining[ 6 ] = m512_const2_64( 0x0200000000000000, 0 );
ctx->chaining[ 6 ] = mm512_bcast128hi_64( 0x0200000000000000 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
@@ -64,14 +60,14 @@ int groestl512_4way_update_close( groestl512_4way_context* ctx, void* output,
if ( i == SIZE512 - 1 )
{
// only 1 vector left in buffer, all padding at once
ctx->buffer[i] = m512_const2_64( blocks << 56, 0x80 );
ctx->buffer[i] = mm512_set2_64( blocks << 56, 0x80 );
}
else
{
ctx->buffer[i] = m512_const2_64( 0, 0x80 );
ctx->buffer[i] = mm512_bcast128lo_64( 0x80 );
for ( i += 1; i < SIZE512 - 1; i++ )
ctx->buffer[i] = m512_zero;
ctx->buffer[i] = m512_const2_64( blocks << 56, 0 );
ctx->buffer[i] = mm512_bcast128hi_64( blocks << 56 );
}
TF1024_4way( ctx->chaining, ctx->buffer );
@@ -97,9 +93,8 @@ int groestl512_4way_full( groestl512_4way_context* ctx, void* output,
memset_zero_512( ctx->chaining, SIZE512 );
memset_zero_512( ctx->buffer, SIZE512 );
ctx->chaining[ 6 ] = m512_const2_64( 0x0200000000000000, 0 );
ctx->chaining[ 6 ] = mm512_bcast128hi_64( 0x0200000000000000 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
// --- update ---
@@ -108,8 +103,7 @@ int groestl512_4way_full( groestl512_4way_context* ctx, void* output,
ctx->buf_ptr = blocks * SIZE512;
for ( i = 0; i < len % SIZE512; i++ )
ctx->buffer[ ctx->rem_ptr + i ] = in[ ctx->buf_ptr + i ];
i += ctx->rem_ptr;
ctx->buffer[ i ] = in[ ctx->buf_ptr + i ];
// --- close ---
@@ -118,14 +112,14 @@ int groestl512_4way_full( groestl512_4way_context* ctx, void* output,
if ( i == SIZE512 - 1 )
{
// only 1 vector left in buffer, all padding at once
ctx->buffer[i] = m512_const2_64( blocks << 56, 0x80 );
ctx->buffer[i] = mm512_set2_64( blocks << 56, 0x80 );
}
else
{
ctx->buffer[i] = m512_const2_64( 0, 0x80 );
ctx->buffer[i] = mm512_bcast128lo_64( 0x80 );
for ( i += 1; i < SIZE512 - 1; i++ )
ctx->buffer[i] = m512_zero;
ctx->buffer[i] = m512_const2_64( blocks << 56, 0 );
ctx->buffer[i] = mm512_bcast128hi_64( blocks << 56 );
}
TF1024_4way( ctx->chaining, ctx->buffer );
@@ -144,14 +138,11 @@ int groestl512_4way_full( groestl512_4way_context* ctx, void* output,
int groestl512_2way_init( groestl512_2way_context* ctx, uint64_t hashlen )
{
if (ctx->chaining == NULL || ctx->buffer == NULL)
return 1;
memset_zero_256( ctx->chaining, SIZE512 );
memset_zero_256( ctx->buffer, SIZE512 );
// The only non-zero in the IV is len. It can be hard coded.
ctx->chaining[ 6 ] = m256_const2_64( 0x0200000000000000, 0 );
ctx->chaining[ 6 ] = mm256_bcast128hi_64( 0x0200000000000000 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
@@ -187,14 +178,14 @@ int groestl512_2way_update_close( groestl512_2way_context* ctx, void* output,
if ( i == SIZE512 - 1 )
{
// only 1 vector left in buffer, all padding at once
ctx->buffer[i] = m256_const2_64( blocks << 56, 0x80 );
ctx->buffer[i] = mm256_set2_64( blocks << 56, 0x80 );
}
else
{
ctx->buffer[i] = m256_const2_64( 0, 0x80 );
ctx->buffer[i] = mm256_bcast128lo_64( 0x80 );
for ( i += 1; i < SIZE512 - 1; i++ )
ctx->buffer[i] = m256_zero;
ctx->buffer[i] = m256_const2_64( blocks << 56, 0 );
ctx->buffer[i] = mm256_bcast128hi_64( blocks << 56 );
}
TF1024_2way( ctx->chaining, ctx->buffer );
@@ -220,9 +211,8 @@ int groestl512_2way_full( groestl512_2way_context* ctx, void* output,
memset_zero_256( ctx->chaining, SIZE512 );
memset_zero_256( ctx->buffer, SIZE512 );
ctx->chaining[ 6 ] = m256_const2_64( 0x0200000000000000, 0 );
ctx->chaining[ 6 ] = mm256_bcast128hi_64( 0x0200000000000000 );
ctx->buf_ptr = 0;
ctx->rem_ptr = 0;
// --- update ---
@@ -231,8 +221,7 @@ int groestl512_2way_full( groestl512_2way_context* ctx, void* output,
ctx->buf_ptr = blocks * SIZE512;
for ( i = 0; i < len % SIZE512; i++ )
ctx->buffer[ ctx->rem_ptr + i ] = in[ ctx->buf_ptr + i ];
i += ctx->rem_ptr;
ctx->buffer[ i ] = in[ ctx->buf_ptr + i ];
// --- close ---
@@ -241,14 +230,14 @@ int groestl512_2way_full( groestl512_2way_context* ctx, void* output,
if ( i == SIZE512 - 1 )
{
// only 1 vector left in buffer, all padding at once
ctx->buffer[i] = m256_const2_64( blocks << 56, 0x80 );
ctx->buffer[i] = mm256_set2_64( blocks << 56, 0x80 );
}
else
{
ctx->buffer[i] = m256_const2_64( 0, 0x80 );
ctx->buffer[i] = mm256_bcast128lo_64( 0x80 );
for ( i += 1; i < SIZE512 - 1; i++ )
ctx->buffer[i] = m256_zero;
ctx->buffer[i] = m256_const2_64( blocks << 56, 0 );
ctx->buffer[i] = mm256_bcast128hi_64( blocks << 56 );
}
TF1024_2way( ctx->chaining, ctx->buffer );

View File

@@ -174,7 +174,7 @@ static const __m512i SUBSH_MASK7 = { 0x06090c0f0205080b, 0x0e0104070a0d0003,
\
/* compute z_i : double x_i using temp xmm8 and 1B xmm9 */\
/* compute w_i : add y_{i+4} */\
b1 = m512_const1_64( 0x1b1b1b1b1b1b1b1b ); \
b1 = _mm512_set1_epi64( 0x1b1b1b1b1b1b1b1b ); \
MUL2( a0, b0, b1 ); \
a0 = _mm512_xor_si512( a0, TEMP0 ); \
MUL2( a1, b0, b1 ); \
@@ -238,7 +238,7 @@ static const __m512i SUBSH_MASK7 = { 0x06090c0f0205080b, 0x0e0104070a0d0003,
for ( round_counter = 0; round_counter < 14; round_counter += 2 ) \
{ \
/* AddRoundConstant P1024 */\
xmm8 = _mm512_xor_si512( xmm8, m512_const1_128( \
xmm8 = _mm512_xor_si512( xmm8, mm512_bcast_m128( \
casti_m128i( round_const_p, round_counter ) ) ); \
/* ShiftBytes P1024 + pre-AESENCLAST */\
xmm8 = _mm512_shuffle_epi8( xmm8, SUBSH_MASK0 ); \
@@ -253,7 +253,7 @@ static const __m512i SUBSH_MASK7 = { 0x06090c0f0205080b, 0x0e0104070a0d0003,
SUBMIX(xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15, xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7);\
\
/* AddRoundConstant P1024 */\
xmm0 = _mm512_xor_si512( xmm0, m512_const1_128( \
xmm0 = _mm512_xor_si512( xmm0, mm512_bcast_m128( \
casti_m128i( round_const_p, round_counter+1 ) ) ); \
/* ShiftBytes P1024 + pre-AESENCLAST */\
xmm0 = _mm512_shuffle_epi8( xmm0, SUBSH_MASK0 );\
@@ -282,7 +282,7 @@ static const __m512i SUBSH_MASK7 = { 0x06090c0f0205080b, 0x0e0104070a0d0003,
xmm12 = _mm512_xor_si512( xmm12, xmm1 );\
xmm13 = _mm512_xor_si512( xmm13, xmm1 );\
xmm14 = _mm512_xor_si512( xmm14, xmm1 );\
xmm15 = _mm512_xor_si512( xmm15, m512_const1_128( \
xmm15 = _mm512_xor_si512( xmm15, mm512_bcast_m128( \
casti_m128i( round_const_q, round_counter ) ) ); \
/* ShiftBytes Q1024 + pre-AESENCLAST */\
xmm8 = _mm512_shuffle_epi8( xmm8, SUBSH_MASK1 );\
@@ -305,7 +305,7 @@ static const __m512i SUBSH_MASK7 = { 0x06090c0f0205080b, 0x0e0104070a0d0003,
xmm4 = _mm512_xor_si512( xmm4, xmm9 );\
xmm5 = _mm512_xor_si512( xmm5, xmm9 );\
xmm6 = _mm512_xor_si512( xmm6, xmm9 );\
xmm7 = _mm512_xor_si512( xmm7, m512_const1_128( \
xmm7 = _mm512_xor_si512( xmm7, mm512_bcast_m128( \
casti_m128i( round_const_q, round_counter+1 ) ) ); \
/* ShiftBytes Q1024 + pre-AESENCLAST */\
xmm0 = _mm512_shuffle_epi8( xmm0, SUBSH_MASK1 );\
@@ -471,8 +471,8 @@ static const __m512i SUBSH_MASK7 = { 0x06090c0f0205080b, 0x0e0104070a0d0003,
void INIT_4way( __m512i* chaining )
{
static __m512i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
static __m512i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
__m512i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
__m512i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
/* load IV into registers xmm8 - xmm15 */
xmm8 = chaining[0];
@@ -500,12 +500,12 @@ void INIT_4way( __m512i* chaining )
void TF1024_4way( __m512i* chaining, const __m512i* message )
{
static __m512i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
static __m512i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
static __m512i QTEMP[8];
static __m512i TEMP0;
static __m512i TEMP1;
static __m512i TEMP2;
__m512i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
__m512i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
__m512i QTEMP[8];
__m512i TEMP0;
__m512i TEMP1;
__m512i TEMP2;
/* load message into registers xmm8 - xmm15 (Q = message) */
xmm8 = message[0];
@@ -606,11 +606,11 @@ void TF1024_4way( __m512i* chaining, const __m512i* message )
void OF1024_4way( __m512i* chaining )
{
static __m512i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
static __m512i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
static __m512i TEMP0;
static __m512i TEMP1;
static __m512i TEMP2;
__m512i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
__m512i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
__m512i TEMP0;
__m512i TEMP1;
__m512i TEMP2;
/* load CV into registers xmm8 - xmm15 */
xmm8 = chaining[0];
@@ -710,7 +710,7 @@ static const __m256i SUBSH_MASK7_2WAY =
b0 = a2;\
a1 = _mm256_xor_si256(a1, a2);\
b1 = a3;\
a2 = _mm256_xor_si256(a2, a3);\
TEMP2 = _mm256_xor_si256(a2, a3);\
b2 = a4;\
a3 = _mm256_xor_si256(a3, a4);\
b3 = a5;\
@@ -722,34 +722,23 @@ static const __m256i SUBSH_MASK7_2WAY =
a7 = _mm256_xor_si256(a7, b6);\
\
/* build y4 y5 y6 ... in regs xmm8, xmm9, xmm10 by adding t_i*/\
b0 = _mm256_xor_si256(b0, a4);\
b6 = _mm256_xor_si256(b6, a4);\
b1 = _mm256_xor_si256(b1, a5);\
b7 = _mm256_xor_si256(b7, a5);\
b2 = _mm256_xor_si256(b2, a6);\
b0 = _mm256_xor_si256(b0, a6);\
TEMP0 = mm256_xor3( b0, a4, a6 ); \
/* spill values y_4, y_5 to memory */\
TEMP0 = b0;\
b3 = _mm256_xor_si256(b3, a7);\
b1 = _mm256_xor_si256(b1, a7);\
TEMP1 = b1;\
b4 = _mm256_xor_si256(b4, a0);\
b2 = _mm256_xor_si256(b2, a0);\
TEMP1 = mm256_xor3( b1, a5, a7 ); \
b2 = mm256_xor3( b2, a6, a0 ); \
/* save values t0, t1, t2 to xmm8, xmm9 and memory */\
b0 = a0;\
b5 = _mm256_xor_si256(b5, a1);\
b3 = _mm256_xor_si256(b3, a1);\
b1 = a1;\
b6 = _mm256_xor_si256(b6, a2);\
b4 = _mm256_xor_si256(b4, a2);\
TEMP2 = a2;\
b7 = _mm256_xor_si256(b7, a3);\
b5 = _mm256_xor_si256(b5, a3);\
b0 = a0; \
b3 = mm256_xor3( b3, a7, a1 ); \
b1 = a1; \
b6 = mm256_xor3( b6, a4, TEMP2 ); \
b4 = mm256_xor3( b4, a0, TEMP2 ); \
b7 = mm256_xor3( b7, a5, a3 ); \
b5 = mm256_xor3( b5, a1, a3 ); \
\
/* compute x_i = t_i + t_{i+3} */\
a0 = _mm256_xor_si256(a0, a3);\
a1 = _mm256_xor_si256(a1, a4);\
a2 = _mm256_xor_si256(a2, a5);\
a2 = _mm256_xor_si256( TEMP2, a5);\
a3 = _mm256_xor_si256(a3, a6);\
a4 = _mm256_xor_si256(a4, a7);\
a5 = _mm256_xor_si256(a5, b0);\
@@ -758,7 +747,7 @@ static const __m256i SUBSH_MASK7_2WAY =
\
/* compute z_i : double x_i using temp xmm8 and 1B xmm9 */\
/* compute w_i : add y_{i+4} */\
b1 = m256_const1_64( 0x1b1b1b1b1b1b1b1b );\
b1 = _mm256_set1_epi64x( 0x1b1b1b1b1b1b1b1b );\
MUL2_2WAY(a0, b0, b1);\
a0 = _mm256_xor_si256(a0, TEMP0);\
MUL2_2WAY(a1, b0, b1);\
@@ -822,7 +811,7 @@ static const __m256i SUBSH_MASK7_2WAY =
for ( round_counter = 0; round_counter < 14; round_counter += 2 ) \
{ \
/* AddRoundConstant P1024 */\
xmm8 = _mm256_xor_si256( xmm8, m256_const1_128( \
xmm8 = _mm256_xor_si256( xmm8, mm256_bcast_m128( \
casti_m128i( round_const_p, round_counter ) ) ); \
/* ShiftBytes P1024 + pre-AESENCLAST */\
xmm8 = _mm256_shuffle_epi8( xmm8, SUBSH_MASK0_2WAY ); \
@@ -837,7 +826,7 @@ static const __m256i SUBSH_MASK7_2WAY =
SUBMIX_2WAY(xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15, xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7);\
\
/* AddRoundConstant P1024 */\
xmm0 = _mm256_xor_si256( xmm0, m256_const1_128( \
xmm0 = _mm256_xor_si256( xmm0, mm256_bcast_m128( \
casti_m128i( round_const_p, round_counter+1 ) ) ); \
/* ShiftBytes P1024 + pre-AESENCLAST */\
xmm0 = _mm256_shuffle_epi8( xmm0, SUBSH_MASK0_2WAY );\
@@ -866,7 +855,7 @@ static const __m256i SUBSH_MASK7_2WAY =
xmm12 = _mm256_xor_si256( xmm12, xmm1 );\
xmm13 = _mm256_xor_si256( xmm13, xmm1 );\
xmm14 = _mm256_xor_si256( xmm14, xmm1 );\
xmm15 = _mm256_xor_si256( xmm15, m256_const1_128( \
xmm15 = _mm256_xor_si256( xmm15, mm256_bcast_m128( \
casti_m128i( round_const_q, round_counter ) ) ); \
/* ShiftBytes Q1024 + pre-AESENCLAST */\
xmm8 = _mm256_shuffle_epi8( xmm8, SUBSH_MASK1_2WAY );\
@@ -889,7 +878,7 @@ static const __m256i SUBSH_MASK7_2WAY =
xmm4 = _mm256_xor_si256( xmm4, xmm9 );\
xmm5 = _mm256_xor_si256( xmm5, xmm9 );\
xmm6 = _mm256_xor_si256( xmm6, xmm9 );\
xmm7 = _mm256_xor_si256( xmm7, m256_const1_128( \
xmm7 = _mm256_xor_si256( xmm7, mm256_bcast_m128( \
casti_m128i( round_const_q, round_counter+1 ) ) ); \
/* ShiftBytes Q1024 + pre-AESENCLAST */\
xmm0 = _mm256_shuffle_epi8( xmm0, SUBSH_MASK1_2WAY );\
@@ -1040,8 +1029,8 @@ static const __m256i SUBSH_MASK7_2WAY =
void INIT_2way( __m256i *chaining )
{
static __m256i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
static __m256i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
__m256i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
__m256i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
/* load IV into registers xmm8 - xmm15 */
xmm8 = chaining[0];
@@ -1069,12 +1058,12 @@ void INIT_2way( __m256i *chaining )
void TF1024_2way( __m256i *chaining, const __m256i *message )
{
static __m256i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
static __m256i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
static __m256i QTEMP[8];
static __m256i TEMP0;
static __m256i TEMP1;
static __m256i TEMP2;
__m256i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
__m256i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
__m256i QTEMP[8];
__m256i TEMP0;
__m256i TEMP1;
__m256i TEMP2;
/* load message into registers xmm8 - xmm15 (Q = message) */
xmm8 = message[0];
@@ -1175,11 +1164,11 @@ void TF1024_2way( __m256i *chaining, const __m256i *message )
void OF1024_2way( __m256i* chaining )
{
static __m256i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
static __m256i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
static __m256i TEMP0;
static __m256i TEMP1;
static __m256i TEMP2;
__m256i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7;
__m256i xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
__m256i TEMP0;
__m256i TEMP1;
__m256i TEMP2;
/* load CV into registers xmm8 - xmm15 */
xmm8 = chaining[0];

View File

@@ -73,11 +73,11 @@ int scanhash_myriad( struct work *work, uint32_t max_nonce,
be32enc(&endiandata[19], nonce);
myriad_hash(hash, endiandata);
if (hash[7] <= Htarg && fulltest(hash, ptarget))
if (hash[7] <= Htarg )
if ( fulltest(hash, ptarget) && !opt_benchmark )
{
pdata[19] = nonce;
*hashes_done = pdata[19] - first_nonce;
return 1;
submit_solution( work, hash, mythr );
}
nonce++;

View File

@@ -4,7 +4,7 @@
#include <stdint.h>
#include <string.h>
#include "aes_ni/hash-groestl.h"
#include "algo/sha/sha-hash-4way.h"
#include "algo/sha/sha256-hash.h"
#if defined(__VAES__)
#include "groestl512-hash-4way.h"
#endif

View File

@@ -40,7 +40,7 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
#if !defined(__AES__)
/**

File diff suppressed because it is too large Load Diff

View File

@@ -36,44 +36,64 @@
#define HAMSI_4WAY_H__
#include <stddef.h>
#include "algo/sha/sph_types.h"
#if defined (__AVX2__)
#include "simd-utils.h"
#ifdef __cplusplus
extern "C"{
#endif
#define SPH_SIZE_hamsi512 512
// Hamsi-512 4x64
// Partial is only scalar but needs pointer ref for hamsi-helper
// deprecate partial_len
typedef struct {
typedef struct
{
__m256i h[8];
__m256i buf[1];
size_t partial_len;
sph_u32 count_high, count_low;
uint32_t count_high, count_low;
} hamsi_4way_big_context;
typedef hamsi_4way_big_context hamsi512_4way_context;
void hamsi512_4way_init( hamsi512_4way_context *sc );
void hamsi512_4way_update( hamsi512_4way_context *sc, const void *data,
size_t len );
//#define hamsi512_4way hamsi512_4way_update
void hamsi512_4way_close( hamsi512_4way_context *sc, void *dst );
#define hamsi512_4x64_context hamsi512_4way_context
#define hamsi512_4x64_init hamsi512_4way_init
#define hamsi512_4x64_update hamsi512_4way_update
#define hamsi512_4x64_close hamsi512_4way_close
// Hamsi-512 8x32
typedef struct
{
__m256i h[16];
__m256i buf[2];
size_t partial_len;
uint32_t count_high, count_low;
} hamsi_8x32_big_context;
typedef hamsi_8x32_big_context hamsi512_8x32_context;
void hamsi512_8x32_init( hamsi512_8x32_context *sc );
void hamsi512_8x32_update( hamsi512_8x32_context *sc, const void *data,
size_t len );
void hamsi512_8x32_close( hamsi512_8x32_context *sc, void *dst );
void hamsi512_8x32_full( hamsi512_8x32_context *sc, void *dst, const void *data,
size_t len );
#endif
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
// Hamsi-512 8x64
typedef struct {
__m512i h[8];
__m512i buf[1];
size_t partial_len;
sph_u32 count_high, count_low;
uint32_t count_high, count_low;
} hamsi_8way_big_context;
typedef hamsi_8way_big_context hamsi512_8way_context;
void hamsi512_8way_init( hamsi512_8way_context *sc );
@@ -81,15 +101,29 @@ void hamsi512_8way_update( hamsi512_8way_context *sc, const void *data,
size_t len );
void hamsi512_8way_close( hamsi512_8way_context *sc, void *dst );
#define hamsi512_8x64_context hamsi512_8way_context
#define hamsi512_8x64_init hamsi512_8way_init
#define hamsi512_8x64_update hamsi512_8way_update
#define hamsi512_8x64_close hamsi512_8way_close
#endif
#ifdef __cplusplus
}
#endif
#endif
// Hamsi-512 16x32
typedef struct
{
__m512i h[16];
__m512i buf[2];
size_t partial_len;
uint32_t count_high, count_low;
} hamsi_16x32_big_context;
typedef hamsi_16x32_big_context hamsi512_16x32_context;
void hamsi512_16x32_init( hamsi512_16x32_context *sc );
void hamsi512_16x32_update( hamsi512_16x32_context *sc, const void *data,
size_t len );
void hamsi512_16way_close( hamsi512_16x32_context *sc, void *dst );
void hamsi512_16x32_full( hamsi512_16x32_context *sc, void *dst,
const void *data, size_t len );
#endif // AVX512
#endif

View File

@@ -36,7 +36,7 @@
#define SPH_HAMSI_H__
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
#ifdef __cplusplus
extern "C"{

View File

@@ -0,0 +1,115 @@
/* $Id: haval_helper.c 218 2010-06-08 17:06:34Z tp $ */
/*
* Helper code, included (three times !) by HAVAL implementation.
*
* TODO: try to merge this with md_helper.c.
*
* ==========================(LICENSE BEGIN)============================
*
* Copyright (c) 2007-2010 Projet RNRT SAPHIR
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* ===========================(LICENSE END)=============================
*
* @author Thomas Pornin <thomas.pornin@cryptolog.com>
*/
#undef SPH_XCAT
#define SPH_XCAT(a, b) SPH_XCAT_(a, b)
#undef SPH_XCAT_
#define SPH_XCAT_(a, b) a ## b
static void
SPH_XCAT(SPH_XCAT(haval, PASSES), _16way_update)
( haval_16way_context *sc, const void *data, size_t len )
{
__m512i *vdata = (__m512i*)data;
unsigned current;
current = (unsigned)sc->count_low & 127U;
while ( len > 0 )
{
unsigned clen;
uint32_t clow, clow2;
clen = 128U - current;
if ( clen > len )
clen = len;
memcpy_512( sc->buf + (current>>2), vdata, clen>>2 );
vdata += clen>>2;
current += clen;
len -= clen;
if ( current == 128U )
{
DSTATE_16W;
IN_PREPARE_16W(sc->buf);
RSTATE_16W;
SPH_XCAT(CORE_16W, PASSES)(INW_16W);
WSTATE_16W;
current = 0;
}
clow = sc->count_low;
clow2 = clow + clen;
sc->count_low = clow2;
if ( clow2 < clow )
sc->count_high ++;
}
}
static void
SPH_XCAT(SPH_XCAT(haval, PASSES), _16way_close)( haval_16way_context *sc,
void *dst)
{
unsigned current;
DSTATE_16W;
current = (unsigned)sc->count_low & 127UL;
sc->buf[ current>>2 ] = v512_32( 1 );
current += 4;
RSTATE_16W;
if ( current > 116UL )
{
memset_zero_512( sc->buf + ( current>>2 ), (128UL-current) >> 2 );
do
{
IN_PREPARE_16W(sc->buf);
SPH_XCAT(CORE_16W, PASSES)(INW_16W);
} while (0);
current = 0;
}
uint32_t t1, t2;
memset_zero_512( sc->buf + ( current>>2 ), (116UL-current) >> 2 );
t1 = 0x01 | (PASSES << 3);
t2 = sc->olen << 3;
sc->buf[ 116>>2 ] = v512_32( ( t1 << 16 ) | ( t2 << 24 ) );
sc->buf[ 120>>2 ] = v512_32( sc->count_low << 3 );
sc->buf[ 124>>2 ] = v512_32( (sc->count_high << 3)
| (sc->count_low >> 29) );
do
{
IN_PREPARE_16W(sc->buf);
SPH_XCAT(CORE_16W, PASSES)(INW_16W);
} while (0);
WSTATE_16W;
haval_16way_out( sc, dst );
}

View File

@@ -48,7 +48,7 @@ SPH_XCAT(SPH_XCAT(haval, PASSES), _4way_update)
while ( len > 0 )
{
unsigned clen;
sph_u32 clow, clow2;
uint32_t clow, clow2;
clen = 128U - current;
if ( clen > len )
@@ -67,7 +67,7 @@ SPH_XCAT(SPH_XCAT(haval, PASSES), _4way_update)
current = 0;
}
clow = sc->count_low;
clow2 = SPH_T32(clow + clen);
clow2 = clow + clen;
sc->count_low = clow2;
if ( clow2 < clow )
sc->count_high ++;
@@ -83,7 +83,7 @@ SPH_XCAT(SPH_XCAT(haval, PASSES), _4way_close)( haval_4way_context *sc,
current = (unsigned)sc->count_low & 127UL;
sc->buf[ current>>2 ] = m128_one_32;
sc->buf[ current>>2 ] = v128_32( 1 );
current += 4;
RSTATE;
if ( current > 116UL )

View File

@@ -83,7 +83,7 @@ SPH_XCAT(SPH_XCAT(haval, PASSES), _8way_close)( haval_8way_context *sc,
current = (unsigned)sc->count_low & 127UL;
sc->buf[ current>>2 ] = m256_one_32;
sc->buf[ current>>2 ] = v256_32( 1 );
current += 4;
RSTATE_8W;
if ( current > 116UL )
@@ -101,9 +101,9 @@ SPH_XCAT(SPH_XCAT(haval, PASSES), _8way_close)( haval_8way_context *sc,
memset_zero_256( sc->buf + ( current>>2 ), (116UL-current) >> 2 );
t1 = 0x01 | (PASSES << 3);
t2 = sc->olen << 3;
sc->buf[ 116>>2 ] = _mm256_set1_epi32( ( t1 << 16 ) | ( t2 << 24 ) );
sc->buf[ 120>>2 ] = _mm256_set1_epi32( sc->count_low << 3 );
sc->buf[ 124>>2 ] = _mm256_set1_epi32( (sc->count_high << 3)
sc->buf[ 116>>2 ] = v256_32( ( t1 << 16 ) | ( t2 << 24 ) );
sc->buf[ 120>>2 ] = v256_32( sc->count_low << 3 );
sc->buf[ 124>>2 ] = v256_32( (sc->count_high << 3)
| (sc->count_low >> 29) );
do
{

View File

@@ -52,6 +52,56 @@ extern "C"{
#define SPH_SMALL_FOOTPRINT_HAVAL 1
//#endif
#if defined(__AVX512VL__)
// ( ~( a ^ b ) ) & c
#define mm128_andnotxor( a, b, c ) \
_mm_ternarylogic_epi32( a, b, c, 0x82 )
#else
#define mm128_andnotxor( a, b, c ) \
_mm_andnot_si128( _mm_xor_si128( a, b ), c )
#endif
#define F1(x6, x5, x4, x3, x2, x1, x0) \
mm128_xor3( x0, mm128_andxor( x1, x0, x4 ), \
_mm_xor_si128( _mm_and_si128( x2, x5 ), \
_mm_and_si128( x3, x6 ) ) ) \
#define F2(x6, x5, x4, x3, x2, x1, x0) \
mm128_xor3( mm128_andxor( x2, _mm_andnot_si128( x3, x1 ), \
mm128_xor3( _mm_and_si128( x4, x5 ), x6, x0 ) ), \
mm128_andxor( x4, x1, x5 ), \
mm128_xorand( x0, x3, x5 ) ) \
#define F3(x6, x5, x4, x3, x2, x1, x0) \
mm128_xor3( x0, \
_mm_and_si128( x3, \
mm128_xor3( _mm_and_si128( x1, x2 ), x6, x0 ) ), \
_mm_xor_si128( _mm_and_si128( x1, x4 ), \
_mm_and_si128( x2, x5 ) ) )
#define F4(x6, x5, x4, x3, x2, x1, x0) \
mm128_xor3( \
mm128_andxor( x3, x5, \
_mm_xor_si128( _mm_and_si128( x1, x2 ), \
_mm_or_si128( x4, x6 ) ) ), \
_mm_and_si128( x4, \
mm128_xor3( x0, _mm_andnot_si128( x2, x5 ), \
_mm_xor_si128( x1, x6 ) ) ), \
mm128_xorand( x0, x2, x6 ) )
#define F5(x6, x5, x4, x3, x2, x1, x0) \
_mm_xor_si128( \
mm128_andnotxor( mm128_and3( x1, x2, x3 ), x5, x0 ), \
mm128_xor3( _mm_and_si128( x1, x4 ), \
_mm_and_si128( x2, x5 ), \
_mm_and_si128( x3, x6 ) ) )
/*
#define F1(x6, x5, x4, x3, x2, x1, x0) \
_mm_xor_si128( x0, \
_mm_xor_si128( _mm_and_si128(_mm_xor_si128( x0, x4 ), x1 ), \
@@ -96,6 +146,7 @@ extern "C"{
_mm_xor_si128( _mm_xor_si128( _mm_and_si128( x1, x4 ), \
_mm_and_si128( x2, x5 ) ), \
_mm_and_si128( x3, x6 ) ) )
*/
/*
* The macros below integrate the phi() permutations, depending on the
@@ -138,7 +189,14 @@ do { \
__m128i t = FP ## n ## _ ## p(x6, x5, x4, x3, x2, x1, x0); \
x7 = _mm_add_epi32( _mm_add_epi32( mm128_ror_32( t, 7 ), \
mm128_ror_32( x7, 11 ) ), \
_mm_add_epi32( w, _mm_set1_epi32( c ) ) ); \
_mm_add_epi32( w, v128_32( c ) ) ); \
} while (0)
#define STEP1(n, p, x7, x6, x5, x4, x3, x2, x1, x0, w) \
do { \
__m128i t = FP ## n ## _ ## p(x6, x5, x4, x3, x2, x1, x0); \
x7 = _mm_add_epi32( _mm_add_epi32( mm128_ror_32( t, 7 ), \
mm128_ror_32( x7, 11 ) ), w ); \
} while (0)
/*
@@ -152,22 +210,22 @@ do { \
#define PASS1(n, in) do { \
unsigned pass_count; \
for (pass_count = 0; pass_count < 32; pass_count += 8) { \
STEP(n, 1, s7, s6, s5, s4, s3, s2, s1, s0, \
in(pass_count + 0), SPH_C32(0x00000000)); \
STEP(n, 1, s6, s5, s4, s3, s2, s1, s0, s7, \
in(pass_count + 1), SPH_C32(0x00000000)); \
STEP(n, 1, s5, s4, s3, s2, s1, s0, s7, s6, \
in(pass_count + 2), SPH_C32(0x00000000)); \
STEP(n, 1, s4, s3, s2, s1, s0, s7, s6, s5, \
in(pass_count + 3), SPH_C32(0x00000000)); \
STEP(n, 1, s3, s2, s1, s0, s7, s6, s5, s4, \
in(pass_count + 4), SPH_C32(0x00000000)); \
STEP(n, 1, s2, s1, s0, s7, s6, s5, s4, s3, \
in(pass_count + 5), SPH_C32(0x00000000)); \
STEP(n, 1, s1, s0, s7, s6, s5, s4, s3, s2, \
in(pass_count + 6), SPH_C32(0x00000000)); \
STEP(n, 1, s0, s7, s6, s5, s4, s3, s2, s1, \
in(pass_count + 7), SPH_C32(0x00000000)); \
STEP1(n, 1, s7, s6, s5, s4, s3, s2, s1, s0, \
in(pass_count + 0) ); \
STEP1(n, 1, s6, s5, s4, s3, s2, s1, s0, s7, \
in(pass_count + 1) ); \
STEP1(n, 1, s5, s4, s3, s2, s1, s0, s7, s6, \
in(pass_count + 2) ); \
STEP1(n, 1, s4, s3, s2, s1, s0, s7, s6, s5, \
in(pass_count + 3) ); \
STEP1(n, 1, s3, s2, s1, s0, s7, s6, s5, s4, \
in(pass_count + 4) ); \
STEP1(n, 1, s2, s1, s0, s7, s6, s5, s4, s3, \
in(pass_count + 5) ); \
STEP1(n, 1, s1, s0, s7, s6, s5, s4, s3, s2, \
in(pass_count + 6) ); \
STEP1(n, 1, s0, s7, s6, s5, s4, s3, s2, s1, \
in(pass_count + 7) ); \
} \
} while (0)
@@ -234,7 +292,9 @@ static const unsigned MP5[32] = {
2, 23, 16, 22, 4, 1, 25, 15
};
static const sph_u32 RK2[32] = {
#define SPH_C32(x) (x)
static const uint32_t RK2[32] = {
SPH_C32(0x452821E6), SPH_C32(0x38D01377),
SPH_C32(0xBE5466CF), SPH_C32(0x34E90C6C),
SPH_C32(0xC0AC29B7), SPH_C32(0xC97C50DD),
@@ -253,7 +313,7 @@ static const sph_u32 RK2[32] = {
SPH_C32(0x7B54A41D), SPH_C32(0xC25A59B5)
};
static const sph_u32 RK3[32] = {
static const uint32_t RK3[32] = {
SPH_C32(0x9C30D539), SPH_C32(0x2AF26013),
SPH_C32(0xC5D1B023), SPH_C32(0x286085F0),
SPH_C32(0xCA417918), SPH_C32(0xB8DB38EF),
@@ -272,7 +332,7 @@ static const sph_u32 RK3[32] = {
SPH_C32(0xAFD6BA33), SPH_C32(0x6C24CF5C)
};
static const sph_u32 RK4[32] = {
static const uint32_t RK4[32] = {
SPH_C32(0x7A325381), SPH_C32(0x28958677),
SPH_C32(0x3B8F4898), SPH_C32(0x6B4BB9AF),
SPH_C32(0xC4BFE81B), SPH_C32(0x66282193),
@@ -291,7 +351,7 @@ static const sph_u32 RK4[32] = {
SPH_C32(0x6EEF0B6C), SPH_C32(0x137A3BE4)
};
static const sph_u32 RK5[32] = {
static const uint32_t RK5[32] = {
SPH_C32(0xBA3BF050), SPH_C32(0x7EFB2A98),
SPH_C32(0xA1F1651D), SPH_C32(0x39AF0176),
SPH_C32(0x66CA593E), SPH_C32(0x82430E88),
@@ -411,14 +471,14 @@ do { \
static void
haval_4way_init( haval_4way_context *sc, unsigned olen, unsigned passes )
{
sc->s0 = _mm_set1_epi32( 0x243F6A88UL );
sc->s1 = _mm_set1_epi32( 0x85A308D3UL );
sc->s2 = _mm_set1_epi32( 0x13198A2EUL );
sc->s3 = _mm_set1_epi32( 0x03707344UL );
sc->s4 = _mm_set1_epi32( 0xA4093822UL );
sc->s5 = _mm_set1_epi32( 0x299F31D0UL );
sc->s6 = _mm_set1_epi32( 0x082EFA98UL );
sc->s7 = _mm_set1_epi32( 0xEC4E6C89UL );
sc->s0 = v128_32( 0x243F6A88UL );
sc->s1 = v128_32( 0x85A308D3UL );
sc->s2 = v128_32( 0x13198A2EUL );
sc->s3 = v128_32( 0x03707344UL );
sc->s4 = v128_32( 0xA4093822UL );
sc->s5 = v128_32( 0x299F31D0UL );
sc->s6 = v128_32( 0x082EFA98UL );
sc->s7 = v128_32( 0xEC4E6C89UL );
sc->olen = olen;
sc->passes = passes;
sc->count_high = 0;
@@ -602,28 +662,35 @@ do { \
__m256i t = FP ## n ## _ ## p ## _8W(x6, x5, x4, x3, x2, x1, x0); \
x7 = _mm256_add_epi32( _mm256_add_epi32( mm256_ror_32( t, 7 ), \
mm256_ror_32( x7, 11 ) ), \
_mm256_add_epi32( w, _mm256_set1_epi32( c ) ) ); \
_mm256_add_epi32( w, v256_32( c ) ) ); \
} while (0)
#define STEP1_8W(n, p, x7, x6, x5, x4, x3, x2, x1, x0, w) \
do { \
__m256i t = FP ## n ## _ ## p ## _8W(x6, x5, x4, x3, x2, x1, x0); \
x7 = _mm256_add_epi32( _mm256_add_epi32( mm256_ror_32( t, 7 ), \
mm256_ror_32( x7, 11 ) ), w ); \
} while (0)
#define PASS1_8W(n, in) do { \
unsigned pass_count; \
for (pass_count = 0; pass_count < 32; pass_count += 8) { \
STEP_8W(n, 1, s7, s6, s5, s4, s3, s2, s1, s0, \
in(pass_count + 0), SPH_C32(0x00000000)); \
STEP_8W(n, 1, s6, s5, s4, s3, s2, s1, s0, s7, \
in(pass_count + 1), SPH_C32(0x00000000)); \
STEP_8W(n, 1, s5, s4, s3, s2, s1, s0, s7, s6, \
in(pass_count + 2), SPH_C32(0x00000000)); \
STEP_8W(n, 1, s4, s3, s2, s1, s0, s7, s6, s5, \
in(pass_count + 3), SPH_C32(0x00000000)); \
STEP_8W(n, 1, s3, s2, s1, s0, s7, s6, s5, s4, \
in(pass_count + 4), SPH_C32(0x00000000)); \
STEP_8W(n, 1, s2, s1, s0, s7, s6, s5, s4, s3, \
in(pass_count + 5), SPH_C32(0x00000000)); \
STEP_8W(n, 1, s1, s0, s7, s6, s5, s4, s3, s2, \
in(pass_count + 6), SPH_C32(0x00000000)); \
STEP_8W(n, 1, s0, s7, s6, s5, s4, s3, s2, s1, \
in(pass_count + 7), SPH_C32(0x00000000)); \
STEP1_8W(n, 1, s7, s6, s5, s4, s3, s2, s1, s0, \
in(pass_count + 0) ); \
STEP1_8W(n, 1, s6, s5, s4, s3, s2, s1, s0, s7, \
in(pass_count + 1) ); \
STEP1_8W(n, 1, s5, s4, s3, s2, s1, s0, s7, s6, \
in(pass_count + 2) ); \
STEP1_8W(n, 1, s4, s3, s2, s1, s0, s7, s6, s5, \
in(pass_count + 3) ); \
STEP1_8W(n, 1, s3, s2, s1, s0, s7, s6, s5, s4, \
in(pass_count + 4) ); \
STEP1_8W(n, 1, s2, s1, s0, s7, s6, s5, s4, s3, \
in(pass_count + 5) ); \
STEP1_8W(n, 1, s1, s0, s7, s6, s5, s4, s3, s2, \
in(pass_count + 6) ); \
STEP1_8W(n, 1, s0, s7, s6, s5, s4, s3, s2, s1, \
in(pass_count + 7) ); \
} \
} while (0)
@@ -726,14 +793,14 @@ do { \
static void
haval_8way_init( haval_8way_context *sc, unsigned olen, unsigned passes )
{
sc->s0 = m256_const1_32( 0x243F6A88UL );
sc->s1 = m256_const1_32( 0x85A308D3UL );
sc->s2 = m256_const1_32( 0x13198A2EUL );
sc->s3 = m256_const1_32( 0x03707344UL );
sc->s4 = m256_const1_32( 0xA4093822UL );
sc->s5 = m256_const1_32( 0x299F31D0UL );
sc->s6 = m256_const1_32( 0x082EFA98UL );
sc->s7 = m256_const1_32( 0xEC4E6C89UL );
sc->s0 = v256_32( 0x243F6A88UL );
sc->s1 = v256_32( 0x85A308D3UL );
sc->s2 = v256_32( 0x13198A2EUL );
sc->s3 = v256_32( 0x03707344UL );
sc->s4 = v256_32( 0xA4093822UL );
sc->s5 = v256_32( 0x299F31D0UL );
sc->s6 = v256_32( 0x082EFA98UL );
sc->s7 = v256_32( 0xEC4E6C89UL );
sc->olen = olen;
sc->passes = passes;
sc->count_high = 0;
@@ -812,10 +879,300 @@ do { \
#define INMSG_8W(i) msg[i]
#endif // AVX2
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
// ( ~( a ^ b ) ) & c
#define mm512_andnotxor( a, b, c ) \
_mm512_ternarylogic_epi32( a, b, c, 0x82 )
#define F1_16W(x6, x5, x4, x3, x2, x1, x0) \
mm512_xor3( x0, mm512_andxor( x1, x0, x4 ), \
_mm512_xor_si512( _mm512_and_si512( x2, x5 ), \
_mm512_and_si512( x3, x6 ) ) ) \
#define F2_16W(x6, x5, x4, x3, x2, x1, x0) \
mm512_xor3( mm512_andxor( x2, _mm512_andnot_si512( x3, x1 ), \
mm512_xor3( _mm512_and_si512( x4, x5 ), x6, x0 ) ), \
mm512_andxor( x4, x1, x5 ), \
mm512_xorand( x0, x3, x5 ) ) \
#define F3_16W(x6, x5, x4, x3, x2, x1, x0) \
mm512_xor3( x0, \
_mm512_and_si512( x3, \
mm512_xor3( _mm512_and_si512( x1, x2 ), x6, x0 ) ), \
_mm512_xor_si512( _mm512_and_si512( x1, x4 ), \
_mm512_and_si512( x2, x5 ) ) )
#define F4_16W(x6, x5, x4, x3, x2, x1, x0) \
mm512_xor3( \
mm512_andxor( x3, x5, \
_mm512_xor_si512( _mm512_and_si512( x1, x2 ), \
_mm512_or_si512( x4, x6 ) ) ), \
_mm512_and_si512( x4, \
mm512_xor3( x0, _mm512_andnot_si512( x2, x5 ), \
_mm512_xor_si512( x1, x6 ) ) ), \
mm512_xorand( x0, x2, x6 ) )
#define F5_16W(x6, x5, x4, x3, x2, x1, x0) \
_mm512_xor_si512( \
mm512_andnotxor( mm512_and3( x1, x2, x3 ), x5, x0 ), \
mm512_xor3( _mm512_and_si512( x1, x4 ), \
_mm512_and_si512( x2, x5 ), \
_mm512_and_si512( x3, x6 ) ) )
#define FP3_1_16W(x6, x5, x4, x3, x2, x1, x0) \
F1_16W(x1, x0, x3, x5, x6, x2, x4)
#define FP3_2_16W(x6, x5, x4, x3, x2, x1, x0) \
F2_16W(x4, x2, x1, x0, x5, x3, x6)
#define FP3_3_16W(x6, x5, x4, x3, x2, x1, x0) \
F3_16W(x6, x1, x2, x3, x4, x5, x0)
#define FP4_1_16W(x6, x5, x4, x3, x2, x1, x0) \
F1_16W(x2, x6, x1, x4, x5, x3, x0)
#define FP4_2_16W(x6, x5, x4, x3, x2, x1, x0) \
F2_16W(x3, x5, x2, x0, x1, x6, x4)
#define FP4_3_16W(x6, x5, x4, x3, x2, x1, x0) \
F3_16W(x1, x4, x3, x6, x0, x2, x5)
#define FP4_4_16W(x6, x5, x4, x3, x2, x1, x0) \
F4_16W(x6, x4, x0, x5, x2, x1, x3)
#define FP5_1_16W(x6, x5, x4, x3, x2, x1, x0) \
F1_16W(x3, x4, x1, x0, x5, x2, x6)
#define FP5_2_16W(x6, x5, x4, x3, x2, x1, x0) \
F2_16W(x6, x2, x1, x0, x3, x4, x5)
#define FP5_3_16W(x6, x5, x4, x3, x2, x1, x0) \
F3_16W(x2, x6, x0, x4, x3, x1, x5)
#define FP5_4_16W(x6, x5, x4, x3, x2, x1, x0) \
F4_16W(x1, x5, x3, x2, x0, x4, x6)
#define FP5_5_16W(x6, x5, x4, x3, x2, x1, x0) \
F5_16W(x2, x5, x0, x6, x4, x3, x1)
#define STEP_16W(n, p, x7, x6, x5, x4, x3, x2, x1, x0, w, c) \
do { \
__m512i t = FP ## n ## _ ## p ## _16W(x6, x5, x4, x3, x2, x1, x0); \
x7 = _mm512_add_epi32( _mm512_add_epi32( mm512_ror_32( t, 7 ), \
mm512_ror_32( x7, 11 ) ), \
_mm512_add_epi32( w, v512_32( c ) ) ); \
} while (0)
#define STEP1_16W(n, p, x7, x6, x5, x4, x3, x2, x1, x0, w) \
do { \
__m512i t = FP ## n ## _ ## p ## _16W(x6, x5, x4, x3, x2, x1, x0); \
x7 = _mm512_add_epi32( _mm512_add_epi32( mm512_ror_32( t, 7 ), \
mm512_ror_32( x7, 11 ) ), w ); \
} while (0)
#define PASS1_16W(n, in) do { \
unsigned pass_count; \
for (pass_count = 0; pass_count < 32; pass_count += 8) { \
STEP1_16W(n, 1, s7, s6, s5, s4, s3, s2, s1, s0, \
in(pass_count + 0) ); \
STEP1_16W(n, 1, s6, s5, s4, s3, s2, s1, s0, s7, \
in(pass_count + 1) ); \
STEP1_16W(n, 1, s5, s4, s3, s2, s1, s0, s7, s6, \
in(pass_count + 2) ); \
STEP1_16W(n, 1, s4, s3, s2, s1, s0, s7, s6, s5, \
in(pass_count + 3) ); \
STEP1_16W(n, 1, s3, s2, s1, s0, s7, s6, s5, s4, \
in(pass_count + 4) ); \
STEP1_16W(n, 1, s2, s1, s0, s7, s6, s5, s4, s3, \
in(pass_count + 5) ); \
STEP1_16W(n, 1, s1, s0, s7, s6, s5, s4, s3, s2, \
in(pass_count + 6) ); \
STEP1_16W(n, 1, s0, s7, s6, s5, s4, s3, s2, s1, \
in(pass_count + 7) ); \
} \
} while (0)
#define PASSG_16W(p, n, in) do { \
unsigned pass_count; \
for (pass_count = 0; pass_count < 32; pass_count += 8) { \
STEP_16W(n, p, s7, s6, s5, s4, s3, s2, s1, s0, \
in(MP ## p[pass_count + 0]), \
RK ## p[pass_count + 0]); \
STEP_16W(n, p, s6, s5, s4, s3, s2, s1, s0, s7, \
in(MP ## p[pass_count + 1]), \
RK ## p[pass_count + 1]); \
STEP_16W(n, p, s5, s4, s3, s2, s1, s0, s7, s6, \
in(MP ## p[pass_count + 2]), \
RK ## p[pass_count + 2]); \
STEP_16W(n, p, s4, s3, s2, s1, s0, s7, s6, s5, \
in(MP ## p[pass_count + 3]), \
RK ## p[pass_count + 3]); \
STEP_16W(n, p, s3, s2, s1, s0, s7, s6, s5, s4, \
in(MP ## p[pass_count + 4]), \
RK ## p[pass_count + 4]); \
STEP_16W(n, p, s2, s1, s0, s7, s6, s5, s4, s3, \
in(MP ## p[pass_count + 5]), \
RK ## p[pass_count + 5]); \
STEP_16W(n, p, s1, s0, s7, s6, s5, s4, s3, s2, \
in(MP ## p[pass_count + 6]), \
RK ## p[pass_count + 6]); \
STEP_16W(n, p, s0, s7, s6, s5, s4, s3, s2, s1, \
in(MP ## p[pass_count + 7]), \
RK ## p[pass_count + 7]); \
} \
} while (0)
#define PASS2_16W(n, in) PASSG_16W(2, n, in)
#define PASS3_16W(n, in) PASSG_16W(3, n, in)
#define PASS4_16W(n, in) PASSG_16W(4, n, in)
#define PASS5_16W(n, in) PASSG_16W(5, n, in)
#define SAVE_STATE_16W \
__m512i u0, u1, u2, u3, u4, u5, u6, u7; \
do { \
u0 = s0; \
u1 = s1; \
u2 = s2; \
u3 = s3; \
u4 = s4; \
u5 = s5; \
u6 = s6; \
u7 = s7; \
} while (0)
#define UPDATE_STATE_16W \
do { \
s0 = _mm512_add_epi32( s0, u0 ); \
s1 = _mm512_add_epi32( s1, u1 ); \
s2 = _mm512_add_epi32( s2, u2 ); \
s3 = _mm512_add_epi32( s3, u3 ); \
s4 = _mm512_add_epi32( s4, u4 ); \
s5 = _mm512_add_epi32( s5, u5 ); \
s6 = _mm512_add_epi32( s6, u6 ); \
s7 = _mm512_add_epi32( s7, u7 ); \
} while (0)
#define CORE_16W5(in) do { \
SAVE_STATE_16W; \
PASS1_16W(5, in); \
PASS2_16W(5, in); \
PASS3_16W(5, in); \
PASS4_16W(5, in); \
PASS5_16W(5, in); \
UPDATE_STATE_16W; \
} while (0)
#define DSTATE_16W __m512i s0, s1, s2, s3, s4, s5, s6, s7
#define RSTATE_16W \
do { \
s0 = sc->s0; \
s1 = sc->s1; \
s2 = sc->s2; \
s3 = sc->s3; \
s4 = sc->s4; \
s5 = sc->s5; \
s6 = sc->s6; \
s7 = sc->s7; \
} while (0)
#define WSTATE_16W \
do { \
sc->s0 = s0; \
sc->s1 = s1; \
sc->s2 = s2; \
sc->s3 = s3; \
sc->s4 = s4; \
sc->s5 = s5; \
sc->s6 = s6; \
sc->s7 = s7; \
} while (0)
static void
haval_16way_init( haval_16way_context *sc, unsigned olen, unsigned passes )
{
sc->s0 = v512_32( 0x243F6A88UL );
sc->s1 = v512_32( 0x85A308D3UL );
sc->s2 = v512_32( 0x13198A2EUL );
sc->s3 = v512_32( 0x03707344UL );
sc->s4 = v512_32( 0xA4093822UL );
sc->s5 = v512_32( 0x299F31D0UL );
sc->s6 = v512_32( 0x082EFA98UL );
sc->s7 = v512_32( 0xEC4E6C89UL );
sc->olen = olen;
sc->passes = passes;
sc->count_high = 0;
sc->count_low = 0;
}
#define IN_PREPARE_16W(indata) const __m512i *const load_ptr_16w = (indata)
#define INW_16W(i) load_ptr_16w[ i ]
static void
haval_16way_out( haval_16way_context *sc, void *dst )
{
__m512i *buf = (__m512i*)dst;
DSTATE_16W;
RSTATE_16W;
buf[0] = s0;
buf[1] = s1;
buf[2] = s2;
buf[3] = s3;
buf[4] = s4;
buf[5] = s5;
buf[6] = s6;
buf[7] = s7;
}
#undef PASSES
#define PASSES 5
#include "haval-16way-helper.c"
#define API_16W(xxx, y) \
void \
haval ## xxx ## _ ## y ## _16way_init(void *cc) \
{ \
haval_16way_init(cc, xxx >> 5, y); \
} \
\
void \
haval ## xxx ## _ ## y ## _16way_update (void *cc, const void *data, size_t len) \
{ \
haval ## y ## _16way_update(cc, data, len); \
} \
\
void \
haval ## xxx ## _ ## y ## _16way_close(void *cc, void *dst) \
{ \
haval ## y ## _16way_close(cc, dst); \
} \
API_16W(256, 5)
#define RVAL_16W \
do { \
s0 = val[0]; \
s1 = val[1]; \
s2 = val[2]; \
s3 = val[3]; \
s4 = val[4]; \
s5 = val[5]; \
s6 = val[6]; \
s7 = val[7]; \
} while (0)
#define WVAL_16W \
do { \
val[0] = s0; \
val[1] = s1; \
val[2] = s2; \
val[3] = s3; \
val[4] = s4; \
val[5] = s5; \
val[6] = s6; \
val[7] = s7; \
} while (0)
#define INMSG_16W(i) msg[i]
#endif
#ifdef __cplusplus
}
#endif

View File

@@ -68,7 +68,6 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "simd-utils.h"
#define SPH_SIZE_haval256_5 256
@@ -77,7 +76,7 @@ typedef struct {
__m128i buf[32];
__m128i s0, s1, s2, s3, s4, s5, s6, s7;
unsigned olen, passes;
sph_u32 count_high, count_low;
uint32_t count_high, count_low;
} haval_4way_context;
typedef haval_4way_context haval256_5_4way_context;
@@ -108,6 +107,25 @@ void haval256_5_8way_close( void *cc, void *dst );
#endif // AVX2
#if defined(__AVX512F__) && defined(__AVX512VL__) && defined(__AVX512DQ__) && defined(__AVX512BW__)
typedef struct {
__m512i buf[32];
__m512i s0, s1, s2, s3, s4, s5, s6, s7;
unsigned olen, passes;
uint32_t count_high, count_low;
} haval_16way_context __attribute__ ((aligned (64)));
typedef haval_16way_context haval256_5_16way_context;
void haval256_5_16way_init( void *cc );
void haval256_5_16way_update( void *cc, const void *data, size_t len );
void haval256_5_16way_close( void *cc, void *dst );
#endif // AVX512
#ifdef __cplusplus
}
#endif

View File

@@ -66,7 +66,7 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
/**
* Output size (in bits) for HAVAL-128/3.

View File

@@ -1,382 +0,0 @@
/*
* HEFTY1 cryptographic hash function
*
* Copyright (c) 2014, dbcc14 <BM-NBx4AKznJuyem3dArgVY8MGyABpihRy5>
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
* ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* The views and conclusions contained in the software and documentation are those
* of the authors and should not be interpreted as representing official policies,
* either expressed or implied, of the FreeBSD Project.
*/
#include <assert.h>
#include <string.h>
#ifdef _MSC_VER
#define inline __inline
#endif
#include "sph_hefty1.h"
#define Min(A, B) (A <= B ? A : B)
#define RoundFunc(ctx, A, B, C, D, E, F, G, H, W, K) \
{ \
/* To thwart parallelism, Br modifies itself each time it's \
* called. This also means that calling it in different \
* orders yeilds different results. In C the order of \
* evaluation of function arguments and + operands are \
* unspecified (and depends on the compiler), so we must make \
* the order of Br calls explicit. \
*/ \
uint32_t brG = Br(ctx, G); \
uint32_t tmp1 = Ch(E, Br(ctx, F), brG) + H + W + K; \
uint32_t tmp2 = tmp1 + Sigma1(Br(ctx, E)); \
uint32_t brC = Br(ctx, C); \
uint32_t brB = Br(ctx, B); \
uint32_t tmp3 = Ma(Br(ctx, A), brB, brC); \
uint32_t tmp4 = tmp3 + Sigma0(Br(ctx, A)); \
H = G; \
G = F; \
F = E; \
E = D + Br(ctx, tmp2); \
D = C; \
C = B; \
B = A; \
A = tmp2 + tmp4; \
} \
/* Nothing up my sleeve constants */
const static uint32_t K[64] = {
0x428a2f98UL, 0x71374491UL, 0xb5c0fbcfUL, 0xe9b5dba5UL,
0x3956c25bUL, 0x59f111f1UL, 0x923f82a4UL, 0xab1c5ed5UL,
0xd807aa98UL, 0x12835b01UL, 0x243185beUL, 0x550c7dc3UL,
0x72be5d74UL, 0x80deb1feUL, 0x9bdc06a7UL, 0xc19bf174UL,
0xe49b69c1UL, 0xefbe4786UL, 0x0fc19dc6UL, 0x240ca1ccUL,
0x2de92c6fUL, 0x4a7484aaUL, 0x5cb0a9dcUL, 0x76f988daUL,
0x983e5152UL, 0xa831c66dUL, 0xb00327c8UL, 0xbf597fc7UL,
0xc6e00bf3UL, 0xd5a79147UL, 0x06ca6351UL, 0x14292967UL,
0x27b70a85UL, 0x2e1b2138UL, 0x4d2c6dfcUL, 0x53380d13UL,
0x650a7354UL, 0x766a0abbUL, 0x81c2c92eUL, 0x92722c85UL,
0xa2bfe8a1UL, 0xa81a664bUL, 0xc24b8b70UL, 0xc76c51a3UL,
0xd192e819UL, 0xd6990624UL, 0xf40e3585UL, 0x106aa070UL,
0x19a4c116UL, 0x1e376c08UL, 0x2748774cUL, 0x34b0bcb5UL,
0x391c0cb3UL, 0x4ed8aa4aUL, 0x5b9cca4fUL, 0x682e6ff3UL,
0x748f82eeUL, 0x78a5636fUL, 0x84c87814UL, 0x8cc70208UL,
0x90befffaUL, 0xa4506cebUL, 0xbef9a3f7UL, 0xc67178f2UL
};
/* Initial hash values */
const static uint32_t H[HEFTY1_STATE_WORDS] = {
0x6a09e667UL,
0xbb67ae85UL,
0x3c6ef372UL,
0xa54ff53aUL,
0x510e527fUL,
0x9b05688cUL,
0x1f83d9abUL,
0x5be0cd19UL
};
static inline uint32_t Rr(uint32_t X, uint8_t n)
{
return (X >> n) | (X << (32 - n));
}
static inline uint32_t Ch(uint32_t E, uint32_t F, uint32_t G)
{
return (E & F) ^ (~E & G);
}
static inline uint32_t Sigma1(uint32_t E)
{
return Rr(E, 6) ^ Rr(E, 11) ^ Rr(E, 25);
}
static inline uint32_t sigma1(uint32_t X)
{
return Rr(X, 17) ^ Rr(X, 19) ^ (X >> 10);
}
static inline uint32_t Ma(uint32_t A, uint32_t B, uint32_t C)
{
return (A & B) ^ (A & C) ^ (B & C);
}
static inline uint32_t Sigma0(uint32_t A)
{
return Rr(A, 2) ^ Rr(A, 13) ^ Rr(A, 22);
}
static inline uint32_t sigma0(uint32_t X)
{
return Rr(X, 7) ^ Rr(X, 18) ^ (X >> 3);
}
static inline uint32_t Reverse32(uint32_t n)
{
#if BYTE_ORDER == LITTLE_ENDIAN
return n << 24 | (n & 0x0000ff00) << 8 | (n & 0x00ff0000) >> 8 | n >> 24;
#else
return n;
#endif
}
static inline uint64_t Reverse64(uint64_t n)
{
#if BYTE_ORDER == LITTLE_ENDIAN
uint32_t a = n >> 32;
uint32_t b = (n << 32) >> 32;
return (uint64_t)Reverse32(b) << 32 | Reverse32(a);
#else
return n;
#endif
}
/* Smoosh byte into nibble */
static inline uint8_t Smoosh4(uint8_t X)
{
return (X >> 4) ^ (X & 0xf);
}
/* Smoosh 32-bit word into 2-bits */
static inline uint8_t Smoosh2(uint32_t X)
{
uint16_t w = (X >> 16) ^ (X & 0xffff);
uint8_t n = Smoosh4((w >> 8) ^ (w & 0xff));
return (n >> 2) ^ (n & 0x3);
}
static void Mangle(uint32_t *S)
{
uint32_t *R = S;
uint32_t *C = &S[1];
uint8_t r0 = Smoosh4(R[0] >> 24);
uint8_t r1 = Smoosh4(R[0] >> 16);
uint8_t r2 = Smoosh4(R[0] >> 8);
uint8_t r3 = Smoosh4(R[0] & 0xff);
int i;
/* Diffuse */
uint32_t tmp = 0;
for (i = 0; i < HEFTY1_SPONGE_WORDS - 1; i++) {
uint8_t r = Smoosh2(tmp);
switch (r) {
case 0:
C[i] ^= Rr(R[0], i + r0);
break;
case 1:
C[i] += Rr(~R[0], i + r1);
break;
case 2:
C[i] &= Rr(~R[0], i + r2);
break;
case 3:
C[i] ^= Rr(R[0], i + r3);
break;
}
tmp ^= C[i];
}
/* Compress */
tmp = 0;
for (i = 0; i < HEFTY1_SPONGE_WORDS - 1; i++)
if (i % 2)
tmp ^= C[i];
else
tmp += C[i];
R[0] ^= tmp;
}
static void Absorb(uint32_t *S, uint32_t X)
{
uint32_t *R = S;
R[0] ^= X;
Mangle(S);
}
static uint32_t Squeeze(uint32_t *S)
{
uint32_t Y = S[0];
Mangle(S);
return Y;
}
/* Branch, compress and serialize function */
static inline uint32_t Br(HEFTY1_CTX *ctx, uint32_t X)
{
uint32_t R = Squeeze(ctx->sponge);
uint8_t r0 = R >> 8;
uint8_t r1 = R & 0xff;
uint32_t Y = 1 << (r0 % 32);
switch (r1 % 4)
{
case 0:
/* Do nothing */
break;
case 1:
return X & ~Y;
case 2:
return X | Y;
case 3:
return X ^ Y;
}
return X;
}
static void HashBlock(HEFTY1_CTX *ctx)
{
uint32_t A, B, C, D, E, F, G, H;
uint32_t W[HEFTY1_BLOCK_BYTES];
assert(ctx);
A = ctx->h[0];
B = ctx->h[1];
C = ctx->h[2];
D = ctx->h[3];
E = ctx->h[4];
F = ctx->h[5];
G = ctx->h[6];
H = ctx->h[7];
int t = 0;
for (; t < 16; t++) {
W[t] = Reverse32(((uint32_t *)&ctx->block[0])[t]); /* To host byte order */
Absorb(ctx->sponge, W[t] ^ K[t]);
}
for (t = 0; t < 16; t++) {
Absorb(ctx->sponge, D ^ H);
RoundFunc(ctx, A, B, C, D, E, F, G, H, W[t], K[t]);
}
for (t = 16; t < 64; t++) {
Absorb(ctx->sponge, H + D);
W[t] = sigma1(W[t - 2]) + W[t - 7] + sigma0(W[t - 15]) + W[t - 16];
RoundFunc(ctx, A, B, C, D, E, F, G, H, W[t], K[t]);
}
ctx->h[0] += A;
ctx->h[1] += B;
ctx->h[2] += C;
ctx->h[3] += D;
ctx->h[4] += E;
ctx->h[5] += F;
ctx->h[6] += G;
ctx->h[7] += H;
A = 0;
B = 0;
C = 0;
D = 0;
E = 0;
F = 0;
G = 0;
H = 0;
memset(W, 0, sizeof(W));
}
/* Public interface */
void HEFTY1_Init(HEFTY1_CTX *ctx)
{
assert(ctx);
memcpy(ctx->h, H, sizeof(ctx->h));
memset(ctx->block, 0, sizeof(ctx->block));
ctx->written = 0;
memset(ctx->sponge, 0, sizeof(ctx->sponge));
}
void HEFTY1_Update(HEFTY1_CTX *ctx, const void *buf, size_t len)
{
assert(ctx);
uint64_t read = 0;
while (len) {
size_t end = (size_t)(ctx->written % HEFTY1_BLOCK_BYTES);
size_t count = Min(len, HEFTY1_BLOCK_BYTES - end);
memcpy(&ctx->block[end], &((unsigned char *)buf)[read], count);
len -= count;
read += count;
ctx->written += count;
if (!(ctx->written % HEFTY1_BLOCK_BYTES))
HashBlock(ctx);
}
}
void HEFTY1_Final(unsigned char *digest, HEFTY1_CTX *ctx)
{
assert(digest);
assert(ctx);
/* Pad message (FIPS 180 Section 5.1.1) */
size_t used = (size_t)(ctx->written % HEFTY1_BLOCK_BYTES);
ctx->block[used++] = 0x80; /* Append 1 to end of message */
if (used > HEFTY1_BLOCK_BYTES - 8) {
/* We have already written into the last 64bits, so
* we must continue into the next block. */
memset(&ctx->block[used], 0, HEFTY1_BLOCK_BYTES - used);
HashBlock(ctx);
used = 0; /* Create a new block (below) */
}
/* All remaining bits to zero */
memset(&ctx->block[used], 0, HEFTY1_BLOCK_BYTES - 8 - used);
/* The last 64bits encode the length (in network byte order) */
uint64_t *len = (uint64_t *)&ctx->block[HEFTY1_BLOCK_BYTES - 8];
*len = Reverse64(ctx->written*8);
HashBlock(ctx);
/* Convert back to network byte order */
int i = 0;
for (; i < HEFTY1_STATE_WORDS; i++)
ctx->h[i] = Reverse32(ctx->h[i]);
memcpy(digest, ctx->h, sizeof(ctx->h));
memset(ctx, 0, sizeof(HEFTY1_CTX));
}
unsigned char* HEFTY1(const unsigned char *buf, size_t len, unsigned char *digest)
{
HEFTY1_CTX ctx;
static unsigned char m[HEFTY1_DIGEST_BYTES];
if (!digest)
digest = m;
HEFTY1_Init(&ctx);
HEFTY1_Update(&ctx, buf, len);
HEFTY1_Final(digest, &ctx);
return digest;
}

View File

@@ -1,66 +0,0 @@
/*
* HEFTY1 cryptographic hash function
*
* Copyright (c) 2014, dbcc14 <BM-NBx4AKznJuyem3dArgVY8MGyABpihRy5>
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
* ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* The views and conclusions contained in the software and documentation are those
* of the authors and should not be interpreted as representing official policies,
* either expressed or implied, of the FreeBSD Project.
*/
#ifndef __HEFTY1_H__
#define __HEFTY1_H__
#ifdef __cplusplus
extern "C" {
#endif
#ifndef WIN32
#include <sys/types.h>
#endif
#include <inttypes.h>
#define HEFTY1_DIGEST_BYTES 32
#define HEFTY1_BLOCK_BYTES 64
#define HEFTY1_STATE_WORDS 8
#define HEFTY1_SPONGE_WORDS 4
typedef struct HEFTY1_CTX {
uint32_t h[HEFTY1_STATE_WORDS];
uint8_t block[HEFTY1_BLOCK_BYTES];
uint64_t written;
uint32_t sponge[HEFTY1_SPONGE_WORDS];
} HEFTY1_CTX;
void HEFTY1_Init(HEFTY1_CTX *cxt);
void HEFTY1_Update(HEFTY1_CTX *cxt, const void *data, size_t len);
void HEFTY1_Final(unsigned char *digest, HEFTY1_CTX *cxt);
unsigned char* HEFTY1(const unsigned char *data, size_t len, unsigned char *digest);
#ifdef __cplusplus
}
#endif
#endif /* __HEFTY1_H__ */

View File

@@ -45,6 +45,6 @@ void sha512Compute32b_parallel(
uint64_t *data[SHA512_PARALLEL_N],
uint64_t *digest[SHA512_PARALLEL_N]);
void sha512ProcessBlock(Sha512Context *context);
void sha512ProcessBlock(Sha512Context contexti[2] );
#endif

View File

@@ -49,12 +49,11 @@ extern "C"{
#define Sb_8W(x0, x1, x2, x3, c) \
do { \
__m512i cc = _mm512_set1_epi64( c ); \
x3 = mm512_not( x3 ); \
const __m512i cc = _mm512_set1_epi64( c ); \
x0 = mm512_xorandnot( x0, x2, cc ); \
tmp = mm512_xorand( cc, x0, x1 ); \
x0 = mm512_xorand( x0, x2, x3 ); \
x3 = mm512_xorandnot( x3, x1, x2 ); \
x0 = mm512_xorandnot( x0, x3, x2 ); \
x3 = _mm512_ternarylogic_epi64( x3, x1, x2, 0x2d ); /* ~x3 ^ (~x1 & x2) */\
x1 = mm512_xorand( x1, x0, x2 ); \
x2 = mm512_xorandnot( x2, x3, x0 ); \
x0 = mm512_xoror( x0, x1, x3 ); \
@@ -77,19 +76,31 @@ do { \
#endif
#if defined(__AVX512VL__)
//TODO enable for AVX10_256, not used with AVX512VL
#define notxorandnot( a, b, c ) \
_mm256_ternarylogic_epi64( a, b, c, 0x2d )
#else
#define notxorandnot( a, b, c ) \
_mm256_xor_si256( mm256_not( a ), _mm256_andnot_si256( b, c ) )
#endif
#define Sb(x0, x1, x2, x3, c) \
do { \
__m256i cc = _mm256_set1_epi64x( c ); \
x3 = mm256_not( x3 ); \
x0 = _mm256_xor_si256( x0, _mm256_andnot_si256( x2, cc ) ); \
tmp = _mm256_xor_si256( cc, _mm256_and_si256( x0, x1 ) ); \
x0 = _mm256_xor_si256( x0, _mm256_and_si256( x2, x3 ) ); \
x3 = _mm256_xor_si256( x3, _mm256_andnot_si256( x1, x2 ) ); \
x1 = _mm256_xor_si256( x1, _mm256_and_si256( x0, x2 ) ); \
x2 = _mm256_xor_si256( x2, _mm256_andnot_si256( x3, x0 ) ); \
x0 = _mm256_xor_si256( x0, _mm256_or_si256( x1, x3 ) ); \
x3 = _mm256_xor_si256( x3, _mm256_and_si256( x1, x2 ) ); \
x1 = _mm256_xor_si256( x1, _mm256_and_si256( tmp, x0 ) ); \
const __m256i cc = _mm256_set1_epi64x( c ); \
x0 = mm256_xorandnot( x0, x2, cc ); \
tmp = mm256_xorand( cc, x0, x1 ); \
x0 = mm256_xorandnot( x0, x3, x2 ); \
x3 = notxorandnot( x3, x1, x2 ); \
x1 = mm256_xorand( x1, x0, x2 ); \
x2 = mm256_xorandnot( x2, x3, x0 ); \
x0 = mm256_xoror( x0, x1, x3 ); \
x3 = mm256_xorand( x3, x1, x2 ); \
x1 = mm256_xorand( x1, tmp, x0 ); \
x2 = _mm256_xor_si256( x2, tmp ); \
} while (0)
@@ -97,11 +108,11 @@ do { \
do { \
x4 = _mm256_xor_si256( x4, x1 ); \
x5 = _mm256_xor_si256( x5, x2 ); \
x6 = _mm256_xor_si256( x6, _mm256_xor_si256( x3, x0 ) ); \
x6 = mm256_xor3( x6, x3, x0 ); \
x7 = _mm256_xor_si256( x7, x0 ); \
x0 = _mm256_xor_si256( x0, x5 ); \
x1 = _mm256_xor_si256( x1, x6 ); \
x2 = _mm256_xor_si256( x2, _mm256_xor_si256( x7, x4 ) ); \
x2 = mm256_xor3( x2, x7, x4 ); \
x3 = _mm256_xor_si256( x3, x4 ); \
} while (0)
@@ -324,12 +335,12 @@ do { \
} while (0)
#define W80(x) Wz_8W(x, m512_const1_64( 0x5555555555555555 ), 1 )
#define W81(x) Wz_8W(x, m512_const1_64( 0x3333333333333333 ), 2 )
#define W82(x) Wz_8W(x, m512_const1_64( 0x0F0F0F0F0F0F0F0F ), 4 )
#define W83(x) Wz_8W(x, m512_const1_64( 0x00FF00FF00FF00FF ), 8 )
#define W84(x) Wz_8W(x, m512_const1_64( 0x0000FFFF0000FFFF ), 16 )
#define W85(x) Wz_8W(x, m512_const1_64( 0x00000000FFFFFFFF ), 32 )
#define W80(x) Wz_8W(x, _mm512_set1_epi64( 0x5555555555555555 ), 1 )
#define W81(x) Wz_8W(x, _mm512_set1_epi64( 0x3333333333333333 ), 2 )
#define W82(x) Wz_8W(x, _mm512_set1_epi64( 0x0F0F0F0F0F0F0F0F ), 4 )
#define W83(x) Wz_8W(x, _mm512_set1_epi64( 0x00FF00FF00FF00FF ), 8 )
#define W84(x) Wz_8W(x, _mm512_set1_epi64( 0x0000FFFF0000FFFF ), 16 )
#define W85(x) Wz_8W(x, _mm512_set1_epi64( 0x00000000FFFFFFFF ), 32 )
#define W86(x) \
do { \
__m512i t = x ## h; \
@@ -353,12 +364,12 @@ do { \
x ## l = _mm256_or_si256( _mm256_and_si256((x ## l >> (n)), (c)), t ); \
} while (0)
#define W0(x) Wz(x, m256_const1_64( 0x5555555555555555 ), 1 )
#define W1(x) Wz(x, m256_const1_64( 0x3333333333333333 ), 2 )
#define W2(x) Wz(x, m256_const1_64( 0x0F0F0F0F0F0F0F0F ), 4 )
#define W3(x) Wz(x, m256_const1_64( 0x00FF00FF00FF00FF ), 8 )
#define W4(x) Wz(x, m256_const1_64( 0x0000FFFF0000FFFF ), 16 )
#define W5(x) Wz(x, m256_const1_64( 0x00000000FFFFFFFF ), 32 )
#define W0(x) Wz(x, _mm256_set1_epi64x( 0x5555555555555555 ), 1 )
#define W1(x) Wz(x, _mm256_set1_epi64x( 0x3333333333333333 ), 2 )
#define W2(x) Wz(x, _mm256_set1_epi64x( 0x0F0F0F0F0F0F0F0F ), 4 )
#define W3(x) Wz(x, _mm256_set1_epi64x( 0x00FF00FF00FF00FF ), 8 )
#define W4(x) Wz(x, _mm256_set1_epi64x( 0x0000FFFF0000FFFF ), 16 )
#define W5(x) Wz(x, _mm256_set1_epi64x( 0x00000000FFFFFFFF ), 32 )
#define W6(x) \
do { \
__m256i t = x ## h; \
@@ -625,22 +636,22 @@ static const sph_u64 IV512[] = {
void jh256_8way_init( jh_8way_context *sc )
{
// bswapped IV256
sc->H[ 0] = m512_const1_64( 0xebd3202c41a398eb );
sc->H[ 1] = m512_const1_64( 0xc145b29c7bbecd92 );
sc->H[ 2] = m512_const1_64( 0xfac7d4609151931c );
sc->H[ 3] = m512_const1_64( 0x038a507ed6820026 );
sc->H[ 4] = m512_const1_64( 0x45b92677269e23a4 );
sc->H[ 5] = m512_const1_64( 0x77941ad4481afbe0 );
sc->H[ 6] = m512_const1_64( 0x7a176b0226abb5cd );
sc->H[ 7] = m512_const1_64( 0xa82fff0f4224f056 );
sc->H[ 8] = m512_const1_64( 0x754d2e7f8996a371 );
sc->H[ 9] = m512_const1_64( 0x62e27df70849141d );
sc->H[10] = m512_const1_64( 0x948f2476f7957627 );
sc->H[11] = m512_const1_64( 0x6c29804757b6d587 );
sc->H[12] = m512_const1_64( 0x6c0d8eac2d275e5c );
sc->H[13] = m512_const1_64( 0x0f7a0557c6508451 );
sc->H[14] = m512_const1_64( 0xea12247067d3e47b );
sc->H[15] = m512_const1_64( 0x69d71cd313abe389 );
sc->H[ 0] = _mm512_set1_epi64( 0xebd3202c41a398eb );
sc->H[ 1] = _mm512_set1_epi64( 0xc145b29c7bbecd92 );
sc->H[ 2] = _mm512_set1_epi64( 0xfac7d4609151931c );
sc->H[ 3] = _mm512_set1_epi64( 0x038a507ed6820026 );
sc->H[ 4] = _mm512_set1_epi64( 0x45b92677269e23a4 );
sc->H[ 5] = _mm512_set1_epi64( 0x77941ad4481afbe0 );
sc->H[ 6] = _mm512_set1_epi64( 0x7a176b0226abb5cd );
sc->H[ 7] = _mm512_set1_epi64( 0xa82fff0f4224f056 );
sc->H[ 8] = _mm512_set1_epi64( 0x754d2e7f8996a371 );
sc->H[ 9] = _mm512_set1_epi64( 0x62e27df70849141d );
sc->H[10] = _mm512_set1_epi64( 0x948f2476f7957627 );
sc->H[11] = _mm512_set1_epi64( 0x6c29804757b6d587 );
sc->H[12] = _mm512_set1_epi64( 0x6c0d8eac2d275e5c );
sc->H[13] = _mm512_set1_epi64( 0x0f7a0557c6508451 );
sc->H[14] = _mm512_set1_epi64( 0xea12247067d3e47b );
sc->H[15] = _mm512_set1_epi64( 0x69d71cd313abe389 );
sc->ptr = 0;
sc->block_count = 0;
}
@@ -648,22 +659,22 @@ void jh256_8way_init( jh_8way_context *sc )
void jh512_8way_init( jh_8way_context *sc )
{
// bswapped IV512
sc->H[ 0] = m512_const1_64( 0x17aa003e964bd16f );
sc->H[ 1] = m512_const1_64( 0x43d5157a052e6a63 );
sc->H[ 2] = m512_const1_64( 0x0bef970c8d5e228a );
sc->H[ 3] = m512_const1_64( 0x61c3b3f2591234e9 );
sc->H[ 4] = m512_const1_64( 0x1e806f53c1a01d89 );
sc->H[ 5] = m512_const1_64( 0x806d2bea6b05a92a );
sc->H[ 6] = m512_const1_64( 0xa6ba7520dbcc8e58 );
sc->H[ 7] = m512_const1_64( 0xf73bf8ba763a0fa9 );
sc->H[ 8] = m512_const1_64( 0x694ae34105e66901 );
sc->H[ 9] = m512_const1_64( 0x5ae66f2e8e8ab546 );
sc->H[10] = m512_const1_64( 0x243c84c1d0a74710 );
sc->H[11] = m512_const1_64( 0x99c15a2db1716e3b );
sc->H[12] = m512_const1_64( 0x56f8b19decf657cf );
sc->H[13] = m512_const1_64( 0x56b116577c8806a7 );
sc->H[14] = m512_const1_64( 0xfb1785e6dffcc2e3 );
sc->H[15] = m512_const1_64( 0x4bdd8ccc78465a54 );
sc->H[ 0] = _mm512_set1_epi64( 0x17aa003e964bd16f );
sc->H[ 1] = _mm512_set1_epi64( 0x43d5157a052e6a63 );
sc->H[ 2] = _mm512_set1_epi64( 0x0bef970c8d5e228a );
sc->H[ 3] = _mm512_set1_epi64( 0x61c3b3f2591234e9 );
sc->H[ 4] = _mm512_set1_epi64( 0x1e806f53c1a01d89 );
sc->H[ 5] = _mm512_set1_epi64( 0x806d2bea6b05a92a );
sc->H[ 6] = _mm512_set1_epi64( 0xa6ba7520dbcc8e58 );
sc->H[ 7] = _mm512_set1_epi64( 0xf73bf8ba763a0fa9 );
sc->H[ 8] = _mm512_set1_epi64( 0x694ae34105e66901 );
sc->H[ 9] = _mm512_set1_epi64( 0x5ae66f2e8e8ab546 );
sc->H[10] = _mm512_set1_epi64( 0x243c84c1d0a74710 );
sc->H[11] = _mm512_set1_epi64( 0x99c15a2db1716e3b );
sc->H[12] = _mm512_set1_epi64( 0x56f8b19decf657cf );
sc->H[13] = _mm512_set1_epi64( 0x56b116577c8806a7 );
sc->H[14] = _mm512_set1_epi64( 0xfb1785e6dffcc2e3 );
sc->H[15] = _mm512_set1_epi64( 0x4bdd8ccc78465a54 );
sc->ptr = 0;
sc->block_count = 0;
}
@@ -722,7 +733,7 @@ jh_8way_close( jh_8way_context *sc, unsigned ub, unsigned n, void *dst,
size_t numz, u;
uint64_t l0, l1;
buf[0] = m512_const1_64( 0x80ULL );
buf[0] = _mm512_set1_epi64( 0x80ULL );
if ( sc->ptr == 0 )
numz = 48;
@@ -773,22 +784,22 @@ jh512_8way_close(void *cc, void *dst)
void jh256_4way_init( jh_4way_context *sc )
{
// bswapped IV256
sc->H[ 0] = m256_const1_64( 0xebd3202c41a398eb );
sc->H[ 1] = m256_const1_64( 0xc145b29c7bbecd92 );
sc->H[ 2] = m256_const1_64( 0xfac7d4609151931c );
sc->H[ 3] = m256_const1_64( 0x038a507ed6820026 );
sc->H[ 4] = m256_const1_64( 0x45b92677269e23a4 );
sc->H[ 5] = m256_const1_64( 0x77941ad4481afbe0 );
sc->H[ 6] = m256_const1_64( 0x7a176b0226abb5cd );
sc->H[ 7] = m256_const1_64( 0xa82fff0f4224f056 );
sc->H[ 8] = m256_const1_64( 0x754d2e7f8996a371 );
sc->H[ 9] = m256_const1_64( 0x62e27df70849141d );
sc->H[10] = m256_const1_64( 0x948f2476f7957627 );
sc->H[11] = m256_const1_64( 0x6c29804757b6d587 );
sc->H[12] = m256_const1_64( 0x6c0d8eac2d275e5c );
sc->H[13] = m256_const1_64( 0x0f7a0557c6508451 );
sc->H[14] = m256_const1_64( 0xea12247067d3e47b );
sc->H[15] = m256_const1_64( 0x69d71cd313abe389 );
sc->H[ 0] = _mm256_set1_epi64x( 0xebd3202c41a398eb );
sc->H[ 1] = _mm256_set1_epi64x( 0xc145b29c7bbecd92 );
sc->H[ 2] = _mm256_set1_epi64x( 0xfac7d4609151931c );
sc->H[ 3] = _mm256_set1_epi64x( 0x038a507ed6820026 );
sc->H[ 4] = _mm256_set1_epi64x( 0x45b92677269e23a4 );
sc->H[ 5] = _mm256_set1_epi64x( 0x77941ad4481afbe0 );
sc->H[ 6] = _mm256_set1_epi64x( 0x7a176b0226abb5cd );
sc->H[ 7] = _mm256_set1_epi64x( 0xa82fff0f4224f056 );
sc->H[ 8] = _mm256_set1_epi64x( 0x754d2e7f8996a371 );
sc->H[ 9] = _mm256_set1_epi64x( 0x62e27df70849141d );
sc->H[10] = _mm256_set1_epi64x( 0x948f2476f7957627 );
sc->H[11] = _mm256_set1_epi64x( 0x6c29804757b6d587 );
sc->H[12] = _mm256_set1_epi64x( 0x6c0d8eac2d275e5c );
sc->H[13] = _mm256_set1_epi64x( 0x0f7a0557c6508451 );
sc->H[14] = _mm256_set1_epi64x( 0xea12247067d3e47b );
sc->H[15] = _mm256_set1_epi64x( 0x69d71cd313abe389 );
sc->ptr = 0;
sc->block_count = 0;
}
@@ -796,22 +807,22 @@ void jh256_4way_init( jh_4way_context *sc )
void jh512_4way_init( jh_4way_context *sc )
{
// bswapped IV512
sc->H[ 0] = m256_const1_64( 0x17aa003e964bd16f );
sc->H[ 1] = m256_const1_64( 0x43d5157a052e6a63 );
sc->H[ 2] = m256_const1_64( 0x0bef970c8d5e228a );
sc->H[ 3] = m256_const1_64( 0x61c3b3f2591234e9 );
sc->H[ 4] = m256_const1_64( 0x1e806f53c1a01d89 );
sc->H[ 5] = m256_const1_64( 0x806d2bea6b05a92a );
sc->H[ 6] = m256_const1_64( 0xa6ba7520dbcc8e58 );
sc->H[ 7] = m256_const1_64( 0xf73bf8ba763a0fa9 );
sc->H[ 8] = m256_const1_64( 0x694ae34105e66901 );
sc->H[ 9] = m256_const1_64( 0x5ae66f2e8e8ab546 );
sc->H[10] = m256_const1_64( 0x243c84c1d0a74710 );
sc->H[11] = m256_const1_64( 0x99c15a2db1716e3b );
sc->H[12] = m256_const1_64( 0x56f8b19decf657cf );
sc->H[13] = m256_const1_64( 0x56b116577c8806a7 );
sc->H[14] = m256_const1_64( 0xfb1785e6dffcc2e3 );
sc->H[15] = m256_const1_64( 0x4bdd8ccc78465a54 );
sc->H[ 0] = _mm256_set1_epi64x( 0x17aa003e964bd16f );
sc->H[ 1] = _mm256_set1_epi64x( 0x43d5157a052e6a63 );
sc->H[ 2] = _mm256_set1_epi64x( 0x0bef970c8d5e228a );
sc->H[ 3] = _mm256_set1_epi64x( 0x61c3b3f2591234e9 );
sc->H[ 4] = _mm256_set1_epi64x( 0x1e806f53c1a01d89 );
sc->H[ 5] = _mm256_set1_epi64x( 0x806d2bea6b05a92a );
sc->H[ 6] = _mm256_set1_epi64x( 0xa6ba7520dbcc8e58 );
sc->H[ 7] = _mm256_set1_epi64x( 0xf73bf8ba763a0fa9 );
sc->H[ 8] = _mm256_set1_epi64x( 0x694ae34105e66901 );
sc->H[ 9] = _mm256_set1_epi64x( 0x5ae66f2e8e8ab546 );
sc->H[10] = _mm256_set1_epi64x( 0x243c84c1d0a74710 );
sc->H[11] = _mm256_set1_epi64x( 0x99c15a2db1716e3b );
sc->H[12] = _mm256_set1_epi64x( 0x56f8b19decf657cf );
sc->H[13] = _mm256_set1_epi64x( 0x56b116577c8806a7 );
sc->H[14] = _mm256_set1_epi64x( 0xfb1785e6dffcc2e3 );
sc->H[15] = _mm256_set1_epi64x( 0x4bdd8ccc78465a54 );
sc->ptr = 0;
sc->block_count = 0;
}
@@ -870,7 +881,7 @@ jh_4way_close( jh_4way_context *sc, unsigned ub, unsigned n, void *dst,
size_t numz, u;
uint64_t l0, l1;
buf[0] = m256_const1_64( 0x80ULL );
buf[0] = _mm256_set1_epi64x( 0x80ULL );
if ( sc->ptr == 0 )
numz = 48;

View File

@@ -6,7 +6,7 @@
#if defined(JHA_4WAY)
#include "algo/blake/blake-hash-4way.h"
#include "algo/blake/blake512-hash.h"
#include "algo/skein/skein-hash-4way.h"
#include "algo/jh/jh-hash-4way.h"
#include "algo/keccak/keccak-hash-4way.h"

View File

@@ -41,7 +41,7 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
/**
* Output size (in bits) for JH-224.

View File

@@ -2,7 +2,6 @@
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include "sph_keccak.h"
#include "keccak-hash-4way.h"
#if defined(KECCAK_8WAY)
@@ -49,7 +48,7 @@ int scanhash_keccak_8way( struct work *work, uint32_t max_nonce,
}
}
*noncev = _mm512_add_epi32( *noncev,
m512_const1_64( 0x0000000800000000 ) );
_mm512_set1_epi64( 0x0000000800000000 ) );
n += 8;
} while ( (n < max_nonce-8) && !work_restart[thr_id].restart);
@@ -101,7 +100,7 @@ int scanhash_keccak_4way( struct work *work, uint32_t max_nonce,
}
}
*noncev = _mm256_add_epi32( *noncev,
m256_const1_64( 0x0000000400000000 ) );
_mm256_set1_epi64x( 0x0000000400000000 ) );
n += 4;
} while ( (n < max_nonce-4) && !work_restart[thr_id].restart);
pdata[19] = n;

View File

@@ -9,7 +9,7 @@ int hard_coded_eb = 1;
bool register_keccak_algo( algo_gate_t* gate )
{
gate->optimizations = AVX2_OPT | AVX512_OPT;
gate->gen_merkle_root = (void*)&SHA256_gen_merkle_root;
gate->gen_merkle_root = (void*)&sha256_gen_merkle_root;
opt_target_factor = 128.0;
#if defined (KECCAK_8WAY)
gate->scanhash = (void*)&scanhash_keccak_8way;

View File

@@ -72,11 +72,11 @@ static const uint64_t RC[] = {
// Targetted macros, keccak-macros.h is included for each target.
#define DECL64(x) __m512i x
#define XOR(d, a, b) (d = _mm512_xor_si512(a,b))
#define XOR64 XOR
#define XOR(d, a, b) (d = _mm512_xor_si512(a,b))
#define XOR64 XOR
#define AND64(d, a, b) (d = _mm512_and_si512(a,b))
#define OR64(d, a, b) (d = _mm512_or_si512(a,b))
#define NOT64(d, s) (d = _mm512_xor_si512(s,m512_neg1))
#define NOT64(d, s) (d = mm512_not( s ) )
#define ROL64(d, v, n) (d = mm512_rol_64(v, n))
#define XOROR(d, a, b, c) (d = mm512_xoror(a, b, c))
#define XORAND(d, a, b, c) (d = mm512_xorand(a, b, c))
@@ -180,15 +180,15 @@ static void keccak64_8way_close( keccak64_ctx_m512i *kc, void *dst,
if ( kc->ptr == (lim - 8) )
{
const uint64_t t = eb | 0x8000000000000000;
u.tmp[0] = m512_const1_64( t );
u.tmp[0] = _mm512_set1_epi64( t );
j = 8;
}
else
{
j = lim - kc->ptr;
u.tmp[0] = m512_const1_64( eb );
u.tmp[0] = _mm512_set1_epi64( eb );
memset_zero_512( u.tmp + 1, (j>>3) - 2 );
u.tmp[ (j>>3) - 1] = m512_const1_64( 0x8000000000000000 );
u.tmp[ (j>>3) - 1] = _mm512_set1_epi64( 0x8000000000000000 );
}
keccak64_8way_core( kc, u.tmp, j, lim );
/* Finalize the "lane complement" */
@@ -257,15 +257,15 @@ keccak512_8way_close(void *cc, void *dst)
kc->w[j ] = _mm256_xor_si256( kc->w[j], buf[j] ); \
} while (0)
#define DECL64(x) __m256i x
#define XOR(d, a, b) (d = _mm256_xor_si256(a,b))
#define XOR64 XOR
#define AND64(d, a, b) (d = _mm256_and_si256(a,b))
#define OR64(d, a, b) (d = _mm256_or_si256(a,b))
#define NOT64(d, s) (d = _mm256_xor_si256(s,m256_neg1))
#define ROL64(d, v, n) (d = mm256_rol_64(v, n))
#define XOROR(d, a, b, c) (d = _mm256_xor_si256(a, _mm256_or_si256(b, c)))
#define XORAND(d, a, b, c) (d = _mm256_xor_si256(a, _mm256_and_si256(b, c)))
#define DECL64(x) __m256i x
#define XOR(d, a, b) (d = _mm256_xor_si256(a,b))
#define XOR64 XOR
#define AND64(d, a, b) (d = _mm256_and_si256(a,b))
#define OR64(d, a, b) (d = _mm256_or_si256(a,b))
#define NOT64(d, s) (d = mm256_not( s ) )
#define ROL64(d, v, n) (d = mm256_rol_64(v, n))
#define XOROR(d, a, b, c) (d = mm256_xoror( a, b, c ) )
#define XORAND(d, a, b, c) (d = mm256_xorand( a, b, c ) )
#define XOR3( d, a, b, c ) (d = mm256_xor3( a, b, c ))
#include "keccak-macros.c"
@@ -368,15 +368,15 @@ static void keccak64_close( keccak64_ctx_m256i *kc, void *dst, size_t byte_len,
if ( kc->ptr == (lim - 8) )
{
const uint64_t t = eb | 0x8000000000000000;
u.tmp[0] = m256_const1_64( t );
u.tmp[0] = _mm256_set1_epi64x( t );
j = 8;
}
else
{
j = lim - kc->ptr;
u.tmp[0] = m256_const1_64( eb );
u.tmp[0] = _mm256_set1_epi64x( eb );
memset_zero_256( u.tmp + 1, (j>>3) - 2 );
u.tmp[ (j>>3) - 1] = m256_const1_64( 0x8000000000000000 );
u.tmp[ (j>>3) - 1] = _mm256_set1_epi64x( 0x8000000000000000 );
}
keccak64_core( kc, u.tmp, j, lim );
/* Finalize the "lane complement" */

View File

@@ -1,45 +1,6 @@
/* $Id: sph_keccak.h 216 2010-06-08 09:46:57Z tp $ */
/**
* Keccak interface. This is the interface for Keccak with the
* recommended parameters for SHA-3, with output lengths 224, 256,
* 384 and 512 bits.
*
* ==========================(LICENSE BEGIN)============================
*
* Copyright (c) 2007-2010 Projet RNRT SAPHIR
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* ===========================(LICENSE END)=============================
*
* @file sph_keccak.h
* @author Thomas Pornin <thomas.pornin@cryptolog.com>
*/
#ifndef KECCAK_HASH_4WAY_H__
#define KECCAK_HASH_4WAY_H__
#ifdef __cplusplus
extern "C"{
#endif
#ifdef __AVX2__
#include <stddef.h>
@@ -100,8 +61,4 @@ void keccak512_4way_addbits_and_close(
#endif
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -2,7 +2,6 @@
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include "sph_keccak.h"
#include "keccak-hash-4way.h"
#if defined(KECCAK_8WAY)
@@ -56,7 +55,7 @@ int scanhash_sha3d_8way( struct work *work, uint32_t max_nonce,
}
}
*noncev = _mm512_add_epi32( *noncev,
m512_const1_64( 0x0000000800000000 ) );
_mm512_set1_epi64( 0x0000000800000000 ) );
n += 8;
} while ( likely( (n < last_nonce) && !work_restart[thr_id].restart ) );
@@ -115,7 +114,7 @@ int scanhash_sha3d_4way( struct work *work, uint32_t max_nonce,
}
}
*noncev = _mm256_add_epi32( *noncev,
m256_const1_64( 0x0000000400000000 ) );
_mm256_set1_epi64x( 0x0000000400000000 ) );
n += 4;
} while ( likely( (n < last_nonce) && !work_restart[thr_id].restart ) );
pdata[19] = n;

View File

@@ -41,7 +41,7 @@ extern "C"{
#endif
#include <stddef.h>
#include "algo/sha/sph_types.h"
#include "compat/sph_types.h"
/**
* Output size (in bits) for Keccak-224.

View File

@@ -23,7 +23,6 @@
#define LANE_H
#include <string.h>
//#include "algo/sha/sha3-defs.h"
#include <stdint.h>
typedef unsigned char BitSequence;

Some files were not shown because too many files have changed in this diff Show More