mirror of
https://github.com/JayDDee/cpuminer-opt.git
synced 2025-09-17 23:44:27 +00:00
Updated Support for AArch64 (markdown)
@@ -11,11 +11,9 @@ Requirements:
|
||||
|
||||
## Status
|
||||
|
||||
**cpuminer-opt-23.6 is released, all users should upgrade**
|
||||
**cpuminer-opt-23.7 is released, all users should upgrade**
|
||||
|
||||
Changes from v23.5 are **highlighted**.
|
||||
|
||||
2 way parallel hash will be implemented on aplicable algorithms for NEON and will also benefit x86_64 CPUs without AVX2.
|
||||
Highlights: more 2 way parallel implementations, AES for shavite.
|
||||
|
||||
Development environment:
|
||||
* Raspberry Pi-4B 8 GB
|
||||
@@ -34,11 +32,11 @@ The miner compiles and runs on Raspberry Pi 4B, and compiles for all version of
|
||||
What works:
|
||||
* All algorithms except Verthash and Hodl should be working.
|
||||
* Allium, Lyra2z, Lyraz330, Argon2d are fully optimzed for NEON, Allium also for AES untested.
|
||||
* Yespower, Yescrypt, **Scrypt, ScryptN2** are fully optimized, SHA is enbabled but untested.
|
||||
* **Sha256dt, Sha256t, Sha256d are fully optimized**, SHA is enabled but untested.
|
||||
* X17 is the only X* to be optimized in this release.
|
||||
* Yespower, Yescrypt, Scrypt, ScryptN2 are fully optimized, SHA is enbabled but untested.
|
||||
* Sha256dt, Sha256t, Sha256d are fully optimized, SHA2 is also working.
|
||||
* More optimizations for X17.
|
||||
* MinotaurX is partially optimized.
|
||||
* AES & SHA2 are enabled but untested
|
||||
* AES is working for Shavite
|
||||
* stratum+ssl and stratum+tcp are working, GBT is untested but expected to work.
|
||||
* CPU and SW feature detection and reporting is working, algo features in progress, CPU brand not yet implemented.
|
||||
* CPU temperature and clock frequency is working.
|
||||
@@ -50,31 +48,19 @@ Known problems:
|
||||
* No detection of ARM architecture minor version number.
|
||||
* NEON may not be displayed in algo features for some algos that may support it.
|
||||
* Algos may show support for NEON even if it's disabled or not yet implemented.
|
||||
* AES & SHA2 are enabled but untested. Susequent testing has shown sha2 has a bug that caused 50% rejects, aes work for shavite but nit Groestl or Echo,
|
||||
* Several parallel hash functions are disabled on ARM although they work on x86_64.
|
||||
* X17, MinotaurX are partially optimized.
|
||||
* Blake256, Blake512, Blake2s, Blake2b N-way parallel hash not working, using linear when possible, unoptimzed otherwise.
|
||||
* Simd: Multiple issues with NEON, using unoptimized.
|
||||
* Luffa: NEON not working, using unoptimized
|
||||
* Simd: NZEON parallel hash not enasbled, usingunoptimized.
|
||||
* Fugue: Multiple issues with NEON & AES, using unoptimized.
|
||||
* SWIFFTX: Multiple issues with NEON,using unoptimized.
|
||||
* Algos not mentioned have either been deferred or have not been analyzed. They may or may not work on ARM.
|
||||
|
||||
Short term plan:
|
||||
|
||||
New: get it to build and work on Mac using Clang. There appear to be 3 issues that may or may not be raleted and may be covering up other underlying issues.
|
||||
* gmp is missing. there' doesn't seem to be a gmp package available for MacOS. This requires disabling m7m on MacOs. This shsould be a straight forwward workaround but it seems to cause other problems on all OS.
|
||||
* aclocal is missing, need to install automake but neew to install homebrew first.
|
||||
* An error trying to link a non existant include library "yes/include". There is no reference to this library in any cpuminer code, it seems to appear from nowhere in the config logs, and since the library doesn't exist, the build fails. This only happens on MacOS. It seems to be related to jansson which has its own config and make files. There are no direct references to yes/include. The name is suspicious, is it a truncation of yescrypt or did some string manipulation create it from one of the many "yes" strings that exist for handling yes/no conditions?
|
||||
|
||||
Until these issues are resolved MacOS isn't supported.
|
||||
|
||||
Continue fixing parallel hash functions for x17 before propagating them to the rest of the X family.
|
||||
Figure out what's going on with verthash.
|
||||
Extend suport to x21s, x22i, x25x.
|
||||
Add support for the short algos like skein2, keccak, blake2s, etc.
|
||||
Complete any other work needed to bring parity with SSE2.
|
||||
Test AES & SHA , HW permitting.
|
||||
Performance testing.
|
||||
|
||||
Medium term:
|
||||
@@ -99,7 +85,7 @@ X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source d
|
||||
|
||||
`uint64x2_t = uint32x2_t * uint32x2_t`
|
||||
|
||||
Most uses are the x86_64 dormat requiring a workaround for ARM.
|
||||
Most uses are the x86_64 format requiring a workaround for ARM.
|
||||
|
||||
NEON has some fancy load instructions that combine load with another oeration like byte swap. These may provide optimizatins that SSE can't.
|
||||
Exploring these is part of the longer term plans once the existing problems are solved and ARM code is up to he same level of optimization level as x86_64.
|
||||
|
Reference in New Issue
Block a user