mirror of
https://github.com/JayDDee/cpuminer-opt.git
synced 2025-09-17 23:44:27 +00:00
Updated Support for AArch64 (markdown)
@@ -1,4 +1,4 @@
|
|||||||
Support for AArch64 with AES and SHA2 is at release candidate status.
|
Support for AArch64 with AES and SHA2 is now fully supported with most algos optimized for NEON, AES & SHA with exceptions noted below.
|
||||||
|
|
||||||
This is provided as source code only and may be built on native Linux by following the existing procedure subject to any modifications described below.
|
This is provided as source code only and may be built on native Linux by following the existing procedure subject to any modifications described below.
|
||||||
|
|
||||||
@@ -12,11 +12,10 @@ Requirements:
|
|||||||
|
|
||||||
## Status
|
## Status
|
||||||
|
|
||||||
**cpuminer-opt-23.11 is released, all users should upgrade**
|
**cpuminer-opt-23.12 is released**
|
||||||
|
|
||||||
Highlights from this release:
|
Highlights from this release:
|
||||||
Important fixes to x25x, hmq1725.
|
Multiple fixes to X16R family of algorithms.
|
||||||
Most SHA3 algos now optimized with 2-way for NEON.
|
|
||||||
|
|
||||||
Development environment:
|
Development environment:
|
||||||
* Orange Pi 5 Plus 16 GB, Rockchip 8 core CPU with AES & SHA2
|
* Orange Pi 5 Plus 16 GB, Rockchip 8 core CPU with AES & SHA2
|
||||||
@@ -40,29 +39,28 @@ The miner has been tested on Raspberry Pi 4B, Orange Pi 5 Plus, and Mac Mini fro
|
|||||||
It compiles for all minor versions of armv8.x with or without AES, or SHA2, or both.
|
It compiles for all minor versions of armv8.x with or without AES, or SHA2, or both.
|
||||||
|
|
||||||
What works:
|
What works:
|
||||||
* Most algorithms are working with Neon optimizations.
|
|
||||||
|
* Most algorithms are working with Neon optimizations, 2-way parallel when applicable.
|
||||||
* AES is working for Shavite and Echo.
|
* AES is working for Shavite and Echo.
|
||||||
* SHA is working for all algos.
|
* SHA is working for all algos.
|
||||||
* stratum+ssl and stratum+tcp are working, GBT is untested but expected to work.
|
* stratum+ssl and stratum+tcp are working, GBT is untested but expected to work.
|
||||||
* all configurations ooptions work as usual.
|
* all configurations options work as usual.
|
||||||
|
|
||||||
Known problems:
|
Known problems:
|
||||||
|
|
||||||
* Verthash algo is not working.
|
* Verthash algo is not working.
|
||||||
* MacOS is not working natively, workaround with linux VM.
|
* MacOS is not working natively, workaround with linux VM.
|
||||||
* CPU and feature detection and reporting is incomplete.
|
* CPU and feature detection and reporting is incomplete.
|
||||||
* Groestl, Fugue: multiple issues not AES related, using unoptimized.
|
* Groestl, Fugue: multiple issues not AES related, using unoptimized.
|
||||||
* Some inavlid shares for certain permutations of x16*.
|
* Some algorithms too difficult to test with a CPU are not optimized for NEON.
|
||||||
|
|
||||||
Short term plan:
|
Short term plan:
|
||||||
|
|
||||||
* Figure out what's going on with verthash.
|
* Figure out what's going on with verthash.
|
||||||
* Complete any other work needed to bring parity with SSE2.
|
* Groestl & Fugue AES.
|
||||||
* Performance testing.
|
|
||||||
* Full support for ARM.
|
|
||||||
|
|
||||||
Medium term:
|
Medium term:
|
||||||
|
|
||||||
* Verthash
|
|
||||||
* Groestl & Fugue AES.
|
|
||||||
* Detection of ARM CPU model and architecture minor version.
|
* Detection of ARM CPU model and architecture minor version.
|
||||||
* Find NEON optimization opportunities that exploit it's architecture and instruction set.
|
* Find NEON optimization opportunities that exploit it's architecture and instruction set.
|
||||||
* Apply lessons learned to x86_64.
|
* Apply lessons learned to x86_64.
|
||||||
@@ -85,10 +83,7 @@ X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source d
|
|||||||
|
|
||||||
`uint64x2_t = uint32x2_t * uint32x2_t`
|
`uint64x2_t = uint32x2_t * uint32x2_t`
|
||||||
|
|
||||||
Most uses are the x86_64 format requiring a workaround for ARM. Te curent workaround seems to be functioning correctly where needed.
|
Most uses are the x86_64 format requiring a workaround for ARM. The curent workaround seems to be functioning correctly where needed.
|
||||||
|
|
||||||
NEON has some fancy load instructions that combine load with another oeration like byte swap. These may provide optimizatins that SSE can't.
|
|
||||||
Exploring these is part of the longer term plans once the existing problems are solved and ARM code is up to he same level of optimization level as x86_64.
|
|
||||||
|
|
||||||
NEON has no blend instruction but can emulate one compatible with x86_64 blendv using boolean algebra, but not very efficiently.
|
NEON has no blend instruction but can emulate one compatible with x86_64 blendv using boolean algebra, but not very efficiently.
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user