Updated Support for AArch64 (markdown)

JayDDee
2023-11-20 22:33:44 -05:00
parent abfad2240c
commit 228ad6d8a3

@@ -1,4 +1,4 @@
Support for AArch64 with AES and SHA2 is at release candidate status.
Support for AArch64 with AES and SHA2 is now fully supported with most algos optimized for NEON, AES & SHA with exceptions noted below.
This is provided as source code only and may be built on native Linux by following the existing procedure subject to any modifications described below.
@@ -12,11 +12,10 @@ Requirements:
## Status
**cpuminer-opt-23.11 is released, all users should upgrade**
**cpuminer-opt-23.12 is released**
Highlights from this release:
Important fixes to x25x, hmq1725.
Most SHA3 algos now optimized with 2-way for NEON.
Multiple fixes to X16R family of algorithms.
Development environment:
* Orange Pi 5 Plus 16 GB, Rockchip 8 core CPU with AES & SHA2
@@ -40,29 +39,28 @@ The miner has been tested on Raspberry Pi 4B, Orange Pi 5 Plus, and Mac Mini fro
It compiles for all minor versions of armv8.x with or without AES, or SHA2, or both.
What works:
* Most algorithms are working with Neon optimizations.
* Most algorithms are working with Neon optimizations, 2-way parallel when applicable.
* AES is working for Shavite and Echo.
* SHA is working for all algos.
* stratum+ssl and stratum+tcp are working, GBT is untested but expected to work.
* all configurations ooptions work as usual.
* all configurations options work as usual.
Known problems:
* Verthash algo is not working.
* MacOS is not working natively, workaround with linux VM.
* CPU and feature detection and reporting is incomplete.
* Groestl, Fugue: multiple issues not AES related, using unoptimized.
* Some inavlid shares for certain permutations of x16*.
* Some algorithms too difficult to test with a CPU are not optimized for NEON.
Short term plan:
* Figure out what's going on with verthash.
* Complete any other work needed to bring parity with SSE2.
* Performance testing.
* Full support for ARM.
* Groestl & Fugue AES.
Medium term:
* Verthash
* Groestl & Fugue AES.
* Detection of ARM CPU model and architecture minor version.
* Find NEON optimization opportunities that exploit it's architecture and instruction set.
* Apply lessons learned to x86_64.
@@ -85,10 +83,7 @@ X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source d
`uint64x2_t = uint32x2_t * uint32x2_t`
Most uses are the x86_64 format requiring a workaround for ARM. Te curent workaround seems to be functioning correctly where needed.
NEON has some fancy load instructions that combine load with another oeration like byte swap. These may provide optimizatins that SSE can't.
Exploring these is part of the longer term plans once the existing problems are solved and ARM code is up to he same level of optimization level as x86_64.
Most uses are the x86_64 format requiring a workaround for ARM. The curent workaround seems to be functioning correctly where needed.
NEON has no blend instruction but can emulate one compatible with x86_64 blendv using boolean algebra, but not very efficiently.