Updated Support for AArch64 (markdown)

2026-02-23 08:53:08 +00:00 · 2023-11-20 22:33:44 -05:00
parent abfad2240c
commit 228ad6d8a3
1 changed files with 11 additions and 16 deletions
--- a/Support-for-AArch64.md
+++ b/Support-for-AArch64.md
@@ -1,4 +1,4 @@
-Support for AArch64 with AES and SHA2 is at release candidate status.
+Support for AArch64 with AES and SHA2 is now fully supported with most algos optimized for NEON, AES & SHA with exceptions noted below.

 This is provided as source code only and may be built on native Linux by following the existing procedure subject to any modifications described below.

@@ -12,11 +12,10 @@ Requirements:

 ## Status

-**cpuminer-opt-23.11 is released, all users should upgrade**
+**cpuminer-opt-23.12 is released**

 Highlights from this release:
-Important fixes to x25x, hmq1725.
-Most SHA3 algos now optimized with 2-way for NEON.
+Multiple fixes to X16R family of algorithms.

 Development environment:
 *  Orange Pi 5 Plus 16 GB, Rockchip 8 core CPU with AES & SHA2
@@ -40,29 +39,28 @@ The miner has been tested on Raspberry Pi 4B, Orange Pi 5 Plus, and Mac Mini fro
 It compiles for all minor versions of armv8.x with or without AES, or SHA2, or both.

 What works:
-* Most algorithms are working with Neon optimizations.
+
+* Most algorithms are working with Neon optimizations, 2-way parallel when applicable.
 * AES is working for Shavite and Echo.
 * SHA is working for all algos.
 * stratum+ssl and stratum+tcp are working, GBT is untested but expected to work.
-* all configurations ooptions work as usual.
+* all configurations options work as usual.

 Known problems:
+
 * Verthash algo is not working.
 * MacOS is not working natively, workaround with linux VM.
 * CPU and feature detection and reporting is incomplete.
 * Groestl, Fugue: multiple issues not AES related, using unoptimized.
-* Some inavlid shares for certain permutations of x16*.
+* Some algorithms too difficult to test with a CPU are not optimized for NEON.

 Short term plan:
+
 * Figure out what's going on with verthash.
-* Complete any other work needed to bring parity with SSE2.
-* Performance testing.
-* Full support for ARM.
+* Groestl & Fugue AES.

 Medium term:

-* Verthash
-* Groestl & Fugue AES.
 * Detection of ARM CPU model and architecture minor version.
 * Find NEON optimization opportunities that exploit it's architecture and instruction set.
 * Apply lessons learned to x86_64.
@@ -85,10 +83,7 @@ X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source d

 `uint64x2_t = uint32x2_t * uint32x2_t`

-Most uses are the x86_64 format requiring a workaround for ARM. Te curent workaround seems to be functioning correctly where needed.
-
-NEON has some fancy load instructions that combine load with another oeration like byte swap. These may provide optimizatins that SSE can't.
-Exploring these is part of the longer term plans once the existing problems are solved and ARM code is up to he same level of optimization level as x86_64. 
+Most uses are the x86_64 format requiring a workaround for ARM. The curent workaround seems to be functioning correctly where needed.

 NEON has no blend instruction but can emulate one compatible with x86_64 blendv using boolean algebra, but not very efficiently.