Updated Support for AArch64 (markdown)

JayDDee
2023-11-20 22:33:44 -05:00
parent abfad2240c
commit 228ad6d8a3

@@ -1,4 +1,4 @@
Support for AArch64 with AES and SHA2 is at release candidate status. Support for AArch64 with AES and SHA2 is now fully supported with most algos optimized for NEON, AES & SHA with exceptions noted below.
This is provided as source code only and may be built on native Linux by following the existing procedure subject to any modifications described below. This is provided as source code only and may be built on native Linux by following the existing procedure subject to any modifications described below.
@@ -12,11 +12,10 @@ Requirements:
## Status ## Status
**cpuminer-opt-23.11 is released, all users should upgrade** **cpuminer-opt-23.12 is released**
Highlights from this release: Highlights from this release:
Important fixes to x25x, hmq1725. Multiple fixes to X16R family of algorithms.
Most SHA3 algos now optimized with 2-way for NEON.
Development environment: Development environment:
* Orange Pi 5 Plus 16 GB, Rockchip 8 core CPU with AES & SHA2 * Orange Pi 5 Plus 16 GB, Rockchip 8 core CPU with AES & SHA2
@@ -40,29 +39,28 @@ The miner has been tested on Raspberry Pi 4B, Orange Pi 5 Plus, and Mac Mini fro
It compiles for all minor versions of armv8.x with or without AES, or SHA2, or both. It compiles for all minor versions of armv8.x with or without AES, or SHA2, or both.
What works: What works:
* Most algorithms are working with Neon optimizations.
* Most algorithms are working with Neon optimizations, 2-way parallel when applicable.
* AES is working for Shavite and Echo. * AES is working for Shavite and Echo.
* SHA is working for all algos. * SHA is working for all algos.
* stratum+ssl and stratum+tcp are working, GBT is untested but expected to work. * stratum+ssl and stratum+tcp are working, GBT is untested but expected to work.
* all configurations ooptions work as usual. * all configurations options work as usual.
Known problems: Known problems:
* Verthash algo is not working. * Verthash algo is not working.
* MacOS is not working natively, workaround with linux VM. * MacOS is not working natively, workaround with linux VM.
* CPU and feature detection and reporting is incomplete. * CPU and feature detection and reporting is incomplete.
* Groestl, Fugue: multiple issues not AES related, using unoptimized. * Groestl, Fugue: multiple issues not AES related, using unoptimized.
* Some inavlid shares for certain permutations of x16*. * Some algorithms too difficult to test with a CPU are not optimized for NEON.
Short term plan: Short term plan:
* Figure out what's going on with verthash. * Figure out what's going on with verthash.
* Complete any other work needed to bring parity with SSE2. * Groestl & Fugue AES.
* Performance testing.
* Full support for ARM.
Medium term: Medium term:
* Verthash
* Groestl & Fugue AES.
* Detection of ARM CPU model and architecture minor version. * Detection of ARM CPU model and architecture minor version.
* Find NEON optimization opportunities that exploit it's architecture and instruction set. * Find NEON optimization opportunities that exploit it's architecture and instruction set.
* Apply lessons learned to x86_64. * Apply lessons learned to x86_64.
@@ -85,10 +83,7 @@ X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source d
`uint64x2_t = uint32x2_t * uint32x2_t` `uint64x2_t = uint32x2_t * uint32x2_t`
Most uses are the x86_64 format requiring a workaround for ARM. Te curent workaround seems to be functioning correctly where needed. Most uses are the x86_64 format requiring a workaround for ARM. The curent workaround seems to be functioning correctly where needed.
NEON has some fancy load instructions that combine load with another oeration like byte swap. These may provide optimizatins that SSE can't.
Exploring these is part of the longer term plans once the existing problems are solved and ARM code is up to he same level of optimization level as x86_64.
NEON has no blend instruction but can emulate one compatible with x86_64 blendv using boolean algebra, but not very efficiently. NEON has no blend instruction but can emulate one compatible with x86_64 blendv using boolean algebra, but not very efficiently.