From 228ad6d8a3567eb81bc5fa7219259b247da4adb0 Mon Sep 17 00:00:00 2001 From: JayDDee Date: Mon, 20 Nov 2023 22:33:44 -0500 Subject: [PATCH] Updated Support for AArch64 (markdown) --- Support-for-AArch64.md | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/Support-for-AArch64.md b/Support-for-AArch64.md index 9306263..ef54fef 100644 --- a/Support-for-AArch64.md +++ b/Support-for-AArch64.md @@ -1,4 +1,4 @@ -Support for AArch64 with AES and SHA2 is at release candidate status. +Support for AArch64 with AES and SHA2 is now fully supported with most algos optimized for NEON, AES & SHA with exceptions noted below. This is provided as source code only and may be built on native Linux by following the existing procedure subject to any modifications described below. @@ -12,11 +12,10 @@ Requirements: ## Status -**cpuminer-opt-23.11 is released, all users should upgrade** +**cpuminer-opt-23.12 is released** Highlights from this release: -Important fixes to x25x, hmq1725. -Most SHA3 algos now optimized with 2-way for NEON. +Multiple fixes to X16R family of algorithms. Development environment: * Orange Pi 5 Plus 16 GB, Rockchip 8 core CPU with AES & SHA2 @@ -40,29 +39,28 @@ The miner has been tested on Raspberry Pi 4B, Orange Pi 5 Plus, and Mac Mini fro It compiles for all minor versions of armv8.x with or without AES, or SHA2, or both. What works: -* Most algorithms are working with Neon optimizations. + +* Most algorithms are working with Neon optimizations, 2-way parallel when applicable. * AES is working for Shavite and Echo. * SHA is working for all algos. * stratum+ssl and stratum+tcp are working, GBT is untested but expected to work. -* all configurations ooptions work as usual. +* all configurations options work as usual. Known problems: + * Verthash algo is not working. * MacOS is not working natively, workaround with linux VM. * CPU and feature detection and reporting is incomplete. * Groestl, Fugue: multiple issues not AES related, using unoptimized. -* Some inavlid shares for certain permutations of x16*. +* Some algorithms too difficult to test with a CPU are not optimized for NEON. Short term plan: + * Figure out what's going on with verthash. -* Complete any other work needed to bring parity with SSE2. -* Performance testing. -* Full support for ARM. +* Groestl & Fugue AES. Medium term: -* Verthash -* Groestl & Fugue AES. * Detection of ARM CPU model and architecture minor version. * Find NEON optimization opportunities that exploit it's architecture and instruction set. * Apply lessons learned to x86_64. @@ -85,10 +83,7 @@ X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source d `uint64x2_t = uint32x2_t * uint32x2_t` -Most uses are the x86_64 format requiring a workaround for ARM. Te curent workaround seems to be functioning correctly where needed. - -NEON has some fancy load instructions that combine load with another oeration like byte swap. These may provide optimizatins that SSE can't. -Exploring these is part of the longer term plans once the existing problems are solved and ARM code is up to he same level of optimization level as x86_64. +Most uses are the x86_64 format requiring a workaround for ARM. The curent workaround seems to be functioning correctly where needed. NEON has no blend instruction but can emulate one compatible with x86_64 blendv using boolean algebra, but not very efficiently.