From 9487e773b240950e15ce28ccad0406e29bd58c96 Mon Sep 17 00:00:00 2001 From: JayDDee Date: Tue, 28 Nov 2023 02:53:55 -0500 Subject: [PATCH] Updated Support for AArch64 (markdown) --- Support-for-AArch64.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/Support-for-AArch64.md b/Support-for-AArch64.md index eb49348..ef81326 100644 --- a/Support-for-AArch64.md +++ b/Support-for-AArch64.md @@ -68,8 +68,6 @@ Some notable observations about the problems observed: Verthash is a mystery, it only produces rejects on ARM even with no targtetted code, only compiled C. The same C source works on x86_64 but not on AArch64. Tried with -O3 & -O2. In all other cases falling back to C was always successful. Verthash data file creation and verification work. Verthash has one unique feature in the data-file. No other algo has that and no other algo fails with unoptimized code. -There are a few cases where translating from SSE2 to NEON is diffiult or the workaround kills performance. NEON, being RISC, has no microcode so no programmable shuffle instruction. The only shuffling I can find is sub-vector word & sub-word bit, shift, rotate & reverse. Notably SSE2 can't do bit reversal but can shuffle bytes any which way. Notably Groestl AES, despite not working, is currently slower on ARM that the SPH version. - Multiplications are implemented differently, particularly widening multiplcatiom where the product is twice the bit width of the souces. X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source data. In effect x86_64 assumes the data is pre-widened and discards lanes 1 & 3 leaving 2 zero extended 64 bit source integers. With ARM the source arguments are packed into a smaller vector and the product is widened to 64 bits upon multiplication: