Updated Support for AArch64 (markdown)

JayDDee
2023-11-28 02:53:55 -05:00
parent 8443d015b6
commit 9487e773b2

@@ -68,8 +68,6 @@ Some notable observations about the problems observed:
Verthash is a mystery, it only produces rejects on ARM even with no targtetted code, only compiled C. The same C source works on x86_64 but not on AArch64. Tried with -O3 & -O2. In all other cases falling back to C was always successful. Verthash data file creation and verification work. Verthash has one unique feature in the data-file. No other algo has that and no other algo fails with unoptimized code.
There are a few cases where translating from SSE2 to NEON is diffiult or the workaround kills performance. NEON, being RISC, has no microcode so no programmable shuffle instruction. The only shuffling I can find is sub-vector word & sub-word bit, shift, rotate & reverse. Notably SSE2 can't do bit reversal but can shuffle bytes any which way. Notably Groestl AES, despite not working, is currently slower on ARM that the SPH version.
Multiplications are implemented differently, particularly widening multiplcatiom where the product is twice the bit width of the souces.
X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source data. In effect x86_64 assumes the data is pre-widened and discards lanes 1 & 3 leaving 2 zero extended 64 bit source integers. With ARM the source arguments are packed into a smaller vector and the product is widened to 64 bits upon multiplication: