From 9487e773b240950e15ce28ccad0406e29bd58c96 Mon Sep 17 00:00:00 2001
From: JayDDee <jayddee246@gmail.com>
Date: Tue, 28 Nov 2023 02:53:55 -0500
Subject: [PATCH] Updated Support for AArch64 (markdown)

---
 Support-for-AArch64.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/Support-for-AArch64.md b/Support-for-AArch64.md
index eb49348..ef81326 100644
--- a/Support-for-AArch64.md
+++ b/Support-for-AArch64.md
@@ -68,8 +68,6 @@ Some notable observations about the problems observed:
 
 Verthash is a mystery, it only produces rejects on ARM even with no targtetted code, only compiled C. The same C source works on x86_64 but not on AArch64. Tried with -O3 & -O2. In all other cases falling back to C was always successful. Verthash data file creation and verification work. Verthash has one unique feature in the data-file. No other algo has that and no other algo fails with unoptimized code.
 
-There are a few cases where translating from SSE2 to NEON is diffiult or the workaround kills performance. NEON, being RISC, has no microcode so no programmable shuffle instruction. The only shuffling I can find is sub-vector word & sub-word bit, shift, rotate & reverse. Notably SSE2 can't do bit reversal but can shuffle bytes any which way. Notably Groestl AES, despite not working, is currently slower on ARM that the SPH version.
-
 Multiplications are implemented differently, particularly widening multiplcatiom where the product is twice the bit width of the souces.
 X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source data. In effect x86_64 assumes the data is pre-widened and discards lanes 1 & 3 leaving 2 zero extended 64 bit source integers. With ARM the source arguments are packed into a smaller vector and the product is widened to 64 bits upon multiplication: