mirror of
https://github.com/JayDDee/cpuminer-opt.git
synced 2025-09-17 23:44:27 +00:00
Updated Support for AArch64 (markdown)
@@ -62,7 +62,9 @@ There are a few cases where translating from SSE2 to NEON is diffiult or the wor
|
||||
|
||||
Multiplications are implemented differently, particularly widening multiplcatiom where the product is twice the bit width of the souces.
|
||||
X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source data. In effect x86_64 assumes the data is pre-widened and discards lanes 1 & 3 leaving 2 zero extended 64 bit source integers. With ARM the source arguments are packed into a smaller vector and the product is widened to 64 bits upon multiplication:
|
||||
|
||||
`uint64x2_t = uint32x2_t * uint32x2_t`
|
||||
|
||||
Most uses are the x86_64 dormat requiring a workaround for ARM.
|
||||
|
||||
NEON has some fancy load instructions that combine load with another oeration like byte swap. These may provide optimizatins that SSE can't.
|
||||
|
Reference in New Issue
Block a user