mirror of
https://github.com/JayDDee/cpuminer-opt.git
synced 2025-09-17 23:44:27 +00:00
Updated Support for AArch64 (markdown)
@@ -61,8 +61,12 @@ Verthash is a mystery, it only produces rejects on ARM even with no targtetted c
|
||||
There are a few cases where translating from SSE2 to NEON is diffiult or the workaround kills performance. NEON, being RISC, has no microcode so no programmable shuffle instruction. The only shuffling I can find is sub-vector word, sub-word bit, shift, rotate & reverse. Notably SSE2 can't do bit reversal but can shulffle bytes any which way.
|
||||
|
||||
Multiplications are implemented differently, particularly widening multiplcatiom where the product is twice the bit width of the souces.
|
||||
X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source data. In effect x86_64 assumes the data is pre-widened and discards lanes 1 & 3 leaving 2 zero extended 64 bit source integers. With ARM the source arguments are packed into a smaller vector (uint32x2_t * uint32x2_t = uint64x2_t) and the product is widened to 64 bits upon multiplication. Most uses are the x86_64 dormat requiring a workaround for ARM.
|
||||
X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source data. In effect x86_64 assumes the data is pre-widened and discards lanes 1 & 3 leaving 2 zero extended 64 bit source integers. With ARM the source arguments are packed into a smaller vector and the product is widened to 64 bits upon multiplication:
|
||||
`uint64x2_t = uint32x2_t * uint32x2_t`
|
||||
Most uses are the x86_64 dormat requiring a workaround for ARM.
|
||||
|
||||
NEON has some fancy load instructions that combine load with another oeration like byte swap. These may provide optimizatins that SSE can't.
|
||||
Exploring these is part of the longer term plans once the existing problems are solved and ARM code is up to he same level of optimization level as x86_64.
|
||||
|
||||
NEON has no blend instruction but can emulate one compatible with x86_64 blendv using boolean algebra, but not very efficiently.
|
||||
|
||||
|
Reference in New Issue
Block a user