This commit is contained in:
Jay D Dee
2024-09-13 14:14:57 -04:00
parent 47e24b50e8
commit 8e91bfbe19
16 changed files with 2727 additions and 1880 deletions

View File

@@ -32,6 +32,14 @@
// Intrinsics automatically promote from REX to VEX when AVX is available
// but ASM needs to be done manually.
//
// APX supports EGPR which adds 16 more GPRs and 3 operand instructions.
// This may affect ASM that include instructions that are superseded by APX
// versions and are therefore incompatible with APX.
// As a result GCC-14 disables EGPR by default and can be enabled with
// "-mapx-inline-asm-use-gpr32"
//TODO
// Some ASM functions may need to be updated to support EGPR with APX.
//
///////////////////////////////////////////////////////////////////////////////
// New architecturally agnostic syntax:
@@ -164,7 +172,7 @@ typedef union
// necessary the cvt, set, or set1 intrinsics can be used allowing the
// compiler to exploit new features to produce optimum code.
// Currently only used internally and by Luffa.
// It also has implications for APX EGPR feature.
#define v128_mov64 _mm_cvtsi64_si128
#define v128_mov32 _mm_cvtsi32_si128

View File

@@ -125,7 +125,7 @@ static inline __m512i mm512_perm_128( const __m512i v, const int c )
// Pseudo constants.
#define m512_zero _mm512_setzero_si512()
// use asm to avoid compiler warning for unitialized local
// use asm to avoid compiler warning for uninitialized local
static inline __m512i mm512_neg1_fn()
{
__m512i v;

View File

@@ -10,7 +10,18 @@
// This code is not used anywhere annd likely never will. It's intent was
// to support 2 way parallel hashing using MMX, or NEON for 32 bit hash
// functions, but hasn't been implementedwas never implemented.
//
//
// MMX is being deprecated by compilers, all intrinsics will be converted to use SSE
// registers and instructions. MMX will still be available using ASM.
// For backward compatibility it's likely the compiler won't allow mixing explicit SSE
// with promoted MMX. It is therefore preferable to implement all 64 bit vector code
// using explicit SSE with the upper 64 bits being ignored.
// Using SSE for 64 bit vectors will complicate loading arrays from memory which will
// always load 128 bits. Odd indexes will need to be extracted from the upper 64 bits
// of the even index SSE register.
// In most cases the exiting 4x32 SSE code can be used with 2 lanes being ignored
// making ths file obsolete.
#define v64_t __m64
#define v64u32_t v64_t

25
simd-utils/simd-sve.h Normal file
View File

@@ -0,0 +1,25 @@
// Placeholder for now.
//
// This file will hold AArch64 SVE code, a replecement for NEON that uses vector length
// agnostic instructions. This means the same code can be used on CPUs with different
// SVE vector register lengths. This is not good for vectorized hashing.
// Optimum hash is sensitive to the vector register length with different code
// used for different register sizes. On X86_64 the vector length is tied to the CPU
// feature making it simple and efficient to handle different lengths although it
// results in multiple executables. Theoretically SVE could use a single executable for
// any vector length.
//
// With the SVE vector length only known at run time it resultis in run time overhead
// to test the vector length. Theoretically it could be tested at program loading and
// appropriate libraries loaded. However I don't know if this can be done and if so
// how to do it.
//
// SVE is not expected to be used for 128 bit vectors as it does not provide any
// advantages over NEON. However, it may be implemented for testing purposes
// because CPU with registers larger than 128 bits are currently very rare and very
// expensive server class CPUs.
//
// N-way parallel hashing could be the best use of SVE, usimg the same code for all
// vector lengths with the only variable being the number of lanes. This will still
// require run time checking but should be lighter than substituting functions.