mirror of
https://github.com/JayDDee/cpuminer-opt.git
synced 2025-09-17 23:44:27 +00:00
v24.5
This commit is contained in:
@@ -32,6 +32,14 @@
|
||||
// Intrinsics automatically promote from REX to VEX when AVX is available
|
||||
// but ASM needs to be done manually.
|
||||
//
|
||||
// APX supports EGPR which adds 16 more GPRs and 3 operand instructions.
|
||||
// This may affect ASM that include instructions that are superseded by APX
|
||||
// versions and are therefore incompatible with APX.
|
||||
// As a result GCC-14 disables EGPR by default and can be enabled with
|
||||
// "-mapx-inline-asm-use-gpr32"
|
||||
//TODO
|
||||
// Some ASM functions may need to be updated to support EGPR with APX.
|
||||
//
|
||||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
// New architecturally agnostic syntax:
|
||||
@@ -164,7 +172,7 @@ typedef union
|
||||
// necessary the cvt, set, or set1 intrinsics can be used allowing the
|
||||
// compiler to exploit new features to produce optimum code.
|
||||
// Currently only used internally and by Luffa.
|
||||
|
||||
// It also has implications for APX EGPR feature.
|
||||
|
||||
#define v128_mov64 _mm_cvtsi64_si128
|
||||
#define v128_mov32 _mm_cvtsi32_si128
|
||||
|
@@ -125,7 +125,7 @@ static inline __m512i mm512_perm_128( const __m512i v, const int c )
|
||||
// Pseudo constants.
|
||||
#define m512_zero _mm512_setzero_si512()
|
||||
|
||||
// use asm to avoid compiler warning for unitialized local
|
||||
// use asm to avoid compiler warning for uninitialized local
|
||||
static inline __m512i mm512_neg1_fn()
|
||||
{
|
||||
__m512i v;
|
||||
|
@@ -10,7 +10,18 @@
|
||||
// This code is not used anywhere annd likely never will. It's intent was
|
||||
// to support 2 way parallel hashing using MMX, or NEON for 32 bit hash
|
||||
// functions, but hasn't been implementedwas never implemented.
|
||||
//
|
||||
//
|
||||
// MMX is being deprecated by compilers, all intrinsics will be converted to use SSE
|
||||
// registers and instructions. MMX will still be available using ASM.
|
||||
// For backward compatibility it's likely the compiler won't allow mixing explicit SSE
|
||||
// with promoted MMX. It is therefore preferable to implement all 64 bit vector code
|
||||
// using explicit SSE with the upper 64 bits being ignored.
|
||||
// Using SSE for 64 bit vectors will complicate loading arrays from memory which will
|
||||
// always load 128 bits. Odd indexes will need to be extracted from the upper 64 bits
|
||||
// of the even index SSE register.
|
||||
// In most cases the exiting 4x32 SSE code can be used with 2 lanes being ignored
|
||||
// making ths file obsolete.
|
||||
|
||||
|
||||
#define v64_t __m64
|
||||
#define v64u32_t v64_t
|
||||
|
25
simd-utils/simd-sve.h
Normal file
25
simd-utils/simd-sve.h
Normal file
@@ -0,0 +1,25 @@
|
||||
// Placeholder for now.
|
||||
//
|
||||
// This file will hold AArch64 SVE code, a replecement for NEON that uses vector length
|
||||
// agnostic instructions. This means the same code can be used on CPUs with different
|
||||
// SVE vector register lengths. This is not good for vectorized hashing.
|
||||
// Optimum hash is sensitive to the vector register length with different code
|
||||
// used for different register sizes. On X86_64 the vector length is tied to the CPU
|
||||
// feature making it simple and efficient to handle different lengths although it
|
||||
// results in multiple executables. Theoretically SVE could use a single executable for
|
||||
// any vector length.
|
||||
//
|
||||
// With the SVE vector length only known at run time it resultis in run time overhead
|
||||
// to test the vector length. Theoretically it could be tested at program loading and
|
||||
// appropriate libraries loaded. However I don't know if this can be done and if so
|
||||
// how to do it.
|
||||
//
|
||||
// SVE is not expected to be used for 128 bit vectors as it does not provide any
|
||||
// advantages over NEON. However, it may be implemented for testing purposes
|
||||
// because CPU with registers larger than 128 bits are currently very rare and very
|
||||
// expensive server class CPUs.
|
||||
//
|
||||
// N-way parallel hashing could be the best use of SVE, usimg the same code for all
|
||||
// vector lengths with the only variable being the number of lanes. This will still
|
||||
// require run time checking but should be lighter than substituting functions.
|
||||
|
Reference in New Issue
Block a user