mirror of
https://github.com/JayDDee/cpuminer-opt.git
synced 2025-09-17 23:44:27 +00:00
96 lines
4.7 KiB
Plaintext
96 lines
4.7 KiB
Plaintext
Included with yespower is the "benchmark" program, which is built by
|
|
simply invoking "make". When invoked without parameters, it tests
|
|
yespower 0.5 at N = 2048, r = 8, which appears to be the lowest setting
|
|
in use by existing cryptocurrencies. On an i7-4770K with 4x DDR3-1600
|
|
(on two memory channels) running CentOS 7 for x86-64 (and built with
|
|
CentOS 7's default version of gcc) and with thread affinity set, this
|
|
reports between 3700 and 3800 hashes per second for both SSE2 and AVX
|
|
builds, e.g.:
|
|
|
|
$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark
|
|
version=0.5 N=2048 r=8
|
|
Will use 2048.00 KiB RAM
|
|
a5 9f ec 4c 4f dd a1 6e 3b 14 05 ad da 66 d5 25 b6 8e 7c ad fc fe 6a c0 66 c7 ad 11 8c d8 05 90
|
|
Benchmarking 1 thread ...
|
|
1018 H/s real, 1018 H/s virtual (2047 hashes in 2.01 seconds)
|
|
Benchmarking 4 threads ...
|
|
3773 H/s real, 950 H/s virtual (8188 hashes in 2.17 seconds)
|
|
min 0.984 ms, avg 1.052 ms, max 1.074 ms
|
|
|
|
Running 8 threads (to match the logical rather than the physical CPU
|
|
core count) results in very slightly worse performance on this system,
|
|
but this might be the other way around on another and/or with other
|
|
parameters. Upgrading to yespower 1.0, performance at these parameters
|
|
improves to almost 4000 hashes per second:
|
|
|
|
$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark 10
|
|
version=1.0 N=2048 r=8
|
|
Will use 2048.00 KiB RAM
|
|
d0 78 cd d4 cf 3f 5a a8 4e 3c 4a 58 66 29 81 d8 2d 27 e5 67 36 37 c4 be 77 63 61 32 24 c1 8a 93
|
|
Benchmarking 1 thread ...
|
|
1080 H/s real, 1080 H/s virtual (4095 hashes in 3.79 seconds)
|
|
Benchmarking 4 threads ...
|
|
3995 H/s real, 1011 H/s virtual (16380 hashes in 4.10 seconds)
|
|
min 0.923 ms, avg 0.989 ms, max 1.137 ms
|
|
|
|
Running 8 threads results in substantial slowdown with this new version
|
|
(to between 3200 and 3400 hashes per second) because of cache thrashing.
|
|
|
|
For higher settings such as those achieving 8 MiB instead of the 2 MiB
|
|
above, this system performs at around 800 hashes per second for yespower
|
|
0.5 and at around 830 hashes per second for yespower 1.0:
|
|
|
|
$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark 5 2048 32
|
|
version=0.5 N=2048 r=32
|
|
Will use 8192.00 KiB RAM
|
|
56 0a 89 1b 5c a2 e1 c6 36 11 1a 9f f7 c8 94 a5 d0 a2 60 2f 43 fd cf a5 94 9b 95 e2 2f e4 46 1e
|
|
Benchmarking 1 thread ...
|
|
265 H/s real, 265 H/s virtual (1023 hashes in 3.85 seconds)
|
|
Benchmarking 4 threads ...
|
|
803 H/s real, 200 H/s virtual (4092 hashes in 5.09 seconds)
|
|
min 4.924 ms, avg 4.980 ms, max 5.074 ms
|
|
|
|
$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark 10 2048 32
|
|
version=1.0 N=2048 r=32
|
|
Will use 8192.00 KiB RAM
|
|
f7 69 26 ae 4a dc 56 53 90 2f f0 22 78 ea aa 39 eb 99 84 11 ac 3e a6 24 2e 19 6d fb c4 3d 68 25
|
|
Benchmarking 1 thread ...
|
|
275 H/s real, 275 H/s virtual (1023 hashes in 3.71 seconds)
|
|
Benchmarking 4 threads ...
|
|
831 H/s real, 209 H/s virtual (4092 hashes in 4.92 seconds)
|
|
min 3.614 ms, avg 4.769 ms, max 5.011 ms
|
|
|
|
Again, running 8 threads results in a slowdown, albeit not as bad as can
|
|
be seen for lower settings.
|
|
|
|
On x86(-64), the following code versions may reasonably be built: SSE2,
|
|
AVX, and XOP. (There's no reason to build for AVX2 and higher, which is
|
|
unsuitable for and thus unused by current yespower anyway. There's also
|
|
no reason to build yespower as-is for SSE4, although there's a disabled
|
|
by default 32-bit specific SSE4 code version that may be re-enabled and
|
|
given a try if someone is so inclined; it may perform slightly slower or
|
|
slightly faster across different systems.)
|
|
|
|
yescrypt and especially yespower 1.0 have been designed to fit the SSE2
|
|
instruction set almost perfectly, so there's very little benefit from
|
|
the AVX and XOP builds, yet even at yespower 1.0 there may be
|
|
performance differences between SSE2, AVX, and XOP builds within 2% or
|
|
so (and it is unclear which is the fastest on a given system until
|
|
tested, except that where XOP is supported it is almost always faster
|
|
than AVX).
|
|
|
|
Proper setting of thread affinities to run exactly one thread per
|
|
physical CPU core is non-trivial. In the above examples, it so happened
|
|
that the first 4 logical CPU numbers corresponded to different physical
|
|
cores, but this won't always be the case. This can vary even between
|
|
apparently similar systems. On Linux, the mapping of logical CPUs to
|
|
physical cores may be obtained from /proc/cpuinfo (on x86[-64] and MIC)
|
|
or sysfs, which an optimized implementation of e.g. a cryptocurrency
|
|
miner could use. If you do not bother obtaining this information from
|
|
the operating system, you might be better off not setting thread
|
|
affinities at all (in order to avoid the risk of doing this incorrectly,
|
|
which would have a greater negative performance impact) and/or running
|
|
as many threads as there are logical CPUs. Also, there's no certainty
|
|
whether different and future CPUs will run yespower faster using one or
|
|
maybe more threads per physical core.
|