Prev: my-program-is-too-slow Next: measuring-memory
CPU performance is hard to measure, due to the following techniques:
Latency is hard to measure due to the following, as well as power saving. Most computers nowadays are heterogeneous, with efficiency cores and performance cores, where performance cores take more energy but have more performance, and efficiency cores take less energy but have less performance.
As well, CPU clocks can be dynamically slowed down to increase efficiency, making the idea of a “cycle” hard to measure.
Thus, to measure the latency of something you have to run it quite a few times.
The code shown in the book, when optimized, shows that there’s no latency, as the code is optimized away.
The compiler can also remove dead code, and reorder code, so you have to be careful about benchmarking. Also, the compiled assembly isn’t the end all be all, because the actual compiled code is also compiled in RISC 86 ops.
2.1 What is the latency of add?
2.2 Run mystery1.cc both unoptimized -O0 and -O2, explain the differences:
2.3 Uncomment the fprintf, see what changes:
2.4 Declare incr as volatile and run again:
2.5 Make your own copy of mystery1.cc and modify it to measure the latency of cycles in a 64-bit integer add. Write down your answer.
2.6 Experiment with the number of loop iterations. Explain why some values do not produce meaningful results.
2.7 Write down your guess for latency of integer 64 bit multiply and divide and floating point multiply and divide.
Integer Multiply: ~15 cycles Integer Divide: ~20 cycles
FP multiply: ~30 cycles? FP Divide: ~40 cycles?
2.8 Now verify your guesses:
Integer multiply: 3-4 cycles Integer Divide: 9-17 cycles
FP multiply: 6-7 cycles FP Div: 15 cycles
2.9 Have your double precision drift into underflow and overflow ranges. Point out the observed latency.
Prev: my-program-is-too-slow Next: measuring-memory