Prev: measurement Next: measuring-cpus
Before beginning performance optimization, you should know how long an operation should take: You can’t diagnose performance problems if you don’t know if they are fixable.
Latency = time to complete unit of work Throughput = units of work completed / time
Tail latency = slowest transactions in a distribution
In server side software, we generally favor a slower average request with a narrower distribution than a faster average with a wider distribution, because it is harder to predict.
Since a request may be broken down into a hundred different parts, executing them in parallel is important, and tail latency is extremely important to minimize:
In this case, the average, min, max, median are all less useful.
Histograms are more useful, where they measure percentiles.
The author fixed this graph by making the 99th percentile 150 milliseconds, paying for 10 years of salary.
To diagnose performance related issues, first we estimate how long an operation should take, then observing how long it takes, and then finding the differences:
Some transactions may only sometimes be slow because it is waiting on a response from a lower layer.
Thus, we always need to find the slowest lower layer and fix that.
We can test a specific transaction by isolating it and giving it some mock data, preferably from production and then monitoring it.
The most interesting case, where a normal RPC is slow, and only happens in production, requires good instrumentation that has less than 1% overhead.
The five fundamental resources shared between unrelated programs running on a single server are:
With cooperating threads:
To debug performance regressions, start off by measuring the five fundamental resources – if contention is high, performance suffers.
Prev: measurement Next: measuring-cpus