We want to count how many distinct filenames were created in the past 24 hours, but the collection is too big to fit in memory, or a single server.
In exchange, hyperloglog is approximate.
The probability that a random string of bits has K leading zeroes is 1 / 2 ^ K.
The algorithm divides the stream of elements into m
independent buckets and calculates the HLL by calculating the harmonic
mean of the buckets.
Hyperloglog improves this by throwing out the top 30% of the buckets with the highest values, as this decreases error from 1.3/sqrt(m) to 1.05/sqrt(m). Also, the harmonic mean decreases this error to 1.04/sqrt(m).