Concurrent Median in a Stream
Try to figure out the concurrent median in a stream.
Imagine you have a class that:
- accepts the percentile in its constructor
- has a function called “add sample” that adds a sample to find
- has a function called “get sample” that returns the sample corresponding to the percentile.
- Pretty good if you can use a counter/sorted dictionary collection
- Better with buckets, as we see with the followup
Followups
What happens if we allow this class to call get sample
from multiple
threads?
Solutions
- Explain/use Concurrent Hashmap (with buckets)
- Explain how if we use a bucket approach, we only have to use a mutex per bucket, so we can read/write faster with concurrency.
- You can use something like left/right (maintain two collections of the underlying data structure) to have reads and writes not have to mutate the underlying data structure, in exchange for stale reads (since reads dont synchronize before fetching).
- Read-Copy-Update works, but