chapter 13 design a search autocomplete system
Prev: chapter-12-design-a-chat-system Next: chapter-14-design-youtube
This chapter outlines autocomplete, which is used at Google and Amazon.
Make sure to ask questions about the problem and scope: Q: Is the matching only supported at the beginning of a search query or in the middle as well? A: Only at the beginning of a search query. Q: How many autocomplete suggestions should the system return? A: 5 Q: How does the system know which 5 suggestions to return? A: This is determined by popularity, decided by the historical query frequency. Q: Does the system support spell check? A: No, spell check or autocorrect is not supported. Q: Are search queries in English? A: Yes. If time allows at the end, we can discuss multi-language support. Q: Do we allow capitalization and special characters? A: No, we assume all search queries have lowercase alphabetic characters. Q: How many users use the product? A: 10 million DAU.
Back of the Envelope Estimation
- Assuming 10 million daily active users (DAU)
- An average person performs 10 searches per day
- 20 bytes of data per query string:
- For every character entered in the search box, the client sends a network request.
- ~ 24000 QPS = 10M users _ 10 queries/day _ 20 characters / 24 hours / 3600 seconds.
- Peak QPS is double the normal QPS, ~48,000 QPS.
- Assume 20% of the daily queries are new. 10M _ 10 queries/day _ 20 bytes per query * 20% = 0.4 GB. 0.4GB of new data is added to storage daily.
High level design
Split the system into two parts:
- Data gathering
- Query service
Data gathering service
As we query a new keyword, we add it into the database, along with its frequency (imagine a hashmap of key → hit count).
use std::collections::HashMap;
let mut map = HashMap::new();
let count = map.entry(&str).or_insert(0);
*count += 1;
We can query the DB with SQL:
SELECT * FROM queries
where query like `prefix%`
ORDER BY frequency DESC
LIMIT 5;
This works fine for small datasets but hits a bottleneck in larger datasets.
Prev: chapter-12-design-a-chat-system Next: chapter-14-design-youtube