Prev: backup-and-recovery Next: mysql-in-the-cloud
This chapter walks us through ProxySQL and Vitess, accepted solutions to scale writes.
Scaling is the system’s ability to support growing traffic.
Capability is the amount of work it can perform in a given amount of time.
Scalability is the ability to add capacity by adding resources: It’s important to think of it in terms of:
To scale, first figure out your workload. Is it primarily writes (INSERT, UPDATE, DELETE) or reads (SELECT)?
Don’t fall into the trap of analyzing QPS only.
If you start off with one leader, you can optimize indexes, queries, and caching data. But eventually you’ll have to add replicas to scale read traffic.
When the bottleneck is write-bound, it’s a bit more difficult. You can scale vertically for a while, but once that hits its limit, you have to think about how to split your data so that you can accept writes in parallel on separate subsets. Sharding data means that joins become a lot more problematic though, as they have to go to another node.
Splitting your data is context dependent.
Since Read Pools are replicated, you just need to stick a load balancer in front of them. HAProxy is a hardware load balancer that works for this, or Nginx also works.
Here’s a sample HAProxy file that maps one virtual IP to two different servers.
global
log 127.0.0.1 local0 notice
user haproxy
group haproxy
defaults
log global
retries 2
timeout connect 3000
timeout server 5000
timeout client 5000
listen mysql-readpool
bind 127.0.0.1:3306
mode tcp
option mysql-check user haproxy_check
balance leastconn
server mysql-1 10.0.0.1:3306 check
server mysql-2 10.0.0.2:3306 check
Balancing with leastconn
is recommended in MySQL.
roundrobin
is another setting, but it might leave some
nodes overloaded.
Now that we have a gate, we need a herder. Service discovery is a good option to automatically discover what hosts can be on the list.
Keep the following in mind for service discovery:
How do you check if a replica is healthy? You can use a heartbeat, but keep in mind while adjusting it:
You should build a heartbeat endpoint for each of your MySQL servers.
These all have their pros and cons. You might be tempted to disregard random, but since it comes with little overhead, it actually works out pretty well.
Scaling your application layer becomes more complex with write transactions that favors consistency over availability.
More nodes results in more timeouts and retries. Before sharding the data, we can use queues.
With a queue, the web server places a request on a queue, which is then picked up by a server and then returned. Thus, the server has to poll for the response and give it back to the client or the client can come back later to see their response.
Sharding means to split your data into different, smaller database clusters so that writes can be executed on more writes on moer source hosts. There are two types.
Functional Partitioning divides different nodes to different tasks.
Data sharding splits the data into smaller pieces, and stores them on different nodes.
Most applications have some logical data set that could be sharded. For example, if an account should be able to query a set of data, you can place an account and all its relevant data on a shard.
Sharding by a hash of each table’s primary key scales poorly, because you have to check other shards for data you need.
All queries should be localized to one shard if possible.
Try to find a partition key that lets you avoid cross-shard queries as much as possible.
If you have data that touches multiple partitions (like user id -> book id) then you can shard users and books together.
Most sharded applications have at least some queries that need to aggregate or join data from multiple shards. These queries are slow, and difficult to implement. You’ll need to aggressively cache, and use summary tables, which traverse all shards and store results redundantly on each shard.
Maintaining data consistency is also harder. Foreign keys don’t work across shards.
Vitess is a database-clustering system for MySQL.
It does the following:
Vitess has the following perks:
ProxySQL is similar, and provides a layer between application code and the database to increase consistency and availability.
Prev: backup-and-recovery Next: mysql-in-the-cloud