Next: monitoring-in-a-reliability-engineering-world
The architecture for MySQL looks like this:
The first layer handles connection pooling.
The second layer handles parsing the SQL given into an optimized form.
The third layer contains the storage engines. Storage engines have different drawbacks, but the most popular one in MySQL is InnoDB.
The optimizer might rewrite the query, determine the order it will read tables, which indexes to use, and so on. Hints may be provided to the Query optimizer.
One note: MySQL used to use a query cache to serve results. With the increase in concurrency, though, the query cache was deprecated in MySQL 5.7.20. This pattern is useful, so caching can be done in Memcached or Redis.
MySQL storage engines offer a few solutions:
Transactions allow us to batch reads and writes, and follow ACID principles
By default, all commands are wrapped in a transaction and committed immediately.
To create a transaction, disable the AUTOCOMMIT
variable
by setting it to 0, or use BEGIN
or
START TRANSACTION
.
Certain commands that affect the DDL (ALTER TABLE
,
LOCK TABLES
) are immediately committed.
The isolation level can be changed by using
SET TRANSACTION ISOLATION LEVEL
.
Note: Since transactions are managed by storage engines, not MySQL, mixing storage engines during transactions can cause lost data during rollbacks.
You can explicitly lock a table in two-phase commit by using
SELECT ... FOR SHARE
or
SELECT ... FOR UPDATE
.
MVCC is used in tandem with Row-level locking mechanisms for
increasing concurrency. MVCC takes snapshots of the data as it existed
in some point in time for writes. Thus, all reads can read the data that
corresponds to the time they are in, so there’s no need for locks (since
readers dont block writers or vice-versa). This comes with the caveat
that MVCC only works with READ COMMITTED
and
REPEATABLE READ
. SERIALIZABLE
is not allowed
because reads lock every row to create ordered reads.
Since MySQL is designed to accept writes on one node, the default way to increase reliability is through replication, which involves a leader node replicating writes to all the other nodes through binary log shipping.
In MySQL 5.X, you can query the information_schema
to
get data about tables. In MySQL 8.X+, all that information has been
included in the tables .ibd
file.
InnoDB is the default engine, which performs well for processing
short-lived transactions that mainly complete instead of being rolled
back. It uses MVCC, defaults to REPEATABLE READ
, and
prevents phantom reads by using a next-key locking strategy.
Its tables use a clustered index, so it provides fast primary key lookups, but secondary indexes contain the primary key columns, so secondary indexes are generally very large.
It supports hash indexes, read-ahead prefetching data from disk, and insert buffers for faster inserts.
It supports hot online backups through a variety of services as well.
MySQL 5.6 allowed for in place updates of tables as well.
MySQL 8 introduced atomic data definition changes. In MySQL 5, if DDL statements partially finished in a transaction, they would be partially applied.