Prev: schema-design-and-management Next: query-performance-optimization
An Index contains values from one or more columns in a table.
B-tree is the table index in mySQL. It speeds up CRUD operations by making them O(log n) time.
Indexes in MySQL are called key:
So this sql indexes all the values in the People Table.
CREATE TABLE People (
varchar(50) not null,
last_name varchar(50) not null,
first_name date not null,
dob key(last_name, first_name, dob)
);
As well, InnoDB has a feature called
adaptive hash indexes
, which adds a hash-index on commonly
accessed B-Tree tuples.
The Order of the indexing is important, because MySQL accesses
indexes from left to right. In the above sql, we index by last name, so
searching for dob
is less useful than filtering on
last_name
.
Full Text indexes are used like search engines, except they use
MATCH AGAINST
instead of WHERE
for parameter
matching.`
Indexes hold data in sorted order, which helps the server reduce the amount of data to go through, and helps the server avoid sorting and temporary tables.
Space can be saved by only indexing the beginning (the left hand side) of a key instead of the whole key.
More unique indexes are higher in index selectivity, which helps in query performance as well.
BLOBs and TEXT columns force this on you by defining prefix indexes.
As well, you should be using EXPLAIN
to let the
optimizer explain to you why it prefers a specific query plan.
To find a good length for selectivity, you can calculate the selectivity of each column:
> SELECT COUNT(DISTINCT city)/COUNT(*) FROM sakila.city_demo;
mysql+-------------------------------+
COUNT(DISTINCT city)/COUNT(*) |
| +-------------------------------+
0.0312 |
| +-------------------------------+
Or for many columns: It seems like 5, 6, 7 are good index lengths for this column.
> SELECT COUNT(DISTINCT LEFT(city, 3))/COUNT(*) AS sel3,
mysql-> COUNT(DISTINCT LEFT(city, 4))/COUNT(*) AS sel4,
-> COUNT(DISTINCT LEFT(city, 5))/COUNT(*) AS sel5,
-> COUNT(DISTINCT LEFT(city, 6))/COUNT(*) AS sel6,
-> COUNT(DISTINCT LEFT(city, 7))/COUNT(*) AS sel7
-> FROM sakila.city_demo;
+--------+--------+--------+--------+--------+
| sel3 | sel4 | sel5 | sel6 | sel7 |+--------+--------+--------+--------+--------+
0.0239 | 0.0293 | 0.0305 | 0.0309 | 0.0310 |
| +--------+--------+--------+--------+--------+
Once we find that 7
is a good value, we can add it to
our database:
ALTER TABLE sakila.city_demo ADD KEY (city(7));
Individual indexes on lots of columns won’t help MySQL much. That
being said, it can do a strategy called index merge
where
it uses multiple indexes on a single table and merges the results for a
query. This speeds up performance but might indicate poor indexes.
If there’s an index merge in EXPLAIN
, you can glean the
following:
AND
conditions) you should have a single index with all relevant columns
instead of multiple indexes.OR
conditions) the indexes should be more selective.Since B-Trees sort from the left, a good column is extremely important.
We could try to index based on selectivity:
So in this case, customer_id
should come first.
> SELECT COUNT(DISTINCT staff_id)/COUNT(*) AS staff_id_selectivity,
mysql-> COUNT(DISTINCT customer_id)/COUNT(*) AS customer_id_selectivity,
-> COUNT(*)
-> FROM payment\G
*************************** 1. row ***************************
0.0001
staff_id_selectivity: 0.0373
customer_id_selectivity: COUNT(*): 16049
However, indexes don’t always work:
Take this query:
SELECT COUNT(DISTINCT threadId) AS COUNT_VALUE
FROM Message
WHERE (groupId = 10137) AND (userId = 1288826) AND (anonymous = 0)
ORDER BY priority DESC, modifiedDate DESC
Explain shows that this uses an index on (groupId, userId).
id: 1
select_type: SIMPLEtable: Message
type: ref
key: ix_groupId_userId
18
key_len: ref: const,const
rows: 1251162
Using where Extra:
But in this case, no indexes will help, due to the fact that this query triplet has to scan through almost every row in the table.
> SELECT COUNT(*), SUM(groupId = 10137),
mysql-> SUM(userId = 1288826), SUM(anonymous = 0)
-> FROM Message\G
*************************** 1. row ***************************
count(*): 4142217
sum(groupId = 10137): 4092654
sum(userId = 1288826): 1288496
sum(anonymous = 0): 4141934
Clustered indexes store all rows in the index’s leaf pages, with adjacent key values stored close to one another.
There is a limit of one clustered index per table. Covering indexes allow you to emulate multiple clustered indexes, however.
Clustering data has advantages:
And disadvantages:
OPTIMIZE TABLE
after
loading a lot of data if the data wasn’t loaded in primary key
order.OPTIMIZE TABLE
d.CREATE TABLE layout_test (
int NOT NULL,
col1 int NOT NULL,
col2 PRIMARY KEY(col1),
KEY(col2)
);
For this table, each row contains col1
,
col2
, the transaction id, the primary key (col2) and the
rollback pointer InnoDB uses for transactional and MVCC purposes.
It may be a good idea to define a surrogate key
, which
is a primary key whose value is not derived from the data.
It’s best to avoid random (nonsequential and distributed) clustered keys, especially for I/O-bound workloads. UUIDs are a big offender of this, as they make clustering harder, and indexes perform worse off.
Take this example table which has an unsigned int as primary key:
CREATE TABLE userinfo (
id int unsigned NOT NULL AUTO_INCREMENT,
varchar(64) NOT NULL DEFAULT '',
name varchar(64) NOT NULL DEFAULT '',
email password varchar(64) NOT NULL DEFAULT '',
date DEFAULT NULL,
dob varchar(255) NOT NULL DEFAULT '',
address varchar(64) NOT NULL DEFAULT '',
city NOT NULL DEFAULT '0',
state_id tinyint unsigned varchar(8) NOT NULL DEFAULT '',
zip smallint unsigned NOT NULL DEFAULT '0',
country_id 'M','F')NOT NULL DEFAULT 'M',
gender (varchar(32) NOT NULL DEFAULT '',
account_type NOT NULL DEFAULT '0',
verified tinyint NOT NULL DEFAULT '0',
allow_mail tinyint unsigned int unsigned NOT NULL DEFAULT '0',
parrent_account varchar(3) NOT NULL DEFAULT '',
closest_airport PRIMARY KEY (id),
UNIQUE KEY email (email),
KEY country_id (country_id),
KEY state_id (state_id),
KEY state_id_2 (state_id,city,address)
=InnoDB ) ENGINE
Vs. the same table but with a uuid as the primary key.
CREATE TABLE userinfo_uuid (
varchar(36) NOT NULL,
uuid ...
)
First, both tables were inserted with a million rows, so the index could fit into memory:
Then, 3 million rows of random data, which wouldn’t fit in memory.
Note how the time to insert the rows for UUID is substantially slower (3x+ when writing to disk) and the index is 70% larger.
Table | Rows | Time (sec) | Index size (MB) |
---|---|---|---|
userinfo | 1,000,000 | 137 | 342 |
userinfo_uuid | 1,000,000 | 180 | 544 |
userinfo | 3,000,000 | 1233 | 1036 |
userinfo_uuid | 3,000,000 | 4525 | 1707 |
This happens because InnoDB can store the data immediately after the
previous insert in case of an AUTO_INCREMENT
primary key,
but for a UUID key, where the distribution is random, may have to flush
pages more often and seek to random places in its in memory buffer to
place records.
This can be improved somewhat with OPTIMIZE TABLE
, to
allow InnoDB to reorder records.
Primary Key order can be worse for high-concurrency workloads,
because AUTO_INCREMENT
requires taking a lock. It could be
possible that removing the auto incrementing public key would be better
for performance.
When a query is covered by an index, you’ll see
Using index
in the extra column of explain:
> EXPLAIN SELECT store_id, film_id FROM sakila.inventory\G
mysql*************************** 1. row ***************************
id: 1
select_type: SIMPLEtable: inventory
partitions: NULL
type: index
NULL
possible_keys: key: idx_store_id_film_id
3
key_len: ref: NULL
rows: 4581
100.00
filtered: Using index Extra:
InnoDB indexes by primary key, so even though the index might not
normally cover the data, it can be used: In this case,
actor_id
, the primary key, isn’t indexed, but can be used
here:
> EXPLAIN SELECT actor_id, last_name
mysql-> FROM sakila.actor WHERE last_name = 'HOPPER'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLEtable: actor
partitions: NULL
type: ref
possible_keys: idx_actor_last_namekey: idx_actor_last_name
182
key_len: ref: const
rows: 2
100.00
filtered: Using index Extra:
MySQL can use indexes for sorting and finding rows. Ordering the
results by index works only when the index’s order is exactly the same
as the ORDER BY
clause, and all columns are sorted in the
same direction (ASC
or DESC
). If not, then
MySQL will use a sort.
This can be checked in the type
of
EXPLAIN
:
> EXPLAIN SELECT rental_id, staff_id FROM sakila.rental
mysql-> WHERE rental_date = '2005-05-25'
-> ORDER BY inventory_id, customer_id\G
*************************** 1. row ***************************
type: ref
possible_keys: rental_datekey: rental_date
rows: 1
Using where Extra:
Indexes can be forced in MySQL, if the optimizer doesn’t do what you want.
Duplicate indexes are indexes of the same type created on the same columns:
CREATE TABLE test (
ID INT NOT NULL PRIMARY KEY,
INT NOT NULL,
A INT NOT NULL,
B UNIQUE(ID),
INDEX(ID)
=InnoDB; ) ENGINE
This creates 3 Unique indexes on the same key (ID). InnoDB only warns you of duplicate indexes. Remove them if possible.
Redundant Indexes are like the following:
If there’s an index on (A, B)
, then another index on
(A)
is redundant, because all lookups will use the index
(A, B)
anyway.
However, an index on (B, A)
or (B)
would
not be redundant, because B is not a left-most prefix of the index
(A, B)
.
That being said, sometimes redundant indexes are desirable for performance reasons:
TABLE userinfo (
REATE id int unsigned NOT NULL AUTO_INCREMENT,
varchar(64) NOT NULL DEFAULT '',
name varchar(64) NOT NULL DEFAULT '',
email password varchar(64) NOT NULL DEFAULT '',
date DEFAULT NULL,
dob varchar(255) NOT NULL DEFAULT '',
address varchar(64) NOT NULL DEFAULT '',
city NOT NULL DEFAULT '0',
state_id tinyint unsigned varchar(8) NOT NULL DEFAULT '',
zip smallint unsigned NOT NULL DEFAULT '0',
country_id varchar(32) NOT NULL DEFAULT '',
account_type NOT NULL DEFAULT '0',
verified tinyint NOT NULL DEFAULT '0',
allow_mail tinyint unsigned int unsigned NOT NULL DEFAULT '0',
parent_account varchar(3) NOT NULL DEFAULT '',
closest_airport PRIMARY KEY (id),
UNIQUE KEY email (email),
KEY country_id (country_id),
KEY state_id (state_id)
=InnoDB ) ENGINE
This table shows the QPS of indexing one million rows of the above
table and performance for SELECT
s. By Indexing both
state_id
and state_id_2
, performance for
selecting each row specifically drops a bit with the double index but is
mainly negligible.
state_id only | state_id_2 only | Both state_id and state_id_2 | |
---|---|---|---|
Query 1 | 108.55 | 100.33 | 107.97 |
Query 2 | 12.12 | 28.04 | 28.0 |
However, having a bigger index slows insertion speed quite a bit:
state_id only | Both state_id and state_id_2 | |
---|---|---|
InnoDB, enough memory for both indexes | 80 seconds | 136 seconds |
In MySQL 8.0, there’s a new invisible index
feature
which will disregard indexes for queries. If you find queries are
EXPLAIN
ed the same with or without an index, you can remove
it.
Some indexes are just plain unused. Get rid of them with
performance_schema
and `sys.
SELECT * from sys.schema_unused_indexes;
+---------------+---------------+-----------------------------+
| object_schema | object_name | index_name |+---------------+---------------+-----------------------------+
| sakila | actor | idx_actor_last_name |
| sakila | address | idx_fk_city_id |
| sakila | address | idx_location |
| sakila | payment | fk_payment_rental |.. trimmed for brevity ..
The three goals of table maintenance are:
Tables can be corrupted due to hardware or internal bugs.
Corrupted indexes can return incorrect results on queries, or raise duplicate-key errors when there is no duplicated value.
CHECK TABLE
can try to guess if the table is
corrupt.
As well, REPAIR TABLE
can work, or an
ALTER TABLE table ENGINE=INNODB
to reload the table with
the current engine.
As well, you could dump the data and reload it.
The storage engine provides the optimizer with fuzzy data about the
number of rows a query might examine. The optimizer uses this data
(Index Statistics) to make guesses and generate a query plan. This data
can be out of sync if lots of updates have been made to a table. If so,
you can ANALYZE TABLE
to regenerate the statistics.
B-Trees can become fragmented. To defragment data, run
OPTIMIZE TABLE
, or dump and reload the table. As well, you
can ALTER TABLE table ENGINE=INNODB
as well.
Prev: schema-design-and-management Next: query-performance-optimization