Next: b-tree-basics
Databases come in three general groups, based on the queries they run.
Online Transaction Processing (OLTP): large amounts of queries that are predefined, short-lived Online Analytical Processing (OLAP): complex aggregations used for analytics. Complex and long-lived Hybrid Transactional and Analytical Processing (HTAP): Run both OLTP and OLAP queries
The architecture of a DBMS might look like the following:
Some databases which operate purely in memory have becoming more
popular, due to the decrease in cost, and less complexity to implement
in-memory versus on-disk data structures. Intel Optane (now discontinued) used to combine the
properties of the fast access of RAM (~30 microseconds) with the
durability of an SSD. It did not become very popular, however.
Most RDBMS’ are row-oriented, meaning each row is stored as a data
unit. Some databases are column-oriented, where they store each column
in an array. Most OLTP workloads prefer row-orientation, because they
grab data horizontally, whereas OLAP workloads may grab certain parts of
a record, and thus prefer column orientation. Some examples of row oriented databases: MySQL,
Postgres. Some examples of column oriented databases: Cassandra,
Clickhouse.
Choosing data layout has a large impact in potential optimizations.
For example, for time-series databases, which store very similar data,
compression can save 95%+ of the memory costs of data. Facebook’s Gorilla DB was able to reduce every row of
data (a 3-tuple of string key, 8-byte timestamp, and 8-byte floating
point number) from 16 bytes to 1.37 bytes, about 12x compression on each
row, by exploiting redundancy in nearby rows.
Wide column stores e.g. BigTable and HBase
represent data as a multidimensional map. Columns are
grouped into column families, which store data of the same type, and
data is stored row-wise. This is useless for storing data retrieved by
key or sequence of keys.
This allows data to be fetched from hierarchical indexes. As well, column families store multiple versions of data by timestamp, which allows them to quickly locate higher-level entries.
A database file stores data to allow quick access. That means that
Next: b-tree-basics