Database Storage II

Table of Contents

Database Storage II

Prev: database-storage-i Next: buffer-pools

Data Representation

Tuples are just byte arrays on disk. The DBMS needs to interpret those bytes accordingly.

Integers

BIGINT = i64 INTEGER = i32 SMALLINT = i16 TINYINT = i8

Variable Precision Numbers

FLOAT = f64 REAL = f32

Fixed Point Precision Numbers

NUMERIC, DECIMAL

Variable Length Data

VARCHAR, VARBINARY, TEXT, BLOB

Dates and Times

TIME, DATE, TIMESTAMP

Workloads

OLTP: On-line Transaction Processing - Fast, short running operations - Queries operate on a single entity at a time - More writes that reads - repetitive operations

OLAP: On-line Analytical Processing - Long running, complex queries - Reads large parts of the database - More reads than writes - Exploratory queries

Storage Models

N-Ary Storage Model (NSM)

Stores all the attributes for a single tuple contiguously, so NSM is known as a “row store”. Useful for OLTP workloads.

Advantages: - Fast insert, updates, and deletes - Fast if data is accessed by row

Disadvantages: - Slow + wasteful if scanning by singular or multiple attributes column wise.

There are two ways to organize an NSM database:

Decomposition Storage Model (DSM)

Stores all attributes for all tuples contiguously in a block of data. A “column store”. Ideal for OLAP workloads.

Advantages: - Reduces the amount of wasted work, since DBMS only reads data required. - Enables better compression of values, since they are more likely to be similar

Disadvantages: - Slow for point queries, inserts, updates, and deletes due to tuple stitching/splitting.

To put tuples back together, databases can set fixed-size offsets, or embed the tuple id inside the attribute of each column, to allow for faster jumping to specific attributes.

Prev: database-storage-i Next: buffer-pools