Database Storage II
Prev: database-storage-i Next: buffer-pools
Data Representation
Tuples are just byte arrays on disk. The DBMS needs to interpret those bytes accordingly.
Integers
BIGINT = i64 INTEGER = i32 SMALLINT = i16 TINYINT = i8
Variable Precision Numbers
- Inexact, variable-precision numeric types that use the IEEE-754 floating point numbers.
FLOAT = f64 REAL = f32
Fixed Point Precision Numbers
- Numeric data types with arbitrary precision and scale.
NUMERIC, DECIMAL
Variable Length Data
- An array of bytes of arbitrary length.
- Generally a header with length of bytes + the bytes themselves.
- Some systems will let you store large values in an external file, and then the tuple will contain a pointer to that file. This is called a
BLOB
normally.
VARCHAR, VARBINARY, TEXT, BLOB
Dates and Times
- Datetime Representations as the number of micro/milliseconds since the unix epoch.
TIME, DATE, TIMESTAMP
Workloads
OLTP: On-line Transaction Processing
- Fast, short running operations
- Queries operate on a single entity at a time
- More writes that reads
- repetitive operations
OLAP: On-line Analytical Processing
- Long running, complex queries
- Reads large parts of the database
- More reads than writes
- Exploratory queries
Storage Models
N-Ary Storage Model (NSM)
Stores all the attributes for a single tuple contiguously, so NSM is known as a “row store”. Useful for OLTP workloads.
Advantages:
- Fast insert, updates, and deletes
- Fast if data is accessed by row
Disadvantages:
- Slow + wasteful if scanning by singular or multiple attributes column wise.
There are two ways to organize an NSM database:
- Heap-Organized Tuples are stored in a heap, which is orderless.
- Index-Organized Tuples are stored in the primary key index.
Decomposition Storage Model (DSM)
Stores all attributes for all tuples contiguously in a block of data. A “column store”. Ideal for OLAP workloads.
Advantages:
- Reduces the amount of wasted work, since DBMS only reads data required.
- Enables better compression of values, since they are more likely to be similar
Disadvantages:
- Slow for point queries, inserts, updates, and deletes due to tuple stitching/splitting.
To put tuples back together, databases can set fixed-size offsets, or embed the tuple id inside the attribute of each column, to allow for faster jumping to specific attributes.
Prev: database-storage-i Next: buffer-pools