Logging Protocols + Schemes
Prev: multi-version-concurrency-control Next: crash-recovery-algorithms
Crash Recovery
Recovery algorithms ensure consistency, atomicity, and durability.
Every recovery algorithm has two parts:
- Actions during normal transaction processing to ensure transaction aborts leave the DB in a consistent state.
- Actions after a failure to recover the database to a consistent, atomic, durable state.
The two key primitives are UNDO
and REDO
.
- UNDO: Removing the effects of an aborted transaction.
- REDO: Reinstating a committed transaction for durability.
Failure Classification
- Transaction Failures
- Logical Errors: A transaction cannot complete due to an integrity/constrain violation.
- Internal State Errors: The DBMS terminates an active transaction due to an error condition.
- System Failures:
- Software Failure: There is a problem with the DBMS implementation that causes the system to halt.
- Hardware Failure: The computer hosting the DBMS crashes. Assume that non-volatile storage is not corrupted by crash.
- Storage Media Failure:
- Non-Repairable Hardware Failure: A disk failure destroys all or some parts of non-volatile storage. Destruction is detectable. The database must be restored from an archived version.
Buffer Pool Management Policies
Steal Policy: Whether the DBMS allows an uncommitted transaction to overwrite the most recently committed value of an object on disk.
- STEAL: allowed
- NO-STEAL: not allowed Force Policy: Whether the DBMS ensures that all updates made by a transaction are reflected on non-volatile storage before the transaction is allowed to commit.
- FORCE: enforced
- NO-FORCE: not enforced
Force makes it easier to recover but results in poorer runtime performance.
Shadow Paging
The DBMS maintains two separate copies of the database (master, shadow). Updates are only made in the shadow copy. When a transaction commits, atomically switch the shadow to become the new master.
This is a NO-STEAL
+ FORCE
system.
Implementation:
- Database pages are a tree, where the root is a single disk page.
- Two copies of the tree, the master and shadow.
- Root points to master.
- Updates are applied to the shadow.
- To install updates, overwrite the root so it points to the shadow. (Pointer swap). This is atomic on most architectures.
- Before overwriting the root, none of the transaction updates are part of the database on disk.
- After the swap, all the transactions are on disk.
UNDO: Remove the shadow pages. Leave the master and DB root pointer alone. REDO: Not needed.
Write-Ahead Logging
The DBMS records all changes made to the database in a log file on non-volatile storage before the change is made to disk. The log contains instructions required to undo and redo after a crash.
This is a STEAL
+ NO-FORCE
system.
This has the fastest runtime performance, but recovery time is slower than shadow paging because it has to replay the log.
Implementation:
- All log records are written to non-volatile storage before the page is written to non-volatile storage.
- A transaction is not considered committed until all its log recordshave been written to disk.
- When the transaction starts, write a
BEGIN
record to the log. - When it finishes, write a
COMMIT
to make sure all log records are flushed before it returns an acknowledgement to the application. - Each log entry contains the following information:
- Transaction ID
- Object ID
- Before Value (for UNDOing)
- After Value (for REDOing)
- Log entries to disk after transaction commits. Group commits can be used to batch multiple log flushes together to amoritize overhead.
Checkpoints
Write-Ahead Logging has a problem in that the log file grows infinitely. After a crash, the DBMS has to replay the entire log, which can take a long time if the log file is large. Thus, the DBMS can periodically make a checkpoint where it flushes all buffers out to disk.
Blocking Checkpoint Implementation:
- The DBMS stops accepting new transactions and waits for active transactions to complete.
- Flush all log records and dirty blocks to disk.
- Write a
CHECKPOINT
entry to the log and flush to stable storage.
Logging Schemes
Physical Logging:
- Record the changes made to a specific location in the database
Logical Logging:
- Record the high level operations executed by transactions. Not necessarily restricted to a single page. Requires less data, but can be difficult to implement for non-deterministic queries. (RAND() will be tricky).
Physiological Logging:
- Hybrid approach where log records target a single page but don’t specify the organization of the page.
Prev: multi-version-concurrency-control Next: crash-recovery-algorithms