RRB-Trees: Efficient Immutable Vectors

Table of Contents

RRB-Trees: Efficient Immutable Vectors

RRB Trees stands for “Relaxed Radix Balanced Trees (RRB-Trees)”. This allows for immutable vector concatenation, insert-at and splits in O(log N) time while maintaining an index, with update and iteration speeds of O(1) time.

Introduction

Cons is slow. Lists in Lisp are slow; as they must be immutable, appending, updating, deleting take O(n) time, while prepending takes O(1) time. As well, since the structure of a list isn’t guaranteed in stack memory, they take a long time to dereference properly.

Vectors are preferable to lists for this reason. As well, disjoint parts of the vector can be worked on in parallel.

Clojure tries to solve this using “Ideal Hash Tries” (HAMTs), which are used for hashmaps and 32-way branching trees. This allows for single-element append in constant time, indexing in 1/5 log N time, and updates in 32/5 log N time. As well, using 32-wide arrays as tree nodes makes it cache friendly. An index update incurs 1/5 log N indirect memory access, which is close to constant time.

Parallel processing requires concatenation, splitting, and insertion at a given index. The RRB Tree has O(log N) concatenation and insertions. This is useful for “filtering”, where the size of the individual partition results isn’t known beforehand.

Vectors

Using HAMTs like clojure, with a 32-way branching structure allows for 1/5 log N index performance.

Concatenation

Concatenation of two vectors naively requires O(n) time, since it requires copying all of the nodes from the right hand side to the left hand side.

RRB-Trees

In Vectors, the branch factor m is always 32, and the trees are balanced. This allows radix search to find the child node during indexing.

Radix Search is cache-aware, which means it can be three times faster than a binary or linear search at the node.

Conclusions

RRB-Trees provide a viable extension to 32-way HAMTs by having the advantage of O (log N) concatenation and splits while maintaining the same costs as the HAMT.