Prev: sorting–aggregations Next: query-execution-i
For a tuple \(r \in R\), and a tuple \(s \in S\) that match on join attributes, the join operator concatenates \(r\) and \(s\) together into a new output tuple.
This can be done a few ways:
Cost Analysis
The cost analysis metric will be the number of disk I/Os used to compute the join.
Set \(M\) as the pages in \(R\), and \(N\) as the pages in \(S\).
This algorithm involves using two for loops, with the smaller table on the outer loop, and the larger table in the inner loop.
Simple Nested Loop Join The regular \(O(n^2)\) algorithm. Cost: \(M + (m * N)\)
Block Nested Loop Join For each block in the outer table, fetch each block from the inner table, and work on each pair of blocks. This is no faster in terms of comparisons, but less costly in terms of I/O. Cost: \(M + (M * N)\).
Index Nested Loop Join If the inner table has an index, we can use that. Assume that each index lookup costs a constant, \(C\). Cost: \(M + (m * C)\).
First sort the tuples on the join key, and then merge them where they match.
Sort Cost for \(R\): \(2M * 1 + \log{B-1}(\frac{M}{B})\) Sort Cost for \(S\): \(2N * 1 + \log{B-1}(\frac{N}{B})\) Merge Cost: \((M + N)\) Cost: Sort + Merge \(\approx O(M + N \log{}(M + N))\)
Basic Hash Join Hash each item in \(R\) and \(S\) on the join key. Then, for each bucket of a hash function that matches, check the members of each hash bucket in \(R\) and \(S\).
This has a variable cost, depending on the amount of collisions, but is \(\approx O(\max{R}{S})\).
Grace Hash Join When hash tables do not fit in memory, swapping to disk would be expensive. Thus, this algorithm hashes the inner table into partitions that are written out to disk as well.
Partition Cost: \(2 * (M + N)\) Probe Cost: \((M + N)\) Total Cost: \(3 * (M + N)\)
Prev: sorting–aggregations Next: query-execution-i