Data Structures

Prev: Algebraic Techniques Next: Geometric Algorithms and Linear Programming

Problems

8.1

Prove Lemma 8.4.

8.2

Prove Lemma 8.5.

8.3

Due to K. Mulmuley [315].

Consider the following version of the Mulmuley games. The pool consists of the sets $P$ , $B$ , $T$ , and $S$ , where $P$ is a set of $p$ players, $B$ a set of $b$ bystanders, $T$ a set of $t$ triggers, and $S$ a set of $s$ stoppers. Assume that the players are totally ordered and that all sets are non-empty and pairwise disjoint. The game consists of picking random elements of the pool, without replacement, until the pool is empty. The value of the game, $G_{p}^{t, s}$ , is defined as the expected value of the following quantity: after all triggers have been chosen, and before any stopper has been chosen, the number of players who, when chosen, are larger than all previously chosen players. This is the same as Game E except for the requirement that we start counting only after all triggers have been picked.

Determine the expected value of $G_{p}^{t, s}$ .

8.4

Given a set of keys $S = {k_{1}, k_{2}, \dots, k_{n}}$ , consider constructing a random treap for $S$ where we do not introduce the dummy leaves needed for the endogenous property. Is every element of $S$ equally likely to be a leaf in this treap? Discuss the implications of your result for the performance of a treap.

8.5

We have shown that for any element in a set $S$ of size $n$ , the expected depth of a random treap for $S$ is $O (lo g n)$ . Show that the depth is $O (lo g n)$ with high probability. Conclude a similar high probability bound on the height of a random treap. (Hint: One way of achieving this bound is to derive a Chernoff-type bound on the tail of the distribution of the value of Game A.)

8.6

Let $T$ be a random treap for a set $S$ of size $n$ . Determine the expected size of the sub-tree rooted at an element $x \in S$ whose rank is $k$ .

8.7

Due to C.R. Aragon and R.G. Seidel [30].

Let $T$ be a random treap for the set $S$ , and let $x, y \in S$ be two elements whose ranks differ by $r$ . Prove that the expected length of the unique path from $x$ to $y$ in $T$ is $O (lo g r)$ .

8.8

While the Mulmuley games are useful for explaining the analysis of random treaps, they are easily dispensed with. To see this, attempt to provide a direct proof of Lemmas 8.6 and 8.7.

8.9

A finger search tree is a binary search tree with a special pointer (the finger) associated with it. The finger always points to the last item accessed in the tree. Describe how you would implement the FIND operation starting from the finger, rather than the root. Finger search trees perform especially well on a sequence of FINDs that has some locality of reference. Analyze the performance of a random treap in terms of the ranks of the keys accessed during a sequence of FIND operations. (The result in Problem 8.7 may be useful for this purpose.)

8.10

Due to C.R. Aragon and R.G. Seidel [30].

Another important property of random treaps is that they adapt well to scenarios where the elements have specific access frequencies. Suppose that each key in $S$ will be accessed a prespecified number of times, but the exact order of the accesses is unknown. Equivalently, consider accesses that involve an element of $S$ chosen at random according to a specific distribution that is not necessarily uniform. In either case, the following notion of a weighted treap provides an optimal solution to the resulting data-structuring problem.

(a) Consider a random treap $T$ for a set $S$ . Associate a positive integer weight $f_{x}$ with each $x \in S$ , and define $F = \sum_{x \in S} f_{x}$ . Define a random weighted treap as a treap obtained by choosing priorities for each $x \in S$ as follows: $p_{x}$ is the maximum of $f_{x}$ independent samples from a continuous distribution $D$ . Describe how you will maintain a random weighted treap under the full set of operations supported by an unweighted treap.

(b) Prove the following performance bounds for random weighted treaps with an arbitrary choice of the weights $f_{x}$ .

The expected time for a FIND, INSERT, or DELETE operation involving a key $x$ is

O (1 + lo g \frac{F}{min { f _{x} , f _{y} , f _{z} }}),

where $F$ includes the weight of $x$ , and the keys $y$ and $z$ are the predecessor and successor of $x$ in the set $S$ .

The expected number of rotations needed for an INSERT or DELETE operation involving a key $x$ is

O (1 + lo g \frac{f _{y} + f _{x}}{f _{y}} + lo g \frac{f _{z} + f _{x}}{f _{z}}),

where the keys $y$ and $z$ are the predecessor and successor of $x$ in the set $S$ .

The expected time to perform a JOIN, PASTE, or SPLIT operation involving sets $S_{1}$ and $S_{2}$ of total weight $F_{1}$ and $F_{2}$ , respectively, is

O (1 + lo g \frac{F _{1}}{f _{x}} + lo g \frac{F _{2}}{f _{y}}),

where $x$ is the largest key in $S_{1}$ and $y$ is the smallest key in $S_{2}$ .

8.11

In Problem 8.10, it was assumed that the access frequency or probability is known in advance, and this knowledge was important in the choice of an appropriate distribution for the elements’ priorities. Explain how weighted treaps can be made to adapt to the observed frequency of access of the elements in the treaps. There is a solution that does not explicitly keep track of the observed frequency and will use no more random bits than in the case where the frequencies are known in advance.

8.12

Let us now analyze the number of random bits needed to implement the operations of a treap. Suppose we pick each priority $p_{i}$ uniformly at random from the unit interval $[0, 1]$ . Then, the binary representation of each $p_{i}$ can be generated as a potentially infinite sequence of bits that are the outcome of unbiased coin flips. The idea is to generate only as many bits in this sequence as is necessary for resolving comparisons between different priorities. Suppose we have only generated some prefixes of the binary representations of the priorities of the elements in the treap $T$ . Now, while inserting an item $y$ , we compare its priority $p_{y}$ to others’ priorities to determine how $y$ should be rotated. While comparing $p_{y}$ to some $p_{i}$ , if their current partial binary representation can resolve the comparison, then we are done. Otherwise, they have the same partial binary representation and we keep generating more bits for each till they first differ.

Compute a tight upper bound on the expected number of coin flips or random bits needed for each update operation. (See also Problem 1.5.)

8.13

Compute a tight upper bound on the expected number of coin flips or random bits needed for each update operation for random skip lists.

8.14

In Lemma 8.10 we gave an upper bound on the expected cost of a FIND operation in a random skip list. Determine the expectation of this random variable as precisely as you can. (Hint: We suggest the following approach. For each element $x_{i}$ , determine the probability that it lies on the search path for a particular query $y$ , and sum this over $i$ to get the desired expectation. To determine the probability, find a characterization of the level numbers that will lead to $x_{i}$ being on the search path.)

8.15

We have shown that the expected cost of a FIND operation in a random skip list is $O (lo g n)$ . Prove that the cost is bounded by $O (lo g n)$ with high probability, using a Chernoff-type bound for the sum of geometrically distributed random variables. Can you prove a similar probability bound for the INSERT and DELETE operations?

8.16

Give a high probability bound on the space requirement of a random skip list for a set $S$ of size $n$ .

8.17

Due to W. Pugh [339].

In defining a random leveling for a skip list, we sampled the elements from $L_{i}$ with probability $1/2$ to determine the next level $L_{i + 1}$ . Consider instead the skip list obtained by performing the sampling with probability $p$ at each level, where $0 < p < 1$ .

(a) Determine the expectation of the number of levels $r$ , and prove a high probability bound on the value of $r$ .

(b) Determine as precisely as you can the expected cost of each operation in this skip list.

8.18

Formulate and prove results similar to those in Problems 8.7 and 8.9 for random skip lists.

8.19

Consider the scenario described in Problem 8.10 for random treaps. Adapt the random skip list structure to prove similar results, and compare the bounds obtained in the two cases.

8.20

Due to M.N. Wegman and J.L. Carter [414]; see also M. Blum and S. Kannan [66].

Consider the problem of deciding whether two integer multisets $S_{1}$ and $S_{2}$ are identical in the sense that each integer occurs the same number of times in both sets. This problem can be solved by sorting the two sets in $O (n lo g n)$ time, where $n$ is the cardinality of the multisets. In Problem 7.4, we considered applying the randomized techniques for verifying polynomial identities to the solution of the multiset identity problem. Suggest a randomized algorithm for solving this problem using universal hash functions. Compare your solution with the randomized algorithm suggested in Problem 7.4.

8.21

Due to J.L. Carter and M.N. Wegman [88].

Suppose that $M = {0, 1}^{m}$ and $N = {0, 1}^{n}$ . Let $M = {0, 1}^{(m + 1) \times n}$ denote the space of Boolean matrices with $m + 1$ rows and $n$ columns. For any $x \in M$ , denote by $x^{(1)}$ the $(m + 1)$ -bit vector obtained by appending a $1$ to the end of $x$ . For $A \in M$ , define

h_{A} (x) = x^{(1)} A mod 2.

Show that $H = {h_{A} ∣ A \in M}$ is a $2$ -universal hash family. Is it also strongly $2$ -universal? Why did we augment the vector $x$ to $x^{(1)}$ ? Compare the complexity and the use of randomness in this construction with that of the construction described in Section 8.4.

8.22

Due to J.L. Carter and M.N. Wegman [88].

In this problem we consider a weakening of the notion of $2$ -universal families of hash functions. Let $g (x) = x mod n$ be as before. For each $a \in Z_{p}$ , define the function $f_{a} (x) = a x mod p$ , and $h_{a} (x) = g (f_{a} (x))$ , and let $H = {h_{a} ∣ a \in Z_{p}, a \neq = 0}$ . Show that $H$ is nearly- $2$ -universal in that, for all $x \neq = y$ ,

δ (x, y, H) \leq \frac{2∣ H ∣}{n} .

Also, show that the bound on the collision probability is close to the best possible for this family of hash functions.

8.23

Due to M.N. Wegman and J.L. Carter [414].

Define a super-strong universal hash family to be a family of hash functions from $M$ to $N$ that is strongly $k$ -universal for all values of $k$ simultaneously. Provide a complete characterization of function families that satisfy this definition.

8.24

Due to N. Nisan [320].

An interesting property of a strongly $2$ -universal hash function is the following. For any $A \subseteq M$ define $p (A) = ∣ A ∣/∣ M ∣$ ; similarly, for any $B \subseteq N$ , define $p (B) = ∣ B ∣/∣ N ∣$ . For any $ϵ > 0$ , $A \subseteq M$ , and $B \subseteq N$ , a hash function $h : M \to N$ is said to be $ϵ$ -good for $A$ and $B$ if for $x$ chosen uniformly at random from $M$ ,

∣ Pr [x \in A and h (x) \in B] - p (A) p (B) ∣ \leq ϵ .

Let $h$ be chosen uniformly at random from a strongly $2$ -universal hash family $H$ . Show that for any $ϵ > 0$ , $A \subseteq M$ , and $B \subseteq N$ , the probability that $h$ is not $ϵ$ -good for $A$ and $B$ is at most

\frac{p ( A ) p ( B ) ( 1 - p ( B ))}{ϵ ^{2} ∣ M ∣} .

8.25

Prove Corollary 8.20.

8.26

Due to M.L. Fredman, J. Komlós, and E. Szemerédi [156].

Show that the hash table representation analyzed in Theorem 8.19 can be constructed with expected $O (s^{2})$ preprocessing time, using $13 s + 1$ cells and the same search time.

8.27

Due to M.L. Fredman, J. Komlós, and E. Szemerédi [156].

Show that the hash table representation described in Theorem 8.19 can be constructed with worst-case $O (s^{3} lo g s)$ preprocessing time, using $13 s + 1$ cells and the same search time.

8.28

Due to M.L. Fredman, J. Komlós, and E. Szemerédi [156].

Show that the hashing scheme of Section 8.5 can be modified to use space $s + o (s)$ while still requiring only polynomial preprocessing time and constant query time. (Hint: Increase the size of the primary hash table and observe that most of the bins will be empty. Find an efficient scheme for packing together the non-empty bins, while creating secondary hash tables only for the bins of size greater than $1$ .)

Takashi's Notes

Explorer

Data Structures

Data Structures

Problems

8.1

8.2

8.3

8.4

8.5

8.6

8.7

8.8

8.9

8.10

8.11

8.12

8.13

8.14

8.15

8.16

8.17

8.18

8.19

8.20

8.21

8.22

8.23

8.24

8.25

8.26

8.27

8.28

Graph View

Table of Contents

Backlinks