Data Structures
Prev: Algebraic Techniques Next: Geometric Algorithms and Linear Programming
Problems
8.1
Prove Lemma 8.4.
8.2
Prove Lemma 8.5.
8.3
Due to K. Mulmuley [315].
Consider the following version of the Mulmuley games. The pool consists of the sets , , , and , where is a set of players, a set of bystanders, a set of triggers, and a set of stoppers. Assume that the players are totally ordered and that all sets are non-empty and pairwise disjoint. The game consists of picking random elements of the pool, without replacement, until the pool is empty. The value of the game, , is defined as the expected value of the following quantity: after all triggers have been chosen, and before any stopper has been chosen, the number of players who, when chosen, are larger than all previously chosen players. This is the same as Game E except for the requirement that we start counting only after all triggers have been picked.
Determine the expected value of .
8.4
Given a set of keys , consider constructing a random treap for where we do not introduce the dummy leaves needed for the endogenous property. Is every element of equally likely to be a leaf in this treap? Discuss the implications of your result for the performance of a treap.
8.5
We have shown that for any element in a set of size , the expected depth of a random treap for is . Show that the depth is with high probability. Conclude a similar high probability bound on the height of a random treap. (Hint: One way of achieving this bound is to derive a Chernoff-type bound on the tail of the distribution of the value of Game A.)
8.6
Let be a random treap for a set of size . Determine the expected size of the sub-tree rooted at an element whose rank is .
8.7
Due to C.R. Aragon and R.G. Seidel [30].
Let be a random treap for the set , and let be two elements whose ranks differ by . Prove that the expected length of the unique path from to in is .
8.8
While the Mulmuley games are useful for explaining the analysis of random treaps, they are easily dispensed with. To see this, attempt to provide a direct proof of Lemmas 8.6 and 8.7.
8.9
A finger search tree is a binary search tree with a special pointer (the finger) associated with it. The finger always points to the last item accessed in the tree. Describe how you would implement the FIND operation starting from the finger, rather than the root. Finger search trees perform especially well on a sequence of FINDs that has some locality of reference. Analyze the performance of a random treap in terms of the ranks of the keys accessed during a sequence of FIND operations. (The result in Problem 8.7 may be useful for this purpose.)
8.10
Due to C.R. Aragon and R.G. Seidel [30].
Another important property of random treaps is that they adapt well to scenarios where the elements have specific access frequencies. Suppose that each key in will be accessed a prespecified number of times, but the exact order of the accesses is unknown. Equivalently, consider accesses that involve an element of chosen at random according to a specific distribution that is not necessarily uniform. In either case, the following notion of a weighted treap provides an optimal solution to the resulting data-structuring problem.
(a) Consider a random treap for a set . Associate a positive integer weight with each , and define . Define a random weighted treap as a treap obtained by choosing priorities for each as follows: is the maximum of independent samples from a continuous distribution . Describe how you will maintain a random weighted treap under the full set of operations supported by an unweighted treap.
(b) Prove the following performance bounds for random weighted treaps with an arbitrary choice of the weights .
- The expected time for a
FIND,INSERT, orDELETEoperation involving a key is
where includes the weight of , and the keys and are the predecessor and successor of in the set .
- The expected number of rotations needed for an
INSERTorDELETEoperation involving a key is
where the keys and are the predecessor and successor of in the set .
- The expected time to perform a
JOIN,PASTE, orSPLIToperation involving sets and of total weight and , respectively, is
where is the largest key in and is the smallest key in .
8.11
In Problem 8.10, it was assumed that the access frequency or probability is known in advance, and this knowledge was important in the choice of an appropriate distribution for the elements’ priorities. Explain how weighted treaps can be made to adapt to the observed frequency of access of the elements in the treaps. There is a solution that does not explicitly keep track of the observed frequency and will use no more random bits than in the case where the frequencies are known in advance.
8.12
Let us now analyze the number of random bits needed to implement the operations of a treap. Suppose we pick each priority uniformly at random from the unit interval . Then, the binary representation of each can be generated as a potentially infinite sequence of bits that are the outcome of unbiased coin flips. The idea is to generate only as many bits in this sequence as is necessary for resolving comparisons between different priorities. Suppose we have only generated some prefixes of the binary representations of the priorities of the elements in the treap . Now, while inserting an item , we compare its priority to others’ priorities to determine how should be rotated. While comparing to some , if their current partial binary representation can resolve the comparison, then we are done. Otherwise, they have the same partial binary representation and we keep generating more bits for each till they first differ.
Compute a tight upper bound on the expected number of coin flips or random bits needed for each update operation. (See also Problem 1.5.)
8.13
Compute a tight upper bound on the expected number of coin flips or random bits needed for each update operation for random skip lists.
8.14
In Lemma 8.10 we gave an upper bound on the expected cost of a FIND operation in a random skip list. Determine the expectation of this random variable as precisely as you can. (Hint: We suggest the following approach. For each element , determine the probability that it lies on the search path for a particular query , and sum this over to get the desired expectation. To determine the probability, find a characterization of the level numbers that will lead to being on the search path.)
8.15
We have shown that the expected cost of a FIND operation in a random skip list is . Prove that the cost is bounded by with high probability, using a Chernoff-type bound for the sum of geometrically distributed random variables. Can you prove a similar probability bound for the INSERT and DELETE operations?
8.16
Give a high probability bound on the space requirement of a random skip list for a set of size .
8.17
Due to W. Pugh [339].
In defining a random leveling for a skip list, we sampled the elements from with probability to determine the next level . Consider instead the skip list obtained by performing the sampling with probability at each level, where .
(a) Determine the expectation of the number of levels , and prove a high probability bound on the value of .
(b) Determine as precisely as you can the expected cost of each operation in this skip list.
(c) Discuss the relation between the choice of the value and the performance of the skip list in practice.
8.18
Formulate and prove results similar to those in Problems 8.7 and 8.9 for random skip lists.
8.19
Consider the scenario described in Problem 8.10 for random treaps. Adapt the random skip list structure to prove similar results, and compare the bounds obtained in the two cases.
8.20
Due to M.N. Wegman and J.L. Carter [414]; see also M. Blum and S. Kannan [66].
Consider the problem of deciding whether two integer multisets and are identical in the sense that each integer occurs the same number of times in both sets. This problem can be solved by sorting the two sets in time, where is the cardinality of the multisets. In Problem 7.4, we considered applying the randomized techniques for verifying polynomial identities to the solution of the multiset identity problem. Suggest a randomized algorithm for solving this problem using universal hash functions. Compare your solution with the randomized algorithm suggested in Problem 7.4.
8.21
Due to J.L. Carter and M.N. Wegman [88].
Suppose that and . Let denote the space of Boolean matrices with rows and columns. For any , denote by the -bit vector obtained by appending a to the end of . For , define
Show that is a -universal hash family. Is it also strongly -universal? Why did we augment the vector to ? Compare the complexity and the use of randomness in this construction with that of the construction described in Section 8.4.
8.22
Due to J.L. Carter and M.N. Wegman [88].
In this problem we consider a weakening of the notion of -universal families of hash functions. Let be as before. For each , define the function , and , and let . Show that is nearly--universal in that, for all ,
Also, show that the bound on the collision probability is close to the best possible for this family of hash functions.
8.23
Due to M.N. Wegman and J.L. Carter [414].
Define a super-strong universal hash family to be a family of hash functions from to that is strongly -universal for all values of simultaneously. Provide a complete characterization of function families that satisfy this definition.
8.24
Due to N. Nisan [320].
An interesting property of a strongly -universal hash function is the following. For any define ; similarly, for any , define . For any , , and , a hash function is said to be -good for and if for chosen uniformly at random from ,
Let be chosen uniformly at random from a strongly -universal hash family . Show that for any , , and , the probability that is not -good for and is at most
8.25
Prove Corollary 8.20.
8.26
Due to M.L. Fredman, J. Komlós, and E. Szemerédi [156].
Show that the hash table representation analyzed in Theorem 8.19 can be constructed with expected preprocessing time, using cells and the same search time.
8.27
Due to M.L. Fredman, J. Komlós, and E. Szemerédi [156].
Show that the hash table representation described in Theorem 8.19 can be constructed with worst-case preprocessing time, using cells and the same search time.
8.28
Due to M.L. Fredman, J. Komlós, and E. Szemerédi [156].
Show that the hashing scheme of Section 8.5 can be modified to use space while still requiring only polynomial preprocessing time and constant query time. (Hint: Increase the size of the primary hash table and observe that most of the bins will be empty. Find an efficient scheme for packing together the non-empty bins, while creating secondary hash tables only for the bins of size greater than .)