Heaps
Prev: Binary Trees Next: Searching
Problems
11.1
This problem is motivated by the following scenario. You are given 500 files, each containing stock trade information for an S&P 500 company. Each trade is encoded by a line in the format 1232111,AAPL,30,456.12. The first number is the time of the trade expressed as the number of milliseconds since the start of the day’s trading. Lines within each file are sorted in increasing order of time. The remaining values are the stock symbol, number of shares, and price. You are to create a single file containing all the trades from the 500 files, sorted in order of increasing trade times.
In the abstract, write a program that takes as input a set of sorted sequences and computes the union of these sequences as a sorted sequence. For example, if the input is , , and , then the output is .
Hint: Which part of each sequence is significant as the algorithm executes?
11.2
An array is said to be -increasing-decreasing if elements repeatedly increase up to a certain index after which they decrease, then again increase, a total of times.
Design an efficient algorithm for sorting a -increasing-decreasing array.
Hint: Can you cast this in terms of combining sorted arrays?
11.3
Often data is almost-sorted. For example, a server receives timestamped stock quotes and earlier quotes may arrive slightly after later quotes because of differences in server loads and network routes. In this problem we address efficient ways to sort such data.
Write a program which takes as input a very long sequence of numbers and prints the numbers in sorted order. Each number is at most away from its correctly sorted position. Such an array is sometimes referred to as being -sorted. For example, no number in the sequence is more than 2 away from its final sorted position.
Hint: How many numbers must you read after reading the th number to be sure you can place it in the correct location?
11.4
Consider a coordinate system for the Milky Way, in which Earth is at . Model stars as points, and assume distances are in light years. The Milky Way consists of approximately stars, and their coordinates are stored in a file.
How would you compute the stars which are closest to Earth?
Hint: Suppose you know the closest stars in the first stars. If the th star is to be added to the set of closest stars, which element in that set should be evicted?
Variant: Design an time algorithm that reads a sequence of elements and, for each element, starting from the th element, prints the th largest element read up to that point. The length of the sequence is not known in advance. Your algorithm cannot use more than additional storage. What are the worst-case inputs for your algorithm?
11.5
You want to compute the running median of a sequence of numbers. The sequence is presented to you in a streaming fashion, so you cannot back up to read an earlier value, and you need to output the median after reading in each new element. For example, if the input is the output is .
Design an algorithm for computing the running median of a sequence.
Hint: Avoid looking at all values each time you read a new value.
11.6
A heap contains limited information about the ordering of elements, so unlike a sorted array or a balanced BST, naive algorithms for computing the largest elements have a time complexity that depends linearly on the number of elements in the collection.
Given a max-heap, represented as an array , design an algorithm that computes the largest elements stored in the max-heap. You cannot modify the heap. For example, if the heap is the one shown in Figure 11.1(a) on Page 173, then the array representation is , and the four largest elements are 561, 314, 401, and 359.
11.7
We discussed the notion of reduction when describing problem solving patterns. Usually, reductions are used to solve a more complex problem using a solution to a simpler problem as a subroutine.
Occasionally it makes sense to go the other way. For example, if we need the functionality of a heap, we can use a BST library, which is more commonly available, with modest performance penalties with respect, for example, to an array-based implementation of a heap.
How would you implement a stack API using a heap?
Hint: Store an additional value with each element that is inserted.
Variant: How would you implement a queue API using a heap?
Answers
11.1
Maintain the first unmerged element from each input sequence in a min-heap. Repeatedly extract the smallest heap entry, append it to the output, and then insert the next element from the same sequence if one exists. The heap never holds more than entries, where is the number of input sequences, so the running time is and the extra space is beyond the output.
11.2
Decompose the array into maximal monotone runs. Reverse each decreasing run so that every run becomes sorted in increasing order, then merge the resulting sorted arrays using the solution to Problem 11.1. The total running time is .
11.3
Keep a min-heap of the next candidates. After reading the first elements, each new input value can be inserted into the heap and the minimum can be emitted immediately, because no later element can precede it by more than positions. When the input ends, repeatedly extract the remaining heap elements. This runs in time and uses space.
11.4
Track only the current closest stars in a max-heap keyed by distance from Earth. Insert each new star; if the heap grows to size , remove the farthest star. At the end, the heap contains exactly the answer set. This processes the stream in time using additional space.
11.5
Split the data into two heaps: a max-heap for the smaller half and a min-heap for the larger half. Keep them balanced so their sizes differ by at most one, with the min-heap allowed to have one extra element when the count is odd. After inserting a new value into the appropriate heap, rebalance if needed. The median is either the top of the min-heap or the average of the two heap tops. Each update costs in the book’s formulation.
11.6
Use a max-heap of candidates whose entries store both an array index and its value. Initialize it with the root at index 0. Repeatedly extract the largest candidate, append its value to the result, and insert its children indices if they exist. Because every extracted candidate can add at most two more, the algorithm performs heap operations, each taking , for total time and extra space .
11.7
Augment each pushed value with a monotonically increasing timestamp and store pairs in a max-heap ordered by timestamp. The most recently pushed element then rises to the top, so push inserts a timestamped pair, pop extracts the heap maximum, and peek returns the maximum without removing it. push and pop are both and peek is .