Searching

Problems

12.1

Binary search commonly asks for the index of any element of a sorted array that is equal to a specified element. The following problem has a slight twist on this.

Write a method that takes a sorted array and a key and returns the index of the first occurrence of that key in the array. For example, when applied to the array in Figure 12.1 your algorithm should return 3 if the given key is 108; if it is 285, your algorithm should return 6.

Hint: What happens when every entry equals $k$ ? Don’t stop when you first see $k$ .

Variant: Design an efficient algorithm that takes a sorted array and a key, and finds the index of the first occurrence of an element greater than that key. For example, when applied to the array in Figure 12.1 on the preceding page your algorithm should return 9 if the key is 285; if it is -13, your algorithm should return 1.

Variant: Let $A$ be an unsorted array of $n$ integers, with $A [0] \geq A [1]$ and $A [n - 2] \leq A [n - 1]$ . Call an index $i$ a local minimum if $A [i]$ is less than or equal to its neighbors. How would you efficiently find a local minimum, if one exists?

Variant: Write a program which takes a sorted array $A$ of integers, and an integer $k$ , and returns the interval enclosing $k$ , i.e., the pair of integers $L$ and $U$ such that $L$ is the first occurrence of $k$ in $A$ and $U$ is the last occurrence of $k$ in $A$ . If $k$ does not appear in $A$ , return [-1, -1]. For example if $A = ⟨ 1, 2, 2, 4, 4, 4, 7, 11, 11, 13 ⟩$ and $k = 11$ , you should return [7, 8].

Variant: Write a program which tests if $p$ is a prefix of a string in an array of sorted strings.

12.2

Design an efficient algorithm that takes a sorted array of distinct integers, and returns an index $i$ such that the element at index $i$ equals $i$ . For example, when the input is $⟨ - 2, 0, 2, 3, 6, 7, 9 ⟩$ your algorithm should return 2 or 3.

Hint: Reduce this problem to ordinary binary search.

Variant: Solve the same problem when $A$ is sorted but may contain duplicates.

12.3

An array is said to be cyclically sorted if it is possible to cyclically shift its entries so that it becomes sorted. For example, the array in Figure 12.2 is cyclically sorted, a cyclic left shift by 4 leads to a sorted array.

Design an $O (lo g n)$ algorithm for finding the position of the smallest element in a cyclically sorted array. Assume all elements are distinct. For example, for the array in Figure 12.2 on the preceding page, your algorithm should return 4.

Hint: Use the divide and conquer principle.

Variant: A sequence is strictly ascending if each element is greater than its predecessor. Suppose it is known that an array $A$ consists of a strictly ascending sequence followed by a strictly descending sequence. Design an algorithm for finding the maximum element in $A$ .

Variant: Design an $O (lo g n)$ algorithm for finding the position of an element $k$ in a cyclically sorted array of distinct elements.

12.4

Write a program which takes a nonnegative integer and returns the largest integer whose square is less than or equal to the given integer. For example, if the input is 16, return 4; if the input is 300, return 17, since $1 7^{2} = 289 < 300$ and $1 8^{2} = 324 > 300$ .

Hint: Look out for a corner-case.

12.5

Square root computations can be implemented using sophisticated numerical techniques involving iterative methods and logarithms. However, if you were asked to implement a square root function, you would not be expected to know these techniques.

Implement a function which takes as input a floating point value and returns its square root.

Hint: Iteratively compute a sequence of intervals, each contained in the previous interval, that contain the result.

Variant: Given two positive floating point numbers $x$ and $y$ , how would you compute $\frac{x}{y}$ to within a specified tolerance $ϵ$ if the division operator cannot be used? You cannot use any library functions, such as log and exp; addition and multiplication are acceptable.

12.6

Call a 2D array sorted if its rows and its columns are nondecreasing.

Design an algorithm that takes a 2D sorted array and a number and checks whether that number appears in the array. For example, if the input is the 2D sorted array in Figure 12.3, and the number is 7, your algorithm should return false; if the number is 8, your algorithm should return true.

Hint: Can you eliminate a row or a column per comparison?

12.7

Given an array of comparable objects, you can find either the min or the max of the elements in the array with $n - 1$ comparisons, where $n$ is the length of the array.

Comparing elements may be expensive, e.g., a comparison may involve a number of nested calls or the elements being compared may be long strings. Therefore, it is natural to ask if both the min and the max can be computed with less than the $2 (n - 1)$ comparisons required to compute the min and the max independently.

Design an algorithm to find the min and max elements in an array. For example, if $A = ⟨ 3, 2, 5, 1, 2, 4 ⟩$ , you should return 1 for the min and 5 for the max.

Hint: Use the fact that $a < b$ and $b < c$ implies $a < c$ to reduce the number of comparisons used by the brute-force approach.

Variant: What is the least number of comparisons required to find the min and the max in the worst-case?

12.8

Many algorithms require as a subroutine the computation of the $k$ th largest element of an array. The first largest element is simply the largest element. The $n$ th largest element is the smallest element, where $n$ is the length of the array.

For example, if the input array $A = ⟨ 3, 2, 1, 5, 4 ⟩$ , then $A [3]$ is the first largest element in $A$ , $A [0]$ is the third largest element in $A$ , and $A [2]$ is the fifth largest element in $A$ .

Design an algorithm for computing the $k$ th largest element in an array. Assume entries are distinct.

Hint: Use divide and conquer in conjunction with randomization.

Variant: Design an algorithm for finding the median of an array.

Variant: Design an algorithm for finding the $k$ th largest element of $A$ in the presence of duplicates. The $k$ th largest element is defined to be $A [k - 1]$ after $A$ has been sorted in a stable manner, i.e., if $A [i] = A [j]$ and $i < j$ then $A [i]$ must appear before $A [j]$ after stable sorting.

Variant: A number of apartment buildings are coming up on a new street. The postal service wants to place a single mailbox on the street. Their objective is to minimize the total distance that residents have to walk to collect their mail each day. Different buildings may have different numbers of residents. Devise an algorithm that computes where to place the mailbox so as to minimize the total distance that residents travel to get to the mailbox. Assume the input is specified as an array of building objects, where each building object has a field indicating the number of residents in that building, and a field indicating the building’s distance from the start of the street.

12.9

The storage capacity of hard drives dwarfs that of RAM. This can lead to interesting space-time trade-offs.

Suppose you were given a file containing roughly one billion IP addresses, each of which is a 32-bit quantity. How would you programmatically find an IP address that is not in the file? Assume you have unlimited drive space but only a few megabytes of RAM at your disposal.

Hint: Can you be sure there is an address which is not in the file?

12.10

If an array contains $n - 1$ integers, each between 0 and $n - 1$ , inclusive, and all numbers in the array are distinct, then it must be the case that exactly one number between 0 and $n - 1$ is absent.

We can determine the missing number in $O (n)$ time and $O (1)$ space by computing the sum of the elements in the array. Since the sum of all the numbers from 0 to $n - 1$ , inclusive, is $\frac{( n - 1 ) n}{2}$ , we can subtract the sum of the numbers in the array from $\frac{( n - 1 ) n}{2}$ to get the missing number.

Similarly, if the array contains $n + 1$ integers, each between 0 and $n - 1$ , inclusive, with exactly one element appearing twice, the duplicated integer will be equal to the sum of the elements of the array minus $\frac{( n - 1 ) n}{2}$ .

You are given an array of $n$ integers, each between 0 and $n - 1$ , inclusive. Exactly one element appears twice, implying that exactly one number between 0 and $n - 1$ is missing from the array. How would you compute the duplicate and missing numbers?

Hint: Consider performing multiple passes through the array.

Answers

12.1

Run binary search, but when you encounter k, record the index and continue searching the left half, since no position to the right can be the first occurrence. If A[mid] < k, go right; otherwise go left. This preserves the $O (lo g n)$ time bound and uses $O (1)$ space.

12.2

Binary search on the derived quantity A[i] - i. In a sorted array of distinct integers, this value is strictly increasing, so if A[mid] - mid is positive the solution, if any, must lie to the left, and if it is negative it must lie to the right. Equality gives an answer immediately. This runs in $O (lo g n)$ time.

12.3

Compare the middle element with the rightmost element. If A[mid] > A[right], the minimum must lie strictly to the right of mid; otherwise it lies in [left, mid]. Repeating this shrinks the interval exactly like binary search and finds the minimum index in $O (lo g n)$ time and $O (1)$ space.

12.4

Binary search the integer interval [0, k] for the largest value whose square does not exceed k. Maintain the invariant that everything left of left is feasible and everything right of right is infeasible. The answer is left - 1 when the search terminates. The running time is $O (lo g k)$ .

12.5

Binary search over a real interval containing $x$ . If x >= 1, search in [1, x]; otherwise search in [x, 1]. At each step compare mid^2 with x and discard the half that cannot contain the answer, stopping when the interval is within the required tolerance. This gives $O (lo g \frac{x}{s})$ time in the book’s notation, where $s$ is the tolerance.

12.6

Start from the top-right corner. If the current value is too small, eliminate the entire row by moving down; if it is too large, eliminate the entire column by moving left; if it matches, return true. Each comparison removes one row or one column, so at most $m + n - 1$ entries are inspected, for $O (m + n)$ time.

12.7

Process the array in pairs. First compare the two elements in each pair to decide which can still be the local minimum candidate and which can still be the local maximum candidate. Then compare the smaller member only against the global minimum and the larger member only against the global maximum. This reduces the worst-case number of comparisons to about $\frac{3 n}{2} - 2$ while keeping $O (n)$ time and $O (1)$ space.

12.8

Use randomized selection, i.e., quickselect. Pick a pivot at random, partition the array into elements greater than the pivot and less than the pivot, and inspect the pivot’s final rank. If exactly k - 1 elements are larger, the pivot is the answer; otherwise recurse or iterate into the side that can still contain the $k$ th largest element. The expected running time is $O (n)$ , worst case $O (n^{2})$ , and extra space can be $O (1)$ if partitioning is done in place.

12.9

Use two passes over the file. In the first pass, count how many addresses fall into each 16-bit prefix bucket. Because the file has fewer than $2^{32}$ entries, at least one bucket must be underfull. In the second pass, restrict attention to that bucket and mark its lower 16-bit suffixes in a bitset. Any unset bit yields a missing IP address. This fits in a few megabytes: the count array has $2^{16}$ entries and the bitset has $2^{16}$ bits.

12.10

Let miss_xor_dup be the XOR of all indices 0..n-1 and all array entries. This equals missing XOR duplicate. Isolate any set bit where those two values differ, and XOR together only the indices and entries having that bit set. The result is one of the two target numbers. A final pass determines whether that value is the duplicate or the missing value, and the other is recovered by XOR. The algorithm runs in $O (n)$ time and $O (1)$ space.

Takashi's Notes

Explorer

Searching

Searching

Problems

12.1

12.2

12.3

12.4

12.5

12.6

12.7

12.8

12.9

12.10

Answers

12.1

12.2

12.3

12.4

12.5

12.6

12.7

12.8

12.9

12.10

Graph View

Table of Contents

Backlinks