Strings

Problems

7.1

A string is a sequence of characters. A string may encode an integer, e.g., "123" encodes 123. In this problem, you are to implement methods that take a string representing an integer and return the corresponding integer, and vice versa. Your code should handle negative integers. You cannot use library functions like stoi in C++ and parseInt in Java.

Implement string/integer inter-conversion functions.

Hint: Build the result one digit at a time.

7.2

In the decimal number system, the position of a digit is used to signify the power of 10 that the digit is to be multiplied with. For example, “314” denotes the number $3 \times 100 + 1 \times 10 + 4 \times 1$ . The base- $b$ number system generalizes the decimal number system: the string $a_{k - 1} a_{k - 2} \dots a_{1} a_{0}$ , where $0 \leq a_{i} < b$ , denotes in base $b$ the integer $a_{0} \times b^{0} + a_{1} \times b^{1} + a_{2} \times b^{2} + \dots + a_{k - 1} \times b^{k - 1}$ .

Write a program that performs base conversion. The input is a string, an integer $b_{1}$ , and another integer $b_{2}$ . The string represents an integer in base $b_{1}$ . The output should be the string representing the integer in base $b_{2}$ . Assume $2 \leq b_{1}, b_{2} \leq 16$ . Use “A” to represent 10, “B” for 11, …, and “F” for 15. For example, if the string is “615”, $b_{1}$ is 7 and $b_{2}$ is 13, then the result should be “1A7”, since $6 \times 7^{2} + 1 \times 7 + 5 = 1 \times 1 3^{2} + 10 \times 13 + 7$ .

Hint: What base can you easily convert to and from?

7.3

Spreadsheets often use an alphabetical encoding of the successive columns. Specifically, columns are identified by “A”, “B”, “C”, …, “X”, “Y”, “Z”, “AA”, “AB”, …, “ZZ”, “AAA”, “AAB”, …

Implement a function that converts a spreadsheet column id to the corresponding integer, with “A” corresponding to 1. For example, you should return 4 for “D”, 27 for “AA”, 702 for “ZZ”, etc. How would you test your code?

Hint: There are 26 characters in ["A", "Z"], and each can be mapped to an integer.

Variant: Solve the same problem with “A” corresponding to 0.

Variant: Implement a function that converts an integer to the spreadsheet column id. For example, you should return “D” for 4, “AA” for 27, and “ZZ” for 702.

7.4

Consider the following two rules that are to be applied to an array of characters.

Replace each 'a' by two 'd's.
Delete each entry containing a 'b'.

For example, applying these rules to the array $⟨ a, c, d, b, b, c, a ⟩$ results in the array $⟨ d, d, c, d, d, c, d, d ⟩$ .

Write a program which takes as input an array of characters, and removes each 'b' and replaces each 'a' by two 'd's. Specifically, along with the array, you are provided an integer-valued size. Size denotes the number of entries of the array that the operation is to be applied to. You do not have to worry about preserving subsequent entries. For example, if the array is $⟨ a, b, a, c,_⟩$ and the size is 4, then you can return $⟨ d, d, d, d, c ⟩$ . You can assume there is enough space in the array to hold the final result.

Hint: Consider performing multiple passes on the array.

Variant: You have an array $C$ of characters. The characters may be letters, digits, blanks, and punctuation. The telex-encoding of the array $C$ is an array $T$ of characters in which letters, digits, and blanks appear as before, but punctuation marks are spelled out. For example, telex-encoding entails replacing the character ”.” by the string “DOT”, the character ”,” by “COMMA”, the character ”?” by “QUESTION MARK”, and the character ”!” by “EXCLAMATION MARK”. Design an algorithm to perform telex-encoding with $O (1)$ space.

Variant: Write a program which merges two sorted arrays of integers, $A$ and $B$ . Specifically, the final result should be a sorted array of length $m + n$ , where $n$ and $m$ are the lengths of $A$ and $B$ , respectively. Use $O (1)$ additional storage. Assume the result is stored in $A$ , which has sufficient space. These arrays are C-style arrays, i.e., contiguous preallocated blocks of memory.

7.5

For the purpose of this problem, define a palindromic string to be a string which, when all the nonalphanumeric characters are removed, reads the same front to back ignoring case. For example, “A man, a plan, a canal, Panama.” and “Able was I, ere I saw Elba!” are palindromic, but “Ray a Ray” is not.

Implement a function which takes as input a string $s$ and returns true if $s$ is a palindromic string.

Hint: Use two indices.

7.6

Given a string containing a set of words separated by whitespace, we would like to transform it to a string in which the words appear in the reverse order. For example, “Alice likes Bob” transforms to “Bob likes Alice”. We do not need to keep the original string.

Implement a function for reversing the words in a string $s$ .

Hint: It’s difficult to solve this with one pass.

7.7

Each digit, apart from 0 and 1, in a phone keypad corresponds to one of three or four letters of the alphabet, as shown in Figure 7.1. Since words are easier to remember than numbers, it is natural to ask if a 7- or 10-digit phone number can be represented by a word. For example, “2276696” corresponds to “ACRONYM” as well as “ABPOMZN”.

Write a program which takes as input a phone number, specified as a string of digits, and returns all possible character sequences that correspond to the phone number. The cell phone keypad is specified by a mapping that takes a digit and returns the corresponding set of characters. The character sequences do not have to be legal words or phrases.

Hint: Use recursion.

Variant: Solve the same problem without using recursion.

7.8

The look-and-say sequence starts with 1. Subsequent numbers are derived by describing the previous number in terms of consecutive digits. Specifically, to generate an entry of the sequence from the previous entry, read off the digits of the previous entry, counting the number of digits in groups of the same digit. For example, 1; one 1; two 1s; one 2 then one 1; one 1, then one 2, then two 1s; three 1s, then two 2s, then one 1. The first eight numbers in the look-and-say sequence are $⟨ 1, 11, 21, 1211, 111221, 312211, 13112221, 1113213211 ⟩$ .

Write a program that takes as input an integer $n$ and returns the $n$ th integer in the look-and-say sequence. Return the result as a string.

Hint: You need to return the result as a string.

7.9

The Roman numeral representation of positive integers uses the symbols $I, V, X, L, C, D, M$ . Each symbol represents a value, with $I$ being 1, $V$ being 5, $X$ being 10, $L$ being 50, $C$ being 100, $D$ being 500, and $M$ being 1000.

In this problem we give simplified rules for representing numbers in this system. Specifically, define a string over the Roman number symbols to be a valid Roman number-string if symbols appear in nonincreasing order, with the following exceptions allowed:

$I$ can immediately precede $V$ and $X$ .
$X$ can immediately precede $L$ and $C$ .
$C$ can immediately precede $D$ and $M$ .

Back-to-back exceptions are not allowed, e.g., $I XC$ is invalid, as is $C D M$ .

A valid complex Roman number string represents the integer which is the sum of the symbols that do not correspond to exceptions; for the exceptions, add the difference of the larger symbol and the smaller symbol.

For example, the strings “XXXXXIIIIIIIII”, “LVIIII”, and “LIX” are valid Roman number strings representing 59. The shortest valid complex Roman number string corresponding to the integer 59 is “LIX”.

Write a program which takes as input a valid Roman number string $s$ and returns the integer it corresponds to.

Hint: Start by solving the problem assuming no exception cases.

Variant: Write a program that takes as input a string of Roman number symbols and checks whether that string is valid.

Variant: Write a program that takes as input a positive integer $n$ and returns a shortest valid simple Roman number string representing $n$ .

7.10

A decimal string is a string consisting of digits between 0 and 9. Internet Protocol (IP) addresses can be written as four decimal strings separated by periods, e.g., 192.168.1.201. A careless programmer mangles a string representing an IP address in such a way that all the periods vanish.

Write a program that determines where to add periods to a decimal string so that the resulting string is a valid IP address. There may be more than one valid IP address corresponding to a string, in which case you should print all possibilities. For example, if the mangled string is “19216811” then two corresponding IP addresses are 192.168.1.1 and 19.216.81.1. There are seven other possible IP addresses for this string.

Hint: Use nested loops.

Variant: Solve the analogous problem when the number of periods is a parameter $k$ and the string length is unbounded.

7.11

We illustrate what it means to write a string in sinusoidal fashion by means of an example. The string “Hello_World!” written in sinusoidal fashion is

 e   _   l
H l o W r d
  l   o   !

Here _ denotes a blank.

Define the snakestring of $s$ to be the left-right top-to-bottom sequence in which characters appear when $s$ is written in sinusoidal fashion. For example, the snakestring for “Hello_World!” is “e_lHloWrdlo!”, where _ again denotes a blank.

Write a program which takes as input a string $s$ and returns the snakestring of $s$ .

Hint: Try concrete examples, and look for periodicity.

7.12

Run-length encoding (RLE) compression offers a fast way to do efficient on-the-fly compression and decompression of strings. The idea is simple: encode successive repeated characters by the repetition count and the character. For example, the RLE of “aaaabcccaa” is “4a1b3c2a”. The decoding of “3e4f2e” returns “eeeffffee”.

Implement run-length encoding and decoding functions. Assume the string to be encoded consists of letters of the alphabet, with no digits, and the string to be decoded is a valid encoding.

Hint: This is similar to converting between binary and string representations.

7.13

A good string search algorithm is fundamental to the performance of many applications. Several clever algorithms have been proposed for string search, each with its own trade-offs. As a result, there is no single perfect answer. If someone asks you this question in an interview, the best way to approach this problem would be to work through one good algorithm in detail and discuss at a high level other algorithms.

Given two strings $s$ (the “search string”) and $t$ (the “text”), find the first occurrence of $s$ in $t$ .

Hint: Form a signature from a string.

Answers

7.1

Convert a string to an integer with a left-to-right fold: multiply the running value by 10 and add the next digit, applying the sign at the end. Convert an integer to a string by repeatedly taking the least-significant digit with modulo 10, appending digits in reverse order, then reversing and restoring the sign. Handle 0 explicitly. Both directions run in $O (n)$ time.

7.2

Convert from base $b_{1}$ to an ordinary integer using the same multiply-and-add idea as in Problem 7.1, then convert that integer to base $b_{2}$ by repeated division and remainder, mapping values 10 through 15 to A through F. A recursive helper or a reverse-at-the-end pass both work. The running time is $O (n (1 + lo g_{b_{2}} b_{1}))$ , where $n$ is the length of the input string.

7.3

Treat the column label like a base-26 numeral, except that A maps to 1 rather than 0. Scan left to right, updating result = result * 26 + value(c). This yields an $O (n)$ algorithm. Good tests cluster around boundaries such as A, Z, AA, AZ, BA, and ZZ.

7.4

Do two passes. In the first pass, remove every b by compacting the remaining characters to the left and count how many as remain. That tells you the final length. In the second pass, work backwards from the end of the compacted prefix, writing each a as dd and copying every other character once. This runs in $O (n)$ time using $O (1)$ extra space.

7.5

Use two indices, one starting at the beginning and the other at the end. Advance each index until it points at an alphanumeric character, compare the lowercase versions, and stop on the first mismatch. If the indices cross, the string is palindromic. The running time is $O (n)$ and the extra space is $O (1)$ .

7.6

Reverse the entire string first. That puts the words in reverse order but also reverses the letters inside each word. Then scan for word boundaries and reverse each word individually. On mutable strings this is in-place and runs in $O (n)$ time with $O (1)$ extra space.

7.7

Use recursion or backtracking. Keep a partial mnemonic buffer, and at position $i$ try every letter mapped from digit $i$ , recursing to position $i + 1$ . When the buffer is full, append a copy to the result. If the phone number has length $n$ , the worst-case number of leaves is $4^{n}$ , and each completed mnemonic costs $O (n)$ to copy, so the total running time is $O (4^{n} n)$ .

7.8

Generate the sequence iteratively. Start from "1", and to obtain the next term, scan the current string, count the length of each maximal run of equal digits, and append the count followed by the digit. Repeating this process $n - 1$ times yields the $n$ th term. A simple upper bound on the running time is $O (n 2^{n})$ .

7.9

Scan the Roman numeral from right to left. Keep the value of the symbol to the right; if the current symbol is smaller, subtract it, otherwise add it. Since the input is guaranteed to be valid, this handles the six allowed subtractive cases without extra validation logic. The running time is $O (n)$ .

7.10

A valid IPv4 address has exactly four parts, each between 0 and 255, with no leading zeros unless the part is exactly 0. Enumerate all placements of the three periods, restricting each part length to 1 through 3 characters, and validate the four resulting substrings. The search space is constant-sized, so the running time is $O (1)$ .

7.11

Observe the period-4 pattern in the sinusoidal layout. The top row consists of indices 1, 5, 9, …, the middle row of 0, 2, 4, …, and the bottom row of 3, 7, 11, … Concatenating those three traversals yields the snakestring in $O (n)$ time.

7.12

For decoding, scan the encoded string, accumulate the current count from consecutive digits, then append that many copies of the following character. For encoding, scan the input and count runs of equal characters, emitting count followed by the character whenever the run ends. Both directions are linear in the length of the processed string.

7.13

Use Rabin-Karp. Hash the pattern and the first window of the text of the same length, then slide the window one character at a time using a rolling hash update. Whenever the hashes match, do a direct string comparison to guard against collisions. With a good hash, the running time is $O (m + n)$ for pattern length $m$ and text length $n$ .

Takashi's Notes

Explorer

Strings

Strings

Problems

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

7.9

7.10

7.11

7.12

7.13

Answers

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

7.9

7.10

7.11

7.12

7.13

Graph View

Table of Contents

Backlinks