Combinatorics Probability and Information Theory

Prev: Digital Signatures Next: Elliptic Curves and Cryptography

Exercises

Section 5.1. Basic Principles of Counting

5.1

The Rhind papyrus is an ancient Egyptian mathematical manuscript that is more than 3500 years old. Problem 79 of the Rhind papyrus poses a problem that can be paraphrased as follows: there are seven houses; in each house lives seven cats; each cat kills seven mice; each mouse has eaten seven spelt seeds; each spelt seed would have produced seven hekat of spelt. What is the sum of all of the named items? Solve this 3500 year old problem.

5.2

(a) How many $n$ -tuples $(x_{1}, x_{2}, \dots, x_{n})$ are there if the coordinates are required to be integers satisfying $0 \leq x_{i} < q$ ?

(b) Same question as (a), except now there are separate bounds $0 \leq x_{i} < q_{i}$ for each coordinate.

(c) How many $n$ -by- $n$ matrices are there if the entries $x_{i, j}$ of the matrix are integers satisfying $0 \leq x_{i, j} < q$ ?

(d) Same question as (a), except now the order of the coordinates does not matter. So for example, $(0, 0, 1, 3)$ and $(1, 0, 3, 0)$ are considered the same. This one is rather tricky.

(e) Twelve students are each taking four classes, for each class they need two loose-leaf notebooks, for each notebook they need 100 sheets of paper, and each sheet of paper has 32 lines on it. Altogether, how many students, classes, notebooks, sheets, and lines are there? Bonus: Make this or a similar problem of your own devising into a rhyme like the St. Ives riddle.

5.3

(a) List all of the permutations of the set ${A, B, C}$ .

(b) List all of the permutations of the set ${1, 2, 3, 4}$ .

(d) Seven students are to be assigned to seven dormitory rooms, each student receiving his or her own room. In how many ways can this be done?

(e) How many different words can be formed with the four symbols $A, A, B, C$ ?

5.4

(a) List the 24 possible permutations of the letters $A_{1}, A_{2}, B_{1}, B_{2}$ . If $A_{1}$ is indistinguishable from $A_{2}$ , and $B_{1}$ is indistinguishable from $B_{2}$ , show how the permutations become grouped into 6 distinct letter arrangements, each containing 4 of the original 24 permutations.

(b) Using the seven symbols $A, A, A, A, B, B, B$ , how many different seven letter words can be formed?

(d) Using the seven symbols $A, A, A, A, B, B, B$ , how many different five letter words can be formed?

5.5

(a) There are 100 students eligible for an award, and the winner gets to choose from among 5 different possible prizes. How many possible outcomes are there?

(b) Same as in (a), but this time there is a first place winner, a second place winner, and a third place winner, each of whom gets to select a prize. However, there is only one of each prize. How many possible outcomes are there?

(c) Same as in (b), except that there are multiple copies of each prize, so each of the three winners may choose any of the prizes. Now how many possible outcomes are there? Is this larger or smaller than your answer from (b)?

(d) Same as in (c), except that rather than specifying a first, second, and third place winner, we just choose three winning students without differentiating between them. Now how many possible outcomes are there? Compare the size of your answers to (b), (c), and (d).

5.6

Use the binomial theorem (Theorem 5.10) to compute each of the following quantities.

(a) $(5 z + 2)^{3}$ .

(b) $(2 a - 3 b)^{4}$ .

5.7

The binomial coefficients satisfy many interesting identities. Give three proofs of the identity

(j n) = (j - 1 n - 1) + (j n - 1) .

(a) For Proof #1, use the definition of $(j n)$ as $\frac{n !}{( n - j )! j !}$ .

(b) For Proof #2, use the binomial theorem (Theorem 5.10) and compare the coefficients of $x^{j} y^{n - j}$ on the two sides of the identity

(x + y)^{n} = (x + y) (x + y)^{n - 1} .

(c) For Proof #3, argue directly that choosing $j$ objects from a set of $n$ objects can be decomposed into either choosing $j - 1$ objects from $n - 1$ objects or choosing $j$ objects from $n - 1$ objects.

5.8

Let $p$ be a prime number. This exercise sketches another proof of Fermat’s little theorem (Theorem 1.24).

(a) If $1 \leq j \leq p - 1$ , prove that the binomial coefficient $(j p)$ is divisible by $p$ .

(b) Use (a) and the binomial theorem (Theorem 5.10) to prove that

(a + b)^{p} \equiv a^{p} + b^{p} (mod p) for all a, b \in Z .

(d) Use (c) to deduce that $a^{p - 1} \equiv 1 (mod p)$ for all $a$ with $g cd (p, a) = 1$ .

5.9

We know that there are $n!$ different permutations of the set ${1, 2, \dots, n}$ .

(a) How many of these permutations leave no number fixed?

(b) How many of these permutations leave at least one number fixed?

(d) How many of these permutations leave at least two numbers fixed?

For each part of this problem, give a formula or algorithm that can be used to compute the answer for an arbitrary value of $n$ , and then compute the value for $n = 10$ and $n = 26$ . This exercise generalizes Exercise 1.5.

Section 5.2. The Vigenere Cipher

5.10

Encrypt each of the following Vigenere plaintexts using the given keyword and the Vigenere tableau (Table 5.1).

(a) Keyword: hamlet

Plaintext: To be, or not to be, that is the question.

(b) Keyword: fortune

Plaintext: The treasure is buried under the big W.

5.11

Decrypt each of the following Vigenere ciphertexts using the given keyword and the Vigenere tableau (Table 5.1).

(a) Keyword: condiment

Ciphertext:

r s g h z b m c x t d v f s q h n i g q x r n b m
p d n s q s m b t r k u

(b) Keyword: rabbithole

Ciphertext:

k h f e q y m s c i e t c s i g j v p w f f b s q
m o a p x z c s f x e p s o x y e n p k d a i c x
c e b s m t t p t x z o o e q l a f l g k i p o c
z s w q m t a u j w g h b o h v r j t q h u

5.12

Explain how a cipher wheel with rotating inner wheel (see Fig. 1.1 on page 3) can be used in place of a Vigenere tableau (Table 5.1) to perform Vigenere encryption and decryption. Illustrate by describing the sequence of rotations used to perform a Vigenere encryption with the keyword mouse.

5.13

Let

s = "I am the very model of a modern major general."
t = "I have information vegetable, animal, and mineral."

(a) Make frequency tables for $s$ and $t$ .

(b) Compute $IndCo (s)$ and $IndCo (t)$ .

5.14

The following strings are blocks from a Vigenere encryption. It turns out that the keyword contains a repeated letter, so two of these blocks were encrypted with the same shift. Compute $MutIndCo (s_{i}, s_{j})$ for $1 \leq i < j \leq 3$ and use these values to deduce which two strings were encrypted using the same shift.

s1 = iwseesetftuonhdptbunnybioeatneghictdnsevi
s2 = qibfhroeqeickxmirbqlflgkrqkejbejpepldfjbk
s3 = iesnnciiheptevaireittuevmhooottrtaaflnatg

5.15

(a) One of the following two strings was encrypted using a simple substitution cipher, while the other is a random string of letters. Compute the index of coincidence of each string and use the results to guess which is which.

s1 = RCZBWBFHSLPSCPILHBGZJTGBIBJGLYIJIBFHCQQFZBYFP
s2 = KHQWGIZMGKPOYRKHUITDUXLXCWZOTWPAHFOHMGFEVUEJJ

(b) One of the following two strings was encrypted using a simple substitution cipher, while the other is a random permutation of the same set of letters.

s1 = NTDCFVDHCTHKGUNGKEPGXKEWNECKEGWEWETWKUEVHDKK
     CDGCWXKDEEAMNHGNDIWUVWSSCTUNIGDSWKE
s2 = IGWSKGEHEXNGECKVWNKVWNKSUTEHTWHEKDNCDXWSIEKD
     AECKFGNDCPUCKDNCUVWEMGEKWGEUTDGTWHD

Thus their indices of coincidence are identical. Develop a method to compute a bigram index of coincidence, i.e., the frequency of pairs of letters, and use it to determine which string is most likely the encrypted text.

Bonus: Decrypt the encrypted texts in (a) and (b), but be forewarned that the plaintexts are in Latin.

5.16

Table 5.13 is a Vigenere ciphertext in which we have marked some of the repeated trigrams for you. How long do you think the keyword is? Why?

Bonus: Complete the cryptanalysis and recover the plaintext.

nhqrk vvvfe fwgjo mzjgc kocgk lejrj wossy wgvkk hnesg kwebi
bkkcj vqazx wnvll zetjc zwgqz zwhah kwdxj fgnyw gdfgh bitig
mrkwn nsuhy iecru ljjvs qlvvw zzxyv woenx ujgyr kqbfj lvjzx
dxjfg nywus rwoar xhvvx ssmja vkrwt uhktm malcz ygrsz xwnvl
lzavs hyigh rvwpn ljazl nispv jahym ntewj jvrzg qvzcr estul
fkwis tfylk ysnir rddpb svsux zjgqk xouhs zzrjj kyiwc zckov
qyhdv rhhny wqhyi rjdqm iwutf nkzgd vvibg oenwb kolca mskle
cuwwz rgusl zgfhy etfre ijjvy ghfau wvwtn xlljv vywyj apgzw
trggr dxfgs ceyts tiiih vjjvt tcxfj hciiv voaro lrxij vjnok
mvrgw kmirt twfer oimsb qgrgc

5.17

We applied a Kasiski test to the Vigenere ciphertext listed in Table 5.14 and found that the key length is probably 5. We then performed a mutual index of coincidence test to each shift of each pair of blocks and listed the results for you in Table 5.15. Use Table 5.15 to guess the relative rotations of the blocks, as we did in Table 5.6. This will give you a rotated version of the keyword. Try rotating it, as we did in Table 5.7, to find the correct keyword and decrypt the text.

Table 5.14:

togmg gbymk kcqiv dmlxk kbyif vcuek cuuis vvxqs pwwej koqgg
phumt whlsf yovww knhhm rcqfq vvhkw psued ugrsf ctwij khvfa
thkef fwptj ggviv cgdra pgwvm osqxg hkdvt whuev kcwyj psgsn
gfwsl jsfse ooqhw tofsh aciin gfbif gabgj adwsy topml ecqzw
asgvs fwrqs fsfvq rhdrs nmvmk cbhrv kblxk gzi

Table 5.15:

Blocks  Shift amount 0..12
1 2  0.044 0.047 0.021 0.054 0.046 0.038 0.022 0.034 0.057 0.035 0.040 0.023 0.038
1 3  0.038 0.031 0.027 0.037 0.045 0.036 0.034 0.032 0.039 0.039 0.047 0.038 0.050
1 4  0.025 0.039 0.053 0.043 0.023 0.035 0.032 0.043 0.029 0.040 0.041 0.050 0.027
1 5  0.050 0.050 0.025 0.031 0.038 0.045 0.037 0.028 0.032 0.038 0.063 0.033 0.034
2 3  0.035 0.037 0.039 0.031 0.031 0.035 0.047 0.048 0.034 0.031 0.031 0.067 0.053
2 4  0.040 0.033 0.046 0.031 0.033 0.023 0.052 0.027 0.031 0.039 0.078 0.034 0.029
2 5  0.042 0.040 0.042 0.029 0.033 0.035 0.035 0.038 0.037 0.057 0.039 0.038 0.040
3 4  0.032 0.033 0.035 0.049 0.053 0.027 0.030 0.022 0.047 0.036 0.040 0.036 0.052
3 5  0.043 0.043 0.040 0.034 0.033 0.034 0.043 0.035 0.026 0.030 0.050 0.068 0.044
4 5  0.045 0.033 0.044 0.046 0.021 0.032 0.030 0.038 0.047 0.040 0.025 0.037 0.068
 
Blocks  Shift amount 13..25
1 2  0.040 0.063 0.033 0.025 0.032 0.055 0.038 0.030 0.032 0.045 0.035 0.030 0.044
1 3  0.026 0.046 0.042 0.053 0.027 0.024 0.040 0.047 0.048 0.018 0.037 0.034 0.066
1 4  0.042 0.050 0.042 0.031 0.024 0.052 0.027 0.051 0.020 0.037 0.042 0.069 0.031
1 5  0.030 0.048 0.039 0.030 0.034 0.038 0.042 0.035 0.036 0.043 0.055 0.030 0.035
2 3  0.039 0.015 0.030 0.045 0.049 0.037 0.023 0.036 0.030 0.049 0.039 0.050 0.037
2 4  0.027 0.048 0.050 0.037 0.032 0.021 0.035 0.043 0.047 0.041 0.047 0.042 0.035
2 5  0.033 0.035 0.039 0.033 0.037 0.047 0.037 0.028 0.034 0.066 0.054 0.032 0.022
3 4  0.040 0.048 0.041 0.044 0.033 0.028 0.039 0.027 0.036 0.017 0.038 0.051 0.065
3 5  0.039 0.029 0.045 0.040 0.033 0.028 0.031 0.037 0.038 0.036 0.033 0.051 0.036
4 5  0.049 0.033 0.029 0.043 0.028 0.033 0.020 0.040 0.040 0.041 0.039 0.039 0.059

5.18

Table 5.16 gives a Vigenere ciphertext for you to analyze from scratch. It is probably easiest to do so by writing a computer program, but you are welcome to try to decrypt it with just paper and pencil.

mgodt beida psgls akowu hxukc iawlr csoyh prtrt udrqh cengx
uuqtu habxw dgkie ktsnp sekld zlvnh wefss glzrn peaoy lbyig
uaafv eqgjo ewabz saawl rzjpv feyky gylwu btlyd kroec bpfvt
psgki puxfb uxfuq cvymy okagl sactt uwlrx psgiy ytpsf rjfuw
igxhr oyazd rakce dxeyr pdobr buehr uwcue ekfic zehrq ijezr
xsyor tcylf egcy

(a) Make a list of matching trigrams as we did in Table 5.3. Use the Kasiski test on matching trigrams to find the likely key length.

(b) Make a table of indices of coincidence for various key lengths, as we did in Table 5.4. Use your results to guess the probable key length.

(c) Using the probable key length from (a) or (b), make a table of mutual indices of coincidence between rotated blocks, as we did in Table 5.5. Pick the largest indices from your table and use them to guess the relative rotations of the blocks, as we did in Table 5.6.

(d) Use your results from (c) to guess a rotated version of the keyword, and then try the different rotations as we did in Table 5.7 to find the correct keyword and decrypt the text.

5.19

The autokey cipher is similar to the Vigenere cipher, except that rather than repeating the key, it simply uses the key to encrypt the first few letters and then uses the plaintext itself (shifted over) to continue the encryption. For example, in order to encrypt the message “The autokey cipher is cool” using the keyword random, we proceed as follows:

Plaintext  t h e a u t o k e y c i p h e r i s c o o l
Key        r a n d o m t h e a u t o k e y c i p h e r
Ciphertext k h r d i f h r i y w b d r i p k a r v s c

The autokey cipher has the advantage that different messages are encrypted using different keys, except for the first few letters. Further, since the key does not repeat, there is no key length, so the autokey is not directly susceptible to a Kasiski or index of coincidence analysis. A disadvantage of the autokey is that a single mistake in encryption renders the remainder of the message unintelligible.

(a) Encrypt the following message using the autokey cipher:

Keyword: LEAR

Plaintext: Come not between the dragon and his wrath.

(b) Decrypt the following message using the autokey cipher:

Keyword: CORDELIA

Ciphertext: pckkm yowvz ejwzk knyzv vurux cstri tgac

Plaintext  ifmusicbethefoodofloveplayon
Ciphertext azdzwqvjjfbwnqphhmptjsszfjci

Help Eve to figure out the keyword that was used for encryption. Describe your method in sufficient generality to show that the autokey cipher is susceptible to known plaintext attacks.

(d) Bonus Problem: Try to formulate a statistical or algebraic attack on the autokey cipher, assuming that you are given a large amount of ciphertext to analyze.

Section 5.3. Probability Theory

5.20

Use the definition (5.15) of the probability of an event to prove the following basic facts about probability theory.

(a) Let $E$ and $F$ be disjoint events. Then

Pr (E \cup F) = Pr (E) + Pr (F) .

(b) Let $E$ and $F$ be events that need not be disjoint. Then

Pr (E \cup F) = Pr (E) + Pr (F) - Pr (E \cap F) .

(d) Let $E_{1}, E_{2}, E_{3}$ be events. Prove that

Pr (E_{1} \cup E_{2} \cup E_{3}) = Pr (E_{1}) + Pr (E_{2}) + Pr (E_{3}) - Pr (E_{1} \cap E_{2}) - Pr (E_{1} \cap E_{3}) - Pr (E_{2} \cap E_{3}) + Pr (E_{1} \cap E_{2} \cap E_{3}) .

The formulas in (b) and (d) and their generalization to $n$ events are known as the inclusion-exclusion principle.

5.21

We continue with the coin tossing scenario from Example 5.23, so our experiment consists in tossing a fair coin ten times. Compute the probabilities of the following events.

(a) The first and last tosses are both heads.

(b) Either the first toss or the last toss, or both, are heads.

(d) There are exactly $k$ heads and $10 - k$ tails. Compute the probability for each value of $k$ between 0 and 10. Hint: To save time, note that the probability of exactly $k$ heads is the same as the probability of exactly $k$ tails.

(e) There is an even number of heads.

(f) There is an odd number of heads.

5.22

Alice offers to make the following bet with you. She will toss a fair coin 14 times. If exactly 7 heads come up, she will give you $\$ 4 $; o t h er w i seyo u m u s t g i v e h er$ $1$. Would you take this bet? If so, and if you repeated the bet 10000 times, how much money would you expect to win or lose?

5.23

Let $E$ and $F$ be events.

(a) Prove that $Pr (E ∣ E) = 1$ . Explain in words why this is reasonable.

(b) If $E$ and $F$ are disjoint, prove that $Pr (F ∣ E) = 0$ . Explain in words why this is reasonable.

(c) Let $F_{1}, \dots, F_{n}$ be events satisfying $F_{i} \cap F_{j} = \emptyset$ for all $i \neq = j$ . We say that $F_{1}, \dots, F_{n}$ are pairwise disjoint. Prove then that

Pr (i = 1 ⋃ n F_{i}) = i = 1 \sum n Pr (F_{i}) .

(d) Let $F_{1}, \dots, F_{n}$ be pairwise disjoint as in (c), and assume further that

F_{1} \cup \dots \cup F_{n} = Ω,

where recall that $Ω$ is the entire sample space. Prove the following general version of the decomposition formula (5.20) in Proposition 5.24(a):

Pr (E) = i = 1 \sum n Pr (E ∣ F_{i}) Pr (F_{i}) .

(e) Prove a general version of Bayes’s formula:

Pr (F_{i} ∣ E) = \frac{Pr ( E ∣ F _{i} ) Pr ( F _{i} )}{Pr ( E ∣ F _{1} ) Pr ( F _{1} ) + Pr ( E ∣ F _{2} ) Pr ( F _{2} ) + \dots + Pr ( E ∣ F _{n} ) Pr ( F _{n} )} .

5.24

There are two urns containing pens and pencils. Urn #1 contains three pens and seven pencils and Urn #2 contains eight pens and four pencils.

(a) An urn is chosen at random and an object is drawn. What is the probability that it is a pencil?

(b) An urn is chosen at random and an object is drawn. If the object drawn is a pencil, what is the probability that it came from Urn #1?

(c) If an urn is chosen at random and two objects are drawn simultaneously, what is the probability that both are pencils?

5.25

An urn contains 20 silver coins and 10 gold coins. You are the sixth person in line to randomly draw and keep a coin from the urn.

(a) What is the probability that you draw a gold coin?

(b) If you draw a gold coin, what is the probability that the five people ahead of you all drew silver coins?

5.26

Consider the three prisoners scenario described in Example 5.26. Let $A$ , $B$ , and $C$ denote respectively the events that Alice is to be released, Bob is to be released, and Carl is to be released, which we assume to be equally likely, so $Pr (A) = Pr (B) = Pr (C) = \frac{1}{3}$ . Also let $J$ be the event that the jailer tells Alice that Bob is to stay in jail.

(a) Compute the values of $Pr (B ∣ J)$ , $Pr (J ∣ B)$ , and $Pr (J ∣ C)$ .

(b) Compute the values of $Pr (J ∣ A^{c})$ and $Pr (J^{c} ∣ A^{c})$ , where the event $A^{c}$ is the event that Alice stays in jail.

(c) Suppose that if Alice is the one who is to be released, then the jailer flips a fair coin to decide whether to tell Alice that Bob stays in jail or that Carl stays in jail. What is the value of $Pr (A ∣ J)$ ?

(d) Suppose instead that if Alice is the one who is to be released, then the jailer always tells her that Bob will stay in jail. Now what is the value of $Pr (A ∣ J)$ ?

Other similar problems with counterintuitive conclusions include the Monty Hall problem (Exercise 5.27), Bertrand’s box paradox, and the principle of restricted choice in contract bridge.

5.27

The Monty Hall Problem. Monty Hall gives Dan the choice of three curtains. Behind one curtain is a car, while behind the other two curtains are goats. Dan chooses a curtain, but before it is opened, Monty Hall opens one of the other curtains and reveals a goat. He then offers Dan the option of keeping his original curtain or switching to the remaining closed curtain. The Monty Hall problem is to figure out Dan’s best strategy: “To stick or to switch?”

(a) What is the probability that Dan wins the car if he always sticks to his first choice of curtain? What is the probability that Dan wins the car if he always switches curtains? Which is his best strategy? If the answer seems counterintuitive, suppose instead that there are 1000 curtains and that Monty Hall opens 998 goat curtains. Now what are the winning probabilities for the two strategies?

(b) Suppose that we give Monty Hall another option, namely he’s allowed to force Dan to stick with his first choice of curtain. Assuming that Monty Hall dislikes giving away cars, now what is Dan’s best strategy, and what is his probability of winning a car?

(c) More generally, suppose that there are $N$ curtains and $M$ cars, and suppose that Monty Hall opens $K$ curtains that have goats behind them. Compute the probabilities

Pr (Dan wins a car ∣ Dan sticks), Pr (Dan wins a car ∣ Dan switches) .

Which is the better strategy?

5.28

Let $S$ be a set, let $A$ be a property of interest, and suppose that for $m \in S$ , we have

Pr (m does not have property A) = δ .

Suppose further that a Monte Carlo algorithm applied to $m$ and a random number $r$ satisfy:

If the algorithm returns Yes, then $m$ definitely has property $A$ .
If $m$ has property $A$ , then the probability that the algorithm returns Yes is at least $p$ .

Notice that we can restate (1) and (2) as conditional probabilities:

$Pr (m has property A ∣ algorithm returns Yes) = 1$ .
$Pr (algorithm returns Yes ∣ m has property A) \geq p$ .

Suppose that we run the algorithm $N$ times on the number $m$ , and suppose that the algorithm returns No every single time. Derive a lower bound, in terms of $δ$ , $p$ , and $N$ , for the probability that $m$ does not have property $A$ . This generalizes the version of the Monte Carlo method that we studied in Sect. 5.3.3 with $δ = 0.01$ and $p = \frac{1}{2}$ . Be careful to distinguish $p$ from $1 - p$ in your calculations.

5.29

We continue with the setup described in Exercise 5.28.

(a) Suppose that $δ = \frac{9}{10}$ and $p = \frac{3}{4}$ . If we run the algorithm 25 times on the input $m$ and always get back No, what is the probability that $m$ does not have property $A$ ?

(b) Same question as (a), but this time we run the algorithm 100 times.

(c) Suppose that $δ = \frac{99}{100}$ and $p = \frac{1}{2}$ . How many times should we run the algorithm on $m$ to be $99%$ confident that $m$ does not have property $A$ , assuming that every output is No?

(d) Same question as (c), except now we want to be $99.9999%$ confident.

5.30

If an integer $n$ is composite, then the Miller-Rabin test has at least a $75%$ chance of succeeding in proving that $n$ is composite, while it never misidentifies a prime as being composite. Suppose that we run the Miller-Rabin test $N$ times on the integer $n$ and that it fails to prove that $n$ is composite. Show that the probability that $n$ is prime satisfies approximately

Pr (n is prime ∣ the Miller-Rabin test fails N times) \geq 1 - \frac{ln ( n )}{4 ^{N}} .

Hint: Use Exercise 5.28 with appropriate choices of $A$ , $S$ , $δ$ , and $p$ . You may also use the estimate from Sect. 3.4.1 that the probability that $n$ is prime is approximately $1/ ln (n)$ .

5.31

It is natural to assume that if $Pr (E ∣ F)$ is significantly larger than $Pr (E)$ , then somehow $F$ is causing $E$ . Bayes’s formula illustrates the fallacy of this sort of reasoning, since it says that

\frac{Pr ( E ∣ F )}{Pr ( E )} = \frac{Pr ( F ∣ E )}{Pr ( F )} .

So if $F$ is “causing” $E$ , then the same reasoning shows that $E$ is “causing” $F$ . All that one can really say is that $E$ and $F$ are correlated with one another, in the sense that either one of them being true makes it more likely that the other one is true. It is incorrect to deduce a cause-and-effect relation.

Here is a concrete example. Testing shows that first graders are more likely to be good spellers if their shoe sizes are larger than average. This is an experimental fact. Hence if we stretch a child’s foot, it will make them a better speller. Alternatively, by Bayes’s formula, if we give them extra spelling lessons, then their feet will grow faster. Explain why these last two assertions are nonsense, and describe what’s really going on.

5.32

Let $f_{X} (k)$ be the binomial density function (5.23). Prove directly, using the binomial theorem, that $\sum_{k = 0}^{n} f_{X} (k) = 1$ .

5.33

In Example 5.37 we used a differentiation trick to compute the value of the infinite series $\sum_{n = 1}^{\infty} n p (1 - p)^{n - 1}$ . This exercise further develops this useful technique. The starting point is the formula for the geometric series

n = 0 \sum \infty x^{n} = \frac{1}{1 - x} for ∣ x ∣ < 1 (5.57)

and the differential operator

D = x \frac{d}{d x} .

(a) Using the fact that $D (x^{n}) = n x^{n}$ , prove that

n = 1 \sum \infty n x^{n} = \frac{x}{( 1 - x ) ^{2}} (5.58)

by applying $D$ to both sides of (5.57). For which $x$ does the left-hand side of (5.58) converge? Hint: Use the ratio test.

(b) Applying $D$ again, prove that

n = 0 \sum \infty n^{2} x^{n} = \frac{x + x ^{2}}{( 1 - x ) ^{3}} . (5.59)

n = 0 \sum \infty n^{k} x^{n} = \frac{F _{k} ( x )}{( 1 - x ) ^{k + 1}} . (5.60)

Hint: Use induction on $k$ .

(d) The first few polynomials $F_{k} (x)$ in (c) are $F_{0} (x) = 1$ , $F_{1} (x) = x$ , and $F_{2} (x) = x + x^{2}$ . These follow from (5.57), (5.58), and (5.59). Compute $F_{3} (x)$ and $F_{4} (x)$ .

(e) Prove that the polynomial $F_{k} (x)$ in (c) has degree $k$ .

5.34

In each case, compute the expectation of the random variable $X$ .

(a) The values of $X$ are uniformly distributed on the set ${0, 1, 2, \dots, N - 1}$ . See Example 5.28.

(b) The values of $X$ are uniformly distributed on the set ${1, 2, \dots, N}$ .

(d) $X$ is a random variable with a binomial density function; see formula (5.23) in Example 5.29 on page 240.

5.35

Let $X$ be a random variable on the probability space $Ω$ . It might seem more natural to define the expected value of $X$ by the formula

ω \in Ω \sum X (ω) Pr (ω) . (5.61)

Prove that the formula (5.61) gives the same value as Eq. (5.27) on page 244, which we used in the text to define $E (X)$ .

Section 5.4. Collision Algorithms and the Birthday Paradox

5.36

(a) In a group of 23 strangers, what is the probability that at least two of them have the same birthday? How about if there are 40 strangers? In a group of 200 strangers, what is the probability that one of them has the same birthday as your birthday? Hint: See the discussion in Sect. 5.4.1.

(b) Suppose that there are $N$ days in a year, where $N$ could be any number, and that there are $n$ people. Develop a general formula, analogous to (5.28), for the probability that at least two of them have the same birthday. Hint: Do a calculation similar to the proof of (5.28) in the collision theorem (Theorem 5.38), but note that the formula is a bit different because the birthdays are being selected from a single list of $N$ days.

Pr (at least one match) \geq 1 - e^{- (some function of n and N)}

for the probability in (b), analogous to the estimate (5.29).

5.37

A deck of cards is shuffled and the top eight cards are turned over.

(a) What is the probability that the king of hearts is visible?

(b) A second deck is shuffled and its top eight cards are turned over. What is the probability that a visible card from the first deck matches a visible card from the second deck? Note that this is slightly different from Example 5.39 because the cards in the second deck are not being replaced.

5.38

(a) Prove that

e^{- x} \geq 1 - x for all values of x .

Hint: Look at the graphs of $e^{- x}$ and $1 - x$ , or use calculus to compute the minimum of the function $f (x) = e^{- x} - (1 - x)$ .

(b) Prove that for all $a > 1$ , the inequality

e^{- a x} \leq (1 - x)^{a} + \frac{1}{2} a x^{2}

is valid for all $0 \leq x \leq 1$ . This is a challenging problem.

(c) We used the inequality in (a) during the proof of the lower bound (5.29) in the collision theorem (Theorem 5.38). Use (b) to prove that

Pr (at least one red) \leq 1 - e^{- mn / N} + \frac{m n ^{2}}{2 N ^{2}} .

Thus if $N$ is large and $m$ and $n$ are not much larger than $N$ , then the estimate

Pr (at least one red) \approx 1 - e^{- mn / N}

is quite accurate. Hint: Use (b) with $a = m$ and $x = n / N$ .

5.39

Solve the discrete logarithm problem $1 0^{x} = 106$ in the finite field $F_{811}$ by finding a collision among the random powers $1 0^{i}$ and $106 \cdot 1 0^{i}$ that are listed in Table 5.17.

i    g^i  h*g^i      i    g^i  h*g^i      i    g^i  h*g^i      i    g^i  h*g^i
116   96    444     519   291     28     791   496    672     406   801    562
497  326    494     286   239    193     385   437     95     745   194    289
225  757    764     298   358    642     178   527    714     234   304    595
233  517    465     500   789    101     471   117    237     556   252    760
677  787    700     272    24    111      42   448    450     326   649    670
622  523    290     307   748    621     258   413    795     399   263    304

Section 5.5. Pollard’s Rho Method

5.40

Table 5.18 gives some of the computations for the solution of the discrete logarithm problem

1 1^{t} = 41387 in F_{81799} (5.62)

using Pollard’s rho method. Use the data in Table 5.18 to solve (5.62).

i    x_i    y_i    alpha_i  beta_i  gamma_i  delta_i
0       1      1        0       0        0        0
1      11    121        1       0        2        0
2     121  14641        2       0        4        0
3    1331  42876        3       0       12        2
4   14641   7150        4       0       25        4
...
151  4862  33573    40876   45662    29798    73363
152 23112  53431    81754    9527    37394    48058
153  8835  23112    81755    9527    67780    28637
154 15386  15386    81756    9527    67782    28637

5.41

Table 5.19 gives some of the computations for the solution of the discrete logarithm problem

7^{t} = 3018 in F_{7963} (5.63)

using Pollard’s rho method. Extend Table 5.19 until you find a collision, and then solve (5.63).

i   x_i   y_i   alpha_i  beta_i  gamma_i  delta_i
0     1     1       0       0        0        0
1     7    49       1       0        2        0
2    49  2401       2       0        4        0
3   343  6167       3       0        6        0
4  2401  1399       4       0        7        1
...
87 1329  1494    6736    7647    3148     3904
88 1340  1539    6737    7647    3150     3904
89 1417  4767    6738    7647    6302     7808
90 1956  1329    6739    7647    4642     7655

5.42

Write a computer program implementing Pollard’s rho method for solving the discrete logarithm problem and use it to solve each of the following:

(a) $2^{t} = 2495$ in $F_{5011}$ .

(b) $1 7^{t} = 14226$ in $F_{17959}$ .

5.43

Evaluate the integral $I = \int_{0}^{\infty} t^{2} e^{- t^{2} /2} d t$ appearing in the proof of Theorem 5.48. Hint: Write $I^{2}$ as an iterated integral,

I^{2} = \int_{0}^{\infty} \int_{0}^{\infty} x^{2} e^{- x^{2} /2} \cdot y^{2} e^{- y^{2} /2} d x d y,

and switch to polar coordinates.

5.44

This exercise describes Pollard’s rho factorization algorithm. Let $N$ be an integer that is not prime, and let

f : Z / N Z ⟶ Z / N Z

be a mixing function, for example $f (x) = x^{2} + 1 mod N$ . As in the abstract version of Pollard’s rho method (Theorem 5.48), let $x_{0} = y_{0}$ be an initial value, and generate sequences by setting $x_{i + 1} = f (x_{i})$ and $y_{i + 1} = f (f (y_{i}))$ . At each step, also compute the greatest common divisor

g_{i} = g cd (∣ x_{i} - y_{i} ∣, N) .

(a) Let $p$ be the smallest prime divisor of $N$ . If the function $f$ is sufficiently random, show that with high probability we have

g_{k} = p for some k = O (p) .

Hence the algorithm factors $N$ in $O (p)$ steps.

(b) Program Pollard’s rho algorithm with $f (x) = x^{2} + 1$ and $x_{0} = y_{0} = 0$ , and use it to factor the following numbers. In each case, give the smallest value of $k$ such that $g_{k}$ is a nontrivial factor of $N$ and print the ratio $k / N$ .

(i) $N = 2201$ .

(ii) $N = 9409613$ .

(iii) $N = 1782886219$ .

(d) Explain what happens if you run Pollard’s rho algorithm and $N$ is prime.

(e) Explain what happens if you run Pollard’s rho algorithm with $f (x) = x^{2}$ and any initial values for $x_{0}$ .

(f) Try running Pollard’s rho algorithm with the function $f (x) = x^{2} - 2$ . Explain what is happening. Hint: This part is more challenging. It may help to use the identity $f^{n} (u + u^{- 1}) = u^{2^{n}} + u^{- 2^{n}}$ , which you can prove by induction.

Section 5.6. Information Theory

5.45

Consider the cipher that has three keys, three plaintexts, and four ciphertexts that are combined using the following encryption table.

	$m_{1}$	$m_{2}$	$m_{3}$
$k_{1}$	$c_{2}$	$c_{4}$	$c_{1}$
$k_{2}$	$c_{1}$	$c_{3}$	$c_{2}$
$k_{3}$	$c_{3}$	$c_{1}$	$c_{2}$

Suppose further that the plaintexts and keys are used with the following probabilities:

f (m_{1}) = f (m_{2}) = \frac{2}{5}, f (m_{3}) = \frac{1}{5}, f (k_{1}) = f (k_{2}) = f (k_{3}) = \frac{1}{3} .

(a) Compute $f (c_{1})$ , $f (c_{2})$ , $f (c_{3})$ , and $f (c_{4})$ .

(b) Compute $f (c_{1} ∣ m_{1})$ , $f (c_{1} ∣ m_{2})$ , and $f (c_{1} ∣ m_{3})$ . Does this cryptosystem have perfect secrecy?

(d) Compute $f (k_{1} ∣ c_{3})$ and $f (k_{2} ∣ c_{3})$ .

5.46

Suppose that a shift cipher is employed such that each key, i.e., each shift amount from 0 to 25, is used with equal probability and such that a new key is chosen to encrypt each successive letter. Show that this cryptosystem has perfect secrecy by filling in the details of the following steps.

(a) Show that $\sum_{k \in K} f_{M} (d_{k} (c)) = 1$ for every ciphertext $c \in C$ .

(b) Compute the ciphertext density function $f_{C}$ using (5.47), which in this case says that

f_{C} (c) = k \in K \sum f_{K} (k) f_{M} (d_{k} (c)) .

5.47

Give the details of the proof of (5.47), which says that

f_{C} (c) = k \in K c \in e_{k} (M) \sum f_{K} (k) f_{M} (d_{k} (c)) .

Hint: Use the decomposition formula from Exercise 5.23(d).

5.48

Suppose that a cryptosystem has the same number of plaintexts as it does ciphertexts, $# M = # C$ . Prove that for any given key $k \in K$ and any given ciphertext $c \in C$ , there is a unique plaintext $m \in M$ that encrypts to $c$ using the key $k$ . We used this fact during the proof of Theorem 5.56. Notice that the proof does not require the cryptosystem to have perfect secrecy; all that is needed is that $# M = # C$ .

5.49

Let

S_{m, c} = {k \in K : e_{k} (m) = c}

be the set used during the proof of Theorem 5.56. Prove that if $c \neq = c^{'}$ , then $S_{m, c} \cap S_{m, c^{'}} = \emptyset$ . Prove this for any cryptosystem; it is not necessary to assume perfect secrecy.

5.50

Suppose that a cryptosystem satisfies $# K = # M = # C$ and that it has perfect secrecy. Prove that every ciphertext is used with equal probability and that every plaintext is used with equal probability. Hint: We proved one of these during the course of proving Theorem 5.56. The proof of the other is similar.

5.51

Prove the “only if” part of Theorem 5.56, i.e., prove that if a cryptosystem with an equal number of keys, plaintexts, and ciphertexts satisfies conditions (a) and (b) of Theorem 5.56, then it has perfect secrecy.

5.52

Let $X_{n}$ be a uniformly distributed random variable on $n$ objects, and let $r \geq 1$ . Prove directly from Property H3 of entropy that

H (X_{n}^{r}) = rH (X_{n}) .

This generalizes Example 5.58.

5.53

Let $X$ , $Y$ , and $Z_{1}, \dots, Z_{m}$ be random variables as described in Property H3 on page 270. Let

p_{i} = Pr (Y = Z_{i}) and q_{ij} = Pr (Z_{i} = x_{ij}),

so $Pr (X = x_{ij}) = p_{i} q_{ij}$ .

With this notation, Property H3 says that

H ((p_{i} q_{ij})_{1 \leq i \leq n 1 \leq j \leq m_{i}}) = H ((p_{i})_{1 \leq i \leq n}) + i = 1 \sum n p_{i} H ((q_{ij})_{1 \leq j \leq m_{i}}) .

Then the formula (5.51) for entropy given in Theorem 5.60 implies that

i = 1 \sum n j = 1 \sum m_{i} p_{i} q_{ij} lo g_{2} (p_{i} q_{ij}) = i = 1 \sum n p_{i} lo g_{2} (p_{i}) + i = 1 \sum n p_{i} j = 1 \sum m_{i} q_{ij} lo g_{2} (q_{ij}) . (5.64)

Prove directly that (5.64) is true. Hint: Remember that the probabilities satisfy $\sum_{i} p_{i} = 1$ and $\sum_{j} q_{ij} = 1$ .

5.54

Let $F (x)$ be a twice differentiable function with the property that $F^{''} (x) < 0$ for all $x$ in its domain. Prove that $F$ is concave in the sense of (5.52). Conclude in particular that the function $F (x) = lo g x$ is concave for all $x > 0$ .

5.55

Use induction to prove Jensen’s inequality (Theorem 5.61).

5.56

Let $X$ and $Y$ be independent random variables.

(a) Prove that the equivocation $H (X ∣ Y)$ is equal to the entropy $H (X)$ .

(b) If $H (X ∣ Y) = H (X)$ , is it necessarily true that $X$ and $Y$ are independent?

5.57

Prove that key equivocation satisfies the formula

H (K ∣ C) = H (K) + H (M) - H (C)

as described in Proposition 5.64.

5.58

We continue with the cipher described in Exercise 5.45.

(a) Compute the entropies $H (K)$ , $H (M)$ , and $H (C)$ .

(b) Compute the key equivocation $H (K ∣ C)$ .

5.59

Suppose that the key equivocation of a certain cryptosystem vanishes, i.e., suppose that $H (K ∣ C) = 0$ . Prove that even a single observed ciphertext uniquely determines which key was used.

5.60

Write a computer program that reads a text file and performs the following tasks:

Convert all alphabetic characters to lowercase and convert all strings of consecutive nonalphabetic characters to a single space. The reason for leaving in a space is that when you count bigrams and trigrams, you will want to know where words begin and end.
Count the frequency of each letter a to z, print a frequency table, and use your frequency table to estimate the entropy of a single letter in English, as we did in Sect. 5.6.3 using Table 1.3.
Count the frequency of each bigram aa, ab, …, zz, being careful to include only bigrams that appear within words. As an alternative, also allow bigrams that either start or end with a space, in which case there are $2 7^{2} - 1 = 728$ possible bigrams. Print a frequency table of the 25 most common bigrams and their probabilities, and use your full frequency table to estimate the entropy of bigrams in English. In the notation of Sect. 5.6.3, this is the quantity $H (L^{2})$ . Compare $\frac{1}{2} H (L^{2})$ with the value of $H (L)$ from step [1].
Repeat [3], but this time with trigrams. Compare $\frac{1}{3} H (L^{3})$ with the values of $H (L)$ and $\frac{1}{2} H (L^{2})$ from [2] and [3]. Note that for this part, you will need a large quantity of text in order to get some reasonable frequencies.

Try running your program on some long blocks of text. For example, the following noncopyrighted material is available in the form of ordinary text files from Project Gutenberg at http://www.gutenberg.org/. To what extent are the letter frequencies similar and to what extent do they differ in these different texts?

(a) Alice’s Adventures in Wonderland by Lewis Carroll, http://www.gutenberg.org/etext/11

(b) Relativity: the Special and General Theory by Albert Einstein, http://www.gutenberg.org/etext/5001

(d) 20000 Lieues Sous Les Mers (20000 Leagues Under the Sea) by Jules Verne, http://www.gutenberg.org/etext/5097. Note that this one is a little trickier, since first you will need to convert all of the letters to their unaccented forms.

Takashi's Notes

Explorer

Combinatorics Probability and Information Theory

Combinatorics Probability and Information Theory

Exercises

Section 5.1. Basic Principles of Counting

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

Section 5.2. The Vigenere Cipher

5.10

5.11

5.12

5.13

5.14

5.15

5.16

5.17

5.18

5.19

Section 5.3. Probability Theory

5.20

5.21

5.22

5.23

5.24

5.25

5.26

5.27

5.28

5.29

5.30

5.31

5.32

5.33

5.34

5.35

Section 5.4. Collision Algorithms and the Birthday Paradox

5.36

5.37

5.38

5.39

Section 5.5. Pollard’s Rho Method

5.40

5.41

5.42

5.43

5.44

Section 5.6. Information Theory

5.45

5.46

5.47

5.48

5.49

5.50

5.51

5.52

5.53

5.54

5.55

5.56

5.57

5.58

5.59

5.60

Graph View

Table of Contents

Backlinks