Conditional expectation

Prev: Transformations Next: Inequalities and limit theorems

Problems

Exercises marked with s have detailed solutions at http://stat110.net.

Conditional expectation given an event

9.1

Fred wants to travel from Blotchville to Blissville, and is deciding between 3 options (involving different routes or different forms of transportation). The th option would take an average of hours, with a standard deviation of hours. Fred randomly chooses between the 3 options, with equal probabilities. Let be how long it takes for him to get from Blotchville to Blissville.

(a) Find . Is it simply , the average of the expectations?

(b) Find . Is it simply , the average of the variances?


9.2

While Fred is sleeping one night, legitimate emails and spam emails are sent to him. Suppose that and are independent, with and . When he wakes up, he observes that he has 30 new emails in his inbox. Given this information, what is the expected value of how many new legitimate emails he has?


9.3

A group of 21 women and 14 men are enrolled in a medical study. Each of them has a certain disease with probability , independently. It is then found (through extremely reliable testing) that exactly 5 of the people have the disease. Given this information, what is the expected number of women who have the disease?


9.4

A researcher studying crime is interested in how often people have gotten arrested. Let be the number of times that a random person got arrested in the last 10 years. However, data from police records are being used for the researcher’s study, and people who were never arrested in the last 10 years do not appear in the records. In other words, the police records have a selection bias: they only contain information on people who have been arrested in the last 10 years.

So averaging the numbers of arrests for people in the police records does not directly estimate ; it makes more sense to think of the police records as giving us information about the conditional distribution of how many times a person was arrested, given that the person was arrested at least once in the last 10 years. The conditional distribution of , given that , is called a truncated Poisson distribution (see Exercise 14 from Chapter 3 for another example of this distribution).

(a) Find .

(b) Find .


9.5

A fair 20-sided die is rolled repeatedly, until a gambler decides to stop. The gambler pays $1 per roll, and receives the amount shown on the die when the gambler stops (e.g., if the die is rolled 7 times and the gambler decides to stop then, with an 18 as the value of the last roll, then the net payoff is $18 - $7 = $11). Suppose the gambler uses the following strategy: keep rolling until a value of or greater is obtained, and then stop (where is a fixed integer between 1 and 20).

(a) What is the expected net payoff?

Hint: The average of consecutive integers is the same as the average of the first and last of these. See the math appendix for more information about series.

(b) Use R or other software to find the optimal value of .


9.6

Let . Find in two different ways:

(a) by calculus, working with the conditional PDF of given ;

(b) without calculus, by expanding using the law of total expectation.


9.7

You are given an opportunity to bid on a mystery box containing a mystery prize. The value of the prize is completely unknown, except that it is worth at least nothing, and at most a million dollars. So the true value of the prize is considered to be Uniform on (measured in millions of dollars).

You can choose to bid any nonnegative amount (in millions of dollars). If , then your bid is rejected and nothing is gained or lost. If , then your bid is accepted and your net payoff is (since you pay to get a prize worth ).

Find your expected payoff as a function of (be sure to specify it for all ). Then find the optimal bid , to maximize your expected payoff.


9.8

Stat110 solution available.

You get to choose between two envelopes, each of which contains a check for some positive amount of money. Unlike in the two-envelope paradox, it is not given that one envelope contains twice as much money as the other envelope. Instead, assume that the two values were generated independently from some distribution on the positive real numbers, with no information given about what that distribution is.

After picking an envelope, you can open it and see how much money is inside (call this value ), and then you have the option of switching. As no information has been given about the distribution, it may seem impossible to have better than a 50% chance of picking the better envelope. Intuitively, we may want to switch if is “small” and not switch if is “large”, but how do we define “small” and “large” in the grand scheme of all possible distributions? [The last sentence was a rhetorical question.]

Consider the following strategy for deciding whether to switch. Generate a threshold , and switch envelopes if and only if the observed value is less than the value of . Show that this strategy succeeds in picking the envelope with more money with probability strictly greater than .

Hint: Let be the value of (generated by a random draw from the distribution). First explain why the strategy works very well if happens to be in between the two envelope values, and does no harm in any case (i.e., there is no case in which the strategy succeeds with probability strictly less than ).


9.9

There are two envelopes, each of which has a check for a amount of money, measured in thousands of dollars. The amounts in the two envelopes are independent. You get to choose an envelope and open it, and then you can either keep that amount or switch to the other envelope and get whatever amount is in that envelope.

Suppose that you use the following strategy: choose an envelope and open it. If you observe , then stick with that envelope with probability , and switch to the other envelope with probability .

(a) Find the probability that you get the larger of the two amounts.

(b) Find the expected value of what you will receive.


9.10

Suppose people are bidding on a mystery prize that is up for auction. The bids are to be submitted in secret, and the individual who submits the highest bid wins the prize. The th bidder receives a signal , with i.i.d. The value of the prize, , is defined to be the sum of the individual bidders’ signals:

This is known in economics as the wallet game: we can imagine that the people are bidding on the total amount of money in their wallets, and each person’s signal is the amount of money in their own wallet. Of course, the wallet is a metaphor; the game can also be used to model company takeovers, where each of two companies bids to take over the other, and a company knows its own value but not the value of the other company. For this problem, assume the are i.i.d. .

(a) Before receiving her signal, what is bidder 1’s unconditional expectation for ?

(b) Conditional on receiving the signal , what is bidder 1’s expectation for ?

(c) Suppose each bidder submits a bid equal to their conditional expectation for , i.e., bidder bids . Conditional on receiving the signal and winning the auction, what is bidder 1’s expectation for ? Explain intuitively why this quantity is always less than the quantity calculated in (b).


9.11

Stat110 solution available.

A coin with probability of Heads is flipped repeatedly. For (a) and (b), suppose that is a known constant, with .

(a) What is the expected number of flips until the pattern HT is observed?

(b) What is the expected number of flips until the pattern HH is observed?

(c) Now suppose that is unknown, and that we use a prior to reflect our uncertainty about (where and are known constants and are greater than 2). In terms of and , find the corresponding answers to (a) and (b) in this setting.


9.12

A coin with probability of Heads is flipped repeatedly, where . The sequence of outcomes can be divided into runs (blocks of H’s or blocks of T’s), e.g., HHHTTTTHTTTHH becomes HHH TTTT H TTT HH, which has 5 runs, with lengths , respectively. Assume that the coin is flipped at least until the start of the third run.

(a) Find the expected length of the first run.

(b) Find the expected length of the second run.


9.13

A fair 6-sided die is rolled once. Find the expected number of additional rolls needed to obtain a value at least as large as that of the first roll.


9.14

A fair 6-sided die is rolled repeatedly.

(a) Find the expected number of rolls needed to get a 1 followed right away by a 2.

Hint: Start by conditioning on whether or not the first roll is a 1.

(b) Find the expected number of rolls needed to get two consecutive 1’s.

(c) Let be the expected number of rolls needed to get the same value times in a row (i.e., to obtain a streak of consecutive ’s for some not-specified-in-advance value of ). Find a recursive formula for in terms of .

Hint: Divide the time until there are consecutive appearances of the same value into two pieces: the time until there are consecutive appearances, and the rest.

(d) Find a simple, explicit formula for for all . What is (numerically)?


Conditional expectation given a random variable

9.15

Stat110 solution available.

Let be i.i.d., and let be the sample mean. In many statistics problems, it is useful or important to obtain a conditional expectation given . As an example of this, find , where are constants with .


9.16

Let be i.i.d. r.v.s with mean 0, and let . As shown in Example 9.3.6, the expected value of the first term given the sum of the first terms is

Generalize this result by finding for all positive integers and .


9.17

Stat110 solution available.

Consider a group of roommate pairs at a college (so there are students). Each of these students independently decides randomly whether to take a certain course, with probability of success (where “success” is defined as taking the course).

Let be the number of students among these who take the course, and let be the number of roommate pairs where both roommates in the pair take the course. Find and .


9.18

Stat110 solution available.

Show that

so these two expressions for agree.

Hint for the variance: Adding a constant (or something acting as a constant) does not affect variance.


9.19

Let be the height of a randomly chosen adult man, and be his father’s height, where and have been standardized to have mean 0 and standard deviation 1. Suppose that is Bivariate Normal, with and .

(a) Let be the equation of the best line for predicting from (in the sense of minimizing the mean squared error), e.g., if we were to observe then we would predict that is . Now suppose that we want to use to predict , rather than using to predict . Give and explain an intuitive guess for what the slope is of the best line for predicting from .

(b) Find a constant (in terms of ) and an r.v. such that , with independent of .

Hint: Start by finding such that .

(c) Find a constant (in terms of ) and an r.v. such that , with independent of .

(d) Find and .

(e) Reconcile (a) and (d), if your intuitive guess in (a) differed from what the results of (d) implied. Give a clear and correct intuitive explanation of the relationship between the slope of the best line for predicting from and the slope of the best line for predicting from .


9.20

Let .

(a) Find and .

(b) Find .


9.21

Let be a discrete r.v., be an event with , and be the indicator r.v. for .

(a) Explain precisely how the r.v. relates to the numbers and .

(b) Show that , directly from the definitions of expectation and conditional expectation.

Hint: Let , and then find an expression for the PMF of .

(c) Use (b) to give a short proof of the fact that


9.22

Show that the following version of LOTP, which we encountered in Section 7.1, is also a consequence of Adam’s law: for any event and continuous r.v. with PDF ,

Hint: Consider .


9.23

Stat110 solution available.

Let and be random variables with finite variances, and let . This is a residual: the difference between the true value of and the predicted value of based on .

(a) Compute and .

(b) Compute , for the case that with .


9.24

Stat110 solution available.

One of two identical-looking coins is picked from a hat randomly, where one coin has probability of Heads and the other has probability of Heads. Let be the number of Heads after flipping the chosen coin times. Find the mean and variance of .


9.25

Kelly makes a series of bets, each of which she has probability of winning, independently. Initially, she has dollars. Let be the amount she has immediately after her th bet is settled. Let be a constant in , called the betting fraction. On each bet, Kelly wagers a fraction of her wealth, and then she either wins or loses that amount. For example, if her current wealth is $100 and , then she bets $25 and either gains or loses that amount. (A famous choice when is , which is known as the Kelly criterion.) Find (in terms of ).

Hint: First find .


9.26

Let be the number of movies that will be released next year. Suppose that for each movie the number of tickets sold is , independent of other movies and of . Find the mean and variance of the number of movie tickets that will be sold next year.


9.27

A party is being held from 8:00 pm to midnight on a certain night, and people are going to show up. They will all arrive at uniformly random times while the party is going on, independently of each other and of .

(a) Find the expected time at which the first person arrives, given that at least one person shows up. Give both an exact answer in terms of , measured in minutes after 8:00 pm, and an answer rounded to the nearest minute for , expressed in time notation (e.g., 8:20 pm).

(b) Find the expected time at which the last person arrives, given that at least one person shows up. As in (a), give both an exact answer and an answer rounded to the nearest minute for .


9.28

Stat110 solution available.

We wish to estimate an unknown parameter , based on an r.v. we will get to observe. As in the Bayesian perspective, assume that and have a joint distribution. Let be the estimator (which is a function of ). Then is said to be unbiased if , and is said to be the Bayes procedure if .

(a) Let be unbiased. Find (the average squared difference between the estimator and the true value of ), in terms of marginal moments of and .

Hint: Condition on .

(b) Repeat (a), except in this part suppose that is the Bayes procedure rather than assuming that it is unbiased.

Hint: Condition on .

(c) Show that it is impossible for to be both the Bayes procedure and unbiased, except in silly problems where we get to know perfectly by observing .

Hint: If is a nonnegative r.v. with mean 0, then .


9.29

Show that if is a constant, then and are uncorrelated.

Hint: Use Adam’s law to find and .


9.30

Show by example that it is possible to have uncorrelated and such that is not a constant.

Hint: Consider a standard Normal and its square.


9.31

Stat110 solution available.

Emails arrive one at a time in an inbox. Let be the time at which the th email arrives (measured on a continuous scale from some starting point in time). Suppose that the waiting times between emails are i.i.d. , i.e., are i.i.d. .

Each email is non-spam with probability , and spam with probability (independently of the other emails and of the waiting times). Let be the time at which the first non-spam email arrives (so is a continuous r.v., with if the 1st email is non-spam, if the 1st email is spam but the 2nd one isn’t, etc.).

(a) Find the mean and variance of .

(b) Find the MGF of . What famous distribution does this imply that has (be sure to state its parameter values)?

Hint for both parts: Let be the number of emails until the first non-spam (including that one), and write as a sum of terms; then condition on .


9.32

Customers arrive at a store according to a Poisson process of rate customers per hour. Each makes a purchase with probability , independently. Given that a customer makes a purchase, the amount spent has mean (in dollars) and variance .

(a) Find the mean and variance of how much a random customer spends (note that the customer may spend nothing).

(b) Find the mean and variance of the revenue the store obtains in an 8-hour time interval, using (a) and results from this chapter.

(c) Find the mean and variance of the revenue the store obtains in an 8-hour time interval, using the chicken-egg story and results from this chapter.


9.33

Fred’s beloved computer will last an amount of time until it has a malfunction. When that happens, Fred will try to get it fixed. With probability , he will be able to get it fixed. If he is able to get it fixed, the computer is good as new again and will last an additional, independent amount of time until the next malfunction (when again he is able to get it fixed with probability , and so on). If after any malfunction Fred is unable to get it fixed, he will buy a new computer. Find the expected amount of time until Fred buys a new computer. (Assume that the time spent on computer diagnosis, repair, and shopping is negligible.)


9.34

A green die is rolled until it lands 1 for the first time. An orange die is rolled until it lands 6 for the first time. The dice are fair, six-sided dice. Let be the sum of the values of the rolls of the green die (including the 1 at the end) and be the sum of the values of the rolls of the orange die (including the 6 at the end). Two students are debating whether or . They kindly gave permission to quote their arguments here.

Student A: We have . By Adam’s law, the expected sum of the rolls of a die is the expected number of rolls times the expected value of one roll, and each of these factors is the same for the two dice. In more detail, let be the number of rolls of the green die and be the number of rolls of the orange die. By Adam’s law and linearity,

and the same method applied to the orange die gives , which equals .

Student B: Actually, . I agree that the expected number of rolls is the same for the two dice, but the key difference is that we know the last roll is a 1 for the green die and a 6 for the orange die. The expected totals are the same for the two dice excluding the last roll of each, and then including the last roll makes .

(a) Discuss in words the extent to which Student A’s argument is convincing and correct.

(b) Discuss in words the extent to which Student B’s argument is convincing and correct.

(c) Give careful derivations of and .


9.35

Stat110 solution available.

Judit plays in a total of chess tournaments in her career. Suppose that in each tournament she has probability of winning the tournament, independently. Let be the number of tournaments she wins in her career.

(a) Find the mean and variance of .

(b) Find the MGF of . What is the name of this distribution (with its parameters)?


9.36

In Story 8.4.5, we showed (among other things) that if and , then the marginal distribution of is . Derive this result using Adam’s law and MGFs.

Hint: Consider the conditional MGF of .


9.37

Let be i.i.d. r.v.s with mean and variance , and . A bootstrap sample of is a sample of r.v.s formed from the by sampling with replacement with equal probabilities. Let denote the sample mean of the bootstrap sample:

(a) Calculate and for each .

(b) Calculate and .

Hint: Conditional on , the are independent, with a PMF that puts probability at each of the points . As a check, your answers should be random variables that are functions of .

(c) Calculate and .

(d) Explain intuitively why .


9.38

An insurance company covers disasters in two neighboring regions, and . Let and be the indicator r.v.s for whether and are hit by the insured disaster, respectively. The indicators and may be dependent. Let for , and .

The company reimburses a total cost of

to these regions, where has mean and variance . Assume that and are independent of each other and that is independent of .

(a) Find .

(b) Find .


9.39

Stat110 solution available.

A certain stock has low volatility on some days and high volatility on other days. Suppose that the probability of a low volatility day is and of a high volatility day is , and that on low volatility days the percent change in the stock price is , while on high volatility days the percent change is , with . Let be the percent change of the stock on a certain day. The distribution is said to be a mixture of two Normal distributions, and a convenient way to represent is as where is the indicator r.v. of having a low volatility day, , , and are independent.

(a) Find in two ways: using Eve’s law, and by using properties of covariance to calculate .

(b) Recall from Chapter 6 that the kurtosis of an r.v. with mean and standard deviation is defined by

Find the kurtosis of (in terms of , fully simplified). The result will show that even though the kurtosis of any Normal distribution is 0, the kurtosis of is positive and in fact can be very large depending on the parameter values.


9.40

Let , , and be random variables, such that has finite variance. Let

Show that

Also, check that this makes sense in the extreme cases where is independent of and where for some function .

Hint: Use Eve’s law on .


9.41

Show that for any r.v.s and ,

This has a nice intuitive interpretation if we think of as the prediction we would make for based on : given the prediction we would use for predicting from , we no longer need to know to predict ; we can just use the prediction we have. For example, letting , if we observe , then we may or may not know what is (since may not be one-to-one). But even without knowing , we know that the prediction for based on is 7.

Hint: Use Adam’s law with extra conditioning.


9.42

A researcher wishes to know whether a new treatment for the disease conditionitis is more effective than the standard treatment. It is unfortunately not feasible to do a randomized experiment, but the researcher does have the medical records of patients who received the new treatment and those who received the standard treatment. She is worried, though, that doctors tend to give the new treatment to younger, healthier patients. If this is the case, then naively comparing the outcomes of patients in the two groups would be like comparing apples and oranges.

Suppose each patient has background variables , which might be age, height and weight, and measurements relating to previous health status. Let be the indicator of receiving the new treatment. The researcher fears that is dependent on , i.e., that the distribution of given is different from the distribution of given .

In order to compare apples to apples, the researcher wants to match every patient who received the new treatment to a patient with similar background variables who received the standard treatment. But could be a high-dimensional random vector, which often makes it very difficult to find a match with a similar value of .

The propensity score reduces the possibly high-dimensional vector of background variables down to a single number (then it is much easier to match someone to a person with a similar propensity score than to match someone to a person with a similar value of ). The propensity score of a person with background characteristics is defined as

By the fundamental bridge, a person’s propensity score is their probability of receiving the treatment, given their background characteristics. Show that conditional on , the treatment indicator is independent of the background variables .

Hint: This problem relates to the previous one. Show that , which is equivalent to showing .


9.43

This exercise develops a useful identity for covariance, similar in spirit to Adam’s law for expectation and Eve’s law for variance. First define conditional covariance in a manner analogous to how we defined conditional variance:

(a) Show that

This should be true since it is the conditional version of the fact that

and conditional probabilities are probabilities, but for this problem you should prove it directly using properties of expectation and conditional expectation.

(b) ECCE, or the law of total covariance, says that

That is, the covariance of and is the expected value of their conditional covariance plus the covariance of their conditional expectations, where all these conditional quantities are conditional on . Prove this identity.

Hint: We can assume without loss of generality that , since adding a constant to an r.v. has no effect on its covariance with any r.v. Then expand out the covariances on the right-hand side of the identity and apply Adam’s law.


Mixed practice

9.44

A group of friends often go out for dinner together. At their dinners, they play “credit card roulette” to decide who pays the bill. This means that at each dinner, one person is chosen uniformly at random to pay the entire bill (independently of what happens at the other dinners).

(a) Find the probability that in dinners, no one will have to pay the bill more than once (do not simplify for the case , but do simplify fully for the case ).

(b) Find the expected number of dinners it takes in order for everyone to have paid at least once (you can leave your answer as a finite sum of simple-looking terms).

(c) Alice and Bob are two of the friends. Find the covariance between how many times Alice pays and how many times Bob pays in dinners.


9.45

As in the previous problem, a group of friends play “credit card roulette” at their dinners. In this problem, let the number of dinners be a r.v.

(a) Alice is one of the friends. Find the correlation between how many dinners Alice pays for and how many free dinners Alice gets.

(b) The costs of the dinners are i.i.d. r.v.s, independent of the number of dinners. Find the mean and variance of the total cost.


9.46

Joe will read books next year. Each book has a number of pages, with book lengths independent of each other and independent of .

(a) Find the expected number of book pages that Joe will read next year.

(b) Find the variance of the number of book pages that Joe will read next year.

(c) For each of the books, Joe likes it with probability and dislikes it with probability , independently. Find the conditional distribution of how many of the books Joe likes, given that he dislikes exactly of the books.


9.47

Buses arrive at a certain bus stop according to a Poisson process of rate . Each bus has seats and, at the instant when it arrives at the stop, has a number of passengers. Assume that the numbers of passengers on different buses are independent of each other, and independent of the arrival times of the buses.

Let be the number of buses that arrive in the time interval , and be the total number of passengers on the buses that arrive in the time interval .

(a) Find the mean and variance of .

(b) Find the mean and variance of .

(c) A bus is full if it has exactly passengers when it arrives at the stop. Find the probability that exactly buses arrive in , of which are full and are not full.


9.48

Paul and other runners compete in a marathon. Their times are independent continuous r.v.s with CDF .

(a) For , let be the event that anonymous runner completes the race faster than Paul. Explain whether the events are independent, and whether they are conditionally independent given Paul’s time to finish the race.

(b) For the rest of this problem, let be the number of runners who finish faster than Paul. Find . (Your answer should depend only on , since Paul’s time is an r.v.)

(c) Find the conditional distribution of , given that Paul’s time to finish the marathon is .

(d) Find . (Your answer should depend only on , since Paul’s time is an r.v.)

Hint: Let be Paul’s time, and use Eve’s law to condition on . Alternatively, use indicator r.v.s.


9.49

Emails arrive in an inbox according to a Poisson process of rate emails per hour.

(a) Find the name and parameters of the conditional distribution of the number of emails that arrive in the first 2 hours of an 8-hour time period, given that exactly emails arrive in that time period.

(b) Each email is legitimate with probability and spam with probability , independently. Find the name and parameters of the conditional distribution of the number of legitimate emails that arrive in an 8-hour time period, given that exactly spams arrived in that time period.

(c) Reading an email takes a random amount of time, with mean hours and standard deviation hours. These reading times are i.i.d. and independent of the email arrival process. Find the (unconditional) mean and variance of the total time it takes to read all the emails that arrive in an 8-hour time period.


9.50

An actuary wishes to estimate various quantities related to the number of insurance claims and the dollar amounts of those claims for someone named Fred. Suppose that Fred will make claims next year, where . But is unknown, so the actuary, taking a Bayesian approach, gives a prior distribution based on past experience. Specifically, the prior is . The dollar amount of a claim is Log-Normal with parameters and (here and are the mean and variance of the underlying Normal), with and known. The dollar amounts of the claims are i.i.d. and independent of .

(a) Find and using properties of conditional expectation (your answers should not depend on , since is unknown and being treated as an r.v.).

(b) Find the mean and variance of the total dollar amount of all the claims.

(c) Find the distribution of . If it is a named distribution we have studied, give its name and parameters.

(d) Find the posterior distribution of , given that it is observed that Fred makes claims next year. If it is a named distribution we have studied, give its name and parameters.


9.51

Stat110 solution available.

Let be independent with (so with possibly different rates). Recall from Chapter 7 that

(a) Find in terms of .

(b) Find , the probability that the first of the three Exponentials is the smallest.

Hint: Restate this in terms of and .

(c) For the case , find the PDF of . Is this one of the important distributions we have studied?


9.52

Stat110 solution available.

A task is randomly assigned to one of two people (with probability for each person). If assigned to the first person, the task takes an length of time to complete (measured in hours), while if assigned to the second person it takes an length of time to complete (independent of how long the first person would have taken). Let be the time taken to complete the task.

(a) Find the mean and variance of .

(b) Suppose instead that the task is assigned to both people, and let be the time taken to complete it (by whoever completes it first, with the two people working independently). It is observed that after 24 hours, the task has not yet been completed. Conditional on this information, what is the expected value of ?


9.53

Suppose for this problem that “true IQ” is a meaningful concept rather than a reified social construct. Suppose that in the U.S. population, the distribution of true IQs is Normal with mean 100 and SD 15. A person is chosen at random from this population to take an IQ test. The test is a noisy measure of true ability: it’s correct on average but has a Normal measurement error with SD 5.

Let be the person’s true IQ, viewed as a random variable, and let be her score on the IQ test. Then we have

and

(a) Find the unconditional mean and variance of .

(b) Find the marginal distribution of . One way is via the MGF.

(c) Find .


9.54

Stat110 solution available.

A certain genetic characteristic is of interest. It can be measured numerically. Let and be the values of the genetic characteristic for two twin boys. Given that they are identical twins, and has mean 0 and variance ; given that they are fraternal twins, and have mean 0, variance , and correlation . The probability that the twins are identical is . Find in terms of .


9.55

Stat110 solution available.

The Mass Cash lottery randomly chooses 5 of the numbers from each day (without repetitions within the choice of 5 numbers). Suppose that we want to know how long it will take until all numbers have been chosen. Let be the average number of additional days needed if we are missing numbers (so and is the average number of days needed to collect all 35 numbers). Find a recursive formula for the .


9.56

Two chess players, Vishy and Magnus, play a series of games. Given , the game results are i.i.d. with probability of Vishy winning, and probability of Magnus winning (assume that each game ends in a win for one of the two players). But is unknown, so we will treat it as an r.v. To reflect our uncertainty about , we use the prior , where and are known positive integers and .

(a) Find the expected number of games needed in order for Vishy to win a game (including the win). Simplify fully; your final answer should not use factorials or .

(b) Explain in terms of independence vs. conditional independence the direction of the inequality between the answer to (a) and for .

(c) Find the conditional distribution of given that Vishy wins exactly 7 out of the first 10 games.


9.57

Laplace’s law of succession says that if are conditionally independent r.v.s given , but is given a prior to reflect ignorance about its value, then

As an example, Laplace discussed the problem of predicting whether the sun will rise tomorrow, given that the sun did rise every time for all days of recorded history; the above formula then gives as the probability of the sun rising tomorrow (of course, assuming independent trials with unchanging over time may be a very unreasonable model for the sunrise problem).

(a) Find the posterior distribution of given , and show that it only depends on the sum of the (so we only need the one-dimensional quantity to obtain the posterior distribution, rather than needing all data points).

(b) Prove Laplace’s law of succession, using a form of the law of total probability to find by conditioning on . (The next exercise, which is closely related, involves an equivalent Adam’s law proof.)


9.58

Two basketball teams, A and B, play an game match. Let be the indicator of team A winning the th game. Given , the r.v.s are i.i.d. with . But is unknown, so we will treat it as an r.v. Let the prior distribution be , and let be the number of wins for team A.

(a) Find and .

(b) Use Adam’s law to find the probability that team A will win game , given that they win exactly of the first games. (The previous exercise, which is closely related, involves an equivalent LOTP proof.)

Hint: Letting be the event that team A wins exactly of the first games,

(c) Find the PMF of . (There are various ways to do this, including a very fast way to see it based on results from earlier chapters.)

(d) The Putnam exam from 2002 posed the following problem:

Shanille O’Keal shoots free throws on a basketball court. She hits the first and misses the second, and thereafter the [conditional] probability that she hits the next shot is equal to the proportion of shots she has hit so far. What is the probability she hits exactly 50 of her first 100 shots?

Solve this Putnam problem by applying the result of Part (c). Be sure to explain why it is valid to apply that result, despite the fact that the Putnam problem does not seem to be using the same model, e.g., it does not mention a prior distribution, let alone mention a prior.


9.59

Let , with . So has a Beta-Binomial distribution, as mentioned in Story 8.3.3 and Example 8.5.3. Find and .


9.60

An election is being held. There are two candidates, A and B, and there are voters. The probability of voting for Candidate A varies by city. There are cities, labeled . The th city has voters, so . Let be the number of people in the th city who vote for Candidate A, with . To reflect our uncertainty about the probability of voting in each city, we treat as r.v.s, with prior distribution asserting that they are i.i.d. . Assume that are independent, both unconditionally and conditional on . Let be the total number of votes for Candidate A.

(a) Find the marginal distribution of and the posterior distribution of .

(b) Find and in terms of and , where .