Conditional expectation

Prev: Transformations Next: Inequalities and limit theorems

Problems

Exercises marked with s have detailed solutions at http://stat110.net.

Conditional expectation given an event

9.1

Fred wants to travel from Blotchville to Blissville, and is deciding between 3 options (involving different routes or different forms of transportation). The $j$ th option would take an average of $μ_{j}$ hours, with a standard deviation of $σ_{j}$ hours. Fred randomly chooses between the 3 options, with equal probabilities. Let $T$ be how long it takes for him to get from Blotchville to Blissville.

(a) Find $E (T)$ . Is it simply $(μ_{1} + μ_{2} + μ_{3}) /3$ , the average of the expectations?

(b) Find $Var (T)$ . Is it simply $(σ_{1}^{2} + σ_{2}^{2} + σ_{3}^{2}) /3$ , the average of the variances?

9.2

While Fred is sleeping one night, $X$ legitimate emails and $Y$ spam emails are sent to him. Suppose that $X$ and $Y$ are independent, with $X \sim Pois (10)$ and $Y \sim Pois (40)$ . When he wakes up, he observes that he has 30 new emails in his inbox. Given this information, what is the expected value of how many new legitimate emails he has?

9.3

A group of 21 women and 14 men are enrolled in a medical study. Each of them has a certain disease with probability $p$ , independently. It is then found (through extremely reliable testing) that exactly 5 of the people have the disease. Given this information, what is the expected number of women who have the disease?

9.4

A researcher studying crime is interested in how often people have gotten arrested. Let $X \sim Pois (λ)$ be the number of times that a random person got arrested in the last 10 years. However, data from police records are being used for the researcher’s study, and people who were never arrested in the last 10 years do not appear in the records. In other words, the police records have a selection bias: they only contain information on people who have been arrested in the last 10 years.

So averaging the numbers of arrests for people in the police records does not directly estimate $E (X)$ ; it makes more sense to think of the police records as giving us information about the conditional distribution of how many times a person was arrested, given that the person was arrested at least once in the last 10 years. The conditional distribution of $X$ , given that $X \geq 1$ , is called a truncated Poisson distribution (see Exercise 14 from Chapter 3 for another example of this distribution).

(a) Find $E (X ∣ X \geq 1)$ .

(b) Find $Var (X ∣ X \geq 1)$ .

9.5

A fair 20-sided die is rolled repeatedly, until a gambler decides to stop. The gambler pays $1 per roll, and receives the amount shown on the die when the gambler stops (e.g., if the die is rolled 7 times and the gambler decides to stop then, with an 18 as the value of the last roll, then the net payoff is $18 - $7 = $11). Suppose the gambler uses the following strategy: keep rolling until a value of $m$ or greater is obtained, and then stop (where $m$ is a fixed integer between 1 and 20).

(a) What is the expected net payoff?

Hint: The average of consecutive integers $a, a + 1, \dots, a + n$ is the same as the average of the first and last of these. See the math appendix for more information about series.

(b) Use R or other software to find the optimal value of $m$ .

9.6

Let $X \sim Expo (λ)$ . Find $E (X ∣ X < 1)$ in two different ways:

(a) by calculus, working with the conditional PDF of $X$ given $X < 1$ ;

(b) without calculus, by expanding $E (X)$ using the law of total expectation.

9.7

You are given an opportunity to bid on a mystery box containing a mystery prize. The value of the prize is completely unknown, except that it is worth at least nothing, and at most a million dollars. So the true value $V$ of the prize is considered to be Uniform on $[0, 1]$ (measured in millions of dollars).

You can choose to bid any nonnegative amount $b$ (in millions of dollars). If $b < V /4$ , then your bid is rejected and nothing is gained or lost. If $b \geq V /4$ , then your bid is accepted and your net payoff is $V - b$ (since you pay $b$ to get a prize worth $V$ ).

Find your expected payoff as a function of $b$ (be sure to specify it for all $b \geq 0$ ). Then find the optimal bid $b$ , to maximize your expected payoff.

9.8

Stat110 solution available.

You get to choose between two envelopes, each of which contains a check for some positive amount of money. Unlike in the two-envelope paradox, it is not given that one envelope contains twice as much money as the other envelope. Instead, assume that the two values were generated independently from some distribution on the positive real numbers, with no information given about what that distribution is.

After picking an envelope, you can open it and see how much money is inside (call this value $x$ ), and then you have the option of switching. As no information has been given about the distribution, it may seem impossible to have better than a 50% chance of picking the better envelope. Intuitively, we may want to switch if $x$ is “small” and not switch if $x$ is “large”, but how do we define “small” and “large” in the grand scheme of all possible distributions? [The last sentence was a rhetorical question.]

Consider the following strategy for deciding whether to switch. Generate a threshold $T \sim Expo (1)$ , and switch envelopes if and only if the observed value $x$ is less than the value of $T$ . Show that this strategy succeeds in picking the envelope with more money with probability strictly greater than $1/2$ .

Hint: Let $t$ be the value of $T$ (generated by a random draw from the $Expo (1)$ distribution). First explain why the strategy works very well if $t$ happens to be in between the two envelope values, and does no harm in any case (i.e., there is no case in which the strategy succeeds with probability strictly less than $1/2$ ).

9.9

There are two envelopes, each of which has a check for a $Unif (0, 1)$ amount of money, measured in thousands of dollars. The amounts in the two envelopes are independent. You get to choose an envelope and open it, and then you can either keep that amount or switch to the other envelope and get whatever amount is in that envelope.

Suppose that you use the following strategy: choose an envelope and open it. If you observe $U$ , then stick with that envelope with probability $U$ , and switch to the other envelope with probability $1 - U$ .

(a) Find the probability that you get the larger of the two amounts.

(b) Find the expected value of what you will receive.

9.10

Suppose $n$ people are bidding on a mystery prize that is up for auction. The bids are to be submitted in secret, and the individual who submits the highest bid wins the prize. The $i$ th bidder receives a signal $X_{i}$ , with $X_{1}, \dots, X_{n}$ i.i.d. The value of the prize, $V$ , is defined to be the sum of the individual bidders’ signals:

V = X_{1} + \dots + X_{n} .

This is known in economics as the wallet game: we can imagine that the $n$ people are bidding on the total amount of money in their wallets, and each person’s signal is the amount of money in their own wallet. Of course, the wallet is a metaphor; the game can also be used to model company takeovers, where each of two companies bids to take over the other, and a company knows its own value but not the value of the other company. For this problem, assume the $X_{i}$ are i.i.d. $Unif (0, 1)$ .

(a) Before receiving her signal, what is bidder 1’s unconditional expectation for $V$ ?

(b) Conditional on receiving the signal $X_{1} = x_{1}$ , what is bidder 1’s expectation for $V$ ?

(c) Suppose each bidder submits a bid equal to their conditional expectation for $V$ , i.e., bidder $i$ bids $E (V ∣ X_{i} = x_{i})$ . Conditional on receiving the signal $X_{1} = x_{1}$ and winning the auction, what is bidder 1’s expectation for $V$ ? Explain intuitively why this quantity is always less than the quantity calculated in (b).

9.11

Stat110 solution available.

A coin with probability $p$ of Heads is flipped repeatedly. For (a) and (b), suppose that $p$ is a known constant, with $0 < p < 1$ .

(a) What is the expected number of flips until the pattern HT is observed?

(b) What is the expected number of flips until the pattern HH is observed?

(c) Now suppose that $p$ is unknown, and that we use a $Beta (a, b)$ prior to reflect our uncertainty about $p$ (where $a$ and $b$ are known constants and are greater than 2). In terms of $a$ and $b$ , find the corresponding answers to (a) and (b) in this setting.

9.12

A coin with probability $p$ of Heads is flipped repeatedly, where $0 < p < 1$ . The sequence of outcomes can be divided into runs (blocks of H’s or blocks of T’s), e.g., HHHTTTTHTTTHH becomes HHH TTTT H TTT HH, which has 5 runs, with lengths $3, 4, 1, 3, 2$ , respectively. Assume that the coin is flipped at least until the start of the third run.

(a) Find the expected length of the first run.

(b) Find the expected length of the second run.

9.13

A fair 6-sided die is rolled once. Find the expected number of additional rolls needed to obtain a value at least as large as that of the first roll.

9.14

A fair 6-sided die is rolled repeatedly.

(a) Find the expected number of rolls needed to get a 1 followed right away by a 2.

Hint: Start by conditioning on whether or not the first roll is a 1.

(b) Find the expected number of rolls needed to get two consecutive 1’s.

(c) Let $a_{n}$ be the expected number of rolls needed to get the same value $n$ times in a row (i.e., to obtain a streak of $n$ consecutive $j$ ’s for some not-specified-in-advance value of $j$ ). Find a recursive formula for $a_{n + 1}$ in terms of $a_{n}$ .

Hint: Divide the time until there are $n + 1$ consecutive appearances of the same value into two pieces: the time until there are $n$ consecutive appearances, and the rest.

(d) Find a simple, explicit formula for $a_{n}$ for all $n \geq 1$ . What is $a_{7}$ (numerically)?

Conditional expectation given a random variable

9.15

Stat110 solution available.

Let $X_{1}, X_{2}$ be i.i.d., and let $\overset{ˉ}{X} = \frac{1}{2} (X_{1} + X_{2})$ be the sample mean. In many statistics problems, it is useful or important to obtain a conditional expectation given $\overset{ˉ}{X}$ . As an example of this, find $E (w_{1} X_{1} + w_{2} X_{2} ∣ \overset{ˉ}{X})$ , where $w_{1}, w_{2}$ are constants with $w_{1} + w_{2} = 1$ .

9.16

Let $X_{1}, X_{2}, \dots$ be i.i.d. r.v.s with mean 0, and let $S_{n} = X_{1} + \dots + X_{n}$ . As shown in Example 9.3.6, the expected value of the first term given the sum of the first $n$ terms is

E (X_{1} ∣ S_{n}) = \frac{S _{n}}{n} .

Generalize this result by finding $E (S_{k} ∣ S_{n})$ for all positive integers $k$ and $n$ .

9.17

Stat110 solution available.

Consider a group of $n$ roommate pairs at a college (so there are $2 n$ students). Each of these $2 n$ students independently decides randomly whether to take a certain course, with probability $p$ of success (where “success” is defined as taking the course).

Let $N$ be the number of students among these $2 n$ who take the course, and let $X$ be the number of roommate pairs where both roommates in the pair take the course. Find $E (X)$ and $E (X ∣ N)$ .

9.18

Stat110 solution available.

Show that

E ((Y - E (Y ∣ X))^{2} ∣ X) = E (Y^{2} ∣ X) - (E (Y ∣ X))^{2},

so these two expressions for $Var (Y ∣ X)$ agree.

Hint for the variance: Adding a constant (or something acting as a constant) does not affect variance.

9.19

Let $X$ be the height of a randomly chosen adult man, and $Y$ be his father’s height, where $X$ and $Y$ have been standardized to have mean 0 and standard deviation 1. Suppose that $(X, Y)$ is Bivariate Normal, with $X, Y \sim N (0, 1)$ and $Corr (X, Y) = ρ$ .

(a) Let $y = a x + b$ be the equation of the best line for predicting $Y$ from $X$ (in the sense of minimizing the mean squared error), e.g., if we were to observe $X = 1.3$ then we would predict that $Y$ is $1.3 a + b$ . Now suppose that we want to use $Y$ to predict $X$ , rather than using $X$ to predict $Y$ . Give and explain an intuitive guess for what the slope is of the best line for predicting $X$ from $Y$ .

(b) Find a constant $c$ (in terms of $ρ$ ) and an r.v. $V$ such that $Y = c X + V$ , with $V$ independent of $X$ .

Hint: Start by finding $c$ such that $Cov (X, Y - c X) = 0$ .

(d) Find $E (Y ∣ X)$ and $E (X ∣ Y)$ .

(e) Reconcile (a) and (d), if your intuitive guess in (a) differed from what the results of (d) implied. Give a clear and correct intuitive explanation of the relationship between the slope of the best line for predicting $Y$ from $X$ and the slope of the best line for predicting $X$ from $Y$ .

9.20

Let $X \sim Mult_{5} (n, p)$ .

(a) Find $E (X_{1} ∣ X_{2})$ and $Var (X_{1} ∣ X_{2})$ .

(b) Find $E (X_{1} ∣ X_{2} + X_{3})$ .

9.21

Let $Y$ be a discrete r.v., $A$ be an event with $0 < P (A) < 1$ , and $I_{A}$ be the indicator r.v. for $A$ .

(a) Explain precisely how the r.v. $E (Y ∣ I_{A})$ relates to the numbers $E (Y ∣ A)$ and $E (Y ∣ A^{c})$ .

(b) Show that $E (Y ∣ A) = E (Y I_{A}) / P (A)$ , directly from the definitions of expectation and conditional expectation.

Hint: Let $X = Y I_{A}$ , and then find an expression for the PMF of $X$ .

E (Y) = E (Y ∣ A) P (A) + E (Y ∣ A^{c}) P (A^{c}) .

9.22

Show that the following version of LOTP, which we encountered in Section 7.1, is also a consequence of Adam’s law: for any event $A$ and continuous r.v. $X$ with PDF $f_{X}$ ,

P (A) = \int_{- \infty}^{\infty} P (A ∣ X = x) f_{X} (x) d x .

Hint: Consider $E (I (A) ∣ X = x)$ .

9.23

Stat110 solution available.

Let $X$ and $Y$ be random variables with finite variances, and let $W = Y - E (Y ∣ X)$ . This is a residual: the difference between the true value of $Y$ and the predicted value of $Y$ based on $X$ .

(a) Compute $E (W)$ and $E (W ∣ X)$ .

(b) Compute $Var (W)$ , for the case that $W ∣ X \sim N (0, X^{2})$ with $X \sim N (0, 1)$ .

9.24

Stat110 solution available.

One of two identical-looking coins is picked from a hat randomly, where one coin has probability $p_{1}$ of Heads and the other has probability $p_{2}$ of Heads. Let $X$ be the number of Heads after flipping the chosen coin $n$ times. Find the mean and variance of $X$ .

9.25

Kelly makes a series of $n$ bets, each of which she has probability $p$ of winning, independently. Initially, she has $x_{0}$ dollars. Let $X_{j}$ be the amount she has immediately after her $j$ th bet is settled. Let $f$ be a constant in $(0, 1)$ , called the betting fraction. On each bet, Kelly wagers a fraction $f$ of her wealth, and then she either wins or loses that amount. For example, if her current wealth is $100 and $f = 0.25$ , then she bets $25 and either gains or loses that amount. (A famous choice when $p > 1/2$ is $f = 2 p - 1$ , which is known as the Kelly criterion.) Find $E (X_{n})$ (in terms of $n, p, f, x_{0}$ ).

Hint: First find $E (X_{j + 1} ∣ X_{j})$ .

9.26

Let $N \sim Pois (λ_{1})$ be the number of movies that will be released next year. Suppose that for each movie the number of tickets sold is $Pois (λ_{2})$ , independent of other movies and of $N$ . Find the mean and variance of the number of movie tickets that will be sold next year.

9.27

A party is being held from 8:00 pm to midnight on a certain night, and $N \sim Pois (λ)$ people are going to show up. They will all arrive at uniformly random times while the party is going on, independently of each other and of $N$ .

(a) Find the expected time at which the first person arrives, given that at least one person shows up. Give both an exact answer in terms of $λ$ , measured in minutes after 8:00 pm, and an answer rounded to the nearest minute for $λ = 20$ , expressed in time notation (e.g., 8:20 pm).

(b) Find the expected time at which the last person arrives, given that at least one person shows up. As in (a), give both an exact answer and an answer rounded to the nearest minute for $λ = 20$ .

9.28

Stat110 solution available.

We wish to estimate an unknown parameter $θ$ , based on an r.v. $X$ we will get to observe. As in the Bayesian perspective, assume that $X$ and $θ$ have a joint distribution. Let $\hat{θ}$ be the estimator (which is a function of $X$ ). Then $\hat{θ}$ is said to be unbiased if $E (\hat{θ} ∣ θ) = θ$ , and $\hat{θ}$ is said to be the Bayes procedure if $E (θ ∣ X) = \hat{θ}$ .

(a) Let $\hat{θ}$ be unbiased. Find $E (\hat{θ} - θ)^{2}$ (the average squared difference between the estimator and the true value of $θ$ ), in terms of marginal moments of $\hat{θ}$ and $θ$ .

Hint: Condition on $θ$ .

(b) Repeat (a), except in this part suppose that $\hat{θ}$ is the Bayes procedure rather than assuming that it is unbiased.

Hint: Condition on $X$ .

(c) Show that it is impossible for $\hat{θ}$ to be both the Bayes procedure and unbiased, except in silly problems where we get to know $θ$ perfectly by observing $X$ .

Hint: If $Y$ is a nonnegative r.v. with mean 0, then $P (Y = 0) = 1$ .

9.29

Show that if $E (Y ∣ X) = c$ is a constant, then $X$ and $Y$ are uncorrelated.

Hint: Use Adam’s law to find $E (Y)$ and $E (X Y)$ .

9.30

Show by example that it is possible to have uncorrelated $X$ and $Y$ such that $E (Y ∣ X)$ is not a constant.

Hint: Consider a standard Normal and its square.

9.31

Stat110 solution available.

Emails arrive one at a time in an inbox. Let $T_{n}$ be the time at which the $n$ th email arrives (measured on a continuous scale from some starting point in time). Suppose that the waiting times between emails are i.i.d. $Expo (λ)$ , i.e., $T_{1}, T_{2} - T_{1}, T_{3} - T_{2}, \dots$ are i.i.d. $Expo (λ)$ .

Each email is non-spam with probability $p$ , and spam with probability $q = 1 - p$ (independently of the other emails and of the waiting times). Let $X$ be the time at which the first non-spam email arrives (so $X$ is a continuous r.v., with $X = T_{1}$ if the 1st email is non-spam, $X = T_{2}$ if the 1st email is spam but the 2nd one isn’t, etc.).

(a) Find the mean and variance of $X$ .

(b) Find the MGF of $X$ . What famous distribution does this imply that $X$ has (be sure to state its parameter values)?

Hint for both parts: Let $N$ be the number of emails until the first non-spam (including that one), and write $X$ as a sum of $N$ terms; then condition on $N$ .

9.32

Customers arrive at a store according to a Poisson process of rate $λ$ customers per hour. Each makes a purchase with probability $p$ , independently. Given that a customer makes a purchase, the amount spent has mean $μ$ (in dollars) and variance $σ^{2}$ .

(a) Find the mean and variance of how much a random customer spends (note that the customer may spend nothing).

(b) Find the mean and variance of the revenue the store obtains in an 8-hour time interval, using (a) and results from this chapter.

(c) Find the mean and variance of the revenue the store obtains in an 8-hour time interval, using the chicken-egg story and results from this chapter.

9.33

Fred’s beloved computer will last an $Expo (λ)$ amount of time until it has a malfunction. When that happens, Fred will try to get it fixed. With probability $p$ , he will be able to get it fixed. If he is able to get it fixed, the computer is good as new again and will last an additional, independent $Expo (λ)$ amount of time until the next malfunction (when again he is able to get it fixed with probability $p$ , and so on). If after any malfunction Fred is unable to get it fixed, he will buy a new computer. Find the expected amount of time until Fred buys a new computer. (Assume that the time spent on computer diagnosis, repair, and shopping is negligible.)

9.34

A green die is rolled until it lands 1 for the first time. An orange die is rolled until it lands 6 for the first time. The dice are fair, six-sided dice. Let $T_{1}$ be the sum of the values of the rolls of the green die (including the 1 at the end) and $T_{6}$ be the sum of the values of the rolls of the orange die (including the 6 at the end). Two students are debating whether $E (T_{1}) = E (T_{6})$ or $E (T_{1}) < E (T_{6})$ . They kindly gave permission to quote their arguments here.

Student A: We have $E (T_{1}) = E (T_{6})$ . By Adam’s law, the expected sum of the rolls of a die is the expected number of rolls times the expected value of one roll, and each of these factors is the same for the two dice. In more detail, let $N_{1}$ be the number of rolls of the green die and $N_{6}$ be the number of rolls of the orange die. By Adam’s law and linearity,

E (T_{1}) = E (E (T_{1} ∣ N_{1})) = E (3.5 N_{1}) = 3.5 E (N_{1}),

and the same method applied to the orange die gives $3.5 E (N_{6})$ , which equals $3.5 E (N_{1})$ .

Student B: Actually, $E (T_{1}) < E (T_{6})$ . I agree that the expected number of rolls is the same for the two dice, but the key difference is that we know the last roll is a 1 for the green die and a 6 for the orange die. The expected totals are the same for the two dice excluding the last roll of each, and then including the last roll makes $E (T_{1}) < E (T_{6})$ .

(a) Discuss in words the extent to which Student A’s argument is convincing and correct.

(b) Discuss in words the extent to which Student B’s argument is convincing and correct.

9.35

Stat110 solution available.

Judit plays in a total of $N \sim Geom (s)$ chess tournaments in her career. Suppose that in each tournament she has probability $p$ of winning the tournament, independently. Let $T$ be the number of tournaments she wins in her career.

(a) Find the mean and variance of $T$ .

(b) Find the MGF of $T$ . What is the name of this distribution (with its parameters)?

9.36

In Story 8.4.5, we showed (among other things) that if $λ \sim Gamma (r_{0}, b_{0})$ and $Y ∣ λ \sim Pois (λ)$ , then the marginal distribution of $Y$ is $NBin (r_{0}, b_{0} / (b_{0} + 1))$ . Derive this result using Adam’s law and MGFs.

Hint: Consider the conditional MGF of $Y ∣ λ$ .

9.37

Let $X_{1}, \dots, X_{n}$ be i.i.d. r.v.s with mean $μ$ and variance $σ^{2}$ , and $n \geq 2$ . A bootstrap sample of $X_{1}, \dots, X_{n}$ is a sample of $n$ r.v.s $X_{1}^{*}, \dots, X_{n}^{*}$ formed from the $X_{j}$ by sampling with replacement with equal probabilities. Let $\overset{ˉ}{X}^{*}$ denote the sample mean of the bootstrap sample:

\overset{ˉ}{X}^{*} = \frac{1}{n} (X_{1}^{*} + \dots + X_{n}^{*}) .

(a) Calculate $E (X_{j}^{*})$ and $Var (X_{j}^{*})$ for each $j$ .

(b) Calculate $E (\overset{ˉ}{X}^{*} ∣ X_{1}, \dots, X_{n})$ and $Var (\overset{ˉ}{X}^{*} ∣ X_{1}, \dots, X_{n})$ .

Hint: Conditional on $X_{1}, \dots, X_{n}$ , the $X_{j}^{*}$ are independent, with a PMF that puts probability $1/ n$ at each of the points $X_{1}, \dots, X_{n}$ . As a check, your answers should be random variables that are functions of $X_{1}, \dots, X_{n}$ .

(d) Explain intuitively why $Var (\overset{ˉ}{X}) < Var (\overset{ˉ}{X}^{*})$ .

9.38

An insurance company covers disasters in two neighboring regions, $R_{1}$ and $R_{2}$ . Let $I_{1}$ and $I_{2}$ be the indicator r.v.s for whether $R_{1}$ and $R_{2}$ are hit by the insured disaster, respectively. The indicators $I_{1}$ and $I_{2}$ may be dependent. Let $p_{j} = E (I_{j})$ for $j = 1, 2$ , and $p_{12} = E (I_{1} I_{2})$ .

The company reimburses a total cost of

C = I_{1} T_{1} + I_{2} T_{2}

to these regions, where $T_{j}$ has mean $μ_{j}$ and variance $σ_{j}^{2}$ . Assume that $T_{1}$ and $T_{2}$ are independent of each other and that $(T_{1}, T_{2})$ is independent of $(I_{1}, I_{2})$ .

(a) Find $E (C)$ .

(b) Find $Var (C)$ .

9.39

Stat110 solution available.

A certain stock has low volatility on some days and high volatility on other days. Suppose that the probability of a low volatility day is $p$ and of a high volatility day is $q = 1 - p$ , and that on low volatility days the percent change in the stock price is $N (0, σ_{1}^{2})$ , while on high volatility days the percent change is $N (0, σ_{2}^{2})$ , with $σ_{1} < σ_{2}$ . Let $X$ be the percent change of the stock on a certain day. The distribution is said to be a mixture of two Normal distributions, and a convenient way to represent $X$ is as $X = I_{1} X_{1} + I_{2} X_{2}$ where $I_{1}$ is the indicator r.v. of having a low volatility day, $I_{2} = 1 - I_{1}$ , $X_{j} \sim N (0, σ_{j}^{2})$ , and $I_{1}, X_{1}, X_{2}$ are independent.

(a) Find $Var (X)$ in two ways: using Eve’s law, and by using properties of covariance to calculate $Cov (I_{1} X_{1} + I_{2} X_{2}, I_{1} X_{1} + I_{2} X_{2})$ .

(b) Recall from Chapter 6 that the kurtosis of an r.v. $Y$ with mean $μ$ and standard deviation $σ$ is defined by

Kurt (Y) = \frac{E ( Y - μ ) ^{4}}{σ ^{4}} - 3.

Find the kurtosis of $X$ (in terms of $p, q, σ_{1}^{2}, σ_{2}^{2}$ , fully simplified). The result will show that even though the kurtosis of any Normal distribution is 0, the kurtosis of $X$ is positive and in fact can be very large depending on the parameter values.

9.40

Let $X_{1}$ , $X_{2}$ , and $Y$ be random variables, such that $Y$ has finite variance. Let

A = E (Y ∣ X_{1}) and B = E (Y ∣ X_{1}, X_{2}) .

Show that

Var (A) \leq Var (B) .

Also, check that this makes sense in the extreme cases where $Y$ is independent of $X_{1}$ and where $Y = h (X_{2})$ for some function $h$ .

Hint: Use Eve’s law on $B$ .

9.41

Show that for any r.v.s $X$ and $Y$ ,

E (Y ∣ E (Y ∣ X)) = E (Y ∣ X) .

This has a nice intuitive interpretation if we think of $E (Y ∣ X)$ as the prediction we would make for $Y$ based on $X$ : given the prediction we would use for predicting $Y$ from $X$ , we no longer need to know $X$ to predict $Y$ ; we can just use the prediction we have. For example, letting $E (Y ∣ X) = g (X)$ , if we observe $g (X) = 7$ , then we may or may not know what $X$ is (since $g$ may not be one-to-one). But even without knowing $X$ , we know that the prediction for $Y$ based on $X$ is 7.

Hint: Use Adam’s law with extra conditioning.

9.42

A researcher wishes to know whether a new treatment for the disease conditionitis is more effective than the standard treatment. It is unfortunately not feasible to do a randomized experiment, but the researcher does have the medical records of patients who received the new treatment and those who received the standard treatment. She is worried, though, that doctors tend to give the new treatment to younger, healthier patients. If this is the case, then naively comparing the outcomes of patients in the two groups would be like comparing apples and oranges.

Suppose each patient has background variables $X$ , which might be age, height and weight, and measurements relating to previous health status. Let $Z$ be the indicator of receiving the new treatment. The researcher fears that $Z$ is dependent on $X$ , i.e., that the distribution of $X$ given $Z = 1$ is different from the distribution of $X$ given $Z = 0$ .

In order to compare apples to apples, the researcher wants to match every patient who received the new treatment to a patient with similar background variables who received the standard treatment. But $X$ could be a high-dimensional random vector, which often makes it very difficult to find a match with a similar value of $X$ .

The propensity score reduces the possibly high-dimensional vector of background variables down to a single number (then it is much easier to match someone to a person with a similar propensity score than to match someone to a person with a similar value of $X$ ). The propensity score of a person with background characteristics $X$ is defined as

S = E (Z ∣ X) .

By the fundamental bridge, a person’s propensity score is their probability of receiving the treatment, given their background characteristics. Show that conditional on $S$ , the treatment indicator $Z$ is independent of the background variables $X$ .

Hint: This problem relates to the previous one. Show that $P (Z = 1 ∣ S, X) = P (Z = 1 ∣ S)$ , which is equivalent to showing $E (Z ∣ S, X) = E (Z ∣ S)$ .

9.43

This exercise develops a useful identity for covariance, similar in spirit to Adam’s law for expectation and Eve’s law for variance. First define conditional covariance in a manner analogous to how we defined conditional variance:

Cov (X, Y ∣ Z) = E ((X - E (X ∣ Z)) (Y - E (Y ∣ Z)) ∣ Z) .

(a) Show that

Cov (X, Y ∣ Z) = E (X Y ∣ Z) - E (X ∣ Z) E (Y ∣ Z) .

This should be true since it is the conditional version of the fact that

Cov (X, Y) = E (X Y) - E (X) E (Y),

and conditional probabilities are probabilities, but for this problem you should prove it directly using properties of expectation and conditional expectation.

(b) ECCE, or the law of total covariance, says that

Cov (X, Y) = E (Cov (X, Y ∣ Z)) + Cov (E (X ∣ Z), E (Y ∣ Z)) .

That is, the covariance of $X$ and $Y$ is the expected value of their conditional covariance plus the covariance of their conditional expectations, where all these conditional quantities are conditional on $Z$ . Prove this identity.

Hint: We can assume without loss of generality that $E (X) = E (Y) = 0$ , since adding a constant to an r.v. has no effect on its covariance with any r.v. Then expand out the covariances on the right-hand side of the identity and apply Adam’s law.

Mixed practice

9.44

A group of $n$ friends often go out for dinner together. At their dinners, they play “credit card roulette” to decide who pays the bill. This means that at each dinner, one person is chosen uniformly at random to pay the entire bill (independently of what happens at the other dinners).

(a) Find the probability that in $k$ dinners, no one will have to pay the bill more than once (do not simplify for the case $k \leq n$ , but do simplify fully for the case $k > n$ ).

(b) Find the expected number of dinners it takes in order for everyone to have paid at least once (you can leave your answer as a finite sum of simple-looking terms).

(c) Alice and Bob are two of the friends. Find the covariance between how many times Alice pays and how many times Bob pays in $k$ dinners.

9.45

As in the previous problem, a group of $n$ friends play “credit card roulette” at their dinners. In this problem, let the number of dinners be a $Pois (λ)$ r.v.

(a) Alice is one of the friends. Find the correlation between how many dinners Alice pays for and how many free dinners Alice gets.

(b) The costs of the dinners are i.i.d. $Gamma (a, b)$ r.v.s, independent of the number of dinners. Find the mean and variance of the total cost.

9.46

Joe will read $N \sim Pois (λ)$ books next year. Each book has a $Pois (μ)$ number of pages, with book lengths independent of each other and independent of $N$ .

(a) Find the expected number of book pages that Joe will read next year.

(b) Find the variance of the number of book pages that Joe will read next year.

(c) For each of the $N$ books, Joe likes it with probability $p$ and dislikes it with probability $1 - p$ , independently. Find the conditional distribution of how many of the $N$ books Joe likes, given that he dislikes exactly $d$ of the books.

9.47

Buses arrive at a certain bus stop according to a Poisson process of rate $λ$ . Each bus has $n$ seats and, at the instant when it arrives at the stop, has a $Bin (n, p)$ number of passengers. Assume that the numbers of passengers on different buses are independent of each other, and independent of the arrival times of the buses.

Let $N_{t}$ be the number of buses that arrive in the time interval $[0, t]$ , and $X_{t}$ be the total number of passengers on the buses that arrive in the time interval $[0, t]$ .

(a) Find the mean and variance of $N_{t}$ .

(b) Find the mean and variance of $X_{t}$ .

(c) A bus is full if it has exactly $n$ passengers when it arrives at the stop. Find the probability that exactly $a + b$ buses arrive in $[0, t]$ , of which $a$ are full and $b$ are not full.

9.48

Paul and $n$ other runners compete in a marathon. Their times are independent continuous r.v.s with CDF $F$ .

(a) For $j = 1, 2, \dots, n$ , let $A_{j}$ be the event that anonymous runner $j$ completes the race faster than Paul. Explain whether the events $A_{j}$ are independent, and whether they are conditionally independent given Paul’s time to finish the race.

(b) For the rest of this problem, let $N$ be the number of runners who finish faster than Paul. Find $E (N)$ . (Your answer should depend only on $n$ , since Paul’s time is an r.v.)

(d) Find $Var (N)$ . (Your answer should depend only on $n$ , since Paul’s time is an r.v.)

Hint: Let $T$ be Paul’s time, and use Eve’s law to condition on $T$ . Alternatively, use indicator r.v.s.

9.49

Emails arrive in an inbox according to a Poisson process of rate $λ$ emails per hour.

(a) Find the name and parameters of the conditional distribution of the number of emails that arrive in the first 2 hours of an 8-hour time period, given that exactly $n$ emails arrive in that time period.

(b) Each email is legitimate with probability $p$ and spam with probability $q = 1 - p$ , independently. Find the name and parameters of the conditional distribution of the number of legitimate emails that arrive in an 8-hour time period, given that exactly $s$ spams arrived in that time period.

(c) Reading an email takes a random amount of time, with mean $μ$ hours and standard deviation $σ$ hours. These reading times are i.i.d. and independent of the email arrival process. Find the (unconditional) mean and variance of the total time it takes to read all the emails that arrive in an 8-hour time period.

9.50

An actuary wishes to estimate various quantities related to the number of insurance claims and the dollar amounts of those claims for someone named Fred. Suppose that Fred will make $N$ claims next year, where $N ∣ λ \sim Pois (λ)$ . But $λ$ is unknown, so the actuary, taking a Bayesian approach, gives $λ$ a prior distribution based on past experience. Specifically, the prior is $λ \sim Expo (1)$ . The dollar amount of a claim is Log-Normal with parameters $μ$ and $σ^{2}$ (here $μ$ and $σ^{2}$ are the mean and variance of the underlying Normal), with $μ$ and $σ^{2}$ known. The dollar amounts of the claims are i.i.d. and independent of $N$ .

(a) Find $E (N)$ and $Var (N)$ using properties of conditional expectation (your answers should not depend on $λ$ , since $λ$ is unknown and being treated as an r.v.).

(b) Find the mean and variance of the total dollar amount of all the claims.

(d) Find the posterior distribution of $λ$ , given that it is observed that Fred makes $N = n$ claims next year. If it is a named distribution we have studied, give its name and parameters.

9.51

Stat110 solution available.

Let $X_{1}, X_{2}, X_{3}$ be independent with $X_{i} \sim Expo (λ_{i})$ (so with possibly different rates). Recall from Chapter 7 that

P (X_{1} < X_{2}) = \frac{λ _{1}}{λ _{1} + λ _{2}} .

(a) Find $E (X_{1} + X_{2} + X_{3} ∣ X_{1} > 1, X_{2} > 2, X_{3} > 3)$ in terms of $λ_{1}, λ_{2}, λ_{3}$ .

(b) Find $P (X_{1} = min (X_{1}, X_{2}, X_{3}))$ , the probability that the first of the three Exponentials is the smallest.

Hint: Restate this in terms of $X_{1}$ and $min (X_{2}, X_{3})$ .

(c) For the case $λ_{1} = λ_{2} = λ_{3} = 1$ , find the PDF of $max (X_{1}, X_{2}, X_{3})$ . Is this one of the important distributions we have studied?

9.52

Stat110 solution available.

A task is randomly assigned to one of two people (with probability $1/2$ for each person). If assigned to the first person, the task takes an $Expo (λ_{1})$ length of time to complete (measured in hours), while if assigned to the second person it takes an $Expo (λ_{2})$ length of time to complete (independent of how long the first person would have taken). Let $T$ be the time taken to complete the task.

(a) Find the mean and variance of $T$ .

(b) Suppose instead that the task is assigned to both people, and let $X$ be the time taken to complete it (by whoever completes it first, with the two people working independently). It is observed that after 24 hours, the task has not yet been completed. Conditional on this information, what is the expected value of $X$ ?

9.53

Suppose for this problem that “true IQ” is a meaningful concept rather than a reified social construct. Suppose that in the U.S. population, the distribution of true IQs is Normal with mean 100 and SD 15. A person is chosen at random from this population to take an IQ test. The test is a noisy measure of true ability: it’s correct on average but has a Normal measurement error with SD 5.

Let $μ$ be the person’s true IQ, viewed as a random variable, and let $Y$ be her score on the IQ test. Then we have

Y ∣ μ \sim N (μ, 5^{2})

and

μ \sim N (100, 1 5^{2}) .

(a) Find the unconditional mean and variance of $Y$ .

(b) Find the marginal distribution of $Y$ . One way is via the MGF.

9.54

Stat110 solution available.

A certain genetic characteristic is of interest. It can be measured numerically. Let $X_{1}$ and $X_{2}$ be the values of the genetic characteristic for two twin boys. Given that they are identical twins, $X_{1} = X_{2}$ and $X_{1}$ has mean 0 and variance $σ^{2}$ ; given that they are fraternal twins, $X_{1}$ and $X_{2}$ have mean 0, variance $σ^{2}$ , and correlation $ρ$ . The probability that the twins are identical is $1/2$ . Find $Cov (X_{1}, X_{2})$ in terms of $ρ, σ^{2}$ .

9.55

Stat110 solution available.

The Mass Cash lottery randomly chooses 5 of the numbers from $1, 2, \dots, 35$ each day (without repetitions within the choice of 5 numbers). Suppose that we want to know how long it will take until all numbers have been chosen. Let $a_{j}$ be the average number of additional days needed if we are missing $j$ numbers (so $a_{0} = 0$ and $a_{35}$ is the average number of days needed to collect all 35 numbers). Find a recursive formula for the $a_{j}$ .

9.56

Two chess players, Vishy and Magnus, play a series of games. Given $p$ , the game results are i.i.d. with probability $p$ of Vishy winning, and probability $q = 1 - p$ of Magnus winning (assume that each game ends in a win for one of the two players). But $p$ is unknown, so we will treat it as an r.v. To reflect our uncertainty about $p$ , we use the prior $p \sim Beta (a, b)$ , where $a$ and $b$ are known positive integers and $a \geq 2$ .

(a) Find the expected number of games needed in order for Vishy to win a game (including the win). Simplify fully; your final answer should not use factorials or $Γ$ .

(b) Explain in terms of independence vs. conditional independence the direction of the inequality between the answer to (a) and $1 + E (G)$ for $G \sim Geom (\frac{a}{a + b})$ .

9.57

Laplace’s law of succession says that if $X_{1}, X_{2}, \dots, X_{n + 1}$ are conditionally independent $Bern (p)$ r.v.s given $p$ , but $p$ is given a $Unif (0, 1)$ prior to reflect ignorance about its value, then

P (X_{n + 1} = 1 ∣ X_{1} + \dots + X_{n} = k) = \frac{k + 1}{n + 2} .

As an example, Laplace discussed the problem of predicting whether the sun will rise tomorrow, given that the sun did rise every time for all $n$ days of recorded history; the above formula then gives $(n + 1) / (n + 2)$ as the probability of the sun rising tomorrow (of course, assuming independent trials with $p$ unchanging over time may be a very unreasonable model for the sunrise problem).

(a) Find the posterior distribution of $p$ given $X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{n} = x_{n}$ , and show that it only depends on the sum of the $x_{j}$ (so we only need the one-dimensional quantity $x_{1} + x_{2} + \dots + x_{n}$ to obtain the posterior distribution, rather than needing all $n$ data points).

(b) Prove Laplace’s law of succession, using a form of the law of total probability to find $P (X_{n + 1} = 1 ∣ X_{1} + \dots + X_{n} = k)$ by conditioning on $p$ . (The next exercise, which is closely related, involves an equivalent Adam’s law proof.)

9.58

Two basketball teams, A and B, play an $n$ game match. Let $X_{j}$ be the indicator of team A winning the $j$ th game. Given $p$ , the r.v.s $X_{1}, \dots, X_{n}$ are i.i.d. with $X_{j} ∣ p \sim Bern (p)$ . But $p$ is unknown, so we will treat it as an r.v. Let the prior distribution be $p \sim Unif (0, 1)$ , and let $X$ be the number of wins for team A.

(a) Find $E (X)$ and $Var (X)$ .

(b) Use Adam’s law to find the probability that team A will win game $j + 1$ , given that they win exactly $a$ of the first $j$ games. (The previous exercise, which is closely related, involves an equivalent LOTP proof.)

Hint: Letting $C$ be the event that team A wins exactly $a$ of the first $j$ games,

P (X_{j + 1} = 1 ∣ C) = E (X_{j + 1} ∣ C) = E (E (X_{j + 1} ∣ C, p) ∣ C) = E (p ∣ C) .

(c) Find the PMF of $X$ . (There are various ways to do this, including a very fast way to see it based on results from earlier chapters.)

(d) The Putnam exam from 2002 posed the following problem:

Shanille O’Keal shoots free throws on a basketball court. She hits the first and misses the second, and thereafter the [conditional] probability that she hits the next shot is equal to the proportion of shots she has hit so far. What is the probability she hits exactly 50 of her first 100 shots?

Solve this Putnam problem by applying the result of Part (c). Be sure to explain why it is valid to apply that result, despite the fact that the Putnam problem does not seem to be using the same model, e.g., it does not mention a prior distribution, let alone mention a $Unif (0, 1)$ prior.

9.59

Let $X ∣ p \sim Bin (n, p)$ , with $p \sim Beta (a, b)$ . So $X$ has a Beta-Binomial distribution, as mentioned in Story 8.3.3 and Example 8.5.3. Find $E (X)$ and $Var (X)$ .

9.60

An election is being held. There are two candidates, A and B, and there are $n$ voters. The probability of voting for Candidate A varies by city. There are $m$ cities, labeled $1, 2, \dots, m$ . The $j$ th city has $n_{j}$ voters, so $n_{1} + n_{2} + \dots + n_{m} = n$ . Let $X_{j}$ be the number of people in the $j$ th city who vote for Candidate A, with $X_{j} ∣ p_{j} \sim Bin (n_{j}, p_{j})$ . To reflect our uncertainty about the probability of voting in each city, we treat $p_{1}, \dots, p_{m}$ as r.v.s, with prior distribution asserting that they are i.i.d. $Unif (0, 1)$ . Assume that $X_{1}, \dots, X_{m}$ are independent, both unconditionally and conditional on $p_{1}, \dots, p_{m}$ . Let $X$ be the total number of votes for Candidate A.

(a) Find the marginal distribution of $X_{1}$ and the posterior distribution of $p_{1} ∣ (X_{1} = k_{1})$ .

(b) Find $E (X)$ and $Var (X)$ in terms of $n$ and $s$ , where $s = n_{1}^{2} + n_{2}^{2} + \dots + n_{m}^{2}$ .

Takashi's Notes

Explorer

Conditional expectation

Conditional expectation

Problems

9.1

9.2

9.3

9.4

9.5

9.6

9.7

9.8

9.9

9.10

9.11

9.12

9.13

9.14

9.15

9.16

9.17

9.18

9.19

9.20

9.21

9.22

9.23

9.24

9.25

9.26

9.27

9.28

9.29

9.30

9.31

9.32

9.33

9.34

9.35

9.36

9.37

9.38

9.39

9.40

9.41

9.42

9.43

9.44

9.45

9.46

9.47

9.48

9.49

9.50

9.51

9.52

9.53

9.54

9.55

9.56

9.57

9.58

9.59

9.60

Graph View

Table of Contents

Backlinks