Random variables and their distributions
Prev: Conditional probability Next: Expectation
Problems
Exercises marked as solved in the book have detailed solutions at http://stat110.net.
PMFs and CDFs
3.1
People are arriving at a party one at a time. While waiting for more people to arrive they entertain themselves by comparing their birthdays. Let be the number of people needed to obtain a birthday match, i.e., before person arrives no two people have the same birthday, but when person arrives there is a match. Find the PMF of .
3.2
(a) Independent Bernoulli trials are performed, with probability of success, until there has been at least one success. Find the PMF of the number of trials performed.
(b) Independent Bernoulli trials are performed, with probability of success, until there has been at least one success and at least one failure. Find the PMF of the number of trials performed.
3.3
Let be an r.v. with CDF , and , where and are real numbers with . (Then is called a location-scale transformation of ; we will encounter this concept many times in Chapter 5 and beyond.) Find the CDF of , in terms of .
3.4
Let be a positive integer and
for , for , and for , where is the greatest integer less than or equal to . Show that is a CDF, and find the PMF that it corresponds to.
3.5
(a) Show that for is a valid PMF for a discrete r.v.
(b) Find the CDF of a random variable with the PMF from (a).
3.6
Stat110 solution available.
Benford’s law states that in a very large variety of real-life data sets, the first digit approximately follows a particular distribution with about a 30% chance of a 1, an 18% chance of a 2, and in general
where is the first digit of a randomly chosen element. Check that this is a valid PMF (using properties of logs, not with a calculator).
3.7
Bob is playing a video game that has 7 levels. He starts at level 1, and has probability of reaching level 2. In general, given that he reaches level , he has probability of reaching level , for . Let be the highest level that he reaches. Find the PMF of (in terms of ).
3.8
There are 100 prizes, with one worth $1, one worth $2, …, and one worth $100. There are 100 boxes, each of which contains one of the prizes. You get 5 prizes by picking random boxes one at a time, without replacement. Find the PMF of how much your most valuable prize is worth (as a simple expression in terms of binomial coefficients).
3.9
Let and be CDFs, , and for all .
(a) Show directly that has the properties of a valid CDF (see Theorem 3.6.3). The distribution defined by is called a mixture of the distributions defined by and .
(b) Consider creating an r.v. in the following way. Flip a coin with probability of Heads. If the coin lands Heads, generate an r.v. according to ; if the coin lands Tails, generate an r.v. according to . Show that the r.v. obtained in this way has CDF .
3.10
(a) Is there a discrete distribution with support , such that the value of the PMF at is proportional to ?
Hint: See the math appendix for a review of some facts about series.
(b) Is there a discrete distribution with support , such that the value of the PMF at is proportional to ?
3.11
Stat110 solution available.
Let be an r.v. whose possible values are , with CDF . In some countries, rather than using a CDF, the convention is to use the function defined by to specify a distribution. Find a way to convert from to , i.e., if is a known function, show how to obtain for all real .
3.12
(a) Give an example of r.v.s and such that for all , where the inequality is strict for some . Here is the CDF of and is the CDF of . For the example you gave, sketch the CDFs of both and on the same axes. Then sketch their PMFs on a second set of axes.
(b) In Part (a), you found an example of two different CDFs where the first is less than or equal to the second everywhere. Is it possible to find two different PMFs where the first is less than or equal to the second everywhere? In other words, find discrete r.v.s and such that for all , where the inequality is strict for some , or show that it is impossible to find such r.v.s.
3.13
Let , , be discrete r.v.s such that and have the same conditional distribution given , i.e., for all and we have
Show that and have the same distribution (unconditionally, not just when given ).
3.14
Let be the number of purchases that Fred will make on the online site for a certain company (in some specified time period). Suppose that the PMF of is for . This distribution is called the Poisson distribution with parameter , and it will be studied extensively in later chapters.
(a) Find and without summing infinite series.
(b) Suppose that the company only knows about people who have made at least one purchase on their site (a user sets up an account to make a purchase, but someone who has never made a purchase there doesn’t appear in the customer database). If the company computes the number of purchases for everyone in their database, then these data are draws from the conditional distribution of the number of purchases, given that at least one purchase is made. Find the conditional PMF of given . (This conditional distribution is called a truncated Poisson distribution.)
Named distributions
3.15
Find the CDF of an r.v. .
3.16
Let , and be a nonempty subset of . Find the conditional distribution of , given that is in .
3.17
An airline overbooks a flight, selling more tickets for the flight than there are seats on the plane (figuring that it’s likely that some people won’t show up). The plane has 100 seats, and 110 people have booked the flight. Each person will show up for the flight with probability 0.9, independently. Find the probability that there will be enough seats for everyone who shows up for the flight.
3.18
Stat110 solution available.
(a) In the World Series of baseball, two teams (call them A and B) play a sequence of games against each other, and the first team to win four games wins the series. Let be the probability that A wins an individual game, and assume that the games are independent. What is the probability that team A wins the series?
(b) Give a clear intuitive explanation of whether the answer to (a) depends on whether the teams always play 7 games (and whoever wins the majority wins the series), or the teams stop playing more games as soon as one team has won 4 games (as is actually the case in practice: once the match is decided, the two teams do not keep playing more games).
3.19
In a chess tournament, games are being played, independently. Each game ends in a win for one player with probability 0.4 and ends in a draw (tie) with probability 0.6. Find the PMFs of the number of games ending in a draw, and of the number of players whose games end in draws.
3.20
Suppose that a lottery ticket has probability of being a winning ticket, independently of other tickets. A gambler buys 3 tickets, hoping this will triple the chance of having at least one winning ticket.
(a) What is the distribution of how many of the 3 tickets are winning tickets?
(b) Show that the probability that at least 1 of the 3 tickets is winning is , in two different ways: by using inclusion-exclusion, and by taking the complement of the desired event and then using the PMF of a certain named distribution.
(c) Show that the gambler’s chances of having at least one winning ticket do not quite triple (compared with buying only one ticket), but that they do approximately triple if is small.
3.21
Stat110 solution available.
Let and , independent of . Show that is not Binomial.
3.22
There are two coins, one with probability of Heads and the other with probability of Heads. One of the coins is randomly chosen (with equal probabilities for the two coins). It is then flipped times. Let be the number of times it lands Heads.
(a) Find the PMF of .
(b) What is the distribution of if ?
(c) Give an intuitive explanation of why is not Binomial for (its distribution is called a mixture of two Binomials). You can assume that is large for your explanation, so that the frequentist interpretation of probability can be applied.
3.23
There are people eligible to vote in a certain election. Voting requires registration. Decisions are made independently. Each of the people will register with probability . Given that a person registers, they will vote with probability . Given that a person votes, they will vote for Kodos (who is one of the candidates) with probability . What is the distribution of the number of votes for Kodos (give the PMF, fully simplified, or the name of the distribution, including its parameters)?
3.24
Let be the number of Heads in 10 fair coin tosses.
(a) Find the conditional PMF of , given that the first two tosses both land Heads.
(b) Find the conditional PMF of , given that at least two tosses land Heads.
3.25
Stat110 solution available.
Alice flips a fair coin times and Bob flips another fair coin times, resulting in independent and .
(a) Show that .
(b) Compute .
Hint: Use (a) and the fact that and are integer-valued.
3.26
If , what is the distribution of ? Give a short proof.
3.27
Recall de Montmort’s matching problem from Chapter 1: in a deck of cards labeled 1 through , a match occurs when the number on the card matches the card’s position in the deck. Let be the number of matching cards. Is Binomial? Is Hypergeometric?
3.28
Stat110 solution available.
There are eggs, each of which hatches a chick with probability (independently). Each of these chicks survives with probability , independently. What is the distribution of the number of chicks that hatch? What is the distribution of the number of chicks that survive? (Give the PMFs; also give the names of the distributions and their parameters, if applicable.)
3.29
Stat110 solution available.
A sequence of independent experiments is performed. Each experiment is a success with probability and a failure with probability . Show that conditional on the number of successes, all valid possibilities for the list of outcomes of the experiment are equally likely.
3.30
A certain company has employees, consisting of women and men. The company is deciding which employees to promote.
(a) Suppose for this part that the company decides to promote employees, where , by choosing random employees (with equal probabilities for each set of employees). What is the distribution of the number of women who get promoted?
(b) Now suppose that instead of having a predetermined number of promotions to give, the company decides independently for each employee, promoting the employee with probability . Find the distributions of the number of women who are promoted, the number of women who are not promoted, and the number of employees who are promoted.
(c) In the set-up from (b), find the conditional distribution of the number of women who are promoted, given that exactly employees are promoted.
3.31
Once upon a time, a famous statistician offered tea to a lady. The lady claimed that she could tell whether milk had been added to the cup before or after the tea. The statistician decided to run some experiments to test her claim.
(a) The lady is given 6 cups of tea, where it is known in advance that 3 will be milk-first and 3 will be tea-first, in a completely random order. The lady gets to taste each and then guess which 3 were milk-first. Assume for this part that she has no ability whatsoever to distinguish milk-first from tea-first cups of tea. Find the probability that at least 2 of her 3 guesses are correct.
(b) Now the lady is given one cup of tea, with probability of it being milk-first. She needs to say whether she thinks it was milk-first. Let be the lady’s probability of being correct given that it was milk-first, and be her probability of being correct given that it was tea-first. She claims that the cup was milk-first. Find the posterior odds that the cup is milk-first, given this information.
3.32
In Evan’s history class, 10 out of 100 key terms will be randomly selected to appear on the final exam; Evan must then choose 7 of those 10 to define. Since he knows the format of the exam in advance, Evan is trying to decide how many key terms he should study.
(a) Suppose that Evan decides to study key terms, where is an integer between 0 and 100. Let be the number of key terms appearing on the exam that he has studied. What is the distribution of ? Give the name and parameters, in terms of .
(b) Using R or other software, calculate the probability that Evan knows at least 7 of the 10 key terms that appear on the exam, assuming that he studies key terms.
3.33
A book has typos. Two proofreaders, Prue and Frida, independently read the book. Prue catches each typo with probability and misses it with probability , independently, and likewise for Frida, who has probabilities of catching and of missing each typo. Let be the number of typos caught by Prue, be the number caught by Frida, and be the number caught by at least one of the two proofreaders.
(a) Find the distribution of .
(b) For this part only, assume that . Find the conditional distribution of given that .
3.34
There are students at a certain school, of whom are Statistics majors. A simple random sample of size is drawn (“simple random sample” means sampling without replacement, with all subsets of the given size equally likely).
(a) Find the PMF of the number of Statistics majors in the sample, using the law of total probability (don’t forget to say what the support is). You can leave your answer as a sum (though with some algebra it can be simplified, by writing the binomial coefficients in terms of factorials and using the binomial theorem).
(b) Give a story proof derivation of the distribution of the number of Statistics majors in the sample; simplify fully.
Hint: Does it matter whether the students declare their majors before or after the random sample is drawn?
3.35
Stat110 solution available.
Players A and B take turns in answering trivia questions, starting with player A answering the first question. Each time A answers a question, she has probability of getting it right. Each time B plays, he has probability of getting it right.
(a) If A answers questions, what is the PMF of the number of questions she gets right?
(b) If A answers times and B answers times, what is the PMF of the total number of questions they get right (you can leave your answer as a sum)? Describe exactly when/whether this is a Binomial distribution.
(c) Suppose that the first player to answer correctly wins the game (with no predetermined maximum number of questions that can be asked). Find the probability that A wins the game.
3.36
There are voters in an upcoming election in a certain country, where is a large, even number. There are two candidates: Candidate A (from the Unite Party) and Candidate B (from the Untie Party). Let be the number of people who vote for Candidate A. Suppose that each voter chooses randomly whom to vote for, independently and with equal probabilities.
(a) Find an exact expression for the probability of a tie in the election (so the candidates end up with the same number of votes).
(b) Use Stirling’s approximation, which approximates the factorial function as
to find a simple approximation to the probability of a tie. Your answer should be of the form , with a constant (which you should specify).
3.37
Stat110 solution available.
A message is sent over a noisy channel. The message is a sequence of bits (). Since the channel is noisy, there is a chance that any bit might be corrupted, resulting in an error (a 0 becomes a 1 or vice versa). Assume that the error events are independent. Let be the probability that an individual bit has an error . Let be the received message (so if there is no error in that bit, but if there is an error there).
To help detect errors, the th bit is reserved for a parity check: is defined to be 0 if is even, and 1 if is odd. When the message is received, the recipient checks whether has the same parity as . If the parity is wrong, the recipient knows that at least one error occurred; otherwise, the recipient assumes that there were no errors.
(a) For , , what is the probability that the received message has errors which go undetected?
(b) For general and , write down an expression (as a sum) for the probability that the received message has errors which go undetected.
(c) Give a simplified expression, not involving a sum of a large number of terms, for the probability that the received message has errors which go undetected.
Hint for (c): Letting
the binomial theorem makes it possible to find simple expressions for and , which then makes it possible to obtain and .
Independence of r.v.s
3.38
(a) Give an example of dependent r.v.s and such that .
(b) Give an example of independent r.v.s and such that .
3.39
Give an example of two discrete random variables and on the same sample space such that and have the same distribution, with support , but the event never occurs. If and are independent, is it still possible to construct such an example?
3.40
Suppose and are discrete r.v.s such that . This means that and always take on the same value.
(a) Do and have the same PMF?
(b) Is it possible for and to be independent?
3.41
If , , are r.v.s such that and are independent and and are independent, does it follow that and are independent?
Hint: Think about simple and extreme examples.
3.42
Stat110 solution available.
Let be a random day of the week, coded so that Monday is 1, Tuesday is 2, etc. (so takes values , with equal probabilities). Let be the next day after (again represented as an integer between 1 and 7). Do and have the same distribution? What is ?
3.43
(a) Is it possible to have two r.v.s and such that and have the same distribution but , where:
- ?
- ?
- ?
- ?
For each, give an example showing it is possible, or prove it is impossible.
Hint: Do the previous question first.
(b) Consider the same question as in Part (a), but now assume that and are independent. Do your answers change?
3.44
For and binary digits (0 or 1), let be 0 if and 1 if (this operation is called exclusive or (often abbreviated to XOR), or addition mod 2).
(a) Let and , independently. What is the distribution of ?
(b) With notation as in (a), is independent of ? Is independent of ? Be sure to consider both the case and the case .
(c) Let be i.i.d. . For each nonempty subset of , let
where the notation means to “add” in the sense all the elements of ; the order in which this is done doesn’t matter since and . Show that and that these r.v.s are pairwise independent, but not independent. For example, we can use this to simulate 1023 pairwise independent fair coin tosses using only 10 independent fair coin tosses.
Hint: Apply the previous parts with . Show that if and are two different nonempty subsets of , then we can write , , where consists of the with , consists of the with , and consists of the with . Then , , are independent since they are based on disjoint sets of . Also, at most one of these sets of can be empty. If , then , . Otherwise, compute by conditioning on whether .
Mixed practice
3.45
Stat110 solution available.
A new treatment for a disease is being tested, to see whether it is better than the standard treatment. The existing treatment is effective on 50% of patients. It is believed initially that there is a chance that the new treatment is effective on 60% of patients, and a chance that the new treatment is effective on 50% of patients. In a pilot study, the new treatment is given to 20 random patients, and is effective for 15 of them.
(a) Given this information, what is the probability that the new treatment is better than the standard treatment?
(b) A second study is done later, giving the new treatment to 20 new random patients. Given the results of the first study, what is the PMF for how many of the new patients the new treatment is effective on? (Letting be the answer to (a), your answer can be left in terms of .)
3.46
Independent Bernoulli trials are performed, with success probability for each trial. An important question that often comes up in such settings is how many trials to perform. Many controversies have arisen in statistics over the issue of how to analyze data coming from an experiment where the number of trials can depend on the data collected so far.
For example, if we can follow the rule “keep performing trials until there are more than twice as many failures as successes, and then stop”, then naively looking at the ratio of failures to successes (if and when the process stops) will give more than 2:1 rather than the true theoretical 1:1 ratio; this could be a very misleading result! However, it might never happen that there are more than twice as many failures as successes; in this problem, you will find the probability of that happening.
(a) Two gamblers, A and B, make a series of bets, where each has probability of winning a bet, but A gets $2 for each win and loses $1 for each loss (a very favorable game for A!). Assume that the gamblers are allowed to borrow money, so they can and do gamble forever. Let be the probability that A, starting with $k, will ever reach $0, for each . Explain how this story relates to the original problem, and how the original problem can be solved if we can find .
(b) Find .
Hint: As in the gambler’s ruin, set up and solve a difference equation for . We have as (you don’t need to prove this, but it should make sense since the game is so favorable to A, which will result in A’s fortune going to ; a formal proof, not required here, could be done using the law of large numbers, an important theorem from Chapter 10). The solution can be written neatly in terms of the golden ratio.
(c) Find the probability of ever having more than twice as many failures as successes with independent trials, as originally desired.
3.47
A copy machine is used to make pages of copies per day. The machine has two trays in which paper gets loaded, and each page used is taken randomly and independently from one of the trays. At the beginning of the day, the trays are refilled so that they each have pages.
(a) Let pbinom(x,n,p) be the CDF of the distribution, evaluated at . In terms of pbinom, find a simple expression for the probability that both trays have enough paper on any particular day, when this probability is strictly between 0 and 1 (also specify the values of for which the probability is 0 and the values for which it is 1).
Hint: Be careful about whether inequalities are strict, since the Binomial is discrete.
(b) Using a computer, find the smallest value of for which there is at least a 95% chance that both trays have enough paper on a particular day, for , , , and .
Hint: If you use R, you may find the following commands useful:
g <- function(m,n) [your answer from (a)] defines a function g such that g(m,n) is your answer from (a), g(1:100,100) gives the vector (g(1,100),...,g(100,100)), which(v>0.95) gives the indices of the components of vector v that exceed 0.95, and min(w) gives the minimum of a vector w.