Continuous random variables
Prev: Expectation Next: Moments
Problems
Exercises marked as solved in the book have detailed solutions at http://stat110.net.
PDFs and CDFs
5.1
The Rayleigh distribution from Example 5.1.7 has PDF
Let have the Rayleigh distribution.
(a) Find .
(b) Find the first quartile, median, and third quartile of ; these are defined to be the values (respectively) such that for .
5.2
(a) Make up a PDF , with an application for which that PDF would be plausible, where for all in a certain interval.
(b) Show that if a PDF has for all in a certain interval, then that interval must have length less than 1.
5.3
Let be the CDF of a continuous r.v., and be the PDF.
(a) Show that defined by is also a valid PDF.
(b) Show that defined by is also a valid PDF.
5.4
Let be a continuous r.v. with CDF and PDF .
(a) Find the conditional CDF of given , for a constant with . That is, find for all , in terms of .
(b) Find the conditional PDF of given (this is the derivative of the conditional CDF).
(c) Check that the conditional PDF from (b) is a valid PDF, by showing directly that it is nonnegative and integrates to 1.
5.5
A circle with a random radius is generated. Let be its area.
(a) Find the mean and variance of , without first finding the CDF or PDF of .
(b) Find the CDF and PDF of .
5.6
The 68-95-99.7% rule gives approximate probabilities of a Normal r.v. being within 1, 2, and 3 standard deviations of its mean. Derive analogous rules for the following distributions.
(a) .
(b) .
(c) . Discuss whether there is one such rule that applies to all Exponential distributions, just as the 68-95-99.7% rule applies to all Normal distributions, not just to the standard Normal.
5.7
Let
for , and for .
(a) Check that is a valid CDF, and find the corresponding PDF . This distribution is called the Arcsine distribution, though it also goes by the name Beta (we will explore the Beta in depth in Chapter 8).
(b) Explain how it is possible for to be a valid PDF even though goes to as approaches 0 from the right and as approaches 1 from the left.
5.8
The Beta distribution with parameters , has PDF
(We will discuss the Beta in detail in Chapter 8.) Let have this distribution.
(a) Find the CDF of .
(b) Find .
(c) Find the mean and variance of (without quoting results about the Beta distribution).
5.9
The Cauchy distribution has PDF
for all real . (We will introduce the Cauchy from another point of view in Chapter 7.) Find the CDF of a random variable with the Cauchy PDF.
Hint: Recall that the derivative of the inverse tangent function is .
Uniform and universality of the Uniform
5.10
Let .
(a) Find without using calculus.
(b) Find the conditional distribution of given .
5.11
Stat110 solution available.
Let be a Uniform r.v. on the interval (be careful about minus signs).
(a) Compute , , and .
(b) Find the CDF and PDF of . Is the distribution of Uniform on ?
5.12
Stat110 solution available.
A stick is broken into two pieces, at a uniformly random breakpoint. Find the CDF and average of the length of the longer piece.
5.13
A stick of length 1 is broken at a uniformly random point, yielding two pieces. Let and be the lengths of the shorter and longer pieces, respectively, and let be the ratio of the lengths and .
(a) Find the CDF and PDF of .
(b) Find the expected value of (if it exists).
(c) Find the expected value of (if it exists).
5.14
Let be i.i.d. , and . What is the PDF of ? What is ?
Hint: Find the CDF of first, by translating the event into an event involving .
5.15
Let . Using , construct .
5.16
Stat110 solution available.
Let , and
Then has the Logistic distribution, as defined in Example 5.1.6.
(a) Write down (but do not compute) an integral giving .
(b) Find without using calculus.
Hint: A useful symmetry property here is that has the same distribution as .
5.17
Let . As a function of , create an r.v. with CDF for .
5.18
The Pareto distribution with parameter has PDF for (and 0 otherwise). This distribution is often used in statistical modeling.
(a) Find the CDF of a Pareto r.v. with parameter ; check that it is a valid CDF.
(b) Suppose that for a simulation you want to run, you need to generate i.i.d. Pareto r.v.s. You have a computer that knows how to generate i.i.d. r.v.s but does not know how to generate Pareto r.v.s. Show how to do this.
Normal
5.19
Let . Create an r.v. , as a simple-looking function of . Make sure to check that your has the correct mean and variance.
5.20
Engineers sometimes work with the “error function”
instead of the standard Normal CDF .
(a) Show that the following conversion between and holds for all :
(b) Show that is an odd function, i.e., .
5.21
(a) Find the points of inflection of the PDF , i.e., the points where the curve switches from convex (second derivative positive) to concave (second derivative negative) or vice versa.
(b) Use the result of (a) and a location-scale transformation to find the points of inflection of the PDF.
5.22
The distance between two points needs to be measured, in meters. The true distance between the points is 10 meters, but due to measurement error we can’t measure the distance exactly. Instead, we will observe a value of , where the error is distributed . Find the probability that the observed distance is within 0.4 meters of the true distance (10 meters). Give both an exact answer in terms of and an approximate numerical answer.
5.23
Alice is trying to transmit to Bob the answer to a yes-no question, using a noisy channel. She encodes “yes” as 1 and “no” as 0, and sends the appropriate value. However, the channel adds noise; specifically, Bob receives what Alice sends plus a noise term (the noise is independent of what Alice sends). If Bob receives a value greater than he interprets it as “yes”; otherwise, he interprets it as “no”.
(a) Find the probability that Bob understands Alice correctly.
(b) What happens to the result from (a) if is very small? What about if is very large? Explain intuitively why the results in these extreme cases make sense.
5.24
A woman is pregnant, with a due date of January 10, 2020. Of course, the actual date on which she will give birth is not necessarily the due date. On a timeline, define time 0 to be the instant when January 10, 2020 begins. Suppose that the time when the woman gives birth has a Normal distribution, centered at 0 and with standard deviation 8 days. What is the probability that she gives birth on her due date? (Your answer should be in terms of , and simplified.)
5.25
We will show in the next chapter that if and are independent with , then . Use this result to find for , with and independent.
Hint: Write and then standardize . Check that your answer makes sense in the special case where and are i.i.d.
5.26
Walter and Carl both often need to travel from Location A to Location B. Walter walks, and his travel time is Normal with mean minutes and standard deviation minutes (travel time can’t be negative without using a tachyon beam, but assume that is so much larger than that the chance of a negative travel time is negligible).
Carl drives his car, and his travel time is Normal with mean minutes and standard deviation minutes (the standard deviation is larger for Carl due to variability in traffic conditions). Walter’s travel time is independent of Carl’s. On a certain day, Walter and Carl leave from Location A to Location B at the same time.
(a) Find the probability that Carl arrives first (in terms of and the parameters). For this you can use the important fact, proven in the next chapter, that if and are independent with , then .
(b) Give a fully simplified criterion (not in terms of ), such that Carl has more than a 50% chance of arriving first if and only if the criterion is satisfied.
(c) Walter and Carl want to make it to a meeting at Location B that is scheduled to begin minutes after they depart from Location A. Give a fully simplified criterion (not in terms of ) such that Carl is more likely than Walter to make it on time for the meeting if and only if the criterion is satisfied.
5.27
Let . We know from the 68-95-99.7% rule that there is a 68% chance of being in the interval . Give a visual explanation of whether or not there is an interval that is shorter than the interval , yet which has at least as large a chance as of containing .
5.28
Let . Use the fact that to construct a random interval (that is, an interval whose endpoints are r.v.s), such that the probability that is in the interval is approximately 0.95. This interval is called a confidence interval for ; such intervals are often desired in statistics when estimating unknown parameters based on data.
5.29
Let , with . This is a well-defined continuous r.v., even though the absolute value function is not differentiable at 0 (due to the sharp corner).
(a) Find the CDF of in terms of . Be sure to specify the CDF everywhere.
(b) Find the PDF of .
(c) Is the PDF of continuous at 0? If not, is this a problem as far as using the PDF to find probabilities?
5.30
Stat110 solution available.
Let and let be a random sign independent of , i.e., is 1 with probability and with probability . Show that .
5.31
Stat110 solution available.
Let . Find without using LOTUS, where is the CDF of .
5.32
Stat110 solution available.
Let and . Then the distribution of is called Chi-Square with 1 degree of freedom. This distribution appears in many statistical methods.
(a) Find a good numerical approximation to using facts about the Normal distribution, without querying a calculator/computer/table about values of the Normal CDF.
(b) Let and be the CDF and PDF of , respectively. Show that for any ,
Using this and LOTUS, derive Mills’ inequality, which is the following lower bound on :
5.33
Let , with CDF . We will show in Chapter 8 that the PDF of is the function given by
for , and for .
(a) Find expressions for as integrals in two different ways, one based on the PDF of and the other based on the PDF of .
(b) Find .
5.34
Stat110 solution available.
Let . A measuring device is used to observe , but the device can only handle positive values, and gives a reading of 0 if ; this is an example of censored data. So assume that is observed rather than , where is the indicator of . Find and .
5.35
Let , and be a nonnegative constant. Find , in terms of the standard Normal CDF and PDF . (This kind of calculation often comes up in quantitative finance.)
Hint: Use LOTUS, and handle the max symbol by adjusting the limits of integration appropriately. As a check, make sure that your answer reduces to when ; this must be the case since we showed in Example 5.4.7 that , and we have so by symmetry
Exponential
5.36
Stat110 solution available.
A post office has 2 clerks. Alice enters the post office while 2 other customers, Bob and Claire, are being served by the 2 clerks. She is next in line. Assume that the time a clerk spends serving a customer has an distribution.
(a) What is the probability that Alice is the last of the 3 customers to be done being served?
Hint: No integrals are needed.
(b) What is the expected total time that Alice needs to spend at the post office?
5.37
Let be the time until a radioactive particle decays, and suppose (as is often done in physics and chemistry) that .
(a) The half-life of the particle is the time at which there is a 50% chance that the particle has decayed (in statistical terminology, this is the median of the distribution of ). Find the half-life of the particle.
(b) Show that for a small, positive constant, the probability that the particle decays in the time interval , given that it has survived until time , does not depend on and is approximately proportional to .
Hint: if .
(c) Now consider radioactive particles, with i.i.d. times until decay . Let be the first time at which one of the particles decays. Find the CDF of . Also, find and .
(d) Continuing (c), find the mean and variance of , the last time at which one of the particles decays, without using calculus.
Hint: Draw a timeline, apply (c), and remember the memoryless property.
5.38
Stat110 solution available.
Fred wants to sell his car, after moving back to Blissville (where he is happy with the bus system). He decides to sell it to the first person to offer at least $18,000 for it. Assume that the offers are independent Exponential random variables with mean $12,000, and that Fred is able to keep getting offers until he obtains one that meets his criterion.
(a) Find the expected number of offers Fred will have.
(b) Find the expected amount of money that Fred will get for the car.
5.39
As in the previous problem, Fred wants to sell his car, and the offers for his car are i.i.d. Exponential r.v.s with mean $12,000. Assume now though that he will wait until he has 3 offers (no matter how large or small they are), and then accept the largest of the 3 offers. Find the expected amount of money that Fred will get for his car.
5.40
(a) Fred visits Blotchville again. He finds that the city has installed an electronic display at the bus stop, showing the time when the previous bus arrived. The times between arrivals of buses are still independent Exponentials with mean 10 minutes. Fred waits for the next bus, and then records the time between that bus and the previous bus. On average, what length of time between buses does he see?
(b) Fred then visits Blunderville, where the times between buses are also 10 minutes on average, and independent. Yet to his dismay, he finds that on average he has to wait more than 1 hour for the next bus when he arrives at the bus stop! How is it possible that the average Fred-to-bus time is greater than the average bus-to-bus time even though Fred arrives at some time between two bus arrivals? Explain this intuitively, and construct a specific discrete distribution for the times between buses showing that this is possible.
5.41
Fred and Gretchen are waiting at a bus stop in Blotchville. Two bus routes, Route 1 and Route 2, have buses that stop at this bus stop. For Route , buses arrive according to a Poisson process with rate buses/minute. The Route 1 process is independent of the Route 2 process. Fred is waiting for a Route 1 bus, and Gretchen is waiting for a Route 2 bus.
(a) Given that Fred has already waited for 20 minutes, on average how much longer will he have to wait for his bus?
(b) Find the probability that at least Route 1 buses will pass by before the first Route 2 bus arrives. The following result from Chapter 7 may be useful here: for independent random variables , , we have .
(c) For this part only, assume that . Find the expected time it will take until both Fred and Gretchen have caught their buses.
5.42
Stat110 solution available.
Joe is waiting in continuous time for a book called The Winds of Winter to be released. Suppose that the waiting time until news of the book’s release is posted, measured in years relative to some starting point, has an Exponential distribution with .
Joe is not so obsessive as to check multiple times a day; instead, he checks the website once at the end of each day. Therefore, he observes the day on which the news was posted, rather than the exact time . Let be this measurement, where means that the news was posted within the first day (after the starting point), means it was posted on the second day, etc. (assume that there are 365 days in a year). Find the PMF of . Is this a named distribution that we have studied?
5.43
The Exponential is the analog of the Geometric in continuous time. This problem explores the connection between Exponential and Geometric in more detail, asking what happens to a Geometric in a limit where the Bernoulli trials are performed faster and faster but with smaller and smaller success probabilities.
Suppose that Bernoulli trials are being performed in continuous time; rather than only thinking about first trial, second trial, etc., imagine that the trials take place at points on a timeline. Assume that the trials are at regularly spaced times , where is a small positive number. Let the probability of success of each trial be , where is a positive constant. Let be the number of failures before the first success (in discrete time), and be the time of the first success (in continuous time).
(a) Find a simple equation relating to .
Hint: Draw a timeline and try out a simple example.
(b) Find the CDF of .
Hint: First find .
(c) Show that as , the CDF of converges to the CDF, evaluating all the CDFs at a fixed .
Hint: Use the compound interest limit (see Section A.2.5 of the math appendix).
5.44
The Laplace distribution has PDF
for all real . The Laplace distribution is also called a symmetrized Exponential distribution. Explain this in the following two ways.
(a) Plot the PDFs and explain how they relate.
(b) Let and be a random sign (1 or , with equal probabilities), with and independent. Find the PDF of (by first finding the CDF), and compare the PDF of and the Laplace PDF.
5.45
Emails arrive in an inbox according to a Poisson process with rate 20 emails per hour. Let be the time at which the 3rd email arrives, measured in hours after a certain fixed starting time. Find without using calculus.
Hint: Apply the count-time duality.
5.46
Let be the lifetime of a certain person (how long that person lives), and let have CDF and PDF . The hazard function of is defined by
(a) Explain why is called the hazard function and in particular, why is the probability density for death at time , given that the person survived up until then.
(b) Show that an Exponential r.v. has constant hazard function and conversely, if the hazard function of is a constant then must be for some .
5.47
Let be the lifetime of a person (or animal or gadget), with CDF and PDF . Let be the hazard function, defined as in the previous problem. If we know then we can calculate , and then in turn we can calculate . In this problem, we consider the reverse problem: how to recover and from knowing .
(a) Show that the CDF and hazard function are related by
for all .
Hint: Let be the survival function, and consider the derivative of .
(b) Show that the PDF and hazard function are related by
for all .
Hint: Apply the result of (a).
5.48
Stat110 solution available.
Find for , using LOTUS and the fact that and , and integration by parts at most once. In the next chapter, we’ll learn how to find for all .
5.49
Stat110 solution available.
The Gumbel distribution is the distribution of with .
(a) Find the CDF of the Gumbel distribution.
(b) Let be i.i.d. and let . Show that converges in distribution to the Gumbel distribution, i.e., as the CDF of converges to the Gumbel CDF.
Mixed practice
5.50
Explain intuitively why if and are i.i.d., but equality may not hold if and are not independent or not identically distributed.
5.51
Let be an r.v. (discrete or continuous) such that always holds. Let .
(a) Show that
Hint: With probability 1, we have .
(b) Show that there is only one possible distribution for for which . What is the name of this distribution?
5.52
The Rayleigh distribution from Example 5.1.7 has PDF
Let have the Rayleigh distribution.
(a) Find without using much calculus, by interpreting the integral in terms of known results about the Normal distribution.
(b) Find .
Hint: A nice approach is to use LOTUS and the substitution , and then interpret the resulting integral in terms of known results about the Exponential distribution.
5.53
Stat110 solution available.
Consider an experiment where we observe the value of a random variable , and estimate the value of an unknown constant using some random variable that is a function of . The r.v. is called an estimator. Think of as the data observed in the experiment, and as an unknown parameter related to the distribution of .
For example, consider the experiment of flipping a coin times, where the coin has an unknown probability of Heads. After the experiment is performed, we have observed the value of . The most natural estimator for is then .
The bias of an estimator for is defined as . The mean squared error is the average squared error when using to estimate :
Show that
This implies that for fixed MSE, lower bias can only be attained at the cost of higher variance and vice versa; this is a form of the bias-variance tradeoff, a phenomenon which arises throughout statistics.
5.54
Stat110 solution available.
(a) Suppose that we have a list of the populations of every country in the world. Guess, without looking at data yet, what percentage of the populations have the digit 1 as their first digit (e.g., a country with a population of 1,234,567 has first digit 1 and a country with population 89,012,345 does not).
(b) After having done (a), look through a list of populations and count how many start with a 1. What percentage of countries is this? Benford’s law states that in a very large variety of real-life data sets, the first digit approximately follows a particular distribution with about a 30% chance of a 1, an 18% chance of a 2, and in general
where is the first digit of a randomly chosen element. (Exercise 6 from Chapter 3 asks for a proof that this is a valid PMF.) How closely does the percentage found in the data agree with that predicted by Benford’s law?
(c) Suppose that we write the random value in some problem (e.g., the population of a random country) in scientific notation as , where is a nonnegative integer and . Assume that is a continuous r.v. with PDF
and 0 otherwise, with a constant. What is the value of (be careful with the bases of logs)? Intuitively, we might hope that the distribution of does not depend on the choice of units in which is measured. To see whether this holds, let with . What is the PDF of (specifying where it is nonzero)?
(d) Show that if we have a random number (written in scientific notation) and has the PDF from (c), then the first digit (which is also the first digit of ) has Benford’s law as its PMF.
Hint: What does correspond to in terms of the values of ?
5.55
Stat110 solution available.
(a) Let be independent r.v.s., and let be the smallest value of such that (i.e., the index of the first exceeding 4). In terms of , find .
(b) Let and be PDFs with and for all . Let be a random variable with PDF . Find the expected value of the ratio
Such ratios come up very often in statistics, when working with a quantity known as a likelihood ratio and when using a computational technique known as importance sampling.
(c) Define
This is a CDF and is a continuous, strictly increasing function. Let have CDF , and define . What are the mean and variance of ?
5.56
Let be i.i.d.
(a) Find an expression for as an integral.
(b) Find .
(c) Find .
5.57
Let be i.i.d., and .
(a) Find the CDF and PDF of .
(b) Let be the PDF of and be the PDF of . Find unsimplified expressions for as integrals in two different ways, one based on and one based on .
(c) Find , in terms of .
Hint: Move all of the r.v.s to one side of the inequality.
5.58
Let and . So is if , and is 0 if .
(a) Find an expression for as an integral (which can be unsimplified).
(b) Let be independent r.v.s, each with the same distribution as . Let , i.e., is the smallest value such that . Find .
(c) Find the CDF of in terms of . (Be sure to specify it for all real numbers.)
5.59
The unit circle is divided into three arcs by choosing three random points on the circle (independently and uniformly), forming arcs between and , between and , and between and . Let be the length of the arc containing the point . What is ? Study this by working through the following steps.
(a) Explain what is wrong with the following argument: “The total length of the arcs is , the circumference of the circle. So by symmetry and linearity, each arc has length on average. Referring to the arc containing is just a way to specify one of the arcs (it wouldn’t matter if were replaced by or any other specific point on the circle in the statement of the problem). So the expected value of is .”
(b) Let the arc containing be divided into two pieces: the piece extending counterclockwise from and the piece extending clockwise from . Write , where and are the lengths of the counterclockwise and clockwise pieces, respectively. Find the CDF, PDF, and expected value of .
(c) Use (b) to find .
5.60
Stat110 solution available.
As in Example 5.7.4, athletes compete one at a time at the high jump. Let be how high the th jumper jumped, with i.i.d. with a continuous distribution. We say that the th jumper is “best in recent memory” if they jump higher than the previous 2 jumpers (for ; the first 2 jumpers don’t qualify).
(a) Find the expected number of best in recent memory jumpers among the 3rd through th jumpers.
(b) Let be the event that the th jumper is the best in recent memory. Find , , and . Are and independent?
5.61
Tyrion, Cersei, and other guests arrive at a party at i.i.d. times drawn from a continuous distribution with support , and stay until the end (time 0 is the party’s start time and time 1 is the end time). The party will be boring at times when neither Tyrion nor Cersei is there, fun when exactly one of them is there, and awkward when both Tyrion and Cersei are there.
(a) On average, how many of the other guests will arrive at times when the party is fun?
(b) Jaime and Robert are two of the other guests. By computing both sides in the definition of independence, determine whether the event “Jaime arrives at a fun time” is independent of the event “Robert arrives at a fun time”.
(c) Give a clear intuitive explanation of whether the two events from (b) are independent, and whether they are conditionally independent given the arrival times of everyone else, i.e., everyone except Jaime and Robert.
5.62
Let be the annual rainfalls in Boston (measured in inches) in the years 2101, 2102, … , respectively. Assume that annual rainfalls are i.i.d. draws from a continuous distribution. A rainfall value is a record high if it is greater than those in all previous years (starting with 2101), and a record low if it is lower than those in all previous years.
(a) In the 22nd century (the years 2101 through 2200, inclusive), find the expected number of years that have either a record low or a record high rainfall.
(b) On average, in how many years in the 22nd century is there a record low followed in the next year by a record high? (Only the record low is required to be in the 22nd century, not the record high.)
(c) By definition, the year 2101 is a record high (and record low). Let be the number of years required to get a new record high. Find for all positive integers , and use this to find the PMF of .
Hint: Note that .
(d) With notation as above, show that is infinite.