Problems

1. To Begin or Not to Begin?

An urn contains k black balls and a single red ball. Peter and Paula draw without replacement balls from this urn, alternating after each draw until the red ball is drawn. The game is won by the player who happens to draw the single red ball. Peter is a gentleman and offers Paula the choice of whether she wants to start or not. Paula has a hunch that she might be better off if she starts; after all, she might succeed in the first draw. On the other hand, if her first draw yields a black ball, then Peter’s chances to draw the red ball in his first draw are increased, because then one black ball is already removed from the urn. How should Paula decide in order to maximize her probability of winning?

Let’s solve this by looking at values of k. If k is 0, Going first has a 100% chance of winning, and second has a 0% chance. If k is 1, Going first has a 50% chance of winning, and second has a 50% chance of winning. If k is 2, Going first means you can take 2/3 of the balls vs 1/3 if you go second, so going first is better. If k is 3, Going first or second means you take 2/4 balls, so this is also 50/50.

Thus, even and odd numbers of k are the only difference, where if k is odd, then you have a 50/50 chance of winning as either first or second, and if k is even, you have an edge, so you should always go first.

2. A Tournament Problem

Ten players participate in the first round of a tennis tournament: 2 females and 8 males. Five single matches are fixed at random by successively drawing, without replacement, the names of all 10 players from an urn: the player drawn first plays against the one whose name comes up second, the third against the fourth, etc.

a. What is the probability that there will not be a single match involving two female players? Is this probability smaller, equal to, or larger than the corresponding probability with 20 females and 80 males?

b. Try to answer the general case in which there are 2n players, of whom 2 ≤ k ≤ n are female. What is the probability p(k, n) that among the n matches there will not be a single one involving two female players

  • For the chance that there will not be a single match involving two female players, we can calculate the total combinations of pairs (10c2), or 45, and the amount of pairs where there is a woman vs a woman (5, since the pair can be in any of the 5 games).
  • Thus the chance is 40/45 or 8/9 for there to be no woman vs woman game.
  • For the case with 20 females and 80 males, it drops to about 9%.

3. Mean Waiting Time for 1 − 1 vs. 1 − 2

Peter and Paula play a simple game of dice, as follows. Peter keeps throwing the (unbiased) die until he obtains the sequence 1 − 1 in two successive throws. For Paula, the rules are similar, but she throws the die until she obtains the sequence 1 − 2 in two successive throws.

a. On average, will both have to throw the die the same number of times? If not, whose expected waiting time is shorter (no explicit calculations are required)?

b. Derive the actual expected waiting times for Peter and Paula

a. The mean waiting time for Peter is longer than for Paula, because Peter has to roll 1-1, and if he rolls anything other than a 1 on a roll, he has to restart. For Paula, if she rolls anything other than a 1 into a 1, she can continue on that way.

We can model this using a game tree for Paul:

  • With a probability p (), Paul will roll a 1.
    • With a probability p () Paul will again roll a 1.
    • With a probability 1 - p (), Paul won’t roll a 1 and restart.
  • With a probability 1 - p, (), Paul won’t roll a 1 and restart.

Thus, there are three outcomes:

The outcome that ends the game, 1-1 has a chance of or (. The outcome that he rolls a 1 and then anything else is or . The outcome that restarts is or .

The first two outcomes take 2 rounds and the last one takes 1 round. Thus, the total waiting time is the time taken * the probability they occur:

Simplifying:

When , W = 42.

For Paula’s case, we can model this as a markov chain:

There are three states, the starting state , a state for when the last throw was a 1 , and the final state of 1-2, .

Thus, we can find the expected waiting time for both and as follows:

This gives us W = 36, solving for .

4. How to Divide up Gains in Interrupted Games

Peter and Paula play a game of chance that consists of several rounds. Each individual round is won, with equal probabilities of , by either Peter or Paula; the winner then receives one point. Successive rounds are independent. Each has staked $50 for a total of $100, and they agree that the game ends as soon as one of them has won a total of 5 points; this player then receives the $100. After they have completed four rounds, of which Peter has won three and Paula only one, a fire breaks out so that they cannot continue their game.

a. How should the $100 be divided between Peter and Paula?

b. How should the $100 be divided in the general case, when Peter needs to win a more rounds and Paula needs to win b more rounds?

a. In the case where Peter has 3 wins and Paula has 1, Peter needs to win 2 times in the next 5 rounds, whereas Paula needs to win 4 times in the 5 rounds.

We can enumerate all the cases, or we can just play 5 rounds and count the times where Peter wins 0 or 1 times out of the 5, to find out Paula’s winning chance:

This is , or Peter should receive of the portion, and Paula should receive .

b. In the general case, we can apply the same technique:

The number of rounds that can be played is .

We can apply the same formula in the general case, where we count and multiply it by .

5. How Often Do Head and Tail Occur Equally Often?

According to many people’s intuition, when two events, such as head and tail in coin tossing, are equally likely then the probability that these events will occur equally often increases with the number of trials. This expectation reflects the intuitive notion that in the long run, asymmetries of the frequencies of head and tail will “balance out” and cancel. To find the basis of this intuition, consider that fair and independent coins are thrown at a time.

a. What is the probability of an even split for head and tail when ?

b. Consider the same question for and .

a. To count the even n : n split, we can count the number of possible outcomes and subtract out the non n : n splits to get the chance of an n : n split.

In the case of 20 coin flips, there are or about 1 million ways to flip them in a way where order doesn’t matter.

To count the number of times where there are exactly 10 heads and 10 tails:

Or about 184,756. Dividing this by , we get 17.62%.

b. This approaches the normal distribution as increases.

We can approximate this probability as the following, trying to create a rectangle of the area below the curve:

6. Sample Size vs. Signal Strength

An urn contains six balls — three red and three blue. One of these balls — let us call it ball A — is selected at random and permanently removed from the urn without the color of this ball being shown to an observer. This observer may now draw successively — at random and with replacement — a number of individual balls (one at a time) from among the five remaining balls, so as to form a noisy impression about the ratio of red vs. blue balls that remained in the urn after A was removed. Peter may draw a ball six times, and each time the ball he draws turns out to be red. Paula may draw a ball 600 times; 303 times she draws a red ball, and 297 times a blue ball. Clearly, both will tend to predict that ball A was probably blue. Which of them — if either — has the stronger empirical evidence for his/her prediction?

They both have the same amount of empirical evidence. The only thing that matters is the difference between the red and blue balls being drawn, which is 6 for both.

To calculate the probability that the ball drawn was blue for Peter, we can use Bayes theorem, calculating the probability that the ball drawn was blue or red:

For Paula, we get the same result:

If we pull blue:

If we pull red:

The chance of blue out of both blue and red:

Which is the same as above.

7. Birthday Holidays

The following problem is described in Cacoullos (1989, pp. 35–36). A worker’s legal code specifies as a holiday any day during which at least one worker in a certain factory has a birthday. All other days are working days. How many workers (n) must the factory employ so that the expected number of working man-days is maximized during the year?

Each worker can work up to 365 days a year. However, if they have the same birthday, everyones workdays go down by 1. Thus, we want to find the point at which

8. Random Areas

Peter and Paula both want to cut out a rectangular piece of paper. Because they are both probabilists they determine the exact form of the rectangle by using realizations of a positive rv, say , as follows. Peter is lazy and generates just a single realization of this rv; he then cuts out a square that has length and width equal to this value. Paula likes diversity and generates two independent realizations of . She then cuts out a rectangle with width equal to the first realization and length equal to the second realization.

a. Will the areas cut out by Peter and Paula differ in expectation?

b. If they do, is Peter’s or Paula’s rectangle expected to be larger?

9. Maximize Your Gain

A nonnegative rv has DF and density ; its mean and variance are both finite. A game is offered, as follows: you may choose a nonnegative number ; if then you win the amount , otherwise you win nothing.

As an example, suppose is the height (measured in cm) of the next person entering a specific public train station. If you choose then you will almost surely win that amount. A value of would double your amount if you win, but of course drastically reduce your winning probability.

a. Find an equation to characterize the value of that maximizes the expected gain.

b. Give a characterization of the optimal value of in terms of the hazard function of (see page 2 for the definition of the hazard function).

c. Derive explicitly for an exponential rv with rate (see page 1 for a definition). How large is the maximum expected gain?

10. Maximize Your Gain When Losses Are Possible

Under the same assumptions as in Problem 9, the rule of the game is changed, as follows: if an amount of is won, but otherwise the amount is lost.

a. Characterize the value of that maximizes the expected gain, .

b. Can the game be unfavorable, i.e., can even the maximum expected gain become negative?

c. How large is the maximum expected gain for an exponential rv with rate ? Compare this to the solution of the previous problem; explain the difference.

11. The Optimal Level of Supply

A man offers milk to the spectators of the weekly baseball matches. Before each match, he orders units of milk at a price of per unit; during the match he sells each unit for a price of . Because the milk cannot be kept fresh for a week, each unsold unit imposes a loss of . The actual demand of milk varies from week to week, according to a strictly increasing DF with density . Assume that and that is a continuous rv with finite mean.

a. For given prices , , and a given demand structure , the milkman can choose among different values of . Consider as a continuous variable, and denote as the expected net gain as a function of . Discuss the shape and qualitative properties of .

b. What is the optimal amount of milk, , the man should stock in order to maximize his expected net gain?

12. Mixing RVs vs. Mixing Their Distributions

The concept of a “mixture distribution” is used in probability and its applications in at least two different ways that have quite different meanings.

Let and be two independent normal rvs, with means and and standard deviations of . Consider the rv defined by

The idea here is that in each realization both a value of and a value of are generated, and half of each is added to produce the value of . Thus, in a fairly direct sense each individual realization of contains one part of and one part of . In this sense, is a mixture of and , much like one mixes half a pound of butter and half a pound of flour.

Next, consider an rv that comes, in each realization, with probability from a normal distribution (namely, that of ) with mean and standard deviation and with probability from a normal distribution (namely, that of ) with mean and standard deviation . Thus, its density is equal to

where is the normal density with mean and standard deviation . The idea here is that in each realization either a value of or a value of is generated (but not of both) with equal probability, and that this value then determines that of . However, across many realizations, the density of will still represent a mixture of and .

a. Sketch the densities of and . Is normally distributed? Is ?

b. Derive the means and variances of and . Compare and explain.

c. Let be two arbitrary but independent rvs with densities ; means ; and standard deviations . Let be a proportion mixing either the rvs themselves:

or their densities

Derive for this more general case the means and variances of and .

13. Throwing the Same vs. Different Dice

The standard binomial sampling scheme assumes independent trials with a constant probability of success. Suppose, for example, that we are given a single fair die, and that the success event consists in throwing, say, a . Thus, each single throw results in a success with probability . Within independent trials, we would thus expect about successes, and the variance of the number of successes would be equal to .

a. Suppose we are given two biased dice. Die A will show a with probability , and die B shows a with probability . With these two dice 144 trials are conducted, as follows. For the first 72 trials we use die A, and during the last 72 trials we use die B. Given that the average success probability across both dice is equal to , we would again expect 24 successes. Is the variance of the number of successes larger than, equal to, or smaller than with the standard binomial sampling scheme?

b. Again we are given the two dice A and B described in a. This time, however, we select at random one die and then throw this selected die 144 times, observing again the number of throws yielding a . Is the variance of the number of successes larger than, equal to, or smaller than with the standard binomial sampling scheme?

c. Once again we are given the two dice A and B. This time, however, in each of 144 trials we start by choosing at random one of the two dice, then throw it and observe whether or not the trial yields a . Is the variance of the number of successes larger than, equal to, or smaller than with the standard binomial sampling scheme?

14. Random Ranks

Peter draws independent realizations of a continuous rv and ranks them in increasing order from 1 to 100. Subsequently, Paula draws a single value from the same population and inserts this value into the rank order created earlier by Peter. For example, if her value is such that 50 of Peter’s draws are smaller and 50 are larger, then the rank associated with her draw would be 51 - that is, overall, her value would be the 51st in increasing order. Or, if her value is smaller than all 100 of Peter’s, then the rank 1 would be associated with it.

a. Is it more likely that Paula’s value will occupy rank 51 than rank 1?

b. Derive for general the probability that Paula’s value will occupy rank , where .

15. Ups and Downs

Ups and downs occur in probability just as in real life. An elementary probability version with the real-life property that ups are more often followed by downs is as follows.

Let be three independent and identically distributed continuous rvs that are realized sequentially: first , then , and finally . Let us say that an increment occurs with if , and a decrement otherwise. Similarly, an increment occurs with if , and a decrement otherwise.

a. Suppose you are told only that has led to an increment, but not the actual value of . Argue that conditional on this information is twice as likely to yield a decrement than an increment, even though and are independent.

b. Random variables are said to be exchangeable if their joint density is the same for any permutation of its arguments. Thus, if are exchangeable, then their density, say , is the same for any permutation of . Note that exchangeable rvs may still be dependent. Does the property described in a. still hold if are exchangeable?

c. Suppose that the three rvs have the same marginal distribution with mean and variance , and let them have the common pairwise correlation . Show that the correlation of the rvs and is generally equal to .

16. Is 2X the Same as X1 + X2?

Let be a continuous rv with density , and let be two independent rvs, both distributed as is . It is then not usually the case that the rv is distributed as is . However, the Cauchy density whose standardized form is given by

possesses this property: has the same distribution as the rv . It is illustrated and compared to the standard normal distribution in Figure 1.1.

a. Based on the property described earlier, argue without any explicit calculation that the variance of the Cauchy distribution is necessarily infinite.

b. Give an inductive argument for the rather unintuitive feature that for the Cauchy distribution the arithmetic mean from a sample of independent realizations of has exactly the same distribution as each contributing summand itself.

17. How Many Donors Needed?

To organize a charity event that costs $100, an organization raises funds. Independent of each other, one donor after another gives some amount of money (considered as a continuous quantity here) that is exponentially distributed (for definition, see page 1), with a mean of $20. The process is stopped as soon as $100 or more has been collected. Find the distribution, mean, and variance of the number of successive donors needed until at least $100 dollars has been collected.

18. Large Gaps

Denote as a Poisson process with rate . Let be the waiting time until for the first time no event in has occurred for the last time units. Derive . For example, might stand for the sequence of cars passing at a crosswalk. A pedestrian needs seconds to cross the road. How long does it on average take before he has reached the other side of the road?

19. Small Gaps

Related to the preceding problem, let be the waiting time until two events occur within time units. Derive .

In some applications, the event that has latency is called a “coincidence.” For example, a volume of biological tissue could be permanently destroyed when two damaging particles are absorbed within (or less) time units. The idea here is that following the first absorption the tissue needs to recover for time units; this opens a window of vulnerability during which a further (second) particle has a lethal effect.

20. Random Powers of Random Variables

Let be independent Poisson rvs (for definition, see page 1), with parameters (means) , respectively. Define a rv .

a. Find the expectation .

b. Does generally increase as a function of and ? Explain.

21. How Many Bugs Are Left?

Peter and Paula are given copies of the same text for independent proofreading. Peter finds 20 errors, and Paula finds 15 errors, of which 10 were found by Peter as well. Estimate the number of errors remaining in the text that have not been detected by either Peter or Paula.

22. ML Estimation with the Geometric Distribution

Let the discrete rv have a geometric distribution (see page 1),

and let be independent realizations from it.

a. Find the maximum likelihood (ML) estimate (for definition, see page 2), , of . Give a simple argument why overestimates .

b. Determine the asymptotic standard error, , of this estimate as a function of . Consider the asymptotic standard error as and as . For which value of is maximal?

c. Explain the apparent paradox that is, on the one hand, the reciprocal of a sum of rvs (which in turn will tend to normality), and on the other hand that it is itself asymptotically normally distributed.

23. How Many Twins Are Homozygotic?

Twin pairs are either homozygotic (with probability ) or dizygotic (probability ). In the first case, their common sex is genetically determined only once, for both twins together, so that they must necessarily be of the same sex. In the second case, the twins’ sexes are determined independently and therefore could potentially be different. Suppose that generally in any genetic sex determination in a given population the outcome will be “male” () with probability and “female” () with probability .

Obviously, the sex-related classification of any given twin pair (i.e., vs. vs. ) is readily apparent, whereas the determination of their homo- vs. dizygotic status requires elaborate genetic diagnostics. Specifically, if the two twins are of opposite sexes (i.e., the pair is ) then they must necessarily be dizygotic, but if they are not (i.e., an or pair) then they may be either homo- or dizygotic.

a. L. v. Bortkiewicz (1920; for a summary, see von Mises, 1931, p. 407) has collected the sex-related classification for a total of twin pairs born in Berlin from 1879 to 1911. The frequencies were , , and . From these data, how could one estimate the latent probability of a twin pair to be homozygotic?

b. Is the estimate of significantly different from ?

24. The Lady Tasting Tea

In 1935, R. A. Fisher presented the famous “lady tasting tea” problem. A lady claims she can tell whether in a tea-plus-milk infusion the tea (type cup) or the milk (type cup) was poured in first. To test this claim, the lady is presented with cups of type and cups of type ; her task is to tell these two sets, which differ in nothing other than the order in which tea and milk were poured in, apart. More specifically, after she has been informed that there is an equal number of and cups, she first tastes the content of each of the cups and then singles out those of them that she thinks are, say, of type ; this implies that she considers the remaining cups to be of type .

As a simple model of this situation, suppose the lady can identify the true state of any given cup with probability , ; with probability , she can only guess. Of course, the skeptic’s claim would be that . Judgments about successive cups are considered to be independent.

a. Define a “hit” to occur when a cup is correctly classified. Explain why the number of hits is necessarily the same for both the set of cups the lady designates as type and the remaining set of cups she considers to be of type . Therefore, if we know the number of hits in the first set, the number of hits in the second set is redundant.

b. What is the probability that among the cups she selects as being of type , there will be exactly , , hits, i.e., cups that are in fact of type ?

c. Suppose the lady’s ability to tell and apart corresponds to in the preceding model. Thus, while not perfect, she clearly does much better than guessing. Using the largest possible significance level below the conventional , how large is her probability to demonstrate her ability in an experiment involving , and cups of each type? A computer is required to do these calculations.

25. How to Aggregate Significance Levels

Suppose a one-sided statistical test is based on a statistic that under the has the distribution function , the rejection region being formed by large values of . The test is applied to a first set of data and the statistic is observed, with an associated observed p-value of . On a second independent set of data the same test is applied, and this time a statistic is observed, with an associated observed p-value of .

a. Argue that the observed p-value is a rv that is uniformly distributed on if is true.

b. From a., derive a way to aggregate into a single overall p-value.

c. With the procedure from b., describe the set of that is judged overall-significant at some given theoretical significance level , such as .

26. Approximately How Tall Is the Tallest?

Let be a continuous rv with strictly increasing DF , and let be the largest of independent realizations of .

a. Determine the probability of the event , where is the inverse function of .

b. From a., derive an approximation, valid for medium and large , for the DF, say , of .

c. Show that if is exponentially distributed, then tends to the double exponential distribution.

d. Suppose the height (measured in cm) of males in a country is normally distributed, with mean cm and standard deviation cm. Sketch the approximate DF of the height of the tallest male, if there are (i) 4 million males (as in a small country) or (ii) 120 million males (as in a large country). Compare these graphs to the corresponding figures for females, assuming that their height is normally distributed, with mean cm and standard deviation cm.

27. The Range in Samples of Exponential RVs

Let be an exponential rv with density ; the parameter is called the rate of the density (cf. page 1). It is not difficult to show that . Exponential rvs have two interesting properties.

(i) The minimum of independent exponential rvs, which may have different rates, is again exponentially distributed, with a rate equal to the sum of the individual rates. For example, the minimum of two independent exponential rvs with rates and is again an exponential rv, with rate .

(ii) If an exponential rv is larger than some other positive independent rv, let us call it , then the excess is again exponentially distributed with rate . This characteristic feature holds also, as a special case, when is a constant, say . It is sometimes called the “lack-of-memory” property: given that exceeds , then its additional lifetime, from there on, has exactly the same distribution as the original lifetime, just as if the process had “no memory” for the already-elapsed period of time.

We draw a sample of i.i.d. realizations of (for the meaning of i.i.d., see page 1). Let the rv , , be the th smallest value in this sample. The positive rv is the largest minus the smallest value; it is called the range of the sample. Of course, the value of will vary from sample to sample, according to some distribution that depends on and, of course, on .

a. Use the two properties described earlier heuristically to derive an expression for the expectation of .

b. Similarly, reason heuristically to find the distribution function of without explicit calculations.

28. The Median in Samples of Exponential RVs

As in Problem 27, let be an exponential rv with density . We draw a sample of size ( a positive integer) i.i.d. realizations of , and estimate the median of by the th order statistic of this sample, i.e., the midmost element of the sample that is both larger and smaller than other realizations. Call this median estimate .

a. Derive the density of .

b. Find an expression for the expectation of , and show that for small it is severely biased.

29. Breaking the Record

Consider a process in which, in a chronological (sequential) order, the i.i.d. realizations of an rv with parent density and DF are generated, yielding the sequence . From time to time, it will then be the case that a realization occurs that is larger than the largest value that has been seen so far. In accordance with the everyday meaning of this term, such a realization is called a record. By this definition, is necessarily a record. Also, the second record is the first realization of that is larger than was. In general, the th record is the first realization of that is larger than the th record. Denote the density of the th record as .

a. Consider a sequence of realizations of . How many records could there be minimally and maximally? Give a simple recursive argument, relating the cases and , to find how many records one would on average expect to see in realizations. Explain why the number of records is independent of the parent distribution .

b. Derive (in terms of ) the density of the second record, .

c. Give an inductive argument to establish the general result that the density

d. Explain that in the exponential case of the result derived in c. could be anticipated without explicit calculations from the lack-of-memory property described in the statement of Problem 27.

30. Paradoxical Contribution

Two large predatory birds, A and B, feed on the same habitat. Bird A’s daily prey (in grams) is normally distributed with mean g and ; for bird B, g and . Thus, on average, bird A is a more successful predator; also, its amount of daily prey is less variable than that of bird B. Clearly, on average, the daily overall prey of both birds together equals 100 g. However, as both birds feed on the same habitat, the correlation, , of their daily prey is negative across days, say : any given animal in the habitat that has fallen prey to A is no longer available to B, and vice versa. Let us assume that the situation is adequately described by a bivariate normal distribution.

On an unusually successful day, A and B together manage to reap 175 g.

a. Try to estimate from the information given how much, in expectation, bird A’s share was on days in which A and B together have reaped 175 g.

b. Try to answer the question in a. if the birds feed on different habitats so that .

31. Attracting Mediocrity

Peter has an IQ of 90 whereas the IQ of Paula is 110. However, due to unsystematic biological or psychological day-to-day variation that is unrelated to the IQ per se, any single measurement of either IQ is distorted by an independent additive measurement error that has a zero-mean normal distribution with variance . For example, if Paula’s IQ were measured repeatedly, the outcomes would be normally distributed with a mean of 110 (her “true” IQ) and a standard deviation of .

a. Suppose that either Peter or Paula is selected at random (), and his/her IQ is measured. You do not know who was selected, but you are told that the result of this first measurement is 105. Now the same person, whose identity is unknown to you, is measured a second time. What is your prediction for the outcome of this second measurement if ?

b. Answer the same question if .

c. Suppose that instead of just contemplating Peter and Paula we now deal with a large population of individuals whose true IQs (an rv that we may call ) is normally distributed with mean and variance . As before, each individual measurement is distorted by an independent and additive normally distributed error that has zero-mean and variance . A single person is drawn at random from this large population, his/her IQ is measured, and the outcome is some above-average value . Show that when the IQ of that same person is measured a second time, the expected outcome of the second (“repeated”) measurement is larger than but smaller than .

32. Discrete Variables with Continuous Error

Sometimes an underlying latent (i.e., not directly observable) rv of theoretical or practical interest is generically integer-valued, such as a binomial, Poisson, or geometrical rv. However, measurements of it are distorted by a random error that is continuous. For example, a random integer number of coins of one type (e.g., dimes) is inserted into a vending machine with a control balance that registers the total weight, say in units of the nominal weight of a single dime. In this example, there will be an integer number of coins, but due to minting imperfections, wear-off, and soiling, each coin will differ somewhat from its ideal nominal weight. Summing the individual deviations from the nominal weight across all coins we get the overall measurement error .

If we assume that the measurement error (thought to be independent of ) has a normal distribution with zero mean and standard deviation , then the actually observed measurements are in fact realizations of the rv . Let be the density of .

a. Characterize in words or with a figure the changes of the density as varies from a near-zero value to larger values. When, approximately, will change from multi- to unimodality? To fix ideas, consider the example of a geometric distribution, , .

b. Argue that can also be interpreted as a discrete mixture of normal densities with means and common standard deviation , the mixture weights being given by .

33. The High-Resolution and the Black-White View

Ideally, many properties and processes of the real world show extremely fine gradations and variations. However, in order to measure and to process this information, we are often forced to simplify and reduce it. For example, many measurement devices digitize quantities (such as force, voltage, or time) that in theory are conceptualized as continuous variables. An extreme form of data compression is thresholding, which reduces a continuous input into a simple binary output. Two important questions then are: how should this threshold be set, and how much information do we lose along the way? The following problem deals with these two questions in a simple but exemplary context.

Consider a sample of independent realizations of an exponential rv with density . Researchers A and B both seek to estimate the unknown mean, . A observes the original raw data and can make use of the full information to estimate . B on the other hand obtains the data only after they have passed a digital filter with a threshold-type mechanism, such that each realization is classified as 0 if it is , and classified as 1 if it is . Suppose , , of the realizations turned out to be .

a. Derive the ML estimate of and its asymptotic standard error (a.s.e.) from the raw data as seen by researcher A.

b. For a given threshold value , derive the ML estimate of from the digitized data as seen by researcher B. Does this estimate correspond to your intuition?

c. Find the a.s.e. of this estimate from the data as seen by researcher B.

d. For a given value of what is the “best” threshold value of , i.e., the one that minimizes the a.s.e. determined in part c.? How large, relative to the estimate in a., is this a.s.e.?

34. The Bivariate Lognormal

The standard parametric model for simple correlation and regression contexts is the bivariate normal distribution. It is quite interesting to see how important characteristics regarding, e.g., the correlation coefficient change under a different bivariate distribution model, such as the lognormal distribution.

Let be a normal rv with mean and standard deviation . Then the rv has a univariate lognormal distribution. A basic result about the rv is that its expectation is equal to

Similarly, let the rvs have the bivariate normal distribution with parameters , where are the mean and standard deviation of , and is the correlation of and . Then the pair has a bivariate lognormal distribution. Note that are a pair of nonnegative rvs.

a. Based on the result given earlier concerning the expectation of , derive without new explicit calculations the variance of .

b. Assume that are uncorrelated, . In this case, are the rvs uncorrelated, too?

c. Explain why for the are perfectly correlated, too, if . Why does this result not extend to as well, even though in the bivariate normal model the cases and are perfectly symmetric?

d. Using the results from a. and b., show that the correlation of and equals

35. The arcsin(sqrt(p)) Transform

Let the rv be the number of successes in independent trials, each with success probability . Clearly, is binomially distributed with parameters and , and is the usual estimate of . It is unbiased, that is, , and its variance is , which varies as a quadratic function of . However, many applications, especially in analysis of variance and regression contexts, require variables that may differ in mean but not in variance across conditions. To circumvent this problem, it is customary to analyze the strictly increasing transformation rather than itself.

a. Let be a differentiable function of . Argue heuristically that if is large, then

where is the sampling error of the estimate . From this representation, derive the approximate mean and variance of the rv .

b. At first glance, the choice of the transform seems rather exotic. Give a principled rationale for this particular choice.

36. Binomial Trials Depending on a Latent Variable

In many situations, the outcome of binomial trials is modeled through an underlying latent, that is, not directly observable, rv with strictly increasing DF , such that each of the binomial trials yields a success if and only if . For example, in a population of individuals the susceptibility to flu may follow a particular population distribution. A flu is successfully avoided by an individual if his or her (standardized) susceptibility does not exceed some critical, but unknown, flu threshold . From observations of the relative flu frequency in a random sample taken from this population, one would then like to estimate the threshold parameter, .

Suppose binomial trials yield successes, leading to the usual estimate for the true success probability . Thus, on equating , the natural estimate, , of the model parameter is , where is the inverse function of .

Of course, from sample to sample our estimate will vary, just as the relative success probability, , of which it is a function. Determine the approximate standard error (i.e., the standard deviation) of the estimate based on a sample of trials if is

a. the standard logistic DF, ,

b. the standard normal DF.

37. The Delta Technique with One Variable

Let be an rv with expectation and variance , and let be a given, known function of . In general then

a. Find a series-based approximation to when is small so that is concentrated in the neighborhood of .

b. Apply this result to the case of when is a normal rv with mean and variance . Compare this approximation to the exact result for this case, , as given in Problem 34. When will the approximation be acceptable?

c. A machine is constructed to throw a ball vertically with an initial velocity of 120 [m/s]. However, due to imperfections of the machine, the actual angle varies from throw to throw according to a normal distribution with a mean of and a standard deviation of . What is, approximately, the expected maximum height that the ball will reach, neglecting the air resistance?

38. The Delta Technique with Two Variables

Problem 37 can be generalized to functions of more than one rv. Thus, let be two rvs with expectations , variances , and correlation . Also, let be a given function of these rvs.

a. Find a series-based approximation to that holds when and are concentrated in a region around the point .

b. Apply this result to the case of the ratio of two independent rvs, i.e., and .

c. Explain how the solution in part b. may also be obtained from the solution of Problem 37, related to functions of a single rv.

d. Consider the special case of an rv defined as

where are independent -rvs, with degrees of freedom, respectively. Compare for this case the approximation to the exact result, namely the mean of the -distribution, which is .

e. Suppose that in question c. of Problem 37 in addition to, and independent of, the variation of the angle , the initial velocity of the ball varies according to a normal distribution with [m/s] and [m/s]. Under these conditions, what is, approximately, the expected maximum height the ball will reach?

39. How Many Trials Produced a Given Maximum?

Let be a positive discrete rv with probability distribution and associated probability generating function (cf. page 2). Also, let be continuous i.i.d. rvs, with a common DF . The rv defined by

is then a random maximum, the largest of a random number (namely, ) of rvs (namely, the ). It is intuitively clear that and are positively related. If is large, then will, on average, tend to be larger, because it is then the maximum of a larger number of i.i.d. realizations.

a. Consider the simple case that equals (with probability ) either 1 or 2. This implies that . Assume the to be uniform rvs on . This means that the random maximum is (with equal probability) either simply (namely, if ) or the maximum of and (i.e., if ). Given that , what is the conditional expectation of ?

b. Show that in general

c. Consider for the case that has a geometric distribution, so that .

d. Consider the special case in which equals either 1 or , both with probability . For a uniform rv, , , the case corresponds to part a.

40. Waiting for Success

The probability of a certain event is usually estimated by looking at how often it occurs in independent trials. If this frequency is , then the usual estimate of is . In this procedure, is fixed in advance, independent of the outcome of the individual trials.

An alternative way to estimate is to look at how long (i.e., how many trials) it takes to achieve a preset number of successes. With this procedure, the total number of trials required is an rv, . Intuitively, the larger is, the smaller will be our estimate of .

a. Use to derive a moment estimate, say , for . In Problem 22 we already saw that is also the ML estimate of derived from . Find the expectation of this estimate. Is biased?

b. Argue that , and derive a moment estimate for based on . Use the results from Problem 37 to derive its approximate expectation. When will this approximation be valid?

c. Haldane proposed the improved estimate for . What is the bias of Haldane’s estimate?