Chapter 5 Common Distributions

5.1 Introduction

No Problems.

5.2 Bernoulli and Binomial

Definition 5.1 Define a Bernoulli random variable as a variable that takes on the value 0 with probability \(1-p\) and the value 1 with probability \(p\) where \(0\le p \le 1\). We will write this as \(X\sim Bernoulli( p )\).

The pf of a Bernoulli random variable is written as \[f(x) = p^x (1-p)^{1-x} \;\; \cdot I(x \in \{0,1\})\]

One convenient place for calculating probabilities is a Web App created by Chester Ismay: https://ismay.shinyapps.io/ProbApp/

Alternatively you could use Mathematica, Wolfram Alpha, or R. In a pinch, you could resort to the tables in the back of your book, but I wouldn’t recommend that.

Show that if \(X\sim Bernoulli(p)\) then
1. The expectation of \(X\) is \(E(X) = p\)
2. The variance of \(X\) is \(Var(X) = p(1-p)\)
3. The moment generating function of \(X\) is \(\psi(t)=pe^t + (1-p)\)

Definition 5.2 Define a Binomial random variable as the sum of \(n\) independent and identically distributed Bernoulli(p) random variables. We will write \(X \sim Bin(n,p)\)

The pf of a Binomial random variable is \[f(x)= {n \choose x}p^x (1-p)^{n-x} \;\;\cdot I(x\in \{0,1,2,\dots,n\})\]

Show that if \(X \sim Bin(n,p)\) then
1. The expectation of \(X\) is \(E(X) = np\)
2. The variance of \(X\) is \(Var(X) = np(1-p)\)
3. The moment generating function of \(X\) is \(\left( pe^t + (1-p) \right)^n\)
Suppose that a fair coin (\(p=0.5\)) is tossed independently four times. What is the probability that the number of heads is greater than the number of tails? What if we tossed the coin ten times?
Suppose that we have an airplane that seats 200 passengers. Through long experience we know that approximately 1% of customers don’t show up to their flight, So to maximize our profit, we might sell 202 tickets and if 2 people don’t show up, then we make more money. Given that we have sold 202 tickets, what is the probability that everybody who shows up for the flight has a seat?
My two children are throwing toys into a toy chest. Elise has three toys and each has probability of \(2/3\) of going into the chest. Casey has four toys to throw but his probability is only \(1/3\). What is the expected number of toys to make it into the chest?

5.3 Hypergeometric

We next consider a distribution that is the sum of dependent Bernoulli random variables. Consider the scenario where there are \(A\) balls that are amber and \(B\) balls that are blue. The balls are thoroughly mixed and we will select \(n\) of those without replacement. Of interest is \(Y\), the number of amber balls drawn. It is helpful to think of randomly arranging all \(A+B\) balls into some order and then selecting the first \(n\) balls. Define \(X_i=1\) if the \(i\)th ball is amber and \(X_i=0\) if the \(i\)th ball is blue. Finally we note that \(X_i\) is not independent of \(X_j\) and that \[Y=\sum_{i=1}^n X_i\]

Derive the pf of \(Y \sim Hypergeometric(A,B,n)\).
1. How many ways are there to draw, without replacement, \(n\) balls from \(A+B\) when order doesn’t matter?
2. How many ways are there to draw \(x\) amber balls and \(n-x\) blue balls (assuming \(x\le A\) and \(n-x \le B\))?
3. Give the pf of a Hypergeometric(A,B,n) distribution.
Notice that absent any information about the other \(X_j\) balls, \(X_i \sim Bernoulli \Big( \frac{A}{A+B} \Big)\). Use this information to derive the expectation of \(Y\).
Because \(X_i\) is negatively correlated with \(X_j\), we can’t easily derive the variance of \(Y\). Instead we will be obnoxiously clever.
1. Let \(n=A+B\) so that we are selecting all the balls. Argue that \(Var(Y) = 0\).
2. Again with \(n=A+B\), notice that because of the symmetry of the random assignments of balls, then \(Cov(X_i,X_j)\) is the same for all \(i\ne j\). There isn’t anything thing to do here, but this part of our arguement is critical.
3. We know that for any set of random variables \[Var(Y) = \sum_{i=1}^n Var(X_i) + \sum_{i \ne j} Cov(X_i,X_j)\] Use this information, along with parts (a) and (b), to solve for the \(Cov(X_i, X_j)\) in terms of just \(A\) and \(B\) under the \(n=A+B\) condition. The hard part here is figuring out how many covariance terms there are. Because this result is just a statement about two draws, then just as in problem 2, it doesn’t matter how many balls are drawn. This step gives us the covariance between \(X_i\) and \(X_j\) regardless of the number draws.
4. Finally, letting \(n\) be an arbitrary integer between \(0\) and \(A+B\), and the \(Cov(X_i, X_j)\) calculated above, show that \[Var(Y) = \frac{nAB}{(A+B)^2}\cdot \frac{A+B-n}{A+B-1}\]
My daughter recently mixed \(20\) M&Ms and \(30\) Skittles in a bowl and left them for me to find.
1. What is the probability that I select only M&Ms when I select 6 pieces of candy?
2. What is the expected number of M&Ms in the next 6 pieces (from the 50)?
3. What kind of monster am I raising?

5.4 Poisson

The Poisson distribution is used to model the number of events that happen in some unit time. The critical idea is that the number of events that occur in any two disjoint time periods are independent, regardless of the length of the period. By splitting the time unit into many sub-intervals, each of which that could only have 0 or 1 event and considering those sub-intervals as independent Bernoulli RVs, it is possible to justify the following probability function \[f(x) = \frac{e^{-\lambda} \lambda^x}{x!} \;\;\cdot I(x \in \{0,1,2,\dots\} )\] where the parameter \(\lambda\) represents the mean number of events that should happen per time interval of some specific size.

Suppose that \(X\sim Poisson(\lambda)\). Show that
1. This is a valid pf by showing that \(f(x) \ge 0\) for all \(x\) and that \(\sum_{x=0}^\infty f(x) = 1\). Hint, look at the series expansion of \(e^\lambda\).
2. The expectation of \(X\) is \(E(X)=\lambda\).
3. The variance of \(X\) is \(Var(X)=\lambda\). Hint, derive \(E[ X(X-1) ]\) and use that to figure out \(E[X^2]\).
4. The moment generating function of \(X\) is \[\psi(t)=e^{\lambda(e^t-1)}\]
Show that if \(X_1,\dots,X_n\) are independent and identically distributed \(Poisson(\lambda)\) random variables, which we denote as \[X_i \stackrel{iid}{\sim} Poisson(\lambda)\] then \[Y=\sum(X_i) \sim Poisson(n\lambda).\]
While driving along the interstate, the number of cars that pass us is reasonably approximated by a Poisson process with rate parameter \(\lambda=0.3\) cars per minute.
1. What is the rate parameter if we wanted it in cars per hour?
2. What is the probability that in a 5 minute stretch of time, that 2 cars pass us?
3. What is the probability that in a 5 minute stretch of time, that 4 or more cars pass us?
Suppose that the number of items produced by a machine follows a Poisson distribution with mean \(\lambda\). Each of those items independently has a probability of being defective of \(p\). We are interested in the marginal distribution the total number defective items produced. Let \(Y\) be the number of defective items, \(N\) be the number of items produced, and \(X_i\) be whether the \(i\)th item produced is defective.
1. What is the distribution of \(X_i\)? What is the distribution of \(N\)? What is the conditional distribution of \(Y\) given \(N\)?
2. What is the joint distribution of \(Y\) and \(N\)?
3. By summing out \(N\), what is the marginal distribution of \(Y\)? Hint: Notice that \(Y\le N\).

5.5 Geometric and Negative Binomial

We will first define the geometric distribution. Here we consider another version of multiple Bernoulli random variables. This time, we consider an experiment where we repeatedly sample from a \(Bernoulli(p)\) distribution, where \(p\) is the probability the draw was a success, and each draw is independent of all previous draws. We are interested in “the number of failures before the first success.”

Show that the Moment Generating Function for the \(Geometric(p)\) distribution is \[\psi(t) = \frac{p}{1-(1-p)e^t}\] Hint, the Geometric distribution is named as such because the Geometric Series result is necessary to show this.
Utilize the mfg to derive the expected value and variance of a \(Geometric(p)\) distribution.

The Negative Binomial distribution extends the idea of “number of failures until the first success” to the number of failures until the \(r\)th success.

The pf of the \(Y \sim Negative \;Binomial(r,p)\) distribution can be derived with the following:
1. What is the probability of observing exactly \(y\) failures in a row, followed by \(r\) successes?
2. How many ways are there to distribute the \(r\) successes among the \(r+y\) draws, keeping in mind that the final draw must be a success?
3. Use the above ideas to derive the pf.
A second way of thinking about the negative binomial distribution is as the sum of \(r\) independent Geometric random variables. Utilize this interpretation to derive the expectation, variance, and moment generating function of a \(Negative \;Binomial(r,p)\) distribution.
Suppose that we have a Negative Binomial distribution with expectation \(\mu\) and variance \(\sigma^2\). What must the \(r\) and \(p\) be? Notice this is just allows for an alternate paramterization of the distribution in terms of the mean and variance.
In the game Pass the Pigs, the probability of a Snouter is approximately 4%.
1. What is the expected number of failures before I roll my fourth Snouter?
2. What is the variance of the number of failures before I roll my fourth Snouter?
Suppose that \(X\sim Geometric(p)\). What is the probability that \(X \in \{0,2,4,\dots\}\)? Hint: Let w=2x.
Show that if \(X \sim Geometric(p)\) that \(Pr( X \ge k ) = (1-p)^k\) for \(k \in \{0,1,2,\dots\}\). _
The Memoryless Property of the Geometric Distribution Let \(X\sim Geometric(p)\). Show that if it is known that \(X\ge k\) for \(k \in \{0,1,2,\dots\}\), then \(Pr( X = k+t \;|\; X \ge k) = Pr( X =t )\)

Some books parameterize the geometric (and negative binomial) distributions as the total number of draws before the first (or \(n\)th) success. Whenever you are looking up properties of this distribution, make sure it is defined how you want it. For example, the wikipedia page for the geometric (and negative binomial) have the information for both definitions.

5.6 Normal

The normal distribution plays a central role in statistics because there are many many asymptotic results that show that some quantity converges to a normal distribution.

The normal distribution is defined by it’s expectation \(\mu\) and variance \(\sigma^2\) and has pdf \[f(x) = \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ -\frac{(x-\mu)^2}{2\sigma^2} \right]\] where \(\exp [w] = e^w\) is just a notational convenience. Notice that we have no indicator function, and \(f(x) > 0\) for all \(x\in \mathbb{R}\).

One of the most tedious result to derive is that the pdf integrates to one (see the appendix) but can easily be done using the appropriate polar-coordinate transformation.

For our first result, we will derive the moment generating function.

\[\begin{aligned} \psi(t) &= E(e^{tX}) \\ &= \int_{-\infty}^\infty e^{tx} \; \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ -\frac{(x-\mu)^2}{2\sigma^2} \right] \,dx\\ &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ tx -\frac{(x-\mu)^2}{2\sigma^2} \right] \,dx\\ \end{aligned}\] We then expand the square twice in the exponent by adding and subtracting the term \(\mu t + \frac{1}{2}\sigma^2 t^2\) and then re-arranging to find

\[\begin{aligned} \psi(t) &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ \mu t + \frac{1}{2}\sigma^2 t^2 -\mu t -\frac{1}{2} \sigma^2 t^2 + tx -\frac{(x-\mu)^2}{2\sigma^2} \right] \,dx\\ &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ \mu t + \frac{1}{2}\sigma^2 t^2 - \left\{ \mu t +\frac{1}{2} \sigma^2 t^2 - tx +\frac{(x-\mu)^2}{2\sigma^2} \right\} \right] \,dx\\ &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ \mu t + \frac{1}{2}\sigma^2 t^2 - \left\{\frac{2\sigma^2 \mu t + \sigma^4 t^2 - 2\sigma^2 tx + x^2 -2x\mu + \mu^2}{2\sigma^2} \right\} \right] \,dx\\ &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ \mu t + \frac{1}{2}\sigma^2 t^2 - \left\{ \frac{x^2 - 2x(\mu+\sigma^2 t) + (\mu + \sigma^2t)^2}{ 2 \sigma^2} \right\} \right] \,dx\\ &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ \mu t + \frac{1}{2} \sigma^2 t^2 - \frac{\Big(x-(\mu+\sigma^2t)\Big)^2}{2\sigma^2} \right] \,dx\\ &= \exp \left[ \mu t + \frac{1}{2} \sigma^2 t^2 \right] \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ - \frac{\Big(x-(\mu+\sigma^2t)\Big)^2}{2\sigma^2} \right] \,dx \\ &= \exp \left[ \mu t + \frac{1}{2} \sigma^2 t^2 \right] \end{aligned}\]

Show that if \(X \sim N(\mu, \sigma^2)\), then \(E(X) = \mu\). Hint, use the mgf.
Show that if \(X \sim N(\mu, \sigma^2)\), then \(Var(X) = \sigma^2\). Hint, use the mgf.
Show that if \(X \sim N(\mu, \sigma^2)\), then \(Y=aX + b\) has distribution \(Y\sim N(a\mu+b, \,a^2 \sigma^2)\). In particular, notice that \(Z=\frac{X-\mu}{\sigma}\) has a \(N(0,1)\) distribution.
Show that if \(X_i \sim N(\mu_i,\, \sigma^2_i)\) for \(i \in \{1,2,\dots,n\}\) where \(X_i\) are all independent, then \(Y=\sum X_i\) has distribution \(Y \sim N\left( \sum \mu_i, \sum \sigma^2_i \right)\)
In STA 270/275 it was claimed that the mean, median, and mode of a \(N(\mu,\sigma^2)\) distribution are all \(\mu\). We can show this for the \(N(0,1)\) case and due to the transformations confirmed in problem three, it will hold for all normal distributions. In the parts below, let \(Z\sim N(0,1)\)
1. Use symmetric to show that the median of the standard normal distribution is 0.
2. Show that the peak of the distribution occurs at 0 using the first derivative test. The peak of a distribution is often called the mode.
3. Using the second derivative, show that the curve has inflection points at \(z=-1\) and \(z=1\) where the concavity changes.
The amount of dry kibble that I feed my cats each morning can be well approximated by a normal distribution with mean \(\mu=200\) grams and standard deviation \(\sigma=30\) grams.
1. What is the probability that I fed my cats more than 250 grams of kibble this morning?
2. From my cats’ perspective, more food is better. How much would I have to feed them for this morning to be among the top \(10\%\) of feedings?

5.7 Exponential and Gamma

Suppose that \(X \sim Gamma(\alpha, \beta)\) and \(c\) is a positive constant. Show that \(cX\) is distributed \(Gamma(\alpha, \beta/c)\).
Find mode (peak of the distribution) for a \(Gamma(\alpha, \beta)\) distribution.
Suppose that \(X_i \stackrel{iid}{\sim} Exp(\beta)\) for \(i\in \{1,2,\dots,n\}\). Find the distribution of the sample mean \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\).
Suppose that \(X_i \sim Exp(\beta_i)\) are independently distributed for \(i\in \{1,2,\dots,n\}\). Show that the distribution of the sample minimum (\(Y = \min\{X_1, X_2, \dots, X_n\}\) is \(Exp \left( \sum\beta_i \right)\).
Suppose that an electronic system contains \(n\) similar components that function independently of each other and that are connected in series so that the system fails as soon as one of the components fails. Suppose also that the length of life for each component, measured in hours, has the exponential distribution with mean \(\mu\). Determine the mean and the variance of the length of time until the system fails.

5.8 Beta

Suppose that \(X \sim Beta(\alpha, \beta)\). Derive the expectation and variance of \(X\).
Suppose that \(X \sim Beta(\alpha, \beta)\). Derive the distribution of \(Y = 1-X\).
Suppose that \(X \sim Gamma(\alpha_1, \beta)\) and \(Y \sim Gamma(\alpha_2, \beta)\) and that \(X\) and \(Y\) are independent. Define \(U=X/(X+Y)\) and \(V=X+Y\).
1. Derive the joint distribution of \(U\) and \(V\). Hint: This goes back to Section 3.9. Theorem 5.8.4 in this section is also helpful.
2. Show that \(U\) and \(V\) are independent.
3. Show that \(U \sim Beta(\alpha_1, \alpha_2)\)