Chapter 5 Common Distributions

5.1 Introduction

No Problems. Really!

5.2 Bernoulli and Binomial

Definition 5.1 Define a Bernoulli random variable as a variable that takes on the value 0 with probability \(1-p\) and the value 1 with probability \(p\) where \(0\le p \le 1\). We will write this as \(X\sim Bernoulli( p )\).

The pf of a Bernoulli random variable is written as \[f(x) = p^x (1-p)^{1-x} \;\; \cdot I(x \in \{0,1\})\]

Show that if \(X\sim Bernoulli(p)\) then
1. The expectation of \(X\) is \(E(X) = p\)
2. The variance of \(X\) is \(Var(X) = p(1-p)\)
3. The moment generating function of \(X\) is \(\psi(t)=pe^t + (1-p)\)

Definition 5.2 Define a Binomial random variable as the sum of \(n\) independent and identically distributed Bernoulli(p) random variables. We will write \(X \sim Bin(n,p)\)

The pf of a Binomial random variable is \[f(x)= {n \choose x}p^x (1-p)^{n-x} \;\;\cdot I(x\in \{0,1,2,\dots,n\})\]

Show that if \(X \sim Bin(n,p)\) then
1. The expectation of \(X\) is \(E(X) = np\)
2. The variance of \(X\) is \(Var(X) = np(1-p)\)
3. The moment generating function of \(X\) is \(\left( pe^t + (1-p) \right)^n\)

5.3 Hypergeometric

We next consider a distribution that is the sum of dependent Bernoulli random variables. Consider the scenario where there are \(A\) balls that are amber and \(B\) balls that are blue. The balls are thoroughly mixed and we will select \(n\) of those without replacement. Of interest is \(Y\), the number of amber balls drawn. It is helpful to think of randomly arranging all \(A+B\) balls into some order and then selecting the first \(n\) balls. Define \(X_i=1\) if the \(i\)th ball is amber and \(X_i=0\) if the \(i\)th ball is blue. Finally we note that \(X_i\) is not independent of \(X_j\) and that \[Y=\sum_{i=1}^n X_i\]

Derive the pf of \(Y \sim Hypergeometric(A,B,n)\).
1. How many ways are there to draw, without replacement, \(n\) balls from \(A+B\) when order doesn’t matter?
2. How many ways are there to draw \(x\) amber balls and \(n-x\) blue balls (assuming \(x\le A\) and \(n-x \le B\))?
3. Give the pf of a Hypergeometric(A,B,n) distribution.
Notice that absent any information about the other \(X_j\) balls, \(X_i \sim Bernoulli \Big( \frac{A}{A+B} \Big)\). Use this information to derive the expectation of \(Y\).
Because \(X_i\) is negatively correlated with \(X_j\), we can’t easily derive the variance of \(Y\). Instead we will be obnoxiously clever.
1. Let \(n=A+B\) so that we are selecting all the balls. Argue that \(Var(Y) = 0\).
2. Because of the symmetry of the random assignments of balls, then \(Cov(X_i,X_j)\) is the same for all \(i\ne j\). There isn’t anything thing to do here, but this part of our arguement is critical.
3. We know that for any set of random variables \[Var(Y) = \sum_{i=1}^n Var(X_i) + \sum_{i \ne j} Cov(X_i,X_j)\] Use this information, along with parts (a) and (b), to solve for the \(Cov(X_i, X_j)\). The hard part here is figuring out how many covariance terms there are.
4. Finally, show that \[Var(Y) = \frac{nAB}{(A+B)^2}\cdot \frac{A+B-n}{A+B-1}\]
Suppose that we have \(Y \sim Hypergeometric(A,B,n)\) and separately we have \(W \sim Bernoulli \Big( n, \frac{A}{A+B} \Big)\). Show that for \(n > 1\) that \(Var(W) > Var(Y)\).
My daughter recently mixed \(20\) M&Ms and \(30\) Skittles in a bowl and left them for me to find.
1. What is the probability that I select only M&Ms when I select 6 pieces of candy?
2. What is the expected number of M&Ms in the next 6 pieces (from the 50)?

5.4 Poisson

The Poisson distribution is used to model the number of events that happen in some unit time. The critical idea is that the number of events that occur in any two disjoint time periods are independent, regardless of the length of the period. By splitting the time unit into many sub-intervals, each of which that could only have 0 or 1 event and considering those sub-intervals as independent Bernoulli RVs, it is possible to justify the following probability function \[f(x) = \frac{e^{-\lambda} \lambda^x}{x!} \;\;\cdot I(x \in \{0,1,2,\dots\} )\] where the parameter \(\lambda\) represents the mean number of events that should happen per time interval of some specific size.

Suppose that \(X\sim Poisson(\lambda)\). Show that
1. This is a valid pf by showing that \(f(x) \ge 0\) for all \(x\) and that \(\sum_{x=0}^\infty f(x) = 1\). Hint, look at the series expansion of \(e^\lambda\).
2. The expectation of \(X\) is \(E(X)=\lambda\).
3. The variance of \(X\) is \(Var(X)=\lambda\). Hint, derive \(E[ X(X-1) ]\) and use that to figure out \(E[X^2]\).
4. The moment generating function of \(X\) is \[\psi(t)=e^{\lambda(e^t-1)}\]
Show that if \(X_1,\dots,X_n\) are independent and identically distributed \(Poisson(\lambda)\) random variables, which we denote as \[X_i \stackrel{iid}{\sim} Poisson(\lambda)\] then \[Y=\sum(X_i) \sim Poisson(n\lambda).\]

5.5 Geometric and Negative Binomial

We will first define the geometric distribution. Here we consider another version of multiple Bernoulli random variables. This time, we consider an experiment where we repeatedly sample from a \(Bernoulli(p)\) distribution, where \(p\) is the probability the draw was a success, and each draw is independent of all previous draws. We are interested in “the number of failures before the first success.”

We first consider \(Y \sim Geometric(p)\).
1. What is the probability that there are no failures? That is, what is the probability that the first success occurs on the first draw. \[f(0) = Pr( Y = 0 ) = \;\;?\]
2. What is the probability that there is one failure followed by a success? What is the probability that there are \(y\) failures before the first success?
3. Use this to derive the pf of a \(Geometric(p)\) distribution.
Show that the Moment Generating Function for the \(Geometric(p)\) distribution is \[\psi(t) = \frac{p}{1-(1-p)e^t}\] Hint, the Geometric distribution is named as such because the Geometric Series result is necessary to show this.
Utilize the mfg to derive the expected value and variance of a \(Geometric(p)\) distribution.

The Negative Binomial distribution extends the idea of “number of failures until the first success” to the number of failures until the \(r\)th success.

The pf of the \(Y \sim Negative \;Binomial(r,p)\) distribution can be derived with the following:
1. What is the probability of observing exactly \(y\) failures in a row, followed by \(r\) successes?
2. How many ways are there to distribute the \(r\) successes among the \(r+y\) draws, keeping in mind that the final draw must be a success?
3. Use the above ideas to derive the pf.
A second way of thinking about the negative binomial distribution is as the sum of \(r\) independent Geometric random variables. Utilize this interpretation to derive the expectation, variance, and moment generating function of a \(Negative \;Binomial(r,p)\) distribution.

Some books parameterize the geometric (and negative binomial) distributions as the total number of draws before the first (or \(n\)th) success. Whenever you are looking up properties of this distribution, make sure it is defined how you want it. For example, the wikipedia page for the geometric (and negative binomial) have the information for both definitions.

5.6 Normal

The normal distribution plays a central role in statistics because there are many many asymptotic results that show that some quantity converges to a normal distribution.

The normal distribution is defined by it’s expectation \(\mu\) and variance \(\sigma^2\) and has pdf \[f(x) = \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ -\frac{(x-\mu)^2}{2\sigma^2} \right]\] where \(\exp [w] = e^w\) is just a notational convenience. Notice that we have no indicator function, and \(f(x) > 0\) for all \(x\in \mathbb{R}\).

One of the most tedious result to derive is that the pdf integrates to one (see the appendix) but can easily be done using the appropriate polar-coordinate transformation.

For our first result, we will derive the moment generating function.

\[\begin{aligned} \psi(t) &= E(e^{tX}) \\ &= \int_{-\infty}^\infty e^{tx} \; \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ -\frac{(x-\mu)^2}{2\sigma^2} \right] \,dx\\ &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ tx -\frac{(x-\mu)^2}{2\sigma^2} \right] \,dx\\ \end{aligned}\] We then expand the square twice in the exponent by adding and subtracting the term \(\mu t + \frac{1}{2}\sigma^2 t^2\) and then re-arranging to find

\[\begin{aligned} \psi(t) &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ \mu t + \frac{1}{2} \sigma^2 t^2 - \frac{\Big(x-(\mu+\sigma^2t)\Big)^2}{2\sigma^2} \right] \,dx\\ &= \exp \left[ \mu t + \frac{1}{2} \sigma^2 t^2 \right] \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma} \exp \left[ - \frac{\Big(x-(\mu+\sigma^2t)\Big)^2}{2\sigma^2} \right] \,dx \\ &= \exp \left[ \mu t + \frac{1}{2} \sigma^2 t^2 \right] \end{aligned}\]

Show that if \(X \sim N(\mu, \sigma^2)\), then \(E(X) = \mu\). Hint, use the mgf.
Show that if \(X \sim N(\mu, \sigma^2)\), then \(Var(X) = \sigma^2\). Hint, use the mgf.
Show that if \(X \sim N(\mu, \sigma^2)\), then \(Y=aX + b\) has distribution \(Y\sim N(a\mu+b, \,a^2 \sigma^2)\).
Show that if \(X_i \sim N(\mu_i,\, \sigma^2_i)\) for \(i \in \{1,2,\dots,n\}\) where \(X_i\) are all independent, then \(Y=\sum X_i\) has distribution \(Y \sim N\left( \sum \mu_i, \sum \sigma^2_i \right)\)

5.7 Uniform

The continuous uniform distribution on the interval \((a,b)\) where \(a<b\) has a density function \[f(x) = \frac{1}{b-a} \;\cdot I(a < x < b)\] 1. Show that if \(X\sim Unif(a,b)\) then \(E(X) = \frac{1}{b-a}\)

Show that the moment generating function of a \(Uniform(a,b)\) distribution is \[\psi(t) = \frac{e^{bt} - e^{at}}{t(b-a)}\]
Show that if \(U \sim Uniform(0,1)\) then \(X=(b-a)U + a\) has a \(Uniform(a,b)\) distribution.
Show that the \(Uniform(0,1)\) distribution has expectation \(\frac{1}{2}\) and variance \(\frac{1}{12}\).
Show that the \(Uniform(a,b)\) distribution has expectation \(\frac{a+b}{2}\) and variance \(\frac{(b-a)^2}{12}\)

STA 473 - Probability