**Sampling Distributions:**

Suppose I randomly select 100 seniors in HenryCounty and record each one’s GPA.

1.95 |
1.98 |
1.86 |
2.04 |
2.75 |
2.72 |
2.06 |
3.36 |
2.09 |
2.06 |

2.33 |
2.56 |
2.17 |
1.67 |
2.75 |
3.95 |
2.23 |
4.53 |
1.31 |
3.79 |

1.29 |
3.00 |
1.89 |
2.36 |
2.76 |
3.29 |
1.51 |
1.09 |
2.75 |
2.68 |

2.28 |
3.13 |
2.62 |
2.85 |
2.41 |
3.16 |
3.39 |
3.18 |
4.05 |
3.26 |

1.95 |
3.23 |
2.53 |
3.70 |
2.90 |
2.79 |
3.08 |
2.79 |
3.26 |
2.29 |

2.59 |
1.36 |
2.38 |
2.03 |
3.31 |
2.05 |
1.58 |
3.12 |
3.33 |
2.04 |

2.81 |
3.94 |
0.82 |
3.14 |
2.63 |
1.51 |
2.24 |
2.22 |
1.85 |
1.96 |

2.05 |
2.62 |
3.27 |
1.94 |
2.01 |
1.68 |
2.01 |
3.15 |
3.44 |
4.00 |

2.33 |
3.01 |
3.15 |
2.25 |
3.34 |
2.22 |
3.29 |
3.90 |
2.96 |
2.61 |

3.01 |
2.86 |
1.70 |
1.55 |
1.63 |
2.37 |
2.84 |
1.67 |
2.92 |
3.29 |

These 100 seniors make up
one possible ** sample**.

All seniors in Henry
County make up the ** population**.

The sample mean () is 2.5470 and the sample standard deviation () is 0.7150.

The population mean ()and the population standard deviation () are unknown.

We can use to estimate and we can use to estimate . These estimates may or may not be reliable.

A number that describes
the population is called a ** parameter**. Hence,
and are both

A parameter is usually represented by .

A number that is computed
from a sample is called a ** statistic**. Therefore,
and are both

A statistic is usually represented by .

If I had chosen a different 100 seniors, then I would have a different sample, but it would still represent the same population. A different sample almost always produces different statistics.

*Example*:
Let represent the proportion of seniors in a sample of 100
seniors whose GPA is 2.0 or higher.

If I compare many
different samples and the statistic is very similar in each one, then the **
sampling variability** is low.

If I compare many different samples and the
statistic is very different in each one, then the ** sampling variability**
is high.

The ** sampling model**
of a statistic is a model of the values of the statistic from all possible
samples of the same size from the same population.

* *

*Example*:
Suppose the sampling model consists of the samples
. (Note: There are actually many more than ten possible
samples.) This sampling model has mean 0.754 and standard deviation 0.049.

sampling distribution

The statistic used to
estimate a parameter is ** unbiased** if the mean of its

*Example*:

Since the mean of the sampling model is 0.754, then is an unbiased estimator of if the true value of (the proportion of all seniors in Anne Arundel County with a GPA of 2.0 or higher) equals 0.754.

A statistic can be **
unbiased** and still have high

**Sample Proportions:**

The parameter is the population proportion. In practice, this value is always unknown. (If we know the population proportion, then there is no need for a sample.)

The statistic is the sample proportion.

We use to estimate the value of .

The value of the statistic changes as the sample changes.

How can we describe the sampling model for ?

1. shape?

2. center?

3. spread?

If our sample is an SRS
of size *n*, then the following statements describe the sampling model for
:

1.
The shape is __approximately
normal__.

** ASSUMPTION:
**Sample size is sufficiently large.

**
CONDITION**:

2.
The __mean__ is
.

3.
The __standard deviation__ is
or

**
**

**
ASSUMPTION:
**Sample size is sufficiently large.

*
*

**
CONDITION**:
The population is at least 10 times as large as the sample.

**Sample Means:**

If we have **categorical**
data, then we must use **sample proportions** to construct a sampling model.

*Example*:

Suppose we want to know
how many seniors in Maryland plan to attend college. We want to know how many
seniors would answer, “YES” to the question, “Do you plan to attend college?”
These responses are **categorical**.

So (our parameter) is the proportion of all seniors Maryland who plan to attend college.

Let (our statistic) be the proportion of Maryland students in an SRS of size 100 who plan to attend college.

To calculate the value of , we divide the number of “Yes” responses in our sample by the total number of students in the sample.

If I graph the values of
for all possible samples of size 100, then I have constructed
a **sampling model**. What will the sampling model look like?

It will be **
approximately** normal. In fact, the larger my sample size, the closer it
will be to a normal model.

It can never be perfectly normal, because our data is discrete, and normal distributions are continuous.

So how large is large enough to ensure that the sampling model is close to normal???

Both **np** and **nq**
should be at least 10 in order for normal approximations to be useful.

Furthermore…

The mean of the sampling model will equal the true population proportion.

And…

The standard deviation (if the population is at least 10 times as large as the sample) will be .

If, on the other hand, we
have **quantitative** data, then we can use **sample means** to construct
a sampling model.

*Example*:

Suppose I randomly select 100 seniors in Georgia and record each one’s GPA. I am interested in knowing the average GPA of a senior in Georgia

1.95 |
1.98 |
1.86 |
2.04 |
2.75 |
2.72 |
2.06 |
3.36 |
2.09 |
2.06 |

2.33 |
2.56 |
2.17 |
1.67 |
2.75 |
3.95 |
2.23 |
4.53 |
1.31 |
3.79 |

1.29 |
3.00 |
1.89 |
2.36 |
2.76 |
3.29 |
1.51 |
1.09 |
2.75 |
2.68 |

2.28 |
3.13 |
2.62 |
2.85 |
2.41 |
3.16 |
3.39 |
3.18 |
4.05 |
3.26 |

1.95 |
3.23 |
2.53 |
3.70 |
2.90 |
2.79 |
3.08 |
2.79 |
3.26 |
2.29 |

2.59 |
1.36 |
2.38 |
2.03 |
3.31 |
2.05 |
1.58 |
3.12 |
3.33 |
2.04 |

2.81 |
3.94 |
0.82 |
3.14 |
2.63 |
1.51 |
2.24 |
2.22 |
1.85 |
1.96 |

2.05 |
2.62 |
3.27 |
1.94 |
2.01 |
1.68 |
2.01 |
3.15 |
3.44 |
4.00 |

2.33 |
3.01 |
3.15 |
2.25 |
3.34 |
2.22 |
3.29 |
3.90 |
2.96 |
2.61 |

3.01 |
2.86 |
1.70 |
1.55 |
1.63 |
2.37 |
2.84 |
1.67 |
2.92 |
3.29 |

These 100 seniors make up
one possible ** sample**.

The sample mean () is 2.5470 and the sample standard deviation () is 0.7150.

So (our parameter) is the true mean GPA of a senior in Georgia.

And (our statistic) is the mean GPA of a senior in Georgia in an SRS of size 100.

To calculate the value of , we find the mean of our sample ().

If we pick different samples, then the value of our statistic changes:

If I graph the values of
for all possible samples of size 100, then I have constructed
a **sampling model **of sample means. What will the sampling model look
like?

Remember that each
value is a mean. Means are __less variable__ than
individual observations because if we are looking only at means, then we don’t
see any extreme values, only average values. We won’t see GPA’s that are very
low or very high, only average GPA’s.

The larger the sample size, the less variation we will see in the values of . So the standard deviation decreases as the sample size increases.

**So what will the
sampling model look like???**

If the sample size is large, it will be approximately normal.

It can never be perfectly normal, because our data is discrete, and normal distributions are continuous.

Furthermore…

The mean of the sampling model will equal the true population mean .

And…

The standard deviation will be (if the population is at least 10 times as large as the sample).

**
**

**
Central Limit Theorem**

**
**

Draw an
SRS of size *n* from any population
whatsoever with mean and standard deviation
.

When *n* is large, the sampling model of
the sample means is close to the normal model
with mean and standard deviation
.

**
Law of Large Numbers**

Draw observations at random from any population with mean . As the number of observations increases, the sample mean gets closer and closer to .