Test for Goodness of Fit:

To analyze categorical data, we construct two-way tables and examine the counts or percents of the explanatory and response variables.

 Proportion of M&M MINIs® Colors COLOR Observed Count, O Expected Count, E Blue Brown Green Orange Red Yellow

We want to compare the observed counts to the expected counts.

The null hypothesis is that there is no difference between the observed and expected counts.

The alternative hypothesis is that there is a difference between the observed and expected counts.

is called the chi-square statistic.  It measures how well the observed counts fit the expected counts, assuming that the null hypothesis is true.

The distribution of the chi-square statistic is called the chi-square distribution, c2.  This distribution is a density curve.

The total area under the curve is 1.

The curve begins at zero on the horizontal axis and is skewed right.

As the degrees of freedom increase, the shape of the curve becomes more symmetric.

Using the M&M Minis® chi-square statistic, find the probability of obtaining a C2 value at least this extreme assuming the null hypothesis is true.

This is known as the “Goodness of Fit Test.”

Graph the chi-square distribution with (n – 1) = 5 degrees of freedom:

TI-83+: c2pdf (X, 5)

0       2          4        6          8       10       12       14       16        18

We would expect to obtain a C2 value at least this extreme in about _____ out of every _____ samples, assuming the null hypothesis is true.

CONDITIONS: The Goodness of Fit Test may be used when all counts are at least 1 and no more than 20% of the counts are less than 5.

Following the Goodness of Fit Test, check to see which component made the greatest contribution to the chi-square statistic to see where the biggest changes occurred.

Inference for Two Way Tables:

To compare two proportions, we use a 2-Proportion Z Test or a 2-Proportion T Test.  If we want to compare three or more proportions, we need a new procedure.

The first step in the overall test for comparing several proportions is to arrange the data in a two-way table.

Think of the counts as elements of a matrix with r rows and c columns.  This is called an r x c table with (r)(c) cells.

Our null hypothesis is that there is no difference among the proportions.  The alternative hypothesis is that there is some difference among the proportions.

We will use the chi-square test to measure how far the observed values are from the expected values.

To calculate the expected counts, multiply the row total by the column total, and divide by the table total:

The chi-square statistic is the sum over all r x c cells in the table:

The degrees of freedom is (r – 1)(c – 1).  The         P-value is the area to the right of the  statistic under the chi-square density curve.

The chi-square test can also be used to show evidence that there is a relationship between two categorical variables.

Use this if you have independent SRSs from several populations where one variable is categorical and the other is the sample number

Or, if you have a single SRS with each individual classified according to two categorical variables

Or, if you have an entire population with each individual classified according to two categorical variables