Test for Goodness of Fit:
To analyze categorical data,
we construct twoway tables and examine the counts or percents of the
explanatory and response variables.
Proportion of M&M MINIs^{®} Colors 

COLOR 
Observed Count, O 
Expected Count, E 
_{} 
Blue 



Brown 



Green 







Red 



Yellow 



We want to compare the
observed counts to the expected counts.
The null hypothesis is that
there is no difference between the
observed and expected counts.
The alternative hypothesis
is that there is a difference between the observed and expected counts.
_{} is
called the chisquare statistic. It
measures how well the observed counts fit the expected counts, assuming that
the null hypothesis is true.
The distribution of the chisquare statistic is called the chisquare distribution, c^{2}. This distribution is a density curve.
▪
The total area
under the curve is 1.
▪
The curve begins
at zero on the horizontal axis and is skewed right.
▪
As the degrees
of freedom increase, the shape of the curve becomes more symmetric.
Using the M&M Minis^{®}
chisquare statistic, find the probability of obtaining a C^{2} value
at least this extreme assuming the null hypothesis is true.
This is known as the “Goodness of Fit Test.”
Graph the chisquare distribution
with (n – 1) = 5 degrees of freedom:
TI83^{+}: c^{2}pdf (X, 5)
0 2 4 6 8 10
12 14 16
18
We would expect to obtain a C^{2} value
at least this extreme in about _____ out of every _____ samples, assuming the
null hypothesis is true.
CONDITIONS: The Goodness of Fit Test may be used when all counts are at least 1
and no more than 20% of the counts are less than 5.
Following the Goodness of
Fit Test, check to see which component made the greatest contribution to the
chisquare statistic to see where the biggest changes occurred.
Inference for
To compare two proportions,
we use a 2Proportion Z Test or a 2Proportion T Test. If we want to compare three or more
proportions, we need a new procedure.
The first step in the overall
test for comparing several proportions is to arrange the data in a twoway
table.
Think of the counts as
elements of a matrix with r rows and c columns.
This is called an r x c table
with (r)(c) cells.
Our null hypothesis is that
there is no difference among the proportions.
The alternative hypothesis is that there is some difference among the
proportions.
We will use the chisquare
test to measure how far the observed values are from the expected values.
To calculate the expected
counts, multiply the row total by the column total, and divide by the table
total:
_{}
The chisquare statistic is
the sum over all r x c cells in the table:
_{}
The degrees of freedom is (r
– 1)(c – 1).
The Pvalue is the area to
the right of the _{} statistic under the
chisquare density curve.
The chisquare test can also
be used to show evidence that there is a relationship between two categorical
variables.
▪
Use this if you
have independent SRSs from several populations where
one variable is categorical and the other is the sample number
▪
Or, if you have
a single
▪
Or, if you have
an entire population with each individual classified according to two
categorical variables