Statistics in the hands of an
engineer are like a lamppost to a drunk—
they're used more for support than illumination.
A. E. Housman
Chapter 2
Sec 2.1
In Chapter 1 we learned that exploring a single
quantitative variable requires plotting the data into a graph, usually a
histogram or stem plot. We should look for an overall pattern by
discussing shape, center and spread and noting any outliers. Then we can
calculate a numerical summary to describe the center and spread...mean and
standard deviation for symmetric distributions and 5 number summary for skewed
distributions.
NOW we add one more step...
Sometimes the overall pattern of a very large number of observations is SO
REGULAR that we can describe it by a smooth curve. This curve is a
mathematical model for the distribution, ie. an idealized description.
It gives a quick picture of the overall pattern but ignores minor irregularities
as well as outliers. It is easier to work with a smooth curve than with a
histogram because the histogram depends on the choice of classes, while the
curve does not depend on any choices made by us.
Rationale...the bars of a histogram suggest areas....these areas represent proportions of the observations. The total area under the smooth curve outlining the histogram is exactly 1 (representing ALL the observations). The curve is now a density curve. A density curve is a curve that is always on or above the horizontal axis, and has area exactly 1 underneath it. When considering a specific data point, there is area to the left and area to the right. A NORMAL curve is one that mimics a symmetric histogram and the mean and median are EQUAL. Other curves may be skewed as their corresponding histogram with the mean skewed in the direction of the tail.
Graphically, the MEDIAN of a density curve is the equal-area point, the point with half the area under the curve to its left and the other half to its right. The quartiles divide the area under the curve into quarters. 1/4 of the area is to the left of Q_{1} and 3/4 of the are is to the left of Q_{3}. You can roughly locate the median and quartiles of any density curve by eye by dividing the area under the curve into four equal parts. The MEAN is the point at which the curve would BALANCE if made of solid material.
New notation...since a density curve is an idealized description of the data (not actual data), we will distinguish between the mean and standard deviation of the curve and the mean (`x ) and standard (s) from the actual observations. The notation for the mean of this idealized distribution is m (Greek mu small "m") and the standard deviation is s (Greek sigma small "s").
Normal distributions result with some outcomes of chance that are repeated many, many times. Chance experiments can be carried out on the TI-83 with some skill. Since you will only be pretending to do the experiment, it is called a "simulation." See page 85 in the text for rolling of a die.
Normal Distributions are symmetric, single-peaked, and bell-shaped. They are called normal curves. All normal distributions have the same overall shape. The exact density curve for a particular normal distribution is described by giving its mean m and standard deviation s. The mean is located at the center of the symmetric curve, and is the same as the median. Changing m without changing s moves the normal curve along the horizontal axis WITHOUT changing its spread. The standard deviation controls the spread. The curve with the larger standard deviation is more spread out.
A density curve has points at which the curvature "changes"..,these are called "inflection points". These points are located on both sides of the mean at a distance of s. The inflection points are located at m + as.
Normal distributions are important because
they are good descriptions for many real data (like SAT scores and psychological
tests), they are good approximations to the results of many kinds of chance
outcomes (tossing a coin, rolling a die), and they are the basis for statistical
inference procedures and work well for other roughly symmetric distributions.
Empirical Rule... In a normal distribution with mean
m and standard deviation
s
68% of the observations will fall within one standard deviation of the mean
95% of the observations fall within tow standard deviations of the mean
99.7% of the observations fall within 3 standard deviations of the mean
Describing any normal distribution can be done with a shortcut notation...
N(m ,s)
Frequently test scores are reported in percentiles rather than raw scores. If you score as the 94th percentile then 94% of the students taking the test scored lower or equal to you. Percentiles are used when wanting to examine where an individual observation stands relative to the other individuals in the distribution. In our statistical language, the median is the 50th percentile and Q_{1 } is 25th and Q_{3} is 75th. A good explanation appears in Ex. 2.6 and 2.7 on page 89.