"An approximate answer to the
right question is worth a good deal more than the exact answer to an approximate
problem. "

*John Tukey*

Chapter 3

Sec 3.2

Now that we have seen what a scatterplot looks like and that it may
suggest an association between the two variables we can get a little more
descriptive. The plot displays the general direction, form, and strength
of the relationship. Linear relations are important because they are so
common. We say a linear relation is* strong* if the points lie
close to a straight line, and

r = 1/(n-1)
S (x_{i }-
`x)/s_{x}
)( (y_{i }- `y)
/s_{y})

Part of the formula involves "standardizing the observations" (both x and
y) to remove confusion caused by mixing units of measurement...that is done by
(x_{i} - x)/s_{x.} and the corresponding change in y. The
correlation "r" is the __average__ of the products of the standardized variables.
(To have the TI 83 report "r", diagnostics must be turned on.)

Some additional thoughts on correlation....the formula helps
show that "r" is positive when there is a positive association and "r" is negative
when the association between x and y is negative. Correlation makes no
distinction between explanatory and response variables...it doesn't matter which
you call x and which you call y. Correlation requires that both variables
be quantitative (numeric) so that arithmetic makes sense. Because "r" uses
the standardized vales of the observations, "r" does NOT change when the units of
measurement are changed. The correlation is always a number between -1 and
1 with the strongest association being at the extreme ends of this segment (on
the number line). Values of -1 or 1 exactly indicate the scatterplot IS a
line with a__ perfect__ correlation. Values near 0 indicate a
weak relationship.

As handy as correlation is to describe a linear relationship, it
is not a complete description of the two variable data...**always give the means
and standard deviations for both x and y also**