"An approximate answer to the right question is worth a good deal more than the exact answer to an approximate problem. "
John Tukey

Chapter 3
    Sec 3.2

Now that we have seen what a scatterplot looks like and that it may suggest an association between the two variables we can get a little more descriptive.  The plot displays the general direction, form, and strength of the relationship.  Linear relations are important because they are so common.  We say a linear relation is strong if the points lie close to a straight line, and weak if they are widely scattered about a line.  Our eyes are NOT good judges of how strong a linear relationship is.  We need to follow a strategy for data analysis by using a numerical measure to supplement the graph...this measure is called correlation.  The correlation measures the direction and strength of the linear relationship between two quantitative variables and is called "r".  We already know how to calculate the mean and standard deviation and will create a formula for r using those measures.

r = 1/(n-1) S (xi - `x)/sx )( (yi - `y) /sy)


Part of the formula  involves "standardizing the observations" (both x and y) to remove confusion caused by mixing units of measurement...that is done by (xi - x)/sx. and the corresponding change in y.  The correlation "r" is the average of the products of the standardized variables.  (To have the TI 83 report "r", diagnostics must be turned on.)

Some additional thoughts on correlation....the formula helps show that "r" is positive when there is a positive association and "r" is negative when the association between x and y is negative.  Correlation makes no distinction between explanatory and response variables...it doesn't matter which you call x and which you call y.  Correlation requires that both variables be quantitative (numeric) so that arithmetic makes sense.  Because "r" uses the standardized vales of the observations, "r" does NOT change when the units of measurement are changed.  The correlation is always a number between -1 and 1 with the strongest association being at the extreme ends of this segment (on the number line).  Values of -1 or 1 exactly indicate the scatterplot IS a line with a perfect correlation.  Values near 0 indicate a weak relationship.  Correlation does NOT describe curved relationships.  Because the mean and standard deviation are used to calculate the correlation "r", and since those measures are not resistant to outliers, neither is "r".

As handy as correlation is to describe a linear relationship, it is not a complete description of the two variable data...always give the means and standard deviations for both x and y also

Index