To be a statistician is great!! You never have to be "absolutely sure" of something...
Being "reasonably certain" is enough!
Pavel E. Guarisma, North Carolina State University


Chapter 4
    Sec 4.2

Be Careful...Be Careful....Be the theme of this section.  We have learned many fancy tools for picturing, analyzing, and summarizing data to show associations between two variables.  Be cautious when introducing the element of "causation." into the analysis.

Firstly, correlation and regression only describe linear data and are not resistant (they are affected by outliers and other influential observations).  Always plot the data before interpreting regression or correlation.  Here comes an extensive list of cautions:

Do NOT make predictions for values that are way outside of the domain of the given data since predictions will not be accurate.  This is called extrapolation.  DON'T!

So far our skills allow us to look at two variables, but sometimes many variables are at work in a relationship.  The original relationship can be influenced by other variables that we didn't think about that are lurking in the background.  A lurking variable is not among the explanatory or response variables in a study and yet many influence the interpretation of the relationships among those variables.  It can falsely suggest a strong relationship between x and y or it can hide a relationship that is really there.  DON'T! (One useful method for detecting lurking variables is to plot both the response variable and the regression residuals against the time order of the observations when time order is available - this will require two residual plots.)

Beware drawing conclusions from data that is averaged when forming conclusions about individuals. DON'T! Correlations based on averages are usually TOO HIGH when applied to individuals.

Now we address the true nature of this discussion...the question of causation. In many studies of the relationship between two variables, the goal is to establish that changes in the explanatory variables cause changes in the response variable.  Even when a strong association is present, the conclusion that the association is due to a causal link between the variables is elusive. 

What constitutes good evidence for causation???

In attempting to explain an association the causes may be direct causation...changing x causes y to change.  An example would be (x) the amount of artificial sweetener (saccharin) in a rat's diet and (y) the count of tumors n the rat's bladder.  OR the change may occur due a common response - where both x and y are changed due to the influence of some lurking variable, like concluding that a high school senior's SAT score causes the student's first year college grade point average.  The lurking variable/s creates an association even though there may be no direct causal link between x and y.  OR the effect of x on y is confounded with the effect of a lurking variable z.  Two variables are confounded when their effects on a response variable cannot be distinguished from each other.  The confounded variables may be other explanatory variables or lurking variables.  When many variables interact with each other we are prevented from drawing conclusions about causation.  For example whether a person regularly attends religious services has a causal relationship with how long the person lives.

Soooo....the best evidence for causation comes from experiments that change x while holding all other factors FIXED.   If the response variable changes, then we can think that x caused the change.  However, many causation discussions involve issues that cannot be settled by experiments making it difficult to pinpoint cause and effect in a setting involving complex relations among man interacting variables.  Sometimes experiments are not practical or ethical.  To handle these situations there are agreed upon criteria.

Establishing causation in the absence of an experiment requires:
1)  the association be strong supported by many studies
2)  the association be consistent across different groups
3)  the association gets stronger with higher doses
4)  alleged cause is plausible

Even when a well-designed experiment results in a conclusion of  "no evidence found" or no association noted, be cautious to not overstate the conclusion which is best left at "after careful study the link between (x) and (y) could not be found."

In conclusion....remember that even a very strong association between two variables is NOT by itself good evidence that there is a cause and effect link between the variables.  Be careful again though since even well established causal relations may not generalize to other settings.