It is easy to lie with
statistics, but it is easier to lie without them.
When you examine the relationship between two or more variables, first ask the preliminary questions as before:
*What individuals do the data describe?
*What exactly are the variables? How are they measured?
* Are all the variables quantitative or is at least one a categorical variable?
When we have data on several variables, categorical variables are often present and help organize the data. There is one MORE question you should ask when examining relations among several variables...
*Do you want to simply explore the nature of the relationship, OR do you think that some of the variables explain or even cause changes in others? That is, are some variables independent and some dependent. In statistics we use a previous concept but change the vocabulary...independent variables (plotted horizontally) are called explanatory variables (x) and dependent variables (plotted vertically) are called response variables (y). A response variable measures an outcome of a study while an explanatory variable tries to explain the observed outcomes. It will take a little while to get used to not saying x and y.
The techniques used to study relations among variables are more complex than the one-variable methods. However, the principles that guide examination of data are the same:
First plot the data, then add numerical summaries.
* Look for overall patterns and deviations from those patterns.
* When the overall pattern is quite regular, use a compact mathematical model to describe it.
See Ex 3.1 (page 122) for the concept at work.
The most effective way to display the relation between two quantitative variables is a scatterplot. A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual. If there is no distinction between the explanatory and response variables, you may plot either on the horizontal axis.
We will discuss Ex. 3.6 (page 125 completely.
Interpreting a scatterplot consists of
* looking for the overall pattern and for striking deviations from that pattern
* describing the overall pattern by the form (may be linear or other mathematical model),
direction (positive or negative), and strength (how closely the point follow a clear form) of the relationship
* an outlier is an individual value that falls outside the overall pattern
* note any clustering of data
See Ex. 3.1 (page 127 for a strong linear relationship.
Of course, NOT all relationships are linear and not all have a clear direction. When introducing another variable (categorical) into the graph, you can use different colors or symbols to plot points and show the distinction.