"To guess is cheap. To guess
wrongly is expensive. "

*Chinese proverb*

Chapter 3

Sec 3.3

If plotting the data
results in a scatterplot that suggests a linear relationship, it would be useful
to summarize the overall pattern by drawing a line through the scatterplot. * Least
Squares Regression *is the method for doing this but only in a specific
situation. A regression line (LSRL - Least Squares Regression Line) is a
straight line that describes how a response variable y changes as an
explanatory variable x changes. The line is a mathematical model used to
*predict* the value of y for a given x. Regression requires that we
have an explanatory and response variable.

No line will pass
through all the data points unless the relation is PERFECT. More likely it
will mimic the points but should be as close as possible. Close means
"close in the vertical direction." Error is defined as observed value -
predicted value and we are seeking a line that minimizes the sum of these
distances. Specifically, the least squares regression line of y on x is
the line that makes the sum of the *squares* of the vertical distances of
the data points from the line as small as possible. Yes, actual *squares.
*See page 152 for visual.

The least squares
regression line is of the same form as any line...has slope and intercept.
To indicate that this is a calculated line we will change from "y=" to "y hat
=". It can be shown that **the slope (b) = r (s _{y}/s_{x})**
where r is the correlation factor and s are the standard deviations for both x
and y. Note: the standard deviations are in the same order as
typical slope (change in y / change in x from Algebra I). The

A quantity related to the
regression output is "r^{2}". Although it simply looks like this
quantity is equal to the square of "r", there is much much more to learn.
**When r ^{2} is close to 0 the regression line is NOT a good model for the
data**.

Let's see the text (pp 158-162) for
the complete explanation of the development of r^{2} from
previously measured values. Once we understand how the method is
derived...we shall use the calculator to calculate the values.

Some additional facts about least squares regression are:

Regression is one of the most
common statistical settings and least squares is the most common method for
fitting a regression line to data. (Another method would be using the
median-median measure which produces a line very similar to the LSRL.)
Order of the variables (explanatory and response) is critical when calculating
regression lines and would produce different results if the x and y were
interchanged. There is a close connection between correlation and the
slope of the least square line. **It is interesting that the least squares
regression line always passes through the point (****`x
, `****y )**. The correlation (r) describes the strength of a straight line relationship. The
square of the correlation, r^{2 } , is the fraction of the variation
in the values of y that is explained by the regression of y on x.
Remember, it is a good idea to include r^{2} as a measure of how
successful the regression was in explaining the response when you report a
regression line.

When the regression line is
calculated based on least squares and the vertical y distances to the regression
line are measured, it is implied that there ARE distances and they represent
"left-over" variation. These distances are called ** residuals.
**A residual is the difference between an observed value of the response
variable and the value predicted by the regression line...

Lots of things can happen when
viewing residuals:

A curved pattern might appear showing that the relationship is not linear

Increasing or decreasing spread about the line as x increases indicates that
prediction of y will be LESS accurate for larger x's.

Individual points with large residuals are outliers in the vertical
direction

Individual points that are extreme in the x direction are also important....as
influential observations

Some definitions...

An * outlier*
is an observation that lies outside the overall pattern of the other
observations.

An observation is

We will complete the activity on page 154. You have experience from Algebra 2.