The latest
survey shows that 3 out of 4 people make up

75% of the world's population.

Chapter 5

Sec 5.1

Now that we are experts on evaluating someone else's data it is time to learn how to produce valid data that will lead to probable predictions and reliable conclusions.

Since we cannot put a
question to the entire population of the U.S. We can put the question to a sample
that represents the opinion of the entire population. How this sample is
chosen is extremely important. A sample is a picture (snapshot) of the entire
population with little interference resulting from the act of gathering the info. This is
called an ** observational study**. Other times we can gather
data from an

Observational studies
of the effect of one variable on another often fail because the explanatory
variable is confounded with lurking variables as discussed earlier.
Well designed experiments take steps to defeat confounding. In some
situations, it may not be possible to observe individuals directly or to perform
an experiment. In other cases, it may be logistically difficult or
inconvenient to obtain a sample or to impose a treatment. * Simulation*
provides an alternative method in these circumstances. After producing
data, the next logical step is to use formal statistical inference, which
answers specific questions with a known degree of confidence.

The entire group of
individuals that we want information about is called the __population__. A__
sample__ is a part of the population that we actually examine in order to gather
information. Note: Sampling and conducting a census are two distinct
ways of collecting data. __Sampling__ involves studying a part in order to
gain information about the whole. A __census__ attempts to contact every
individual in the entire population. A carefully constructed sample is
often more accurate than a census. Accountants sample a firm's inventory
to verify the accuracy of the records because attempting to count every last
item in the warehouse would be expensive and time intensive. The
DESIGN of a sample refers to the method used to choose the sample from the
population. Poor sample designs can produce misleading conclusions.

Some issues that
compromise the results of conducting a study on a sample of the population
include a voluntary response sample which consists of people who choose
themselves by responding to a general appeal, like a telephone call-in poll.
This is one of the common bad sample designs. Another is convenience
sampling which chooses the individuals easiest to reach. Both voluntary
response samples and convenience samples choose a sample that is almost
* guaranteed *not to represent the entire population. These sampling methods
display BIAS or systematic error by favoring some parts of the population over
others.

The statistical
remedy for these personal choice bias samples is to allow impersonal *CHANCE* to
choose the sample. A sample chosen by chance allows neither favoritism by
the sampler nor self-selection by respondents. Choosing a sample by chance
attacks bias by giving all individuals an equal chance to be chosen. The
simplest way to use chance to select a sample is to place names in a hat (from
the entire population) and draw out a handful (the sample). This is the
idea of SIMPLE RANDOM SAMPLING (SRS). A simple random sample of size
n consists of n individuals from the population chosen in such a way that every
set of n individuals has an equal chance to be the sample actually selected.
An SRS not only gives each individual an equal chance to be chosen but also
gives every possible sample an equal chance to be chosen. The idea of an
SRS is to choose our sample by drawing names from a hat. In practice,
computer software can choose and SRS from a list of individuals in the
population by using a random number generator or by consulting a table of
random digits.

A table of random digits is a long string of the digits 0, 1, 2, 3, ..., 9 with two properties: 1. each entry in the table is equally likely to be any of the 10 digits 0 thru 9 and 2. the entries are independent of each other.

See Ex. 5.4 (page
276) for explanation of random number usage to create a simple random sample.
The steps are:

1) Assign a numerical label to every individual in the population

2) Use a table or random number generator to select labels at
random.

* The use of CHANCE to select the sample is the essential principle
of statistical sampling. *A probability sample is a sample chosen by
chance. We must know what samples are possible and what chance, or
probability each possible sample has.

When sampling from
large populations it is common to sample important groups within the population
separately, then combine these samples...this is a *stratified* sample. To
select a *stratified random sample*, first divide the population into groups of
similar individuals, called *strata*. Then choose a separate SRS in each
stratum and combine these SRSs to form the full sample.

Another common means of restricting random selection is to choose the sample in stages ie. the current population survey uses a multistage sampling design along with opinion polls and other nation samples.

There are a few cautions about using sample surveys in
particular. When the population consists of human beings accurate
information from a sample requires much more than a good sampling design.
To begin we need an accurate and complete list of the population. Because
such a list is rarely available, MOST SAMPLES SUFFER FROM SOME DEGREE OF __
UNDER-COVERAGE__. The results of national sample surveys have some BIAS
if the people not covered, who often are poor, differ from the rest of the
population. A more serious source of bias in most sample surveys is
NON-RESPONSE, which occurs when a selected individual cannot be contacted or
refuses to cooperate.

Again....Under-coverage occurs when some groups in the population
are left out of the process of choosing the sample. Non-response occurs
when an individual chosen for the sample can't be contacted or does not
cooperate. Under-coverage is always present with a national census since
the list of addresses is incomplete with respect to homeless people.

In addition, the behavior of the respondent of or the interviewer can cause
response bias.

Respondents may lie or an interviewer whose attitude
suggests that some answers are more desirable than others will get these answers
more often. The race or sex of the interviewer can influence responses or
the faulty memory of participants. Good interviewing technique is another
aspect of a well-done sample survey. The wording of questions is the most
important influence on the answers. Never trust the results of a sample
survey until you have read the exact questions posed. The sampling design,
the amount of non-response, and the date of the survey are also important.
Good statistical design is a part, but only a part, of a trustworthy survey.

Some final comments: Using chance to choose a sample does eliminate bias in the actual selection of the sample but it is unlikely that results from a sample are exactly the same as for the entire population. Properly designed samples avoid systematic bias, but their results are rarely exactly correct and they vary from sample to sample. Because we deliberately use chance, the results obey the laws of probability that govern chance behavior. (We will study all the laws of probability in Chapter 6.) Results from a survey usually come with a margin of error which we will learn in Chapter 10. Finally, larger random samples give more accurate results than smaller samples.