There are two kinds of statistics, the kind you look up,
and the kind you make up.
Rex Stout


Sec. 10.3

Pharmaceutical companies require significant evidence of effectiveness and safety prior to introduction of a new product..

Sellers/marketers want to know whether a new ad campaign significantly outperforms the old one before expending major funds.

Medical researchers want to know whether new therapy performs better than the old.

The calculations are relatively easy but using the tests wisely requires study.

The purpose of tests of significance is to give a clear statement about the degree of evidence provided (by a sample(s)) against the null hypothesis.

Using the test statistic and p-value to make a decision usually requires a pre determined level of strength - called the alpha (?) level.
(Often times alpha is NOT fixed ahead of time.)

Selecting the significance level is based on:
1) How plausible is H0??? If H0 is generally held as true then STRONG evidence (very small alpha, and p, value) will be needed to convince people to change their minds.
2) What are the consequences of rejecting H0??? Will rejecting H0 mean large expenditures to support Ha---if yes then strong evidence (small alpha) is needed.

A good idea is to report the specific p-value and let each reader decide on the strength of evidence required.

Note: Alpha levels were used PRE fast computers when tables were used. The most common level was .05 (or 5%) and is sometimes considered the rule of thumb but this is NOT universal.

Yet, with all these precautions flaws in conclusions can still result.

Statistical inference may not be valid for all data, ie., data generated by badly designed surveys or poorly designed experiments.
Randomization and the laws of probability underly confidence intervals and tests of significance. Always question HOW the data was gathered.

A common experimental flaw is one that is called "the Hawthorne Effect". The Hawthorne Effect states the knowing subjects are part of a study causes a change in their behavior that may not be attributable to the question being studied.
Example: A manufacturing facility tries music playing in the background to see if the productivity of its workers improves. Even if improvement occurs one still cannot conclude with certainty that the change was caused by the music.

Some important point to reconsider:

Have a hypothesis in place before deciding on data procedure.

Design a study (experiment, simulation) that seeks to demonstrate the affect . Make sure the subjected are randomly selected, data is non-biased.

Do the calculations to tests statistical significance of the data and use these values as "real" evidence.