If you want three opinions, just ask two statisticians.


Chapter 7
    Sec 7.1

Sample spaces need not consist of numbers.  In statistics, we are most often interested in numerical outcomes such as the "count" of an occurrence.  We call X a random variable because its values vary when the phenomenon is repeated.  We use capital letters near the end of the alphabet like X or Y.

A random variable is a variable whose value is a numerical outcome of a random phenomenon.  When a random variable describes a random phenomenon the sample space S just lists the possible values of the random variable.  There are two ways of assigning probabilities to the values of a random variable that will dominate our application of probability as we study statistical inference. 

Random variables can be either discrete or continuous.  A discrete random variable X has a "countable number of possible values.  The probability distribution of X lists the values and their probabilities in table form.  The probabilities must satisfy two requirement: 
1)  every probability pi is a number between 0 and 1
 2) p1 + p2 +,,,+pk = 1.
The probability of any event is found by adding the probabilities pi of the particular values xi that make up the event.

In Chapters 1 and 2 we used histograms and density curves to describe finite quantitative data.  In this chapter we will use analogous methods to describe the probabilities of discrete (finite) random variables. For discrete random variables histograms can be used to display probability distributions instead of table form.  We previously used histograms to picture the distributions of data.  The height of each bar shows the probability of the outcome at its base.  Because the heights are probabilities, they add to 1.  All the bars in the histogram have the same width so the areas of the bars also display the assignment of probability to outcomes.  See Ex. 7.2 page 394 for more explanation.

For continuous random variables which have infinite values defined by a given interval other methods must be employed.  We cannot assign probabilities to EACH individual value of x and then sum since there are INFINITE possible values.  Instead we assign probabilities directly to events using areas under a density curve.  Any density curve has area exactly 1 underneath it, corresponding to total probability 1. 

More formally...

A continuous random variable X takes all values in an interval of numbers.  The probability distribution of X is described by a density curve.  The probability of any event is the area under the density curve and above the values of X that make up the event.

The probability model for a continuous random variable assigns probabilities to intervals of outcomes rather than to individual outcomes.  In fact all continuous probability distributions assign probability 0 to every individual outcome.  Only intervals of values have positive probability.

We ignore the distinction between > and  > when finding probabilities for continuous random variables but keep the distinction when working with discrete random variables.

Because any density curve describes an assignment of probabilities, normal distributions are probability distributions.  Recall N(mean, standard deviation) for data which permitted standardization of data to "z scores".  Random variables can also be standardized to become a standard normal random variable (Z) having distribution N(0,1) using the same formula.