There are no facts, only interpretations.
Frederick Nietzsche

Chapter 1
   
 Sec. 1.1 Notes:

Statistics is the SCIENCE of DATA.  Individuals are the objects described by a set of data.  Individuals may be people, but also animals or things.  A variable is any characteristic of an individual and can take different values for different individuals.

When you encounter NEW data, ask yourself...WHO? (what individuals and how many), WHAT? (how many variables, their definitions, and units, and WHY? (what is the reason the data were gathered.) 
NOTE:  Data is plural...be careful with the verb.

A VARIABLE  is any measured characteristic or attribute that differs for different subjects.
For example, if the height of 50 subjects were measured, then height would be a variable.
Variable: any characteristic that can be assigned a number or category.


There are two (2) kinds of variables - numerical and categorical.

Numerical  variables have a more formal name: QUANTITATIVE and  measure a numerical characteristic like weight, height, income,  height of trees, number of students, $ as tips .  (Sometimes can be converted into a categorical  like 10,000-24,999 income category)

Categorical variables also have a formal name: QUALITATIVE and record a category designation;  birth month, shirt size, soft drink, color of eyes, types of jobs.   A special case is called "binary" variable where ONLY 2 possible categories exist...yes/no, true/false, male/female, etc.

Consider your classmates as legitimate variables that can be measured as "observational units", that is, a person or thing to which a number or category can be assigned..  Hair color is a legitimate variable.  Number of students with blonde has is NOT a variable.  Height of the shortest student is NOT a variable.  Whether or not a student has black hair IS a categorical (qualitative) binary, having only two possible outcomes) variable.  Other binary variables would be gender or  political identity (considering our two party system).  Age of the teacher is NOT a variable.  The number of states that a student has visited is quantitative along with  heights of students.  NOTE:  IF the observational units had been all the classes at this school, then the number of students with blonde hair would become a variable.

Data may be "UNIVARIATE meaning only one (1) measurement on each object is recorded as height of a child. or BIVARIATE meaning two (2) measurements on each object are used like height AND weight of a child.
The data type will determine the type of display used.
The data type will determine the type of display used.
The data type will determine the type of display used.

The distribution of a variable tells what values the variable takes and how often it takes them.  When we examine data in order to describe their main features it is called "Exploratory Data Analysis."  We should always begin by examining each variable by itself and then move to relationships among the variables.  Do this with a GRAPH, then add NUMERICAL SUMMARIES.

BAR GRAPHS and PIE CHARTS (using calculated %'s) are suitable to display distribution of categorical variables.   Bar graphs compare counts within categories using height of bars.  Pie charts show what part of the whole (percentage) each group or category forms. 

DOT PLOTS (number line with dots) and HISTOGRAMS  (a special and  important type of graph  most commonly used for this type of variable) shows selected intervals (classes) using adjoining bars without gaps are appropriate for quantitative dataSTEM PLOTS, also called stem-leaf plots, are sideways histograms but should be used for small data sets since too few stems hides the pattern and too many stems dilutes the pattern.

When looking at the data, some characteristics are readily observable...symmetry or non-symmetry.  Symmetric distributions will have two sides that are approximate mirror images of each other.  Non-symmetric distributions may have long tails on either side of "center" and are said to be "skewed right" if the tail is long on the right or "skewed left" if the tail is long on the left.

The pth percentile of a distribution is the value such that p percent of the observations fall "at" or "below it."

A relative cumulative frequency graph  (see page 28) gives information about the "relative" standing of an individual observation while a histogram displays the distribution of all the values.

A time plot of a variable plots each observation against the time at which it was measured.  Time is the horizontal axis and variable is the vertical axis. When examining a time plot, you may observe an overall pattern called a "trend" or a pattern that repeats at regular intervals known as seasonal variation.

It would be helpful for you to summarize this section in your own words.

Index

A more thorough discussion of variable types can be found at
http://davidmlane.com/hyperstat/intro.html