There are no facts, only
interpretations.

*Frederick Nietzsche*

Chapter 1

Sec. 1.1 Notes:Statistics is the SCIENCE of DATA. Individuals are the objects described by a set of data. Individuals may be people, but also animals or things. A variable is any characteristic of an individual and can take different values for different individuals.

When you encounter NEW data, ask yourself...WHO?(what individuals and how many),WHAT?(how many variables, their definitions, and units, andWHY?(what is the reason the data were gathered.)

NOTE: Data is plural...be careful with the verb.

Variable: any characteristic that can be assigned a number or
category. |

There are two (2) kinds of variables -numerical and categorical.

Numericalvariables have a more formal name:QUANTITATIVEand measure a numerical characteristic like weight, height, income, height of trees, number of students, $ as tips . (Sometimes can be converted into a categorical like 10,000-24,999 income category)

Categoricalvariables also have a formal name:QUALITATIVEand record a category designation; birth month, shirt size, soft drink, color of eyes, types of jobs. A special case is called "binary" variable where ONLY 2 possible categories exist...yes/no, true/false, male/female, etc.

Consider your classmates as legitimate variables that can be measured as "observational units", that is, a person or thing to which a number or category can be assigned.. Hair color is a legitimate variable. Number of students with blonde has is NOT a variable. Height of the shortest student is NOT a variable. Whether or not a student has black hair IS a categorical (qualitative) binary, having only two possible outcomes) variable. Other binary variables would be gender or political identity (considering our two party system). Age of the teacher is NOT a variable. The number of states that a student has visited is quantitative along with heights of students. NOTE: IF the observational units had been all the classes at this school, then the number of students with blonde hair would become a variable.

Data may be "UNIVARIATE meaning only one (1) measurement on each object is recorded as height of a child. or BIVARIATE meaning two (2) measurements on each object are used like height AND weight of a child.

The data type will determine the type of display used.

The data type will determine the type of display used.

The data type will determine the type of display used.The

distributionof a variable tells what values the variable takes and how often it takes them. When we examine data in order to describe their main features it is called"Exploratory Data Analysis."We should always begin by examining each variable by itself and then move to relationships among the variables. Do this with a GRAPH, then add NUMERICAL SUMMARIES.

BAR GRAPHS and PIE CHARTS (using calculated %'s) are suitable to display distribution ofcategoricalvariables. Bar graphs compare counts within categories using height of bars. Pie charts show what part of the whole (percentage) each group or category forms.

DOT PLOTS(number line with dots) andHISTOGRAMS(a special and important type of graph most commonly used for this type of variable)shows selected intervals (classes) using adjoining bars without gaps are appropriate forquantitative data.STEM PLOTS, also called stem-leaf plots, are sideways histograms but should be used for small data sets since too few stems hides the pattern and too many stems dilutes the pattern.When looking at the data, some characteristics are readily observable...symmetry or non-symmetry. Symmetric distributions will have two sides that are approximate mirror images of each other. Non-symmetric distributions may have long tails on either side of "center" and are said to be "

skewed right" if the tail is long on the right or "skewed left" if the tail is long on the left.The p

^{th}percentileof a distribution is the value such that p percent of the observations fall "at" or "below it."

Arelative cumulative frequency graph(see page 28) gives information about the "relative" standing of an individual observation while a histogram displays the distribution of all the values.

Atime plotof a variable plots each observation against the time at which it was measured. Time is the horizontal axis and variable is the vertical axis. When examining a time plot, you may observe an overall pattern called a "trend" or a pattern that repeats at regular intervals known as seasonal variation.

It would be helpful for you to summarize this section in your own words.

IndexA more thorough discussion of variable types can be found at

http://davidmlane.com/hyperstat/intro.html