Saturday, February 26, 2011

fundamentals of statistics ch.1

Getting the


PART

1


CHAPTER 1

Data Collection

Information
You Need

Statistics is a process—a series of steps that leads to a goal. This

text is divided into four parts to help the reader see the process

of statistics.

Part 1 focuses on the first step in the process, which is to

determine the research objective or question to be answered.

Then information is obtained to answer the questions stated in

the research objective.


Data
Collection

1
Outline

1.1
Introduction to the
Practice of Statistics
1.2
Observational Studies
versus Designed
Experiments
1.3
Simple Random
Sampling
1.4
Other Effective
Sampling Methods
1.5
Bias in Sampling
1.6
The Design of
Experiments
It is Monday morning and already you
are thinking about Friday night—movie
night. You don’t trust the movie reviews
published by professional critics, so you
decide to survey “regular” people yourself.
You need to design a questionnaire that can
be used to help you make an informed deci-
sion about whether to attend a particular
movie. See the Decisions project on page 60.
PUTTING IT TOGETHER
Statistics plays a major role in many different areas of our lives. For example, it is used in sports to help a
general manager decide which player might be the best fit for a team. It is used by politicians to help them
understand how the public feels about various governmental policies. Statistics is used to help determine the
effectiveness (efficacy) of experimental drugs.
Used appropriately, statistics can provide an understanding of the world around us. Used inappropriately,
it can lend support to inaccurate beliefs. Understanding the methodologies of statistics will provide you with
the ability to analyze and critique studies.With this ability, you will be an informed consumer of information,
which will enable you to distinguish solid analysis from the bogus presentation of numerical “facts.”
To help you understand the features of this text, and for hints to help you study, read the Pathway to
Success on the front inside cover of the text.

Section 1.1 Introduction to the Practice of Statistics

1.1 INTRODUCTION TO THE PRACTICE OF STATISTICS
Objectives 1 Define statistics and statistical thinking
2 Explain the process of statistics
3 Distinguish between qualitative and quantitative variables
4 Distinguish between discrete and continuous variables
5 Determine the level of measurement of a variable

1
Define Statistics and Statistical Thinking

What is statistics? When asked this question, many people respond that statistics is
numbers. After all, we are bombarded by numbers that supposedly represent how
we feel and who we are. For example, we hear on the radio that 50% of first marriages,
67% of second marriages, and 74% of third marriages end in divorce (Forest
Institute of Professional Psychology, Springfield, MO).

Another interesting consideration about the “facts” we hear or read is that two
different sources can report two different results. For example, a July 12–15, 2007,
Gallup poll indicated that 66% of Americans disapproved of the job that Congress
was doing. However, a July 25–29, 2007, poll conducted by the Pew Research Center
indicated that 54% of Americans disapproved of the job that Congress was doing.
Is it possible that Congress’s disapproval rating could decrease by 12% in less than
2 weeks or is something else going on? Statistics helps to provide the answer.

Certainly, statistics has a lot to do with numbers, but this definition is only
partially correct. Statistics is also about where the numbers come from (that is, how
they were obtained) and how closely the numbers reflect reality.

Definition Statistics is the science of collecting, organizing, summarizing, and analyzing

information to draw conclusions or answer questions. In addition, statistics is

about providing a measure of confidence in any conclusions.

It is helpful to consider this definition in four parts. The first part of the definition
states that statistics involves the collection of information. The second refers to
the organization and summarization of information. The third states that the information
is analyzed to draw conclusions or answer specific questions.The fourth part
states that results should be reported with some measure that represents how convinced
we are that our conclusions reflect reality.

What is the information referred to in the definition? The information is data.
The American Heritage Dictionary defines data as “a fact or proposition used to
draw a conclusion or make a decision.” Data can be numerical, as in height, or non-
numerical, as in gender. In either case, data describe characteristics of an individual.
The reason that data are important in statistics can be seen in this definition: data
are used to draw a conclusion or make a decision.

In Other Words Analysis of data can lead to powerful results. Data can be used to offset anecdot-
Anecdotal means that the information al claims, such as the suggestion that cellular telephones cause brain cancer. After
being conveyed is based on casual carefully collecting, summarizing, and analyzing data regarding this phenomenon,

observation, not scientific research.

it was determined that there is no link between cell phone usage and brain cancer.
See Examples 1 and 2 in Section 1.2.

Because data are powerful, they can be dangerous when misused. The misuse
of data usually occurs when data are incorrectly obtained or analyzed. For example,
radio or television talk shows regularly ask poll questions for which respondents must
call in or use the Internet to supply their vote. Most likely, the individuals who are
going to call in are those that have a strong opinion about the topic. This group is not
likely to be representative of people in general, so the results of the poll are not meaningful.
Whenever we look at data,we should be mindful of where the data come from.


Chapter 1 Data Collection

2
Even when data tell us that a relation exists, we need to investigate. For example,
a study showed that breast-fed children have higher IQs than those who were
not breast-fed. Does this study mean that mothers should breast-feed their
children? Not necessarily. It may be that some factor other than breast-feeding
contributes to the IQ of the children. In this case, it turns out that mothers who
breast-feed generally have higher IQs than those who do not. Therefore, it may be
genetics that leads to the higher IQ, not breast-feeding. This illustrates an idea in
statistics known as the lurking variable. In statistics, we must consider lurking variables,
because two variables are often influenced by a third variable. A good statistical
study will have a way of dealing with lurking variables.

A key aspect of data is that they vary. To help understand this variability,
consider the students in your classroom. Is everyone the same height? No. Does
everyone have the same color hair? No. So, among a group of individuals there is
variation. Now consider yourself. Do you eat the same amount of food each day?
No. Do you sleep the same number of hours each day? No. So, even looking at an individual
there is variation. Data vary. One goal of statistics is to describe and understand
the sources of variation. Variability in data may help to explain the different
results obtained by Gallup and Pew mentioned at the beginning of this section.

Because of this variability, the results that we obtain using data can vary. This is
a very different idea than what you may be used to from your mathematics classes.
In mathematics, if Bob and Jane are asked to solve 3x
+
5 =
11, they will both
obtain x
=
2 as the solution, if they use the correct procedures. In statistics, if Bob
and Jane are asked to estimate the average commute time for workers in Dallas,
Texas, they will likely get different answers, even though they both use the correct
procedure. The different answers occur because they likely surveyed different individuals,
and these individuals have different commute times. Note: The only way
Bob and Jane would get the same result is if they both asked all commuters or the
same commuters how long it takes to get to work, but how likely is this?

So, in mathematics when a problem is solved correctly, the results can be reported
with 100% certainty. In statistics, when a problem is solved, the results do not
have 100% certainty. In statistics, we might say that we are 95% confident that the
average commute time in Dallas, Texas, is between 20 and 23 minutes. While uncertain
results may sound disturbing now, it will become more apparent what this
means as we proceed through the course.

Without certainty, how can statistics be useful? Statistics can provide an understanding
of the world around us because recognizing where variability in data comes
from can help us to control it. Understanding the techniques presented in this text
will provide you with powerful tools that will give you the ability to analyze and
critique media reports, make investment decisions (such as what mutual fund to
invest in), or conduct research on major purchases (such as what type of car you
should buy). This will help to make you an informed consumer of information and
guide you in becoming a critical and statistical thinker.

Explain the Process of Statistics

Consider the following scenario.

You are walking down the street and notice that a person walking in front of
you drops $100. Nobody seems to notice the $100 except you. Since you could
keep the money without anyone knowing, would you keep the money or
return it to the owner?

Note: Certainly, obtaining a truthful response to a question such as this is challenging.
In Section 1.5, we present some techniques for obtaining truthful responses to
sensitive questions. >


Suppose you wanted to use this scenario as a gauge of the morality of students
at your school by determining the percent of students who would return the money.
How might you go about doing this? Well, you could attempt to present the scenario


Section 1.1 Introduction to the Practice of Statistics

to every student at the school, but this is likely to be difficult or impossible since the
number of enrolled students is likely large.A second possibility is to present the scenario
to 50 students and use the results from these 50 students to make a statement

Figure 1

about all the students at the school.

Population


Sample


Individual


Definitions
The entire group of individuals to be studied is called the population. An
individual is a person or object that is a member of the population being studied.
A sample is a subset of the population that is being studied. See Figure 1.

In the scenario presented, the population is all the students at the school. Each

student is an individual. The sample is the 50 students selected to participate in the

study.

Suppose 39 of the 50 students stated that they would return the money to the
owner. We could present this result by saying the percent of students in the survey
that stated they would return the money to the owner is 78%. This is an example of
a descriptive statistic because it describes the results of the sample without making
any general conclusions about the population.

Definitions
A statistic is a numerical summary of a sample. Descriptive statistics consist of
organizing and summarizing data. Descriptive statistics describe data through
numerical summaries, tables, and graphs.

So 78% is a statistic because it is a numerical summary based on a sample. Descriptive
statistics make it easier to get an overview of what the data are telling us.

If we extend the results of our sample to the population and say that the propor


tion of all students at the school who would return the money is 78%, we are per


forming inferential statistics.

Definition
Inferential statistics uses methods that take a result from a sample, extend it
to the population, and measure the reliability of the result.

When generalizing to a population from a sample, there is always uncertainty as
to the accuracy of the generalization, because we cannot learn everything about a
population by looking at a sample.Therefore,when performing inferential statistics,
we always report a measure that quantifies how confident we are in our results. So,
rather than saying that 78% of all students would return the money, we might say
that we are 95% confident that between 76% and 80% of all students would return
the money. Notice how this inferential statement includes a level of confidence
(measure of reliability) in our results. It also provides a range of values to account
for the variability in our results.

One goal of inferential statistics is to use statistics to estimate parameters.

Definition
A parameter is a numerical summary of a population.


EXAMPLE 1
Parameter versus Statistic

Suppose the percentage of all students on your campus that own a car is 48.2%.This
value represents a parameter because it is a numerical summary of a population.
Suppose a sample of 100 students is obtained, and from this sample we find that
46% own a car.This value represents a statistic because it is a numerical summary of
a sample.

Now Work Problem 13

Chapter 1 Data Collection


Many nonscientific studies are
based on convenience samples, such
as Internet surveys or phone-in polls.
The results of any study performed
using this type of sampling method
are not reliable.

The methods of statistics follow a process.

The Process of Statistics

1. Identify the research objective. A researcher must determine the question(s)
he or she wants answered. The question(s) must be detailed so that it identifies
the population that is to be studied.
2. Collect the data needed to answer the question(s) posed in (1). Gaining access
to an entire population is often difficult and expensive. When conducting
research, we typically look at a sample. The collection-of-data step is vital to
the statistical process, because if the data are not collected correctly, the conclusions
drawn are meaningless. Do not overlook the importance of appropriate
data-collection processes. We discuss this step in detail in Sections 1.2
through 1.6.
3. Describe the data. Obtaining descriptive statistics allows the researcher to
obtain an overview of the data and can provide insight as to the type of
statistical methods the researcher should use. We discuss this step in detail in
Chapters 2 through 4.
4. Perform inference. Apply the appropriate techniques to extend the results
obtained from the sample to the population and report a level of reliability of
the results. We discuss techniques for measuring reliability in Chapters 5
through 8 and inferential techniques in Chapters 9 through 12.
EXAMPLE 2
The Process of Statistics: Do You Favor Stricter

Gun Laws?

A poll was conducted by the Gallup Organization on October 4–7, 2007, to learn
how Americans feel about existing gun-control laws. The following statistical
process allowed the researchers at Gallup to conduct their study.

1. Identify the research objective. The researchers wished to determine the percentage
of Americans aged 18 years or older who were in favor of more strict
gun-control laws. Therefore, the population being studied was Americans aged
18 years or older.
2. Collect the information needed to answer the question posed in (1). It is unreasonable
to expect to survey the more than 200 million Americans aged 18 years
or older to determine how they feel about gun-control laws. So the researchers
surveyed a sample of 1,010 Americans aged 18 years or older. Of those surveyed,
515 stated they were in favor of more strict laws covering the sale of
firearms.
3. Describe the data. Of the 1,010 individuals in the survey, 51% (= 515/1,010)
are in favor of more strict laws covering the sale of firearms. This is a descriptive
statistic because its value is determined from a sample.
4. Perform inference. The researchers at Gallup wanted to extend the results
of the survey to all Americans aged 18 years or older. Remember, when generalizing
results from a sample to a population, the results are uncertain. To
account for this uncertainty, Gallup reported a 3% margin of error. This
means that Gallup feels fairly certain (in fact, Gallup is 95% certain) that the
percentage of all Americans aged 18 years or older in favor of more strict laws
covering the sale of firearms is somewhere between 48% (51% -3%) and
54% (51% +
3%).
Now Work Problem 57

Section 1.1 Introduction to the Practice of Statistics

3 Distinguish between Qualitative
and Quantitative Variables

Once a research objective is stated, a list of the information the researcher desires
about the individuals must be created. Variables are the characteristics of the individuals
within the population. For example, this past spring my son and I planted
a tomato plant in our backyard. We decided to collect some information about the
tomatoes harvested from the plant. The individuals we studied were the tomatoes.
The variable that interested us was the weight of the tomatoes. My son noted that
the tomatoes had different weights even though they all came from the same plant.
He discovered that variables such as weight vary.

If variables did not vary, they would be constants, and statistical inference would
not be necessary. Think about it this way: If all the tomatoes had the same weight,
then knowing the weight of one tomato would be sufficient to determine the weights
of all tomatoes. However, the weights vary from one tomato to the next. One goal of
research is to learn the causes of the variability so that we can learn to grow plants
that yield the best tomatoes.

Variables can be classified into two groups: qualitative or quantitative.

Definitions Qualitative, or categorical, variables allow for classification of individuals based

on some attribute or characteristic.
Quantitative variables provide numerical measures of individuals. Arithmetic
operations such as addition and subtraction can be performed on the values of a
quantitative variable and will provide meaningful results.


Many examples in this text will include a suggested approach, or a way to look
at and organize a problem so that it can be solved. The approach will be a suggested
method of attack toward solving the problem.This does not mean that the approach
given is the only way to solve the problem, because many problems have more than
one approach leading to a correct solution. For example, if you turn the key in your

In Other Words

car’s ignition and it doesn’t start, one approach would be to look under the hood to

Typically, there is more than one correct

try to determine what is wrong. (Of course, this approach will work only if you know

approach to solving a problem.

how to fix cars.) A second, equally valid approach would be to call an automobile
mechanic to service the car.

EXAMPLE 3
Distinguishing between Qualitative and Quantitative Variables

Problem: Determine whether the following variables are qualitative or quantitative.

(a) Gender
(b) Temperature
(c) Number of days during the past week that a college student aged 21 years or
older has had at least one drink
(d) Zip code
Approach: Quantitative variables are numerical measures such that meaningful
arithmetic operations can be performed on the values of the variable. Qualitative variables
describe an attribute or characteristic of the individual that allows researchers to
categorize the individual.

Solution

(a) Gender is a qualitative variable because it allows a researcher to categorize the
individual as male or female. Notice that arithmetic operations cannot be performed
on these attributes.
(b) Temperature is a quantitative variable because it is numeric, and operations
such as addition and subtraction provide meaningful results. For example, 70°F is
10°F warmer than 60°F.

Chapter 1 Data Collection

Now Work Problem 21

4

Definitions

In Other Words

If you count to get the value of a
quantitative variable, it is discrete.
If you measure to get the value of a
quantitative variable, it is continuous.
When deciding whether a variable is
discrete or continuous, ask yourself if
it is counted or measured.

Figure 2

EXAMPLE 4

(c) Number of days during the past week that a college student aged 21 years or
older had at least one drink is a quantitative variable because it is numeric, and
operations such as addition and subtraction provide meaningful results.
(d) Zip code is a qualitative variable because it categorizes a location. Notice that,
even though they are numeric, the addition or subtraction of zip codes does not
provide meaningful results.
On the basis of the result of Example 3(d), we conclude that a variable may be
qualitative while having values that are numeric. Just because the value of a variable
is numeric does not mean that the variable is quantitative.

Distinguish between Discrete
and Continuous Variables

We can further classify quantitative variables into two types: discrete or continuous.

A discrete variable is a quantitative variable that has either a finite number of
possible values or a countable number of possible values. The term countable
means that the values result from counting, such as 0, 1, 2, 3, and so on.

A continuous variable is a quantitative variable that has an infinite number of
possible values that are not countable.

Figure 2 illustrates the relationship among qualitative, quantitative, discrete, and continuous
variables.

Qualitative
Quantitative
variables
variables


Discrete
Continuous
variables
variables


An example should help to clarify the definitions.

Distinguishing between Discrete and Continuous Variables

Problem: Determine whether the following quantitative variables are discrete or
continuous.

(a) The number of heads obtained after flipping a coin five times.
(b) The number of cars that arrive at a McDonald’s drive-through between 12:00 P.M.
and 1:00 P.M.
(c) The distance a 2007 Toyota Prius can travel in city driving conditions with a full
tank of gas.
Approach: A variable is discrete if its value results from counting. A variable is
continuous if its value is measured.

Solution

(a) The number of heads obtained by flipping a coin five times would be a discrete
variable because we would count the number of heads obtained.The possible values
of the discrete variable are 0, 1, 2, 3, 4, 5.
(b) The number of cars that arrive at a McDonald’s drive-through between 12:00 P.M.
and 1:00 P.M. is a discrete variable because its value would result from counting the

Now Work Problem 29 Now Work Problem 29
EXAMPLE 5
Section 1.1 Introduction to the Practice of Statistics

cars. The possible values of the discrete variable are 0, 1, 2, 3, 4, and so on. Notice
that there is no predetermined upper limit to the number of cars that may arrive.

(c) The distance traveled is a continuous variable because we measure the distance.
Continuous variables are often rounded. For example, when the miles per
gallon (mpg) of gasoline for a certain make of car is given as 24 mpg, it means that
the miles per gallon is greater than or equal to 23.5 and less than 24.5, or

23.5

mpg 6
24.5.
The type of variable (qualitative, discrete, or continuous) dictates the methods
that can be used to analyze the data.
The list of observed values for a variable is data. Gender is a variable; the observations
male or female are data. Qualitative data are observations corresponding to
a qualitative variable. Quantitative data are observations corresponding to a quantitative
variable. Discrete data are observations corresponding to a discrete variable,
and continuous data are observations corresponding to a continuous variable.

Distinguishing between Variables and Data

Problem: Table 1 presents a group of selected countries and information regarding
these countries as of October, 2007. Identify the individuals, variables, and data in
Table 1.

Table 1
Life Expectancy Population
Country Government Type (years) (in millions)
Australia Federal parliamentary democracy 80.62 20.4
Canada Constitutional monarchy 80.34 33.4
France Republic 80.59 63.7
Morocco Constitutional monarchy 71.22 33.8
Poland Republic 75.19 38.52
Sri Lanka Republic 74.80 20.93
United States Federal republic 78.00 301.14

Source: CIA World Factbook

Approach: An individual is an object or person for whom we wish to obtain data.
The variables are the characteristics of the individuals, and the data are the specific
values of the variables.

Solution: The individuals in the study are the countries: Australia, Canada, and so
on (in red ink). The variables measured for each country are government type, life
expectancy, and population (in blue ink).The variable government type is qualitative
because it categorizes the individual. The variables life expectancy and population
are quantitative.

The quantitative variable life expectancy is continuous because it is measured.
The quantitative variable population is discrete because we count people. The observations
are the data (in green ink). For example, the data corresponding to the
variable life expectancy are 80.62, 80.34, 80.59, 71.22, 75.19, 74.80, and 78.00. The following
data correspond to the individual Poland: a republic government with residents
whose life expectancy is 75.19 years and where population is 38.52 million
people. Republic is an instance of qualitative data that results from observing the
value of the qualitative variable government type. The life expectancy of 75.19 years
is an instance of quantitative data that results from observing the value of the quantitative
variable life expectancy.


Now Work Problem 51

10 Chapter 1 Data Collection

5 Determine the Level of Measurement
of a Variable

Rather than classify a variable as qualitative or quantitative, we can assign a level of
measurement to the variable.

Definitions
A variable is at the nominal level of measurement if the values of the variable
name, label, or categorize. In addition, the naming scheme does not allow for the
values of the variable to be arranged in a ranked or specific order.

In Other Words

A variable is at the ordinal level of measurement if it has the properties of the

The word nominal comes from the Latin

nominal level of measurement and the naming scheme allows for the values of

word nomen, which means to name. When

the variable to be arranged in a ranked or specific order.

you see the word ordinal, think order.

A variable is at the interval level of measurement if it has the properties of the
ordinal level of measurement and the differences in the values of the variable
have meaning. A value of zero in the interval level of measurement does not
mean the absence of the quantity. Arithmetic operations such as addition and
subtraction can be performed on values of the variable.

A variable is at the ratio level of measurement if it has the properties of the interval
level of measurement and the ratios of the values of the variable have
meaning. A value of zero in the ratio level of measurement means the absence
of the quantity.Arithmetic operations such as multiplication and division can be
performed on the values of the variable.

Variables that are nominal or ordinal are qualitative variables, while variables
that are interval or ratio are quantitative variables.

EXAMPLE 6
Determining the Level of Measurement of a Variable

Problem: For each of the following variables, determine the level of measurement.

(a) Gender
(b) Temperature
(c) Number of days during the past week that a college student aged 21 years or
older has had at least one drink
(d) Letter grade earned in your statistics class
Approach: For each variable, we ask the following: Does the variable simply categorize
each individual? If so, the variable is nominal. Does the variable categorize
and allow ranking of each value of the variable? If so, the variable is ordinal. Do differences
in values of the variable have meaning, but a value of zero does not mean
the absence of the quantity? If so, the variable is interval. Do ratios of values of the
variable have meaning and there is a natural zero starting point? If so, the variable
is ratio.

Solution

(a) Gender is a variable measured at the nominal level because it only allows for categorization
of male or female. Plus, it is not possible to rank gender classifications.
(b) Temperature is a variable measured at the interval level because differences in
the value of the variable make sense. For example, 70°F is 10°F warmer than 60°F.
Notice that the ratio of temperatures does not represent a meaningful result. For
example, 60°F is not twice as warm as 30°F. In addition, 0°F does not represent the
absence of heat.
(c) Number of days during the past week that a college student aged 21 years or
older has had at least one drink is measured at the ratio level, because the ratio of
two values makes sense and a value of zero has meaning. For example, a student
who had four drinks had twice as many drinks as a student who had two drinks.

Now Work Problem 37 Now Work Problem 37
Section 1.1 Introduction to the Practice of Statistics 11

(d) Letter grade is a variable measured at the ordinal level because the values of
the variable can be ranked, but differences in values have no meaning. For example,
an A is better than a B, but A – B has no meaning.
When classifying variables according to their level of measurement, it is extremely
important to be careful to recognize what the variable is intended to measure.
For example, suppose we want to know whether cars with 4-cylinder engines get
better gas mileage than cars with 6-cylinder engines. Here, engine size represents a
category of data and so the variable is nominal. On the other hand, if we want to know
the average number of cylinders in cars in the United States, the variable is classified
as ratio (an 8-cylinder engine has twice as many cylinders as a 4-cylinder engine).

Validity, Reliability, and Variability
Divide the class into groups of four to six students.
(a) Select one student to be the group leader. Each student in the
group measures the length of the right arm of the group leader.
As the group leader is being measured, the other students in the
group do not look on. Do not share the measurements obtained with
others in the group until everyone has obtained a measurement!
Record the results.
(b) The group leader measures the length of the right arm of each of
the other students in the group. Record the results.
(c) Validity of a variable or measurement represents how close to
the true value the measurement is. In other words, a variable is valid if
it measures what it is supposed to measure. For example, if a student
measured arm length from the shoulder to the wrist and another
student measured arm length from the shoulder to the tip of the
middle finger, the variable is not valid. How valid are the results
obtained from part (a)? What could have been done by the group
to increase the validity of the variable?
(d) Reliability of a variable or measurement represents the ability
of different measurements of the same individual to yield the same
results. How reliable are the measurements obtained in part (b)?
Why is it likely that the results from part (b) are valid, but may not
be reliable?
(e) Which set of data appears to have more variability, the data from
part (a) or the data from part (b)? Why?
(f) Compare the results of all the groups. Which group do you think has
the most valid results? Which group has the most reliable results?
1.1 ASSESS YOUR UNDERSTANDING
Concepts and Vocabulary

1.
Define statistics.
2.
Explain the difference between a population and a sample.
3.
A(n) is a person or object that is a member of the
population being studied.
4.
statistics consists of organizing and summarizing
information collected, while statistics uses methods
that generalize results obtained from a sample to the population
and measure the reliability of the results.

5. A(n) is a numerical summary of sample.
A(n) is a numerical summary of a population.
6. are the characteristics of the individuals of the

population being studied.


12 Chapter 1 Data Collection

7.
Contrast the differences between qualitative and quantitative
variables.
8.
Discuss the differences between discrete and continuous
variables.
9.
In your own words, define the four levels of measurement of
a variable. Give an example of each.
10.
Explain what is meant when we say “data vary.” How does
this variability affect the results of statistical analysis?
11. Explain the process of statistics.
12.
The age of a person is commonly considered to be a continuous
random variable. Could it be considered a discrete random
variable instead? Explain.
Skill Building

In Problems 13–20, determine whether the underlined value is a
parameter or a statistic.

13.
State Government Following the 2006 national midterm
election, 18% of the governors of the 50 United States were
female.
NW
Source: National Governors Association

14.
Calculus Exam The average score for a class of 28 students
taking a calculus midterm exam was 72%.
15.
Illegal Drugs In a national survey of high school students
(grades 9 to 12), 25% of respondents reported that someone
had offered, sold, or given them an illegal drug on school
property.
Source: Bureau of Justice Statistics jointly with the U.S. Department
of Education, Indicators of School Crime and
Safety, 2006, December 2006

16.
Alcohol Use In a national survey on substance abuse, 66.4%
of respondents who were full-time college students aged 18
to 22 reported using alcohol within the past month.
Source: Substance Abuse and Mental Health Services Administration,
Results from the 2006 National Survey on Drug
Use and Health: National Findings, September 2007

17.
Batting Average Ty Cobb is one of Major League Baseball’s
greatest hitters of all time, with a career batting average of
0.366.
Source: baseball-almanac.com
18.
Moonwalkers Only 12 men have walked on the moon.
The average age of these men at the time of their moonwalks
was 39 years, 11 months, 15 days.
Source: Wikipedia.org

19.
Hygiene Habits A study of 6,076 adults in public rest rooms
(inAtlanta,Chicago,NewYork City,and San Francisco) found
that 23% did not wash their hands before exiting.
Source: American Society for Microbiology and the Soap and
Detergent Association, Press Release: Hygiene Habits Stall:
Public Handwashing Down. September 17, 2007

20.
Public Knowledge Telephone interviews of 1,502 adults
18 years of age or older, conducted nationwide February
1–13, 2007, found that only 69% could identify the current
vice-president.
Source: The Pew Research Center, Public Knowledge of
Current Affairs Little Changed by News and Information
Revolutions: What Americans Know: 1989–2007, April 15,
2007

In Problems 21–28, classify the variable as qualitative or
quantitative.

21.
Nation of origin
NW
22. Number of siblings
23. Grams of carbohydrates in a doughnut
24. Number on a football player’s jersey
25.
Number of unpopped kernels in a bag of ACT microwave
popcorn
26.
Assessed value of a house
27.
Phone number
28.
Student ID number
In Problems 29–36, determine whether the quantitative variable is
discrete or continuous.

NW29. Runs scored in a season by Albert Pujols
30. Volume of water lost each day through a leaky faucet
31. Length (in minutes) of a country song
32.
Number of sequoia trees in a randomly selected acre of
Yosemite National Park
33.
Temperature on a randomly selected day in Memphis,
Tennessee
34. Internet connection speed in kilobytes per second
35. Points scored in an NCAA basketball game
36.
Air pressure in pounds per square inch in an automobile
tire
In Problems 37–44, determine the level of measurement of each
variable.

NW37. Nation of origin
38. Movie ratings of one star through five stars
39. Volume of water used by a household in a day
40. Year of birth of college students
41. Highest degree conferred (high school, bachelor’s, and so on)
42. Eye color
43.
Assessed value of a house
44. Time of day measured in military time
In Problems 45–50, a research objective is presented. For each research
objective, identify the population and sample in the study.

45.
The Gallup Organization contacts 1,028 teenagers who are
13 to 17 years of age and live in the United States and asks
whether or not they had been prescribed medications for
any mental disorders, such as depression or anxiety.
46.
A quality-control manager randomly selects 50 bottles of
Coca-Cola that were filled on October 15 to assess the calibration
of the filling machine.
47.
A farmer wanted to learn about the weight of his soybean
crop. He randomly sampled 100 plants and weighed the soybeans
on each plant.
48.
Every year the U.S. Census Bureau releases the Current Population
Report based on a survey of 50,000 households. The

Section 1.1 Introduction to the Practice of Statistics 13

goal of this report is to learn the demographic characteristics
of all households within the United States, such as income.

49.
Folate and Hypertension Researcher John P. Forman and
co-workers wanted to determine whether or not higher folate
intake is associated with a lower risk of hypertension
(high blood pressure) in younger women (27 to 44 years of
age). To make this determination, they looked at 7,373 cases
of hypertension in younger women and found that younger
women who consumed at least 1,000 micrograms per day
1mg/d2
of total folate (dietary plus supplemental) had a
decreased risk of hypertension compared with those who
consumed less than 200 mg/d.
Source: John P. Forman, MD; Eric B. Rimm, ScD; Meir J.
Stampfer, MD; Gary C. Curhan, MD, ScD, “Folate Intake
and the Risk of Incident Hypertension among US Women,”
Journal of the American Medical Association 293:320–329,
2005

50.
A large community college has noticed that an increasing
number of full-time students are working while attending
the school.The administration randomly selects 128 students
and asks this question: How many hours per week do you
work?
In Problems 51–54, identify the individuals, variables, and data
corresponding to the variables. Determine whether each variable is
qualitative, continuous, or discrete.

51.
Widescreen TVs The following data relate to widescreen
high-definition televisions.
NW
Model Size (in.) Screen Type Price ($)
Hitachi 50 Plasma 4,000
#P50X901
Mitsubishi 73 Projection 4,300
#WD-73833
Sony 50 Projection 1,500
#KDF-50E3000
Panasonic 65 Plasma 9,000
#TH-65PZ750U
Phillips 60 Projection 1,600
#60PP9200D37
Samsung 58 Plasma 4,200
#FP-T5884
LG 52 Plasma 3,500
#52LB5D

Source: bestbuy.com

52.
BMW Cars The following information relates to the 2008
model year product line of BMW automobiles.
Model Body Style Weight (lb) Number of Seats
3 Series Coupe 3,351 4
5 Series Sedan 3,505 5
6 Series Convertible 4,277 4
7 Series Sedan 4,486 5
X3 Sport utility 4,012 5
Z4 Roadster Coupe 3,087 2

Source: www.motortrend.com

53. Driver’s License Laws The following data represent driver’s
license laws for various states.
State
Minimum Age
for Driver’s
License
(unrestricted)
Mandatory
Belt Use
Seating
Positions
Maximum
Allowable Speed
Limit (cars on
rural interstate),
mph, 2007
Alabama 17 Front 70
Colorado 17 Front 75
Indiana 18 All 70
North Carolina 16 All 70
Wisconsin 18 All 65

Source: Governors Highway Safety Association

54.
Media Players The following information concerns various
digital media players that can be purchased online at
circuitcity.com.
Product Memory Size (GB) Weight (oz) Price ($)
Samsung YP-U3 2 0.8 79.99
SanDisk Sansa c200 2 10.4 74.99
Microsoft Zune 4 8.3 149.99
SanDisk Sansa Connect 4 1.7 129.99
Apple iPod nano 4 1.7 149.99
Apple iPod touch 8 4.2 299.99

Archos 605 30 6.7 299.99

Applying the Concepts

55.
A Cure for the Common Wart A study conducted by
researchers was designed “to determine if application of
duct tape is as effective as cryotherapy in the treatment of
common warts.” The researchers randomly divided 51 patients
into two groups. The 26 patients in group 1 had their
warts treated by applying duct tape to the wart for 6.5 days
and then removing the tape for 12 hours, at which point the
cycle was repeated for a maximum of 2 months. The 25 patients
in group 2 had their warts treated by cryotherapy (liquid
nitrogen applied to the wart for 10 seconds every 2 to
3 weeks) for a maximum of six treatments. Once the treatments
were complete, it was determined that 85% of the patients
in group 1 and 60% of the patients in group 2 had
complete resolution of their warts. The researchers concluded
that duct tape is significantly more effective in treating
warts than cryotherapy.
Source: Dean R. Focht III, Carole Spicer, Mary P. Fairchok.
“The Efficacy of Duct Tape vs. Cryotherapy in the Treatment
of Verruca Vulgaris (The Common Wart),” Archives of Pediatrics
and Adolescent Medicine, 156(10), 2002

(a) What is the research objective?
(b) Whatisthepopulationbeingstudied?Whatisthesample?
(c) What are the descriptive statistics?
(d) What are the conclusions of the study?
56.
Early Epidurals A study was conducted at Northwestern
University in Chicago to determine if pregnant women in
first-time labor could receive early low-dose epidurals, an
anesthesis to control pain during childbirth, without raising

14 Chapter 1 Data Collection

their chances of a Cesarean section. In the study, reported
in the New England Journal of Medicine,“728 women in first-
time labor were divided into two groups. One group received
the spinal shot and then got epidurals when the cervix dilated
to about 2 centimeters. The other group initially received
pain-relieving medicine directly into their bloodstreams, and
put off epidurals until 4 centimeters if they could tolerate
the pain.” In the end, the C-section rate was 18% in the early
epidural group and 21% in the delayed group. The researchers
concluded that pregnant women in first-time labor
can be given a low-dose epidural early without raising their
chances of a C-section.

Source: Associated Press, February, 22, 2005

(a) What is the research objective?
(b) Whatisthepopulationbeingstudied?Whatisthesample?
(c) What are the descriptive statistics?
(d) What are the conclusions of the study?
57.
When Are You Best? Gallup News Service conducted a
survey of 1,019 American adults aged 18 years or older,
August 13–16, 2007. The respondents were asked, “Generally
speaking,at what hour of the day or night are you personally at
your best?” Of the 1,019 adults surveyed, 55% said they were
personally at their best in the morning (5 A.M. to 11:59 A.M.).
Gallup reported that 55% of all adult Americans felt they
were personally at their best in the morning, with a 3% margin
of error with 95% confidence.
NW
(a) What is the research objective?
(b) What is the population?
(c) What is the sample?
(d) List the descriptive statistics.
(e) What can be inferred from this survey?
58.
Financial Worries? Gallup News Service conducted a survey
of 1,006 American adults aged 18 years or older, September
24–27, 2007. The respondents were asked, “What, if
anything, worries you most about your personal financial situation
in the long term?” Of the 1,006 adults surveyed, 18%
said they were most worried about having enough money for
retirement. (Ironically, not having enough money for retirement
was not a short-term concern.) Gallup reported that
18% of all adult Americans were most worried about not
having enough money for retirement, with a 4% margin of
error with 95% confidence.
(a) What is the research objective?
(b) What is the population?
(c) What is the sample?
(d) List the descriptive statistics.
(e) What can be inferred from this survey?
59.
What Level of Measurement? It is extremely important for
a researcher to clearly define the variables in a study because
this helps to determine the type of analysis that can be
performed on the data. For example, if a researcher wanted
to describe baseball players based on jersey number, what
level of measurement would the variable jersey number be?
Now suppose the researcher felt that certain players who
were of lower caliber received higher numbers. Does the
level of measurement of the variable change? If so, how?
60.
Interpreting the Variable Suppose a fundraiser holds a raffle
for which each person that enters the room receives a ticket.
The tickets are numbered 1 to N, where N is the number of
people at the fundraiser. The first person to arrive receives
ticket number 1, the second person receives ticket number 2,
and so on. Determine the level of measurement for each of
the following interpretations of the variable ticket number.
(a) The winning ticket number.
(b) The winning ticket number was announced as 329. An
attendee noted his ticket number was 294 and stated,
“I guess I arrived too early.”
(c) The winning ticket number
was announced as 329.
An attendee looked around the room and commented,
“It doesn’t look like there are 329 people in
attendance.”
61.
Analyze the Article Read the newspaper article and identify
(a) the research question the study addresses, (b) the
population, (c) the sample, (d) the descriptive statistics, and
(e) the inferences of the study.
Study: Educational TV for Toddlers OK

CHICAGO (AP)—Arthur and Barney are OK for toddler TV-
watching, but not Rugrats and certainly not Power Rangers,
reports a new study of early TV-watching and future attention
problems.
The research involved children younger than 3, so TV is
mostly a no–no anyway, according to the experts. But if TV
is allowed, it should be of the educational variety, the researchers
said.
Every hour per day that kids under 3 watched violent child-
oriented entertainment their risk doubled for attention
problems five years later, the study found. Even nonviolent
kids’ shows like Rugrats and The Flintstones carried a still
substantial risk for attention problems, though slightly lower.
On the other hand, educational shows, including Arthur,
Barney and Sesame Street had no association with future
attention problems.
Interestingly, the risks only occurred in children younger
than age 3, perhaps because that is a particularly crucial period
of brain development. Those results echo a different
study last month that suggested TV-watching has less impact
on older children’s behavior than on toddlers.
The American Academy of Pediatrics recommends no television
for children younger than 2 and limited TV for older
children.
The current study by University of Washington researchers
was prepared for release Monday in November’s issue of
the journal Pediatrics.
Previous research and news reports on TV’s effects have
tended to view television as a single entity, without regard
to content. But “the reality is that it’s not inherently
good or bad. It really depends on what they watch,” said
Dr. Dimitri Christakis, who co-authored the study with
researcher Frederick Zimmerman.
Their study was based on parent questionnaires. They acknowledge
it’s observational data that only suggests a link
and isn’t proof that TV habits cause attention problems.
Still, they think the connection is plausible.


Section 1.2 Observational Studies versus Designed Experiments 15

The researchers called a show violent if it involved fighting,
hitting people, threats or other violence that was central to
the plot or a main character. Shows listed included Power
Rangers, Lion King and Scooby Doo.
These shows, and other kids’ shows without violence, also
tend to be very fast-paced, which may hamper children’s
ability to focus attention, Christakis said.
Shows with violence also send a flawed message, namely that
“if someone gets bonked on the head with a rolling pin, it just
makes a funny sound and someone gets dizzy for a minute
and then everything is back to normal,” Christakis said.
Dennis Wharton of the National Association of Broadcasters,
a trade association for stations and networks including
those with entertainment and educational children’s TV
shows, said he had not had a chance to thoroughly review
the research and declined to comment on specifics.
Wharton said his group believes “there are many superb
television programs for children, and would acknowledge
that it is important for parents to supervise the media consumption
habits of young children.”
The study involved a nationally representative sample of 967
children whose parents answered government-funded child

development questionnaires in 1997 and 2002. Questions
involved television viewing habits in 1997. Parents were
asked in 2002 about their children’s behavior, including inattentiveness,
difficulty concentrating and restlessness.
The researchers took into account other factors that might
have influenced the results—including cultural differences
and parents’ education levels—and still found a strong link
between the non-educational shows and future attention
problems.
Peggy O’Brien, senior vice president for educational programming
and services at the Corporation for Public Broadcasting,
said violence in ads accompanying shows on
commercial TV might contribute to the study results.
She said lots of research about brain development goes
into the production of educational TV programming for
children, and that the slower pace is intentional.
“We want it to be kind of an extension of play” rather than
fantasy, she said.

Source: Copyright @ 2008 The Associated Press. All rights
reserved. The information contained in the AP News report
may not be published, broadcast, rewritten or redistributed
without the prior written authority of The Associated Press.

1.2 OBSERVATIONAL STUDIES VERSUS
DESIGNED EXPERIMENTS

Objectives 1 Distinguish between an observational study and an experiment
2 Explain the various types of observational studies
1 Distinguish between an Observational
Study and an Experiment

Once our research question is developed, we must develop methods for obtaining
the data that can be used to answer the questions posed in our research objective.
There are two methods for collecting data, observational studies and designed experiments.
To help see the difference between these two methods for obtaining data,
read the following two studies.

EXAMPLE 1
Cellular Phones and Brain Tumors

Researchers Joachim Schüz and associates wanted “to investigate cancer risk
among Danish cellular telephone users who were followed for up to 21 years.” To do
so, they kept track of 420,095 people whose first cellular telephone prescription was
between 1982 and 1995. In 2002, they recorded the number of people out of the
420,095 people who had a brain tumor and compared the rate of brain tumors in this
group to the rate of brain tumors in the general population. They found no significant
difference in the rate of brain tumors between the two groups. The researchers
concluded “cellular telephone use was not associated with increased risk for brain
tumors.” (Source: Joachim Schüz et al. “Cellular Telephone Use and Cancer Risk:
Update of a Nationwide Danish Cohort,” Journal of the National Cancer Institute
98(23): 1707–1713, 2006)



16 Chapter 1 Data Collection

EXAMPLE 2
Cellular Phones and Brain Tumors

Researchers Joseph L. Roti Roti and associates examined “whether chronic exposure
to radio frequency (RF) radiation at two common cell phone signals—

835.62 megahertz, a frequency used by analogue cell phones, and 847.74 megahertz,
a frequency used by digital cell phones—caused brain tumors in rats.” To do so, the
researchers divided 480 rats into three groups. The rats in group 1 were exposed to
the analogue cell phone frequency; the rats in group 2 were exposed to the digital
frequency; the rats in group 3 served as controls and received no radiation. The
exposure was done for 4 hours a day, 5 days a week for 2 years. The rats in all three
groups were treated the same, except for the RF exposure.
After 505 days of exposure, the researchers reported the following after analyzing
the data. “We found no statistically significant increases in any tumor type, including
brain, liver, lung or kidney, compared to the control group.” (Source: M. La
Regina, E. Moros, W. Pickard, W. Straube, J. L. Roti Roti, “The Effect of Chronic
Exposure to 835.62 MHz FMCW or 847.7 MHz CDMA on the Incidence of Spontaneous
Tumors in Rats,” Bioelectromagnetic Society Conference, June 25, 2002.)


In both studies, the goal was to determine if radio frequencies from cell phones
increase the risk of contracting brain tumors. Whether or not brain cancer was contracted
is the response variable. The level of cell phone usage is the explanatory variable.
In research, we wish to determine how varying the amount of an explanatory
variable affects the value of a response variable.

What are the differences between the study in Example 1 and the study in
Example 2? Obviously, in Example 1 the study was conducted on humans, whereas
the study in Example 2 was conducted on rats. However, there is a bigger difference.
In Example 1, no attempt was made to influence the individuals in the study. The
researchers simply let the 420,095 people go through their everyday lives and talk
on the phone as much or as little as they wished. In other words, no attempt was
made to influence the value of the explanatory variable, radio-frequency exposure
(cell phone use). Because the researchers simply observed the behavior of the study
participants, we call the study in Example 1 an observational study.

Definition
An observational study measures the value of the response variable without
attempting to influence the value of either the response or explanatory variables.
That is, in an observational study, the researcher observes the behavior of
the individuals in the study without trying to influence the outcome of the study.

Now let’s consider the study in Example 2. In this study, the researchers obtained
480 rats and divided the rats into three groups. Each group was intentionally
exposed to various levels of radiation. The researchers then compared the number
of rats that had brain tumors. Clearly, there was an attempt to influence the individuals
in this study because the value of the explanatory variable (exposure to radio
frequency) was influenced. Because the researchers controlled the value of the
explanatory variable, we call the study in Example 2 a designed experiment.

Definition If a researcher assigns the individuals in a study to a certain group, intentionally
changes the value of an explanatory variable, and then records the value of
the response variable for each group, the researcher is conducting a designed
experiment.

Now Work Problem 9
Which Is Better? A Designed Experiment or an Observational Study?

To answer this question, let’s consider another study.


EXAMPLE 3
Section 1.2 Observational Studies versus Designed Experiments 17

Do Flu Shots Benefit Seniors?

Researchers wanted to determine the long-term benefits of the influenza vaccine
on seniors aged 65 years and older. The researchers looked at records of over
36,000 seniors for 10 years.The seniors were divided into two groups.Group 1 were
seniors who chose to get a flu vaccination shot, and group 2 were seniors who chose
not to get a flu vaccination shot. After observing the seniors for 10 years, it was
determined that seniors who get flu shots are 27% less likely to be hospitalized for
pneumonia or influenza and 48% less likely to die from pneumonia or influenza.
(Source: Kristin L. Nichol, MD, MPH, MBA, James D. Nordin, MD, MPH, David B.
Nelson, PhD, John P. Mullooly, PhD, Eelko Hak, PhD. “Effectiveness of Influenza
Vaccine in the Community-Dwelling Elderly,” New England Journal of Medicine
357:1373–1381, 2007)


Wow! The results of this study sound great! All seniors should go out and get a flu
shot.Right?Well,hold on a second.The authors of the study admitted that there may be
some flaws in their results.They were concerned with confounding.That is,the authors
were concerned that there might be a different explanation for lower hospitalization
and death rates than the flu shot. Could it be that seniors that get flu shots are more
health conscious in the first place? Could it be that seniors who get flu shots are able to
getaroundmore easily,sotheycangettotheclinictogettheflushot?Doesrace,income,
or gender play a role in whether one might contract (and possibly die from) influenza?

Definition
Confounding in a study occurs when the effects of two or more explanatory
variables are not separated. Therefore, any relation that may exist between an
explanatory variable and the response variable may be due to some other variable
or variables not accounted for in the study.

Confounding is potentially a major problem with observational studies. Often,
the cause of confounding is a lurking variable.

Definition
A lurking variable is an explanatory variable that was not considered in a study,
but that affects the value of the response variable in the study. In addition,
lurking variables are typically related to explanatory variables considered in
the study.

In the influenza study, possible lurking variables might be age, health status,
or mobility of the senior. How can we manage the effect of lurking variables? One
possibility is to look at the individuals in the study to determine if they differ in any
significant way. For example, it turns out in the influenza study that the seniors who
elected to get a flu shot were actually less healthy than those who did not. The
researchers also accounted for race and income. Another variable the authors identified
as a potential lurking variable was functional status, meaning the ability of
the seniors to conduct day-to-day activities on their own. The authors were able to
adjust their results for this variable as well.

Even after accounting for all the potential lurking variables in the study, the authors
were still careful to conclude that getting an influenza shot is associated with a
lower risk of being hospitalized or dying from influenza. The reason the authors
used the term associated instead of saying that influenza shots result in (or cause) a
lower risk of death due to influenza is because the study was observational.

Observational studies do not allow a researcher to claim causation, only
association.


18 Chapter 1 Data Collection

2
Designed experiments, on the other hand, are used whenever control of certain
variables is possible and desirable. This type of research allows the researcher to
identify certain cause and effect relationships among the variables in the study.

So why ever conduct a study through an observational experiment? Often, it is
unethical to conduct an experiment. Consider the link between smoking and lung
cancer. Would you want to participate in a designed experiment to determine if
smoking causes lung cancer in humans? To do so, a researcher would divide a group
of volunteers into two groups. Group 1 would be told to smoke a pack of cigarettes
every day for the next 10 years, while group 2 would not. In addition, the researcher
would control eating habits, sleeping habits, and exercise so that the only difference
between the two groups was smoking. After 10 years the researcher would compare
the incidence rate of lung cancer (the response variable) in the smoking group to
the nonsmoking group. If the two cancer rates differ significantly, we could say that
smoking causes cancer. By approaching the study in this way, we are able to control
many of the factors that might affect the incidence rate of lung cancer that were
beyond our control in the observational study.

Other reasons exist for conducting observational studies over designed experiments.
Kjell Benson and Arthur Hartz wrote an article in the New England Journal
of Medicine in support of observational studies by stating, “observational studies
have several advantages over designed experiments, including lower cost, greater
timeliness, and a broader range of patients.” (Source: Kjell Benson, BA, and Arthur

J. Hartz, MD, PhD. “A Comparison of Observational Studies and Randomized,
Controlled Trials,” New England Journal of Medicine 342:1878–1886, 2000)
For the remainder of this section, and in Sections 1.3 through 1.5, we will look at
obtaining data through various types of observational studies. We look at designed
experiments in Section 1.6.

Explain the Various Types
of Observational Studies

There are three major categories of observational studies: (1) cross-sectional studies,

(2) case-control studies, and (3) cohort studies.
Cross-sectional Studies These are observational studies that collect information
about individuals at a specific point in time or over a very short period of time.

For example, a researcher might want to assess the risk associated with smoking
by looking at a group of people, determining how many are smokers and comparing
the incidence rate of lung cancer of the smokers to the nonsmokers.

A clear advantage of cross-sectional studies is that they are cheap and quick to
do. However, cross-sectional studies have limitations. For our lung cancer study, it
could be that individuals develop cancer after the data are collected, so our study
will not give the full picture.

Case-control Studies These studies are retrospective, meaning that they require
individuals to look back in time or require the researcher to look at existing records.
In case-control studies, individuals that have a certain characteristic are matched
with those that do not.

For example, we might match individuals that have lung cancer with those that
do not. When we say “match” individuals, we mean that we would like the individuals
in the study to be as similar (homogeneous) as possible in terms of demographics
and other variables that may affect the response variable. Once homogeneous
groups are established, we would ask the individuals in each group how much they
smoked. The incidence rate of lung cancer between the two groups would then be
compared.

Certainly, a disadvantage to this type of study is that it requires individuals to
recall information from the past. Plus, it requires the individuals to be truthful in
their responses.An advantage of case-control studies is that they are relatively inexpensive
to conduct and can be done relatively quickly.


Now Work Problem 19 Now Work Problem 19
Definition

Section 1.2 Observational Studies versus Designed Experiments 19

Cohort Studies A cohort study first identifies a group of individuals to participate
in the study (the cohort). The cohort is then observed over a period of time
(sometimes a long period of time). Over this time period, characteristics about the
individuals are recorded and some individuals in the study will be exposed to certain
factors (not intentionally) and others will not. At the end of the study the value of
the response variable is recorded for the individuals.

The observational study in Example 1 is a cohort study that took over 21 years
to complete! The individuals were divided into groups depending on their cell
phone usage.A cohort study was done to further advance the link between lung cancer
and smoking. Typically, cohort studies require many individuals to participate
over long periods of time. Because the data are collected over time, cohort studies
are prospective. Another problem with cohort studies is that individuals tend to
drop out due to the long time frame. This could lead to misleading results. Cohort
studies definitely are the most powerful of the observational studies.

One of the largest cohort studies is the Framingham Heart Study. In this study,
more than 10,000 individuals have been monitored since 1948. The study continues
to this day, with the grandchildren of the original participants taking part in the
study.This cohort study is responsible for many of the breakthroughs in understanding
heart disease. The cost of this study is in excess of $10 million.

Some Concluding Remarks about Observational Studies
versus Designed Experiments

Is a designed experiment superior to an observational study? Not necessarily. Plus,
observational studies play a role in the research process. For example, because
cross-sectional and case-control observational studies are relatively inexpensive,
they provide an opportunity to explore possible associations prior to undertaking
large cohort studies or designing experiments.

Also, it is not always possible to conduct an experiment. For example, we could
not conduct an experiment to investigate the perceived link between high tension
wires and leukemia. Do you see why?

Existing Sources of Data and Census Data

Have you ever heard this saying? There is no point in reinventing the wheel. Well,
there is no point in spending energy obtaining data that already exist either. If a researcher
wishes to conduct a study and a data set exists that can be used to answer
the researcher’s questions, then it would be silly to collect the data from scratch. For
example, various federal agencies regularly collect data that are available to the public.
Some of these agencies include the Centers for Disease Control and Prevention
(www.cdc.gov), the Internal Revenue Service (www.irs.gov), and the Department of
Justice (http://fjsrc.urban.org/index.cfm). In fact, a great website that lists virtually
all the sources of federal data is www.fedstats.gov. Another great source of data is
the General Social Survey (GSS) administered by the University of Chicago. This
survey regularly asks “demographic and attitudinal questions” of individuals around
the country. The website is www.gss.norc.org.

Another source of data is a census.

A census is a list of all individuals in a population along with certain character


istics of each individual.

The United States attempts to conduct a census every 10 years to learn the
demographic makeup of the United States. Everyone whose usual residence is
within the borders of the United States must fill out a questionnaire packet. There
are two different census forms:a short form and a long form.The short form goes to
every household in the United States and includes questions on name, gender, age,
relationship of individuals living in the household, Hispanic origin, race, and housing
tenure (whether the home is owned or rented). About 83% of all households


20 Chapter 1 Data Collection

received the short form in 2000, with the remaining households receiving the long
form. The cost of obtaining the census in 2000 was approximately $6 billion; about
860,000 temporary workers were hired to assist in collecting the data.

Why is the U.S. Census so important? The results of the census are used to
determine the number of representatives in the House of Representatives in each
state, congressional districts, distribution of funds for government programs (such as
Medicaid), and planning for the construction of schools and roads. The first census
of the United States was obtained in 1790 under the direction of Thomas Jefferson.
It is a constitutional mandate that a census be conducted every 10 years (Article 1,
Section 2, of the U.S. Constitution).

Is the United States successful in obtaining a census? Not entirely. Inevitably,
certain individuals in the United States go uncounted. Why? There are a number
of reasons, but a few of the common reasons include illiteracy, language issues,
and homelessness. Given what is at stake politically based on the results of the
census, politicians often debate on how to count these individuals. In fact, statisticians
have offered solutions to the counting problem. The interested reader can
go to www.census.gov and in the search box type count homeless. You will find
many articles related to the Census Bureau’s attempt to count the homeless. The
bottom line is that even census data can have flaws.

1.2 ASSESS YOUR UNDERSTANDING
Concepts and Vocabulary

1.
In your own words, define explanatory variable and response
variable.
2.
What is an observational study? What is a designed experiment?
Which allows the researcher to claim causation
between an explanatory variable and a response variable?
3.
Explain what is meant by confounding. What is a lurking
variable?
4.
Given a choice, would you conduct a study using an observational
study or a designed experiment? Why?
5.
What is a cross-sectional study? What is a case-control
study? Which is the superior observational study? Why?
6.
The data used in the influenza study presented in Example 3
were obtained from a cohort study. What does this mean?
Why is a cohort study superior to a case-control study?
7.
Explain why it would be unlikely to use a designed experiment
to answer the research question posed in Example 3.
8.
What does it mean when an observational study is retrospective?
What does it mean when an observational study is
prospective?
Skill Building

In Problems 9–16, determine whether the study depicts an observational
study or an experiment.

9.
Researchers wanted to know if there is a link between
proximity to high-tension wires and the rate of leukemia in
children. To conduct the study, researchers compared the
incidence rate of leukemia for children who lived within 1 2
mile of high-tension wires to the incidence rate of leukemia
for children who did not live within 1 2 mile of high-tension
wires.
NW
10.
Rats with cancer are divided into two groups. One group receives
5 milligrams (mg) of a medication that is thought to
fight cancer, and the other receives 10 mg. After 2 years, the
spread of the cancer is measured.
11.
Seventh-grade students are randomly divided into two
groups. One group is taught math using traditional techniques;
the other is taught math using a reform method.After
1 year, each group is given an achievement test to compare
proficiency.
12.
A poll is conducted in which 500 people are asked whom they
plan to vote for in the upcoming election.
13.
A survey is conducted asking 400 people, “Do you prefer
Coke or Pepsi?”
14.
While shopping, 200 people are asked to perform a taste test
in which they drink from two randomly placed, unmarked
cups.They are then asked which drink they prefer.
15.
Sixty patients with carpal tunnel syndrome are randomly
divided into two groups. One group is treated weekly with
both acupuncture and an exercise regimen. The other is
treated weekly with the exact same exercise regimen, but no
acupuncture. After 1 year, both groups are questioned about
their level of pain due to carpal tunnel syndrome.
16.
Conservation agents netted 250 large-mouth bass in a lake
and determined how many were carrying parasites.
Applying the Concepts

17.
Daily Coffee Consumption Researchers wanted to determine
if there was an association between daily coffee
consumption and the occurrence of skin cancer. The researchers
looked at 93,676 women enrolled in the Women’s
Health Initiative Observational Study and asked them to
report their coffee-drinking habits. The researchers also determined
which of the women had nonmelanoma skin cancer.
After their analysis, the researchers concluded that
consumption of six or more cups of caffeinated coffee per
day was associated with a reduction in nonmelanoma skin
cancer.
Source: European Journal of Cancer Prevention, 16(5):
446–452, October 2007


Section 1.2 Observational Studies versus Designed Experiments 21

(a) What type of observational study was this? Explain.
(b) What is the response variable in the study? What is the
explanatory variable?
(c) In
their report, the researchers stated that “After
adjusting for various demographic and lifestyle variables,
daily consumption of six or more cups was associated
with a 30% reduced prevalence of nonmelanoma
skin cancer.” Why was it important to adjust for these
variables?
18.
Obesity and Artery Calcification Scientists were interested
in determining if abdominal obesity is related to coronary
artery calcification (CAC). The scientists studied 2,951 participants
in the Coronary Artery Risk Development in
Young Adults Study to investigate a possible link. Waist and
hip girths were measured in 1985–1986, 1995–1996 (year 10),
and in 2000–2001 (waist girth only). CAC measurements
were taken in 2001–2002. The results of the study indicated
that abdominal obesity measured by waist girth is associated
with early atherosclerosclerosis as measured by the presence
of CAC in participants.
Source: American Journal of Clinical Nutrition, 86(1): 48–54,
2007

(a) What type of observational study was this? Explain.
(b) What is the response variable in the study? What is the
explanatory variable?
19.
Television in the Bedroom Researchers Christelle Delmas
and associates wanted to determine if having a television
(TV) in the bedroom is associated with obesity. The researchers
administered a questionnaire to 379 twelve-yearold
French adolescents. After analyzing the results, the
researchers determined that the body mass index of the adolescents
who had a TV in their bedroom was significantly
higher than that of the adolescents who did not have a TV in
their bedroom.
NW
Source: Christelle Delmas, Carine Platat, Brigette Schweitzer,
Aline Wagner,Mohamed Oujaa,and Chantal Simon.“Association
Between Television in Bedroom and Adiposity Throughout
Adolescence,” Obesity, 15:2495–2503, 2007

(a) Why is this an observational study? What type of observational
study is this?
(b) What is the response variable in the study? What is the
explanatory variable?
(c) Can you think of any lurking variables that may affect
the results of the study?
(d) In the report, the researchers stated, “These results
remain significant after adjustment for socioeconomic
status.” What does this mean?
(e) Does a television in the bedroom cause a higher body
mass index? Explain.
20.
Get Married, Gain Weight Researcher Penny Gordon-
Larson and her associate wanted to determine whether
young couples who marry or cohabitate are more likely to
gain weight than those who stay single. The researchers followed
8,000 men and women from 1995 through 2002 as they
matured from the teens to young adults. When the study
began, none of the participants was married or living with a
romantic partner. By 2002, 14% of the participants were
married and 16% were living with a romantic partner.At the
end of the study, married or cohabiting women gained, on
average, 9 pounds more than single women, and married
or cohabiting men gained, on average, 6 pounds more than
single men.

(a) Why is this an observational study? What type of observational
study is this?
(b) What is the response variable in the study? What is the
explanatory variable?
(c) Identify some potential lurking variables in this study.
(d) Does getting married or cohabiting cause one to gain
weight? Explain.
21.
Analyze the Article Write a summary of the following opinion.
The opinion is posted at abcnews.com. Include the type
of study conducted, possible lurking variables, and conclusions.
What is the message of the author of the article?
Power Lines and Cancer—To Move or Not to Move

New Research May Cause More Fear Than Warranted, One
Physician Explains

OPINION by JOSEPH MOORE, M.D.
May 30, 2007—
A recent study out of Switzerland indicates there might be
an increased risk of certain blood cancers in people with
prolonged exposure to electromagnetic fields, like those
generated from high-voltage power lines.
If you live in a house near one of these high-voltage power
lines, a study like this one might make you wonder whether
you should move.
But based on what we know now, I don’t think that’s necessary.
We can never say there is no risk, but we can say that
the risk appears to be extremely small.

”Scare Science”

The results of studies like this add a bit more to our knowledge
of potential harmful environmental exposures, but
they should also be seen in conjunction with the results of
hundreds of studies that have gone before. It cannot be
seen as a definitive call to action in and of itself.
The current study followed more than 20,000 Swiss railway
workers over a period of 30 years. True, that represents a lot
of people over a long period of time.
However, the problem with many epidemiological studies,
like this one, is that it is difficult to have an absolute control
group of people to compare results with. The researchers
compared the incidence of different cancers of workers with
a high amount of electromagnetic field exposure to those
workers with lower exposures.
These studies aren’t like those that have identified definitive
links between an exposure and a disease—like those involving
smoking and lung cancer. In those studies, we can
actually measure the damage done to lung tissue as a direct
result of smoking. But usually it’s very difficult for the conclusions
of an epidemiological study to rise to the level of controlled
studies in determining public policy.
Remember the recent scare about coffee and increased risk
of pancreatic cancer? Or the always-simmering issue of cell
phone use and brain tumors?
As far as I can tell, none of us have turned in our cell
phones. In our own minds, we’ve decided that any links to
cell phone use and brain cancer have not been proven
definitively. While we can’t say that there is absolutely no
risk in using cell phones, individuals have determined on


22 Chapter 1 Data Collection

their own that the potential risks appear to be quite small
and are outweighed by the benefits.

Findings Shouldn’t Lead to Fear

As a society, we should continue to investigate these and
other related exposures to try to prove one way or another
whether they are disease-causing. If we don’t continue to
study, we won’t find out. It’s that simple.
When findings like these come out, and I’m sure there will
be more in the future, I would advise people not to lose
their heads. Remain calm. You should take the results as we
scientists do—as intriguing pieces of data about a problem
we will eventually learn more about, either positively or
negatively, in the future. It should not necessarily alter what
we do right now.
What we can do is take actions that we know will reduce our
chances of developing cancer.
Stop smoking and avoid passive smoke. It is the leading
cause of cancer that individuals have control over.
Whenever you go outside, put on sunscreen or cover up.
Eat a healthy diet and stay physically active.
Make sure you get tested or screened. Procedures like
colonoscopies, mammograms, pap smears and prostate
exams can catch the early signs of cancer, when the chances
of successfully treating them are the best.
Taking the actions above will go much farther in reducing
your risks for cancer than moving away from power lines or
throwing away your cell phone.

Dr. Joseph Moore is a medical oncologist at Duke University
Comprehensive Cancer Center.
Source: Reprinted with the permission of the author.


22.
Reread the article in Problem 61 from Section 1.1.What type
of observational study does this appear to be? Name some
lurking variables that the researchers accounted for.
23.
Putting It Together: Passive Smoke The following abstract
appears in The New England Journal of Medicine:
BACKGROUND. The relation between passive smoking
and lung cancer is of great public health importance. Some
previous studies have suggested that exposure to environmental
tobacco smoke in the household can cause lung
cancer, but others have found no effect. Smoking by the
spouse has been the most commonly used measure of this
exposure.

METHODS. In order to determine whether lung cancer is
associated with exposure to tobacco smoke within the household,
we conducted a case-control study of 191 patients with
lung cancer who had never smoked and an equal number of
persons without lung cancer who had never smoked. Lifetime
residential histories including information on exposure to
environmental tobacco smoke were compiled and analyzed.
Exposure was measured in terms of “smoker-years,” determined
by multiplying the number of years in each residence
by the number of smokers in the household.

RESULTS. Household exposure to 25 or more smoker-
years during childhood and adolescence doubled the risk of
lung cancer.Approximately 15 percent of the control subjects
who had never smoked reported this level of exposure.
Household exposure of less than 25 smoker-years during
childhood and adolescence did not increase the risk of lung
cancer. Exposure to a spouse’s smoking, which constituted
less than one third of total household exposure on average,
was not associated with an increase in risk.

CONCLUSIONS. The possibility of recall bias and other
methodologic problems may influence the results of case-
control studies of environmental tobacco smoke. Nonetheless,
our findings regarding exposure during early life suggest
that approximately 17 percent of lung cancers among nonsmokers
can be attributed to high levels of exposure to cigarette
smoke during childhood and adolescence.

(a) What is the research objective?
(b) What makes this study a case-control study? Why is this
a retrospective study?
(c) What is the explanatory variable in the study? Is it qualitative
or quantitative?
(d) Can you identify any lurking variables that may have
affected this study?
(e) What is the conclusion of the study? Does exposure to
smoke in the household cause lung cancer?
(f)
Would it be possible to design an experiment to answer
the research question in part (a)? Explain.
1.3 SIMPLE RANDOM SAMPLING
Objective 1 Obtain a simple random sample
Sampling

Besides the observational studies that we looked at in Section 1.2, observational studies
can also be conducted by administering a survey.Whenever administering a survey,
the researcher must first identify the population that is to be targeted. For example,
the Gallup Organization regularly surveys Americans about various pop-culture and
political issues. Often, the population of interest in these surveys is adult Americans
aged 18 years or older. Of course, it is unreasonable to expect the Gallup Organization
to survey all adult Americans (there are over 200 million), so instead the Gallup
Organization will typically survey a random sample of about 1,000 adult Americans.


Section 1.3 Simple Random Sampling 23

Definition
Random sampling is the process of using chance to select individuals from a
population to be included in the sample.

What allows a researcher to be confident the results of a survey accurately reflect
the feelings of an entire population? For the results of a survey to be reliable, the
characteristics of the individuals in the sample must be representative of the characteristics
of the individuals in the population. How can this be accomplished? The key
to obtaining a sample representative of a population is to let chance or randomness
play a role in dictating which individuals are in the sample, rather than convenience.
If convenience is used to obtain a sample, the results of the survey are meaningless.

For example, suppose that Gallup wants to know the proportion of adult Americans
who consider themselves to be baseball fans. If Gallup obtained a sample by
standing outside of Fenway Park (home of the Boston Red Sox professional baseball
team),the results of the survey are not likely to be reliable.Why? Clearly,the individuals
in the sample do not accurately reflect the makeup of the entire population. As
another example, suppose you wanted to learn the proportion of students on your
campus who work. It might be convenient to survey the students in your statistics
class, but do the students in your class represent the overall student body? Is the proportion
of freshmen, sophomores, juniors, and seniors in your class close to the
proportion of freshmen, sophomores, juniors, and seniors on campus? Is the proportion
of males and females in your class close to the proportion of males and females
on campus? Probably not. For this reason, the convenient sample is not representative
of the population, which means the results of your survey are misleading.

We will discuss four basic sampling techniques: simple random sampling, stratified
sampling, systematic sampling, and cluster sampling.These sampling methods are
designed so that any selection biases introduced (knowingly or unknowingly) by the
surveyor during the selection process are eliminated. In other words, the surveyor
does not have a choice as to which individuals are in the study.We will discuss simple
random sampling now and the remaining three types of sampling in the next section.

1 Obtain a Simple Random Sample

The most basic sample survey design is simple random sampling.

Definition
A sample of size n from a population of size N is obtained through simple
random sampling if every possible sample of size n has an equally likely chance
of occurring. The sample is then called a simple random sample.

In Other Words

Simple random sampling is like selecting

names from a hat.
The sample is always a subset of the population, meaning that the number of individuals
in the sample is less than the number of individuals in the population.

Simple Random Sampling
This activity illustrates the idea of simple random sampling.
(a) Choose 5 students in the class to represent a population. Number
the students 1 through 5.
(b) Form all possible samples of size from the population of size
How many different simple random samples are possible?
(c) Write the numbers 1 through 5 on five pieces of paper and then
place the paper in a hat. Select two of the numbers. The two individuals
corresponding to these numbers are in the sample.
(d) Put the two numbers back in the hat. Select two of the numbers.
The two individuals corresponding to these numbers are in the sample.
Are the individuals in the second sample the same as the individuals in
the first sample?
N=5.
n=2

24 Chapter 1 Data Collection

EXAMPLE 1
Illustrating Simple Random Sampling

Problem: Sophia has four tickets to a concert. Six of her friends, Yolanda, Michael,
Kevin, Marissa, Annie,and Katie,have all expressed an interest in going to the concert.
Sophia decides to randomly select three of her six friends to attend the concert.

(a) List all possible samples of size n
=
3 (without replacement) from the population
of size N
=
6.
(b) Commenton thelikelihoodofthesamplecontainingMichael,Kevin,andMarissa.
Approach: We list all possible combinations of three people chosen from the six.
Remember, in simple random sampling, each sample of size 3 is equally likely to occur.

Solution

(a) The possible samples of size 3 are listed in Table 2.
Table 2
Yolanda, Michael, Kevin Yolanda, Michael, Marissa Yolanda, Michael, Annie Yolanda, Michael, Katie

Yolanda, Kevin, Marissa Yolanda, Kevin, Annie Yolanda, Kevin, Katie Yolanda, Marissa, Annie

Yolanda, Marissa, Katie Yolanda, Annie, Katie Michael, Kevin, Marissa Michael, Kevin, Annie

Michael, Kevin, Katie Michael, Marissa, Annie Michael, Marissa, Katie Michael, Annie, Katie

Kevin, Marissa, Annie

Now Work Problem 7

In Other Words

A frame lists all the individuals in a
population. For example, a list of all

registered voters in a particular precinct
might be a frame.

EXAMPLE 2
Kevin, Marissa, Katie Kevin, Annie, Katie Marissa, Annie, Katie

From Table 2, we see that there are 20 possible samples of size 3 from the population
of size 6. We use the term sample to mean the individuals in the sample.

(b) There is 1 sample that contains Michael, Kevin, and Marissa and 20 possible samples,
so there isa1in20 chance that the simple random sample will contain Michael,
Kevin, and Marissa. In fact, all the samples of size 3 havea1in20 chance of occurring.
Obtaining a Simple Random Sample

The results of Example 1 leave one question unanswered: How do we select the
individuals in a simple random sample? To obtain a simple random sample from a
population, we could write the names of the individuals in the population on different
sheets of paper and then select names from a hat.

Often, however, the size of the population is so large that performing simple
random sampling in this fashion is not practical.Typically,random numbers are used
by assigning each individual in the population a unique number between 1 and N,
where N is the size of the population. Then n random numbers from this list are
selected, where n represents the size of the sample. Because we must number
the individuals in the population, we must have a list of all the individuals within the
population, called a frame.

Obtaining a Simple Random Sample

Problem: Senese and Associates has increased its accounting business. To make
sure their clients are still satisfied with the services they are receiving, Senese and
Associates decides to send a survey out to a simple random sample of 5 of its 30 clients.

Approach

Step 1: A list of the 30 clients must be obtained (the frame). Each client is then

assigned a unique number from 01 to 30.
Step 2: Five unique numbers will be randomly selected. The clients corresponding
to the numbers are sent a survey.This process is called sampling without replacement.
When we sample without replacement, once an individual is selected, he or she
is removed from the population and cannot be chosen again. Contrast this with
sampling with replacement, which means the selected individual is placed back into


Section 1.3 Simple Random Sampling 25

the population and so could be chosen a second time. We use sampling without
replacement so that we don’t select the same client twice.

Solution

Step 1: Table 3 shows the list of clients. We arrange the clients in alphabetic order
(although this is not necessary). Because there are 30 clients, we number the clients
from 01 to 30.

Table 3
01. ABC Electric 11. Fox Studios 21. R&Q Realty
02. Brassil Construction 12. Haynes Hauling 22. Ritter Engineering
03. Bridal Zone 13. House of Hair 23. Simplex Forms
04. Casey’s Glass House 14. John’s Bakery 24. Spruce Landscaping
05. Chicago Locksmith 15. Logistics Management, Inc. 25. Thors, Robert DDS
06. DeSoto Painting 16. Lucky Larry’s Bistro 26. Travel Zone
07. Dino Jump 17. Moe’s Exterminating 27. Ultimate Electric
08. Euro Car Care 18. Nick’s Tavern 28. Venetian Gardens Restaurant
09. Farrell’s Antiques 19. Orion Bowling 29. Walker Insurance

10. First Fifth Bank 20. Precise Plumbing 30. Worldwide Wireless
Step 2: A table of random numbers can be used to select the individuals to be in the
sample.See Table 4.*We select a starting place in the table of random numbers.This

Column 4

Number 01–05 06–10 11–15 16–20 21–25 26–30 31–35 36–40 41–45 46–50

Row 13 13 96101 30646 35526 90389 73634 79304 96635 06626 94683 16696

Table 4
Column Number

Row

01 89392 23212 74483 36590 25956 36544 68518 40805 09980 00467

02 61458 17639 96252 95649 73727 33912 72896 66218 52341 97141

03 11452 74197 81962 48433 90360 26480 73231 37740 26628 44690

04 27575 04429 31308 02241 01698 19191 18948 78871 36030 23980

05 36829 59109 88976 46845 28329 47460 88944 08264 00843 84592

06 81902 93458 42161 26099 09419 89073 82849 09160 61845 40906

07 59761 55212 33360 68751 86737 79743 85262 31887 37879 17525

08 46827 25906 64708 20307 78423 15910 86548 08763 47050 18513

09 24040 66449 32353 83668 13874 86741 81312 54185 78824 00718

10 98144 96372 50277 15571 82261 66628 31457 00377 63423 55141

11 14228 17930 30118 00438 49666 65189 62869 31304 17117 71489

12 55366 51057 90065 14791 62426 02957 85518 28822 30588 32798

14 38152 55474 30153 26525 83647 31988 82182 98377 33802 80471

15 85007 18416 24661 95581 45868 15662 28906 36392 07617 50248

16 85544 15890 80011 18160 33468 84106 40603 01315 74664 20553

17 10446 20699 98370 17684 16932 80449 92654 02084 19985 59321

18 67237 45509 17638 65115 29757 80705 82686 48565 72612 61760

19 23026 89817 05403 82209 30573 47501 00135 33955 50250 72592

20 67411 58542 18678 46491 13219 84084 27783 34508 55158 78742

We skip 52 because
it is larger than 30.


*Each digit is in its own column.The digits are displayed in groups of five for ease of reading.The digits in
row 1 are 893922321274483, and so on. The first digit, 8, is in column 1; the second digit, 9, is in column 2;
the ninth digit, 1, is in column 9.


26 Chapter 1 Data Collection

can be done by closing your eyes and placing your finger on the table. This may
sound haphazard, but it accomplishes the goal of being random. Suppose we start in
column 4, row 13. Because our data have two digits, we select two-digit numbers
from the table using columns 4 and 5. We only select numbers greater than or equal
to 01 and less than or equal to 30. Anytime we encounter 00, a number greater than
30, or a number already selected, we skip it and continue to the next number.

The first number in the list is 01, so the client corresponding to 01 will receive a
survey. Moving down the list, the next number is 52. Because 52 is greater than 30, we
skip it. Continuing down the list, the following numbers are selected from the list:

01, 07, 26, 11, 23

The clients corresponding to these numbers are

ABC Electric, Dino Jump, Travel Zone, Fox Studios, Simplex Forms

Each random number used to select the individuals in the sample is set in boldface
type in Table 4 to help you to understand where the numbers come from.


EXAMPLE 3
Obtaining a Simple Random Sample Using Technology

Problem: Find a simple random sample of five clients for the problem presented in
Example 2.

Approach: The approach is similar to that given in Example 2.

Step 1: A list of the 30 clients must be obtained (the frame). The clients are then
assigned a number from 01 to 30.
Step 2: Five numbers are randomly selected using a random number generator.The

clients corresponding to the numbers are given a survey. We sample without replacement
so that we don’t select the same client twice. To use a random-number generator
using technology, we must first set the seed. The seed in a random-number
generator provides an initial point for the generator to start creating random numbers.
It is just like selecting the initial point in the table of random numbers. The
seed can be any nonzero number. Statistical software such as MINITAB or Excel
can be used to generate random numbers, but we will use a TI-84 Plus graphing
calculator. The steps for obtaining random numbers using MINITAB, Excel, and
the TI-83/84 graphing calculator can be found in the Technology Step-by-Step on
page 29.

Solution

Step 1: Table 3 on page 25 shows the list of clients and numbers corresponding to

the clients.
Step 2: See Figure 3(a) for an illustration of setting the seed using a TI-84 Plus
graphing calculator, where the seed is set at 34. We are now ready to obtain the list
of random numbers. Figure 3(b) shows the results obtained from a TI-84 Plus graphing
calculator.


Figure 3

Using Technology

If you are using a different statistical
package or type of calculator, the
random numbers generated will
likely be different. This does not
mean you are wrong. There is no
such thing as a wrong random
sample as long as the correct
procedures are followed.


(a)
(b)

Section 1.3 Simple Random Sampling 27

Now Work Problem 11
Random-number generators
are not truly random, because they are
programs, and programs do not act
“randomly.” The seed dictates the
random numbers that are generated.

The following numbers are generated by the calculator:

11, 4, 20, 29, 11, 27

We ignore the second 11 because we are sampling without replacement. The clients
corresponding to these numbers are the clients to be surveyed: Fox Studios, Casey’s
Glass House, Precise Plumbing, Walker Insurance, and Ultimate Electric.


There is a very important consequence when comparing the by hand and
technology solutions from Examples 2 and 3. Because both samples were obtained
randomly, they resulted in different individuals in the sample! For this reason, each
sample will likely result in different descriptive statistics. Any inference based on
each sample may result in different conclusions regarding the population.This is the
nature of statistics. Inferences based on samples will vary because the individuals in
different samples vary.

1.3 ASSESS YOUR UNDERSTANDING
Concepts and Vocabulary

1.
Explain why a frame is necessary to obtain a simple random
sample.
2.
Discuss why sampling is used in statistics.
3.
What does it mean when sampling is done without
replacement?
4.
What is random sampling? Why is it used and how does it
compare with convenience sampling?
Skill Building

5.
Literature As part of a college literature course, students
must select three classic works of literature from the provided
list and complete critical book reviews for each selected
work. Obtain a simple random sample of size 3 from this list.
Write a short description of the process you used to generate
your sample.
Pride and Prejudice The Sun Also Rises The Jungle

As I Lay Dying A Tale of Two Cities Huckleberry Finn

Death of a Salesman Scarlet Letter Crime and Punishment

6.
Team Captains A coach must select two players to serve as
captains at the beginning of a soccer match. He has 10 players
on his team and, to be fair, wants to randomly select 2
players to be the captains. Obtain a simple random sample of
size 2 from the following list. Write a short description of the
process you used to generate your sample.
Mady Breanne Jory

Evin Tori Payton

Emily Claire Jordyn

Caty

7.
Course Selection A student entering a doctoral program in
educational psychology is required to select two courses from
the list of courses provided as part of his or her program.
NW
EPR 616, Research in Child Development
EPR 630,Educational Research Planning and Interpretation
EPR 631, Nonparametric Statistics

EPR 632, Methods of Multivariate Analysis
EPR 645, Theory of Measurement
EPR 649, Fieldwork Methods in Educational Research
EPR 650, Interpretive Methods in Educational Research

(a) List all possible two-course selections.
(b) Comment on the likelihood that the pair of courses EPR
630 and EPR 645 will be selected.
8.
Merit Badge Requirements To complete the Citizenship in
the World merit badge, one must select TWO of the following
organizations and describe their role in the world.
Source: Boy Scouts of America
1. The United Nations
2. The World Court
3. World Organization of the Scout Movement
4. The World Health Organization
5. Amnesty International
6. The International Committee of the Red Cross
7. CARE
(a) List all possible pairs of organizations.
(b) Comment on the likelihood that the pair The United
Nations and Amnesty International will be selected.
Applying the Concepts

9.
Sampling the Faculty A small community college employs
87 full-time faculty members. To gain the faculty’s opinions
about an upcoming building project, the college president
wishes to obtain a simple random sample that will consist of
9 faculty members. He numbers the faculty from 1 to 87.
(a) Using Table I from Appendix A, the president closes his
eyes and drops his ink pen on the table. It points to the digit
in row 5,column 22.Using this position as the starting point
and proceeding downward, determine the numbers for the
9 faculty members who will be included in the sample.
(b) If the president uses technology, determine the numbers
for the 9 faculty members who will be included in the
sample.
10.
Sampling the Students The same community college from
Problem 9 has 7,656 students currently enrolled in classes.

28 Chapter 1 Data Collection

To gain the students’ opinions about an upcoming building project, the college president wishes to obtain a simple random sample of
20 students. He numbers the students from 1 to 7,656.

(a) Using Table I from Appendix A, the president closes his eyes and drops his ink pen on the table. It points to the digit in row 11,
column 32. Using this position as the starting point and proceeding downward, determine the numbers for the 20 students who
will be included in the sample.
(b) If the president uses technology, determine the numbers for the 20 students who will be included in the sample.
11. Obtaining a Simple Random Sample The following table lists the 50 states.
NW
(a) Obtain a simple random sample of size 10 using Table I in Appendix A, a graphing calculator, or computer software.
(b) Obtain a second simple random sample of size 10 using Table I in Appendix A, a graphing calculator, or computer software.
1. Alabama 11. Hawaii 21. Massachusetts 31. New Mexico 41. South Dakota
2. Alaska 12. Idaho 22. Michigan 32. New York 42. Tennessee
3. Arizona 13. Illinois 23. Minnesota 33. North Carolina 43. Texas
4. Arkansas 14. Indiana 24. Mississippi 34. North Dakota 44. Utah
5. California 15. Iowa 25. Missouri 35. Ohio 45. Vermont
6. Colorado 16. Kansas 26. Montana 36. Oklahoma 46. Virginia
7. Connecticut 17. Kentucky 27. Nebraska 37. Oregon 47. Washington
8. Delaware 18. Louisiana 28. Nevada 38. Pennsylvania 48. West Virginia
9. Florida 19. Maine 29. New Hampshire 39. Rhode Island 49. Wisconsin

10. Georgia 20. Maryland 30. New Jersey 40. South Carolina 50. Wyoming
12. Obtaining a Simple Random Sample The following table lists the 44 presidents of the United States.
(a) Obtain a simple random sample of size 8 using Table I in Appendix A, a graphing calculator, or computer software.
(b) Obtain a second simple random sample of size 8 using Table I in Appendix A, a graphing calculator, or computer software.
1. Washington 10. Tyler 19. Hayes 28. Wilson 37. Nixon
2. J. Adams 11. Polk 20. Garfield 29. Harding 38. Ford
3. Jefferson 12. Taylor 21. Arthur 30. Coolidge 39. Carter
4. Madison 13. Fillmore 22. Cleveland 31. Hoover 40. Reagan
5. Monroe 14. Pierce 23. B. Harrison 32. F. D. Roosevelt 41. G. H. Bush
6. J. Q. Adams 15. Buchanan 24. Cleveland 33. Truman 42. Clinton
7. Jackson 16. Lincoln 25. McKinley 34. Eisenhower 43. G. W. Bush
8. Van Buren 17. A. Johnson 26. T. Roosevelt 35. Kennedy 44. Obama
9. W. H. Harrison 18. Grant 27. Taft 36. L. B. Johnson

13.
Obtaining a Simple Random Sample Suppose you are the
president of the student government. You wish to conduct a
survey to determine the student body’s opinion regarding
student services. The administration provides you with a list
of the names and phone numbers of the 19,935 registered
students.
(a) Discuss the procedure you would follow to obtain a simple
random sample of 25 students.
(b) Obtain this sample.
14.
Obtaining a Simple Random Sample Suppose the mayor of
Justice, Illinois, asks you to poll the residents of the village.
The mayor provides you with a list of the names and phone
numbers of the 5,832 residents of the village.
(a) Discuss the procedure you would follow to obtain a
simple random sample of 20 residents.
(b) Obtain this sample.
15.
Future Government Club The Future Government Club
wants to sponsor a panel discussion on the upcoming
national election. The club wants four of its members to lead
the panel discussion. Obtain a simple random sample of size
4 from the table.Write a short description of the process you
used to generate your sample.
Blouin Fallenbuchel Niemeyer Rice
Bolden Grajewski Nolan Salihar
Bolt Haydra Ochs Tate
Carter Keating Opacian Thompson
Cooper Khouri Pawlak Trudeau
Debold Lukens Pechtold Washington
De Young May Ramirez Wright
Engler Motola Redmond Zenkel


Section 1.3 Simple Random Sampling 29

16. Worker Morale The owner of a private food store is con-
Archer Foushi Kemp Oliver

cerned about employee morale. She decides to survey the
employees to see if she can learn about work environment Bolcerek Gow Lathus Orsini
and job satisfaction. Obtain a simple random sample of size Bryant Grove Lindsey Salazar
5 from the names in the given table.Write a short description

Carlisle Hall Massie Ullrich

of the process you used to generate your sample.

Cole Hills McGuffin Vaneck
Dimas Houston Musa Weber
Ellison Kats Nickas Zavodny
Everhart

TECHNOLOGY STEP-BY-STEP Obtaining a Simple Random Sample

TI-83/84 Plus The reason we generate 10 rows of data (instead

1. Enter any nonzero number (the seed) on the of 5) is in case any of the random numbers repeat.
HOME screen. Select OK, and the random numbers will appear in
2. Press the STO N
button. column 1 (C1) in the spreadsheet.
3. Press the MATHbutton.
Excel

4. Highlight the PRBmenu and select 1: rand.
1. Be sure the Data Analysis Tool Pak is activated.
5. From the HOME screen press ENTER.
This is done by selecting the Tools menu and

6. Press the MATHbutton. Highlight PRB menu and
highlighting Add – Ins Á
. Check the box for the

select 5: randInt(.

Analysis ToolPak and select OK.

7. With randInt( on the HOME screen, enter 1,
2. Select Tools and highlight Data Analysis Á
.
N, where N is the population size. For example, if

Highlight Random Number Generation and

N
=
500, enter the following:

select OK.

randInt(1,500) 3. Fill in the window with the appropriate values.
Press ENTER to obtain the first individual in the To obtain a simple random sample for the situation
sample. Continue pressing ENTER until the desired in Example 2, we would fill in the following:
sample size is obtained.

MINITAB

1. Select the Calc menu and highlight Set
Base Á
.
2. Enter any seed number you desire. Note that it
is not necessary to set the seed, because MINITAB
uses the time of day in seconds to set the seed.
3. Select the Calc menu, highlight Random Data,
and select Integer Á
.
4. Fill in the following window with the appropriate
values.To obtain a simple random sample for the
situation in Example 2,we would enter the following:
The reason we generate 10 rows of data (instead of 5)
is in case any of the random numbers repeat. Notice
also that the parameter is between 1 and 31, so any
value greater than or equal to 1 and less than or equal
to 31 is possible. In the unlikely event that 31 appears,
simply ignore it. Select OK and the random numbers
will appear in column 1 (A1) in the spreadsheet.
Ignore any values to the right of the decimal place.



30 Chapter 1 Data Collection

1.4 OTHER EFFECTIVE SAMPLING METHODS
Objectives 1 Obtain a stratified sample
2 Obtain a systematic sample
3 Obtain a cluster sample
1
The goal of sampling is to obtain as much information as possible about the population
at the least cost. Remember, we are using the word cost in a general sense. Cost
includes monetary outlays,time,and other resources.With this goal in mind,we may
find it advantageous to use sampling techniques other than simple random sampling.

Obtain a Stratified Sample

Under certain circumstances, stratified sampling provides more information about
the population for less cost than simple random sampling.

Definition
A stratified sample is obtained by separating the population into nonoverlapping
groups called strata and then obtaining a simple random sample from
each stratum. The individuals within each stratum should be homogeneous (or
similar) in some way.

For example, suppose Congress was considering a bill that abolishes estate
taxes. In an effort to determine the opinion of her constituency, a senator asks a pollster
to conduct a survey within her district. The pollster may divide the population
of registered voters within the district into three strata: Republican, Democrat, and
Independent. This grouping makes sense because the members within each of the
three party affiliations may have the same opinion regarding estate taxes, but opinions
between parties may differ. The main criterion in performing a stratified sam-

In Other Words
ple is that each group (stratum) must have a common attribute that results in the

Stratum is singular, while strata is individuals being similar within the stratum.

plural. The word strata means divisions. An advantage of stratified sampling over simple random sampling is that it may

So a stratified sample is a simple allow fewer individuals to be surveyed while obtaining the same or more informa


random sample of different divisions of

tion. This result occurs because individuals within each subgroup have similar

the population.

characteristics, so opinions within the group are not as likely to vary much from one
individual to the next. In addition, a stratified sample guarantees that each stratum
is represented in the sample.

EXAMPLE 1
Obtaining a Stratified Sample

Problem: The president of DePaul University wants to conduct a survey to determine
the community’s opinion regarding campus safety. The president divides the
DePaul community into three groups: resident students, nonresident (commuting)
students, and staff (including faculty) so that he can obtain a stratified sample. Suppose
there are 6,204 resident students, 13,304 nonresident students, and 2,401 staff,
for a total of 21,909 individuals in the population. The president wants to obtain a
sample of size 100, with the number of individuals selected from each stratum
weighted by the population size. So resident students make up 6,204/21,909 =
28%
of the sample, nonresident students account for 61% of the sample, and staff constitute
11% of the sample. To obtain a sample of size 100, the president will obtain a
stratified sample of 0.2811002
=
28 resident students, 0.6111002
=
61 nonresident
students, and 0.1111002
=
11 staff.

Approach: To obtain the stratified sample, conduct a simple random sample within
each group.That is,obtain a simple random sample of 28 resident students (from the
6,204 resident students), a simple random sample of 61 nonresident students, and a
simple random sample of 11 staff. Be sure to use a different seed for each stratum.


Section 1.4 Other Effective Sampling Methods 31
Solution: Using MINITAB, with the seed set to 4032 and the values shown in
Figure 4, we obtain the following sample of staff:
240, 630, 847, 190, 2096, 705, 2320, 323, 701, 471, 744

Figure 4


Do not use the same seed

(or starting point in Table I) for all the
groups in a stratified sample, because
we want the simple random samples
within each stratum to be
independent of each other.


Repeat this procedure for the resident and nonresident students using a different
seed.

An advantage of stratified sampling over simple random sampling is that the
researcher is able to determine characteristics within each stratum. This allows an
analysis to be performed on each subgroup to see if any significant differences between
the groups exist. For example, we could analyze the data obtained in Example 1
to see if there is a difference in the opinions of students versus staff.

Now Work Problem 25
2
Obtain a Systematic Sample

In both simple random sampling and stratified sampling, it is necessary for a list of
the individuals in the population being studied (the frame) to exist. Therefore, these
sampling techniques require some preliminary work before the sample is obtained.
A sampling technique that does not require a frame is systematic sampling.

Definition
A systematic sample is obtained by selecting every kth individual from the
population. The first individual selected corresponds to a random number
between 1 and k.

Because systematic sampling does not require a frame, it is a useful technique

when you can’t obtain a list of the individuals in the population that you wish to

study.

The idea behind obtaining a systematic sample is relatively simple: Select a
number k, randomly select a number between 1 and k and survey that individual,
then survey every kth individual thereafter. For example, we might decide to survey
every k
=
8th individual. We randomly select a number between 1 and 8 such as 5.
This means we survey the 5th, 5 +
8 =
13th, 13 +
8 =
21st, 21 +
8 =
29th, and so
on, individuals until we reach the desired sample size.

EXAMPLE 2
Obtaining a Systematic Sample without a Frame

Problem: The manager of Kroger Food Stores wants to measure the satisfaction
of the store’s customers. Design a sampling technique that can be used to obtain a
sample of 40 customers.

Approach: A frame of Kroger customers would be difficult, if not impossible, to
obtain. Therefore, it is reasonable to use systematic sampling by surveying every
kth customer who leaves the store.


32 Chapter 1 Data Collection

3
Now Work Problem 27
Solution: The manager decides to obtain a systematic sample by surveying
every 7th customer. He randomly determines a number between 1 and 7, say 5.
He then surveys the 5th customer exiting the store and every 7th customer thereafter,
until a sample of 40 customers is reached. The survey will include customers
5, 12, 19, Á
, 278.*


But how do we select the value of k? If the size of the population is unknown,
there is no mathematical way to determine k. It must be chosen by determining a
value of k that is not so large that we are unable to achieve our desired sample
size, but not so small that we obtain a sample that is not representative of the
population.

To clarify this point, let’s revisit Example 2. Suppose we chose a value of k that
was too large, say 30. This means that we will survey every 30th shopper, starting
with the 5th. To obtain a sample of size 40 would require that 1,175 shoppers visit
Kroger on that day. If Kroger does not have 1,175 shoppers, the desired sample size
will not be achieved. On the other hand, if k is too small, say 4, we would survey the
5th, 9th, Á
, 161st shopper. It may be that the 161st shopper exits the store at 3 P.M.,
which means our survey did not include any of the evening shoppers. Certainly, this
sample is not representative of all Kroger patrons! An estimate of the size of the
population would certainly help determine an appropriate value for k.

To determine the value of k when the size of the population, N, is known is
relatively straightforward. Suppose we wish to survey a population whose size is
known to be N
=
20,325 and we desire a sample of size n
=
100. To guarantee
that individuals are selected evenly from both the beginning and the end of the
population (such as early and late shoppers), we compute N/n and round down to
the nearest integer. For example, 20,325/100 =
203.25, so k
=
203. Then we randomly
select a number between 1 and 203 and select every 203rd individual thereafter.
So, if we randomly selected 90 as our starting point, we would survey the
90th, 293rd, 496th, Á
, 20,187th individuals.

We summarize the procedure as follows:

Steps in Systematic Sampling

Step 1: If possible, approximate the population size, N.

Step 2: Determine the sample size desired, n.

N

Step 3: Compute and round down to the nearest integer. This value is k.

n


Step 4: Randomly select a number between 1 and k. Call this number p.

Step 5: The sample will consist of the following individuals:

p, p
+
k, p
+
2k, Á
, p
+
1n
-12k


Because systematic sampling does not require a frame, it typically provides
more information for a given cost than does simple random sampling. In addition,
systematic sampling is easier to employ, so there is less likelihood of interviewer
error occurring, such as selecting the wrong individual to be surveyed.

Obtain a Cluster Sample

A fourth sampling method is called cluster sampling. The previous three sampling
methods discussed have benefits under certain circumstances. So does cluster
sampling.

*Because we are surveying 40 customers, the first individual surveyed is the 5th, the second is the
5 +
7 =
12th, the third is the 5 +
122 7 =
19th, and so on, until we reach the 40th, which is the
5 +
1392 7 =
278th shopper.


Section 1.4 Other Effective Sampling Methods 33

Definition
A cluster sample is obtained by selecting all individuals within a randomly
selected collection or group of individuals.

In Other Words

Imagine a mall parking lot. Each
subsection of the lot could be a cluster
(Section F-4, for example).

EXAMPLE 3
Stratified and cluster samples
are different. In a stratified sample,
we divide the population into two or
more homogeneous groups. Then we
obtain a simple random sample from
each group. In a cluster sample, we
divide the population into groups,
obtain a simple random sample of
some of the groups, and survey all
individuals in the selected groups.

Now Work Problem 13
Suppose a school administrator wants to learn the characteristics of students
enrolled in online classes. Rather than obtaining a simple random sample based on
the frame of all students enrolled in online classes, the administrator could treat
each online class as a cluster and then obtain a simple random sample of these clusters.
The administrator would then survey all the students in the selected clusters.

Obtaining a Cluster Sample

Problem: A sociologist wants to gather data regarding household income within
the city of Boston. Obtain a sample using cluster sampling.

Approach: The city of Boston can be set up so that each city block is a cluster.
Once the city blocks have been identified, we obtain a simple random sample of the
city blocks and survey all households on the blocks selected.

Solution: Suppose there are 10,493 city blocks in Boston. First, we must number
the blocks from 1 to 10,493. Suppose the sociologist has enough time and money to
survey 20 clusters (city blocks). Therefore, the sociologist should obtain a simple
random sample of 20 numbers between 1 and 10,493 and survey all households from
the clusters selected. Cluster sampling is a good choice in this example because it reduces
the travel time to households that is likely to occur with both simple random
sampling and stratified sampling. In addition, there is no need to obtain a detailed
frame with cluster sampling. The only frame needed is one that provides information
regarding city blocks.


Recall that in systematic sampling we had to determine an appropriate value
for k, the number of individuals to skip between individuals selected to be in the
sample. We have a similar problem in cluster sampling. The following are a few of
the questions that arise:

• How do I cluster the population?
• How many clusters do I sample?
• How many individuals should be in each cluster?
First, it must be determined whether the individuals within the proposed cluster
are homogeneous (similar individuals) or heterogeneous (dissimilar individuals).
Consider the results of Example 3. City blocks tend to have similar households.
Surveying one house on a city block is likely to result in similar responses from
another house on the same block.This results in duplicate information.We conclude
the following: If the clusters have homogeneous individuals, it is better to have more
clusters with fewer individuals in each cluster.

What if the cluster is heterogeneous? Under this circumstance, the heterogeneity
of the cluster likely resembles the heterogeneity of the population. In other
words, each cluster is a scaled-down representation of the overall population. For
example, a quality-control manager might use shipping boxes that contain 100 light
bulbs as a cluster, since the rate of defects within the cluster would closely mimic the
rate of defects in the population, assuming the bulbs are randomly placed in the box.
Thus, when each cluster is heterogeneous, fewer clusters with more individuals in
each cluster are appropriate.

The four sampling techniques just presented are sampling techniques in
which the individuals are selected randomly. Often, however, sampling methods
are used in which the individuals are not randomly selected, such as convenience
sampling.


34 Chapter 1 Data Collection

Definition


Studies that use convenience
sampling generally have results that
are suspect. The results should be
looked on with extreme skepticism.

Convenience Sampling

Have you ever been stopped in the mall by someone holding a clipboard? These
folks are responsible for gathering information, but their methods of data collection
are inappropriate, and the results of their analysis are suspect because they obtained
their data using a convenience sample.

A convenience sample is a sample in which the individuals are easily obtained

and not based on randomness.

There are many types of convenience samples, but probably the most popular
are those in which the individuals in the sample are self-selected (the individuals
themselves decide to participate in a survey). These are also called voluntary
response samples. Examples of self-selected sampling include phone-in polling; a
radio personality will ask his or her listeners to phone the station to submit their
opinions. Another example is the use of the Internet to conduct surveys. For example,
Dateline will present a story regarding a certain topic and ask its viewers to “tell
us what you think” by completing a questionnaire online or phoning in an opinion.
Both of these samples are poor designs because the individuals who decide to be in
the sample generally have strong opinions about the topic. A more typical individual
in the population will not bother phoning or logging on to a computer to complete
a survey. Any inference made regarding the population from this type of
sample should be made with extreme caution.

The reason convenience samples yield unreliable results is that the individuals
chosen to participate in the survey are not chosen using random sampling. Instead,
the interviewer or participant selects who is in the survey. Do you think an interviewer
would select an ornery individual? Of course not! Therefore, the sample is
likely not to be representative of the population.

Multistage Sampling

In practice, most large-scale surveys obtain samples using a combination of the techniques
just presented.

As an example of multistage sampling, consider Nielsen Media Research.
Nielsen randomly selects households and monitors the television programs these
households are watching through a People Meter. The meter is an electronic box
placed on each TV within the household.The People Meter measures what program
is being watched and who is watching it. Nielsen selects the households with the use
of a two-stage sampling process.

Stage 1: Using U.S. Census data, Nielsen divides the country into geographic

areas (strata). The strata are typically city blocks in urban areas and geographic

regions in rural areas. About 6,000 strata are randomly selected.

Stage 2: Nielsen sends representatives to the selected strata and lists the house


holds within the strata. The households are then randomly selected through a

simple random sample.

Nielsen sells the information obtained to television stations and companies. These
results are used to help determine prices for commercials.

As another example of multistage sampling, consider the sample used by the
Census Bureau for the Current Population Survey. This survey requires five stages
of sampling:

Stage 1: Stratified sample

Stage 2: Cluster sample

Stage 3: Stratified sample

Stage 4: Cluster sample

Stage 5: Systematic sample


Section 1.4 Other Effective Sampling Methods 35

This survey is very important because it is used to obtain demographic estimates of
the United States in noncensus years. A detailed presentation of the sampling
method used by the Census Bureau can be found in The Current Population Survey:
Design and Methodology, Technical Paper No. 40.

Sample Size Considerations

Throughout our discussion of sampling, we did not mention how to determine the
sample size. Determining the sample size is key in the overall statistical process. In
other words, the researcher must ask this question: “How many individuals must I
survey to draw conclusions about the population within some predetermined margin
of error?” The researcher must find the correct balance between the reliability
of the results and the cost of obtaining these results.The bottom line is that time and
money determine the level of confidence a researcher will place on the conclusions
drawn from the sample data. The more time and money the researcher has available,
the more accurate the results of the statistical inference will be.

Nonetheless, techniques do exist for determining the sample size required to
estimate characteristics regarding the population within some margin of error.
We will consider some of these techniques in Sections 9.1 and 9.3. (For a detailed
discussion of sample size considerations, consult a text on sampling techniques
such as Elements of Sampling Theory and Methods by Z. Govindarajulu, Prentice
Hall, 1999.)

Summary

Figure 5 provides a summary of the four sampling techniques presented.

Figure 5

Simple Random Sampling Stratified Sampling

2

5

1

45

10 8

8

2

22

13

1

44

5

35

5

6 8 910

67

78

11 12

99

33

6

7119
12

12

Population Sample Population Strata Sample

Systematic Sampling Cluster Sampling

1 2 5 6 9 10 1 2 5 6 9 10 1 2
1 2 34 5 6 7 8 9 10 11
3 4
Population 3 4 7 8 11 12 3 4 7 8 11 12

13 14 17 18 2122

13 14

17 18

21 22

17 18

15 16 19 202324

19 202324 19 20

15 16

2 5811

Sample (every 3rd person selected) Population Cluster Sample:

Population Randomly
Selected
Clusters


36 Chapter 1 Data Collection

Different Sampling Methods
The following question was recently asked by the Gallup Organization:
In general, are you satisfied or dissatisfied with the way things are going
in the country?
(a) Number the students in the class from 1 to N, where N is the
number of students. Obtain a simple random sample and have them
answer this question. Record the number of satisfied responses and the
number of dissatisfied responses.
(b) Divide the students in the class by gender. Treat each gender as a
stratum. Obtain a simple random sample from each stratum and have
them answer this question. Record the number of satisfied responses
and the number of dissatisfied responses.
(c) Treat each row of desks as a cluster. Obtain a simple random sample
of clusters and have each student in the selected clusters answer this
question. Record the number of satisfied responses and the number
of dissatisfied responses.
(d) Number the students in the class from 1 to N, where N is the number
of students. Obtain a systematic sample and have the selected students
answer this question. Record the number of satisfied responses and the
number of dissatisfied responses.
(e) Were there any differences in the results of the survey? State some
reasons for any differences.
1.4 ASSESS YOUR UNDERSTANDING
Concepts and Vocabulary

1.
Describe a circumstance in which stratified sampling would
be an appropriate sampling method.
2.
Which sampling method does not require a frame?
3.
Why are convenience samples ill advised?
4.
A(n) ______ is obtained by dividing the population into
groups and selecting all individuals from within a random
sample of the groups.
5.
A(n) ______ is obtained by dividing the population into
homogeneous groups and randomly selecting individuals
from each group.
6.
True or False: When taking a systematic random sample of
size n, every group of size n from the population has the
same chance of being selected.
7.
True or False: A simple random sample is always preferred
because it obtains the same information as other sampling
plans but requires a smaller sample size.
8.
True or False: When conducting a cluster sample, it is better
to have fewer clusters with more individuals when the clusters
are heterogeneous.
9.
True or False: Inferences based on voluntary response samples
are generally not reliable.
10.
True or False: When obtaining a stratified sample, the number
of individuals included within each stratum must be
equal.
Skill Building

In Problems 11–22, identify the type of sampling used.

11.
To estimate the percentage of defects in a recent manufacturing
batch, a quality-control manager at Intel selects every
8th chip that comes off the assembly line starting with the
3rd until she obtains a sample of 140 chips.
12.
To determine the prevalence of human growth hormone
(HGH) use among high school varsity baseball players, the
State Athletic Commission randomly selects 50 high schools.
All members of the selected high schools’ varsity baseball
teams are tested for HGH.
NW13. To determine customer opinion of its boarding policy, Southwest
Airlines randomly selects 60 flights during a certain
week and surveys all passengers on the flights.
14.
A member of Congress wishes to determine her constituency’s
opinion regarding estate taxes. She divides her
constituency into three income classes: low-income households,
middle-income households, and upper-income households.
She then takes a simple random sample of households
from each income class.
15.
In an effort to identify if an advertising campaign has been
effective, a marketing firm conducts a nationwide poll by
randomly selecting individuals from a list of known users of
the product.
16.
A radio station asks its listeners to call in their opinion regarding
the use of U.S. forces in peacekeeping missions.

17.
18.
19.
20.
21.
22.
23.
24.
NWNW
A farmer divides his orchard into 50 subsections, randomly
selects 4, and samples all the trees within the 4 subsections to
approximate the yield of his orchard.
A school official divides the student population into five
classes: freshman, sophomore, junior, senior, and graduate
student. The official takes a simple random sample from
each class and asks the members’ opinions regarding student
services.
A survey regarding download time on a certain website is
administered on the Internet by a market research firm to
anyone who would like to take it.
The presider of a guest-lecture series at a university stands
outside the auditorium before a lecture begins and hands
every fifth person who arrives, beginning with the third, a
speaker evaluation survey to be completed and returned at
the end of the program.
To determine his DSL Internet connection speed, Shawn
divides up the day into four parts: morning, midday,
evening, and late night. He then measures his Internet con-
nection speed at 5 randomly selected times during each
part of the day.
24 Hour Fitness wants to administer a satisfaction survey to
its current members. Using its membership roster, the club
randomly selects 40 club members and asks them about their
level of satisfaction with the club.
A salesperson obtained a systematic sample of size 20 from
a list of 500 clients. To do so, he randomly selected a number
from 1 of 25, obtaining the number 16. He included in the
sample the 16th client on the list and every 25th client there-
after. List the numbers that correspond to the 20 clients
selected.
A quality-control expert wishes to obtain a cluster sample by
selecting 10 of 795 clusters. She numbers the clusters from 1 to
795. Using Table I from Appendix A, she closes her eyes and
drops a pencil on the table. It points to the digit in row 8, col-
umn 38. Using this position as the starting point and proceed-
ing downward, determine the numbers for the 10 clusters
selected.
Applying the Concepts
25. Stratified Sampling The Future Government Club wants to
sponsor a panel discussion on the upcoming national elec-
tion. The club wants to have four of its members lead the
panel discussion. To be fair, however, the panel should con-
sist of two Democrats and two Republicans. From the list of
current members of the club, obtain a stratified sample of
two Democrats and two Republicans to serve on the panel.
Democrats Republicans
Bolden Motola Blouin Ochs
Bolt Nolan Cooper Pechtold
Carter Opacian De Young Redmond
Debold Pawlak Engler Rice
Fallenbuchel Ramirez Grajewski Salihar
Haydra Tate Keating Thompson
Khouri Washington May Trudeau
Lukens Wright Niemeyer Zenkel
Section 1.4 Other Effective Sampling Methods 37

26.
Stratified Sampling The owner of a private food store is concerned
about employee morale. She decides to survey the
managers and hourly employees to see if she can learn about
work environment and job satisfaction. From the list of
workers at the store, obtain a stratified sample of two managers
and four hourly employees to survey.
Managers Hourly Employees
Carlisle Oliver Archer Foushi Massie
Hills Orsini Bolcerek Gow Musa
Kats Ullrich Bryant Grove Nickas
Lindsey McGuffin Cole Hall Salazar
Dimas Houston Vaneck
Ellison Kemp Weber
Everhart Lathus Zavodny

27.
Systematic Sample The human resource department at a
certain company wants to conduct a survey regarding worker
morale. The department has an alphabetical list of all
4,502 employees at the company and wants to conduct a systematic
sample.
NW
(a) Determine k if the sample size is 50.
(b) Determine the individuals who will be administered the
survey. More than one answer is possible.
28.
Systematic Sample To predict the outcome of a county election,
a newspaper obtains a list of all 945,035 registered voters
in the county and wants to conduct a systematic sample.
(a) Determine k if the sample size is 130.
(b) Determine the individuals who will be administered the
survey. More than one answer is possible.
29.
Which Method? The mathematics department at a university
wishes to administer a survey to a sample of students taking
college algebra.The department is offering 32 sections of college
algebra, similar in class size and makeup, with a total of
1,280 students. They would like the sample size to be roughly
10% of the population of college algebra students this semester.
How might the department obtain a simple random
sample? A stratified sample? A cluster sample? Which
method do you think is best in this situation?
30.
Good Sampling Method? To obtain students’ opinions
about proposed changes to course registration procedures,
the administration of a small college asked for faculty
volunteers who were willing to administer a survey in one
of their classes. Twenty-three faculty members volunteered.
Each of these faculty members gave the survey to
all the students in one course of their choosing. Would this
sampling method be considered a cluster sample? Why or
why not?
31.
Sample Design The city of Naperville is considering the
construction of a new commuter rail station. The city wishes
to survey the residents of the city to obtain their opinion regarding
the use of tax dollars for this purpose. Design a sampling
method to obtain the individuals in the sample. Be sure
to support your choice.
32.
Sample Design A school board at a local community college
is considering raising the student services fees. The board
wants to obtain the opinion of the student body before proceeding.
Design a sampling method to obtain the individuals
in the sample. Be sure to support your choice.

38 Chapter 1 Data Collection

33.
Sample Design Target wants to open a new store in the village
of Lockport. Before construction, Target’s marketers want to
obtain some demographic information regarding the area
under consideration. Design a sampling method to obtain the
individuals in the sample. Be sure to support your choice.
34.
Sample Design The county sheriff wishes to determine if a
certain highway has a high proportion of speeders traveling
on it. Design a sampling method to obtain the individuals in
the sample. Be sure to support your choice.
35.
Sample Design A pharmaceutical company wants to conduct
a survey of 30 individuals who have high cholesterol.
The company has obtained a list from doctors throughout
the country of 6,600 individuals who are known to have high
cholesterol. Design a sampling method to obtain the individuals
in the sample. Be sure to support your choice.
36.
Sample Design A marketing executive for Coca-Cola, Inc.,
wants to identify television shows that people in the Boston
area who typically drink Coke are watching. The executive
has a list of all households in the Boston area. Design a
sampling method to obtain the individuals in the sample. Be
sure to support your choice.
37.
Putting It Together: Comparing Sampling Methods Suppose
a political strategist wants to get a sense of how American
1.5 BIAS IN SAMPLING
adults aged 18 years or older feel about health care and
health insurance.

(a) In a political poll, what would be a good frame to use for
obtaining a sample?
(b) Explain why simple random sampling may not guarantee
that the sample has an accurate representation of
registered Democrats, registered Republicans, and registered
Independents.
(c) How can stratified sampling guarantee this representation?
38.
Putting It Together: Thinking about Randomness What is
random sampling? Why is it necessary for a sample to be
obtained randomly rather than conveniently? Will randomness
guarantee that a sample will provide accurate information
about the population? Explain.
39.
Research the origins of the Gallup Poll and the current sampling
method the organization uses. Report your findings to
the class.
40.
Research the sampling methods used by a market research
firm in your neighborhood. Report your findings to the class.
The report should include the types of sampling methods
used, number of stages, and sample size.
Objective 1 Explain the sources of bias in sampling
1 Explain the Sources of Bias in Sampling

So far we have looked at how to obtain samples, but not at some of the problems
that inevitably arise in sampling. Remember, the goal of sampling is to obtain information
about a population through a sample.

Definition
If the results of the sample are not representative of the population, then the
sample has bias.

In Other Words

There are three sources of bias in sampling:

The word bias could mean to give

preference to selecting some individuals

1. Sampling bias
over others. It could also mean that

2. Nonresponse bias
certain responses are more likely to

occur in the sample than in the 3. Response bias

population.

Sampling Bias

Sampling bias means that the technique used to obtain the individuals to be in
the sample tends to favor one part of the population over another.Any convenience
sample has sampling bias because the individuals are not chosen through a random
sample. For example, a voluntary response sample will have sampling bias because
the opinions of individuals who decide to be in the sample are probably not representative
of the population as a whole.

Sampling bias also results due to undercoverage. Undercoverage occurs when
the proportion of one segment of the population is lower in a sample than it is in
the population. Undercoverage can result because the frame used to obtain the sample
is incomplete or not representative of the population. Recall that the frame is the
list of all individuals in the population under study. Sometimes, obtaining the frame


Section 1.5 Bias in Sampling 39


would seem to be a relatively easy task, such as obtaining the list of all registered
voters for a study regarding voter preference in an upcoming election. Even under
this circumstance, however, the frame may be incomplete since people who recently
registered to vote may not be on the published list of registered voters.

Sampling bias can result in incorrect predictions. For example, the magazine
Literary Digest predicted that Alfred M. Landon would defeat Franklin D. Roosevelt
in the 1936 presidential election. The Literary Digest conducted a poll by mailing
questionnaires based on a list of its subscribers, telephone directories, and automobile
owners. On the basis of the results, the Literary Digest predicted that Landon
would win the election with 57% of the popular vote. However, Roosevelt won the
election with about 62% of the popular vote. Bear in mind that this election was
taking place during the height of the Great Depression. The incorrect prediction
by the Literary Digest was the result of sampling bias. In 1936, most subscribers to
the magazine, households with telephones, and automobile owners were Republican,
the party of Landon. Therefore, the choice of the frame used to conduct the
survey led to an incorrect prediction. Essentially, there was undercoverage of Democrats.


Often, it is difficult to gain access to a complete list of individuals in a population.
For example, in public-opinion polls, random telephone surveys are frequently
conducted, which implies that the frame is all households with telephones. This
method of sampling will exclude any household that does not have a telephone, as
well as homeless people. If the individuals without a telephone or homeless people
differ in some way from people with a telephone or with homes, then the results of
the sample may not be valid.

Nonresponse Bias

Nonresponse bias exists when individuals selected to be in the sample who do not
respond to the survey have different opinions from those who do. Nonresponse can
occur because individuals selected for the sample do not wish to respond or the
interviewer was unable to contact them.

All surveys will suffer from nonresponse. The federal government uses a complex
random sample to select individuals to participate in its Current Population
Survey. Overall, the response rate is about 92%, but it varies depending on the age
of the individual. For example, the response rate for 20- to 29-year-olds is 85%,
while the response rate for individuals at least 70 years of age is 99%. Response
rates in random digit dialing (RDD) telephone surveys are typically around 70%.
Response rates for e-mail surveys typically hover around 40%, and mail surveys can
have response rates as high as 60%.

Nonresponse bias can be controlled using callbacks. For example, if nonresponse
occurs because a mailed questionnaire was not returned, a callback might
mean phoning the individual to conduct the survey. If nonresponse occurs because
an individual was not at home, a callback might mean returning to the home at other
times in the day or on other days of the week.

Another method to improve nonresponse is using rewards and incentives.
Rewards may include cash payments for completing a questionnaire. Incentives
might include a cover letter that states that the responses to the questionnaire will
determine future policy. For example, I received $1 with a survey regarding my
satisfaction with a recent purchase. The $1 “payment” was meant to make me feel
guilty enough to fill out the questionnaire. As another example, a city may send out
questionnaires to households and state in a cover letter that the responses to the
questionnaire will be used to decide pending issues within the city.

Let’s consider the Literary Digest poll again. The Literary Digest mailed out
more than 10 million questionnaires and 2.3 million people responded. The rather
low response rate (23%) contributed to the Literary Digest making an incorrect
prediction. After all, Roosevelt was the incumbent president and only those who
were unhappy with his administration were likely to respond. By the way, in the
same election, the 35-year-old George Gallup predicted that Roosevelt would win
the election. He surveyed only 50,000 people to come to his conclusion.


40 Chapter 1 Data Collection


The wording of questions can
significantly affect the responses and,
therefore, the validity of a study.

Response Bias

Response bias exists when the answers on a survey do not reflect the true feelings
of the respondent. Response bias can find its way into survey results in a number
of ways.

Interviewer Error A trained interviewer is essential to obtain accurate information
from a survey. A good interviewer will have the skill necessary to elicit responses
from individuals within a sample and be able to make the interviewee feel comfortable
enough to give truthful responses. For example, a good interviewer should be
able to obtain truthful answers to questions as sensitive as “Have you ever cheated
on your taxes?” Do not be quick to trust surveys that are conducted by poorly
trained interviewers. Do not trust survey results if the sponsor has a vested interest
in the results of the survey. Would you trust a survey conducted by a car dealer that
reports 90% of customers say they would buy another car from the dealer?

Misrepresented Answers Some survey questions result in responses that misrepresent
facts or are flat-out lies. For example, a survey of recent college graduates
may find that self-reported salaries are somewhat inflated.Also,people may overestimate
their abilities. For example, ask people how many push-ups they can do in
1 minute, and then ask them to do the push-ups. How accurate were they?

Wording of Questions The wording of a question plays a large role in the type of
response given to the question. The way a question is worded can lead to response
bias in a survey, so questions must always be asked in balanced form. For example,
the “yes/no” question

Do you oppose the reduction of estate taxes?

should be written

Do you favor or oppose the reduction of estate taxes?

The second question is balanced. Do you see the difference? Consider the following
report based on studies from Schuman and Presser (Questions and Answers in
Attitude Surveys, 1981, p. 277), who asked the following two questions:

(A) Do you think the United States should forbid public speeches against
democracy?
(B) Do you think the United States should allow public speeches against democracy?
For those respondents presented with question A, 21.4% gave “yes” responses,
while for those given question B, 47.8% gave “no” responses. The conclusion you
may arrive at is that most people are not necessarily willing to forbid something, but
more people are willing not to allow something. These results imply that the wording
of the question can alter the outcome of a survey.

Another consideration in wording a question is not to be vague. For example,
the question “How much do you study?” is too vague. Does the researcher mean
how much do I study for all my classes or just for statistics? Does the researcher
mean per day or per week? The question should be written “How many hours do
you study statistics each week?”

Ordering of Questions or Words Many surveys will rearrange the order of the
questions within a questionnaire so that responses are not affected by prior questions.
Consider the following example from Schuman and Presser in which the
following two questions were asked:

(A) Do you think the United States should let Communist newspaper reporters
from other countries come in here and send back to their papers the news as they
see it?
(B) Do you think a Communist country such as Russia should let American newspaper
reporters come in and send back to America the news as they see it?

Section 1.5 Bias in Sampling 41

For surveys conducted in 1980 in which the questions appeared in the order (A, B),
54.7% of respondents answered “yes” to A and 63.7% answered “yes” to B. If
the questions were ordered (B, A), then 74.6% answered “yes” to A and 81.9%
answered “yes” to B. When Americans are first asked if U.S. reporters should be
allowed to report Communist news, they are more likely to agree that Communists
should be allowed to report American news. Questions should be rearranged as
much as possible to help reduce effects of this type.

Pollsters will also rearrange words within a question. For example, the Gallup
Organization asked the following question of 1,017 adults aged 18 years or older:

Do you [rotated: approve (or) disapprove] of the job George W. Bush is

doing as president?

Notice how the words approve and disapprove were rotated. The purpose of this is
to remove the effect that may occur by writing the word approve first in the question.

Type of Question One of the first considerations in designing a question is
determining whether the question should be open or closed.

An open question is one for which A closed question is one for which
the respondent is free to choose the respondent must choose from
his or her response. For example: a list of predetermined responses.

What is the most important What is the most important problem
problem facing America’s youth facing America’s youth today?
today?

(a) Drugs
(b) Violence
(c) Single-parent homes
(d) Promiscuity
(e) Peer pressure
Not only should the order of the questions or certain words within the question
be rearranged, but in closed questions the possible responses should also be rearranged.
The reason is that respondents are likely to choose early choices in a list
rather than later choices.

When designing an open question, be sure to phrase the question so that the responses
are similar. (You don’t want a wide variety of responses.) This allows for easy
analysis of the responses.The benefit of closed questions is that they limit the number
of respondent choices and, therefore, the results are much easier to analyze. However,
this limits the choices and does not always allow the respondent to respond the way
he or she might want to. If the desired answer is not provided as a choice, the respondent
will be forced to choose a secondary answer or skip the question.

Survey designers recommend conducting pretest surveys with open questions
and then using the most popular answers as the choices on closed-question surveys.
Another issue to consider in the closed-question design is the number of responses
the respondent may choose from. It is recommended that the option “no opinion” be
omitted, because this option does not allow for meaningful analysis. The bottom line
is to try to limit the number of choices in a closed-question format without forcing respondents
to choose an option they otherwise would not. If the respondents choose
an option they otherwise would not choose, the survey will have response bias.

Data-entry Error Although not technically a result of response bias, data-entry
error will lead to results that are not representative of the population. Once data
are collected, the results typically must be entered into a computer, which could result
in input errors. For example, 39 may be entered as 93. It is imperative that data
be checked for accuracy. In this text, we present some suggestions for checking for
data error.


42 Chapter 1 Data Collection

Can a Census Have Bias?

The discussion thus far has focused on bias in samples. This is not to imply that bias
cannot occur when conducting a census, however. For example, it is entirely possible
that a question on a census form is misunderstood, thereby leading to response bias
in the results. We also mentioned that it is often difficult to contact each individual
in a population. For example, the U.S. Census Bureau is challenged to count each
homeless person in the country, so the census data published by the U.S. government
likely suffers from nonresponse bias.

Sampling Error versus Nonsampling Error

Nonresponse bias, response bias, and data-entry errors are types of nonsampling error.
However, whenever a sample is used to learn information about a population, there
will inevitably also be sampling error.

Definitions
Nonsampling errors are errors that result from undercoverage, nonresponse
bias, response bias, or data-entry error. Such errors could also be present in a
complete census of the population. Sampling error is the error that results from
using a sample to estimate information about a population. This type of error
occurs because a sample gives incomplete information about a population.

By incomplete information, we mean that the individuals in the sample cannot
reveal all the information about the population. Consider the following: Suppose

In Other Words

that we wanted to determine the average age of the students enrolled in an intro-

We can think of sampling error as error

ductory statistics course. To do this, we obtain a simple random sample of four stu


that results from using a subset of the

dents and ask them to write their age on a sheet of paper and turn it in. The average

population to describe characteristics

age of these four students is found to be 23.25 years. Assume that no students lied

of the population. Nonsampling error is

about their age, nobody misunderstood the question, and the sampling was done ap


error that results from obtaining and

propriately. If the actual average age of all 30 students in the class (the population)

recording the information collected.

is 22.91 years, then the sampling error is 23.25 -22.91 =
0.34 year. Now suppose
that the same survey is conducted, but this time one individual lies about his age.
Then the results of the survey will also have nonsampling error.

A Classroom Survey
As a class, answer the following questions.Throughout the semester, the
results of the survey can be used to illustrate various statistical concepts.
1. What is your gender?
2. What is your age?
3. How many semester hours are you enrolled in this semester?
4. How many minutes did you watch television last night?
5. What is your major? If you don’t know, state undeclared.
6. How many hours did you work last week? If you don’t work, write 0.
7. How many siblings do you have (include half- and step-siblings)?
8. Do you own your own car? If so, what make (Chevrolet, Honda, etc.)?
9. Do you speak more than one language fluently? If so, what
language(s)?
10. How many hours do you study each week?
11. How many hours did you study last night?
12. How long does it take you (in minutes) to get to campus?
13. What is your eye color?

Section 1.5 Bias in Sampling 43

1.5 ASSESS YOUR UNDERSTANDING
Concepts and Vocabulary

1.
Why is it rare for frames to be completely accurate?
2.
What are some solutions to nonresponse?
3.
What is a closed question? What is an open question? Discuss
the advantages and disadvantages of each type of question.
4.
What does it mean when a part of the population is underrepresented?
5.
Discuss the benefits of having trained interviewers.
6.
What are the advantages of having a presurvey when constructing
a questionnaire that has closed questions?
7.
Discuss the pros and cons of telephone interviews that take
place during dinner time in the early evening.
8.
Why is a high response rate desired? How would a low response
rate affect survey results?
9.
Discuss why the order of questions or choices within a questionnaire
are important in sample surveys.
10.
Suppose a survey asks, “Do you own any CDs?” Explain
how this could be interpreted in more than one way. Suggest
a way in which the question could be improved.
11.
What is bias? Name the three sources of bias and provide an
example of each. How can a census have bias?
12.
Distinguish between nonsampling error and sampling error.
Skill Building

In Problems 13–24, the survey has bias. (a) Determine the type of
bias. (b) Suggest a remedy.

13.
A retail store manager wants to conduct a study regarding
the shopping habits of his customers. He selects
the first 60 customers who enter his store on a Saturday
morning.
14.
The village of Oak Lawn wishes to conduct a study regarding
the income level of households within the village.The village
manager selects 10 homes in the southwest corner of
the village and sends an interviewer to the homes to determine
household income.
15.
An antigun advocate wants to estimate the percentage of
people who favor stricter gun laws. He conducts a nationwide
survey of 1,203 randomly selected adults 18 years old
and older. The interviewer asks the respondents, “Do
you favor harsher penalties for individuals who sell guns
illegally?”
16.
Suppose you are conducting a survey regarding the sleeping
habits of students. From a list of registered students, you
obtain a simple random sample of 150 students. One survey
question is “How much sleep do you get?”
17.
A polling organization conducts a study to estimate the percentage
of households that speaks a foreign language as the
primary language. It mails a questionnaire to 1,023 randomly
selected households throughout the United States and asks
the head of household if a foreign language is the primary
language spoken in the home. Of the 1,023 households
selected, 12 responded.
18.
Cold Stone Creamery is considering opening a new store in
O’Fallon. Before opening the store, the company would like
to know the percentage of households in O’Fallon that regularly
visit an ice cream shop.The market researcher obtains a
list of households in O’Fallon and randomly selects 150 of
them. He mails a questionnaire to the 150 households that
asks about ice cream eating habits and flavor preferences. Of
the 150 questionnaires mailed, 4 are returned.

19.
A newspaper article reported, “The Cosmopolitan magazine
survey of more than 5,000 Australian women aged 18–34
found about 42 percent considered themselves overweight
or obese.”
Source: Herald Sun, September 9, 2007

20.
A health teacher wishes to do research on the weight of
college students. She obtains the weights for all the students
in her 9 A.M. class by looking at their driver’s licenses or
state IDs.
21.
A magazine is conducting a study on the effects of infidelity
in a marriage.The editors randomly select 400 women whose
husbands were unfaithful and ask, “Do you believe a marriage
can survive when the husband destroys the trust that
must exist between husband and wife?”
22.
A textbook publisher wants to determine what percentage
of college professors either require or recommend that their
students purchase textbook packages with supplemental
materials, such as study guides, digital media, and online
tools. The publisher sends out surveys by e-mail to a random
sample of 320 faculty members who have registered with
its website and have agreed to receive solicitations.The publisher
reports that 80% of college professors require or recommend
that their students purchase some type of textbook
package.
23.
Suppose you are conducting a survey regarding illicit drug use
among teenagers in the Baltimore school district. You obtain
a cluster sample of 12 schools within the district and sample
all sophomore students in the randomly selected schools.
The survey is administered by the teachers.
24.
To determine the public’s opinion of the police department,
the police chief obtains a cluster sample of 15 census tracts
within his jurisdiction and samples all households in the randomly
selected tracts. Uniformed police officers go door to
door to conduct the survey.
Applying the Concepts

25.
Response Rates Surveys tend to suffer from low response
rates. Based on past experience, a researcher determines that
the typical response rate for an e-mail survey is 40%. She
wishes to obtain a sample of 300 respondents, so she e-mails
the survey to 1500 randomly selected e-mail addresses. Assuming
the response rate for her survey is 40%,will the respondents
form an unbiased sample? Explain.
26.
Delivery Format The General Social Survey asked, “About
how often did you have sex in the past 12 months?” About
47% of respondents indicated they had sex at least once a
week. In a Web survey for a marriage and family wellness
center, respondents were asked, “How often do you and
your partner have sex (on average)?” About 31% of respondents
indicated they had sex with their partner at least once
a week. Explain how the delivery method for such a question
could result in biased responses.

44 Chapter 1 Data Collection

27.
Order of the Questions Consider the following two questions:
(a) Suppose that a rape is committed in which the woman
becomes pregnant. Do you think the criminal should
or should not face additional charges if the woman
becomes pregnant?
(b) Do you think abortions should be legal under any circumstances,
legal under certain circumstances, or illegal
in all circumstances?
Do you think the order in which the questions are asked
will affect the survey results? If so, what can the pollster do
to alleviate this response bias?

28.
Order of the Questions Consider the following two questions:
(a) Do you believe that the government should or should
not be allowed to prohibit individuals from expressing
their religious beliefs at their place of employment?
(b) Do you believe that the government should or should
not be allowed to prohibit teachers from expressing
their religious beliefs in public school classrooms?
Do you think the order in which the questions are asked
will affect the survey results? If so, what can the pollster do
to alleviate this response bias? Discuss the choice of the
word prohibit in the survey questions.

29.
Improving Response Rates Suppose you are reading an
article at psychcentral.com and the following text appears in
a pop-up window:
What tactic is the company using to increase the response
rate for its survey?

30.
Rotating Choices Consider this question from a recent
Gallup poll:
Which of the following approaches to solving the nation’s energy
problems do you think the U.S. should follow right now—
[ROTATED: emphasize production of more oil, gas and coal supplies
(or) emphasize more conservation by consumers of existing
energy supplies]?

Why is it important to rotate the two choices presented in the
question?

31.
Random Digit Dialing Many polls use random digit dialing
(RDD) to obtain a sample, which means a computer randomly
generates phone numbers. What is the frame for this
type of sampling? Who would be excluded from the survey
and how might this affect the results of the survey?
32.
Caller ID How do you think caller ID has affected phone
surveys?
33.
Don’t Call Me! The Telephone Consumer Protection Act
(TCPA) allows consumers to put themselves on a do-not-call
registry. If a number is on the registry, commercial telemarketers
are not allowed to call you. Do you believe this has
affected the ability of surveyors to obtain accurate polling
results? If so, how?

34.
Current Population Survey In the federal government’s
Current Population Survey, the response rate for 20- to
29-year-olds is 85%, while response rates for individuals at
least 70 years of age is 99%. Why do you think this is?
35.
Analyze an Article Read the following article from the January
20, 2005 USA Today. What types of nonsampling errors
led to incorrect exit polls?
Firms Report Flaws That Threw Off Exit Polls

Kerry backers’ willingness, pollsters’ inexperience cited

By Mark Memmott, USA Today

The exit polls of voters on Election Day so overstated Sen.
John Kerry’s support that, going back to 1988, they rank as
the most inaccurate in a presidential election, the firms that
did the work concede.
One reason the surveys were skewed, they say, was because
Kerry’s supporters were more willing to participate
than Bush’s. Also, the people they hired to quiz voters were
on average too young and too inexperienced and needed
more training.
The exit polls, which are supposed to help the TV networks
shape their coverage on election night, were sharply criticized.
Leaks of preliminary data showed up on the Internet in
the early afternoon of Election Day, fueling talk that Kerry was
beating President Bush. After the election, some political
scientists, pollsters and journalists questioned their value.
In a report to the six media companies that paid them to
conduct the voter surveys, pollsters Warren Mitofsky and
Joseph Lenski said Wednesday that “on average, the results
from each precinct overstated the Kerry-Bush difference by

6.5 (percentage) points. This is the largest (overstatement) we
have observed . . . in the last five presidential elections.”
Lenski said Wednesday that issuing the report was like
“hanging out your dirty underwear. You hope it’s cleaner
than people expected.”
Among the findings:

They hired too many relatively young adults to conduct
the interviews. Half of the 1,400 interviewers were
younger than 35. That may explain in part why Kerry voters
were more inclined to participate, since he drew
more of the youth vote than did Bush. But Mitofsky and
Lenski also found younger interviewers were more likely
to make mistakes.

Early results were skewed by a “programming error” that
led to including too many female voters. Kerry outpolled
Bush among women.

Some local officials prevented interviewers from getting
close to voters.

For future exit polls, Lenski and Mitofsky recommended

hiring more experienced polltakers and giving them better

training, and working with election officials to ensure ac


cess to polling places.

Lenski and Mitofsky noted that none of the media outlets

they worked for—ABC, CBS, CNN, Fox News, NBC and the

Associated Press—made any wrong “calls” on election

night. Representatives of those six are reviewing the report.

Many other news media, including USA Today, also paid to

get some of the data.

Source: USA TODAY. January 20, 2005. Reprinted with
Permission.


Section 1.6 The Design of Experiments 45

36.
Increasing Response Rates Offering rewards or incentives
is one way of attempting to increase response rates. Discuss
a possible disadvantage of such a practice.
37.
Wording Survey Questions Write a survey question that
contains strong wording and a survey question that contains
tempered wording. Present the strongly worded question to
10 randomly selected people and the tempered question to
10 different randomly selected people. How does the wording
affect the response?
38.
Order in Survey Questions Write two questions that
could have different responses, depending on the order in
which the questions are presented. Randomly select 20
people and present the questions in one order to 10 of the
people and in the opposite order to the other 10 people.
Did the results differ?
39.
Research a survey method used by a company or government
branch. Determine the sampling method used, the
sample size, the method of collection, and the frame used.
40.
Informed Opinions People often respond to survey questions
without any knowledge of the subject matter. A common
example of this is the discussion on banning dihydrogen
monoxide. The Centers for Disease Control (CDC) reports
that there were 1,493 deaths due to asbestos in 2002, but
over 3,200 deaths were attributed to dihydrogen monoxide
in 2000. Articles and Web sites, such as www.dhmo.org tell
how this substance is widely used despite the dangers associated
with it. Many people have joined the cause to ban this
substance without realizing that dihydrogen monoxide is
simply water 1H2O2. Their eagerness to protect the environment
or their fear of seeming uninformed may be part of the
problem. Put together a survey that asks individuals whether
dihydrogen monoxide should or should not be banned. Give
the survey to 20 randomly selected students around campus
and report your results to the class. An example survey
might look like the following:
Dihydrogen monoxide is colorless, odorless, and kills thousands
of people every year. Most of these deaths are
caused by accidental inhalation, but the dangers of dihydrogen
monoxide do not stop there. Prolonged exposure to its
solid form can severely damage skin tissue. Symptoms of

ingestion can include excessive sweating and urination and
possibly a bloated feeling, nausea, vomiting, and body electrolyte
imbalance. Dihydrogen monoxide is a major component
of acid rain and can cause corrosion after coming in
contact with certain metals.

Do you believe that the government should or should not
ban the use of dihydrogen monoxide?

41.
Name two biases that led to the Literary Digest making an
incorrect prediction in the presidential election of 1936.
42.
Research on George Gallup Research the polling done by
George Gallup in the 1936 presidential election. Write a report
on your findings. Be sure to include information about
the sampling technique and sample size. Now research the
polling done by Gallup for the 1948 presidential election.
Did Gallup accurately predict the outcome of the election?
What lessons were learned by Gallup?
43.
Putting It Together: Speed Limit In the state of California,
speed limits are established through traffic engineering surveys.
One aspect of the survey is for city officials to measure
the speed of vehicles on a particular road.
Source: www.ci.eureka.ca.gov, www.nctimes.com

(a) What is the population of interest for this portion of the
engineering survey?
(b) What is the variable of interest for this portion of the
engineering survey?
(c) Is the variable qualitative or quantitative?
(d) What is the level of measurement for the variable?
(e) Is a census feasible in this situation? Explain why or why
not.
(f) Is a sample feasible in this situation? If so, explain what
type of sampling plan could be used? If not, explain why
not.
(g) In July 2007, the Temecula City Council refused a request
to increase the speed limit on Pechanga Parkway from 40
to 45 mph despite survey results indicating that the prevailing
speed on the parkway favored the increase. Opponents
were concerned that it was visitors to a nearby casino
who were driving at the increased speeds and that city residents
actually favored the lower speed limit.Explain how
bias might be playing a role in the city council’s decision.
1.6 THE DESIGN OF EXPERIMENTS
Objectives 1 Describe the characteristics of an experiment
2 Explain the steps in designing an experiment
3 Explain the completely randomized design
4 Explain the matched-pairs design

The major theme of this chapter has been data collection. Section 1.2 briefly discussed
the idea of an experiment, but the main focus was on observational studies.
Sections 1.3 through 1.5 focused on sampling and surveys. In this section, we further
develop the idea of collecting data through an experiment.


46 Chapter 1 Data Collection

1 Describe the Characteristics
of an Experiment

Remember, in an observational study, if an association exists between an explanatory
variable and response variable, the researcher cannot claim causality. If a
researcher is interested in demonstrating how changes in the explanatory variable
cause changes in the response variable, the researcher needs to conduct an
experiment.


Definition An experiment is a controlled study conducted to determine the effect varying
one or more explanatory variables or factors has on a response variable. Any

Historical

combination of the values of the factors is called a treatment.

Note

Sir Ronald Fisher, often

In an experiment, the experimental unit is a person, object, or some other well-

called the Father of Modern Statistics,

defined item upon which a treatment is applied. We often refer to the experimental

was born in England on February 17,

unit as a subject when he or she is a person. The subject is analogous to the individ


1890. He received a BA in astronomy
ual in a survey.


from Cambridge University in 1912.
In 1914, he took a position teaching The overriding goal in an experiment is to determine the effect various treat-


mathematics and physics at a high ments have on the response variable. For example, we might want to determine
school. He did this to help serve his whether a new treatment is superior to an existing treatment (or no treatment at
country during World War I. (He was


all). To make this determination, experiments require a control group.A control
rejected by the army because of his

group serves as a baseline treatment that can be used to compare to other treat-
poor eyesight.) In 1919, Fisher took a

ments. For example, a researcher in education might want to determine if students

job as a statistician at Rothamsted

who do their homework using an online homework system do better on an exam

Experimental Station, where he was

than those who do their homework from the text. The students doing the text

involved in agricultural research. In

homework might serve as the control group (since this is the currently accepted

1933, Fisher became Galton Professor
practice). The factor is the type of homework. There are two treatments: online

of Eugenics at Cambridge University,
where he studied Rh blood groups. In homework and text homework. A second method for defining the control group


1943 he was appointed to the Balfour is through the use of a placebo.A placebo is an innocuous medication, such as a
Chair of Genetics at Cambridge. He sugar tablet, that looks, tastes, and smells like the experimental medication.
was knighted by Queen Elizabeth in


In an experiment, it is important that each group be treated the same way.
1952. Fisher retired in 1957 and died

It is also important that individuals do not adjust their behavior in some way due

in Adelaide, Australia, on July 29,

to the treatment they are receiving. For this reason, many experiments use a

1962. One of his famous quotations

technique called blinding. Blinding refers to nondisclosure of the treatment an

is “To call in the statistician after

experimental unit is receiving. There are two types of blinding: single blinding and

the experiment is done may be no

double blinding.

more than asking him to perform a
postmortem examination: he may
be able to say what the experiment
died of.”


Definitions
A single-blind experiment is one in which the experimental unit (or subject)
does not know which treatment he or she is receiving. A double-blind experiment
is one in which neither the experimental unit nor the researcher in
contact with the experimental unit knows which treatment the experimental
unit is receiving.

EXAMPLE 1
The Characteristics of an Experiment

Problem: Lipitor is a cholesterol-lowering drug by Pfizer. In the Collaborative
Atorvastatin Diabetes Study (CARDS), the effect of Lipitor on cardiovascular disease
was assessed in 2,838 subjects, ages 40 to 75, with type 2 diabetes, without prior
history of cardiovascular disease. In this placebo-controlled, double-blind experiment,
subjects were randomly allocated to either Lipitor 10 mg daily (1,429) or
placebo (1,411) and were followed for 4 years. The response variable was the occurrence
of any major cardiovascular event.


Section 1.6 The Design of Experiments 47

Lipitor significantly reduced the rate of major cardiovascular events (83 events
in the Lipitor group versus 127 events in the placebo group).There were 61 deaths in
the Lipitor group versus 82 deaths in the placebo group.

(a) What does it mean for the experiment to be placebo-controlled?
(b) What does it mean for the experiment to be double-blind?
(c) What is the population for which this study applies? What is the sample?
(d) What are the treatments?
(e) What is the response variable?
Approach: We will apply the definitions just presented.
Solution

(a) The placebo is a medication that looks, smells, and tastes like Lipitor. The purpose
of the placebo control group is to serve as a baseline against which to compare
the results from the group receiving Lipitor. Another reason for the placebo is to
account for the fact that people tend to behave differently when they are in a study.
By having a placebo control group, the effect of this is neutralized.
(b) Since the experiment is double-blind, the subjects do not know whether they are
receiving Lipitor or the placebo. Plus, the individual monitoring the subjects does
not know whether the subject is receiving Lipitor or the placebo. The reason we
double-blind is so that the subjects receiving the medication do not behave differently
from those receiving the placebo and the individual monitoring the subjects does not
treat the folks in the Lipitor group differently from those in the placebo group.
(c) The population is individuals from 40 to 75 years of age with type 2 diabetes
without a prior history of cardiovascular disease. The sample is the 2,838 subjects in
the study.
(d) The treatments are 10 mg of Lipitor or a placebo daily.
(e) The response variable is whether the subject had any major cardiovascular
event, such as a stroke, or not.
Now Work Problem 9
2
Explain the Steps in Designing
an Experiment

To design an experiment means to describe the overall plan in conducting the
experiment. The process of conducting an experiment requires a series of steps.

Step 1: Identify the Problem to Be Solved. The statement of the problem should
be as explicit as possible. The statement should provide the experimenter with
direction. In addition, the statement must identify the response variable and the
population to be studied. Often, the statement is referred to as the claim.

Step 2: Determine the Factors That Affect the Response Variable. The factors
are usually identified by an expert in the field of study. In identifying the factors,
we must ask, “What things affect the value of the response variable?” Once the
factors are identified, it must be determined which factors will be fixed at some
predetermined level, which will be manipulated, and which will be uncontrolled.

Step 3: Determine the Number of Experimental Units. As a general rule, choose
as many experimental units as time and money will allow.Techniques do exist for
determining sample size, provided certain information is available. Some of these
techniques are discussed later in the text.

Step 4: Determine the Level of Each Factor. There are two ways to deal with

the factors:

1. Control: There are two ways to control the factors.
(a) Fix their level at one predetermined value throughout the experiment.
These are factors whose effect on the response variable is not of interest.

48 Chapter 1 Data Collection

(b) Set them at predetermined levels. These are the factors whose effect
on the response variable interests us. The combinations of the levels of
these factors constitute the treatments in the experiment.
2. Randomize: Randomize the experimental units to various treatment
groups so that the effect of factors whose levels cannot be controlled is minimized.
The idea is that randomization averages out the effects of uncontrolled
factors (explanatory variables). It is difficult, if not impossible, to
identify all factors in an experiment. This is why randomization is so important.
It mutes the effect of variation attributable to factors not controlled.
Step 5: Conduct the Experiment.

(a) The experimental units are randomly assigned to the treatments. Replication
occurs when each treatment is applied to more than one experimental unit. By
using more than one experimental unit for each treatment, we can be assured
that the effect of a treatment is not due to some characteristic of a single experimental
unit. It is a good idea to assign an equal number of experimental units to
each treatment.
(b) Collect and process the data. Measure the value of the response variable
for each replication. Then organize the results. The idea is that the value of the
response variable for each treatment group is the same before the experiment
because of randomization. Then any difference in the value of the response
variable among the different treatment groups can be attributed to differences
in the level of the treatment.
Step 6: Test the Claim. This is the subject of inferential statistics. Inferential
statistics is a process in which generalizations about a population are made on
the basis of results obtained from a sample. In addition, a statement regarding
our level of confidence in our generalization is provided. We study methods of
inferential statistics in Chapters 9 through 12.

Explain the Completely Randomized Design

The steps just given apply to any type of designed experiment. We now concentrate
on the simplest type of experiment.

Definition
A completely randomized design is one in which each experimental unit is randomly
assigned to a treatment.

We illustrate this type of experimental design using the steps just given.

3
EXAMPLE 2
A Completely Randomized Design

Problem: A farmer wishes to determine the optimal level of a new fertilizer on his
soybean crop. Design an experiment that will assist him.

Approach: We follow the steps for designing an experiment.

Solution

Step 1: The farmer wants to identify the optimal level of fertilizer for growing soybeans.
We define optimal as the level that maximizes yield. So the response variable
will be crop yield.


Step 2: Some factors that affect crop yield are fertilizer, precipitation, sunlight,
method of tilling the soil, type of soil, plant, and temperature.
Step 3: In this experiment, we will plant 60 soybean plants (experimental units).



Section 1.6 The Design of Experiments 49

In Other Words

The various levels of the factor are the
treatments in a completely randomized
design.

Figure 6

Step 4: We list the factors and their levels.


Fertilizer. This factor will be set at three levels. We wish to measure the effect of
varying the level of this variable on the response variable, yield. We will set the
treatments (level of fertilizer) as follows:
Treatment A: 20 soybean plants receive no fertilizer.

Treatment B: 20 soybean plants receive 2 teaspoons of fertilizer per gallon
of water every 2 weeks.
Treatment C: 20 soybean plants receive 4 teaspoons of fertilizer per gallon

of water every 2 weeks.
See Figure 6.


Treatment Treatment Treatment
ABC



Precipitation. Although we cannot control the amount of rainfall, we can control
the amount of watering we do.This factor will be controlled so that each plant receives
the same amount of precipitation.

Sunlight. This is an uncontrollable factor, but it will be roughly the same for each
plant.

Method of tilling. We can control this factor.We agree to use the round-up ready
method of tilling for each plant.

Type of soil. We can control certain aspects of the soil such as level of acidity.
In addition, each plant will be planted within a 1-acre area, so it is reasonable to
assume that the soil conditions for each plant are equivalent.

Plant. There may be variation from plant to plant. To account for this, we randomly
assign the plants to a treatment.

Temperature. This factor is not within our control, but will be the same for each
plant.
Step 5

(a) We need to assign each plant to a treatment group. To do this, we will number the
plants from 1 to 60.To determine which plants get treatmentA,we randomly generate
20 numbers.The plants corresponding to these numbers get treatmentA.Now number
the remaining plants 1 to 40 and randomly generate 20 numbers.The plants correspondingtothesenumbersgettreatmentB.
Theremainingplantsget treatmentC.Nowtill
the soil, plant the soybean plants, and fertilize according to the schedule prescribed.
(b) At the end of the growing season, determine the crop yield for each plant.
Step 6: Determine whether any differences in yield exist among the three treatment
groups.
Figure 7 illustrates the experimental design.

Figure 7

Group 1 receives Treatment A:
20 plants No fertilizer

Random

Compare

assignment Group 2 receives Treatment B:

yield

of plants to 20 plants 2 teaspoons
treatments

Group 3 receives Treatment C:
20 plants 4 teaspoons


50 Chapter 1 Data Collection

Now Work Problem 11

4

Definition

EXAMPLE 3

Match students
according to
gender and IQ.

Example 2 is a completely randomized design because the experimental units
(the plants) were randomly assigned to the treatments. It is the most popular experimental
design because of its simplicity, but it is not always the best. We discuss
inferential procedures for the completely randomized design in which there are two
treatments in Section 11.2 and in which there are three or more treatments in
Section C.4 on the CD that accompanies this text.

Explain the Matched-Pairs Design

Another type of experimental design is called a matched-pairs design.

A matched-pairs design is an experimental design in which the experimental
units are paired up. The pairs are matched up so that they are somehow related
(that is, the same person before and after a treatment, twins, husband and wife,
same geographical location, and so on). There are only two levels of treatment
in a matched-pairs design.

In matched-pairs design, one matched individual will receive one treatment and
the other matched individual receives a different treatment. The assignment of the
matched pair to the treatment is done randomly using a coin flip or a random-
number generator. We then look at the difference in the results of each matched
pair. One common type of matched-pairs design is to measure a response variable
on an experimental unit before a treatment is applied, and then to measure the
response variable on the same experimental unit after the treatment is applied. In
this way, the individual is matched against itself. These experiments are sometimes
called before–after or pretest–posttest experiments.

A Matched-Pairs Design

Problem: An educational psychologist wanted to determine whether listening to
music has an effect on a student’s ability to learn. Design an experiment to help the
psychologist answer the question.

Approach: We will use a matched-pairs design by matching students according to
IQ and gender (just in case gender plays a role in learning with music).

Solution: We match students according to IQ and gender. For example, a female
with an IQ in the 110 to 115 range will be matched with a second female with an IQ
in the 110 to 115 range.

For each pair of students, we will flip a coin to determine whether the first
student in the pair is assigned the treatment of a quiet room or a room with music
playing in the background.

Each student will be given a statistics textbook and asked to study Section 1.1.
After 2 hours, the students will enter a testing center and take a short quiz on the
material in the section. We compute the difference in the scores of each matched
pair.Any differences in scores will be attributed to the treatment.Figure 8 illustrates
the design.

Figure 8

For each matched

Administer

Randomly assign a pair, compute the

treatment and

student from each difference in

exam to each

pair to a treatment. scores on the

matched pair.

exam.

Now Work Problem 13

We discuss statistical inference for the matched-pairs design in Section 11.1.


Section 1.6 The Design of Experiments 51

One note about the relation between a designed experiment and simple random
sampling: It is often the case that the experimental units selected to participate in a
study are not randomly selected. This is because we often need the experimental units
to have some common trait, such as high blood pressure. For this reason, participants in
experiments are recruited or volunteer to be in a study. However, once we have the experimental
units, we use simple random sampling to assign them to treatment groups.
With random assignment we assume that the participants are similar at the start of the
experiment. Because the treatment is the only difference between the groups, we can
say the treatment caused the difference observed in the response variable.

Experimental Design (Hippity-Hop)
You are commissioned by the board of directors of Paper Toys, Inc.
to design a new paper frog for their Christmas catalog. The design for
the construction of the frog has already been completed and will be
provided to you. However, the material with which to make the frogs
has not yet been determined. The Materials Department has narrowed
the choices down to either newspaper or brown paper (such as that used
in grocery bags). You have decided to test both types of paper. Manage-
ment decided to build the frogs from sheets of paper 9 inches square.
The goal of the experiment is to determine the material that results
in frogs that jump farther.
(a) As a class, design an experiment that will answer the research
question.
(b) Make the frogs.
(c) Conduct the experiment.
(d) As a class, discuss the strengths and weaknesses of the design.
Would you change anything?
1.6 ASSESS YOUR UNDERSTANDING
Concepts and Vocabulary

1.
Define the following:
(a) Experimental unit
(b) Treatment
(c) Response variable
(d) Factor
(e) Placebo
(f) Confounding
2.
What is replication in an experiment?
3.
Explain the difference between a single-blind and a double-
blind experiment.
4.
List the steps in designing an experiment.
5.
A(n) ______ ______ design is one in which each experimental
unit is randomly assigned to a treatment. A(n) ______
______ design is one in which the experimental units are
paired up.
6.
True or False: Generally, the goal of an experiment is to determine
the effect that treatments will have on the response
variable.
7.
True or False: Observational studies can be used to
determine causality between explanatory and response
variables.
8.
Discuss why control groups are needed in experiments.
Applying the Concepts

9.
Caffeinated Sports Drinks Researchers conducted a double-
blind, placebo-controlled, repeated-measures experiment
to compare the effectiveness of a commercial caffeinated
carbohydrate–electrolyte sports drink with a commercial
noncaffeinated carbohydrate–electrolyte sports drink and a
flavored-water placebo. Sixteen highly trained cyclists each
completed three trials of prolonged cycling in a warm environment:
one while receiving the placebo, one while receiving
the noncaffeinated sports drink, and one while receiving
the caffeinated sports drink. For a given trial, one beverage
treatment was administered throughout a 2-hour variable-
intensity cycling bout followed by a 15-minute performance
ride.Total work in kilojoules (kJ) performed during the final
15 minutes was used to measure performance. The beverage
order for the individual subjects was randomly assigned. A
period of at least 5 days separated the trials. All trials took
place at approximately the same time of day in an environmental
chamber at 28.5°C and 60% relative humidity with
fan airflow of approximately 2.5 meters per second (m/s).
NW
The researchers found that cycling performance, as assessed
by the total work completed during the performance
ride, was 23% greater for the caffeinated sports drink than
for the placebo and 15% greater for the caffeinated sports
drink than for the noncaffeinated sports drink. Cycling


52 Chapter 1 Data Collection

performances for the noncaffeinated sports drink and the
placebo were not significantly different. The researchers
concluded that the caffeinated carbohydrate–electrolyte
sports drink substantially enhanced physical performance
during prolonged exercise compared with the noncaffeinated
carbohydrate–electrolyte sports drink and the placebo.

Source: Kirk J. Cureton, Gordon L. Warren et al. “Caffeinated
Sports Drink: Ergogenic Effects and Possible Mechanisms,”
International Journal of Sport Nutrition and Exercise
Metabolism, 17(1):35–55, 2007

(a) What does it mean for the experiment to be placebo-
controlled?
(b) What does it mean for the experiment to be double-
blind? Why do you think it is necessary for the experiment
to be double-blind?
(c) How is randomization used in this experiment?
(d) What is the population for which this study applies?
What is the sample?
(e) What are the treatments?
(f) What is the response variable?
(g) This experiment used a repeated-measures design, a design
type that has not been directly discussed in this
textbook. Using this experiment as a guide, determine
what it means for the design of the experiment to be
repeated-measures. How does this design relate to the
matched-pairs design?
10.
Alcohol Dependence To determine if topiramate is a safe
and effective treatment for alcohol dependence, researchers
conducted a 14-week trial of 371 men and women aged 18 to
65 years diagnosed with alcohol dependence. In this double-
blind, randomized, placebo-controlled experiment, subjects
were randomly given either 300 milligrams (mg) of topiramate
(183 subjects) or a placebo (188 subjects) daily, along
with a weekly compliance enhancement intervention. The
variable used to determine the effectiveness of the treatment
was self-reported percentage of heavy drinking days.
Results indicated that topiramate was more effective than
placebo at reducing the percentage of heavy drinking days.
The researchers concluded that topiramate is a promising
treatment for alcohol dependence.
Source: Bankole A. Johnson, Norman Rosenthal, et al.
“Topiramate for Treating Alcohol Dependence: A Randomized
Controlled Trial,” Journal of the American Medical
Association, 298(14):1641–1651, 2007

(a) What does it mean for the experiment to be placebo-
controlled?
(b) What does it mean for the experiment to be double-
blind? Why do you think it is necessary for the experiment
to be double-blind?
(c) What does it mean for the experiment to be randomized?
(d) What is the population for which this study applies?
What is the sample?
(e) What are the treatments?
(f) What is the response variable?
11.
School Psychology A school psychologist wants to test the
effectiveness of a new method for teaching reading. She recruits
500 first-grade students in District 203 and randomly
divides them into two groups. Group 1 is taught by means of
the new method, while group 2 is taught via traditional
NW
methods. The same teacher is assigned to teach both groups.
At the end of the year, an achievement test is administered
and the results of the two groups are compared.

(a) What is the response variable in this experiment?
(b) Think of some of the factors in the study. How are they
controlled?
(c) What are the treatments? How many treatments are
there?
(d) How are the factors that are not controlled dealt with?
(e) Which group serves as the control group?
(f) What type of experimental design is this?
(g) Identify the subjects.
(h) Draw a diagram similar to Figure 7, 8, or 10 to illustrate
the design.
12.
Pharmacy A pharmaceutical company has developed an
experimental drug meant to relieve symptoms associated with
the common cold. The company identifies 300 adult males
25 to 29 years old who have a common cold and randomly divides
them into two groups. Group 1 is given the experimental
drug, while group 2 is given a placebo. After 1 week of treatment,
the proportions of each group that still have cold symptoms
are compared.
(a) What is the response variable in this experiment?
(b) Think of some of the factors in the study. How are they
controlled?
(c) What are the treatments? How many treatments are
there?
(d) How are the factors that are not controlled dealt with?
(e) What type of experimental design is this?
(f) Identify the subjects.
(g) Draw a diagram similar to Figure 7, 8, or 10 to illustrate
the design.
13.
Whiter Teeth An ad for Crest Whitestrips Premium claims
that the strips will whiten teeth in 7 days and the results will
last for 12 months.A researcher who wishes to test this claim
studies 20 sets of identical twins. Within each set of twins,
one is randomly selected to use Crest Whitestrips Premium
in addition to regular brushing and flossing, while the other
just brushes and flosses. Whiteness of teeth is measured at
the beginning of the study, after 7 days, and every month
thereafter for 12 months.
NW
(a) What type of experimental design is this?
(b) What is the response variable in this experiment?
(c) What are the treatments?
(d) What are other factors (controlled or uncontrolled) that
could affect the response variable?
(e) What might be an advantage of using identical twins as
subjects in this experiment?
14.
Assessment To help assess student learning in her developmental
math courses, a mathematics professor at a community
college implemented pre-and posttests for her
developmental math students.A knowledge-gained score was
obtained by taking the difference of the two test scores.
(a) What type of experimental design is this?
(b) What is the response variable in this experiment?
(c) What is the treatment?
15.
Insomnia Researchers Jack D. Edinger and associates wanted
to test the effectiveness of a new cognitive behavioral therapy
(CBT) compared with both an older behavioral treatment and

Section 1.6
Section 1.6Section 1.6 The Design of Experiments
The Design of ExperimentsThe Design of Experiments 53
5353

a placebo therapy for treating insomnia. They identified
75 adults with chronic insomnia. Patients were randomly
assigned to one of three treatment groups.Twenty-five patients
were randomly assigned to receive CBT (sleep education,
stimulus control, and time-in-bed restrictions), another 25 received
muscle relaxation training (RT), and the final 25
received a placebo treatment. Treatment lasted 6 weeks, with
follow-up conducted at 6 months.To measure the effectiveness
of the treatment, researchers used wake time after sleep onset
(WASO). Cognitive behavioral therapy produced larger improvements
than did RT or placebo treatment. For example,
the CBT-treated patients achieved an average 54% reduction
in their WASO, whereas RT-treated and placebo-treated
patients, respectively, achieved only 16% and 12% reductions
in this measure. Results suggest that CBT treatment leads to
significant sleep improvements within 6 weeks, and these improvements
appear to endure through 6 months of follow-up.

Source: Jack D. Edinger, PhD; William K. Wohlgemuth, PhD;
RodneyA.Radtke,MD;Gail R.Marsh,PhD;Ruth E.Quillian,
PhD.“Cognitive BehavioralTherapy forTreatment of Chronic
Primary Insomnia,” Journal of the American Medical Association,
285:1856–1864, 2001

(a) What type of experimental design is this?
(b) What is the population being studied?
(c) What is the response variable in this study?
(d) What are the treatments?
(e) Identify the experimental units.
(f) Draw a diagram similar to Figure 7 or 8 to illustrate the
design.
16.
Depression Researchers wanted to compare the effectiveness
and safety of an extract of St. John’s wort with placebo
in outpatients with major depression.To do this,they recruited
200 adult outpatients diagnosed as having major depression
and having a baseline Hamilton Rating Scale for
Depression (HAM-D) score of at least 20. Participants were
randomly assigned to receive either St. John’s wort extract,
900 milligrams per day (mg/d) for 4 weeks, increased to
1200 mg/d in the absence of an adequate response thereafter,
or a placebo for 8 weeks. The response variable was the
change on the HAM-D over the treatment period. After
analysis of the data, it was concluded that St. John’s wort
was not effective for treatment of major depression.
Source: Richard C. Shelton, MD, et al. “Effectiveness of St.
John’s Wort in Major Depression,” Journal of the American
Medical Association 285:1978–1986, 2001

(a) What type of experimental design is this?
(b) What is the population that is being studied?
(c) What is the response variable in this study?
(d) What are the treatments?
(e) Identify the experimental units.
(f)
What is the control group in this study?
(g) Draw a diagram similar to Figure 7 or 8 to illustrate the
design.
17.
The Memory Drug? Researchers wanted to
evaluate whether ginkgo, an over-the-counter
herb marketed as enhancing memory, improves
memory in elderly adults as measured by objective
tests. To do this, they recruited 98 men and
132 women older than 60 years and in good
health. Participants were randomly assigned to receive
ginkgo, 40 milligrams (mg) 3 times per day,
or a matching placebo. The measure of memory improvement
was determined by a standardized test of learning and
memory. After 6 weeks of treatment, the data indicated that
ginkgo did not increase performance on standard tests of
learning, memory, attention, and concentration. These data
suggest that, when taken following the manufacturer’s instructions,
ginkgo provides no measurable increase in memory or
related cognitive function to adults with healthy cognitive
function.
Source: Paul R. Solomon et al. “Ginkgo for Memory
Enhancement,” Journal of the American Medical Association
288:835–840, 2002

(a) What type of experimental design is this?
(b) What is the population being studied?
(c) What is the response variable in this study?
(d) What is the factor that is set to predetermined levels?
What are the treatments?
(e) Identify the experimental units.
(f)
What is the control group in this study?
(g) Draw a diagram similar to Figure 7 or 8 to illustrate the
design.
18.
Treating Depression Researchers wanted to test whether a
new drug therapy results in a more rapid response in patients
with major depression. To do this, they recruited 63 inpatients
with a diagnosis of major depression. Patients were
randomly assigned to two treatment groups receiving either
placebo (31 patients) or the new drug therapy (32 patients).
The response variable was the Hamilton Rating Scale for
Depression score. After collecting and analyzing the data, it
was concluded that the new drug therapy is effective in the
treatment of major depression.
Source: Jahn Holger, MD, et al. “Metyrapone as Additive
Treatment in Major Depression,” Archives of General Psychiatry,
61:1235–1244, 2004
(a) What type of experimental design is this?
(b) What is the population that is being studied?
(c) What is the response variable in this study?
(d) What are the treatments?
(e) Identify the experimental units.
(f) Draw a diagram similar to Figure 7 or 8 to illustrate the
design.
19.
Dominant Hand Professor Andy Neill wanted to determine
if the reaction time of people differs in their dominant
hand versus their nondominant hand. To do this, he recruited
15 students. Each student was asked to hold a yardstick
between the index finger and thumb. The student was asked
to open the hand, release the yardstick, and then asked to
catch the yardstick between the index finger and thumb. The
distance that the yardstick fell served as a measure of reaction
time. A coin flip was used to determine whether the student
would use their dominant hand first or the nondominant
hand. Results indicated that the reaction time in the dominant
hand exceeded that of the nondominant hand.
(a) What type of experimental design is this?
(b) What is the response variable in this study?
(c) What is the treatment?
(d) Identify the experimental units.
(e) Why did Professor Neill use a coin flip to determine
whether the student should begin with the dominant
hand or the nondominant hand?
(f) Draw a diagram similar to Figure 7 or 8 to illustrate the
design.

54 Chapter 1 Data Collection

20.
Golf Anyone? A local golf pro wanted to compare two
styles of golf club. One golf club had a graphite shaft and the
other had the latest style of steel shaft. It is a common belief
that graphite shafts allow a player to hit the ball farther, but
the manufacturer of the new steel shaft said the ball travels
just as far with its new technology. To test this belief, the pro
recruited 10 golfers from the driving range. Each player was
asked to hit one ball with the graphite-shafted club and one
ball with the new steel-shafted club. The distance that the
ball traveled was determined using a range finder. A coin
flip was used to determine whether the player hit with the
graphite club or the steel club first. Results indicated that
the distance the ball was hit with the graphite club was no
different than the distance when using the steel club.
(a) What type of experimental design is this?
(b) What is the response variable in this study?
(c) What is the factor that is set to predetermined levels?
What is the treatment?
(d) Identify the experimental units.
(e) Why did the golf pro use a coin flip to determine
whether the golfer should hit with the graphite first or
the steel first?
(f) Draw a diagram similar to Figure 7 or 8 to illustrate the
design.
21.
Drug Effectiveness A pharmaceutical company wants to
test the effectiveness of an experimental drug meant to reduce
high cholesterol. The researcher at the pharmaceutical
company has decided to test the effectiveness of the drug
through a completely randomized design. She has obtained
20 volunteers with high cholesterol: Ann, John, Michael,
Kevin, Marissa, Christina, Eddie, Shannon, Julia, Randy, Sue,
Tom, Wanda, Roger, Laurie, Rick, Kim, Joe, Colleen, and
Bill. Number the volunteers from 1 to 20. Use a random-
number generator to randomly assign 10 of the volunteers
to the experimental group. The remaining volunteers will go
into the control group. List the individuals in each group.
22.
Effects of Alcohol A researcher has recruited 20 volunteers
to participate in a study. The researcher wishes to measure
the effect of alcohol on an individual’s reaction time. The
20 volunteers are randomly divided into two groups. Group 1
will serve as a control group in which participants drink four
1-ounce cups of a liquid that looks, smells, and tastes like
alcohol in 15-minute increments. Group 2 will serve as an
experimental group in which participants drink four 1-ounce
cups of 80-proof alcohol in 15-minute increments. After
drinking the last 1-ounce cup, the participants sit for 20 minutes.
After the 20-minute resting period, the reaction time to
a stimulus is measured.
(a) What type of experimental design is this?
(b) Use Table I in Appendix A or a random-number generator
to divide the 20 volunteers into groups 1 and 2 by assigning
the volunteers a number between 1 and 20. Then randomly
select 10 numbers between 1 and 20.The individuals
corresponding to these numbers will go into group 1.
23.
Tomatoes An oncologist wants to perform a long-term
study on the benefits of eating tomatoes. In particular, she
wishes to determine whether there is a significant difference
in the rate of prostate cancer among adult males after eating
one serving of tomatoes per week for 5 years, after eating
three servings of tomatoes per week for 5 years, and after
eating five servings of tomatoes per week for 5 years. Help
the oncologist design the experiment. Include a diagram to
illustrate your design.

24.
Batteries An engineer wants to determine the effect of temperature
on battery voltage. In particular, he is interested in
determining if there is a significant difference in the voltage of
the batteries when exposed to temperatures of 90°F, 70°F, and
50°F. Help the engineer design the experiment. Include a diagram
to illustrate your design.
25.
The Better Paint Suppose you are interested in comparing
Benjamin Moore’s MoorLife Latex house paint with
Sherwin Williams’ LowTemp 35 Exterior Latex paint.
Design an experiment that will answer this question: Which
paint is better for painting the exterior of a house? Include a
diagram to illustrate your design.
26.
Tire Design An engineer has just developed a new tire
design. However, before going into production, the tire company
wants to determine if the new tire reduces braking
distance on a car traveling 60 miles per hour compared with
radial tires. Design an experiment to help the engineer determine
if the new tire reduces braking distance.
27.
Designing an Experiment Researchers wish to know if
there is a link between hypertension (high blood pressure)
and consumption of salt. Past studies have indicated that
the consumption of fruits and vegetables offsets the negative
impact of salt consumption. It is also known that there
is quite a bit of person-to-person variability as far as the
ability of the body to process and eliminate salt. However,
no method exists for identifying individuals who have a
higher ability to process salt. The U.S. Department of Agriculture
recommends that daily intake of salt should not
exceed 2400 milligrams (mg). The researchers want to keep
the design simple, so they choose to conduct their study
using a completely randomized design.
(a) What is the response variable in the study?
(b) Name three factors that have been identified.
(c) For each factor identified,
determine whether the
variable can be controlled or cannot be controlled. If a
factor cannot be controlled, what should be done to reduce
variability in the response variable?
(d) How many treatments would you recommend? Why?
28.
Search a newspaper, magazine, or other periodical that
describes an experiment. Identify the population, experimental
unit, response variable, treatment, factors, and their levels.
29.
Research the placebo effect and the Hawthorne effect. Write
a paragraph that describes how each affects the outcome of
an experiment.
30.
Coke or Pepsi Suppose you want to perform an experiment
whose goal is to determine whether people prefer Coke or
Pepsi. Design an experiment that utilizes the completely
randomized design. Design an experiment that utilizes the
matched-pairs design. In both designs, be sure to identify the
response variable, the role of blinding, and randomization.
Which design do you prefer? Why?
31.
Putting It Together: Mosquito Control In an attempt to
identify ecologically friendly methods for controlling
mosquito populations, researchers conducted field experiments
in India where aquatic nymphs of the dragonfly
Brachytron pretense were used against the larvae of

Section 1.6 The Design of Experiments 55

mosquitoes. For the experiment, the researchers selected
ten 300-liter (L) outdoor, open, concrete water tanks, which
were natural breeding places for mosquitoes. Each tank was
manually sieved to ensure that it was free of any nonmosquito
larvae, nymphs, or fish. Only larvae of mosquitoes
were allowed to remain in the tanks. The larval density in
each tank was assessed using a 250-milliliter (mL) dipper.
For each tank, 30 dips were taken and the mean larval density
per dip was calculated. Ten freshly collected nymphs of
Brachytron pretense were introduced into each of five randomly
selected tanks. No nymphs were released into the remaining
five tanks, which served as controls. After 15 days,
larval densities in all the tanks were assessed again and all
the introduced nymphs were removed. After another 15
days, the larval densities in all the tanks were assessed a
third time.

In the nymph-treated tanks, the density of larval mosquitoes
dropped significantly from 7.34 to 0.83 larvae per dip
15 days after the Brachytron pretense nymphs were introduced.
Further, the larval density increased significantly to

6.83 larvae per dip 15 days after the nymphs were removed.
Over the same time period, the control tanks did not show a
significant difference in larval density, with density measurements
of 7.12, 6.83, and 6.79 larvae per dip. The researchers
concluded that Brachytron pretense can be used effectively
as a strong, ecologically friendly control of mosquitoes and
mosquito borne diseases.

Source: S. N. Chatterjee, A. Ghosh, and G. Chandra.
“Eco-Friendly Control of Mosquito Larvae by Brachytron
pretense Nymph,” Journal of Environmental Health,
69(8):44–48, 2007

(a) Identify the research objective.
(b) What type of experimental design is this?
(c) What is the response variable? It is quantitative
or
qualitative? If quantitative, is it discrete or continuous?
(d) What is the factor the researchers controlled and set
to predetermined levels? What are the treatments?
(e) Can you think of other factors that may affect larvae of
mosquitoes? How are they controlled or dealt with?
(f)
What is the population for which this study applies?
What is the sample?
(g) List the descriptive statistics.
(h) How did the researchers control this experiment?
(i) Draw a diagram similar to Figure 7 or 8 to illustrate the
design.
(j) State the conclusion made in the study.
Emotional “Aspirin”
Americans have a long history of altering their moods with
chemicals, ranging from alcohol and illicit drugs to prescription
medications, such as diazepam (Valium) for anxiety and
fluoxetine (Prozac) for depression.Today,there’s a new trend:
the over-the-counter availability of apparently effective mood
modifiers in the form of herbs and other dietary supplements.

One problem is that many people who are treating
themselves with these remedies may be sufficiently anxious or
depressed to require professional care and monitoring.
Self-treatment can be dangerous, particularly with depression,
which causes some 20,000 reported suicides a year in the
United States.Another major pitfall is that dietary supplements
are largely unregulated by the government, so consumers have
almost no protection against substandard preparations.

To help consumers and doctors, Consumer Reports tested
the amounts of key ingredients in representative brands of
several major mood-changing pills. To avoid potential bias,
we tested samples from different lots of the pills using a
randomized statistical design. The table contains a subset of
the data from this study.

Each of these pills has a label claim of 200 mg of
SAM-E. The column labeled Random Code contains a set of
3-digit random codes that were used so that the laboratory
did not know which manufacturer was being tested. The column
labeled Mg SAM-E contains the amount of SAM-E
measured by the laboratory.

(a)
Why is it important to label the pills with random codes?
(b) Why is it important to randomize the order in which the
pills are tested instead of testing all of brand A first, followed
by all of brand B, and so on?
Run Order Brand Random Code Mg SAM-E
1 B 461 238.9
2 D 992 219.2
3 C 962 227.1
4 A 305 231.2
5 B 835 263.7
6 D 717 251.1
7 A 206 232.9
8 D 649 192.8
9 C 132 213.4
10 B 923 224.6
11 A 823 261.1
12 C 515 207.8

(c) Sort the data by brand. Does it appear that each brand is
meeting its label claims?
(d) Design an experiment that follows the steps presented to
answer the following research question: “Is there a difference
in the amount of SAM-E contained in brands A, B, C, and D?”
Note to Readers: In many cases, our test protocol and analytical
methods are more complicated than described in this
example. The data and discussion have been modified to
make the material more appropriate for the audience.

Source: © 2002 by Consumers Union of U.S., Inc., Yonkers, NY 107031057,
a nonprofit organization. Reprinted with permission from the
Dec. 2002 issue of CONSUMER REPORTS® for educational
purposes only.No commercial use or photocopying permitted.To learn
more about Consumers Union, log onto www.ConsumersReports.org.


56 Chapter 1 Data Collection

CHAP TER 1 REVIEW
Summary

We defined statistics as a science in which data are collected,organized,
summarized, and analyzed to infer characteristics regarding a
population. Statistics also provides a measure of confidence in the
conclusions that are drawn.Descriptivestatisticsconsistsoforganizing
and summarizing information, while inferential statistics consists
of drawing conclusions about a population based on results
obtained from a sample.The population is a collection of individuals
about which information is desired and the sample is a subset of
the population.

Data are the observations of a variable. Data can be either
qualitative or quantitative. Quantitative data are either discrete
or continuous.

Data can be obtained from four sources: a census, existing
sources,observational studies,or a designed experiment.A census

will list all the individuals in the population, along with certain
characteristics. Due to the cost of obtaining a census, most researchers
opt for obtaining a sample. In observational studies,
the response variable is measured without attempting to influence
its value. In addition, the explanatory variable is not manipulated.
Designed experiments are used when control of the
individuals in the study is desired to isolate the effect of a certain
treatment on a response variable.

We introduced five sampling methods: simple random sampling,
stratified sampling, systematic sampling, cluster sampling,
and convenience sampling. All the sampling methods, except for
convenience sampling, allow for unbiased statistical inference to
be made. Convenience sampling typically leads to an unrepresentative
sample and biased results.

Vocabulary

Be sure you can define the following:

Statistics (p. 3)
Data (pp. 3, 9)
Population (p. 5)
Individual (p. 5)
Sample (p. 5)
Descriptive statistics (p. 5)
Statistic (p. 5)
Inferential statistics (pp. 5, 48)
Parameter (p. 5)
Variable (p. 7)
Qualitative or categorical variable (p. 7)
Quantitative variable (p. 7)
Discrete variable (p. 8)
Continuous variable (p. 8)
Qualitative data (p. 9)
Quantitative data (p. 9)
Discrete data (p. 9)
Continuous data (p. 9)
Nominal level of measurement (p. 10)
Ordinal level of measurement (p. 10)
Interval level of measurement (p. 10)
Ratio level of measurement (p. 10)
Validity (p. 11)


Reliability (p. 11)
Explanatory variable (p. 16)
Response variable (p. 16)
Observational study (p. 16)
Designed experiment (p. 16)
Confounding (pp. 17, 51)
Lurking variable (p. 17)
Retrospective (p. 18)
Prospective (p. 19)
Census (p. 19)
Random sampling (p. 23)
Simple random sampling (p. 23)
Simple random sample (p. 23)
Frame (p. 24)
Sampling without replacement (p. 24)
Sampling with replacement (p. 24)
Seed (p. 26)
Stratified sample (p. 30)
Systematic sample (p. 31)
Cluster sample (p. 33)
Convenience sample (p. 34)
Self-selected (p. 34)
Voluntary response (p. 34)


Bias (p. 38)
Sampling bias (p. 38)
Undercoverage (p. 38)
Nonresponse bias (p. 39)
Response bias (p. 40)
Open question (p. 41)
Closed question (p. 41)
Nonsampling error (p. 42)
Sampling error (p. 42)
Experiment (p. 46)
Factors (p. 46)
Treatment (p. 46)
Experimental unit (p. 46)
Subject (p. 46)
Control group (p. 46)
Placebo (p. 46)
Blinding (p. 46)
Single-blind (p. 46)
Double-blind (p. 46)
Design (p. 47)
Replication (p. 48)
Completely randomized design (p. 48)
Matched-pairs design (p. 50)


Objectives
Section You should be able to Á
Example(s) Review Exercises
1.1 1 Define statistics and statistical thinking (p. 3) pp. 3–4 1
2 Explain the process of statistics (p. 4) 1, 2 7, 14, 15
3 Distinguish between qualitative and quantitative variables (p. 7) 3 11–13
4 Distinguish between discrete and continuous variables (p. 8) 4, 5 11–13
5 Determine the level of measurement of a variable (p. 10) 6 16–19
1.2 1 Distinguish between an observational study and an experiment (p. 15) 1–3 20–21
2 Explain the various types of observational studies (p. 18) pp. 18–19 6, 22
1.3 1 Obtain a simple random sample (p. 23) 1–3 28, 30


Chapter 1 Review 57

1.4 1 Obtain a stratified sample (p. 30)
2 Obtain a systematic sample (p. 31)
3 Obtain a cluster sample (p. 32)
1
2
3
25
26, 29
24
1.5 1 Explain the sources of bias in sampling (p. 38) pp. 38–42 8, 9, 27
1.6 1 Describe the characteristics of an experiment (p. 46)
2 Explain the steps in designing an experiment (p. 47)
3 Explain the completely randomized design (p. 48)
4 Explain the matched-pairs design (p. 50)
1
pp. 47–48
2
3
5
10
31, 33, 34
34

Review Exercises

In Problems 1–5, provide a definition using your own words.

1.
Statistics 2. Population
3.
Sample 4. Observational study
5.
Designed experiment
6.
List and describe the three major types of observational
studies.
7.
What is meant by the process of statistics?
8.
List and explain the three sources of bias in sampling. Provide
some methods that might be used to minimize bias in
sampling.
9.
Distinguish between sampling and nonsampling error.
10.
Explain the steps in designing an experiment.
In Problems 11–13, classify the variable as qualitative or quantitative.
If the variable is quantitative, state whether it is discrete or
continuous.

11.
Number of new automobiles sold at a dealership on a
given day
12.
Weight in carats of an uncut diamond
13.
Brand name of a pair of running shoes
In Problems 14 and 15, determine whether the underlined value is a
parameter or a statistic.

14.
In a survey of 1011 people age 50 or older, 73% agreed with
the statement “I believe in life after death.”
Source: Bill Newcott. “Life after Death,” AARP Magazine,
Sept./Oct. 2007


15. Completion Rate
In the 2007 NCAA Football Championship
Game, quarterback Chris Leak completed 69% of his
passes for a total of 213 yards and 1 touchdown.
In Problems 16–19, determine the level of measurement of each
variable.

16.
Birth year
17.
Marital status
18.
Stock rating (strong buy, buy, hold, sell, strong sell)
19.
Number of siblings
In Problems 20 and 21, determine whether the study depicts an
observational study or a designed experiment.

20.
A parent group examines 25 randomly selected PG-13 movies
and 25 randomly selected PG movies and records the number
of sexual innuendos and curse words that occur in each. They
then compare the number of sexual innuendos and curse
words between the two movie ratings.
21.
A sample of 504 patients in early stages of Alzheimer’s disease
is divided into two groups. One group receives an
experimental drug; the other receives a placebo. The advance
of the disease in the patients from the two groups is tracked
at 1-month intervals over the next year.
22.
Read the following description of an observational study and
determine whether it is a cross-sectional, a case-control, or a
cohort study. Explain your choice.
The Cancer Prevention Study II (CPS-II) examines the
relationship among environmental and lifestyle factors of
cancer cases by tracking approximately 1.2 million men
and women. Study participants completed an initial study
questionnaire in 1982 providing information on a range
of lifestyle factors, such as diet, alcohol and tobacco use,
occupation, medical history, and family cancer history.
These data have been examined extensively in relation to
cancer mortality. The vital status of study participants is
updated biennially.

Source: American Cancer Society

In Problems 23–26, determine the type of sampling used.

23.
On election day, a pollster for Fox News positions herself
outside a polling place near her home. She then asks the first
50 voters leaving the facility to complete a survey.
24.
An Internet service provider randomly selects 15 residential
blocks from a large city. It then surveys every household in
these 15 blocks to determine the number that would use a
high-speed Internet service if it were made available.
25.
Thirty-five sophomores, 22 juniors, and 35 seniors are
randomly selected to participate in a study from 574 sophomores,
462 juniors, and 532 seniors at a certain high
school.
26.
Officers for the Department of Motor Vehicles pull aside
every 40th tractor trailer passing through a weigh station,
starting with the 12th, for an emissions test.

58 Chapter 1
Data Collection

27.
Each of the following surveys has bias. Determine the type of
bias and suggest a remedy.
(a) A politician sends a survey about tax issues to a random
sample of subscribers to a literary magazine.
(b) An interviewer with little foreign language knowledge
is sent to an area where her language is not commonly
spoken.
(c) A data-entry clerk mistypes survey results into his
computer.
28. Obtaining a Simple Random Sample
The mayor of a small
town wants to conduct personal interviews with small business
owners to determine if there is anything the mayor
could to do to help improve business conditions. The following
list gives the names of the companies in the town. Obtain
a simple random sample of size 5 from the companies in the
town.
Allied Tube and Lighthouse Financial Senese’s Winery
Conduit


Bechstien Mill Creek Animal Skyline Laboratory
Construction Co. Clinic

Cizer Trucking Co. Nancy’s Flowers Solus, Maria, DDS
D & M Welding Norm’s Jewelry Trust Lock and Key
Grace Cleaning Papoose Children’s Ultimate Carpet
Service Center

Jiffy Lube Plaza Inn Motel Waterfront Tavern

Levin,Thomas,MD
RiskyBusiness WPAPharmacy
Security


29. Obtaining a Systematic Sample
A quality-control engineer
wants to be sure that bolts coming off an assembly line are
within prescribed tolerances. He wants to conduct a systematic
sample by selecting every 9th bolt to come off the
assembly line. The machine produces 30,000 bolts per day,
and the engineer wants a sample of 32 bolts. Which bolts will
be sampled?
30. Obtaining a Simple Random Sample
Based on the Military
Standard 105E (ANS1/ASQC Z1.4, ISO 2859) Tables, a lot of
91 to 150 items with an acceptable quality level (AQL) of 1%
and a normal inspection plan would require a sample of size
13 to be inspected for defects. If the sample contains no
defects, the entire lot is accepted. Otherwise, the entire lot is
rejected. A shipment of 100 night-vision goggles is received
and must be inspected. Discuss the procedure you would follow
to obtain a simple random sample of 13 goggles to inspect.

31. Ballasts
An electronics company has just developed a new
electric ballast to be used in fluorescent bulbs.To determine if
the new ballast is more energy efficient than the older ballast,
the company randomly divides 200 fluorescent bulbs into two
groups.The group 1 bulbs are to be given the new ballast and
the group 2 bulbs are to be given the old ballast. The amount
of energy required to light each bulb is measured.
(a) What type of experimental design is this?
(b) What is the response variable in this experiment?
(c) What are the treatments?
(d) Which group serves as the control group?
(e) What are the experimental units?
(f)
What role does randomization play in this experiment?
(g) Draw a diagram similar to Figure 7 or 8 to illustrate the
design.
32. Multiple Choice
A common tip for taking multiple-choice
tests is to always pick (b) or (c) if you are unsure. The idea is
that instructors tend to feel the answer is more hidden if it is
surrounded by distractor answers. An astute statistics instructor
is aware of this and decides to use a table of random digits
to select which choice will be the correct answer. If each
question has five choices, use Table I in Appendix A or a
random-number generator to determine the correct answers
for a 20-question multiple-choice exam.
33. Humor in Advertising
A marketing research firm wants to
know whether information presented in a commercial is better
recalled when presented using humor or serious commentary
by adults between 18 and 35 years of age.They will use an
exam that asks questions of 50 subjects about information
presented in the ad. The response variable will be percentage
of information recalled. Create a completely randomized design
to answer the question. Be sure to include a diagram to
illustrate your design.
34.
Describe what is meant by a matched-pairs design. Contrast
this experimental design with a completely randomized design.
CHAPTER TEST
1.
List the four components that comprise the definition of
statistics.
2.
What is meant by the process of statistics?
In Problems 3–5, determine the level of measurement for the
variable and identify if the variable is qualitative or quantitative.
If the variable is quantitative, determine if it is discrete or
continuous.

3.
Time to complete the 500-meter race in speed skating.
4.
Video game rating system by the Entertainment Software
Rating Board (EC, E, E10+, T, M, AO, RP)
5.
The number of surface imperfections on a camera lens.
In Problems 6 and 7, determine whether the study depicts an observational
study or a designed experiment. Identify the response
variable in each case.

6.
A random sample of 30 digital cameras is selected and divided
into two groups. One group uses a brand-name battery,
while the other uses a generic plain-label battery. All variables
besides battery type are controlled. Pictures are taken
under identical conditions and the battery life of the two
groups is compared.
7.
A sports reporter asks 100 baseball fans if Barry Bonds’s 756th
homerun ball should be marked with an asterisk when sent to
the Baseball Hall of Fame.

8.
Contrast the three major types of observational studies in
terms of the time frame when the data are collected.
9.
Compare and contrast observational studies and designed
experiments. Which study allows a researcher to claim
causality?
10.
Explain why it is important to use a control group and blinding
in an experiment.
11.
List the steps required to conduct an experiment.
12.
A tanning company is looking for ways to improve customer
satisfaction. They want to select a simple random sample of
four stores from their 15 franchises in which to conduct customer
satisfaction surveys. Discuss the procedure you would
use, and then use the procedure to select a simple random
sample of size n
=
4. The locations are as follows:
Afton Ballwin Chesterfield Clayton Deer Creek

Ellisville Farmington Fenton Ladue Lake St. Louis

O’Fallon Pevely Shrewsbury Troy Warrenton

13.
A congresswoman wants to survey her constituency regarding
public policy. She asks one of her staff members to
obtain a sample of residents of the district.The frame she has
available lists 9,012 Democrats, 8,302 Republicans, and
3,012 Independents. Obtain a stratified random sample of
8 Democrats, 7 Republicans, and 3 Independents. Be sure to
discuss the procedure used.
14.
A farmer has a 500-acre orchard in Florida. Each acre is
subdivided into blocks of 5. Altogether, there are 2,500
blocks of trees on the farm. After a frost, he wants to get an
idea of the extent of the damage. Obtain a sample of 10
blocks of trees using a cluster sample. Be sure to discuss the
procedure used.
15.
A casino manager wants to inspect a sample of 14 slot machines
in his casino for quality-control purposes. There are
600 sequentially numbered slot machines operating in the
casino. Obtain a systematic sample of 14 slot machines. Be
sure to discuss how you obtained the sample.
16.
Describe what is meant by an experiment that is a completely
randomized design.
17.
Each of the following surveys has bias. Identify the type of
bias.
(a) A television survey that gives 900 phone numbers for
viewers to call with their vote. Each call costs $2.00.
(b) An employer distributes a survey to her 450 employees
asking them how many hours each week, on average,
they surf the Internet during business hours.Three of the
employees complete the survey.
(c) A question on a survey asks, “Do you favor or oppose a
minor increase in property tax to ensure fair salaries for
teachers and properly equipped school buildings?”
(d) A researcher conducting a poll about national politics
sends a survey to a random sample of subscribers to Time
magazine.
18.
The four members of Skylab had their lymphocyte count per
cubic millimeter measured 1 day before lift-off and measured
again on their return to Earth.
Chapter Test 59

(a) What is the response variable in this experiment?
(b) What is the treatment?
(c) What type of experimental design is this?
(d) Identify the experimental units.
(e) Draw a diagram similar to Figure 7 or 8 to illustrate the
design.
19.
Nucryst Pharmaceuticals, Inc., announced the results of its
first human trial of NPI 32101, a topical form of its skin
ointment. A total of 225 patients diagnosed with skin irritations
were randomly divided into three groups as part of a
double-blind, placebo-controlled study to test the effectiveness
of the new topical cream.The first group received a 0.5%
cream, the second group received a 1.0% cream, and the third
group received a placebo. Groups were treated twice daily for
a 6-week period.
Source: www.nucryst.com

(a) What type of experimental design is this?
(b) What is the response variable in this experiment?
(c) What is the factor that is set to predetermined levels?
What are the treatments?
(d) What does it mean for this study to be double-blind?
(e) What is the control group for this study?
(f) Identify the experimental units.
(g) Draw a diagram similar to Figure 7 or 8 to illustrate the
design.
20.
Researchers Katherine Tucker and associates wanted to determine
whether consumption of cola is associated with
lower bone mineral density. They looked at 1,125 men and
1,413 women in the Framingham Osteoporosis Study, which
is a cohort that began in 1971. The first examination in this
study began between 1971 and 1975, with participants returning
for an examination every 4 years. Based on results of
questionnaires, the researchers were able to determine cola
consumption on a weekly basis. Analysis of the results indicated
that women who consumed at least one cola per day (on
average) had a bone mineral density that was significantly
lower at the femoral neck than those who consumed less
than one cola per day. The researchers did not find this
relation in men.
Source: “Colas, but not other carbonated beverages, are associated
with low bone mineral density in older women: The
Framingam Osteoporosis Study,” American Journal of Clinical
Nutrition 84:936–942, 2006

(a) Why is this a cohort study?
(b) What is the response variable in this study? What is the
explanatory variable?
(c) Is the response variable qualitative or quantitative?
(d) The following appears in the article: “Variables that
could potentially confound the relation between carbonated
beverage consumption and bone mineral density
were obtained from information collected (in the questionnaire).”
What does this mean?
(e) Can you think of any lurking variables that should be
accounted for?
(f)
What are the conclusions of the study? Does increased
cola consumption cause a lower bone mineral density?

60 Chapter 1 Data Collection

What Movie
Should I Go To?
One of the most difficult tasks of
surveying is phrasing ques-
tions so that they are not
misunderstood. In addition,
questions must be phrased so that the researcher
obtains answers that allow for meaningful analysis.
We wish to create a questionnaire that can be used to
make an informed decision about whether to attend a
certain movie. Select a movie that you wish to see. If
the movie is still in theaters, make sure that it has
been released for at least a couple of weeks so that it
is likely that a number of people have seen it. Design
a questionnaire to be filled out by individuals who
have seen the movie. You may wish to include ques-
tions regarding the demographics of the respondents
first (such as age, gender, level of education, and so
on).Ask as many questions as you feel are necessary
to obtain an opinion regarding the movie. The ques-
tions can be open or closed.Administer the survey to
at least 20 randomly selected people who have seen
the movie.While administering the survey,keep track
of those individuals who have not seen the movie. In
particular, keep track of their demographic informa-
tion.After administering the survey, summarize your
findings. On the basis of the survey results, do you
think that you will enjoy the movie? Why? Now see
the movie. Did you like it? Did the survey accurately
predict whether you would enjoy the movie? Now
answer the following questions:
(a) What sampling method did you use? Why? Did
you have a frame for the population?
(b) Did you have any problems with respondents
misinterpreting your questions? How could this
issue have been resolved?
(c) What role did the demographics of the respon-
dents have in forming your opinion? Why?
(d) Did the demographics of individuals who did
not see the movie play a role while you were form-
ing your opinion regarding the movie?
(e) Look up a review of the movie by a profes-
sional movie critic. Did the movie critic’s opinion
agree with yours? What might account for the sim-
ilarities or differences in your opinions?
(f) Describe the problems that you had in admin-
istering the survey. If you had to do this survey over
again, would you change anything? Why?
The Chapter Case Study is located on the CD that accompanies this Text.