Variables and Data Collection

Site: Saylor Academy
Course: MA121: Introduction to Statistics
Book: Variables and Data Collection
Printed by: Guest user
Date: Friday, April 19, 2024, 2:08 AM

Description

Read these sections and complete the questions at the end of each section. This section introduces several types of data and their distinguishing features. You will learn about independent and dependent variables and how common data can be coded and collected.

Variables

Learning Objectives

  1. Define and distinguish between independent and dependent variables
  2. Define and distinguish between discrete and continuous variables
  3. Define and distinguish between qualitative and quantitative variables

Source: Heidi Ziemer, https://onlinestatbook.com/2/introduction/variables.html
Public Domain Mark This work is in the Public Domain.

Independent and dependent variables

Variables are properties or characteristics of some event, object, or person that can take on different values or amounts (as opposed to constants such as π that do not vary). When conducting research, experimenters often manipulate variables. For example, an experimenter might compare the effectiveness of four types of antidepressants. In this case, the variable is "type of antidepressant". When a variable is manipulated by an experimenter, it is called an independent variable. The experiment seeks to determine the effect of the independent variable on relief from depression. In this example, relief from depression is called a dependent variable. In general, the independent variable is manipulated by the experimenter and its effects on the dependent variable are measured.

Example #1: Can blueberries slow down aging? A study indicates that antioxidants found in blueberries may slow down the process of aging. In this study, 19-month-old rats (equivalent to 60-year-old humans) were fed either their standard diet or a diet supplemented by either blueberry, strawberry, or spinach powder. After eight weeks, the rats were given memory and motor skills tests. Although all supplemented rats showed improvement, those supplemented with blueberry powder showed the most notable improvement.

  1. What is the independent variable? (dietary supplement: none, blueberry, strawberry, and spinach)
  2. What are the dependent variables? (memory test and motor skills test)

Example #2: Does beta-carotene protect against cancer? Beta-carotene supplements have been thought to protect against cancer. However, a study published in the Journal of the National Cancer Institute suggests this is false. The study was conducted with 39,000 women aged 45 and up. These women were randomly assigned to receive a beta-carotene supplement or a placebo, and their health was studied over their lifetime. Cancer rates for women taking the beta-carotene supplement did not differ systematically from the cancer rates of those women taking the placebo.

1. What is the independent variable? (supplements: beta-carotene or placebo)
2. What is the dependent variable? (occurrence of cancer)


Example #3: How bright is right? An automobile manufacturer wants to know how bright brake lights should be in order to minimize the time required for the driver of a following car to realize that the car in front is stopping and to hit the brakes.

  1. What is the independent variable? (brightness of brake lights)
  2. What is the dependent variable? (time to hit brakes)


Levels of an Independent Variable

If an experiment compares an experimental treatment with a control treatment, then the independent variable (type of treatment) has two levels: experimental and control. If an experiment were comparing five types of diets, then the independent variable (type of diet) would have 5 levels. In general, the number of levels of an independent variable is the number of experimental conditions.


Qualitative and Quantitative Variables

An important distinction between variables is between qualitative variables and quantitative variables. Qualitative variables are those that express a qualitative attribute such as hair color, eye color, religion, favorite movie, gender, and so on. The values of a qualitative variable do not imply a numerical ordering. Values of the variable "religion" differ qualitatively; no ordering of religions is implied. Qualitative variables are sometimes referred to as categorical variables. Quantitative variables are those variables that are measured in terms of numbers. Some examples of quantitative variables are height, weight, and shoe size.

In the study on the effect of diet discussed above, the independent variable was type of supplement: none, strawberry, blueberry, and spinach. The variable "type of supplement" is a qualitative variable; there is nothing quantitative about it. In contrast, the dependent variable "memory test" is a quantitative variable since memory performance was measured on a quantitative scale (number correct).


Discrete and Continuous Variables

Variables such as number of children in a household are called discrete variables since the possible scores are discrete points on the scale. For example, a household could have three children or six children, but not 4.53 children. Other variables such as "time to respond to a question" are continuous variables since the scale is continuous and not made up of discrete steps. The response time could be 1.64 seconds, or it could be 1.64237123922121 seconds. Of course, the practicalities of measurement preclude most measured variables from being truly continuous.

Questions

Question 1 out of 6.

Which of the following are qualitative variables?

  • height measured in number of feet
  • weight measured in number of pounds
  • number of days it snowed
  • hair color
  • gender
  • average daily temperature


Question 2 out of 6.

In a study of the effect of handedness on athletic ability, participants were divided into three groups: right-handed, left-handed, and ambidextrous. Athletic ability was measured on a 12-point scale. The independent variable is _________; the number of levels of the independent variable is _______.

  • athletic ability; three
  • athletic ability; twelve
  • handedness; three
  • handedness; twelve


Question 3 out of 6.

In a study of the effect of handedness on athletic ability, participants were divided into three groups: right-handed, left-handed, and ambidextrous. Athletic ability was measured on a 12-point scale. The dependent variable is

  1.  handedness.
  2.  athletic ability.
  3.  not described.
  4.  both a and b


Question 4 out of 6.

In a study of the effect of handedness on athletic ability, participants were divided into three groups: right-handed, left-handed, and ambidextrous. Athletic ability was measured on a 12-point scale. Check all that apply. The variable athletic ability is

  • discrete.
  • qualitative.
  • continuous.
  • quantitative.
  • a dependent variable.
  • an independent variable.


Question 5 out of 6.

In an experiment on the effect of sleep on memory, the independent variable is

  • number of hours of sleep
  • recall score on a memory test
  • gender of the subjects
  • gender of the experimenter


Question 6 out of 6.

In an experiment on the effect of sleep on memory, the dependent variable is

  • number of hours of sleep
  • recall score on a memory test
  • gender of the subjects
  • gender of the experimenter

Answers


  1. Answer: The qualitative variables are hair color and gender.

  2. The independent variable is handedness. Since there are three types of handedness, the number of levels is 3.

  3.  Athletic ability is the dependent variable.

  4. The variable is discrete because the rating scale contains only 12 points and therefore is not continuous. Athletic ability is a discrete quantitative dependent variable.

  5. The independent variable is the number of hours of sleep.

  6. The dependent variable is the recall score on a memory test.

Basics of Data Collection

Learning Objectives

  1. Describe how a variable such as height should be recorded
  2. Choose a good response scale for a questionnaire

Most statistical analyses require that your data be in numerical rather than verbal form (you can't punch letters into your calculator). Therefore, data collected in verbal form must be coded so that it is represented by numbers. To illustrate, consider the data in Table 1.

Table 1. Example Data

Student Name Hair Color Gender Major Height Computer Experience
Norma Brown Female Psychology 5'4" Lots
Amber Blonde Female Social Science 5'7" Very little
Paul Blonde Male History 6'1" Moderate
Christopher Black Male Biology 5'10" Lots
Sonya Brown Female Psychology 5'4" Little


Can you conduct statistical analyses on the above data or must you re-code it in some way? For example, how would you go about computing the average height of the 5 students. You cannot enter students' heights in their current form into a statistical program - the computer would probably give you an error message because it does not understand notation such as 5'4". One solution is to change all the numbers to inches. So, 5'4" becomes (5 \times 12)+4=64, and 6'1" becomes (6 \times 12)+1=73, and so forth. In this way, you are converting height in feet and inches to simply height in inches. From there, it is very easy to ask a statistical program to calculate the mean height in inches for the 5 students.

You may ask, "Why not simply ask subjects to write their height in inches in the first place?" Well, the number one rule of data collection is to ask for information in such a way as it will be most accurately reported. Most people know their height in feet and inches and cannot quickly and accurately convert it into inches "on the fly". So, in order to preserve data accuracy, it is best for researchers to make the necessary conversions.

Let's take another example. Suppose you wanted to calculate the mean amount of computer experience for the five students shown in Table 1. One way would be to convert the verbal descriptions to numbers as shown in Table 2. Thus, "Very Little" would be converted to "1" and "Little" would be converted to "2".

Table 2. Conversion of verbal descriptions to numbers.

1 2 3 4 5
Very Little Little Moderate Lots Very Lots


Measurement Examples

Example #1: How much information should I record?

Say you are volunteering at a track meet at your college, and your job is to record each runner's time as they pass the finish line for each race. Their times are shown in large red numbers on a digital clock with eight digits to the right of the decimal point, and you are told to record the entire number in your tablet. Thinking eight decimal places is a bit excessive, you only record runners' times to one decimal place. The track meet begins, and runner number one finishes with a time of 22.93219780 seconds. You dutifully record her time in your tablet, but only to one decimal place, that is 22.9. Race number two finishes and you record 32.7 for the winning runner. The fastest time in Race number three is 25.6. Race number four winning time is 22.9, Race number five is…. But wait! You suddenly realize your mistake; you now have a tie between runner one and runner four for the title of Fastest Overall Runner! You should have recorded more information from the digital clock - that information is now lost, and you cannot go back in time and record running times to more decimal places.

The point is that you should think very carefully about the scales and specificity of information needed in your research before you begin collecting data. If you believe you might need additional information later but are not sure, measure it; you can always decide to not use some of the data, or "collapse" your data down to lower scales if you wish, but you cannot expand your data set to include more information after the fact. In this example, you probably would not need to record eight digits to the right of the decimal point. But recording only one decimal digit is clearly too few.


Example #2

Pretend for a moment that you are teaching five children in middle school (yikes!), and you are trying to convince them that they must study more in order to earn better grades. To prove your point, you decide to collect actual data from their recent math exams, and, toward this end, you develop a questionnaire to measure their study time and subsequent grades. You might develop a questionnaire which looks like the following:

  1. Please write your name: ____________________________
  2. Please indicate how much you studied for this math exam:
    a lot……………moderate……….…….little
  3. Please circle the grade you received on the math exam:
    A  B  C  D   F

Given the above questionnaire, your obtained data might look like the following:

Name Amount Studied Grade
John Little C
Sally Moderate B
Alexander Lots A
Linda Moderate A
Thomas Little B


Eyeballing the data, it seems as if the children who studied more received better grades, but it's difficult to tell. "Little," "lots," and "B," are imprecise, qualitative terms. You could get more precise information by asking specifically how many hours they studied and their exact score on the exam. The data then might look as follows:

Name Hours studied % Correct
John 5 71
Sally 9 83
Alexander 13 97
Linda 12 91
Thomas 7 85


Of course, this assumes the students would know how many hours they studied. Rather than trust the students' memories, you might ask them to keep a log of their study time as they study.


Questions

Question 1 out of 2.

You should always record data to as many decimal places as possible.

  • false
  • true


Question 2 out of 2.

If you wished to know how long since your subjects had last eaten, it would be better to ask them

  • What time they last ate and what time it is now.
  • How many minutes since they had last eaten.

Answers

  1. false
    For many experiments you do not need to record more than a few decimal places. However, make sure to record enough.

  2. What time they last ate and what time it is now.
    Subjects may make errors in arithmetic. It is better to compute the difference yourself.