Unit 1: Statistics and Data
In today's technologically advanced world, we have access to large volumes of data. The first step of data analysis is to accurately summarize all of this data, both graphically and numerically, so that we can understand what the data reveals. To be able to use and interpret the data correctly is essential to making informed decisions. For instance, when you see a survey of opinion about a certain TV program, you may be interested in the proportion of those people who indeed like the program.
In this unit, you will learn about descriptive statistics, which are used to summarize and display data. After completing this unit, you will know how to present your findings once you have collected data. For example, suppose you want to buy a new mobile phone with a particular type of a camera. Suppose you are not sure about the prices of any of the phones with this feature, so you access a website that provides you with a sample data set of prices, given your desired features. Looking at all of the prices in a sample can sometimes be confusing. A better way to compare this data might be to look at the median price and the variation of prices. The median and variation are two ways out of several ways that you can describe data. You can also graph the data so that it is easier to see what the price distribution looks like.
In this unit, you will study precisely this; namely, you will learn both numerical and graphical ways to describe and display your data. You will understand the essentials of calculating common descriptive statistics for measuring center, variability, and skewness in data. You will learn to calculate and interpret these measurements and graphs.
Descriptive statistics are, as their name suggests, descriptive. They do not generalize beyond the data considered. Descriptive statistics illustrate what the data shows. Numerical descriptive measures computed from data are called statistics. Numerical descriptive measures of the population are called parameters. Inferential statistics can be used to generalize the findings from sample data to a broader population.
Completing this unit should take you approximately 7 hours.
Upon successful completion of this unit, you will be able to:
- describe various types of sampling methods to data collection, and apply these methods;
- create and interpret frequency tables;
- display data graphically and interpret the following types of graphs: stem plots, histograms, and boxplots;
- identify, describe, and calculate the following measures of the location of data: quartiles and percentiles;
- identify, describe, and calculate the measures of the center of mean, median, and mode; and
- identify, describe, and calculate the following measures of the spread of data: variance, standard deviation, and range.
1.1: The Science of Statistics and Its Importance
1.1.1: What is Statistics?
Read this brief introduction to the field of statistics and some relevant examples of how statistics can lend credibility to making arguments. Complete the practice questions in these sections.
1.1.2: Descriptive and Inferential Statistics
Read these sections and complete the questions at the end of each section. Here, we introduce descriptive statistics using examples and discuss the difference between descriptive and inferential statistics. We also talk about samples and populations, explain how you can identify biased samples, and define differential statistics.
Read section 1 from chapter 1 to further enhance your understanding of the elements of descriptive and inferential statistics. This section will introduce some of the key concepts in statistics and has numerous exercise and examples. Complete the odd-numbered exercises before checking the answers.
1.1.3: Types of Data and Their Collection
Read these sections and complete the questions at the end of each section. This section introduces several types of data and their distinguishing features. You will learn about independent and dependent variables and how common data can be coded and collected.
This section talks about how data can be presented. Attempt the exercises and check your answers.
1.2: Methods for Describing Data
1.2.1: Graphical Methods for Describing Quantitative Data
Read these sections and complete the questions at the end of each section. First, we'll look at the available methods to portray distributions of quantitative variables. Then, we'll introduce the stem and leaf plot and how to capture the frequency of your data. We'll also discuss box plots for the purpose of identifying outliers and for comparing distributions and bar charts for quantitative variables. Finally, we'll talk about line graphs, which are based on bar graphs.
This section elaborates on how to describe data. In particular, you will learn about the relative frequency histogram. Complete the exercises and check your answers.
1.2.2: Numerical Measures of Central Tendency and Variability
Read these sections and complete the questions at the end of each section. First, we will define central tendency and introduce mean, median, and mode. We will then elaborate on median and mean and discusses their strengths and weaknesses in measuring central tendency. Finally, we'll address variability, range, interquartile range, variance, and the standard deviation.
This section elaborates on mean, median, and mode at the population level and sample level. This section also contains many interesting examples of range, variance, and standard deviation. Complete the exercises and check your answers.
Watch this video series, which begins with a discussion on descriptive statistics and inferential statistics and then talks about mean, median, and mode, as well as sample variance.
1.2.3: Methods for Describing Relative Standing
This section discusses percentiles, which are useful for describing relative standings of observations in a dataset.
1.2.4: Methods for Describing Bivariate Relationships
Watch this tutorial to learn how to create the scatter plot for bivariate data using two variables, x and y.
This section introduces Pearson's correlation and explains what the typical values represent. It then elaborates on the properties of r, particularly that it is invariant under linear transformation. Finally, it introduces several formulas we can use to compute Pearson's correlation.
Unit 1 Assessment
- Receive a grade
Take this assessment to see how well you understood this unit.
- This assessment does not count towards your grade. It is just for practice!
- You will see the correct answers when you submit your answers. Use this to help you study for the final exam!
- You can take this assessment as many times as you want, whenever you want.