### Unit 1: Statistics and Data

In today's technologically advanced world, we have access to large volumes of data. The first step of data analysis is to accurately summarize all of this data, both graphically and numerically, so that we can understand what the data reveals. To be able to use and interpret the data correctly is essential to making informed decisions. For instance, when you see a survey of opinion about a certain TV program, you may be interested in the proportion of those people who indeed like the program.

In this unit, you will learn about descriptive statistics, which are used to summarize and display data. After completing this unit, you will know how to present your findings once you have collected data. For example, suppose you want to buy a new mobile phone with a particular type of a camera. Suppose you are not sure about the prices of any of the phones with this feature, so you access a website that provides you with a sample data set of prices, given your desired features. Looking at all of the prices in a sample can sometimes be confusing. A better way to compare this data might be to look at the median price and the variation of prices. The median and variation are two ways out of several ways that you can describe data. You can also graph the data so that it is easier to see what the price distribution looks like.

In this unit, you will study precisely this; namely, you will learn both numerical and graphical ways to describe and display your data. You will understand the essentials of calculating common descriptive statistics for measuring center, variability, and skewness in data. You will learn to calculate and interpret these measurements and graphs.

Descriptive statistics are, as their name suggests, descriptive. They do not generalize beyond the data considered. Descriptive statistics illustrate what the data shows. Numerical descriptive measures computed from data are called statistics. Numerical descriptive measures of the population are called parameters. Inferential statistics can be used to generalize the findings from sample data to a broader population.

**Completing this unit should take you approximately 22 hours.**

### 1.1: The Science of Statistics and Its Importance

### 1.1.1: What is Statistics?

Read sections 2 and 3 from chapter 1. Section 2 provides a brief introduction to the field of statistics and some relevant examples. Section 3 presents more examples of how statistics can lend credibility to making arguments. Also, complete the questions in these sections.

### 1.1.2: Descriptive and Inferential Statistics

Read sections 4 and 5 from chapter 1, and then complete the questions at the end of each section. Section 4 introduces descriptive statistics by using examples and discusses the difference between descriptive and inferential statistics. Section 5 talks about samples and populations, explains how one can identify biased samples, and defines differential statistics.

Read section 1 from chapter 1 to further enhance your understanding of the elements of descriptive and inferential statistics. This section will introduce some of the key concepts in statistics and has numerous exercise and examples. Complete the odd-numbered exercises before checking the answers.

### 1.1.3: Types of Data and Their Collection

Read section 7 from chapter 1 and section 4 from chapter 6. Also, complete the questions at the end of each section. Section 7 will introduce several types of data and their distinguishing features. You will also learn about independent and dependent variables. Section 4 will explain how common data can be coded and collected.

Study section 3 from chapter 1. This reading talks about ways that data can be presented. Attempt the odd-numbered exercises on page before checking the answers.

### 1.2: Methods for Describing Data

### 1.2.1: Graphical Methods for Describing Quantitative Data

Read sections 3-7, 9, and 10 from chapter 2. Also, complete the questions at the end of each section. Section 3 provides an overview of the available methods to portray distributions of quantitative variables. Section 4 introduces you to the stem and leaf plot. In sections 5 and 6, you will learn how to capture the frequency of your data. Section 7 discusses box plots for the purpose of identifying outliers and for comparing distributions. Section 9 discusses bar charts for quantitative variables. Section 10 talks about the method of line graphs, which is based on bar graphs.

Read section 1 from chapter 2. This reading further elaborates on ways of describing data. In particular, you will learn about the relative frequency histogram. Complete the odd-numbered exercises on before checking the answers.

### 1.2.2: Numerical Measures of Central Tendency and Variability

Read sections 2, 4, 8, 12, and 13 from chapter 3. Also, complete the questions at the end of each section. Section 2 defines the concept of central tendency. Section 4 introduces mean, median, and mode in the context of examples. Section 8 further elaborates on median and mean and discusses their strengths and weaknesses in measuring the central tendency. Section 12 addresses the concept of variability. Section 13 discusses range, interquartile range, variance, and the standard deviation.

Read sections 2 and 3 from chapter 2. Section 2.2 further elaborates on mean, median, and mode - both at the population level and sample level. This section contains many interesting examples and exercises. Section 2.3 talks about range, variance, and standard deviation using many examples. Complete the odd-numbered problems in the exercise sets for each section before checking the answers.

Watch this video series, which begins with a discussion on descriptive statistics and inferential statistics and then talks about mean, median, and mode, as well as sample variance.

### 1.2.3: Methods for Describing Relative Standing

Read section 8 of chapter 1. Also, complete the questions at the end of the section. This reading discusses percentiles, which are useful for describing relative standings of observations in a dataset. This reading presents several definitions, so make sure to take notes.

### 1.2.4: Methods for Describing Bivariate Relationships

Watch this video tutorial to learn how to create the scatter plot for bivariate data, using two variables x and y. It may be useful to review the definitions on this webpage.

Read sections 3, 5, and 6 from chapter 4. Also, complete the questions at the end of each section. Section 3 introduces Pearson's correlation and explains what the typical values represent. Section 5 further elaborates on the properties of r, particularly the fact that it is invariant under linear transformation. Section 6 introduces several formulas that can be used to compute Pearson's correlation.

### End of Unit Assessment

Please take this assessment to check your understanding of the materials presented in this unit.

**Notes:****There is no minimum required score to pass this assessment, and your score on this assessment**__will not__factor into your overall course grade.**This assessment is designed to prepare you for the Final Exam that will determine your course grade. Upon submission of your assessment you will be provided with the correct answers and/or other feedback meant to help in your understanding of the topics being assessed.****You may attempt this assessment as many times as needed, whenever you would like.**