### Course Introduction

Welcome to the amazing world of statistics! You might be thinking that the topic is just about a bunch of charts, graphs, and odd-looking formulas, but in fact, it is a fascinating and challenging field of study. In this course, we will indeed study those charts and graphs, and yes, that array of complex formulas. But beyond those tools, we will find an entire new way of thinking, a new way of approaching and understanding the world around us. We will learn why taking aspirin helps lower the risk and severity of a heart attack; how researchers have determined that the more friends you have on a social networking site, the more likely you are to have fewer friends in real life; and how political pollsters almost always know the outcome of an election even before the polls open.

The course is divided into 10 units of study. The first two units are devoted to simple statistical calculations and graphical representations of data. Most of this material will be familiar to you from previous math or science courses. Unit 3 is devoted to a foundational concept of statistics, which is the study of probability. Unit 4 will introduce you to random variables and a very important distribution called the binomial distribution. Unit 5 will focus entirely on one topic: the bell curve. You may have studied the bell curve, also called the normal distribution, in other courses, but this unit will make sure that you are confident and competent in knowing its properties, its uses, and its central importance to all of the material in the rest of the course and in the entire field of statistics.

The first five units build the foundation of concepts, vocabulary, knowledge, and skills for success in the remainder of the course. In the final five units, we will take the plunge into the domain of inferential statistics, where we make statistical decisions based on the data that we have collected. In Unit 6, we will learn how to design statistically sound experiments and studies, in order to collect valid, reliable data. In Unit 7 and Unit 8, we will learn how to analyze the data, using confidence intervals and hypothesis tests, to make statistically sound decisions and inferences about our results.

The final two units will be devoted to two topics frequently used in statistical research: linear regression and chi-square analysis. These exotic-sounding topics will act as springboards for your further study of the discipline in your college undergraduate or graduate programs.

We will use a variety of resources in addition to the text. An online course needs to be multidimensional so that you won't be lulled into a daily grind of textbook reading and doing homework problems. In light of this, you will supplement your course text with video lessons, interactive applets, and research into some of statistics' most interesting, controversial, and fascinating experiments and studies. By the end of the course, you will have mastered the foundational concepts of a field of endeavor that will assist you in studying and understanding the world around you as never before.

### Unit 1: Introduction to Statistical Analysis

Every discipline has its own unique vocabulary and introductory concepts. In Unit 1, you will be introduced to several topics that you have studied in previous math or science courses. But be cautious about skimming through the material! You will see these topics (and some new ones) introduced in a

*statistical*context, and this treatment will lay the foundation for the rest of the course. Pay special attention to the concepts of population versus sample. Commit the symbols for means and standard deviations to memory. Don't worry about memorizing the formulas; they will be provided to you on a standardized formula sheet for use on checkpoints, assessments, and the AP exam.**Completing this unit should take you approximately 5 hours.**### Unit 2: Visualizations of Data

A picture is worth a thousand words. In Unit 2, you will learn how very true this is. You will learn that every data set has a story to tell, and you will begin the telling of that story by its graphical representation. Just as a photographer chooses the best lighting, pose, and camera setting for conveying an idea or mood, so does the statistician choose the most appropriate graph or chart for best conveying the story that the data are trying to tell.

In this unit, you will do more than choose the right graph; you will also begin to look behind the scenes to discern some characteristics not obvious to a casual observer. You will use the concepts of center, shape, and spread of a data distribution to get both the bigger picture and the sharper focus of what the data are saying.

**Completing this unit should take you approximately 6 hours.**### Unit 3: Probability

Penelope wants to be a lawyer and is trying to decide which of two colleges to attend. College A costs considerably more than College B, but a higher percentage of graduates of College A is accepted into the law school than the percentage of graduates of College B. Also, the prelaw curriculum is more difficult at College A than College B, so she might not have as competitive a GPA if she attends College A. Penelope is overwhelmed with the comparisons of cost, acceptance rates, and so forth, while trying to decide which college to attend.

While this is a personal decision for Penelope, her knowledge of probability theory will help her make the decision and be satisfied that she made the best decision for her unique circumstances.

Probability is everywhere, and you make decisions every day based on your own personal probabilities or on the probabilities calculated for you or by you. Should you drive in the rain and face a higher accident probability, or should you choose to wait until the rain subsides and risk being late for an important job interview? Should you buy a cheaper used car that might soon need repairs or a more expensive new car with an extended warranty? Should you enroll in an SAT prep course to improve your test scores?

The study of probability in Unit 3 will give you tools for understanding the use of probability theory in making both statistical and personal decisions. You will learn about sample spaces and mutually exclusive and independent events and how to calculate their probabilities.

The critical concept of conditional probability will also be addressed. You will put all of these concepts together to have a solid understanding of probability, and this foundation will provide you with the skills to use probability concepts later in the course and in life to help make sound decisions.

**Completing this unit should take you approximately 11 hours.**### Unit 4: Discrete Probability Distributions

You received a check from Aunt Matilda for your birthday, and you want to go to the bank to deposit it during your 45-minute lunch break. You know that it takes you 10 minutes to drive to the bank and 10 minutes to get back to school. You know that sometimes there is no waiting line at the bank and sometimes there is a really long line, and you might drive all the way there and find out that you don't have time to wait in line before it's time to go back to school. Your knowledge of probability distributions and expected values from this unit will help you make the statistically optimal decision.

Probability distributions give us condensed information about possible outcomes of both mathematical and practical situations. Probability distributions aid us in making sound decisions about waiting in line, buying raffle tickets, and taking prescription medications.

The binomial distribution is one type of probability distribution that has a myriad of applications. What guessing strategy should I use on the SAT? Will I get into the college I want, or won't I? What are the chances that the football team will win every one of its home games this season? These types of situations lend themselves to an analysis of the underlying binomial distribution. You might be surprised at the answers!

**Completing this unit should take you approximately 11 hours.**### Unit 5: The Normal Distribution

Your SAT scores are here! You scored 520 on Math and 600 on Verbal. Should you jump for joy, or should you groan and sign up for a retest date?

While there are many factors affecting your reaction, it will always be useful to know how your performance compares to that of others who took the test.

SAT scores, IQ scores, cockroach lengths, gestation time of elephants, and a multitude of other measures follow a bell shape, or normal, distribution. In this unit, we will study the normal curve in great detail. Knowledge of the normal curve, its characteristics, and the calculation of probabilities for variables that are normally distributed is critical to success in this course. In addition, you will find out how easy it is to determine how well you fared on the SAT or if the used car you want to buy is overpriced.

**Completing this unit should take you approximately 8 hours.**### Unit 6: Conducting Experiments and Studies

A recent study revealed that breast-fed babies have higher IQs than babies that were not breast-fed.

What was your reaction to this study? Did you immediately think that any baby you ever have will be breast-fed so that he or she will have a high IQ? Or did you see that there are other issues to consider that the outcome did not address?

Unit 6 will sharpen your ability to "talk back to a statistic.” You will learn the critical differences between a study and an experiment and how to understand the difference between a correlation between two variables and a cause-and-effect relationship between two variables. Big difference!

You will learn about the importance of taking a representative sample from a population and how to design an experiment that measures what you really intend to measure. You will learn to consider issues affecting a study that may cloud your results.

**Completing this unit should take you approximately 5 hours.**### Unit 7: Sampling Distributions and Estimations

"Four out of five dentists surveyed recommend sugarless gum for their patients who chew gum.”

This is a statement that has been part of a television commercial for many years. Reread the statement. Does this mean that they asked just 5 dentists, and 4 of them recommended sugarless gum? Or did they ask 50 dentists, and 40 of them recommended sugarless gum? Or did they ask 50,000 dentists? Does it matter? Would you trust the statement more if they had asked 50,000 dentists instead of 5?

In Unit 7, you will learn that it certainly does make a difference. You will find out that if you have a representative sample, a large sample size will yield more trustworthy results. This is the realm of sampling distributions and confidence intervals. These unwieldy sounding concepts provide basic tools for statisticians to use when they decide how big a sample they need and how much "error” they can be comfortable with.

Political pollsters use margins of error when they report approval ratings for politicians (e.g., 42% approval rate, ± 3%) to give a complete picture of opinion data that they have collected. You will learn why these are important and how to calculate them yourself.

**Completing this unit should take you approximately 10 hours.**### Unit 8: Hypothesis Testing

A student competing in the science fair performs an experiment on the growth of tomato plants. She applies fertilizer to one group of tomato plants and no fertilizer to another group. At the end of the experiment, she weighs all of the tomatoes grown in each group. The fertilizer group averages 5.2 pounds per tomato plant, and the nonfertilizer group averages 4.9 pounds per tomato plant.

It is obvious that the fertilizer group has produced a higher mean yield, but is it just slightly higher or significantly higher? Could the difference in the two yields be attributed to just chance, or did the use of the fertilizer make a real difference? Could the fertilizer company use these results to claim that their fertilizer will definitely contribute to a higher plant yield?

This is an example of the type of experimental situation that you will encounter in analyzing data and using the results to make an inference about an entire population.

In previous units you learned about probability, sampling, experimental design, and various statistical distributions. In Unit 8, you will put all of this information together. You will perform actual statistical tests of hypotheses and make decisions based on the experimental outcomes. You will be using the same analysis techniques and tools that professional researchers use on a daily basis. This unit is the culmination of all previous units, and its mastery will stand you in good stead for research you might conduct in the future.

**Completing this unit should take you approximately 11 hours.**### Unit 9: Regression and Correlation

Up to this point we have studied univariate data. This type of data studies one item, such as weight loss or IQ or miles per gallon. But in Unit 9, we expand our study of statistics to include bivariate data, which is the relationship between two numerical measures. These might include the relationship between calories eaten and pounds lost, or the number of hours studied and the score on an exam. The study of bivariate data allows us to see if knowledge of one variable (the number of registered boats in Florida, for example) can predict the value of another variable (the number of manatees killed) with some degree of accuracy.

In Unit 9, we will learn how to create a scatterplot from our data and then use sophisticated mathematical techniques to determine the best linear equation that relates the two variables. We will calculate the correlation between the variables and test hypotheses about the strength of the linear relationship between them.

**Completing this unit should take you approximately 10 hours.**### Unit 10: Chi-Square

You open a bag of plain M&Ms and your friend opens his bag of peanut M&Ms, and you immediately notice that the two bags appear to have different proportions of the different colors. Each of you calculates the proportions of each color, and you are surprised to see that they are quite different for the two candy types. You wonder if these bags are just unusual - maybe the machine was out of whack at the factory when the bags were filled - or if the M&M people intentionally put different proportions of colors into the plain M&M bags compared to the peanut M&Ms.

While this example is not exactly akin to the quest for curing world hunger, it poses a question about categorical data that actually has many practical and important implications. Up to this point, we have only been able to do hypothesis tests for one proportion against a particular value, or for the equality of two proportions. But Unit 10 allows us to compare multiple proportions. In our example, we can compare the proportions of brown, blue, red, orange, green, and yellow candies all at the same time. We use the chi-square procedures for doing this.

The chi-square procedures have been used to test Mendelian genetics, beverage preference tests, and whether drivers of red cars really do get more tickets than other drivers.

**Completing this unit should take you approximately 4 hours.**