### Course Introduction

This course will introduce you to business statistics, or the application of statistics in the workplace. Statistics is a course in the methods for gathering, analyzing, and interpreting data. If you have taken a statistics course in the past, you may find some of the topics in this course familiar. You can apply statistics to any number of fields - from anthropology to hedge fund management - because many of us best interpret data when it is presented in an organized fashion (as it is with statistics). You can analyze data in any number of forms. Summary statistics, for example, provide an overview of a data set, such as the average score on an exam. However, the average does not always tell the entire story; for example, if the average score is 80, it may be because half of the students received 100s and the other half received 60s. This would present a much different story than if everyone in the class had received an 80, which demonstrates consistency. Statistics provides more than simple averages. In this course, you will learn how to apply statistical tools to analyze data, draw conclusions, and make predictions of the future. The course will begin with data distributions, followed by probability analysis, sampling, hypothesis testing, inferential statistics, and, finally, regression. This course is mathematically intensive, and much of what you learn here will deal with things you encounter every day. This course also makes use of spreadsheets, an important tool for working with and making sense of numerical data.

### Unit 1: Introduction to Statistical Analysis

Statistics may appear to be a difficult, even scary, subject. You will find, however, that you are already familiar with the fundamentals of statistics from your life experience. For instance, from your experience, you know that the majority of adult males have the same shoe size, which is very close to the average size, and that there are a few adult males on both sides of the average (small and large size). In statistics, this phenomenon shown from the data pattern is said to be a variable that follows a normal distribution.

This unit will provide an introduction to statistical analysis and how it relates to business. For example, you may be interested in learning about the average price of a 50-inch digital TV by gathering the price for it from 30 different stores. You take your 30 prices and compute the average price. Given the fact that there are thousands of stores that are selling that particular product, the next question in statistics is: Are you confident enough to say that your computed average is reflective of the real average that would be computing from all the existing prices for that TV sold at all stores?

You are probably familiar with the average of a data set. In this course, we will refer to what most people call the average as the arithmetic mean. The average is actually any single value used to describe the middle of a data set. The most common averages used in statistics are the arithmetic mean, the median, and the mode. Each describes the middle of a dataset in different ways. For example, the median is the numeric value that separates the upper and lower half of a data set. The mean is the sum of all values divided by the number of values. The mode is the most common value within the dataset.

In many instances, the median and the mean are similar, but this introductory unit will also identify many examples where it is not. The distinction between summary statistics is important in business statistics. This unit will define various terms that you may not be familiar with, such as variance and outliers. Understanding this vocabulary will be vital to the successful completion of this course.

**Completing this unit should take you approximately 7 hours.**### Unit 2: Counting, Probability, and Probability Distributions

What is the likelihood that an event will occur? What are the chances that a given student will receive a 60-69 score? By studying distributions of data, you can determine the probability that a certain event will occur. By looking at the distribution of grades in a class, you can identify the probability that a student will receive between a 60 and 69. The applications of probability in business are infinite; from predicting profits to determining the chances that a business model will affect regulation, businesses use probability to make decisions frequently.

Before you can focus on probability, you must first learn how to count. What's that you say? You already know how to count? Maybe - but in this unit you will learn techniques for counting the different ways that multiple events can occur together. These are called "Combinations” and "Permutations,” and they are a fundamental concept needed to fully understand probability.

**Completing this unit should take you approximately 19 hours.**### Unit 3: The Normal Distribution

A distribution is a line graph representation of the probability that an event will occur. It is similar to a histogram, but in a distribution, the user does not determine the grouping; instead, data is grouped according to the likelihood that it will occur within the dataset. Distributions also allow for analysis of a specific event, whereas a histogram requires events be grouped.

An important type of this distribution is the "normal" distribution. The normal distribution is used to approximate real-world occurrences. If you can make certain assumptions about the occurrence of an event, then you can use the normal distribution to find out the probabilities of that event occurring. Many of the events that are important to business can be approximated using the normal distribution.

**Completing this unit should take you approximately 5 hours.**### Unit 4: Sampling and Sampling Distributions

While you may not become a professional data gatherer, it is likely that you will need to compile data on a regular basis. When gathering data, you will not always have the luxury of collecting all available data. For example, economists cannot measure the entire unemployment of the population, so they must take a random sample instead. Likewise, in a manufacturing facility, quality control managers do not have the resources to test every product that comes off the line; it is simply not feasible. Instead, they take samples at various points during the production process to test the quality of the products the firm produces.

There are a number of methods employed in sampling data. It is important that the sampling method fits the application. For example, marketing managers may wish to test a product on various groups of people. They may define these groups by age, race, geography, income, or any other factors. They then divide the population into these groups and take samples from each group in a process known as cluster sampling. If marketers do not properly divide the population, they may end up marketing to the wrong demographic and achieving poor sales.

**Completing this unit should take you approximately 4 hours.**### Unit 5: Estimation and Hypothesis Testing

Estimation is the process of making predictions based on the best available information. Businesses employ estimation in order to help managers make decisions regarding the future. For example, if the CFO estimates profits will be lower next year, the CEO will consider cost-cutting measures to make up for the loss. Normally, companies do not want to pursue aggressive cost-cutting because it usually comes in the form of layoffs, which are bad for employee morale.

In order to make accurate estimates, companies use hypothesis testing. For example, assume the CFO thinks profits will be below 5% of revenue next year. His null hypothesis is that profits will be 5% or greater next year. The alternative hypothesis is that profits will not be 5% or greater next year. This seems counter-intuitive, but statistics proposes that a hypothesis cannot be proven true; it can only be rejected, or shown to be not true. Through the hypothesis testing process, the CFO will either reject or accept the null hypothesis. Hypothesis tests are always framed in this manner because, with imperfect information, nothing can be proven.

Note: The best non-business analogy to hypothesis testing comes from the courtroom. In the United States, a defendant is presumed innocent until proven guilty. The null hypothesis in this scenario is innocent or not guilty. The alternative hypothesis is guilty. In order to find the defendant guilty, the jury must be offered enough evidence that suggests the defendant is guilty beyond a reasonable doubt. If the members of the jury make that decision, then they reject the null hypothesis. If the jury members decide they do not have enough evidence to make that judgment, then they must find the defendant not guilty. Notice not guilty does not mean the jury claims the defendant is innocent. The decision simply means the members of the jury do not have enough information to find the person guilty, so they err on the side of caution and fail to reject the null hypothesis. As an aside, in this example, beyond a reasonable doubt is analogous to the level of significance, which you will learn is crucial to hypothesis testing.

**Completing this unit should take you approximately 12 hours.**### Unit 6: Correlation and Regression

If two data points move in the same direction, does that mean that one causes the other? How are we to analyze their correlation?

Regression is an analysis of the relationship of one variable to another. A regression might identify, for example, the relationship between car speed and the number of fatal accidents. In this example, speed and number of accidents are the two variables; the number of accidents is said to be the dependent variable, because the number of accidents depends on the speed. Speed is considered the independent variable. While regressions can be calculated manually, a statistically significant dataset could take a long time to regress.

Regressions not only allow us to determine whether a relationship exists but also to identify how strong that relationship is. The measure of this relationship is known as the regression coefficient. If the regression coefficient is relatively low, then speed may not be the major factor in fatal accidents. Perhaps the major factor is the time of day, whether it rained or not, or if alcohol was involved. With multiple regression, a number of independent variables can be tested against the dependent variable at the same time. The regression coefficient would determine which variables have the strongest relationship with the dependent variable. In business, you will frequently use regression to predict future events. Though not an exact science, regression can be used to make reliable predictions if enough variables are identified. For example, first responders could use regression outputs to predict the number of fatal accidents in a given shift based on average travel speed, time of day, weather, and any other factors deemed significant. This unit will also stress the importance of determining the factors that most likely contribute to a dependent variable.

Regression is often used in finance. Investors often want to know the relationship between a stock's performance and the overall performance of the market. By regressing the period returns of a stock with the returns of the market, investors can see the regression coefficient. This coefficient is known as a stock's beta and is covered extensively in BUS202: Principles of Finance.

**Completing this unit should take you approximately 6 hours.**