Time: 53 hours
College Credit Recommended($25 Proctor Fee)
Here, we will look at summary statistics, which give an overview of a data set, such as the average score on an exam. However, the average does not always tell the entire story; since half of the students could have gotten 100 on the exam, and the other half gotten 60. Using statistics, we can learn a lot more about how data is organized. To do that, we will use statistical tools to analyze data, draw conclusions, and make predictions of the future. The course will begin with data distributions, followed by probability analysis, sampling, hypothesis testing, inferential statistics, and regression.
Even if you haven't taken a statistics course before, you are already familiar with the fundamentals of statistics from your everyday life. For instance, you already know that the majority of adult males have the same shoe size (which is very close to the average size), and that there are a few adult males on both sides of the average who have smaller or larger shoe sizes. In statistics, we call this phenomenon the "normal distribution".
This unit will introduce you to statistical analysis and how it relates to business. For example, you may be interested in learning about the average price of a 50-inch TV by gathering price data from 30 different stores. You would then take your 30 prices and compute the average price. Given the fact that there are thousands of stores that are selling that particular product, the next question in statistics is: are you confident enough to say that your computed average is reflective of the real average that you would get if you looked at the price for that TV at every possible store?
You are probably familiar with the average of a data set. In this course, we will refer to what most people call the average as the "arithmetic mean". The average is actually any single value used to describe the middle of a data set. The most common averages used in statistics are the arithmetic mean, the median, and the mode. Each describes the middle of a dataset in different ways. The median is the numeric value that separates the upper and lower half of a data set. The mean is the sum of all values divided by the number of values. The mode is the most common value within the dataset.
In many instances, the median and the mean are similar, but we will also talk about many examples where it is not. The distinction between these kinds of summary statistics is important in business statistics. Understanding this vocabulary will be vital to your success in this course and in the business world.
Completing this unit should take you approximately 6 hours.
How likely is it that a certain event will occur? What are the chances that a given student will receive a grade of 60-69 on their exam? By studying distributions of data, you can determine the probability that a certain event will occur. By looking at the distribution of grades in a class, you can identify the probability that a student will receive between a 60 and 69. Probability is used in business to predicting profits, determine the chances that a business model will affect regulation, and in many other ways.
Before you can focus on probability, you must first learn how to count. What's that? You already know how to count? Maybe – but in this unit, you will learn how to count the different ways that multiple events can occur together. These are called combinations and permutations, and they are a fundamental concept of probability.
Completing this unit should take you approximately 12 hours.
A distribution is a line graph representation of the probability that an event will occur. It is similar to a histogram, but in a distribution, the user does not determine the grouping; instead, data is grouped according to the likelihood that it will occur within the data set. Distributions also allow for analysis of a specific event, whereas a histogram requires events be grouped.
An important type of this distribution is the "normal" distribution. The normal distribution is used to approximate real-world occurrences. If you can make certain assumptions about the occurrence of an event, then you can use the normal distribution to find out the probabilities of that event occurring. Many of the events that are important to business can be approximated using the normal distribution.
Completing this unit should take you approximately 3 hours.
While you may not become a professional data gatherer, it is likely that you will need to compile data on a regular basis. When gathering data, you will not always have the luxury of collecting all available data. For example, economists cannot measure the entire unemployment of the population, so they must take a random sample instead. Likewise, in a manufacturing facility, quality control managers do not have the resources to test every product that comes off the line; it is simply not feasible. Instead, they take samples at various points during the production process to test the quality of the products the firm produces.
There are a number of methods employed in sampling data. It is important that the sampling method fits the application. For example, marketing managers may wish to test a product on various groups of people. They may define these groups by age, race, geography, income, or any other factors. They then divide the population into these groups and take samples from each group in a process known as cluster sampling. If marketers do not properly divide the population, they may end up marketing to the wrong demographic and achieving poor sales.
Completing this unit should take you approximately 3 hours.
Estimation is the process of making predictions based on the best available information. Businesses employ estimation in order to help managers make decisions regarding the future. For example, if the CFO estimates profits will be lower next year, the CEO will consider cost-cutting measures to make up for the loss. Normally, companies do not want to pursue aggressive cost-cutting because it usually comes in the form of layoffs, which are bad for employee morale.
In order to make accurate estimates, companies use hypothesis testing. For example, assume the CFO thinks profits will be below 5% of revenue next year. His null hypothesis is that profits will be 5% or greater next year. The alternative hypothesis is that profits will not be 5% or greater next year. This seems counter-intuitive, but statistics proposes that a hypothesis cannot be proven true; it can only be rejected, or shown to be not true. Through the hypothesis testing process, the CFO will either reject or accept the null hypothesis. Hypothesis tests are always framed in this manner because, with imperfect information, nothing can be proven.
The best non-business analogy to hypothesis testing comes from the courtroom. In the United States, a defendant is presumed innocent until proven guilty. The null hypothesis in this scenario is innocent or not guilty. The alternative hypothesis is guilty. In order to find the defendant guilty, the jury must be offered enough evidence that suggests the defendant is guilty beyond a reasonable doubt. If the members of the jury make that decision, then they reject the null hypothesis. If the jury members decide they do not have enough evidence to make that judgment, then they must find the defendant not guilty. Notice not guilty does not mean the jury claims the defendant is innocent. The decision simply means the members of the jury do not have enough information to find the person guilty, so they err on the side of caution and fail to reject the null hypothesis. As an aside, in this example, beyond a reasonable doubt is analogous to the level of significance, which you will learn is crucial to hypothesis testing.
Completing this unit should take you approximately 11 hours.
If two data points move in the same direction, does that mean that one causes the other? How are we to analyze their correlation?
Regression is an analysis of the relationship of one variable to another. A regression might identify, for example, the relationship between car speed and the number of fatal accidents. In this example, speed and number of accidents are the two variables; the number of accidents is said to be the dependent variable, because the number of accidents depends on the speed. Speed is considered the independent variable. While regressions can be calculated manually, a statistically significant data set could take a long time to regress.
Regressions not only allow us to determine whether a relationship exists but also to identify how strong that relationship is. The measure of this relationship is known as the regression coefficient. If the regression coefficient is relatively low, then speed may not be the major factor in fatal accidents. Perhaps the major factor is the time of day, whether it rained or not, or if alcohol was involved. With multiple regression, a number of independent variables can be tested against the dependent variable at the same time. The regression coefficient would determine which variables have the strongest relationship with the dependent variable. In business, you will frequently use regression to predict future events. Though not an exact science, regression can be used to make reliable predictions if enough variables are identified. For example, first responders could use regression outputs to predict the number of fatal accidents in a given shift based on average travel speed, time of day, weather, and any other factors deemed significant. This unit will also stress the importance of determining the factors that most likely contribute to a dependent variable.
Regression is often used in finance. Investors often want to know the relationship between a stock's performance and the overall performance of the market. By regressing the period returns of a stock with the returns of the market, investors can see the regression coefficient. This coefficient is known as a stock's beta and is covered extensively in BUS202: Principles of Finance.
Completing this unit should take you approximately 5 hours.