BUS204 Study Guide

Unit 4: Sampling and Sampling Distributions

4a. Differentiate the population from a sample

What is the difference between the population and a sample? Why are samples necessary?
If you're a fan of a sport, you're used to reading about statistics from a particular player or team. Suppose you're looking at a baseball player's batting average for a season. Would this be considered a parameter or a statistic? Should a football game's halftime statistics really be called "halftime parameters"? Why or why not?

The population is the group being studied, and the sample is a (hopefully) representative portion of that population. Rarely will we have access to an entire population, so we must gather descriptive statistics (mean, standard deviation, etc.) from the sample and use them to infer the properties of the population. This last part is called inferential statistics.

The information about a population (mean, etc.) is referred to as parameters; the information about a sample is statistics.

To review, see The Central Limit Theorem for Sample Means.

4b. Define and apply simple random sampling

What is simple random sampling, and how does it differ from other methods?
Are there situations where simple random sampling may not be the best sampling method?

Simple random sampling from a population means that each member of a population has an equal chance of being randomly selected.

The probability of $x$ taking on a certain value can be found using equally likely outcomes. If there are 3 red marbles, 2 white, and 5 blue, the probability of selecting a red marble at random is 3 out of 10.

To review, see Definitions of Statistics, Probability, and Key Terms.

4c. Determine different types of selection bias and sampling errors, and explain how to avoid these errors in survey sampling, such as selection and estimation errors

Why is it important to be careful to make sure that your sample is representative of the population?
What types of errors or biases must a researcher be aware of when selecting his/her sample and running the survey?

There are several types of response bias that you have to be careful about when designing surveys and obtaining samples:

Ordering of answers:
- For example: "Would you say traffic contributes more or less to air pollution than industry"?
  - Results: Traffic, 45%; Industry, 27%
- When the order is reversed, we get different results:
  - Results: Industry, 57%; Traffic, 24%
Misleading Conclusions: Concluding that one variable causes the other variable when in fact, the variables are only correlated or associated together. Two variables that may seem linked are smoking and the rate of heartbeat. We cannot conclude that one causes the other. Correlation does not imply causality.
Small Samples: Conclusions should not be based on samples that are far too small.

Example: Basing a school suspension rate on a sample of only three students

Loaded Questions: If survey questions are not worded carefully, the results of a study can be misleading.

97% yes: "Should the President have the line item veto to eliminate waste?"
57% yes: "Should the President have the line item veto, or not?"

Leading Questions: The wording of the question may be loaded in some way to unduly favor one response over another. For example, a satisfaction survey may ask the respondent to indicate where she is satisfied, dissatisfied, or very dissatisfied. By giving the respondent one response option to express satisfaction and two response options to express dissatisfaction, this survey question is biased toward getting a dissatisfied response.
Social Desirability: Most people like to present themselves in a favorable light, so they will be reluctant to admit to unsavory attitudes or illegal activities in a survey, particularly if the survey results are not confidential. Instead, their responses may be biased toward what they believe is socially desirable.

If you would like additional help, try watching this external video: Techniques for Random Sampling and Avoiding Bias.

4d. Describe and identify the different sampling methods, including systematic, stratified random, cluster, convenience, panel, and quota sampling, and identify an example of each

Why would you need to use a method other than simple random sampling when gathering your sample?
Explain the difference between stratified and cluster sampling. When would you use each?
Why is it important to choose a good sampling method for your situation? What’s wrong with informal methods like Facebook polls?

Simple random sampling is, as the name suggests, the easiest way of drawing a sample from a population. Each member of the population has an equal chance of being selected. It is often referred to as the "picking names out of a hat" method. Select a piece of paper randomly from a hat, or use a random number generator. There are cases, however, when using a simple random sample can bias your sample.

Stratified sampling is used when your population is non-homogenous, and you want to make sure that various groups in the population are proportionally represented in the sample. Suppose you are conducting a survey at your college, and you know that males and females will have very different opinions, yet you want your sample to be representative of the entire population. You divide the population up into these groups, or strata, and take a simple random sample from each group. If the student population is 60% female and 40% male, and you want to sample 50 students, you should randomly select 30 women and 20 men.

Cluster sampling is similar to stratified sampling, except it is used when you have several subgroups of a population that are already heterogeneous and representative of the sample. Then you select a simple random sample not of the entire population but from among the entire clusters. Example: A large apartment complex has 1,000 residents living in 10 buildings of 100 people each. If you want to select a sample of 200 residents, and you want to minimize the walking up and down stairs, you randomly select 2 of the 10 buildings and sample every person in them.

Systematic sampling samples every kth item. This is useful when you know what sample size you want but can only approximate the population size. A classic example is doing a quality inspection on every 20th item to come off an assembly line. This is also sometimes called the "shopping mall" method because of people standing in shopping centers selecting every 10th customer that walks in.

Convenience sampling is not really a valid method but is mentioned here for illustration. Convenience means something like saying, "whatever, I'll poll my Facebook friends". Any time there is no diversity in the sample or when the sample is self-selected, it is prone to bias. An example would be if you took a poll to see who should be the next President of the United States, and you sampled people overwhelmingly from one gender, age group, profession, or state or asked members of a political club you belong to.

If you would like additional help, try watching this external video: Techniques for Random Sampling and Avoiding Bias.

4e. Use a point estimator from a sample to estimate the entire population

What is a point estimator for a population parameter, and which statistic is often used?
What is the purpose of a point estimator?

The point estimator is the starting point when estimating a parameter from the population based on sample statistics or sample data. In most cases, it is equal to the sample statistic. For example, if you are trying to estimate the population mean, a confidence interval would be based on the point estimate plus or minus a margin of error. This course covers how to calculate these margins of error for finding means and other parameters in Unit 5. The point estimator can also be used to generate a p-value, which gives you a clue on whether or not to reject a particular hypothesis about the population.

If you would like additional help, try watching this external video: Confidence Intervals and Margin of Error.

Unit 4 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included.

cluster sampling
descriptive statistics
equally likely outcomes
inferential statistics
parameter
point estimator
population
response bias
sample
simple random sampling
statistic
stratified sampling
systematic sampling