BUS607 Study Guide
Unit 5: Deriving Data Insights
5a. Describe the mean, median, and mode of a set of data
- What are the differences between the mean, median, and mode of a data set?
- How do you differentiate between nominal, ordinal, interval, and ratio scales?
The 'center' or midpoint of a data set helps describe a location. The mean and median are the two most widely used measures of the data 'center'. The mean, also called the average, is the total values divided by the number of values. The median is the middle number that splits the ordered data set into two equal parts. It is used most often when there are extreme values or outliers in the data since it is not impacted by precise numerical values. Another measure of the center is the mode, which is the most frequent value. There can be more than one mode in a data set as long as the values have the same frequency.
Frequency is the number of times an event or a value occurs in a dataset. A frequency table lists each item and the number of times the item appears. The level of measurement is how a data set is measured and can vary with the type of data being analyzed. The four levels of measurement include:
- Nominal scale, which is qualitative, includes categories such as colors, names, labels, etc.
- Ordinal scale, which is similar to the nominal scale, but listed in an ordered fashion, like the top restaurants in a city or the best beaches in a country
- Interval scale, which measures data that is in a definite order but does not necessarily have a starting point, like weather temperature
- Ratio scale, which can be very informative since it has a 0 starting point and can be calculated and ordered
To review, see Frequency, Frequency Tables, and Levels of Measurement.
5b. Analyze data presented in frequency tables, frequency distributions, and graphics
- How do you differentiate between frequency, relative frequency, and cumulative relative frequency?
- How are frequency distributions utilized to analyze data?
- What are the criteria for selecting the best graphics to display data to a particular audience?
When organizing data, it is important to know how many times a value appears. Questions like, the number of hours students study or the percentage of families with multiple pets. Frequency (also called, absolute frequency), relative frequency, and cumulative relative frequency are measures that answer questions like these.
The absolute frequency is the number of times a value occurs in the data. The relative frequency is the ratio of the number of times a value occurs in the total number of values. The cumulative relative frequency is the summation of all of the relative frequencies and totals to 1 or 100%.
When displaying data to an audience, it's important to make the right choice to help them quickly understand the point being made. Some simple charts that can be used include:
- Line charts for comparing trends, multiple datasets over time, or correlations
- Area charts for comparing change over time from two or more variables
- Column charts for showing frequency distribution and comparing datasets
- Bar charts for ranking datasets or comparing datasets
- Pie charts for comparing datasets as percentages of a whole.
To review, see Frequency, Frequency Tables, and Levels of Measurement and Presenting Data.
5c. Analyze relative frequencies and the relationship with frequency tables
- How are relative frequencies different from absolute frequencies?
- What are some of the ways a frequency distribution can be displayed?
- What are the features of a histogram versus the features of a bar chart?
Frequency distributions are visual displays that organize and present frequency counts so that the information can be interpreted more easily. They can be shown as absolute frequencies or relative frequencies, such as proportions or percentages. A frequency distribution can be shown in a table or graph. Some common methods of showing frequency distributions include frequency tables, histograms, or bar charts.
A histogram displays the distribution of all observations in a quantitative dataset. It can be used for describing the shape, center, and spread to better understand the data distribution. The height of each column shows the frequency for the specific range of values. The columns are usually of equal width. The values of each column must be mutually exclusive (no spaces between columns). There should be no ambiguity in the x-axis label.
The columns in a bar chart represent categorical variables or discrete ungrouped numeric variables. It is primarily used to compare the frequency (count) of a category or characteristic against another. The bar height (vertical or horizontal) shows the frequency for each category or characteristic. The data distribution is not important since each column represents an individual category or characteristic. Therefore, gaps are included between each bar, and the bars can be arranged in any order without impacting the data.
To review, see The Statistical Language of Frequency Distribution and Frequency Tables.
5d. Interpret cumulative frequency distribution and explain its use in decision-making
- What are the differences between cumulative relative frequency and relative or absolute frequencies?
- What should the last entry in a cumulative distribution be equal to?
Cumulative relative frequency is the accumulation of the previous relative frequencies. To find the cumulative relative frequencies, add all the previous relative frequencies to the relative frequency for the current row. This distribution helps the analyst know all entries have been accounted for.
The absolute frequency is the number of times a value occurs in the data. The relative frequency is the ratio of the number of times a value occurs in the total number of values. The cumulative relative frequency is the summation of all of the relative frequencies and totals to 1 or 100%.
To review, see Frequency, Frequency Tables, and Levels of Measurement and The Statistical Language of Frequency Distribution.
Unit 5 Vocabulary
This vocabulary list includes the terms that you will need to know to successfully complete the final exam.
- absolute frequency
- area chart
- bar chart
- column chart
- cumulative relative frequency
- frequency
- frequency distributions
- frequency tables
- histogram
- interval scale
- level of measurement
- line chart
- mean
- median
- mode
- nominal scale
- ordinal scale
- pie chart
- ratio scale
- relative frequency