Shapes of Distributions

Read this section. Note that the instructions given in the text are for version 2.x of OpenOffice. If you have 3.x, some steps are slightly different; you may need to consult the help documents.

The shapes of distributions have names by which they are known.

One of the aspects of a sample that is often similar to the population is the shape of the distribution. If a good random sample of sufficient size has a symmetric distribution, then the population is likely to have a symmetric distribution. The process of projecting results from a sample to a population is called generalizing. Thus we can say that the shape of a sample distribution generalizes to a population.

uniform

peaked

symmetric

skewed
1 1 1
2 5 5
3 7 8
4 9 9
5 10 11
6 11 12
7 12 13
8 12 14
9 13 15
10 13 16
11 14 17
12 14 18
13 14 19
14 14 20
15 15 20
16 15 21
17 15 22
18 15 23
19 16 24
20 16 23
21 17 24
22 17 25
23 18 26
24 19 27
25 20 25
26 22 26
27 24 27
28 28 28

Both box plots and frequency histograms show the distribution of the data. Box plots and frequency histograms are two different views of the distribution of the data. There is a relationship between the frequency histogram and the associated box plot. The following charts show the frequency histograms and box plots for three distributions: a uniform distribution, a peaked symmetric heap distribution, and a left skewed distribution.

The uniform data is evenly distributed across the range. The whiskers run from the maximum to minimum value and the InterQuartile Range is the largest of the three distributions.

The peaked symmetric data has the smallest InterQuartile Range, the bulk of the data is close to the middle of the distribution. In the box plot this can be seen in the small InterQuartile range centered on the median. The peaked symmetric data has two potential outliers at the minimum and maximum values. For the peaked symmetric distribution data is usually found near the middle of the distribution.

The skewed data has the bulk of the data near the maximum. In the box plot this can be seen by the InterQuartile Range - the box - being "pushed" up towards the maximum value. The whiskers are also of an unequal length, another sign of a skewed distribution.

Creating histograms with spreadsheets

Making histograms with LibreOffice.org Calc

Select both the column with the class and the column with the frequencies.

Click on the chart wizard button. If not selected by default, choose a column chart. Click on Next.

At the second step, the data range step, select "First column as label" as seen in the next image.

Click on Next. At step three there is usually nothing that needs to be done if one has correctly selected their columns prior to starting the chart wizard.

Click on Next. On the next screen fill in the appropriate titles. The legend can be "unchecked" as seen in the next image.

When done, click on Finish. Double click any column in the chart to open up the data series dialog box.

Click on the options tab and set the Spacing to zero percent as seen in the previous image. In the data series dialog box one can alter the background color, add column borders, or make other customizations. Click on OK.

Gnumeric histogram notes

Select both the class upper limits and the frequencies. Choose the chart wizard. At the first step of the chart wizard select the Column chart option. Click on "Use first series as shared abscissa". The first series is the first column, the class upper limits. The abscissa is another word for x-axis.

At step two of two, select PlotBarCol1 and set the Gap to zero.

In step two a title can be added to Chart1 by clicking on Chart1 and then clicking on the Add button. The drop down menu includes the item "Title to Chart1" in alphabetic order on the list. To add a label to Y-Axis1, click on Add and then choose "Label to Y-Axis1". When one has made all desired modifications, click on Insert and then drag to size the chart.

As an anecdote, dragging to choose the size of the chart is the way Microsoft Excel 95 operated. While this may seem retro, this is an instructor's blessing. No two students are going to execute the exact same drag, hence no two homework assignments should have exactly the same size chart in exactly the same location.

Making histograms with Microsoft Excel 97/2000/XP

Select ONLY the column with the column with the frequencies. Click on the chart wizard.

Click on next.

In step 2 of 4, click on the series tab

Click in the Category (X) axis labels text box

Select the class upper limits by dragging with the mouse. Click on next when done.

Fill in the appropriate titles and then click on finish.

Double click any column to open up the Format Data series dialog box.

Click on the options tab and set the gap width to zero.

Click on OK.

Making histograms with Microsoft Excel 2007

Excel 2007 and 2010 are vastly different from earlier versions of Excel. The differences are beyond cosmetic and involve a fundamental shift in the philosophy, the gestalt if you will, of the interface. Excel 2010 made cosmetic improvements to ribbon background colors in an attempt to improve usability.

Note these examples use different data than the examples above. The original data derives from speed of sound measurements made by the physical science class.

Fundamentally the program violates the old precept of reducing the number of modalities for a user interface. These are where the user interface shows and hides menus according to a mode setting. Office 2007 turns this on its head and is all about modes. The program opens in the "Home" mode, a basic editing mode. The main menus are replaced by a structure called "the ribbon" seen in the image below.

Home

In the home mode the chart wizard is hidden from view. Click on the Insert tab on the ribbon.

Insert

The charts section the ribbon is horizontally compressed in the image above. The chart section usually appears as follows.

Charts

Select the data to be charted in the histogram, and then click on the column button.

Select data and then column button.

Select the chart subtype.

Chart subtype selection

The chart appears.

Right click on the chart to pop-up the chart context menu. Choose "Select Data"

Context menu

Remove the class upper limits (CUL) item from the Legend Series column.

Click on "Edit" in the Horizontal (Category) Axis Labels column.

After clicking "Edit" the screen highlights the existing frequency column.

Select the class upper limits (classes). Click OK.

Click OK again. To set the gap width (spacing) to zero, right-mouse click on the series and choose Format Data Series.

Set the gap width to zero.

Gap width setting

The result is a tad cartoonish - borderless columns - but that is a default style for Excel 2007.

Borderless columns

One can delete the legend, but x and y axis labels are usually necessary. Adding these is possibly the most non-obvious step for an OpenOffice.org or Excel 97/2000 user.

Note at the top of the Excel screen that there is a tab marked "Design". The two words to the right are also tabs, camoflaged to not look like a tab. Click on the camoflaged Layout tab.

Now select Axis Titles: Primary Horizontal Axis Title: Title Below Axis sub-sub-menu. This adds an x-axis label which one can then edit.

To obtain a y-axis label, select Axis Titles: Primary Vertical Axis Title: Rotated Title. This will add a y-axis title. Edit that title.


Source: Dana Lee Ling; http://www.comfsm.fm/~dleeling/statistics/text5.html
This work is licensed under a Creative Commons Attribution 4.0 License.

Last modified: Wednesday, April 3, 2019, 3:59 PM