MathCS.org - Statistics

back | next

3.6. Frequency Charts for Categorical Variables

Often one would like to know the frequency of occurrence of values for a variable in percent. This is similar to a frequency histogram we studies earlier, but a histogram only applies to numerical variables, while the procedure outlines in this section will apply to categorical variables. Unfortunately the procedure is somewhat lengthy, but with a little bit of practice it should not be too bad.

Example: A survey was conducted in the summer of 2004, asking several students in a statistics course a number of questions about their background and musical taste. The data can be found by clicking on the link below. Display a bar chart for the race of the students. In other words, compute how many of the students are white, black, hispanic, etc. and display those figures in a bar chart.

Here is the spreadsheet that contains the results of this survey:

Student Survey
Loading this data into Excel, we see that there is one column of interest, entitled "Race". However, that column represents a categorical variable (ordinal or nominal?) so we can not compute a frequency histogram. Also, the values are not numeric so we can't ask Excel to automatically add up all the "hispanics" (for example).
Spreadsheet Data
But since that column does contain the data we want to display, we need to learn a new procedure for handling categorical data. The procedure should automatically count the frequencies of the various races and present those counts in a bar chart.

Before we figure out how Excel can do this automatically let's simply do it by hand. Inspecting the data we see that there are 5 categories, White, Black, Hispanic, Pacific Islander, and Other. We type these categories into an empty part of the Excel spreadsheet and manually count how many people in each category are contained in our data. We add these counts, or frequencies, next to each category manually:

Spreadsheet Data
Now it should be easy to create the appropriate bar chart - make sure to do it, it works just as described in the previous section on creating simple bar charts.

Our manual procedure barely worked because we did not have that much data. For large data sets we need to figure out an automatic procedure to create a table of frequencies and the associated bar chart. Fortunately, Excel has just such a procedure, called a Pivot Table. The Pivot tool is found as the first button of the "Insert" ribbon. It looks slightly different depending on the version of Excel but the differences are pretty minor and you should be able to figure it out. For detailed assistance - if you are using Excel 2007 - you could try this helpful video.

The Pivot tool is actually a lot more flexible than we will need in our course, but it will for sure create the type of tables that we will be interested in. We will, in fact, see that tool again subsequent sections.

You will see a "potential frequency/percentage table" containing labels such as "Drop Row Field Here", "Drop Column Fields Here", etc., but no data values are yet contained in the table. There will also be a window called "Field List" containing the available variables from your data, in our case "ID", "Sex", "Weight", "Height", "Race", etc. You can "drag-and-drop" these variables to the various slots in the table to create a variety of useful tables for data analysis.

Pivot Table

You will finally see the counts of how many occurrences fell inside each race category, which of course will turn out similar to the one we created manually before, except this time it includes the "blank" category (and the order may be different.

For extra credit, see if you can eliminate the "blank" response row. Hint: maybe you can find a drop-down menu somewhere where you can 'uncheck' unwanted categories. Also, when you double-click the "Count of race" label in the table you can specify exactly what type of counts should be shown and in which way it should be formatted. Try for example to get your counts to appear as percent of the overall total.

In subsequent sections we'll revisit the Pivot Table tool and investigate additional options and possibilities. If you have problems with the Pivot tool so far, you might want to check out the following video:

Exercise: Please practice using the same data set and create:

  1. a bar chart representing the number of males and females in the survey
  2. a bar chart representing the number of vegetarians in the survey
  3. For review, create a histogram for "weight". The obstacle here will be that there is one (or more) blank data points for this variable.
  4. The more review, create a histogram for "height". In this case you will notice that one guy entered 5 as height, which does not make sense compared to all the other data points. What should you do with this exceptional value?
You should of course use the Pivot tool, not count the frequencies manually. For your reference, the charts for the first two questions are as follows:
Pivot Table

The last two questions require a good old histogram, which we have covered before. The appearent problems with this will be discussed in a homework.