Suggested languages for you:
|
|

## All-in-one learning app

• Flashcards
• NotesNotes
• ExplanationsExplanations
• Study Planner
• Textbook solutions

# Categorical Data

Have you every filled out a customer satisfaction survey? How about one where you were asked about your income level? Then you have participated in categorical data collection!

## Categorical Data Definition

First let's take a quick look at what categorical data is.

Categorical data is data which can be divided into different groups instead of being measured numerically.

So, some examples of categorical data would be hair color, type of pets someone has, and favorite foods. On the other hand things like height, weight, and the number of cups of coffee that someone drinks per day would be measured numerically, and so are not categorical data.

For a more thorough explanation of categorical data and what it is used for see the article Categorical Variables. To see the various types of data and how they are used you can take a look at One-Variable Data and Data Analysis.

In fact there are two types of categorical data, nominal and ordinal.

Nominal categorical data would be data that isn't assigned a number. An example would be, if you asked people if they lived in a rural area or a city area. "Rural" and "city" would be nominal categories.

Ordinal categorical data would be data that can be assigned numbers, but you couldn't really add the numbers together. For example if you did a customer satisfaction survey, and asked people to rate the service on a scale from $$1$$ to $$5$$, that would be ordinal categorical data. Notice that you can't add a satisfaction level of $$2$$ and a satisfaction level of $$4$$ together to get a satisfaction level of $$6$$!

## Categorical vs. Quantitative Data

Now you know what categorical data is, but how is that different from quantitative data? It helps to look at the definition first.

Quantitative data is data that is a count of how many things in a data set we have a particular quality.

Quantitative data usually answers questions like "how many" or "how much". For example quantitative data would be collected if you wanted to know how much people spent on buying a cell phone. Quantitative data is often used to compare multiple sets of data together. For a more complete discussion of quantitative data and what it is used for, take a look at Quantitative Variables.

Categorical data is qualitative, not quantitative!

## Categorical vs. Continuous Data

All right, what about continuous data? Can that be categorical? Let's take a look at the definition of continuous data.

Continuous data is data that is measured on a scale of numbers, where the data could be any number in the scale.

A good example of continuous data is height. For any of the numbers between $$4 \, ft.$$ and $$5 \, ft.$$ there could be someone of that height. In general, categorical data is not continuous data.

## Categorical Data Examples

Now that you have seen some comparisons between categorical data and other types, let's look at some more examples of categorical data.

Suppose you are having a party, and you want to make sure everyone has a dessert that they can eat. So you ask people to fill out a survey telling you their favorite dessert, and you gather up their data into a table like the one below.

 Favorite Dessert Frequency Ice cream $$4$$ Cake $$2$$ Fruit $$17$$ Pudding $$5$$ Cookies $$10$$

Is the data in the table categorical data?

Solution

Yes. Because the data is divided up into categories (favorite dessert) this is categorical data. In fact this would be considered ordinal categorical data.

Let's take a look at another example.

Suppose you were asked to give a survey to decide whether people liked a particular soft drink and got back the following information:

• 14 people liked the soft drink; and
• 50 people did not like it.

Is this categorical data?

Solution

Yes. You can divide up the answers into two categories, in this case "liked it" and "didn't like it". This would be an example of nominal categorical data.

Let's take a look at one more example.

Suppose you came across a survey someone had done which measured how far away from the center of a city someone lived and compared it to their income. Would this be categorical data?

Solution

It depends on the questions asked when gathering data. Let's take a look at a couple of surveys.

Survey 1

Question 1: How far do you live from the city center?

(a) I live in the city center

(b) I live within 1 mile of the city center, but not in it.

(c) I live between 1 and 5 miles of the city center.

(d) I live more than 5 miles from the city center.

Question 2: What is your income?

(a) Less than $10,000 per year. (b) Between$10,000 and $20,000 per year. (c) More than$20,000 per year.

Survey 2

Question 1: How many miles do you live from the city center?

Question 2: What is your income per year?

Then Survey 1 has the information divided up into categories. It is actually collecting two types of categorical data, and those can be compared together.

On the other hand, Survey 2 asks people for numbers. Their answer can be any number which is positive. So this survey is collecting continuous data. The data is not divided into categories, so it is not categorical data.

It is reasonable to ask how you analyze categorical data.

## Categorical Data Analysis

Two of the most frequent ways to look at categorical data is in bar charts and pie charts.

Let's go back to the example about soft drinks, where you discovered that 14 people liked the soft drink and 50 didn't. You could just look at the total number of responses, and make a bar chart showing the information.

Like and Didn't Like Bar Chart

You could also make a pie chart with the data.

Pie chart showing percentage of people who liked or didn't like the soda

Either one gives you a visual comparison of the data. For many more examples of how to construct a chart for categorical data, see Bar Graphs.

## Frequency Tables and Relative Frequency

If you go back to the example about dessert, there was a table of data. It listed how many people liked each kind of dessert. This kind of table is also called a frequency table. You could change the heading "number of responses" to "frequency" (shorthand for frequency of response) and the table would give exactly the same information.

 Favorite Dessert Frequency Ice cream $$4$$ Cake $$2$$ fruit $$17$$ Pudding $$5$$ Cookies $$10$$

Let's take a look at the more formal definition.

A categorical frequency distribution is a table that organizes categorical data into frequencies.

So in fact the table above could be called a categorical frequency distribution!

Once you know that, it is normal to ask questions like "what percentage of the party goers like fruit for dessert?". That is asking for the relative frequency.

The relative frequency is the proportion of the number of times a category appears in the data set when compared to the total number in the data set.

In other words, the relative frequency is just the number in that category divided by the total number of responses. Because these are really percentages, if you add up all of the relative frequencies in a table you should get $$1$$, or $$100 \%$$. Let's do an example.

From the table of dessert choices, make a table of relative frequencies.

 Favorite Dessert Frequency Ice cream $$4$$ Cake $$2$$ Fruit $$17$$ Pudding $$5$$ Cookies $$10$$

Solution

First you need to know how many responses there were to the survey. You can find that by adding up the frequency column of the table, so

$\mbox{total responses } = 4+2+17+5+10 = 38.$

Then you can find the relative frequency of each category by dividing the frequency by the total number of responses. For example the relative frequency of ice cream is

$\mbox{relative frequency of ice cream } = \frac{4}{38} = 0.105$

to three decimal places.

You can fill in the rest of the table in exactly the same way.

 Favorite Dessert Frequency Relative Frequency Ice cream $$4$$ $$0.105$$ Cake $$2$$ $$0.053$$ Fruit $$17$$ $$0.447$$ Pudding $$5$$ $$0.132$$ Cookies $$10$$ $$0.263$$

Notice that if you add up all of the relative frequencies, you get $$1$$, so you know these are more than likely correct. It is a good check to do to see if you are on the right track.

You can also look at a table that includes the cumulative relative frequency. That is just a fancy way of saying that the table includes the sum of all the relative frequencies before it.

Let's go back to the dessert table (which sounds like we should all be getting a piece of cake instead of more math). The cumulative relative frequency of the first row is just the relative frequency of the first row. The cumulative relative frequency of the second row is given by the sum of the relative frequency of the first row PLUS the relative frequency of the second row. Here is the table with cumulative relative frequency.

 Favorite Dessert Frequency Relative Frequency Cumulative Relative Frequency Ice cream $$4$$ $$0.105$$ $$0.105$$ Cake $$2$$ $$0.053$$ $$0.105 + 0.053 = 0.158$$ Fruit $$17$$ $$0.447$$ $$0.447 + 0.158 = 0.605$$ Pudding $$5$$ $$0.132$$ $$0.132 + 0.605 = 0.737$$ Cookies $$10$$ $$0.263$$ $$0.263 + 0.737 = 1$$

But what happens if you have two kinds of categorical data and want to compare them?

## Two-way Tables

Two-way tables are a way to compare types of categorical data. This is easiest to understand with an example. Let's go back to the survey

Question 1: How far do you live from the city center?

(a) I live in the city center.

(b) I live within 1 mile of the city center, but not in it.

(c) I live between 1 and 5 miles of the city center.

(d) I live more than 5 miles from the city center.

Question 2: What is your income?

(a) Less than $10,000 per year. (b) Between$10,000 and $20,000 per year. (c) More than$20,000 per year.

There are two questions, and each one is a type of categorical data. Suppose you got the following responses from the survey:

 Person Number Question 1 Question 2 Person Number Question 1 Question 2 1 a a 7 b c 2 a b 8 c a 3 d a 9 a b 4 b c 10 d c 5 c c 11 d b 6 d a 12 b c

It is kind of hard to see if there might be any relationships between distance from the city center and income this way! So instead you can make a two-way table. It has the columns as one of the first question responses, the rows are the second question responses. The empty two-way table would be:

 City Center Within 1 mile 1 to 5 miles More than 5 miles Less than $10,000$10,000 - $20,000 More than$20,000

The entry in each part of the table is the total number of responses given that has both the row answer and the column answer.

For example, in the table above $$2$$ people answered (a) (city center) for Question 1 and (b) (between $10,000 and$20,000) for Question 2. So at the intersection of "City Center" and "$10,000 -$20,000" there should be a $$2$$.

 City Center Within 1 mile 1 to 5 miles More than 5 miles Less than $10,000$10,000 - $20,000 2 More than$20,000

You can fill in the rest of the table the same way.

 City Center Within 1 mile 1 to 5 miles More than 5 miles Less than $10,000 1 0 1 2$10,000 - $20,000 2 0 0 1 More than$20,000 0 3 1 1

Now it is much easier to see any connections between distance from the city center and income. Notice that if you add up all the entries in the table you get $$12$$, which is exactly the same as the number of survey responses. You can graph these in a bar chart just like you normally would.

Bar graph for income vs. distance from city center

To look at the data in a two-way table graphically you can make a segmented bar graph. In a segmented bar graph, each bar of the graph is divided up into percentages based on the number of answers of that type. Sometimes a segmented bar graph is called a stacked bar chart. A segmented bar graph makes it easier to see what percentage of the total falls into each category.

Stacked bar chart for income vs distance from city center

Using the graph above, you can quickly see that more than half of the people making more than \$20,000 per year live within 1 mile of the city center!

## Categorical Data - Key takeaways

• Categorical data is data which can be divided into different groups instead of being measured numerically.
• Nominal categorical data would be data that isn't assigned a number.
• Ordinal categorical data would be data that can be assigned numbers, but you couldn't really add the numbers together.
• A categorical frequency distribution is a table that organizes categorical data into frequencies.
• The relative frequency is the proportion of the number of times a category appears in the data set when compared to the total number in the data set.
• Categorical data is qualitative, not quantitative.
• Continuous data is data that is measured on a scale of numbers, where the data could be any number in the scale.
• Two of the most frequent ways to look at categorical data is in bar charts and pie charts.
• Two-way tables are a way to compare types of categorical data.
• To look at the data in a two-way table graphically you can use a segmented bar graph, also called a stacked bar chart.

Categorical data is data which can be divided into different groups instead of being measured numerically.

Bar charts or pie graphs.

Examples of categorical data include hair color, type of pets someone has, and favorite foods.

Yes, for example income divided into ranges.

Bar charts.

## Final Categorical Data Quiz

Question

What is categorical data?

Categorical data is data which can be divided into different groups instead of being measured numerically.

Show question

Question

What is nominal categorical data?

Nominal categorical data would be data that isn't assigned a number.

Show question

Question

What is ordinal categorical data?

Ordinal categorical data would be data that can be assigned numbers, but you couldn't really add the numbers together.

Show question

Question

What is quantitative data?

Quantitative data is data that is a count of how many things in a data set have a particular quality.

Show question

Question

Is categorical data quantitative?

No, it is qualitative.

Show question

Question

What is continuous data?

Continuous data is data that is measured on a scale of numbers, where the data could be any number in the scale.

Show question

Question

Is categorical data continuous?

No, unless you have created arbitrary categories, like heights between 4 and 5 feet.

Show question

Question

What are two common ways to graph categorical data?

Bar charts and pie charts.

Show question

Question

What is a frequency table?

It is a table that lists how many responses there were within each category.

Show question

Question

What is a categorical frequency distribution?

A categorical frequency distribution is a table that organizes categorical data into frequencies.

Show question

Question

What is the relative frequency of a table of categorical data?

The relative frequency is the proportion of the number of times a category appears in the data set when compared to the total number in the data set.

Show question

Question

How do you find the relative frequency of a categorical frequency table?

The relative frequency is just the number in that category divided by the total number of responses.

Show question

Question

If you add up all the relative frequencies in a frequency table, what should you get?

You should always get 1, because they add up to 100% of the data.

Show question

Question

What kind of table would you use to compare two kinds of categorical data?

A two-way table.

Show question

Question

How could you compare two kinds of categorical data graphically?

In a stacked bar chart.

Show question

Question

What is another name for a stacked bar chart?

A segmented bar graph.

Show question

Question

What are segmented bar graphs used for?

Comparing two kinds  of categorical data.

Show question

60%

of the users don't pass the Categorical Data quiz! Will you pass the quiz?

Start Quiz

## Study Plan

Be perfectly prepared on time with an individual plan.

## Quizzes

Test your knowledge with gamified quizzes.

## Flashcards

Create and find flashcards in record time.

## Notes

Create beautiful notes faster than ever before.

## Study Sets

Have all your study materials in one place.

## Documents

Upload unlimited documents and save them online.

## Study Analytics

Identify your study strength and weaknesses.

## Weekly Goals

Set individual study goals and earn points reaching them.

## Smart Reminders

Stop procrastinating with our study reminders.

## Rewards

Earn points, unlock badges and level up while studying.

## Magic Marker

Create flashcards in notes completely automatically.

## Smart Formatting

Create the most beautiful study materials using our templates.