Suggested languages for you:
|
|

## All-in-one learning app

• Flashcards
• NotesNotes
• ExplanationsExplanations
• Study Planner
• Textbook solutions

# Categorical Variables

How satisfied are you with this app? Please rate it on the following scale,

• $$1$$ very unsatisfied

• $$2$$ somewhat unsatisfied

• $$3$$ neither satisfied nor unsatisfied

• $$4$$ somewhat satisfied

• $$5$$ very satisfied

You have just seen categorical variables!

## Definition of Categorical Variables

Remember that univariate data, also known as one-variable data, are observations that are made on the individuals in a population or sample. That data comes in different types, like qualitative, quantitative, categorical, continuous, discrete, and so on. In particular you will be looking at categorical variables, which are also often called categorical data. Let's first look at the definition.

A variable is called a categorical variable if the data collected falls into categories.

In other words, the name tells you exactly what it is! It is data that falls into categories. Categorical variables are qualitative variables because they deal with qualities, not quantities.

## Types of Categorical Variables

There are two main types of categorical variables, nominal and ordinal.

A categorical variable is called ordinal if it has an implied order to it.

An example of ordinal categorical data would be the survey at the start of this article. It asked you to rate satisfaction on a scale of $$1$$ to $$5$$, meaning there is an implied order to your rating. Remember that numerical data is data that involves numbers, which the survey example does have. So it is possible for survey data to be both ordinal and numerical.

A categorical variable is called nominal if the categories are named.

Suppose a survey asked you what kind of housing you live in, and the options you could pick from were dorm, house, and apartment. Those are examples of named categories, so that is nominal categorical data. In other words, if it has a named category but isn't numerically ordered, then it is a nominal categorical variable.

## Categorical Variables in Statistics

Before you go on to look at more examples of categorical variables, let's look at some of the advantages and disadvantages of categorical data.

• The results are very straightforward because people only get a few options to choose from.

• Because the options are laid out ahead of time, there are no open-ended questions that need to be analyzed. Categorical data is called concrete because of this property.

• Categorical data can be much easier to analyze (and less expensive to analyze) than other kinds of data.

• In general, you need to get quite a few samples to make sure the survey accurately represents the population. This can be expensive to do.

• Because the categories are laid out at the start of the survey, it isn't very sensitive. For example if the only two options for hair color on a survey are brown hair and white hair, people will have trouble deciding which category to put their hair color in (assuming they have any at all). This can lead to non-responses, and people making unanticipated choices on what their hair color is which skews the data.

• You can't do quantitative analysis on categorical data! Because it isn't numerical data you can't do arithmetic on it. For example you can't take a survey satisfaction of $$4$$, and add it to a survey satisfaction of $$3$$ to get a survey satisfaction of $$7$$.

How do you collect categorical data? This is often done through interviews (either in person or on the phone) or surveys (either online, in the mail, or in person). In either case, the questions asked are not open-ended. They will always ask people to choose between a specific set of options.

How do you analyze categorical data? Often it is done with proportions or percentages, and it can be in tables or graphs. For more information on these topics see Categorical Data in Tables and Bar Graphs.

## Examples of Categorical Variables

Let's look at some examples of what categorical data can be.

Suppose you are interesting in seeing a movie, and you ask a bunch of your friends whether they liked it or not in order to decide whether you want to spend money on it. Of your friends, $$15$$ liked the movie and $$50$$ didn't like it. What is the variable here, and what kind of variable is it?

Solution

First of all, this is categorical data. It is divided into two categories, "liked" and "didn't like". There is one variable in the data set, namely your friends' opinions of the movie. In fact, this is an example of nominal categorical data.

Let's look at another example.

Going back to the movie example, suppose you asked your friends whether or not they liked a particular movie, and what city they live in. How many variables are there, and what kind are they?

Solution

Just like in the previous example, your friends' opinions of the movie is one variable, and it is categorical. Since you also asked what city your friends live in, there is a second variable here, and it is the name of the state they live in. There are only so many states in the US, so there are a finite number of places they could list as their state. So the state is a second nominal categorical variable you have collected data on.

Now suppose you have asked your friends about how much they are willing to pay to see the movie, and you give them three price ranges: less than $5; between$5 and $10; and more than$10. What kind of data is this?

Solution

This is still categorical data because you have laid out the categories your friends can answer in before you asked them to answer your survey. However this time it is ordinal categorical data since you can order the categories by price (which is a number).

So how do you compare categorical variables anyway?

## Correlation Between Categorical Variables

Suppose you asked your friends whether or not they liked a particular movie, and whether they paid less than $5, between$5 and $10, or more than$10 to see it. Those are two categorical variables, so how can you compare them? Is there any way to see if how much they paid to see the movie influenced how much they liked it?

One thing you can do is look at comparative bar charts of the data, or at a two-way table. You can find more information about those in the article Bar Graphs. The other thing you can do is a more official kind of statistical test, called a chi-square test. This topic can be found in the article Inference for Distributions of Categorical Data.

## Categorical Variables - Key takeaways

• A variable is called a categorical variable if the data collected falls into categories.
• Categorical variables are qualitative variables because they deal with qualities, not quantities.
• A categorical variable is called ordinal if it has an implied order to it.
• A categorical variable is called nominal if the categories are named.
• Ways to look at categorical variables include tables and bar charts.

A categorical variable is one where the data collected isn't a measurement.  For example, hair color is a kind of  categorical data, but pounds of produce bought per week is not.

Hair color, educational level, and customer satisfaction on a scale of 1 to 5 are all categorical variables.

A nominal categorical variable is one that can be put into categories, but the categories aren't intrinsically ordered.  For example whether you live in a house, apartment, or someplace else are categorical, but they don't have an intrinsic number associated with them.

Quantitative data is data that represents an amount, like height in inches.  Categorical data is data that is collected in categories, for example if a survey asked someone if they were less than 4 feet tall, between 4 and 6 feet tall, or more than 6 feet tall.

The most common way to measure categorical data is with percentages that are displayed graphically, as in bar graphs.

## Final Categorical Variables Quiz

Question

This type of data has no intrinsic ordering to its categories.

Nominal data.

Show question

Question

Examples of categorical data are...

Hair color, eye color, educational level, etc.

Show question

Question

The results of categorical data are concrete, this means…

They are without subjective open-ended questions.

Show question

Question

What is one disadvantage of doing a survey to gather categorical data?

You need to do a sufficient number of surveys to make sure you are getting an accurate picture of the population.

Show question

Question

Why can gathering categorical data be expensive?

Because it involves surveys and interviews, both of which are generally expensive to do.

Show question

Question

Categorical data can be analyzed using…………

Bar graphs, pie charts, and tables.

Show question

Question

Is categorical data sensitive?

No, because there are no open-ended questions allowed.

Show question

Question

Why can't you do arithmetic on categorical data?

Because it isn't a continuous data type.

Show question

Question

The two main types of categorical data are……..

Nominal and ordinal.

Show question

Question

What is categorical and quantitative data?

Quantitative variables are variables where the data represent amounts (e.g. height, weight, or age). Categorical variables are variables where the data represent groups.

Show question

Question

The weight of an object is a categorical variable. True/False

False, because it is a continuous variable.  Unless you have divided the weight up into categories (for example less than 10 pounds, 10-20 pounds, more than 20 pounds) before you do the survey.  Then it would be a categorical variable.

Show question

Question

Categorical variables can be divided into

Nominal variables and Ordinal variables.

Show question

Question

Analyzing categorical data normally involves the use of__________

tables and graphs.

Show question

Question

Classify gender and temperature as examples under Nominal or Ordinal variables.

Gender is a nominal variable and temperature is an ordinal variable.

Show question

Question

Categorical data can be presented by counting the number of observations that fall into each group for two variables using a________

Two-way data table.

Show question

Question

Can categorical data be numerical?

Yes.

Show question

Question

Two kinds of categorical data are_____

Nominal and ordinal.

Show question

Question

Why can't you do quantitative analysis on categorical variables?

Because it isn't quantitative data.

Show question

Question

What kind of graph would you use to look at categorical data?

A bar graph.

Show question

Question

What is an ordinal categorical variable?

A categorical variable is called ordinal if it has an implied order to it.

Show question

Question

What kind of test can be used to compare two categorical variables?

A chi-square test.

Show question

Question

One advantage of categorical data is the fact that it can be less expensive to analyze.  Why?

Because the categories are laid out ahead of time and there are no open-ended questions.

Show question

Question

Nominal data is a kind of...

categorical data.

Show question

Question

One disadvantage of Categorical data in terms of sampling is the fact that it requires larger samples which are ______

Expensive/costly.

Show question

Question

Assuming that there are only two genders, state the categories of gender as a categorical variable.

Male/female.

Show question

Question

What is categorical data?

Categorical data is data which can be divided into different groups instead of being measured numerically.

Show question

Question

What is nominal categorical data?

Nominal categorical data would be data that isn't assigned a number.

Show question

Question

What is ordinal categorical data?

Ordinal categorical data would be data that can be assigned numbers, but you couldn't really add the numbers together.

Show question

Question

What is quantitative data?

Quantitative data is data that is a count of how many things in a data set have a particular quality.

Show question

Question

Is categorical data quantitative?

No, it is qualitative.

Show question

Question

What is continuous data?

Continuous data is data that is measured on a scale of numbers, where the data could be any number in the scale.

Show question

Question

Is categorical data continuous?

No, unless you have created arbitrary categories, like heights between 4 and 5 feet.

Show question

Question

What are two common ways to graph categorical data?

Bar charts and pie charts.

Show question

Question

What is a frequency table?

It is a table that lists how many responses there were within each category.

Show question

Question

What is a categorical frequency distribution?

A categorical frequency distribution is a table that organizes categorical data into frequencies.

Show question

Question

What is the relative frequency of a table of categorical data?

The relative frequency is the proportion of the number of times a category appears in the data set when compared to the total number in the data set.

Show question

Question

How do you find the relative frequency of a categorical frequency table?

The relative frequency is just the number in that category divided by the total number of responses.

Show question

Question

If you add up all the relative frequencies in a frequency table, what should you get?

You should always get 1, because they add up to 100% of the data.

Show question

Question

What kind of table would you use to compare two kinds of categorical data?

A two-way table.

Show question

Question

How could you compare two kinds of categorical data graphically?

In a stacked bar chart.

Show question

Question

What is another name for a stacked bar chart?

A segmented bar graph.

Show question

Question

What are segmented bar graphs used for?

Comparing two kinds  of categorical data.

Show question

Question

What is a bar graph?

A bar graph is a pictorial representation of the distribution of a data set, that uses vertical or horizontal bars of the same width to represent different categories.

Show question

Question

A bar graph is also known as a bar chart or bar diagram.

True.

Show question

Question

What does the length of each bar in a bar graph represent?

The frequency of appearance of each category in the data.

Show question

Question

You can use bar graphs to represent the distribution of a data set when the observations refer to only one variable.

True.

Show question

Question

What is a frequency distribution?

A frequency distribution is a table showing all the possible categories and their corresponding frequencies.

Show question

Question

When you use a bar graph to represent the distribution of a data set, you can do so by using the frequency or relative frequency of each category.

True.

Show question

Question

How do you calculate the frequency of a category in a data set?

No formula required. All you need to do is count how many times you find each category in the data set.

Show question

Question

How do you calculate the relative frequency of a category in a data set?

$\text{Relative Frequency} = \frac{f}{n},$

where:

$$f \Rightarrow$$ Frequency,

$$n \Rightarrow$$ Total number of observations, which equals the sum of all the frequencies.

Show question

60%

of the users don't pass the Categorical Variables quiz! Will you pass the quiz?

Start Quiz

## Study Plan

Be perfectly prepared on time with an individual plan.

## Quizzes

Test your knowledge with gamified quizzes.

## Flashcards

Create and find flashcards in record time.

## Notes

Create beautiful notes faster than ever before.

## Study Sets

Have all your study materials in one place.

## Documents

Upload unlimited documents and save them online.

## Study Analytics

Identify your study strength and weaknesses.

## Weekly Goals

Set individual study goals and earn points reaching them.

## Smart Reminders

Stop procrastinating with our study reminders.

## Rewards

Earn points, unlock badges and level up while studying.

## Magic Marker

Create flashcards in notes completely automatically.

## Smart Formatting

Create the most beautiful study materials using our templates.