StudySmarter - The all-in-one study app.

4.8 • +11k Ratings

More than 3 Million Downloads

Free

Suggested languages for you:

Americas

Europe

Two Categorical Variables

- Calculus
- Absolute Maxima and Minima
- Absolute and Conditional Convergence
- Accumulation Function
- Accumulation Problems
- Algebraic Functions
- Alternating Series
- Antiderivatives
- Application of Derivatives
- Approximating Areas
- Arc Length of a Curve
- Area Between Two Curves
- Arithmetic Series
- Average Value of a Function
- Calculus of Parametric Curves
- Candidate Test
- Combining Differentiation Rules
- Combining Functions
- Continuity
- Continuity Over an Interval
- Convergence Tests
- Cost and Revenue
- Density and Center of Mass
- Derivative Functions
- Derivative of Exponential Function
- Derivative of Inverse Function
- Derivative of Logarithmic Functions
- Derivative of Trigonometric Functions
- Derivatives
- Derivatives and Continuity
- Derivatives and the Shape of a Graph
- Derivatives of Inverse Trigonometric Functions
- Derivatives of Polar Functions
- Derivatives of Sec, Csc and Cot
- Derivatives of Sin, Cos and Tan
- Determining Volumes by Slicing
- Direction Fields
- Disk Method
- Divergence Test
- Eliminating the Parameter
- Euler's Method
- Evaluating a Definite Integral
- Evaluation Theorem
- Exponential Functions
- Finding Limits
- Finding Limits of Specific Functions
- First Derivative Test
- Function Transformations
- General Solution of Differential Equation
- Geometric Series
- Growth Rate of Functions
- Higher-Order Derivatives
- Hydrostatic Pressure
- Hyperbolic Functions
- Implicit Differentiation Tangent Line
- Implicit Relations
- Improper Integrals
- Indefinite Integral
- Indeterminate Forms
- Initial Value Problem Differential Equations
- Integral Test
- Integrals of Exponential Functions
- Integrals of Motion
- Integrating Even and Odd Functions
- Integration Formula
- Integration Tables
- Integration Using Long Division
- Integration of Logarithmic Functions
- Integration using Inverse Trigonometric Functions
- Intermediate Value Theorem
- Inverse Trigonometric Functions
- Jump Discontinuity
- Lagrange Error Bound
- Limit Laws
- Limit of Vector Valued Function
- Limit of a Sequence
- Limits
- Limits at Infinity
- Limits at Infinity and Asymptotes
- Limits of a Function
- Linear Approximations and Differentials
- Linear Differential Equation
- Linear Functions
- Logarithmic Differentiation
- Logarithmic Functions
- Logistic Differential Equation
- Maclaurin Series
- Manipulating Functions
- Maxima and Minima
- Maxima and Minima Problems
- Mean Value Theorem for Integrals
- Models for Population Growth
- Motion Along a Line
- Motion in Space
- Natural Logarithmic Function
- Net Change Theorem
- Newton's Method
- Nonhomogeneous Differential Equation
- One-Sided Limits
- Optimization Problems
- P Series
- Particle Model Motion
- Particular Solutions to Differential Equations
- Polar Coordinates
- Polar Coordinates Functions
- Polar Curves
- Population Change
- Power Series
- Radius of Convergence
- Ratio Test
- Removable Discontinuity
- Riemann Sum
- Rolle's Theorem
- Root Test
- Second Derivative Test
- Separable Equations
- Separation of Variables
- Simpson's Rule
- Solid of Revolution
- Solutions to Differential Equations
- Surface Area of Revolution
- Symmetry of Functions
- Tangent Lines
- Taylor Polynomials
- Taylor Series
- Techniques of Integration
- The Fundamental Theorem of Calculus
- The Mean Value Theorem
- The Power Rule
- The Squeeze Theorem
- The Trapezoidal Rule
- Theorems of Continuity
- Trigonometric Substitution
- Vector Valued Function
- Vectors in Calculus
- Vectors in Space
- Washer Method
- Decision Maths
- Geometry
- 2 Dimensional Figures
- 3 Dimensional Vectors
- 3-Dimensional Figures
- Altitude
- Angles in Circles
- Arc Measures
- Area and Volume
- Area of Circles
- Area of Circular Sector
- Area of Parallelograms
- Area of Plane Figures
- Area of Rectangles
- Area of Regular Polygons
- Area of Rhombus
- Area of Trapezoid
- Area of a Kite
- Composition
- Congruence Transformations
- Congruent Triangles
- Convexity in Polygons
- Coordinate Systems
- Dilations
- Distance and Midpoints
- Equation of Circles
- Equilateral Triangles
- Figures
- Fundamentals of Geometry
- Geometric Inequalities
- Geometric Mean
- Geometric Probability
- Glide Reflections
- HL ASA and AAS
- Identity Map
- Inscribed Angles
- Isometry
- Isosceles Triangles
- Law of Cosines
- Law of Sines
- Linear Measure and Precision
- Median
- Parallel Lines Theorem
- Parallelograms
- Perpendicular Bisector
- Plane Geometry
- Polygons
- Projections
- Properties of Chords
- Proportionality Theorems
- Pythagoras Theorem
- Rectangle
- Reflection in Geometry
- Regular Polygon
- Rhombuses
- Right Triangles
- Rotations
- SSS and SAS
- Segment Length
- Similarity
- Similarity Transformations
- Special quadrilaterals
- Squares
- Surface Area of Cone
- Surface Area of Cylinder
- Surface Area of Prism
- Surface Area of Sphere
- Surface Area of a Solid
- Surface of Pyramids
- Symmetry
- Translations
- Trapezoids
- Triangle Inequalities
- Triangles
- Using Similar Polygons
- Vector Addition
- Vector Product
- Volume of Cone
- Volume of Cylinder
- Volume of Pyramid
- Volume of Solid
- Volume of Sphere
- Volume of prisms
- Mechanics Maths
- Acceleration and Time
- Acceleration and Velocity
- Angular Speed
- Assumptions
- Calculus Kinematics
- Coefficient of Friction
- Connected Particles
- Conservation of Mechanical Energy
- Constant Acceleration
- Constant Acceleration Equations
- Converting Units
- Elastic Strings and Springs
- Force as a Vector
- Kinematics
- Newton's First Law
- Newton's Law of Gravitation
- Newton's Second Law
- Newton's Third Law
- Power
- Projectiles
- Pulleys
- Resolving Forces
- Statics and Dynamics
- Tension in Strings
- Variable Acceleration
- Work Done by a Constant Force
- Probability and Statistics
- Bar Graphs
- Basic Probability
- Charts and Diagrams
- Conditional Probabilities
- Continuous and Discrete Data
- Frequency, Frequency Tables and Levels of Measurement
- Independent Events Probability
- Line Graphs
- Mean Median and Mode
- Mutually Exclusive Probabilities
- Probability Rules
- Probability of Combined Events
- Quartiles and Interquartile Range
- Systematic Listing
- Pure Maths
- ASA Theorem
- Absolute Value Equations and Inequalities
- Addition and Subtraction of Rational Expressions
- Addition, Subtraction, Multiplication and Division
- Algebra
- Algebraic Fractions
- Algebraic Notation
- Algebraic Representation
- Analyzing Graphs of Polynomials
- Angle Measure
- Angles
- Angles in Polygons
- Approximation and Estimation
- Area and Circumference of a Circle
- Area and Perimeter of Quadrilaterals
- Area of Triangles
- Argand Diagram
- Arithmetic Sequences
- Average Rate of Change
- Bijective Functions
- Binomial Expansion
- Binomial Theorem
- Chain Rule
- Circle Theorems
- Circles
- Circles Maths
- Combination of Functions
- Combinatorics
- Common Factors
- Common Multiples
- Completing the Square
- Completing the Squares
- Complex Numbers
- Composite Functions
- Composition of Functions
- Compound Interest
- Compound Units
- Conic Sections
- Construction and Loci
- Converting Metrics
- Convexity and Concavity
- Coordinate Geometry
- Coordinates in Four Quadrants
- Cubic Function Graph
- Cubic Polynomial Graphs
- Data transformations
- De Moivre's Theorem
- Deductive Reasoning
- Definite Integrals
- Deriving Equations
- Determinant of Inverse Matrix
- Determinants
- Differential Equations
- Differentiation
- Differentiation Rules
- Differentiation from First Principles
- Differentiation of Hyperbolic Functions
- Direct and Inverse proportions
- Disjoint and Overlapping Events
- Disproof by Counterexample
- Distance from a Point to a Line
- Divisibility Tests
- Double Angle and Half Angle Formulas
- Drawing Conclusions from Examples
- Ellipse
- Equation of Line in 3D
- Equation of a Perpendicular Bisector
- Equation of a circle
- Equations
- Equations and Identities
- Equations and Inequalities
- Estimation in Real Life
- Euclidean Algorithm
- Evaluating and Graphing Polynomials
- Even Functions
- Exponential Form of Complex Numbers
- Exponential Rules
- Exponentials and Logarithms
- Expression Math
- Expressions and Formulas
- Faces Edges and Vertices
- Factorials
- Factoring Polynomials
- Factoring Quadratic Equations
- Factorising expressions
- Factors
- Finding Maxima and Minima Using Derivatives
- Finding Rational Zeros
- Finding the Area
- Forms of Quadratic Functions
- Fractional Powers
- Fractional Ratio
- Fractions
- Fractions and Decimals
- Fractions and Factors
- Fractions in Expressions and Equations
- Fractions, Decimals and Percentages
- Function Basics
- Functional Analysis
- Functions
- Fundamental Counting Principle
- Fundamental Theorem of Algebra
- Generating Terms of a Sequence
- Geometric Sequence
- Gradient and Intercept
- Graphical Representation
- Graphing Rational Functions
- Graphing Trigonometric Functions
- Graphs
- Graphs and Differentiation
- Graphs of Common Functions
- Graphs of Exponents and Logarithms
- Graphs of Trigonometric Functions
- Greatest Common Divisor
- Growth and Decay
- Growth of Functions
- Highest Common Factor
- Hyperbolas
- Imaginary Unit and Polar Bijection
- Implicit differentiation
- Inductive Reasoning
- Inequalities Maths
- Infinite geometric series
- Injective functions
- Instantaneous Rate of Change
- Integers
- Integrating Polynomials
- Integrating Trigonometric Functions
- Integrating e^x and 1/x
- Integration
- Integration Using Partial Fractions
- Integration by Parts
- Integration by Substitution
- Integration of Hyperbolic Functions
- Interest
- Inverse Hyperbolic Functions
- Inverse Matrices
- Inverse and Joint Variation
- Inverse functions
- Iterative Methods
- L'Hopital's Rule
- Law of Cosines in Algebra
- Law of Sines in Algebra
- Laws of Logs
- Limits of Accuracy
- Linear Expressions
- Linear Systems
- Linear Transformations of Matrices
- Location of Roots
- Logarithm Base
- Logic
- Lower and Upper Bounds
- Lowest Common Denominator
- Lowest Common Multiple
- Math formula
- Matrices
- Matrix Addition and Subtraction
- Matrix Determinant
- Matrix Multiplication
- Metric and Imperial Units
- Misleading Graphs
- Mixed Expressions
- Modulus Functions
- Modulus and Phase
- Multiples of Pi
- Multiplication and Division of Fractions
- Multiplicative Relationship
- Multiplying and Dividing Rational Expressions
- Natural Logarithm
- Natural Numbers
- Notation
- Number
- Number Line
- Number Systems
- Numerical Methods
- Odd functions
- Open Sentences and Identities
- Operation with Complex Numbers
- Operations with Decimals
- Operations with Matrices
- Operations with Polynomials
- Order of Operations
- Parabola
- Parallel Lines
- Parametric Differentiation
- Parametric Equations
- Parametric Integration
- Partial Fractions
- Pascal's Triangle
- Percentage
- Percentage Increase and Decrease
- Percentage as fraction or decimals
- Perimeter of a Triangle
- Permutations and Combinations
- Perpendicular Lines
- Points Lines and Planes
- Polynomial Graphs
- Polynomials
- Powers Roots And Radicals
- Powers and Exponents
- Powers and Roots
- Prime Factorization
- Prime Numbers
- Problem-solving Models and Strategies
- Product Rule
- Proof
- Proof and Mathematical Induction
- Proof by Contradiction
- Proof by Deduction
- Proof by Exhaustion
- Proof by Induction
- Properties of Exponents
- Proportion
- Proving an Identity
- Pythagorean Identities
- Quadratic Equations
- Quadratic Function Graphs
- Quadratic Graphs
- Quadratic functions
- Quadrilaterals
- Quotient Rule
- Radians
- Radical Functions
- Rates of Change
- Ratio
- Ratio Fractions
- Rational Exponents
- Rational Expressions
- Rational Functions
- Rational Numbers and Fractions
- Ratios as Fractions
- Real Numbers
- Reciprocal Graphs
- Recurrence Relation
- Recursion and Special Sequences
- Remainder and Factor Theorems
- Representation of Complex Numbers
- Rewriting Formulas and Equations
- Roots of Complex Numbers
- Roots of Polynomials
- Roots of Unity
- Rounding
- SAS Theorem
- SSS Theorem
- Scalar Triple Product
- Scale Drawings and Maps
- Scale Factors
- Scientific Notation
- Second Order Recurrence Relation
- Sector of a Circle
- Segment of a Circle
- Sequences
- Sequences and Series
- Series Maths
- Sets Math
- Similar Triangles
- Similar and Congruent Shapes
- Simple Interest
- Simplifying Fractions
- Simplifying Radicals
- Simultaneous Equations
- Sine and Cosine Rules
- Small Angle Approximation
- Solving Linear Equations
- Solving Linear Systems
- Solving Quadratic Equations
- Solving Radical Inequalities
- Solving Rational Equations
- Solving Simultaneous Equations Using Matrices
- Solving Systems of Inequalities
- Solving Trigonometric Equations
- Solving and Graphing Quadratic Equations
- Solving and Graphing Quadratic Inequalities
- Special Products
- Standard Form
- Standard Integrals
- Standard Unit
- Straight Line Graphs
- Substraction and addition of fractions
- Sum and Difference of Angles Formulas
- Sum of Natural Numbers
- Surds
- Surjective functions
- Tables and Graphs
- Tangent of a Circle
- The Quadratic Formula and the Discriminant
- Transformations
- Transformations of Graphs
- Translations of Trigonometric Functions
- Triangle Rules
- Triangle trigonometry
- Trigonometric Functions
- Trigonometric Functions of General Angles
- Trigonometric Identities
- Trigonometric Ratios
- Trigonometry
- Turning Points
- Types of Functions
- Types of Numbers
- Types of Triangles
- Unit Circle
- Units
- Variables in Algebra
- Vectors
- Verifying Trigonometric Identities
- Writing Equations
- Writing Linear Equations
- Statistics
- Bias in Experiments
- Binomial Distribution
- Binomial Hypothesis Test
- Bivariate Data
- Box Plots
- Categorical Data
- Categorical Variables
- Central Limit Theorem
- Chi Square Test for Goodness of Fit
- Chi Square Test for Homogeneity
- Chi Square Test for Independence
- Chi-Square Distribution
- Combining Random Variables
- Comparing Data
- Comparing Two Means Hypothesis Testing
- Conditional Probability
- Conducting a Study
- Conducting a Survey
- Conducting an Experiment
- Confidence Interval for Population Mean
- Confidence Interval for Population Proportion
- Confidence Interval for Slope of Regression Line
- Confidence Interval for the Difference of Two Means
- Confidence Intervals
- Correlation Math
- Cumulative Distribution Function
- Cumulative Frequency
- Data Analysis
- Data Interpretation
- Degrees of Freedom
- Discrete Random Variable
- Distributions
- Dot Plot
- Empirical Rule
- Errors in Hypothesis Testing
- Estimator Bias
- Events (Probability)
- Frequency Polygons
- Generalization and Conclusions
- Geometric Distribution
- Histograms
- Hypothesis Test for Correlation
- Hypothesis Test for Regression Slope
- Hypothesis Test of Two Population Proportions
- Hypothesis Testing
- Inference for Distributions of Categorical Data
- Inferences in Statistics
- Large Data Set
- Least Squares Linear Regression
- Linear Interpolation
- Linear Regression
- Measures of Central Tendency
- Methods of Data Collection
- Normal Distribution
- Normal Distribution Hypothesis Test
- Normal Distribution Percentile
- Paired T-Test
- Point Estimation
- Probability
- Probability Calculations
- Probability Density Function
- Probability Distribution
- Probability Generating Function
- Quantitative Variables
- Quartiles
- Random Variables
- Randomized Block Design
- Residual Sum of Squares
- Residuals
- Sample Mean
- Sample Proportion
- Sampling
- Sampling Distribution
- Scatter Graphs
- Single Variable Data
- Skewness
- Spearman's Rank Correlation Coefficient
- Standard Deviation
- Standard Error
- Standard Normal Distribution
- Statistical Graphs
- Statistical Measures
- Stem and Leaf Graph
- Sum of Independent Random Variables
- Survey Bias
- T-distribution
- Transforming Random Variables
- Tree Diagram
- Two Categorical Variables
- Two Quantitative Variables
- Type I Error
- Type II Error
- Types of Data in Statistics
- Variance for Binomial Distribution
- Venn Diagrams

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Jetzt kostenlos anmeldenNie wieder prokastinieren mit unseren Lernerinnerungen.

Jetzt kostenlos anmeldenWhile on a train, suddenly, our journey was halted. A murder had been committed on the train, and a detective was brought to investigate. During the investigation, the detective first considered the boarding class of the culprit and had to take the census of all the passengers on the first class and the economic class. Afterwards, we were grouped based on handwriting into left-handed, right-handed, and ambidextrous.

This detective had just considered two categorical variables, boarding class, and handwriting, but was he able to solve the crime? Here, you will be learning about the correlation, graph, tests, and more regarding two categorical variables. You can also be a detective!

Earlier in the crime story, it was mentioned that the detective had approached the case by considering two categorical variables. What is a categorical variable?

A **categorical variable, **also known as a **qualitative variable**, is a variable whose properties are described, rather than measured.

If the properties of a variable can be measured or counted, they are known as **quantitative variables****.** You will not focus on these variables in this article.

Definitions are always better understood with examples!

You get thirsty while on the train, so you go and get a can of soda. To be more specific, you get a \(12\) oz lime-flavored soda, which comes in a green can and has \(40\) calories.

In this example, the **categorical variables** are those that you can describe, such as the flavor and the color of the can. The amount of liquid in the can and the calorie count are both measurable, so they are **quantitative variables.**

And what does the detective mean when talking about two categorical variables?

When talking about **two categorical variables**, you are talking about the combinations you can get from looking at two separate categorical variables.

Let's go back to the investigation. The detective considered two categories: boarding class and handwriting. So, there are six possible combinations that are produced using these two categories:

- First-class right-handed
- Economic class right-handed
- First-class left-handed
- Economic class left-handed
- First-class ambidextrous
- Economic class ambidextrous

A two-way table, or contingency table, is a table that organizes the observations according to two categorical variables. Each cell in a contingency table represents a combination of two factors, and the frequency of the subjects that fall within those categories is written in that cell.

The detective used a contingency table to classify the passengers of the train based on-boarding class and handwriting.

Boarding class | |||

First class | Economic class | ||

Handwriting | Right | \[30\] | \[35\] |

Left | \[13\] | \[11\] | |

Ambidextrous | \[4\] | \[7\] |

For example, by looking at the table, you can tell that \(30\) of the first-class passengers are right-handed. You can find the rest of the frequencies of the other combinations of factors by looking at the respective cell.

The frequencies of a contingency table show how many subjects fall within each **combination **of the two categorical variables.

Typically, contingency tables also include an extra row at the bottom and an extra column to the right to count totals.

Boarding class | ||||

First class | Economic class | Total | ||

Handwriting | Right | \[30\] | \[35\] | \[65\] |

Left | \[13\] | \[11\] | \[24\] | |

Ambidextrous | \[4\] | \[7\] | \[11\] | |

Total | \[47\] | \[53\] | \[100\] |

For example, there are \(65\) right-handed passengers, and there are \(53\) passengers in the economic class. By looking in the bottom-right corner, you find that there is a total of \(100\) passengers.

Sometimes, rather than the actual numbers, you just need to know which fraction of the subjects fall within each category. This fraction, or ratio, is known as **relative frequency.**

The **relative frequency** is the ratio of the frequency of an observation divided by the total of observations.

Perhaps, you wish to know what fraction of the total suspects consists of left-handed first class passengers, then, the relative frequency of left-handed first class passengers to the total passengers is:

\[\frac{13}{100} \]

or written as a percentage:

\[ \frac{13}{100} \cdot 100 \% = 13\%\]

You might also find the **marginal relative frequency** and **conditional relative frequency**, which are two kinds of relative frequencies.

Contingency tables typically write totals at the extreme right column and the bottom row. These totals are known as **marginal frequencies.**

The marginal frequency is the number of subjects that fall within each individual category. The **marginal distribution** consists of all the marginal frequencies of the table.

In the train scenario, the marginal distribution will tell you the frequency of first-class, economic class, right-handed, left-handed, and ambidextrous passengers.

The marginal distribution receives its name from the fact that the totals are shown on the **margins** of the table.

The marginal frequencies of a contingency table show how many subjects fall within each individual categorical variable.

If you know how to find marginal frequencies and relative frequencies, then you also know about marginal relative frequencies! Whenever you are using a marginal frequency to find a ratio, then you are finding a marginal relative frequency.

Imagine you were to determine the marginal frequency of economic class passengers relative to the total suspects from the table. Knowing that the marginal frequency of economic class passengers is \(53\) and the total frequency is \(100\), the marginal relative frequency of economic class passengers is:

\[\frac{53}{100}\]

or written as a percentage:

\[ \frac{53}{100} \cdot 100 \% = 53\%\]

You can also apply this reasoning to find more frequencies. Try finding the marginal frequency of left-handed people, or the marginal relative frequency of first-class passengers.

By using the same table, if you choose to focus on a particular row, then you will be working with a particular handwriting. Likewise, if you decide to focus on a particular column, then you are dealing with a specific boarding class.

In this case, you are placing a **condition** on the values that you are reading from the table.

The **conditional frequency** is the number of subjects that fall within a category, considering that the other category has already been specified.

The conditional frequency makes more sense when talking about **conditional relative frequency**.

A** conditional relative**** frequency** is the ratio of a **conditional frequency** divided by the **marginal frequency** of the specified category.

Typically, the word “given” is used to emphasize that you are dealing with a conditional frequency.

As usual, this idea is better understood with an example.

Using the information in the table, determine the conditional relative frequency that a suspect is left-handed given that it is on the economic class.

**Solution:**

Here is the table again, so you do not have to scroll back up.

Boarding class | ||||

First class | Economic class | Total | ||

Handwriting | Right | \[30\] | \[35\] | \[65\] |

Left | \[13\] | \[11\] | \[24\] | |

Ambidextrous | \[4\] | \[7\] | \[11\] | |

Total | \[47\] | \[53\] | \[100\] |

Table 1. Categories of people and hand dexterity.

You are asked to find a certain conditional relative frequency **given** that the passenger is from the economic class. This means that you will focus on the column that contains the frequencies of the economic class passengers.

Since you are asked to find the conditional relative frequency that a subject is left-handed, you now look at the row with the left-handed passengers. There are \(11\) economic class, left-handed passengers.

Finally, divide this frequency by the marginal frequency of economic class passengers. The number at the bottom of the economic class column tells you that there are \(53\) passengers in the economic class, so the conditional relative frequency that a suspect is left-handed, given that is in the economic class, is:

\[ \frac{11}{53}\]

which you can write as a percentage with the help of a calculator, that is:

\[ \frac{11}{53} \cdot 100 \% = 20.75 \%\]

The use of relative frequency in analyzing a contingency table enables you to re-express the table in percentages of the desired condition, which may require just one category or a combination of both. When such is achieved, a graph may be drawn in the form of a pie chart or bar chart.

As mentioned earlier, the two-way table is essential for visualizing two categorical variables. The following example is a quick illustration of the graphing of two categorical variables.

Considering the table which has repeatedly been used in this text, graph a pie chart considering all data given.

**Solution:**

In this case, all two categories must be represented in one pie chart. This means that the contingency table has to be re-expressed in percentage to easily plot the graph. Below is the table from the data given:

Boarding class | ||||

First class | Economic class | Total | ||

Handwriting | Right | \[30\] | \[35\] | \[65\] |

Left | \[13\] | \[11\] | \[24\] | |

Ambidextrous | \[4\] | \[7\] | \[11\] | |

Total | \[47\] | \[53\] | \[100\] |

Table 2. Categories of people and hand dexterity.

Each cell has to be represented in percentage as a fraction of the total, which is \(100\). For example, the relative frequency of left-handed first-class passengers would be calculated as:

\[\frac{13}{100}\cdot 100\%=13\%\]

By repeating this process with all the frequencies, you obtain the following table.

Boarding class | ||||

First class | Economic class | Total | ||

Handwriting | Right | \[30 \%\] | \[35 \% \] | \[65 \%\] |

Left | \[13 \%\] | \[11 \%\] | \[24 \%\] | |

Ambidextrous | \[4 \%\] | \[7 \%\] | \[11 \%\] | |

Total | \[47 \%\] | \[53 \%\] | \[100 \%\] |

Table 3. Categories of people and hand dexterity.

The pie chart above illustrates the graphical representation of the data collated by the detective and is an example of a two categorical variables graph.

It is also common to draw charts using conditional relative frequencies.

The detective decides to focus his attention on the first-class passengers. Draw a pie chart of the conditional relative frequencies of the passengers given that they are first-class passengers.

**Solution:**

Since you are asked to draw a pie chart for the passengers that meet the condition of being in first-class, you should focus on such a column from the table.

First class passengers | |

Right-handed | \[30\] |

Left-handed | \[13\] |

Ambidextrous | \[4\] |

Total | \[47\] |

Table 4. Categories of people and hand dexterity.

As usual, to draw a pie chart you need to find the relative frequencies, which in this case will be conditional relative frequencies. For right-handed first-class passengers, this is:

\[ \frac{30}{47} \cdot 100 \% = 63.8\%\]

and find the rest of the conditional relative frequencies in a similarly, obtaining the next table.

First class passengers | |

Right-handed | \[63.8 \%\] |

Left-handed | \[ 27.7 \% \] |

Ambidextrous | \[ 8.5 \% \] |

Total | \[ 100\%\] |

Table 5. Categories of people and hand dexterity.

Here is the resulting pie chart.

Keep in mind that you can also use other types of graphs to study two categorical variables, such as bar graphs or stacked bar charts.

There are some tests to calculate the correlation between two categorical variables, like the tetrachoric correlation, the polychoric correlation, and Cramer's V. However, these methods are not addressed in AP, so will remain out of the scope of this article.

Whenever you are dealing with correlation at an AP level, you are talking about the correlation between quantitative variables. For more information about this, please take a look at our article about Two Quantitative Variables.

To check the association between variables in a contingency table, the chi-square (or \(X^ 2\)) test is used. Two questions are asked, which form the null hypothesis and the alternate hypothesis. The null hypothesis is denoted as \(H_0\), and represents “no association exists between both variables”, which implies that both variables are indeed independent. Meanwhile, the alternate hypothesis denoted by \(H_a\) represents “association exists between both variables”, and implies that both variables are dependent.

For more information about the chi-square test and how to perform it, please reach out to our Chi-Square Tests article.

Besides studying the relation between the data you have collected, statistics can also be used for predicting outcomes. Given a significant enough data set, you can start making predictions based on the data you previously gathered. This is the main idea behind **regression analysis.**

**Regression analysis** is a collection of techniques used in statistics to find a mathematical model that can describe the relationship between two (or more) variables.

Regression analysis is usually done on quantitative variables because you are working with the numerical values of such variables. However, in some scenarios, it is possible to assign a numerical value to a categorical variable, so the techniques used in regression become available. The following example might sound familiar.

For administrative purposes, restaurants often rely on surveys to evaluate a customer's satisfaction. This satisfaction can be seen as a categorical variable, which will typically be described using words like:

- Terrible
- Bad
- Regular
- Good
- Excellent

However, you cannot do operations with these words. One way of dealing with this situation is to assign numerical values to each of the possible categories, so the following phrase might now sound familiar to you:

“On a scale from \(1\) to \(5\), where \(1\) is terrible and \(5\) is excellent, how would you rate the service?”

This way, you can assign a numerical value to each of the possible words that you would have used.

- Terrible
- Bad
- Regular
- Good
- Excellent

If you feel like the service was *almost *excellent, this method will also allow you to give intermediates, like \(4.8\).

Once you have assigned numerical values to categorical variables, you can now start doing regression with it. You just have to make sure that the numerical values assigned are reasonable enough. For more information about regression analysis please take a look at our Linear Regression article.

You should try as many examples as possible to develop competency on tasks involving two categorical variables.

The detective now decides to focus on investigating right-handed people. Draw a pie chart of the conditional relative frequencies of the passengers given that they are right-handed.

**Solution:**

Since you are asked to draw a pie chart for the passengers that meet the condition of being right-handed, you should focus on the corresponding row of the table.

First class | Economic class | Total | |

Right-handed people | \[30\] | \[35\] | \[65\] |

Table 6. Categories of people and hand dexterity for right-handed people.

Relative frequencies are always required for drawing pie charts, so find them using the usual method. This will result in the following table.

First class | Economic class | Total | |

Right-handed people | \[46.2 \% \] | \[ 53.8 \%\] | \[ 100 \%\] |

Table 7. Categories of people and hand dexterity for right handed people.

Using this table, you can draw the pie chart of these conditional relative frequencies.

Remember the crime scene? While the detective carried out his investigation, he confirmed that the crime had been carried out by an ambidextrous person on the first class, who also had flu. By placing an additional condition, the detective narrowed down the search! However, the only ambidextrous fellow with the flu on the train was... ME. Did I commit murder ? Wait up after the following example to confirm this.

A survey was carried out to determine the relationship between a population's sociability and the size of their family. Here, an individual was asked whether they considered themselves sociable or not and whether they came from a family size of four and below, or from a family size of above four. The results are shown below.

Categories | Sociable | Not sociable |

Family size of four and below | \[40\] | \[50\] |

Family size of above four | \[90\] | \[20\] |

Table 8. Categories of people and their sociability.

a. Find the relative frequency of individuals that come from a family size above four, relative to the total individuals sampled.

b. Determine the conditional relative frequency that an individual is from a family of four and below, given that is not sociable.

c. Make a graph of the information given.

**Solution:**

Since you are not provided such information, you should begin by finding the marginal frequencies and adding them to the contingency table. Here, you only have to add the values of the rows and the columns.

Categories | Sociable | Not sociable | Total |

Family size of four and below | \[40\] | \[50\] | \[90\] |

Family size of above four | \[90\] | \[20\] | \[110\] |

Total | \[130\] | \[70\] | \[200\] |

Table 9. Categories of people and their sociability..

Now, you can answer the questions.

a. This is the marginal frequency of individuals that come from families that are above four in size, divided by the total individuals sampled. This is:

\[\frac{110}{200}=\frac{11}{20}\]

or written as a percentage

\[ \frac{11}{20} \cdot 100 \% = 55 \%\]

b. You are asked to find the conditional relative frequency that an individual is from a family of four and below, **given **that is not sociable. The **condition **placed here is that the individual is **not sociable**, so you will focus on the corresponding column.

Next, go to the corresponding row to find that \(50\) individuals under this category also come from a family size of four and below. To find the conditional relative frequency, you divide this number by the total of not sociable individuals, so:

\[\frac{50}{70}=\frac{5}{7}\]

or written as a percentage

\[ \frac{5}{7} \cdot 100 \% = 71.4\%\]

c. A bar graph may be drawn to give a visual interpretation of the data given. However, it would be beneficial to have a separate table of the data in percentages, as shown below:

Categories | Sociable | Not sociable | Total |

Family size of four and below | \[ 20\%\] | \[25 \%\] | \[45 \%\] |

Family size of above four | \[ 45 \% \] | \[ 10 \% \] | \[ 55 \% \] |

Total | \[ 65 \% \] | \[ 35 \% \] | \[ 100 \% \] |

Table 10. Categories of people and their sociability..

The bar graph below is a representation of the data given:

After the detective revealed that I was the culprit, I woke up from my weary dream. Nonetheless, all you have learned here are based on statistical principles and would prove very useful when you attempt more tasks. See how statistics can be useful even when solving crimes?

- Two categorical variables are data representations arranged by considering two factors or groups, which are otherwise termed categories.
- When studying two categorical variables, they are typically arranged in
**contingency tables,**which are also known as**two-way tables.**- Each value in a contingency table represents the frequency of the individuals that fall under each
**combination**of the two categorical variables.

- Each value in a contingency table represents the frequency of the individuals that fall under each
- Contingency tables typically also include totals in their margins. These totals are known as
**marginal frequencies.**- The marginal frequencies of a contingency table show how many subjects fall within each individual categorical variable.

- Relative frequency is the fraction of an event out of the total frequency in a statistical experiment.
- A conditional relative frequency can be obtained by dividing one of the frequencies of the table by the marginal frequency of the category that is being used as the condition.

- The data from studying two categorical variables can be arranged using the typical charts that are used for categorical data, like bar charts and pie charts.

**contingency tables,** which are also known as **two-way tables.**

More about Two Categorical Variables

Be perfectly prepared on time with an individual plan.

Test your knowledge with gamified quizzes.

Create and find flashcards in record time.

Create beautiful notes faster than ever before.

Have all your study materials in one place.

Upload unlimited documents and save them online.

Identify your study strength and weaknesses.

Set individual study goals and earn points reaching them.

Stop procrastinating with our study reminders.

Earn points, unlock badges and level up while studying.

Create flashcards in notes completely automatically.

Create the most beautiful study materials using our templates.

Sign up to highlight and take notes. It’s 100% free.

Over 10 million students from across the world are already learning smarter.

Get Started for Free