Log In Start studying!

Select your language

Suggested languages for you:
StudySmarter - The all-in-one study app.
4.8 • +11k Ratings
More than 3 Million Downloads
Free
|
|

Chi Square Test for Goodness of Fit

Chi Square Test for Goodness of Fit

So, you've familiarized yourself with the concept of Chi-square distributions and been introduced to the concept of Chi-square tests. Well, now you've come to the good bit. Now it's time to learn how to actually apply these handy little concepts to perform actual statistical testing on sets of data. The first chi-square test that can be performed is the Chi-square test for goodness of fit. In this explanation, you'll learn how you can use this cool little test to check if a distribution actually occurs as projected in reality, or if the distribution, in reality, differs from the projection in a statistically meaningful way.

If you don't feel totally comfortable with the idea of a Chi-Square Distribution or the basic concept of Chi-Square Tests, don't sweat it, there are StudySmarter explanations for both!

No point waiting around then, let's dive into it!

Chi-Square Test for Goodness of Fit Definition

What is the Chi-square test for goodness of fit then? Well...

The Chi-square test for goodness of fit is a statistical hypothesis test used to determine whether an expected distribution of outcomes is significantly different from the actual observed distribution of outcomes.

This is a lot of talk of outcomes and distributions and all sorts of statistics talk, but what does it all mean?

Well, imagine if you rolled a \(6\)-sided die \(100\) times. You would expect it to land on each of the sides roughly an equal number of times.

If you actually carried this out and recorded the results, you could then use the Chi-squared goodness of fit test to check if the real-life data matched your expectation, within reasonable limits of course.

Useful right? Ok so now you're hopefully familiar with the what and the why of Chi-square tests for goodness of fit, now let's get into the good stuff. The how.

Chi-Square Test for Goodness of Fit Hypotheses

The Chi-square test for goodness of fit is a hypothesis test. This means that of course, it must start with a set of hypotheses.

Now, to conduct a hypothesis test like this, you need a null hypothesis and an alternative hypothesis.

A null hypothesis is a hypothesis that states that any statistical difference between populations is down to random chance. For instance

\(H_0:\) A flipped coin will land on heads \(50\%\) of the time.

If the null hypothesis proves false through the test, what will have been found? The alternative hypothesis

\(H_a:\) A flipped coin will not land on heads \(50\%\) of the time.

How does the Chi-square test for goodness of fit prove or disprove the null hypothesis? Well, it tests how likely the outcome of the sample is to have occurred if the null hypothesis is true. If the probability is low enough, the null hypothesis is considered false, and the alternative hypothesis must be true.

For instance, say your sample was \(100\) coin flips and you got the following result.

Heads
Tails
\(99\)
\(1\)

Table 1. Heads vs tails test.

If there was a \(50\%\) chance of flipping heads with each flip, as the null hypothesis states, then how likely would this result be? It makes sense intuitively that the probability is so low that it's bordering on impossible.

What about if you obtained these results?

Heads
Tails
\(58\)
\(42\)

Table 2. Heads vs tails test.

Well, this is a bit closer so it's hard to say, but using the Chi-square test for goodness of fit, it could be determined whether this result proved or disproved the null hypothesis.

Chi-Square Test for Goodness of Fit Assumptions and Conditions

The Chi-square test for goodness of fit is not appropriate to be used on all data. In fact, there are fconditions, (sometimes referred to as assumptions) that must hold true.

  • The sampling method is simple random sampling.

  • The variable under study is categorical.

  • The expected value of observations for each category must be at least five.

  • Each outcome in the variable under study must be independent.

Let's take a look at each of these conditions a little more closely.

Random Sampling

For the Chi-square test for goodness of fit, the sample being analyzed must have constituents that have been chosen at random.

Say you wished to try and predict the frequency at which different types of candy appear in a mixed bag. Well, if you wished to see if your prediction was accurate, you could potentially use a Chi-square test for goodness of fit only if the bags you take to check this are chosen completely at random.

Categorical Variable

What is a categorical variable? Well, let's take the example of the mixed bags of candy from before. Each of the candies in the bag can be categorized by what type of sweet it is. There is no inherent ordering to these categories, therefore the variable is categorical. If, for instance, your data categories were school years, the variable would be simply ordered from low to high, and thus an ordinal variable, not categorical.

Of these two examples of variables, only the candy example is categorical, and therefore only the candies can be tested using the Chi-squared for the goodness of fit test.

Expected Value

The next condition for a Chi-square test for goodness of fit is expected sample observations per category of at least five. This one is nice and simple. Basically, this test can only be used on large enough sample sizes. Your hypothesis might be that there is the same number of each sweet spread out amongst the bags. If your sample includes \(200\) candy and five types of candies, then the expected number of each sweet found in the sample would be \(40\). This is above five, and therefore meets this condition for the test

Outcome Independence

The final condition for the Chi-square test for goodness of fit is outcome independence. All this means is that the probability of each outcome is not affected by the outcomes that came before. For instance, when it comes to the bag of candies, each time a sweet is picked from a bag there is a \(\frac{1}{5}\) chance that it is a cola bottle. This is true no matter how many cola bottles have been picked before, or how many gummy bears. The previous outcomes have no effect on this one, so the outcomes are independent and the condition is met.

Formula for Chi-Square Goodness of Fit Test Statistic

Once the hypotheses have been formulated and the conditions confirmed to have been met, it's time to calculate the Chi-square test statistic. This is done with this simple formula

\[\chi^2 = \sum_{i=1}^n \frac{(O_i-E_i)^2}{E_i}\]

Where \(O_i\) is the \(i^{th}\) observed value and \(E_i\) is the \(i^{th} \) expected value.

For example, with the following expected and observed values, the calculation would be carried out as follows.

Cola Bottle
Flying Saucer
Gummy Bear
Fruit Lace
Toffee
Expected
\(40\)
\(40\)
\(40\)
\(40\)
\(40\)
Observed
\(20\)
\(25\)
\(15\)
\(18\)
\(22\)

Table 3. expected and observed values, chi-square test.

\[\begin{align} \chi^2 &= \sum_{i=1}^n \frac{(O_i-E_i)^2}{E_i} \\\\ &= \frac{(20-40)^2}{40} + \frac{(25-40)^2}{40} + \frac{(15-40)^2}{40}+... \\\\ &= 51.45 \end{align} \]

Performing the Test for Goodness of Fit

Firstly, you will need to know the significance level, \(\alpha\). The significance level sets the strength of the evidence you require to be able to consider the null hypothesis proven. Often significance levels will be set at \(5\%\), (\(\alpha=0.05\)). A lower significance level indicates that a greater strength of evidence is required.

Secondly, you will need to know the number of degrees of freedom of the problem. The number of degrees of freedom is simply the number of independent groups the variable has. This value is just the number of groups \(-1\). For example, for a variable with five groups, the number of degrees of freedom is four.

The next step in the test is to either find the Chi-square value or the p-value. Either of these values can be used to complete the test.

Performing the Test With the Chi-Square Value

From the Chi-square table, you can find the Chi-square value for your test for the significance level and degrees of freedom of your specific problem. Below is a small segment of the table.

Degrees of Freedom

Significance Level

\(0.2\)

\(0.1\)

\(0.05\)

\(0.025\)

\(0.01\)

\(1\)

\(1.64\)

\(2.71\)

\(3.84\)

\(5.02\)

\(6.64\)

\(2\)

\(3.22\)

\(4.61\)

\(5.99\)

\(7.38\)

\(9.21\)

\(3\)

\(4.64\)

\(6.25\)

\(7.82\)

\(9.35\)

\(11.35\)

\(4\)

\(5.99\)

\(7.78\)

\(9.49\)

\(11.14\)

\(13.28\)

Table 4 - Chi-Square Values

So, back to the candy example. If the significance level is set at \(5\%\), what is the Chi-square value? Well, the value where \(\alpha = 0.05\) and \(4\) meet is \(9.49\).

The question that now arises, is whether the test statistic is greater, or smaller than the Chi-square value. If your test statistic is lower than the Chi-square value, then you can consider the null hypothesis confirmed.

Performing the Test With the P-value

The \(p-\)value is the probability that (if the null hypothesis is true) sampling variation would produce an estimate that is further away from the hypothesis value than found in the current sample. It's a bit wordy, In other words, it's the probability that random sampling could produce a less accurate result than the current one.

Once again, the table is consulted. This time, find where your test statistic lies in the table, and extract the corresponding value from the significance level row. For example, for a test statistic of \(5\) when the degrees of freedom was \(3\), \(0.2< p <0.1\). As long as the \(p-\)value is greater than the significance level, the null hypothesis has not been disproven.

Chi-square Test for Goodness of Fit Example

(1) A biologist hypothesizes that each of the three types of fish occurs in equal numbers in a pond. They take a random sample of \(120\) fish to test the hypothesis, and the results were as follows
Bass
Crappie
Sunfish
\(32\)
\(52\)
\(36\)

Table 5. Fish data table.

Degrees of Freedom

Significance Level

\(0.2\)
\(0.1\)
\(0.05\)
\(0.025\)
\(0.01\)
\(1\)
\(1.64\)
\(2.71\)
\(3.84\)
\(5.02\)
\(6.64\)
\(2\)
\(3.22\)
\(4.61\)
\(5.99\)
\(7.38\)
\(9.21\)
\(3\)
\(4.64\)
\(6.25\)
\(7.82\)
\(9.35\)
\(11.35\)
\(4\)
\(5.99\)
\(7.78\)
\(9.49\)
\(11.14\)
\(13.28\)

Table 6. Degrees of freedom and significant level.

(a) State the hypotheses being tested.(b) Does the data being tested meet the conditions for a Chi-square test for goodness of fit?(c) Calculate the Chi-square test statistic.(d) Find the Chi-square value of the data, given the significance level is \(5\%\).(e) Does the sample disprove the null hypothesis?Solution:(a) The first step is to define the hypotheses.\(H_0\): Each type of fish occurs in equal numbers in the pond.\(H_a\): Each type of fish does not occur in equal numbers in the pond.(b) The question states that the sample is random, so the first condition is met.The variable is categorical as it is made up of unordered groups therefore the second condition is met.The expected value of each group is \(\frac{120}{3} = 40\), which is over five, therefore the third condition is met.Finally, when a fish is pulled out of the water there is always a \(\frac{1}{3}\) chance of it being any of the types of fish, therefore each outcome is independent, and so the fourth condition is met.Yes, it meets the four conditions(c) \[\begin{align} \chi^2& = \sum_{i=1}^n \frac{(O_i-E_i)^2}{E_i} \\\\ &=\frac{(32-40)^2}{40} +\frac{(52-40)^2}{40} + \frac{(36-40)^2}{40} \\\\ &= 5.6 \end{align}\]

(d) \[\begin{align} df &= n - 1 \\\\ &= 3 - 1 \\\\ &= 2\end{align}\]

With a significance level of \(5\%\), \(\alpha = 0.05\), the Chi-square value from the table is \(5.99\).

(e) As the test statistic is less than the Chi-square value \((5.6 < 5.99)\), the test has shown there is not sufficient evidence to disprove the null hypothesis.

(2) A school does a study about the occurrence of different colored eyes in its pupils. It is hypothesized that \(15\%\) of pupils will have green eyes, \(25\%\) of pupils will have blue eyes, and \(60\%\) of pupils will have brown eyes. Of the \(1000\) pupils, \(80\) are chosen at random. The results of the sample are as follows.

Green
Blue
Brown
\(18\)
\(28\)
\(34\)

Table 7. Colour data.

Table 8. Degrees of freedom and significant level.

Degrees of Freedom

Significance Level

\(0.2\)
\(0.1\)
\(0.05\)
\(0.025\)
\(0.01\)
\(1\)
\(1.64\)
\(2.71\)
\(3.84\)
\(5.02\)
\(6.64\)
\(2\)
\(3.22\)
\(4.61\)
\(5.99\)
\(7.38\)
\(9.21\)
\(3\)
\(4.64\)
\(6.25\)
\(7.82\)
\(9.35\)
\(11.35\)
\(4\)
\(5.99\)
\(7.78\)
\(9.49\)
\(11.14\)
\(13.28\)

(a) State the hypotheses being tested.(b) Does the data being tested meet the conditions for a Chi-square test for goodness of fit?(c) Calculate the Chi-square test statistic.(d) Find the \(p-\)value of the data, given the significance level is \(5\%\).(e) Does the sample disprove the null hypothesis?

Answer:

(a) \(H_0\): \(15\%\) of pupils will have green eyes, \(25\%\) of pupils will have blue eyes, and \(60\%\) of pupils will have brown eyes.

\(H_a\): It is not the case that \(15\%\) of pupils will have green eyes, \(25\%\) of pupils will have blue eyes, and \(60\%\) of pupils will have brown eyes

(b) The question states that the sample is random, so the first condition is met. The variable is categorical as it is made up of unordered groups therefore the second condition is met. The expected value of each group can be calculated as follows

\[Green = 80 \cdot 0.15 = 12\]

\[Blue = 80 \cdot 0.25 = 20\]

\[Brown = 80 \cdot 0.6 = 48\]

As the expected value of each group is greater than \(5\), the third condition is met.Finally, the color of one student's eyes is not affected by the color of any other student's eyes, therefore the fourth condition is met.

(c) \[\begin{align} \chi^2& = \sum_{i=1}^n \frac{(O_i-E_i)^2}{E_i} \\\\ &=\frac{(18-12)^2}{12} +\frac{(28-20)^2}{20} + \frac{(34-48)^2}{48} \\\\ &= 10.28 \end{align}\]

(d) First, find the degrees of freedom

\[\begin{align} df &= n - 1 \\\\ &= 3-1 \\\\ &=2 \end{align}\]

Now, as the test statistic is \(10.28\), from the table

\[p < 0.01 \]

(e) As the \(p-\)value is smaller than the significance level, sufficient evidence has been provided to disprove the null hypothesis.

\[p < 0.01 < 0.05\]

Chi-Square Test for Goodness of Fit - Key takeaways

  • The Chi-square test for goodness of fit is a statistical hypothesis test used to determine whether an expected distribution of outcomes is significantly different from the actual observed distribution of outcomes.
  • The Chi-square test for goodness of fit can only be carried out on data that meets the four conditions.
  • The Chi-square test for goodness of fit can be carried out either by comparing the Chi-square value and test statistic or by comparing the \(p-\)value of the data and the significance level.

Frequently Asked Questions about Chi Square Test for Goodness of Fit

The Chi-square test for goodness of fit can be used when you wish to test hypotheses about categorical data sets.

A chi-square test for goodness of fit can be conducted to confirm or deny a hypothesis about the distribution of a categorical data set.

  1. The sample method must be random
  2. The variable under study must be categorical
  3. The expected value of observations for each category must be at least five
  4. Each outcome in the variable under study must be independent.

The Chi-square test for goodness of fit can only be conducted on data that meets the four conditions.

One.

The Chi-square test for goodness of fit is right tailed because the numerator of the Chi-square test statistic is always positive

A Chi-square test for goodness of fit can be conducted to confirm or deny a hypothesis about the distribution of a categorical data set.

Final Chi Square Test for Goodness of Fit Quiz

Question

What is the Chi-square test for goodness of fit?

Show answer

Answer

The Chi-square test for goodness of fit is a statistical test that can be used to confirm or deny a hypothesis about the distribution of a categorical data set.

Show question

Question

What is a categorical data set?

Show answer

Answer

Categorical data is data that is divided into discrete, unordered groups.

Show question

Question

Give an example of a categorical data set.

Show answer

Answer

Anything that fits the definition


'Categorical data is data that is divided into discrete, unordered groups.'


Such as types of candy in a bag, the number of people with various eye colors, etc.

Show question

Question

How many conditions need to be met by a data set for a Chi-square test for goodness of fit to be used?

Show answer

Answer

Four.

Show question

Question

What are the conditions that need to be met by a data set for a Chi-square test for goodness of fit to be used?

Show answer

Answer

  • The sampling method is simple random sampling.

  • The variable under study is categorical.

  • The expected value of observations for each category must be at least five.

  • Each outcome in the variable under study must be independent. 

Show question

Question

What makes a sampling method random?

Show answer

Answer

The constituents of the sample must be chosen totally at random.

Show question

Question

Someone flips a coin a hundred times and records the result of each flip, heads or tails.


Is each outcome in this dataset independent?

Show answer

Answer

Yes, the outcome of any given flip does not impact the probability of that or another outcome arising again.

Show question

Question

What is the formula for calculating the Chi-square test statistic?

Show answer

Answer

\[\chi^2 = \sum_{i=1}^n \frac{(O_i - E_i)^2}{E_i}\]

Show question

Question

What is the significance level of a Chi-square test for goodness of fit?

Show answer

Answer

The significance level sets the strength of the evidence you require to be able to consider the null hypothesis proven.

Show question

Question

What is a null hypothesis?

Show answer

Answer

A null hypothesis is a hypothesis that states that any statistical difference between populations is down to random chance.

Show question

Question

What is the alternative hypothesis in a Chi-square test for goodness of fit?

Show answer

Answer

The alternative hypothesis is what will be true if your null hypothesis proves wrong.

Show question

Question

How can you find the number of degrees of freedom of a categorical variable?

Show answer

Answer

\[df =\text { number of groups} - 1\]

Show question

Question

What values do you need to find the Chi-square value for a Chi-square test for goodness of fit?

Show answer

Answer

Degrees of freedom, significance level, and a Chi-square table.

Show question

Question

What makes a group of a categorical variable independent?

Show answer

Answer

The likelihood of a given observation belonging to that group is not affected by the number of observations belonging to any other group.

Show question

Question

Why is a chi-square for goodness of fit test always right-tailed

Show answer

Answer

The Chi-square goodness of fit test is right-tailed because the numerator of the Chi-square test statistic is always positive.

Show question

More about Chi Square Test for Goodness of Fit
60%

of the users don't pass the Chi Square Test for Goodness of Fit quiz! Will you pass the quiz?

Start Quiz

Discover the right content for your subjects

No need to cheat if you have everything you need to succeed! Packed into one app!

Study Plan

Be perfectly prepared on time with an individual plan.

Quizzes

Test your knowledge with gamified quizzes.

Flashcards

Create and find flashcards in record time.

Notes

Create beautiful notes faster than ever before.

Study Sets

Have all your study materials in one place.

Documents

Upload unlimited documents and save them online.

Study Analytics

Identify your study strength and weaknesses.

Weekly Goals

Set individual study goals and earn points reaching them.

Smart Reminders

Stop procrastinating with our study reminders.

Rewards

Earn points, unlock badges and level up while studying.

Magic Marker

Create flashcards in notes completely automatically.

Smart Formatting

Create the most beautiful study materials using our templates.

Sign up to highlight and take notes. It’s 100% free.

Get FREE ACCESS to all of our study material, tailor-made!

Over 10 million students from across the world are already learning smarter.

Get Started for Free
Illustration