Suggested languages for you:
|
|

## All-in-one learning app

• Flashcards
• NotesNotes
• ExplanationsExplanations
• Study Planner
• Textbook solutions

# Central Limit Theorem

If you were asked if there were any important things in your life, I bet it wouldn't be a difficult question to answer. You could easily identify aspects of your daily life that you could not live with relative quality without. You could label these things as central in your life.

The same is true in several areas of knowledge, particularly in statistics. There is a mathematical result so important in statistics that they made a point of including the word central in its designation. And it is central not only in its importance, but also in its simplifying power.

It is the Central Limit Theorem and in this article, you will see its definition, its formula, conditions, calculations and examples of application.

## Understanding the Central Limit Theorem

Consider the following example.

Imagine you have a bag with four balls

• of equal size;
• indistinguishable to touch;
• and numbered with the even numbers 2, 4, 6, and 8.

You are going to remove two balls at random, with replacement, and you'll calculate the mean of the numbers of the two balls you removed.

"With replacement" means you remove the first ball from the bag, you put it back, and you remove the second ball. And yes, this can lead to the same ball being removed twice.

Notice that you have 16 possible combinations; we present them in the tables below, with their means calculated.

 1st ball 2 2 2 2 4 4 4 4 2nd ball 2 4 6 8 2 4 6 8 mean 2 3 4 5 3 4 5 6
 1st ball 6 6 6 6 8 8 8 8 2nd ball 2 4 6 8 2 4 6 8 mean 4 5 6 7 5 6 7 8

Now let's draw a bar graph of these means, figure 2.

Fig. 2 - Bar graph of the list of mean in the tables

If you notice, the shape of this bar graph is heading towards the shape of a normal distribution, don’t you agree? It’s getting closer to the form of a normal curve!

Now, if instead of 4 balls numbered with 2, 4, 6 and 8, you had 5 balls numbered with 2, 4, 6, 8 and 10, then you’d have 25 possible combinations, which leads to 25 means.

What would the graph bar of this new list of means look like? Yes, it would have a similar form to that of a normal curve.

If you kept increasing the number of numbered balls, the corresponding bar graph would get closer and closer to a normal curve.

"Why is that?" you ask. This leads you to the next section.

## Definition of Central Limit Theorem

The Central Limit Theorem is an important theorem in statistics, if not the most important, and is responsible for the effect of approximating the bar graphs for increasing values of the number of numbered balls to the curve of the normal distribution in the above example.

Let's start by looking at its statement, and then recall two important concepts involved in it: a distribution of sample means, and the useful normal distribution.

### Central Limit Theorem Statement

The statement of the Central Limit Theorem says:

If you take a sufficiently large number of samples from any random distribution, the distribution of the sample means can be approximated by the normal distribution.

Easy-peasy, right?! “Uhh… No…!!” Ok, ok. Let's understand it by simplifying its statement a bit:

If you take a big number of samples from a distribution, the sample mean of this distribution can be approximated by the normal distribution.

Let's forget for a moment "a sufficiently large number" and "any random distribution", and focus on:

• a sample mean;

• and normal distribution.

### Understanding the Distribution of Sample Means

Imagine you have to perform a statistical study for a particular attribute. You identify the population of your study and from it, you’ll draw a random sample. You will then calculate a particular statistic related to that attribute you’re interested in from this sample, and it’ll be the mean.

Now imagine drawing another sample randomly from the same population, with the same size as the previous one, and calculating the mean of the attribute of this new sample.

Imagine doing this a few more (and more and more) times. What you’ll end up with is a list of means from the samples you’ve drawn. And voilà! That list of means you end up with constitutes a distribution of sample means.

To deepen your knowledge on this topic, read our article Sample Mean.

### Recalling the Normal Distribution

One big usefulness of the normal distribution is associated with the fact that it approximates quite satisfactorily the frequency curves of physical measurements. That is, physical measures such as the height and weight of a sample of elements of the human population can be approximated by this distribution. Now you’re close to seeing another important application of this distribution.

By now you may already know that the normal distribution is a probability distribution with two parameters, a mean $$\mu$$ and a standard deviation $$\sigma$$, and that has a graphical appearance of a bell-shaped curve – see figure 1.

Fig. 1 – Normal curve of a normal distribution of mean 0 and standard deviation 0.05

The mean is the value at which the distribution is centered, and the standard deviation describes its degree of dispersion.

In the case of figure 1, the normal curve is centered at 0 and its dispersion is somewhat low, 0.05. The lower the dispersion, the closer the curve is to the $$y$$-axis.

To refresh your memory on this topic, read our article Normal Distribution.

### How Many is Enough?

What you need to understand here is that the Central Limit Theorem tells us that for a "number” of samples from a distribution, the sample mean will get closer to the normal distribution.

Recalling the example above:

"Imagine you have a bag with four balls

• of equal size;
• indistinguishable to touch;
• and numbered with the even numbers 2, 4, 6, and 8.

You are going to remove two balls at random, with replacement, and you'll calculate the mean of the numbers of the two balls you removed."

Notice that here the samples are the means of the two balls removed, and the distribution will be of the list of means obtained.

Now including what we took out for a moment, Central Limit Theorem says that no matter what the distribution is - "any random distribution" -, the distribution of its mean approaches normal distribution as the number of samples grows - "a sufficiently large number of samples".

Now the question imposes itself, what is a sufficiently large number of samples? This leads us to the next section.

## Conditions for the Central Limit Theorem

There are two main conditions that must be met for you to apply the Central Limit Theorem.

The conditions are the following:

• Randomness – the sample collection must be random, this means every element of the population must have the same chance of being selected.

Coming back to the first example, you had the 4 balls on a bag, and they were indistinguishable to touch. These elements randomize the experiment.

• Sufficiently large sample: as a practical rule, when the number of samples is at least 30 the distribution of the sample means will satisfactorily approach a normal distribution.

This is why the example above serves only the purpose of illustrating with simplicity the idea of the Central Limit Theorem. We got 16 samples from it, and if there were 5 balls, we could only get 25 samples, which again is not enough large number of samples.

## Central Limit Theorem Formula

Addressing the Central Limit Theorem formula is equivalent to restating it by introducing all the necessary notation, and giving it further details.

It’s worth repeating the first statement:

If you take a sufficiently large number of samples from any random distribution, the distribution of the sample means can be approximated by the normal distribution.

Now introducing the appropriate notation:

Assume you have an initial distribution, with either an unknown or known probability distribution, and let $$\mu$$ be its mean and $$\sigma$$ be its standard deviation.

Also, assume you’ll take $$n$$ samples from this initial distribution, and $$n\ge30$$.

Then, the sample mean, $$\bar{x}$$, with mean $$\mu_\bar{x}$$ and standard deviation $$\sigma_\bar{x}$$, will be normally distributed with mean $$\mu$$ and standard variation $$\frac{\sigma}{\sqrt{n}}$$.

As a result of this new restatement of the Central Limit Theorem, you can conclude that:

1. The mean of the distribution of the sample mean $$\bar{x}$$ will be equal to the mean of the initial distribution, i.e., $\mu_\bar{x}=\mu;$
2. The standard deviation of the distribution of the sample mean $$\bar{x}$$ will be $$\frac{1}{\sqrt{n}}$$ of the standard deviation of the initial distribution, i.e., $\sigma_\bar{x}=\frac{\sigma}{\sqrt{n}};$

This is actually good: notice that for an increasing value of $$n$$, $$\frac{\sigma}{\sqrt{n}}$$ decreases, the dispersion of $$\bar{x}$$ decreases, which means it behaves more and more like a normal distribution.

3. The Central Limit Theorem applies to any distribution with many samples, be it known (like a binomial, a uniform, or a Poisson distribution) or an unknown distribution.

Let's look at an example where you'll see this notation in action.

A study reports that the mean age of peanut buyers is $$30$$ years and the standard deviation is $$12$$. With a sample size of $$100$$ people, what are the mean and standard deviation for the sample mean ages of the peanut buyers?

Solution:

The population and consequently the sample of the study consists of peanut buyers, and the attribute they were interested in was age.

So, you're told the mean and the standard deviation of the initial distribution is $$\mu=30$$ and $$\sigma=12$$.

You're also told the number of samples, so $$n=100$$.

Since $$n$$ is greater than $$30$$, you can apply the Central Limit Theorem. Then, there will be a sample mean $$\bar{x}$$ that is normally distributed with mean $$\mu_\bar{x}$$ and standard deviation $$\sigma_\bar{x}$$.

And you know more,

\begin{align} \mu_\bar{x}&=\mu\\ &=30\end{align}

and

\begin{align} \sigma_\bar{x}&=\frac{\sigma}{\sqrt{n}} \\ &=\frac{12}{\sqrt{100}} \\ &=\frac{12}{10} \\ &=1.2 .\end{align}

Therefore, $$\bar{x}$$ is normally distributed with mean $$30$$ and standard deviation $$1.2$$.

## Calculations Involving the Central Limit Theorem

As you by now know, the Central Limit Theorem allows us to approximate any distribution of means, for a large number of samples, to the normal distribution. This means that some of the calculations where the Central Limit Theorem is applicable will involve calculations with the normal distribution. Here, what you'll be doing is converting a normal distribution to the standard normal distribution.

To recall more of the last concept topic, please read our article Standard Normal Distribution.

The importance of doing this conversion is that then you'll have access to a table of values of the standard normal, also known as z-score, to which you can refer to proceed with your calculations.

Any point $$x$$ from a normal distribution can be converted to the standard normal distribution $$z$$ by doing the following

$z=\frac{x-\mu}{\sigma},$

where $$z$$ follows the standard normal distribution (with mean $$\mu=0$$ and standard deviation $$\sigma=1$$).

Because $$\bar{x}$$ is normally distributed with mean $$\mu$$ and standard deviation

$\frac{\sigma}{\sqrt{n}},$

the conversion will be more like

$z=\frac{x-\mu}{\frac{\sigma}{\sqrt{n}}}.$

You can refresh your memory on this topic by reading our article z-score.

This example serves as a reminder of the conversion to the standard normal distribution.

A random sample of size $$n=90$$ is selected from a population with mean $$\mu=20$$ and standard deviation $$\sigma=7$$. Determine the probability that $$\bar{x}$$ is less than or equal to $$22$$.

Solution:

Since the sample size is $$n=90$$, you can apply the Central Limit Theorem. This means $$\bar{x}$$ will follow a normal distribution with mean

$\mu_\bar{x}=\mu=22$

and standard deviation

\begin{align} \sigma_\bar{x}&=\frac{\sigma}{\sqrt{n}} \\ &=\frac{7}{\sqrt{90}} \\ &=0.738 \end{align}

to three decimal places.

Now you want to find $$P(\bar{x}\le 22)$$, and for that you apply the conversion to the standard normal:

\begin{align} P(\bar{x}\le 22)&=P\left( z\le \frac{22-20}{0.738} \right) \\ \\ &=P( z\le 2.71) \\ \\ &=\text{ area under the normal curve to the left of 2.71} \\ \\ &=0.9966 \end{align}

## Examples of the Central Limit Theorem

To consolidate the learnings from this article, let's now turn to application examples. Here, you will see an overview of all the main aspects of the Central Limit Theorem.

To the first example.

A female population’s weight data follows a normal distribution. It has a mean of 65 kg and a standard deviation of 14 kg. What is the standard deviation of the chosen sample if a researcher analyzes the records of 50 females?

Solution:

Let's do a final word problem.

A small hotel receives on average $$10$$ new customers per day with a standard deviation of 3 customers. Calculate the probability that in a 30-day period, the hotel receives on average more than $$12$$ customers in 30 days.

Solution:

The initial distribution has a mean $$\mu=10$$ and a standard deviation $$\sigma=3$$. As the time period is 30 days, $$n=30$$. Therefore, you can apply Central Limit Theorem. This means you'll have $$\bar{x}$$ whose distribution has a mean $$\mu_\bar{x}$$ and a standard deviation $$\sigma_\bar{x}$$, and

\begin{align} \mu_\bar{x}&=\mu\\ &=10 \end{align}

and

\begin{align} \sigma_\bar{x}&=\frac{\sigma}{\sqrt{n}}\\ &=\frac{3}{\sqrt{30}} \\ &=0.548 \end{align}

to three decimal places.

You're asked to calculate $$P(\bar{x}\ge 12)$$, and for that you'll convert $$\bar{x}$$ to the normal standard $$z$$:

\begin{align} P(\bar{x}\ge 12)&=P\left(z \ge \frac{12-10}{0.548} \right) \\ \\ &=P(z \ge 3.65) .\end{align}

Now, the final calculations:

\begin{align} P(z\ge 3.65)&=\text{ area under the normal curve to right of 3.65} \\ &=1-0.9999 \\ &=0.0001\, (0.01\%).\end{align}

Therefore, the probability that in a 30-day period the hotel receives on average more than $$12$$ customers in 30 days is $$0.01\%$$.

## Importance of the Central Limit Theorem

There are many situations in which the Central Limit Theorem is of importance. Here are some of them:

• In instances where it is difficult to collect data on each element of a population, the Central Limit Theorem is used to approximate the features of the population.

• The Central Limit Theorem is useful in making significant inferences about the population from a sample. It can be used to tell whether two samples were drawn from the same population, and also check if the sample was drawn from a certain population.

• To build robust statistical models in data science, the Central Limit Theorem is applied.

• To assess the performance of a model in machine learning, the Central Limit Theorem is employed.

• You test a hypothesis in statistics using the Central Limit Theorem to determine if a sample belongs to a certain population.

## The Central Limit Theorem - Key takeaways

• Central Limit Theorem says, if you take a sufficiently large number of samples from any random distribution, the distribution of the sample means can be approximated by the normal distribution.

• Another way of stating Central Limit Theorem is if $$n\ge 30$$, then the sample mean $$\bar{x}$$ follows a normal distribution with $$\mu_\bar{x}=\mu$$ and $$\sigma_\bar{x}=\frac{\sigma}{\sqrt{n}}.$$

• Any normal distribution can be converted to the normal standard by doing $$z=\frac{x-\mu}{\frac{\sigma}{\sqrt{n}}}.$$

• Knowledge of the standard normal distribution, its table and its properties help you in calculations involving the Central Limit Theorem.

The Central Limit Theorem is an important theorem in Statistics that involves approximating a distribution of sample means to the normal distribution.

The Central Limit Theorem is useful in making significant inferences about the population from a sample. It can be used to tell whether two samples were drawn from the same population, and also check if the sample was drawn from a certain population.

Assume you have a random variable X, with either an unknown or known probability distribution. Let σ be the standard deviation of X and μ be its. The new random variable, X, comprising the sample means, will be normally distributed, for a large number of samples (n ≥ 30), with mean μ and standard deviation σ/√n.

The Central Limit Theorem says that if you take a sufficiently large number of samples from any random distribution, the distribution of the sample means can be approximated by the normal distribution.

The Central Limit Theorem is not a prerequisite for confidence intervals. However, it helps to construct intervals by forming an estimate of samples as having a normal distribution.

## Final Central Limit Theorem Quiz

Question

What are the mean and standard deviation of the sampling distribution for samples of size 40 trips if the population mean of the number of fish caught each trip to a given fishing hole is 3.2 and the population standard deviation is 1.8?

mean = 3.2 and standard deviation = 0.285

Show question

Question

What is the Central Limit Theorem?

The Central Limit Theorem is an important theorem in statistics that involves approximating a distribution of sample means to the normal distribution.

Show question

Question

What is the minimum sample size to consider when using the Central Limit Theorem?

30

Show question

Question

How can you supposedly construct a distribution of sample means?

By drawing many samples of the same size from the same population and calculating the mean of the attribute you're interested in, you form a list of means from those samples that may become a distribution of sample means.

Show question

Question

What are two important conditions for the Central Limit Theorem?

Two important conditions are randomness and a sufficiently large number of samples.

Show question

Question

What important concepts does the Central Limit Theorem involve?

There are two important concepts that the Central Limit Theorem involves: a distribution of sample means and the normal distribution.

Show question

Question

The Central Limit Theorem applies to any distribution with many samples, be it known, like a binomial, a uniform, or a Poisson distribution, or an unknown distribution. True or false?

True.

Show question

Question

What does the Central Limit Theorem tell us?

The Central Limit Theorem says that if you take a sufficiently large number of samples from any random distribution, the distribution of the sample means can be approximated by the normal distribution.

Show question

Question

State the formula for the Central Limit Theorem.

For $$X$$ with mean $$\mu$$ and standard deviation $$\delta$$, if $$n\ge 30$$, then there's a random variable $$\bar{X}$$ such that $$\bar{X}\approx N\left (\mu, \frac{\delta}{\sqrt{n}} \right)$$.

Show question

Question

The Central Limit Theorem is useful in making significant inferences about the population from a sample. It can be used to tell whether two samples were drawn from the same population, and also check if the sample was drawn from a certain population. True or False?

True.

Show question

Question

In instances where it is difficult to collect data on each element of a population, the Central Limit Theorem won't be useful to approximate the features of the population. True or False?

False.

Show question

Question

The Central Limit Theorem allows approximating any distribution, for a large sample size, to the binomial distribution. True or False?

False.

Show question

Question

By the Central Limit Theorem, the distribution of the sample means will have the same mean and standard deviation of the initial distribution. True or False?

False.

Show question

Question

By the Central Limit Theorem, if a random variable $$X$$ of a particular distribution has a standard deviation of $$\delta$$, what will be the standard deviation of $$\bar{X}$$?

$$\delta_\bar{X}=\frac{\delta}{\sqrt{n}}$$

Show question

Question

The normal standard is an important distribution when dealing with calculations involving the Central Limit Theorem. True or False?

True.

Show question

Question

Any distribution $$X$$ of mean $$\mu$$ and standard deviation $$\delta$$ can be easily converted to the normal standard by doing $$Z=\frac{X-\mu}{\delta}$$. True or False?

False. Only the normal distribution can be converted to the normal standard distribution.

Show question

60%

of the users don't pass the Central Limit Theorem quiz! Will you pass the quiz?

Start Quiz

## Study Plan

Be perfectly prepared on time with an individual plan.

## Quizzes

Test your knowledge with gamified quizzes.

## Flashcards

Create and find flashcards in record time.

## Notes

Create beautiful notes faster than ever before.

## Study Sets

Have all your study materials in one place.

## Documents

Upload unlimited documents and save them online.

## Study Analytics

Identify your study strength and weaknesses.

## Weekly Goals

Set individual study goals and earn points reaching them.

## Smart Reminders

Stop procrastinating with our study reminders.

## Rewards

Earn points, unlock badges and level up while studying.

## Magic Marker

Create flashcards in notes completely automatically.

## Smart Formatting

Create the most beautiful study materials using our templates.