Suggested languages for you:

Americas

Europe

|
|

# Sampling Distribution

Let's say you want to know the average GPA of high school senior students in Atlanta, Georgia. To calculate the exact value, you would need to ask the population, that is, all the senior students in Atlanta, Georgia for their GPA. That sounds exhausting! But what if you just take a sample of it instead of asking all the senior students? This is the idea behind sampling distributions.

In this article, you'll find the definition of sampling distributions, types of sampling distributions, the formulas, the mean and the standard deviation of sampling distributions, and examples of application.

## Introduction to Sampling Distributions

Coming back to the example above, let's say you randomly select and sample $$100$$ senior students and calculate the average GPA from this sample. This average GPA would not be the same as the mean GPA of all senior students in Atlanta. It could be lower or higher, but it would most likely not be exactly equal to the population mean.

If you select a second sample of $$100$$ senior students, the average GPA for this sample would most likely differ from the mean of your first one. Thus, random samples selected would produce different mean values. Despite this variety of values, when many sample means are obtained, you can plot these collected means on a graph, and then this can provide an estimated mean of the entire population. This process explains the concept of creating sampling distributions of the mean.

## Definition of Sampling Distributions

A value that is calculated by taking information from a sample is called a statistic. Statistics allows you to estimate data of an entire population. As you saw in the example above, different random samples can give different values for a statistic; this difference is called sampling variability (or sampling error). This sampling variability can be reduced by increasing the sample size.

The distribution formed by all the possible values for sample statistics obtained for every possible different sample of a given size is called the sampling distribution.

### Conditions for Sampling Distributions

To ensure that the sampling distribution truly estimates the entire population, you must make sure that these two criteria are checked:

1. Randomization condition: the most important condition necessary for creating a sampling distribution is that your data comes from samples randomly selected.

2. Independence ($$10\%$$ condition): the sampled values must be independent one from another. Achieving this condition is the same as considering sample sizes no larger than $$10\%$$ of the entire population.

Let's go back to the average GPA example. For the randomization condition, unless you have a list of the students with the highest GPA in Atlanta, choosing any $$100$$ student randomly is enough to satisfy this condition.

On the other hand, for the independence condition, it is not unreasonable to assume that there are more than $$10\, 000$$ senior students in Atlanta, so the $$10\%$$ of this is $$1\,000$$. Any sample size less than $$1\,000$$ satisfies this condition, thus considering samples of a $$100$$ in size is acceptable.

## Types of Sampling Distributions

There are 3 types of sampling distributions:

1. Sampling distribution of proportions

2. Sampling distribution of means

3. T-distribution

### Sampling Distribution of Proportions

It is used to estimate a population proportion. It calculates the proportion of success, or chance, that a specific event will occur. The mean from each group of the sample proportion is a representation of the estimated proportion of success of the entire population.

### Sampling Distribution of Means

It entails calculating the means of all sample groups from a selected population. Then, the average of the means of all the samples is an estimated mean of the entire population.

### T-distribution

It is focused on a small population. It is used to measure the mean of the population and other statistical measurements such as confidence intervals, linear regression, and statistical differences. Since this distribution uses $$t$$-scores to calculate probabilities, it is out of the scope of this article.

## Formula for Sampling Distributions

The sample proportion, denoted by $$\widehat{p}$$, is calculated by counting how many successes are in the sample (success means that an individual possesses the characteristic of interest) and dividing it by the total sample size $$n$$

$\widehat{p}=\frac{\text{number of successes in the sample}}{n}.$

The sample mean, denoted by $$\overline{x}$$, is calculated by adding up all the values obtained from the sample and dividing by the total sample size $$n$$. The idea is the same as finding the average for a set of data. The formula is

$\overline{x}=\frac{x_1+x_2+...+x_n}{n},$

where $$\overline{x}$$ is the sample mean, $$x_i$$ is each one of the values of the sample, and $$n$$ is the sample size.

## Mean and Standard Deviation of Sampling Distributions

All probability distributions have characteristics that distinguish them. Sampling distributions are no exception, knowing the mean and standard deviation can give you a lot of information about the shape of the distribution.

### Mean and Standard Deviation of the Sample Proportion

Let $$p$$ be the proportion of success in a population and $$\widehat{p}$$ the sample proportion, that is, the proportion of success in a random sample of size $$n$$, then the sampling distribution of $$\widehat{p}$$ has mean and standard deviation given by $\mu_\widehat{p}=p\,\text{ and }\, \sigma_\widehat{p}=\sqrt{\frac{p(1-p)}{n}}.$

Moreover, if $np\geq 10\,\text{ and }\, n(1-p)\geq 10,$ then, the sampling distribution of $$\widehat{p}$$ is similar to a normal distribution.

A random sample is selected from a population that has a proportion of successes $$p=0.72$$. Calculate the mean and standard deviation of the sampling distribution of $$\widehat{p}$$ with sample size $$n=20$$.

Solution:

Using the formulas stated before, the mean is equal to the proportion of success of the population, then $\mu_\widehat{p}=0.72,$ while the standard deviation is given by $\sigma_\widehat{p} =\sqrt{\frac{0.72(0.28)}{20}}\approx 0.100.$

### Mean and Standard Deviation of the Sample Mean

Let $$\mu$$ be the mean and $$\sigma$$ the standard deviation of the population. Let $$\overline{x}$$ be the sample mean of a random sample of size $$n$$, then the sampling distribution of $$\overline{x}$$ has mean and standard deviation given by $\mu_\overline{x}=\mu\,\text{ and }\, \sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}.$

The standard deviation of the sampling distribution of means is also known as the standard error of the mean (SEM).

If the sample size $$n$$ is large enough (according to the Central Limit Theorem, $$n\geq 30$$ is enough) then, the sampling distribution of $$\overline{x}$$ is similar to a normal distribution.

A random sample is selected from a population with mean $$\mu=80$$ and standard deviation $$\sigma=5$$. Calculate the mean and standard deviation of the sampling distribution of $$\overline{x}$$ with sample size $$n=35$$.

Solution:

Using the formulas stated before, the sample mean is equal to the mean of the population, so $\mu_\overline{x}=80.$ And for the standard deviation of the sample mean

$\sigma_\overline{x}=\frac{5}{\sqrt{35}}\approx 0.845.$

## Examples of Sampling Distributions

Let's see an example using sampling distributions.

A restaurant stated $$30\%$$ of their customers like pineapple on their pizza. If there are $$100$$ customers on a given day, what is the probability that at least $$40\%$$ of these customers will buy a pizza with pineapple?

Solution:

(1) Note that $$p=0.30$$, $$(1-p)=0.70$$ and the sample size is $$n=100$$. Thus, the mean $$\mu_\widehat{p}=0.30$$ and the standard deviation $\sigma_{\widehat{p}}=\sqrt{\frac{(0.30)(0.70)}{100}}\approx 0.046.$

(2) Since $$np=100(0.30)=30>10$$ and $$n(1-p)=100(0.70)=70>10$$, then the sampling distribution of $$\widehat{p}$$ is similar to a normal distribution, and you can use this later to calculate the probability.

(3) Converting $$\widehat{p}$$ into $$z$$-score (see the article $$z$$-scores for more details), you will have

\begin{align} P(\widehat{p}>40) &= P\left(z>\frac{0.40-0.30}{0.046}\right) \\ &=P(z>2.17) \\ & =1-P(z<2.17) \\ &= 1-0.9850 \\ &=0.015.\end{align}

Thus, the probability that at least $$40\%$$ of these customers ask for a pizza with pineapple is $$0.015$$.

Let's see one extra example.

A company claims that the average lifetime of their lightbulbs is $$2\,000$$ hours with a standard deviation of $$300$$ hours. What is the probability that a random sample of $$50$$ lightbulbs have an average lifetime of less than $$1\,900$$ hours?

Solution:

(1) Since the sample size is $$n=50$$, according to the Central Limit Theorem, the sampling distribution of the mean $$\overline{x}$$ follows a normal distribution with mean $$\mu_\overline{x}=2\,000$$ and standard deviation $\sigma_\overline{x}=\frac{300}{\sqrt{50}} \approx 42.426.$

(2) Converting the $$\overline{x}$$ into $$z$$-scores and using the standard normal table (see the article Standard Normal Distribution for more information), you will have

\begin{align} P(\overline{x}<1\,900) &=P\left(z<\frac{1\,900-2\,000}{42.426}\right) \\ &=P(z<-2.35) \\ &= 0.0094. \end{align}

Thus, the probability that from a sample of size $$n=50$$ lightbulbs the average lifetime is less than $$1\,900$$ hours is $$0.0094$$.

## Sampling Distribution - Key takeaways

• A sampling distribution shows every possible statistic that can be obtained from every possible sample of the population.
• The sampling distribution of proportion $$\widehat{p}$$ has mean and standard deviation $\mu_\widehat{p}=p\, \text{ and } \,\sigma_\widehat{p}=\sqrt{\frac{p(1-p)}{n}}.$
• When $$np\geq 10$$ and $$n(1-p)\geq 10,$$ the sampling distribution of proportion $$\widehat{p}$$ behaves like a normal distribution.
• The sampling distribution of mean $$\overline{x}$$ has mean and standard deviation $\mu_\overline{x}=\mu\,\text{ and }\, \sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}.$
• When $$n\geq 30$$, the Central Limit Theorem states that the sampling distribution of mean $$\overline{x}$$ behaves like a normal distribution.

A sampling distribution is a statistical tool that helps to determine the probability of an event or another statistical parameter in a population based on taking random and small samples of it.

✓ Sampling distribution of proportions

✓ Sampling distribution of means

✓ T-distribution

To find the sampling distribution, follow the following steps:

1. select random samples of fixed size from the population;
2. obtain your data and summarize;
3. plot the distribution of the summary data.

✓ The sample mean is a good estimator (unbiased) of the population mean.

✓ The data is centered on the mean or close to the true population mean.
✓ The distribution is normal and has a symmetric shape when enough data points are included (at least 30, according to the Central Limit Theorem).

The sampling distribution allows you to determine information about an entire population using only information from small samples.

## Final Sampling Distribution Quiz

Question

To use the normal distribution to model a sampling distribution of mean, the following condition regarding the sample size must be satisfied:

$$n\geq 30$$.

Show question

Question

The standard deviation of the sampling distribution of the proportion $$\widehat{p}$$ can be calculated using the formula ____.

$$\sigma_\widehat{p}=\sqrt{\frac{p(1-p)}{n}}$$

Show question

Question

The ____ is a statistical tool that helps to calculate the probability of an event by sampling a small group repeatedly instead of sampling an entire population.

sampling distribution

Show question

Question

To use the normal distribution to model a sampling distribution of proportion, the following condition must be satisfied:

$$np\geq 10$$ and

$$n(1-p)\geq 10$$.

Show question

Question

The standard deviation of the sampling distribution of the mean $$\overline{x}$$ can be calculated using the formula ____.

$$\sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}$$

Show question

Question

Mention the 3 types of sampling distributions.

Proportions, means, and T-distribution.

Show question

Question

This type of sampling distribution focuses on a small population.

T-distribution

Show question

Question

When the data produces a bell-shaped curve, it is said to follow a ____ distribution.

normal

Show question

Question

When the normality condition is satisfied, the sampling distribution of proportions follows a normal distribution with mean and standard deviation given by

$$\mu_\widehat{p}=p$$ and $$\sigma_\widehat{p}=\sqrt{\frac{p(1-p)}{n}}$$.

Show question

Question

When the normality condition is satisfied, the sampling distribution of means follows a normal distribution with mean and standard deviation given by

$$\mu_\overline{x}=\mu$$ and $$\sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}$$.

Show question

Question

The standard deviation of the sampling distribution of means is also known as the ___.

standard error of the mean

Show question

Question

What does the randomization condition mean?

The collected data comes from samples randomly selected.

Show question

Question

What does the independence condition ($$10\%$$ condition) mean?

The sampled values must be independent one from another.

Show question

Question

The sampling variability can be reduced by _____.

increasing the sample size

Show question

Question

Sampling variability is also known as ____.

sampling error

Show question

Question

The standard deviation of the sampling distribution of proportion is given by

$$\sigma_\widehat{p}=\sqrt{\frac{p(1-p)}{n}}.$$

Show question

Question

How do you calculate a sample proportion?

$$\widehat{p}=\dfrac{\text{number of successes in the sample}}{n}$$.

Show question

Question

A restaurant wants to know how many customers order dessert. They asked 50 customers, of which 23 said they do order dessert. Which notation is the correct to represent this proportion?

$$p=23/50$$.

Show question

Question

What does the $$10\%$$ condition mean?

The population must at least $$10$$ times the sample size.

Show question

Question

If you know the population proportion and the sample size, can you calculate the standard deviation of the sample proportion?

Yes.

Show question

Question

This is the normality condition for sample proportions

$$np\geq 10$$ and

$$n(1-p)\geq 10$$.

Show question

Question

The sample proportion can only take values from $[0,1].$

True.

Show question

Question

$$P(\widehat{p}<0.11)$$.

Show question

Question

If the sampling distribution of a proportion $$\widehat{p}$$ is normally distributed, how do you convert a value $$\widehat{p}$$ into a $$z$$-value?

$$z=\dfrac{\widehat{p}-\mu_\widehat{p}}{\sigma_\widehat{p}}$$.

Show question

Question

What does the randomization condition mean?

Samples must be randomly selected.

Show question

Question

What are the mean and standard deviation of the sampling distribution for samples of size 40 trips if the population mean of the number of fish caught each trip to a given fishing hole is 3.2 and the population standard deviation is 1.8?

mean = 3.2 and standard deviation = 0.285

Show question

Question

What is the Central Limit Theorem?

The Central Limit Theorem is an important theorem in statistics that involves approximating a distribution of sample means to the normal distribution.

Show question

Question

What is the minimum sample size to consider when using the Central Limit Theorem?

30

Show question

Question

How can you supposedly construct a distribution of sample means?

By drawing many samples of the same size from the same population and calculating the mean of the attribute you're interested in, you form a list of means from those samples that may become a distribution of sample means.

Show question

Question

What are two important conditions for the Central Limit Theorem?

Two important conditions are randomness and a sufficiently large number of samples.

Show question

Question

What important concepts does the Central Limit Theorem involve?

There are two important concepts that the Central Limit Theorem involves: a distribution of sample means and the normal distribution.

Show question

Question

The Central Limit Theorem applies to any distribution with many samples, be it known, like a binomial, a uniform, or a Poisson distribution, or an unknown distribution. True or false?

True.

Show question

Question

What does the Central Limit Theorem tell us?

The Central Limit Theorem says that if you take a sufficiently large number of samples from any random distribution, the distribution of the sample means can be approximated by the normal distribution.

Show question

Question

State the formula for the Central Limit Theorem.

For $$X$$ with mean $$\mu$$ and standard deviation $$\delta$$, if $$n\ge 30$$, then there's a random variable $$\bar{X}$$ such that $$\bar{X}\approx N\left (\mu, \frac{\delta}{\sqrt{n}} \right)$$.

Show question

Question

The Central Limit Theorem is useful in making significant inferences about the population from a sample. It can be used to tell whether two samples were drawn from the same population, and also check if the sample was drawn from a certain population. True or False?

True.

Show question

Question

The mean of the sampling distribution of proportion is given by

$$\mu_\widehat{p}=p$$.

Show question

Question

What is an estimator?

It is the value resulting from a point estimation of a parameter.

Show question

Question

If the expected value of the parameter is equal to the parameter, what statement is true?

bias = 0

Show question

Question

What is true about the maximum likehood function?

derivative = 0

Show question

Question

What is an example of a Bernouilli distribution

Throwing a coin

Show question

Question

What is an example of a poisson distribution

The number of cars going pass a school in 10 minutes

Show question

Question

True or False: The advantage of point estimation is that you don't know how close or how far away from the true value of the parameter the estimator is.

False.

Show question

Question

When the properties of consistency and unbiased are met for an estimator, you have what is called:

The best-unbiased estimator.

Show question

Question

Two important properties of estimators are ___ and ____.

Consistent and Unbiased.

Show question

Question

The point estimate of the population mean $$\mu$$ is:

The sample mean $$\bar{x}$$.

Show question

Question

What is point estimation?

Point estimation is the use of statistics taken from one or several samples to estimate the value of an unknown parameter of a population.

Show question

Question

If the distribution is possion how do we find p(x = 7|λ = 2)

Use the possion distribution where λ = 2. Find the P(x =< 7) - P(x=< 6)

Show question

Question

What is true about the likelihood function?

Product of all the probabilities at a particular parameter.

Show question

Question

In instances where it is difficult to collect data on each element of a population, the Central Limit Theorem won't be useful to approximate the features of the population. True or False?

False.

Show question

Question

The Central Limit Theorem allows approximating any distribution, for a large sample size, to the binomial distribution. True or False?

False.

Show question

60%

of the users don't pass the Sampling Distribution quiz! Will you pass the quiz?

Start Quiz

## Study Plan

Be perfectly prepared on time with an individual plan.

## Quizzes

Test your knowledge with gamified quizzes.

## Flashcards

Create and find flashcards in record time.

## Notes

Create beautiful notes faster than ever before.

## Study Sets

Have all your study materials in one place.

## Documents

Upload unlimited documents and save them online.

## Study Analytics

Identify your study strength and weaknesses.

## Weekly Goals

Set individual study goals and earn points reaching them.

## Smart Reminders

Stop procrastinating with our study reminders.

## Rewards

Earn points, unlock badges and level up while studying.

## Magic Marker

Create flashcards in notes completely automatically.

## Smart Formatting

Create the most beautiful study materials using our templates.