Log In Start studying!

Select your language

Suggested languages for you:
StudySmarter - The all-in-one study app.
4.8 • +11k Ratings
More than 3 Million Downloads
Free
|
|

Comparing Two Means Hypothesis Testing

Comparing Two Means Hypothesis Testing

When facing different scenarios, you will need to adapt your hypothesis testing method. One scenario that frequently arises is one where you wish to test whether there is a difference between two means. You might have done this already using the normal distribution. But what happens if you don't know the variances of these populations and your sample sizes are small?

That's where the \(t\)-distribution comes in. This article will take you through a hypothesis test for the difference in means of two independent, normally distributed populations.

Comparing Two Means: Hypothesis Testing Overview

The \(t\)-distribution can also be used to test the means of two independent normal distributions when the variances are unknown and the sample sizes are small. To do so, you will need to assume the populations have the same variance and therefore need to use a pooled estimate of variance.

For a reminder on the \(t\)-distribution and its properties, see the article T-distribution.

Unlike the paired \(t\)-test, where you are comparing the results of an experiment before and after some treatment, here you are comparing two independent normal distributions.

Describe the kind of hypothesis test would you use in the following scenarios.

1. A mobile phone company has released a new software update. They have asked you to find statistical evidence to support their claim that the software update has improved battery life.

2. A pet store sells Welsh Corgi puppies from two different breeders. They wish to determine whether there is a significant difference between the weights of the puppies from each breeder.

Solution

1. In order to conduct this experiment, you would need to collect samples of information on phone battery life before and after the software update. Since the samples will be taken from the same population after a change has been made, they are not independent. Therefore, you need to use a paired t-test.

2. In this case, you would be required to take samples of weights from two different breeders and therefore two independent distributions. You should assume that the populations have the same variances, therefore you will need to use a pooled estimate of variance to find the t-value and not a paired t-test.

Hypothesis testing for the difference of two means

The hypothesis test for the difference of two means follows these steps:

  1. Find the null hypothesis and alternative hypothesis, \(H_0\) and \(H_1\).

  2. Determine the significance level from the questions, \(\alpha\).

  3. Determine the number of degrees of freedom, \(\upsilon\).

  4. Find the critical region.

  5. Calculate the pooled estimate of the variance, \(s^2_p\).

  6. Calculate \(t\).

  7. Compare the value of \(t\) with your critical region and state your conclusion, addressing whether the result is significant, and what this means in the context of the question.

Next let's take a look at the hypotheses you will need to do the test.

Null hypothesis for comparing two means

While comparing two means, your null hypothesis will state that the difference between the two populations you are testing is equal to zero. In other words, the null hypothesis is that there is no difference in the population means.

Samples are taken from two distributions, \(X\) and \(Y\), under the assumption that they are independent and normally distributed.

To perform a hypothesis test for the difference between the means of these distributions, use the following null hypothesis,

\[H_0:\, \mu _x =\mu _y.\]

What about the alternative hypothesis?

Alternative hypothesis for comparing two means

The alternative hypothesis for comparing two means will depend on whether you wish to test whether one particular distribution is greater than the other (a one-tailed test), or simply whether there is any difference at all (a two-tailed test).

When using a two-tailed test, remember to divide the significance level between the two tails!

Remember to read the question carefully to determine which sort of alternative hypothesis to use.

Samples are taken from two distributions, \(X\) and \(Y\), under the assumption that they are independent and normally distributed.

In the case that you wish to test whether the means are different (that is a two-tailed test), you will have the following alternative hypothesis,

\[H_1:\, \mu _x \neq \mu _y.\]

In the case that you wish to test whether the mean of \(X\) is greater than the mean of \(Y\) (that is a one-tailed test), you will have the following alternative hypothesis,

\[H_1:\, \mu _x > \mu _y.\]

Next let's see some of the calculations involved.

Comparing Two Population Means Hypothesis Testing: Calculations

When testing for the difference between means, there are some extra calculations that you'll need to perform to find the pooled estimate of the variance and the value of \(t\) that you wish to test.

Using sample variances, \(s^2_x\) and \(s^2_y\), and the size of each sample, \(n_x\) and \(n_y\), the pooled estimate of the variance is given by the formula

\[s^2_p=\frac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)}.\]

Once you have found \(s^2_p\), you will need to find the \(t\)-critical value that goes with it.

Given samples means and variances \(\bar{x}\), \(\bar{y}\), \(s^2_x\) and \(s^2_y\) and the pooled estimate of variance \(s^2_p\), the \(t\)-critical value, \(t^*\) is:

\[t^*=\frac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\dfrac{1}{n_x}+\dfrac{1}{n_y}\right)}}.\]

Hypothesis Testing Two Population Means Examples

Next, let's look at a couple of examples on how to use and calculate these statistics within an actual hypothesis test.

A pet store sells Welsh Corgi puppies on behalf of two puppy breeders, \(X\) and \(Y\). They have sampled the weights of puppies from each breeder.

Hypothesis Test for the Difference Between Two Means six Corgi puppies all in a row StudySmarterFig. 1 - Puppies always make math better!

Weights of puppies from breeder \(X\) in kilograms: \(5.44,5.32,5.21,5.67.\)

Weights of puppies from breeder \(Y\) in kilograms: \(5.02,4.99,5.42,5.21,5.11.\)

The pet store wishes to know whether there is a statistically significant difference between the weights of the puppies from each breeder.

a. If you wanted to test the difference in the weights of the puppies, what assumptions need to be made?

b. Test whether the mean weights of puppies from the two breeders is different at the \(10\%\) confidence level.

Solution

a. In order to test the difference in the weights of the puppies, the assumptions to be made are that the samples of puppies are normally distributed, independent and have the same variances.

b. The test is two-tailed, so the hypotheses are,

\[ \begin{align} &H_0:\, \mu _x=\mu _y \\ &H_1: \,\mu _x \neq \mu _y.\end{align}\]

This is a two-tailed test since the alternative hypothesis is that the mean weights are different. The significance level is \(10\)%, so the critical region will have the probability of \(0.05\) in each tail of the distribution.

The number of degrees of freedom is

\[\upsilon = (4-1)+(5-1)=7.\]

To find degrees of freedom in this case, you need to add together the degrees of freedom from each sample. Or, you can use the formula \(\upsilon = n_x+n_y-2\).

The critical value can be found using a calculator or probability tables:

\[t_{\upsilon =7}(0.05)=1.895.\]

Next, find the pooled estimate of variance. You should have \(\bar{x}=5.41\) and \(\bar{y}=5.17.\)

The samples variances are \(s^2_x=0.038866667 \) and \(s^2_y=0.03015\).

Therefore, the pooled estimate of variance is,

\[\begin{align} s^2_p &= \frac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)} \\&= \frac{(4-1)0.038867 +(5-1)0.03015 }{(4-1)+(5-1)} \\&=0.033886 \text{ to 5 s.f.} \end{align}\]

Your value of \(t^*\) is then:

\[\begin{align} t&=\frac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\dfrac{1}{n_x}+\dfrac{1}{n_y}\right)}}\\&=\dfrac{(5.41-5.17)-(0)}{\sqrt{0.033886\left(\dfrac{1}{4}+\dfrac{1}{5}\right)}}\\&=1.9435\end{align}\]

Since \(t^*=1.9435>1.895=t_\upsilon\), your value of \(t^*\) falls within the critical region. Therefore, at the \(10\)% significance level, you can reject the null hypothesis.

In conclusion, there is evidence to suggest there is a difference between the means of the weights of Welsh Corgi puppies from the two breeders.

This second example is slightly different to the first. The method will need to be adapted slightly.

A food delivery service, \(A\), claims that their average food delivery time is more than \(5\) minutes faster than the delivery time of their competitor, \(B\).

A random sample of delivery times from each company is collected:

  • Food delivery time for \(A\), in minutes: \(22,16,45,23,39,32.\)
  • Food delivery time for \(B\), in minutes: \(34,42,63,18,25,46,47.\)

Food delivery service \(B\) hires you to test whether this claim is statistically significant at the \(10\%\) significance level. Complete a hypothesis test for the difference between means and explain what this means for the two food delivery services.

Solution

Since the samples are independent the null hypothesis would normally be that the two means are the same. However the claim is that service \(A\) averages \(5\) minutes faster than their competitor, so the null hypothesis is instead \(\mu _A=\mu _B -5 \). Since you are only interested in whether the food delivery time is greater for one service, the hypotheses are:

\[ \begin{align} &H_0:\,\mu _A=\mu _B -5 \\ &H_1: \,\mu_A < \mu _B-5. \end{align}\]

This is a one-tailed test.The significance level is \(10\)%, so the critical region will have the probability of \(0.10\) in the left tail of the distribution.

The number of degrees of freedom are

\[\upsilon = (6-1)+(7-1)=11.\]

The critical value can be found using a calculator or probability tables,

\[t_{\upsilon =11}(0.10)=1.363.\]

Since you are only interested in whether \(\mu _a\) is less than \(\mu _b -5\), the critical value is \(t_\upsilon = -1.363\).

If the alternative hypothesis had been greater than, you would have used \(t_\upsilon = 1.363\) instead.

Next, find the pooled estimate of variance. You have \(\bar{a}=29.5\) and \(\bar{b}=39.3\). The samples variances are \(s^2_a=123.50 \) and \(s^2_b=226.57\). Therefore, the pooled estimate of variance is:

\[\begin{align} s^2_p &= \frac{(n_a-1)s^2_a+(n_b-1)s^2_b}{(n_a-1)+(n_b-1)} \\&= \frac{(6-1)123.50 +(7-1)226.57 }{(6-1)+(7-1)} \\&=179.72\text{ to 5 s.f.} \end{align}\]

The value of \(t^*\) is therefore,

\[\begin{align} t^*&=\frac{(\bar{a}-\bar{b})-(\mu _a - \mu _b)}{\sqrt{s^2_p\left(\dfrac{1}{n_a}+\dfrac{1}{n_b}\right)}}\\&=\dfrac{(29.5 -39.3)-(-5)}{\sqrt{179.72 \left(\dfrac{1}{6}+\dfrac{1}{7}\right)}}\\&=-0.64357.\end{align}\]

Since the null hypothesis states that \(\mu _x=\mu _y-5\), you will have \(\mu _x-\mu _y=-5\).

Since \(t^*=-0.64357>-1.363=t_\upsilon \), the value of \(t\) falls within the acceptance region. Therefore, at the \(10\%\) significance level, you fail to reject the null hypothesis.

This means that there is not sufficient evidence to suggest that delivery service \(A\) has a delivery time better than \(5\) minutes faster than delivery service \(B\).

For a more detailed explanation of the pooled estimate of variance, check out the article Pooled Estimate of Variance.

Comparing Two Means Hypothesis Testing - Key takeaways

  • The \(t\)-distribution can be used to test the means of two independent normal distributions when the variances are unknown
  • The assumptions are that the populations are independent, normal and have the same variance
  • The pooled estimate of variance formula is \[s^2_p=\frac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)}.\]
  • The \(t^*\) value is \[t^*=\dfrac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\dfrac{1}{n_x}+\dfrac{1}{n_y}\right)}}.\]

Frequently Asked Questions about Comparing Two Means Hypothesis Testing

It depends on if the samples are independent or not.  If they are not independent then you can use a paired t-test.  If they are independent then you can use a test for the difference of two means.

If the two samples are independent, then the null hypothesis is that the difference in their means is zero.

The two means are significantly different if the \(t\)-critical value is outside the significance value selected for the hypothesis test.

Assuming that the samples are independent, the null hypothesis will be that the difference in the means is zero.  The alternative hypothesis will depend on whether you want to see if one mean is larger that the other, or if they are just different from each other. 

A comparison of means test is a kind of hypothesis test done when you have two independent samples and it uses a pooled estimate of variance.

Final Comparing Two Means Hypothesis Testing Quiz

Question

The t-distribution can be used to test the means of two independent normal distributions when the variances are known.

Show answer

Answer

False.

Show question

Question

To test the difference between two means, the distributions need to...

Show answer

Answer

...be independent normal distributions with unknown variances where the samples size is small.

Show question

Question

What are the assumptions to test for the difference between two means using the \(t\)-distribution?

Show answer

Answer

The assumptions are that:

1. the two populations are independent;

2. they are normally distributed; and

3. and they have the same variance.

Show question

Question

What is the formula for the pooled estimate of variance?

Show answer

Answer

\(s^2_p=\frac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)}\).

Show question

Question

What is the formula for the \(t\) value when testing for the difference between two means?

Show answer

Answer

\(t=\frac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\frac{1}{n_x}+\frac{1}{n_y}\right)}}\).

Show question

Question

What is the difference between the paired \(t\)-test and testing for the difference between two means?

Show answer

Answer

The paired test is used when comparing the results of an experiment before and after some treatment. The difference between two means tests for differences between two independent distributions.

Show question

Question

What is the first step in performing a hypothesis test?

Show answer

Answer

Finding the null hypothesis and alternative hypothesis, \(H_0\) and \(H_1\).

Show question

Question

Samples are taken from two distributions, \(X\) and \(Y\), under the assumption that they are independent and normally distributed.  When performing a hypothesis test for the difference between the means, what should your null hypothesis be? 

Show answer

Answer

The null hypothesis should be that there is no difference between the means, i.e.

\(H_0:\mu _x =\mu _y\).

Show question

Question

Samples are taken from two distributions, \(X\) and \(Y\). They are independent and normally distributed. You suspect that the means of the distributions are different. What should your alternative hypothesis be? 

Show answer

Answer

\(H_1: \mu _x \neq \mu _y\).

Show question

Question

Samples are taken from two distributions, \(X\) and \(Y\). They are independent and normally distributed.  You suspect that the mean of \(Y\) is larger than that of \(X\). What should your alternative hypothesis be? 

Show answer

Answer

\(H_1: \mu _x < \mu _y\).

Show question

Question

What is the final stage of performing a hypothesis test?

Show answer

Answer

You must compare the \(t\)-value with your critical region and state your conclusion, addressing firstly whether the result is significant and secondly what this means in the context of the question.

Show question

Question

With sample variances \(s^2_x\) and \(s^2_y\), with sample sizes \(n_x\) and \(n_y\), the pooled estimate of the variance is:

Show answer

Answer

\(\dfrac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)}\).

Show question

Question

If you have the sample means and variances \(\bar{x}\), \(\bar{y}\), \(s^2_x\) and \(s^2_y\) and the pooled estimate of variance \(s^2_p\), the value of \(t\) is:

Show answer

Answer

\(t=\dfrac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\dfrac{1}{n_x}+\dfrac{1}{n_y}\right)}}\).

Show question

Question

What formula can you use to find degrees of freedom of the pooled samples \(X\) and \(Y\)?

Show answer

Answer

\(\upsilon = n_x+n_y-2\).

Show question

Question

Which of the following statement are true for paired \(t\)-tests and the test for the difference between two means?

Show answer

Answer

They both use the \(t\)-distribution.

Show question

More about Comparing Two Means Hypothesis Testing
60%

of the users don't pass the Comparing Two Means Hypothesis Testing quiz! Will you pass the quiz?

Start Quiz

Discover the right content for your subjects

No need to cheat if you have everything you need to succeed! Packed into one app!

Study Plan

Be perfectly prepared on time with an individual plan.

Quizzes

Test your knowledge with gamified quizzes.

Flashcards

Create and find flashcards in record time.

Notes

Create beautiful notes faster than ever before.

Study Sets

Have all your study materials in one place.

Documents

Upload unlimited documents and save them online.

Study Analytics

Identify your study strength and weaknesses.

Weekly Goals

Set individual study goals and earn points reaching them.

Smart Reminders

Stop procrastinating with our study reminders.

Rewards

Earn points, unlock badges and level up while studying.

Magic Marker

Create flashcards in notes completely automatically.

Smart Formatting

Create the most beautiful study materials using our templates.

Sign up to highlight and take notes. It’s 100% free.

Get FREE ACCESS to all of our study material, tailor-made!

Over 10 million students from across the world are already learning smarter.

Get Started for Free
Illustration