Suggested languages for you:

Americas

Europe

|
|

# Inferences in Statistics

In looking at large data sets, the idea is to find out what information you can gather on particular populations, given the tremendous amounts and kinds of data gathered. But how do you know what kind of conclusions drawn from the data are valid and which are mostly guesses? The process of making inferences in statistics can help you decide!

## The Meaning of Inferences in Statistics

Statistics is defined as a discipline in applied mathematics concerned with the systematic study of the collection, presentation, analysis, and interpretation of data. The collection and analysis of data using different techniques and methods are called descriptive statistics. Now, after describing data with various techniques, what's next? That’s where inferences in statistics come in. Inferential statistics is the branch of statistics that deals with making the right conclusions, interpretations, and predictions from the analyzed data.

Inferences in statistics are techniques employed to examine the results of data to arrive at conclusions, interpretations, and predictions. Inferences in statistics are also referred to as inferential statistics or statistical inference.

Inferences in statistics can help you make predictions and conclusions about the populations you are looking at by interpreting the results of random samples from that population. The two main applications of inferential statistics that help us to draw these conclusions are hypothesis testing and confidence intervals of the data.

Statistical inferences are dependent on three main components:

• the size of samples;

• variability in the samples; and

• the size of the observed differences.

In general, you need a way to talk about the difference between the entire group you are looking at and the specific people who answered a survey or were part of a study.

The population refers to a group of units (persons, objects, or other items) enumerated in a census or from which a sample is drawn.

You can look at various sub-groups of the population. These sub-groups are referred to as samples.

A sample is defined as a subset of a population selected for measurement, observation, or questioning, to provide statistical information about the population.

To conduct statistical inference, the following conditions must be met:

1. The data for the experiment should be obtained through random samples or randomized experiments

2. The distribution of the sample means must be approximately normal

3. Individual observations must be independent

## Methods for Inference in Statistics

There are two main methods for making inferences in statistics: hypothesis tests and confidence intervals. Hypothesis tests involve proving or disproving a statement with appropriate statistical methods. Confidence intervals involve the creation of a range or boundary within which the value of a parameter is expected to be found and with a degree of assurance.

These are general steps that could be followed to make statistical inferences:

1. Plan and design your study

2. Collect data

3. Analyze data

4. Interpret the results

5. Present the results.

Let's look at a quick example of these steps.

There are fifty states in the United States, and overall, a population of more than 300 million people. Let's say the government wants to determine the average age of the population to gain insights into changing population conditions and social and economic trends.

Planning the study: They definitely cannot go door to door asking the age of every single person in the United States! However, they can use more strategic ways and statistical inferences to arrive at values and facts very close to or equal to that of the population, and this will form the plan and strategy of the study. One of the things, the government would need to be concerned with would be sources of bias in surveys to ensure the accuracy of the data.

Collecting Data: This can involve looking at census data or taking a random sample of people in the United States and asking the ages of people in their families. Take a look at the articles Random Sampling and Survey Sampling Methods for more information.

Analyzing the Data: This is looking for the average (or mean) of a population, so the appropriate analysis of the data would be hypothesis tests for a population mean.

Interpret the Results: This step is especially important! Often you will see things like "the approval rating is $$54 \% \pm 3\%$$, meaning that the rating isn't exactly $$54\%$$, but they can say with some degree of certainty that it is within $$3\%$$ of $$54\%$$. Being able to justify the claims you make is a big part of inference in statistics.

Presenting the Results: Once the average age is determined, it needs to be presented in such a way that other people (newscasters, bloggers, etc.) can understand it and explain it to other people.

## Types of Inferences in Statistics

Inferences in statistics can be done in several ways, with one of the most frequently seen being hypothesis testing.

A hypothesis is an assumption taken to be true for argument or investigation. An example of a hypothesis would be that the president's approval rating has declined since last year.

Hypothesis testing refers to the process of testing these assumptions and drawing conclusions about parameters from a sample regarding the population. It is done to assess the credibility of a certain hypothesis using data from a sample.

You can look at the article Hypothesis Testing for further information on what a hypothesis really is and how the testing is done.

Another method used in inferences is making and using confidence intervals. A confidence interval is used to generate a range of values where you can conclude with reasonable certainty that the real value lies. You might have seen this in political commentary when someone says something along the lines of "the candidate is leading by $$18$$ points, plus or minus $$2\%$$". That would mean that they have constructed a confidence interval for the lead of the candidate, and it is within $$2\%$$, lower or higher than $$18\%$$. Depending on what you are measuring, you would do one of the following kinds of intervals:

• Confidence Intervals for a Population Proportion
• Confidence Intervals for a Population Mean
• Confidence Intervals for the Difference of Two Proportions
• Confidence Intervals for the Slope of a Regression Model
• Confidence Intervals for the Difference of Two Means

As you have already seen in the article Data Analysis, sometimes the data collected isn't numerical. It could be categorical, such as in surveys. If you would like to draw inferences from categorical data, then you will generally use the Chi-Square Distribution. For more information on this kind of inference, see the article Inference for Distributions of Categorical Data.

What do you think of when you hear the term causal inference?

Causal inference is the process of concluding that a particular treatment given to the independent variable was the cause of the effect observed in the dependent variable.

An academic field known as causal inference examines the presumptions, research plans, and estimating techniques that enable researchers to infer causal relationships from data. Here, the treatment given to the independent variable is known as the intervention, while the effect observed in the dependent variable is the outcome.

Causal inference is when one deduces that something is or is most likely to be the cause of another. For instance, one may assume that someone is (or was) playing piano based on the sound of piano music.

However, a correlation may be misunderstood for causality. When certain variables show a relationship or association, this should not be mistaken that one directly affects the occurrence of the other as there may be a third variable. For instance, because cucumbers and tomatoes both have higher production in one year does not mean that the yield of tomatoes and cucumbers are related.

They are both associated with another variable which is climate.

Nonetheless, if a reduction in one variable leads to a proportionate increase or decrease in the other variable, then one can agree that a cause-and-effect relationship does occur between both variables. There are ways to design experiments so that as many outside effects as possible are eliminated. For more information on these techniques, see Experiment Methods, Sources of Bias in Experiments, and Randomized Block Design.

## Power Function in Statistics Inference

A power function describes the true value of a parameter to the probability of rejecting a null hypothesis about the value of that parameter. See the article Errors in Hypothesis testing for more information on the types of errors in hypothesis tests and what can cause them.

## Examples of Inference in Statistics

Let's take a look at an example of inference in statistics.

Suppose you are interested in seeing if there is a relationship between the number of hours of sleep someone gets and how good their grades are. To answer the question, you select random people in your class (this would be your sample) and ask them how many hours of sleep they get in a night and what their grade in the class is. You can then use this sample of the whole class (the whole class is the population) to make a hypothesis about the number of hours of sleep and the relationship this has to grades and do a hypothesis test to check the results. From there, you can make an inference about the population based on your sample.

Let's look at another example.

A drug manufacturer has a new product that they hope will cure cancer that they want to test. Naturally, they first start by testing it on mice rather than people. They select a group of mice with cancer and the second group of mice without cancer. Some of each group get the new product (these are the treatment groups), and some do not (these are the control groups). They can then measure how the drug affects the mice who get the drug and compare it to how the mice who didn't get the drug do.

This is an example of conducting an experiment, and the manufacturer would need to do hypothesis testing with two samples to see if their drug is effective. From there, they can draw an inference and decide if they want to continue pursing development of the drug.

## Inferences in Statistics - Key takeaways

• Inferences in statistics are techniques employed to examine the results of data and be able to make the right conclusion and interpretation from random variation. Inference in statistics is also referred to as inferential statistics or statistical inference.
• To conduct inferences in statistics, follow the following steps
1. Plan and design your study

2. Collect data

3. Analyze data

4. Interpret the results

5. Present the results.

• Hypothesis testing refers to the process of testing assumptions and drawing conclusions about parameters from a sample regarding the population.

Inferences in statistics are techniques employed to examine the results of data and be able to make the right conclusion and interpretation from random variation. Inference in statistics is also referred to as inferential statistics or statistical inference.

1. The data for the experiment should be obtained through random samples or randomized experiments

2. The distribution of the sample means must be approximately normal.

3. Individual observations must be independent.

An example of inference in statistics is if you say A and B are independent but and you use statistical methods to reject the null hypothesis that says A is dependent on B. This is specific to hypothesis testing as a type of inferences in statistics.

1. Plan and design your study: plan your study and determine which type of statistical inference is relevant to the study

2. Collect data

3. Analyze data

4. Interpret the results

5. Present the results

Causal inference is the process of concluding that a particular treatment given to the independent variable was the cause of the effect observed in the dependent variable.

## Final Inferences in Statistics Quiz

Question

What are the parameters that dictate the shape of a chi-square distribution?

The only parameter is the Degrees of Freedom, $$k$$.

Show question

Question

What is the range of a $$\chi^{2}_{k}$$ distribution?

The range is $$0$$ to $$\infty$$.

Show question

Question

What is the standard deviation of a $$\chi^{2}_{k}$$ distribution?

$$\sqrt{2k}$$.

Show question

Question

A chi-square distribution with $$4$$ degrees of freedom has a $$95\%$$ critical value of $$9.49$$.

True.

Show question

Question

A chi-square distribution with $$18$$ degrees of freedom has a $$10\%$$ critical value of $$25.99$$.

False.

Show question

Question

What is the mode of a $$\chi^{2}_{k}$$ distribution?

$$k - 2$$ if $$k \geq 2$$.

Show question

Question

What is the variance of a $$\chi^{2}_{k}$$ distribution?

$$\sigma^{2} = 2k$$.

Show question

Question

A chi-square distribution is a _____  distribution that becomes increasingly ____ as its degrees of freedom, $$k$$, increases.

non-symmetric, symmetrical.

Show question

Question

You have a chi-square distribution with a standard deviation of $$4$$. How many degrees of freedom does the distribution have?

1. Start with the formula for standard deviation: $\sigma = \sqrt{2k},$ where $k = \text{degrees of freedom}.$
2. Rearrange the formula to solve for $$k$$: $\dfrac{\sigma^{2}}{2} = k.$
3. Plug in the value for standard deviation: $k = \dfrac{(4)^{2}}{2}.$
4. Solve for $$k$$: $k=8.$

Show question

Question

Let $$Z_{i}$$ represent a standard normal random variable. What distribution does $$\sum^{15}_{i = 1} Z^{2}_{i}$$ follow?

$$\chi^{2}_{15}$$.

Show question

Question

What is the mean of a $$\chi^{2}_{9}$$ distribution?

$$\mu = k = 9$$.

Show question

Question

What is the mean of a $$\chi^{2}_{k}$$ distribution?

$$\mu = k$$.

Show question

Question

What distribution does

$Q = \chi^{2}_{6} + \chi^{2}_{11}$

follow?

$$Q = \chi^{2}_{17}$$.

Show question

Question

You want to know if people of different hair colors prefer different cuisines. Which chi-square test should you perform?

Chi-Square Test for Homogeneity

Show question

Question

You only have one categorical variable. What kind of chi-square test can you perform?

Chi-Square Test for Goodness of Fit

Show question

Question

Your friend seems to roll a lot of sixes...what kind of chi-square test can tell you if they are cheating?

Chi-Square Test for Goodness of Fit

Show question

Question

You work for a car dealership. You want to know if race plays a role in what model car people buy. You can get the data from your company's records, but what kind of chi-square test should you perform to answer your question?

Chi-Square Test for Independence

Show question

Question

Your contingency table has 5 columns and 5 rows. You are performing a chi-square test for homogeneity. What are the degrees of freedom?

16

Show question

Question

When considering whether gender and eye color are independent, you measure gender as either Male, Female, or Non-Binary. You also keep counts of 6 eye colors. How many degrees of freedom will your chi-square test have?

10

Show question

Question

A roulette wheel has 38 possible outcomes. You've been watching a roulette wheel for a few hours, gathering data to determine if the wheel is fair. How many degrees of freedom will your chi-square test have?

37

Show question

Question

Can you use percentages for a chi-square test? If not, what kind of data can you use?

No. A chi-square test requires count data.

Show question

Question

What is the formula for the test statistic in a hypothesis test for population proportion?

$z=\frac{\hat{p}-p}{\sqrt{\frac{p(1-p)}{n}}}$

Show question

Question

A politician says that $$50\%$$ of gun crime in Chicago is committed with illegal weapons. What kind of hypothesis test for population proportion can you use to test this claim?

A two-tailed test

Show question

Question

A poll says that over $$90\%$$ of Americans believe assault rifles should not be banned. You've performed a survey and gathered your own random sample of responses. What kind of hypothesis test for population proportion would you use to test the claim made by the first poll?

A one-tailed test

Show question

Question

You want to know if more than $$25\%$$ of your customers are Spanish speakers. What are the correct null and alternative hypotheses for this test?

$$h_0 \leq 0.25$$

$$h_1 > 0.25$$

Show question

Question

As a quarterback, you want to know if you complete at least $$80\%$$ of passes during games. Your friend randomly tracks your performance, counting catches and incompletes. What null and alternative hypotheses should you use to test your completion percentage?

$$h_0 \geq 0.8$$

$$h_1 < 0.8$$

Show question

Question

What null and alternative hypotheses would you use to test the claim that approximately $$25\%$$ of married men have affairs?

$$h_0 = 0.25$$

$$h_1 \neq 0.25$$

Show question

Question

In a blind taste test, $$122$$ people preferred Pepsi and $$80$$ people preferred Coca Cola. Do more people in general prefer Pepsi?

The p-value from this hypothesis test is approximately $$0.04$$; at $$0.05$$ significance, there is evidence that more than $$50\%$$ of people prefer Pepsi over Coke.

Show question

Question

In a survey of $$708$$ households, approximately $$59\%$$ had at least one dog. Using a significance level of $$0.05$$, test the hypothesis that $$55\%$$ of American households have a dog.

This is a two-tailed hypothesis test. The total p-value is approximately $$0.25$$, which is higher the significance level. Therefore, we do not have evidence that the proportion of households that own dogs is different from $$0.55$$.

Show question

Question

How do you know if your sample size is large enough for a hypothesis test for population proportion?

The sample size is big enough when $$np \geq 10$$ and $$n(1-p) \geq 10$$.

Show question

Question

The sample hypothesis test for the difference of two population means should be random.

True.

Show question

Question

Which of the following is not a possibility when comparing the means?

$$\mu _{1}\leq \mu _{2}$$.

Show question

Question

The observation for the hypothesis test are ____?

Independent.

Show question

Question

State all three possibilities of the alternative hypothesis.

$h_{1}:\mu _{1}-\mu _{2} > 0$

$h_{1}:\mu _{1}-\mu _{2} < 0$

$h_{1}:\mu _{1}-\mu _{2}\neq 0$

Show question

Question

Write the test statistic formula for the hypothesis test for the difference of two population mean

$t=\frac{\left ( \bar{x}_{1}-\bar{x}_{2} \right )-\left ( \mu _{1}-\mu _{2} \right )}{\sqrt{\frac{s{_{1}}^{2}}{n_{1}}+\frac{s{_{2}}^{2}}{n_{2}}}}$

Show question

Question

The difference in population mean conclude ____?

Rejection of null hypothesis.

Show question

Question

The variables are normally distributed in both populations.

True.

Show question

Question

Null hypothesis states ____?

Difference in population mean.

Show question

Question

Combined variance is also known as ____?

Grouped variance.

Show question

Question

Which of the following is the CORRECT degree of freedom, when considering pooled variance?

$$n_1+n_2$$.

Show question

Question

Write the test statistics formula for the estimation of pooled variance.

$t=\frac{\left ( \bar{x}_{1}-\bar{x}_{2} \right )-\left ( \mu _{1}-\mu _{2} \right )}{s_{pooled}\sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}}}$

Show question

Question

A viral disease outbreak is reported in children. 57 children got infected with it and 43 of them already had the vaccination.

Similarly, another case was recorded in the past with 100 infected children. Out of which 83 had the vaccination. Write the CORRECT hypothesis claim to check the difference between the vaccination rate between two groups?

$$h_{0}:\mu _{1}-\mu _{2} = 0$$, $$h_{1}:\mu _{1}-\mu _{2}\neq 0$$.

Show question

Question

What variable is considered in the following case?

The hypothesis test is performed to determine the difference between the average salary of New York residents and California residents.

Salary.

Show question

Question

Is pooled test used for real-life scenario hypothesis testing?

Yes

Show question

Question

What is compared to the p-value in the t-test distribution to determine the conclusion of acceptance or rejection of the hypothesis?

Significance level.

Show question

Question

The confidence interval of a population proportion can be said to be_

the level of certainty that the real or actual population proportion falls within an estimated range of values.

Show question

Question

True or False?

The confidence interval for a population proportion gives you an estimated boundary or range for which the exact value is expected to be found, with a specified level of assurance.

True.

Show question

Question

True or False?

There are only 3 confidence levels.

False.

Show question

Question

True or False?

Statisticians mostly use the $$90\%$$ confidence level.

False.

Show question

Question

While choosing confidence level, you should tend to be ___.

more precise and more certain.

Show question

60%

of the users don't pass the Inferences in Statistics quiz! Will you pass the quiz?

Start Quiz

## Study Plan

Be perfectly prepared on time with an individual plan.

## Quizzes

Test your knowledge with gamified quizzes.

## Flashcards

Create and find flashcards in record time.

## Notes

Create beautiful notes faster than ever before.

## Study Sets

Have all your study materials in one place.

## Documents

Upload unlimited documents and save them online.

## Study Analytics

Identify your study strength and weaknesses.

## Weekly Goals

Set individual study goals and earn points reaching them.

## Smart Reminders

Stop procrastinating with our study reminders.

## Rewards

Earn points, unlock badges and level up while studying.

## Magic Marker

Create flashcards in notes completely automatically.

## Smart Formatting

Create the most beautiful study materials using our templates.