Suggested languages for you:
|
|

## All-in-one learning app

• Flashcards
• NotesNotes
• ExplanationsExplanations
• Study Planner
• Textbook solutions

# Probability and Statistics

## Want to get better grades?

• Flashcards
• Notes
• Explanations
• Study Planner
• Textbook solutions

Probability, the science of chance, and statistics, the science of interpreting data, influence and govern our daily lives. They are used to predict the weather, determine the effectiveness of medicine and are an important process in making scientific breakthroughs. They can even help us play card games.

More generally, probability and statistics help individuals and institutions make better decisions by learning from events in the past and applying this knowledge to the present to affect the future.

## The relationship between probability and statistics

It is important to understand the differences and similarities between probability and statistics. They are different but related subjects.

Probability is a theoretical subject used to analyse the likelihood of events happening in the future. On the other hand, statistics is an applied subject which uses probability theory to analyse data which has been collected.

Probability and statistics are related in that the theories we develop in probability mathematics are compared with statistical findings which can tell us more information about the data. We can also use statistics to estimate the probability of something happening in the future.

## Probability

Probability is the likelihood of some event occurring. The probability of an event happening is always a fraction, decimal or percentage between 0 and 1.

Probability is the branch of mathematics that studies the likelihood of events occurring.

The basic formula for probability is the following:

### Some terminology for probability

There is a lot of terminology in probability which you need to be familiar with. The better you understand the basic concepts detailed below, the easier it will be when you tackle trickier probability questions!

 Terminology Definition Experiment or trial A repeatable action which has a defined set of outcomes. Sample space The set of all possible outcomes of an experiment, usually denoted by or S. Event An event is one particular outcome, usually denoted by a capital letter. The probability of event A occurring is commonly notated as P(A).

#### An example of probability

You are on a game show and you have to choose between 2 boxes. One contains a sports car and the other is empty. You are given no clues as to which box contains the prize and must therefore choose a box at random. What is the probability you win the sports car?

Solution

Here, the sample space is S = {W, L} where W stands for 'win' and L stands for 'lose'. This is the sample space since these are the only possible outcomes.

Intuitively, since you are choosing at random, there is an equal probability of choosing the prize and choosing the empty box. Since probabilities are always between 0 and 1, the probability of choosing the sports car is 0.5 (= 50%).

We can also arrive at this answer by using the basic formula for probability. The number of ways to choose the box with the sports car is 1, and the total number of outcomes (either choosing the sports car or choosing the empty box) is 2. Using mathematical notation, the probability of winning is as follows:

.

Therefore, the probability of choosing the sports car and therefore winning is which is equivalent to 50% or 0.5.

## Statistics

Data usually contains far too many individual numbers for the human brain to even comprehend, let alone understand. This is why we need statistics: to help us to better understand the underlying complexities in the data we have collected.

Statistics concerns the analysis and interpretation of data which has been collected.

The sorts of tasks you will have in statistics questions will fall into one of three categories: data collection, analysis, and visualisation and interpretation.

### Data collection

The type of data collected will depend on the statistical question you want answered, i.e. the characteristic you want to study. You might be interested in collecting data through holding a scientific experiment (e.g. measuring the effect of location on the growth of plants) or perhaps by collecting data through observation (e.g. recording the number of students that are late for class).

All data falls into two categories: qualitative and quantitative.

Qualitative data refers to descriptive data, such as words. It might be collected through interviews or surveys.

Quantitative data refers to things which can be represented using numbers, like measurements such as height and weight, or quantities of something that we are interested in knowing more about. Important methods include:

• conducting a census

• data is collected from every single member of a population

• sampling

• data is collected from a subset of a population called a sample

• this subset is either randomly chosen or is chosen to be representative of the wider population

• controlled experiments

• a scientific procedure is planned such that accurate data can be collected and analysed

Quantitative data also falls into two categories: continuous and discrete.

Continuous variables can take on any value within a range, which can make it more difficult for us to count them. For example, length can be measured to as many decimal places as is possible. It's up to you, the data collector, to decide how many decimal places are necessary.

Discrete variables must take on particular values within a range, which makes it easier to count them. For example, shoe size can be 4 or 4.5, but not 4.26735!

### Analysis

Once you have collected your data, it is now time to start analysing it. Since it is very difficult (and often impossible!) to understand data in its raw form, we need to condense it into manageable descriptions that retain as much information about the data as possible whilst being understandable. This is the ultimate goal of statistics: to understand, describe and find meaningful information from datasets.

Descriptive statistics are particularly useful. These are numbers that tell us something about the data. There are two kinds: measures of location and measures of spread.

Measures of location use one statistic to summarise a dataset. They include:

• mean

• a particularly common and useful statistic that requires summing values and dividing by the number of values, often denoted by (pronounced 'x bar'):

;

• median

• the middle value in an ordered list of values;

• mode

• the most common value.

Measures of spread describe the variability of the data. They include:

• range

• this is the largest value minus the smallest value in the data;

• interquartile range

• this describes the range of the central 'quartiles' of the data, between 25% and 75% which surround the median at 50%.

Data is often be represented in frequency tables. Frequency refers to the number of times something occurs, which in the case of statistics will be the number of times a particular data value occurs. From a frequency table, you will be able to extract descriptive statistics such as the ones listed above. You will also be able to use these to visualise data.

### Visualisation and interpretation

Descriptive statistics are useful in condensing data into a small amount of information, and can tell us about the location and spread of the data. Data visualisations, on the other hand, are able to graphically represent the data. For a thorough analysis, you would ideally use both methods to be able to fully understand the behaviour of the data.

We will now go through some examples of different data visualisations. Don't worry if you can't yet fully understand the examples below – there are in-depth explanations in other articles!

#### Line graphs

These are useful in representing continuous data and trends.

The following is an example of a line graph:

StudySmarter originals, Rebecca Farthing

#### Bar charts

These are useful in representing data which is grouped.

The following is an example of a bar chart:

StudySmarter originals, Rebecca Farthing

#### Histograms

These are useful in representing the frequency of something happening.

The following is an example of a histogram:

StudySmarter originals, Rebecca Farthing

#### Pie charts

These are useful in representing proportional data.

The following is an example of a pie chart:

StudySmarter originals, Rebecca Farthing

#### Box-and-whisker plots

These visualise the range, interquartile range and median.

The following is an example of a box-and-whisker plot:

StudySmarter originals, Rebecca Farthing

#### Scatter graphs

Shows the relationship between two variables.

The following is an example of a scatter plot:

StudySmarter originals, Rebecca Farthing

Finally, once the data has been analysed and represented graphically, we can draw conclusions from the data. Have a look at the example below.

#### An example of statistics

The following is data collected from a team's football matches. The frequency of the number of goals scored by each player per match is recorded from a total of 20 matches.

 Team member Scored 0 goals Scored 1 goal Scored 2 goals Scored 3 goals Scored 4 goals Zack 5 6 4 4 1 Josh 9 5 7 0 0 Amy 0 13 3 4 0 Ahmed 2 11 4 1 2 Emily 7 12 0 0 1

a) What type of data is presented here?

b) Which team member scores the largest number of goals over the course of the season?

c) Which player scored the highest average number of goals per game?

d) Represent this data using a pie chart.

e) What does this data tell us about the performance of the players?

Solution

a) Frequency and goals scored per match are both quantitative, discrete data. It is impossible to score 0.5 goals!

b) By multiplying the number of goals by the frequency, f, we can find the total goals scored by each player:

 Team member f x 0 f x 1 f x 2 f x 3 f x 4 Total goals Zack 0 5 4 × 2 = 8 4 × 3 = 12 2 × 4 = 8 33 Josh 0 5 7 × 2 = 14 0 × 3 = 0 0 × 4 = 0 19 Amy 0 13 3 × 2 = 6 4 × 3 = 12 0 × 4 = 0 32 Ahmed 0 11 4 × 2 = 8 1 × 3 = 3 2 × 4 = 8 30 Emily 0 12 0 × 2 = 0 0 × 3 = 0 1 × 4 = 4 16
We can see that the most goals scored by one player was by Zack, who scored 33 during the season.c) To find the average number of goals per match, we need to divide the total number of goals by the number of matches:Zack: goals per match

Josh: goals per match

Amy: to 2 d.p. goals per match

Ahmed: goals per match

Emily: goals per match

d) A pie chart shows proportional data. This means we need to work out what proportion of the total goals in the season are scored by each player. To do this, we should divide each player's total goals by the total number of goals scored during the season.Total number of goals scored by all players = 33 + 19 + 32 + 30 + 16 = 130Zack: Josh: Amy: Ahmed: Emily:
StudySmarter originals, Rebecca Farthing
e) There isn't just one right answer here! This is about interpretation, so what is important is to have your answer backed up by statistics that you have found from the data.A possible conclusion we might take from the data is that the three highest-scoring players, Zack, Amy and Ahmed, all contributed a similar number of goals, making up 73% of the goals scored. On the other hand, Ahmed and Emily together contributed 27% of the goals, which is considerably less.

## Probability and Statistics - Key takeaways

• Probability is the branch of mathematics that studies the likelihood of events occurring
• An experiment or trial is a repeatable action which has a defined set of outcomes
• A sample space is the set of all possible outcomes of an experiment
• An event is one particular outcome. The probability of event A occurring is P(A)
• Statistics concerns the analysis and interpretation of data which has been collected
• Data can be qualitative or quantitative
• Quantitative data can be discrete or continuous
• Measures of location use one statistic to summarise a dataset
• e.g. mean, median and mode
• Measures of spread describe the variability of the data
• e.g. range and interquartile range
• Some examples of data visualisations are line graphs, bar charts, histograms, pie charts, box-and-whisker plots and scatter plots

Probability is the science of chance.

Statistics concerns the analysis and interpretation of data which has been collected.

An example of statistics and probabilities is a coin toss: we know that if the coin is unbiased, the probability of 'heads' is 0.5.

Solving probabilities requires applying probability theory and logic. Using statistics requires analysing data.

One of the main rules of probability theory is that a probability can never be greater than 1 or less than 0.

## Final Probability and Statistics Quiz

Question

What are the two types of measures that are usually commented on when comparing data distributions?

1. measure of location

Show question

Question

What is a measure of spread?

A measure of spread provides us information regarding the variability of data in a given data set, i.e. how close or far away the different points in a data set are from each other.

Show question

Question

What is a measure of location?

A measure of location is used to summarize an entire data set with a single value.

Show question

Question

Compare

Data set A - median 25, Q1 = 18, Q3 = 56

Data set B - median 24, Q1 = 14, Q3 = 130

Data set A has a lower measure of location (median) and also a lower variability among the data.

Show question

Question

Compare

Data set A - median 100, Q1 = 50, Q3 = 150

Data set B - median 200, Q1 = 150, Q3 = 250

Data set A has a lower measure of location (median). There appears to be an equal variability among the data sets.

Show question

Question

Compare

Data set A - median 300, Q1 = 275, Q3 = 325

Data set B - median 200, Q1 = 150, Q3 = 250

Data set A has a higher measure of location (median) and a lower variability among the data.

Show question

Question

Which of the following is appropriate to use along with median for comparison?

standard deviation

Show question

Question

Which of the following is appropriate to use along with mean for comparison?

standard deviation

Show question

Question

Which of the following is appropriate to use along with standard deviation for comparison?

mean

Show question

Question

Which of the following is appropriate to use along with interquartile range for comparison?

mean

Show question

Question

Which of the following should you use for comparing a data set with extreme values?

mean and standard deviation

Show question

Question

Compare the 2 data sets

Data set A - mean 100, standard deviation = 50

Data set B - mean 200, standard deviation = 50

Data set A has a lower measure of location (mean). There is an equal variability among the data sets.

Show question

Question

Compare the 2 data sets

Data set A - mean = 13, standard deviation = 5

Data set B - mean = 18, standard deviation = 15

Data set A has a lower measure of location (mean) and a lower variability among the data sets.

Show question

Question

Compare the 2 data sets

Data set A - mean = 13, standard deviation = 5

Data set B - mean = 13, standard deviation = 5

Both data sets have similar measures of location and spread within the data.

Show question

Question

What is a box plot?

The box plot is a type of graph that helps visualize the five-number summary: the median, the lower and upper quartiles, and the lower and upper extremes of the data set.

Show question

Question

True/False: Box plot is a very helpful tool when it comes to displaying variation in a set of data.

True.

Show question

Question

What elements are needed in making a box plot?

Minimum value of the data set

Maximum value of the data set

Lower quartile

Median

Upper quartile

Show question

Question

What is the first step in making a box plot given a data set?

Rearrange the data set in ascending order

Show question

Question

What is the lower quartile?

The value below which the lower 25% of the data are contained.

Show question

Question

Given the data set 23, 42, 12, 10, 15, 14, 9 , what is the lower quartile?

10

Show question

Question

What is the upper quartile?

The value above which the upper 25% of the data are contained.

Show question

Question

Given the data set 23, 42, 12, 10, 15, 14, 9, what is the upper quartile here?

23

Show question

Question

True/False: Box plot is not healthy in displaying data with outliers.

False.

Show question

Question

Which of these is the most appropriate reason to use a box plot to visualise data?

Visualising scores between classrooms or schools.

Show question

Question

What is the next step in creating a box plot after rearranging data values in ascending order?

Identifying the minimum and maximum values

Show question

Question

What is the first thing to do in creating the box plot after the all required values have been found?

First draw a number line that fits the data, and plot all the necessary values we found.

Show question

Question

Given the data set 23, 42, 12, 10, 15, 14, 9, rearrange in ascending order

9, 10, 12, 14, 15, 23, 42

Show question

Question

True/False: In constructing the box plot, the rectangle encloses the minimum value of the data set.

False.

Show question

Question

Another name for the box plot is ____

Box plot-and-whisker

Show question

Question

What are measures of central tendency?

Measures of central tendency are techniques used to indicate the approximate central point of a data set.

Show question

Question

What are the measures of central tendency

Mean, mode, median, and moving averages.

Show question

Question

How do you find the mean of a data set?

Mean is found by adding all the value points in the data set and then dividing that resultant value by the number of value points added.

Show question

Question

What is the denotation of mean?

𝜇

Show question

Question

What will be the mean for the given data set; 6, 14, 18, 23, 25, 37, 44, 78?

30.625

Show question

Question

What is the median?

The median is the middle element of a data set when the values are arranged from lowest to highest.

Show question

Question

What will be the median for the given data set; 6, 14, 18, 23, 25, 37, 44, 78?

24

Show question

Question

What is mode?

The mode in statistics is the most occurring value in a given data set.

Show question

Question

What is the mode for the given data set; 4, 5, 6, 14, 18, 23, 25, 37, 44, 78, 14?

14

Show question

Question

What is the moving average?

The moving average is a type of measure of central tendency that accounts for average change in a data series over time.

Show question

Question

What will be the mean of the given data set; 88, 73, 60, 89, 79, 73, 69, 65, 71?

74

Show question

Question

What is statistics?

Statistics is the science that is concerned with studying methods for collecting and developing, analyzing, interpreting, and presenting empirical data.

Show question

Question

Which of these is not a type of statistic?

Qualitative data

Show question

Question

What are the types of statistics?

Descriptive statistics

Inferential statistics

Show question

Question

What is descriptive statistics?

Descriptive statistics deals with the brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire population or a sample of a population.

Show question

Question

What does μ  statistic symbol mean?

Population mean

Show question

Question

What is data?

Data, in relation to statistics, is the individual piece of information recorded and used for the purpose of analysis.

Show question

Question

What are the types of data?

Qualitative and quantitative data

Show question

Question

Describe Quantitative data

This type of data can be measured in numerical values.

Show question

Question

What are the two types of quantitative data?

Discrete data and continuous data.

Show question

Question

State, whether the age of students collected, is qualitative or quantitative.

Quantitative data

Show question

60%

of the users don't pass the Probability and Statistics quiz! Will you pass the quiz?

Start Quiz

## Study Plan

Be perfectly prepared on time with an individual plan.

## Quizzes

Test your knowledge with gamified quizzes.

## Flashcards

Create and find flashcards in record time.

## Notes

Create beautiful notes faster than ever before.

## Study Sets

Have all your study materials in one place.

## Documents

Upload unlimited documents and save them online.

## Study Analytics

Identify your study strength and weaknesses.

## Weekly Goals

Set individual study goals and earn points reaching them.

## Smart Reminders

Stop procrastinating with our study reminders.

## Rewards

Earn points, unlock badges and level up while studying.

## Magic Marker

Create flashcards in notes completely automatically.

## Smart Formatting

Create the most beautiful study materials using our templates.