Suggested languages for you:

Americas

Europe

|
|

# Comparing Data

You have probably already come across methods of analysing and interpreting data in given data distributions. In many real-world applications, we are required to compare information between multiple data sets. Let's look at how to compare data between data distributions.

## Comparing data distributions

When comparing multiple data distributions, you can comment on

• A measure of location – a measure of location is used to summarise an entire data set with a single value. For example, mean and median are measures of location.

• A measure of spread – a measure of spread provides us information regarding the variability of data in a given data set, i.e. how close or far away the different points in a data set are from each other. Standard deviation and interquartile range are examples of measures of spread.

You can compare different data distributions using the mean and standard deviation, or using the median and interquartile ranges. In cases where data sets contain extreme values and/or outliers, median and interquartile ranges are usually more appropriate to use.

Do not use the median and standard deviation together or the mean and interquartile ranges together.

Let's explore the concept further with the help of examples.

## Comparing mean and standard deviations of data sets

The daily mean temperatures during August is recorded at Heathrow and Leeming. For Heathrow, ∑x=562, ∑x²=10301.2. For Leeming, the mean temperature was 15.6°C with a standard deviation of 2.01° C

a) Calculate the mean and standard deviation for Heathrow. b) Compare the data for Heathrow with that of Leeming.

Solutions

For Heathrow,

a)

b) From the above information, we see that the mean temperature at Heathrow during August was higher than Leeming, and the spread/variability of temperatures was less than Leeming.

A company collects the delivery times in minutes for suppliers A and B for a period of 20 days. The following is the result of the data collected. Compare the performance of the two suppliers.

 suppliers ∑x ∑x² A 360 18000 B 300 29000

solutions

For supplier A,

For supplier B,

From the above information, we see that supplier A has a longer delivery time, while supplier B has a greater spread in delivery time.

Consider the above example in a real-world context. If the company wants to keep one of its suppliers and let go of the other, it could compare the above data just like we have. If the priority of the company is to reduce delivery times on average, it would favour supplier B. If the priority on the other hand is greater reliability, it would favour the supplier with less variability, and that would be supplier A.

## Comparing median and interquartile range of data sets

The students of two different sections sit for an exam. The quartile and median marks of each section is provided. Compare the performance of the 2 sections.

 Section median Section 1 58 71 87 Section 2 62 74 83

Solutions

The interquartile range for Section 1 =

= 87-58

=29

The interquartile range for Section 2 =

= 83-62

=21

From the given data, we see that the median marks is higher for section 2, while the variability of marks is higher in section 1.

A company collects the delivery times for suppliers, A and B, for a period of 20 days. The median delivery time was 4 hours for supplier A, and 3 hours for supplier B. The interquartile range for supplier A was 0.8 hours and for supplier B was 1.5 hours.

Compare the performance of the suppliers in terms of speed and reliability.

Solutions

Supplier B appears to be the more efficient performing better in terms of speed with a lower median delivery time. Supplier A appears to be more reliable with a lower spread/variability in delivery time.

## Comparing Data - Key takeaways

• In many real-world applications we are required to compare information between multiple data sets.
• When comparing multiple data distributions, you can comment on
• a measure of location
• You can compare different data distributions using the mean and standard deviation, or using the median and interquartile ranges.

Bar graphs allow you to easily visualise the measures of location and spread.

In many real-world applications, we are required to compare information between multiple data sets to make better-informed decisions.

## Final Comparing Data Quiz

Question

What are the two types of measures that are usually commented on when comparing data distributions?

1. measure of location

Show question

Question

What is a measure of spread?

a measure of spread provides us information regarding the variability of data in a given data set, i.e. how close or far away the different points in a data set are from each other.

Show question

Question

What is a measure of location?

a measure of location is used to summarize an entire data set with a single value.

Show question

Question

Data set A - median 25, Q1 = 18, Q3 = 56

Data set B - median 24, Q1 = 14, Q3 = 130

Data set A has a lower measure of location (median) and also a lower variability among the data.

Show question

Question

Data set A - median 100, Q1 = 50, Q3 = 150

Data set B - median 200, Q1 = 150, Q3 = 250

Data set A has a lower measure of location (median). There appears to be an equal variability among the data sets.

Show question

Question

Data set A - median 300, Q1 = 275, Q3 = 325

Data set B - median 200, Q1 = 150, Q3 = 250

Data set A has a higher measure of location (median) and a lower variability among the data.

Show question

Question

Which of the following is appropriate to use along with median for comparison?

Interquartile range

Show question

Question

Which of the following is appropriate to use along with mean for comparison?

standard deviation

Show question

Question

Which of the following is appropriate to use along with

standard deviation for comparison?

mean

Show question

Question

Which of the following is appropriate to use along with interquartile range for comparison?

mean

Show question

Question

Which of the following should you use for comparing a data set with extreme values?

mean and standard deviation

Show question

Question

Compare the 2 data sets

Data set A - mean 100, standard deviation = 50

Data set B - mean 200, standard deviation = 50

Data set A has a lower measure of location (mean). There is an equal variability among the data sets.

Show question

Question

Compare the 2 data sets

Data set A - mean = 13, standard deviation = 5

Data set B - mean = 18, standard deviation = 15

Data set A has a lower measure of location (mean) and a lower variability among the data sets.

Show question

Question

Compare the 2 data sets

Data set A - mean = 13, standard deviation = 5

Data set B - mean = 13, standard deviation = 5

Both data sets have similar measures of location and spread within the data.

Show question

60%

of the users don't pass the Comparing Data quiz! Will you pass the quiz?

Start Quiz

## Study Plan

Be perfectly prepared on time with an individual plan.

## Quizzes

Test your knowledge with gamified quizzes.

## Flashcards

Create and find flashcards in record time.

## Notes

Create beautiful notes faster than ever before.

## Study Sets

Have all your study materials in one place.

## Documents

Upload unlimited documents and save them online.

## Study Analytics

Identify your study strength and weaknesses.

## Weekly Goals

Set individual study goals and earn points reaching them.

## Smart Reminders

Stop procrastinating with our study reminders.

## Rewards

Earn points, unlock badges and level up while studying.

## Magic Marker

Create flashcards in notes completely automatically.

## Smart Formatting

Create the most beautiful study materials using our templates.