Suggested languages for you:

Americas

Europe

|
|

# Data transformations

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Data transformation is where a particular combination of mathematical operations (such as addition or multiplication) is applied to every single data point in a set. It is especially useful in making our lives easier when handling tricky numbers. When transforming data, it is essential that we know how the transformation affects the statistical parameters like measures of central tendency (i.e., the mean) and dispersion (i.e., the standard deviation).

## Why Do We Transform Data in Statistics?

Suppose we wanted to find the mean of four numbers: 305, 306, 305, and 310. We could use a calculator to work this out.

$\frac{305+304+305+310}{4}=\frac{1224}{4}=306$

But there is a much easier way. You could even do this without a calculator. What we could do instead is subtract 300 from each of the four values. Now the data values are 5, 4, 5, and 10, and the mean of these is 6. Of course, 6 is not the mean of the original data points, but if we add back the 300 we subtracted, we get 306. This is the mean we worked out using a calculator, but instead this time we used data transformation.

## Statistical Data Transformation Methods and Techniques

There are several methods and techniques to transform statistical data so that it becomes easier to manipulate and interpret. Here we look at data transformation by addition and data transformation by multiplication.

By far the easiest technique to keep track of what is happening to the data points is to assign a new variable to the transformation formula. In general, for data transformation by addition, this formula will be in the form $y=x-a$.

Let's look at this in the context of our example from earlier. We can assign the original data points to the variable $x$ which will give us the transformed variable $y$. Now, we can find out the mean of $y$ which is known as$\overline{y}$('y-bar'). In the example from earlier where we subtracted 300, $a$ will be 300.

It is worth noting that exam questions will almost always specify the transformation formula you need to apply. In the case that they don't specify, try to think about how you can make the data smaller and easier to handle.

Step 1. To find the new y-value data points, use the formula:

$y=x-300$.

The x-value data points are as follows: ${x}_{1}=305,{x}_{2}=304,{x}_{3}=306,{x}_{4}=310$.

The y-value data points are therefore:

${y}_{1}={x}_{1}-300=305-300=5\phantom{\rule{0ex}{0ex}}{y}_{2}={x}_{2}-300=304-300=4\phantom{\rule{0ex}{0ex}}{y}_{3}={x}_{3}-300=306-300=6\phantom{\rule{0ex}{0ex}}{y}_{4}={x}_{4}-300=310-300=10$

Step 2. Find the mean: $\overline{)y}=\frac{{y}_{1}+{y}_{2}+{y}_{3}+{y}_{4}}{4}=\frac{5+4+5+10}{4}=6$

Step 3. To convert $\overline{)y}$ back into $\overline{)x}$, use the formula $\overline{)y}=\overline{)x}-300$ to find the mean of the original data points:

$\overline{)y}=\overline{)x}-300=6$,

therefore, $\overline{)x}=300+6=306$.

So, we have successfully found the mean. But what about standard deviation, ${\sigma }_{x}$? First, let's work out the standard deviation of y, ${y}_{x}$.

Step 1. Use the standard deviation formula:

${\sigma }_{y}=\sqrt{\frac{\sum _{}{\left({y}_{i}-\overline{)y}\right)}^{2}}{N}}$.

(To recap, $N$ is the number of data points, ${y}_{i}$ is each data point, and $\overline{)y}$ is the mean of y.

Step 2. Use the values for y that we found earlier, 5, 4, 5, 10, and sum the squares of the deviations from the mean, 6.

Step 3. Find the standard deviation:

${\sigma }_{y}=\sqrt{\frac{22}{4}}=2.3452$to 5 s.f.

Now, what about ${\sigma }_{x}$ (the standard deviation of the x-values data points)?

Step 1. Use the standard deviation formula:

${\sigma }_{x}=\sqrt{\frac{\sum _{}{\left({x}_{i}-\overline{)x}\right)}^{2}}{N}}$.

Step 2. Use the original values for $x$, 305, 304, 305, 310, and sum the squares of the deviations from the mean, 306.

Step 3. Find the standard deviation:

${\sigma }_{x}=\sqrt{\frac{22}{4}}=2.3452$ to 5 s.f.

Notice that the standard deviations of both the x and y values are the same. When adding a number to each data point, why is the standard deviation the same but the mean is different? Why do transformations affect statistical parameters differently?

Intuitively, since standard deviation is a measure of spread, its value should be preserved even if the data points are all higher. We can see in the example that since the sum of the squares of the deviations from the mean does not change, neither will the standard deviation when using the transformation formula $y=x-a$.

The fact that transforming data by adding or subtracting does not change the standard deviation is important to remember. Mathematically speaking, a has no effect on the standard deviation of both the original and the transformed data sets.

### Data Transformation by Multiplication

Say we have a different set of data points: 152, 160, 128, 136. We could potentially subtract a number from these, but another possible approach would be to divide each point by a number, $b$, thereby using the transformation formula $y=\frac{x}{b}$.

Unlike with transformation by addition, the standard deviation will need to be de-coded as well as the mean.

We can use multiplication and addition simultaneously, in which case the formula would be in the form ${y}{=}\frac{x-a}{b}$

Let's try to find the mean and standard deviation using this formula.

Step 1: To find the new $y$ values, use the formula $y=\frac{x}{8}$.

Step 2: Find the transformed mean,

Step 3: Find the mean of the original data points, $\overline{)x}=\overline{)y}×8=18×8=144$.

Step 4: Find the transformed standard deviation,

${\sigma }_{y}=\sqrt{\frac{\sum _{}{\left({y}_{i}-\overline{)18}\right)}^{2}}{4}}=\sqrt{\frac{{1}^{2}+{2}^{2}+{2}^{2}+{1}^{2}}{4}}=\sqrt{\frac{10}{4}}=1.5811$to 5 s.f.

Step 5: Finally, find the standard deviation of the original data points: ${\sigma }_{x}=8×1.5811...=12.649$to 5 s.f.

Notice that parameter b has affected the standard deviation.

## Data Transformations - Key takeaways

• Data transformations are mathematical operations applied to every data point in a set.
• A formula connects the original data points to the transformed data points.
• For transformation by addition: $y=x-a$
• For transformation by multiplication: $y=\frac{x}{b}$
• For both addition and multiplication simultaneously: $y=\frac{x-a}{b}$
• The value of a has no effect on the standard deviation, but does affect the mean.
• The value of b affects both standard deviation and mean.

Data transformation allows for easier handling of tricky numbers (such as numbers that are very large).

Types of data transformation include transformation by addition ( or subtraction) and multiplication (or division).

If data has negative values, you can either add a value that would make the data positive, or multiply all values by –1.

The best method for manual data transformation (i.e. without the help of software) is by assigning a new variable to the transformation formula.

## Data transformations Quiz - Teste dein Wissen

Question

What are some of the different types of data transformation in statistics?

Transformation by multiplication (or division) and addition (or subtraction).

Show question

Question

Which of the following is the best definition for a data transformation?

Where a number is added to every data point in a set.

Show question

Question

Why do we use data transformations?

Generally, data transformations are useful since they can make the data points easier to work with. For example, data transformation could be used to make large numbers smaller.

Show question

Question

Which of the following are true?

A data transformation is applied to every single data point in a set.

Show question

Question

Which of the following are true?

A data transformation always affects the standard deviation of a data set.

Show question

60%

of the users don't pass the Data transformations quiz! Will you pass the quiz?

Start Quiz

## Study Plan

Be perfectly prepared on time with an individual plan.

## Quizzes

Test your knowledge with gamified quizzes.

## Flashcards

Create and find flashcards in record time.

## Notes

Create beautiful notes faster than ever before.

## Study Sets

Have all your study materials in one place.

## Documents

Upload unlimited documents and save them online.

## Study Analytics

Identify your study strength and weaknesses.

## Weekly Goals

Set individual study goals and earn points reaching them.

## Smart Reminders

Stop procrastinating with our study reminders.

## Rewards

Earn points, unlock badges and level up while studying.

## Magic Marker

Create flashcards in notes completely automatically.

## Smart Formatting

Create the most beautiful study materials using our templates.