Data transformation is the process of changing the data in some way. More formally, a transformation involves creating a new variable or set of variables from an existing variable or set of variables.
Objectives of transformation
Data transformation is undertaken with the following objectives:
- Making it easier to see patterns in the data (e.g., the Log transformations and Principal Components Analysis).
- Making it easier to communicate patterns in the data (e.g., the Net Promoter Score).
- To address violations of the assumptions of statistical tests (e.g., Ranks, Log transformations).
- To improve the validity of regression models (e.g., Basis Functions).
- To reduce the amount of data (e.g., Principal Components Analysis).
Standard transformations of a categorical variable
A categorical variable can be transformed in one of two ways:
- It can be turned into a numeric variable, by coming up with some rules about the numeric interpretation of categories. For example:
- The categories of a categorical variable can be combined. Most commonly, small categories are merged into larger categories. For example:
Standard transformations of a numeric variable
A more up-to-date version of this content is on www.displayr.com.