Feature transformation involves modifying data while keeping the information to give them more discriminatory power. The new data may have different interpretation from the original data. Key feature transformation methods are outlined below.
Numeric Data #
Binning or bucketing converts numeric features into categorical type normally to reduce cardinality, e.g., converting income figures into categories “low”, “medium”, and “high”, or converting values of transactions in an eCommerce shop into 50 bins only. Popular approaches are equal-width binning, equal-frequency binning and entropy-based binning.
Numeric to binary is a special case of binning with only 2 bins – true or false, e.g. a class mark less than 50% is encoded as “fail” and a mark 50% or above is encoded “pass”
Scaling enlarges or diminishes values of features that have very different scales or units to a specific range (e.g., 50 to 100) in order to eliminate the dominance of some features over others; e.g., slight changes in “house price” will dominate results for a dataset that contains “house price” stated in hundreds of thousands of dollars and “number of rooms”. Scaling is required by algorithms such as k-means and k-nearest neighbors that work by calculating the distance between the data points. Common methods are min-max scaling, mean scaling and standardization (Z-score).
Categorical Data #
Categorical values must be converted to numbers because many machine learning algorithms accept only numbers as input.
Label encoding assigns a different integer to each unique value for ordinal features. See Figure 1.
One-hot-encoding creates a new column for each possible value of a categorical feature and maps the value to 1 or 0 depending on the presence of the value. This is useful for nominal features (where there is no intrinsic ranking or order), Figure 2.
Flagging replaces a value by a 0/1 flag indicating presence or absence. It equally applies to mapping binary values.
Functional Transformation #
Functional transformation changes the underlying shape of data to fulfill the assumptions of certain algorithms or to enable better visualization, e.g., log transformation to make quadratic distributions more linear to enable separation of classes.
Normalization is the most popular method of functional transformation that transforms data in a feature so that its distribution is normal/ bell curve /Gausian (i.e., remove skewness, linearity or heteroskedasticity) in order to meet the requirements of algorithms such as linear discriminant analysis that assume data is normally distributed. Normalization preserves impact of outliers. Common transformations are log, square root and inverse.
Other Transformations #
Date to numeric transformation extracts components from date or time as numeric values relative to other components; e.g., hour relative to day, hour relative to month, or day relative to year
Text processing encodes text data in a way which is ready to be used by machine learning algorithms e.g., in sentiment analysis. Examples of such encoding are TFIDF vectorization and tokenization.