Normalize Data Using scikit-learn in Python

By Chandrashekhar Fakirpure

Updated on Feb 06, 2024

In this tutorial, we will explain how to normalize data using scikit-learn in Python. Normalizing data is a crucial preprocessing step in machine learning to make sure that features are on a similar scale.

Scikit-learn provides a simple and effective way to normalize data. Scikit-learn has a function called preprocessing.normalize(). it allows the data normalize using various norms. It takes each sample row independently to have a unit norms. Each row in the dataset will have a length of 1 after normalization. 

There are 3 types of norms parameter. Here is a list of available norms that we can use with preprocessing.normalize().

1 - L1
It is known as least absolute deviation. It scales each row or column by the sum of its absolute values. Each row or column becomes sum of absolute values equal to 1.

2 - L2
It is known as Euclidean normalization. It scales each row or column by the square root of the sum of the squares of its elements. Each row or column becomes sum of absolute values equal to 1.

3 - max
It scales each row or column by its maximum absolute value. max norm make sure that the maximum absolute value of each row or column is 1.

Row-wise normalization

Here is the Python code of showing, how we can use preprocessing.normalize() to normalize data using scikit-learn:

from sklearn import preprocessing
import numpy as np

# Sample data
data = np.array([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0]])

# Normalizing the data using preprocessing.normalize()
normalized_data = preprocessing.normalize(data, norm='l2')  # 'l2' normalization

print("Normalized Data:")
print(normalized_data)

Here is the explaination of the above code:

  • First we have import preprocessing from sklearn and we have imported numpy
  • Next, we have created sample dataset and assigned to data variable. We have use numpy array.
  • We have used preprocessing.normalize() function to normalize data and used L2 norm. It scale each row or column becomes sum of absolute values equal to 1.
  • Finally, we print out the normalized data.

If we want, we can use other norms also to normalize the data. 

The crucial note is preprocessing.normalize() works on row-wise normalization. If we want to normalize column wise or using other techniques, we can use MinMaxScaler, StandardScaler, or custom normalization functions.

Column-wise normalization

We can use MinMaxScaler or StandardScaler classes to normalize the data. It provides simple and effective way to normalize data.

MinMaxScaler: It scales the data at the fixed range. It uses between 0 and 1 by subtracting the minimum value and dividing by the range of each feature.

StandardScaler: This scaler standardizes features by removing the mean and scaling to unit variance. It centers the data around 0 with a standard deviation of 1.

Following is the Python code that shows how to use both scalers:

from sklearn.preprocessing import MinMaxScaler, StandardScaler
import numpy as np

# Sample data
data = np.array([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0]])

# Using MinMaxScaler
scaler = MinMaxScaler()
normalized_data_minmax = scaler.fit_transform(data)

print("Min-Max Scaled Data:")
print(normalized_data_minmax)

# Using StandardScaler
scaler = StandardScaler()
normalized_data_standard = scaler.fit_transform(data)

print("\nStandard Scaled Data:")
print(normalized_data_standard)

Here is the explaination of the above code:

  • First we have imported MinMaxScaler and StandardScaler from sklearn.preprocessing and we have imported numpy
  • We have created sample dataset and assigned to data variable
  • We have used both MinMaxScaler and StandardScaler
  • We then apply the fit_transform() method to normalize the data.
  • Finally, we print out the normalized data.

Remember to use fit_transform() for the training data and transform() for the testing data. It will make sure that the testing data is scaled using the same parameters as the training data.

Normalization helps algorithms converge faster during training and prevents features with larger scales from dominating those with smaller scales.

We have seen how to normalize data using scikit-learn in Python. We have seen two ways of the normalization.