In this tutorial, we will explain how to normalize data using scikit-learn in Python. Normalizing data is a crucial preprocessing step in machine learning to make sure that features are on a similar scale.

Scikit-learn provides a simple and effective way to normalize data. Scikit-learn has a function called preprocessing.normalize(). it allows the data normalize using various norms. It takes each sample row independently to have a unit norms. Each row in the dataset will have a length of 1 after normalization.

There are 3 types of norms parameter. Here is a list of available norms that we can use with `preprocessing.normalize()`

.

**1 - L1**

It is known as least absolute deviation. It scales each row or column by the sum of its absolute values. Each row or column becomes sum of absolute values equal to 1.

**2 - L2**

It is known as Euclidean normalization. It scales each row or column by the square root of the sum of the squares of its elements. Each row or column becomes sum of absolute values equal to 1.

**3 - max**

It scales each row or column by its maximum absolute value. max norm make sure that the maximum absolute value of each row or column is 1.

#### Row-wise normalization

Here is the Python code of showing, how we can use `preprocessing.normalize()`

to normalize data using * scikit-learn*:

from sklearn import preprocessing

import numpy as np

# Sample data

data = np.array([[1.0, 2.0, 3.0],

[4.0, 5.0, 6.0],

[7.0, 8.0, 9.0]])

# Normalizing the data using preprocessing.normalize()

normalized_data = preprocessing.normalize(data, norm='l2') # 'l2' normalization

print("Normalized Data:")

print(normalized_data)

Here is the explaination of the above code:

- First we have
*import preprocessing*from*sklearn*and we have imported*numpy* - Next, we have created sample dataset and assigned to
*data*variable. We have use numpy array. - We have used
*preprocessing.normalize()*function to normalize data and used*L2 norm*. It scale each row or column becomes sum of absolute values equal to 1. - Finally, we print out the normalized data.

If we want, we can use other norms also to normalize the data.

The crucial note is `preprocessing.normalize()`

works on row-wise normalization. If we want to normalize column wise or using other techniques, we can use `MinMaxScaler`

, `StandardScaler`

, or custom normalization functions.

#### Column-wise normalization

We can use MinMaxScaler or StandardScaler classes to normalize the data. It provides simple and effective way to normalize data.

**MinMaxScaler:** It scales the data at the fixed range. It uses between 0 and 1 by subtracting the minimum value and dividing by the range of each feature.

**StandardScaler:** This scaler standardizes features by removing the mean and scaling to unit variance. It centers the data around 0 with a standard deviation of 1.

Following is the Python code that shows how to use both scalers:

from sklearn.preprocessing import MinMaxScaler, StandardScaler

import numpy as np

# Sample data

data = np.array([[1.0, 2.0, 3.0],

[4.0, 5.0, 6.0],

[7.0, 8.0, 9.0]])

# Using MinMaxScaler

scaler = MinMaxScaler()

normalized_data_minmax = scaler.fit_transform(data)

print("Min-Max Scaled Data:")

print(normalized_data_minmax)

# Using StandardScaler

scaler = StandardScaler()

normalized_data_standard = scaler.fit_transform(data)

print("\nStandard Scaled Data:")

print(normalized_data_standard)

Here is the explaination of the above code:

- First we have imported
*MinMaxScaler and StandardScaler*from*sklearn.preprocessing*and we have imported*numpy* - We have created sample dataset and assigned to
*data*variable - We have used both
*MinMaxScaler*and*StandardScaler* - We then apply the
*fit_transform()*method to normalize the data. - Finally, we print out the normalized data.

Remember to use `fit_transform()`

for the training data and `transform()`

for the testing data. It will make sure that the testing data is scaled using the same parameters as the training data.

Normalization helps algorithms converge faster during training and prevents features with larger scales from dominating those with smaller scales.

We have seen how to normalize data using scikit-learn in Python. We have seen two ways of the normalization.