Label Encoding and One Hot Encoding


Label Encoding and One Hot Encoding

 Label Encoding and One-Hot Encoding are two common techniques for converting categorical variables into a numerical format that machine learning algorithms can process.

Label Encoding and One-Hot Encoding are used in the data preprocessing stage of the machine learning (ML) pipeline. Specifically, they are applied when dealing with categorical variables before feeding the data into ML models.


1. Label Encoding:

Label encoding converts each category in a feature into an integer. It assigns a unique number (label) to each category. This method is suitable for ordinal categorical features where the order of the categories matters.


Example:

Consider a feature called Color with three categories: Red, Green, Blue.

Color             Label Encoded

Red                         0

Green                 1

Blue                 2

Advantages:


Simple and easy to implement.

Works well for ordinal features (categories with an inherent order).

Disadvantages:

For nominal (non-ordinal) features, label encoding may introduce a false sense of order, which can mislead some algorithms (e.g., linear models).

2. One-Hot Encoding:

One-hot encoding creates a binary column for each category and assigns a 1 or 0 to indicate the presence of that category. This method is suitable for nominal categorical features where the categories have no ordinal relationship.


Example:

For the same Color feature, one-hot encoding creates separate columns for each category.

Color         Red             Green         Blue

Red                     1                 0         0

Green             0                 1         0

Blue             0                 0         1


Advantages:


Suitable for nominal features (no natural order).

Prevents introducing false ordinal relationships.

Disadvantages:


Increases dimensionality, especially when there are many categories, leading to the "curse of dimensionality."

Key Differences:

Label Encoding assigns a numeric label to each category, making it more compact but potentially misleading for nominal data.

One-Hot Encoding expands the feature space, making it more suitable for nominal data but at the cost of higher dimensionality.

Use label encoding for ordinal data and one-hot encoding for nominal data.


Download Label Encoding Dataset from here

Download Iris Dataset from here

Go To Jupyter Notebook: Label Encoding and One Hot Encoding

टिप्पणी पोस्ट करा

0 टिप्पण्या