Label Encoding and One Hot Encoding
Label Encoding and One-Hot Encoding are two common techniques for converting categorical variables into a numerical format that machine learning algorithms can process.
Label Encoding and One-Hot Encoding are used in the data preprocessing stage of the machine learning (ML) pipeline. Specifically, they are applied when dealing with categorical variables before feeding the data into ML models.
1. Label Encoding:
Label encoding converts each category in a feature into an integer. It assigns a unique number (label) to each category. This method is suitable for ordinal categorical features where the order of the categories matters.
Example:
Consider a feature called Color with three categories: Red, Green, Blue.
Color Label Encoded
Red 0
Green 1
Blue 2
Advantages:
Simple and easy to implement.
Works well for ordinal features (categories with an inherent order).
Disadvantages:
For nominal (non-ordinal) features, label encoding may introduce a false sense of order, which can mislead some algorithms (e.g., linear models).
2. One-Hot Encoding:
One-hot encoding creates a binary column for each category and assigns a 1 or 0 to indicate the presence of that category. This method is suitable for nominal categorical features where the categories have no ordinal relationship.
Example:
For the same Color feature, one-hot encoding creates separate columns for each category.
Color Red Green Blue
Red 1 0 0
Green 0 1 0
Blue 0 0 1
Advantages:
Suitable for nominal features (no natural order).
Prevents introducing false ordinal relationships.
Disadvantages:
Increases dimensionality, especially when there are many categories, leading to the "curse of dimensionality."
Key Differences:
Label Encoding assigns a numeric label to each category, making it more compact but potentially misleading for nominal data.
One-Hot Encoding expands the feature space, making it more suitable for nominal data but at the cost of higher dimensionality.
Use label encoding for ordinal data and one-hot encoding for nominal data.
Download Label Encoding Dataset from here
Download Iris Dataset from here
Go To Jupyter Notebook: Label Encoding and One Hot Encoding
0 टिप्पण्या
कृपया तुमच्या प्रियजनांना लेख शेअर करा आणि तुमचा अभिप्राय जरूर नोंदवा. 🙏 🙏