Label Encoding Using Python

 

Label Encoding

Before Applying Label Encoding:

import pandas as pd

from sklearn.datasets import fetch_openml

from sklearn.preprocessing import LabelEncoder

 

# Load the Titanic dataset

titanic = fetch_openml(name='titanic', version=1, as_frame=True)

 

# Convert the data and target into a DataFrame

df = pd.concat([titanic.data, titanic.target], axis=1)

 

# Display the first few rows of the DataFrame after label encoding

df.head()

 

After Applying Label Encoding:

import pandas as pd

from sklearn.datasets import fetch_openml

from sklearn.preprocessing import LabelEncoder

 

# Load the Titanic dataset

titanic = fetch_openml(name='titanic', version=1, as_frame=True)

 

# Convert the data and target into a DataFrame

df = pd.concat([titanic.data, titanic.target], axis=1)

 

# Define columns for label encoding

columns_to_encode = ['sex', 'embarked', 'pclass', 'survived']

 

# Apply label encoding

label_encoder = LabelEncoder()

for column in columns_to_encode:

    df[column] = label_encoder.fit_transform(df[column])

 

# Display the first few rows of the DataFrame after label encoding

df.head()

 

This Python code performs the following tasks:

Import necessary libraries: The code begins by importing the required libraries - pandas, fetch_openml from sklearn.datasets, and LabelEncoder from sklearn.preprocessing.

 

Load the Titanic dataset: Using the fetch_openml function, it loads the Titanic dataset with the specified name ('titanic') and version (1) as a DataFrame (as_frame=True). This dataset contains information about passengers aboard the Titanic.

 

Convert the data and target into a DataFrame: The code concatenates the data and target of the Titanic dataset into a single DataFrame called df. This DataFrame contains both the features (data) and the target variable ('survived').

 

Define columns for label encoding: It specifies the columns that need to be label encoded. In this case, the columns include 'sex', 'embarked', 'pclass', and 'survived'.

 

Apply label encoding: A LabelEncoder object is created, and a loop is used to iterate over the columns specified for label encoding. For each column, the fit_transform method of the LabelEncoder object is applied to encode the categorical values into numeric labels.

 

Display the first few rows of the DataFrame after label encoding: Finally, the code displays the first few rows of the DataFrame (df) after label encoding using the head() method.

 

The label encoding replaces categorical values with numeric labels, making it easier to work with the data in machine learning algorithms that require numerical input. For example, in the 'sex' column, 'male' might be encoded as 0 and 'female' as 1, while in the 'embarked' column, 'S', 'C', and 'Q' might be encoded as 0, 1, and 2 respectively.

 

 

 

 

 

टिप्पणी पोस्ट करा

0 टिप्पण्या