BBA Analytics Using Python Practice Notebook
In [4]:
import pandas as pd
import seaborn as sns
# Load the Titanic dataset from Seaborn
titanic = sns.load_dataset('titanic')
# Display the first few rows of the DataFrame
print("Original DataFrame:")
print(titanic.head())
# Fill 'sex' column with 0 for male and 1 for female
titanic['sex'] = titanic['sex'].map({'male': 0, 'female': 1})
# Fill 'embarked' column with 'S', 'C', and 'Q'
titanic['embarked'].fillna('S', inplace=True)
titanic['embarked'] = titanic['embarked'].map({'S': 'S', 'C': 'C', 'Q': 'Q'})
# Display the modified DataFrame
print("\nDataFrame after filling values:")
titanic.head()
executed in 70ms, finished 21:44:35 2024-04-19
Original DataFrame: survived pclass sex age sibsp parch fare embarked class \ 0 0 3 male 22.0 1 0 7.2500 S Third 1 1 1 female 38.0 1 0 71.2833 C First 2 1 3 female 26.0 0 0 7.9250 S Third 3 1 1 female 35.0 1 0 53.1000 S First 4 0 3 male 35.0 0 0 8.0500 S Third who adult_male deck embark_town alive alone 0 man True NaN Southampton no False 1 woman False C Cherbourg yes False 2 woman False NaN Southampton yes True 3 woman False C Southampton yes False 4 man True NaN Southampton no True DataFrame after filling values:
Out[4]:
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | 0 | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | 1 | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | 1 | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | 1 | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | 0 | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
In [8]:
import pandas as pd
import seaborn as sns
# Load the Titanic dataset from Seaborn
titanic = sns.load_dataset('titanic')
# Display total null values
print("Total null values before filling:")
print(titanic.isnull().sum())
# Fill null values with mean
titanic.fillna(titanic.mean(), inplace=True)
# Display DataFrame after filling null values with mean
print("\nDataFrame after filling null values with mean:")
titanic.head()
executed in 83ms, finished 21:47:05 2024-04-19
Total null values before filling: survived 0 pclass 0 sex 0 age 177 sibsp 0 parch 0 fare 0 embarked 2 class 0 who 0 adult_male 0 deck 688 embark_town 2 alive 0 alone 0 dtype: int64 DataFrame after filling null values with mean:
C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\2346204559.py:12: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. titanic.fillna(titanic.mean(), inplace=True)
Out[8]:
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
In [9]:
import pandas as pd
import seaborn as sns
# Load the dataset from Seaborn
iris = sns.load_dataset('iris')
# Display the first few rows of the dataset
print("First few rows of the dataset:")
print(iris.head())
# Calculate mean
print("\nMean values:")
print(iris.mean())
# Calculate median
print("\nMedian values:")
print(iris.median())
# Calculate mode
print("\nMode values:")
print(iris.mode())
# Calculate correlation
print("\nCorrelation matrix:")
print(iris.corr())
# Calculate covariance
print("\nCovariance matrix:")
print(iris.cov())
# Calculate standard deviation
print("\nStandard deviation values:")
print(iris.std())
executed in 8.32s, finished 21:49:22 2024-04-19
First few rows of the dataset: sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa Mean values: sepal_length 5.843333 sepal_width 3.057333 petal_length 3.758000 petal_width 1.199333 dtype: float64 Median values: sepal_length 5.80 sepal_width 3.00 petal_length 4.35 petal_width 1.30 dtype: float64 Mode values: sepal_length sepal_width petal_length petal_width species 0 5.0 3.0 1.4 0.2 setosa 1 NaN NaN 1.5 NaN versicolor 2 NaN NaN NaN NaN virginica Correlation matrix: sepal_length sepal_width petal_length petal_width sepal_length 1.000000 -0.117570 0.871754 0.817941 sepal_width -0.117570 1.000000 -0.428440 -0.366126 petal_length 0.871754 -0.428440 1.000000 0.962865 petal_width 0.817941 -0.366126 0.962865 1.000000 Covariance matrix: sepal_length sepal_width petal_length petal_width sepal_length 0.685694 -0.042434 1.274315 0.516271 sepal_width -0.042434 0.189979 -0.329656 -0.121639 petal_length 1.274315 -0.329656 3.116278 1.295609 petal_width 0.516271 -0.121639 1.295609 0.581006 Standard deviation values: sepal_length 0.828066 sepal_width 0.435866 petal_length 1.765298 petal_width 0.762238 dtype: float64
C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\4261197520.py:13: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. print(iris.mean()) C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\4261197520.py:17: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. print(iris.median()) C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\4261197520.py:33: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. print(iris.std())
In [11]:
import numpy as np
# Define two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
# Addition
addition_result = arr1 + arr2
print("Addition result:", addition_result)
# Multiplication
multiplication_result = arr1 * arr2
print("Multiplication result:", multiplication_result)
# Product of array elements
array_product = np.prod(arr1)
print("Product of array elements:", array_product)
executed in 22ms, finished 21:50:27 2024-04-19
Addition result: [5 7 9] Multiplication result: [ 4 10 18] Product of array elements: 6
In [12]:
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset directly from Seaborn (for example, let's use the 'iris' dataset)
iris = sns.load_dataset('iris')
# Draw a boxplot for each numerical column
sns.boxplot(data=iris)
# Show the plot
plt.show()
executed in 179ms, finished 21:52:15 2024-04-19
*Here’s a more detailed breakdown of the box plot:
- The top whisker extends to the maximum value within 1.5 IQR from the upper quartile.
- The upper quartile is the 75th percentile, which means that 75% of the data points fall below this value.
- The box represents the middle 50% of the data points, or the IQR.
- The lower quartile is the 25th percentile, which means that 25% of the data points fall below this value.
- The bottom whisker extends to the minimum value within 1.5 IQR from the lower quartile.
- The outliers are any data points that fall outside the whiskers.
- Box plots are a useful way to quickly visualize the distribution of data, including the center, spread, and outliers. They can be used to compare data sets or to identify patterns in data.
0 टिप्पण्या
कृपया तुमच्या प्रियजनांना लेख शेअर करा आणि तुमचा अभिप्राय जरूर नोंदवा. 🙏 🙏