BBA Analytics Using Python Practice Notebook
In [4]:
import pandas as pdimport seaborn as sns# Load the Titanic dataset from Seaborntitanic = sns.load_dataset('titanic')# Display the first few rows of the DataFrameprint("Original DataFrame:")print(titanic.head())# Fill 'sex' column with 0 for male and 1 for femaletitanic['sex'] = titanic['sex'].map({'male': 0, 'female': 1})# Fill 'embarked' column with 'S', 'C', and 'Q'titanic['embarked'].fillna('S', inplace=True)titanic['embarked'] = titanic['embarked'].map({'S': 'S', 'C': 'C', 'Q': 'Q'})# Display the modified DataFrameprint("\nDataFrame after filling values:")titanic.head()executed in 70ms, finished 21:44:35 2024-04-19
Original DataFrame:
survived pclass sex age sibsp parch fare embarked class \
0 0 3 male 22.0 1 0 7.2500 S Third
1 1 1 female 38.0 1 0 71.2833 C First
2 1 3 female 26.0 0 0 7.9250 S Third
3 1 1 female 35.0 1 0 53.1000 S First
4 0 3 male 35.0 0 0 8.0500 S Third
who adult_male deck embark_town alive alone
0 man True NaN Southampton no False
1 woman False C Cherbourg yes False
2 woman False NaN Southampton yes True
3 woman False C Southampton yes False
4 man True NaN Southampton no True
DataFrame after filling values:
Out[4]:
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | 0 | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
| 1 | 1 | 1 | 1 | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
| 2 | 1 | 3 | 1 | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
| 3 | 1 | 1 | 1 | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
| 4 | 0 | 3 | 0 | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
In [8]:
import pandas as pdimport seaborn as sns# Load the Titanic dataset from Seaborntitanic = sns.load_dataset('titanic')# Display total null valuesprint("Total null values before filling:")print(titanic.isnull().sum())# Fill null values with meantitanic.fillna(titanic.mean(), inplace=True)# Display DataFrame after filling null values with meanprint("\nDataFrame after filling null values with mean:")titanic.head()executed in 83ms, finished 21:47:05 2024-04-19
Total null values before filling: survived 0 pclass 0 sex 0 age 177 sibsp 0 parch 0 fare 0 embarked 2 class 0 who 0 adult_male 0 deck 688 embark_town 2 alive 0 alone 0 dtype: int64 DataFrame after filling null values with mean:
C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\2346204559.py:12: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. titanic.fillna(titanic.mean(), inplace=True)
Out[8]:
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
| 1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
| 2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
| 3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
| 4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
In [9]:
import pandas as pdimport seaborn as sns# Load the dataset from Seaborniris = sns.load_dataset('iris')# Display the first few rows of the datasetprint("First few rows of the dataset:")print(iris.head())# Calculate meanprint("\nMean values:")print(iris.mean())# Calculate medianprint("\nMedian values:")print(iris.median())# Calculate modeprint("\nMode values:")print(iris.mode())# Calculate correlationprint("\nCorrelation matrix:")print(iris.corr())# Calculate covarianceprint("\nCovariance matrix:")print(iris.cov())# Calculate standard deviationprint("\nStandard deviation values:")print(iris.std())executed in 8.32s, finished 21:49:22 2024-04-19
First few rows of the dataset:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
Mean values:
sepal_length 5.843333
sepal_width 3.057333
petal_length 3.758000
petal_width 1.199333
dtype: float64
Median values:
sepal_length 5.80
sepal_width 3.00
petal_length 4.35
petal_width 1.30
dtype: float64
Mode values:
sepal_length sepal_width petal_length petal_width species
0 5.0 3.0 1.4 0.2 setosa
1 NaN NaN 1.5 NaN versicolor
2 NaN NaN NaN NaN virginica
Correlation matrix:
sepal_length sepal_width petal_length petal_width
sepal_length 1.000000 -0.117570 0.871754 0.817941
sepal_width -0.117570 1.000000 -0.428440 -0.366126
petal_length 0.871754 -0.428440 1.000000 0.962865
petal_width 0.817941 -0.366126 0.962865 1.000000
Covariance matrix:
sepal_length sepal_width petal_length petal_width
sepal_length 0.685694 -0.042434 1.274315 0.516271
sepal_width -0.042434 0.189979 -0.329656 -0.121639
petal_length 1.274315 -0.329656 3.116278 1.295609
petal_width 0.516271 -0.121639 1.295609 0.581006
Standard deviation values:
sepal_length 0.828066
sepal_width 0.435866
petal_length 1.765298
petal_width 0.762238
dtype: float64
C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\4261197520.py:13: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. print(iris.mean()) C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\4261197520.py:17: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. print(iris.median()) C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\4261197520.py:33: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. print(iris.std())
In [11]:
import numpy as np# Define two arraysarr1 = np.array([1, 2, 3])arr2 = np.array([4, 5, 6])# Additionaddition_result = arr1 + arr2print("Addition result:", addition_result)# Multiplicationmultiplication_result = arr1 * arr2print("Multiplication result:", multiplication_result)# Product of array elementsarray_product = np.prod(arr1)print("Product of array elements:", array_product)executed in 22ms, finished 21:50:27 2024-04-19
Addition result: [5 7 9] Multiplication result: [ 4 10 18] Product of array elements: 6
In [12]:
import seaborn as snsimport matplotlib.pyplot as plt# Load the dataset directly from Seaborn (for example, let's use the 'iris' dataset)iris = sns.load_dataset('iris')# Draw a boxplot for each numerical columnsns.boxplot(data=iris)# Show the plotplt.show()executed in 179ms, finished 21:52:15 2024-04-19
*Here’s a more detailed breakdown of the box plot:
- The top whisker extends to the maximum value within 1.5 IQR from the upper quartile.
- The upper quartile is the 75th percentile, which means that 75% of the data points fall below this value.
- The box represents the middle 50% of the data points, or the IQR.
- The lower quartile is the 25th percentile, which means that 25% of the data points fall below this value.
- The bottom whisker extends to the minimum value within 1.5 IQR from the lower quartile.
- The outliers are any data points that fall outside the whiskers.
- Box plots are a useful way to quickly visualize the distribution of data, including the center, spread, and outliers. They can be used to compare data sets or to identify patterns in data.
0 टिप्पण्या
कृपया तुमच्या प्रियजनांना लेख शेअर करा आणि तुमचा अभिप्राय जरूर नोंदवा. 🙏 🙏