BBA Analytics Using Python Practice Notebook

BBA Analytics Using Python Practice Notebook

1  

Import titanic dataset. Display dataframe. Fill sex column with male as 0 and female as 1 Embarked column as “S”, “C”, and :Q” Interpret the result

In [4]:
executed in 70ms, finished 21:44:35 2024-04-19
Original DataFrame:
   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     who  adult_male deck  embark_town alive  alone  
0    man        True  NaN  Southampton    no  False  
1  woman       False    C    Cherbourg   yes  False  
2  woman       False  NaN  Southampton   yes   True  
3  woman       False    C  Southampton   yes  False  
4    man        True  NaN  Southampton    no   True  

DataFrame after filling values:
Out[4]:
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
003022.0107.2500SThirdmanTrueNaNSouthamptonnoFalse
111138.01071.2833CFirstwomanFalseCCherbourgyesFalse
213126.0007.9250SThirdwomanFalseNaNSouthamptonyesTrue
311135.01053.1000SFirstwomanFalseCSouthamptonyesFalse
403035.0008.0500SThirdmanTrueNaNSouthamptonnoTrue

2  Import titanic dataset. Find null values. Display total null values.

If there are null values fill all null values with mean values. Display your dataframe after filling null values with mean.

In [8]:
executed in 83ms, finished 21:47:05 2024-04-19
Total null values before filling:
survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64

DataFrame after filling null values with mean:
C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\2346204559.py:12: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
  titanic.fillna(titanic.mean(), inplace=True)
Out[8]:
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
003male22.0107.2500SThirdmanTrueNaNSouthamptonnoFalse
111female38.01071.2833CFirstwomanFalseCCherbourgyesFalse
213female26.0007.9250SThirdwomanFalseNaNSouthamptonyesTrue
311female35.01053.1000SFirstwomanFalseCSouthamptonyesFalse
403male35.0008.0500SThirdmanTrueNaNSouthamptonnoTrue

3  Import any dataset apply mean, median, mode, corr and cov and std function

In [9]:
executed in 8.32s, finished 21:49:22 2024-04-19
First few rows of the dataset:
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

Mean values:
sepal_length    5.843333
sepal_width     3.057333
petal_length    3.758000
petal_width     1.199333
dtype: float64

Median values:
sepal_length    5.80
sepal_width     3.00
petal_length    4.35
petal_width     1.30
dtype: float64

Mode values:
   sepal_length  sepal_width  petal_length  petal_width     species
0           5.0          3.0           1.4          0.2      setosa
1           NaN          NaN           1.5          NaN  versicolor
2           NaN          NaN           NaN          NaN   virginica

Correlation matrix:
              sepal_length  sepal_width  petal_length  petal_width
sepal_length      1.000000    -0.117570      0.871754     0.817941
sepal_width      -0.117570     1.000000     -0.428440    -0.366126
petal_length      0.871754    -0.428440      1.000000     0.962865
petal_width       0.817941    -0.366126      0.962865     1.000000

Covariance matrix:
              sepal_length  sepal_width  petal_length  petal_width
sepal_length      0.685694    -0.042434      1.274315     0.516271
sepal_width      -0.042434     0.189979     -0.329656    -0.121639
petal_length      1.274315    -0.329656      3.116278     1.295609
petal_width       0.516271    -0.121639      1.295609     0.581006

Standard deviation values:
sepal_length    0.828066
sepal_width     0.435866
petal_length    1.765298
petal_width     0.762238
dtype: float64
C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\4261197520.py:13: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
  print(iris.mean())
C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\4261197520.py:17: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
  print(iris.median())
C:\Users\Admin\AppData\Local\Temp\ipykernel_25652\4261197520.py:33: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
  print(iris.std())

4  Import array package and perform following operations

  1. Addition
  2. multiplication
  3. product of an array elements
In [11]:
executed in 22ms, finished 21:50:27 2024-04-19
Addition result: [5 7 9]
Multiplication result: [ 4 10 18]
Product of array elements: 6

5  Use any dataset and draw boxplot. Interpret the boxplot

In [12]:
executed in 179ms, finished 21:52:15 2024-04-19

*Here’s a more detailed breakdown of the box plot:

  • The top whisker extends to the maximum value within 1.5 IQR from the upper quartile.
  • The upper quartile is the 75th percentile, which means that 75% of the data points fall below this value.
  • The box represents the middle 50% of the data points, or the IQR.
  • The lower quartile is the 25th percentile, which means that 25% of the data points fall below this value.
  • The bottom whisker extends to the minimum value within 1.5 IQR from the lower quartile.
  • The outliers are any data points that fall outside the whiskers.
  • Box plots are a useful way to quickly visualize the distribution of data, including the center, spread, and outliers. They can be used to compare data sets or to identify patterns in data.

टिप्पणी पोस्ट करा

0 टिप्पण्या