Working with Data Frame Using Basic Libraries

 Working with Data Frame Using Basic Libraries

Modules, Packages and Libraries in Python

What are modules in Python?


Modules are the files which contain Python definitions and statements. File name is the module name which have suffix .py. Modules contain functions, classes and other executable codes.

In Python there are three types of modules,

1.    Built-in-Modules:

Built in modules are included in standard libraries. These modules used in code directly without installing any additional packages.

For example, datetime, math, random,os etc.

Example,

import datetime

time=datetime.datetime.now()

print(time)

Output:

2023-04-05 12:51:24.704548

 

Another example,

import math

s_root=math.sqrt(15)

s_root

Output:

3.872983346207417

 

2.    Third Party Modules:

Third party modules are created by another developer. You canot use these modules directly in your code. First you have to install all these third party modules with package managers like pip.

Like

pip install pandas

then you import pandas in your code like

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

 

Some commonly used third party modules are

Pandas, numpy, matplotlib, tensorflow etc.

 

3.    User Defined Modules:

The user defined modules are the modules created by user and these can be used in different programs.

Following is an example to create user defined module

          def greetings(name):

                   print(“Hello friend” + name)

This module will be saved with my_module.py and then this module is called in another python program like

import my_module

my_module.greetings(“Manisha”)

Output:

Hello friend Manisha

In this program my_module is imported and then greetings function is used to perform another task

Working With DataFrame:

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

# create a list of lists

Student_data = [['C' ,23,69 ],

       ['C++',21, 65 ],

        ['Python',20, 66 ],

        ['PHP', 17,70 ],

        ['SE', 19,64 ],

        ['HTML',29, 69 ]]

 

# create a DataFrame from the list of lists

df = pd.DataFrame(Student_data, columns=['Subject Name', 'Internal Marks', 'External Marks'])

print(df)

          Subject Name   Internal Marks           External Marks

0       C                                    23                        69

1       C++                               21                        65

2       Python                         20                        66

3       PHP                              17                        70

4       SE                                 19                        64

5       HTML                          29                        69

 

# Creating One Another Column Total Marks

df["Total Marks"]=df["Internal Marks"]+ df["External Marks"]

df

Subject Name   Internal Marks External Marks         Total Marks

0       C                                    23               69                        92

1       C++                               21               65                        86

2       Python                         20               66                        86

3       PHP                              17               70                        87

4       SE                                 19               64                         83

5       HTML                         29               69                        98

6       DBMS                          16               56                        72

7       COA                              19               65                        84

8       BC                                 21               55                        76

9       MIS                               27               67                         94

10     VB                                 11               26                        37

11     RDBMS                       10               22                         32

12     ML                                9                 21                         30

 

# Creating Percentage Column

df["Percentage"]=df["Total Marks"] * 100 / 100

df

Subject Name       Internal Marks     External Marks    Total Marks          Percentage

0              C             23           69           92           92.0

1              C++        21           65           86           86.0

2              Python   20           66           86           86.0

3              PHP        17           70           87           87.0

4              SE           19           64           83           83.0

5              HTML   29           69           98           98.0

6              DBMS    16           56           72           72.0

7              COA       19           65           84           84.0

8              BC          21           55           76           76.0

9              MIS        27           67           94           94.0

10            VB          11            26            37            37.0

11            RDBMS  10            22            32            32.0

12            ML          9              21            30            30.0

 

# Creating Grade Column

df["Grade"]=" "

df

                Subject Name          Internal Marks        External Marks       Total Marks            Percentage               Grade

0              C             23            69            92            92.0        

1              C++         21            65            86            86.0        

2              Python     20            66            86            86.0        

3              PHP        17            70            87            87.0        

4              SE           19            64            83            83.0        

5              HTML     29            69            98            98.0        

6              DBMS    16            56            72            72.0        

7              COA       19            65            84            84.0        

8              BC           21            55            76            76.0        

9              MIS         27            67            94            94.0        

10            VB          11            26            37            37.0        

11            RDBMS  10            22            32            32.0        

12            ML          9              21            30            30.0        

 

# Filling Grades to Grade Column

df.loc[df['Percentage'] >= 90, 'Grade'] = 'Outstanding'

df.loc[(df['Percentage'] >= 80) & (df['Percentage'] < 90), 'Grade'] = 'Excellent'

df.loc[(df['Percentage'] >= 70) & (df['Percentage'] < 80), 'Grade'] = 'Distinction'

df.loc[(df['Percentage'] >= 60) & (df['Percentage'] < 70), 'Grade'] = 'First Class'

df.loc[(df['Percentage'] >= 50) & (df['Percentage'] < 60), 'Grade'] = 'Second Class'

df.loc[(df['Percentage'] >= 40) & (df['Percentage'] < 50), 'Grade'] = 'Pass'

df.loc[df['Percentage'] < 40, 'Grade'] = 'Fail'

df

                Subject Name          Internal Marks        External Marks       Total Marks            Percentage               Grade

0              C             23            69            92            92.0         Outstanding

1              C++         21            65            86            86.0         Excellent

2              Python     20            66            86            86.0         Excellent

3              PHP        17            70            87            87.0         Excellent

4              SE           19            64            83            83.0         Excellent

5              HTML     29            69            98            98.0         Outstanding

6              DBMS    16            56            72            72.0         Distinction

7              COA       19            65            84            84.0         Excellent

8              BC           21            55            76            76.0         Distinction

9              MIS         27            67            94            94.0         Outstanding

10            VB          11            26            37            37.0         Fail

11            RDBMS  10            22            32            32.0         Fail

12            ML          9              21            30            30.0         Fail

 

# Applying Basic Functions of Data Frame

1.     Display Top 5 Rows from DataFrame

Df.head()

Subject Name          Internal Marks        External Marks       Total Marks            Percentage               Grade

0              C             23            69            92            92.0         Outstanding

1              C++         21            65            86            86.0         Excellent

2              Python     20            66            86            86.0         Excellent

3              PHP        17            70            87            87.0         Excellent

4              SE           19            64            83            83.0         Excellent

 

2.     Display bottom 5 rows from dataframe

Df.tail()

Subject Name          Internal Marks        External Marks       Total Marks            Percentage               Grade

8              BC           21            55            76            76.0         Distinction

9              MIS         27            67            94            94.0         Outstanding

10            VB          11            26            37            37.0         Fail

11            RDBMS  10            22            32            32.0         Fail

12            ML          9              21            30            30.0         Fail

 

3.     Get overall information of your dataframe

Df.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 13 entries, 0 to 12

Data columns (total 6 columns):

 #   Column          Non-Null Count  Dtype 

---  ------          --------------  ----- 

 0   Subject Name    13 non-null     object

 1   Internal Marks  13 non-null     int64 

 2   External Marks  13 non-null     int64 

 3   Total Marks     13 non-null     int64 

 4   Percentage      13 non-null     float64

 5   Grade           13 non-null     object

dtypes: float64(1), int64(3), object(2)

memory usage: 752.0+ bytes

 

4.        Check list of columns in your data frame

Dr.columns

Index(['Subject Name', 'Internal Marks', 'External Marks', 'Total Marks',

       'Percentage', 'Grade'],

      dtype='object')

5.        Check Null values from your data frame

Df.isna()

Subject Name  Internal Marks        External Marks       Total Marks            Percentage               Grade

0      False        False        False        False        False        False

1      False        False        False        False        False        False

2      False        False        False        False        False        False

3      False        False        False        False        False        False

4      False        False        False        False        False        False

5      False        False        False        False        False        False

6      False        False        False        False        False        False

7      False        False        False        False        False        False

8      False        False        False        False        False        False

9      False        False        False        False        False        False

10    False        False        False        False        False        False

11    False        False        False        False        False        False

12    False        False        False        False        False        False

# Get total count of null values from data frame

Df.isna().sum() 

Subject Name      0

Internal Marks    0

External Marks    0

Total Marks       0

Percentage        0

Grade             0

dtype: int64

 

6.     Get Basic Statistics of  data frame

Df.describe()

Internal Marks      External Marks     Total Marks           Percentage

count      13.000000               13.000000               13.000000               13.000000

mean       18.615385               55.000000               73.615385               73.615385

std           6.090135                18.819316               24.174844               24.174844

min         9.000000                 21.000000               30.000000               30.000000

25%        16.000000               55.000000               72.000000               72.000000

50%        19.000000               65.000000               84.000000               84.000000

75%        21.000000               67.000000               87.000000               87.000000

max         29.000000               70.000000               98.000000               98.000000

 

7.     Calculate the mean

df.mean()

<ipython-input-14-c61f0c8f89b5>:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.

  df.mean()

Internal Marks    18.615385

External Marks    55.000000

Total Marks       73.615385

Percentage        73.615385

dtype: float64

 

8.       Calculate median

df.median()

 

<ipython-input-15-6d467abf240d>:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.

  df.median()

Internal Marks    19.0

External Marks    65.0

Total Marks       84.0

Percentage        84.0

dtype: float64

 

 

9.      Calculate mode

df.mode()

Subject Name Internal Marks      External Marks     Total Marks           Percentage              Grade

0      BC          19.0         65.0         86.0         86.0         Excellent

1      C             21.0         69.0         NaN        NaN        NaN

2      C++        NaN        NaN        NaN        NaN        NaN

3      COA       NaN        NaN        NaN        NaN        NaN

4      DBMS    NaN        NaN        NaN        NaN        NaN

5      HTML    NaN        NaN        NaN        NaN        NaN

6      MIS        NaN        NaN        NaN        NaN        NaN

7      ML          NaN        NaN        NaN        NaN        NaN

8      PHP        NaN        NaN        NaN        NaN        NaN

9      Python    NaN        NaN        NaN        NaN        NaN

10    RDBMS NaN        NaN        NaN        NaN        NaN

11    SE           NaN        NaN        NaN        NaN        NaN

12    VB          NaN        NaN        NaN        NaN        NaN

 

10.     Calculate standard Deviation

df.std()

<ipython-input-17-ce97bb7eaef8>:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.

  df.std()

Internal Marks     6.090135

External Marks    18.819316

Total Marks       24.174844

Percentage        24.174844

dtype: float64

 

Data Visualization

Basic Libraries

import matplotlib.pyplot as plt

import seaborn as sns

 

1.     Pie Chart

# Create a pie chart

grade = df['Grade'].value_counts()

plt.pie(grade, labels=grade.index, autopct='%2.3f%%')

plt.title('Student Grades')

plt.show()

 



 

2.     Histogram

 

df.hist(figsize=(10, 10))

plt.show()

 



 

 

3.     Distribution Plot

sns.distplot(df['Percentage'],color='Red')

 



 

4.     Count Plot

sns.countplot(x="Percentage", data=df)  # create the countplot with the specified column and dataframe



5.      Box Plot

sns.boxplot(data=df)  # create the box plot for the entire dataframe

 



 

 

6.     Line Graph

plt.plot(df["Subject Name"], df["Percentage"])

plt.xlabel("Subject NAme")  # set the x-axis label to the name of the x column

plt.ylabel("Percentage")  # set the y-axis label to the name of the y column

plt.title('Student Percentage')  # set the title of the plot

plt.show()  # display the plot



7.     Scatered Graph

plt.scatter(df["Subject Name"], df["Percentage"])

plt.xlabel("Subject Name")  # set the x-axis label to the name of the x column

plt.ylabel("Percentage")  # set the y-axis label to the name of the y column

plt.title('Scatter Plot')  # set the title of the plot

plt.show()  # display the plot

 



 

 

 

 

 

 


टिप्पणी पोस्ट करा

0 टिप्पण्या