Working with Data Frame Using Basic Libraries
Modules, Packages and Libraries in Python
What are modules in Python?
Modules
are the files which contain Python definitions and statements. File name is the
module name which have suffix .py. Modules contain functions, classes and other
executable codes.
In
Python there are three types of modules,
1.
Built-in-Modules:
Built in modules are included in standard libraries.
These modules used in code directly without installing any additional packages.
For example, datetime, math, random,os etc.
Example,
import datetime
time=datetime.datetime.now()
print(time)
Output:
2023-04-05 12:51:24.704548
Another example,
import math
s_root=math.sqrt(15)
s_root
Output:
3.872983346207417
2.
Third Party Modules:
Third party modules are created by another developer.
You canot use these modules directly in your code. First you have to install
all these third party modules with package managers like pip.
Like
pip install pandas
then you import pandas in your code like
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Some commonly used third party modules are
Pandas, numpy, matplotlib, tensorflow etc.
3.
User Defined Modules:
The user defined modules are the modules created by
user and these can be used in different programs.
Following is an example to create user defined module
def greetings(name):
print(“Hello friend” + name)
This
module will be saved with my_module.py and then this module is called in
another python program like
import
my_module
my_module.greetings(“Manisha”)
Output:
Hello
friend Manisha
In
this program my_module is imported and then greetings function is used to
perform another task
Working With DataFrame:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# create a list of lists
Student_data = [['C' ,23,69 ],
['C++',21, 65 ],
['Python',20, 66 ],
['PHP', 17,70 ],
['SE', 19,64 ],
['HTML',29, 69 ]]
# create a DataFrame from the list of
lists
df = pd.DataFrame(Student_data,
columns=['Subject Name', 'Internal Marks', 'External Marks'])
print(df)
Subject
Name Internal Marks External
Marks
0 C 23 69
1 C++ 21 65
2 Python 20 66
3 PHP 17 70
4 SE 19 64
5 HTML 29 69
# Creating One Another Column Total
Marks
df["Total
Marks"]=df["Internal Marks"]+ df["External Marks"]
df
Subject Name Internal Marks External Marks Total Marks
0 C 23 69 92
1 C++ 21 65 86
2 Python 20 66 86
3 PHP 17 70 87
4 SE 19 64 83
5 HTML 29 69 98
6 DBMS 16 56 72
7 COA 19 65 84
8 BC 21 55 76
9 MIS 27 67 94
10 VB 11 26 37
11 RDBMS 10 22 32
12 ML 9 21 30
#
Creating Percentage Column
df["Percentage"]=df["Total
Marks"] * 100 / 100
df
Subject
Name Internal Marks External Marks Total Marks Percentage
0 C 23 69 92 92.0
1 C++ 21 65 86 86.0
2 Python 20 66 86 86.0
3 PHP 17 70 87 87.0
4 SE 19 64 83 83.0
5 HTML 29 69 98 98.0
6 DBMS 16 56 72 72.0
7 COA 19 65 84 84.0
8 BC 21 55 76 76.0
9 MIS 27 67 94 94.0
10 VB 11 26 37 37.0
11 RDBMS 10 22 32 32.0
12 ML 9 21 30 30.0
# Creating Grade Column
df["Grade"]="
"
df
Subject Name Internal Marks External Marks Total
Marks Percentage Grade
0 C 23 69 92 92.0
1 C++ 21 65 86 86.0
2 Python 20 66 86 86.0
3 PHP 17 70 87 87.0
4 SE 19 64 83 83.0
5 HTML 29 69 98 98.0
6 DBMS 16 56 72 72.0
7 COA 19 65 84 84.0
8 BC 21 55 76 76.0
9 MIS 27 67 94 94.0
10 VB 11 26 37 37.0
11 RDBMS 10 22 32 32.0
12 ML 9 21 30 30.0
#
Filling Grades to Grade Column
df.loc[df['Percentage']
>= 90, 'Grade'] = 'Outstanding'
df.loc[(df['Percentage']
>= 80) & (df['Percentage'] < 90), 'Grade'] = 'Excellent'
df.loc[(df['Percentage']
>= 70) & (df['Percentage'] < 80), 'Grade'] = 'Distinction'
df.loc[(df['Percentage']
>= 60) & (df['Percentage'] < 70), 'Grade'] = 'First Class'
df.loc[(df['Percentage']
>= 50) & (df['Percentage'] < 60), 'Grade'] = 'Second Class'
df.loc[(df['Percentage']
>= 40) & (df['Percentage'] < 50), 'Grade'] = 'Pass'
df.loc[df['Percentage']
< 40, 'Grade'] = 'Fail'
df
Subject Name Internal Marks External Marks Total
Marks Percentage Grade
0 C 23 69 92 92.0 Outstanding
1 C++ 21 65 86 86.0 Excellent
2 Python 20 66 86 86.0 Excellent
3 PHP 17 70 87 87.0 Excellent
4 SE 19 64 83 83.0 Excellent
5 HTML 29 69 98 98.0 Outstanding
6 DBMS 16 56 72 72.0 Distinction
7 COA 19 65 84 84.0 Excellent
8 BC 21 55 76 76.0 Distinction
9 MIS 27 67 94 94.0 Outstanding
10 VB 11 26 37 37.0 Fail
11 RDBMS 10 22 32 32.0 Fail
12 ML 9 21 30 30.0 Fail
#
Applying Basic Functions of Data Frame
1.
Display
Top 5 Rows from DataFrame
Df.head()
Subject
Name Internal Marks External Marks Total Marks Percentage Grade
0 C 23 69 92 92.0 Outstanding
1 C++ 21 65 86 86.0 Excellent
2 Python 20 66 86 86.0 Excellent
3 PHP 17 70 87 87.0 Excellent
4 SE 19 64 83 83.0 Excellent
2.
Display
bottom 5 rows from dataframe
Df.tail()
Subject
Name Internal Marks External Marks Total Marks Percentage Grade
8 BC 21 55 76 76.0 Distinction
9 MIS 27 67 94 94.0 Outstanding
10 VB 11 26 37 37.0 Fail
11 RDBMS 10 22 32 32.0 Fail
12 ML 9 21 30 30.0 Fail
3.
Get
overall information of your dataframe
Df.info()
<class
'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0
to 12
Data columns (total 6
columns):
#
Column Non-Null
Count Dtype
--- ------ -------------- -----
0 Subject
Name 13 non-null object
1
Internal Marks 13 non-null int64
2
External Marks 13 non-null int64
3
Total Marks 13 non-null int64
4
Percentage 13 non-null float64
5
Grade 13 non-null object
dtypes: float64(1),
int64(3), object(2)
memory usage: 752.0+
bytes
4.
Check
list of columns in your data frame
Dr.columns
Index(['Subject Name',
'Internal Marks', 'External Marks', 'Total Marks',
'Percentage', 'Grade'],
dtype='object')
5.
Check Null values from your data frame
Df.isna()
Subject Name Internal Marks External
Marks Total Marks Percentage Grade
0 False False False False False False
1 False False False False False False
2 False False False False False False
3 False False False False False False
4 False False False False False False
5 False False False False False False
6 False False False False False False
7 False False False False False False
8 False False False False False False
9 False False False False False False
10 False False False False False False
11 False False False False False False
12 False False False False False False
#
Get total count of null values from data frame
Df.isna().sum()
Subject
Name 0
Internal
Marks 0
External
Marks 0
Total
Marks 0
Percentage 0
Grade 0
dtype:
int64
6.
Get
Basic Statistics of data frame
Df.describe()
Internal Marks External Marks Total
Marks Percentage
count 13.000000 13.000000 13.000000 13.000000
mean 18.615385 55.000000 73.615385 73.615385
std 6.090135 18.819316 24.174844 24.174844
min 9.000000 21.000000 30.000000 30.000000
25% 16.000000 55.000000 72.000000 72.000000
50% 19.000000 65.000000 84.000000 84.000000
75% 21.000000 67.000000 87.000000 87.000000
max 29.000000 70.000000 98.000000 98.000000
7.
Calculate
the mean
df.mean()
<ipython-input-14-c61f0c8f89b5>:1:
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with
'numeric_only=None') is deprecated; in a future version this will raise
TypeError. Select only valid columns
before calling the reduction.
df.mean()
Internal
Marks 18.615385
External
Marks 55.000000
Total
Marks 73.615385
Percentage 73.615385
dtype:
float64
8.
Calculate
median
df.median()
<ipython-input-15-6d467abf240d>:1: FutureWarning:
Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None')
is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the
reduction.
df.median()
Internal Marks
19.0
External Marks
65.0
Total Marks
84.0
Percentage
84.0
dtype: float64
9.
Calculate mode
df.mode()
Subject Name Internal
Marks External Marks Total Marks Percentage Grade
0 BC 19.0 65.0 86.0 86.0 Excellent
1 C 21.0 69.0 NaN NaN NaN
2 C++ NaN NaN NaN NaN NaN
3 COA NaN NaN NaN NaN NaN
4 DBMS NaN NaN NaN NaN NaN
5 HTML NaN NaN NaN NaN NaN
6 MIS NaN NaN NaN NaN NaN
7 ML NaN NaN NaN NaN NaN
8 PHP NaN NaN NaN NaN NaN
9 Python NaN NaN NaN NaN NaN
10 RDBMS NaN NaN NaN NaN NaN
11 SE NaN NaN NaN NaN NaN
12 VB NaN NaN NaN NaN NaN
10.
Calculate
standard Deviation
df.std()
<ipython-input-17-ce97bb7eaef8>:1:
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with
'numeric_only=None') is deprecated; in a future version this will raise
TypeError. Select only valid columns
before calling the reduction.
df.std()
Internal Marks
6.090135
External Marks
18.819316
Total Marks
24.174844
Percentage
24.174844
dtype: float64
Data
Visualization
Basic
Libraries
import matplotlib.pyplot as plt
import seaborn as sns
1. Pie Chart
# Create a pie chart
grade = df['Grade'].value_counts()
plt.pie(grade, labels=grade.index,
autopct='%2.3f%%')
plt.title('Student Grades')
plt.show()
2. Histogram
df.hist(figsize=(10, 10))
plt.show()
3. Distribution Plot
sns.distplot(df['Percentage'],color='Red')
4. Count Plot
sns.countplot(x="Percentage",
data=df) # create the countplot with the
specified column and dataframe
5. Box Plot
sns.boxplot(data=df) # create the box plot for the entire
dataframe
6. Line Graph
plt.plot(df["Subject
Name"], df["Percentage"])
plt.xlabel("Subject
NAme") # set the x-axis label to
the name of the x column
plt.ylabel("Percentage") # set the y-axis label to the name of the y
column
plt.title('Student Percentage') # set the title of the plot
plt.show() # display the plot
7. Scatered Graph
plt.scatter(df["Subject
Name"], df["Percentage"])
plt.xlabel("Subject
Name") # set the x-axis label to
the name of the x column
plt.ylabel("Percentage") # set the y-axis label to the name of the y
column
plt.title('Scatter Plot') # set the title of the plot
plt.show() # display the plot
0 टिप्पण्या
कृपया तुमच्या प्रियजनांना लेख शेअर करा आणि तुमचा अभिप्राय जरूर नोंदवा. 🙏 🙏