Regular Expressions In Python
What is RegEx in Python:
A RegExp or regular expression in Python
is a sequence of characters that forms a search pattern. This search pattern
then used to detect or check if a string contains a search patterns. The RegExp
is widely used in UNIX.
The Python module re is used to for the
searching regular expression. Dealing with a regular expressions we can use raw
strings as r’expression’. Programmer can match and extract any string
pattern from given text with the help of regular expression.
For example,
if we want to match the “Mr. ABC”
keyword and then extract only name i.e. ABC from all the names from given list
without taking Mr. Prefix in this problem we can use regular expressions in
Python.
RegExp is widely used in texts, emails,
documents. The regular expression is also called as string matching programming
language.
Where to use Regular Expression:
Form Validations:
The regular expression is mostly used in
form validations such as email validation, phone validations, password
validations.
Account Details:
The credit cards, debit cards number
have 16 digits and first few number represents the cards are Visa cards, Master
cards or Rupay cards. To detects or search the specified pattern in given
number regular expressions are used.
The IFC code of different banks starts
with name of bank and some numbers to find sequence regular expressions are
used.
Regular Expression in Data Mining/ NLP:
In data mining the unstructured data is
to be converted into the structured form and then build the model and train the
model to get final results. The transforming the dat from unstructured to
structured form regular expression plays very important role.
Cleansing of data by removing stop
words, special symbols, punctuations etc. Are removed by using regular
expression.
RegEx Module:
Python has built-in package called re is
used for regular expression. To work with regular expressions first we have to
import re module as follows,
import re
Example
import re
name="My name is Manisha"
stringsearch=re.search("^My.*Manisha$",name)
if stringsearch:
print("Yes match found")
else:
print("No match found")
O/P:
Yes match found
In the above example name variable stores string My name is Manisha. One another variable stringsearch is used here to to store search pattern. The ^ symbol is used to find start character or word in given string. The . Is used to find any character except new line character. * is used to find zero or more occurrences in given string. $ is used to find or to check the end character or word from given string.
Another Example,
India= 'India is my country,
All Indians are my brothers and sisters'
match=re.search(r'brothers',India)
print("The start
Index",match.start())
print("The end
index",match.end())
O/P:
The start Index 40
The end
index 48
Meta Characters:
Following meta characters
are used in regular expression.
1. \ ( Backslash):
The use of \ is
make sure that the character in given string is not treated in a special way.
If you want to search any character from given string then you have to use backslash
before that character so that the string is not treated specially.
2. [] Square Bracket:
Square bracket is used to
represent set of characters in it. We can write range of characters in between
square bracket.
For example,
[0,9], [a-zA-Z]
^ i.e caret is a symbol used
to check the string startas with a given character or not.
For example,
^M will check the given
string is starts with more, mane, multiple etc.
4. $- Dollar:
The dollar symbol is used to
match with end of the string with a given character or not.
For example,
L$ is used to check the end
of given string with the character L i.e. beautiful, colourful etc.
5. .- Dot
The . is used to check a single character except for net
line character.
For example,
m.n will check string that
contains the character at dot.
6. |- Or:
The Or symbols works like
logical Or. It checks the given pattern before or after the Or symbol in given
string.
For example
m|n will search and match any string which
contains m or n such as many, any, tc.
7. ?- Question Mark:
The question mark symbol
checks if the string before the question mark in the regular expression
occurred at leat once or not at all.
For example,
xy?z will search the string for xz, xzy, mxyz
but it will not match xyyz because there
are two y.
8. *- Star:
* symbol matches zero or
more occurances of the regular expression .
For example,
xy*z will be matched in the string where y will be followed by z
like xyzxyz, abxyz klmnopxyz
etc.
9. +- Plus:
The plus symbol matches one or more
occurrences of the regular expression preceding the plus symbol.
For example,
xy+z
Special Sequences Used in Regular Expression:
The special sequence in
regular expression is \ followed by one of the character is given below,
1. \A:
The \A is used to return the
specified character at the beginning of the string. If match found return true
else return false.
For example,
# Use of \A sequence
import re
print("Yes, string start with H!")
else:
print("No , string does not start with H")
['H']
Yes, string start with H!
2. \b:
Find specified character at beginning or at the end of given
string.
For example,
# Use of \b sequence
import re
string1 = "Hello Friends how are you"
print("Yes, character is found!")
else:
print("No , character doesnot found")
O/P:
['llo']
Yes, character is found!
# Use of \b sequence
import re
print("Yes, character is found!")
else:
print("No , character doesnot found")
[]
No , character doesnot
found
3. \B :
This sequence is used to find the specified characters
present in the string but not in the beginning of of the string.
For
example
# Use of \B sequence
import re
print("Yes, character is found!")
else:
print("No , character doesnot found")
['ain', 'ain', 'ain']
Yes, character is found!
4.
\d and \D:
\d checks the specified string contains numerics
0-9. And \D checks the specified string does not contain numeric i.e. 0-9.
For example,
# Use of \d sequence
import re
check = re.findall(r"\d", string1)
print("Yes, digit is found!")
else:
print("No ,digit doesnot found")
['7']
Yes, digit is found!
Example for \D
# Use of \D sequence
import re
check = re.findall(r"\D", string1)
print("Yes, digit is found!")
else:
print("No ,digit doesnot found")
['r', 'a', 'i', 'n', ' ', 'r', 'a', 'i', 'n', '
', 'c', 'o', 'm', 'e', ' ', 'a', 'g', 'a', 'i', 'n', ' ']
Yes, match found!
\s returns a match where string contains white
spaces character
And \S returns where string does not contains
white spaces character
Example:
# Use of \s sequence
import re
check = re.findall(r"\s", string1)
print("Yes, white space found!")
else:
print("No ,white space doesnot found")
O/P:
[' ', ' ', ' ', ' ']
Yes, white space
found!
# Use of \S sequence
import re
check = re.findall(r"\s", string1)
print("Yes, white space found!")
else:
print("No ,white space doesnot found")
O/P:
[]
No ,white space
doesnot found
\w returns the match where string contains any
word character a-Z, 0-9, _ etc.
\W returns match where string doesn't contains
any word character
Example:
# Use of \w sequence
import re
check = re.findall(r"\w", string1)
print("Yes, match found!")
else:
print("No ,match doesnot found")
['r', 'a', 'i', 'n', 'r', 'a', 'i', 'n', 'c',
'o', 'm', 'e', 'a', 'g', 'a', 'i', 'n', '1', '2', '3']
Yes, match
found!
import re
string1 = "rainraincomeagain123"
check = re.findall(r"\W", string1)
print("Yes, match found!")
else:
print("No ,match doesnot found")
O/P:
[]
No ,match doesnot found
Returns match if specified characters are at the
end of the string
Example,
# Use of \Z sequence
import re
check = re.findall("again\Z", string1)
print("Yes, match found!")
else:
print("No ,match doesnot found")
['again']
Yes, match
found!
Functions used in Regular Expressions:
1.
Findall():
The
findall() return characters containing all matches
import re
# find all 'i' characters in given string
string1="India is my country"
check=re.findall("i",string1)
print(check)
['i', 'i']
2.
Search() Function:
The
search() function is used to return a character or string which present anywhere
in string. If match does not found then nothing is returned.
For
example,
string1="India is my country"
check=re.search("himalay",string1)
print(check)
None
3.
Split() Function:
The split
function is used to returns a list if
string is splits at every match.
For example,
# split () Function
string1="My nation is India"
splits=re.split("\s",string1)
print(splits)
O/P:
['My',
'nation', 'is', 'India']
We can control the splits by specifiying the
number of splits you want to display.
For example,
# split () Function with specifiying max splits
string1="My nation is India"
splits=re.split("\s",string1,2)
print(splits)
O/P:
['My', 'nation', 'is India']
4.
Sub() Function:
The sub() is used to replaces with one or more matches
with another string.
For
example,
# sub () Function
string1="My nation is India"
replace=re.sub("\s","**",string1)
print(replace)
My**nation**is**India
For example,
# sub () Function with count parameter
string1="My nation is India"
replace=re.sub("\s","**",string1,1)
print(replace)
My**nation is India
0 टिप्पण्या
कृपया तुमच्या प्रियजनांना लेख शेअर करा आणि तुमचा अभिप्राय जरूर नोंदवा. 🙏 🙏