Regular Expressions in
Python
What is Regular Expression:
Regular expressions, often
abbreviated as regex or regexp, are sequences of characters that define a
search pattern. They are used for matching patterns in strings, which can be
helpful for tasks like searching, replacing, and validating text data.
In Python, regular expressions are
supported through the re module, which provides functions and methods for
working with regular expressions.
Here are some key concepts related
to regular expressions in Python:
Pattern Matching: Regular
expressions define patterns that can be matched against strings. Patterns can
include literal characters, metacharacters, character classes, quantifiers, and
more.
Metacharacters: Metacharacters are
special characters in regular expressions that have special meanings. For
example, . matches any character except a newline, ^ matches the start of a
string, and $ matches the end of a string.
Character Classes: Character
classes allow you to match specific sets of characters. For example, [aeiou]
matches any vowel, [0-9] matches any digit, and [^a-z] matches any character
except lowercase letters.
Quantifiers: Quantifiers specify
how many times a pattern should be repeated. For example, * matches zero or
more occurrences, + matches one or more occurrences, and ? matches zero or one
occurrence.
Anchors: Anchors are used to
specify positions in the string where a match should occur. For example, ^
matches the start of a string, $ matches the end of a string, and \b matches a
word boundary.
Modifiers: Modifiers are used to
change the behavior of regular expression patterns. For example, i performs
case-insensitive matching, m enables multiline mode, and s enables dot-all
mode.
Regular expressions are powerful
tools for text processing and manipulation, allowing you to perform complex
pattern matching operations with relatively simple patterns. They are widely
used in tasks such as data validation, text extraction, parsing, and more.
Simple example that demonstrate Regular Expression for Pattern Search,
import re
# Sample text
text = "The quick brown fox
jumps over the lazy dog."
# Search for the word
"fox" in the text
pattern = r'fox'
match = re.search(pattern, text)
if match:
print("Found 'fox' at position:", match.start())
else:
print("Pattern not found.")
# Search for any word starting with
"q" in the text
pattern = r'\bq\w+'
matches = re.findall(pattern, text)
if matches:
print("Words starting with 'q':", matches)
else:
print("Pattern not found.")
# Replace all occurrences of
"fox" with "cat" in the text
pattern = r'fox'
new_text = re.sub(pattern, 'cat',
text)
print("Modified text:",
new_text)
Explanation,
This code demonstrates three common
operations with regular expressions:
Searching: It searches for a
specific pattern (in this case, the word "fox") within the text using
the re.search() function. If a match is found, it prints the position of the
match.
Finding All Matches: It finds all
words in the text that start with the letter "q" using the
re.findall() function. It uses the pattern \bq\w+, where \b indicates a word
boundary, q matches the letter "q", and \w+ matches one or more word
characters.
Substitution: It replaces all
occurrences of "fox" with "cat" in the text using the
re.sub() function. This function takes the pattern to search for, the
replacement text, and the input text. It returns a new string with all
occurrences of the pattern replaced.
This is a basic example to
demonstrate the usage of regular expressions. Regular expressions can be much
more powerful and versatile, allowing you to define complex patterns for
searching, matching, and manipulating text data.
Another Example, By taking user input
import re
# Get user input
text = input("Enter a
sentence: ")
# Search for the word
"fox" in the text
pattern = r'fox'
match = re.search(pattern, text)
if match:
print("Found 'fox' at position:", match.start())
else:
print("Pattern not found.")
# Search for any word starting with
"q" in the text
pattern = r'\bq\w+'
matches = re.findall(pattern, text)
if matches:
print("Words starting with 'q':", matches)
else:
print("Pattern not found.")
# Replace all occurrences of
"fox" with "cat" in the text
pattern = r'fox'
new_text = re.sub(pattern, 'cat',
text)
print("Modified text:",
new_text)
If it find the string it will
return otherwise it will return None
Following are some Pattern Search Characters:
Here are some common characters used in regular expressions and their
purposes:
. (Dot):
Purpose: Matches any single
character except newline (\n).
^ (Caret):
Purpose: Matches the start of the
string.
When used inside square brackets
[], it negates the character class, matching any character except the ones
listed.
$ (Dollar):
Purpose: Matches the end of the
string or just before the newline at the end of the string.
\ (Backslash):
Purpose: Escapes special
characters, allowing you to match literal characters like ., ^, $, etc.
[] (Square brackets):
Purpose: Defines a character class,
allowing you to match any one of the characters inside the brackets.
| (Pipe):
Purpose: Acts as an OR operator,
allowing you to match either the expression before or after the pipe.
* (Asterisk):
Purpose: Matches zero or more
occurrences of the preceding character or group.
+ (Plus):
Purpose: Matches one or more
occurrences of the preceding character or group.
? (Question mark):
Purpose: Matches zero or one
occurrence of the preceding character or group. It makes the preceding token
optional.
{} (Curly braces):
Purpose: Specifies the number of
occurrences of the preceding character or group. {n} matches exactly n
occurrences, {n,} matches n or more occurrences, and {n,m} matches between n
and m occurrences.
These are some of the basic
characters used in regular expressions. They allow for powerful pattern
matching and manipulation of strings.
For Example,
Email validation:
Lets validate email for given data
·
The email address must start with one or more word
characters (letters, digits, or underscores).
·
It may contain dots (.), hyphens (-), or underscores
(_) after the initial word characters.
·
The domain part must start with an @ symbol followed
by one or more word characters.
·
The domain can have multiple subdomains separated by
dots.
·
The TLD (Top-Level Domain) can contain between 2 to 4
characters.
import re
pattern =
r'^\w+([-._]\w+)*@\w+([.-]\w+)*\.\w{2,4}$'
# Test email addresses
emails = [
"user@example.com",
"first.last@example.co.uk",
"user123@sub.domain.example",
"invalid.email@invalid"
]
# Validate email addresses
for email in emails:
if re.match(pattern, email):
print(f"{email} is a valid email
address.")
else:
print(f"{email} is not a valid
email address.")
Explanation:
Explanation of the regular
expression pattern:
^: Matches the start of the string.
\w+: Matches one or more word
characters (letters, digits, or underscores).
([-._]\w+)*: Matches zero or more
occurrences of a hyphen, dot, or underscore followed by one or more word
characters, allowing for subdomains.
@: Matches the @ symbol.
\w+([.-]\w+)*: Matches the domain
part, allowing for dots (.) or hyphens (-) followed by one or more word
characters.
\.\w{2,4}: Matches the dot (.)
followed by the TLD (Top-Level Domain) with 2 to 4 characters.
$: Matches the end of the string.
This regular expression pattern
demonstrates the usage of several special characters like ^, \w, [], *, +, .
(dot), -, |, {}, and $.
Password Validation:
This regular expression pattern
ensures that the password contains:
At least one uppercase letter.
At least one lowercase letter.
At least one digit.
At least one special character from
@$!%*?&.
A length between 8 and 20
characters.
Code
import re
def validate_password(password):
# Password pattern: 8-20 characters, at least one uppercase letter,
# one lowercase letter, one digit, and one special character
pattern =
r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,20}$'
if re.match(pattern, password):
return True
else:
return False
# Test passwords
passwords = [
"StrongPassword123!",
"weakpassword",
"1234567890",
"Abcdefg123",
"Test@Password!",
"verylongpasswordthatexceeds20characters"
]
# Validate passwords
for password in passwords:
if validate_password(password):
print(f"{password} is a valid
password.")
else:
print(f"{password} is not a valid
password.")
Explanation:
Explanation of the regular
expression pattern:
^: Matches the start of the string.
(?=.*[A-Z]): Positive lookahead
assertion to ensure there is at least one uppercase letter.
(?=.*[a-z]): Positive lookahead
assertion to ensure there is at least one lowercase letter.
(?=.*\d): Positive lookahead
assertion to ensure there is at least one digit.
(?=.*[@$!%*?&]): Positive
lookahead assertion to ensure there is at least one special character among
@$!%*?&.
[A-Za-z\d@$!%*?&]{8,20}:
Matches 8 to 20 characters consisting of letters (uppercase and lowercase), digits,
and special characters.
$: Matches the end of the string.
0 टिप्पण्या
कृपया तुमच्या प्रियजनांना लेख शेअर करा आणि तुमचा अभिप्राय जरूर नोंदवा. 🙏 🙏