Regular Expressions in Python

 

Regular Expressions in Python

What is Regular Expression:

Regular expressions, often abbreviated as regex or regexp, are sequences of characters that define a search pattern. They are used for matching patterns in strings, which can be helpful for tasks like searching, replacing, and validating text data.

 

In Python, regular expressions are supported through the re module, which provides functions and methods for working with regular expressions.

 

Here are some key concepts related to regular expressions in Python:

 

Pattern Matching: Regular expressions define patterns that can be matched against strings. Patterns can include literal characters, metacharacters, character classes, quantifiers, and more.

 

Metacharacters: Metacharacters are special characters in regular expressions that have special meanings. For example, . matches any character except a newline, ^ matches the start of a string, and $ matches the end of a string.

 

Character Classes: Character classes allow you to match specific sets of characters. For example, [aeiou] matches any vowel, [0-9] matches any digit, and [^a-z] matches any character except lowercase letters.

 

Quantifiers: Quantifiers specify how many times a pattern should be repeated. For example, * matches zero or more occurrences, + matches one or more occurrences, and ? matches zero or one occurrence.

 

Anchors: Anchors are used to specify positions in the string where a match should occur. For example, ^ matches the start of a string, $ matches the end of a string, and \b matches a word boundary.

 

Modifiers: Modifiers are used to change the behavior of regular expression patterns. For example, i performs case-insensitive matching, m enables multiline mode, and s enables dot-all mode.

 

Regular expressions are powerful tools for text processing and manipulation, allowing you to perform complex pattern matching operations with relatively simple patterns. They are widely used in tasks such as data validation, text extraction, parsing, and more.

 

Simple example that demonstrate Regular Expression for Pattern Search,

import re

 

# Sample text

text = "The quick brown fox jumps over the lazy dog."

 

# Search for the word "fox" in the text

pattern = r'fox'

match = re.search(pattern, text)

if match:

    print("Found 'fox' at position:", match.start())

else:

    print("Pattern not found.")

 

# Search for any word starting with "q" in the text

pattern = r'\bq\w+'

matches = re.findall(pattern, text)

if matches:

    print("Words starting with 'q':", matches)

else:

    print("Pattern not found.")

 

# Replace all occurrences of "fox" with "cat" in the text

pattern = r'fox'

new_text = re.sub(pattern, 'cat', text)

print("Modified text:", new_text)

Explanation,

This code demonstrates three common operations with regular expressions:

 

Searching: It searches for a specific pattern (in this case, the word "fox") within the text using the re.search() function. If a match is found, it prints the position of the match.

 

Finding All Matches: It finds all words in the text that start with the letter "q" using the re.findall() function. It uses the pattern \bq\w+, where \b indicates a word boundary, q matches the letter "q", and \w+ matches one or more word characters.

 

Substitution: It replaces all occurrences of "fox" with "cat" in the text using the re.sub() function. This function takes the pattern to search for, the replacement text, and the input text. It returns a new string with all occurrences of the pattern replaced.

 

This is a basic example to demonstrate the usage of regular expressions. Regular expressions can be much more powerful and versatile, allowing you to define complex patterns for searching, matching, and manipulating text data.

 

Another Example, By taking user input

import re

 

# Get user input

text = input("Enter a sentence: ")

 

# Search for the word "fox" in the text

pattern = r'fox'

match = re.search(pattern, text)

if match:

    print("Found 'fox' at position:", match.start())

else:

    print("Pattern not found.")

 

# Search for any word starting with "q" in the text

pattern = r'\bq\w+'

matches = re.findall(pattern, text)

if matches:

    print("Words starting with 'q':", matches)

else:

    print("Pattern not found.")

 

# Replace all occurrences of "fox" with "cat" in the text

pattern = r'fox'

new_text = re.sub(pattern, 'cat', text)

print("Modified text:", new_text)

If it find the string it will return otherwise it will return None

Following are some Pattern Search Characters:

Here are some common characters used in regular expressions and their purposes:

 

. (Dot):

Purpose: Matches any single character except newline (\n).

^ (Caret):

Purpose: Matches the start of the string.

When used inside square brackets [], it negates the character class, matching any character except the ones listed.

$ (Dollar):

Purpose: Matches the end of the string or just before the newline at the end of the string.

\ (Backslash):

Purpose: Escapes special characters, allowing you to match literal characters like ., ^, $, etc.

[] (Square brackets):

Purpose: Defines a character class, allowing you to match any one of the characters inside the brackets.

| (Pipe):

Purpose: Acts as an OR operator, allowing you to match either the expression before or after the pipe.

* (Asterisk):

Purpose: Matches zero or more occurrences of the preceding character or group.

+ (Plus):

Purpose: Matches one or more occurrences of the preceding character or group.

? (Question mark):

Purpose: Matches zero or one occurrence of the preceding character or group. It makes the preceding token optional.

{} (Curly braces):

Purpose: Specifies the number of occurrences of the preceding character or group. {n} matches exactly n occurrences, {n,} matches n or more occurrences, and {n,m} matches between n and m occurrences.

These are some of the basic characters used in regular expressions. They allow for powerful pattern matching and manipulation of strings.

For Example,

Email validation:

Lets validate email for given data

·        The email address must start with one or more word characters (letters, digits, or underscores).

·        It may contain dots (.), hyphens (-), or underscores (_) after the initial word characters.

·        The domain part must start with an @ symbol followed by one or more word characters.

·        The domain can have multiple subdomains separated by dots.

·        The TLD (Top-Level Domain) can contain between 2 to 4 characters.

 

import re

pattern = r'^\w+([-._]\w+)*@\w+([.-]\w+)*\.\w{2,4}$'

 

# Test email addresses

emails = [

    "user@example.com",

    "first.last@example.co.uk",

    "user123@sub.domain.example",

    "invalid.email@invalid"

]

 

# Validate email addresses

for email in emails:

    if re.match(pattern, email):

        print(f"{email} is a valid email address.")

    else:

        print(f"{email} is not a valid email address.")

Explanation:

Explanation of the regular expression pattern:

 

^: Matches the start of the string.

\w+: Matches one or more word characters (letters, digits, or underscores).

([-._]\w+)*: Matches zero or more occurrences of a hyphen, dot, or underscore followed by one or more word characters, allowing for subdomains.

@: Matches the @ symbol.

\w+([.-]\w+)*: Matches the domain part, allowing for dots (.) or hyphens (-) followed by one or more word characters.

\.\w{2,4}: Matches the dot (.) followed by the TLD (Top-Level Domain) with 2 to 4 characters.

$: Matches the end of the string.

This regular expression pattern demonstrates the usage of several special characters like ^, \w, [], *, +, . (dot), -, |, {}, and $.

Password Validation:

This regular expression pattern ensures that the password contains:

 

At least one uppercase letter.

At least one lowercase letter.

At least one digit.

At least one special character from @$!%*?&.

A length between 8 and 20 characters.

Code

import re

 

def validate_password(password):

    # Password pattern: 8-20 characters, at least one uppercase letter,

    # one lowercase letter, one digit, and one special character

    pattern = r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,20}$'

    if re.match(pattern, password):

        return True

    else:

        return False

 

# Test passwords

passwords = [

    "StrongPassword123!",

    "weakpassword",

    "1234567890",

    "Abcdefg123",

    "Test@Password!",

    "verylongpasswordthatexceeds20characters"

]

 

# Validate passwords

for password in passwords:

    if validate_password(password):

        print(f"{password} is a valid password.")

    else:

        print(f"{password} is not a valid password.")

Explanation:

Explanation of the regular expression pattern:

 

^: Matches the start of the string.

(?=.*[A-Z]): Positive lookahead assertion to ensure there is at least one uppercase letter.

(?=.*[a-z]): Positive lookahead assertion to ensure there is at least one lowercase letter.

(?=.*\d): Positive lookahead assertion to ensure there is at least one digit.

(?=.*[@$!%*?&]): Positive lookahead assertion to ensure there is at least one special character among @$!%*?&.

[A-Za-z\d@$!%*?&]{8,20}: Matches 8 to 20 characters consisting of letters (uppercase and lowercase), digits, and special characters.

$: Matches the end of the string.

टिप्पणी पोस्ट करा

0 टिप्पण्या