Regular Expressions - re Module, Pattern Matching

Python Regular Expressions (Regex): Pattern Matching Mastery

Harness the power of regex to search, validate, and transform text data!

1. What Are Regular Expressions?

Regular expressions (regex) are patterns used to match and manipulate text. They’re ideal for:

Validating emails/phone numbers
Extracting data (e.g., dates, URLs)
Replacing text patterns

2. re Module Basics

Key Functions

Function	Description
re.search()	Check if a pattern exists anywhere in the string.
re.match()	Check if the pattern matches at the start of the string.
re.findall()	Return all non-overlapping matches as a list.
re.sub()	Replace matches with new text.
re.compile()	Precompile a regex pattern for reuse.

3. Pattern Syntax Cheatsheet

Pattern	Meaning	Example
^	Start of string	^Hello
$	End of string	world$
.	Any character (except newline)	h.t → "hot", "hat"
\d	Digit (0-9)	\d\d → "42"
\w	Word character (a-z, A-Z, 0-9, _)	\w+ → "Python3"
\s	Whitespace (space, tab, newline)	\s+
[abc]	Match any of a, b, or c	[aeiou] → vowels
[^abc]	Match not a, b, or c	[^0-9] → non-digits
a\|b	Match a or b	cat\|dog
{n}	Exactly n repetitions	\d{3} → "123"
{n,m}	Between n and m repetitions	\w{2,4} → "A1b"
*	0 or more repetitions	a* → "", "a", "aa"
+	1 or more repetitions	\d+ → "7", "45"
?	0 or 1 repetition	colou?r → "color", "colour"

4. Common Use Cases

1. Email Validation


import re  

pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'  
email = "user.name@example.com"  

if re.match(pattern, email):  
    print("Valid email!")  
else:  
    print("Invalid email!")

Pattern Breakdown:

^[\w\.-]+: Username (letters, numbers, ., -, _)
@: Literal @
[\w\.-]+: Domain name
\.\w+$: Top-level domain (e.g., .com, .org)

2. Phone Number Extraction


text = "Call me at 555-1234 or (555) 567-8901."  
pattern = r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'  

matches = re.findall(pattern, text)  
print(matches)  # Output: ['555-1234', '(555) 567-8901']

3. URL Extraction


text = "Visit https://example.com or http://sub.site.org"  
pattern = r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'  

urls = re.findall(pattern, text)  
print(urls)  # Output: ['https://example.com', 'http://sub.site.org']

5. Groups & Capturing

Use parentheses () to capture parts of a match.

Example: Extract date components


text = "Date: 2024-07-15"  
pattern = r'(\d{4})-(\d{2})-(\d{2})'  

match = re.search(pattern, text)  
if match:  
    year, month, day = match.groups()  
    print(f"Year: {year}, Month: {month}, Day: {day}")

Output:


Year: 2024, Month: 07, Day: 15

6. Flags for Advanced Matching

Modify regex behavior with flags:

re.IGNORECASE (re.I): Case-insensitive matching
re.MULTILINE (re.M): ^ and $ match start/end of lines
re.DOTALL (re.S): . matches newlines

Example:


text = "HELLO\nworld"  
matches = re.findall(r'^[a-z]+', text, flags=re.I | re.M)  
print(matches)  # Output: ['HELLO', 'world']

7. Greedy vs. Non-Greedy Matching

Greedy (*, +, ?): Match as much as possible.

Non-Greedy (*?, +?, ??): Match as little as possible.

Example:


text = "Content
More"  

# Greedy  
greedy_match = re.search(r'.*', text).group()  
print(greedy_match)  # Output: "Content
More"  

# Non-Greedy  
non_greedy_match = re.search(r'.*?', text).group()  
print(non_greedy_match)  # Output: "Content"

8. Best Practices

Test Patterns: Use tools like Regex101 to debug.
Raw Strings: Use r"pattern" to avoid escaping backslashes.
Precompile: Use re.compile() for frequently used patterns.
Comment Complex Regex: Use re.VERBOSE for multi-line patterns.

Practice Problem

Extract all hashtags from a tweet:


tweet = "Learning #Python is fun! #Coding #100DaysOfCode"  
pattern = r'#\w+'  
hashtags = re.findall(pattern, tweet)  
print(hashtags)  # Output: ['#Python', '#Coding', '#100DaysOfCode']

Key Takeaways

✅ Use re.search(), re.findall(), and re.sub() for common tasks.
✅ Metacharacters like \d, \w, ^, $ define patterns.
✅ Groups capture sub-patterns, flags modify behavior.
✅ Greedy vs. Non-Greedy quantifiers control match length.

What’s Next?

Learn web scraping with requests and BeautifulSoup to extract data from websites!

Previous Next

Ethical circuits - AI News and Tips

Regular Expressions - re Module, Pattern Matching

Python Regular Expressions (Regex): Pattern Matching Mastery

1. What Are Regular Expressions?

2. re Module Basics

Key Functions

3. Pattern Syntax Cheatsheet

4. Common Use Cases

1. Email Validation

2. Phone Number Extraction

3. URL Extraction

5. Groups & Capturing

Example: Extract date components

6. Flags for Advanced Matching

Example:

7. Greedy vs. Non-Greedy Matching

Example:

8. Best Practices

Practice Problem

Key Takeaways

What’s Next?

Post a Comment

Popular Items

India vs China: The Battle for AI Independence - Two Nations, Two Approaches

How Indian Farmers Are Harnessing AI to Revolutionize Agriculture and Boost Crop Yields

AI Tools for Neurodivergent Professionals: Enhancing Productivity

How to Understand Neural Networks for Beginners

Contact form