Regular Expressions - re Module, Pattern Matching

Python Regular Expressions (Regex): Pattern Matching Mastery

Python Regular Expressions (Regex): Pattern Matching Mastery

Harness the power of regex to search, validate, and transform text data!

1. What Are Regular Expressions?

Regular expressions (regex) are patterns used to match and manipulate text. They’re ideal for:

  • Validating emails/phone numbers
  • Extracting data (e.g., dates, URLs)
  • Replacing text patterns

2. re Module Basics

Key Functions

Function Description
re.search() Check if a pattern exists anywhere in the string.
re.match() Check if the pattern matches at the start of the string.
re.findall() Return all non-overlapping matches as a list.
re.sub() Replace matches with new text.
re.compile() Precompile a regex pattern for reuse.

3. Pattern Syntax Cheatsheet

Pattern Meaning Example
^ Start of string ^Hello
$ End of string world$
. Any character (except newline) h.t → "hot", "hat"
\d Digit (0-9) \d\d → "42"
\w Word character (a-z, A-Z, 0-9, _) \w+ → "Python3"
\s Whitespace (space, tab, newline) \s+
[abc] Match any of a, b, or c [aeiou] → vowels
[^abc] Match not a, b, or c [^0-9] → non-digits
a|b Match a or b cat|dog
{n} Exactly n repetitions \d{3} → "123"
{n,m} Between n and m repetitions \w{2,4} → "A1b"
* 0 or more repetitions a* → "", "a", "aa"
+ 1 or more repetitions \d+ → "7", "45"
? 0 or 1 repetition colou?r → "color", "colour"

4. Common Use Cases

1. Email Validation


import re  

pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'  
email = "user.name@example.com"  

if re.match(pattern, email):  
    print("Valid email!")  
else:  
    print("Invalid email!")  

            

Pattern Breakdown:

  • ^[\w\.-]+: Username (letters, numbers, ., -, _)
  • @: Literal @
  • [\w\.-]+: Domain name
  • \.\w+$: Top-level domain (e.g., .com, .org)

2. Phone Number Extraction


text = "Call me at 555-1234 or (555) 567-8901."  
pattern = r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'  

matches = re.findall(pattern, text)  
print(matches)  # Output: ['555-1234', '(555) 567-8901']  

            

3. URL Extraction


text = "Visit https://example.com or http://sub.site.org"  
pattern = r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'  

urls = re.findall(pattern, text)  
print(urls)  # Output: ['https://example.com', 'http://sub.site.org']  

            

5. Groups & Capturing

Use parentheses () to capture parts of a match.

Example: Extract date components


text = "Date: 2024-07-15"  
pattern = r'(\d{4})-(\d{2})-(\d{2})'  

match = re.search(pattern, text)  
if match:  
    year, month, day = match.groups()  
    print(f"Year: {year}, Month: {month}, Day: {day}")  

            

Output:


Year: 2024, Month: 07, Day: 15  

            

6. Flags for Advanced Matching

Modify regex behavior with flags:

  • re.IGNORECASE (re.I): Case-insensitive matching
  • re.MULTILINE (re.M): ^ and $ match start/end of lines
  • re.DOTALL (re.S): . matches newlines

Example:


text = "HELLO\nworld"  
matches = re.findall(r'^[a-z]+', text, flags=re.I | re.M)  
print(matches)  # Output: ['HELLO', 'world']  

            

7. Greedy vs. Non-Greedy Matching

Greedy (*, +, ?): Match as much as possible.

Non-Greedy (*?, +?, ??): Match as little as possible.

Example:


text = "
Content
More
" # Greedy greedy_match = re.search(r'
.*
', text).group() print(greedy_match) # Output: "
Content
More
" # Non-Greedy non_greedy_match = re.search(r'
.*?
', text).group() print(non_greedy_match) # Output: "
Content
"

8. Best Practices

  • Test Patterns: Use tools like Regex101 to debug.
  • Raw Strings: Use r"pattern" to avoid escaping backslashes.
  • Precompile: Use re.compile() for frequently used patterns.
  • Comment Complex Regex: Use re.VERBOSE for multi-line patterns.

Practice Problem

Extract all hashtags from a tweet:


tweet = "Learning #Python is fun! #Coding #100DaysOfCode"  
pattern = r'#\w+'  
hashtags = re.findall(pattern, tweet)  
print(hashtags)  # Output: ['#Python', '#Coding', '#100DaysOfCode']  

            

Key Takeaways

  • ✅ Use re.search(), re.findall(), and re.sub() for common tasks.
  • ✅ Metacharacters like \d, \w, ^, $ define patterns.
  • ✅ Groups capture sub-patterns, flags modify behavior.
  • ✅ Greedy vs. Non-Greedy quantifiers control match length.

What’s Next?

Learn web scraping with requests and BeautifulSoup to extract data from websites!

Post a Comment

Previous Post Next Post