Python Regular Expressions (Regex): Pattern Matching Mastery
Harness the power of regex to search, validate, and transform text data!
1. What Are Regular Expressions?
Regular expressions (regex) are patterns used to match and manipulate text. They’re ideal for:
- Validating emails/phone numbers
- Extracting data (e.g., dates, URLs)
- Replacing text patterns
2. re Module Basics
Key Functions
Function | Description |
---|---|
re.search() | Check if a pattern exists anywhere in the string. |
re.match() | Check if the pattern matches at the start of the string. |
re.findall() | Return all non-overlapping matches as a list. |
re.sub() | Replace matches with new text. |
re.compile() | Precompile a regex pattern for reuse. |
3. Pattern Syntax Cheatsheet
Pattern | Meaning | Example |
---|---|---|
^ | Start of string | ^Hello |
$ | End of string | world$ |
. | Any character (except newline) | h.t → "hot", "hat" |
\d | Digit (0-9) | \d\d → "42" |
\w | Word character (a-z, A-Z, 0-9, _) | \w+ → "Python3" |
\s | Whitespace (space, tab, newline) | \s+ |
[abc] | Match any of a, b, or c | [aeiou] → vowels |
[^abc] | Match not a, b, or c | [^0-9] → non-digits |
a|b | Match a or b | cat|dog |
{n} | Exactly n repetitions | \d{3} → "123" |
{n,m} | Between n and m repetitions | \w{2,4} → "A1b" |
* | 0 or more repetitions | a* → "", "a", "aa" |
+ | 1 or more repetitions | \d+ → "7", "45" |
? | 0 or 1 repetition | colou?r → "color", "colour" |
4. Common Use Cases
1. Email Validation
import re
pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'
email = "user.name@example.com"
if re.match(pattern, email):
print("Valid email!")
else:
print("Invalid email!")
Pattern Breakdown:
^[\w\.-]+
: Username (letters, numbers, ., -, _)@
: Literal @[\w\.-]+
: Domain name\.\w+$
: Top-level domain (e.g., .com, .org)
2. Phone Number Extraction
text = "Call me at 555-1234 or (555) 567-8901."
pattern = r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'
matches = re.findall(pattern, text)
print(matches) # Output: ['555-1234', '(555) 567-8901']
3. URL Extraction
text = "Visit https://example.com or http://sub.site.org"
pattern = r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
urls = re.findall(pattern, text)
print(urls) # Output: ['https://example.com', 'http://sub.site.org']
5. Groups & Capturing
Use parentheses ()
to capture parts of a match.
Example: Extract date components
text = "Date: 2024-07-15"
pattern = r'(\d{4})-(\d{2})-(\d{2})'
match = re.search(pattern, text)
if match:
year, month, day = match.groups()
print(f"Year: {year}, Month: {month}, Day: {day}")
Output:
Year: 2024, Month: 07, Day: 15
6. Flags for Advanced Matching
Modify regex behavior with flags:
re.IGNORECASE
(re.I
): Case-insensitive matchingre.MULTILINE
(re.M
):^
and$
match start/end of linesre.DOTALL
(re.S
):.
matches newlines
Example:
text = "HELLO\nworld"
matches = re.findall(r'^[a-z]+', text, flags=re.I | re.M)
print(matches) # Output: ['HELLO', 'world']
7. Greedy vs. Non-Greedy Matching
Greedy (*
, +
, ?
): Match as much as possible.
Non-Greedy (*?
, +?
, ??
): Match as little as possible.
Example:
text = "ContentMore"
# Greedy
greedy_match = re.search(r'.*', text).group()
print(greedy_match) # Output: "ContentMore"
# Non-Greedy
non_greedy_match = re.search(r'.*?', text).group()
print(non_greedy_match) # Output: "Content"
8. Best Practices
- Test Patterns: Use tools like Regex101 to debug.
- Raw Strings: Use
r"pattern"
to avoid escaping backslashes. - Precompile: Use
re.compile()
for frequently used patterns. - Comment Complex Regex: Use
re.VERBOSE
for multi-line patterns.
Practice Problem
Extract all hashtags from a tweet:
tweet = "Learning #Python is fun! #Coding #100DaysOfCode"
pattern = r'#\w+'
hashtags = re.findall(pattern, tweet)
print(hashtags) # Output: ['#Python', '#Coding', '#100DaysOfCode']
Key Takeaways
- ✅ Use
re.search()
,re.findall()
, andre.sub()
for common tasks. - ✅ Metacharacters like
\d
,\w
,^
,$
define patterns. - ✅ Groups capture sub-patterns, flags modify behavior.
- ✅ Greedy vs. Non-Greedy quantifiers control match length.
What’s Next?
Learn web scraping with requests and BeautifulSoup to extract data from websites!
Tags:
python