Python : RegEx

Regular expressions (regex) in Python are a powerful tool for pattern matching and string manipulation. The re module in Python provides support for working with regular expressions. Here's an overview of how to use regex in Python:

Importing the re Module:


To use the functionalities provided by the re module, you need to import it:


import re
 

Searching for a Pattern:


The re.search() function searches for a specified pattern within a string. It returns a match object if the pattern is found, or None if no match is found.


text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"
match = re.search(pattern, text)
if match:
    print("Pattern found at index:", match.start())
else:
    print("Pattern not found")

 

Matching Patterns:


The re.match() function attempts to match a pattern at the beginning of a string. It returns a match object if the pattern is found at the beginning of the string, or None otherwise.


text = "The quick brown fox jumps over the lazy dog"
pattern = "The"
match = re.match(pattern, text)
if match:
    print("Pattern found")
else:
    print("Pattern not found")

 

Finding All Matches:


The re.findall() function finds all occurrences of a pattern within a string and returns them as a list of strings.


text = "The quick brown fox jumps over the lazy dog"
pattern = "the"
matches = re.findall(pattern, text, re.IGNORECASE)  # Case-insensitive search
print("Number of matches:", len(matches))

 

Substituting Patterns:


The re.sub() function substitutes occurrences of a pattern in a string with a specified replacement string.


text = "The quick brown fox jumps over the lazy dog"
pattern = "lazy"
replacement = "sleepy"
new_text = re.sub(pattern, replacement, text)
print("New text:", new_text)

 

Regular Expression Patterns:


Regular expressions support a wide range of pattern matching options, including:

  • Literal characters
  • Character classes ( [...] )
  • Quantifiers ( *, +, ?, {m}, {m,n})
  • Anchors (^, $)
  • Groups (...)
  • Alternation ( | )
  • Escape sequences ( \d , \w , \s , etc.)
  • And more...

Example:

text = "The quick brown fox jumps over the lazy dog"
pattern = r"\b\w{3}\b"  # Matches three-letter words
matches = re.findall(pattern, text)
print("Three-letter words:", matches)

 

Flags:


The re module also supports flags for modifying the behavior of regex operations. Common flags include re.IGNORECASE (for case-insensitive matching) and re.MULTILINE (for multiline matching).

Example:

text = "The quick brown fox\njumps over the lazy dog"
pattern = r"^the"  # Matches 'the' at the beginning of a line
matches = re.findall(pattern, text, re.IGNORECASE | re.MULTILINE)
print("Matches:", matches)

 

Regular expressions are a powerful tool for text processing and pattern matching in Python. They provide a concise and flexible way to search, extract, and manipulate text data based on complex patterns. However, they can also be complex and hard to read, so it's essential to use them judiciously and document them well.