Python Regular Expressions (RegEx) Guide

A regular expression (RegEx) is a sequence of characters that forms a search pattern. RegEx can be used to check whether a string contains a specified search pattern.

RegEx module

There is a built-in package in Python called re that can be used to process regular expressions. Import the re module:

import re

RegEx in Python, once you have imported the re module you can start using regular expressions.

Example: Search a string to see if it starts with “The” and ends with “Spain”:

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

RegEx function

The

re module provides a set of functions that allow us to search for matches in a string:

Function Description

findall returns a list containing all matches

search Returns a Match object if there is a match anywhere in the string

split returns a list where the string has been split at each occurrence

sub replaces one or more matches with a string

Metacharacters

Metacharacters are characters with special meaning:

Character Description Example

[] A set of characters "[a-m]"

\ represents a special sequence (can also be used to escape special characters) "\d"

. any character (any character except newline) "he..o"

^ Starts with "^hello"

$ ends with "planet$"

- Zero or more occurrences of "he.*o"
- One or more occurrences of "he. + o"

? Zero or one occurrence of "he.?o"

{} The specified number of occurrences "he.{2}o"

| Either... or... "falls|stays"

() capture and group

Special sequence

The special sequence is a \ followed by a character from the following list, which has a special meaning:

Character Description Example

\A If the specified character is at the beginning of the string, returns a match "\AThe"

\b Returns matches of the specified character at the beginning or end of the word

(The leading "r" ensures that the string is treated as a "raw string") r"\bain"

r"ain\b"

\B returns matches where the specified character exists but is not at the beginning (or end) of the word

(The leading "r" ensures that the string is treated as a "raw string") r"\Bain"

r"ain\B"

\d Returns matches where the string contains numbers (0-9) "\d"

\D Returns matches where the string does not contain numbers "\D"

\s returns a match if the string contains whitespace characters "\s"

\S Returns a match where the string does not contain whitespace characters "\S"

\w Returns matches where the string contains any word character (from a to Z, from 0 to 9, and the underscore _ character) "\w"

\W Returns a match where the string does not contain any word characters "\W"

\Z If the specified character is at the end of the string, returns a match "Spain\Z"

Collection

A set is a group of characters enclosed in a pair of square brackets [] that has a special meaning:

Collection Description

[arn] Returns a match in which the specified character (a, r, or n) is present

[a-n] Returns a match for any lowercase character, alphabetically between a and n

[^arn] returns a match for any character except a, r, and n

[0123] Returns occurrences of any specified number (0, 1, 2, or 3) in the string

[0-9] Returns matches where any number (0 to 9) exists in the string

[0-5][0-9] Returns any two-digit match in the string between 00 and 59

[a-zA-Z] Returns a match for any alphabetic character in alphabetical order from a to z, case-insensitive

[ + ] In a collection, + , *, ., |, (), $, {} have no special meaning, so [ + ] means: Return a match of any + character in the string

findall() function

The findall() function returns a list of all matches.

Example: Print a list of all matches:

import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

The list contains matches in the order they are found. If no match is found, an empty list is returned:

Example: If no match is found, return an empty list:

import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)

search() function

The search() function searches a string for a match and returns a Match object if there is a match. If there are multiple matches, only the first match will be returned:

Example: Search for the first space character in a string:

import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first space character is at position:", x.start())

If no match is found, the return value is None:

Example: Conduct a search that returns no matches:

import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)

split() function

The split() function returns a list where the string has been split at each occurrence:

Example: Split at every space character:

import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

You can control the number of occurrences by specifying the maxsplit parameter:

Example: Split the string only at the first occurrence:

import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)

sub() function

The sub() function replaces matches with text of your choice:

Example: Replace each space character with the number 9:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

You can control the number of substitutions by specifying the count parameter:

Example: Replace the first two matches:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

Match object

A Match object is an object that contains information about the search and results.

Note: If there is no match, the value None will be returned instead of a Match object.

Example: Perform a search that returns a Match object:

import re

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) # This will print an object

Match objects have properties and methods for retrieving information about searches and results:

  • .span() Returns a tuple containing the start and end positions of the match.
  • .string Returns the string passed to the function.
  • .group() Returns the portion of a string in which a match exists.

Example: Print the position of the first match (start and end). Regular expression to find any word starting with a capital letter “S”:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w + ", txt)
print(x.span())

Example: Print the string passed to the function:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w + ", txt)
print(x.string)

Example: Print the portion of the string where a match exists. Regular expression to find any word starting with a capital letter “S”:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w + ", txt)
print(x.group())

Finally

In order to facilitate friends on other devices and platforms to view previous articles:

Search the WeChat public account: Let us Coding, and follow it to get the latest article push

If you find it helpful after reading this, please like, collect, and follow