A regular expression (RegEx) is a sequence of characters that forms a search pattern. RegEx can be used to check whether a string contains a specified search pattern.
RegEx module
There is a built-in package in Python called re that can be used to process regular expressions. Import the re module:
import re
RegEx in Python, once you have imported the re module you can start using regular expressions.
Example: Search a string to see if it starts with “The” and ends with “Spain”:
import re txt = "The rain in Spain" x = re.search("^The.*Spain$", txt)
RegEx function
The
re module provides a set of functions that allow us to search for matches in a string: Function Description findall returns a list containing all matches search Returns a Match object if there is a match anywhere in the string split returns a list where the string has been split at each occurrence sub replaces one or more matches with a string Metacharacters Metacharacters are characters with special meaning: Character Description Example [] A set of characters "[a-m]" \ represents a special sequence (can also be used to escape special characters) "\d" . any character (any character except newline) "he..o" ^ Starts with "^hello" $ ends with "planet$" - Zero or more occurrences of "he.*o" - One or more occurrences of "he. + o" ? Zero or one occurrence of "he.?o" {} The specified number of occurrences "he.{2}o" | Either... or... "falls|stays" () capture and group
Special sequence
The special sequence is a \ followed by a character from the following list, which has a special meaning:
Character Description Example \A If the specified character is at the beginning of the string, returns a match "\AThe" \b Returns matches of the specified character at the beginning or end of the word (The leading "r" ensures that the string is treated as a "raw string") r"\bain" r"ain\b" \B returns matches where the specified character exists but is not at the beginning (or end) of the word (The leading "r" ensures that the string is treated as a "raw string") r"\Bain" r"ain\B" \d Returns matches where the string contains numbers (0-9) "\d" \D Returns matches where the string does not contain numbers "\D" \s returns a match if the string contains whitespace characters "\s" \S Returns a match where the string does not contain whitespace characters "\S" \w Returns matches where the string contains any word character (from a to Z, from 0 to 9, and the underscore _ character) "\w" \W Returns a match where the string does not contain any word characters "\W" \Z If the specified character is at the end of the string, returns a match "Spain\Z"
Collection
A set is a group of characters enclosed in a pair of square brackets [] that has a special meaning:
Collection Description [arn] Returns a match in which the specified character (a, r, or n) is present [a-n] Returns a match for any lowercase character, alphabetically between a and n [^arn] returns a match for any character except a, r, and n [0123] Returns occurrences of any specified number (0, 1, 2, or 3) in the string [0-9] Returns matches where any number (0 to 9) exists in the string [0-5][0-9] Returns any two-digit match in the string between 00 and 59 [a-zA-Z] Returns a match for any alphabetic character in alphabetical order from a to z, case-insensitive [ + ] In a collection, + , *, ., |, (), $, {} have no special meaning, so [ + ] means: Return a match of any + character in the string
findall() function
The findall() function returns a list of all matches.
Example: Print a list of all matches:
import re txt = "The rain in Spain" x = re.findall("ai", txt) print(x)
The list contains matches in the order they are found. If no match is found, an empty list is returned:
Example: If no match is found, return an empty list:
import re txt = "The rain in Spain" x = re.findall("Portugal", txt) print(x)
search() function
The search() function searches a string for a match and returns a Match object if there is a match. If there are multiple matches, only the first match will be returned:
Example: Search for the first space character in a string:
import re txt = "The rain in Spain" x = re.search("\s", txt) print("The first space character is at position:", x.start())
If no match is found, the return value is None:
Example: Conduct a search that returns no matches:
import re txt = "The rain in Spain" x = re.search("Portugal", txt) print(x)
split() function
The split() function returns a list where the string has been split at each occurrence:
Example: Split at every space character:
import re txt = "The rain in Spain" x = re.split("\s", txt) print(x)
You can control the number of occurrences by specifying the maxsplit parameter:
Example: Split the string only at the first occurrence:
import re txt = "The rain in Spain" x = re.split("\s", txt, 1) print(x)
sub() function
The sub() function replaces matches with text of your choice:
Example: Replace each space character with the number 9:
import re txt = "The rain in Spain" x = re.sub("\s", "9", txt) print(x)
You can control the number of substitutions by specifying the count parameter:
Example: Replace the first two matches:
import re txt = "The rain in Spain" x = re.sub("\s", "9", txt, 2) print(x)
Match object
A Match object is an object that contains information about the search and results.
Note: If there is no match, the value None will be returned instead of a Match object.
Example: Perform a search that returns a Match object:
import re txt = "The rain in Spain" x = re.search("ai", txt) print(x) # This will print an object
Match objects have properties and methods for retrieving information about searches and results:
.span()
Returns a tuple containing the start and end positions of the match..string
Returns the string passed to the function..group()
Returns the portion of a string in which a match exists.
Example: Print the position of the first match (start and end). Regular expression to find any word starting with a capital letter “S”:
import re txt = "The rain in Spain" x = re.search(r"\bS\w + ", txt) print(x.span())
Example: Print the string passed to the function:
import re txt = "The rain in Spain" x = re.search(r"\bS\w + ", txt) print(x.string)
Example: Print the portion of the string where a match exists. Regular expression to find any word starting with a capital letter “S”:
import re txt = "The rain in Spain" x = re.search(r"\bS\w + ", txt) print(x.group())
Finally
In order to facilitate friends on other devices and platforms to view previous articles:
Search the WeChat public account: Let us Coding
, and follow it to get the latest article push
If you find it helpful after reading this, please like, collect, and follow