The use and exception of python regularization expressions and generic functions

Directory

1. Regular expressions

3.1 Matching function

3.2 Retrieval and Replacement

3.3 Regular Expression Objects

Second, the common method to use

Third, the use of generic functions

Fourth, the context manager

5. Decorator

6. Abnormal

6.1 Throwing and catching exceptions


1. Regular expression

A regular expression is a special sequence of characters that helps check whether a string matches a certain pattern

The re module has been added to python, so that the Python language has all the regular expression functions

  • re.match(): Match from the beginning of the string, and return none if unsuccessful.
  • re.search(): Matches the entire string until a match is found.
  • re.groups(): returns matching expressions
  • re.sub(): Used to replace matches in a string.
  • re.compile (): used to compile the regular expression and generate a regular expression (Pattern) object for use by the two functions match() and search()
  • re.finditer(): Find all substrings matched by the regular expression in the string and return them as an iterator.
  • re.split(): The matched substring splits the string and returns a list

(1) Special character classes and their meanings

(2) Character classes

(3) Regular expression mode

Pattern strings use a special syntax to represent a regular expression:

Letters and numbers represent themselves. Letters and numbers in a regular expression pattern match the same string. Most letters and numbers have a different meaning when preceded by a backslash. Punctuation marks match themselves only if they are escaped, otherwise they have a special meaning. The backslash itself needs to be escaped with a backslash (\). Since regular expressions often contain backslashes, it’s best to use raw strings to represent them. Pattern elements (such as r’\t’, equivalent to ‘\t’) match the corresponding special characters.

The following table lists the special elements in the regular expression pattern syntax. When using a pattern with optional flags arguments, the meaning of some pattern elements changes.

3.1 Matching function

(1) re.match function

Match a pattern from the starting position of the string. If the match is successful, output the range of the character (closed before opening and then opened). If the matching is not successful at the starting position, match() will return none.

re.match(pattern, string, flags=0) #pattern: matching regular expression;
import re #import module
print(re.match('www', 'www.runoob.com').span()) # match at the beginning output (0, 3)
print(re.match('com', 'www.runoob.com')) # does not match at the beginning position output None

Use the match object function to get the match expression:

  • group(num): Get the element corresponding to the num subscript;
  • group() gets all elements.
  • groups(): Returns a tuple of all group strings, from 1 to the contained group number
import re
 
line = "Cats are smarter than dogs" #Define a string variable
 
matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)
 
if matchObj:
matchObj:
   print("matchObj.group() : ", matchObj.groups()) # ('Cats', 'smarter')
   print("matchObj.group() : ", matchObj.group()) # matchObj.group() : Cats are smarter than dogs
   print("matchObj.group(1) : ", matchObj.group(1)) # matchObj.group(1) : Cats
   print("matchObj.group(2) : ", matchObj.group(2)) # matchObj.group(2) : smarter
else:
   print("No match!!")

(2) re.search method

Scans the entire string and returns the first successful match. Use the group(num) or groups() match object functions to obtain match expressions.

re. search(pattern, string, flags=0)
import re
print(re.search('www', 'www.runoob.com').span()) # The location of www is (0, 3)
print(re.search('com', 'www.runoob.com').span()) # The position of com is (11, 14) before closing and then opening

line = "Cats are smarter than dogs"
 
searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)
 
if searchObj:
   print("searchObj.group() : ", searchObj.group()) # Cats are smarter than dogs
   print("searchObj.group(1) : ", searchObj.group(1)) # Cats
   print("searchObj.group(2) : ", searchObj.group(2)) # smarter
else:
   print("Nothing found!!")

(3) The difference between re.match and re.search

re.match only matches the beginning of the string, if the beginning of the string does not match the regular expression, the match fails and the function returns None; while re.search matches the entire string until a match is found (returns its position).

import re
line = "Cats are smarter than dogs"
 
matchObj = re.match(r'dogs', line, re.M|re.I) #Start matching dog from the string, because the starting value is not dog, so the match is unsuccessful
if matchObj:
   print("match --> matchObj.group() : ", matchObj.group())
else:
   print("No mtch!!") # Execute this statement
 
matchObj = re.search(r'dogs', line, re.M|re.I) #traverse the entire string to match dogs, and the match can be successful
if matchObj:
   print("search --> searchObj.group() : ", matchObj.group()) #Execute this statement and output dogs (refer to matchobj for specific output)
else:
   print("No match!!")

(1) The re module provides re.sub for replacing matches in strings.

re.sub(pattern, repl, string, count=0, flags=0)
  • pattern : The pattern string in the regex.
  • repl : The string to replace, it can also be a function.
  • string : The raw string to be searched and replaced.
  • count : The maximum number of replacements after pattern matching, default 0 means replace all matches.
import re
 
phone = "2004-959-559 # This is a foreign phone number"
 
# Remove Python comments from the string
num = re.sub(r'#.*$', "", phone)
print("The phone number is: ", num) #The phone number is: 2004-959-559
 
# Remove non-numeric (-) strings
num = re.sub(r'\D', "", phone)
print("The phone number is: ", num) #The phone number is: 2004959559

(2) re.compile function

Used to compile regular expressions to generate a regular expression (Pattern) object for use by the two functions match() and search().

  1. re.I ignore case
  2. re.L represents the special character set \w, \W, \b, \B, \s, \S depends on the current environment
  3. re.M multiline mode
  4. re.S is . and any character including newline (. does not include newline)
  5. re.U represents the special character set \w, \W, \b, \B, \d, \D, \s, \S depends on the Unicode character attribute database
  6. re.X Ignore spaces and comments after # for readability

In the above, a Match object is returned when the match is successful, where:

  • The group([group1, …]) method is used to obtain one or more group matching strings. When you want to obtain the entire matching substring, you can directly use group() code> or group(0);
  • The start([group]) method is used to get the starting position of the substring matched by the group in the entire string (the index of the first character of the substring), and the default value of the parameter is 0;
  • The end([group]) method is used to obtain the end position of the substring matched by the group in the entire string (the index of the last character of the substring + 1), and the default value of the parameter is 0;
  • The span([group]) method returns (start(group), end(group)).

(3) findall

Find all the substrings matched by the regular expression in the string, and return a list; if there are multiple matching patterns, return a list of tuples, if no match is found, return an empty list .

Attention! ! ! match and search match once, findall matches all.

findall(string[, pos[, endpos]])
  • string : The string to match.
  • pos : optional parameter, specify the starting position of the string, the default is 0.
  • endpos : Optional parameter, specify the end position of the string, the default is the length of the string.
import re
 
pattern = re.compile(r'\d + ') # only look for numbers
result1 = pattern.findall('runoob 123 google 456')
result2 = pattern.findall('run88oob123google456', 0, 10)
 
print(result1)
print(result2)

Multiple match patterns:

result = re.findall(r'(\w + )=(\d + )', 'set width=20 and height=10')
print(result) # output [('width', '20'), ('height', '10')]

3.3 Regular Expression Object

re.compile() returns a RegexObject object.

group() returns the strings matched by the RE.

  • start() returns the position where the match starts
  • end() returns the position where the match ends
  • span() returns a tuple containing the position of the match (start, end)

2. Commonly used methods

import re
print(re.match('www', 'www.runoob.com').span()) # match at the starting position (0, 3)
print(re.match('com', 'www.runoob.com')) # does not match none at the beginning


#------Find all the numbers in the string --------#
pattern = re.compile(r'\d + ') # find numbers
result1 = pattern.findall('runoob 123 google 456')
result2 = pattern.findall('run88oob123google456', 0, 10)
print(result1) # ['123', '456']
print(result2) #['88', '12']


#------#Multiple matching patterns, return a list of tuples --------#
result = re.findall(r'(\w + )=(\d + )', 'set width=20 and height=10')
print(result) #[('width', '20'), ('height', '10')]

1. Regular search URL

import re

str1 = input()
result = re.match("https://www", str1)
print(result.span()) ##Output the range of returned results
# print(re.match('https://www', str1).span()) #Output the range of the URL from the beginning match to the first mismatch. 

2. map() function:

from collections.abc import Iterator #import iterator

map_obj = map(lambda x: x*2, [1,2,3,4,5])
print(isinstance(map_obj,Iterator)) #True
print(list(map_obj)) #[2, 4, 6, 8, 10]
 
def square(x): #define a function square()
    v = x**2
    print(v)
    
print(list(map(square, [1,2,3,4,5]))) 

3. filter () function:

filter_obj = filter(lambda x: x > 5, range(0,10))
print(list(filter_obj))

4.isinstance(): You can determine whether an object is an instance of a specific type or custom class.

print(isinstance("hello world", str)) #True
print(isinstance(10,int)) #True
print(isinstance(10.0,float)) #True
print(isinstance(5,float)) #False

5.hasattr(obj,attribute): Determine whether the target object obj contains the attribute attribute

aa = hasattr(json,"dumps")
print(aa) ##True

bb = getattr(json,"__path__") #Get the value of attribute __path__
print(bb) # ['D:\Anacoda3\lib\json']

6. callable(): Determine whether an object is callable (such as functions and classes, these objects are callable objects.)

print(callable("hello python")) #False
print(callable(list)) #True

7. Module

print(json.__doc__) #Query the documentation of the module json, which outputs the same content as help()
print(json.__name__) #query the name of the module json
print(json.__file__) #Query the file path of the module json. If the built-in module does not have this attribute, accessing it will throw an exception!
print(json.__dict__) #Query the dictionary type object of module json

3. Use of generic functions

Generic: singledispatch in Python: calls different functions according to different types of incoming parameters.

from functools import singledispatch

@singledispatch
def age(obj):
    print('Please pass in a legal type of parameter!')

@age. register(int)
def _(age):
    print('I am {} years old.'.format(age))

@age. register(str)
def _(age):
    print('I am {} years old.'. format(age))


age(23) # int I am 23 years old.
age('twenty three') # str I am twenty three years old.
age(['23']) # list Please pass in parameters of legal type! 

(1) Splicing of functions

from functools import singledispatch

def check_type(func): #General stitching function
    def wrapper(*args):
        arg1, arg2 = args[:2]
        if type(arg1) != type(arg2):
            return '[Error]: The parameter types are different and cannot be spliced!!'
        return func(*args)
    return wrapper

@singledispatch
def add(obj, new_obj):
    raise TypeError

@add. register(str)
@check_type
def _(obj, new_obj):
    obj + = new_obj #String concatenation
    return obj

@add. register(list)
@check_type
def _(obj, new_obj):
    obj.extend(new_obj) #list splicing
    return obj

@add. register(dict)
@check_type
def _(obj, new_obj):
    obj.update(new_obj) #dictionary stitching
    return obj

@add. register(tuple)
@check_type
def _(obj, new_obj):
    return (*obj, *new_obj) #tuple splicing

print(add('hello',', world')) #hello, world
print(add([1,2,3], [4,5,6])) #[1, 2, 3, 4, 5, 6]
print(add({'name': 'wangbm'}, {'age':25})) #{'name': 'wangbm', 'age': 25}
print(add(('apple', 'huawei'), ('vivo', 'oppo'))) #('apple', 'huawei', 'vivo', 'oppo')

#list and string cannot be concatenated (different types cannot be concatenated)
print(add([1,2,3], '4,5,6')) # Output: [Error]: Parameter types are different, cannot be spliced!!

four, context manager

Benefits of context managers: Improve code reuse, elegance, and readability;

(1) Read the file content

file = open("C:\Users\HY\Desktop\Autotest\1.txt")
print(file.readline()) #read the first line
print(file.read()) #read the entire content of the file
file.close() #Manually close the file handle

(2) Use the with keyword to read the file, and the file handle can be automatically closed after reading (with is the context manager)

  • Context expression: with open('test.txt') as file:
  • Context manager: open('test.txt')
  • file is a resource object
with open("C:\Users\HY\Desktop\Autotest\1.txt") as file:
    print(file. read())

(3) Implement a context manager in the class: That is to say, define in the class: __enter__ and __exit__ methods, the instance of this class is also the context manager

class Resource():
    def __enter__(self):
        print("-----connect to resource-----")
        return self
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        print("-----close resource connection------")
        
    def func(self):
        print("----Execute the logic inside the function")
    
with Resource() as result:
    result. func()

# output:
# -----connect to resource-----
# ---- Execute the logic inside the function
# -----close resource connection------

When writing the __exit__ function, it must have these three parameters:

  • exc_type: exception type

  • exc_val: abnormal value

  • exc_tb: exception error stack information

When the main logic code does not report an exception, these three parameters will all be None.

(4) Use contextlib to build a context manager (implement the context manager through a function instead of a class)

In python, the contextlib protocol implements a context manager for opening files (with open).

import contextlib

@contextlib.contextmanager
def open_func(file_name):
    
# __enter__ method
    print("open file:",file_name,"in__enter__")
    file_handler = open(file_name,"r") #open file
    
    yield file_handler # generator (with yield)
    
# __exit__ method
    print("close file:",file_name,"in__exit")
    file_handler.close() #close the handle
    return

with open_func("C:\Users\HY\Desktop\Autotest\1.txt") as file_in:
    for line in file_in:
        print(line)

The above code can only achieve the first purpose of the context manager (manage resources), but cannot achieve the second purpose (handle exceptions). If you want to handle exceptions, you can change it to:

import contextlib

@contextlib.contextmanager
def open_func(file_name):
    
# __enter__ method
    print("open file:",file_name,"in__enter__")
    file_handler = open(file_name,"r") #read file
    
    try:
        yield file_handler # generator (with yield)
    except Exception as exc:
        print("the exception was thrown")
    finally:
        print("close file:",file_name,"in__exit") # __exit__ method
        file_handler.close() #close the handle
        return

with open_func("C:\Users\HY\Desktop\Autotest\1.txt") as file_in:
    for line in file_in:
        1/0
        print(line)

5. Decorator

A decorator is essentially a Python function, which allows other functions to add additional functions without any code changes. The return value of the decorator is also a function object.

It is often used in scenarios with cross-cutting requirements: such as inserting logs, performance testing, transaction processing, caching, permission verification and other scenarios. With decorators, we can extract a large amount of identical code that has nothing to do with the function itself and continue to reuse it.

How to use the decorator:

  • First define a decorator decorator (hat)

  • Then define your business function or class (human) wrapper

  • Finally, put this decorator (hat) on the head of this function (person)

#================== Use of decorators ============================ =
# define decorator
def decorator(func):
    def wrapper(*args,**kw):
         return func()
    return wrapper

# Define business functions and decorate
@decorator
def function():
    print("hello world")

(1) Ordinary decorator

#===================Use of ordinary decorators========================== ==
#Define the decorator function, logger is the decorator, and the parameter func is the decorated function

def logger(func):
    def wrapper(*args,**kw):
        print('Start executing {} function: '.format(func.__name__)) #
        
        #Function logic body: the logic that really needs to be executed
        func(*args,**kw)
        
        print('Execution completed')
    return wrapper
        
#Write function, specific function
@logger
def add(x,y):
    print(f"{x} + {y}={x + y}") #Both outputs are available
    print("{} + {}={}". format(x,y,x + y))

add(20,30)

Six, Abnormal

Exception: An error that causes a program to abort and exit abnormally during execution. Under normal circumstances, Exceptions will not be handled by the program, but displayed in the form of error messages. All exceptions are exception classes with the first letter capitalized!

  • SyntaxError: syntax error
  • TypeError: type error, that is to say, when an operation or function is applied to an object of an inappropriate type, such as addition and subtraction of integer and character
  • IndexError: An error occurred in the index, such as the most common subscript index beyond the sequence boundary
  • KeyError:Keyword error, mainly occurs in the dictionary, for example, it will be triggered when the user tries to access a key that does not exist in the dictionary.
  • ValueError: Raised when a value that the caller does not expect is passed in, even if the type of the value is correct, such as trying to get the index of a value that does not exist in a list.
  • AttributeError: Attribute error, raised when trying to access a non-existing attribute of an object. (For example, a dictionary has a get method, but a list does not. If the list object calls the get method, this exception will be thrown.)
  • NameError: A variable name error occurs, such as when the user tries to call a variable that has not been assigned or initialized.
  • IOError: Open file error, raised when the user tries to open a non-existent file for reading.
  • StopIteration: Iterator error, when the last value of the iterator is still accessed, this exception will be thrown, reminding the user that there is no value in the iterator for access
  • AssertionError: Assertion error, when the user uses an assertion statement to detect an exception, if the expression detected by the assertion statement is false, this exception will be thrown.
  • IndentationError: indentation error
  • ImportError: An error occurred during the package import process, the package name is wrong or the path is wrong, the package is not installed, and an ImportError is thrown

6.1 Throwing and catching exceptions

Exception handling includes: throwing and catching.

Capture refers to using try….except to wrap a specific statement, while raise is to actively throw an exception

(1) Throw an exception

Exceptions come from two sources:

  • The program automatically throws: For example, 1/0 will automatically throw ZeroDivisionError

  • Raised by the developer: use the raise keyword to raise.

def demo_func(filename):
    if not os.path.isfile(filename):
        raise Exception #raise throws an exception

(2) Catch exception

There are four syntaxes for exception capture:

#====================== Only capture, not get exception information ==================== #
try:
    code A
except [EXCEPTION]:
    code B

#========= Captured, but also to get the exception information, after assigning it to e, print the exception information to the log. =================#
try:
    code A
except [EXCEPTION] as e:
    code B

#=================== Code A has an exception, it will go to the logic of code B ================== ======#
try:
    code A
except [exception] as e :
    code B

#============= If an exception occurs in code A, it will go to the logic of code B. If there is no exception, it will go to code C ============ =#
try:
    code A
except [exception] as e:
    code B
else:
    code C

#============= If an exception occurs in code A, it will go to the logic of code B, and finally code C will be executed regardless of whether there is an exception or not=============#
try:
    code A
except [exception] as e:
    code B
finally:
    Code C

(3) Catch multiple exceptions: except can catch one or more exceptions

1) Each except catches an exception

A try statement may have multiple except clauses to specify handlers for different exceptions, but at most one handler will be executed.

try:
    1/0 #An exception is thrown here, because the divisor cannot be 0
except IOError:
    print("IO read and write error")
except FloatingPointError:
    # Floating point calculation error
    print("calculation error")
except ZeroDivisionError: #Therefore, an exception is caught here, and this part of the code is executed, and the rest of the exception code is not executed
    # divisor cannot be 0
    print("calculation error") #The final output is: calculation error

2) One except catches multiple exceptions

Except can be followed by multiple exceptions, Use parentheses between multiple exceptions. As long as it matches the previous one, it will be captured, and it will enter the corresponding code branch.

try:
    1/0
except IOError:
    print("IO read and write error")
except (ZeroDivisionError, FloatingPointError): #Exception caught here
    print("calculation error")

(4) Custom exception

Custom exceptions should inherit from the Exception class, either directly or indirectly.

Custom exception or error class, use InputError below (The name of the exception ends with Error, we also need to follow this when naming the custom exception A specification, just like the standard exception naming), indicating that a problem occurred while accepting user input.

class InputError(Exception):
    def __init__(self, msg):
        self. message = msg

    def __str__(self):
        return self. message

def get_input():
    name = input("Please enter your name:")
    if name == '':
        raise InputError("No input content")
    
try:
    get_input()
except InputError as e:
    print(e)

(5) How to turn off the exception automatic association context

If an exception is thrown in an exception handler or finally block, by default the exception mechanism works implicitly by attaching the previous exception as the new exception’s __context__ attribute. This is Python’s auto-correlation exception context enabled by default.

If you want to control this context, you can add a from keyword (the syntax of from has a restriction, that is, the second expression must be another exception class or instance.) to indicate your new exception is directly caused by which exception

try:
    print(1/0) #Throw an exception
except Exception as exc:
    raise RuntimeError("Something bad happened") from exc #Execute this statement and throw a RuntimeError while throwing a ZeroDivisionError exception generated by 1/0

You can also use the with_traceback() method to set the context __context__ attribute for the exception, which can also better display the exception information in the traceback.

try:
    print(1 / 0)
except Exception as exc:
    raise RuntimeError("bad thing").with_traceback(exc)

Summary:

  • Only catch statements that may throw exceptions, avoiding ambiguous catch logic

  • Maintain the abstract consistency of module exception classes, and wrap the underlying exception classes when necessary

  • Repeated exception handling logic can be simplified by using a “context manager”