PHP preg_replace() Regularly replaces all strings that meet the conditions

PHP preg_replace() regular replacement is different from Javascript regular replacement. PHP preg_replace() defaults to replacing all elements whose symbols match the conditions.

The data that we need to process with programs is not always designed in advance with database thinking, or it cannot be stored using the structure of a database.
For example, template engine parsing templates, spam sensitive information filtering, etc.
Generally in this case, we use regular expressions to match preg_match and replace preg_replace according to our rules.
But in general applications, they are nothing more than database CRUD, and there are very few opportunities to fiddle with regular expressions.
According to what was said before, there are two scenarios: statistical analysis, using matching; processing using replacement.

PHP preg_replace() regular replacement is different from Javascript regular replacement. PHP preg_replace() defaults to replacing all elements whose symbols match the conditions.

preg_replace (regular expression, replace with, string, maximum number of replacements [default -1, countless times], number of replacements)

Regular expressions in most languages are similar, but there are subtle differences.

PHP regular expressions

Regular characters Regular interpretation
\ Change the next character Marked as a special character, or a literal character, or a backreference, or an octal escape character. For example, “\\
” matches the character “n”. “\\
” matches a newline character. The sequence “\” matches “” and “\(” matches “(“.
^ Matches the input string The starting position. If the Multiline property of the RegExp object is set, ^ also matches the position after “\\
” or “\r”.
$ Matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before “\\
” or “\r”.
* Matches the preceding subexpression zero or more times. For example, zo* matches “z” and “zoo”. * is equivalent to {0,}.
+ Matches the previous subexpression one or more times. For example, “zo + ” can match “zo” and “zoo”, but not “z” . + is equivalent to {1,}.
? Matches the preceding subexpression zero or one time. For example, “do(es) ?” can match “does” or “do” in “does”. ? is equivalent to {0,1}.
{n} n is a non-negative integer. Matches a certain number of n times. For example, “o{2}” cannot match the “o” in “Bob”, but it can match the two o’s in “food”.
{n,} n is a non-negative integer. Match at least n times. For example, “o{2,}” cannot match “o” in “Bob” “, but can match all o’s in “foooood”. “o{1,}” is equivalent to “o + “. “o{0,}” is equivalent to “o*”.
{n,m} M and n are non-negative integers, where n<=m. Match at least n times and at most m times. For example, "o{ 1,3}" will match the first three o's in "fooooood". "o{0,1}" is equivalent to "o?". Note that there can be no spaces between the comma and the two numbers.
? When this character is immediately followed by any other limiter (*, +,?, {n}, {n,}, {n,m }), the matching mode is non-greedy. The non-greedy mode matches as little of the searched string as possible, while the default greedy mode matches as much of the searched string as possible. For example, for the string “oooo” , “o?” will match a single “o”, and “o + ” will match all “o”.
.Dot matches except ” Any single character except \\
“. To match any character including “\\
“, use a pattern like “[\s\S]”.
(pattern) Match pattern and get this match. The obtained matches can be obtained from the generated Matches collection, using the SubMatches collection in VBScript and the $0…$9 attributes in JScript. To match parentheses characters, use “\(” or “\)”.
(?:pattern) Matches pattern but does not obtain the matching result, which means that this is a non-acquisition match and is not stored for later use. . This is useful when combining parts of a pattern using the or character “(|)”. For example, “industr(?:y|ies)” is a simpler expression than “industry|industries”.
(?=pattern) Forward positive lookup, matching the search string at the beginning of any string matching pattern. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, “Windows(?=95|98|NT|2000)” can match “Windows” in “Windows2000”, but cannot match “Windows” in “Windows3.1”. Prefetching does not consume characters, that is, after a match occurs, the search for the next match begins immediately after the last match, rather than starting after the character containing the prefetch.
(?!pattern) Forward negative pre-check, matching the search string at the beginning of any string that does not match pattern. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, “Windows(?!95|98|NT|2000)” can match “Windows” in “Windows3.1”, but cannot match “Windows” in “Windows2000”.
(?<=pattern) Reverse positive pre-check is similar to forward positive pre-check, but in the opposite direction. For example, “(?<=95|98|NT|2000)Windows" can match "Windows" in "2000Windows", but cannot match "Windows" in "3.1Windows".
(? Reverse negative pre-check is similar to forward negative pre-check, but in the opposite direction. For example, “(?
x|y Match x or y. For example, “z|food” matches “z” or “food”. “(z|f)ood” matches “zood” or “food”.
[xyz] Character collection. Matches any one of the characters contained. For example, “[abc]” would match the “a” in “plain”.
[^xyz] A collection of negative characters. Matches any character not included. For example, “[^abc]” would match “plin” in “plain”.
[a-z] Character range. Matches any character within the specified range. For example, “[a-z]” matches any lowercase alphabetic character in the range “a” through “z”. Note: Only when the hyphen is inside a character group and appears between two characters, it can represent the range of characters; if it appears at the beginning of a character group, it can only represent the hyphen itself.
[^a-z] Negative character range. Matches any character not within the specified range. For example, “[^a-z]” matches any character that is not in the range “a” through “z”.
\b Matches a word boundary, that is, the position between a word and a space. For example, “er\b” matches the “er” in “never” but not the “er” in “verb”.
\B Match non-word boundaries. “er\B” can match the “er” in “verb”, but not the “er” in “never”.
\cx Matches the control character specified by x. For example, \cM matches a Control-M or carriage return character. The value of x must be one of A-Z or a-z. Otherwise, treat c as a literal “c” character.
\d Matches a numeric character. Equivalent to [0-9].
\D Matches a non-numeric character. Equivalent to [^0-9].
\f Matches a form feed character. Equivalent to \x0c and \cL.
\\ Matches a newline character. Equivalent to \x0a and \cJ.
\r Matches a carriage return character. Equivalent to \x0d and \cM.
\s Matches any whitespace character, including spaces, tabs, form feeds, etc. Equivalent to [ \f\\
\r\t\v].
\S Matches any non-whitespace character. Equivalent to [^ \f\\
\r\t\v].
\t Matches a tab character. Equivalent to \x09 and \cI.
\v Matches a vertical tab character. Equivalent to \x0b and \cK.
\w Matches any word character including an underscore. Equivalent to “[A-Za-z0-9_]”.
\W Match any non-word character. Equivalent to “[^A-Za-z0-9_]”.
\xn Matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be exactly two digits long. For example, “\x41” matches “A”. “\x041” is equivalent to “\x04 & amp;1”. ASCII encoding can be used in regular expressions.
\\
um
Matches num, where num is a positive integer. A reference to the match obtained. For example, “(.)\1” matches two consecutive identical characters.
\\ Identifies an octal escape value or a backreference. If \\
is preceded by at least n fetched subexpressions, n is a backward reference. Otherwise, if n is an octal number (0-7), then n is an octal escape value.
\\
m
Identifies an octal escape value or a backreference. If there are at least nm get subexpressions before \\
m, nm is a backward reference. If \\
m is preceded by at least n obtains, then n is a backward reference followed by the literal m. If none of the previous conditions are met, and if n and m are both octal numbers (0-7), \\
m will match the octal escape value nm.
\\
ml
If n is an octal number (0-7), and m and l are both octal numbers (0-7), then Matches the octal escape value nml.
\un

Matches n, where n is a Unicode character represented by four hexadecimal digits. For example, \\? matches the copyright symbol (?).

The above table is a relatively comprehensive explanation of regular expressions, and the regular characters in trademarks have special meanings and no longer represent the meaning of the original characters. For example, “+” in regular expressions does not represent a plus sign, but represents matching one or more times. And if you want “+” to represent a plus sign, you need to escape it with “” in front of it, that is, use “\ +” to represent a plus sign.

1 + 1=2 The regular expression is: 1\ + 1=2
The regular expression 1 + 1=2 can represent multiple 1=2, that is:
11=2 Regular expression: 1 + 1=2
111=2 Regular expression: 1 + 1=2
1111=2 Regular expression: 1 + 1=2
…

That is to say, all regular characters have specific meanings. If they need to be used to express the meaning of the original characters, they need to be escaped with “” in front. Even if they are non-regular characters, there is no problem with “” escaping.

1 + 1=2 The regular expression can also be: \1\ + \1\=\2
All characters are escaped, but this is not recommended. 

Regular expressions must be surrounded by delimiters. In Javascript, the delimiter is “/”, while in PHP, it is more common to use “/” to delimit, and “#” can also be used to delimit, and It also needs to be surrounded by quotation marks.

If the regular expression contains these delimiters, you need to escape these characters.

PHP regular expression delimiter

Regular expressions in most languages are delimited by “/”, and in PHP, you can also use “#” to delimit. If the string contains a large number of “/” characters, use “/” to delimit it. When delimiting, these “/” need to be escaped, but using “#” does not require escaping and is more concise.

<?php
$weigeti='The URL of the W3CSchool online tutorial is http://e.jb51.net/. Can you replace this URL with the correct URL? ';
//The above requirement is to replace http://e.jb51.net/ with http://e.jb51.net/w3c/
// . : - are all regular symbols, so they need to be escaped, and / is the delimiter. If the string contains the / delimiter, it needs to be escaped.
echo preg_replace('/http\:\/\/www\.jb51\.net\//','http://e.jb51.net/w3c/',$ weigeti);
// When # is used as the delimiter, / is no longer the meaning of the delimiter and does not need to be escaped.
echo preg_replace('#http\://www\.jb51\.net/#','http://e.jb51.net/w3c/',$weigeti);
//The above two output results are the same. [The URL of the W3CSchool online tutorial is http://e.jb51.net/w3c/. Can you replace this URL with the correct URL? 】
?>

Through the above two PHP regular replacement codes, we can find that if the regular statement contains a large number of “/”, it is okay to use “/” or “#” as the delimiter, but using “#” can make the code read It seems more concise. However, E-Dimension Technology recommends that you keep using “/” as the delimiter, because in languages such as Javascript, you can only use “/” as the delimiter. This can form a habit in writing and can be used in other languages.

PHP Regular Expression Modifiers

The modifier is placed at the end of the PHP regular expression delimiter “/” and before the trailing quotation mark of the regular expression.

i ignores case, matching does not consider case
m matches multiple lines independently. If the string does not contain newline characters such as [\\
], it will be the same as ordinary regular expressions.
s sets the regular symbol . Can match the newline character [\\
]. If not set, the regular symbol . cannot match the newline character \\
.
x ignore unescaped spaces
e eval() performs a function on the matched elements.
A forward anchoring, constraint matching only starts searching from the target string
D locks $ as the end. If there is no D, if the string contains newline characters such as [\\
], $ will still match newline characters. If modifier m is set, modifier D is ignored.
S analyzes unanchored matches
U is not greedy. If you add "?" after the regular character quantifier, greedy can be restored.
X opens attachments that are incompatible with perl
u Forces the string to be UTF-8 encoded. This is generally required in non-UTF-8 encoded documents. It is recommended not to use this in UTF-8 environment. According to E-dimensional Technology's investigation, there will be a bug when using this. This bug URL: 

If you are familiar with Javascript regular expressions, you may be familiar with the Javascript regular expression modifier “g”, which means to match all elements that meet the conditions. In PHP regular replacement, it is an element that matches all symbol conditions, so there is no Javascript modifier “g”.

PHP regular Chinese and ignore case PHP preg_replace() is case-sensitive and can only match strings in ASCII encoding. If you need to match case-insensitive and Chinese characters, you need to add the corresponding modifier i or u.

<?php
$weigeti='W3CSschool online tutorial URL: https://www.jb51.net/w3school/';
echo preg_replace('/W3CSschool/','w3c',$weigeti);
//If the case is different, output [w3c online tutorial website: https://www.jb51.net/w3school/]
echo preg_replace('/W3CSschool/i','w3c',$weigeti);
//Ignore case and perform replacement output [w3c online tutorial URL: http://e.jb51.net/w3c/]
echo preg_replace('/URL/u','',$weigeti);
//Force UTF-8 Chinese, perform replacement, and output [W3CSchool online tutorial: https://www.jb51.net/w3school/]
?>

Both case and Chinese are sensitive in PHP, but in Javascript regular expressions, it is only case-sensitive. Ignoring case is also effected by the modifier i, but Javascript does not need to tell whether it is a special character such as UTF-8 Chinese. Can match Chinese directly.

PHP regular line break example

When PHP regular expression encounters a newline character, it will treat the newline character as an ordinary character in the middle of the string. The universal symbol . cannot match \\
, so there are many points when encountering a string with a newline character.

<?php
$weigeti="jb51.net\\
IS\\
LOVING\\
YOU";
// Want to replace $weigeti above with jb51.net
echo preg_replace('/^[A-Z].*[A-Z]$/','',$weigeti);
// This regular expression matches only elements containing \w. $weigeti starts with V, which is consistent with [A-Z], and ends with U, which is also consistent with [A-Z]. .cannot match\\

// Output [jb51.net IS LOVEING YOU]
echo preg_replace('/^[A-Z].*[A-Z]$/s','',$weigeti);
// This uses the modifier s, that is, . can match \\
, so the entire sentence matches and the output is empty
//Output【】
echo preg_replace('/^[A-Z].*[A-Z]$/m','',$weigeti);
// Modifiers are used here to match \\
 as multiple lines independently. It is equivalent to:
/*
$preg_m=preg_replace('/^[A-Z].*[A-Z]$/m','',$weigeti);
$p='/^[A-Z].*[A-Z]$/';
$a=preg_replace($p,'','jb51.net');
$b=preg_replace($p,'','IS');
$c=preg_replace($p,'','LOVING');
$d=preg_replace($p,'','YOU');
$preg_m === $a.$b.$c.$d;
*/
// Output [jb51.net]
?>

In the future, when you use PHP to crawl the content of a website and replace it with regular expressions in batches, you will inevitably ignore that the acquired content contains line breaks, so you must pay attention when using regular expression replacement.

PHP regular matching execution function PHP regular replacement can use a modifier e, which represents eval() to execute a function on the matched content.

<?php
$weigeti='W3CSchool online tutorial website: https://www.jb51.net, are you Jbzj!? ';
//Convert the above URL to lowercase
echo preg_replace('/(http\:[\/\w\.\-] + \/)/e','strtolower("$1")',$weigeti );
// After using the modifier e, you can execute the PHP function strtolower() on the matching URL.
// Output [W3CSchool online tutorial website: https://www.jb51.net, are you Jbzj!? 】
?>

According to the above code, although the matched function strtolower() is within quotes, it will still be executed by eval().

Regular replacement matching variable backward reference

If you are familiar with Javascript, you must be familiar with backward references such as $1 $2 $3… etc., and in PHP these can also be used as backward reference parameters. In PHP, you can also use \1 \1 to represent backward references.

The concept of backward reference is to match a large fragment. This regular expression is internally cut into several small matching elements using parentheses. Then each matching element is replaced by a backward reference according to the sequence of parentheses.

<?php
$weigeti='W3CSchool online tutorial website: https://www.jb51.net, are you Jbzj!? ';
echo preg_replace('/. + (http\:[\w\-\/\.] + \/)[^\w\-\!] + ([\w \-\!] + ). + /','$1',$weigeti);
echo preg_replace('/. + (http\:[\w\-\/\.] + \/)[^\w\-\!] + ([\w \-\!] + ). + /','\1',$weigeti);
echo preg_replace('/. + (http\:[\w\-\/\.] + \/)[^\w\-\!] + ([\w \-\!] + ). + /','\1',$weigeti);
// The above three are all output [https://www.jb51.net]
echo preg_replace('/^(. + ) URL: (http\:[\w\-\/\.] + \/)[^\w\-\!] + ([\w\-\!] + ). + $/',' column: $1<br>URL: $2<br>Trademark: $3',$weigeti);
/*
Column: W3CSchool Online Tutorial
Website: https://www.jb51.net
Trademark: Jbzj!
*/
//The inner brackets are counted first, and the outer brackets are counted first.
echo preg_replace('/^((. + )URL: (http\:[\w\-\/\.] + \/)[^\w\-\!] + ([\w\-\!] + ). + )$/','Original text: $1<br>Column: $2<br>Website: $3<br>Trademark: $4',$weigeti);
/*
Original text: W3CSchool online tutorial website: https://www.jb51.net, are you Jbzj!?
Column: W3CSchool Online Tutorial
Website: https://www.jb51.net
Trademark: Jbzj!
*/
?>